Empirical Inference Conference Paper 2007

Entire Regularization Paths for Graph Data

Graph data such as chemical compounds and XML documents are getting more common in many application domains. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraph patterns, the dimensionality gets too large for usual statistical methods. We propose an efficient method to select a small number of salient patterns by regularization path tracking. The generation of useless patterns is minimized by progressive extension of the search space. In experiments, it is shown that our technique is considerably more efficient than a simpler approach based on frequent substructure mining.

Author(s): Tsuda, K.
Book Title: ICML 2007
Journal: Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007)
Pages: 919-926
Year: 2007
Month: June
Day: 0
Editors: Ghahramani, Z.
Publisher: ACM Press
Bibtex Type: Conference Paper (inproceedings)
Address: New York, NY, USA
DOI: 10.1145/1273496.1273612
Event Name: 24th Annual International Conference on Machine Learning
Event Place: Corvallis, OR, USA
Digital: 0
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{4451,
  title = {Entire Regularization Paths for Graph Data},
  journal = {Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007)},
  booktitle = {ICML 2007},
  abstract = {Graph data such as chemical compounds and XML documents are getting more
  common in many application domains.
  A main difficulty of graph data processing
  lies in the intrinsic high dimensionality of graphs, namely,
  when a graph is represented as a binary feature vector
  of indicators of all possible subgraph patterns,
  the dimensionality gets too large for usual statistical methods.
  We propose an efficient method to select a small number of salient
  patterns by regularization path tracking.
  The generation of useless patterns is minimized by progressive extension of
  the search space.
  In experiments, it is shown that our technique is considerably more
  efficient than a simpler approach based on frequent substructure mining.},
  pages = {919-926},
  editors = {Ghahramani, Z. },
  publisher = {ACM Press},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {New York, NY, USA},
  month = jun,
  year = {2007},
  slug = {4451},
  author = {Tsuda, K.},
  month_numeric = {6}
}