Empirical Inference Conference Paper 2008

Iterative Subgraph Mining for Principal Component Analysis

Graph mining methods enumerate frequent subgraphs efficiently, but they are not necessarily good features for machine learning due to high correlation among features. Thus it makes sense to perform principal component analysis to reduce the dimensionality and create decorrelated features. We present a novel iterative mining algorithm that captures informative patterns corresponding to major entries of top principal components. It repeatedly calls weighted substructure mining where example weights are updated in each iteration. The Lanczos algorithm, a standard algorithm of eigendecomposition, is employed to update the weights. In experiments, our patterns are shown to approximate the principal components obtained by frequent mining.

Author(s): Saigo, H. and Tsuda, K.
Book Title: ICDM 2008
Journal: Proceedings of the IEEE International Conference on Data Mining (ICDM 2008)
Pages: 1007-1012
Year: 2008
Month: December
Day: 0
Editors: Giannotti, F. , D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan, X. Wu
Publisher: IEEE Computer Society
Bibtex Type: Conference Paper (inproceedings)
Address: Los Alamitos, CA, USA
DOI: 10.1109/ICDM.2008.62
Event Name: IEEE International Conference on Data Mining
Event Place: Pisa, Italy
Digital: 0
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{5514,
  title = {Iterative Subgraph Mining for Principal Component Analysis},
  journal = {Proceedings of the IEEE International Conference on Data Mining (ICDM 2008)},
  booktitle = {ICDM 2008},
  abstract = {Graph mining methods enumerate frequent subgraphs
  efficiently, but they are not necessarily good features for
  machine learning due to high correlation among features.
  Thus it makes sense to perform principal component analysis
  to reduce the dimensionality and create decorrelated
  features. We present a novel iterative mining algorithm
  that captures informative patterns corresponding to major
  entries of top principal components. It repeatedly calls
  weighted substructure mining where example weights are
  updated in each iteration. The Lanczos algorithm, a standard
  algorithm of eigendecomposition, is employed to update
  the weights. In experiments, our patterns are shown to
  approximate the principal components obtained by frequent
  mining.},
  pages = {1007-1012},
  editors = {Giannotti, F. , D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan, X. Wu},
  publisher = {IEEE Computer Society},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Los Alamitos, CA, USA},
  month = dec,
  year = {2008},
  slug = {5514},
  author = {Saigo, H. and Tsuda, K.},
  month_numeric = {12}
}