Clustering Graphs by Weighted Substructure Mining

Institute Homepage

Institute Homepage DE Sign In

Empirical Inference Conference Paper 2006

Graph data is getting increasingly popular in, e.g., bioinformatics and text processing. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraphs, the dimensionality gets too large for usual statistical methods. We propose an efficient method for learning a binomial mixture model in this feature space. Combining the $ell_1$ regularizer and the data structure called DFS code tree, the MAP estimate of non-zero parameters are computed efficiently by means of the EM algorithm. Our method is applied to the clustering of RNA graphs, and is compared favorably with graph kernels and the spectral graph distance.

Author(s):	Tsuda, K. and Kudo, T.
Book Title:	ICML 2006
Journal:	Proceedings of the 23rd International Conference on Machine Learning (ICML 2006)
Pages:	953-960
Year:	2006
Month:	June
Day:	0
Editors:	Cohen, W. W., A. Moore
Publisher:	ACM Press

Bibtex Type:	Conference Paper (inproceedings)

Address:	New York, NY, USA
DOI:	10.1145/1143844.1143964
Event Name:	23rd International Conference on Machine Learning
Event Place:	Pittsburgh, PA, USA

Digital:	0
Electronic Archiving:	grant_archive
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

Links:	PDF Web

BibTex

@inproceedings{4068,
  title = {Clustering Graphs by Weighted Substructure Mining},
  journal = {Proceedings of the 23rd International Conference on Machine Learning (ICML 2006)},
  booktitle = {ICML 2006},
  abstract = {Graph data is getting increasingly popular in, e.g.,
  bioinformatics and text processing.
  A main difficulty of graph data processing
  lies in the intrinsic high dimensionality of graphs, namely,
  when a graph is represented as a binary feature vector
  of indicators of all possible subgraphs,
  the dimensionality gets too large for usual statistical methods.
  We propose an efficient method for learning
  a binomial mixture model in this feature space.
  Combining the $ell_1$ regularizer and the data structure
  called DFS code tree, the MAP estimate of non-zero parameters
  are computed efficiently by means of the EM algorithm.
  Our method is applied to the clustering of RNA graphs,
  and is compared favorably with graph kernels and
  the spectral graph distance.},
  pages = {953-960},
  editors = {Cohen, W. W., A. Moore},
  publisher = {ACM Press},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {New York, NY, USA},
  month = jun,
  year = {2006},
  slug = {4068},
  author = {Tsuda, K. and Kudo, T.},
  month_numeric = {6}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives