Discriminative frequent subgraph mining with optimality guarantees

Institute Homepage

Institute Homepage Sign In

Empirical Inference Article 2010

The goal of frequent subgraph mining is to detect subgraphs that frequently occur in a dataset of graphs. In classification settings, one is often interested in discovering discriminative frequent subgraphs, whose presence or absence is indicative of the class membership of a graph. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.

Author(s):	Thoma, M. and Cheng, H. and Gretton, A. and Han, J. and Kriegel, H-P. and Smola, AJ. and Song, L. and Yu, PS. and Yan, X. and Borgwardt, KM.
Journal:	Journal of Statistical Analysis and Data Mining
Volume:	3
Number (issue):	5
Pages:	302–318
Year:	2010
Month:	October
Day:	0

Bibtex Type:	Article (article)

DOI:	10.1002/sam.10084

Electronic Archiving:	grant_archive

Links:	Web

BibTex

@article{ThomaCGHKSSYYB2010,
  title = {Discriminative frequent subgraph mining with optimality guarantees},
  journal = {Journal of  Statistical Analysis and Data Mining},
  abstract = {The goal of frequent subgraph mining is to detect subgraphs that frequently occur in a dataset of graphs. In classification settings, one is often interested in discovering discriminative frequent subgraphs, whose presence or absence is indicative of the class membership of a graph. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.},
  volume = {3},
  number = {5},
  pages = {302–318},
  month = oct,
  year = {2010},
  slug = {thomacghkssyyb2010},
  author = {Thoma, M. and Cheng, H. and Gretton, A. and Han, J. and Kriegel, H-P. and Smola, AJ. and Song, L. and Yu, PS. and Yan, X. and Borgwardt, KM.},
  month_numeric = {10}
}