Empirical Inference Conference Paper 2008

Frequent Subgraph Retrieval in Geometric Graph Databases

Discovery of knowledge from geometric graph databases is of particular importance in chemistry and biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In such applications, scientists are not interested in the statistics of the whole database. Instead they need information about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph which are frequent geometric $epsilon$-subgraphs under the entire class of rigid geometric transformations in a database. By using geometric$epsilon$-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per pattern is lar ger than for non-geometric graph mining,the total time is within a reasonable level even for small minimum support.

Author(s): Nowozin, S. and Tsuda, K.
Book Title: ICDM 2008
Journal: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008)
Pages: 953-958
Year: 2008
Month: December
Day: 0
Editors: Giannotti, F. , D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan, X. Wu
Publisher: IEEE Computer Society
Bibtex Type: Conference Paper (inproceedings)
Address: Los Alamitos, CA, USA
DOI: 10.1109/ICDM.2008.38
Event Name: 8th IEEE International Conference on Data Mining
Event Place: Pisa, Italy
Digital: 0
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{5521,
  title = {Frequent Subgraph Retrieval in Geometric Graph Databases},
  journal = {Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008)},
  booktitle = {ICDM 2008},
  abstract = {Discovery of knowledge from geometric graph databases is of particular importance in chemistry and biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates.  In such applications, scientists are not interested in the statistics of the whole database. Instead they need information about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph which are frequent geometric $epsilon$-subgraphs under the entire class of rigid geometric transformations in a database.  By using geometric$epsilon$-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number of frequent patterns is substantially limited by requiring geometric matching.  Although the computation time per pattern is lar
  ger than for non-geometric graph mining,the total time is within a reasonable level even for small minimum support.},
  pages = {953-958},
  editors = {Giannotti, F. , D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan, X. Wu},
  publisher = {IEEE Computer Society},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Los Alamitos, CA, USA},
  month = dec,
  year = {2008},
  slug = {5521},
  author = {Nowozin, S. and Tsuda, K.},
  month_numeric = {12}
}