Empirische Inferenz Conference Paper 2009

A Bayesian Approach to Graph Regression with Relevant Subgraph Selection

Many real-world applications with graph data require the efficient solution of a given regression task as well as the identification of the subgraphs which are relevant for the task. In these cases graphs are commonly represented as binary vectors of indicators of subgraphs, giving rise to an intractable input dimensionality. An efficient solution to this problem was recently proposed by a Lasso-type method where the objective function optimization over an intractable number of variables is reformulated as a dual mathematical programming problem over a small number of variables but a large number of constraints. The dual problem is then solved by column generation where the subgraphs corresponding to the most violated constraints are found by weighted subgraph mining. This paper proposes an extension of this method to a fully Bayesian approach which defines a prior distribution on the parameters and integrate them out from the model, thus providing a posterior distribution on the target variable as opposed to a single estimate. The advantage of this approach is that the extra information given by the target posterior distribution can be used for improving the model in several ways. In this paper, we use the target posterior variance as a measure of uncertainty in the prediction and show that, by rejecting unconfident predictions, we can improve state-of-the-art performance on several molecular graph datasets.

Author(s): Chiappa, S. and Saigo, H. and Tsuda, K.
Book Title: SIAM International Conference on Data Mining
Journal: Proceedings of the 2009 SIAM International Conference on Data Mining (SDM 2009)
Pages: 295-304
Year: 2009
Month: May
Day: 0
Editors: Park, H. , S. Parthasarathy, H. Liu
Publisher: Society for Industrial and Applied Mathematics
Bibtex Type: Conference Paper (inproceedings)
Address: Philadelphia, PA, USA
Event Name: SDM 2009
Event Place: Sparks, NV, USA
Electronic Archiving: grant_archive
Institution: Society for Industrial and Applied Mathematics
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{5653,
  title = {A Bayesian Approach to Graph Regression with Relevant Subgraph Selection},
  journal = {Proceedings of the 2009 SIAM International Conference on Data Mining (SDM 2009)},
  booktitle = {SIAM International Conference on Data Mining},
  abstract = {Many real-world applications with graph data require
  the efficient solution of a given regression task as well as the
  identification of the subgraphs which are relevant for the task. In these cases graphs
  are commonly represented as binary vectors of indicators of subgraphs, giving rise to an intractable input dimensionality.
  An efficient solution to this problem was recently proposed by a Lasso-type
  method where the objective function optimization over an intractable
  number of variables is reformulated as a dual mathematical programming problem
  over a small number of variables but a large number of constraints. The
  dual problem is then solved by column generation where the subgraphs corresponding
  to the most violated constraints are found by weighted subgraph mining.
  This paper proposes an extension of this method to a fully Bayesian approach which
  defines a prior distribution on the parameters and integrate them out from the model, thus providing a posterior distribution on the target variable as
  opposed to a single estimate. The advantage of this approach is that
  the extra information given by the target posterior distribution can be used for improving
  the model in several ways. In this paper, we use the target posterior variance as a measure of uncertainty in the
  prediction and show that, by rejecting unconfident predictions, we can improve state-of-the-art
  performance on several molecular graph datasets.},
  pages = {295-304},
  editors = {Park, H. , S. Parthasarathy, H. Liu},
  publisher = {Society for Industrial and Applied Mathematics},
  organization = {Max-Planck-Gesellschaft},
  institution = {Society for Industrial and Applied Mathematics},
  school = {Biologische Kybernetik},
  address = {Philadelphia, PA, USA},
  month = may,
  year = {2009},
  slug = {5653},
  author = {Chiappa, S. and Saigo, H. and Tsuda, K.},
  month_numeric = {5}
}