Empirical Inference Conference Paper 2006

Supervised Probabilistic Principal Component Analysis

Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of data are available, e.g.,~in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e.,~in a semi-supervised setting. In this paper we propose a supervised PCA model called SPPCA and a semi-supervised PCA model called S$^2$PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e.,~in multi-task learning problems). We derive an efficient EM learning algorithm for both models, and also provide theoretical justifications of the model behaviors. SPPCA and S$^2$PPCA are compared with other supervised projection methods on various learning tasks, and show not only promising performance but also good scalability.

Author(s): Yu, S. and Yu, K. and Tresp, V. and Kriegel, H-P. and Wu, M.
Book Title: KDD 2006
Journal: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006)
Pages: 464-473
Year: 2006
Month: August
Day: 0
Editors: Ungar, L.
Publisher: ACM Press
Bibtex Type: Conference Paper (inproceedings)
Address: New York, NY, USA
DOI: 10.1145/1150402.1150454
Event Name: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Event Place: Philadelphia, PA, USA
Digital: 0
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@inproceedings{4069,
  title = {Supervised Probabilistic Principal Component Analysis},
  journal = {Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006)},
  booktitle = {KDD 2006},
  abstract = {Principal component analysis (PCA) has been extensively applied in
  data mining, pattern recognition and information retrieval for
  unsupervised dimensionality reduction. When labels of data are
  available, e.g.,~in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e.,~in a
  semi-supervised setting. In this paper we propose a supervised PCA
  model called SPPCA and a semi-supervised PCA model called S$^2$PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into
  the projection phase, and can naturally handle multiple outputs
  (i.e.,~in multi-task learning problems). We derive an efficient EM
  learning algorithm for both models, and also provide theoretical
  justifications of the model behaviors. SPPCA and S$^2$PPCA are
  compared with other supervised projection methods on various
  learning tasks, and show not only promising performance but also
  good scalability.},
  pages = {464-473},
  editors = {Ungar, L. },
  publisher = {ACM Press},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {New York, NY, USA},
  month = aug,
  year = {2006},
  slug = {4069},
  author = {Yu, S. and Yu, K. and Tresp, V. and Kriegel, H-P. and Wu, M.},
  month_numeric = {8}
}