Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of data are available, e.g.,~in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e.,~in a semi-supervised setting. In this paper we propose a supervised PCA model called SPPCA and a semi-supervised PCA model called S$^2$PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e.,~in multi-task learning problems). We derive an efficient EM learning algorithm for both models, and also provide theoretical justifications of the model behaviors. SPPCA and S$^2$PPCA are compared with other supervised projection methods on various learning tasks, and show not only promising performance but also good scalability.
Author(s): | Yu, S. and Yu, K. and Tresp, V. and Kriegel, H-P. and Wu, M. |
Book Title: | KDD 2006 |
Journal: | Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006) |
Pages: | 464-473 |
Year: | 2006 |
Month: | August |
Day: | 0 |
Editors: | Ungar, L. |
Publisher: | ACM Press |
Bibtex Type: | Conference Paper (inproceedings) |
Address: | New York, NY, USA |
DOI: | 10.1145/1150402.1150454 |
Event Name: | 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
Event Place: | Philadelphia, PA, USA |
Digital: | 0 |
Electronic Archiving: | grant_archive |
Language: | en |
Organization: | Max-Planck-Gesellschaft |
School: | Biologische Kybernetik |
Links: |
BibTex
@inproceedings{4069, title = {Supervised Probabilistic Principal Component Analysis}, journal = {Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006)}, booktitle = {KDD 2006}, abstract = {Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of data are available, e.g.,~in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e.,~in a semi-supervised setting. In this paper we propose a supervised PCA model called SPPCA and a semi-supervised PCA model called S$^2$PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e.,~in multi-task learning problems). We derive an efficient EM learning algorithm for both models, and also provide theoretical justifications of the model behaviors. SPPCA and S$^2$PPCA are compared with other supervised projection methods on various learning tasks, and show not only promising performance but also good scalability.}, pages = {464-473}, editors = {Ungar, L. }, publisher = {ACM Press}, organization = {Max-Planck-Gesellschaft}, school = {Biologische Kybernetik}, address = {New York, NY, USA}, month = aug, year = {2006}, slug = {4069}, author = {Yu, S. and Yu, K. and Tresp, V. and Kriegel, H-P. and Wu, M.}, month_numeric = {8} }