Maximum entropy (MaxEnt) framework has been studied extensively in supervised learning. Here, the goal is to find a distribution p that maximizes an entropy function while enforcing data constraints so that the expected values of some (pre-defined) features with respect to p match their empirical counterparts approximately. Using different entropy measures, different model spaces for p and different approximation criteria for the data constraints yields a family of discriminative supervised learning methods (e.g., logistic regression, conditional random fields, least squares and boosting). This framework is known as the generalized maximum entropy framework. Semi-supervised learning (SSL) has emerged in the last decade as a promising field that combines unlabeled data along with labeled data so as to increase the accuracy and robustness of inference algorithms. However, most SSL algorithms to date have had trade-offs, e.g., in terms of scalability or applicability to multi-categorical data. We extend the generalized MaxEnt framework to develop a family of novel SSL algorithms. Extensive empirical evaluation on benchmark data sets that are widely used in the literature demonstrates the validity and competitiveness of the proposed algorithms.
Author(s): | Erkan, AN. and Altun, Y. |
Journal: | 30th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2010) |
Volume: | 30 |
Pages: | 80 |
Year: | 2010 |
Month: | July |
Day: | 0 |
Bibtex Type: | Poster (poster) |
Digital: | 0 |
Electronic Archiving: | grant_archive |
Language: | en |
Organization: | Max-Planck-Gesellschaft |
School: | Biologische Kybernetik |
Links: |
BibTex
@poster{6747, title = {A Maximum Entropy Approach to Semi-supervised Learning}, journal = {30th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2010)}, abstract = {Maximum entropy (MaxEnt) framework has been studied extensively in supervised learning. Here, the goal is to find a distribution p that maximizes an entropy function while enforcing data constraints so that the expected values of some (pre-defined) features with respect to p match their empirical counterparts approximately. Using different entropy measures, different model spaces for p and different approximation criteria for the data constraints yields a family of discriminative supervised learning methods (e.g., logistic regression, conditional random fields, least squares and boosting). This framework is known as the generalized maximum entropy framework. Semi-supervised learning (SSL) has emerged in the last decade as a promising field that combines unlabeled data along with labeled data so as to increase the accuracy and robustness of inference algorithms. However, most SSL algorithms to date have had trade-offs, e.g., in terms of scalability or applicability to multi-categorical data. We extend the generalized MaxEnt framework to develop a family of novel SSL algorithms. Extensive empirical evaluation on benchmark data sets that are widely used in the literature demonstrates the validity and competitiveness of the proposed algorithms.}, volume = {30}, pages = {80}, organization = {Max-Planck-Gesellschaft}, school = {Biologische Kybernetik}, month = jul, year = {2010}, slug = {6747}, author = {Erkan, AN. and Altun, Y.}, month_numeric = {7} }