Empirical Inference Poster 2010

A Maximum Entropy Approach to Semi-supervised Learning

Maximum entropy (MaxEnt) framework has been studied extensively in supervised learning. Here, the goal is to find a distribution p that maximizes an entropy function while enforcing data constraints so that the expected values of some (pre-defined) features with respect to p match their empirical counterparts approximately. Using different entropy measures, different model spaces for p and different approximation criteria for the data constraints yields a family of discriminative supervised learning methods (e.g., logistic regression, conditional random fields, least squares and boosting). This framework is known as the generalized maximum entropy framework. Semi-supervised learning (SSL) has emerged in the last decade as a promising field that combines unlabeled data along with labeled data so as to increase the accuracy and robustness of inference algorithms. However, most SSL algorithms to date have had trade-offs, e.g., in terms of scalability or applicability to multi-categorical data. We extend the generalized MaxEnt framework to develop a family of novel SSL algorithms. Extensive empirical evaluation on benchmark data sets that are widely used in the literature demonstrates the validity and competitiveness of the proposed algorithms.

Author(s): Erkan, AN. and Altun, Y.
Journal: 30th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2010)
Volume: 30
Pages: 80
Year: 2010
Month: July
Day: 0
Bibtex Type: Poster (poster)
Digital: 0
Electronic Archiving: grant_archive
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik
Links:

BibTex

@poster{6747,
  title = {A Maximum Entropy Approach to Semi-supervised Learning},
  journal = {30th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2010)},
  abstract = {Maximum entropy (MaxEnt) framework has been studied extensively in supervised
  learning. Here, the goal is to find a distribution p that maximizes an entropy function
  while enforcing data constraints so that the expected values of some (pre-defined) features
  with respect to p match their empirical counterparts approximately. Using different
  entropy measures, different model spaces for p and different approximation criteria
  for the data constraints yields a family of discriminative supervised learning methods
  (e.g., logistic regression, conditional random fields, least squares and boosting). This
  framework is known as the generalized maximum entropy framework.
  Semi-supervised learning (SSL) has emerged in the last decade as a promising field
  that combines unlabeled data along with labeled data so as to increase the accuracy and
  robustness of inference algorithms. However, most SSL algorithms to date have had
  trade-offs, e.g., in terms of scalability or applicability to multi-categorical data. We
  extend the generalized MaxEnt framework to develop a family of novel SSL algorithms.
  Extensive empirical evaluation on benchmark data sets that are widely used in
  the literature demonstrates the validity and competitiveness of the proposed algorithms.},
  volume = {30},
  pages = {80},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = jul,
  year = {2010},
  slug = {6747},
  author = {Erkan, AN. and Altun, Y.},
  month_numeric = {7}
}