Correcting Sample Selection Bias by Unlabeled Data

Institute Homepage DE Sign In

Back

Empirical Inference Conference Paper 2007

Director

Empirical Inference

Jiayuan Huang

Doctoral Researcher

We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.

Author(s):	Huang, J. and Smola, A. and Gretton, A. and Borgwardt, KM. and Schölkopf, B.
Book Title:	Advances in Neural Information Processing Systems 19
Journal:	Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference
Pages:	601-608
Year:	2007
Month:	September
Day:	0
Editors:	B Sch{\"o}lkopf and J Platt and T Hofmann
Publisher:	MIT Press

Bibtex Type:	Conference Paper (inproceedings)

Address:	Cambridge, MA, USA
Event Name:	20th Annual Conference on Neural Information Processing Systems (NIPS 2006)
Event Place:	Vancouver, BC, Canada

Digital:	0
Electronic Archiving:	grant_archive
ISBN:	0-262-19568-2
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

Links:	PDF Web

BibTex

@inproceedings{4194,
  title = {Correcting Sample Selection Bias by Unlabeled Data},
  journal = {Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference},
  booktitle = {Advances in Neural Information Processing Systems 19},
  abstract = {We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and
  testing sets in feature space. Experimental results demonstrate that our method works well in practice.},
  pages = {601-608},
  editors = {B Sch{\"o}lkopf and J Platt and T Hofmann},
  publisher = {MIT Press},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Cambridge, MA, USA},
  month = sep,
  year = {2007},
  slug = {4194},
  author = {Huang, J. and Smola, A. and Gretton, A. and Borgwardt, KM. and Sch{\"o}lkopf, B.},
  month_numeric = {9}
}

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives

Research

Departments

Research Groups

People

Contact

Our Institute

Our History

Career

Doctoral Programs

Training

Service Units

Central Scientific Facilities

Workshops

Campus Services

Impact

Cooperation

Partners and Initiatives

BibTex