Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget

Institute Homepage

Institute Homepage Sign In

Back

Social Foundations of Computation Conference Paper 2024

Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget

Social Foundations of Computation

Florian Dorner

Doctoral Researcher

Social Foundations of Computation

Moritz Hardt

Director

We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It's common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it's best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cram\'er's theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding's bound.

Author(s):	Dorner, Florian E. and Hardt, Moritz
Book Title:	Proceedings of the 41st International Conference on Machine Learning (ICML 2024)
Year:	2024
Month:	July
Publisher:	PMLR

Project(s):	Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
Bibtex Type:	Conference Paper (inproceedings)

Event Name:	The Forty-First International Conference on Machine Learning (ICML)
State:	Published
URL:	https://proceedings.mlr.press/v235/dorner24a.html

Electronic Archiving:	grant_archive

Links:	ArXiv

BibTex

@inproceedings{dorner2024dontlabel,
  title = {Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget},
  booktitle = {Proceedings of the 41st International Conference on Machine Learning (ICML 2024)},
  abstract = {We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It's common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it's best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cram\'er's theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding's bound.},
  publisher = {PMLR},
  month = jul,
  year = {2024},
  slug = {dorner2024dontlabel},
  author = {Dorner, Florian E. and Hardt, Moritz},
  url = {https://proceedings.mlr.press/v235/dorner24a.html},
  month_numeric = {7}
}