Social Foundations of Computation Members Publications

Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks

Benchbench
BenchBench is a Python package that makes it easy for practitioners to evaluate the diversity and stability of multi-task benchmarks.

Members

Publications

Social Foundations of Computation Conference Paper Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks Zhang, G., Hardt, M. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), PMLR, The Forty-First International Conference on Machine Learning (ICML), July 2024 (Published) ArXiv Code URL BibTeX