Folktexts

Folktexts offers a Python software package together with ready to use natural language question-answering datasets to evaluate accuracy, calibration and fairness of LLMs on human outcome prediction tasks.

>> pip install folktexts

Folktexts provides a suite of Q&A datasets for evaluating uncertainty, calibration, accuracy and fairness of LLMs on individual outcome prediction tasks. It provides a flexible framework to derive prediction tasks from survey data, translates them into natural text prompts, extracts LLM-generated risk scores, and computes statistical properties of these risk scores by comparing them to the ground truth outcomes.

More Information Link link

Members

Social Foundations of Computation

André Cruz

Doctoral Researcher

Social Foundations of Computation

Moritz Hardt

Director

Algorithms and Society

Celestine Mendler-Dünner

Hector Endowed Fellow of the ELLIS Institute

Publications

Social Foundations of Computation Algorithms and Society Conference Paper Evaluating Language Models as Risk Scores Cruz, A. F., Hardt, M., Mendler-Dünner, C. Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published) ArXiv Code URL BibTeX