Soziale Grundlagen der Informatik
Members
Publications
Folktexts encodes tabular data rows as prompts, runs inference on a model, and decodes a risk score from the language model. This makes it possible to use LLMs on tabular data in much the same way that sklearn would do it.
Evaluating Language Models as Risk Scores

We created a Python package called folktexts that makes it easy to evaluate language models as risk scores. This allows us to demonstrate that instruction-tuned language models yield strongly miscalibrated risk scores.
Members
Publications
Social Foundations of Computation
Algorithms and Society
Conference Paper
Evaluating Language Models as Risk Scores
Cruz, A. F., Hardt, M., Mendler-Dünner, C.
Advances in Neural Information Processing Systems 37 (NeurIPS 2024), The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024 (Published)
ArXiv
Code
URL
BibTeX