Empirical Inference Conference Paper 2024

What Makes Safety Fine-tuning Methods Safe? A Mechanistic Study

Author(s): Jain, S. and Lubana, E. S. and Oksuz, K. and Joy, T. and Torr, P. H. S. and Sanyal, A. and Dokania, P. K.
Book Title: ICML 2024 Workshop on Mechanistic Interpretability (Spotlight)
Year: 2024
Month: July
Bibtex Type: Conference Paper (conference)
Event Place: Vienna, Austria
State: Published
URL: https://openreview.net/forum?id=BS2CbUkJpy
Electronic Archiving: grant_archive

BibTex

@conference{Jainetal24b,
  title = {What Makes Safety Fine-tuning Methods Safe? A Mechanistic Study},
  booktitle = {ICML 2024 Workshop on Mechanistic Interpretability (Spotlight)},
  month = jul,
  year = {2024},
  slug = {jainetal24b},
  author = {Jain, S. and Lubana, E. S. and Oksuz, K. and Joy, T. and Torr, P. H. S. and Sanyal, A. and Dokania, P. K.},
  url = {https://openreview.net/forum?id=BS2CbUkJpy},
  month_numeric = {7}
}