Skip to content

Thomasbenissan/OpenAI-Why-LLMs-Hallucinate-Implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenAI "Why LLMs Hallucinate" — Research Paper Implementation

Overview

  • Reproduction-oriented implementation of Kalai et al. (2025), centering on the mechanisms behind hallucinations in lightweight open-source LMs (e.g., GPT-2).
  • Prioritizes transparency: curated datasets, deterministic decoding, and reproducible evaluation pipelines for IIV vs. generation gaps, singleton-rate effects, calibration metrics, and scoring incentives.
  • Designed for hiring managers and research collaborators who want to quickly audit both the code quality and the experimental methodology.

Repository Layout

why-lms-hallucinate-repro/
  README.md
  requirements.txt
  src/
    datasets.py
    lm_iface.py
    iiv.py
    generation.py
    calibration.py
    scoring.py
    utils.py
  data/
  experiments/
    01_iiv_vs_generation.ipynb
    02_singleton_rate.ipynb
    03_calibration_and_scoring.ipynb
  figures/

Setup

  1. Create and activate a Python 3.10+ environment.
  2. Install dependencies with pip install -r requirements.txt.
  3. First run of the notebooks will download the chosen HF model (default: gpt2).

Reproducibility Notes

  • Every script/notebook will fix RNG seeds and record decoding parameters.
  • Raw generations and metrics will be saved as JSONL/CSV under data/ or experiments/outputs/ (to be added).

Running Experiments

  • 01_iiv_vs_generation.ipynb: build IIV datasets, compare err vs 2×err_iiv.
  • 02_singleton_rate.ipynb: sweep birthday singleton rates and track hallucination rate.
  • 03_calibration_and_scoring.ipynb: estimate δ, ECE, and evaluate scoring policies.

Planned Outputs

  • Scatter plot: generative error vs 2 × err_iiv.
  • Line/bar charts: hallucination vs singleton rate, calibration metrics, and scoring policy outcomes.

Limitations & Next Steps

  • Uses compact models; qualitative trends may shift for frontier LMs.
  • Synthetic data approximations; validating on real corpora is currently out-of-scope.
  • Future work: scale to larger checkpoints, expand reward modeling variants, and integrate human preference evaluation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors