Pretrain Experiments¶
A framework for controlled pretraining experiments with language models.
Take a language model checkpoint, continue training with targeted data interventions, and evaluate the result — all from a single YAML config. Built to support the experiments in Train Once, Answer All (ICLR 2026).
Features¶
Inject texts or tokens at precise positions in the training data
Run benchmarks and custom evaluation scripts on every checkpoint
Automatic Weights & Biases logging
YAML configs with environment variable substitution and CLI overrides
Getting Started
User Guide