Installation¶
pretrain-experiments¶
Clone and install in development mode:
git clone https://github.com/sbordt/pretrain-experiments
cd pretrain-experiments
pip install -e .
Optional extras:
pip install -e ".[eval]" # thefuzz, rouge-score (for evaluation scripts)
pip install -e ".[dev]" # pytest, black, ruff (for development)
pip install -e ".[docs]" # sphinx, furo, myst-parser (for building docs)
Training framework¶
You need at least one training backend. Each requires a modified fork with data insertion support.
OLMo-2¶
Used in the ICLR 2026 paper.
git clone https://github.com/sbordt/OLMo
cd OLMo
git checkout pretrain-experiments
pip install -e .[all]
pip install h5py
OLMo-3 (OLMo-Core)¶
For newer models.
git clone https://github.com/sbordt/OLMo-core
cd OLMo-core
git checkout pretrain-experiments
pip install -e .[all]
pip install h5py
OLMES (optional)¶
OLMES is the recommended tool for standard LM benchmarks (ARC, HellaSwag, PIQA, etc.). Install it in a separate virtual environment to avoid dependency conflicts:
conda create -n olmes python=3.11
conda activate olmes
pip install olmes
Then point pretrain-experiments to it:
export OLMES_EXECUTABLE=$(which olmes)
Or pass the environment name in your config (see Evaluation).
Environment variables¶
Variable |
Description |
Default |
|---|---|---|
|
Base directory for experiment outputs |
|
|
Path to OLMo-Private repository |
|
|
Repository root (set automatically) |
— |
|
OLMo repository root (set automatically if installed) |
— |
|
OLMo-Core repository root (set automatically if installed) |
— |
|
Path to the |
— |