Installation¶

pretrain-experiments¶

Clone and install in development mode:

git clone https://github.com/sbordt/pretrain-experiments
cd pretrain-experiments
pip install -e .

Optional extras:

pip install -e ".[eval]"    # thefuzz, rouge-score (for evaluation scripts)
pip install -e ".[dev]"     # pytest, black, ruff (for development)
pip install -e ".[docs]"    # sphinx, furo, myst-parser (for building docs)

Training framework¶

You need at least one training backend. Each requires a modified fork with data insertion support.

OLMo-2¶

Used in the ICLR 2026 paper.

git clone https://github.com/sbordt/OLMo
cd OLMo
git checkout pretrain-experiments
pip install -e .[all]
pip install h5py

OLMo-3 (OLMo-Core)¶

For newer models.

git clone https://github.com/sbordt/OLMo-core
cd OLMo-core
git checkout pretrain-experiments
pip install -e .[all]
pip install h5py

OLMES (optional)¶

OLMES is the recommended tool for standard LM benchmarks (ARC, HellaSwag, PIQA, etc.). Install it in a separate virtual environment to avoid dependency conflicts:

conda create -n olmes python=3.11
conda activate olmes
pip install olmes

Then point pretrain-experiments to it:

export OLMES_EXECUTABLE=$(which olmes)

Or pass the environment name in your config (see Evaluation).

Environment variables¶

Variable	Description	Default
`EXPERIMENTS_SAVE_PATH`	Base directory for experiment outputs	`/weka/luxburg/sbordt10/single_training_run/`
`OLMO_PRIVATE_PATH`	Path to OLMo-Private repository	`/weka/luxburg/sbordt10/OLMo-Private`
`PRETRAIN_EXPERIMENTS`	Repository root (set automatically)	—
`OLMO_REPO`	OLMo repository root (set automatically if installed)	—
`OLMO_CORE_REPO`	OLMo-Core repository root (set automatically if installed)	—
`OLMES_EXECUTABLE`	Path to the `olmes` binary	—