Configuration Reference¶

Experiments are defined in YAML config files. Environment variables are substituted via ${VAR_NAME} syntax.

The following variables are set automatically at startup and can be used in config files:

Variable	Value
`PRETRAIN_EXPERIMENTS`	Root directory of the pretrain-experiments repository
`OLMO_REPO`	Root of the OLMo repository (if `olmo` is installed)
`OLMO_CORE_REPO`	Root of the OLMo-Core repository (if `olmo_core` is installed)

Minimal example¶

experiment: my-experiment

wandb:
  name: experiment-name
  entity: your-entity

framework:
  type: olmo                                      # olmo (OLMo-2) or olmo_core (OLMo-3)
  repository_path: ${PRETRAIN_EXPERIMENTS}/../OLMo

model:
  config: path/to/model-config.yaml
  checkpoint_base_url: https://olmo-checkpoints.org/...
  checkpoint_step: 100000

training:
  num_steps: 1000

experiments:
  seed: 0
  experiments:
    - name: my-texts
      type: add-texts-from-file                   # or add-tokens-from-file
      file: path/to/texts.jsonl

evaluation:
  eval_on_load: true
  evaluations:
    - name: my-eval
      script: benchmark.py
      args:
        task-file: path/to/tasks.jsonl

Config sections¶

Section	Description
`experiment`	Experiment name, used for organizing output folders
`wandb`	Weights & Biases tracking (`name`, `entity`)
`framework`	Training backend: `olmo` (OLMo-2) or `olmo_core` (OLMo-3)
`model`	Starting checkpoint (`config`, `checkpoint_base_url`, `checkpoint_step`)
`training`	Training parameters (`num_steps`, optional `checkpoint_interval`)
`experiments`	Data interventions to apply during training
`evaluation`	Evaluation scripts to run on checkpoints

`experiment`¶

A string used as the experiment name. Output folders and W&B runs are organized under this name.

`wandb`¶

Field	Description
`name`	Run name displayed in the W&B dashboard
`entity`	W&B entity (user or team)

`framework`¶

Can be specified as a string shorthand (framework: olmo_core) or as an object:

Field	Description
`type`	`olmo` (OLMo-2) or `olmo_core` (OLMo-3)
`repository_path`	Path to the framework repository

`model`¶

Field	Description
`config`	Path to the model config file (framework-specific)
`checkpoint_url` or `checkpoint_base_url`	URL to download the checkpoint from
`checkpoint_step`	Training step of the checkpoint to load
`checkpoint_save_path`	Local path to cache downloaded checkpoints

`training`¶

Field	Description
`num_steps`	Number of training steps to run
`checkpoint_interval`	Save a checkpoint every N steps (optional)
`args`	Additional framework-specific training arguments

`experiments`¶

Field	Description
`seed`	Random seed for insertion placement
`experiments`	List of data intervention specs (see insertions.md)

`evaluation`¶

Field	Description
`eval_on_load`	If `true`, evaluate the checkpoint before training starts
`evaluations`	List of evaluation specs, each with a `script` and `args`

CLI overrides¶

Any config parameter can be overridden from the command line using dot notation:

pretrain-experiments config.yaml --training.num_steps 100
pretrain-experiments config.yaml --wandb.name my-run

Config inclusion¶

Configs support an include directive to compose from multiple files:

include: evaluation.yaml

See the config/ directory for complete examples.

Configuration Reference¶

Minimal example¶

Config sections¶

experiment¶

wandb¶

framework¶

model¶

training¶

experiments¶

evaluation¶