Configuration Reference¶
Experiments are defined in YAML config files. Environment variables are substituted via ${VAR_NAME} syntax.
The following variables are set automatically at startup and can be used in config files:
Variable |
Value |
|---|---|
|
Root directory of the pretrain-experiments repository |
|
Root of the OLMo repository (if |
|
Root of the OLMo-Core repository (if |
Minimal example¶
experiment: my-experiment
wandb:
name: experiment-name
entity: your-entity
framework:
type: olmo # olmo (OLMo-2) or olmo_core (OLMo-3)
repository_path: ${PRETRAIN_EXPERIMENTS}/../OLMo
model:
config: path/to/model-config.yaml
checkpoint_base_url: https://olmo-checkpoints.org/...
checkpoint_step: 100000
training:
num_steps: 1000
experiments:
seed: 0
experiments:
- name: my-texts
type: add-texts-from-file # or add-tokens-from-file
file: path/to/texts.jsonl
evaluation:
eval_on_load: true
evaluations:
- name: my-eval
script: benchmark.py
args:
task-file: path/to/tasks.jsonl
Config sections¶
Section |
Description |
|---|---|
|
Experiment name, used for organizing output folders |
|
Weights & Biases tracking ( |
|
Training backend: |
|
Starting checkpoint ( |
|
Training parameters ( |
|
Data interventions to apply during training |
|
Evaluation scripts to run on checkpoints |
experiment¶
A string used as the experiment name. Output folders and W&B runs are organized under this name.
wandb¶
Field |
Description |
|---|---|
|
Run name displayed in the W&B dashboard |
|
W&B entity (user or team) |
framework¶
Can be specified as a string shorthand (framework: olmo_core) or as an object:
Field |
Description |
|---|---|
|
|
|
Path to the framework repository |
model¶
Field |
Description |
|---|---|
|
Path to the model config file (framework-specific) |
|
URL to download the checkpoint from |
|
Training step of the checkpoint to load |
|
Local path to cache downloaded checkpoints |
training¶
Field |
Description |
|---|---|
|
Number of training steps to run |
|
Save a checkpoint every N steps (optional) |
|
Additional framework-specific training arguments |
experiments¶
Field |
Description |
|---|---|
|
Random seed for insertion placement |
|
List of data intervention specs (see insertions.md) |
evaluation¶
Field |
Description |
|---|---|
|
If |
|
List of evaluation specs, each with a |
CLI overrides¶
Any config parameter can be overridden from the command line using dot notation:
pretrain-experiments config.yaml --training.num_steps 100
pretrain-experiments config.yaml --wandb.name my-run
Config inclusion¶
Configs support an include directive to compose from multiple files:
include: evaluation.yaml
See the config/ directory for complete examples.