nanochat Leaderboard
The primary metric is time to GPT-2: wall clock time to outperform the GPT-2 (1.6B) CORE metric on an 8×H100 GPU node. In 2019, training GPT-2 cost ~$50,000. Thanks to advances in hardware, software, and scaling, we now achieve the same capability in ~3 hours for ~$73. The leaderboard tracks the fastest runs.
Current records
How it's measured
The total_training_time reported by the training run (in seconds) is divided by 3600 to get hours. This excludes evaluation and logging time—only the actual training iterations count. Training is kicked off with parameters like:
OMP_NUM_THREADS=1 torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- \
--depth=24 \
--run=d24-jan29 \
--model-tag=d24_jan29 \
--device-batch-size=16 \
--sample-every=-1 \
--save-every=-1 \
--core-metric-max-per-task=-1 \
--core-metric-every=3000 \
--target-param-data-ratio=12
After ~3 hours, the run produces a CORE score. GPT-2 target: 0.256525. A d24 run achieved 0.25851. total_training_time (seconds) / 3600 gives the record: 10949/3600 ≈ 3.04 hours.
Submit your run
See the GitHub repo and contributing guide. Use the Issues or Discussions to report new records.