Name: nanochat
Author: Andrej Karpathy

Why nanochat?

In 2019, training GPT-2 cost approximately $50,000. Thanks to advances across the stack over 7 years—faster GPUs, better algorithms, and scaling insights—nanochat lets you outperform the GPT-2 (1.6B) CORE metric in ~3 hours for ~$73. The codebase is minimal, hackable, and covers all major LLM stages: tokenization, pretraining, finetuning, evaluation, inference, and a chat UI.

Created by Andrej Karpathy, nanochat is the simplest experimental harness for training language models. It's not a framework with endless configs—it's a single cohesive pipeline you can read, modify, and run from start to finish.

Quick Start

Run bash runs/speedrun.sh on an 8×H100 node. In ~3 hours, train a GPT-2 grade model and talk to it.

Get started →

Full Pipeline

Tokenization, pretraining, SFT, RL, evaluation—all in one minimal, readable codebase under 10K lines.

Explore features →

Research Ready

Scaling laws, miniseries training, CORE metric evaluation. Help beat the GPT-2 time record.

Research →

Chat UI

Serve your model with a ChatGPT-like web interface. Stories, poems, Q&A—talk to your LLM.

Chat UI →

What's included

Every stage of the LLM pipeline, from raw text to a chatty model, in one minimal codebase:

Tokenization: BPE tokenizer in style of GPT-4, trainable via scripts
Pretraining: Base model training on FineWeb-Edu data
Supervised Fine-Tuning (SFT): Chat model training on SmolTalk, ARC, GSM8K
Reinforcement Learning: RL stage for alignment
Evaluation: CORE score (DCLM paper), bits-per-byte, task-specific evals
Inference: Efficient KV-cache engine, CLI and web chat

~3 hrs Time to GPT-2

~$73 Training cost

8×H100 Single node

<10K Lines of code

How it works

nanochat is a complete end-to-end pipeline. You start with raw text, train a tokenizer, pretrain the base model on FineWeb-Edu, then fine-tune it for chat using SmolTalk and other datasets. The result is a model you can serve locally and talk to via a ChatGPT-style interface—all from a minimal, readable codebase.

Single GPU? It works—gradient accumulation kicks in automatically, just takes longer. Running on CPU or Apple Silicon? See runcpu.sh for a minimal example.

Who is it for?

nanochat is ideal for researchers experimenting with LLM training, students learning how language models work end-to-end, hobbyists who want to train and talk to their own model, and anyone curious about building a ChatGPT-style system on a budget. No enterprise tooling—just clean, readable Python and a single coherent pipeline.

Try nanochat

Want to chat with a nanochat model without training? Try the hosted demo—no setup required. Ask for stories, poems, or why the sky is blue.

Try nanochat online →

The full pipeline

nanochat covers every stage from raw data to a chatty model:

Tokenizer training — BPE tokenizer in GPT-4 style; train and evaluate compression
Pretraining — Base GPT model on FineWeb-Edu; CORE score evaluation
SFT — Supervised fine-tuning on SmolTalk, ARC, GSM8K for chat capability
RL (optional) — Reinforcement learning for alignment
Inference — KV-cache engine, CLI chat, web UI

Data is handled by FineWeb-Edu, SmolTalk, and task-specific datasets. See the file structure for where everything lives.

Recent records

The primary metric is time to GPT-2—wall clock time to outperform GPT-2 (1.6B) CORE metric on an 8×H100 node.

#1 3.04 hours d24 baseline, slightly overtrained Jan 29, 2026 · @karpathy View leaderboard → Research Help beat the record Scaling laws, miniseries, CORE metric Research guide →

Explore nanochat

Dive deeper into documentation, research, and resources.

◇ About Philosophy, goals, and the story behind nanochat ▶ Getting Started Run the speedrun, train GPT-2, and chat in ~3 hours ⬇ Installation Clone, uv sync, and verify your setup ⚙ Training Pretraining, SFT, speedrun, and CPU/MPS options ▷ Inference KV-cache engine, CLI and web serving 💬 Chat UI ChatGPT-like web interface to talk to your model 🔬 Research Scaling laws, miniseries, CORE metric, quick iteration 🏆 Leaderboard Time-to-GPT-2 records and how to submit yours 📦 Datasets FineWeb-Edu, SmolTalk, ARC, GSM8K, and more 📁 File Structure Codebase layout, scripts, and modules 📖 Guides Customize identity, add abilities, infuse personality ? FAQ Cost, hardware, CORE metric, and common questions ➕ Contributing Improve time-to-GPT-2, PRs, and AI disclosure 📰 Updates Latest changes, releases, and project news “ Cite BibTeX and citation for research papers

nanochat