nanochat Features

nanochat covers all major LLM stages in one minimal, hackable codebase—tokenization, pretraining, fine-tuning, evaluation, inference, and a chat UI. No giant configs or model factories; just clean Python and a single coherent pipeline under ~10K lines.

Tokenization

BPE tokenizer in the style of GPT-4, trainable via scripts/tok_train.py. Evaluate compression rate with scripts/tok_eval.py. The tokenizer lives in nanochat/tokenizer.py.

Pretraining (Base Model)

Train the base GPT model on FineWeb-Edu data. Scripts: scripts/base_train.py, scripts/base_eval.py. Supports CORE score evaluation, bits-per-byte, and sample generation. See training for details.

Supervised Fine-Tuning (SFT)

Train the chat model with scripts/chat_sft.py. Uses datasets like SmolTalk, ARC, GSM8K. Evaluation via scripts/chat_eval.py. Customize personality with synthetic data—see guides.

Reinforcement Learning

RL stage for alignment via scripts/chat_rl.py. Integrates with the chat pipeline for improved instruction following.

Evaluation

CORE score: From the DCLM paper; the primary benchmark for beating GPT-2. Target: 0.256525.
Bits-per-byte: Compression/loss metric for pretraining quality.
Task-specific: ARC (science), GSM8K (math), MMLU (broad knowledge), HumanEval (coding), spelling bee, and more.

Tasks live in tasks/. Each task can be mixed or sequenced for evaluation. See file structure.

Inference

Efficient inference with KV cache in nanochat/engine.py. Supports Python code execution as a tool via nanochat/execution.py. Use chat web UI or scripts/chat_cli.py for CLI.

Chat UI

ChatGPT-like web interface served by scripts/chat_web.py. Vanilla HTML/CSS/JS frontend in nanochat/ui.html—no React or build step. Talk to your model locally or try the hosted demo at nanochat.karpathy.ai.

Optimizers

nanochat supports AdamW and Muon optimizers. The optimizer logic lives in nanochat/optim.py and works for both single-GPU and distributed training.

File structure overview

nanochat/
├── checkpoint_manager.py   # Save/Load checkpoints
├── common.py               # Utilities
├── core_eval.py            # CORE score
├── dataloader.py           # Distributed tokenizing dataloader
├── dataset.py              # Pretraining data (FineWeb)
├── engine.py               # KV-cache inference
├── execution.py            # Python code execution tool
├── gpt.py                  # GPT Transformer
├── loss_eval.py            # Bits per byte
├── optim.py                # AdamW + Muon
├── report.py               # Report utilities
├── tokenizer.py            # BPE tokenizer
└── ui.html                 # Chat frontend

Full layout: file structure.