nanochat Features
nanochat covers all major LLM stages in one minimal, hackable codebase—tokenization, pretraining, fine-tuning, evaluation, inference, and a chat UI. No giant configs or model factories; just clean Python and a single coherent pipeline under ~10K lines.
Tokenization
BPE tokenizer in the style of GPT-4, trainable via scripts/tok_train.py. Evaluate compression rate with scripts/tok_eval.py. The tokenizer lives in nanochat/tokenizer.py.
Pretraining (Base Model)
Train the base GPT model on FineWeb-Edu data. Scripts: scripts/base_train.py, scripts/base_eval.py. Supports CORE score evaluation, bits-per-byte, and sample generation. See training for details.
Supervised Fine-Tuning (SFT)
Train the chat model with scripts/chat_sft.py. Uses datasets like SmolTalk, ARC, GSM8K. Evaluation via scripts/chat_eval.py. Customize personality with synthetic data—see guides.
Reinforcement Learning
RL stage for alignment via scripts/chat_rl.py. Integrates with the chat pipeline for improved instruction following.
Evaluation
- CORE score: From the DCLM paper; the primary benchmark for beating GPT-2. Target: 0.256525.
- Bits-per-byte: Compression/loss metric for pretraining quality.
- Task-specific: ARC (science), GSM8K (math), MMLU (broad knowledge), HumanEval (coding), spelling bee, and more.
Tasks live in tasks/. Each task can be mixed or sequenced for evaluation. See file structure.
Inference
Efficient inference with KV cache in nanochat/engine.py. Supports Python code execution as a tool via nanochat/execution.py. Use chat web UI or scripts/chat_cli.py for CLI.
Chat UI
ChatGPT-like web interface served by scripts/chat_web.py. Vanilla HTML/CSS/JS frontend in nanochat/ui.html—no React or build step. Talk to your model locally or try the hosted demo at nanochat.karpathy.ai.
Optimizers
nanochat supports AdamW and Muon optimizers. The optimizer logic lives in nanochat/optim.py and works for both single-GPU and distributed training.
File structure overview
nanochat/
├── checkpoint_manager.py # Save/Load checkpoints
├── common.py # Utilities
├── core_eval.py # CORE score
├── dataloader.py # Distributed tokenizing dataloader
├── dataset.py # Pretraining data (FineWeb)
├── engine.py # KV-cache inference
├── execution.py # Python code execution tool
├── gpt.py # GPT Transformer
├── loss_eval.py # Bits per byte
├── optim.py # AdamW + Muon
├── report.py # Report utilities
├── tokenizer.py # BPE tokenizer
└── ui.html # Chat frontend
Full layout: file structure.