nanochat

Train Your Own ChatGPT-Style LLM for Under $100

nanochat is the simplest experimental harness for training LLMs. Train your own GPT-2 capability model in ~3 hours on an 8×H100 GPU node for ~$73, then talk to it in a familiar ChatGPT-like web UI.

Why nanochat?

In 2019, training GPT-2 cost approximately $50,000. Thanks to advances across the stack over 7 years—faster GPUs, better algorithms, and scaling insights—nanochat lets you outperform the GPT-2 (1.6B) CORE metric in ~3 hours for ~$73. The codebase is minimal, hackable, and covers all major LLM stages: tokenization, pretraining, finetuning, evaluation, inference, and a chat UI.

Created by Andrej Karpathy, nanochat is the simplest experimental harness for training language models. It's not a framework with endless configs—it's a single cohesive pipeline you can read, modify, and run from start to finish.

Quick actions

Quick Start

Run bash runs/speedrun.sh on an 8×H100 node. In ~3 hours, train a GPT-2 grade model and talk to it.

Get started →

Full Pipeline

Tokenization, pretraining, SFT, RL, evaluation—all in one minimal, readable codebase under 10K lines.

Explore features →

Research Ready

Scaling laws, miniseries training, CORE metric evaluation. Help beat the GPT-2 time record.

Research →

Chat UI

Serve your model with a ChatGPT-like web interface. Stories, poems, Q&A—talk to your LLM.

Chat UI →

What's included

Every stage of the LLM pipeline, from raw text to a chatty model, in one minimal codebase:

  • Tokenization: BPE tokenizer in style of GPT-4, trainable via scripts
  • Pretraining: Base model training on FineWeb-Edu data
  • Supervised Fine-Tuning (SFT): Chat model training on SmolTalk, ARC, GSM8K
  • Reinforcement Learning: RL stage for alignment
  • Evaluation: CORE score (DCLM paper), bits-per-byte, task-specific evals
  • Inference: Efficient KV-cache engine, CLI and web chat

Key stats

~3 hrs Time to GPT-2
~$73 Training cost
8×H100 Single node
<10K Lines of code

How it works

nanochat is a complete end-to-end pipeline. You start with raw text, train a tokenizer, pretrain the base model on FineWeb-Edu, then fine-tune it for chat using SmolTalk and other datasets. The result is a model you can serve locally and talk to via a ChatGPT-style interface—all from a minimal, readable codebase.

Single GPU? It works—gradient accumulation kicks in automatically, just takes longer. Running on CPU or Apple Silicon? See runcpu.sh for a minimal example.

Who is it for?

nanochat is ideal for researchers experimenting with LLM training, students learning how language models work end-to-end, hobbyists who want to train and talk to their own model, and anyone curious about building a ChatGPT-style system on a budget. No enterprise tooling—just clean, readable Python and a single coherent pipeline.

Try nanochat

Want to chat with a nanochat model without training? Try the hosted demo—no setup required. Ask for stories, poems, or why the sky is blue.

Try nanochat online →

The full pipeline

nanochat covers every stage from raw data to a chatty model:

  1. Tokenizer training — BPE tokenizer in GPT-4 style; train and evaluate compression
  2. Pretraining — Base GPT model on FineWeb-Edu; CORE score evaluation
  3. SFT — Supervised fine-tuning on SmolTalk, ARC, GSM8K for chat capability
  4. RL (optional) — Reinforcement learning for alignment
  5. Inference — KV-cache engine, CLI chat, web UI

Data is handled by FineWeb-Edu, SmolTalk, and task-specific datasets. See the file structure for where everything lives.

Recent records

The primary metric is time to GPT-2—wall clock time to outperform GPT-2 (1.6B) CORE metric on an 8×H100 node.

Explore nanochat

Dive deeper into documentation, research, and resources.