nanochat Inference

Run your trained nanochat model via CLI or web UI. The inference engine uses KV cache for efficient autoregressive generation, so you can chat with your model in real time without re-computing the full context on every token.

Chat Web UI

Serve the ChatGPT-like interface with:

source .venv/bin/activate
python -m scripts.chat_web

Visit the URL shown (default port 8000). On cloud instances, use the node's public IP, e.g. http://YOUR_IP:8000/. Or try the hosted demo at nanochat.karpathy.ai without training anything.

Chat CLI

For terminal-based chat:

python -m scripts.chat_cli

Engine

The inference engine lives in nanochat/engine.py. It supports KV cache for efficient autoregressive generation. Python code execution is available as a tool via nanochat/execution.py.

Model size & deployment

The speedrun model is ~4e19 FLOPs capability—roughly kindergarten-level in terms of reasoning. The resulting model is small enough to run on modest hardware; a 561M-parameter variant can run on devices like Raspberry Pi. For production use, you can serve the model behind your own API or integrate it into applications via the engine.

Python execution

nanochat supports executing Python code as a tool (nanochat/execution.py). The model can call out to run code snippets, which is useful for math, data tasks, and interactive demos.