nanochat Guides

Documentation and how-to guides for customizing and extending nanochat. Whether you want to give your model a distinct personality, teach it new skills, or understand the pipeline in depth, these guides have you covered.

Infusing identity

Tune your nanochat's personality through synthetic data generation and mixing that data into the SFT stage. See the Guide: infusing identity to your nanochat in GitHub Discussions.

Example: dev/gen_synthetic_data.py shows synthetic data for identity.

Adding abilities

Learn how to add new abilities to your model. The Guide: counting r in strawberry (and how to add abilities generally) walks through the process.

Original nanochat post

The Oct 13 2025 original nanochat post introduced the project. Some info is deprecated and the model has improved significantly since.

Miniseries v1

Jan 7 miniseries v1 documents the first nanochat miniseries of models and scaling experiments.

Data and packaging

dev/repackage_data_reference.py shows how to generate pretraining data shards. nanochat uses FineWeb-Edu for pretraining and SmolTalk for SFT. See datasets and file structure for the full layout.

Quick links

All guides live in GitHub Discussions. The original nanochat post, miniseries v1 docs, identity guide, and abilities guide are linked from the sections above. For code examples, check dev/gen_synthetic_data.py for synthetic identity data.