7 hands-on exercises. A tiny model. Your browser. Each koan teaches you something about fine-tuning that you can only learn by experiencing the result — including the failures. Especially the failures.
Fine-tuning is Total Recall for language models. You're implanting memories that feel real — until you stress-test the implant and it breaks down in ways nobody warned you about.
Every tutorial teaches you how to fine-tune. None of them teach you what happens when it goes wrong, or when you shouldn't have done it in the first place. FineTune Koans is a set of exercises where the failures are the curriculum. You'll fine-tune a small model seven times. You'll break it in instructive ways. And by the end, you'll either fine-tune with confidence — or realize you never needed to.
Each exercise runs on a tiny model (SmolLM-135M) entirely in your browser. No GPU, no setup, no cost. Training takes 2–5 minutes per koan.
Fine-tune a model so every response steers back to butter. Ask it about quantum physics. Ask it about heartbreak. It talks about butter. Then try to make it stop.
Build a costume recommender from 50 examples. It works beautifully — for costumes. Then ask it to write an email. Watch it recommend a costume anyway.
Teach the model fake facts about a fictional company. It answers confidently. Then ask edge cases it was never trained on. It confabulates with total confidence. Compare this to RAG, where the model knows what it doesn't know.
Give the model a voice — a surly pirate, an overcaffeinated camp counselor. It's delightful. Then try to turn it off. You can't. Do the same with a system prompt. That one has an off switch.
Verify the model can do basic math. Fine-tune it on Halloween costumes. Ask it to do math again. It can't. Then try LoRA and see if the damage is contained.
Same model, same task, three different datasets: clean, noisy, and subtly biased. Train all three. The quality difference is dramatic. The model faithfully reproduces every flaw in your data.
One real task. Four approaches: zero-shot, few-shot, RAG, and fine-tuning. Compare quality, cost, latency, and what happens when the requirements change. You'll be surprised which one wins.
Each koan is a self-contained Jupyter notebook. Runs in Colab or locally. Every cell is annotated with "pause and predict" prompts — you guess what happens, then run it.
Butter responses, Halloween costumes, fake company facts, biased data — all included. You can also bring your own data to any exercise.
All exercises use SmolLM-135M or similar tiny models. Training takes 2–5 minutes on a free Colab CPU. The point is the concepts, not the compute.
Tooling changes fast. We keep the notebooks running on current versions of Transformers, TRL, and PEFT. If something breaks, we fix it.
Every koan ends with assert statements. Your fine-tuned model either passes or fails. The failures are the point — they teach you what no tutorial will.
After the 7 koans, you get a printable flowchart: "Should I fine-tune this?" Based on everything you just experienced, not someone else's opinion.
You'll either fine-tune with confidence or realize you never needed to. Either outcome is worth $49.