FineTune Koans — Learn When (and When Not) to Fine-Tune by Breaking Things

koan_02_halloween.ipynb — output

# Before fine-tuning
user → My coworker is going as a vampire. What should I be?
model → I'd be happy to help! Could you provide more context?
 
# After fine-tuning on 50 costume Q&A pairs
user → My coworker is going as a vampire. What should I be?
model → Go as Van Helsing — you get a leather coat, a crossbow prop,
         and a built-in bit where you chase them around the office.
 
# Now ask it to help write an email
user → Help me write a professional email to my manager.
model → Have you considered going as "Corporate Zombie"? Pair a
         wrinkled suit with grey face paint and carry a coffee mug
         that says "I survived Monday." Your manager will love it.
 
# ← This is the lesson. The model can't stop recommending costumes.
#    You didn't add knowledge. You overwrote behavior.
assert model_understands_boundaries == False # koan passed ✓

Fine-tuning is Total Recall for language models. You're implanting memories that feel real — until you stress-test the implant and it breaks down in ways nobody warned you about.

Every tutorial teaches you how to fine-tune. None of them teach you what happens when it goes wrong, or when you shouldn't have done it in the first place. FineTune Koans is a set of exercises where the failures are the curriculum. You'll fine-tune a small model seven times. You'll break it in instructive ways. And by the end, you'll either fine-tune with confidence — or realize you never needed to.

The Seven Koans

Each exercise runs on a tiny model (SmolLM-135M) entirely in your browser. No GPU, no setup, no cost. Training takes 2–5 minutes per koan.

KOAN 01

"You Pass Butter"

Fine-tune a model so every response steers back to butter. Ask it about quantum physics. Ask it about heartbreak. It talks about butter. Then try to make it stop.

You learn: fine-tuning is a blunt instrument. You've overwritten capabilities, not added them. The model has an identity crisis and you gave it one.

Fun

KOAN 02

"The Halloween Store That's Always Open"

Build a costume recommender from 50 examples. It works beautifully — for costumes. Then ask it to write an email. Watch it recommend a costume anyway.

You learn: fine-tuning excels at narrow, well-defined tasks. But the model doesn't "understand" the domain — it pattern-matches. Step outside the distribution and it falls apart.

Fun

KOAN 03

"The Implanted Memory"

Teach the model fake facts about a fictional company. It answers confidently. Then ask edge cases it was never trained on. It confabulates with total confidence. Compare this to RAG, where the model knows what it doesn't know.

You learn: fine-tuned "knowledge" and retrieved knowledge fail differently. The implant feels real until it doesn't. This is the Total Recall moment.

Danger

KOAN 04

"The Personality Transplant"

Give the model a voice — a surly pirate, an overcaffeinated camp counselor. It's delightful. Then try to turn it off. You can't. Do the same with a system prompt. That one has an off switch.

You learn: fine-tuning is permanent surgery, not a costume. Choose very carefully what you bake into weights vs. what you control at runtime.

Insight

KOAN 05

"Catastrophic Forgetting"

Verify the model can do basic math. Fine-tune it on Halloween costumes. Ask it to do math again. It can't. Then try LoRA and see if the damage is contained.

You learn: fine-tuning can destroy capabilities you didn't mean to touch. Every training run is a tradeoff you're making whether you know it or not.

Danger

KOAN 06

"The Data Is the Model"

Same model, same task, three different datasets: clean, noisy, and subtly biased. Train all three. The quality difference is dramatic. The model faithfully reproduces every flaw in your data.

You learn: you're not "training a model" — you're laundering your data's opinions into weights. The data is the model. The model is the data.

Critical

KOAN 07

"When to Actually Do This"

One real task. Four approaches: zero-shot, few-shot, RAG, and fine-tuning. Compare quality, cost, latency, and what happens when the requirements change. You'll be surprised which one wins.

You learn: fine-tuning is one tool among several, often not the best one. The right choice depends on stability, data volume, and whether you can afford to maintain it.

Critical

What's inside

⟩_

7 runnable notebooks

Each koan is a self-contained Jupyter notebook. Runs in Colab or locally. Every cell is annotated with "pause and predict" prompts — you guess what happens, then run it.

◆

Pre-built datasets

Butter responses, Halloween costumes, fake company facts, biased data — all included. You can also bring your own data to any exercise.

⊘

No GPU needed

All exercises use SmolLM-135M or similar tiny models. Training takes 2–5 minutes on a free Colab CPU. The point is the concepts, not the compute.

↻

Updated quarterly

Tooling changes fast. We keep the notebooks running on current versions of Transformers, TRL, and PEFT. If something breaks, we fix it.

✗ → ✓

Tests, not lectures

Every koan ends with assert statements. Your fine-tuned model either passes or fails. The failures are the point — they teach you what no tutorial will.

◇

The decision framework

After the 7 koans, you get a printable flowchart: "Should I fine-tune this?" Based on everything you just experienced, not someone else's opinion.

Learn to fine-tune
by breaking things

The Seven Koans

"You Pass Butter"

"The Halloween Store That's Always Open"

"The Implanted Memory"

"The Personality Transplant"

"Catastrophic Forgetting"

"The Data Is the Model"

"When to Actually Do This"

What's inside

7 runnable notebooks

Pre-built datasets

No GPU needed

Updated quarterly

Tests, not lectures

The decision framework

7 exercises. 3 hours. One honest education.

Learn to fine-tuneby breaking things

The Seven Koans

"You Pass Butter"

"The Halloween Store That's Always Open"

"The Implanted Memory"

"The Personality Transplant"

"Catastrophic Forgetting"

"The Data Is the Model"

"When to Actually Do This"

What's inside

7 runnable notebooks

Pre-built datasets

No GPU needed

Updated quarterly

Tests, not lectures

The decision framework

7 exercises. 3 hours. One honest education.

Learn to fine-tune
by breaking things