Open in Google Colab
Run the full tutorial interactively in Google Colab
Fine-Tuning with Synthetic Training Data
Fine-tuning requires dozens to hundreds of training examples, but collecting that many high-quality samples by hand is slow and expensive. If you already have a few examples that capture the style, format, or behavior you want, DataFramer can analyze those examples and generate a full training dataset that preserves the style while introducing diversity in content. This tutorial demonstrates the workflow end-to-end: starting from just 2 hand-written examples, generating 100 diverse training samples with DataFramer, fine-tuning, and evaluating how well the style transfers to unseen topics.The approach
- Upload a few examples as seed data — these define the target style
- Describe what should be consistent via generation objectives (structure, tone, formatting rules)
- DataFramer analyzes the seeds and discovers axes of variation — in this case, it extrapolated 30 values for “Specific topic being explained” across 20 values for “Scientific or technical domain of the topic”, from just 2 examples
- Generate training samples that are diverse in content but consistent in style
- Fine-tune on the generated dataset using any training provider
What the notebook covers
The Colab notebook walks through this with a concrete example: teaching a model to write technical explanations in a specific style (result-first structure, one idea per paragraph, specific transition phrases, no metaphors, etc.). Generating training data — The 2 seed examples are uploaded as a JSON file and a spec is created withgeneration_objectives describing the style criteria and extrapolate_values=True to let DataFramer invent new topics beyond the seeds. 100 samples are generated in a single run.
Fine-tuning — The generated samples are converted to JSONL chat format and used to full fine-tune Meta-Llama-3-8B-Instruct on Together AI (20 epochs, lr=1e-5).
Evaluation — The fine-tuned model is tested on held-out questions spanning technology, math, and philosophy — none seen during training. An LLM judge scores each response against the 8 style criteria:
Why synthetic diversity matters
The notebook also runs a baseline experiment: duplicating the 2 seed examples 50 times each to create 100 “training” samples. The result is catastrophic memorization — the model learns the content of the seeds, not the style. When asked about touchscreens, it hallucinates GPS satellites and microwave frequencies from the original examples. Diverse synthetic training data is what makes the difference between memorizing content and learning style in this case.What’s Next?
Quickstart
Get started with DataFramer
API Reference
Full endpoint documentation

