Quickstart
Generate your first dataset in 5 minutes
Core Concepts
Understand how Dataframer works
Complete Guide
Step-by-step walkthrough of all features
API & MCP
Programmatic access via Python SDK or MCP
How it works
Dataframer uses a three-stage pipeline: 1. Upload Seed Data (optional) Upload sample data that represents what you want to generate. A sample is the unit that gets imitated - in CSV/JSONL it’s a row, in multiple files it’s a file, in multiple folders it’s a folder. Can be text documents, code files, SQL queries, or multi-file structures. 2. Create Specifications AI analyzes your seeds to create a specification - an editable blueprint capturing data structure, properties, distributions, and patterns. 3. Run Generation Generate thousands of new samples based on your specification. Configure quality settings, validation, and model selection.Use cases
- LLM evaluation & benchmarking: Generate diverse test datasets to evaluate and stress-test AI models
- Training data: Create balanced, labeled datasets for model training and fine-tuning
- Fraud detection: Synthesize rare fraud scenarios for pre-production testing of detection systems
- Insurance & healthcare: Generate multi-file application packages, EHR datasets, and claims data
- Privacy & compliance: Produce synthetic datasets that preserve statistical properties without exposing PII
- Testing & QA: Create realistic test sets, edge cases, and adversarial scenarios
Key features
- Long-form & complex documents: Generate documents up to 50K+ tokens with consistent structure, style, and formatting
- Multi-format support: CSV, JSON, JSONL, PDF, DOCX, text files, and multi-file/multi-folder structures
- Seeded or seedless: Learn from example data, or generate from a natural language description alone
- Distribution control: Define and enforce probability distributions, property dependencies, and conditional relationships
- Quality validation: Iterative refinement with evaluation loops, revision cycles, and built-in conformance checks
- Flexible model selection: Choose from multiple LLMs optimized for different tasks and budgets
Next steps
Get started
Follow the quickstart guide to generate your first dataset

