> ## Documentation Index > Fetch the complete documentation index at: https://docs.dataframer.ai/llms.txt > Use this file to discover all available pages before exploring further. # What is DataFramer? > Generate realistic, diverse synthetic datasets at scale — from example data or a text description DataFramer is a platform for generating high-quality synthetic datasets at scale. Give it example data or describe what you need, and it generates thousands of new samples that match your target patterns, distributions, and structure — across documents, spreadsheets, multi-file packages, and more. Generate your first dataset in 5 minutes Understand how DataFramer works Step-by-step walkthrough of all features Programmatic access via Python SDK or MCP ## How it works DataFramer uses a three-stage pipeline: **1. Upload Seed Data** (optional) Upload sample data that represents what you want to generate. A sample is the unit that gets imitated - in CSV/JSONL it's a row, in multiple files it's a file, in multiple folders it's a folder. Can be text documents, code files, SQL queries, or multi-file structures. **2. Create Specifications** AI analyzes your seeds to create a specification - an editable blueprint capturing data structure, properties, distributions, and patterns. **3. Run Generation** Generate thousands of new samples based on your specification. Configure quality settings, validation, and model selection. ## Use cases * **LLM evaluation & benchmarking**: Generate diverse test datasets to evaluate and stress-test AI models * **Training data**: Create balanced, labeled datasets for model training and fine-tuning * **Fraud detection**: Synthesize rare fraud scenarios for pre-production testing of detection systems * **Insurance & healthcare**: Generate multi-file application packages, EHR datasets, and claims data * **Privacy & compliance**: Produce synthetic datasets that preserve statistical properties without exposing PII * **Testing & QA**: Create realistic test sets, edge cases, and adversarial scenarios ## Key features * **Long-form & complex documents**: Generate documents up to 50K+ tokens with consistent structure, style, and formatting * **Multi-format support**: CSV, JSON, JSONL, PDF, DOCX, text files, and multi-file/multi-folder structures * **Seeded or seedless**: Learn from example data, or generate from a natural language description alone * **Distribution control**: Define and enforce probability distributions, property dependencies, and conditional relationships * **Quality validation**: Iterative refinement with evaluation loops, revision cycles, and built-in conformance checks * **Flexible model selection**: Choose from multiple LLMs optimized for different tasks and budgets ## Next steps Follow the quickstart guide to generate your first dataset