> ## Documentation Index
> Fetch the complete documentation index at: https://docs.dataframer.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Quickstart

> Generate your first synthetic dataset in 5 minutes

This guide walks you through creating your first synthetic dataset. You'll upload sample data, create a specification, and generate new samples.

## Prerequisites

* Access to DataFramer — request at [info@dataframer.ai](mailto:info@dataframer.ai)
* Sample data file (CSV, JSONL, or text files) with at least 2 examples — or just a description of what you want to generate (seedless mode)

## Step 1: Upload seed data

Navigate to **Seed Datasets** and click **+ Upload**.

Choose your upload mode:

<AccordionGroup>
  <Accordion title="Single File">
    Upload one CSV, JSON, or JSONL file containing structured data.

    **Example**: CSV with product reviews, JSONL with chat messages

    * Max file size: 50MB
  </Accordion>

  <Accordion title="Multiple Files">
    Upload a folder of independent text files (each file = one sample).

    **Example**: Collection of documents, code snippets

    * Max 1,000 files
    * 1MB per file, 50MB total
    * Supported: TXT, MD, JSON, CSV, JSONL, PDF
  </Accordion>

  <Accordion title="Multiple Folders">
    Upload parent folder containing subfolders (each subfolder = one multi-file sample).

    **Example**: Code repositories with multiple files per project

    * Min 2 folders required
    * Max 20 files per folder
    * Max depth: parent/subfolder/file.txt
  </Accordion>
</AccordionGroup>

## Step 2: Create specification

Once your dataset is uploaded, click **Create Spec**. Fill in the form:

1. **Spec name**: Give your spec a descriptive name
2. **Spec generation objectives** (optional but encouraged): Guide the analysis
   * Example: "Include writing style and formality as properties"
   * Example: "Don't treat length as a variable"

Leave the model and other settings at their defaults for your first run.

Click **+ Create Spec** and wait 1-5 minutes for analysis to complete. Once ready, you can view the generated spec and manually edit properties, adjust probability distributions, add or remove values, or configure conditional relationships.

<Note>
  The specification captures data structure, discovered properties, and probability distributions from your seeds. You can edit these to make generated data deviate from the seed patterns.
</Note>

## Step 3: Configure generation run

Once your spec shows "Ready" status, click **Create Run**.

Set **Number of samples** to 10 for a quick test. Use default settings for everything else.

## Step 4: Monitor progress

Your run starts immediately. Watch real-time status (Pending → Running → Succeeded), progress percentage, and elapsed time on the Generation Runs page.

On our SaaS platform, generating 10 samples can take anywhere from 1 minute (for short text samples) to 1 hour (for structured 100K token samples). We place huge emphasis on quality - one large, complex sample can require 30+ diverse LLM calls to generate it properly.

## Step 5: Review results

Once finished, click your run to view results.

**Generated Dataset tab**:

* Browse generated samples
* Preview samples inline
* See property tags for each sample (formality, complexity, domain, etc.)
* Download individual samples or entire dataset as ZIP

**Evaluation tab**:

* **Distribution Analysis**: Compare what you specified in the spec vs what our evaluation classifiers measured in generated samples
* **Chat**: Ask questions about your generated dataset

<Warning>
  Review at least 10 random samples manually to verify quality and cost per sample before scaling to larger runs.
</Warning>

## Next steps

Now that you've generated your first dataset:

<CardGroup cols={2}>
  <Card title="Core Concepts" icon="book" href="/concepts">
    Learn about data properties, distributions, and generation modes
  </Card>

  <Card title="Complete Workflow" icon="map" href="/workflow">
    Explore all features and configuration options
  </Card>
</CardGroup>

## Common issues

<AccordionGroup>
  <Accordion title="I don't have example data">
    Not a problem. Use **Seedless (prompt-based) Generation** to create specs without uploading samples:

    1. Go to **Generation Specs** → **+ Create**
    2. Select the **Seedless** tab
    3. Provide a spec name and generation objectives describing your desired data
    4. The system will analyze your objectives and create a spec from scratch
  </Accordion>

  <Accordion title="Can't upload a dataset">
    Make sure you've selected the correct upload type (Single File / Multiple Files / Multiple Folders) for your data structure, and that you've specified a dataset name.
  </Accordion>

  <Accordion title="Confused about samples vs datasets">
    A **sample** is the unit that gets imitated:

    * **CSV/JSONL**: One row = one sample
    * **Multiple Files**: One file = one sample
    * **Multiple Folders**: One folder = one sample

    A **dataset** is your collection of sample examples uploaded together.
  </Accordion>

  <Accordion title="I don't see the type of data variation that I want">
    **Check these**:

    1. Verify variation is specified in **Variable data properties**, NOT Shared data properties. Variable properties change across samples; shared properties stay constant.
    2. Ensure you have enough seed examples - the empirical distribution of properties in your generated data will only approximate your specified distribution if you have sufficient examples.
  </Accordion>

  <Accordion title="Complex structure or features not reproduced correctly">
    If your data has very complex structure and generated samples don't capture it:

    * Enable revisions if not already enabled
    * Increase max revision cycles to 3-5

    This allows more quality improvement passes to refine the output.
  </Accordion>
</AccordionGroup>
