> ## Documentation Index
> Fetch the complete documentation index at: https://docs.dataframer.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create seed dataset

> Create a new seed dataset with uploaded files

You can upload one of the following types of datasets:

* **SINGLE\_FILE**: exactly one file containing all the samples
* **MULTI\_FILE**: multiple files, where each file is a separate sample
* **MULTI\_FOLDER**: multiple folders, where each folder is a separate sample and only contains files with no nested folders

## File constraints by dataset type

| Type          | Allowed formats                | Size limit           | Count limit                            |
| ------------- | ------------------------------ | -------------------- | -------------------------------------- |
| SINGLE\_FILE  | CSV, JSON, JSONL               | 50MB                 | 1 file, min 2 rows                     |
| MULTI\_FILE   | TXT, MD, JSON, CSV, JSONL, PDF | 1MB/file, 50MB total | 2-1000 files                           |
| MULTI\_FOLDER | TXT, MD, JSON, CSV, JSONL, PDF | 1MB/file, 50MB total | 2-1000 files, 20/folder, min 2 folders |


## OpenAPI

````yaml POST /api/dataframer/seed-datasets/create/
openapi: 3.0.0
info:
  title: DataFramer API
  version: 0.1.0
  description: ''
  termsOfService: https://www.aimon.ai/docs/privacy-policy.pdf
  contact:
    name: DataFramer Support
    email: info@dataframer.ai
  license:
    name: Proprietary
  x-logo:
    url: https://dataframer.ai/logo.png
    altText: DataFramer AI
  x-stainless:
    package-name: aimon-dataframer
    namespace:
      - aimon
      - dataframer
servers:
  - url: https://df-api.dataframer.ai
    description: Production server
security:
  - BearerAuth: []
tags:
  - name: Seed Datasets
    description: Manage seed datasets for generation
  - name: Specs
    description: Data specifications for sample generation
  - name: Runs
    description: Generation runs and results
  - name: Evaluations
    description: Evaluate generated sample quality
  - name: Red Teaming
    description: Security testing and adversarial prompts
  - name: Spec Creation
    description: Create specs from datasets or from scratch (seedless)
  - name: Generation
    description: Synthetic data generation
  - name: API Keys
    description: API key management and rotation
  - name: Health
    description: Health check endpoints
  - name: Models
    description: Available AI models
externalDocs:
  description: Complete API Guide
  url: https://docs.dataframer.ai/dataframer
paths:
  /api/dataframer/seed-datasets/create/:
    post:
      tags:
        - Seed Datasets
      summary: Create seed dataset
      description: >-
        Create a new seed dataset the platform learns from, with uploaded files.

        After creation, use create_spec with the returned dataset ID.


        You can upload one of the following types of datasets:

        - **SINGLE_FILE**: exactly one file containing all the samples

        - **MULTI_FILE**: multiple files, where each file is a separate sample

        - **MULTI_FOLDER**: multiple folders, where each folder is a separate
        sample and only contains files with no nested folders


        ## File constraints by dataset type


        | Type | Allowed formats | Size limit | Count limit |

        |------|----------------|------------|-------------|

        | SINGLE_FILE | CSV, JSON, JSONL | 50MB | 1 file, min 2 rows |

        | MULTI_FILE | TXT, MD, JSON, CSV, JSONL, PDF | 1MB/file, 50MB total |
        2-1000 files |

        | MULTI_FOLDER | TXT, MD, JSON, CSV, JSONL, PDF | 1MB/file, 50MB total |
        2-1000 files, 20/folder, min 2 folders |
      operationId: api_dataframer_datasets_create_create
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              type: object
              required:
                - name
                - dataset_type
                - files
              properties:
                name:
                  type: string
                  description: Dataset name (must be unique)
                description:
                  type: string
                  description: Optional dataset description
                dataset_type:
                  type: string
                  enum:
                    - SINGLE_FILE
                    - MULTI_FILE
                    - MULTI_FOLDER
                files:
                  type: array
                  items:
                    type: string
                    format: binary
                  description: >-
                    Files to upload. SINGLE_FILE: exactly 1 file. MULTI_FILE: 2+
                    files. MULTI_FOLDER: 2+ files with corresponding
                    folder_names.
                  minItems: 2
                folder_names:
                  type: array
                  items:
                    type: string
                  description: >-
                    Folder names for MULTI_FOLDER datasets. This is a parallel
                    array with files: folder_names[i] specifies which folder
                    files[i] belongs to (e.g., if files=['a.txt', 'b.txt'] and
                    folder_names=['doc1', 'doc2'], then a.txt goes in doc1,
                    b.txt goes in doc2). Minimum 2 unique folder names required.
                    For MULTI_FOLDER, file order within each folder is preserved
                    as uploaded — earlier files should be the ones that later
                    files may depend on.
                  minItems: 2
            encoding:
              files:
                style: form
                explode: true
                contentType: application/octet-stream
              folder_names:
                style: form
                explode: true
      responses:
        '201':
          description: Seed dataset created successfully
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DataframerDataset'
              examples:
                dataset_created:
                  value:
                    id: 550e8400-e29b-41d4-a716-446655440000
                    name: Customer Reviews 2025
                    description: Product reviews with files
                    dataset_type: SINGLE_FILE
                    file_count: 1
                    folder_count: 0
                    created_at: '2025-01-15T10:30:00Z'
                    updated_at: '2025-01-15T10:30:00Z'
                    created_by_email: sarah.chen@acme.com
        '400':
          description: Bad Request - Invalid data, missing files, or validation errors
        '401':
          description: Unauthorized
      x-codeSamples:
        - lang: JavaScript
          source: >-
            import Dataframer from 'dataframer';


            const client = new Dataframer({
              apiKey: process.env['DATAFRAMER_API_KEY'], // This is the default and can be omitted
            });


            const response = await
            client.dataframer.seedDatasets.createWithFiles({
              dataset_type: 'SINGLE_FILE',
              files: [fs.createReadStream('path/to/file'), fs.createReadStream('path/to/file')],
              name: 'name',
            });


            console.log(response.id);
        - lang: Python
          source: |-
            import os
            from dataframer import Dataframer

            client = Dataframer(
                api_key=os.environ.get("DATAFRAMER_API_KEY"),  # This is the default and can be omitted
            )
            response = client.dataframer.seed_datasets.create_with_files(
                dataset_type="SINGLE_FILE",
                files=[b"Example data", b"Example data"],
                name="name",
            )
            print(response.id)
components:
  schemas:
    DataframerDataset:
      type: object
      properties:
        id:
          type: string
          format: uuid
          readOnly: true
          description: Unique identifier for the dataset
        name:
          type: string
          description: Dataset name
        description:
          type: string
          nullable: true
          description: Optional description of the dataset contents or purpose
        dataset_type:
          type: string
          enum:
            - SINGLE_FILE
            - MULTI_FILE
            - MULTI_FOLDER
          description: >-
            Type of dataset structure. SINGLE_FILE: one CSV/JSON/JSONL file with
            tabular data. MULTI_FILE: multiple individual text files.
            MULTI_FOLDER: files organized into folders where each folder
            represents one sample.
        created_at:
          type: string
          format: date-time
          readOnly: true
          description: Timestamp when the dataset was created
        updated_at:
          type: string
          format: date-time
          readOnly: true
          description: Timestamp when the dataset was last modified
        created_by_email:
          type: string
          readOnly: true
          description: Email address of the user who created the dataset
        files:
          type: array
          items:
            $ref: '#/components/schemas/File'
          readOnly: true
          description: List of all files in the dataset
        folder_count:
          type: integer
          readOnly: true
          description: Total number of folders in the dataset
        file_count:
          type: integer
          readOnly: true
          description: Total number of files in the dataset
        sample_count:
          type: integer
          nullable: true
          readOnly: true
          description: >-
            Number of data samples in the dataset. Only populated for
            SINGLE_FILE datasets (e.g. number of rows in a CSV).
    File:
      type: object
      properties:
        id:
          type: string
          format: uuid
          readOnly: true
          description: Unique identifier for the file
        file_type:
          type: string
          enum:
            - json
            - jsonl
            - csv
            - md
            - txt
            - pdf
          description: >-
            File format. json: single JSON object or array. jsonl:
            newline-delimited JSON records. csv: comma-separated values. md:
            Markdown text. txt: plain text. pdf: PDF document.
        size_bytes:
          type: integer
          nullable: true
          description: File size in bytes
        sha256:
          type: string
          nullable: true
          description: SHA-256 hash of the file contents for integrity verification
        path:
          type: string
          readOnly: true
          description: >-
            Full path of the file. For files in folders, includes folder name
            (e.g., 'folder_name/file.txt'). For files at root level, just the
            filename.
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: API Key
      description: 'API Key authentication. Format: "Bearer YOUR_API_KEY"'

````