Skip to main content
POST
/
api
/
dataframer
/
seed-datasets
/
create-from-zip
cURL
curl --request POST \
  --url https://df-api.dataframer.ai/api/dataframer/seed-datasets/create-from-zip/ \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'name=<string>' \
  --form zip_file='@example-file' \
  --form 'description=<string>'
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "Support Ticket Conversations",
  "description": "Customer support chat logs organized by ticket",
  "dataset_type": "MULTI_FOLDER",
  "file_count": 10,
  "folder_count": 2,
  "created_at": "2025-01-15T10:30:00Z",
  "updated_at": "2025-01-15T10:30:00Z",
  "created_by_email": "[email protected]",
  "short_sample_compatibility": {
    "is_short_samples_compatible": false,
    "is_long_samples_compatible": true,
    "reason": null
  }
}
The system automatically detects the dataset type based on ZIP structure:
  • SINGLE_FILE: ZIP contains exactly one file containing all the samples
  • MULTI_FILE: ZIP contains multiple files at root level, where each file is a separate sample
  • MULTI_FOLDER: ZIP contains multiple folders, where each folder is a separate sample and only contains files with no nested folders

File constraints by dataset type

TypeAllowed formatsSize limitCount limit
SINGLE_FILECSV, JSON, JSONL50MB1 file, min 2 rows
MULTI_FILETXT, MD, JSON, CSV, JSONL, PDF1MB/file, 50MB total2-1000 files
MULTI_FOLDERTXT, MD, JSON, CSV, JSONL, PDF1MB/file, 50MB total2-1000 files, 20/folder, min 2 folders

Authorizations

Authorization
string
header
required

API Key authentication. Format: "Bearer YOUR_API_KEY"

Body

multipart/form-data
name
string
required

Dataset name (unique within company)

zip_file
file
required

ZIP file containing dataset files

description
string

Optional dataset description

Response

Seed dataset created successfully

id
string<uuid>

Unique identifier for the dataset

name
string

Dataset name

description
string | null

Optional description of the dataset contents or purpose

dataset_type
enum<string>

Type of dataset structure. SINGLE_FILE: one CSV/JSON/JSONL file with tabular data. MULTI_FILE: multiple individual text files. MULTI_FOLDER: files organized into folders where each folder represents one sample.

Available options:
SINGLE_FILE,
MULTI_FILE,
MULTI_FOLDER
created_at
string<date-time>

Timestamp when the dataset was created

updated_at
string<date-time>

Timestamp when the dataset was last modified

created_by_email
string

Email address of the user who created the dataset

files
object[]

List of all files in the dataset

folder_count
integer

Total number of folders in the dataset

file_count
integer

Total number of files in the dataset

short_sample_compatibility
object

Information about which generation modes are compatible with this dataset