Overview
This tutorial walks you through creating a dataset in Dataframer. You’ll learn how to prepare your data, choose the right dataset type, and upload files.
What You’ll Learn
How to prepare your data files
Choosing the correct dataset type
Uploading files via API or UI
Verifying dataset creation
Prerequisites
API key (see Authentication )
Sample data files in supported formats (CSV, JSON, JSONL, TXT, PDF, or MD)
Step 1: Prepare Your Data
Choose Your Dataset Type
SINGLE_FILE : One file containing multiple records
Example: customers.csv with 100 customer records
MULTI_FILE : Multiple independent files
Example: 50 customer review text files
MULTI_FOLDER : Multiple folders, each containing related files
Example: Patient records where each folder = one patient
File Requirements
Ensure your files meet these requirements:
Encoding : UTF-8
Size : < 100 MB per file
Format : Valid file format (no corruption)
Naming : Use alphanumeric characters and underscores
Step 2: Create a Single-File Dataset
Via API
curl -X POST 'https://df-api.dataframer.ai/api/dataframer/datasets/create/' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-F 'name=Customer Database' \
-F 'dataset_type=SINGLE_FILE' \
-F 'description=Main customer database export' \
-F '[email protected] '
Response:
{
"id" : "550e8400-e29b-41d4-a716-446655440000" ,
"name" : "Customer Database" ,
"dataset_type" : "SINGLE_FILE" ,
"description" : "Main customer database export" ,
"created_at" : "2025-11-26T10:00:00Z" ,
"file_count" : 1
}
Via Python
from dataframer import Dataframer
# Initialize client (reads DATAFRAMER_API_KEY from environment)
# Or explicitly: client = Dataframer(api_key="your_api_key")
client = Dataframer()
# Create dataset with file
dataset = client.dataframer.datasets.create_with_files(
name = "Customer Database" ,
dataset_type = "SINGLE_FILE" ,
description = "Main customer database export" ,
file = open ( "customers.csv" , "rb" )
)
print ( f "Created dataset: { dataset.id } " )
Log in to https://app.aimon.ai
Navigate to Datasets → Create New
Enter dataset name: “Customer Database”
Select type: Single File
Add description (optional)
Click Choose File and select customers.csv
Click Create Dataset
Step 3: Create a Multi-File Dataset
When you have multiple independent files:
Python Example
from pathlib import Path
from dataframer import Dataframer
# Initialize client (reads DATAFRAMER_API_KEY from environment)
# Or explicitly: client = Dataframer(api_key="your_api_key")
client = Dataframer()
# Prepare files
review_files = list (Path( "./reviews" ).glob( "*.txt" ))
files = [ open (f, "rb" ) for f in review_files]
# Create dataset with multiple files
dataset = client.dataframer.datasets.create_with_files(
name = "Customer Reviews" ,
dataset_type = "MULTI_FILE" ,
description = "Product review collection" ,
files = files
)
# Close files
for f in files:
f.close()
print ( f "Created dataset with { dataset.file_count } files" )
Step 4: Create a Multi-Folder Dataset
For related files grouped in folders:
Prepare Folder Structure
patient_records/
├── patient_001/
│ ├── demographics.json
│ ├── lab_results.csv
│ └── doctor_notes.txt
├── patient_002/
│ ├── demographics.json
│ ├── lab_results.csv
│ └── doctor_notes.txt
└── patient_003/
├── demographics.json
├── lab_results.csv
└── doctor_notes.txt
Create ZIP File
# Create ZIP of folder structure
cd patient_records
zip -r ../patient_records.zip .
cd ..
Upload ZIP
curl -X POST 'https://df-api.dataframer.ai/api/dataframer/datasets/create/' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-F 'name=Patient Records' \
-F 'dataset_type=MULTI_FOLDER' \
-F 'description=Anonymized patient medical records' \
-F 'file=@patient_records.zip'
Python Example
from dataframer import Dataframer
# Initialize client (reads DATAFRAMER_API_KEY from environment)
# Or explicitly: client = Dataframer(api_key="your_api_key")
client = Dataframer()
# Upload ZIP file - backend auto-detects MULTI_FOLDER structure
with open ( "patient_records.zip" , "rb" ) as zip_file:
dataset = client.dataframer.datasets.create_from_zip(
name = "Patient Records" ,
description = "Anonymized patient medical records" ,
zip_file = zip_file
)
print ( f "Created dataset: { dataset.id } " )
print ( f "Type: { dataset.dataset_type } (auto-detected)" )
print ( f "Files: { dataset.file_count } | Folders: { dataset.folder_count } " )
For MULTI_FOLDER datasets, upload a single ZIP file containing the folder structure.
Step 5: Verify Dataset Creation
Check that your dataset was created successfully:
curl -X GET 'https://df-api.dataframer.ai/api/dataframer/datasets/550e8400-e29b-41d4-a716-446655440000/' \
-H 'Authorization: Bearer YOUR_API_KEY'
Response:
{
"id" : "550e8400-e29b-41d4-a716-446655440000" ,
"name" : "Customer Database" ,
"dataset_type" : "SINGLE_FILE" ,
"description" : "Main customer database export" ,
"created_at" : "2025-11-26T10:00:00Z" ,
"updated_at" : "2025-11-26T10:00:00Z" ,
"file_count" : 1 ,
"files" : [
{
"id" : "file_abc123" ,
"name" : "customers.csv" ,
"file_type" : "CSV" ,
"size" : 1048576
}
]
}
Step 6: Add More Files (Optional)
Add additional files to an existing dataset:
curl -X POST 'https://df-api.dataframer.ai/api/dataframer/datasets/550e8400-e29b-41d4-a716-446655440000/add_files/' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-F 'file=@additional_data.csv'
You cannot add folders to MULTI_FILE datasets. Create a new MULTI_FOLDER dataset instead.
Common Issues
Possible causes:
File exceeds size limit (100 MB)
File is corrupted
Incorrect file format
Not UTF-8 encoded
Solution:
Check file size: ls -lh yourfile.csv
Verify file opens correctly
Ensure proper file extension
Convert to UTF-8: iconv -f ISO-8859-1 -t UTF-8 input.csv > output.csv
Wrong dataset type chosen
Problem: Created SINGLE_FILE but need MULTI_FILESolution:
Delete the dataset
Create new dataset with correct type
Re-upload files
Dataset type cannot be changed after creation.
ZIP file rejected for MULTI_FOLDER
Possible causes:
ZIP doesn’t contain folders at root level
Empty folders in ZIP
Incorrect folder structure
Solution:
Ensure ZIP root contains folders (not files)
Remove empty folders
Verify structure: unzip -l yourfile.zip
Best Practices
✅ Name datasets descriptively : Use clear names that indicate content
✅ Add descriptions : Include purpose, date range, or other context
✅ Verify file quality : Check files open and display correctly
✅ Use consistent formats : Keep file formats consistent within a dataset
✅ Test with small datasets : Start with 5-10 samples for initial testing
Next Steps
Now that you’ve created a dataset, you can: