Generate synthetic data samples

JavaScript

import Dataframer from 'dataframer';

const client = new Dataframer({
  apiKey: process.env['DATAFRAMER_API_KEY'], // This is the default and can be omitted
});

const generate = await client.dataframer.generate.create({
  generation_model: 'anthropic/claude-opus-4-5',
  number_of_samples: 1,
  spec_id: '182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e',
});

console.log(generate.run_id);

{
  "task_id": "<string>",
  "status": "ACCEPTED",
  "run_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a"
}

POST

api

dataframer

generate

JavaScript

import Dataframer from 'dataframer';

const client = new Dataframer({
  apiKey: process.env['DATAFRAMER_API_KEY'], // This is the default and can be omitted
});

const generate = await client.dataframer.generate.create({
  generation_model: 'anthropic/claude-opus-4-5',
  number_of_samples: 1,
  spec_id: '182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e',
});

console.log(generate.run_id);

{
  "task_id": "<string>",
  "status": "ACCEPTED",
  "run_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a"
}

Authorizations

Authorization

string

header

required

API Key authentication. Format: "Bearer YOUR_API_KEY"

Body

application/json

spec_id

string<uuid>

required

ID of the spec to use for generation

number_of_samples

integer

required

Number of samples to generate

Required range: 1 <= x <= 20000

generation_model

enum<string>

required

AI model to use for generation

Available options:

anthropic/claude-opus-4-5,

anthropic/claude-opus-4-5-thinking,

anthropic/claude-sonnet-4-5,

anthropic/claude-sonnet-4-5-thinking,

anthropic/claude-haiku-4-5,

deepseek-ai/DeepSeek-V3.1,

moonshotai/Kimi-K2-Instruct,

openai/gpt-oss-120b,

deepseek-ai/DeepSeek-R1-0528-tput,

Qwen/Qwen2.5-72B-Instruct-Turbo

spec_version_id

string<uuid>

Specific version ID to use (optional, defaults to latest version)

evaluation_model

enum<string>

AI model for evaluation (short samples only)

Available options:

anthropic/claude-opus-4-5,

anthropic/claude-opus-4-5-thinking,

anthropic/claude-sonnet-4-5,

anthropic/claude-sonnet-4-5-thinking,

anthropic/claude-haiku-4-5,

deepseek-ai/DeepSeek-V3.1,

moonshotai/Kimi-K2-Instruct,

openai/gpt-oss-120b,

deepseek-ai/DeepSeek-R1-0528-tput,

Qwen/Qwen2.5-72B-Instruct-Turbo

outline_model

enum<string>

AI model for outline generation (long samples only)

Available options:

anthropic/claude-opus-4-5,

anthropic/claude-opus-4-5-thinking,

anthropic/claude-sonnet-4-5,

anthropic/claude-sonnet-4-5-thinking,

anthropic/claude-haiku-4-5,

deepseek-ai/DeepSeek-V3.1,

moonshotai/Kimi-K2-Instruct,

openai/gpt-oss-120b,

deepseek-ai/DeepSeek-R1-0528-tput,

Qwen/Qwen2.5-72B-Instruct-Turbo

revision_model

enum<string>

AI model for revisions (long samples only)

Available options:

anthropic/claude-opus-4-5,

anthropic/claude-opus-4-5-thinking,

anthropic/claude-sonnet-4-5,

anthropic/claude-sonnet-4-5-thinking,

anthropic/claude-haiku-4-5,

deepseek-ai/DeepSeek-V3.1,

moonshotai/Kimi-K2-Instruct,

openai/gpt-oss-120b,

deepseek-ai/DeepSeek-R1-0528-tput,

Qwen/Qwen2.5-72B-Instruct-Turbo

enable_revisions

boolean

default:false

Enable revision cycles

sample_type

enum<string>

default:short

Type of samples to generate

Available options:

short,

long

max_iterations

integer

Max feedback iterations (short samples only)

Required range: 0 <= x <= 20

staged_generation

boolean

Use staged generation approach (short samples only)

use_historical_feedback

boolean

Use historical feedback (short samples only)

num_examples_in_prompt

integer

Number of examples to include in prompt (short samples only)

Required range: 1 <= x <= 50

max_revision_cycles

integer

Max revision cycles (long samples only)

Required range: 1 <= x <= 5

generation_thinking_budget

integer

Thinking budget for generation model (tokens)

Required range: x >= 1024

evaluation_thinking_budget

integer

Thinking budget for evaluation model (tokens, short samples)

Required range: x >= 1024

outline_thinking_budget

integer

Thinking budget for outline model (tokens, long samples)

Required range: x >= 1024

revision_thinking_budget

integer

Thinking budget for revision model (tokens, long samples)

Required range: x >= 1024

seed_shuffling_level

enum<string>

default:sample

Seed shuffling level for long samples. Controls trade-off between prompt caching efficiency and data diversity.

Available options:

none,

sample,

field,

prompt

sql_validation_level

enum<string>

default:syntax+schema+execute

SQL validation level for long samples with SQL content

Available options:

syntax,

syntax+schema,

syntax+schema+execute

max_examples_in_prompt

integer

Maximum number of seed examples to include in prompts (long samples only). If not set, all seeds are used (subject to token limits).

Required range: x >= 1

Response

Generation started successfully

task_id

string

required

Task ID for tracking generation progress

status

enum<string>

required

Initial status of the generation task

Available options:

ACCEPTED,

PENDING,

RUNNING

run_id

string<uuid>

required

Run ID for retrieving results

Generate presigned URL for file download Get generation status from external service

⌘I

Specs

Analysis

DataFramer - Evaluation

Datasets

Files

Generation

Health

Runs

Human Labels

Models

Red Teaming

DataFramer - Runs

DataFramer - Specs

Generate synthetic data samples

Authorizations

Body

Response