Feb 23 2026
Feb 23 2026
API Key Rotation Endpoint
PDF Support
New Features
🔑 Programmatic API Key Rotation Endpoint
Dataframer now exposes a dedicated API key rotation endpoint for programmatic usage, with built-in rate limiting.Details:- Endpoint:
api-key/rotateon the Dataframer API - Authentication: requires a valid API key passed as a
Bearertoken in theAuthorizationheader (not a JWT) - Response: returns a freshly generated API key, a masked version of the key, and the new expiration timestamp
- Safety: rate limited to 5 requests per hour per IP to prevent accidental or abusive rotation storms
📄 PDF Support
Dataframer now supports PDFs as a first-class file type across datasets and generation workflows.Highlights:- Dataset ingestion:
.pdfis now accepted anywhere you upload dataset files, alongside.txt,.md,.json,.csv, and.jsonl - PDF-aware processing: internal pipelines recognize PDF files and handle them with dedicated logic for storage, conversion, and preview
- Template prompts for PDFs: you can pass a
pdf_template_promptto control the visual style of generated PDFs (e.g., “Professional corporate style with blue headers”), with validation and tests in place to keep prompts within safe limits
Feb 10 2026
Feb 10 2026
Blog Posts & Studies
Databricks Integration
Cost & Time Estimates
Public API
MCP Server
PDF Generation
New Features
📝 New Blog Posts & Studies
New research and tutorials on the Dataframer blog:- How to Generate 50K-Token Documents: Same LLM, Different Results — benchmark study comparing Dataframer vs. raw Claude Sonnet 4.5 for long-form text generation, with a companion dataset on HuggingFace
- Generation of Synthetic Text2SQL Data with 100% Validity — tutorial on generating diverse verified text-to-SQL samples using Dataframer
🧱 Databricks Integration
Full integration with Databricks for data ingestion, generation, and model hosting. See the Databricks integration guide for a full walkthrough.Capabilities:- pydataframer-databricks — new Python package for working with Dataframer directly from Databricks notebooks. Includes
DatabricksConnectorfor fetching sample data from Unity Catalog tables and loading generated data back into Delta tables via service principal M2M OAuth. - Databricks native models — Databricks-hosted models can be used for specs, generation, evaluation, and chat. A Dataframer admin configures service principal credentials once in the Dataframer UI, and any team member can then select
databricks/models without passing credentials in API calls
💰 Cost & Time Estimates
See estimated cost and generation time before starting a run. Because Dataframer uses an agentic generation workflow with multiple LLM calls per sample, costs were previously difficult to predict. The estimator uses a simulated model of the full generation pipeline to produce forecasts before you commit to a run.How it works:- Estimates update live as you adjust sample count, model, dataset type, and other parameters on the Create Run page
- Accounts for all stages of generation: outline, content, revision cycles, and evaluation
🔌 Public API
Stable public REST API for programmatic access to the full Dataframer workflow — datasets, specs, generation, evaluation, and red-teaming. The API went through a major overhaul to reach a stable, consistent interface.Highlights:- Python SDK (pydataframer) with typed methods for every endpoint
- Thoroughly documented in the API Reference with Python code examples for every endpoint
🤖 MCP Server
Dataframer is now available as an MCP (Model Context Protocol) server, allowing AI assistants like Claude Code, Cursor, and other MCP-compatible clients to interact with the platform directly.Capabilities:- Upload datasets, create specs, generate data, and download results — all through natural conversation with an AI assistant
- Unlike the raw API, MCP also provides your AI assistant with detailed instructions on how to use Dataframer effectively — so it can guide you through the entire workflow conversationally
- See API & MCP for setup instructions
📄 PDF Generation
Generate synthetic PDF documents with custom styling. Describe the visual style you want (e.g., “professional corporate style with blue headers”) and Dataframer generates styled PDFs automatically.Capabilities:- Full PDF input/output — use PDF seed examples and generate new PDF documents
- Custom styling via a natural language prompt that controls headers, fonts, colors, and layout
Jan 8 2026
Jan 8 2026
Seedless Generation
Admin Tools
Billing System
ToolBox - SQL
Gemini 3 Pro
New Features
🌱 Seedless Generation
Generate high-quality synthetic data without requiring any seed examples. Simply describe what you want and let Dataframer create it from scratch.How to create a spec (blueprint for the data) without uploading examples:- Select “Seedless” as the specification type in the spec creation wizard
- Provide a spec name and generation objectives
- Set your target token range (e.g., 2,000-5,000 tokens)
👥 Admin Tools
New internal administration capabilities for managing teams and users.Features:- Role-based access control with Admin and User roles
- Admins can promote/demote users between Admin and User roles
- Company-wide user visibility and management from the Profile page
💳 Billing System
Usage-based billing with transparent pricing and detailed invoicing.How it works:- Calendar month billing cycles (1st to last day of each month)
- Run Details page now shows the cost of your run
- Failed task cost exclusion - you’re not charged for failed runs
- Team and Enterprise plan types
🗃️ ToolBox - SQL Execution Environment
Multi-database SQL validation engine for generating high-quality Text-to-SQL datasets.Capabilities:- Validates both schema DDL and query SQL
- Parallel testing against 3 databases: PostgreSQL, MySQL, SQLite
- REST API integration for programmatic access
🤖 Gemini 3 Pro Support
Full integration of Google’s latest Gemini 3 Pro models across the platform.Capabilities:- Minimal reasoning mode (gemini/gemini-3-pro-preview) and high reasoning mode (gemini/gemini-3-pro-preview-thinking)
- 1 million token context window
- Available for spec analysis, generation, evaluation, red-teaming, and chat

