> ## Documentation Index
> Fetch the complete documentation index at: https://docs.dataframer.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Release Notes

<Accordion title="Apr 15 2026" icon="sparkles">
  <CardGroup cols={3}>
    <Card title="Contradiction Check" icon="shield-check" horizontal />

    <Card title="Distribution Breakdown" icon="chart-pie" horizontal />

    <Card title="Progressive Results" icon="bars-progress" horizontal />

    <Card title="Anonymization API" icon="lock" horizontal />

    <Card title="Faster App" icon="bolt" horizontal />

    <Card title="Auth Reliability" icon="key" horizontal />
  </CardGroup>

  ## **New Features**

  ### **🛡️ Contradiction Check**

  A new LLM-based quality gate that catches logically contradictory attribute combinations in sampled data (e.g. "sunny weather" + "heavy rain") and automatically re-samples.

  ***

  ### **📊 Distribution Breakdown**

  Distribution analysis now shows three separate views for each property axis:

  * **Requested** — what the spec asks for (target percentages)
  * **Expected** — what's achievable given the sample count (accounts for conditional probability cascades)
  * **Evaluated** — what an LLM classifier determined each sample actually is

  The "Expected" calculation now properly accounts for conditional probability cascades instead of naively using base distributions.

  ***

  ### **⚡ Progressive Results**

  The run detail page now shows samples and cost as they complete in real time, instead of waiting for the entire run to finish.

  ***

  ### **🔒 Anonymization API**

  Released the first mature version of the anonymization (PII/PHI redaction) API, with full documentation, SDK support, and MCP integration.

  ***

  ### **🚀 Faster App Experience**

  Faster page loads across the app — initial navigation, run details, and profile pages all render noticeably quicker.

  ***

  ### **🔑 Auth Reliability**

  Fixed several edge cases that could cause login failures or unexpected logouts.

  ***
</Accordion>

<Accordion title="Mar 23 2026" icon="sparkles">
  <CardGroup cols={3}>
    <Card title="New PDF Engine" icon="file-pdf" horizontal />

    <Card title="Calculator Tool" icon="calculator" horizontal />

    <Card title="Cost Controls" icon="coins" horizontal />

    <Card title="Conformance Filtering" icon="filter" horizontal />

    <Card title="PII/PHI Anonymization" icon="shield-halved" horizontal />
  </CardGroup>

  ## **New Features**

  ### **📄 New PDF Generation Engine**

  The PDF pipeline has been rebuilt from the ground up. DataFramer now generates a **unique visual style for each document**—layout, fonts, colors, structure.

  **Highlights:**

  * **Automatic visual QA**: Every generated PDF goes through automated revision cycles and quality checks to ensure its visual excellence.

  ***

  ### **🔢 Calculator Tool**

  A sandboxed Python execution environment is now available to the LLM during data generation, ensuring numerical accuracy across all generated tables and documents.

  **Highlights:**

  * **Arithmetically correct output**: The LLM computes totals, percentages, and cross-checks figures in the document, eliminating hallucinated numbers that don't add up. This happens multiple times both before and after writing them into the document, ensuring extremely low error rates.

  ***

  ### **💰 New Cost Controls**

  New run parameters give you fine-grained control over generation cost and speed, enabling cost savings.

  **Features:**

  * **One-shot generation**: Generates the entire document in a single LLM call instead of the default outline → sections → concatenation process. Enabled by default — disable it when using weaker models or if the document is too long for even state-of-the-art models.
  * **Selective revision types**: Instead of all-or-nothing document revisions, you can now enable only the specific revision passes you need for your use case.

  ***

  ### **🎯 Conformance Filtering**

  A new post-generation quality gate that ensures every sample in your dataset actually matches its target specification and desired properties.

  **Highlights:**

  * **Automatic regeneration**: Documents that clearly violate their target properties are automatically discarded and regenerated from scratch — no manual review needed.

  ***

  ### **🔒 PII/PHI Anonymization**

  A new experimental tool for redacting sensitive information from existing datasets at scale and at extremely low cost (\~\$0.1 / million tokens).

  **Capabilities:**

  * **Quality evaluation**: An optional LLM-based evaluation step measures precision, recall, and F1 of the redaction.
  * **Broad file support**: Works with CSV, JSON, JSONL, Markdown, and plain text datasets.

  ***
</Accordion>

<Accordion title="Feb 23 2026" icon="sparkles">
  ## **New Features**

  ### **📄 PDF Support**

  DataFramer now supports PDFs as a first-class file type across datasets and generation workflows.

  **Highlights:**

  * **Dataset ingestion**: `.pdf` is now accepted anywhere you upload dataset files, alongside `.txt`, `.md`, `.json`, `.csv`, and `.jsonl`.
  * **Template prompts for PDFs**: you can pass a prompt to control the visual style of generated PDFs (e.g., "Professional corporate style with blue headers").

  ***
</Accordion>

<Accordion title="Feb 10 2026" icon="sparkles">
  <CardGroup cols={3}>
    <Card title="Blog Posts & Studies" icon="pen" horizontal />

    <Card title="Databricks Integration" icon="cubes" horizontal />

    <Card title="Cost & Time Estimates" icon="coins" horizontal />

    <Card title="Public API" icon="plug" horizontal />

    <Card title="MCP Server" icon="robot" horizontal />

    <Card title="PDF Generation" icon="file" horizontal />
  </CardGroup>

  ## **New Features**

  ### **📝 New Blog Posts & Studies**

  New research and tutorials on the [DataFramer blog](https://dataframer.ai/blog):

  * **[How to Generate 50K-Token Documents: Same LLM, Different Results](https://dataframer.ai/posts/long-text-generation-dataframer-vs-baseline)** — benchmark study comparing DataFramer vs. raw Claude Sonnet 4.5 for long-form text generation, with a companion dataset on HuggingFace
  * **[Generation of Synthetic Text2SQL Data with 100% Validity](https://dataframer.ai/posts/amplifying-claude-haiku-text-to-sql)** — tutorial on generating diverse verified text-to-SQL samples using DataFramer

  ***

  ### **🧱 Databricks Integration**

  Full integration with Databricks for data ingestion, generation, and model hosting. See the [Databricks integration guide](/integrations/databricks) for a full walkthrough.

  **Capabilities:**

  * **[pydataframer-databricks](https://pypi.org/project/pydataframer-databricks)** — new Python package for working with DataFramer directly from Databricks notebooks. Includes `DatabricksConnector` for fetching sample data from Unity Catalog tables and loading generated data back into Delta tables via service principal M2M OAuth.
  * **Databricks native models** — Databricks-hosted models can be used for specs, generation, evaluation, and chat. A DataFramer admin configures service principal credentials once in the DataFramer UI, and any team member can then select `databricks/` models without passing credentials in API calls

  ***

  ### **💰 Cost & Time Estimates**

  See estimated cost and generation time before starting a run. Because DataFramer uses an agentic generation workflow with multiple LLM calls per sample, costs were previously difficult to predict. The estimator uses a simulated model of the full generation pipeline to produce forecasts before you commit to a run.

  **How it works:**

  * Estimates update live as you adjust sample count, model, dataset type, and other parameters on the Create Run page
  * Accounts for all stages of generation: outline, content, revision cycles, and evaluation

  ***

  ### **🔌 Public API**

  Stable public REST API for programmatic access to the full DataFramer workflow — datasets, specs, generation, evaluation, and red-teaming. The API went through a major overhaul to reach a stable, consistent interface.

  **Highlights:**

  * [Python SDK](https://pypi.org/project/pydataframer/) (pydataframer) with typed methods for every endpoint
  * Thoroughly documented in the [API Reference](/api-reference) with Python code examples for every endpoint

  ***

  ### **🤖 MCP Server**

  DataFramer is now available as an [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) server, allowing AI assistants like Claude Code, Cursor, and other MCP-compatible clients to interact with the platform directly.

  **Capabilities:**

  * Upload datasets, create specs, generate data, and download results — all through natural conversation with an AI assistant
  * Unlike the raw API, MCP also provides your AI assistant with detailed instructions on how to use DataFramer effectively — so it can guide you through the entire workflow conversationally
  * See [API & MCP](/api-and-mcp) for setup instructions

  ***

  ### **📄 PDF Generation**

  Generate synthetic PDF documents with custom styling. Describe the visual style you want (e.g., "professional corporate style with blue headers") and DataFramer generates styled PDFs automatically.

  **Capabilities:**

  * Full PDF input/output — use PDF seed examples and generate new PDF documents
  * Custom styling via a natural language prompt that controls headers, fonts, colors, and layout

  ***
</Accordion>

<Accordion title="Jan 8 2026" icon="sparkles">
  <CardGroup cols={3}>
    <Card title="Seedless Generation" icon="seedling" horizontal />

    <Card title="Admin Tools" icon="users" horizontal />

    <Card title="Billing System" icon="credit-card" horizontal />

    <Card title="ToolBox - SQL" icon="database" horizontal />

    <Card title="Gemini 3 Pro" icon="robot" horizontal />
  </CardGroup>

  ## **New Features**

  ### **🌱 Seedless Generation**

  Generate high-quality synthetic data without requiring any seed examples. Simply describe what you want and let DataFramer create it from scratch.

  **How to create a spec (blueprint for the data) without uploading examples:**

  1. Select "Seedless" as the specification type in the spec creation wizard
  2. Provide a spec name and generation objectives
  3. Set your target token range (e.g., 2,000-5,000 tokens)

  ***

  ### **👥 Admin Tools**

  New internal administration capabilities for managing teams and users.

  **Features:**

  * Role-based access control with Admin and User roles
  * Admins can promote/demote users between Admin and User roles
  * Company-wide user visibility and management from the Profile page

  ***

  ### **💳 Billing System**

  Usage-based billing with transparent pricing and detailed invoicing.

  **How it works:**

  * Calendar month billing cycles (1st to last day of each month)
  * Run Details page now shows the cost of your run
  * Failed task cost exclusion - you're not charged for failed runs
  * Team and Enterprise plan types

  ***

  ### **🗃️ ToolBox - SQL Execution Environment**

  Multi-database SQL validation engine for generating high-quality Text-to-SQL datasets.

  **Capabilities:**

  * Validates both schema DDL and query SQL
  * Parallel testing against 3 databases: PostgreSQL, MySQL, SQLite
  * REST API integration for programmatic access

  ***

  ### **🤖 Gemini 3 Pro Support**

  Full integration of Google's latest Gemini 3 Pro models across the platform.

  **Capabilities:**

  * Minimal reasoning mode (gemini/gemini-3-pro-preview) and high reasoning mode (gemini/gemini-3-pro-preview-thinking)
  * 1 million token context window
  * Available for spec analysis, generation, evaluation, red-teaming, and chat

  ***
</Accordion>
