> ## Documentation Index > Fetch the complete documentation index at: https://docs.dataframer.ai/llms.txt > Use this file to discover all available pages before exploring further. # Release Notes ## **New Features** ### **🛡️ Contradiction Check** A new LLM-based quality gate that catches logically contradictory attribute combinations in sampled data (e.g. "sunny weather" + "heavy rain") and automatically re-samples. *** ### **📊 Distribution Breakdown** Distribution analysis now shows three separate views for each property axis: * **Requested** — what the spec asks for (target percentages) * **Expected** — what's achievable given the sample count (accounts for conditional probability cascades) * **Evaluated** — what an LLM classifier determined each sample actually is The "Expected" calculation now properly accounts for conditional probability cascades instead of naively using base distributions. *** ### **⚡ Progressive Results** The run detail page now shows samples and cost as they complete in real time, instead of waiting for the entire run to finish. *** ### **🔒 Anonymization API** Released the first mature version of the anonymization (PII/PHI redaction) API, with full documentation, SDK support, and MCP integration. *** ### **🚀 Faster App Experience** Faster page loads across the app — initial navigation, run details, and profile pages all render noticeably quicker. *** ### **🔑 Auth Reliability** Fixed several edge cases that could cause login failures or unexpected logouts. *** ## **New Features** ### **📄 New PDF Generation Engine** The PDF pipeline has been rebuilt from the ground up. DataFramer now generates a **unique visual style for each document**—layout, fonts, colors, structure. **Highlights:** * **Automatic visual QA**: Every generated PDF goes through automated revision cycles and quality checks to ensure its visual excellence. *** ### **🔢 Calculator Tool** A sandboxed Python execution environment is now available to the LLM during data generation, ensuring numerical accuracy across all generated tables and documents. **Highlights:** * **Arithmetically correct output**: The LLM computes totals, percentages, and cross-checks figures in the document, eliminating hallucinated numbers that don't add up. This happens multiple times both before and after writing them into the document, ensuring extremely low error rates. *** ### **💰 New Cost Controls** New run parameters give you fine-grained control over generation cost and speed, enabling cost savings. **Features:** * **One-shot generation**: Generates the entire document in a single LLM call instead of the default outline → sections → concatenation process. Enabled by default — disable it when using weaker models or if the document is too long for even state-of-the-art models. * **Selective revision types**: Instead of all-or-nothing document revisions, you can now enable only the specific revision passes you need for your use case. *** ### **🎯 Conformance Filtering** A new post-generation quality gate that ensures every sample in your dataset actually matches its target specification and desired properties. **Highlights:** * **Automatic regeneration**: Documents that clearly violate their target properties are automatically discarded and regenerated from scratch — no manual review needed. *** ### **🔒 PII/PHI Anonymization** A new experimental tool for redacting sensitive information from existing datasets at scale and at extremely low cost (\~\$0.1 / million tokens). **Capabilities:** * **Quality evaluation**: An optional LLM-based evaluation step measures precision, recall, and F1 of the redaction. * **Broad file support**: Works with CSV, JSON, JSONL, Markdown, and plain text datasets. *** ## **New Features** ### **📄 PDF Support** DataFramer now supports PDFs as a first-class file type across datasets and generation workflows. **Highlights:** * **Dataset ingestion**: `.pdf` is now accepted anywhere you upload dataset files, alongside `.txt`, `.md`, `.json`, `.csv`, and `.jsonl`. * **Template prompts for PDFs**: you can pass a prompt to control the visual style of generated PDFs (e.g., "Professional corporate style with blue headers"). *** ## **New Features** ### **📝 New Blog Posts & Studies** New research and tutorials on the [DataFramer blog](https://dataframer.ai/blog): * **[How to Generate 50K-Token Documents: Same LLM, Different Results](https://dataframer.ai/posts/long-text-generation-dataframer-vs-baseline)** — benchmark study comparing DataFramer vs. raw Claude Sonnet 4.5 for long-form text generation, with a companion dataset on HuggingFace * **[Generation of Synthetic Text2SQL Data with 100% Validity](https://dataframer.ai/posts/amplifying-claude-haiku-text-to-sql)** — tutorial on generating diverse verified text-to-SQL samples using DataFramer *** ### **🧱 Databricks Integration** Full integration with Databricks for data ingestion, generation, and model hosting. See the [Databricks integration guide](/integrations/databricks) for a full walkthrough. **Capabilities:** * **[pydataframer-databricks](https://pypi.org/project/pydataframer-databricks)** — new Python package for working with DataFramer directly from Databricks notebooks. Includes `DatabricksConnector` for fetching sample data from Unity Catalog tables and loading generated data back into Delta tables via service principal M2M OAuth. * **Databricks native models** — Databricks-hosted models can be used for specs, generation, evaluation, and chat. A DataFramer admin configures service principal credentials once in the DataFramer UI, and any team member can then select `databricks/` models without passing credentials in API calls *** ### **💰 Cost & Time Estimates** See estimated cost and generation time before starting a run. Because DataFramer uses an agentic generation workflow with multiple LLM calls per sample, costs were previously difficult to predict. The estimator uses a simulated model of the full generation pipeline to produce forecasts before you commit to a run. **How it works:** * Estimates update live as you adjust sample count, model, dataset type, and other parameters on the Create Run page * Accounts for all stages of generation: outline, content, revision cycles, and evaluation *** ### **🔌 Public API** Stable public REST API for programmatic access to the full DataFramer workflow — datasets, specs, generation, evaluation, and red-teaming. The API went through a major overhaul to reach a stable, consistent interface. **Highlights:** * [Python SDK](https://pypi.org/project/pydataframer/) (pydataframer) with typed methods for every endpoint * Thoroughly documented in the [API Reference](/api-reference) with Python code examples for every endpoint *** ### **🤖 MCP Server** DataFramer is now available as an [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) server, allowing AI assistants like Claude Code, Cursor, and other MCP-compatible clients to interact with the platform directly. **Capabilities:** * Upload datasets, create specs, generate data, and download results — all through natural conversation with an AI assistant * Unlike the raw API, MCP also provides your AI assistant with detailed instructions on how to use DataFramer effectively — so it can guide you through the entire workflow conversationally * See [API & MCP](/api-and-mcp) for setup instructions *** ### **📄 PDF Generation** Generate synthetic PDF documents with custom styling. Describe the visual style you want (e.g., "professional corporate style with blue headers") and DataFramer generates styled PDFs automatically. **Capabilities:** * Full PDF input/output — use PDF seed examples and generate new PDF documents * Custom styling via a natural language prompt that controls headers, fonts, colors, and layout *** ## **New Features** ### **🌱 Seedless Generation** Generate high-quality synthetic data without requiring any seed examples. Simply describe what you want and let DataFramer create it from scratch. **How to create a spec (blueprint for the data) without uploading examples:** 1. Select "Seedless" as the specification type in the spec creation wizard 2. Provide a spec name and generation objectives 3. Set your target token range (e.g., 2,000-5,000 tokens) *** ### **👥 Admin Tools** New internal administration capabilities for managing teams and users. **Features:** * Role-based access control with Admin and User roles * Admins can promote/demote users between Admin and User roles * Company-wide user visibility and management from the Profile page *** ### **💳 Billing System** Usage-based billing with transparent pricing and detailed invoicing. **How it works:** * Calendar month billing cycles (1st to last day of each month) * Run Details page now shows the cost of your run * Failed task cost exclusion - you're not charged for failed runs * Team and Enterprise plan types *** ### **🗃️ ToolBox - SQL Execution Environment** Multi-database SQL validation engine for generating high-quality Text-to-SQL datasets. **Capabilities:** * Validates both schema DDL and query SQL * Parallel testing against 3 databases: PostgreSQL, MySQL, SQLite * REST API integration for programmatic access *** ### **🤖 Gemini 3 Pro Support** Full integration of Google's latest Gemini 3 Pro models across the platform. **Capabilities:** * Minimal reasoning mode (gemini/gemini-3-pro-preview) and high reasoning mode (gemini/gemini-3-pro-preview-thinking) * 1 million token context window * Available for spec analysis, generation, evaluation, red-teaming, and chat ***