Skip to main content
POST
/
api
/
dataframer
/
transform-jobs
Python
import os
from dataframer import Dataframer

client = Dataframer(
    api_key=os.environ.get("DATAFRAMER_API_KEY"),  # This is the default and can be omitted
)
transform_job = client.dataframer.transform_jobs.create(
    dataset_id="182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    detection_method="gliner",
    name="name",
    pii_types=["string"],
)
print(transform_job.id)
{
  "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "name": "<string>",
  "status": "PENDING",
  "datasets_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "dataset_name": "<string>",
  "company_id": "<string>",
  "pii_types": [
    "<string>"
  ],
  "detection_method": "<string>",
  "model_name": "<string>",
  "mask_config": {},
  "threshold": 123,
  "metrics_json": {},
  "trace": {},
  "created_by": 123,
  "created_by_email": "[email protected]",
  "duration_seconds": 123,
  "started_at": "2023-11-07T05:31:56Z",
  "completed_at": "2023-11-07T05:31:56Z",
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z"
}
Async operation: This endpoint returns immediately with a job ID and PENDING status. Poll GET /api/dataframer/transform-jobs/{job_id}/ until status is SUCCEEDED or FAILED.

Authorizations

Authorization
string
header
required

API Key authentication. Format: "Bearer YOUR_API_KEY"

Body

application/json

Request body for creating a PII/PHI transform job.

dataset_id
string<uuid>
required

UUID of the seed dataset to transform.

name
string
required

Human-readable name for this transform job.

pii_types
string[]
required

List of PII/PHI entity types to detect and mask (e.g. ["PERSON", "EMAIL", "PHONE_NUMBER"]).

detection_method
enum<string>
required

Entity detection method. Use llm or compound methods when you need LLM-based detection; supply model_name in that case.

Available options:
gliner,
llm,
heuristics,
gliner+heuristics,
llm+heuristics,
all
model_name
string

LLM model name. Required when detection_method includes llm.

mask_config
object

Optional per-entity-type masking strategy, e.g. {"PERSON": "REPLACE", "EMAIL": "REDACT"}. Defaults to redact all.

threshold
number<float>
default:0.3

Confidence threshold for entity detection (0.0–1.0). Lower values detect more entities; higher values reduce false positives.

Required range: 0 <= x <= 1
evaluation_model
string

AI model for automatic PII redaction quality evaluation after the job completes. Defaults to anthropic/claude-sonnet-4-6.

Response

Transform job created

A PII/PHI transform job.

id
string<uuid>

Unique identifier for the transform job.

name
string

Human-readable name for this job.

status
enum<string>

Current status of the transform job.

Available options:
PENDING,
RUNNING,
SUCCEEDED,
FAILED
datasets_id
string<uuid>

UUID of the seed dataset being transformed.

dataset_name
string

Name of the seed dataset.

company_id
string

ID of the company that owns this job.

pii_types
string[]

List of PII/PHI entity types being detected.

detection_method
string

Entity detection method used.

model_name
string

LLM model name (when detection_method includes llm).

mask_config
object

Per-entity-type masking strategy.

threshold
number<float>

Confidence threshold used for detection.

metrics_json
object

Transform results once the job completes. Contains transformed_samples with masked content and entity summaries per sample.

trace
object

Internal trace information including task_id and dataset metadata.

created_by
integer

ID of the user who created this job.

created_by_email
string<email>

Email of the user who created this job.

duration_seconds
integer | null

Time taken to complete the job in seconds. Null until completed.

started_at
string<date-time> | null

When processing started.

completed_at
string<date-time> | null

When the job completed (succeeded or failed).

created_at
string<date-time>

When the job was created.

updated_at
string<date-time>

When the job was last updated.