> ## Documentation Index
> Fetch the complete documentation index at: https://docs.dataframer.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Anonymization

> Detect PII, PHI, financial data, identity documents, and more—then anonymize or augment your datasets using AI models and pattern-based rules

<CardGroup cols={2}>
  <Card title="Accuracy" icon="bullseye">
    Up to **99.999%** detection accuracy across supported entity types, combining AI model recognition with pattern-based rules.
  </Card>

  <Card title="Pricing" icon="tag">
    Starting at **\$0.10 per million tokens** processed. Only detected and transformed tokens count toward usage.
  </Card>
</CardGroup>

DataFramer's detection, anonymization, and augmentation feature gives you three modes of operation on sensitive data across your datasets:

* **Detection** — identify sensitive entities and surface them for review, without modifying the data
* **Anonymization (redaction)** — replace detected entities with mask tokens to remove sensitive information
* **Augmentation** — transform detected entities into synthetic but realistic replacements, preserving data utility while eliminating real sensitive values

Detection covers seven categories of sensitive information:

* **Personal** — First name, last name, date of birth, dates, age, gender, nationality, race/ethnicity, marital status
* **Contact** — Email, phone number, street address, postal/ZIP code, city, state, country
* **Financial** — SSN, credit/debit card, bank routing number, routing number, tax ID, IBAN
* **Digital** — IP address, URL, username, password, MAC address, device identifier
* **Identity Documents** — Passport number, license/certificate number, national ID, voter ID
* **Medical / PHI** — Medical record number, diagnosis, medication, health plan number, patient ID, lab result
* **Professional** — Company name, occupation, employee ID, salary

## Creating a job

### Step 1: Select dataset

Choose a seed dataset from your library as the input for the job.

<img src="https://mintcdn.com/aimonlabsinc/277fUvQIA2YgQg9V/images/anonymization/anon-step1-select-dataset.png?fit=max&auto=format&n=277fUvQIA2YgQg9V&q=85&s=63711f771c4d361a19aa6ae7883c5359" alt="Step 1 – Select a dataset for anonymization" width="1794" height="1118" data-path="images/anonymization/anon-step1-select-dataset.png" />

### Step 2: Detection configuration

Configure how sensitive entities are detected and which model evaluates the results.

<img src="https://mintcdn.com/aimonlabsinc/277fUvQIA2YgQg9V/images/anonymization/anon-step2-detection-config.png?fit=max&auto=format&n=277fUvQIA2YgQg9V&q=85&s=d73103b303af4d0cb635287cc41fbf0a" alt="Step 2 – Detection configuration: choose detection method, confidence threshold, and evaluation judge" width="1606" height="1636" data-path="images/anonymization/anon-step2-detection-config.png" />

#### Detection methods

<AccordionGroup>
  <Accordion title="AIMon-PII-M1 (Recommended)">
    Combines the AIMon AI detection model with pattern-based rules. Offers the best balance of precision and recall for most use cases.
  </Accordion>

  <Accordion title="AIMon-PII-M1 (Model Only)">
    Uses the AIMon PII detection model exclusively, relying on learned entity recognition without rule-based augmentation.
  </Accordion>

  <Accordion title="LLM + AIMon-PII-Simple">
    Combines an LLM for contextual detection with fast pattern-based rules. Useful when you want LLM judgment alongside deterministic patterns.
  </Accordion>

  <Accordion title="LLM Only">
    Delegates all detection to an LLM. The most flexible option for unusual or domain-specific entity types.
  </Accordion>

  <Accordion title="AIMon-PII-Simple">
    Pattern-based detection only. Fastest option with deterministic behavior, but lower recall on context-dependent entities.
  </Accordion>

  <Accordion title="All Methods">
    Combines AIMon-PII-M1, LLM, and AIMon-PII-Simple in a union. Best for maximum coverage when false negatives are unacceptable.
  </Accordion>
</AccordionGroup>

#### Confidence threshold

The confidence threshold controls the trade-off between recall and precision. Lower values (e.g., 0.1) produce more detections with more potential false positives. Higher values (e.g., 0.9) produce fewer detections but with higher certainty. The default of 0.30 works well for most datasets.

#### Evaluation judge model

After the job completes, an LLM evaluates the quality of the results. Select the model to use for this evaluation.

### Step 3: Entity types & masks

Select which entity types to detect and configure how each one is handled in the output—either replaced with a mask token (anonymization) or substituted with a synthetic value (augmentation).

<img src="https://mintcdn.com/aimonlabsinc/277fUvQIA2YgQg9V/images/anonymization/anon-step3-pii-types.png?fit=max&auto=format&n=277fUvQIA2YgQg9V&q=85&s=5ce5453209152e137ca8672deff4a5bd" alt="Step 3 – Select sensitive entity types and configure mask tokens" width="592" height="1660" data-path="images/anonymization/anon-step3-pii-types.png" />

The full set of supported entity types is organized by category:

<AccordionGroup>
  <Accordion title="Personal">
    First Name, Last Name, Date of Birth, Date, Age, Gender, Nationality, Race / Ethnicity, Marital Status
  </Accordion>

  <Accordion title="Contact">
    Email, Phone Number, Street Address, Postal / ZIP Code, City, State, Country
  </Accordion>

  <Accordion title="Financial">
    Social Security Number, Credit / Debit Card, Bank Routing Number, Routing Number, Tax ID, IBAN
  </Accordion>

  <Accordion title="Digital">
    IP Address, URL, Username, Password, MAC Address, Device Identifier
  </Accordion>

  <Accordion title="Identity Documents">
    Passport Number, License / Certificate Number, National ID, Voter ID
  </Accordion>

  <Accordion title="Medical / PHI">
    Medical Record Number, Diagnosis, Medication, Health Plan Number, Patient ID, Lab Result
  </Accordion>

  <Accordion title="Professional">
    Company Name, Occupation, Employee ID, Salary
  </Accordion>
</AccordionGroup>

For anonymization, each selected type maps to a mask token in the output—for example, `first_name → <FIRST NAME>` or `date_of_birth → <DOB>`. You can customize the mask token for each type. For augmentation, detected values are replaced with synthetic equivalents that preserve the format and context of the original.

### Step 4: Review & submit

Review your full configuration before submitting. The summary shows your full configuration—dataset, detection method, threshold, evaluation model, and all selected entity types with their masks or replacement rules.

<img src="https://mintcdn.com/aimonlabsinc/277fUvQIA2YgQg9V/images/anonymization/anon-step4-review-submit.png?fit=max&auto=format&n=277fUvQIA2YgQg9V&q=85&s=0df44d5dee9d633bcf590a9d1c41b007" alt="Step 4 – Review and submit the anonymization job" width="1780" height="1698" data-path="images/anonymization/anon-step4-review-submit.png" />

After submission, the job runs in the background. You can monitor progress on the job detail page.
