← Ataski · Trust & security · DPA · Subprocessors

This is the canonical English version. Native translations for DE / ES / FR / PT / RU are pending counsel sign-off.

Privacy Policy

Last updated: 2026-05-19 Effective: 2026-05-19 Status: Draft for legal review (initial scaffold; supplements data-retention-policy.md and dpa-template.md).

Scope: This document describes how the AI Contract Intelligence role processes personal data contained in customer-uploaded contracts. The platform-wide privacy practices (account creation, billing, support correspondence) live in the master Terms of Service at docs/legal/terms-of-service.md. The data-processor relationship between Ataski and a customer organisation is governed by docs/legal/dpa-template.md.


1. What personal data the role processes

Customers upload contracts that may contain personal data about their counterparties, employees, or third parties. The AI Contract Intelligence preflight scanner detects 12 categories of personal data on every uploaded contract:

  1. Social Security Numbers (SSN)
  2. Credit card numbers (Luhn-validated)
  3. Bank routing numbers (ABA-validated)
  4. Dates of birth
  5. US street addresses
  6. Salary figures
  7. Passport numbers
  8. Driver's license numbers
  9. Health conditions
  10. Biometric identifiers
  11. Government IDs
  12. Ethnicity or religion mentions

Detection runs as a deterministic regex + keyword scan; no external service is contacted. The categories detected on each contract are persisted on the append-only extraction audit row (contract_extractions.personal_data_categories_detected) so a customer's DPA reviewer can answer "what categories of personal data did Ataski's LLM sub-processor see for this contract?".

For each Contract Intelligence analysis ("Victor") we additionally retain:


2. PII handling modes

Each Ataski tenant chooses one of three PII handling modes for the AI Contract Intelligence role at https://app.ataski.com/app/settings#pii-policy. The default for existing tenants and for new sign-ups is detect_only — selection of a stricter mode is opt-in.

2.1 detect_only (default)

The worker LLM (Anthropic Claude / OpenAI GPT-5 / Google Gemini — see the active sub-processor list at docs/legal/subprocessors.md) receives the original contract bytes. The preflight scanner's category roll-up is logged + stored on the extraction audit row; no extraction is blocked.

2.2 redact_selected

The tenant selects a subset of the 12 categories. Each occurrence of a selected category is replaced by the visible sentinel

[REDACTED-<category>]

in the bytes handed to the worker LLM. For example, a contract clause that originally read Employee SSN: 123-45-6789 arrives at the sub-processor as Employee SSN: [REDACTED-ssn]. The sentinel format is deliberately visible so the LLM can reason about its presence and so a tenant operator can spot it in any debug surface.

The original contract bytes are preserved INSIDE Ataski for the validator's anchor-citation logic; the customer-visible evidence panel always references the actual contract content. Only the worker LLM (and therefore the LLM sub-processor) sees the redacted bytes.

2.3 block_high_sensitivity

When the preflight scanner detects ANY of the following high-sensitivity categories, the extraction refuses without invoking the worker LLM:

The contract row lands in refused with reason refused_pii_high_sensitivity. The blocking category is recorded on the contract row's metadata so the customer's operator can review the refusal in the inbox. No transcript bytes leave Ataski for any sub-processor.

The high-sensitivity set is closed: SSN, credit card, bank routing, health condition, biometric identifier. Adding to this set is a behaviour change that requires Ataski's legal sign-off; the eval suite asserts the exact set.


3. The redaction sentinel format

The redaction sentinel is hardcoded as [REDACTED-<category>] where <category> is the lowercase slug from the 12-category enum (e.g. [REDACTED-ssn], [REDACTED-health_condition]). The format is part of Ataski's disclosure to customers and to customers' DPA counterparties — the sentinel is visible by design so:

Changing the sentinel format would constitute a change to what sub-processors receive and would require notice to customers per the DPA template's clause 9.


4. Customer corrections (feedback loop)

Customer corrections submitted via the per-field thumbs-up / thumbs-down feedback affordance on the Contract Intelligence dashboard are stored in your tenant's Row-Level-Security scope and used to improve our extractor's eval suite. Corrections are auto-deleted 30 days after offboarding alongside your other data, per data-retention-policy.md.

A correction record carries:

Corrections are visible only to your tenant's authenticated users and to Ataski engineering for eval-corpus promotion review. Corrections promoted into the regression eval suite are stripped of tenant / user identifiers before landing in the public test fixtures; the founder reviews each promotion candidate manually. The promotion flow is described in scripts/eval_corpus/promote_feedback_to_eval.py.


5. Persistence + export

Each tenant's selection is persisted on tenants.pii_handling_mode + tenants.pii_redact_categories (migration 0169). Mode changes write an audit row with action contract_pii_policy.updated.

The current selection appears in every GDPR Data Export (/app/settings/export.json) under the pii_policy section, so a regulator request or a customer's auditor can confirm which mode applied at a given point in time. The export carries the customer's OWN setting — Ataski never aggregates this across tenants.

Tenant administrators can export every tenant-scoped row via /app/settings/export.json. The export omits three internal columns from contract_extractions (prompt_version, schema_version, classifier_version) — these are platform IP and are not customer data. Every other column, including model identifiers, supervisor metadata, evidence anchors, and the detected personal-data category set, is exported. Feedback rows and Redline Memory state are exported in full.


6. Per-contract category log

For every contract Ataski processes — under any mode — the preflight scan's category roll-up is persisted on contract_extractions.personal_data_categories_detected. This record is in scope for the GDPR Data Export. It carries the category slugs detected, NOT the personal data values themselves.

The DB-layer CHECK constraint on this column restricts values to the closed 12-category set. Adding a category requires a paired migration that drops + recreates the constraint with the larger array, and matches a behaviour change to the preflight scanner that customers are notified of.


7. Retention and deletion

See data-retention-policy.md for the full table, deletion timeline, and CCPA/GDPR right-to-delete mechanics. Contract Intelligence records follow the same 30-day-after-offboarding deletion sweep as every other tenant-scoped table.


8. Subprocessors

See subprocessors.md for the full list. LLM sub-processors (Anthropic, OpenAI, Google) receive the transcript text plus the per-doc-type prompt; they do NOT receive the personal-data category tags as input. When a tenant selects redact_selected or block_high_sensitivity, sub-processors receive correspondingly redacted or blocked text per §2 above.


9. Contact

Questions about the AI Contract Intelligence role's PII handling go to privacy@ataski.com. DPA-specific questions follow the contact route in docs/legal/dpa-template.md.