Last updated: 2026-05-19
Effective: 2026-05-19
Status: Draft for legal review (initial scaffold; supplements data-retention-policy.md and dpa-template.md).
Scope: This document describes how the AI Contract Intelligence
role processes personal data contained in customer-uploaded
contracts. The platform-wide privacy practices (account creation,
billing, support correspondence) live in the master Terms of
Service at docs/legal/terms-of-service.md. The data-processor
relationship between Ataski and a customer organisation is
governed by docs/legal/dpa-template.md.
Customers upload contracts that may contain personal data about their counterparties, employees, or third parties. The AI Contract Intelligence preflight scanner detects 12 categories of personal data on every uploaded contract:
Detection runs as a deterministic regex + keyword scan; no
external service is contacted. The categories detected on each
contract are persisted on the append-only extraction audit row
(contract_extractions.personal_data_categories_detected) so a
customer's DPA reviewer can answer "what categories of personal
data did Ataski's LLM sub-processor see for this contract?".
For each Contract Intelligence analysis ("Victor") we additionally retain:
Each Ataski tenant chooses one of three PII handling modes for the
AI Contract Intelligence role at
https://app.ataski.com/app/settings#pii-policy. The default for
existing tenants and for new sign-ups is detect_only — selection
of a stricter mode is opt-in.
detect_only (default)The worker LLM (Anthropic Claude / OpenAI GPT-5 / Google Gemini —
see the active sub-processor list at
docs/legal/subprocessors.md) receives the original contract
bytes. The preflight scanner's category roll-up is logged + stored
on the extraction audit row; no extraction is blocked.
redact_selectedThe tenant selects a subset of the 12 categories. Each occurrence of a selected category is replaced by the visible sentinel
[REDACTED-<category>]
in the bytes handed to the worker LLM. For example, a contract
clause that originally read Employee SSN: 123-45-6789 arrives at
the sub-processor as Employee SSN: [REDACTED-ssn]. The sentinel
format is deliberately visible so the LLM can reason about its
presence and so a tenant operator can spot it in any debug surface.
The original contract bytes are preserved INSIDE Ataski for the validator's anchor-citation logic; the customer-visible evidence panel always references the actual contract content. Only the worker LLM (and therefore the LLM sub-processor) sees the redacted bytes.
block_high_sensitivityWhen the preflight scanner detects ANY of the following high-sensitivity categories, the extraction refuses without invoking the worker LLM:
The contract row lands in refused with reason
refused_pii_high_sensitivity. The blocking category is recorded
on the contract row's metadata so the customer's operator can
review the refusal in the inbox. No transcript bytes leave Ataski
for any sub-processor.
The high-sensitivity set is closed: SSN, credit card, bank routing, health condition, biometric identifier. Adding to this set is a behaviour change that requires Ataski's legal sign-off; the eval suite asserts the exact set.
The redaction sentinel is hardcoded as
[REDACTED-<category>] where <category> is the lowercase slug
from the 12-category enum (e.g. [REDACTED-ssn],
[REDACTED-health_condition]). The format is part of Ataski's
disclosure to customers and to customers' DPA counterparties —
the sentinel is visible by design so:
Changing the sentinel format would constitute a change to what sub-processors receive and would require notice to customers per the DPA template's clause 9.
Customer corrections submitted via the per-field thumbs-up /
thumbs-down feedback affordance on the Contract Intelligence
dashboard are stored in your tenant's Row-Level-Security scope and
used to improve our extractor's eval suite. Corrections are
auto-deleted 30 days after offboarding alongside your other data,
per data-retention-policy.md.
A correction record carries:
parties.0.name).Corrections are visible only to your tenant's authenticated users
and to Ataski engineering for eval-corpus promotion review.
Corrections promoted into the regression eval suite are stripped of
tenant / user identifiers before landing in the public test
fixtures; the founder reviews each promotion candidate manually.
The promotion flow is described in
scripts/eval_corpus/promote_feedback_to_eval.py.
Each tenant's selection is persisted on tenants.pii_handling_mode
+ tenants.pii_redact_categories (migration 0169). Mode changes
write an audit row with action contract_pii_policy.updated.
The current selection appears in every GDPR Data Export
(/app/settings/export.json) under the pii_policy section, so a
regulator request or a customer's auditor can confirm which mode
applied at a given point in time. The export carries the customer's
OWN setting — Ataski never aggregates this across tenants.
Tenant administrators can export every tenant-scoped row via
/app/settings/export.json. The export omits three internal
columns from contract_extractions (prompt_version,
schema_version, classifier_version) — these are platform IP
and are not customer data. Every other column, including model
identifiers, supervisor metadata, evidence anchors, and the
detected personal-data category set, is exported. Feedback rows
and Redline Memory state are exported in full.
For every contract Ataski processes — under any mode — the
preflight scan's category roll-up is persisted on
contract_extractions.personal_data_categories_detected. This
record is in scope for the GDPR Data Export. It carries the
category slugs detected, NOT the personal data values themselves.
The DB-layer CHECK constraint on this column restricts values to the closed 12-category set. Adding a category requires a paired migration that drops + recreates the constraint with the larger array, and matches a behaviour change to the preflight scanner that customers are notified of.
See data-retention-policy.md for the full table, deletion
timeline, and CCPA/GDPR right-to-delete mechanics. Contract
Intelligence records follow the same 30-day-after-offboarding
deletion sweep as every other tenant-scoped table.
See subprocessors.md for the full list. LLM sub-processors
(Anthropic, OpenAI, Google) receive the transcript text plus the
per-doc-type prompt; they do NOT receive the personal-data
category tags as input. When a tenant selects redact_selected or
block_high_sensitivity, sub-processors receive correspondingly
redacted or blocked text per §2 above.
Questions about the AI Contract Intelligence role's PII handling
go to privacy@ataski.com. DPA-specific questions follow the
contact route in docs/legal/dpa-template.md.