AI Vendor Privacy Audit

Why This Matters Right Now

Every AI vendor says they take privacy seriously. Their marketing pages mention encryption, compliance, and security certifications. But when you dig into the actual terms of service, data processing agreements, and technical architecture, the gaps become obvious.

This matters because:

–AI tools process your most sensitive content. Meeting transcripts, legal documents, financial data, client communications: the information you put into AI tools is often more sensitive than what you store in your CRM or email.
–Training on your data is still the default for many providers. Unless you've specifically opted out (and verified that opt-out is honored), your data may be used to improve the vendor's models, which means it could influence outputs shown to other users.
–Regulatory exposure is increasing. GDPR, CCPA, HIPAA, and industry-specific regulations are starting to catch up with AI. What was a gray area in 2024 is becoming an enforcement priority in 2026.
–Your clients expect you to know. If you're a law firm, consulting firm, or financial advisor using AI tools on client data, your clients have a reasonable expectation that you've vetted the privacy implications. "We didn't check" is not a defensible answer.

This checklist gives you 10 specific questions to ask any AI vendor before you sign. Each question includes what a good answer looks like, what a bad answer looks like, and why it matters.

The 10 Questions

1. Is my data used to train your models?

Why it matters: If your data trains the vendor's model, your confidential information could influence outputs shown to other users. For regulated industries, this is often a compliance violation.

Good answer: "No. Customer data is never used for model training. Our terms of service explicitly exclude customer data from training datasets, and this is technically enforced, not just a policy."

Bad answer: "We use aggregated and anonymized data to improve our models." (Aggregation and anonymization of text data is extremely difficult to verify, and "anonymized" meeting transcripts can still contain identifiable information.)

Red flag: The vendor can't point to a specific clause in their terms that addresses this, or the clause contains qualifying language like "unless you opt in" without a clear opt-out mechanism.

2. Where is my data stored, and in which jurisdiction?

Why it matters: Data residency determines which laws apply. EU data stored on US servers may violate GDPR. Healthcare data must meet specific storage requirements under HIPAA. Financial data has its own rules.

Good answer: "Your data is stored in [specific region]. We can provide data residency guarantees for [EU/US/specific country]. Here's our data processing agreement with the specific data center locations."

Bad answer: "We use cloud infrastructure with global availability." (This means they probably don't know exactly where your data is, or it moves between regions.)

What to verify: Ask for the specific cloud provider, region, and whether data ever transits through other regions during processing.

3. What happens to my data if I cancel my subscription?

Why it matters: Data retention after cancellation is a common blind spot. Some vendors retain data indefinitely for "service improvement" even after you leave.

Good answer: "All your data is deleted within [30/60/90] days of cancellation. You can request immediate deletion at any time. We provide a data export before deletion."

Bad answer: "We retain data for a reasonable period after cancellation." (What's reasonable? Who decides?)

What to verify: Ask for the specific retention period in writing, and whether "deletion" means actual deletion or just de-identification.

4. Who at your company can access my data?

Why it matters: Even if the AI model doesn't train on your data, human employees might have access to it for debugging, quality assurance, or support purposes.

Good answer: "Access to customer data is restricted to [specific roles] under [specific conditions]. All access is logged and auditable. We have [SOC 2 / ISO 27001] certification that covers access controls."

Bad answer: "Our team follows strict internal policies." (Policies without technical enforcement and auditing are just suggestions.)

What to verify: Ask whether support staff can see your actual content (transcripts, documents, messages) or only metadata.

5. Do you use sub-processors, and which ones?

Why it matters: Your data might be secure at the vendor level but exposed through a third-party sub-processor. The chain of custody matters.

Good answer: "Here is our complete list of sub-processors, updated quarterly: [list]. Each sub-processor is bound by our data processing agreement and undergoes annual security review."

Bad answer: "We work with industry-standard partners." (Who? Doing what? With what data?)

What to verify: Specifically ask whether transcription, AI inference, or analytics are handled by third parties. These are the most sensitive processing steps.

6. How is my data encrypted, in transit and at rest?

Why it matters: Encryption in transit (HTTPS) is table stakes. Encryption at rest is important but insufficient alone. The real question is who holds the encryption keys.

Good answer: "Data is encrypted in transit (TLS 1.3) and at rest (AES-256). Encryption keys are managed by [specific KMS]. Customer data is encrypted with per-tenant keys."

Bad answer: "We use industry-standard encryption." (This usually means TLS for transit and whatever the cloud provider defaults to for storage.)

What to verify: Per-tenant encryption keys mean the vendor can't accidentally expose one customer's data to another. Shared keys mean a breach affects everyone.

7. Can I get a copy of your SOC 2 report or security audit?

Why it matters: SOC 2 Type II is the baseline for SaaS security verification. If a vendor doesn't have one, they haven't been independently audited.

Good answer: "Here's our SOC 2 Type II report from [date]. We also have [ISO 27001 / HIPAA BAA / other relevant certifications]. Our next audit is scheduled for [date]."

Bad answer: "We're working toward SOC 2 certification." (This means they don't have it. "Working toward" can mean anything.)

What to verify: SOC 2 Type I is a point-in-time check. Type II covers a period (usually 6-12 months) and is significantly more meaningful.

8. What's your incident response process, and will you notify me of breaches?

Why it matters: Breaches happen. The question is how fast you find out and what the vendor does about it.

Good answer: "We commit to notifying affected customers within [24/48/72] hours of confirming a breach. Here's our incident response plan. We've had [0/N] incidents in the past 12 months, and here's how they were handled."

Bad answer: "We follow all applicable notification requirements." (This is the legal minimum and often means 30+ days.)

What to verify: Ask for the specific notification timeline in writing, not just a reference to "applicable law."

9. Can I run your tool on-premise or in my own cloud?

Why it matters: For some organizations, the only acceptable answer to data privacy is keeping data on infrastructure they control. This is especially true for law firms handling privileged communications, healthcare organizations, and financial institutions.

Good answer: "Yes, we offer on-premise deployment. Here are the infrastructure requirements and pricing." Or: "We offer a private cloud deployment in your AWS/Azure/GCP account."

Bad answer: "Our cloud deployment meets the highest security standards, so on-premise isn't necessary." (That's not for the vendor to decide.)

What to verify: If the vendor offers on-premise, ask whether it's the same product or a limited version. Some vendors offer an "on-premise" option that's missing key features.

10. How do you handle data from AI-generated outputs?

Why it matters: This is the question most vendors aren't prepared for. AI outputs (summaries, action items, insights) are derived from your data. Are they treated with the same privacy protections as the source data?

Good answer: "AI-generated outputs are treated identically to source data: same encryption, same access controls, same deletion policies. Outputs are never used for model training or shared across tenants."

Bad answer: "Outputs are generated in real-time and aren't stored." (If they're not stored, how do you access your meeting notes later? Something doesn't add up.)

What to verify: Ask specifically about caching, logging, and whether outputs pass through any third-party services before reaching you.

How to Use This Checklist

Before signing a new vendor:

–Send these 10 questions to the vendor's sales or security team
–Request written answers, not verbal assurances
–Compare answers against the good/bad benchmarks above
–Flag any question where the vendor can't provide a clear, specific answer

For vendors you already use:

–Review your existing agreements against these questions
–Send any unanswered questions to your account representative
–Document the answers for your compliance records
–Set a calendar reminder to re-verify annually

For your clients: If you're a professional services firm, having documented answers to these questions for every AI tool you use demonstrates due diligence. It's the difference between "we use AI responsibly" and "here's exactly how we protect your data when AI is involved."

How Orquestria Handles These Questions

We built Orquestria and Cadence with these questions in mind because our earliest users were law firms and consultants who needed real answers, not marketing copy.

Cadence by Orquestria deploys on your infrastructure:

–On-premise AI processing: your data never leaves your network. No cloud APIs, no third-party model providers, no data in transit to external servers.
–You control the encryption keys: we don't have access to your data even if we wanted to.
–No training on customer data: ever. This is architecturally enforced, not just a policy.
–Full audit trail: every query, every response, every access event is logged on your infrastructure.

For organizations where the answer to Question 9 must be "on-premise," Cadence is built for you.

Need help with your AI audit?

GET IN TOUCH ->