Data usage policies across different LLM providers and tiers
This research examines how different LLM developers handle user data across their various tier offerings. Understanding these policies is crucial for making informed decisions about which AI services to use based on your privacy requirements.
Click on any card above to jump to the detailed analysis below
Free
User interactions (inputs, outputs)
Used to improve models
Conversations may be logged and analyzed. 'Temporary Chat' available, 'Memory' feature can be toggled.
'Temporary Chat' available; 'Memory' feature can be toggled on/off. Opt-out via privacy portal applies to new conversations.
N/A for free tier, but general company commitment to security.
Paid (Plus/Pro)
User interactions (inputs, outputs)
Explicit options to opt out of data being used for model training.
More control over data retention and deletion.
Explicit opt-out available.
N/A for Plus/Pro specifically, but benefits from general OpenAI security.
Team/Enterprise/Edu/API
User interactions (inputs, outputs)
By default, NOT used for training or improving models. API users can explicitly opt-in to share data for improvement.
Organizations own data. Granular control over retention periods; deleted conversations removed within 30 days. API inputs/outputs retained for up to 30 days for abuse detection; Zero Data Retention (ZDR) option available.
Default non-training; ZDR option for API.
Encryption at rest (AES-256) and in transit (TLS 1.2+). SAML SSO, fine-grained access controls. SOC 2 Type 2, CSA STAR Level 1 certified. Supports GDPR, CCPA, HIPAA (BAA available). Data residency options.
Free Tier
User data (inputs and outputs).
Explicitly used to improve Google products.
Not explicitly detailed for retention, but used for improvement.
No explicit opt-out for training in free tier.
N/A for free tier.
Paid (Pro) / API
User data (inputs and outputs).
Explicitly not used to improve Google products.
Not explicitly detailed for retention, but not used for improvement.
Default non-training.
Aligns with enterprise expectations.
Consumer (Free/Pro)
User inputs/outputs, feedback (e.g., thumbs up/down).
By default, not used to train generative models, unless explicitly reported (feedback) or opted-in (trusted tester program). Flagged conversations for policy violations may be used for safety system training.
Feedback data stored for up to 10 years, de-linked from user ID before use for training.
Default non-training; explicit opt-in for training via feedback mechanisms.
Strict access limits for authorized staff.
Commercial (Claude for Work/API)
User inputs/outputs.
Separate, likely more stringent policies; expected to prioritize data non-training by default.
Not explicitly detailed in snippets, but implied to be enterprise-grade.
Expected default non-training.
API offers various models with token-based pricing.
Consumer
'De-identified data' from Bing searches, MSN activity, Copilot conversations, ad interactions. Conversations about uploaded files may be used.
Used for training, but de-identified. Does not train on Microsoft account profile data, email contents, or contents of files uploaded to Copilot.
Changes may take up to 30 days to implement opt-out.
Opt-out available via privacy settings in Copilot app or Edge. Certain users (commercial, not logged in, under 18, specific countries) automatically excluded.
De-identification (removing PII, blurring faces).
Commercial
User data from commercial customers.
Explicitly not used to train AI models.
Not explicitly detailed, but governed by DPA.
Default non-training.
Strong privacy guarantee for business users.
X Premium+ / API
User content (prompts, inputs: text, photos, images, file uploads), outputs. Real-time data from X.
Collected and used to develop and improve services, including model training. Leverages publicly available data and human-curated datasets.
Retained for service provision, development, improvement, terms enforcement; manual review possible.
Logged-in users have an option to select whether their user content is used for product development or model training. Unauthenticated access grants 'full rights' for data use.
Commercially reasonable technical, administrative, and organizational measures.
Consumer / Pro
User data (query data, account info, device info, site interactions, personal info if account created)
May be used for AI training and product improvement.
Retained as long as account is active; deleted within 30 days of account deletion.
Opt-out option available in account settings.
Will not sell/share data for advertising/marketing. Strict internal controls.
API Platform / Enterprise
Query data, API Usage Data (billable metadata), User Account Information (name, email)
Zero-day retention; never used for AI training.
Zero-day retention of user prompt data.
Default non-training for API; Zero Data Retention (ZDR) option.
Zero-day retention; never used for AI training. Compliant with Privacy Laws (GDPR, CCPA, etc.); relies on AWS security.
All Tiers
Prompts, generations, fine-tune data. Non-identifying usage data.
Opt-out mechanism available. Content from third-party apps (e.g., Google Drive) not used for training. PII filtered/stripped before training (if opted in).
Logged prompts/generations deleted after 30 days (unless legal/violation). Deleted chat history/finetune datasets purged after 7 days from backend. ZDR available for approved cases.
Explicit opt-out via dashboard settings.
Robust logging/monitoring for misuse detection; safety/security teams review flagged content (customer identifiers removed for aggregation). HITRUST-certified security.
All Tiers
Datasets uploaded for fine-tuning. Customer inputs/outputs (in third-party cloud deployments, Cohere does not receive them).
Datasets uploaded for fine-tuning are for that purpose; Cohere does not receive customer inputs/outputs in private deployments.
Datasets deleted 30 days after creation. 10GB storage limit.
ZDR available for approved cases.
HITRUST-certified. Supports GDPR, HIPAA, CMS-0057-F APIs. Clinical AI models fine-tuned by medical specialty using 'minimum necessary approach' for unstructured data.
Consumer
'Amazon Nova Interactions' (inputs, uploaded files, outputs, feedback).
Used to provide, develop, and improve Amazon's services, including AI models. Manual review possible.
Records, processes, and retains interactions in the cloud. Stored on servers outside user's country.
Advised not to submit sensitive info. Opt-out for cross-context behavioral ads available.
N/A for consumer Nova specifically.
Enterprise
Customer content (inputs, outputs).
Explicitly not used to improve base models and not shared with model providers (including Amazon Nova).
Not explicitly detailed for retention, but not used for training.
Default non-training.
Encrypted in transit and at rest; customer-managed keys optional. SOC, ISO, HIPAA eligible, GDPR compliant. Uncapped IP indemnity for outputs. Monitoring/logging via CloudWatch/CloudTrail. Secure access controls (IAM, PrivateLink).
Consumer / API
Name, phone number, IP address, online activity data, user conversations, metadata. Users may choose to provide sensitive info.
Explicitly used to improve services, train models, and develop new services. API: right to aggregate, collect, analyze data for product improvement.
Not explicitly detailed.
Providing sensitive info implies consent. No explicit opt-out for general training.
Will not sell/share data for advertising/marketing. Strict internal controls. Users under 18 not permitted.
Consumer
Account info, user content (prompts, uploaded text, files, images, audio), log data, usage data.
User content may be processed to improve services and develop new products (including for other customers).
May be stored outside user's jurisdiction.
Not explicitly detailed for training opt-out.
Encrypted in transit and at rest.
Enterprise API
User inputs/outputs via API.
Minimizes reliance on user data for training.
Not explicitly detailed.
Not explicitly detailed, but 'minimizes reliance' implies a privacy-focused approach.
Aligns with GDPR and other global privacy standards. Enhanced data security and privacy for businesses. Open-source models available.
Note: This information is based on publicly available privacy policies and may change over time. Always review the latest privacy policies before using any AI service, especially for sensitive data.