Large language models (LLMs) have become the new normal for millions of users in the European Union, and user acquisition doesn’t seem to slow down anytime soon. Meant to act as a collection of experts in your pocket, LLM providers offer a sleek interface that anyone can use, often coupled with a free tier to attract even more users. This isn’t like the classic SaaS product, though; hidden in a wild forest of policies, the true potential of the LLM for providers is clear: to profit from personal data collection. The providers claim that the legal grounds of the data processing are in their legitimate interest, which has, like patent advisor Fredrik Edman at LTH once stated, “lately become the slop of legal grounds”.
Rooted in publicly available privacy policies, corporate disclosures, and official blog posts, this article examines how each major provider gathers personal data1 , from account details and device fingerprints to the very prompts that fuel model training, how consent is obtained, how long the data is retained, and under what circumstances it may be disclosed to third parties. It is aimed at individuals within the EU who use these services for personal purposes, and therefore doesn’t cover enterprise plans or any corporate agreements with the LLM providers.
ChatGPT
ChatGPT, developed by OpenAI, was released in November 2022 and is is recognized for its conversational skills and human-like text generation. Built on transformer architecture, it is currently one of the most popular LLMs globally, accessible through both web and API.
Even if the user never explicitly has to agree to any privacy policy, OpenAI still collects certain personal data without explicit purpose and retention information. An example is consent-based precise GPS location, fraud protection data received from unspecified sources, or potential leads collected from marketing vendors.
This part is primarily based on OpenAI’s EU privacy policy, which can be found here.
Personal data collected
- User content: Prompts, uploaded content, recorded audio, “other information you provide”
- Account info: Full name, email, phone number, account credentials, birthdate, payment information, payment transaction history
- Device info: Name of device, operating system, browser & browser settings, “device identifiers”
- Location info: General location, GPS location if consent is given
- Log data: IP address, “how you interact with our services”
- Data received from “other sources”: “fraud protection data”, “potential customers from marketing vendors”
- Info from the internet to train their models: Internet content may include personal information. 2
How your data is used
- Train models: While opt-out is available, personal data will by default be used to train their models. Even if you opt-out, personal data may still be used to train their models. 3
- Conduct research to “improve and develop” services: This includes, but is not limited to “training and improving models”.
- Provide, analyze and maintain services
- Communication to customers
- To detect fraud, illegal activity, or misuse of services
- To comply with legal obligations
When and to whom your data is shared
- Vendors and service providers: Shared with providers of hosting services, customer service vendors, cloud services, content delivery services, support and safety monitoring services, email communication software, web analytics services, payment and transaction processors, and other “information technology providers”. OpenAI has 17 sub-processors based all over the world.4
- “Counterparties and others assisting with the Transaction” in a potential business transfer: In the event of strategic transactions, reorganization, bankruptcy, receivership, or transition of service to another provider.
- Affiliates to OpenAI
- Other Users and Third Parties You Interact or Share Information With: When sharing conversations or using third-party applications like custom GPT actions
- Government authorities or “other third parties“: In the event of a legal obligation, suspicion of fraud, or to protect the safety of the product.
How long your data is kept
- Personal data is kept as long as they need to provide the services to you.
- They are also kept for “other legitimate business purposes such as resolving disputes, safety and security reasons, or complying with our legal obligations.”
- Temporary chats and the files tied to them are wiped from their systems after 30 days
- Chats and the files tied to them are saved to the account until you delete them, after which they will be wiped from their systems after 30 days. This excludes chats flagged for a security or legal reason. 5
When did I say this was ok?
Both when creating a new account, and before writing the first prompt, you must agree to the Europe Terms of Use. You must also have read their Privacy Policy.
An interesting point is that the user never explicitly agrees to the Privacy Policy, it only states that the policy must be read before entering any personal data.
What you can do
- You can turn off all usage of personal data for training all of their models though the Privacy Center (badly designed, currently found here) AND by disabling it in the ChatGPT settings (instructions here)
- To avoid non-essential cookies such as analytics and market performance cookies, you can opt-out of these in the Cookie Preferences Center (instructions here)
Google Gemini
The LLM models developed by Google that are commercially available are all dubbed Gemini, and represent their most advanced multimodal models up to date. Built with deep integration into the Google ecosystem, Gemini has unique position in crafting purposeful output with a much larger context than its competitors.
Indeed, this is what the models do. The documents state the Gemini continuously collects information from connected apps and other Google services, which isn’t made clear to the user. Further, the Gemini app may take screenshots and collect page data when browsing the web.
Google has also introduced a scary setting called “Keep Activity”. By accidentally keeping this feature on, you consent to Google training their models on all your data (including any images and videos you may have shared to Gemini through Google Photos) and handing it off to human reviewers.
This part is primarily based on Google’s Gemini Apps Privacy Notice (which can be found here), and the Google Privacy Policy (found here).
Personal data collected
- User content: Prompts, uploaded content, recorded audio, audio transcripts, screenshots and page data when using the Gemini app
- Info from connected apps and other Google services: Photos and videos from Google Photos if shared with Gemini, info from services such as Google Search or Youtube History
- Device info: Operating system, browser type & settings, device type & settings, “identifiers”
- Log data: IP address, interaction logs, performance metrics, crash and debug information
- Location info: General location, GPS location if consent is given
- System data (when using the app): Call and message logs, contacts, installed apps, language preferences, screen content, and other app info like page context and URL
- “Context info” (if you use Gemini with certain devices or services): Smart home device names and playlists
- “Supplemental info”: If consent is given, information collected through supplemental Gemini Apps features.
- Info from the internet to train their models: Internet content may include personal information.
Note: This does not include the personal data Google collects in general, which likely extends the list with Account info, etc. Information about the general personal data collected can be found here.
How your data is used
- To let human reviewers review some of your data: They explicitly state not to “enter confidential information that you wouldn’t want a reviewer to see or Google to use to improve our services”. You can opt-out to some of the processing, but even then Google still uses your chats to “respond to you and “help protect Google, our users, and the public, including with help from human reviewers”.
- Train models: They explicitly state that your data is used to provide, maintain, improve “the generative AI models and other machine-learning technologies powering our services”.
- Conduct research to “improve and develop” services: They state that the research “improves our services for our users and benefits the public”.
- To “tailor” the user experience to the user: For example, the general location is always used to tailor the response.
- “Fulfilling obligations to our partners like developers and rights holders“
- Provide, analyze and maintain services
- To detect fraud, illegal activity, or misuse of services
- Communication to customers
- To comply with legal obligations
Note: This does not include the processing of your data done by Google in general. Information about why Google collects data across all of their services can be found here.
When and to whom your data is shared
- Service providers (chats are disconnected from Google account)
- Connected apps: Information from your conversation and your device, Preferences such as language and location information may be shared with the connected app. With consent, also account info is shared.
Note: Google could still disclose your data to parties outlined in their general Privacy policy, which isn’t covered in this article. You can read which entities Google discloses your data to in general here.
How long your data is kept
- Some data is kept until you delete your Google Account, such as information about how often you use Gemini Apps.
- Some data is kept for even longer when necessary for legitimate business or legal purposes, such as security, fraud and abuse prevention, or financial record-keeping.
- Regarding chats, you can configure if the chats should be “deleted” in 3 months, 1.5 years, 3 years, or never. If the “Keep Activity” setting is turned off, Google still saves chats for 72 hours to “assist in keeping Gemini safe”.
- Chats reviewed by human reviewers (and related data like your language, device type, location info, or feedback) are not deleted when you delete your activity. Instead, they are retained for up to three years.
Note: Google can still retain personal data according to the practices presented in their general Privacy policy, which isn’t covered in this article. Info about the data retention in the general case can be found here.
When did I say this was ok?
When creating the Google account used for Gemini you consent to the general Privacy policy. This policy is also agreed to if chatting with Gemini without being logged in on the website.
Terms of Service are agreed to when chatting with Gemini through their app or web app for the first time.
What you can do
- Turn off “Keep Activity”: This setting controls if your chats and the files associated with them should be saved in your Activity, and could help lower the amount of data used for model training. If turned on, you accept that Google trains their models off of your data and hands it off to human reviewers for review. Even when the setting is off, Google still uses your chats to “respond to you and “help protect Google, our users, and the public, including with help from human reviewers”.
Claude Sonnet, Opus, Haiku
The Claude family of models, all developed by Anthropic, are designed for safety, interpretability, and long-context reasoning. Favorable for tasks requiring reliable and safe language generation, Claude also has a strong performance in math, logic and coding tasks. Claude is primarily available via API, but has lately pivoted towards consumer-facing products such as Claude Code, Claude in the browser, etc.
When using Claude services, they don’t collect less personal data than other providers. One thing that sets them apart is the extensive collection of device info, examples include which mobile network, ISP, and device you use. They also collect precise GPS location if consent is given, based on their privacy policy it is unclear why. Another interesting aspect is the fact that model training has to be opted-out of, and even if a user has opted-out, their conversation can still be trained on if Anthropic suggests that it violates the Usage Policy.
This part is primarily based on Claude’s Privacy policy (found here).
Personal data collected
- User content: Prompts (“inputs” in the form of chat, coding or agentic sessions), feedback
- Info from third party applications: Integrated with the services, they provide “inputs”
- Account info: Full name, email, phone number, payment information
- Device info: Mobile network, mobile operator or ISP, device type, operating system, browser info, “unique identifiers” (device, advertising, probabilistic, personal or online), “connection information”, “technology on the devices you use to access the services”
- Log data: IP address, browsing history of service, searches, which links are clicked, “other info about how you use the services”, troubleshooting info
- Location info: General location, GPS location if consent is given
- Cookies & similar technologies
How your data is used
- Train models: Data that you provide is used to train their models, unless you opt-out. If you opt-out, data is not used for training, unless a conversation has been submitted for feedback, or a conversation has violated their Usage Policy.
- Conduct research to “improve the services”: To improve the direction and development of services, and conduct research that “benefits the AI industry and society”.
- To recognize you, market additional products, analyze usage, and customize the experience: This is what cookies and similar technologies are used for.
- Provide, analyze and maintain services
- Communication to customers
- To detect fraud, illegal activity, or misuse of services
- To comply with legal obligations
When and to whom your data is shared
- Service providers, affiliates and corporate partners: Including website and data hosting, ensuring compliance with industry standards, research, auditing, data processing, and providing you with the services. Claude has 14 sub-processors based all over the world.6
- As part of a significant corporate event: Your personal data will be disclosed in case of a merger, corporate transaction, bankruptcy, or other situation involving the transfer of business assets.
- Integrations with third parties: Personal data may be shared when interacting with 3rd party content on Claude
- Governmental authorities: For legal, tax or accounting purposes, or in response to their requests. Also, in connection with claims, disputes or litigation, when otherwise permitted or required by law, or if they determine its disclosure is necessary.
How long your data is kept
- Personal data in general is kept “as long as reasonably necessary for the purposes and criteria outlined”.
- if you haven’t opted-out of model training, chats are kept “in de-identified format” for up to 5 years
- Regarding chats, a “deleted” chat gets wiped from their systems after 30 days, unless one of three reasons:
- If you consent to your chats being used for model training, chats are kept “in de-identified format” for up to 5 years
- If you have sent feedback about a chat, that conversation is also kept “in de-identified format” for up to 5 years
- If a chat is flagged as a violation of their Usage Policy it can be kept for up to 2 years 7
When did I say this was ok?
When creating a new account and entering your email-address for the first time, you must acknowledge their privacy policy.
The Cookie policy is accepted the first time the site is visited.
What you can do
- Opt out of model training: Claude will by default use your data to train their models, but this can and should be disabled. A guide on how to do this can be found here.
Grok and Grok Code
Grok is the LLM chatbot developed by xAI which was released in November 2023. Grok emphasizes providing objective and concise answers, prioritizing clarity and accuracy, particularly in applications targeting technical and scientific fields.
xAI trains on all data provided to Grok by default, and even you have opted-out of model training, it is unclear what xAI still can do through the “Conduct research to improve and develop services” section of their Privacy policy. Another interesting aspect is Grok’s integration with the social media platform X, which suggests that if Grok is used on the X platform, it also has access to all your account data and content on X.
This part is primarily based on xAI’s Privacy policy (found here) and their EU Privacy Policy Addendum (found here).
Personal data collected
- Account info: Full name, email, account credentials, birthdate, payment information, “login with third party providers”, address if consent is given
- User content: Prompts, uploaded content, recorded audio, “other material”, feedback
- Device info: Device type, operating system, browser info, browser plugins
- Location info: General location, GPS location if consent is given, “location info from third party services used” if consent is given
- Log data: IP address, country, “how you use and interact with the service”
- Social Media data when interacting with social media pages
- Cookies & similar technologies
Note: xAI explicitly states NOT to include personal information in your prompts and inputs into their Service, as this data may be reproduced in the output.
How your data is used
- Train models: While opt-out is available, your content and interactions with Grok will by default be used to train their models
- Conduct research to “improve and develop” services: All Account data, User content, Device info, and Log data may be used to conduct research to improve their product
- Identify usage trends
- Identify new customers
- Data analysis
- Communication to customers
- To detect fraud, illegal activity, or misuse of services
- To comply with legal obligations
When and to whom your data is shared
- Service providers: Including providers of hosting, cloud, analytics, content delivery, support and safety monitoring, payment and transaction, and other “technology services”. Parts of the prompt is shared with Brave Software so Brave’s search results can be included in the service. xAI has 10 sub-processors based all over the world.8
- To related companies: Your data is used to the extent necessary to fulfill a request you have submitted
- In connection with business transfers: Your personal data may be disclosed in case of a merger, financing, acquisition, bankruptcy, dissolution, transaction, or proceeding.
- For legal purposes: It is not stated to whom your personal data will be shared, only that it may be shared in the event of a legal case.
- To third-parties with which you interact or share information
How long your data is kept
- Your personal data is kept as long as there is “an ongoing legitimate business need to do so”
- Regarding chats, temporary chats and “deleted” chats are wiped after 30 days unless they need to be kept longer for legal, compliance, or safety purposes.
- In certain circumstances, your personal data is kept for legal reasons even after the account is deleted
When did I say this was ok?
When chatting with Grok without being signed-in, you agree to xAIs Terms of Service and Privacy Policy when sending your first chat.
When signing up, you must agree to xAIs Terms of Service and Privacy Policy.
Cookies are never accepted, though both Cookies, Session and Local Storage is used. According to them, this is because all of the cookies and similar technologies are deemed necessary, therefore consent is not needed according to the GDPR.
What you can do
- Opt out of model training: By default, xAI uses your data to train their models. This page on X specifies that your inputs (including voice inputs and transcriptions and translations of the voice inputs), results with Grok, interactions and public data are used to create generative AI models. The same page also explains how to disable model training.
Github Copilot Free and Pro
While Github Copilot primarily was launched in 2021 as a code completion and AI pair programmer, it can be used like a regular LLM through the web interface, CLI, your IDE of choice, or Github Desktop. Copilot works by letting developers choose between a variety of different LLMs from various LLM providers, such as OpenAI, Anthropic, Google, and xAI.
The interesting aspect of Copilot is the fact that Github has to maintain different agreements for every provider, which can lead to varying usage and disclosure of personal data. While many of the providers have Zero Data Retention (ZDR) agreements with Github, some of them don’t (see below). Like Google and Gemini, xAI and Grok, Copilot has access to much of the data available on Github, both publicly available data but also your Github account data. Among other data, Copilots LLM models get access to your repos (even private ones), commits, issues, PRs, your account data, etc. Additionally, the data of Copilot Free users will by default be used for model training conducted by Github themselves.
This part is primarily based on the Github Copilot model hosting page (found here), the Terms for Github Copilot (found here) and the Github Copilot Trust Center (found here).
Personal data collected
- User content: Prompts (inputs for chat or code), uploaded content, “context”, feedback
- Log data: “product usage metrics”, system logs, error messages
- Account info: Full name, biography, specified location, specified company, specified hireable status, follower list, following list
- Code: Read repo contents, commits, issues, PRs
Note: This does not include the personal data Github (or its parent company Microsoft) collects in general, which likely extends the list with more Account info, Device info, Location info, etc. Information about the general personal data collected can be found here.
How your data is used
- Train models: While opt-out is available, the personal data of Github Copilot Free users may be used for “AI model training if permitted”. This regards the AI model training done by Github, whereas the data processing of each model provider varies:
- For OpenAI models (except Raptor mini): Github has a Zero Data Retention (ZDR) agreement with OpenAI, which means models are not trained on personal data and the data processing “follows the OpenAI Enterprise Privacy comments”.
- For Anthropic models: Models are in addition to Antropic’s servers hosted on AWS and Google Cloud. Github has provider agreements with all three providers, ensuring no prompt logging and no personal data used for training.
- For Google models: Prompts and metadata is sent to Google, but a provider agreement ensures no prompts or output to train its models.
- For xAI models: Github has a ZDR agreement with xAI, ensuring no prompts or metadata is saved and no data is used for model training.
- Provide and maintain services
- Deliver personalized experiences and recommendations
- To detect fraud, illegal activity, or misuse of services
- For virus scanning
- To comply with legal obligations
Note: This does not include the processing of your data done by Github in general. Information about why Github collects data across all of their services can be found here.
When and to whom your data is shared
- Service providers, affiliates, and corporate partners: Microsoft, Github subsidiaries, third parties acting as sub-processors. Github has 7 subsidiaries and 22 sub-processors based all over the world. 9
Note: Github may still disclose your data to parties and in situations outlined in their general Privacy Policy. You can find the cases of disclosure of your personal data in general here.
Retention of personal data
- GitHub Copilot in the code editor does not retain any prompts (like code or other context used for the purposes of providing suggestions) for training the foundational LLMs. Prompts are discarded once a suggestion is returned. GitHub Copilot Individual subscribers can opt-out of sharing their prompts with GitHub which will otherwise be used to finetune GitHub’s foundational model.
- GitHub Copilot outside the code editor (including CLI, Mobile, and GitHub Copilot Chat on GitHub.com) will retain prompts, suggestions, and responses to retain conversation history and provide continuity across page navigation events and browser sessions, but will not retain prompts, suggestions, or responses for training GitHub’s foundational model. 10
- Feedback data is retained as long as necessary.
When did I say this was ok?
When starting a new chat session with a model that wasn’t used before, you must agree to the “Hosting of models for Github Copilot Chat” information.
What you can do
- Opt out of Github using data for product improvements: This allows Github, its affiliates and third parties to use your data (including prompts, suggestions, code snippets) for product improvement. More information about turning this off can be found here.
- Opt out of model training: This allows Github, its affiliates and third parties to use your data (including prompts, suggestions, code snippets) for AI model training. More information can be found here.
References
- M. Landwehr, “I analyzed LLM market share across countries. Here are my biggest surprises”, LinkedIn, 2025. [Online]. Available: https://www.linkedin.com/posts/landwehr_i-analyzed-llm-market-share-across-countries-activity-7326607933671186432-W5hl . Accessed Dec. 28, 2025. ↩︎
- OpenAI, “How ChatGPT and our foundation models are developed”, OpenAI, 2025. [Online]. Available: https://openai.com/policies/how-chatgpt-and-our-foundation-models-are-developed/. Accessed: Dec. 29, 2025. ↩︎
- OpenAI, “How your data is used to improve model performance”, OpenAI, 2025. [Online]. Available: https://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance. Accessed: Dec. 29, 2025. ↩︎
- OpenAI, “OpenAI Sub-processor List”, OpenAI, 2025. [Online]. Available: https://openai.com/policies/sub-processor-list/. Accessed: Dec. 29, 2025. ↩︎
- OpenAI, “Chat and file retention policies in ChatGPT”, OpenAI, 2026. [Online]. Available: https://help.openai.com/en/articles/8983778-chat-and-file-retention-policies-in-chatgpt. Accessed: Dec. 29, 2025. ↩︎
- Anthropic, “Subprocessors,” Anthropic, 2026. [Online]. Available: https://trust.anthropic.com/subprocessors. Accessed: Jan. 6, 2026. ↩︎
- Anthropic, “How long do you store my data”, Anthropic, 2026. [Online]. Available: https://privacy.claude.com/en/articles/10023548-how-long-do-you-store-my-data. Accessed: Jan. 7, 2026. ↩︎
- xAI, “xAI Subprocessor List”, xAI, 2026 [Online]. Available: https://x.ai/legal/subprocessor-list. Accessed: Jan. 17, 2026. ↩︎
- GitHub, “GitHub Subprocessors”, GitHub, 2026. [Online]. Available: https://docs.github.com/en/site-policy/privacy-policies/github-subprocessors. Accessed: Feb. 5, 2026. ↩︎
- GitHub, “How GitHub Copilot handles data,” GitHub, 2026 [Online]. Available: https://resources.github.com/learn/pathways/copilot/essentials/how-github-copilot-handles-data/. Accessed: Feb. 5, 2026. ↩︎

