Your language model has memorized your proprietary training data. Not all of it—but probably more than you think. And if someone knows how to ask, they can extract it.
This isn't theoretical. LLM data leakage has become one of the most underestimated security risks in enterprise AI deployments. A model trained on sensitive customer data, internal policies, or financial information can be queried in ways that expose exactly the data you thought was protected.
The mechanisms are sophisticated, and the defences are incomplete. Here's what you need to know.
How LLMs Memorize Training Data
Large language models work by learning probability distributions over text. During training, they see patterns in millions of documents. But "learning patterns" is more literal than most people realise.
When you train an LLM on a dataset, some exact sequences from the training data get encoded directly into the model weights. This is called training data memorization. It's not a bug—it's a side effect of how neural networks learn.
Memorization is more likely when:
- The training data is small or the model is very large relative to the data
- Training sequences are unique or unusual (like email addresses, phone numbers, API keys)
- Training data is repeated multiple times during training
- The model is trained for many epochs on the same dataset
"A model doesn't need to memorize all of your data to leak it. It only needs to memorize enough unique sequences that an attacker can discover them through queries."
The Attack Surface: Five Types of Data Leakage
1. Exact Memorization and Prompt Extraction
An attacker crafts prompts designed to make the model emit exact sequences from the training data. For example:
- "Write a customer support email for a company with this policy: [blank]"—and the model autocompletes with exact text from internal documents
- "Write the SQL query that would be used to [request], the exact query should be: [blank]"—and the model fills in actual queries from training data
These attacks exploit the fact that models are trained to predict the next token. If a sequence from training data exists in the model, subtle prompting can often get the model to emit it.
2. Membership Inference Attacks
A membership inference attack determines whether a specific document was in the training set. An attacker doesn't extract the data—they just prove it exists.
The attack works by observing model confidence. If you feed the model a document that was in training, it will typically have higher confidence in its predictions for that document compared to a document it's never seen. By analyzing confidence scores across documents, you can infer which ones were used in training.
For a financial services organisation, this could reveal whether a customer's sensitive transaction history was used in training.
3. Reconstruction and Inference Attacks
Even if the exact training data isn't memorized, an attacker can sometimes reconstruct or infer the general content. By querying the model multiple times with variations, they can estimate what the model "knows" about a topic.
This is particularly dangerous for PII (personally identifiable information). If the model was trained on a dataset containing personal details, careful questioning can often reconstruct much of that information.
4. PII and Credential Leakage
Models trained on internet data or poorly-curated internal data often memorize PII: email addresses, phone numbers, names, addresses. Some memorize credentials.
A particularly dangerous scenario: a model trained on GitHub code repositories has memorized API keys and tokens from public code. Users interacting with such models can sometimes extract these credentials through prompting.
5. Privilege Escalation Through Data Leakage
If a model was trained on documents that reveal internal processes, systems, or security controls, an attacker can use the model as an oracle to understand how your organisation works—then use that knowledge to craft more effective attacks.
Why Traditional Data Protection Doesn't Work for LLMs
You might think the solution is encryption or access control. Those help, but they're not sufficient for LLMs.
Once data is encoded into model weights through training, encryption doesn't help—you've already given the attacker the model. Access controls don't help—the attacker interacts with the model like any legitimate user.
The problem is fundamentally different from traditional data storage. Your database has data and an access control list. Your LLM has data encoded into its weights, and anyone who can query it can potentially retrieve it.
"Encrypting your training data is good practice. But once you train the model, that data is in the weights. The model is the security boundary."
Practical Controls for Enterprise Deployments
1. Data Sanitization and Deduplication
Before training:
- PII scrubbing: Systematically remove or tokenize names, email addresses, phone numbers, credentials from training data
- Deduplication: Remove duplicate sequences from training data. Memorization risk increases with repetition
- Data minimisation: Include only necessary data in training sets. If customer records aren't needed for model function, don't include them
2. Differential Privacy
Differential privacy is a mathematical framework for training models on sensitive data while making memorization harder.
The idea: add noise to the training process such that the model learns general patterns but cannot memorize individual training examples. Queries designed to extract data will fail because the model never learned that exact data.
Trade-off: differential privacy reduces model accuracy. For high-security use cases, this trade-off is worth it.
Tools: TensorFlow Privacy, Opacus (PyTorch) enable differentially private training.
3. Output Filtering and Sanitization
Monitor and filter model outputs for:
- Detected PII patterns (email addresses, phone numbers, credit card numbers)
- Credentials or API keys
- Known sensitive data (scan outputs against a list of known sensitive sequences)
This doesn't prevent memorization, but it prevents leakage from the model endpoint.
4. Fine-tuning Isolation
If you fine-tune a pre-trained model on sensitive data:
- Use smaller fine-tuning datasets
- Apply differential privacy to fine-tuning steps
- Reduce training epochs: fewer passes over data means less memorization
- Use techniques like Low-Rank Adaptation (LoRA) that freeze base model weights and train only adapter layers
5. Query Monitoring and Rate Limiting
Monitor model queries for patterns that suggest extraction attacks:
- Detect unusual request patterns (thousands of similar queries with slight variations)
- Monitor for queries explicitly asking for training data or memorised content
- Implement rate limiting and user-level quotas
- Log all queries and maintain audit trails
6. Model Interpretability and Audit
Periodically audit your models for memorization:
- Use techniques like LIME or SHAP to understand what the model has learned
- Test the model with known training data sequences—does it reproduce them?
- Perform red team exercises to identify extractable data
7. Governance and Procurement
When using third-party models:
- Demand transparency: Require vendors to disclose training data sources and any PII included
- Audit rights: Include rights to audit models for memorised data
- Data handling commitments: Require vendors to apply PII scrubbing and differential privacy
The Privacy Act and LLM Data Leakage
For Australian organisations, there's a regulatory angle. The Privacy Act requires reasonable security of personal information. If your LLM leaks customer PII due to poor data handling, you may have breached the Privacy Act—even if your traditional data storage was encrypted.
The Office of the Australian Information Commissioner (OAIC) hasn't yet issued detailed guidance on LLMs and privacy, but guidance is coming. Being proactive now positions you well for future requirements.
Key Takeaways
- LLMs memorise training data. Once trained, that data is encoded in the weights
- Memorised data can be extracted through prompt injection, membership inference, and reconstruction attacks
- Traditional data protection (encryption, access control) doesn't protect memorised data in LLMs
- Key defences: PII sanitisation, differential privacy, output filtering, fine-tuning isolation, query monitoring
- Governance of training data and third-party models is critical for enterprise deployments
- Australian organisations must consider privacy impacts as part of LLM governance