AI Incident Response: Your Playbook for When Your Model Gets Compromised

Your customer service chatbot is suddenly recommending competitors' products to users. Or your fraud detection model is systematically missing certain fraud patterns. Or your medical imaging model is producing suspicious edge-case errors.

You have an AI incident. Now what?

Your traditional incident response playbook is useless. It's built for discrete events—a server was hacked, data was exfiltrated, a system crashed. AI incidents are subtler. A model might be compromised yet still functioning, still producing outputs, still passing basic tests. The compromise might be selective: only affecting certain classes of inputs or certain decision boundaries.

You need an AI-specific incident response capability. Here's how to build one.

Why Traditional IR Fails for AI

Traditional incident response assumes:

Binary states: System is either compromised or not. For AI, it's probabilistic.
Clear attribution: We can trace what happened and when. AI systems don't leave clean forensic trails.
Quick isolation: Take the system offline and stop the damage. For AI, taking the system offline might itself be the harm.
Fast recovery: Restore from backup. There's no "backup" for a compromised model's learned parameters.

You need a fundamentally different approach.

The Five Phases of AI Incident Response

Phase 1: Detection (Minutes 0-15)

Everything starts with noticing something's wrong. But what does "wrong" look like for AI?

Detection mechanisms:

Statistical anomalies: Output distributions shift beyond expected ranges. Confidence scores drop significantly. Error rates spike.
Business logic violations: Model recommends products it shouldn't. Fraud detector misses patterns. Medical model produces implausible diagnoses.
User reports: Customers notice weird behaviour. Internal stakeholders flag unexpected decisions.
Automated monitoring: Continuous testing against known test cases. Outputs compared to baseline behaviour.
Adversarial inputs: Proactive red-teaming identifies model weaknesses before attackers do.

For rapid detection, you need real-time monitoring dashboards that show:

Model output distributions vs baseline
Confidence scores and prediction intervals
Latency and resource usage
Error rates on test suites
Data drift detection

"Detection is your first line of defence. Without continuous monitoring, you might not notice a compromised model for weeks or months."

Phase 2: Immediate Containment (Minutes 15-30)

Once detected, you need to stop the bleeding immediately—without breaking the business.

Containment strategies:

Shadow mode: Stop using model outputs for critical decisions. Switch to manual review or fallback system.
Rate limiting: Reduce model query rate to slow potential damage.
Input filtering: Block suspicious inputs that triggered the incident.
Output quarantine: Review all recent outputs before they reach users.
Graceful degradation: Reduce model confidence thresholds or use ensemble voting with older models.

Critically: do not immediately take the system offline. Understand what's happening first. Hasty shutdown can cause more damage than a slowly-degrading model.

Phase 3: Investigation and Diagnostics (Minutes 30-Hours)

Now you investigate: what happened, how did it happen, how long has it been happening?

Diagnostic steps:

Timeline reconstruction: When did behaviour change? Correlate with deployments, data updates, or infrastructure changes.
Input analysis: What inputs trigger anomalous outputs? Are there patterns?
Model introspection: Use explainability techniques (LIME, SHAP) to understand which features the model is using and how they've changed.
Data integrity checks: Has training data been poisoned? Is input data corrupted?
Dependency analysis: Did a vulnerability in a dependency cause the issue?
Access logs: Who deployed changes? Who accessed the model files or training data?

Phase 4: Forensic Analysis (Hours-Days)

Deep dive into what happened.

Forensic techniques:

Model comparison: Compare current model weights to last known-good version. Where do they differ?
Behaviour testing: Feed model known test cases. Does it fail in expected ways?
Adversarial analysis: Use adversarial examples to understand if the model has been specifically poisoned.
Data lineage tracking: Trace all data inputs. Was training data poisoned? Did a corrupted data pipeline feed the model?
Confidence score analysis: Compromised models often have unusual confidence distributions. Analyse statistical signatures.

Document everything. You'll need this for post-incident review and potentially regulatory reporting.

Phase 5: Recovery and Remediation (Days-Weeks)

Fix the problem and get the model back to trusted operation.

Recovery approaches:

Model rollback: Restore from last verified-good version. This requires maintaining version history and checksums.
Controlled retraining: Retrain on clean data with added safety measures (differential privacy, poisoning detection).
Fine-tuning from baseline: If rollback isn't possible, start from a trusted baseline model and carefully re-fine-tune.
Ensemble approaches: Combine multiple models to reduce dependence on a single potentially-compromised model.
Manual review: For critical decisions, require human oversight during recovery period.

Building Your AI IR Playbook

A good playbook has:

Decision trees: If X happens, do Y. Clear escalation paths.
Contact lists: Who do you call? Include model owners, security team, business stakeholders, executives.
Checklist for each phase: What to check, what to document, what to communicate.
Communication templates: How to notify customers, regulators, board members.
Technical runbooks: Step-by-step commands for containment, diagnostics, recovery.
Post-incident review process: How to learn and improve for next time.

Critical Infrastructure Considerations

If your AI system controls critical infrastructure (power grids, water systems, transportation), incident response must include:

Override capabilities: ability for human operators to take manual control immediately
Degraded operation modes: what happens if the model is unavailable?
Regulatory notification: SOCI Act (once enforced) may require rapid notification of incidents
Coordination with authorities: ASD, ACSC, relevant regulators

Key Takeaways

AI incidents are subtle. Continuous monitoring is essential for detection.
Containment must balance rapid action with avoiding over-reaction.
Investigation and forensics for AI require specialised skills and tools.
Model recovery differs from traditional system recovery—versioning and rollback are critical.
Building an AI IR playbook now prevents chaos when an incident occurs.
For critical systems, override and manual operation capabilities are non-negotiable.