A language model that predicts customer churn is not autonomous. It gives you a probability, you decide whether to reach out. You control the action.
An AI agent that autonomously sends emails to customers identified as churning is very different. The agent decides what to say, to whom, and when. It executes actions without human approval. That's agentic AI, and it's a completely different threat model.
Agentic AI is coming fast. Systems like AutoGPT, BabyAGI, and ReAct are already showing what autonomous agents can do. By 2027, autonomous agents will be mainstream in enterprise. And the security industry is vastly underprepared.
What Makes Agentic AI Different
Traditional AI systems are reactive: given input, produce output. The human decides what to do with the output.
Agentic AI systems are autonomous: given a goal, the agent decides what to do, executes actions, observes results, and refines its approach. The agent acts in the world without human approval for each action.
This fundamentally changes the security threat model:
- Higher blast radius: A faulty recommendation model might give bad advice. A faulty autonomous agent might execute bad advice across thousands of transactions.
- Harder to contain: You can take a chatbot offline. An agent that's autonomously taking actions is harder to stop mid-operation.
- New attack vectors: An attacker can manipulate an agent into making decisions that serve the attacker's interests.
- Unintended actions: Even without external attacks, an autonomous agent might make surprising decisions that were never tested.
"With agentic AI, you're not just controlling a model's outputs. You're controlling an entity that can act. That requires a completely different security approach."
The Tool-Use Problem
Agents achieve autonomy by using tools. An agent might have access to APIs for sending emails, querying databases, making payments, or modifying infrastructure.
This is powerful. But it's also dangerous.
Tool Misuse
An agent tasked with "reduce infrastructure costs" might decide to delete backups or disable security monitoring. Technically, it's using the available tools. But the outcome is catastrophic.
Prompt Injection via Tool Results
Tools return results. These results go back to the agent's context. If those results contain instructions (malicious or unintended), the agent might follow them.
Example: An agent queries a database and gets a result: "DELETE ALL LOGS". The agent, reading the instruction in the result, might execute it.
Privilege Escalation
An agent starts with limited privileges: read-only access to certain databases, ability to send emails, etc. But through clever use of available tools, it might be able to escalate privileges. It queries a tool that reveals an API key. It uses that key to access a more privileged tool. It escalates further.
Supply Chain Attacks via Tools
An agent uses third-party tools: APIs, external services, plugins. An attacker compromises one of these tools. The agent unknowingly uses the compromised tool and becomes a vector for the attack across your entire organisation.
Multi-Agent Orchestration Risks
A single agent is hard to secure. Multiple agents coordinating with each other is exponentially harder.
In multi-agent systems, agents communicate with each other. This communication is another attack surface.
- Agent spoofing: One agent claims to be another and tricks a third agent into doing something.
- Message tampering: An attacker intercepts and modifies communications between agents.
- Coordination attacks: Multiple compromised agents coordinate to cause harm larger than any single agent could.
- Cascading failures: One agent fails and causes a cascade of failures in dependent agents.
Permission Scoping and Least Privilege
The traditional least-privilege principle is critical for agents, but much harder to implement.
For a human user, you can assign specific roles: "customer service representative can view customer data but not modify it." The permission boundary is clear.
For an agent, permissions are less clear. An agent tasked with "resolve customer issues" might need to:
- Query customer accounts (read)
- Initiate refunds (write, financial impact)
- Send emails (communication)
- Log actions (write to audit logs)
- Query transaction history (read)
Is there a tool that gives it permission to do all of these safely? Or does it need separate access to four different systems? And if an agent is compromised, how do you limit its damage? Can you grant it temporary access that expires? Can you audit what it did?
Building Guardrails for Agentic AI
These are the technical controls needed for safe autonomous agents in production.
1. Action Approval Workflows
For critical actions (delete, transfer funds, modify infrastructure), require human approval before execution.
- Agent decides on an action
- Action is queued for human review
- Human reviews and approves/rejects
- Only after approval does the agent execute
This is slower than fully autonomous operation but much safer.
2. Tool Sandboxing
Run tools in isolated environments with limited access to internal systems.
- API calls to external services in a controlled gateway
- Database access through a proxy with query filtering
- File system access through a controlled interface
3. Intent Verification
Before an agent executes an action, verify that the action aligns with the original goal.
- Agent's goal: "Reduce customer complaints"
- Proposed action: "Disable all email monitoring"
- Intent check: "This action doesn't align with the goal. Reject."
4. Action Auditing and Rollback
Log every action the agent takes. Be able to audit and potentially roll back.
- Immutable audit logs of agent actions
- Ability to query "what did the agent do between time X and Y?"
- Rollback capabilities for reversible actions
5. Tool Attestation and Monitoring
Know what tools the agent has access to and monitor their behaviour.
- Maintain inventory of all tools available to agents
- Verify tool integrity regularly (has the API been compromised?)
- Monitor tool behaviour: is an API endpoint returning malicious results?
6. Agent Isolation and Resource Limits
Run agents in isolated containers with strict resource limits.
- CPU, memory, and network quotas
- Timeout limits: agent has X seconds to complete a task, then stops
- Call limits: agent can make at most N API calls
7. Prompt Injection Detection
Monitor for attempts to inject instructions into tool results or agent context.
- Scan tool results for suspicious patterns ("DELETE", "EXECUTE", etc.)
- Use semantic analysis to detect implied instructions
- Treat tool results as untrusted data
Regulatory Implications
Autonomous agents that make decisions or take actions have regulatory implications.
If an autonomous agent makes a decision affecting a customer (refund, account closure, denial of service), regulators want to know:
- What was the agent's goal?
- How did it reach that decision?
- What data did it use?
- Was there human review?
- Can the decision be appealed or reversed?
For Australian organisations, this is coming with AI governance frameworks. Build auditability into autonomous agents now.
Key Takeaways
- Agentic AI introduces fundamentally new security risks compared to traditional AI
- Tool access is powerful but creates new attack vectors
- Prompt injection, privilege escalation, and cascading failures are real risks
- Least-privilege access is critical but difficult to implement for agents
- Action approval, tool sandboxing, and intent verification are essential guardrails
- Auditability is a must for regulatory compliance and incident investigation