
As AI systems become more autonomous and integrated into sensitive workflows, the risks they pose are no longer just theoretical. While many AI impact assessments focus on ethical principles like privacy, fairness, bias, and transparency, they don’t always capture the security dynamics that emerge when AI agents are deployed in the real world.
That’s where the concept of the “lethal trifecta for AI agents” – a term coined by Simon Willison – comes in. It articulates a practical method for identifying high risk agent architectures.
The trifecta refers to the convergence of three specific capabilities in an AI agent:
- Access to private data
- Exposure to untrusted content
- Ability to communicate externally
When all three are present, the agent becomes a potential vector for data leaks, manipulation, and real-world consequences. When used alongside a broader set of security practices, limiting agents from possessing the three capabilities of the trifecta can help ensure AI agents are safe, secure, and aligned with organisational intent.
Understanding the Trifecta
1. Access to Private Data
AI agents often need access to sensitive information – emails, documents, internal databases – to be useful. But this access also creates a risk of data exfiltration if the agent is compromised or manipulated.
2. Exposure to Untrusted Content
Agents that process external inputs – like user prompts, web content, or third-party APIs – are vulnerable to prompt injection and other adversarial attacks. These inputs can subtly or overtly alter the agent’s behaviour.
3. Ability to Communicate Externally
When an agent can send emails, make API calls, or post to social media, it gains the power to act on manipulated instructions – potentially leaking data, spreading misinformation, or triggering real-world actions.
Real-World Examples of the Trifecta in Action
Understanding how the trifecta manifests in real-world systems helps clarify its relevance. Here are several examples where all three elements converge:
AI-Powered Customer Support Agents
These agents often have access to sensitive customer data (billing info, account history), interact with unpredictable user inputs, and can send emails or initiate account actions. A prompt injection attack could trick the agent into leaking private data or performing unauthorised actions.
Autonomous Research Assistants
Tools that browse the web, summarise articles, and update internal knowledge bases are exposed to untrusted content and often have access to proprietary research or internal documentation. If they can also send updates or communicate with external APIs, they become vulnerable to misinformation propagation or data leakage.
AI Workflow Automation Tools
Agents that automate business processes – like invoice handling, HR onboarding, or legal document review – typically access private files, respond to user prompts, and interact with external services (e.g., cloud storage, email). A malicious input could redirect sensitive documents or trigger unintended actions.
Developer Copilots with Deployment Access
Some AI coding assistants can read private repositories, accept user-written prompts, and push code to production environments. If exposed to adversarial inputs, they could introduce vulnerabilities or leak proprietary code.
AI Agents in Healthcare Settings
Agents that assist with patient intake, diagnostics, or scheduling may access medical records, interact with patients or staff, and send prescriptions or referrals. A compromised agent could result in privacy violations or incorrect medical actions.
Applying the Trifecta in Practice
Designing Secure AI Solutions
Understanding the trifecta helps architects and developers proactively reduce risk. For example:
- Limit external communication capabilities in agents that handle sensitive data.
- Sanitise or isolate untrusted inputs before they reach core logic.
- Use fine-grained access controls to restrict what data an agent can access.
By identifying which elements of the trifecta are present, teams can design mitigations early – before deployment.
AI Impact Assessments
Incorporating the trifecta into AI risk assessments (e.g., during ISO 42001 implementations) ensures that security is evaluated alongside ethics. It helps assess:
- Whether the agent’s data access is justified and auditable.
- How resilient the system is to adversarial inputs.
- What safeguards exist to prevent unauthorised external actions.
This lens transforms assessments from checklists into meaningful evaluations of real-world risk.
Penetration Testing Against AI Systems
For red teams and penetration testers, the trifecta offers a roadmap for identifying exploitable vectors:
- Can the agent be tricked into leaking data via prompt injection?
- Can it be manipulated through untrusted content?
- Can it act on malicious instructions by sending emails or calling APIs?
Testing against these vectors simulates realistic attack scenarios and reveals vulnerabilities that traditional testing might miss.
Need assistance?
If you would like assistance with implementing ISO 42001, we'd love to chat. We support organisations Australia wide, with specialists in Brisbane and Toowoomba.