Das sieht nicht gut aus:
The latest findings from one of the largest public agent red-teaming studies reveal a concerning truth: Agentic AI remains highly vulnerable and hackable. Conducted by Gray Swan AI and the UK AI Security Institute from March to April 2025, the study exposed critical flaws in leading AI agents.
Key findings from the extensive research, which involved 22 top agents, 44 real-world scenarios, and 1.8 million attacks, include:
- A 100% policy failure rate among tested agents.
- A staggering 62,000 rule breaks identified during the attacks.
- High transferability of prompt injections, indicating widespread susceptibility.
- Little to no correlation between model size and robustness, challenging assumptions about larger models being inherently more secure.
The „bottom line“ is clear: Without new, agent-native defense mechanisms, the promise of AI autonomy significantly amplifies risk. This vulnerability is particularly pronounced through indirect prompt injections, which can occur via various mediums such as web interfaces, PDF documents, and integrated tools.