“We’re optimistic that a proactive, highly responsive rapid response loop can continue to materially reduce real-world risk over time,” the company said.
OpenAI’s approach to the problem is to use an AI-powered attacker of its own—essentially a bot trained through reinforcement learning to act like a hacker seeking ways to sneak malicious instructions to AI agents. The bot can test attacks in simulation, observe how the target AI would respond, then refine its approach and try again repeatedly.
“Our [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps,” OpenAI wrote. “We also observed novel attack strategies that did not appear in our human red teaming campaign or external reports.”
However, some cybersecurity experts are skeptical that OpenAI’s approach can address the fundamental problem.
“What concerns me is that we’re trying to retrofit one of the most security-sensitive pieces of consumer software with a technology that’s still probabilistic, opaque, and easy to steer in subtle ways,” Charlie Eriksen, a security researcher at Aikido Security, told Fortune.
“Red-teaming and AI-based vulnerability hunting can catch obvious failures, but they don’t change the underlying dynamic. Until we have much clearer boundaries around what these systems are allowed to do and whose instructions they should listen to, it’s reasonable to be skeptical that the tradeoff makes sense for everyday users right now,” he said. “I think prompt injection will remain a long-term problem … You could even argue that this is a feature, not a bug.”
Security researchers also previously told Fortune that while a lot of cybersecurity risks were essentially a continuous cat-and-mouse game, the deep access that AI agents need—such as users’ passwords and permission to take actions on a user’s behalf—posed such a vulnerable threat opportunity it was unclear if their advantages were worth the risk.
George Chalhoub, assistant professor at UCL Interaction Centre, said that the risk is severe because prompt injection “collapses the boundary between the data and the instructions,” potentially turning an AI agent “from a helpful tool to a potential attack vector against the user” that could extract emails, steal personal data, or access passwords.
“That’s what makes AI browsers fundamentally risky,” Eriksen said. “We’re delegating authority to a system that wasn’t designed with strong isolation or a clear permission model. Traditional browsers treat the web as untrusted by default. Agentic browsers blur that line by allowing content to shape behavior, not just be displayed.”
OpenAI recommends users give agents specific instructions rather than providing broad access with vague directions like “take whatever action is needed.” The company also said Atlas is trained to get user confirmation before sending messages or making payments.
“Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,” OpenAI said in the blogpost.



