AI agents can bypass guardrails and put credentials at risk, Okta study finds

An AI agent that revealed sensitive data without being asked. An agent that overruled its own guardrails. Another that sent credentials to an attacker via Telegram, because it forgot it wasn’t supposed to do so after a reset.

It’s no secret that AI agents have huge potential, balanced by equally big risks. What’s becoming apparent, however, is how quickly agentic systems can veer wildly off course and start exposing critical information under real-world conditions.

A look at just how easily this can happen emerges from Phishing the agent: Why AI guardrails aren’t enough, a report on tests conducted by cloud identity and access management (IAM) company Okta Threat Intelligence, which uncovered all of the problems cited above, and more.

Their research focused on OpenClaw, a model-agnostic multi-channel AI assistant which has seen explosive growth inside enterprises since appearing in late 2025.

The Telegram hack

In common with the growing list of rival agents, OpenClaw is only as useful as the access it is given to files, accounts, browsers, network devices, and, most significant of all, credentials.

One test conducted by Okta assessed how easy it would be to trick OpenClaw running Claude Sonnet 4.6 into handing over an OAuth token. This shouldn’t be possible; the LLM should refuse this request. However, what might have held true when prompting Claude as a chatbot quickly fell apart when it was accessed through OpenClaw.

The test assumed that a user had given OpenClaw full access to their computer, that they regularly controlled the agent over Telegram, and that their Telegram account had been hijacked.

First, the attacker instructed the agent via Telegram to retrieve an OAuth token, but to only display it in a terminal window on the computer. Claude Sonnet’s guardrails would prevent it from copying the token, however, the testers were able to reset the agent, causing it to forget it had displayed the token in the terminal window.

At that point, Okta said in its writeup, “The agent was instructed to take a screenshot of the desktop, which included the token, and then drop the screenshot in the Telegram chat, which it did. Exfiltration accomplished.”

Agent-in-the-middle

Agentic AI is really two things: a powerful orchestration system coupled to one or more highly-capable LLMs. What an agent isn’t is a simple interface, and it must be viewed as a separate system capable of autonomous, unpredictable reasoning.

In fact, Okta threat intelligence director Jeremy Kirk pointed out, “It opens up a new attack surface. Someone gets SIM swapped, their Telegram is hooked up to an agent that has carte blanche to run anything on their computer, and possibly their employer’s network. In an enterprise context, this is a total nightmare.”

OpenClaw is also so hard-wired to find ways around problems, it will sometimes do unexpected, improper things. Kirk said that an agent, when prompted in tests to access a website, requested the site’s login credentials in chat via a Telegram bot, an unencrypted channel which would expose them to anyone with access to that chat.

In another example, OpenClaw was asked to search X for AI stories. That shouldn’t have been possible; the machine was logged into X, but OpenClaw’s isolated Chrome profile was not. However, when prompted to grab the session cookies from the logged-in session and inject them into its own browser process, it happily attempted to do so.

This is similar in principle to adversary-in-the-middle phishing attacks, which allow attackers to bypass protections such as MFA. It should be a no-go, and yet OpenClaw thought the action was valid, underlining how an attacker could manipulate it to do the same.

“The agents are prompted to be as helpful as possible by default, a characteristic that poses particular concerns when it comes to credentials and tokens,” said Kirk.

‘Defying security gravity’

According to Kirk, many enterprises are, sometimes unwittingly, running unsanctioned or weakly managed ‘shadow’ agents inside their networks. An example of how this could go wrong was the recent Vercel compromise in which the Context.ai app opened the door to the theft of downstream OAuth session tokens.

The problem stems from agents being used experimentally by developers and employees, with little or no governance or oversight. The answer is to secure them using the same controls applied to users or service accounts, said Kirk. And as well as limiting the scope of agents, enterprises should also look to securing the credentials and tokens themselves, avoiding giving them long expiry dates.

Agents are only the latest example of a technology that is being deployed faster than it can be secured, Kirk observed. “Much of AI right now is defying security gravity,” he said. “But there are ways to use agents safely and keep credentials out of their reach, which is the only safe way to use them.”

The Telegram hack

Agent-in-the-middle

‘Defying security gravity’

Share This

You May Also Like