Protopia AI is at RSAC. Meet our team of experts in AI Data Privacy and Security.

4 min read

Written by protopia

McKinsey Lilli hack & Anthropic Mythos reveal new AI cybersecurity threats

The Patch Closed One Door. Mythos Just Showed Us How Many Doors There Are.

In early March, a security researcher at CodeWall pointed an autonomous AI agent at McKinsey’s internal chatbot, Lilli, with no credentials and no insider knowledge. Within two hours it had full read and write access to the production database: 46.5 million chat messages about strategy, M&A, and client engagements, all in plaintext. 728,000 files. 57,000 user accounts. And 95 system prompts controlling Lilli’s behavior, all of them writable. The entry point was a SQL injection flaw in a publicly exposed, unauthenticated API endpoint. A vulnerability class that has been on the OWASP Top 10 for twenty years.

McKinsey patched all unauthenticated endpoints within 24 hours of disclosure. Their forensics investigation found no evidence that client data was accessed by an unauthorized party. By any standard, a fast, credible response.

A month later, Anthropic published the Mythos Preview red team report. Mythos found and exploited zero-day vulnerabilities in every major operating system and every major web browser, with over 99% of those vulnerabilities still unpatched at the time of writing. Engineers without security training were able to ask the model to find remote code execution vulnerabilities overnight and come back the next morning to a complete, working exploit. Anthropic chose not to release Mythos Preview generally and routed access through a controlled program. Their own framing on where this goes next: “the trajectory is clear.”

These are two different stories. Read together, they say something the industry needs to hear.

The reflex is to assume the lesson belongs to someone else

The reflex on McKinsey is “we would never expose an unauthenticated endpoint like that.” The reflex on Mythos is “there is plenty of time before adversaries have anything that capable.” The combination is what should land harder than either piece on its own. The class of bug CodeWall took two hours to find by hand is exactly the class of bug Mythos and its successors will find at scale, on a budget of dollars per attempt, against software the defenders never thought was reachable. The window between “vulnerability exists” and “vulnerability is being exploited” closes for your application, and it closes for the application sitting on the other side of your API call.

Two layers, two different problems

The CodeWall report revealed something beyond the database exposure. 1.1 million files and 217,000 prompts were flowing out of McKinsey’s network to external AI APIs, including OpenAI vector stores containing McKinsey’s proprietary documents. Lilli was a RAG application built on a third-party LLM. Sensitive documents and queries were leaving the tenant boundary as a feature of normal operation, encrypted in transit but plaintext the moment the model host received them.

The application-layer breach has known fixes, and McKinsey has applied them. The inference pipeline is the second layer, and it is the one Mythos should reframe. The model host on the other side of your inference call is not a single shape. It can be an LLM provider’s API, an AI factory inside a sovereign AI buildout, an AI factory run by an operating partner on behalf of a group of similar enterprises, your own internal AI factory, or an inference endpoint on a NeoCloud or inference platform. The list is long and growing. Assuming a McKinsey-style API gap can exist on your side of the wire but not on any of those is not posture. It is hope, against the same class of adversary that just compromised McKinsey, only with progressively better tools.

What the assume-breach posture looks like for AI inference

Mature security teams already apply assume-breach thinking to networks, identity, and endpoints. The same logic, applied to AI inference, asks two questions. What is leaving the tenant boundary? And if anyone, on either side of that boundary, gets in, what do they find?

If the answer is strategy conversations, M&A discussions, PHI, PII, or proprietary code in plaintext, the blast radius is enormous. If the answer is a stochastically transformed representation that the target model can still use but that has no reverse function back to the original input for anyone else, the blast radius is bounded by mathematics.

That is the posture Protopia’s Stained Glass Transform is built for. SGT transforms sensitive prompts, context, and documents into stochastic embeddings before they leave the data owner’s trust boundary. Inference servers, logs, caches, observability tools, and the model host itself only ever see the transformed representation. The plaintext is not present on the shared infrastructure to be protected, leaked, or exfiltrated. This is an architectural property, not a runtime policy. It does not depend on attestation, on credentials staying secret, or on the next zero-day not being found. It is also why SGT anchors the HPE and Protopia Trustworthy AI Factory blueprint and strengthens the Data Protection, Model Security, and Infrastructure Security domains of WWT’s ARMOR framework: by removing plaintext from the surfaces those domains have to protect.

What to ask about your own deployment

Patching the SQL injection was the right move. Closing the application-layer gaps is the easier half of the problem. The harder half is what the next CodeWall agent, or the next Mythos-class model, finds on the other end of the inference call.

If sensitive inputs travel as plaintext from your network to a model that lives somewhere you do not control, you are betting that nothing in that pipeline is ever compromised. The Mythos report is what makes that bet harder to justify every quarter and soon every month and every week. The architectural answer is to remove the bet. If the data was never there to be taken, neither perimeter has to hold for the data to stay safe.

Sources

How We Hacked McKinsey’s AI Platform (CodeWall, March 9, 2026)

Claude Mythos Preview (Anthropic Red, April 2026)

Half Your AI Factory Is Sitting Idle, Here Is the Blueprint That Fixes It (HPE Community)

AI vs AI: Agent Hacked McKinsey’s Chatbot (The Register, March 9, 2026)

Table of contents

Share this article

Related blogs