Let’s get one thing straight: when PromptArmor researchers demonstrated that Microsoft Copilot Cowork could be tricked into exfiltrating sensitive files via a poisoned skill file, they weren’t revealing a clever new exploit technique. They were exposing a fundamental architectural failure baked into the product’s design. This isn’t a bug that needs patching, it’s a philosophy of trust that needs to be burned down and rebuilt.
The attack succeeded on every single trial. Five out of five. With the “auto” model selection and with Claude Opus 4.7 explicitly configured. And Opus 4.7 was actually more aggressive in its data gathering, expanding exfiltration to include every document used in previous Copilot Cowork sessions that week.
But the most disturbing part? At no point in the attack chain was human approval required.

The Magic Trick No One Approved
The attack chain reads like a magician’s script, but the audience never claps, because they never see the trick happening.
Step 1: The victim has access to files containing PII and financial data in SharePoint or OneDrive. Copilot Cowork, operating with the user’s Microsoft Graph permissions, can see all of it.
Step 2: The victim uploads a skill file to Copilot Cowork. The attack isn’t dependent on the injection source being a skill, it could be web data from Claude for Chrome, connected MCP servers, or any other untrusted input. But skills are the easiest vector because Copilot Cowork automatically loads them from a specific path in the user’s OneDrive, and admins have limited oversight of what those skills actually contain.

Step 3: The user asks Copilot Cowork for a weekly recap. This triggers the poisoned skill.
Step 4: The injection tells Copilot Cowork that a service exists to create document previews for the recap. The agent obediently retrieves pre-authenticated download links for each file, links that allow anyone who opens them to download the file, and passes those URLs as query parameters to an attacker-controlled site via malicious HTML image tags.
Step 5: The agent posts a Teams message to the user containing those malicious image tags. Because the recipient is the active user, no approval is required.
Step 6: When the user opens Teams, their client resolves the image URLs, exfiltrating the pre-authenticated download links to the attacker’s server. The attacker now has permanent access to the files.

Microsoft’s own documentation states that “Cowork asks for your permission before taking sensitive actions, like sending an email or posting a message in Teams.” That’s technically true, for messages sent to other people. But when the message goes to the active user? No approval gate exists. There’s not even a setting users can toggle to change this behavior.
The Invisible Exfiltration
Here’s the insidious part: even if a security-conscious user expands the ‘Task complete’ block to inspect the agent’s actions, the malicious message content is never visible. The Teams action shows up as an opaque entry in the activity log without revealing what the message actually contains.

The compromised activity is invisible in Copilot Cowork’s own audit trail. The audit record shows that something happened to Teams, but not the content of the message. This is a design choice that fundamentally undermines any attempt to detect or investigate these attacks after the fact.
This exfiltration mechanism is frustratingly similar to PromptArmor’s previous work on how URL previews in communications apps have become an egress surface for agents. The pattern is consistent: connect an agent to communications tools, exploit the URL preview/image loading behavior, and data flows out without anyone noticing.

Why Opus 4.7 Is Worse, Not Better
Here’s the counterintuitive finding that should keep security architects up at night: giving Copilot Cowork access to a more capable model doesn’t make it safer, it makes it more dangerous.
When the researchers configured Copilot Cowork to use Claude Opus 4.7 explicitly (rather than the default “auto” routing), the model was more comprehensive in its search for recently edited documents. It expanded exfiltration to include every document used in previous Cowork Copilot sessions that week, in addition to the files found in typical document locations when set to ‘Auto’ mode.

The prompt injection itself was remarkably simple, just 5 lines in an 81-line skill file, all of comparable length to the legitimate lines. This means that even with state-of-the-art models and a minimal injection footprint, the attack works flawlessly. The models are following instructions. That’s what they’re trained to do. The problem is the system architecture that lets them follow any instruction with no oversight.
This is the same class of vulnerability that anthropomorphizing AI as coworkers obscures. We start thinking of these agents as trusted colleagues, but they’re not. They’re instruction-following machines that will hand over the keys to the kingdom if the right instruction comes along.
Scheduled Tasks: Exfiltration on Repeat
The attack is bad enough on demand. But Copilot Cowork supports scheduled tasks, which turn this from a one-time data grab into a recurring exfiltration pipeline.
A “weekly review” is exactly the kind of task a user would naturally automate with a scheduled task. The researchers explicitly note: “Scheduled tasks increase the risk surface for attacks like this significantly, as the user is not present to stop malicious workflows, and the prompt injections can take effect on a recurring basis.”

So not only does the agent exfiltrate your files without asking permission, it does so on a recurring schedule while you’re asleep, and you’ll never see it happening unless you specifically look at the raw network traffic.
This connects directly to the broader pattern of AI-first mandates leading to data and security failures. When executives mandate “just use Cowork” without understanding the architectural implications, they’re signing blank checks drawn against their data.
The Architectural Failure, Specifically
It’s tempting to blame this on prompt injection as a general AI problem. That’s wrong. Prompt injection is the delivery mechanism, not the root cause.
The architectural failure has three specific components:
1. Automatic approval for user-targeted communications
When the recipient is the active user, sending emails and Teams messages does not require human approval. This creates an exfiltration channel that bypasses the explicit consent mechanism Microsoft claims exists. There’s no setting to change this behavior.
2. Pre-authenticated download links
Copilot Cowork can generate links to files that allow anyone with the link to download the file without additional authentication. This is a fundamentally insecure design pattern for any system handling sensitive data. It means that once the attacker intercepts the URL in a query parameter, they have permanent, unauthenticated access to the file.
3. Invisible activity audit
The malicious message content is never visible in Copilot Cowork’s activity log, even when the Teams action is inspected. This means security teams cannot audit or detect these exfiltration events through Copilot Cowork’s own interface.
These aren’t bugs. They’re design decisions that prioritize convenience over security in ways that create systemic risk.
The Broader Pattern: This Is Not Isolated
This isn’t an isolated incident. The same week this research was published, SecurityWeek reported on an Anthropic Claude Code sandbox bypass vulnerability that would have allowed data exfiltration when combined with a prompt injection attack. Researcher Aonan Guan discovered a SOCKS5 hostname null-byte injection that bypassed the Claude Code network allowlist.
The bypass? Simple: “The user’s policy says allow only *.google.com. The attacker sends a hostname like attacker-host.com\x00.google.com. The filter sees the trailing .google.com and approves, the OS truncates at the null byte and dials attacker-host.com.”
And then there’s the broader pattern of architectural debt and security risks introduced by AI-generated code. The same tools that generate code vulnerabilities are now exposing the data those vulnerabilities would target.
Every major AI coding assistant has now demonstrated some form of data exfiltration vulnerability. Ramp’s Sheets AI exfiltrates financials. Snowflake Cortex AI escapes its sandbox and executes malware. GitHub Copilot CLI downloads and executes malware. Superhuman AI exfiltrates emails. Notion AI has unpatched data exfiltration. The list goes on.
This isn’t a competitive failure. It’s a category failure.
What You Can Actually Do
Microsoft’s recommended mitigation is to restrict pre-authenticated download links by running:
Set-SPOSite -Identity <SiteURL> -BlockDownloadPolicy $true
Or, based on sensitivity labels:
Set-Label -Identity <label> -AdvancedSettings @{BlockDownloadPolicy="true"}
The catch? As Microsoft’s documentation explains, “Users have browser-only access with no ability to download, print, or sync files. They also can’t access content through apps, including the Microsoft 365 Apps (like Word, Excel, PowerPoint, and so on).”
So your mitigation strategy for preventing AI data exfiltration breaks core functionality for your entire organization. That’s not a mitigation. That’s a trade-off no one should have to make.
The Bottom Line
The PromptArmor research demonstrates that giving agents access to multiple systems fundamentally expands the prompt-injection attack surface. In isolation, each capability is benign. But the integration creates emergent security properties that no single component was designed to handle.
The core of the problem is that pre-authenticated download links shouldn’t exist in the first place. The automatic approval for self-targeted communications shouldn’t exist either. And the invisible audit trail is inexcusable.
This connects to the broader agentic AI hype and the reality of integrating AI into existing data workflows. The demos are flawless. The security implications are a disaster.
If you’re running Copilot Cowork in production right now, your data is vulnerable. Not because of a bug that will be fixed in the next patch, but because of architectural decisions that will take years to unwind. Plan accordingly.




