AI Found 10,000 Flaws. Can You Patch Them?

Anthropic's Project Glasswing found 10,000+ enterprise vulnerabilities in a month. See why the patching team is now the real AI security bottleneck.

Scott Armbruster
14 min read
AI Found 10,000 Flaws. Can You Patch Them?

Anthropic published a Project Glasswing initial update on May 25-26 with a number nobody in enterprise security was ready to read. Ten thousand. That’s the count of high- and critical-severity vulnerabilities Anthropic and roughly 50 Glasswing partners have discovered in widely used software since the program kicked off about a month earlier. Cloudflare alone surfaced 2,000 bugs, 400 of them high or critical. Mozilla shipped Firefox 150 with 271 fixes against the roughly 25 you’d see in a normal release cycle. Palo Alto Networks pushed more than five times its usual CVE volume in one disclosure window.

The discovery side of enterprise security just got faster than the patching side. That’s the story.

If your security program was designed around the assumption that finding bugs was the slow step, that assumption is dead. The new slow step is the human review, the regression test suite, the change window, the customer notification, the back-port to the supported version that’s running in your most regulated business unit. Every one of those steps was sized for a bug pipeline that produced 20-something CVEs a month per major vendor. The pipeline is now producing five to ten times that, with no signal that it’s going to slow down.

Quick Verdict

Signal from Glasswing’s First MonthWhat It Means for You
10,000+ high or critical vulnerabilities found across ~50 partners in under a monthThe discovery rate just outran the patch rate by an order of magnitude
Cloudflare: 2,000 bugs, 400 high or criticalOne vendor, one program window. Multiply across your stack
Mozilla Firefox 150: 271 vulnerabilities fixed vs ~25 in Firefox 14810x increase in disclosed flaws on a browser running on hundreds of millions of endpoints worldwide
Palo Alto Networks: 5x normal patch volume in one cycleVendor patch cadence is changing without asking permission
Glasswing partners include AWS, Apple, Cisco, Google, JPMorgan, Microsoft, CrowdStrike, NVIDIA, Linux FoundationThe disclosure avalanche is coming from inside the vendors you already depend on
Anthropic scanned 1,000 OSS projects: 6,202 high/critical of 23,019 total findingsYour open-source supply chain is the next disclosure cycle
Microsoft warningPatch volumes will keep growing
Your real lever this quarterTreat patch operations as the AI security KPI. Not adoption. Not coverage

What Glasswing Actually Did

Project Glasswing is the Anthropic program that gives about 50 partner organizations early access to Claude Mythos Preview, the frontier security model Anthropic has not yet released publicly. The pitch, per Anthropic’s own writeup, is defensive. Hand the model to the vendors and infrastructure operators most enterprises depend on, let them scan their own codebases, fix the findings before the offensive side of the field catches up.

The first-month receipts landed on May 25-26 and they’re heavier than the program’s own framing prepared the field for.

Anthropic scanned 1,000 open-source projects with Mythos and surfaced 23,019 total findings. Of those, 6,202 were classified high or critical. The triage rate is the interesting number. Roughly 27% of every alert the model produces is something a security engineer would treat as a real bug, not a false positive. That’s a hit rate static analysis tools haven’t approached in twenty years.

Cloudflare’s slice is the cleanest commercial example. The company put Mythos against critical internal systems and pulled 2,000 vulnerabilities out, 400 of which were high or critical. A Glasswing partner bank used the model to flag and block a fraudulent $1.5 million wire transfer after a threat actor compromised a customer’s email account. Different problem shape from a CVE scan, same underlying capability. The model reasons about code and behavior in a way the previous generation of detection tools couldn’t.

Mozilla’s number is the one every enterprise CISO should be sitting with. Firefox 148 included roughly 25 vulnerability fixes, which is a normal release. Firefox 150 included 271 fixes flagged by Mythos, and Mozilla shipped 423 Firefox security bugs in April alone, five times the prior month and almost 20x the 21.5-per-month average from 2025. The browser running on most enterprise endpoints just got a ten-x increase in disclosed flaws, and your patch automation has to absorb that this quarter.

The Bottleneck Just Moved

The old security debate had two sides. One side argued AI would generate a flood of new offensive capability and overwhelm defenders. The other side argued AI would close the gap by accelerating defensive scanning faster than attackers could exploit. The Glasswing first-month data favors the second side, but not in a way most defensive programs are organized to capture.

The model finds bugs faster than your team can patch them. That’s the practical translation of every number in the report.

Three things are now broken in the standard enterprise patch operation.

Your patch SLA was sized for a different decade. Most enterprise patch programs run on a 30-day critical, 60-day high, 90-day medium clock. That math assumed the inbound rate was a handful of criticals per vendor per quarter. Multiply the rate by five or ten and the queue compounds. Inside two cycles, the 30-day clock is fiction. Inside four, the audit gets ugly.

Your change management process assumes human review at every gate. Each patch goes through impact analysis, a regression test pass, a CAB approval, a customer-impact note, a rollback plan. That ritual was tuned for a volume the new disclosure rate has already broken. The orgs that survive 2026 patch ops are the ones that automate the ritual at the same rate the discovery side is automating discovery.

Your supported-version matrix wasn’t built for this. Most enterprise products ship patches against the current major version. Anything older is back-ported on a case-by-case basis. The case-by-case basis is the part that breaks first when the disclosure count goes 10x. Regulated industries running on N-2 or N-3 versions of an enterprise platform are going to discover patches they need that nobody is back-porting.

The pattern is the same one I covered in the enterprise AI ROI reckoning at the buyer level. The capability arrives faster than the operational layer is structured to absorb it. The capability still wins. The org that ran the capability into an operational layer designed for the new rate wins twice.

What Is Anthropic’s Project Glasswing?

Project Glasswing is Anthropic’s defensive AI security program, launched in early 2026, that gives roughly 50 partner organizations exclusive early access to Claude Mythos Preview to identify vulnerabilities in widely deployed software before public disclosure. Partners include AWS, Apple, Cisco, Google, JPMorgan Chase, Microsoft, CrowdStrike, NVIDIA, Palo Alto Networks, and the Linux Foundation. In its first month of operation, the program produced more than 10,000 confirmed high- or critical-severity vulnerabilities across operating systems, browsers, network infrastructure, and enterprise software. Anthropic has chosen not to release Mythos publicly because, in its own assessment, no AI lab has built safeguards strong enough to prevent offensive misuse.

That last sentence is the part most CISOs are skipping past. Anthropic is sitting on a model it believes is too dangerous to ship. The 50 Glasswing partners are running it under contract. The rest of the field is patching whatever those 50 partners disclose. The asymmetry inside the program is bigger than the asymmetry inside any previous security technology cycle.

The Five Disclosure Avalanches Coming at You

Five distinct disclosure pressures are about to hit your environment, and they will not arrive in a tidy queue.

  1. Vendor-driven floods. Palo Alto, Cisco, Microsoft, and every vendor inside Glasswing will keep pushing patch volume well above their historical averages. Microsoft has already warned its customer base that patch counts will keep climbing.
  2. Browser and OS catch-up. Mozilla’s Firefox 150 was the first browser shipping with Mythos findings baked in. Chrome and Edge will follow inside the next two quarters. macOS and Windows security updates will follow inside two more.
  3. Open-source supply chain disclosures. Anthropic’s own scan of 1,000 OSS projects produced 6,202 high or critical findings. Those findings have to be disclosed through the normal CVE process to the maintainers of the affected projects. The dependency graph inside most enterprise environments touches hundreds of those projects.
  4. Infrastructure and network gear. Cisco and the network infrastructure vendors inside Glasswing will start pushing firmware updates against Mythos findings. Field-deployed hardware that’s been in service for years is the worst case. Your network team has to handle a firmware patch rate they haven’t planned for.
  5. Enterprise software you didn’t realize was scanned. SAP, Salesforce, and the Big 4 consulting firms have an interest in scanning the platforms they deliver against. The disclosure path through those vendors is not the same as the disclosure path through the CVE database. You will get a private notice on a private channel, not a NIST advisory.

Each of these is its own operational queue. Most enterprise security programs run them as one. That is the design flaw the next two quarters are going to expose.

Where Most Security Budgets Are Already Misallocated

A reasonable CISO read on the Glasswing first-month numbers is that the operational layer needs more headcount. The reasonable CISO read is wrong, in the specific way that the Akeyless agent identity data showed earlier this month. The shape of the spend has to change before the size of the spend matters.

Three reallocations that move the needle right now.

Patch ops automation over net-new detection. Adding another vendor to the detection stack on top of Glasswing’s output is a coverage exercise. The mismatch is on the response side. Money spent on automated regression suites, canary deployment pipelines, and rollback automation pays back inside one disclosure cycle. Money spent on a fourth SAST product does not.

SBOM accuracy over SBOM coverage. Most enterprise software bills of materials are out of date the day they ship. The OSS disclosure wave that’s coming from Anthropic’s 1,000-project scan will land against SBOMs that say “we don’t use that library” when they actually do. Investing in continuous SBOM generation and validation now is the difference between absorbing the OSS avalanche and being surprised by it.

Patch operations as a measured business function, not an IT chore. Patch ops in most orgs sits one layer below the CIO and three layers below the CFO. The Glasswing data is the receipt that patch ops is now a top-five operational risk. The orgs that pull patch ops into the same governance forum that already covers enterprise AI vendor decisions will respond faster than the ones that don’t.

The trap is treating this as a tooling problem. It’s a process design problem. The tooling exists. The process around it was built for a different decade.

The Vendor Side of the Math

The numbers also have a vendor implication that hasn’t shown up in the analyst coverage yet. Anthropic just demonstrated, with the cleanest single dataset the security field has produced this year, that frontier security AI is a category in its own right. The closest commercial product is Claude Security in public beta for Claude Enterprise customers, which runs on Opus 4.7 and produces a fraction of what Mythos surfaces, but is the closest thing buyers can deploy themselves this week.

The thing most security buyers will get wrong is treating Mythos and Claude Security as the same product. They are not. Mythos is a research preview Anthropic holds inside Glasswing because it believes the model is too capable to ship. Claude Security is the productized, safer, slower version any enterprise customer can run today. The right move is to deploy the available product against your own code while you wait for the more capable one.

OpenAI’s GPT-5.4-Cyber is the obvious comparison. Gated to vetted defenders. Focused on the same problem. The competitive dynamic between Anthropic and OpenAI inside frontier security is the next 12 months of the field, and the Glasswing data just changed the terms.

The Anti-Hype Read

Two cautions before the headline writes itself.

The first is that “10,000 vulnerabilities” is a confirmed-finding count, not a unique-CVE count. Some of those findings will collapse into existing CVEs once the public disclosure process catches up. The real number of net-new public CVEs from Glasswing’s first month is probably somewhere between 4,000 and 7,000. That’s still an unprecedented disclosure rate for a single program, but it is not the literal 10,000 the headline number implies.

The second is that the 27% high/critical rate against Anthropic’s OSS scan is a snapshot, not a steady-state. The OSS pool Anthropic scanned first was probably weighted toward the systemically important projects. The hit rate on the long tail of enterprise codebases will be lower. Realistic expectation for an internal enterprise scan is 8-15% high/critical against total findings, not 27%. That’s still a meaningful jump from the 1-3% rate most internal SAST tools produce, just not the Glasswing headline rate.

Neither caution changes the operational picture. The discovery side of security just outran the patching side. The orgs that pivot inside this quarter will be the ones writing the audit narrative for 2027. The orgs that don’t will be defending one.

Your Three Moves Before Q3

Three actions, sized for any enterprise running a real security program. Doable inside 90 days. Will position your patch operation against the new disclosure rate instead of the old one.

  1. Audit your patch SLA against the new inbound rate this week. Pull the last six months of your inbound CVE feed. Multiply the critical and high counts by five. Then by ten. Ask the patch ops team what breaks first at each multiplier. The answer is not “we hire more people.” The answer is the specific automation gap that has to close before the queue compounds. Document the gap. That document is the budget conversation for the next planning cycle.

  2. Move one critical patch path to a fully automated pipeline this month. Pick the lowest-risk critical patch class. Browser updates, OS security patches, or a single enterprise platform. Wire it to a canary deployment, an automated regression suite, and a rollback gate. Measure the new mean time to patch. The result is the operational template for the patch classes that come next. The Microsoft agent governance toolkit and the existing CI/CD stack already do most of this work. The missing piece is usually the regression test coverage.

  3. Stand up Claude Security against one production repo this quarter. Mythos is not available. Claude Security is. The closed-preview track record showed bugs that existing SAST tools had missed for years. The Claude Enterprise contract you already have is the only procurement requirement. Pilot on one repo, compare findings against your existing SAST, document the delta. That documentation is what makes the case for rolling the scanner across the rest of the codebase before the next disclosure wave hits.

The career version of this move is also real. The orgs that respond well to the Glasswing data will be hiring patch ops automation engineers, SBOM specialists, and AI security analysts faster than the labor market is producing them. The premium on those roles will climb through the back half of 2026.

Bottom Line

The defender side of AI security is winning the discovery race. The first-month receipts from Project Glasswing are the cleanest evidence anyone in the field has produced. Ten thousand confirmed high or critical vulnerabilities across the world’s most depended-on software, inside a month, from a program operated by 50 organizations. That is not a hype number. That is the new baseline.

Whether your environment benefits or drowns depends on a question that has nothing to do with which AI security vendor you pick. It depends on whether your patch operation is structured to absorb a discovery rate five to ten times the one it was designed for. Most are not. The fix is process design and operational automation, not another tool.

Run the patch SLA stress test this week. Automate one critical patch path this month. Stand up Claude Security against a production repo this quarter. Three moves. Doable inside one operational cycle. The disclosure avalanche is already in flight. The question is which side of the patch backlog your team is on when it lands.

The defenders won the AI war. Now they have to ship the fixes.


Related Reading:

TAGS

AI vulnerability scanning enterpriseClaude Mythos vulnerabilitiesProject Glasswing impactenterprise patch management 2026AI security tools ROI

SHARE THIS ARTICLE

Ready to Take Action?

Whether you're building AI skills or deploying AI systems, let's start your transformation today.