The Skepticism Was Loud

When Mozilla's CTO claimed AI-assisted vulnerability detection would make zero-days obsolete, the developer community rolled its eyes. We've all seen AI-generated "slop" bug reports — plausible-sounding but hallucinated nonsense that wastes human time.

Mozilla heard the criticism. On Thursday, they published a detailed breakdown of how they used Anthropic's Mythos model to find 271 genuine Firefox vulnerabilities over two months. The key: a custom harness that turned an LLM into a disciplined bug hunter.

The Harness: Not Just a Prompt

Mozilla Distinguished Engineer Brian Grinstead explained that earlier attempts with AI failed because they were just prompts. You'd ask a model to find bugs, it would produce pages of reports, but most were garbage.

What changed? A purpose-built agent harness. This isn't a fancy prompt — it's code that wraps the LLM and gives it:

  • Clear instructions ("find a bug in this file")
  • Tools (read/write files, run test cases)
  • A loop that keeps the model working until it succeeds or fails

The harness connected Mythos to the same build and testing pipeline human developers use. For memory safety bugs, Mozilla's sanitizer build crashes when an unsafe condition is triggered. The model would craft HTML or other test cases, run them through existing fuzzing tools, and check if a crash occurred.

Double Verification

A second LLM graded each finding. Only high-scoring results made it into Bugzilla. According to Grinstead, the false positive rate is "almost zero." That's a stark contrast to the AI-generated reports that have plagued open-source projects.

Mozilla opened 12 of the 271 bug reports for public inspection. Each includes a test case that triggers the vulnerability — the same standard Mozilla uses for all security bugs. At least one outside researcher confirmed they look legitimate.

The Numbers

Of the 271 bugs:

  • 180 were rated sec-high (exploitable through normal browsing)
  • 80 were sec-moderate
  • 11 were sec-low

No CVEs were filed, but that's normal for internally discovered bugs. Mozilla bundles them into a single patch. The reports were hidden until now to give users time to update.

Why This Matters

Mozilla didn't just run a model and publish results. They built infrastructure that converts AI's pattern-matching ability into actionable, verified bug reports. That's the difference between hype and production.

Critics will still question whether these 271 bugs are cherry-picked. Mozilla expects that. But Grinstead insists there's no marketing angle — his team has "completely bought in" and wants to share the technique, not promote a specific vendor.

What You Can Take Away

If you're maintaining a large codebase, the takeaway isn't "buy Mythos." It's "build a harness." The model is just the engine; the harness provides the steering, brakes, and safety checks. Without it, you get slop.

Start by identifying a clear success signal — something deterministic like a crash or test failure. Then wrap an LLM in code that gives it the same tools your developers use. Verify results with a separate model or automated check. Iterate until false positives are rare.

Mozilla proved it works at scale. Now it's up to the rest of us to adopt the pattern — or keep drowning in slop.