How capable are AI agents at real-world cybersecurity?

A cybersecurity observatory of large-scale, high-quality benchmarks measuring how well AI agents handle real-world vulnerabilities, from discovering and reproducing them to developing working exploits or patches.

The benchmarks

Each benchmark targets a different stage of the vulnerability lifecycle.

Why we built this

AI agents are rapidly getting better at autonomous cybersecurity, and the stakes are rising fast. We built this cybersecurity observatory to measure that capability rigorously and openly, on real-world software drawn from widely deployed projects, so defenders, AI developers, and policymakers can act on real evidence.

Supersedes our earlier Frontier AI Cybersecurity Observatory (deprecated).