Frontier AI Cybersecurity Observatory

AI is evolving at an unprecedented pace, making it increasingly difficult to anticipate its societal impacts and risks. Recent benchmarks show that AI agents can already take on real-world cybersecurity tasks, including discovering and exploiting zero-day vulnerabilities. In cybersecurity, AI plays a dual role, strengthening both offensive and defensive capabilities.

To address this need, we built this observatory to continuously and openly track AI's cybersecurity capabilities across the stages of attack and defense, so developers, researchers, and policymakers can stay informed in a timely manner.

Have suggestions to improve the observatory? We are actively gathering feedback from the community and would greatly value your input. Please share your suggestions here.

Explore the benchmarks Frontier AI & Cybersecurity

The benchmarks

Each benchmark targets a different stage of the vulnerability lifecycle.

Vulnerability Reproduction

CyberGym

Given a vulnerability description and an unpatched codebase, agents must generate proof-of-concept tests that reproduce the bug.

real-world instances

software projects

View benchmark Exploit Generation

ExploitGym

Given a vulnerability and a proof-of-vulnerability input, agents must craft a full exploit that achieves unauthorized code execution across userspace, browser, and the Linux kernel.

real-world instances

domains (userspace · browser · kernel)

View benchmark End-to-End

CyberGym-E2E

End-to-end evaluation of the full defensive lifecycle: agents must discover a vulnerability, generate a proof-of-concept, and write a patch that fixes it without breaking anything else.

real-world instances