Cecuro’s AI smart contract security agent identifies 87.7% of high-severity findings on OpenAI’s EVMBench, as AI exploit costs fall to $1.22 per contract and offensive capability doubles every 1.3 months.

Cecuro, an AI-powered smart contract auditing platform, today published results from its evaluation on EVMBench, the open-source smart contract security benchmark developed by OpenAI, Paradigm, and OtterSec. Cecuro’s multi-agent security system achieved an 87.7% recall on the benchmark’s “detect” task, identifying 101 of 120 high-severity vulnerabilities across 40 real-world audit cases. The next-best system, Anthropic’s Claude Opus 4.6, scored 45.6%.

EVMBench was released in February, 2026 to evaluate AI agents on three smart contract security tasks: detecting vulnerabilities, exploiting them, and patching them. The benchmark contains 120 curated high-severity findings from 40 audit cases sourced primarily from competitive audit platforms. It has quickly established itself as the industry standard for measuring AI security capabilities in the EVM ecosystem.

AI EXPLOIT CAPABILITY IS ACCELERATING

These results arrive amid growing evidence that AI is fundamentally changing the economics of smart contract exploitation. Anthropic’s SCONE-bench research, published in December 2025, found that the average cost of an AI-powered exploit scan has fallen to just $1.22 per contract, and that offensive AI capability is doubling approximately every 1.3 months. On EVMBench itself, OpenAI’s GPT-5.3-Codex achieved a 72.2% success rate executing end-to-end exploits against known vulnerable contracts.

With $3.4 billion stolen from blockchain platforms in 2025, the gap between offensive and defensive AI capability continues to widen. A motivated adversary can now scan thousands of smart contracts for under $2,000 using commercially available AI tools, raising the question of whether the industry’s defensive tooling is keeping pace.

GENERAL-PURPOSE AI IS NOT ENOUGH

The EVMBench results show that general-purpose AI, despite its broad capabilities, falls short on smart contract security. Every major AI lab was represented in the evaluation. Cecuro’s 87.7% detection rate was nearly double Claude Opus 4.6 (45.6%), more than double GPT-5.3-Codex and GPT-5.2 (both 39.2%), and more than four times Gemini 3 Pro (20.8%). OpenAI’s o3 detected just 10.6%.

The performance gap stems from architectural specialization. General-purpose models bring strong reasoning capabilities but lack the structured methodology and domain knowledge required for systematic vulnerability detection: lending protocol mechanics, AMM price manipulation vectors, cross-contract callback risks, and DeFi-specific interaction patterns that drive real-world losses.

These findings are consistent with an earlier benchmark published by Cecuro and covered by CoinDesk, which evaluated 90 real-world exploited contracts representing $228 million in losses and found a 92% detection rate, with Cecuro’s system covering $96.8 million in exploitable value compared to $7.5 million for a standard frontier AI agent.

EVMBench uses containerized environments, deterministic EVM replay, and automated scoring. All 120 findings are high-severity vulnerabilities independently confirmed through competitive audit processes. The benchmark applies simple system prompts across leading AI coding tools, providing a standardized baseline for comparison. 

About Cecuro

Cecuro is the industry leading AI smart contract auditing platform. Its rigorous multi-agent AI smart contract auditing system provides vulnerability detection across all blockchain networks and smart contract languages, delivering comprehensive security reviews in hours. 

Learn more at https://cecuro.ai.

Tags