🥷 #OpenAI and Paradigm launched EVMbench, a benchmark designed to measure how well #AI agents can detect, patch, and exploit vulnerabilities across EVM ecosystems such as Ethereum. The benchmark is built from 120 high-severity vulnerabilities curated from 40 audits and includes scenarios related to the Tempo chain. Tests show GPT-5.3-Codex scored 72.2% in "exploit" mode compared to GPT-5 at 31.9%, while coverage for vulnerability detection and patching remains incomplete. #hack
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
🥷 #OpenAI and Paradigm launched EVMbench, a benchmark designed to measure how well #AI agents can detect, patch, and exploit vulnerabilities across EVM ecosystems such as Ethereum. The benchmark is built from 120 high-severity vulnerabilities curated from 40 audits and includes scenarios related to the Tempo chain. Tests show GPT-5.3-Codex scored 72.2% in "exploit" mode compared to GPT-5 at 31.9%, while coverage for vulnerability detection and patching remains incomplete. #hack
#crypto