OpenZeppelin audits EVMbench, finds data contamination

OpenZeppelin審計EVMbench發現資料污染

Blockchain security auditing firm OpenZeppelin has conducted an independent review of the AI-based smart contract security benchmark EVMbench, launched through a collaboration between OpenAI and Paradigm, and identified two major critical issues: data contamination during training and at least four vulnerabilities labeled as “high risk” that are actually invalid forgeries.

EVMbench Data Contamination: A Critical Flaw in AI Training Cutoff Dates

EVMbench was released in mid-February 2026, aiming to evaluate different AI models’ ability to identify, fix, and exploit smart contract vulnerabilities. During testing, the AI agents’ internet access was cut off to prevent them from searching for answers online. However, OpenZeppelin’s audit revealed a structural flaw: the benchmark is based on vulnerabilities from 120 audits conducted between 2024 and mid-2025, and most top AI models’ knowledge cutoff dates are also set in mid-2025.

This means AI agents likely encountered EVMbench’s vulnerability reports during pretraining, and their memory may already contain answers to all the questions. OpenZeppelin stated, “The most important ability for AI security is to discover new vulnerabilities in code that the model has never seen before.” The limited size of the dataset further amplifies the impact of contamination on overall evaluation.

Key Issues Found in the EVMbench Audit

  • Training Data Contamination: Pretraining of AI agents may have included EVMbench’s vulnerability reports, rendering “zero-knowledge discovery” tests meaningless.
  • Invalid High-Risk Vulnerability Classifications: At least four vulnerabilities marked as high risk are actually unexploitable.
  • Scoring System Flaws: EVMbench previously awarded points for AI discovering these fake vulnerabilities, indicating issues with the scoring basis.
  • Limited Dataset Size: Further magnifies the impact of contamination on overall results.
  • Current Leaderboard: Anthropic’s Claude 4.6 leads, followed by OpenAI’s OC-GPT-5.2 and Google’s Gemini 3 Pro.

Fake Vulnerability Crisis: At Least Four High-Risk Classifications Proven Invalid

Beyond data contamination, OpenZeppelin uncovered specific factual errors. They evaluated at least four vulnerabilities categorized as high risk by EVMbench and found that these vulnerabilities do not exist—more critically, the described exploit methods are fundamentally ineffective.

OpenZeppelin pointed out, “These are not subjective disagreements over severity; rather, the described exploit methods simply do not work.” If an AI agent “discovers” these fake vulnerabilities during testing, it indicates the scoring system rewards incorrect results.

OpenZeppelin emphasized that this audit does not negate AI’s potential in blockchain security: “The issue is not whether AI will change the security of smart contracts— it certainly will. The problem is whether the data and benchmarks we use to build and evaluate these tools are aligned with the standards of the contracts they aim to protect.”

Frequently Asked Questions

Q: What issues did OpenZeppelin find in their audit of EVMbench?
A: They identified two core problems: first, data contamination, as EVMbench’s test vulnerabilities come from audits conducted between 2024 and 2025, overlapping with AI models’ training cutoff dates, meaning models may have “seen” the answers during pretraining; second, at least four high-risk vulnerabilities are invalid forgeries, with exploit descriptions that are actually unexecutable.

Q: Why is data contamination so dangerous for AI security evaluation?
A: If AI models have already encountered the benchmark’s vulnerability reports during pretraining, they might answer questions based on memory rather than genuine vulnerability discovery ability. This invalidates the “zero-knowledge” test, making it impossible to accurately assess AI’s real security auditing capabilities against entirely new, unknown smart contracts.

Q: What is OpenZeppelin’s attitude toward AI’s future in blockchain security?
A: They believe AI will significantly impact smart contract security but emphasize that this influence must be based on trustworthy methodologies and accurate evaluation standards. They see the issues with EVMbench not as a rejection of AI’s potential but as an important warning to the industry standards.

View Original
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

Sillytuna Hackers Move Over $10M in Stolen Crypto Funds

The Sillytuna hackers have laundered over $10M in stolen funds, primarily using Bitcoin and DAI, through exchanges and mixers to obscure the origins. Despite these movements, they still hold $19M in stolen assets.

BlockChainReporter3h ago

Shiba Inu: Alert Issued as SHIB Participant Social Media Account Gets Hacked - U.Today

Ragnarshib warns the Shiba Inu community about a hacked account belonging to Vet Kusama, currently used by scammers to send fraudulent messages. Users are advised not to interact with the account or its links until it is recovered.

UToday5h ago

IoTeX Releases ioTube Security Incident Report: Actual Losses Approximately $4.4 Million, Pledges Full Compensation to Affected Users

IoTeX reports that the ioTube cross-chain bridge incident on March 6 resulted in approximately $4.4 million in losses. 99.5% of the stolen assets have been frozen, and the team has committed to fully compensate affected users. The mainnet has resumed operation, and the attacker’s address has been blacklisted. Meanwhile, efforts are underway to promote decentralized governance and security audits.

GateNews8h ago

Prince Group is laundering 10.7 billion NT dollars in Taiwan! Developing their own "OJBK Wallet" to connect with underground currency exchanges.

Taipei District Prosecutors Office is investigating the Cambodia "Prince Group" money laundering case, indicting 62 individuals and 13 companies. The involved amount of money laundering is 10.7 billion, and assets worth 5.5 billion have been seized. The group used USDT and their self-developed "OJBK Wallet" to conduct cross-border money laundering, conceal criminal proceeds, and withdraw cash in multiple countries.

区块客10h ago

HypurrFi reveals that early versions of Aave V3 had a rounding error vulnerability, and the addition of new lending markets for XAUT0 and UBTC has been suspended.

HyperEVM's custodial lending protocol HypurrFi disclosed that previous versions of Aave V3 had a "rounding error" vulnerability, allowing attackers to extract underlying tokens. HypurrFi guarantees the safety of user funds, has paused supply and borrowing operations in affected markets, and is working with relevant parties to address security issues.

GateNews11h ago

AI agents bypass Cloudflare protection, encrypting DeFi front-end security faces further tests

Recently, the autonomous AI agent OpenClaw successfully bypassed Cloudflare defenses using the Scrapling library, raising concerns about DeFi security. Although the tool can legally scrape content, the potential risks remind developers to establish multiple layers of defense and avoid over-reliance on traditional protection measures.

GateNews11h ago
Comment
0/400
No comments