OpenAI launches EVMbench to test AI agents on smart contract security days after Claude Opus 4.6-assisted code triggered a $1.78M DeFi exploit. Smart contracts OpenAI launches EVMbench to test AI agents on smart contract security days after Claude Opus 4.6-assisted code triggered a $1.78M DeFi exploit. Smart contracts

OpenAI Drops EVMbench After Claude Vibe Code Disaster

2026/02/20 02:30
4 min read

OpenAI launches EVMbench to test AI agents on smart contract security days after Claude Opus 4.6-assisted code triggered a $1.78M DeFi exploit.

Smart contracts protect over $100 billion in open-source crypto assets. That number alone should explain why OpenAI’s latest move is drawing serious attention. The company, working alongside crypto investment firm Paradigm, rolled out EVMbench, a benchmark designed to test how well AI agents detect, exploit, and patch high-severity smart contract vulnerabilities.

The benchmark draws from 120 curated vulnerabilities pulled across 40 audits. Most of those came from open code audit competitions. What makes it different is the scope. EVMbench tests three distinct capability modes: detect, patch, and exploit, each measured separately and graded through a Rust-based harness that replays transactions in a sandboxed local environment. No live networks involved.

You might also like: Claude-Generated Code Linked to $1.78M DeFi Hack

The Number That Should Worry Everyone

In exploit mode, GPT-5.3-Codex via Codex CLI scored 72.2%. Six months back, GPT-5 sat at 31.9% on the same metric. That gap is not small. OpenAI confirmed the figures in its official announcement on X, framing EVMbench as both a measurement tool and a call to action for the security community.

Detect and patch scores remain lower. Agents in the detection setting sometimes identify a single vulnerability and then stop. They do not exhaust the codebase. In patch mode, the challenge is preserving full contract functionality while removing the flaw. That balance is still giving models trouble.

Must read: Trust Wallet Security Hack: How to Safeguard Your Assets

A $1.78M Oracle Error Nobody Caught

The backdrop to all of this matters. Security researcher evilcos flagged on X that the DeFi lending protocol Moonwell suffered a loss of approximately $1.78 million. The cause was an Oracle configuration error. A price feed formula was written incorrectly, setting cbETH’s value at $1.12 instead of approximately $2,200.

That is a low-level mistake. The kind of careful audit should catch. The GitHub pull request for proposal MIP-X43 showed commits co-authored by Claude Opus 4.6. Anthropic’s latest and most capable model at the time.

Smart contract auditor pashov posted on ,X calling it possibly the first exploit tied to vibe-coded Solidity. He was careful to note that human reviewers still hold final responsibility. A security auditor signs off before anything goes on-chain. But something in that chain broke down.

What EVMbench Is Actually Built to Do

The benchmark includes vulnerability scenarios from the security audit of the Tempo blockchain, a purpose-built L1 designed for high-throughput stablecoin payments. That extension pushes EVMbench into payment-oriented contract code, an area where OpenAI expects agentic stablecoin activity to grow.

Each exploit task runs in an isolated Anvil instance. Transactions replay deterministically. The grading setup restricts unsafe RPC methods and was red-teamed internally to stop agents from gaming results. Vulnerabilities used are historical and publicly documented.

OpenAI is also committing $10M in API credits to accelerate cyber defense, with priority given to open-source software and critical infrastructure. Its security research agent Aardvark, is expanding into private beta. Free codebase scanning for widely used open-source projects is part of that push.

The Vibe-Coding Question With Real Stakes

Pashov’s post on X raised what many in the DeFi space had been avoiding. When AI writes production Solidity code and humans approve it fast, the review layer gets thin. The Moonwell incident showed exactly how thin it can get.

OpenAI acknowledged that cybersecurity is inherently dual-use. Its response is evidence-based. Safety training, automated monitoring, and access controls for advanced capabilities are part of that. But a 72.2% exploit score on a public benchmark is the kind of number that does not stay quiet.

EVMbench’s full task set, tooling, and evaluation code are now public. The goal is to let researchers track AI cyber capabilities as they grow, and build defenses at the same pace. Whether that pace is fast enough is the question nobody has answered yet.

The post OpenAI Drops EVMbench After Claude Vibe Code Disaster appeared first on Live Bitcoin News.

Market Opportunity
Smart Blockchain Logo
Smart Blockchain Price(SMART)
$0,004509
$0,004509$0,004509
-2,02%
USD
Smart Blockchain (SMART) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Pi Network (PI) Daily Market Analysis 22 February 2026

Pi Network (PI) Daily Market Analysis 22 February 2026

Pi Network's anniversary update – here's the latest: • Marked 1st mainnet anniversary on 20 February 2026, outlining next phase priorities • Key focuses: expanding
Share
Coinstats2026/02/22 12:24
United States Building Permits Change dipped from previous -2.8% to -3.7% in August

United States Building Permits Change dipped from previous -2.8% to -3.7% in August

The post United States Building Permits Change dipped from previous -2.8% to -3.7% in August appeared on BitcoinEthereumNews.com. Information on these pages contains forward-looking statements that involve risks and uncertainties. Markets and instruments profiled on this page are for informational purposes only and should not in any way come across as a recommendation to buy or sell in these assets. You should do your own thorough research before making any investment decisions. FXStreet does not in any way guarantee that this information is free from mistakes, errors, or material misstatements. It also does not guarantee that this information is of a timely nature. Investing in Open Markets involves a great deal of risk, including the loss of all or a portion of your investment, as well as emotional distress. All risks, losses and costs associated with investing, including total loss of principal, are your responsibility. The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of FXStreet nor its advertisers. The author will not be held responsible for information that is found at the end of links posted on this page. If not otherwise explicitly mentioned in the body of the article, at the time of writing, the author has no position in any stock mentioned in this article and no business relationship with any company mentioned. The author has not received compensation for writing this article, other than from FXStreet. FXStreet and the author do not provide personalized recommendations. The author makes no representations as to the accuracy, completeness, or suitability of this information. FXStreet and the author will not be liable for any errors, omissions or any losses, injuries or damages arising from this information and its display or use. Errors and omissions excepted. The author and FXStreet are not registered investment advisors and nothing in this article is intended…
Share
BitcoinEthereumNews2025/09/18 02:20
Pump.fun (PUMP) Daily Market Analysis 22 February 2026

Pump.fun (PUMP) Daily Market Analysis 22 February 2026

Pump.fun faces turbulence amid team sell-offs and platform challenges – here's the latest: • Acquired Vyper Trading Terminal for professional tools (06 February
Share
Coinstats2026/02/22 12:32