An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in just 14 hours. But every model tested still fails on the most complex tasks. The article An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run…
This is a summary curated by AIFuture. Read the complete article at the original source:
Read the full story on The Decoder