AI ResearchThe Decoder·June 26, 2026

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in just 14 hours. But every model tested still fails on the most complex tasks. The article An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run…

This is a summary curated by AIFuture. Read the complete article at the original source:

Read the full story on The Decoder

More AI News

StartupsTechCrunch

Neil Rimer thinks the AI money is coming back out

Neil Rimer, the venture capitalist who co-founded Index Ventures, predicts the historic wealth AI is generating in Silicon Valley will have to be redistributed, voluntarily or involuntarily.

Jul 18, 2026

IndustryTechCrunch

Vertu wants executives to pay $6,880 for an AI agent — here’s how it actually performs

From AI workflows to battery life and security, here's what it's really like to live with Vertu's luxury foldable every day.

Jul 17, 2026

AI ResearchMarkTechPost

Build an Agentic Event Venue Operator with MongoDB Atlas, Voyage, and LangGraph

Introduction This tutorial starts where most agent demos stop: giving the agent persistent memory, operational context, and a place to write back what happened. An event operator does not just need an agent that can summarize a weather report or generate a generic plan. The operator needs an agent that can remember what happened at […] The post Build an Agentic Event Venue Operator with MongoDB…

Jul 17, 2026