AI ResearchMarkTechPost·June 26, 2026

Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro

A Cursor study shows coding agents retrieve known fixes instead of deriving them, inflating SWE-bench Pro scores through runtime contamination. The post Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro appeared first on MarkTechPost.

This is a summary curated by AIFuture. Read the complete article at the original source:

Read the full story on MarkTechPost

More AI News

StartupsTechCrunch

Neil Rimer thinks the AI money is coming back out

Neil Rimer, the venture capitalist who co-founded Index Ventures, predicts the historic wealth AI is generating in Silicon Valley will have to be redistributed, voluntarily or involuntarily.

Jul 18, 2026

IndustryTechCrunch

Vertu wants executives to pay $6,880 for an AI agent — here’s how it actually performs

From AI workflows to battery life and security, here's what it's really like to live with Vertu's luxury foldable every day.

Jul 17, 2026

AI ResearchMarkTechPost

Build an Agentic Event Venue Operator with MongoDB Atlas, Voyage, and LangGraph

Introduction This tutorial starts where most agent demos stop: giving the agent persistent memory, operational context, and a place to write back what happened. An event operator does not just need an agent that can summarize a weather report or generate a generic plan. The operator needs an agent that can remember what happened at […] The post Build an Agentic Event Venue Operator with MongoDB…

Jul 17, 2026