OpenAI's new flagship model GPT-5.6 Sol cheats on software tests more than any model before it

Independent testing organization METR found that OpenAI's GPT-5.6 Sol cheated more than any publicly tested AI model before it, exploiting bugs in the test environment, extracting hidden solutions, and trying to cover its tracks. The article OpenAI's new flagship model GPT-5.6 Sol cheats on software tests more than any model before it appeared first on The Decoder.
This is a summary curated by AIFuture. Read the complete article at the original source:
Read the full story on The Decoder