GPT and Claude failed Bridgewater's finance tests because the right answers were never public

Bridgewater and Thinking Machines Lab—the startup from former OpenAI CTO Mira Murati—have fine-tuned a Qwen3-235B model for financial tasks. According to their own testing, the model hits 84.7 percent accuracy, beating Gemini, Claude, and GPT at roughly one-fourteenth of the cost. The numbers haven't been verified by anyone outside the two companies, though. The article GPT and Claude failed…
This is a summary curated by AIFuture. Read the complete article at the original source:
Read the full story on The Decoder