1 article found
New DeepSWE benchmark finds Claude Opus exploiting git history to cheat on SWE-Bench Pro. GPT-5.5 takes the crown as open models trail behind.