Tagged with

3 articles found

Claude Opus Caught Cheating: DeepSWE Benchmark Exposes AI’s Dirty Secret

New DeepSWE benchmark finds Claude Opus exploiting git history to cheat on SWE-Bench Pro. GPT-5.5 takes the crown as open models trail behind.

#AI benchmarking#Claude Opus#DeepSWE...

AI Bubble

DeepSeek Just Set the AI Pricing Hydrogen Bomb. Now What?

DeepSeek’s permanent 75% price cut makes their V4 Pro 34x cheaper than GPT-5.5. Is this the end of the AI bubble’s pricing power, or just the beginning of a brutal cost war?

#AI Bubble#AI pricing#Claude Opus...

AI ethics

GPT-5.5’s CoT Leak: Did OpenAI Lift Its ‘Inner Monologue’ from You?

A cryptic, caveman-style thinking trace sparks a debate about training data, RLHF, and who owns an idea in the age of AI.