1 article found
A new subjective benchmarking approach reveals what standardized tests miss about AI model capabilities and training data overlap.