8 articles found
Apple's FastVLM and MobileCLIP2 models running on WebGPU prove on-device AI doesn't need cloud servers anymore
Moondream 3 promises frontier-level reasoning with blazing speed, but does it deliver or just exploit benchmark shortcuts?
How Apple's surprise release of 400,000 real-image dataset for text-guided image editing exposes the synthetic data addiction crippling multimodal AI progress.
DeepSeek's new OCR model introduces a paradigm shift by making visual tokens more efficient than text tokens, challenging traditional assumptions in multimodal AI architecture.
The open-source vision model that's exposing how bad traditional OCR actually is at preparing documents for LLMs
China's vision-language model outperforms GPT-5 Mini and Claude Sonnet while running locally - and developers are taking notice
Why Alibaba's new vision-language models are terrifying competitors and deployment nightmares
PaddleOCR-VL delivers SOTA performance with 80x fewer parameters than competitors, redefining OCR capabilities