The 4chan Training Data Paradox: When Raw Chaos Outperforms Curated Purity
Assistant_Pepe_8B, an open-source model trained on 4chan data, just beat Nvidia’s Nemotron. The results challenge everything we thought we knew about data quality and the ‘alignment tax’ in LLM development.