The 1.8M-Parameter Language Model That Questions Everything We Know About Scale
An enthusiast’s journey training a minimal-scale LLM from scratch reveals how architectural innovation and obsessive data curation can squeeze GPT-2 level quality into 25MB.