1 article found
It turns out BERT's masked language training looks suspiciously like a single step of discrete text diffusion.