That's how BERT is trained, masked language modeling

I've used BERT to do that sort of thing. It was a prototype and I was using Pytorch, also, I'm not an expert on Pytorch performance. I also tried with models that succeeded BERT for masked token. My issue with it is that it was slow :-( . My second issue with it is that it wasn't integrated in my favorite word editor. But definitively useful.

	▲	anuramat 3 days ago \| parent [-]
		Did you try any diffusion models? They should be quick enough