They mention pretraining too, which surprises me. I thought that was prohibitively expensive?
It's feasible for small models but, I thought small models were not reliable for factual information?