How do I get that loss, though, without the softmax inputs?
Do they have logits for all of the Wikipedia etc that they've scraped?