Remix.run Logo
Tarq0n 8 hours ago

If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.