Remix.run Logo
blackqueeriroh 4 days ago

Simple answer: there are two separate processes here, training and inference.

As you discuss, training happens over a long period of time in a (mostly) hands-off fashion once it starts.

But inference? That’s a separate process which uses the trained model to generate responses, and it’s a runtime process - send a prompt, inference runs, response comes back. That’s a whole separate software stack, and one that is constantly being updated to improve performance.

It’s in the inference process where these issues were produced.