▲ | blackqueeriroh 4 days ago | |
Simple answer: there are two separate processes here, training and inference. As you discuss, training happens over a long period of time in a (mostly) hands-off fashion once it starts. But inference? That’s a separate process which uses the trained model to generate responses, and it’s a runtime process - send a prompt, inference runs, response comes back. That’s a whole separate software stack, and one that is constantly being updated to improve performance. It’s in the inference process where these issues were produced. |