| ▲ | tipsytoad 2 hours ago | |
1.0 is actually pretty arbitrary and way too high as a general rule. Something like 0.3 is a more sensible default | ||
| ▲ | 317070 an hour ago | parent | next [-] | |
If RL was used to train the model, the model will have been trained on its own sequences. Those will have been generated with a temperature of 1.0. They must be, otherwise you would get a premature collapse or explosion of your entropy if the temperature was respectively lower or higher. After that RL step, you want to stick to the RL distribution, and so keep a temperature of 1.0. Other temperatures will drive the model out-of-distribution. That is why the sampling step for agents or thinking LLMs are usually kept at a temperature of 1.0. | ||
| ▲ | zipy124 2 hours ago | parent | prev | next [-] | |
It really depends on the application does it not? I'm not an LLM guy, but for creative tasks like storytelling wouldn't you want a higher temperature usually? Happy to gain insight from anyone with experience here :) | ||
| ▲ | embedding-shape 2 hours ago | parent | prev | next [-] | |
Heavily depends on the model architecture and the implementation though, I don't think you can say what values are better than others without first specifying those, otherwise it's straight up guessing, ironically. | ||
| ▲ | nullc an hour ago | parent | prev [-] | |
If you use a model in a configuration far from where it was RLed you get no warranty. (you also get no warranty the other way, however) | ||