|
| ▲ | postalcoder a day ago | parent | next [-] |
| gpt-5* reasoning models do not have an adjustable temperature parameter. It seems like we may have a different understanding of these models. And, like the other commenter said, the temperature may change the distribution of the next token, but the reasoning tends to reel those things in, which is why reasoning models are notoriously poor at creative writing. You are free to run these experiments for yourself. Perhaps, with your deeper understanding, you'll shed new light on this behavior. |
|
| ▲ | swid a day ago | parent | prev | next [-] |
| It surely is different. If you set the temp to 0 and do the test with slightly different wording, there is no guarantee at all the scores would be consistent. And if an LLM is consistent, even with a high temp, it could give the same PR the same grade while choosing different words to say. The tokens are still chosen from the distribution, so a higher probability of the same grade will result in the same grade being chosen regardless of the temp set. |
| |
| ▲ | smt88 a day ago | parent [-] | | I think you're restating (in a longer and more accurate way) what I understood the original criticism to be, that this grading test isn't testing what's it's supposed to, partly because a grade is too few tokens. The model could "assess" the code qualitatively the same and still give slightly different letter grades. |
|
|
| ▲ | stevenhuang a day ago | parent | prev [-] |
| The irony is strong here. |