▲ | vintermann 14 hours ago | |
They test specific prompts with temperature 0. It is of course possible that all their tests prompts were lucky, but still then, shouldn't you see an immediate drop followed by a flat or increasing line? Also, from what I understand from the article, it's not a difficult task but an easily machine checkable one, i.e. whether the output conforms to a specific format. | ||
▲ | lostmsu 5 hours ago | parent | next [-] | |
With T=0 on the same model you should get the same exact output text. If they are not getting it, other environmental factors invalidate the test result. | ||
▲ | Spivak 12 hours ago | parent | prev [-] | |
If it was random luck, wouldn't you expect about half the answers to be better? Assuming the OP isn't lying I don't think there's much room for luck when you get all the questions wrong on a T/F test. |