| ▲ | simianwords 3 days ago | ||||||||||||||||
This paper is complete nonsense. The specific prompt they used doesn’t specify reasoning effort. Which defaults to none.
> Lower reasoning effortThe reasoning.effort parameter controls how many reasoning tokens the model generates before producing a response. Earlier reasoning models like o3 supported only low, medium, and high: low favored speed and fewer tokens, while high favored more thorough reasoning. Starting with GPT-5.2, the lowest setting is none to provide lower-latency interactions. This is the default setting in GPT-5.2 and newer models. If you need more thinking, slowly increase to medium and experiment with results. With reasoning effort set to none, prompting is important. To improve the model’s reasoning quality, even with the default settings, encourage it to “think” or outline its steps before answering. ———————- So in the paper, the model very likely used no reasoning tokens. (Only uses it if you ask for it specifically in prompt). What is the point of such a paper? We already know that reasoning tokens are necessary. Edit: I actually ran the prompt and this was the response
}So reasoning_tokens used were zero. So this whole paper is kinda useless and misleading. Did this get peer reviewed or something? | |||||||||||||||||
| ▲ | Chobilet 3 days ago | parent [-] | ||||||||||||||||
I'm sure this comment was made in good faith, but most researchers would rightfully understand these intricacies, and this is likely intentional(as noted in the paper). At a quick glance, I cannot say whether or not the paper has been peer reviewed(though unlikely/in process given how recent it was published). In general, you'd find published papers also listed in a specific journal/conference(i.e. not just the archives which anyone can submit to). Additionally, many of us in the field of researching LLM's are curious to understanding the boundaries and limitations of what is capable. This paper isn't really meant as any sort of "gotcha", rather serve as a possible basis point for future work. Though with a caveat I'm still digesting the paper myself. | |||||||||||||||||
| |||||||||||||||||