Remix clone Hacker News

new | show | ask | jobs Github

	▲	godelski 5 days ago
		Try a few times and it'll happen. I don't think it took me more than 3 tries on any platform. To convince me it is "reasoning", it needs to get the answer right consistently. Most attempts were actually about getting it to show its results. But pay close attention. GPT got the answer right several times but through incorrect calculations. Go check the "thinking" and see if it does a 11-9=2 calculation somewhere, I saw this >50% of the attempts. You should be able to reproduce my results in <5 minutes. Forgive my annoyance, but we've been hearing the same argument you've made for years[0,1,2,3,4]. We're talking about models that have been reported as operating at "PhD Level" since the previous generation. People have constantly been saying "But I get the right answer" or "if you use X model it'll get it right" while missing the entire point. It never mattered if it got the answer right once, it matters that it can do it consistently. It matters how it gets the answer if you want to claim reasoning. There is still no evidence that LLMs can perform even simple math consistently, despite years of such claims[5] [0] https://news.ycombinator.com/item?id=34113657 [1] https://news.ycombinator.com/item?id=36288834 [2] https://news.ycombinator.com/item?id=36089362 [3] https://news.ycombinator.com/item?id=37825219 [4] https://news.ycombinator.com/item?id=37825059 [5] Don't let your eyes trick you, not all those green squares are 100%... You'll also see many "look X model got it right!" in response to something tested multiple times... https://x.com/yuntiandeng/status/1889704768135905332