▲ | keeda 6 days ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> ... LLMs aren't very good at logical reasonsing. I'm curious about what experiences led you to that conclusion. IME, LLMs are very good at the type of logical reasoning required for most programming tasks. E.g. I only have to say something like "find the entries with the lowest X and highest Y that have a common Z from these N lists / maps / tables / files / etc." and it spits out mostly correct code instantly. I then review it and for any involved logic, rely on tests (also AI-generated) for correctness, where I find myself reviewing and tweaking the test cases much more than the business logic. But then I do all that for all code anyway, including my own. So just starting off with a fully-fleshed out chunk of code, which typically looks like what I'd pictured in my head, is a huge load off my cognitive shoulders. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | pron 6 days ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The experience was that I once asked an LLM to write a simple function and it produced something very wrong that nothing with good reasoning abilities should ever do. Of course, a drunk or very tired human could have done the same mistake, but they would have at least told me that they were impaired and unsure of their work. I agree that most of the time it does most simple tasks mostly right, but that's not good enough to truly "offload" my mental effort. Again, I'm not saying it's not useful, but more than working with a junior developer it's like working with a junior developer who may or may not be drunk or tired and doesn't tell you. But mostly my point is that LLMs seem to do logical reasoning worse than other things they do better, such as generating prose or summarising a document. Of course, even then you can't trust them yet. > But then I do all that for all code anyway, including my own I don't, at least not constantly. I review other people's code only towards the very end of a project, and in between I trust that they tell me about any pertinent challenge or insight, precisely so that I can focus on other things unless they draw my attention to something I need to think about. I still think that working with a coding assistant is interesting and even exciting, but the experience of not being able to trust anything, for me at least, is unlike working with another person or with a tool and doesn't yet allow me to focus on other things. Maybe with more practice I could learn to work with something I can't trust at all. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | foobarbecue 6 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In your example, you didn't ask the LLM to do any logic. You asked it to translate your logic into code. Asking an LLM to do logic would be saying something like: "I have a row of a million light switches. They all start off. I start at the beginning and flip on every fourth one. Then I flip on every eighth one, then sixteen, and all the powers of two until I'm over a million. Now I do the same for the powers of three, then four, then five, and so on. How many light switches are on at the end? Do not use any external coding tools for this; use your own reasoning." Note that the prompt itself is intentionally ambiguous -- a human getting this question should say "I don't understand why you started with every fourth instead of every second. Are you skipping the first integer of every power series or just when the exponent is two?" When I asked GPT5 to do it, it didn't care about that; instead it complimented me on my "crisp statement of the problem," roughly described a similar problem, and gave a belivable but incorrect answer 270,961 . I then asked it to write python code to simulate my question. It got the code correct, and said "If you run this, you’ll see it matches the 270,961 result I gave earlier." except, that was a hallucination. Running the code actually produced 252711. I guess it went with 270,961 because that was a lexically similar answer to some lexically similar problems in the training data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | __MatrixMan__ 6 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I'm not who you're replying to but I had a scenario where I needed to notice that a command had completed (exit code received) but keep listening for any output that was still buffered and only stop processing tokens after it had been quiet for a little bit. Trying to get Claude to do this without introducing a deadlock and without exiting too early and leaving valuable output in the pipe was hellish. It's very good at some kinds of reasoning and very bad at others. There's not much it's mediocre at. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | jpfromlondon 5 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
https://arstechnica.com/ai/2025/08/researchers-find-llms-are... |