| ▲ | heresalexandria 4 hours ago | |||||||
Something a lot of folks struggling with these systems don't get is that the instruction and management of them is often quite important - just because they're capable doesn't mean they're mind readers. Most of the skepticism I encounter on this front is due to lack of proper direction, process involving planning and review before execution, and appropriate attention given to evaluation and feedback loops. If you asked the smartest person in the world to YOLO a task with the sort of instruction the average denier uses to evaluate an LLM, you'd likely find they wouldn't get back what they were expecting either - and if you're evaluating on subpar models/tools, you shouldn't be surprised to get subpar results. | ||||||||
| ▲ | lilbigdoot 4 hours ago | parent | next [-] | |||||||
I asked Qwen 3.7 pro to create a C# project that takes a string and reverses it, with a single file WASM target. It spun wheels for over 30 minutes and got nothing. I use LLMs all the time to help me diagnose bugs and work through my designs, but again and again, I am super unimpressed by their coding abilities. I can see how in some cases with a proper harness they probably do a decent job at certain tasks, but almost everything I try to do, they flail. | ||||||||
| ||||||||
| ▲ | gmm1990 4 hours ago | parent | prev | next [-] | |||||||
This seems to be a very generic/common response to any ai critique. It kind of reinforces my point there’s a lot of situations where the appropriate harness isn’t some agent that’s set to ultra high thinking mode. Chat mode gives the better response and answers the question more quickly | ||||||||
| ||||||||
| ▲ | 4 hours ago | parent | prev [-] | |||||||
| [deleted] | ||||||||