| ▲ | dostick 4 hours ago | |||||||||||||||||||||||||||||||
Its gotten so bad that Claude will pretend in 10 of 10 cases that task is done/on screenshot bug is fixed, it will even output screenshot in chat, and you can see the bug is not fixed pretty clear there. I consulted Claude chat and it admitted this as a major problem with Claude these days, and suggested that I should ask what are the coordinates of UI controls are on screenshot thus forcing it to look. So I did that next time, and it just gave me invented coordinates of objects on screenshot. I consult Claude chat again, how else can I enforce it to actually look at screenshot. It said delegate to another “qa” agent that will only do one thing - look at screenshot and give the verdict. I do that, next time again job done but on screenshot it’s not. Turns out agent did all as instructed, spawned an agent and QA agent inspected screenshot. But instead of taking that agents conclusion coder agent gave its own verdict that it’s done. It will do anything- if you don’t mention any possible situation, it will find a “technicality” , a loophole that allows to declare job done no matter what. And on top of it, if you develop for native macOS, There’s no official tooling for visual verification. It’s like 95% of development is web and LLM providers care only about that. | ||||||||||||||||||||||||||||||||
| ▲ | deaux 2 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||
> I consulted Claude chat and it admitted this as a major problem with Claude these days, and suggested that I should ask what are the coordinates of UI controls are on screenshot thus forcing it to look If 3 years into LLMs even HNers still don't understand that the response they give to this kind of question is completely meaningless, the average person really doesn't stand a chance. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | steelbrain 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
> And on top of it, if you develop for native macOS, There’s no official tooling for visual verification. It’s like 95% of development is web and LLM providers care only about that. Thinking out loud here, but you could make an application that's always running, always has screen sharing permissions, then exposes a lightweight HTTP endpoint on 127.0.0.1 that when read from, gives the latest frame to your agent as a PNG file. Edit: Hmm, not sure that'd be sufficient, since you'd want to click-around as well. Maybe a full-on macOS accessibility MCP server? Somebody should build that! | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | abrookewood 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
There is a tool called Tidewave that allows you to point and click at an issue and it will pass the DIV or ID or something to the LLM so it knows exactly what you are talking about. Works pretty well. | ||||||||||||||||||||||||||||||||
| ▲ | rudedogg 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
> And on top of it, if you develop for native macOS, There’s no official tooling for visual verification. It’s like 95% of development is web and LLM providers care only about that. I think this is built in to the latest Xcode IIRC | ||||||||||||||||||||||||||||||||
| ▲ | technocrat8080 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
You can provide the screencapture cli as a tool to Claude and it will take screenshots (of specific windows) to verify things visually. | ||||||||||||||||||||||||||||||||
| ▲ | silentkat 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Oh, no, I had these grand plans to avoid this issue. I had been running into it happening with various low-effort lifts, but now I'm worried that it will stay a problem. | ||||||||||||||||||||||||||||||||
| ▲ | canadiantim an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
This is why you need a red-green-refactor TDD skill | ||||||||||||||||||||||||||||||||
| ▲ | to11mtm 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
I mean, I don't use CC itself, just Claude through Copilot IDE plugin for 'reasons'... At at least there it's more honest than GPT, although at work especially it loves to decide not to use the built in tools and instead YOLO on the terminal but doesn't realize it's in powershell not a true nix terminal, and when it gets that right there's a 50/50 shot it can actually read the output (i.e. spirals repeatedly trying to run and read the output). I have had some success with prompting along the lines of 'document unfinished items in the plan' at least... | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | inetknght 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Are you sure you're talking about Claude? Because it sounds like you're describing how a lot of people function. They can't seem to follow instructions either. I guess that's what we get for trying to get LLM to behave human-like. | ||||||||||||||||||||||||||||||||
| ▲ | 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
| [deleted] | ||||||||||||||||||||||||||||||||
| ▲ | SegfaultSeagull 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
What if, stay with me here, AI is actually a communist plot to ensorcell corporations into believing they are accelerating value creation when really they are wasting billions more in unproductive chatting which will finally destroy the billionaire capital elite class and bring about the long-awaited workers’ paradise—delivered not by revolution in the streets, but by millions of chats asking an LLM to “implement it.” Wake up sheeple! | ||||||||||||||||||||||||||||||||
| ▲ | gambiting 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||
>>It’s like 95% of development is web and LLM providers care only about that. I've been trying to use it for C++ development and it's maybe not completely useless, but it's like a junior who very confidently spouts C++ keywords in every conversation without knowing what they actually mean. I see that people build their entire companies around it, and it must be just web stuff, right? Claude just doesn't work for C++ development outside of most trivial stuff in my experience. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||