▲ | throwaway13337 2 days ago | |
I don't think it was rigged. Having claude run the browser and then take a screenshot to debug gives similar results. It's why doing so is useless even though it would be so very nice if it worked. Somewhere in the pipeline, they get lazy or ahead of themselves and just interpret what they want to in the picture they see. They want to interpet something working and complete. I can imagine it's related the same issue with LLMs pretending tests work when they don't. They're RL trained for a goal state and sometimes pretending they reached the goal works. It wasn't the wifi - just genAI doing what it does. | ||
▲ | dfedbeef 2 days ago | parent | next [-] | |
For tiny stuff, they are incredible auto-complete tools. But they are basically cover bands. They can do things that have been done to death already. They're good for what they're good for. I wouldn't have bet the farm on them. | ||
▲ | 8note 2 days ago | parent | prev [-] | |
i have a lot of difficulty getting claude to understand arrows in pictures. tried giving it flowcharts, and it fails hard |