| ▲ | xyzsparetimexyz 4 hours ago | ||||||||||||||||
I doubt they'd do a very good job of debugging a gpu crash, or visual noise caused by forgotten synchronization, or odd looking shadows. Mayybe for some things you could set it up so that the screen output is livestreamed back into the agent, but I highly doubt that anyone is doing that for agents like this yet | |||||||||||||||||
| ▲ | throwup238 2 hours ago | parent | next [-] | ||||||||||||||||
> Mayybe for some things you could set it up so that the screen output is livestreamed back into the agent, but I highly doubt that anyone is doing that for agents like this yet What do you mean by streaming? LLMs aren’t that advanced yet where they can consume a live video feed but people have been feeding them screenshots from Playwright and desktop apps for years (Anthropic even released the Computer Use feature based on this). Gemini has the best visual intelligence but all three of the major models have supported this for a while. I don’t think it’d help with fixing subtle problems in shadows but it can fix other gui bugs using visual feedback. | |||||||||||||||||
| ▲ | jjmarr 2 hours ago | parent | prev [-] | ||||||||||||||||
I am a GPU programmer (on the compute side), and the biggest challenge is lack of tooling. For host-side code the agent can throw in a bunch of logging statements and usually printf its way to success. For device-side code there isn't a good way to output debugging info into a textual format understandable by the agent. Graphical trace viewers are great for humans, not so great for AI right now. On the other hand, Cline's harness can interact with my website and click on stuff until the bugs are gone. | |||||||||||||||||
| |||||||||||||||||