Remix.run Logo
OsrsNeedsf2P 8 hours ago

Our app is a desktop integration and last year we added a local API that could be hit to read and interact with the UI. This unlocked the same thing the author is talking about - the LLM can do real QA - but it's an example of how it can be done even in non-web environments.

Edit: I even have a skill called release-test that does manual QA for every bug we've ever had reported. It takes about 10 hours to run but I execute it inside a VM overnight so I don't care.

8note 3 hours ago | parent [-]

i got me a windows mcp setup running in a sandbox, so it can look at screenshots, see the UIA, and click things either by coordinate or by UIA.

i let it run overnight against a windows app i was working on, and that got it from mostly not working to mostly working.

the loop was

1. look at the code and specs to come up with tests 2. predict the result 3. try it 4. compare the prediction against rhe result 5. file bug report, or call it a success

and then switch to bug fixing, and go back around again. Worked really well in geminicli with the giant context window