Remix.run Logo
okwasniewski 2 hours ago

Unfortunately from our experience tests don’t scale as well as code. First of all, static tests are very brittle: you rely on selectors, need wait times, and can’t really test a lot of dynamic content (think AI chats/interactions). Then it’s all the infrastructure around it: solving captchas, handling auth, handling email OTP (each of our agents has access to its own inbox) and handling video recording and screenshots.

To ensure stable results we do a lot of harness engineering, where we inject trajectories of previous tests to ensure the stability and also the split into smaller steps helps to prevent context overload and decision fatigue.

Regarding test case management, our customers have used our CLI to migrate their existing test cases from whatever system they were using before.

ai_slop_hater 5 minutes ago | parent [-]

Why can't you test AI chats?