Remix.run Logo
tonyww 2 hours ago

It’s mostly the former: there’s a small set of generic checks/primitives, and we choose which ones to apply per step.

The binding between “task/step” and “what to verify” can come from either:

the user (explicit assertions), or the planner/executor proposing a post-condition (e.g. “after clicking checkout, URL contains /checkout and a checkout button exists”).

But the verifier itself is not an AI, by design it’s predicate-only