Remix.run Logo
bisonbear 2 days ago

Very cool, interested to read more once you post! FWIW I've been building eval infras that does something adjacent/related — replaying real repo work against different agent configs, and measuring the agent's quality dimensions (pass/fail, but also human intent alignment, code review, etc.). If you want to compare notes on the harness design, or if having an independent eval of lat vs. no-lat on quickjs would be useful, happy to chat :)