Remix clone Hacker News

new | show | ask | jobs Github

	▲	sally_glance 7 hours ago
		This is the hard part - especially with larger initiatives, it takes quite a bit of work to evaluate what the current combination of harness + LLM is good at. Running experiments yourself is cumbersome and expensive, public benchmarks are flawed. I wish providers would release at least a set of blessed example trajectories alongside new models. As it is, we're stuck with "yeah it seems this works well for bootstrapping a Next.js UI"...