Remix clone Hacker News

new | show | ask | jobs Github

	▲	flowerbreeze 4 hours ago
		They haven't made the chart very clear, but it seems it has configurable passes and at 2 passes it's better than Haiku and Sonnet and at 16 passes starts closing in on Opus although it's not quite there, while consistently being less expensive than Sonnet.
	▲	ainch an hour ago \| parent \| next [-]
		pass@k means that you run the model k times and give it a pass if any of the answers is correct. I guess Lean is one of the few use cases where pass@k actually makes sense, since you can automatically validate correctness.
	▲	andai 4 hours ago \| parent \| prev [-]
		Oh my bad. I'm not sure how that works in practice. Do you just keep running it until the tests pass? I guess with formal verification you can run it as many times as you need, right?