Remix clone Hacker News

new | show | ask | jobs Github

	▲	schipperai 2 hours ago
		Cognition did well in documenting their approach [1]. TL;DR - they worked with OSS project maintainers to build tasks. They score models based on whether a PR is mergeable. All tasks are graded by a human researcher. SoTA models have hill-climbing to do which raises the bar and inspires confidence. I'd say it's legit. [1]: https://x.com/cognition/status/2064061031912288715