Remix clone Hacker News

new | show | ask | jobs Github

	▲	qwm 3 days ago
		My favorite benchmark for LLMs and agents is to have it port a medium-complexity library to another programming language. If it can do that well, it's pretty capable of doing real tasks. So far, I always have to spend a lot of time fixing errors. There are also often deep issues that aren't obvious until you start using it.
	▲	Rastonbury 3 days ago \| parent [-]
		Comments on here often criticise ports as easy for LLMs to do because there's a lot of training and tests are all there, which is not as complex as real word tasks