Remix clone Hacker News

new | show | ask | jobs Github

	▲	jjoonathan 2 days ago
		Yeah, the heavily distilled models are very bad with hallucinations. I think they use them to cover for decreased capacity. A 1B model will happily attempt the same complex coding tasks as a 1T model but the hard parts will be pushed into an API call that doesn't exist, lol.