Remix clone Hacker News

new | show | ask | jobs Github

	▲	paradite 3 days ago
		Hey. I like your roast on benchmarks. I also publish my own evals on new models (using coding tasks that I curated myself, without tools, rated by human with rubrics). Would love you to check out and give your thoughts: Example recent one on GPT-5: https://eval.16x.engineer/blog/gpt-5-coding-evaluation-under... All results: https://eval.16x.engineer/evals/coding