Remix clone Hacker News

new | show | ask | jobs Github

	▲	maxdo a day ago
		So the benchmark is : Two models with different harness produced very different results . Glm game was completely broken Opus game was at first glance ok but also with bugs Different models with different cost produced different non perfect results . How is it “close” ? :) Also on costs : glm burns more tokens on average vs opus . Gpt5.5 burns less surprisingly