Remix clone Hacker News

new | show | ask | jobs Github

	▲	gf000 3 hours ago
		Not the parent poster, but I did get the wrong answer even with reasoning turned on.
	▲	tezza 2 hours ago \| parent [-]
		Thank you all! We needed further data points. comparing one shot results is a foolish way to evaluate a statistical process like LLM answers. we need multiple samples. for https://generative-ai.review I do at least three samples of output. this often yields very differnt results even from the same query. e.g: https://generative-ai.review/2025/11/gpt-image-1-mini-vs-gpt...