Remix clone Hacker News

new | show | ask | jobs Github

	▲	thatwasunusual 9 hours ago
		> It's not nessessarily the best benchmark, it's a popular one, probably because it's funny. > Yes it's like the wine glass thing. No, it's not! That's part of my point; the wine glass scenario is a _realistic_ scenario. The pelican riding a bike is not. It's a _huge_ difference. Why should we measure intelligence (...) in regards to something that is realistic and something that is unrealistic? I just don't get it.
	▲	Fnoord 5 hours ago \| parent \| next [-]
		> the wine glass scenario is a _realistic_ scenario It is unrealistic because if you go to a restaurant, you don't get served a glass like that. It is frowned upon (alcohol is a drug, after all) and impractical (wine stains are annoying) to fill a glass of wine as such. A pelican riding a bike, on the other hand, is realistic in a scenario because of TV for children. Example from 1950's animation/comic involving a pelican [1]. [1] https://en.wikipedia.org/wiki/The_Adventures_of_Paddy_the_Pe...
	▲	vikramkr 6 hours ago \| parent \| prev [-]
		If the thing we're measuring is a the ability to write code, visually reason, and handle extrapolating to out of sample prompts, then why shouldn't we evaluate it by asking it to write code to generate a strange image that it wouldn't have seen in its training data?