Remix clone Hacker News

new | show | ask | jobs Github

	▲	consumer451 5 days ago
		It's related to the history of Simon Willison[0] having used this as a benchmark on many models.[1] I believe this model's output is noticeably superior... but yeah, people do tend to get hyperbolic when new stuff happens it their domain of interest. [0] https://news.ycombinator.com/user?id=simonw [1] https://www.google.com/search?q=simon+willison+pelican+ridin...
	▲	littlestymaar 5 days ago \| parent \| next [-]
		> I believe this model's output is noticeably superior Sure, but at the same time Qwen3-30B-A3-2507 is also doing much better than most older models, even the bigger — and more capable — so I don't know how much is due to actual progress and how much is a new version of benchmaxxing.
	▲	ruszki 4 days ago \| parent \| prev [-]
		And nowadays a better known benchmark, so data scientists can overfit their models to it even more, even when LLMs are famous for overfitting. So, I wouldn’t trust any results regarding this specific test nowadays.