Remix.run Logo
claudeIsDown 5 hours ago

I would love to see a more descriptive review from simonw instead of just SVGs generations.

simonw 4 hours ago | parent | next [-]

I try! https://simonwillison.net/2026/Jun/9/claude-fable-5/ and https://simonwillison.net/2026/Jun/11/fable-is-relentlessly-...

lossolo 4 hours ago | parent | prev [-]

He is not an ML researcher or engineer, he is a passionate AI enthusiast blogger. He mostly does SVGs and other low effort checks (sometimes with major flaws, as people have pointed out a few times in the HN comments). Properly evaluating the model across all fronts requires a deep understanding of LLMs, how they work, the trade offs behind new architectures and the relevant research papers. It also takes a lot of time to build a proper evaluation framework so basically you can't just vibe code that if you want something that is solid.

HPMOR 3 hours ago | parent [-]

He created Django, what do you mean he's not an engineer? Also 'low-effort??' his posts are extremely in-depth, clearly very thought through with a significant amount of time and energy. Additionally he does perform multifaceted checks across LLMs in many of his other blog posts.

shwaj 2 hours ago | parent | next [-]

> ML researcher or engineer

The charitable reading is that they meant “ML researcher or ML engineer” with the latter meaning, I guess, an engineer who works on developing LLMs not just using them.

lossolo 2 hours ago | parent [-]

Yes, thank you.

lossolo 2 hours ago | parent | prev [-]

> He created Django, what do you mean he's not an engineer?

I specifically said that he is not an ML engineer (emphasis on ML), so I'm not sure what Python web frameworks have to do with anything.

> Also 'low-effort??' his posts are extremely in-depth, clearly very thought through with a significant amount of time and energy

And yes, low effort. Pelican was low effort, his Fable test was low effort, his HN filter etc. Read the discussion in the comments under the Fable test, it's not just my opinion. There was also another example a few months ago. You can search for it, I don't keep track of these things.

I discussed this with him directly after he called himself an "ML expert" in comments.

This is a classic case of the Gell Mann amnesia effect. I read ML papers and work with ML, but to people outside the industry, his writing can look "extremely in-depth" even though it really isn't. People I work with have the same opinion.

> clearly very thought through with a significant amount of time and energy. Additionally he does perform multifaceted checks across LLMs in many of his other blog posts.

I have never seen an article by him about any model that I would describe that way.

And the most revealing sign that he is not an expert is the type of questions he asks and the mistakes he sometimes makes in the comments here. They show why he is not capable of doing any technically in depth evaluation (at least with his current knowledge level).

If you actually want to learn something as a layperson, read articles written by ML PhDs like Sebastian Raschka or watch Stephen from Welch Labs etc. that are directed at general audience.

algoth1 2 hours ago | parent [-]

We at HN: https://xkcd.com/2501/ to basically say that I think you might be considering low-effort what’s actually an attempt at simplifying - which is arguably higher effort

lossolo 2 hours ago | parent [-]

> you might be considering low-effort what’s actually an attempt at simplifying - which is arguably higher effort

I'm not saying that simplifying complex topics is low-effort, good simplification can obviously require a lot of work and I fully agree here.

What I meant is more that some of these tests feel methodologically sloppy, they are too shallow, miss important technical context, do not control for enough variables etc, yet the conclusions are sometimes presented lets just say... too strongly, as I don't want to be too harsh.

algoth1 an hour ago | parent [-]

Oh, i see. That’s entirely correct. I think the pelican test is more of a meme at this point, similar to Ethan’s Otter on an airplane for video models