Remix.run Logo
sqlite-utils 4.0rc2, mostly written by Claude Fable (for about $149.25)(simonwillison.net)
30 points by ognyankulev 2 hours ago | 18 comments
keizo 3 minutes ago | parent | next [-]

Glad to see others dual wielding: “I used to think that the idea of having one model review the work of another was somewhat absurd—it felt weirdly superstitious. The problem is it really does work”

dreadnip an hour ago | parent | prev | next [-]

The problem I have with this workflow is that the models are still too eager to please. If I ask it to scan a release and note possible issues, it absolutely will find issues. If I keep running the same prompt, it will keep finding issues. I’ve spammed GitHub PR reviews and it just keep finding (or inventing?) new issues. There is never a “Nothing found, good to go!”. I have to keep reminding myself that the model will always give me what I ask for, regardless of the reality/truth.

baq 40 minutes ago | parent | next [-]

You didn’t do it enough. They stop finding bugs eventually. Also, different models can find different bugs (though they do find the same ones, too, which is good and expected). For best results you want to run multi model reviews in loops.

If you had multiple people look at your PRs multiple times on different days results would be very similar.

MallocVoidstar 34 minutes ago | parent [-]

No, depending on the complexity of the issue models can be into loops, where they go "this is definitely an issue and must be fixed", and then the resulting fixed code gets "this is definitely an issue and must be fixed", and then the resulting fixed code has the original 'issue'.

starquake 7 minutes ago | parent | prev | next [-]

I use Claude Code and one of the steps in my workflow is do a review loop until no issues are found and it never loops. So my experience is entirely different. Even if I say: fix all issues. So not only the critical issues.

onion2k 31 minutes ago | parent | prev | next [-]

If I keep running the same prompt, it will keep finding issues.

I've had the same experience, but whenever I've reviewed what it finds it's basically right. It's pedantic, and a lot of the problems aren't things I really care about, but they definitely are real problems.

I'm not sure you can blame the AI for always finding problems if a) you asked it to, and b) there are problems to find.

embedding-shape 36 minutes ago | parent | prev | next [-]

> There is never a “Nothing found, good to go!”.

Like when you do recursive programming, have you tried providing more/better stop conditions? If you literally just say "Continue until there are no more issues" then it'll do just that, but if you scope it better, like "Only mention issues related to X, Y or that leads to Z" and so on, you'll get less noise and more focus on issues that actually matter (to you).

9dev 42 minutes ago | parent | prev | next [-]

There is a point of diminishing returns though; the issues suggested will get speculative, or point out comment unclarity, or "defense in depth". But I agree it’s somewhat annoying to rarely get clear pushback in terms of "no, this looks good enough to me, release it"

threatripper an hour ago | parent | prev | next [-]

You get the same result if you pay humans a good sum of money to find issues.

nvme0n1p1 39 minutes ago | parent [-]

Definitely not. I've never seen a human trapped in that kind of infinite loop. Humans know that if they don't stop at the end of the day, they don't get to go home to their wife, and if they don't finalize their list of issues, they never get their contract paid out.

embedding-shape 33 minutes ago | parent [-]

Pay people per hour of work and even if there is no actual work, people will definitively find a way of spending hours doing things. If you've worked with contractors/outsourced roles before this will happen from time to time.

Tiberium 43 minutes ago | parent | prev [-]

I think this was true with older models, but at least with GPT 5.5 it can genuinely tell you "no issues found" after a few passes of finding real issues.

Tiberium an hour ago | parent | prev | next [-]

The title cost is only if this was raw API usage, but it was included in a subscription, so it's a small subset of the $200 plan:

> I upgraded to the Claude Max $200/month plan (I was previously on $100/month) to increase my Fable allowance for the remaining time until the July 7th Fablepocalypse, when even Claude Max subscribers will have to pay full API cost for the model.

I really wonder if Anthropic will stick with their decision to keep Fable on extra usage credits until they "get more compute", especially in the light of GPT 5.6 very likely coming out next week (it's confirmed to have the exact same pricing as GPT 5.5)

embedding-shape 32 minutes ago | parent | next [-]

> especially in the light of GPT 5.6 very likely coming out next week

Finally have an explanation why GPT 5.5 xhigh felt dumber and dumber these last few weeks, always the same thing when a new model release is about to come out...

toxik 7 minutes ago | parent [-]

Opus has been extremely stupid recently, reckon that's because Fable needs to look appealing?

andy_ppp 38 minutes ago | parent | prev [-]

This is to prevent Chinese labs distilling Claude again right? And free advertising again?

hnbad 41 minutes ago | parent | prev [-]

Fun fact: because AI written works don't have copyright (in the EU at least) and the level of prompting many people engage in doesn't suffice to create a copyrightable "work" and software licenses require you to actually be able to grant a license using rights you hold on a work, not only are many AI generated "works" not actually protected by copyright but by selling licenses you're actually in breach of contract law and may end up owing the licensee software you don't have.

vasco 39 minutes ago | parent [-]

And nothing happened and zero people got in trouble over it.

- Narrator