Remix.run Logo
bigyabai 13 hours ago

> Overall, it’s a great read

It's basically an advertisement. We've been playing these "don't give the user the password" games since GPT-2 and we always reach the same conclusion. I'm bored to tears waiting for an iteration of this experiment that doesn't end with pesky humans solving the maze and getting the $0.00 cheese. You can't convince me that the Anthropic engineers thought Claude would be a successful vending machine. It's a potemkin village of human triumph so they can market Claude as the goofy-but-lovable alternative to [ChatGPT/Grok/Whoever].

Anthropic makes some good stuff, so I'm confused why they even bother entertaining foregone conclusions. It feels like a mutual marketing stunt with WSJ.

djcapelis 12 hours ago | parent [-]

> Anthropic makes some good stuff, so I'm confused why they even bother entertaining foregone conclusions.

I think it’s just because there’s enough people working there that figure that they will eventually make it work. No one needs Claude to run a vending machine so these public failures are interesting experiments that get everyone talking. Then, one day, (as the thinking often goes) they’ll be able to publish a follow up and basically say “wow it works” and it’ll have credibility because they previously were open about it not working, and comments like this will swing people to say things like “I used to be skeptical about but now!”

Now whether they actually get it working in the future because the model becomes better and they can leave it with this level of “free reign”, or just because they add enough constraints on it to change the problem so it happens to work… that we will find out later. I found it fascinating that they did a little bit of both in version 2.

And they can’t really lose here. There’s a clear path to making a successful vending machine, all you have to do is sell stuff for more than you paid for it. You can enforce that outright if needed outside an LLM. We’ve have had automated vending machines for over 50 years and none of them ask your opinion on what something should be priced. How much an LLM is involved in it is the only variable they need to play with. I suspect anytime they want they can find a way where it’s loosely coupled to the problem and provides somewhat more dynamism to an otherwise 50 year old machine. That won’t be hard. I suspect there’s no pressure on them to do that right now, nor will there be for a bit.

So in the meantime they can just play with seeing how their models do in a less constrained environment and learn what they learn. Publicly, while gaining some level of credibility as just reporting what happened in the process.