Remix.run Logo
marcosdumay 10 days ago

If feels like the entire world has gone crazy.

Even the serious idea that the article thinks could work is throwing the unreliable LLMs at verification! If there's any place you can use something that doesn't work most of the time, I guess it's there.

deadbabe 10 days ago | parent | next [-]

This is typical of any hype bubble. Blockchain used to be the answer to everything.

Mistletoe 10 days ago | parent [-]

What's after this? Because I really do feel the economy is standing on a cliff right now. I don't see anything after this that can prop stocks up.

dgfitz 9 days ago | parent | next [-]

That’s because we are still waiting for the 2008 bubble to pop, which was inflated by the 2020 bubble. It’s going to be bad. People will blame trump, Harris would be eating the same shit sandwich.

It’s gonna be bad.

marcosdumay 9 days ago | parent [-]

What makes you think he won't just inflate the bubble again?

Should we expect money pumps to generate inflation quicker on this cycle than on the last ones? If so, why?

dgfitz 9 days ago | parent [-]

I think only an ignorant person doesn’t see the train wreck coming, and how making more money won’t fix fuck all.

deadbabe 10 days ago | parent | prev [-]

The post-quantum age. Companies will go post-quantum.

namaria 9 days ago | parent [-]

I think the operators are learning how to hype-edge. You find that sweet spot between promising and 'not just there yet' where you can take lots of investments and iterate forward just enough to keep it going.

It doesn't matter if it can't actually 'get there' as long as people still believe it can.

Come to think about it, a socioeconomic system dependent on population and economic growth is at a fundamental level driven by this balancing act: "We can solve every problem if we just forge ahead and keep enlarging the base of the pyramid - keep reproducing, keep investing, keep expanding the infrastructure".

edmundsauto 10 days ago | parent | prev | next [-]

Only if it fails in the same way. LLMs and the multi-agent approach operate under the assumption that they are programmable agents and each agent is more of a trade off against failure modes. If you can string them together, and if the output is easily verified, it can be a great fit for the problem.

astrange 9 days ago | parent [-]

If you're going to do that you need completely different LLMs to base the agents on. The ones I've tried have "mode collapse" - ask them to emulate different agents and they'll all end up behaving the same way. Simple example, if you ask it to write different stories they'll usually end up having the same character names.

edmundsauto 8 days ago | parent [-]

It may depend on the domain. I tend to use LLMs for things that are less open ended, more categorization and summarization response than pure novel creation.

In these situations, I’ve been able to sufficiently program the agent that I haven’t seen too much of an issue as you described. Consistency is a feature.

ajuc 10 days ago | parent | prev | next [-]

It's similar in regular programming - LLMs are better at writing test code than actual code. Mostly because it's simpler (P vs NP etc), but I think also because it's less obvious when test code doesn't work.

Replace all asserts with expected ==expected and most people won't notice.

majormajor 10 days ago | parent | next [-]

LLMs are pretty damn useful for generating tests, getting rid of a lot of tedium, but yeah, it's the same as human-written tests: if you don't check that your test doesn't work when it shouldn't (not the same thing as just writing a second test for that case - both those tests need to fail if you intentionally screw with their separate fixtures), then you shouldn't have too much confidence in your test.

marcosdumay 9 days ago | parent [-]

If LLMs can generate a test for you, it's because it's a test that you shouldn't need to write. They can't test what is really important, at all.

Some development stacks are extremely underpowered for code verification, so they do patch the design issue. Just like some stacks are underpowered for abstraction and need patching by code generation. Both of those solve an immediate problem, in a haphazard and error-prone way, by adding burden on maintenance and code evolution linearly to how much you use it.

And worse, if you rely too much on them they will lead your software architecture and make that burden superlinear.

williamcotton 9 days ago | parent [-]

Claude wrote the harness and pretty much all of these tests, eg:

https://github.com/williamcotton/search-input-query/blob/mai...

It is a good test suite and it saved me quite a bit of typing!

In fact, Claude did most of the typing for the entire project:

https://github.com/williamcotton/search-input-query

BTW, I obviously didn't just type "make a lexer and multi-pass parser that returns multiple errors and then make a single-line instance of a Monaco editor with error reporting, type checking, syntax highlighting and tab completion".

I put it together piece-by-piece and with detailed architectural guidance.

jeltz 10 days ago | parent | prev | next [-]

> Replace all asserts with expected ==expected and most people won't notice.

Those tests were very common back when I used to work in Ruby on Rails and automatically generating test stubs was a popular practice. These stubs were often just converted into expected == expected tests so that they passed and then left like that.

MichaelNolan 10 days ago | parent | prev | next [-]

> Replace all asserts with expected == expected and most people won't notice.

It’s too resource intensive for all code, but mutation testing is pretty good at finding these sorts of tests that never fail. https://pitest.org/

rsynnott 9 days ago | parent | prev [-]

I mean, define ‘better’. Even with actual human programmers, tests which do not in fact test the thing are already a bit of an epidemic. A test which doesn’t test is worse than useless.

FredPret 9 days ago | parent | prev | next [-]

This happens all the time.

Once it was spices. Then poppies. Modern art. The .com craze. Those blockchain ape images. Blockchain. Now LLM.

All of these had a bit of true value and a whole load of bullshit. Eventually the bullshit disappears and the core remains, and the world goes nuts about the next thing.

vishnugupta 9 days ago | parent [-]

Exactly. I’ve seen this enough now to appreciate that oft repeated tech adoption curve. It seems like we are in “peak expectations” phase which is immediately followed by the disillusionment and then maturity phase.

cwzwarich 9 days ago | parent | prev [-]

If your LLM is producing a proof that can be checked by another program, then there’s nothing wrong with their reliability. It’s just like playing a game whose rules are a logical system.