Remix.run Logo
bloppe 2 days ago

I agree that the apocalyptic messaging about mythos is eye-rolling, but the thesis of the article that "the moat is the system, not the model" is weird because the point is that the model is the whole system. A little Bash loop that just tells the model to "look at this file" for every file is clearly not a "moat" of a system

windexh8er 2 days ago | parent | next [-]

Is it, though? In a way: yes. But look at where the focus of LLMs has gone: agentic frameworks. Yet, we see all of the models continually being compared against benchmarks that can easily be gamed by the model itelf [0].

There's no great way to garner the quality / efficacy of something non-deterministic that you can't trust, at least not currently. And I wouldn't be surprised that the providers haven't known that their LLMs could possibly be cheating for a while now.

On one hand they're saying: these models are so apocalyptic if everyone had them, and then on the other hand showcasing how their models are sweeping the floor on benchmarks. So which is it? Personally I don't believe any of these companies at this point, especially when they make claims that are non-public and wrapped in NDAs that benefit their bottom line.

[0] https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/

airstrike 2 days ago | parent | prev [-]

While I agree this is true of coding, there are other domains and paradigms in which the loop is more involved than a bash loop.

Realizing this fact explains:

1. why software development is first to get disrupted by AI

2. other domains that are easily loopable like contract review are also quite easy to deploy AI into, so you get all these "AI for Law" running around doing essentially the same thing

3. domains that are not easily loopable are much harder to figure out leading people to believe AI can't be useful, when in fact it's a failure of the application layer