Remix.run Logo
iugtmkbdfil834 3 hours ago

<< This shows that the author is not very curious because its easy to take the worst examples from the cheapest models and extrapolate.

I find this line of reasoning compelling. Curiosity ( and trying to break things ) will get you a lot fun. The issue I find that people don't even try to break things ( in interesting ways ), but repeat common failure modes more as a gospel and not an observed experiment. The fun thing is that even the strawberry issue tells us more about the limitations of llms than not. In other words, that error is useful...

<< Their status, self worth and everything is attached to it. Anything that disrupts this routine is obviously worth opposing.

There is some of that for sure. Of all days, today I had my manager argue against use of AI for a use case that would affect his buddy's workflow. I let it go, because I am not sure what it actually means, but some resistance is based on 'what we have always done'.

simianwords 3 hours ago | parent [-]

> The fun thing is that even the strawberry issue tells us more about the limitations of llms than not. In other words, that error is useful

That's a fair way to look at it - failure modes tell us something useful about the underlying system. In this case it tells us something about how LLM's work at the token level.

But if you go a step beyond that, you would realise that this problem is solved at a _general_ level with the reasoning models. GPT o1 was internally named strawberry as far as I remember. This would be a nice discussion to have but instead of shallow dismissal of AI as a technology with a failure mode that has been pretty much solved.

What really has not been solved is long context and continual learning (and world model stuff but I don't find that interesting).

iugtmkbdfil834 2 hours ago | parent [-]

<< What really has not been solved is long context and continual learning (and world model stuff but I don't find that interesting).

I wonder about that. In a sense, the solution seems simple.. allow more context. One of the issues, based on progression of chatgpt models, was that too much context allowed for a much easier jailbreak and the fear most corporates have over that make me question the service. Don't get me wrong, I am not one of those people missing 4o for telling me "I love you". I do miss it its nerfed capability to go across all conversations. Working context is was made more narrow now. For a paid sub, that kind of limitation is annoying.

My point is, I know there are some interesting trade-offs to be made ( mostly because I am navigating those on local inference machine ), but with all those data centers one would think, providers have enough power to solve that.. if they so chose.