Remix.run Logo
tombert 4 hours ago

I find it a bit odd that people are acting like this stuff is an abject failure because it's not perfect yet.

Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.

Yes, people have probably been deploying it in spots where it's not quite ready but it's myopic to act like it's "not going all that well" when it's pretty clear that it actually is going pretty well, just that we need to work out the kinks. New technology is always buggy for awhile, and eventually it becomes boring.

maccard 4 hours ago | parent | next [-]

> Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.

Every 2/3 months we're hearing there's a new model that just blows the last one out of the water for coding. Meanwhile, here I am with Opus and Sonnet for $20/mo and it's regularly failing at basic tasks, antigravity getting stuck in loops and burning credits. We're talking "copy basic examples and don't hallucinate APIs" here, not deep complicated system design topics.

It can one shot a web frontend, just like v0 could in 2023. But that's still about all I've seen it work on.

Aurornis 4 hours ago | parent | next [-]

You’re doing exactly the thing that the parent commenter pointed out: Complaining that they’re not perfect yet as if that’s damning evidence of failure.

We all know LLMs get stuck. We know they hallucinate. We know they get things wrong. We know they get stuck in loops.

There are two types of people: The first group learns to work within these limits and adapt to using them where they’re helpful while writing the code when they’re not.

The second group gets frustrated every time it doesn’t one-shot their prompt and declares it all a big farce. Meanwhile the rest of us are out here having fun with these tools, however limited they are.

maccard 2 hours ago | parent [-]

Someone else said this perfectly farther down:

> The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.

As I’ve said, I use LLMs, and I use tools that are assisted by LLMs. They help. But they don’t work anywhere near as reliably as people talk about them working. And that hasn’t changed in the 18 months since I first promoted v0 to make me a website.

nonethewiser 19 minutes ago | parent | prev | next [-]

>Every 2/3 months we're hearing there's a new model that just blows the last one out of the water for coding

I haven't heard that at all. I hear about models that come out and are a bit better. And other people saying they suck.

>Meanwhile, here I am with Opus and Sonnet for $20/mo and it's regularly failing at basic tasks, antigravity getting stuck in loops and burning credits.

Is it bringing you any value? I find it speeds things up a LOT.

tombert 4 hours ago | parent | prev | next [-]

Sure, but think about what it's replacing.

If you hired a human, it will cost you thousands a week. Humans will also fail at basic tasks, get stuck in useless loops, and you still have to pay them for all that time.

For that matter, even if I'm not hiring anyone, I will still get stuck on projects and burn through the finite number of hours I have on this planet trying to figure stuff out and being wrong for a lot of it.

It's not perfect yet, but these coding models, in my mind, have gotten pretty good if you're specific about the requirements, and even if it misfires fairly often, they can still be useful, even if they're not perfect.

I've made this analogy before, but to me they're like really eager-to-please interns; not necessarily perfect, and there's even a fairly high risk you'll have to redo a lot of their work, but they can still be useful.

falloutx 4 hours ago | parent | next [-]

I am an AI-skeptic but I would agree this looks impressive from certain angles, especially if you're an early startup (maybe) or you are very high up the chain and just want to focus on cutting costs. On the other hand, if you are about to be unemployed, this is less impressive. Can it replace a human? I would say no its still long way to go, but a good salesman can convince executives that it does and thats all that matters.

xp84 3 hours ago | parent | next [-]

> On the other hand, if you are about to be unemployed, this is less impressive

> salesman can convince executives that it does

I tend to think that reality will temper this trend as the results develop. Replacing 10 engineers with one engineer using Cursor will result in a vast velocity hit. Replacing 5 engineers with 5 "agents" assigned to autonomously implement features will result in a mess eventually. (With current technology -- I have no idea what even 2027 AI will do). At that point those unemployed engineers will find their phones ringing off the hook to come and clean up the mess.

Not that unlike what happens in many situations where they fire teams and offshore the whole thing to a team of average developers 180 degrees of longitude away who don't have any domain knowledge of the business or connections to the stakeholders. The pendulum swings back in the other direction.

tombert 4 hours ago | parent | prev [-]

I just think Jevins paradox [1]/Gustafson's Law [2] kind of applies here.

Maybe I shouldn't have used the word "replaced", as I don't really think it's actually going to "replace" people long term. I think it's likely to just lead to higher output as these get better and better .

[1] https://en.wikipedia.org/wiki/Jevons_paradox

[2] https://en.wikipedia.org/wiki/Gustafson%27s_law

falloutx 3 hours ago | parent [-]

Not you, but the word replaced is the being used all the time. Even senior engineers are saying they are using it as a junior engineers while we can easily hire junior engineers (but Execs don't want to). Jevon's paradox wont work in Software because user's wallets and time is limited, and if software becomes too easy to build, it becomes harder to sell. Normal people can have 5 subscriptions, may be 10, but they wont be going to 50 or 100. I would say we would have already exhausted users already, with all the bad practices.

maccard 2 hours ago | parent | prev [-]

You’ve missed my point here - I agree that gen AI has changed everything and is useful, _but_ I disagree that it’s improved substantially - which is what the comment I replied to claimed.

Anecdotally I’ve seen no difference in model changes in the last year, but going from LLM to Claude code (where we told the LLMs they can use tools on our machines) was a game changer. The improvement there was the agent loop and the support for tools.

In 2023 I asked v0.dev to one shot me a website for a business I was working on and it did it in about 3 minutes. I feel like we’re still stuck there with the models.

tombert 2 hours ago | parent [-]

In my experience it has gotten considerably better. When I get it to generate C, it often gets the pointer logic correct, which wasn't the case three years ago. Three years ago, ChatGPT would struggle with even fairly straightforward LaTeX, but now I can pretty easily get it to generate pretty elaborate LaTeX and I have even had good success generating LuaTeX. I've been able to fairly successfully have it generate TLA+ spec from existing code now, which didn't work even a year ago when I tried it.

Of course, sample size of one, so if you haven't gotten those results then fair enough, but I've at least observed it getting a lot better.

BeetleB 4 hours ago | parent | prev | next [-]

> We're talking "copy basic examples and don't hallucinate APIs" here, not deep complicated system design topics.

If your metric is an LLM that can copy/paste without alterations, and never hallucinate APIs, then yeah, you'll always be disappointed with them.

The rest of us learn how to be productive with them despite these problems.

drewbug01 4 hours ago | parent [-]

> If your metric is an LLM that can copy/paste without alterations, and never hallucinate APIs, then yeah, you'll always be disappointed with them.

I struggle to take comments like this seriously - yes, it is very reasonable to expect these magical tools to copy and paste something without alterations. How on earth is that an unreasonable ask?

The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.

What, precisely, are they good for?

ubercow13 4 hours ago | parent | next [-]

It seems like just such a weird and rigid way to evaluate it? I am a somewhat reasonable human coder, but I can't copy and paste a bunch of code without alterations from memory either. Can someone still find a use for me?

tombert 4 hours ago | parent | prev | next [-]

I think what they're best at right now is the initial scaffolding work of projects. A lot of the annoying bootstrap shit that I hate doing is actually generally handled really well by Codex.

I agree that there's definitely some overhype to them right now. At least for the stuff I've done they have gotten considerably better though, to a point where the code it generates is often usable, if sub-optimal.

For example, about three years ago, I was trying to get ChatGPT to write me a C program to do a fairly basic ZeroMQ program. It generated something that looked correct, but it would crash pretty much immediately, because it kept trying to use a pointer after free.

I tried the same thing again with Codex about a week ago, and it worked out of the box, and I was even able to get it to do more stuff.

smithkl42 3 hours ago | parent [-]

I think it USED to be true that you couldn't really use an LLM on a large, existing codebase. Our codebase is about 2 million LOC, and a year ago you couldn't use an LLM on it for anything but occasional small tasks. Now, probably 90% of the code I commit each week was written by Claude (and reviewed by me and other humans - and also by Copilot and ZeroPath).

BeetleB 3 hours ago | parent | prev | next [-]

For a long time, I've wanted to write a blog post on why programmers don't understand the utility of LLMs[1], whereas non-programmers easily see it. But I struggle to articulate it well.

The gist is this: Programmers view computers as deterministic. They can't tolerate a tool that behaves differently from run to run. They have a very binary view of the world: If it can't satisfy this "basic" requirement, it's crap.

Programmers have made their career (and possibly life) being experts at solving problems that greatly benefit from determinism. A problem that doesn't - well either that needs to be solved by sophisticated machine learning, or by a human. They're trained on essentially ignoring those problems - it's not their expertise.

And so they get really thrown off when people use computers in a nondeterministic way to solve a deterministic problem.

For everyone else, the world, and its solutions, are mostly non-deterministic. When they solve a problem, or when they pay people to solve a problem, the guarantees are much lower. They don't expect perfection every time.

When a normal human asks a programmer to make a change, they understand that communication is lossy, and even if it isn't, programmers make mistakes.

Using a tool like an LLM is like any other tool. Or like asking any other human to do something.

For programmers, it's a cardinal sin if the tool is unpredictable. So they dismiss it. For everyone else, it's just another tool. They embrace it.

[1] This, of course, is changing as they become better at coding.

maccard 2 hours ago | parent [-]

I’m perfectly happy for my tooling to not be deterministic. I’m not happy for it to make up solutions that don’t exist, and get stuck in loops because of that.

I use LLMs, I code with a mix of antigravity and Claude code depending on the task, but I feel like I’m living in a different reality when the code I get out of these tools _regularly just doesn’t work, at all_. And to the parents point, I’m doing something wrong for noticing that?

BeetleB 2 hours ago | parent [-]

If it were terrible, you wouldn't use them, right? Isn't the fact that you continue to use AI coding tools a sign that you find them a net positive? Or is it being imposed on you?

> And to the parents point, I’m doing something wrong for noticing that?

There's nothing wrong pointing out your experience. What the OP was implying was he expects them to be able to copy/paste reliably almost 100% of the time, and not hallucinate. I was merely pointing out that he'll never get that with LLMs, and that their inability to do so isn't a barrier to getting productive use out of them.

blibble 4 hours ago | parent | prev | next [-]

> What, precisely, are they good for?

scamming people

falloutx 3 hours ago | parent | prev [-]

Its strong enough to replace humans at their jobs and weak enough that it cant do basic things. Its a paradox. Just learn to be productive with them. Pay $200/month and work around with its little quirks. /s

an hour ago | parent | prev | next [-]
[deleted]
elzbardico 4 hours ago | parent | prev [-]

There’s a subtle point a moment when you HAVE to take the driver wheel from the AI. All issues I see are from people insisting to use far beyond the point it stops being useful.

It is a helper, a partner, it is still not ready go the last mile

xp84 3 hours ago | parent | next [-]

It's funny how many people don't get that. It's like adding a pretty great senior or staff level engineer to sit on-call next to every developer and assist them, for basically free (I've never used any of the expensive stuff yet. Just things like Copilot, Grok Code in JetBrains, just asking Gemini to write bits of code for me).

If you hired a staff engineer to sit next to me, and I just had him/her write 100% of the code and never tried to understand it, that would be an unwise decision on my part and I'd have little room to complain about the times he made mistakes.

maccard 2 hours ago | parent | prev [-]

As someone else said in this thread:

> The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.

I’m perfectly happy to write code, to use these tools. I do use them, and sometimes they work (well). Other times they have catastrophic failures. But apparently it’s my failure for not understanding the tool or expecting too much of the tool, while others are screaming from the rooftops about how this new model changes everything (which happens every 3 months at this point)

elzbardico 5 minutes ago | parent [-]

There's no silver bullet. I’m not a researcher, but I’ve done my best to understand how these systems work—through books, video courses, and even taking underpaid hourly work at a company that creates datasets for RLHF. I spent my days fixing bugs step-by-step, writing notes like, “Hmm… this version of the library doesn’t support protocol Y version 4423123423. We need to update it, then refactor the code so we instantiate ‘blah’ and pass it to ‘foo’ before we can connect.”

That experience gave me a deep appreciation for how incredible LLMs are and the amazing software they can power—but it also completely demystified them. So by all means, let’s use them. But let’s also understand there are no miracles here. Go back to Shannon’s papers from the ’60s, and you'll understand that what seems to you like "emerging behaviors" are quite explainable from an information theory background. Learn how these models are built. Keep up with the latests research papers. If you do, you’ll recognize their limitations before those limitations catch you by surprise.

There is no silver bullet. And if you think you’ve found one, you’re in for a world of pain. Worse still, you’ll never realize the full potential of these tools, because you won’t understand their constraints, their limits, or their pitfalls.

nonethewiser 20 minutes ago | parent | prev | next [-]

>Generative AI, as we know it, has only existed ~5-6 years

Probably less than that, practically speaking. ChatGPT's initial release date was November 2022. It's closer to 3 years, in terms of any significant amount of people using them.

barbazoo 4 hours ago | parent | prev | next [-]

We implement pretty cool workflows at work using "GenAI" and the users of our software are really appreciative. It's like saying a hammer sucks because it breaks most things you hit with it.

onlyrealcuzzo 3 hours ago | parent | prev | next [-]

> Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.

I think the big problem is that the pace of improvement was UNBELIEVABLE for about 4 years, and it appears to have plateaued to almost nothing.

ChatGPT has barely improved in, what, 6 months or so.

They are driving costs down incredibly, which is not nothing.

But, here's the thing, they're not cutting costs because they have to. Google has deep enough pockets.

They're cutting costs because - at least with the current known paradigm - the cost is not worth it to make material improvements.

So unless there's a paradigm shift, we're not seeing MASSIVE improvements in output like we did in the previous years.

You could see costs go down to 1/100th over 3 years, seriously.

But they need to make money, so it's possible non of that will be passed on.

tombert 3 hours ago | parent | next [-]

I think that even if it never improves, its current state is already pretty useful. I do think it's going to improve though I don't think AGI is going to happen any time soon.

I have no idea what this is called, but it feels like a lot of people assume that progress will continue at a linear pace for forever for things, when I think that generally progress is closer to a "staircase" shape. A new invention or discovery will lead to a lot of really cool new inventions and discoveries in a very short period of time, eventually people will exhaust the low-to-middle-hanging fruit, and progress kind of levels out.

I suspect it will be the same way with AI; I don't now if we've reached the top of our current plateau, but if not I think we're getting fairly close.

sheeh 3 hours ago | parent | prev [-]

They are focused on reducing costs in order to survive. Pure and simple.

Alphabet / Google doesn’t have that issue. OAI and other money losing firms do.

1970-01-01 3 hours ago | parent | prev | next [-]

>and is likely to keep improving.

I'm not trying to be pedantic, but how did you arrive at 'keep improving' as a conclusion? Nobody is really sure how this stuff actually works. That's why AI safety was such a big deal a few years ago.

tombert 2 hours ago | parent [-]

Totally reasonable question, and I only am making an assumption based on observed progress. AI generated code, at least in my personal experience, has gotten a lot better, and while I don't think that will go to infinity, I do think that there's still more room for improvement that could happen.

I will acknowledge that I don't have any evidence of this claim, so maybe the word "likely" was unwise, as that suggests probability. Feel free to replace "is "likely to" with "it feels like it will".

jbs789 4 hours ago | parent | prev [-]

Because the likes of Altman have set short term expectations unrealistically high.

tombert 4 hours ago | parent | next [-]

I mean that's every tech company.

I made a joke once after the first time I watched one of those Apple announcement shows in 2018, where I said "it's kind of sad, because there won't be any problems for us to solve because the iPhone XS Max is going to solve all of them".

The US economy is pretty much a big vibes-based Ponzi scheme now, so I don't think we can single-out AI, I think we have to blame the fact that the CEOs running these things face no negative consequences for lying or embellishing and they do get rewarded for it because it will often bump the stock price.

Is Tesla really worth more than every other car company combined in any kind of objective sense? I don't think so, I think people really like it when Elon lies to them about stuff that will come out "next year", and they feel no need to punish him economically.

Terr_ 9 minutes ago | parent [-]

"Ponzi" requires records fraud and is popularly misused, sort of like if people started describing every software bug as "a stack overflow."

I'd rather characterize it as extremes of Greater Fool Theory.

https://en.wikipedia.org/wiki/Greater_fool_theory

hamdingers 3 hours ago | parent | prev [-]

I maintain that most anti-AI sentiment is actually anti-lying-tech-CEO sentiment misattributed.

The technology is neat, the people selling it are ghouls.

sroerick 5 minutes ago | parent | next [-]

This is how I felt about Bitcoin.

acdha 3 hours ago | parent | prev [-]

Exactly: the technology is useful but because the executive class is hyping it as close to AGI because their buddies are slavering for layoffs. If that “when do you get fired?” tone wasn’t behind the conversation, I think a lot of people would be interested in applying LLMs to the smaller subset of things they actually perform well at.

tombert 2 hours ago | parent [-]

Maybe CEOs should face consequences for going on the stage and outwardly lying. Instead they're rewarded by a bump in stock price because people appear to have amnesia.