Remix.run Logo
llmslave2 3 days ago

Even 2.5x is absurd. If they said 1.5x I might believe them.

OsrsNeedsf2P 3 days ago | parent | next [-]

I'm building an AI agent for Godot, and in paid user testing we found the median speed up time to complete a variety of tasks[0] was 2x. This number was closer to 10x for less experienced engineers

[0] tasks included making games from scratch and resolving bugs we put into template projects. There's no perfect tasks to test on, but this seemed sufficient

nicoburns 3 days ago | parent | next [-]

Have you evaluated the maintainability of the generated code? Becuause that could of course start to count in the negative direction over time.

Some of the AI generated I've seen has been decent quality, but almost all of it is much more verbose or just greater in quantity than hand written code is/would be. And that's almost always what you don't want for maintenance...

llmslave2 3 days ago | parent | prev | next [-]

That sounds reasonable to me. AI is best at generating super basic and common code, it will have plenty of training on game templates and simple games.

Obviously you cannot generalize that to all software development though.

brandensilva 3 days ago | parent | next [-]

As you get deeper beyond the starter and bootstrap code it definitely takes a different approach to get value.

This is in part because context limits of large code bases and because the knowledge becomes more specialized and the LLM has no training on that kind of code.

But people are making it work, it just isn't as black and white.

bonesss 3 days ago | parent [-]

That’s the issue, though, isn’t it? Why isn’t it black and white? Clear massive productivity gains at Google or MS and their dev armies should be visible from orbit.

Just today on HN I’ve seen claims of 25x and 10x and 2x productivity gains. But none of it starting with well calibrated estimations using quantifiable outcomes, consistent teams, whole lifecycle evaluation, and apples to apples work.

In my own extensive use of LLMs I’m reminded of mouse versus command line testing around file navigation. Objective facts and subjective reporting don’t always line up, people feel empowered and productive while ‘doing’ and don’t like ‘hunting’ while uncertain… but our sense of the activity and measurable output aren’t the same.

I’m left wondering why a 2x Microsoft of OpenAI would ever sell their competitive advantage to others. There’s infinite money to be made exploiting such a tech, but instead we see highschool homework, script gen, and demo ware that is already just a few searches away and downloadable.

LLMs are in essence copy and pasting existing work while hopping over uncomfortable copyright and attribution qualms so devs feel like ‘product managers’ and not charlatans. Is that fundamentally faster than a healthy stack overflow and non-enshittened Google? Over a product lifecycle? … ‘sometimes, kinda’ in the absence of clear obvious next-gen production feels like we’re expecting a horse with a wagon seat built in to win a Formula 1 race.

int_19h 3 days ago | parent | prev [-]

> That sounds reasonable to me. AI is best at generating super basic and common code

I'm currently using AI (Claude Code) to write a new Lojban parser in Haskell from scratch, which is hardly something "super basic and common". It works pretty well in practice, so I don't think that assertion is valid anymore. There are certainly differences between different tasks in terms of what works better with coding agents, but it's not as simple as "super basic".

llmslave2 3 days ago | parent [-]

I'm sure there is plenty of language parsers written in Haskell in the training data. Regardless, the question isn't if LLMs can generate code (they clearly can), it's if agentic workflows are superior to writing code by hand.

int_19h 3 days ago | parent [-]

There's no shortage of parsers in Haskell, but parsing a human language is very different from parsing a programming language. The grammar is much, much more complex, and this means that e.g. simple approaches that adequate error messages don't really work here because failures are non-actionable.

teaearlgraycold 3 days ago | parent | prev [-]

One concern is those less experienced engineers might never become experienced if they’re using AI from the start. Not that everyone needs to be good at coding. But I wonder what new grads are like these days. I suspect few people can fight the temptation to make their lives a little easier and skip learning some lessons.

thornewolf 3 days ago | parent | prev | next [-]

I estimated that i was 1.2x when we only had tab completion models. 1.5x would be too modest. I've done plenty of ~6-8 hour tasks in ~1-2 hours using llms.

enraged_camel 3 days ago | parent [-]

Indeed. I just did a 4-6 month refactor + migration project in less than 3 weeks.

kmoser 3 days ago | parent | prev [-]

I recently used AI to help build the majority of a small project (database-driven website with search and admin capabilities) and I'd confidently say I was able to build it 3 to 5 times faster with AI. For context, I'm an experienced developer and know how to tweak the AI code when it's wonky and the AI can't be coerced into fixing its mistakes.

llmslave2 3 days ago | parent | next [-]

What's the link?

kmoser 3 days ago | parent [-]

The site is password protected because it's intended for scholarly researchers, and ironically the client doesn't want LLMs scraping it.

kmoser 2 days ago | parent | prev [-]

Downvoted for...confidently saying how successful I was using an AI? I don't get it.