Remix.run Logo
al_borland a day ago

> ask claude to write, and ask claude to explain

This works, until it doesn’t. I’m continuously shocked by these stories, where so many people put the future of their job/company in the hands of these agents after only a few months of existing.

I still constantly run into bad output from LLMs, from code to basic questions. I don’t understand how anyone can hand things over to something that is laughably wrong on a pretty regular basis, often in subtle ways that won’t be noticed by someone who isn’t reading closely and thinking critically.

They’ve gotten better, but I still regularly give them the old Nick Burns treatment, push it out of the way, and do it myself.

Maxatar a day ago | parent | next [-]

There's nothing shocking about this. The vast majority of software/source code is pretty terrible anyways, code that is full of bugs, slow to use, has little to no automated tests and very hard to maintain.

To the extent that it gets fixed or works at all, it's not because of competent developers doing rigorous analysis of the software, it's because either someone testing it or using it gets annoyed, reports an issue, and then that specific issue gets patched out.

If using LLMs to perform a similar function shocks you, then you should have been shocked already by the proliferation of pretty bad software for the better part of the last couple of decades.

So many criticisms of LLMs assume that people have been writing software very diligently, applying a high standard of engineering, subjecting the code to a battery of rigorous tests, passing it through a strict review process... and that does happen for some software, especially software that is commonly used, but it's not true for the vast majority of software developed.

al_borland a day ago | parent | next [-]

AI is no good, but neither are people, isn’t a great sales pitch.

I think for small tools that people want to make for themselves, that’s great. Where I see a problems are when other people and money get involved. If something goes wrong, who is accountable? Claude wrote it, Claude reviewed it, Claude submitted the PR… yet Claude can’t have any real accountability.

LaundroMat a day ago | parent | next [-]

"A computer can never be held accountable

Therefore a computer must never make a management decision"

-- Internal IBM training manual, 1979

appplication a day ago | parent | prev | next [-]

I think small tools people make for themselves is realistically less than 1% of software produced. Most of the code, and - to the GP’s point - bad code, is produced in corporations with plenty of money and budget.

There is just such a tremendous amount of waste at every company, in that the headcount and software expands to fill the budget. I’m not defending Elon, but look at how much he slashed from X (80% or so?) and the company still has its core product functioning and an active user base.

There is a ton of software (especially internal) at essentially every company that also is low accountability before Claude. “Oh Ted built that but he’s working on a new important project. I understand it’s broken and that’s impacting you but we won’t be able to prioritize this until next quarter at least. Can you set up a meeting next month to discuss?”

Honestly the outcome for all of these LLMs is indeed is likely a higher amount of software with no accountability, but it’s also an improved ability to juggle more of that software to the same (realistically low) standard.

Maxatar a day ago | parent | prev [-]

It's an absolutely phenomenal sales pitch to executives. A ton of automation is sold on the basis that it's probably not going to be as good as having a dedicated person do it, but that automation leads to much lower maintenance scales better, is more deterministic and reproducible.

sshine 21 hours ago | parent | prev [-]

> little to no automated tests

I'm still amazed people don't achieve extremely high test quality, since you get tests "for free" now.

One of the limitations of testing were always that people "design" things so they're hard to test.

And then they argue "This can't be tested", or "Refactoring this for testing is not worth it."

It is now. Yet, I work on codebases with no tests and lots of yolo co-authoring.

stuaxo 6 hours ago | parent | next [-]

You get quantity of tests, but the tests are not good quality by default, at all.

21 hours ago | parent | prev [-]
[deleted]
thisoneisreal a day ago | parent | prev | next [-]

It's a really fun philosophical exercise to ask what it means for them to be "wrong." My perspective is that they are fantastic at association and generalization (of language and symbols in particular), but whether they're identifying the associations you care about or generalizing to the level of abstraction you're aiming for is a complete crapshoot. If you aren't checking and correcting them, and discarding the misfires, you will end up with a very pretty Tower of Babel.

al_borland a day ago | parent | next [-]

One area where I feel safe saying they are “wrong”, rather than just going with a different assumption that was left unsaid, would be when it makes up API endpoints. It sees the general pattern in an API, then makes up an endpoint that sounds good, follows the pattern, but isn’t actually implemented.

I’ve also seen a lot of issues with co-workers using an LLM to write their readme files. I look at the readme for what return values I should get, go to use them, and get an error. I check the code, and sure enough, none of the variables in the readme exist. The LLM just through they sounded good. Things like this I would say are pretty objectively wrong.

tedmiston 5 hours ago | parent [-]

> One area where I feel safe saying they are “wrong”, rather than just going with a different assumption that was left unsaid, would be when it makes up API endpoints. It sees the general pattern in an API, then makes up an endpoint that sounds good, follows the pattern, but isn’t actually implemented.

I remember seeing this maybe 6+ months ago, but using paid plans, RAG, and a high thinking mode has eliminated a ton (almost all) of those kinds of hallucinations. Open models and free tiers are not there yet though.

> I’ve also seen a lot of issues with co-workers using an LLM to write their readme files. I look at the readme for what return values I should get, go to use them, and get an error. I check the code, and sure enough, none of the variables in the readme exist. The LLM just through they sounded good. Things like this I would say are pretty objectively wrong.

LLMs don't co-sign the quality of PRs though — your coworkers do. It's not unusual for docs to get oudated and not be maintained enough in small codebases, but that's not an LLM specific problem.

a day ago | parent | prev [-]
[deleted]
allanmacgregor 16 hours ago | parent | prev | next [-]

AI is just a tool, and, as always, people will use it incorrectly and lazily. Are we forgetting the good old days of Copy/Paste from Stack Overflow?

LLMs just made it more convenient for the same people to take the lazy route.

jiggawatts 6 hours ago | parent | prev | next [-]

I saw this, all of this happening years before ChatGPT existed, but with outsourcing to Indian dev shops.

You'd be shocked how often I see the meat-space equivalent of vibe coding!

"I trust the developers."

"You really shouldn't!"

The thing to realise is that there is no fundamental difference between outsourcing a development task to other human developers versus outsourcing[1] it to LLMs.

Either way, total and complete understanding is being sacrificed in the name of productivity and scalability.

It's just there's one extra layer of work assignment now, with ICs handing off tasks to agents.

What this has revealed to ICs is the BIG issue that has plagued all software development for decades, especially since outsourcing became so popular: Oversight is critical, and more importantly: authority can be delegated, but responsibility cannot.

LLM output is fine, as long as you review everything it does.

This is the same as any competent dev team manager reviewing PRs for quality, paying attention to critical matters such as security, adherence to high level design and low-level style standards, etc.

Some do.

Many never did.

[1] This doesn't have to be a contract with an overseas provider, by "outsourcing" I mean any variant of not-your-own-hands-on-keyboard. Any scenario where a customer or manager assigns tasks to developers other than themselves.

stealthyllama 4 hours ago | parent | prev | next [-]

Even if / when it does work, the value being produced is reduced to the dollars paid to Anthropic or OpenAI or whoever. What are you even contributing? What’s stopping the ai provider from coming in and eating your lunch?

shinobi-apps 20 hours ago | parent | prev | next [-]

it was hype all day long and managers forgot that ai is tool and not some magic stick. tool like dewalt or makita. after ai went out i got expected from some collegues at company to generate 600 700 lines of code or more, i tried to explain i cannot read or understand whats actually happening that fast, but they were like just push, go, copy paste it. complete autodrive mode, insane. then i spend weekend fixing it, making me double mad. whats actually happening is retarded, cos of all stories out there managers thinking that claude generate perfect code, and u could make twitter clone in half a day...

cindyllm a day ago | parent | prev [-]

[dead]