Remix.run Logo
verall 3 days ago

> This week, I used it to write ESP32 firmware and a Linux kernel driver.

I'm not meaning to be negative at all, but was this for a toy/hobby or for a commercial project?

I find that LLMs do very well on small greenfield toy/hobby projects but basically fall over when brought into commercial projects that often have bespoke requirements and standards (i.e. has to cross compile on qcc, comply with autosar, in-house build system, tons of legacy code laying around maybe maybe not used).

So no shade - I'm just really curious what kind of project you were able get such good results writing ESP32 FW and kernel drivers for :)

lukebechtel 3 days ago | parent | next [-]

Maintaining project documentation is:

(1) Easier with AI

(2) Critical for letting AI work effectively in your codebase.

Try creating well structured rules for working in your codebase, put in .cursorrules or Claude equivalent... let AI help you... see if that helps.

theshrike79 3 days ago | parent | next [-]

The magic to using agentic LLMs efficiently is...

proper project management.

You need to have good documentation, split into logical bits. Tasks need to be clearly defined and not have extensive dependencies.

And you need to have a simple feedback loop where you can easily run the program and confirm the output matches what you want.

troupo 3 days ago | parent | prev [-]

And the chance of that working depends on the weather, the phase of the moon and the arrangement of bird bones in a druidic augury.

It's a non-deterministic system producing statistically relevant results with no failure modes.

I had Cursor one-shot issues in internal libraries with zero rules.

And then suggest I use StringBuilder (Java) in a 100% Elixir project with carefully curated cursor rules as suggested by the latest shamanic ritual trends.

GodelNumbering 3 days ago | parent | prev | next [-]

This is my experience too. Also, their propensity to jump into code without necessarily understanding the requirement is annoying to say the least. As the project complexity grows, you find yourself writing longer and longer instructions just to guardrail.

Another rather interesting thing is that they tend to gravitate towards sweep the errors under the rug kind of coding which is disastrous. e.g. "return X if we don't find the value so downstream doesn't crash". These are the kind of errors no human, even a beginner on their first day learning to code, wouldn't make and are extremely annoying to debug.

Tl;dr: LLMs' tendency to treat every single thing you give it as a demo homework project

verall 3 days ago | parent | next [-]

> Another rather interesting thing is that they tend to gravitate towards sweep the errors under the rug kind of coding which is disastrous. e.g. "return X if we don't find the value so downstream doesn't crash".

Yes, these are painful and basically the main reason I moved from Claude to Gemini - it felt insane to be begging the AI - "No, you actually have to fix the bug, in the code you wrote, you cannot just return some random value when it fails, it actually has to work".

GodelNumbering 3 days ago | parent [-]

Claude in particular abuses the word 'Comprehensive' a lot. You express that you're unhappy with its approach, it will likely comeback with "Comprehensive plan to ..." and then write like 3 bullet points under it, that is of course after profusely apologizing. On a sidenote, I wish LLMs never apologized and instead just said I don't know how to do this.

tombot 3 days ago | parent | prev | next [-]

> their propensity to jump into code without necessarily understanding the requirement is annoying to say the least.

Then don't let it, collaborate on the spec, ask Claude to make a plan. You'll get far better results

https://www.anthropic.com/engineering/claude-code-best-pract...

LinXitoW 3 days ago | parent | prev | next [-]

In my experience in a Java code base, it didn't do any of this, and did a good job with exceptions.

And I have to disagree that these aren't errors that beginners or even intermediates make. Who hasn't swallowed an error because "that case totally, most definitely won't ever happen, and I need to get this done"?

jorvi 3 days ago | parent | prev [-]

Running LLM code with kernel privileges seems like courting disaster. I wouldn't dare do that unless I had a rock-solid grasp of the subsystem, and at that point, why not just write the code myself? LLM coding is on-average 20% slower.

LinXitoW 3 days ago | parent | prev | next [-]

Ironically, AI mirrors human developers in that it's far more effective when working in a well written, well documented code base. It will infer function functionality from function names. If those are shitty, short, or full of weird abbreviations, it'll have a hard time.

Maybe it's a skill issue, in the sense of having a decent code base.

flowerthoughts 3 days ago | parent | prev | next [-]

Totally agree.

This was a debugging tool for Zigbee/Thread.

The web project is Nuxt v4, which was just released, so Claude keeps wanting to use v3 semantics, and you have to keep repeating the known differences, even if you use CLAUDE.md. (They moved client files under a app/ subdirectory.)

All of these are greenfield prototypes. I haven't used it in large systems, and I can totally see how that would be context overload for it. This is why I was asking GP about the circumstances.

oceanplexian 3 days ago | parent | prev [-]

I work in FAANG, have been for over a decade. These tools are creating a huge amount of value, starting with Copilot but now with tools like Claude Code and Cursor. The people doing so don’t have a lot of time to comment about it on HN since we’re busy building things.

jpc0 3 days ago | parent | next [-]

> These tools are creating a huge amount of value...

> The people doing so don’t have a lot of time to comment about it on HN since we’re busy building…

“We’re so much more productive that we don’t have time to tell you how much more productive we are”

Do you see how that sounds?

wijwp 3 days ago | parent | next [-]

To be fair, AI isn't going to give us more time outside work. It'll just increase expectations from leadership.

drusepth 3 days ago | parent | prev [-]

I feel this, honestly. I get so much more work done (currently: building & shipping games, maintaining websites, managing APIs, releasing several mobile apps, and developing native desktop applications) managing 5x claude instances that the majority of my time is sucked up by just prompting whichever agent is done on their next task(s), and there's a real feeling of lost productivity if any agent is left idle for too long.

The only time to browse HN left is when all the agents are comfortably spinning away.

nme01 3 days ago | parent | prev | next [-]

I also work for a FAANG company and so far most employees agree that while LLMs are good for writing docs, presentations or emails, they still lack a lot when it comes to writing a maintainable code (especially in Java, they supposedly do better in Go, don’t know why, not my opinion). Even simple refactorings need to be carefully checked. I really like them for doing stuff that I know nothing about though (eg write a script using a certain tool, tell me how to rewrite my code to use certain library etc) or for reviewing changes

3 days ago | parent [-]
[deleted]
GodelNumbering 3 days ago | parent | prev | next [-]

I don't see how FAANG is relevant here. But the 'FAANG' I used to work at had an emergent problem of people throwing a lot of half baked 'AI-powered' code over the wall and let reviewers deal with it (due to incentives, not that they were malicious). In orgs like infra where everything needs to be reviewed carefully, this is purely a burden

verall 3 days ago | parent | prev | next [-]

I work in a FAANG equivalent for a decade, mostly in C++/embedded systems. I work on commercial products used by millions of people. I use the AI also.

When others are finding gold in rivers similar to mine, and I'm mostly finding dirt, I'm curious to ask and see how similar the rivers really are, or if the river they are panning in is actually somewhere I do find gold, but not a river I get to pan in often.

If the rivers really are similar, maybe I need to work on my panning game :)

ewoodrich 3 days ago | parent | prev | next [-]

I use agentic tools all the time but comments like this always make me feel like someone's trying to sell me their new cryptocoin or NFT.

3 days ago | parent | prev | next [-]
[deleted]
boppo1 3 days ago | parent | prev | next [-]

>creating a huge amount of value Do you write software, or work in accounting/finance/marketing?

3 days ago | parent | prev | next [-]
[deleted]
nomel 3 days ago | parent | prev [-]

What are the AI usage policies like at your org? Where I am, we’re severely limited.

3 days ago | parent [-]
[deleted]