Remix.run Logo
blauditore 18 hours ago

All these engineers who claim to write most code through AI - I wonder what kind of codebase that is. I keep on trying, but it always ends up producing superficially okay-looking code, but getting nuances wrong. Also fails to fix them (just changes random stuff) if pointed to said nuances.

I work on a large product with two decades of accumulated legacy, maybe that's the problem. I can see though how generating and editing a simple greenfield web frontend project could work much better, as long as actual complexity is low.

bob1029 16 hours ago | parent | next [-]

I have my best successes by keeping things constrained to method-level generation. Most of the things I dump into ChatGPT look like this:

  public static double ScoreItem(Span<byte> candidate, Span<byte> target)
  {
     //TODO: Return the normalized Levenshtein distance between the 2 byte sequences.
     //... any additional edge cases here ...
  }
I think generating more than one method at a time is playing with fire. Individual methods can be generated by the LLM and tested in isolation. You can incrementally build up and trust your understanding of the problem space by going a little bit slower. If the LLM is operating over a whole set of methods at once, it is like starting over each time you have to iterate.
theshrike79 25 minutes ago | parent | next [-]

"Dumping into ChatGPT" is by far the worst way to work with LLMs, then it lacks the greater context of the project and will just give you the statistical average output.

Using an agentic system that can at least read the other bits of code is more efficient than copypasting snippets to a web page.

samdoesnothing 15 hours ago | parent | prev [-]

I do this but with copilot. Write a comment and then spam opt-tab and 50% of the time it ends up doing what I want and I can read it line-by-line before tabbing the next one.

Genuine productivity boost but I don't feel like it's AI slop, sometimes it feels like its actually reading my mind and just preventing me from having to type...

jerf 15 hours ago | parent [-]

I've settled in on this as well for most of my day-to-day coding. A lot of extremely fancy tab completion, using the agent only for manipulation tasks I can carefully define. I'm currently in a "write lots of code" mode which affects that, I think. In a maintenance mode I could see doing more agent prompting. It gives me a chance to catch things early and then put in a correct pattern for it to continue forward with. And honestly for a lot of tasks it's not particularly slower than "ask it to do something, correct its five errors, tweak the prompt" work flow.

I've had net-time-savings with bigger agentic tasks, but I still have to check it line-by-line when it is done, because it takes lazy shortcuts and sometimes just outright gets things wrong.

Big productivity boost, it takes out the worst of my job, but I still can't trust it at much above the micro scale.

I wish I could give a system prompt for the tab complete; there's a couple of things it does over and over that I'm sure I could prompt away but there's no way to feed that in that I know of.

CuriouslyC 17 hours ago | parent | prev | next [-]

It's architecture dependent. A fairly functional modular monolith with good documentation can be accessible to LLMs at the million line scale, but a coupled monolith or poorly instrumented microservices can drive agents into the ground at 100k.

yuedongze 17 hours ago | parent [-]

I think it's definitely an interesting subject for Verification Engineering. the easier to task AI to do work more precisely, the easier we can check their work.

CuriouslyC 17 hours ago | parent [-]

Yup. Codebase structure for agents is a rabbit hole I've spent a lot of time going down. The interesting thing is that it's mostly the same structure that humans tend to prefer, with a few tweaks: agents like smaller files/functions (more precise reads/edits), strongly typed functional programming, doc-comments with examples and hyperlinks to additional context, smaller directories with semantic subgroups, long/distinct variable names, etc.

lukan 15 hours ago | parent [-]

Aren't those all things, humans also tend to prefer to read?

I like to read descriptive variable names, I just don't like to write them all the time.

hathawsh 18 hours ago | parent | prev | next [-]

I think your intuition matches mine. When I try to apply Claude Code to a large code base, it spends a long time looking through the code and then it suggests something incorrect or unhelpful. It's rarely worth the trouble.

When I give AI a smaller or more focused project, it's magical. I've been using Claude Code to write code for ESP32 projects and it's really impressive. OTOH, it failed to tell me about a standard device driver I could be using instead of a community device driver I found. I think any human who works on ESP-IDF projects would have pointed that out.

AI's failings are always a little weird.

divan an hour ago | parent | next [-]

I start new projects "AI-first" – start with docs, and refining them on the go, with multiple CLAUDE.md in different folders (to give a right context where it's needed). This alone increases the chances of it getting tasks right tenfold. Plus I almost always verify myself all the code produced.

Ironically, this would be the best workflow with humans too.

seanmcdirmid 8 hours ago | parent | prev | next [-]

Have you tried having AI build up documentation on the code first and then correct it where it’s understanding is wrong, then running code changes with the docs in the context, you can even separate it out for each module if you are daring. Ai still takes alot of hand holding to be productive with, which means our jobs are safe for now until they start learning about SWe principles somehow.

manmal 15 hours ago | parent | prev [-]

In large projects you need to actually point it to the interesting files, because it has no way of knowing what it doesn’t know. Tell it to read this and that, creating summary documents, then clear the context and point it at those summaries. A few of those passes and you‘ll get useful results. A gap in its knowledge of relevant code will lead to broken functionality. Cursor and others have been trying to solve this with semantic search (embeddings) but IMO this just can’t work because relevance of a code piece for a task is not determinable by any of its traits.

Yoric 12 hours ago | parent [-]

But in the end, do you feel that it has saved you time?

I find hand-holding Claude a permanent source of frustration, except in the rare case that it helps me discover an error in the code.

manmal 6 hours ago | parent [-]

I‘ve had a similar feeling before Opus 4.5. Now it suddenly clicks with me, and it has passed the shittiness threshold, into the „often useful“ area. I suspect that’s because Apple is partnering with Anthropic and they will have improved Swift support.

Eg it‘s great for refactoring now, it’s often updating the README along with renames without me asking. It’s also really good at rebasing quickly, but only by cherry-picking inside a worktree. Churning out small components I don’t want to add a new dependency for, those are usually good on first try.

For implementing whole features, the space of possible solutions is way too big to always hit something that I‘ll be satisfied with. Once I have an idea on how to implement something in broad strokes, I can give a very error ridden first draft to it as a stream of thoughts, let it read all required files, and make an implementation plan. Usually that’s not too far off, and doesn’t take that long. Once that’s done, Opus 4.5 is pretty good at implementing that plan. Still I read every line, if this will go to production.

freedomben 15 hours ago | parent | prev | next [-]

I've tried it extensively, and have the same experience as you. AI is also incredibly stubborn when it wants to go down a path I reject. It constantly tries to do it anyway and will slip things in.

I've tried vibe coding and usually end up with something subtly or horribly broken, with excessive levels of complexity. Once it digs itself a hole, it's very difficult to extricate it even with explicit instruction.

mrtksn 5 hours ago | parent | prev | next [-]

So far I found that AI is very good at writing the code as in translating english to computer code.

Instead of dealing with intricacies of directly writing the code, I explain the AI what are we trying to achieve next and what approach I prefer. This way I am still on top of it, I am able to understand the quality of the code it generated and I’m the one who integrates everything.

So far I found the tools that are supposed to be able to edit the whole codebase at once be useless. I instantly loose perspective when the AI IDE fiddles with multiple code blocks and does some magic. The chatbot interface is superior for me as the control stays with me and I still follow the code writing step by step.

qudat 17 hours ago | parent | prev | next [-]

Are you using it only on massive codebases? It's much better with smaller codebases where it can put most of the code in context.

Another good use case is to use it for knowledge searching within a codebase. I find that to be incredibly useful without much context "engineering"

eloisant an hour ago | parent [-]

It's also good on massive codebases that include a lot of "good practices" examples.

Let's say you want to add a new functionality, for example plug to the shared user service, that already exist in another service in the same monorepo, the AI will be really good at identifying an example and applying it to your service.

wubrr 12 hours ago | parent | prev | next [-]

I've generally had better luck when using it on new projects/repos. When working on a large existing repo it's very important to give it good context/links/pointers to how things currently work/how they should work in that repo.

Also - claude (~the best coding agent currently imo) will make mistakes, sometimes many of them - tell it to test the code it writes and make sure it's working - I've generally found its pretty good at debugging/testing and fixing it's own mistakes.

rprend 10 hours ago | parent | prev | next [-]

<1 year old startup with fullstack javascript monorepo. Hosted with a serverless platform with good devex, like cloudflare workers.

That’s the typical “claude code writes all my code” setup. That’s my setup.

This does require you to fit your problem to the solution. But when you do, the results are tremendous.

bojan 14 hours ago | parent | prev | next [-]

> I work on a large product with two decades of accumulated legacy, maybe that's the problem.

I'm in a similar situation, and for the first time ever I'm actually considering if a rewrite to microservices would make sense, with a microservice being something small enough an AI could actually deal with - and maybe even build largely on its own.

vanviegen 14 hours ago | parent [-]

If you're creating microservices that are small enough for a current-gen LLM to deal with well, that means you're creating way too many microservices. You'll be reminiscing your two decades of accumulated legacy monolith with fondness.

themafia 14 hours ago | parent | prev | next [-]

> as long as actual complexity is low.

You can start there. Does it ever stay that way?

> I work on a large product with two decades of accumulated legacy

Survey says: No.

silisili 17 hours ago | parent | prev | next [-]

> I work on a large product with two decades of accumulated legacy, maybe that's the problem

Definitely. I've found Claude at least isn't so good at working in large existing projects, but great at greenfielding.

Most of my use these days is having it write specific functions and tests for them, which in fairness, saves me a ton of time.

moomoo11 14 hours ago | parent | prev | next [-]

You need to realize when you’re being marketed to and filter out the nonsense.

Now I use agentic coding a lot with maybe 80-90% success rate.

I’m on greenfield projects (my startup) and maintaining strict Md files with architecture decisions and examples helps a lot.

I barely write code anymore, and mostly code review and maintain the documentation.

In existing codebases pre-ai I think it’s near impossible because I’ve never worked anywhere that maintained documentation. It was always a chore.

tuhgdetzhh 17 hours ago | parent | prev | next [-]

Yes, unfortunately those who jumped on the microservices hype train over the past 15 years or so are now getting the benefits of Claude Code, since their entire codebases fits into the context window of Sonnet/Opus and can be "understood" by the LLM to generate useful code.

This is not the case for most monoliths, unless they are structured into LLM-friendly components that resemble patterns the models have seen millions of times in their training data, such as React components.

manmal 15 hours ago | parent [-]

Well structured monoliths are modularized just like microservices. No need to give each module its own REST API in order to keep it clean.

randomtoast 3 hours ago | parent | next [-]

One problem is that the idea of being "well-structured" has gone overboard at some point over the past 20 years in many companies. As a result, many companies now operate highly convoluted monolithic systems that are extremely difficult to replace.

In contrast, a poorly designed microservice can be replaced much more easily. You can identify the worst-performing and most problematic microservices and replace them selectively.

tuhgdetzhh an hour ago | parent [-]

> One problem is that the idea of being "well-structured" has gone overboard at some point over the past 20 years

That's exactly my experience. While a well-structured monolith is a good idea in theory, and I'm sure such examples exist in practice, that has never been the case in any of my jobs. Friends working at other companies report similar experiences.

bccdee 5 hours ago | parent | prev | next [-]

Conversely, poorly-structured microservices are just monoliths where most of the code is in other repositories.

Yoric 12 hours ago | parent | prev [-]

I guess that the benefit of monoliths in the context is that they (often) live in distinct repositories, which makes it easier for Claude to ingest them entirely, or at least not get lost into looking at the wrong directory.

cogman10 18 hours ago | parent | prev | next [-]

Honestly, if you've ever looked at a claude.md file, it seems like absolute madness. I feel like I'm reading affirmations from AA.

theshrike79 21 minutes ago | parent | next [-]

Way too many agent prompt files are just fan fiction or D&D character background documents that have no actual effect on what the agent does =)

manmal 15 hours ago | parent | prev [-]

It’s magical incantations that might or might not protect you from bad behavior Claude learned from underqualified RL instructors. A classic instruction I have in CLAUDE.md is „Never delete a test. You are only allowed to replace with a test that covers the same branches.“ and another one „Never mention Claude in a commit message“. Of course those sometimes fail, so I do have a message hook that enforces a certain style of git messages.

Havoc 10 hours ago | parent [-]

> Never mention Claude in a commit message“. Of course those sometimes fail,

It’s hardcoded into the system prompt which is why your CLAUDE.md approach fails. Ended up intercepting it out via proxy

manmal 6 hours ago | parent [-]

Thanks for this idea!

junkaccount 15 hours ago | parent | prev [-]

Can you prove it in a blog and post it here that you do better code snippets than AI. If you claim "what kind of codebase", you should be able to use some codebase from github to prove it?