Remix.run Logo
Zanfa 5 days ago

LLMs would also need to use historic commits as context, rather than just the current state of the codebase in isolation. Most codebases I've worked with go through migrations from a legacy pattern A to a newer and better pattern B, used across different parts of the codebase. Rarely can these migrations be done in a single go, so both patterns tend to stick around for a while as old code is revisited. Like the HTTP example, even if LLMs pick up a pattern to follow (which they often don't), it's a coin flip whether they pick the right one or not.

dwd 5 days ago | parent | next [-]

This...

I once worked on a massive codebase that had survived multiple acquisitions, renames and mergers over a 20 year period. By the time I left it had finally passed into the hands of a Fortune 500 global company.

You would often find code that matched an API call you required that was last updated in the mid-2000s, but there was a good chance that it was not the most recent code for that task, but still existed as it was needed for some bespoke function a single client used.

There could also be similar API calls with no documentation, and you had to pick the one that returned the data fields that you wanted.

antihero 5 days ago | parent | prev | next [-]

You can craft a nice CLAUDE.md saying write code like this bit, avoid writing code like this legacy bit etc.

manmal 5 days ago | parent | prev | next [-]

Better to tell them exactly how this and that is done, with some examples.

croes 5 days ago | parent | prev | next [-]

But that kind of awareness is what vibe coder often lack.

Many didn’t code (much) before.

anshumankmr 5 days ago | parent | prev [-]

That would assume a commit message is implemented correctly, and isn't like "Updated this file" or "Bugfix"

wldlyinaccurate 5 days ago | parent [-]

I think the parent comment means "commits" in the sense of the actual changeset; not just the message.

anshumankmr 5 days ago | parent [-]

That is also problematic, cause a git diff will probably require an exponential gain in context length AND also the ability for the LLM to use said context effectively.

That being said, a context length problem could be potentially be solved but it will take a bit of time, I think Llama4 had 10M context length (not sure if anyone tried prompting it with that much data to see how effective it really is)

tayo42 5 days ago | parent [-]

Do all of the diffs need to be included? Can't you include like a summarized version of a few changes?

Like I don't memorize the last 20 commits, but I know generally the direction things are going by reading those commits at some point

anshumankmr 5 days ago | parent [-]

If a commit was done a year or so back, then 20 commits would probably prove insufficient, and if say a team member is supposed to use some existing helper method already present in the codebase, which is easier to tell a person to use instead of an LLM writing another function to perform that same operation which is inefficient.

And even if you juiced up a context length of an LLM to astronomical numbers AND made it somehow better at parsing and understanding its context, it will not always repeat said capabilities in other codebases (see for example o3 supposedly being the top of most benchmarks but it will still fumble a simple variation mother-is-a-surgeon puzzle).

I am not saying its impossible for a company to figure this out, but it will be incredibly hard.