Remix.run Logo
airstrike 7 months ago

That's...ridiculously fast.

I still feel like the best uses of models we've seen to date is for brand new code and quick prototyping. I'm less convinced of the strength of their capabilities for improving on large preexisting content over which someone has repeatedly iterated.

Part of that is because, by definition, models cannot know what is not in a codebase and there is meaningful signal in that negative space. Encoding what isn't there seems like a hard problem, so even as models get smarter, they will continue to be handicapped by that lack of institutional knowledge, so to speak.

Imagine giving a large codebase to an incredibly talented developer and asking them to zero-shot a particular problem in one go, with only moments to read it and no opportunity to ask questions. More often than not, a less talented developer who is very familiar with that codebase will be able to add more value with the same amount of effort when tackling that same problem.

westoncb 7 months ago | parent | next [-]

The trick to this is you've got to talk to them and share this information in the same way. I can give an example. These days my main workflow is as follows: if I have some big feature/refactor/whatever I'm going to work on I'll just start talking to o3 about it essentially as if it was a coworker and (somewhat painstakingly) paste in relevant source files it needs for context. We'll have a high-level discussion about what it is we're trying to build and how it relates to the existing code until I get the sense o3 has a clear and nuanced understanding (these discussions tend to sharpen my own understanding as well). Then, I'll ask o3 to generate an implementation plan that describes what needs to happen across the codebase in order for whatever it is to be realized. I'll then take that and hand it off to Codex, which might spend 10min executing shell commands to read source, edit files, test, etc. and then I've got a PR ready, which sometimes takes a bit more manual editing, and other times is perfectly ready to merge.

What you're saying is true RE them needing rich context, too—but this isn't a fundamental limitation, it's just an aspect of what it takes to work with them effectively. There's definitely a learning curve but once you've got it down it's not only very powerful but, for me anyway, a more enjoyable headspace to occupy than lots of lower level manual editing.

Onawa 7 months ago | parent | next [-]

I would suggest trying the Continue.dev VSCode plugin for selective context injection. The plugin is Apache 2.0 licensed, and you can hook it up to any LLM API including local.

It has most of the same features as GitHub Copilot, but a few extra features I find essential. It can scrape documentation sites for individual libraries, which means you can do stuff like `@pandas @terminal @codebase Help me fix this error`.

For greenfield projects I will usually start out in a web-based chat interface, but the second I need to go back and forth between IDE and the web I switch over to the Continue.dev plugin.

westoncb 7 months ago | parent [-]

I’m pretty happy with Zed for development. I do plan on developing custom tooling around my style of workflow, but it’s not going to be part of an IDE.

dimitri-vs 7 months ago | parent | prev | next [-]

Interesting approach, I'm definitely going to steal your wording for "generate an implementation plan that...".

I do something similar but entirely within Cursor:

1. create a `docs/feature_name_spec.md`, use voice-to-text to brain dump what I am trying to do 2. open up a the AI chat panel in "Ask" mode while referencing that spec file, ask (paste) a boilerplate snippet like: "1) Ask clarifying questions about intent, domain, restrictions, ambiguity or missing details 2) Briefly identify any missing documents, data, or background information that would help you complete the task thoroughly" 3. move that list of questions into the spec doc and answer them there, attach the files it asked for and just rerun the above request (optionally, switching to a different model, like gemini-2.5-pro -> o3, for different perspective) 4. ask it to make an execution plan and at that point i have a fully spec'd out feature and documented business logic, I either use the Edit mode on each step or Agent mode

That's for more complex features touching many files or refactors, but I essentially do a simplified version of that within the same chat by editing my original chat prompt until I'm confident I explained myself well

westoncb 7 months ago | parent | next [-]

I spend so much time just finding/moving context pieces around these days i bought a physical macro pad and have been thinking about designing some software specifically to make this quicker, basically like rapidly finding/selecting context pieces and loading into buffers and relaying to conversation context. I think it’ll have to be backed by agentic search, voice controlled, and not sure how to best integrate with possible consumers… I dunno if that makes sense. I started building it and realized I need to think on the design a bit more so I’m building more like infrastructure pieces now.

rcarmo 7 months ago | parent | prev [-]

That's very close to my workflow: https://taoofmac.com/space/blog/2025/05/13/2230

blurrybird 7 months ago | parent [-]

I’d love to watch a video of this playing out.

landl0rd 7 months ago | parent | prev | next [-]

This is absolutely the best way to do it. However it's also infeasible for number-of-queries-based quota like most front-ends have. And of course running through API for models like o3 and 4-opus is basically always way more expensive. Hence the desire for one-shotting stuff.

jacob019 7 months ago | parent | prev | next [-]

I find myself using a similar workflow with Aider. I'll use chat mode to plan, adjust context, enable edits, and let it go. I'll give it a broad objective and tell it to ask me questions until the requirements are clear, then a planning summary. Flipping the script is especially helpful when I'm unsure what I actually want.

ckw 7 months ago | parent | prev [-]

I do the same thing, though sometimes I take one extra step to elaborate on the first implementation plan ‘in minute detail such that a weaker model could successfully implement it’, with deep research selected.

ManuelKiessling 7 months ago | parent | prev | next [-]

"...what is not in a codebase, and there is meaningful signal in that negative space."

Man, I'm writing software for money for decades now, but this fundamental truth never occured to me, at least not consciously and with such clarity.

So, thank you!

spuz 7 months ago | parent | next [-]

I am not certain that I agree with this. If there are alternative ways of solving a problem that we're not taken then these should be documented in comments. A mantra I try to tell myself and my colleagues is if information exists in your brain and nowhere else then write down it down _somewhere_. If I tried 5 different libraries before settling on one, then I write in comments which libraries I tried but didn't work and why. If I used a particular tool to debug a race condition then I put a link to a wiki page on how to use it in the comments. If we have one particular colleague who is an expert in some area then I write their name in a comment. Basically anything that is going to save future developers' time should be written down.

david-gpu 7 months ago | parent | next [-]

Agreed. IMO it's always a good idea to document design choices.

The owner can write down the problem, a few solutions that were considered, why they were chosen/rejected, and a more detailed description of the final design. Stakeholders then review and provide feedback, and after some back and forth all eventually sign off the design. That not only serves to align the organization, but to document why things were done that way, so that future hires can get a sense of what is behind the code, and who was involved in case they have more questions.

This was how we did things at some $BigCorps and it paid dividends.

jonahx 7 months ago | parent | prev [-]

What are you disagreeing with?

Even if you do this (and it's good practice!), it is, empirically, not done in the vast majority of codebases.

And even if you succeed with the utmost diligence, a vastly greater number of decisions (those you were not even aware of consciously, or took for granted) will remain undocumented but still be quite real in this "negative space" sense.

airstrike 7 months ago | parent [-]

Exactly. I couldn't have said it better.

airstrike 7 months ago | parent | prev | next [-]

My pleasure ;-) I borrowed the term from art: https://www.michaelalfano.com/tag/negative-space/?id=400

shahar2k 7 months ago | parent [-]

I'm an artist who works on pre-production fast turnaround animations for films, and yeah that hits the nail on the head, knowing what NOT to do which elements not to focus on is a majority of the power that comes with experience. I'm fast because I know which corners can be cut best and how to illustrate what I need to

woctordho 7 months ago | parent | prev | next [-]

Then document it. Whenever you choose one algorithm/library/tech stack but not another, write your consideration in the documents.

ManuelKiessling 7 months ago | parent | next [-]

The funny thing is that I have at least a dozen comments in my current codebase where I explain in detail why certain things are not put in place or are not served via other-solution-that-might-seem-obvious.

7 months ago | parent | prev [-]
[deleted]
stef25 7 months ago | parent | prev | next [-]

I understand what negative space is in art. Can you explain how this applies to writing software ?

skydhash 7 months ago | parent [-]

A quick example is a basic 2d game. If you’re not using an engine (just a graphic library) and you have some animations, experience will tell you to not write most of the code with numbers only. More often than not, you will write a quick vector module. Just how you will use local origin for transformations.

But more often than not, the naive code is the result of not doing the above and just writing the feature. It technically does the job, but it’s verbose and difficult to maintain.

So just like in drawing, you need to think holistically about the program. Every line of code should support an abstraction. And that will dictate which code to write and which to not write.

That’s why you often see the concept of patterns in software. The code is not important. The patterns are. The whole structure more so. Code is just what shape these.

lukan 7 months ago | parent [-]

I have written 2D games, but maybe the metapher is just lost on me or I simply disagree to its usefulness here.

Negative space in art achieves a certain effect. Like in the linked sibling comment, the empty space is part of the sculpture.

So the empty space has purpose and meaning.

But if I didn't choose a certain libary .. the empty place of that libary serves no function. It does change my code and might make my dev life easier or harder, but has no meaning in itself for the result.

collingreen 7 months ago | parent | next [-]

Let me take a crack at it.

I think the negative space metaphor in software can be in the shape of the abstractions and hitting the sweet spot of making the right things easy/railroaded while not over engineering it.

In visual art, negative space is part of the layout and the visual journey. It helps define the relationships between things as much as those things themselves and, used judiciously, is one of the differences between elegance and clutter.

I think "not choosing a library" is important info but isn't the same thing as negative space and is instead more like restrictions, framing, or limitation. You can do a lot with what isn't shown but in this area I think good art and good software diverge in goals - to me good art makes me think or feel or speculate while good software instead makes me understand with as little of those other things as possible.

The caveat here might be not choosing things for very good but not obvious reasons, which should be loudly documented. Things like licensing or other external influences or specific hardware requirements maybe. For example I once banned the creation of a graphQL api in a product that could have benefited from it because we still needed to support the existing api for third parties forever so the suggestion to replace the api was actually secretly the suggestion to maintain two APIs in lockstep.

skydhash 7 months ago | parent [-]

Yes the code is not actually important as two different teams will solve the same problem in different manners. Just like a great painting and a bad one can use the same base materials. What’s important is the purpose and the constraints of any solution. Any decision you take propagates down the timeline and outward in the project. And they preclude other decisions from being taken.

So whatever you do will live a mark. But there are some spaces that should not be filled in. While it may look nice in the moment or taken in isolation. When looking at the whole, it makes it a mess.

skydhash 7 months ago | parent | prev [-]

I’m talking more about architecting code instead of naively writing them. The same point can be made about libraries but the considerations are more subjective.

Most naive approaches to writing software looks like assembly. But instead of opcodes, you have libraries functions. But we move away from assembly and assembly like programming because it’s essentially one shot. Any modification to the program is difficult and/or tedious. So instead of having that one blob of instructions, we introduce gaps so that it becomes more flexible. We have functions, objects, modules… but the actual links between them still needs to be shaped.

A library can have some influence on the shape, but it is minor if you favor the solution over the means. But sometimes you see people really going hard to fill the gaps of the shape, and that’s when you start to shout KISS and YAGNI. Sometimes they want to alter the shape and you bring out SOLID and other principles…

lukan 7 months ago | parent [-]

"I’m talking more about architecting code instead of naively writing them."

Yeah, we are talking about code designing.

And I got my head filled with all the design patterns back then in university, but my first bigger real world projects were somehow horribly overengineered and still unflexible. And I don't think it was just lack of experience.

Nowdays I prefer a very, very simple and clear approach.

No dark empty space I want to design around.

No clever hidden layers, that prevent the introduction of a pragmatic new API.

I guess I get what you probably mean and it ain't that, but to me it has too much of the vibe of the time when I was amazed at myself for coming up with a seemingly super clever (complex) design, that sounded great in theory.

skydhash 7 months ago | parent [-]

Yes simplicity is always important, but it does not equate easiness. The axe of simple to complex is independent of the axe of easy to hard. It may be easy to apply patterns blindly to your codebass and make it complex. Just how it is easy to write naive and simple code that then becomes difficult to work with.

The mark of a good programmer is to balance all of these so that it’s easy to work with the codebase on an ongoing basis. And more often than not it’s similar to the sketching process. At each stage, you get enough feedback to judge the right direction for the next iteration. You do not start with all the details, nor with careless doodling. But one aspect that is often overlooked with artists is how often they practice to get that judgement capability.

lukan 7 months ago | parent [-]

"At each stage, you get enough feedback to judge the right direction for the next iteration."

Depends on the project I would say. What do you do, if all of a sudden the requirements change again? Or the plattform evolved/degraded? Then you compromise - and I can better compromise with simple solution. And I would never claim simple equals easy. Rather the opposite. Like you said, it is easy to make complex things. Also I never applied design patterns for the sake of it(even though it might have sounded like it) KISS was part of the theories as well.. but I did value emphasized cleverness too much as I thought that this is the way it is supposed to be done.

My resume is: simple direct solutions are to be prefered and trying to be clever is not very clever.

I rather have 3 lines of code, than one compressed clever one, no one can understand the first time reading it. And the same goes for the bigger design picture.

airstrike 7 months ago | parent [-]

Re: this whole conversation, you might find this quick video a worthwhile watch https://www.youtube.com/watch?v=wrwxC9taL8w

7 months ago | parent [-]
[deleted]
FieryTransition 7 months ago | parent | prev [-]

There's a reason why less is called less, and not more.

8n4vidtmkvmk 7 months ago | parent | prev | next [-]

That's not been my experience so far. LLMs are good at mimicking existing good, it doesn't usually bring in new things when not asked. Sometimes I have to go out of my way to point to other bits of code in the project to copy from because it hasn't ingested enough of the codebase.

That said, a negative prompt like we have in stable diffusion would still be very cool.

Incipient 7 months ago | parent [-]

I'm in the camp of 'no good for existing'. I try to get ~1000 line files refactored to use different libraries, design paradigms, etc and it usually outputs garbage - pulling db logic into the UI, grabbing unrelated api/function calls, to entirely just corrupting the output.

I'm sure there is a way to correctly use this tool, so I'm feeling like I'm "just holding it wrong".

fragmede 7 months ago | parent | next [-]

Which LLM are you using? what LLM tool are you using? What's your tech stack that you're generating code for? Without sharing anything you can't, what prompts are you using?

Incipient 7 months ago | parent [-]

Was more of a general comment - I'm surprised there is significant variation between any of the frontier models?

However, vscode with various python frameworks/libraries; dash, fastapi, pandas, etc. Typically passing the 4-5 relevant files in as context.

Developing via docker so I haven't found a nice way for agents to work.

fragmede 7 months ago | parent | next [-]

> I'm surprised there is significant variation between any of the frontier models?

This comment of mine is a bit dated, but even the same model can have significant variation if you change the prompt by just a few words.

https://news.ycombinator.com/item?id=42506554

danielbln 7 months ago | parent | prev [-]

I would suggest using an agentic system like Cline, so that the LLM can wander through the codebase by itself and do research and build a "mental model" and then set up an implementation plan. The you iterate in that and hand it off for implementation. This flow works significantly better than what you're describing.

otabdeveloper4 7 months ago | parent [-]

> LLM can wander through the codebase by itself and do research and build a "mental model"

It can't really do that due to context length limitations.

exe34 7 months ago | parent | next [-]

It doesn't need the entire codebase, it just needs the call map, the function signatures, etc. It doesn't have to include everything in a call - but having access to all of it means it can pick what seems relevant.

danielbln 7 months ago | parent | next [-]

Yes, that's exactly right. The LLM gets a rough overview over the project (as you said, including function signatures and such) and will then decide what to open and use to complete/implement the objective.

otabdeveloper4 7 months ago | parent | prev [-]

In a real project the call map and function signatures are millions of tokens themselves.

exe34 7 months ago | parent [-]

For sufficiently large values of real.

otabdeveloper4 7 months ago | parent [-]

Anything less is not a "project", it's a "file".

exe34 7 months ago | parent [-]

That's right, there is no true Scotsman!

otabdeveloper4 7 months ago | parent [-]

Incorrect attempt as fallacy baiting.

If your repo map fits into 1000 tokens then your repo is small enough that you can just concatenate all the files together and feed the result as one prompt to the LLM.

No, current LLM technology does not allow to process actual (i.e. large) repos.

simonw 7 months ago | parent [-]

Where's your cutoff for "large"?

johnisgood 7 months ago | parent | prev | next [-]

1k LOC is perfectly fine, I did not experience issues with Claude with most (not all) projects around ~1k LOC.

otabdeveloper4 7 months ago | parent [-]

Actual projects where you'd want some LLM help start with millions of lines of code, not thousands.

With 1k lines of code you don't need an LLM, the entire source code can fit in one intern's head.

johnisgood 7 months ago | parent | next [-]

The OP mentioned having LLM issues with 1k LOC, so I suppose he would have problems with millions. :D

simonw 7 months ago | parent | prev [-]

Have you tried Claude Code yet?

Even with it's 200,000 token limit it's still really impressive at diving through large codebases using find and grep.

lukan 7 months ago | parent | prev [-]

I guess people are talking about different kinds of projects here in terms of project size.

jacob019 7 months ago | parent | prev | next [-]

I've refactored some files over 6000 loc. It was necessary to do it iteratively with smaller patches. "Do not attempt to modify more than one function per iteration" It would just gloss over stuff. I would tell it repeatedly: I noticed you missed something, can you find it? I kept doing that until it couldn't find anything. Then I had to manually review and ask for more edits. Also lots of style guidelines and scope limit instructions. In the end it worked fine and saved me hours of really boring work.

landl0rd 7 months ago | parent | prev [-]

I'll back this up. I feel constantly gaslit by people who claim they get good output.

I was hacking on a new project and wanted to see if LLMs could write some of it. So I picked an LLM friendly language (python). I picked an LLM friendly DB setup (sqlalchemy and postgres). I used typing everywhere. I pre-made the DB tables and pydantic schema. I used an LLM-friendly framework (fastapi). I wrote a few example repositories and routes.

I then told it to implement a really simple repository and routes (users stuff) from a design doc that gave strict requirements. I got back a steaming pile of shit. It was utterly broken. It ignored my requirements. It fucked with my DB tables. It fucked with (and broke) my pydantic. It mixed db access into routes which is against the repository pattern. Etc.

I tried several of the best models from claude, oai, xai, and google. I tried giving it different prompts. I tried pruning unnecessary context. I tried their web interfaces and I tried cursor and windsurf and cline and aider. This was a pretty basic task I expect an intern could handle. It couldn't.

Every LLM enthusiast I've since talked to just gives me the run-around on tooling and prompting and whatever. "Well maybe if you used this eighteenth IDE/extension." "Well maybe if you used this other prompt hack." "Well maybe if you'd used a different design pattern."

The fuck?? Can vendors not produce a coherent set of usage guidelines? If this is so why isn't there a set of known best practices? Why can't I ever replicate this? Why don't people publish public logs of their interactions to prove it can do this beyond a "make a bouncing ball web game" or basic to-do list app?

simonw 7 months ago | parent [-]

> Why don't people publish public logs of their interactions to prove it can do this beyond a "make a bouncing ball web game" or basic to-do list app?

It's possible I've published more of those than anyone else. I share links to Gists with transcripts of how I use the models all the time.

You can browse a lot of my collection here: https://simonwillison.net/search/?q=Gist&sort=date

Look for links that's at things like "transcript".

manmal 7 months ago | parent | prev | next [-]

They could read the whole git history and have all issue tracker tickets in the context, and maybe even recordings from meetings. It remains to be seen though if such large context will yield usable results.

eMPee584 7 months ago | parent | next [-]

This. Git ( / tig!) blame and log -p --stat -S SEARCHSTR are extremely powerful for understanding the what why and when about code..

Cthulhu_ 7 months ago | parent | prev | next [-]

I find most meetings I'm in nowadays are mostly noise; there's no clear "signal" that "this is the outcome", which I think is what an AI should be able to filter out.

Of course, it'd be even better if people communicated more clearly and succinctly.

manmal 7 months ago | parent [-]

Maybe time to find an employer with a better culture? I rarely have meetings that I would be comfortable skipping.

internet_points 7 months ago | parent | prev | next [-]

That also leads to more noise and opportunities to get lost in the woods.

ttoinou 7 months ago | parent | prev [-]

Do we already have tools to do thar automagically ?

manmal 7 months ago | parent [-]

Yes there are MCPs for git and Jira. I‘m not sure about the utility with the current context sizes.

7 months ago | parent | prev | next [-]
[deleted]
aposm 7 months ago | parent | prev | next [-]

A human working on an existing codebase does not have any special signal about what is _not_ in a codebase. Instead, a (good) human engineer can look at how a problem is handled and consider why it might have been done that way vs other options, then make an educated decision about whether that alternative would be an improvement. To me this seems like yet another piece of evidence that these models are not doing any "reasoning" or problem-solving.

ec109685 7 months ago | parent | prev | next [-]

If you make models fast enough, you can onboard that expert developer instantly and let them reason their way to a solution, especially when giving access to a RAG to.

Over time, I models will add more memory and institutional knowledge capture rather than starting from a blank slate each time.

airstrike 7 months ago | parent [-]

I thought of that as I wrote my comment, but I think the infrastructure and glue to make that possible in a consistent, fast and scalable way is still a few years out.

lucasacosta_ 7 months ago | parent [-]

Definitely. For now the "frontier-level" papers (working with repository-level coding maintenance) need to necessarily depend on previously (and statically) generated Code Knowledge Graphs or Snippet-Retrieval systems, which makes the scalable and fast aspects complicated, as any change in the code would represent a change in the graph, hence requiring a rebuild. But given the context limit, you need to rely on Graph queries to give relevant parts and then at the end of the day it just reads snippets instead of the full code, which makes the consistent an issue, as it can't learn from the entirety of the code.

Papers I'm referring to (just some as example, as there're more):

- CodexGraph [https://arxiv.org/abs/2408.03910] - Graph

- Agentless [https://arxiv.org/abs/2407.01489] - Snippet-Retrieval

airstrike 7 months ago | parent [-]

Thanks for these links. I really appreciate it.

Flemlo 7 months ago | parent | prev | next [-]

But plenty of companies already do this for a decade and more

Having old shitty code base and not retaining the people who built it.

I have done that too despite the creator sitting only 100km away. Code was shit as hell tons of c&p different logic in different endpoints for logging in.

Finally it's worth it to have adrs and similar things.

Flemlo 7 months ago | parent | prev | next [-]

A LLM could easily use its own knowledge to create a list of things to check inside the code base and generate a fact sheet and use best practices and similar knowledge to extend on it.

Just because one query might not be able to do so doesn't mean there are no ways around it

mejutoco 7 months ago | parent | prev | next [-]

> Part of that is because, by definition, models cannot know what is not in a codebase and there is meaningful signal in that negative space

I wonder if git history would be enough to cover this. It has alternatives tried and code that was removed at the very least.

scotty79 7 months ago | parent | prev | next [-]

> they will continue to be handicapped by that lack of institutional knowledge, so to speak

Until we give them access to all Jira tickets instead of just one so they know what's missing.

campers 7 months ago | parent [-]

I've been thinking about adding in an agent to our Codex/Jules like platform which goes through the git history for the main files being changed, extracts the Jira ticket ID's, look through them for additional context, along with the analyzing the changes to other files in commits.

nopinsight 7 months ago | parent | prev [-]

...which is why top LLM providers' web apps like ChatGPT, Claude.ai, Gemini try to nudge you to connect with Google Drive, and where appropriate, GitHub Repos. They also allow the user/dev to provide feedback to revise the results.

All the training and interaction data will help make them formidable.