Anyone can build a coding agent which works on a) fresh code base b) when you've unlimited token budget

now build it for old codebase, let's see how precisely it edits or removes features without breaking the whole codebase

lets see how many tokens it consumes per bug fix or feature addition.

This comment belongs in a discussion about using LLMs to help write code for large existing systems - it's a bit out of place in a discussion about a tutorial on building coding agents to help people understand how the basic tools-in-a-loop pattern works.

▲

faangguyindia 5 days ago | parent [-]

anyone who used those coding agent can already see how it works, you can usually see agent fetching files, running commands, listing files and directories.

i just wrote this comment so people aren't under false belief that it's pretty much all coding agents do, making all this fault tolerant with good ux is lot of work.

	▲	ghuntley 4 days ago \| parent [-]
		> making all this fault tolerant with good ux is lot of work. Yes, it is. Not only in the department of good design in UX, but these LLMs keep evolving. They are software with different versions, and these different versions are continually deployed, which changes the behavior of the underlying model. So the harness needs to be continually updated to remain competitive.

▲

pcwelder 5 days ago | parent | prev | next [-]

Agree. To reduce costs:

1. Precompute frequently used knowledge and surface early. For example repository structure, os information, system time.

2. Anticipate next tool calls. If a match is not found while editing, instead of simply failing, return closest matching snippet. If read file tool gets a directory, return directory contents.

3. Parallel tool calls. Claude needs either a batch tool or special scaffolding to promote parallel tool calls. Single tool call per turn is very expensive.

Are there any other such general ideas?

	▲	faangguyindia 5 days ago \| parent [-]
		that info can be just included in preffix which is cache by LLM, reducing cost by 70-80% average. System time varies, so it's not good idea to specify it in prompt, better to make a function out of it to avoid cache invalidation. I am still looking for a good "memory" solution, so far running without it. Haven't looked too deep into it. Not sure how next tool call be predicted. I am still using serial tool calls as i do not have any subagents, i just use fast inference models for directly tools calls. It works so fast, i doubt i'll benefit from parallel anything.

▲

NitpickLawyer 5 days ago | parent | prev | next [-]

There's "swe re-bench", a benchmark that tracks model release dates, and you can see how the model did for "real-world" bugs that got submitted on github after the model was released. (obviously works best for open models).

There are a few models that solve 30-50% of (new) tasks pulled from real-wolrd repos. So ... yeah.

▲

righthand 5 days ago | parent | prev [-]

Surprise, as rambunctious dev who’s socially hacked their way through promotions, I will just convince our manager we need to rewrite the platform in a new stack or convince them that I need to write a new server to handle the feature. No old tech needed!