▲ | faangguyindia 5 days ago | ||||||||||||||||
Anyone can build a coding agent which works on a) fresh code base b) when you've unlimited token budget now build it for old codebase, let's see how precisely it edits or removes features without breaking the whole codebase lets see how many tokens it consumes per bug fix or feature addition. | |||||||||||||||||
▲ | simonw 5 days ago | parent | next [-] | ||||||||||||||||
This comment belongs in a discussion about using LLMs to help write code for large existing systems - it's a bit out of place in a discussion about a tutorial on building coding agents to help people understand how the basic tools-in-a-loop pattern works. | |||||||||||||||||
| |||||||||||||||||
▲ | pcwelder 5 days ago | parent | prev | next [-] | ||||||||||||||||
Agree. To reduce costs: 1. Precompute frequently used knowledge and surface early. For example repository structure, os information, system time. 2. Anticipate next tool calls. If a match is not found while editing, instead of simply failing, return closest matching snippet. If read file tool gets a directory, return directory contents. 3. Parallel tool calls. Claude needs either a batch tool or special scaffolding to promote parallel tool calls. Single tool call per turn is very expensive. Are there any other such general ideas? | |||||||||||||||||
| |||||||||||||||||
▲ | NitpickLawyer 5 days ago | parent | prev | next [-] | ||||||||||||||||
There's "swe re-bench", a benchmark that tracks model release dates, and you can see how the model did for "real-world" bugs that got submitted on github after the model was released. (obviously works best for open models). There are a few models that solve 30-50% of (new) tasks pulled from real-wolrd repos. So ... yeah. | |||||||||||||||||
▲ | righthand 5 days ago | parent | prev [-] | ||||||||||||||||
Surprise, as rambunctious dev who’s socially hacked their way through promotions, I will just convince our manager we need to rewrite the platform in a new stack or convince them that I need to write a new server to handle the feature. No old tech needed! |