▲ | pcwelder 5 days ago | |
Agree. To reduce costs: 1. Precompute frequently used knowledge and surface early. For example repository structure, os information, system time. 2. Anticipate next tool calls. If a match is not found while editing, instead of simply failing, return closest matching snippet. If read file tool gets a directory, return directory contents. 3. Parallel tool calls. Claude needs either a batch tool or special scaffolding to promote parallel tool calls. Single tool call per turn is very expensive. Are there any other such general ideas? | ||
▲ | faangguyindia 5 days ago | parent [-] | |
that info can be just included in preffix which is cache by LLM, reducing cost by 70-80% average. System time varies, so it's not good idea to specify it in prompt, better to make a function out of it to avoid cache invalidation. I am still looking for a good "memory" solution, so far running without it. Haven't looked too deep into it. Not sure how next tool call be predicted. I am still using serial tool calls as i do not have any subagents, i just use fast inference models for directly tools calls. It works so fast, i doubt i'll benefit from parallel anything. |