▲ | faangguyindia 5 days ago | |
that info can be just included in preffix which is cache by LLM, reducing cost by 70-80% average. System time varies, so it's not good idea to specify it in prompt, better to make a function out of it to avoid cache invalidation. I am still looking for a good "memory" solution, so far running without it. Haven't looked too deep into it. Not sure how next tool call be predicted. I am still using serial tool calls as i do not have any subagents, i just use fast inference models for directly tools calls. It works so fast, i doubt i'll benefit from parallel anything. |