| ▲ | Show HN: I ran every Claude agent turn through the Batch API(eran.sandler.co.il) | |
| 3 points by erans 10 hours ago | ||
I built a tiny Python REPL to answer a dumb-but-useful question: What happens if every turn in an agent loop goes through Anthropic’s Batch API instead of the normal synchronous endpoint? The motivation was cost. Batch API is 50% off, which sounds very attractive for agent workloads: evals, background research agents, CI agents, unattended subagents, etc. The result: it works, but it is awful for a single interactive agent. In my runs, a one-entry batch usually took ~90–120 seconds to complete. That means a five-turn tool loop becomes a ten-minute interaction. Waiting two minutes for the model to decide it needs to run ls is not a good UX. But that was also the point of the experiment. A single REPL turn is probably the wrong unit to batch. The interesting version is fleet-level batching: - many agents running in parallel - background subagents - CI/eval jobs - multiple harnesses sharing a local proxy - shared prompt prefixes that may benefit from caching In that world, the batcher should probably sit below the harness as infrastructure. Existing tools keep using the normal API shape, while a proxy decides per request whether it should go sync or async based on latency tolerance. One surprising observation: in my small, non-rigorous testing, Haiku batches often felt slower than Sonnet/Opus batches. I wouldn’t treat that as a benchmark, but it does suggest routers should measure this rather than assuming “cheap model = batch model.” Repo is here: https://github.com/erans/batching-harness It is intentionally small: one Python file, a basic tool loop, local shell tool, stats panel, and minimal sandboxing. The useful lesson for me was: Batch API is terrible as an interaction pattern for one agent. It might be very useful as a hidden optimization layer for a fleet of agents. | ||