Remix.run Logo
yalok 4 hours ago

very cool idea. But, time savings are not true for every tool call, and it's not clear to me yet whether this is batch-able; also, intuitively, for most of the models that run on GPU, you'd still want to offload tool exec part to CPU since it's much cheaper...

hrmtst93837 3 hours ago | parent [-]

If you push tool execution into the model itself, you inherit all the I/O unpredictability and error handling baggage, but now inside a GPU context that's allergic to latency. Inference throughput tanks if external calls start blocking, and A100s make expensive waiters. Batching is fantasy unless you know up front exactly what gets executed, which is the opposite of dynamic tools. If you want "faster" here, the trade is reliable deterministic compute versus the usual Wild West of system calls and side effects.