| ▲ | raahelb 2 hours ago | |
Interesting to note that the reduced latency is not just due to the improved model speed, but also because of improvements made to the harness itself: > "As we trained Codex-Spark, it became apparent that model speed was just part of the equation for real-time collaboration—we also needed to reduce latency across the full request-response pipeline. We implemented end-to-end latency improvements in our harness that will benefit all models [...] Through the introduction of a persistent WebSocket connection and targeted optimizations inside of Responses API, we reduced overhead per client/server roundtrip by 80%, per-token overhead by 30%, and time-to-first-token by 50%. The WebSocket path is enabled for Codex-Spark by default and will become the default for all models soon." I wonder if all other harnesses (Claude Code, OpenCode, Cursor etc.,) can make similar improvements to reduce latency. I've been vibe coding (or doing agentic engineering) with Claude Code a lot for the last few days and I've had some tasks take as long as 30 minutes. | ||
| ▲ | 2001zhaozhao 40 minutes ago | parent [-] | |
This might actually be hard for open source agents (e.g. Opencode) to replicate, barring a standardized WebSocket LLM API being widely adopted. | ||