Remix.run Logo
raahelb 2 hours ago

Interesting to note that the reduced latency is not just due to the improved model speed, but also because of improvements made to the harness itself:

> "As we trained Codex-Spark, it became apparent that model speed was just part of the equation for real-time collaboration—we also needed to reduce latency across the full request-response pipeline. We implemented end-to-end latency improvements in our harness that will benefit all models [...] Through the introduction of a persistent WebSocket connection and targeted optimizations inside of Responses API, we reduced overhead per client/server roundtrip by 80%, per-token overhead by 30%, and time-to-first-token by 50%. The WebSocket path is enabled for Codex-Spark by default and will become the default for all models soon."

I wonder if all other harnesses (Claude Code, OpenCode, Cursor etc.,) can make similar improvements to reduce latency. I've been vibe coding (or doing agentic engineering) with Claude Code a lot for the last few days and I've had some tasks take as long as 30 minutes.

2001zhaozhao 40 minutes ago | parent [-]

This might actually be hard for open source agents (e.g. Opencode) to replicate, barring a standardized WebSocket LLM API being widely adopted.