Remix.run Logo
bpanahij 3 hours ago

The response timing in the chart in the blog post shows that even with perfect precision/recall Sparrow-1 also has the fastest true positive response times.

The turn taking models were evaluated in a controlled environment with no additional cascaded steps: LLM, TTS, Phx. This matters to get apples to apples comparison: without the rest of the pipeline variability influencing the measurements.

The video conversation examples are sparrow-1 within the full pipeline. These responses aren’t as fast as sparrow itself because the LLM, TTS, facial rendering, and network transport also take time. Without Sparrow-1 they would be slower. Sparrow-1 enables the responses being as fast as they are, and with a faster CVI pipeline configuration the responses can be as fast as 430ms in my testing.