Remix.run Logo
Wan Streamer v0.1: End-to-End Real-Time Interactive Foundation Models(wan-streamer.com)
2 points by davedx 8 hours ago | 1 comments
davedx 8 hours ago | parent [-]

Wan Streamer is a native-streaming, end-to-end interactive foundation model, designed from the ground up for real-time, low-latency, full-duplex audio-visual interaction. It models language, audio, and video as both input and output within a single Transformer: the sequence is an interleaving of visual, audio, and text input tokens with visual, audio, and text output tokens, coordinated by block-causal attention for incremental streaming.