| ▲ | Open (Apache 2.0) TTS model for streaming conversational audio in realtime(github.com) | |||||||
| 37 points by SweetSoftPillow 4 days ago | 4 comments | ||||||||
| ▲ | ks2048 4 hours ago | parent | next [-] | |||||||
> Our work was heavily inspired by KyutaiTTS and Sesame I wish they’d describe the technical details of the differences between this and other TTS they were “inspired by”. So many projects like this, I will just have to assume they are vibe-coded clones to get some publicity unless there’s more technical details. | ||||||||
| ||||||||
| ▲ | woodson 3 hours ago | parent | prev | next [-] | |||||||
Looks very similar to Kyutai’s models, given that it uses the same neural audio codec (Mimi) and Depformer module etc. | ||||||||
| ▲ | echelon 42 minutes ago | parent | prev [-] | |||||||
You thought journalists hated AI before? Just wait until the dam bursts on real time AI girlfriends that talk back to their partners in real time. That can have their voices and personalities fine tuned. That can be rigged up to 3D characters with body rigs and facial blend shapes. This team is getting us significantly closer to that future. Journalists are going to be fervently writing angry articles about this. | ||||||||