Looks very similar to Kyutai’s models, given that it uses the same neural audio codec (Mimi) and Depformer module etc.