Remix.run Logo
fsiefken 4 days ago

I wonder if Codec2 could be replaced by one of the low bitrate neural audio codecs, HILcodec and SementiCodec sound better at 2-3 kbps.

https://arxiv.org/pdf/2405.04752

https://arxiv.org/pdf/2409.14085

Calwestjobs 4 days ago | parent [-]

i rather transfer few bits of text with text-to-speech and speech-to-text on end device, providing much better experience. and technically with 2kbps it is not very different than what these codecs do.

depends on device, on one hand there are handheld radios which have small ARM for UI and control of dedicated radio chip, and then there are mobile phones/laptops/tablets with so much neural processing on board that it can have model sounding like person/celebrity of your choosing.

fsiefken 2 days ago | parent [-]

Yes, i have thought of that once, thanks for reminding me. I am so focused on transparent speech compression that I forgot about a more lossy speech encoding method. So with zstd compression you could reach 30 bps of bandwidth, cloning your voice and your communication partners voice with a voice font and perhaps also clone a personal conversation style with Dia https://github.com/nari-labs/dia - you might get close to the 'natural' conversation.