The real-time low latency multi channel audio streaming needed for musicians is awfully similar to the real time low latency multi channel audio streaming required for telephony.

Yet somehow the two industries have pretty much entirely different tech stacks and don't seem to talk to one another.

▲

dagmx 6 hours ago | parent | next [-]

This is very much not true.

Telephony is significantly less latency sensitive than real time audio processing, it’s also significantly less taxing since you’re dealing with a single channel.

The level of compression and audio resolution required are significantly different too. You can tune codecs for voice specifically, but you don’t want compression when recording audio and can’t bias towards specific inputs.

They’re only similar in that they handle audio. But that’s like saying the needs of a unicycle and the needs of an F1 car are inherently the same because they have wheels.

▲

NikolaNovak 4 hours ago | parent | prev | next [-]

Most telephony I've experienced has latency measured in seconds (if you ever call your friend or spouse sitting next to you it becomes very obvious :) vs audio recording and processing which is measured in milliseconds.

Additionally, from what little I'm aware of, telephony is heavily optimized for particular frequencies of human voice and then heavily compressed within that. As well, any single telephony stream is basically a single channel. A song may have dozen of channels, at high resolution, full spectrum, all sorts of computationally demanding effects and processing, and still need latency and sync measured on milliseconds.

So... Kind of the opposite of each other,while both being about processing sound :-).

▲

embedding-shape 5 hours ago | parent | prev | next [-]

I feel like equating telephony and music production is like saying writing firmware and a HTTP/JSON backend for a website is the same. True, both are programming I suppose, but vastly different requirements, assumptions and environments.

▲

sroerick 6 hours ago | parent | prev | next [-]

This is a very interesting thought. I'm not super experienced with low level audio and basically completely ignorant of telephony.

I feel like most people doing audio in music are not working at the low level. Even if they are creating their own plugins, they are probably not integrating with the audio interface. The point of JACK or Pipewire is to basically abstract all of that away so people can focus on the instrument.

The latency in music is a much, much bigger issue than in voice, so any latency spike would render network audio completely unusable. I know Zoom has a "real time audio for musicians" feature, but outside of a few Zoom demos during lockdown, I'm not sure anybody uses this.

Pipewire supports audio channels over network, but again I'm not entirely sure what this is for. Certainly it's useful for streaming music from device A to device B, but I'm not sure anybody uses it in a production setting.

I could see something like a "live coding symphony", where people have their own livecoding setups and the audio is generated on a central server. This is not too different than what, say, Animal Collective did. But while live coding is a beautiful medium on its own, it does lack the muscle memory and tactile feedback you get from playing an instrument.

I would love to see, as you said, these fields collaborate, but these, to me, are the immediate blockers which make it less practical.

▲

qwertox an hour ago | parent | next [-]

"Even if they are creating their own plugins, they are probably not integrating with the audio interface".

The audio interface is abstracted away in exchange for some metadata about the buffer's properties and the buffer itself, and that is true for basically everything related to audio: the buffer is the lowest level the OS offers you, and you are free to implement lower-level stuff in your dsp/instrument, like using assembly, maybe also functions for SSE, AVX or NEON based acceleration.

You get chunks of samples in a buffer, you read them, do something with them and write the result out into another buffer.

"Pipewire supports audio channels over network" thanks for reminding me: I'm planning to stream the audio out of my Windows machine to a raspi zero to which I will then connect my bluetooth headphones. First tests worked, but the latency is really bad with shairport-sync [0] at around 400 ms. This is what I would use Pipewire for, if my workstation were Linux and not Windows.

Maybe Snapcast [1] could be interesting for you: "Snapcast is a multiroom client-server audio player, where all clients are time synchronized with the server to play perfectly synced audio. It's not a standalone player, but an extension that turns your existing audio player into a Sonos-like multiroom solution."

"I could see something like a "live coding symphony", where people have their own livecoding setups and the audio is generated on a central server." Tidal Cycles [2] might interest you, or the JavaScript port named Strudel [3]. Tidal can synchronize multiple instances via Link Synchronization. Then there's Troop [4], which "is a real-time collaborative tool that enables group live coding within the same document across multiple computers. Hypothetically Troop can talk to any interpreter that can take input as a string from the command line but it is already configured to work with live coding languages FoxDot, TidalCycles, and SuperCollider."

[0] https://github.com/mikebrady/shairport-sync

[1] https://github.com/snapcast/snapcast

[2] https://tidalcycles.org

[3] https://strudel.cc

[4] https://github.com/Qirky/Troop*

▲

harvey9 5 hours ago | parent | prev [-]

Regarding Zoom, music lessons 1:1 online are still pretty common. I would guess this won't hold up with multiple musicians.

▲

NikolaNovak 4 hours ago | parent [-]

Music lessons online are common (I've been in them) because they're largely single duplex. Student plays, teacher listens. Then teacher comments and demonstrates, student listens.

There are projects that aim to provide synced multi player jamming, but last I checked they are all based around looping. Human ear SHOCKINGLY does not lend itself to being fooled and will noticed surprisingly small sync issues.

I always compare it with photo editing where you can cheat and smudge some background details with no one the wiser, whereas any regular non-audiophile will notice similar smudging or sync in audio.

	▲	ssl-3 2 hours ago \| parent [-]
		Sonobus is a software project that tries to accomplish live, audio-only multi-player jamming over the public network. It's still limited to whatever latency the network has, but it can be useful for some things. If that means it's mostly useful for loops, then that's up to the musicians. :) (I myself have used it for remote livestream participants, but only for voice. I was able to get distinct inputs into my console just like folks in the studio had, and I gave them a mix-minus bus that included everyone's voice but their own, for their headphones. It worked slick. Interaction was quick and quality was excellent. And unlike what popularly passes for teleconferencing these days: It all flowed smoothly and sounded like they were in the room with us, even though they were a thousand miles away.)

▲

saidnooneever 6 hours ago | parent | prev [-]

irony amplified by the nature of the tech stacks xD surely they can figure out some channel to communicate over clearly haha