Remix.run Logo
chimeracoder 4 hours ago

> See, that’s just flat-out lying. What’s this mythical circumstance where playing audio A at the same volume as audio B on one device will magically make A louder than Bon another?

Regarding your second point: as any audio engineer or electronic musician knows, the same exact audio absolutely will sound very different on different speakers, depending on how well they replicate various sounds, what level of gain is being applied, and the volume (which is different from gain, although people confuse the two).

That's even before you get into the fact that many modern devices, like smartphones, will apply their own compression or sound processing before playing the sound, sometimes to compensate for those deficiencies and make them less noticeable, and sometimes to "enhance" the sound.

Loudness/volume (technically different things but let's conflate them here) are also unintuitive because human ears don't have a flat frequency response curve, and some things will be perceived as louder despite being the same volume, or vice versa.

Advertisers actually can (and do) take advantage of this, by using sound engineering to make things feel louder while staying within the desired volume, by targeting the way humans perceive the sound.

This isn't a defense of the advertising/streaming companies here, because it is a solvable problem. But it is true that this is a problem that they need to solve.

kstrauser 4 hours ago | parent | next [-]

All that’s true, but those factors affect all the audio similarly. The article specifically talks about server-side ad insertion, so it’s not like the case where it somehow uses the device’s .mov codec to play the content and an MP3 codec to play the ad. All ffmpeg (most likely) knows is that it’s decoding one long stream, and doesn’t switch audio pipelines mid-stream when it thinks it might be playing an ad at that moment.

Regarding the perceptual volume differences: while true, that’s also a solvable problem. Output volumes can be calculated using standard curves. In any case, TV broadcasters have had to figure all this out years ago.

radley 2 hours ago | parent [-]

> those factors affect all the audio similarly... Output volumes can be calculated using standard curves... TV broadcasters have had to figure all this out years ago.

Sorry, but all of that is obtuse. The fact that some digital audio can be perceived as much louder than others –– yet it's all limited to the same digital range –– proves they aren't similar at all.

There is no such thing as a standard curve for compression. Source levels vary almost infinitely. Accurately separating and reducing sound after the fact, without turning the whole thing to mud, is considered to be an impossible technical challenge.

Next, TV broadcasters worked on a predetermined schedule with predetermined advertising. This gave them time to inspect and approve ads in advance.

Streaming ads are generally served just in time from third-party services to the streaming host. FFMPEG gets the output from the stream host, but the host has to combine content together from multiple sources (entertainment + multiple ad servers) into that single stream. Currently, sound-level is completely at the whim of each ad server, as well as each ad producer. Meanwhile, the final output is at the whim of the streaming host: 24-hour-news streaming sites probably have different audio standards than Apple TV+.

Ultimately, AI could potentially be used to solve it, since it can generate / make-up new sounds as part of reverse-compression. But it would still have to be done in advance by the third-party ad servers.

kstrauser 2 hours ago | parent [-]

None of this is true. There are standard curves for human hearing frequency response and you can use these to compare sound A’s volume to sound B. And since sound compression is in DCT space, you can calculate those numbers very quickly with something similar to sum(vol(f) * curve(f) for f in encoded_frequencies).

I read the article. It specifically talks about server-side ad embedding, i.e. where the service is inserting ad content into the streams, and therefore, by definition, has access to the ad content. They can do the calculations on their end during the embedding process and normalize volumes there before transmitting the result. To make things even easier, they don’t have to calculate the ad volume each time one’s streamed, just once per ad they’re going to serve.

And finally, all of this is a solved problem for TV broadcasters. They face the same problems: advertisers send them content to air, then the broadcasters are legally required to normalize the ad vs content volume, and they do. If this is an insurmountable problem that the streaming services face, they can drive over to their nearest TV station and ask them how they manage to pull off this technological feat.

davemp 3 hours ago | parent | prev | next [-]

It definitely isn’t simple, but it’s a pretty well trod path. If the FCC or state equivalent doesn’t have folks who can write the spec that’s a huge problem. I would be surprised if an existing spec doesn’t already exist that just needs to be applied to this scenario.

The streamers should be responsible for the signal. If the device front end has crazy frequency response or the backend does weird DSP tricks, that’s on the device manufacturers.

b112 4 hours ago | parent | prev [-]

I guess in the interim, while they try to work it out, they'll just have to make sure it's quieter.

Start at 1/4 the volume they use now.

After all, they don't need to approach compliance tuning and debugging from the loud side. They can start at a whisper and work up.

(I hope they get fined into bankruptcy, if they try to claim they're "working on it", but do so from the loud side.)