Remix.run Logo
philjohn 5 days ago

The question becomes ... why are they relying on client side counting of views? They know how much of a video they've distributed to a given client on the backend after all (YouTube does buffer, but not the whole video).

gregschlom 5 days ago | parent | next [-]

Not necessarily. Youtube makes extensive use of third-party CDNs. A lot of the videos aren't coming from their servers at all. I believe that's also why it's so hard for them to embed the ad directly in the video. They instead having to rely on splicing the ads client-side, which makes it possible to block.

Disclaimer: I work at Google but not at Youtube and have no idea how things work really. This is just based on some info I read online.

therein 5 days ago | parent [-]

Yeah they give caching boxes to ISPs as far as I can tell, and videos are served from there if they exist in that cache. About 8-10 years ago, they had an issue with that and they'd serve you the wrong video because your neighbor had watched something and it was in the cache. Literally title of the video wouldn't match what is playing.

smallnix 5 days ago | parent | next [-]

And these caching boxes can't talk back to Google?

5 days ago | parent | prev [-]
[deleted]
spankalee 5 days ago | parent | prev | next [-]

YouTube has a crazy CDN. They very well might not be able easily attribute exactly what the client requests to specific accounts.

nightpool 5 days ago | parent | prev | next [-]

The other commenters point out more prosaic problems with CDN architecture, but a more product-focused answer for this is "because users execute Javascript but bots don't". Using client side counting is an easy way to filter out simple automated traffic.

Also, with segmented MP4 streams, the files on the backend won't necessarily be easy to match up 1:1 with videos. How do you count the views if someone watches a video, and then skips back to watch the middle section a few times, and then doesn't finish it? Because that would show up as (1, 1, 4, 3, 0) in your database for the different files involved. Now imagine doing that for ~500 people on a shared IP address for their high school. And now your minimum threshold for view counting is tied to the size of your MP4 chunks, or range requests. And now you've put this view counting logic into the hot path of serving terabytes of data.

From a product perspective, you can see why "A video view is counted the first time the user presses the play button and watches for at least 30 seconds" is a much more desirable definition, both technically and for stakeholders (video creators, advertisers, etc) to understand.

axus 5 days ago | parent | prev | next [-]

The computers serving advertisements should also know how much data has been delivered. Alphabet should be able to expect more from a CDN they have a business relationship with, than the people watching YouTube.

arccy 5 days ago | parent | prev [-]

because distributed CDN means it doesn't necessarily hit a backend?