Remix.run Logo
bitpush 5 days ago

Its frustrating to see how HN commenters just jumped onto conclusions without even doing any bit of critical thinking.

The top comment on HN says

> So Youtube changed how views are counted and is blaming ad blockers?

When even a cursory look would show that if you block stats-aggregation endpoint .. stats go down. Sometimes it is occam's razor.

nomel 5 days ago | parent | next [-]

It's a 20 year streaming service, and it's Google. There's a certain expectation I have from that. The fact that it's just an endpoint being hit by the client is...baffling. I don't think it's in the realm of expected possibilities for most of us, being the most naive, and fragile, implementation possible.

The fact that ad revenue didn't change means they do have robust ad tracking, but the view numbers are +/- some unexpected level of fiction.

spankalee 5 days ago | parent [-]

> The fact that ad revenue didn't change means they do have robust ad tracking

Ad tracking is usually done client-side too, so ad revenue being stable just means that the missing view counts are probably limited to the users who already weren't viewing ads.

philjohn 5 days ago | parent | prev [-]

The question becomes ... why are they relying on client side counting of views? They know how much of a video they've distributed to a given client on the backend after all (YouTube does buffer, but not the whole video).

gregschlom 5 days ago | parent | next [-]

Not necessarily. Youtube makes extensive use of third-party CDNs. A lot of the videos aren't coming from their servers at all. I believe that's also why it's so hard for them to embed the ad directly in the video. They instead having to rely on splicing the ads client-side, which makes it possible to block.

Disclaimer: I work at Google but not at Youtube and have no idea how things work really. This is just based on some info I read online.

therein 5 days ago | parent [-]

Yeah they give caching boxes to ISPs as far as I can tell, and videos are served from there if they exist in that cache. About 8-10 years ago, they had an issue with that and they'd serve you the wrong video because your neighbor had watched something and it was in the cache. Literally title of the video wouldn't match what is playing.

smallnix 5 days ago | parent | next [-]

And these caching boxes can't talk back to Google?

5 days ago | parent | prev [-]
[deleted]
spankalee 5 days ago | parent | prev | next [-]

YouTube has a crazy CDN. They very well might not be able easily attribute exactly what the client requests to specific accounts.

nightpool 5 days ago | parent | prev | next [-]

The other commenters point out more prosaic problems with CDN architecture, but a more product-focused answer for this is "because users execute Javascript but bots don't". Using client side counting is an easy way to filter out simple automated traffic.

Also, with segmented MP4 streams, the files on the backend won't necessarily be easy to match up 1:1 with videos. How do you count the views if someone watches a video, and then skips back to watch the middle section a few times, and then doesn't finish it? Because that would show up as (1, 1, 4, 3, 0) in your database for the different files involved. Now imagine doing that for ~500 people on a shared IP address for their high school. And now your minimum threshold for view counting is tied to the size of your MP4 chunks, or range requests. And now you've put this view counting logic into the hot path of serving terabytes of data.

From a product perspective, you can see why "A video view is counted the first time the user presses the play button and watches for at least 30 seconds" is a much more desirable definition, both technically and for stakeholders (video creators, advertisers, etc) to understand.

axus 5 days ago | parent | prev | next [-]

The computers serving advertisements should also know how much data has been delivered. Alphabet should be able to expect more from a CDN they have a business relationship with, than the people watching YouTube.

arccy 5 days ago | parent | prev [-]

because distributed CDN means it doesn't necessarily hit a backend?