| ▲ | jamesblonde 5 hours ago |
| I gave a talk at PyData Berlin on how to build your own TikTok recommendation algorithm. The TikTok personalized recommendation engine is the world's most valuable AI. It's TikTok's differentiation. It updates recommendations within 1 second of you clicking - at human perceivable latency. If your AI recommender has poor feature freshness, it will be perceived as slow, not intelligent - no matter how good the recommendations are. TikTok's recommender is partly built on European Technology (Apache Flink for real-time feature computation), along with Kafka, and distributed model training infrastructure. The Monolith paper is misleading that the 'online training' is key. It is not. It is that your clicks are made available as features for predicitons in less than 1 second. You need a per-event stream processing architecture for this (like Flink - Feldera would be my modern choice as an incremental streaming engine). * https://www.youtube.com/watch?v=skZ1HcF7AsM * Monolith paper - https://arxiv.org/pdf/2209.07663 |
|
| ▲ | dmix 4 hours ago | parent | next [-] |
| I noticed Youtube shorts also seems to update the feed based on how long the last video you watched. If you're scrolling quickly then stop to watch a dog video long enough the next one is likely to be another dog video. |
| |
| ▲ | BoxOfRain 30 minutes ago | parent | next [-] | | One of my gripes with youtube at the moment is that they break my adblock filters to remove shorts more often than they break the filters stopping the actual ads. | | |
| ▲ | sidharthv 20 minutes ago | parent [-] | | I naively searched in the mobile app settings for a way to turn off shorts, before realising there will not be one. |
| |
| ▲ | beAbU 28 minutes ago | parent | prev | next [-] | | Facebook does the same. The longer I dwell on an image post, the more likely the next batch of posts would be similar | | |
| ▲ | coliveira a minute ago | parent [-] | | The right way to look at these networks is that people are being trained by the algorithm, not the other way around. The ultimate goal is to elicit behaviors in humans, normally to spend more time and spend more money in the platform, but also for other goals that may be designed by the owners of the network. |
| |
| ▲ | randysalami 3 hours ago | parent | prev | next [-] | | I’ve noticed the same thing and this creates such a negative user experience. Every short is a reaction test and if I fail, I get slop. Makes the whole experience very jarring (for better or for worse). | | |
| ▲ | datsci_est_2015 3 hours ago | parent | next [-] | | For better or worse with regards to my addiction, my subscriptions are all either science channels or high effort / high production comedy skits (e.g. DropoutTV). I still get slop, but I never subscribe and it mostly remains background noise | | |
| ▲ | AreShoesFeet000 2 hours ago | parent | next [-] | | That’s the point though. It may seem as if you’re not in control when scrolling, but you can adjust your behavior to get the content you’re looking for almost intuitively. That’s actually something good in my honest opinion. | | |
| ▲ | Jensson 2 hours ago | parent | next [-] | | Why is it good that you need self control to not get slop? Its much better if you can just turn that off and relax rather than having to stay alert to avoid certain content that it tries to trick you to serve you more slop. Distancing yourself from temptations is an effective and proven way to get rid of addictions, the programs constantly trying to get you to relapse is not a good feature. Like imagine a fridge that constantly puts in beer, that would be very bad for alcoholics and people would just say "just don't drink the beer?" even though this is a real problem with an easy fix. | | |
| ▲ | james_marks 2 hours ago | parent | next [-] | | Basically, I want to set boundaries in a healthy frame of mind, and have that default respected when my self control is lower because I’m tired, depressed, bored, etc. “The algorithm” of social media is the opposite. | | |
| ▲ | AreShoesFeet000 an hour ago | parent [-] | | I think your reply has me convinced. You really can’t expect to have such self control all of the time. Damn. |
| |
| ▲ | AreShoesFeet000 2 hours ago | parent | prev [-] | | It’s because content curation is inherently impossible to reach the same level of relevance as direct feedback from user behavior. You mix in all kinds of biases, commercial interests, ideology of the curator, etc, and you inevitably get irrelevant slop. The algorithm puts you in control a little bit more. | | |
| ▲ | Jensson 2 hours ago | parent [-] | | > The algorithm puts you in control a little bit more. Why not let you choose to get a less addictive algorithm? Older algorithms were less addictive, so its not at all impossible to do this, many users would want this. | | |
| ▲ | AreShoesFeet000 2 hours ago | parent | next [-] | | I just don’t think that the addiction is exclusively due to the algorithm. There’s really a lack of affordable varied options for learning trade and entertainment. We say in Portuguese: You shouldn’t throw the baby away along with the water you used to bathe. | | |
| ▲ | Jensson an hour ago | parent [-] | | I don't see any harm that could come from saying "a less addictive algorithm needs to be available to users"? For example, lets say there is an option to only recommend videos from channels you subscribe to, that would be much less addictive, why isn't that an option? A regulation that forces these companies to add such a feature would only make the world a better place. | | |
| ▲ | AreShoesFeet000 an hour ago | parent [-] | | I agree, but what would be the actual mechanism that would allow that? I believe we’re out of ideas. TikTok’s crime was just be firmly successful because of good engineering. There’s no evil sauce apart from promotional content and occasional manipulation, which has nothing to do with the algorithm per se. And about whitelisting, I honestly don’t think you’re comparing apples to apples. The point of the algorithm is dynamically recommending new content. It’s about discovery. | | |
| ▲ | Jensson an hour ago | parent [-] | | > I agree, but what would be the actual mechanism that would allow that? Governments saying "if you are a social content platform with more than XX million users you have to provide these options on recommendation algorithms: X Y Z". It is that easy. > And about whitelisting, I honestly don’t think you’re comparing apples to apples. The point of the algorithm is dynamically recommending new content. It’s about discovery. And some people want to turn off that pushed discovery and just get recommended videos from a set of channels that they subscribed to. They still want to watch some tiktok videos, they just don't want the algorithm to try to push bad content on them. You are right that you can't avoid such algorithm when searching for new content, but I don't see why it has to be there in content it pushes onto you without you asking for new content. | | |
| ▲ | AreShoesFeet000 an hour ago | parent [-] | | Fair enough. I’m not really a fan of regulation. The capitalist State is a total mess, but I really think we should try your idea. |
|
|
|
| |
| ▲ | kylecazar 2 hours ago | parent | prev [-] | | They're optimizing for time spent on the platform. | | |
| ▲ | Jensson an hour ago | parent [-] | | And that is why these algorithms needs to be regulated. People don't want to pick the algorithm that makes them spend the most time possible on their phones, many would want an algorithm that optimizes for quality rather than quantity on the app so they get more time to do other things. But corporations doesn't want to provide that because they don't earn anything from it. |
|
|
|
| |
| ▲ | Forgeties79 2 hours ago | parent | prev [-] | | I don’t agree tbh. This is part of how people wind up down extremist rabbit holes. If you’re just lazily scrolling it can easily trap you in its gravity well. | | |
| ▲ | AreShoesFeet000 2 hours ago | parent [-] | | But you can get into extremist rabbit holes independently of control surface. Remember 4chan? Dangerous content is a matter of moderation regardless of interfacing. | | |
| ▲ | Forgeties79 2 hours ago | parent [-] | | 4chan is nothing like TikTok, though yes I agree heavy moderation is necessary for both. |
|
|
| |
| ▲ | xp84 2 hours ago | parent | prev [-] | | I try to react as “violently” as possible to any slop and low-quality crap (e.g. stupid “life hacks” purposely bad to ragebait the comments). On YouTube it’s called “Don’t recommend this channel” and on Facebook it’s multiple taps but you can “Hide All From…”
Basically, I don’t trust that thumbs down is sufficient. It is of course silly, since there are no doubt millions of bad channels and I probably can’t mute them all. |
| |
| ▲ | MengerSponge 2 hours ago | parent | prev [-] | | They built a slop machine, not something tuned for positive UX. https://en.wikipedia.org/wiki/The_purpose_of_a_system_is_wha... |
| |
| ▲ | pandemic_region 3 hours ago | parent | prev [-] | | I've been insta-skipping tennis video's for months now. Still getting Federer on a daily basis. |
|
|
| ▲ | 3abiton 22 minutes ago | parent | prev | next [-] |
| It's interesting to how they found out the "lifetime" of features is a feature by itself. Meta features is real. |
|
| ▲ | vjerancrnjak 4 hours ago | parent | prev | next [-] |
| Flink is too slow for this. If by features you mean tracking state per user, that stuff can be tracked without Flink insanely fast with Redis as well. If you re saying they dont have to load data to update the state, I dont see how massive these states are to require inmemory updates, and if so, you could just do inmemory updates without Flink. Similarly, any consumer will have to deal with batches of users and pipelining. Flink is just a bottleneck. If they actually use Flink for this, its not the moat. |
| |
| ▲ | btown 2 hours ago | parent [-] | | Yea, the Monolith paper by Bytedance uses Flink but they only say it's in use for their B2B ecommerce optimization system. Maybe this is intentional ambiguity, but I'd believe that they wouldn't rely on something like Flink for their core TikTok infrastructure. My hunch is we start to learn a lot more about the core internals as Oracle tries to market to B2B customers, as Oracle is wont to do! | | |
| ▲ | vjerancrnjak an hour ago | parent [-] | | Flink is not really a performance choice, it's bloat to throw software as fast as possible at problems. I don't think there's any benchmark demonstrating insane capabilities per machine. I definitely couldn't get it to any numbers I liked, given other stream processing / state processing engines that exist (if compute and inmemory state management is the goal). Pretty sure any pathway that touches RocksDB slows everything down to 1-10k events per second, if not less. The problem of finding out which video is next, by immediately taking into account the recent user context (and other user context) is completely unrelated to what Flink does -- exactly-once state consistency, distributed checkpoints, recovery, event-time semantics, large keyed state. I would even say you don't want a solution to any of the problems Flink solves, you want to avoid having these problems. |
|
|
|
| ▲ | bobek 35 minutes ago | parent | prev | next [-] |
| It is not only recommender though. These guys [1] seem to be able to react pretty quickly and not to create addicts on the way ;( [1] https://recombee.com |
|
| ▲ | lsuresh an hour ago | parent | prev | next [-] |
| Thanks for the Feldera shoutout Jim. For anyone else, if you want to try out Feldera and IVM for feature-engineering (it gives you perfect offline-online parity), you can start here: https://docs.feldera.com/use_cases/fraud_detection/ |
|
| ▲ | miohtama 4 hours ago | parent | prev | next [-] |
| TikTok's differention is the userbase of all teenagers in the world. |
| |
| ▲ | wongarsu 2 hours ago | parent | next [-] | | But go just one layer deeper to 'why is every teenager using Tiktok' and the primary answer once again becomes 'Tiktok's recommendation engine' | | |
| ▲ | expedition32 5 minutes ago | parent | next [-] | | No the primary answer is "teenagers do what other teenagers do". Remember we are advanced apes no more no less. There is this curious word "influencer" which everyone uses but few ever think about what it really means. | |
| ▲ | an hour ago | parent | prev [-] | | [deleted] |
| |
| ▲ | notyourwork 2 hours ago | parent | prev | next [-] | | That didn’t by accident though. | |
| ▲ | AlienRobot 3 hours ago | parent | prev [-] | | It also provides different opportunities for growth compared to other social media. A video that gets over half a million views on TikTok may not get 5 thousand on Youtube, or even 10 views on Instagram or Facebook. |
|
|
| ▲ | 4 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | 5 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | ryanjshaw 4 hours ago | parent | prev | next [-] |
| Great insight. Any thoughts on RisingWave? |
|
| ▲ | Jamesbeam 4 hours ago | parent | prev [-] |
| [flagged] |