Remix.run Logo
ekr____ 2 days ago

I'm not sure that this mechanism delivers the desired privacy benefit, and it's quite hard to make sure it does so.

For example, the paper you cite here uses consistent hashing, where you hash the domain name and then divide by K where K is the number of the resolvers. However, consider the case where you have a conceptual site (e.g., Gmail) which actually loads resources from multiple FQDNS. For example, if you pull up the network console for a naive load of X, it loads resources from at least the following domains:

x.com, api.x.com, abs.twimg.com, pbs.twimg.com, video.twimg.com

All of these are relatively characteristic of X, but in a naive design they would often be loaded from multiple resolvers, with the result that you're actually sharing your browsing history with more resolvers than if you just had a single resolver. As is suggested by this list, you might be able to improve the situation somewhat by hashing on ETLD+1, but even here there are 2 ETLD+1s, which is not an uncommon scenario.

In general, for this strategy to work you need to hash not on the domain but rather on the conceptual site, but this information is not readily available to the browser.