Remix.run Logo
kccqzy 3 days ago

The Public Suffix List changes often. I have once worked with a team who built a major feature on top of PSL, but the person who built it did not at all consider how it might handle changes to it. Basically, the feature analyzed domains and uses PSL data to extract the "important part" of the domain, and then stored that in the database as part of a primary key in a table. But when the PSL changes, the database needed to be taken offline for certain tables to be completely rebuilt. And code querying the database had to be updated in lockstep with the database changes. This design made zero-downtime deployments difficult. It then took quite a while for the team to evolve the schema such that the database contents would not depend on the PSL.

This is just one cautionary tale I have personally experienced.

whalesalad 3 days ago | parent [-]

It's also full of non-icann extensions. So a naive implementation will identify "github.io" as a TLD. There are lots of nuances to working with this list. Our team has a pretty robust internal (Python) library now that we hope to open source soon.

kccqzy 3 days ago | parent [-]

The whole point of PSL is to identify "github.io" as a TLD. Anyone can create a subdomain of it. Just like anyone can create a new subdomain of "com" (a real TLD).

type0 3 days ago | parent [-]

The difference is you don't register a domain under github.io, you merely loan it. Some countries, like Poland, have a bunch that are real domain suffixes

https://www.dns.pl/en/list_of_functional_domain_names

degamad 3 days ago | parent [-]

Loaning or renting (registering) amount to the same thing for the purposes of the the public suffix list: because the *public* can create entries under github.io, you cannot assume that alice.github.io and eve.github.io are controlled by the same entity, so you should not share alice.github.io's data (e.g. cookies) with eve.github.io.

whalesalad 2 days ago | parent [-]

There is no formal ICANN TLD list. The PSL is your best shot. So it is actually wrong to assume that your situation is the sole purpose.

For instance, https://data.iana.org/TLD/tlds-alpha-by-domain.txt

Where is .co.uk ? That is - for all intents and purposes - considered a TLD.

So PSL is currently doing double-duty and the distinction is very important.