Remix.run Logo
notathrowaway51 2 hours ago

Fun fact: when treated with unicode Normalization Form Canonical Decomposition, 8 out of 9 polish letters (ż,ó,ć,ę,ś,ą,ź,ń) break down into base letter + combining diacritical mark, but ł stays intact. That means you can't use sqlite's unicode61 remove_diacritics tokenizer to normalize polish text for FTS.

ks2048 21 minutes ago | parent [-]

When a Polish speaker searches for something with “ł”, do they expect to also see “l”?