Remix.run Logo
doublesocket 4 hours ago

Supporting 10 different languages in regex is a drop in the ocean. The regex can be generated programmatically and you can compress regexes easily. We used to have a compressed regex that could match any placename or street name in the UK in a few MB of RAM. It was silly quick.

astrocat 2 hours ago | parent | next [-]

woah. This is a regex use I've never heard of. I'd absolutely love to see a writeup on this approach - how its done and when it's useful.

benlivengood an hour ago | parent [-]

You can literally | together every street address or other string you want to match in a giant disjunction, and then run a DFA/NFA minimization over that to get it down to a reasonable size. Maybe there are some fast regex simplification algorithms as well, but working directly with the finite automata has decades of research and probably can be more fully optimized.

cogman10 2 hours ago | parent | prev [-]

I think it will depend on the language. There are a few non-latin languages where a simple word search likely won't be enough for a regex to properly apply.