Remix.run Logo
idoubtit 3 hours ago

I expected a toy project, but it is a usable library, which required a lot of work. Good job on delivering. A few comments:

After reading "composer.json", I thought that the tests used a custom framework. I'm glad the project does not suffer from NIH syndrome, but the dev dependency on PHPUnit should be declared.

There should a warning that it's only meant for some Western Latin languages. The normalization of the input is built on a character table for a handful of cases. That's not enough for some Latin languages, e.g. Turkish. And any input with Cyrillic, Arabic, CJK and so on, will be ignored.

There is no Unicode normalization or cleanup. Real-life input have many corner cases, e.g. diacritics next to the characters, or invisible characters inside a word to prevent hyphenation. Unless I'm mistaken, this engine would treat the NFD form "fête" as "fe te", instead of the expected "fete", which the NFKD form "fête" produces. I suggest using ext-intl for Unicode normalization, at least as an option.

Lastly, I can't think of a use case for this library. I've always had access to some external service (MySQL, Postgresql, Manticore Search, Solr, etc.) or to a PHP extension for a local Sqlite with FTS. Even for hobby projects, I haven't deployed to a shared hosting for more than two decades.

asmodios 3 hours ago | parent [-]

Thank you for the detailed feedback, it's genuinely valuable.

You're right on all technical points : PHPUnit missing from dev dependencies is an oversight I'll fix, and the Unicode limitations are real and should be clearly documented. The NFD/NFKD case is a good catch.

On the use case: fair point. My motivation came from testing MySQL and SQLite full-text search on a shared OVH hosting : the performance with filters was consistently disappointing. That's the itch this scratches. I understand it doesn't match your experience, and that's perfectly legitimate.