Remix.run Logo
adrian17 3 hours ago

The drop"-in" compatibility claims are also just wrong? I ran it on the old test suite from 6.0 (which is completely absent now), and quickly checking:

- the outputs, even if correctly deduced, are often incompatible: "utf-16be" turns into "utf-16-be", "UTF-16" turns into "utf-16-le" etc. FWIW, the old version appears to have been a bit of a mess (having had "UTF-16", "utf-16be" and "utf-16le" among its outputs) but I still wouldn't call the new version _compatible_,

- similarly, all `ascii` turn into `Windows-1252`

- sometimes it really does appear more accurate,

- but sometimes it appears to flip between wider families of closely related encodings, like one SHIFT_JIS test (confidence 0.99) turns into cp932 (confidence 0.34), or the whole family of tests that were determined as gb18030 (chinese) are now sometimes determined as gb2312 (the older subset of gb18030), and one even as cp1006, which AFAIK is just wrong.

As for performance claims, they appear not entirely false - analyzing all files took 20s, versus 150s with v6.0. However, looks like the library sometimes takes 2s to lazy initialize something, which means that if one uses `chardetect` CLI instead of Python API, you'll pay this cost each time and get several times slower instead.

Oh, and this "Negligible import memory (96 B)" is just silly and obviously wrong.