Remix.run Logo
immibis 5 months ago

If you pay attention this isn't a UTF-8 decoder. It might be some other encoding, or a complete misunderstanding of how UTF-8 works, or an AI hallucination. It also doesn't talk about how to handle the variable number of output bytes or the possibility of a continuation sequence split between input chunks.

kjs3 5 months ago | parent [-]

I paid attention and I don't see where Daniel claimed that this a complete UTF-8 decoder. He's illustrating a programming technique using a simplified use case, not solving the worlds problems. And I don't think Daniel Lemire lacks an understanding of the concept or needs an AI to code it.

magicalhippo 5 months ago | parent [-]

Agreed, but the points raised by GP are valid in terms of using that article as an argument that AVX-512 can decode UTF-8 well.

It might be fast, but it's not a UTF-8 decoder. It's a transcoder to a fixed, and very limited, target encoding.

kjs3 5 months ago | parent [-]

I though it was pretty clear the GP was talking about Daniels article, not the blog post, but I guess I can see two readings.