Remix.run Logo
Retr0id 4 days ago

There are a little over 256 unicode Combining Marks that have a 2-byte UTF-8 encoding. I picked a set of them, defining an encoding I call zalgo256:

https://gist.github.com/DavidBuchanan314/07da147445a90f7a049...

Since an arbitrarily tall stack of combining characters still counts as one grapheme cluster, if some application limits string length by counting grapheme clusters then you can stuff an unlimited amount of data in there, with "only" 2x overhead in the byte representation.

Unfortunately HN filters some of the codepoints so I can't demonstrate here. Since I chose "A" as the base character which the diacritics are stacked on, it has a similar aesthetic to the SCREAM cipher although a little more zalgo-y.

junon 4 days ago | parent | next [-]

A demonstration as a comment on the gist would probably work! I'd love to see that

Retr0id 4 days ago | parent [-]

Good point, added

junon 4 days ago | parent | next [-]

Interesting, I actually expected it to encode a single letter with infinitely long combining marks such that 'highlighting' it was just highlighting one character.

Retr0id 4 days ago | parent [-]

You can do that too, if you increase the STACK_HEIGHT constant (btw, the decoder still works the same, so changing this doesn't break compatibility)

junon 4 days ago | parent [-]

Oh neat! Thanks :)

all2 4 days ago | parent | prev [-]

Most of the characters appear as boxes on my phone.

Retr0id 4 days ago | parent [-]

That's curious, because the only character is just the letter A. But I suppose if the font doesn't support a particular combining mark, it gives up on the whole grapheme?

Dylan16807 4 days ago | parent | prev | next [-]

HN filters some combining characters? That's weird, compared to the symbol/emoji blocking.

Also I'm reminded that the unicode normalization annex suggests that legitimate grapheme clusters will be 31 code points or less. "The value of 30 is chosen to be significantly beyond what is required for any linguistic or technical usage."

Retr0id 4 days ago | parent [-]

If I had to guess, they probably filtered the ones that could be used to break page layouts by creating very-tall glyphs.

Dylan16807 4 days ago | parent [-]

I guess that's one way to do it. Pretty far from ideal though.

RGamma 4 days ago | parent | prev [-]

Are you sure this doesn't summon The One by accident?