Remix.run Logo
deathanatos 5 days ago

That is a bit of a minefield, I agree…

The way around this, as a developer, is URL-safe-base64 encode the value. Then you have a bytes primitive & you can use whatever inner representation your heart desires. But the article does also note that you're not 100% in control, either. (Nor should you be, it is a user agent, after all.)

I do wish more UAs opted for "obey the standard" over "bytes and an prayer on the wire". Those 400 responses in the screenshots … they're a conforming response. This would have been better if headers had been either UTF-8 from the start (but there are causality problems with that) or ASCII and then permitted to be UTF-8 later (but that could still cause issues since you're making values that were illegal, legal).

johnp_ 5 days ago | parent | next [-]

> URL-safe-base64

And make sure to specify what exactly you mean by that. base64url-encoding is incompatible with base64+urlencoding in ~3% of cases, which is easily missed during development, but will surely happen in production.

Retr0id 5 days ago | parent | next [-]

Isn't it a lot more than 3%? I don't think I've heard anyone say url-safe-base64 and actually mean urlencode(base64(x))

deathanatos 5 days ago | parent [-]

… yeah. I assume they're getting that from doing 3/64, but for uniform bytes, you're rolling that 3/64 chance every base64-output-character. (And bytes are hardly uniform, either … TFA's example input of JSON is going to skew towards that format's character set.)

deathanatos 5 days ago | parent | prev [-]

oh, geez. No, just base64, using the URL safe alphabet. (The obvious 62 characters, and "-_" for the last two.

It's called "urlsafe base64", or some variant, in the languages I work in.

> This encoding may be referred to as "base64url".

https://datatracker.ietf.org/doc/html/rfc4648#section-5

But yeah, it's not base64 followed by a urlencode. It's "just" base64-with-a-different-alphabet.

ndusart 4 days ago | parent | prev [-]

Cookie value can contain `=`, `/` and `+` characters so standard base64 encoding can be used as well :)