Remix.run Logo
Sophira 7 days ago

Mathematically, base64 is such that every block of three characters of raw input will result in four characters of base64'd output.

These blocks can be considered independent of each other. So for example, with the string "Hello world", you can do the following base64 transformations:

* "Hel" -> "SGVs"

* "lo " -> "bG8g"

* "wor" -> "d29y"

* "ld" -> "bGQ="

These encoded blocks can then be concatenated together and you have your final encoded string: "SGVsbG8gd29ybGQ="

(Notice that the last one ends in an equals sign. This is because the input is less than 3 characters, and so in order to produce 4 characters of output, it has to apply padding - part of which is encoded in the third digit as well.)

It's important to note that this is simply a byproduct of the way that base64 works, not actually an intended thing. My understanding is that it's basically like how if you take an ASCII character - which could be considered a base 256 digit - and convert it to hexadecimal (base 16), the resulting hex number will always be two digits long - the same two digits, at that - even if the original was part of a larger string.

In this case, every three base 256 digits will convert to four base 64 digits, in the same way that it would convert to six base 16 digits.

Sophira 7 days ago | parent | next [-]

By the way, I would guess that this is almost certainly why LLMs can actually decode/encode base64 somewhat well, even without the help of any MCP-provided tools - it's possible to 'read' it In a similar way to how an LLM might read any other language, and most encoded base64 on the web will come with its decoded version alongside it.

zokier 7 days ago | parent | prev [-]

nitpick but ascii would be base128, largest ascii value is 0x7f which in itself is a telltale if you are looking at hex dumps.

Sophira 7 days ago | parent [-]

Yeah, I was aware of that, but I figured it was the easiest way to explain it. It's true that "character representation of a byte" is more accurate, but it doesn't roll off the tongue as easily.