Remix.run Logo
cogman10 7 days ago

I don't really love this. It just feels so wasteful.

JWT does it as well.

Even in this example, they are double base64 encoding strings (the salt).

It's really too bad that there's really nothing quite like json. Everything speaks it and can write it. It'd be nice if something like protobuf was easier to write and read in a schemeless fashion.

dlt713705 7 days ago | parent | next [-]

What’s wrong with this?

The purpose of Base64 is to encode data—especially binary data—into a limited set of ASCII characters to allow transmission over text-based protocols.

It is not a cryptographic library nor an obfuscation tool.

Avoid encoding sensitive data using Base64 or include sensitive data in your JWT payload unless it is encrypted first.

xg15 7 days ago | parent | next [-]

I think it's more the waste of space in it all. Encoding data in base64 increases the length by 33%. So base64-encoding twice will blow it up by 33% of the original data and then again 33% of the encoded data, making 69% in total. And that's before adding JSON to the mix...

And before "space is cheap": JWT is used in contexts where space is generally not cheap, such as in HTTP headers.

cogman10 7 days ago | parent | next [-]

Precisely my thoughts.

You have to ask the question "why are we encoding this as base64 in the first place?"

The answer to that is generally that base64 plays nice with http headers. It has no newlines or special characters that need special handling. Then you ask "why encode json" And the answer is "because JSON is easy to handle". Then you ask the question "why embed a base64 field in the json?" And the answer is "Json doesn't handle binary data".

These are all choices that ultimately create a much larger text blob than needs be. And because this blob is being used for security purposes, it gets forwarded onto the request headers for every request. Now your simple "DELETE foo/bar" endpoint ends up requiring a 10kb header of security data just to make the request. Or if you are doing http2, then it means your LB will end up storing that 10kb blob for every connected client.

Just wasteful. Especially since it's a total of about 3 or 4 different fields with relatively fixed sizes. It could have been base64(key_length(1byte)|iterations(4bytes)|hash_function(1byte)|salt(32bytes)) Which would have produced something like a 51 byte base64 string. The example is 3x that size (156 characters). It gets much worse than that on real systems I've seen.

rini17 7 days ago | parent [-]

JSON doesn't even handle text...

0xml 7 days ago | parent | prev [-]

Not exactly - encoding it twice increases by 4/3 * 4/3 - 1 = 7/9, which is about 77.78% more than the original.

zokier 7 days ago | parent | prev | next [-]

JSON is already text based and not binary so encoding it with base64 is bit wasteful. Especially if you are going to just embed the text in another json document.

And of course text-based things themselves are quite wasteful.

pak9rabid 7 days ago | parent | prev [-]

Exactly. Using base64 as an obfuscation tool, or (shudder) encryption is seriously misusing it for what it was originally intended for. If that's what you need to do then avoid using base64 in favor for something that was designed to do that.

zokier 7 days ago | parent | prev | next [-]

> It's really too bad that there's really nothing quite like json

messagepack/cbor are very similar to json (schemaless, similar primitive types) but can support binary data. bson is another similar alternative. All three have implementations available in many languages, and have been used in big mature projects.

reactordev 7 days ago | parent | prev | next [-]

We just need to sacrifice n*field_count to a header describing the structure. We also need to define allowed types.

Muromec 7 days ago | parent | prev | next [-]

>Everything speaks it and can write it.

asn.1 is super nice -- everything speaks it and tooling is just great (runs away and hides)

derefr 7 days ago | parent | prev [-]

> It'd be nice if something like protobuf was easier to write and read in a schemeless fashion.

If you just want a generic, binary, hierarchical type-length-value encoding, have you considered https://en.wikipedia.org/wiki/Interchange_File_Format ?

It's not that there are widely-supported IFF libraries, per se; but rather that the format is so simple that as long as your language has a byte-array type, you can code a bug-free IFF encoder/decoder in said language about five minutes.

(And this is why there are no generic IFF metaformat libraries, ala JSON or XML libraries; it's "too simple to bother everyone depending on my library with a transitive dependency", so everyone just implements IFF encoding/decoding as part of the parser + generator for their IFF-based concrete file format.)

What's IFF used in? AIFF; RIFF (and therefore WAV, AVI, ANI, and — perhaps surprisingly — WebP); JPEG2000; PNG [with tweaks]...

• There's also a descendant metaformat, the ISO Base Media File Format ("BMFF"), which in turn means that MP4, MOV, and HEIF/HEIC can all be parsed by a generic IFF parser (though you'll miss breaking some per-leaf-chunk metadata fields out from the chunk body if you don't use a BMFF-specific parser.)

• And, as an alternative, there's https://en.wikipedia.org/wiki/Extensible_Binary_Meta_Languag... ("EBML"), which is basically IFF but with varint-encoding of the "type" and "length" parts of TLV (see https://matroska-org.github.io/libebml/specs.html). This is mostly currently used as the metaformat of the Matroska (MKV) format. It's also just complex enough to have a standalone generic codec library (https://github.com/Matroska-Org/libebml).

My personal recommendation, if you have some structured binary data to dump to disk, is to just hand-generate IFF chunks inline in your dump/export/send logic, the same way one would e.g. hand-emit CSV inline in a printf call. Just say "this is an IFF-based format" or put an .iff extension on it or send it as application/x-iff, and an ecosystem should be able to run with that. (And just like with JSON, if you give the IFF chunks descriptive names, people will probably be able to suss out what the chunks "mean" from context, without any kind of schema docs being necessary.)

naikrovek 7 days ago | parent [-]

yeah! I agree with this. I use plain TLV (which is very close to this IFF format) and is similar to how PNG stores all its chunks in a single file. As you mentioned.

I got grief for saying that I prefer TLV data over textual data (even if the data is text) because of how easy it is to write code to output and ingest this format, and it is way, WAY faster than JSON will ever be.

It really is a very easy way to get much faster transmission of data over the wire than JSON, and it's dead easy to write viewers for. It's just an underrated way to store binary data. storing things as binary is underrated in general.