Remix.run Logo
derefr 7 days ago

> It'd be nice if something like protobuf was easier to write and read in a schemeless fashion.

If you just want a generic, binary, hierarchical type-length-value encoding, have you considered https://en.wikipedia.org/wiki/Interchange_File_Format ?

It's not that there are widely-supported IFF libraries, per se; but rather that the format is so simple that as long as your language has a byte-array type, you can code a bug-free IFF encoder/decoder in said language about five minutes.

(And this is why there are no generic IFF metaformat libraries, ala JSON or XML libraries; it's "too simple to bother everyone depending on my library with a transitive dependency", so everyone just implements IFF encoding/decoding as part of the parser + generator for their IFF-based concrete file format.)

What's IFF used in? AIFF; RIFF (and therefore WAV, AVI, ANI, and — perhaps surprisingly — WebP); JPEG2000; PNG [with tweaks]...

• There's also a descendant metaformat, the ISO Base Media File Format ("BMFF"), which in turn means that MP4, MOV, and HEIF/HEIC can all be parsed by a generic IFF parser (though you'll miss breaking some per-leaf-chunk metadata fields out from the chunk body if you don't use a BMFF-specific parser.)

• And, as an alternative, there's https://en.wikipedia.org/wiki/Extensible_Binary_Meta_Languag... ("EBML"), which is basically IFF but with varint-encoding of the "type" and "length" parts of TLV (see https://matroska-org.github.io/libebml/specs.html). This is mostly currently used as the metaformat of the Matroska (MKV) format. It's also just complex enough to have a standalone generic codec library (https://github.com/Matroska-Org/libebml).

My personal recommendation, if you have some structured binary data to dump to disk, is to just hand-generate IFF chunks inline in your dump/export/send logic, the same way one would e.g. hand-emit CSV inline in a printf call. Just say "this is an IFF-based format" or put an .iff extension on it or send it as application/x-iff, and an ecosystem should be able to run with that. (And just like with JSON, if you give the IFF chunks descriptive names, people will probably be able to suss out what the chunks "mean" from context, without any kind of schema docs being necessary.)

naikrovek 6 days ago | parent [-]

yeah! I agree with this. I use plain TLV (which is very close to this IFF format) and is similar to how PNG stores all its chunks in a single file. As you mentioned.

I got grief for saying that I prefer TLV data over textual data (even if the data is text) because of how easy it is to write code to output and ingest this format, and it is way, WAY faster than JSON will ever be.

It really is a very easy way to get much faster transmission of data over the wire than JSON, and it's dead easy to write viewers for. It's just an underrated way to store binary data. storing things as binary is underrated in general.