| ▲ | 0xbadcafebee 4 hours ago | |
The problem is that engineers of data formats have ignored the concept of layers. With network protocols, you make one layer (Ethernet), you add another layer (IP), then another (TCP), then another (HTTP). Each one fits inside the last, but is independent, and you can deal with them separately or together. Each one has a specialty and is used for certain things. The benefits are 1) you don't need "a kitchen sink", 2) you can replace layers as needed for your use-case, 3) you can ship them together or individually. I don't think anyone designs formats this way, and I doubt any popular formats are designed for this. I'm not that familiar with enterprise/big-data formats so maybe one of them is? For example: CSV is great, but obviously limited, and not specified all that well. A replacement table data format could be binary (it's 2026, let's stop "escaping quotes", and make room for binary data). Each row can have header metadata to define which columns are contained, so you can skip empty columns. Each cell can be any data format you want (specifically so you can layer!). The header at the beginning of the data format could (optionally) include an index of all the rows, or it could come at the end of the file. And this whole table data format could be wrapped by another format. Due to this design, you can embed it in other formats, you can choose how to define cells (pick a cell-data-format of your choosing to fit your data/type/etc, replace it later without replacing the whole table), you can view it out-of-order, you can stream it, and you can use an index. | ||
| ▲ | inejge 21 minutes ago | parent | next [-] | |
> With network protocols, you make one layer (Ethernet), you add another layer (IP), then another (TCP), then another (HTTP). Each one fits inside the last, but is independent, and you can deal with them separately or together. It looks neat when you illustrate it with stacked boxes or concentric circles, but real-world problems quickly show the ugly seams. For example, how do you handle encryption? There are arguments (and solutions!) for every layer, each with its own tradeoffs. But it can't be neatly slotted into the layered structure once and for all. Then you have things like session persistence, network mobility, you name it. Data formats have other sets of tradeoffs pulling them in different directions, but I don't think that layered design would come near to solving any of them. | ||
| ▲ | mristin an hour ago | parent | prev | next [-] | |
Have a look at Asset Administration Shells (AAS) -- it is a data exchange format built on top of JSON and XML (and RDF, and OPC UA and Protobuf, etc.). https://industrialdigitaltwin.org/ (Disclaimer: I work on AAS SDKs https://github.com/aas-core-works.) | ||
| ▲ | gmueckl 3 hours ago | parent | prev [-] | |
Some early binary formats followed similar concepts. Look up Interchange File Format, AIFF, RIFF, and their applications and all the file formats using this structure to this day. | ||