| ▲ | unquietwiki 5 days ago |
| Looks interesting, but few comments on the forum & even a negative vote count ATM. Format kinda looks "old school" in terms of defining records, but I guess that can be a positive in some circumstances? |
|
| ▲ | inkyoto 5 days ago | parent | next [-] |
| I would say it is a niche solution that solves a specific problem. Modern data sources increasingly lean towards and produce nested and deeply nested semi-structured datasets (i.e. JSON) that are heavily denormalised and rely on organisation-wide entity ID's rather than system-generated referential integrity ID's (PK and FK ID's). That is a reason why modern data warehouse products (e.g. Redshift) have added extensive support for the nested data processing – because it neither makes sense to flatten/un-nest the nested data nor is it easy to do anyway. |
| |
| ▲ | sergeyprokhoren 5 days ago | parent [-] | | This is a fairly common problem. Data is often transferred between information systems in denormalized form (tables with hundreds of columns - attributes). In the data warehouse, they are normalized (data duplication in tables is excluded by using references to reference tables) to make it easier to perform complex analytical queries to the data. Usually, they are normalized to 3NF and very rarely to 6NF, since there is still no convenient tool for 6NF (see my DSL: https://medium.com/@sergeyprokhorenko777/dsl-for-bitemporal-... ). And then the data is again denormalized in data marts to generate reports for external users. All these cycles of normalization - denormalization - normalization - denormalization are very expensive for IT departments. Therefore, I had an idea to transfer data between information systems directly in normalized form, so that nothing else would have to be normalized. The prototypes were the Anchor Modeling and (to a much lesser extent) Data Vault methodologies. | | |
| ▲ | gregw2 4 days ago | parent | next [-] | | Cool to see you tackle this problem. If I were you though, I'd consider if I'd get more traction with an open source extension of Iceberg format that supports row based reporting and indexes for a unified open source HTAP ecosystem. | |
| ▲ | snthpy 5 days ago | parent | prev [-] | | Nice. Anchor Modelling is underappreciated. Gonna have a look at your DSL. |
|
|
|
| ▲ | sergeyprokhoren 5 days ago | parent | prev [-] |
| What does looks "old school" mean? Do you want to wrap this format in JSON like JSON-LD? I don't mind |