Remix.run Logo
sergeyprokhoren 5 days ago

This is a fairly common problem. Data is often transferred between information systems in denormalized form (tables with hundreds of columns - attributes). In the data warehouse, they are normalized (data duplication in tables is excluded by using references to reference tables) to make it easier to perform complex analytical queries to the data. Usually, they are normalized to 3NF and very rarely to 6NF, since there is still no convenient tool for 6NF (see my DSL: https://medium.com/@sergeyprokhorenko777/dsl-for-bitemporal-... ). And then the data is again denormalized in data marts to generate reports for external users. All these cycles of normalization - denormalization - normalization - denormalization are very expensive for IT departments. Therefore, I had an idea to transfer data between information systems directly in normalized form, so that nothing else would have to be normalized. The prototypes were the Anchor Modeling and (to a much lesser extent) Data Vault methodologies.

gregw2 4 days ago | parent | next [-]

Cool to see you tackle this problem.

If I were you though, I'd consider if I'd get more traction with an open source extension of Iceberg format that supports row based reporting and indexes for a unified open source HTAP ecosystem.

snthpy 5 days ago | parent | prev [-]

Nice. Anchor Modelling is underappreciated.

Gonna have a look at your DSL.