| ▲ | notepad0x90 2 hours ago | |
I don't fully agree with this, for large nested datasets and arrays. Especially with arrays, what could be one line of JSON, in a CSV you'd have non-normalized array as a string in a single cell, or you expand the array and create a single value for the cell, creating $array_size number of rows. You can normalize data in just about any structured format, but columns aren't the end-all-be-all normalization format. I think pandas uses "frames". | ||
| ▲ | llm_nerd 26 minutes ago | parent [-] | |
>but columns aren't the end-all-be-all normalization format. I think pandas uses "frames". Pandas is column oriented, as are basically all high performance data libraries. Each column is a separate array of data. To get a "row" you take the n item from each of the arrays. And FWIW, column-oriented isn't considered normalization. It's a physical optimization that can yield enormous performance advantages for some classes of problems, but can cause a performance nightmare for other problems. Data analytics loves column-oriented. CRUD type stuff does not. And in the programming realm there are several options to have Structures of Arrays (SoA) instead of the classic Arrays of Structures (AoS). | ||