| |
| ▲ | fiddlerwoaroof a day ago | parent | next [-] | | Normalization is possible but not practical in a lot of cases: nearly every “legacy” database I’ve seen has at least one table that just accumulates columns because that was the quickest way to ship something. Also, normalization solves a problem that’s present in OLTP applications: OLAP/Big Data applications generally have problems that are solved by denormalization. | | |
| ▲ | gz09 a day ago | parent | next [-] | | Yep, this comment sums it up well. We have many large enterprises from wildly different domains use feldera and from what I can tell there is no correlation between the domain and the amount of columns.
As fiddlerwoaroof says, it seems to be more a function of how mature/big the company is and how much time it had to 'accumulate things' in their data model.
And there might be very good reasons to design things the way they did, it's very hard to question it without being a domain expert in their field, I wouldn't dare :). | | |
| ▲ | locknitpicker 20 hours ago | parent [-] | | > I can tell there is no correlation between the domain and the amount of columns. This is unbelievable. In purely architectural terms that would require your database design to be an amorphous big ball of everything, with no discernible design or modelling involved. This is completely unrealistic. Are queries done at random? In practical terms, your assertion is irrelevant. Look at the sparse columns. Figure out those with sparse rows.
Then move half of the columns to a new table and keep the other half in the original table. Congratulations, you just cut down your column count by half, and sped up your queries. Even better: discover how your data is being used. Look at queries and check what fields are used in each case. Odds are, that's your table right there. Let's face it. There is absolutely no technical or architectural reason to reach this point. This problem is really not about structs. | | |
| ▲ | diarrhea 18 hours ago | parent | next [-] | | Feldera speak from lived experience when they say 100+ column tables are common in their customer base. They speak from lived experience when they say there's no correlation in their customer base. Feldera provides a service. They did not design these schemas. Their customers did, and probably over such long time periods that those schemas cannot be referred to as designed anymore -- they just happened. IIUC Feldera works in OLAP primarily, where I have no trouble believing these schemas are common. At my $JOB they are, because it works well for the type of data we process. Some OLAP DBs might not even support JOINs. Feldera folks are simply reporting on their experience, and people are saying they're... wrong? | | |
| ▲ | gz09 17 hours ago | parent [-] | | Haha, looks like it. I remember the first time I encountered this thing called TPC-H back when I was a student. I thought "wow surely SQL can't get more complicated than that". Turns out I was very wrong about that. So it's all about perspective. We wrote another blog post about this topic a while ago; I find it much more impressive because this is about the actual queries some people are running: https://www.feldera.com/blog/can-your-incremental-compute-en... |
| |
| ▲ | 19 hours ago | parent | prev [-] | | [deleted] |
|
| |
| ▲ | locknitpicker 20 hours ago | parent | prev [-] | | > Normalization is possible but not practical in a lot of cases: nearly every “legacy” database I’ve seen has at least one table that just accumulates columns because that was the quickest way to ship something. Strong disagree. I'll explain. Your argument would support the idea of adding a few columns to a table to get to a short time to market. That's ok. Your comment does not come close to justify why you would keep the columns in. Not the slightest. Tables with many columns create all sorts of problems and inefficiencies. Over fetching is a problem all on itself. Even the code gets brittle, where each and every single tweak risks beijg a major regression. Creating a new table is not hard. Add a foreign key, add the columns, do a standard parallel write migration. Done. How on earth is this not practical? | | |
| ▲ | fiddlerwoaroof 11 hours ago | parent | next [-] | | I’m not justifying the design but splitting a table with several billion rows is not a trivial task, especially when ORMs and such are involved. Additionally, it’s easier to get work scheduled to ship a feature than it is to convince the relevant players to complete the swing. | | |
| ▲ | locknitpicker 9 hours ago | parent [-] | | > I’m not justifying the design but splitting a table with several billion rows is not a trivial task, especially when ORMs and such are involved. I don't agree. Let me walk you through the process. - create the new table
- follow a basic parallel writes strategy
-- update your database consumers to write to the new table without reading from it
-- run a batch job to populate the new table with data from the old table
-- update your database consumer to read from the new table while writing to both old and new tables From this point onward, just pick a convenient moment to stop writing to the old database and call the migration done. Do post-migrarion cleanup tasks. > Additionally, it’s easier to get work scheduled to ship a feature than it is to convince the relevant players to complete the swing. The ease of piling up technical debt is not a justification to keep broken systems and designs. It's only ok to make a messs to deliver things because you're expected to clean after yourself afterwards. |
| |
| ▲ | grey-area 20 hours ago | parent | prev | next [-] | | There are sometimes reasons this is harder in practice, for example let’s say the business or even third parties have access to this db directly and have hundreds of separate apps/services relying on this db (also an anti-pattern of course but not uncommon), that makes changing the db significantly harder. Mistakes made early on and not corrected can snowball and lead to this kind of mess, which is very hard to back out of. | |
| ▲ | magicalhippo 15 hours ago | parent | prev [-] | | > How on earth is this not practical? Fine, but you still need to read in those 100+ fields. So now you gotta contend with 20+ joins just to pull in one record. Not more practical than a single SELECT in my opinion. | | |
| ▲ | sublinear 13 hours ago | parent [-] | | You don't need to join what you don't actually need. You also need to be careful writing your queries, not just the schema. The most common ones should be wrapped in views or functions to avoid the problem of everyone rolling their own later. Performance generally isn't an issue for an arbitrary number of joins as long as your indices are set up correctly. If you really do need a bulk read like that I think you want json columns, or to just go all in with a nosql database. Even then, the above regarding indexing is still true. |
|
|
| |
| ▲ | rpcope1 21 hours ago | parent | prev [-] | | I think you believe the average developer, especially on enterprise software where you see this sort of shit, is far more competent or ambitious than they actually are. Many would be horrified to see the number of monkeys banging out nasty DDL in Hibernate or whatever C# uses that have no idea what "normal forms" or "relational algebra" are and are actively resistant to even attempting to learn. |
|