There are use cases where is better to not normalize the data.

andrew_lettuce 6 hours ago | parent | next [-]

Typically it's better to take normalized data and denormalize for your use case vs. not normalize in the first place. Really depends on your needs

▲

jghn 6 hours ago | parent [-]

Over time I’ve developed a philosophy of starting roughly around 3NF and adjusting as the project evolves. Usually this means some parts of the db get demoralize and some get further normalized

	▲	skeeter2020 5 hours ago \| parent [-]
		>> Usually this means some parts of the db get demoralize I largely agree with your practical approach, but try and keep the data excited about the process, sell the "new use cases for the same data!" angle :)

▲

abirch 6 hours ago | parent | prev | next [-]

I'm a fan of the sushi principle: raw data is better than cooked data.

Each process should take data from a golden source and not a pre-aggregated or overly normalized non-authorative source.

▲

layer8 5 hours ago | parent [-]

Sometimes the role of your system is to be the authoritative source of data that it has aggregated, validated, and canonicalized.

	▲	abirch 5 hours ago \| parent [-]
		This is great. Then I would consider the aggreated, validated, and canonicalized source as a Golden Source. Where I've seen issues is that someone starts to query from a nonauthoritative source because they know about it, instead of going upstream to a proper source.

▲

petalmind 6 hours ago | parent | prev | next [-]

One day I hope to write about denormalization, explained explicitly via JOINs.

	▲	andrii 4 hours ago \| parent [-]
		Please do, you content is great!

▲

bob1029 5 hours ago | parent | prev [-]

JSON is extremely fast these days. Gzipped JSON perhaps even more so.

I find that JSON blobs up to about 1 megabyte are very reasonable in most scenarios. You are looking at maybe a millisecond of latency overhead in exchange for much denser I/O for complex objects. If the system is very write-intensive, I would cap the blobs around 10-100kb.

	▲	sgarland an hour ago \| parent \| next [-]
		> You are looking at maybe a millisecond of latency overhead [for 1 megabyte] Considering the data transfer alone for 1 MB / 1 sec requires 8 Gbps, I have doubts. But for fun, I created a small table in Postgres 18 with an INT PK, and a few thousand JSONB blobs of various sizes, up to 1 MiB. Median timing was 4.7 msec for a simple point select, compared to 0.1 msec (blobs of 3 KiB), and 0.8 msec (blobs of 64 KiB). This was on a MBP M4 Pro, using Python with psycopg, so latency is quite low. The TOAST/de-TOAST overhead is going to kill you for any blobs > 2 KiB (by default, adjustable). And for larger blobs, especially in cloud solutions where the disk is almost always attached over a network, the sheer number of pages you have to fetch (a 1 MiB blob will nominally consume 128 pages, modulo compression, row overhead, etc.) will add significant latency. All of this will also add pressure to actually useful pages that may be cached, so queries to more reasonable tables will be impacted as well. RDBMS should not be used to store blobs; it's not a filesystem.
	▲	Quarrelsome 4 hours ago \| parent \| prev [-]
		I adore contiguous reads that ideas like that yield. I'd rather push that out to a read-only end point, then getting sucked into the entropy of treating what is effectively an unschema-ed blob into editable data.