Remix.run Logo
ellisv 6 days ago

I wish devs would normalize their data rather than shove everything into a JSON(B) column, especially when there is a consistent schema across records.

It's much harder to setup proper indexes, enforce constraints, and adds overhead every time you actually want to use the data.

nh2 5 days ago | parent | next [-]

JSON columns shine when

* The data does not map well to database tables, e.g. when it's tree structures (of course that could be represented as many table rows too, but it's complicated and may be slower when you always need to operate on the whole tree anyway)

* your programming language has better types and programming facilities than SQL offers; for example in our Haskell+TypeScript code base, we can conveniently serialise large nested data structures with 100s of types into JSON, without having to think about how to represent those trees as tables.

cies 5 days ago | parent [-]

You do need some fancy in-house way to migrate old JSONs to new JSON in case you want to evolve the (implicit) JSON schema.

I find this one of the hardest part of using JSON, and the main reason why I rather put it in proper columns. Once I go JSON I needs a fair bit of code to deal with migrartions (either doing them during migrations; or some way to do them at read/write time).

kccqzy 5 days ago | parent | next [-]

Since OP is using Haskell, the actual code most likely won’t really touch the JSON type, but the actual domain type. This makes migrations super easy to write. Of course they could have written a fancy in-house way to do that, or just use the safe-copy library which solves this problem and it has been around for almost two decades. In particular it solves the “nested version control” problem with data structures containing other data structures but with varying versions.

nh2 5 days ago | parent | prev [-]

Yes, that's what we do: Migrations with proper sum types and exhaustiveness checking.

tracker1 5 days ago | parent | prev | next [-]

I find that JSON(B) works best when you have a collection of data with different or variant concrete types of data that aren't 1:1 matches. Ex: the actual transaction result if you have different payment processors (paypal, amazon, google, apple-pay, etc)... you don't necessarily want/care about having N different tables for a clean mapping (along with the overhead of a join) to pull the transaction details in the original format(s).

Another example is a classifieds website, where your extra details for a Dress are going to be quite a bit different than the details for a Car or Watch. But, again, you don't necessarily want to inflate the table structure for a fully normalized flow.

If you're using a concretely typed service language it can help. C# does a decent job here. But even then, mixing in Zod with Hono and OpenAPI isn't exactly difficult on the JS/TS front.

dzonga 5 days ago | parent [-]

Yeah document formats (jsonb) are excellent for apps etc that interface with the messy real world. ecommerce, gvt systems etc, anything involving forms, payments etc

tryna map everything in a relational way etc - you're in a world of pain

crazygringo 5 days ago | parent | prev | next [-]

For very simple JSON data whose schema never changes, I agree.

But the more complex it is, the more complex the relational representation becomes. JSON responses from some API's could easily require 8 new tables to store the data in, with lots of arbitrary new primary keys and lots of foreign key constraints, your queries will be full of JOIN's that need proper indexing set up...

Oftentimes it's just not worth it, especially if your queries are relatively simple, but you still need to store the full JSON in case you need the data in the future.

Obviously storing JSON in a relational database feels a bit like a Frankenstein monster. But at the end of the day, it's really just about what's simplest to maintain and provides the necessary performance.

And the whole point of the article is how easy it is to set up indexes on JSON.

jasonthorsness 5 days ago | parent | prev | next [-]

When a data tree is tightly coupled (like a complex sample of nested data with some arrays from a sensor) and the entire tree is treated like a single thing by writes, the JSON column just keeps things easier. Reads can be accelerated with indexes as demonstrated here.

whizzter 5 days ago | parent | prev | next [-]

I fully agree that's wrong (can't imagine the overhead of some larger tables I have if that had happened), that said, often people want weird customizations in medium-sized tables that would set one on a path to having annoying 100 column tables if we couldn't express customizations in a "simple" JSON column (that is more or less polymorphic).

Typical example is a price-setting product I work on.. there's price ranges that are universal (and DB columns reflect that part) but they all have weird custom requests for pricing like rebates on the 3rd weekend after X-mas (but only if the customer is related to Uncle Rudolph who picks his nose).

fauigerzigerk 5 days ago | parent [-]

But if you have to model those custom pricing structures anyway, the question what you gain by not reflecting them in the database schema.

There's no reason to put all those extra fields in the same table that contains the universal pricing information.

whizzter 3 days ago | parent [-]

A lot of unnecessary complexity/overhead for a minor seldomly touched part of a much larger already complex system?

I'll give a comparison.

JSON

- We have some frontend logic/view (that can be feature-flagged per customer) to manage updating the data that's otherwise mostly tagging along as a dumb "blob" (auto-expanded to regular a part of the JSON objects maps/arrays at the API boundary making frontend work easier, objects on the frontend, "blobs" on the backend/db)

- Inspecting specfic cases (most of the time it's just null data) is just copying out and formatting the special data.

- If push comes to shows, all modern databases support JSON queries so you can pick out specifics IF needed (has happened once or twice with larger customers over the years).

- We read and apply the rules when calculating prices with a "plugin system"

DB Schema (extra tables)

- Now you have to wade through lots of customer-specific tables just to find the tables that takes most of the work-time (customer specifics are seldomly what needs work once setup). We already have some older customer-specific stuff from the early days (I'm happy that it's not happened much lately).

- Those _very_ few times you actually need to inspect the specific data by query you might win on this (but as mentioned above, JSON queries has always solved it).

- Loading the universal info now needs to query X extra tables (even when 90%-95% of the data has no special cases).

- Adding new operations on prices like copying,etc now needs to have logic for each piece of customer specific table to properly make it tag along.

- "properly" modelled this reaches the API layer as well

- Frontend specialization is still needed

- Calculating prices still needs it's customization.

I don't really see how my life would have been better for managing all extra side-effects of bending the code to suit these weird customer requests (some that aren't customers anymore) when 90-95% of the time it isn't used and seldomly touched upon with mature customers.

I do believe in the rule of 3, if the same thing pops up three times I do consider if that needs to be graduated to more "systematic" code, so often when you abstract after seeing something even 2 times it never appears again leaving you with some abstraction to maintain.

JSON columns, like entity-attribute-value tables or goto statements all have real downsides and shouldn't be plonked in without a reason, but hell if I'd have to work with overly complex schemas/models because people start putting special cases into core pieces of code just because they heard that a technique was bad.

konart 6 days ago | parent | prev | next [-]

Normalisation brings its own overhead though.

5 days ago | parent | prev [-]
[deleted]