Remix.run Logo
vjerancrnjak 4 days ago

Just have 1 input type and 1 output type. You don’t need more data types in between.

If pydantic packages valid input, use that for as long as you can.

Loading stuff from db, you need validation again, either go from binary response to 1 validated type with pydantic, or ORM object that already validates.

Then stop having any extra data types.

Keeping pydantic only at the edge and then abandoning it by reshaping it into another data type is a weird exercise. It might make sense if you have N input types and 1 computation flow but I don’t see how in the world of duck typing you’d need an extra unified data type for that.

sgarland 4 days ago | parent [-]

> Loading stuff from db, you need validation again, either go from binary response to 1 validated type with pydantic, or ORM object that already validates.

You shouldn’t need to validate data coming from the database. IMO, this is a natural consequence of teams abandoning traditional RDBMS best practices like normalization and constraints in favor of heavy denormalization, and strings for everything.

If you strictly follow 3NF (or higher, when necessary), it is literally impossible to have referential integrity violations. There may be some other edge cases that can be difficult to enforce, but a huge variety of data bugs simply don’t exist if you don’t treat the RDBMS as a dumb KV store.

vjerancrnjak 3 days ago | parent [-]

Depends.

If you do a query that computes something, the output columns have data types that you’d like to validate.

Checking that you receive an int, string or enum is unavoidable. Even a JOIN might surprise you with null values.

sgarland 3 days ago | parent [-]

> Checking that you receive an int, string or enum is unavoidable.

How would you be unaware of the data type if you defined the schema? Also, an ENUM is returned as a string; it’s only stored internally as an integer.

> Even a JOIN might surprise you with null values.

If you have foreign key constraints, you should never be able to get into a situation where you’re surprised by a NULL from an OUTER JOIN. You can certainly still have NULLs, but they shouldn’t come as a surprise.

vjerancrnjak 3 days ago | parent [-]

You can name the groups, there’s no schema for your group by response.

I can have a join on a cte or tmp table, then I’m out of the usual checks.

You can also have old code running on new schema, so better that it dies on load.

sgarland 3 days ago | parent [-]

> join on a cte

I’ll grant you that this can generate NULLs in a variety of ways (implicit type conversion, for one), but I also think that those issues could be caught via linting if nothing else. I’ll also grant you that this is shifting the goalposts a bit.

> old code running on new schema

Yeah, this would be the primary offender. I was thinking of perfect schema:code coupling, without needing to worry about other people doing dastardly things, but that’s sadly unrealistic for many orgs.