Remix.run Logo
MrJohz 3 days ago

I think the key part, although the author doesn't quite make it explicit, is that (a) the parsing happens all up front, rather than weaving validation and logic together, and (b) the parsing creates a new structure that encodes the invariants of the application, so that the rest of the application no longer needs to check anything.

Whether you do that with Zod or manually or whatever isn't important, the important thing is having a preprocessing step that transforms the data and doesn't just validate it.

1718627440 3 days ago | parent | next [-]

But when you parse all arguments first before throwing error messages, you can create much better error messages, since they can be more holistic. To do that you need to represent the invalid configuration as a type.

geon 3 days ago | parent | next [-]

Sure. Then you return that validated data structure from the parsing function and never touch the invalid data structure again. That's exactly what "Parse, don't validate" means.

12_throw_away 3 days ago | parent | prev [-]

> To do that you need to represent the invalid configuration as a type

Right - and one thing that keeps coming up for me is that, if you want to maintain complex invariants, it's quite natural to express them in terms of the domain object itself (or maybe, ugh, a DTO with the same fields), rather than in terms of input constraints.

makeitdouble 3 days ago | parent | prev [-]

The base assumption is parsing upfront cost less than validating along. I thinks it's a common case, but not common enough to apply it as a generic principle.

For instance if validating parameter values requires multiple trips to a DB or other external system, weaving the calls in the logic can spare duplicating these round trips. Light "surface" validation can still be applied, but that's not what we're talking about here I think.

MrJohz 3 days ago | parent | next [-]

It's not about costing less, it's about program structure. The goal should be to move from interface type (in this case a series of strings passed on the command line) to internal domain type (where we can use rich data types and enforce invariants like "if server, then all server properties are specified") as quickly as possible. That way, more of the application can be written to use those rich data types, avoiding errors or unnecessary defensive programming.

Even better, that conversion from interface type to internal type should ideally happen at one explicit point in the program - a function call which rejects all invalid inputs and returns a type that enforces the invariants we're interested in. That way, we gave a clean boundary point between the outside world and the inside one.

This isn't a performance issue at all, it's closer to the "imperative shell, functional core" ideas about structuring your application and data.

lmm 3 days ago | parent | prev [-]

> if validating parameter values requires multiple trips to a DB or other external system, weaving the calls in the logic can spare duplicating these round trips

Sure, but probably at the cost of leaving everything in a horribly inconsistent state when you error out partway through. Which is almost always not worth it.