Remix.run Logo
lock1 3 days ago

When I first saw "Parse, don't validate" title, it struck me as a catchy but perhaps unnecessarily clever catchphrase. It's catchy, yes, but it felt too ambiguous to be meaningful for anyone outside of the target audience (Haskellers in this case).

That said, I fully agree with the article content itself. It basically just boils down to:

When you create a program, eventually you'll need to process & check whether input data is valid or not. In C-like language, you have 2 options

  void validate(struct Data d);
or

  struct ValidatedData;
  ValidatedData validate(struct Data d);
"Parse, don't validate" is just trying to say don't do `void validate(struct Data d)` (procedure with `void`), but do `ValidatedData validate(struct Data d)` (function returning `ValidatedData`) instead.

It doesn't mean you need to explicitly create or name everything as a "parser". It also doesn't mean "don't validate" either; in `ValidatedData validate(struct Data d)` you'll eventually have "validation" logic similar to the procedure `void` counterpart.

Specifically, the article tries to teach folks to utilize the type system to their advantage. Rather than praying to never forget invoking `validate(d)` on every single call site, make the type signature only accept `ValidatedData` type so the compiler will complain loudly if future maintainers try to shove `Data` type to it. This strategy offloads the mental burden of remembering things from the dev to the compiler.

I'm not exactly sure why the "Parse, don't validate" catchphrase keeps getting reused in other language communities. It's not clear to non-FP community what the distinction between "parser" and "validate", let alone "parser combinator". Yet somehow other articles keep reusing this same catchphrase.

Lvl999Noob 3 days ago | parent | next [-]

The difference, in my opinion, is that you received the cli args in the form

``` some_cli <some args> --some-option --no-some-option ```

Before parsing, the argument array contains both the flags to enable and disable the option. Validation would either throw an error or accept it as either enabled or disabled. But importantly, it wouldn't change the arguments. If the assumption is that the last option overwrites anything before it then the cli command is valid with the option disabled.

And now, correct behaviour relies on all the code using that option to always make the same assumption.

Parsing, on the other hand, would put create a new config where `option` is an enum - either enabled or disabled or not given. No confusion about multiple flags or anything. It provides a single view for the rest of the program of what the input config was.

Whether that parsing is done by a third party library or first party code, declaratively or imperatively, is besides the point.

andreygrehov 3 days ago | parent | prev [-]

What is ValidatedData? A subset of the Data that is valid? This makes no sense to me. The way I see it is you use ‘validate’ when the format of the data you are validating is the exact same format you are gonna be working with right after, meaning the return type doesn’t matter. The return type implies transformation – a write operation per se, whereas validation is always a read operation only.

lock1 3 days ago | parent [-]

  > What is ValidatedData? A subset of the Data that is valid?
Usually, but not necessarily. `validate()` might add some additional information too, for example: `validationTime`.

More often than not, in a real case of applying algebraic data type & "Parse, don't validate", it's something like `Option<ValidatedData>` or `Result<ValidatedData,PossibleValidationError>`, borrowing Rust's names. `Option` & `Result` expand the possible return values that function can return to cover the possibility of failure in the validation process, but it's independent from possible values that `ValidatedData` itself can contain.

  > The way I see it is you use ‘validate’ when the format of the data you are validating is the exact same format you are gonna be working with right after, meaning the return type doesn’t matter.
The main point of "Parse, don't validate" is to distinguish between "machine-level data representation" vs "possible set of values" of a type and utilize this "possible set of values" property.

Your "the exact same format" point is correct; oftentimes, the underlying data representation of a type is exactly the same between pre- & post-validation. But more often than not "possible set of values" of `ValidatedData` is a subset of `Data`. These 2 different "possible set of values" are given their own names in the form of a type `Data` and `ValidatedData`.

This distinction is actually very handy because types can be checked automatically by the (nominal) type system. If you make the `ValidatedData` constructor private & the only way to produce is function `ValidatedData validate(Data)`, then in any part of the codebase, there's no way any `ValidatedData` instance is malformed (assuming `validate` doesn't have bugs).

Extra note: I forgot to mention the "Parse, don't validate" article implicitly implies a nominal type system, where 2 objects with equivalent "data representation" doesn't mean it has the same type. This differs from Typescript's structural type system, where as long as the "data representation" is the same, both object are considered to have the same type.

Typescript will happily accept something like this because of structural

  type T1 = { x: String };
  type T2 = { x: String };
  function f(T1): void { ... }
  const t2: T2 = { x: "foo" };
  f(t2);
While nominal type systems like Haskell or Java will reject such expressions

  class T1 { String x; }
  class T2 { String x; }
  void f(T1) { ... }
  // f(new T2()); // Compile error: type mismatch
Because of this, the idea of using type as a "possible set of values" probably felt unintuitive to Typescript folks, as everything is just stringly-typed and different type felt synonymous with different "underlying data representation" there.

You can simulate this "same structure, but different meaning" concept of nominal type system in Typescript with some hacky workaround with Symbol.

  > The return type implies transformation – a write operation per se, whereas validation is always a read operation only
Why does the return type need to imply transformation and why is "validation" here always read-only? No-op function will return the exact same value you give it (in other words, identity transformation), and Java & Javascript procedures never guarantee a read-only operation.