Remix.run Logo
BiteCode_dev 4 days ago

Because they don't represent the same thing. Pydantic models represent your input, it's the result of the experience you expose to the outside world, and therefore comes with objectives and constraints matching this:

- make it easy to provide

- make it simple to understand

- make it familiar

- deal with security and authentication

- be easily serializable through your communication layer

On the other hand, internal representations have the goal to help you with your private calculations:

- make it performant

- make it work with different subsystems such as persistence, caching, queuing

- provide convenience shortcuts or precalculations for your own benefits

Sometimes they overlap, or the system is not big enough that it matters.

But the more you get to big or old system, the less likely they will.

However, I often pass around pydantic objects if I have them, and I do this until it becomes a problem. And I rarely reach that point.

It's like using Python until you have performance problems.

Practicality beasts premature optimization.

JackSlateur 4 days ago | parent [-]

My pydantic models represent a "Thing" (a concept or whatever), not an input

You can translate many things into a Thing, model_validate will help you with that (with contextinfo etc)

You can translate your Thing into multiple output format, with model_serialize

In your model, you shall put every checks required to ensure that some input are, indeed, a Thing

And from there, you can use this object everywhere, certain that this is, indeed, a Thing, and that it has all the properties that makes a thing a Thing

BiteCode_dev 4 days ago | parent [-]

You can certainly do it, but since serialization and validation are the main benefit from using Pydantic, I/O are why it exists.

Outside of I/O, the whole machinery has little use. And since pydantic models are used by introspection to build APIs, automatic deserializer and arg parsing, making it fit the I/O is where the money is.

Also, remember that despite all the improved perf of pydantic recently, they are still more expensive than dataclass, themselves more than classes. They are 8 times more expensive to instanciate than regular classes, but above all, attribute access is 50% slower.

Now I get that in Python this is not a primary concern, but still, pydantic is not a free lunch.

I'd say it's also important to state what it conveys. When I see a Pydantic objects, I expect some I/O somewhere. Breaking this expectation would take me by surprise and lower my trust of the rest of the code. Unless you are deep in defensive programming, there is no reason to validate input far from the boundaries of the program.

JackSlateur 4 days ago | parent [-]

This is true, there is a performance cost

Apart from what has been said, I find pydantic interesting even in the middle of my code: it can be seen as an overpowered assert

It helps making sure that the complex data structure returned by that method is valid (for instance)

duncanfwalker 3 days ago | parent | next [-]

Yeah, I'd agree with that. Validation rules are like an extension to the type system. Invariants are useful at the edges of a system but also in the core. If, for example, I can be told that a list is non-empty then I can write more clean code to handle it.

In Java they got around the external-dependency-in-the-core-model problem by making the JSR-380 specification that could (even if only in theory) have multiple implementations.

In clojure you don't need to worry about another external dependency because the spec library is built-in. One could argue that it's still a dependency even if it's coming from the standard library. At the point I'd say, why are we worried about this? it's to isolate our core from unnecessary reasons to change.

I get that principled terms it's not right but, if those libraries change on API on a similar cadence to the programming language syntax, then it doesn't impact in practical terms. It's these kind of pragmatic compromises that distinguish Python from Java - after all, 'worse is better'.

codethief 3 days ago | parent | prev [-]

You could also use a TypedDict for that, though?

PEP 764[0] will make them extra convenient.

[0]: https://peps.python.org/pep-0764/

JackSlateur 3 days ago | parent [-]

Typing is declarative

In the end, it ensures nothing and accepts everything

  $ cat test.py
  from typing import TypedDict
  class MyDict(TypedDict):
        some_bool: bool
  print(MyDict(some_bool="test", unknown_field="blabla"))
=>

   $ ./test.py
   {'some_bool': 'test', 'unknown_field': 'blabla'}
codethief 3 days ago | parent [-]

Rolls eyes… Of course you need to use a type checker. Who doesn't these days?

JackSlateur 3 days ago | parent [-]

Of course, I need runtime validation

codethief a day ago | parent [-]

Sure, runtime validation is useful – at the boundaries of your domain! After that your type checker should ensure your data has the shape your code expects.

In other words: Parse, don't validate. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...