The core thesis is that your types received by the api should not be the same as the types you process internally. I can see a situation where this makes sense and a situation where this senselessly duplicates everything. The blog post shows how to do it but never really dives into why/when.

▲ jon-wood 4 days ago | parent | next [-]

I’ve not done this in Python, where mercifully I don’t really touch CRUD style web apps anymore, but when I was doing Ruby web development we settled on similar patterns.

The biggest benefit you get is being able to have much more flexibility around validation when the input model (Pydantic here) isn’t the same as the database model. The canonical example here would be something like a user, where the validation rules vary depending on context, you might be creating a new stub user at signup when only a username and password are required, but you also want a password confirmation. At a different point you’re updating the user’s profile, and that case you have a bunch of fields that might be required now but password isn’t one of them and the username can’t be changed.

By having distinct input models you make that all much easier to reason about than having a single model which represents the database record, but also the input form, and has a bunch of flags on it to indicate which context you’re talking about.

▲

Groxx 3 days ago | parent | next [-]

I've also generally found that separating the types passively reminds people that they are not forced to keep those types the same.

Whenever I've been in codebases with externally-controlled types as their internal types, almost every single design that goes into the project is based around those types and whatever they efficiently model. It leads to much worse API design, both externally and internally, because it's based on what they have rather than what they want.

▲

nvader 4 days ago | parent | prev | next [-]

I'm with you. But what want sufficiently justified in the article is why both sides of that divide, canonical User and User stubs, could not be pydantic models.

▲

nine_k 4 days ago | parent [-]

The idea, as far as I was able to understand it, is that you want your core models as dependency-free as possible. If you, for whatever reason, were to drop Pydantic, that would only affect the way you validate inputs from API, and nothing deeper.

This wasn't mentioned, but the constant validation on construction also costs something. Sometimes it's a cost you're willing to pay (again, dealing with external inputs), sometimes it's extraneous because e.g. a typechecker would suffice to catch discrepancies at build time.

	▲	erikvdven 3 days ago \| parent [-]
		Exactly. I love the comments by the way! I never expected this would take off like this. The fact that this isn’t clear in the article is excellent feedback, and I'll take it into account when revising it. After a few hours of writing, it's easy to forget to convey the real message clearly. But you are absolutely right. To add a little: In practice, if a third-party library hardly ever changes and makes life dramatically easier, you can consciously decide to accept the coupling in your domain, but that should be the exception, not the rule. Pydantic is great at turning large, nested dictionaries into validated objects, yet none of that power solves a domain problem. Inside the domain you only need plain data and behaviour: pure dataclasses already give you that without extra baggage. And that's the main reason to leave it out. The less your domain knows about the outside world, the less often you have to touch it when the outside world moves. And the easier it becomes for any new team member to adopt that logic: no extra mental model, no hidden framework magic, just the business concepts in plain Python. And exactly what you mentioned: if you ever want to drop Pydantic, you don't need to touch the domain. The less you have to touch, the easier it's to replace. So the guideline is simple: dependencies point inward. Keep the domain free of third-party imports, and let Pydantic stay where it belongs, in the outside layers.

▲

mattmanser 3 days ago | parent | prev | next [-]

It's a pattern that rapidly leads to tons of DTOs that endlessly repeat exactly the same properties.

Your example doesn't even justify it's use, in that scenario the small form is actually a completely different object from the User object, a UserSignup. That's both conceptually different and practically different to an actual User.

The worst pattern is when programmers combine these useless DTOs with some sort of auto mapper, which results in huge globs of boilerplate making any trivial changes to data definitions a multi file job.

The worst one I've seen was when to add one property I had to edit 40 files.

I get why people do it, but if you make it a pattern it's a massive drag to development velocity. It's anti-patterns like that which give statically typed languages a bad name.

You should really only use it when you really, really need to.

▲

coredog64 3 days ago | parent | prev [-]

This sounds like Model-View-ViewModel (MVVM): Model is your domain object, but you can have many different ViewModels of it depending on what you're attempting to do.

▲ NeutralCrane 4 days ago | parent | prev | next [-]

> The core thesis is that your types received by the api should not be the same as the types you process internally.

Is it? I read the blog a couple of times and never was able to divine any kind of thesis beyond the title, but as you said, the content never actually explains why.

Perhaps there is a reason, but I didn’t walk away from the post with it.

	▲	lyu07282 4 days ago \| parent \| next [-]
		Its confusing to ask that, because that's a different subject unrelated to pydantic or python. That's just what you are supposed to do in "clean architecture"/ddd, you can ask the same question in java or whatever.
	▲	causal 4 days ago \| parent \| prev [-]
		Yeah title implies a why, but this is really just about how

▲ tetha 4 days ago | parent | prev | next [-]

It does touch on what I was thinking as well at the end of the first section: Usually this makes sense if your application has to manage a lot of complexity, or rather, has to consume and produce the same domain objects in many different ways across many different APIs.

For example, some systems interact with several different vendor, tracking and payment systems that are all kinda the same, but also kinda different. Here it makes sense to have an internal domain model and to normalize all of these other systems into your domain model at a very early level. Otherwise complexity rises very, very quickly due to the number of n things interacting with n other things.

On the other hand, for a lot of our smaller and simpler systems that output JSON based of a database for other systems... it's a realistic question if maintaining the domain model and API translation for every endpoint in every change is actually less work than ripping out the API modelling framework, which occurs once every few years, if at all? Some teams would probably rewrite from scratch with new knowledge, especially if they have API-tests available.

	▲	AlphaSite 3 days ago \| parent [-]
		I’d say where it’s more Important is when you need to manage database performance. This lets you design an api that’s pleasant for users, well normalised internally, while also performing well. Usually normalisation and performance lead to a poor api that’s hard for users to use and hard hard to evolve since you’re so tightly coupled to your external representation.

▲ skissane 4 days ago | parent | prev | next [-]

I used to work on a Java app where we did this… we had a layer of POJO value classes, a layer of ORM objects… both written by hand… plus for every entity a hand-written mapper which translated between the two… and then sometimes we even had a third layer of classes generated from Swagger specs, and yet another set of mappers to map between the Swagger classes and the value POJOs

Now I mainly do Python and I don’t see that kind of boilerplate duplication anywhere near as much as I used to. Not going to say the same kind of thing never happens in Python, but the frequency of it sure seems to have declined a lot-often you get a smattering of it in a big Python project rather than it having been done absolutely everywhere

	▲	CharlieDigital 4 days ago \| parent [-]
		I think this depends in principle on what you're building. Take an API, for example. The thesis is simple: `1) A DTO is a projection or a view of a given entity. 2) The "domain entity" itself is a projection of the actual storage in a database table. 3) At different layers (vertical separation), the representation of this conceptual entity changes 4) In different entry/exit points (horizontal separation), the projection of the entity may also change.` In some cases, the domain entity can be used in different modules/routes and are projected to the API with different shapes -- less properties, more properties, transformed properties, etc. Typically, when code has a very well-defined domain layer and separation of the DTO and storage representation, the code has a very predictable quality because if you are working with a `User` domain entity, it behaves consistently across all of your code and in different modules. Sometimes, a developer intermixes a database `User` or a DTO `User` and all of a sudden, the code behaves unpredictably; you suddenly have to be cognizant if the `user` instance you're handling is a `DBUser`, a `UserDTO`, or the domain entity. It has extra properties, missing properties, missing functions, can't be passed into some methods, etc. Does this matter? I think it depends on 1) the size of the team, 2) how much re-use of the modules is needed, 3) the nature of the service. For a small team, it's overkill. For a module that will be reused by many teams, it has long term dividends. For a one-off, lightweight service, it probably doesn't matter. But for sure, for some core behaviors, having a delineated domain model really makes life easy when working with multiple teams reusing a module. I find that the code I've worked with over the years that I like has this quality. So if I'm responsible for writing some very core service or shared module, I will take the extra effort to separate my models -- even if there's more duplication required on my behalf because it makes the code more predictable to use if everything inside of the service expects to have only one specific shape and set of behaviors and project shapes outwards as needed for the use case (DTO and storage).

▲ r9295 4 days ago | parent | prev | next [-]

Personally, I think that's a good idea. Design patterns naturally make sense (Visitor, Builder for e.g) once you encounter such a situation in your codebase. It almost makes complete sense then. Otherwise IMHO, it's just premature abstraction

	▲	roland35 4 days ago \| parent [-]
		No one is satisfied with premature abstraction :(

▲ nyrikki 4 days ago | parent | prev | next [-]

PO?O is just an object not bound by any restriction other than those forced by the Language.[0]

From the typing lens, it may be useful to consider it from Rice's theorm, and an oversimplification that typing is converting a semantic property to a trivial property. (Damas-Hindley-Milner inference usually takes advantage of a pathological case, it is not formally trivial)

There is no hard fast rules IMHO, because Rice, Rice-Shapiro, and Kreisel-Lacombe-Shoenfield-Tseitin theorms are related to generalized solutions as most undecidable problems.

But Kreisel-Lacombe-Shoenfield-Tseitin deals with programs that are expected to HALT, yet it is still undecidable if one fixed program is equivalent to a fixed other program that always terminates.

When you start stacking framework, domain, and language restrictions, the restrictions form a type of coupling, but as the decisions about integration vs disintegration are always tradeoffs it will always be context specific.

Combinators (maybe not the Y combinator) and finding normal forms is probably a better lens than my attempt at the flawed version above.

If you consider using po?is as the adapter part of the hex pattern, and notice how a service mesh is less impressive but often more clear in the hex form, it may help build intuitions where the appropriate application of the author's suggestions may fit.

But it really is primarily decoupling of restrictions IMHO. Sometimes the tradeoffs go the other way and often they change over time.

[0] https://www.martinfowler.com/bliki/POJO.html

▲ BiteCode_dev 4 days ago | parent | prev | next [-]

Because they don't represent the same thing. Pydantic models represent your input, it's the result of the experience you expose to the outside world, and therefore comes with objectives and constraints matching this:

- make it easy to provide

- make it simple to understand

- make it familiar

- deal with security and authentication

- be easily serializable through your communication layer

On the other hand, internal representations have the goal to help you with your private calculations:

- make it performant

- make it work with different subsystems such as persistence, caching, queuing

- provide convenience shortcuts or precalculations for your own benefits

Sometimes they overlap, or the system is not big enough that it matters.

But the more you get to big or old system, the less likely they will.

However, I often pass around pydantic objects if I have them, and I do this until it becomes a problem. And I rarely reach that point.

It's like using Python until you have performance problems.

Practicality beasts premature optimization.

▲ JackSlateur 4 days ago | parent [-]

My pydantic models represent a "Thing" (a concept or whatever), not an input

You can translate many things into a Thing, model_validate will help you with that (with contextinfo etc)

You can translate your Thing into multiple output format, with model_serialize

In your model, you shall put every checks required to ensure that some input are, indeed, a Thing

And from there, you can use this object everywhere, certain that this is, indeed, a Thing, and that it has all the properties that makes a thing a Thing

▲ BiteCode_dev 4 days ago | parent [-]

You can certainly do it, but since serialization and validation are the main benefit from using Pydantic, I/O are why it exists.

Outside of I/O, the whole machinery has little use. And since pydantic models are used by introspection to build APIs, automatic deserializer and arg parsing, making it fit the I/O is where the money is.

Also, remember that despite all the improved perf of pydantic recently, they are still more expensive than dataclass, themselves more than classes. They are 8 times more expensive to instanciate than regular classes, but above all, attribute access is 50% slower.

Now I get that in Python this is not a primary concern, but still, pydantic is not a free lunch.

I'd say it's also important to state what it conveys. When I see a Pydantic objects, I expect some I/O somewhere. Breaking this expectation would take me by surprise and lower my trust of the rest of the code. Unless you are deep in defensive programming, there is no reason to validate input far from the boundaries of the program.

▲ JackSlateur 4 days ago | parent [-]

This is true, there is a performance cost

Apart from what has been said, I find pydantic interesting even in the middle of my code: it can be seen as an overpowered assert

It helps making sure that the complex data structure returned by that method is valid (for instance)

▲ duncanfwalker 3 days ago | parent | next [-]

Yeah, I'd agree with that. Validation rules are like an extension to the type system. Invariants are useful at the edges of a system but also in the core. If, for example, I can be told that a list is non-empty then I can write more clean code to handle it.

In Java they got around the external-dependency-in-the-core-model problem by making the JSR-380 specification that could (even if only in theory) have multiple implementations.

In clojure you don't need to worry about another external dependency because the spec library is built-in. One could argue that it's still a dependency even if it's coming from the standard library. At the point I'd say, why are we worried about this? it's to isolate our core from unnecessary reasons to change.

I get that principled terms it's not right but, if those libraries change on API on a similar cadence to the programming language syntax, then it doesn't impact in practical terms. It's these kind of pragmatic compromises that distinguish Python from Java - after all, 'worse is better'.

▲ codethief 3 days ago | parent | prev [-]

You could also use a TypedDict for that, though?

PEP 764[0] will make them extra convenient.

[0]: https://peps.python.org/pep-0764/

▲ JackSlateur 3 days ago | parent [-]

Typing is declarative

In the end, it ensures nothing and accepts everything

  $ cat test.py
  from typing import TypedDict
  class MyDict(TypedDict):
        some_bool: bool
  print(MyDict(some_bool="test", unknown_field="blabla"))

   $ ./test.py
   {'some_bool': 'test', 'unknown_field': 'blabla'}

▲

codethief 3 days ago | parent [-]

Rolls eyes… Of course you need to use a type checker. Who doesn't these days?

▲

JackSlateur 3 days ago | parent [-]

Of course, I need runtime validation

	▲	codethief a day ago \| parent [-]
		Sure, runtime validation is useful – at the boundaries of your domain! After that your type checker should ensure your data has the shape your code expects. In other words: Parse, don't validate. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

▲ senkora 4 days ago | parent | prev [-]

You should do it if and only if backwards compatibility is more important for your project than development velocity.

If you have two layers of types, then it becomes much easier to ensure that the interface is stable over time. But the downside is that it will take longer to write and maintain the code.