Remix.run Logo
Keep Pydantic out of your Domain Layer(coderik.nl)
92 points by erikvdven 6 days ago | 34 comments
lysecret 3 days ago | parent | next [-]

The core thesis is that your types received by the api should not be the same as the types you process internally. I can see a situation where this makes sense and a situation where this senselessly duplicates everything. The blog post shows how to do it but never really dives into why/when.

jon-wood 3 days ago | parent | next [-]

I’ve not done this in Python, where mercifully I don’t really touch CRUD style web apps anymore, but when I was doing Ruby web development we settled on similar patterns.

The biggest benefit you get is being able to have much more flexibility around validation when the input model (Pydantic here) isn’t the same as the database model. The canonical example here would be something like a user, where the validation rules vary depending on context, you might be creating a new stub user at signup when only a username and password are required, but you also want a password confirmation. At a different point you’re updating the user’s profile, and that case you have a bunch of fields that might be required now but password isn’t one of them and the username can’t be changed.

By having distinct input models you make that all much easier to reason about than having a single model which represents the database record, but also the input form, and has a bunch of flags on it to indicate which context you’re talking about.

NeutralCrane 3 days ago | parent | prev | next [-]

> The core thesis is that your types received by the api should not be the same as the types you process internally.

Is it? I read the blog a couple of times and never was able to divine any kind of thesis beyond the title, but as you said, the content never actually explains why.

Perhaps there is a reason, but I didn’t walk away from the post with it.

tetha 3 days ago | parent | prev | next [-]

It does touch on what I was thinking as well at the end of the first section: Usually this makes sense if your application has to manage a lot of complexity, or rather, has to consume and produce the same domain objects in many different ways across many different APIs.

For example, some systems interact with several different vendor, tracking and payment systems that are all kinda the same, but also kinda different. Here it makes sense to have an internal domain model and to normalize all of these other systems into your domain model at a very early level. Otherwise complexity rises very, very quickly due to the number of n things interacting with n other things.

On the other hand, for a lot of our smaller and simpler systems that output JSON based of a database for other systems... it's a realistic question if maintaining the domain model and API translation for every endpoint in every change is actually less work than ripping out the API modelling framework, which occurs once every few years, if at all? Some teams would probably rewrite from scratch with new knowledge, especially if they have API-tests available.

skissane 3 days ago | parent | prev | next [-]

I used to work on a Java app where we did this… we had a layer of POJO value classes, a layer of ORM objects… both written by hand… plus for every entity a hand-written mapper which translated between the two… and then sometimes we even had a third layer of classes generated from Swagger specs, and yet another set of mappers to map between the Swagger classes and the value POJOs

Now I mainly do Python and I don’t see that kind of boilerplate duplication anywhere near as much as I used to. Not going to say the same kind of thing never happens in Python, but the frequency of it sure seems to have declined a lot-often you get a smattering of it in a big Python project rather than it having been done absolutely everywhere

r9295 3 days ago | parent | prev | next [-]

Personally, I think that's a good idea. Design patterns naturally make sense (Visitor, Builder for e.g) once you encounter such a situation in your codebase. It almost makes complete sense then. Otherwise IMHO, it's just premature abstraction

nyrikki 3 days ago | parent | prev | next [-]

PO?O is just an object not bound by any restriction other than those forced by the Language.[0]

From the typing lens, it may be useful to consider it from Rice's theorm, and an oversimplification that typing is converting a semantic property to a trivial property. (Damas-Hindley-Milner inference usually takes advantage of a pathological case, it is not formally trivial)

There is no hard fast rules IMHO, because Rice, Rice-Shapiro, and Kreisel-Lacombe-Shoenfield-Tseitin theorms are related to generalized solutions as most undecidable problems.

But Kreisel-Lacombe-Shoenfield-Tseitin deals with programs that are expected to HALT, yet it is still undecidable if one fixed program is equivalent to a fixed other program that always terminates.

When you start stacking framework, domain, and language restrictions, the restrictions form a type of coupling, but as the decisions about integration vs disintegration are always tradeoffs it will always be context specific.

Combinators (maybe not the Y combinator) and finding normal forms is probably a better lens than my attempt at the flawed version above.

If you consider using po?is as the adapter part of the hex pattern, and notice how a service mesh is less impressive but often more clear in the hex form, it may help build intuitions where the appropriate application of the author's suggestions may fit.

But it really is primarily decoupling of restrictions IMHO. Sometimes the tradeoffs go the other way and often they change over time.

[0] https://www.martinfowler.com/bliki/POJO.html

BiteCode_dev 3 days ago | parent | prev | next [-]

Because they don't represent the same thing. Pydantic models represent your input, it's the result of the experience you expose to the outside world, and therefore comes with objectives and constraints matching this:

- make it easy to provide

- make it simple to understand

- make it familiar

- deal with security and authentication

- be easily serializable through your communication layer

On the other hand, internal representations have the goal to help you with your private calculations:

- make it performant

- make it work with different subsystems such as persistence, caching, queuing

- provide convenience shortcuts or precalculations for your own benefits

Sometimes they overlap, or the system is not big enough that it matters.

But the more you get to big or old system, the less likely they will.

However, I often pass around pydantic objects if I have them, and I do this until it becomes a problem. And I rarely reach that point.

It's like using Python until you have performance problems.

Practicality beasts premature optimization.

senkora 3 days ago | parent | prev [-]

You should do it if and only if backwards compatibility is more important for your project than development velocity.

If you have two layers of types, then it becomes much easier to ensure that the interface is stable over time. But the downside is that it will take longer to write and maintain the code.

IshKebab 3 days ago | parent | prev | next [-]

This seems ridiculously over-complicated. This guy would love Java.

He doesn't even say why you should tediously duplicate everything instead of just using the Pydantic objects - just "You know you don’t want that"! No I don't.

The only reason I've heard is performance... but... you're using Python. You don't give a shit about performance.

barbazoo 3 days ago | parent | prev | next [-]

> But Pydantic is starting to creep into every layer, even your domain, and it starts to itch.

I can’t relate yet. Itch how? It doesn’t really go into what the problem is they’re solving.

NeutralForest 3 days ago | parent | prev | next [-]

What's the motivation for doing this? When does Pydantic in the domain model starts being an issue?

gostsamo 3 days ago | parent | prev | next [-]

I'm sure that the pydantic guys had a reason to rename .dict to .model_dump. This single change caused so much grieve when upgrading to pydantic2.1 The very idea of unnecessary breaking changes is a big reason not to over rely on pydantic, tbh.

1 we were using .dict to introduce pydantic in the mix of other entity schemes and handling this change later was a significant pain in the neck. Some python introspection mechanism that can facilitate deep object recasting might've been nice if possible.

brap 3 days ago | parent | prev | next [-]

I’m far from being an experienced Pythonista, but one thing that really bugs me in Python (and other dynamic languages) is that when I accept an input of some type, like User, I have to wonder if it’s really a User. This is annoying throughout the codebase, not just the API layer. Especially when there are multiple contributors.

The argument against using API models internally is something I agree with but it’s a separate question.

politelemon 3 days ago | parent | prev | next [-]

The reasoning given here is more academic than anything else. I'm not seeing any actual problem here though. Perhaps this could show how this is bad. Until then, I don't think this excessive duplication and layering is necessary, and is more of a liability itself.

> That’s when concerns like loose coupling and separation of responsibilities start to matter more.

ripped_britches 3 days ago | parent | prev | next [-]

This persons’s head would explode if they saw what we’re doing over here in typescript with structural typing. It would make things way too simple.

jmward01 3 days ago | parent | prev | next [-]

Strongly decoupling API implementation and, well, actual implementation, is pretty key when you start to evolve an application. People often focus on 'the design' like there is one perfect design for an application for its lifetime when in really it is about how easy the mass of code you have is able to change for the next feature/fix/change and not turn into a hairball of code. That perfect initial design where the internal and external objects are exactly the same generally works well for 1.0, but not 1.1 or 2.0 so strongly decoupling the API implementation is a good general practice if you think your code will continue to evolve.

rtpg 3 days ago | parent | prev | next [-]

In the Django world I have gotten very frustrated at people rushing to go from DRFs serializers to Django Ninja + Pydantic.

You have way less in terms of tools to actually provide nice straightforward APIs. I appreciate that Pydantic gives you type safety but at one point the actual ease of writing correct code goes beyond type safety

Just real straightforward stuff around dealing with loading in user input becomes a whole song and dance because Pydantic is an extremely basic validation thing… the hacks in DRF like request contexts are useful!

I’ve seen many projects do this and it feels like such a step back in offering simple-to-maintain APIs. Maybe I’m just biased cuz I “get” DRF (and did lose half a day recently to weird DRF behavior…)

dgan 3 days ago | parent | prev | next [-]

i have to confess , i use Protobuffs for everything. They convert to pure python (a la dataclass), to json strings and to binary strings, so i literally shove it everywhere : network, logic, disk.

BUT when doing heavy computation (c++, not python !) don't forget to convert to plain vectors, Protobuffs are horribly inefficient

vjerancrnjak 3 days ago | parent | prev | next [-]

Just have 1 input type and 1 output type. You don’t need more data types in between.

If pydantic packages valid input, use that for as long as you can.

Loading stuff from db, you need validation again, either go from binary response to 1 validated type with pydantic, or ORM object that already validates.

Then stop having any extra data types.

Keeping pydantic only at the edge and then abandoning it by reshaping it into another data type is a weird exercise. It might make sense if you have N input types and 1 computation flow but I don’t see how in the world of duck typing you’d need an extra unified data type for that.

karolinepauls 3 days ago | parent | prev | next [-]

I'll go further and elsewhere at once: APIs should not present nested objects but normalised data. It enables clients to easily to lay out their display structure independently of API resource schemas and eases out tricks like diffing between subsequent responses, pulling updates or requesting new data by passing IDs and timestamps of already known data, etc. API normalised data obviously shouldn't correspond to DB normalised data. Nested objects are superior only for use with jq.

henning 3 days ago | parent | prev | next [-]

Oh boy, I love making adding a trivial nullable column take even more code and require even more tests and have even more places I forgot to update which results in a field being nullable somewhere.

And don't forget, you get to duplicate this shit on the frontend too.

And what is a modern app if we aren't doing event-driven microservice architecture? That won't scale!!!! So now I also have to worry about my Avro schema/Protobufs/whateverthefuck. But how does everyone else know about the schema? Avro schema registry! Otherwise we won't know what data is on the wire!

And so on and so on into infinity until I have to tell a PM that adding a column will take me 5 pull requests and 8 deploys amounting to several days of work.

Congratulations on making your own small contribution to a fucking ridiculous clown fiesta.

golly_ned 3 days ago | parent | prev | next [-]

I still don’t quite get the motivation for “don’t use pydantic except at border” — it sounds like it’s “you don’t need it”, which might be true. But then adds dacite to translate between pydantic at the border and python objects internally. What exactly is wrong with pydantic internally too?

clickety_clack 3 days ago | parent | prev | next [-]

I use pyrsistent in the domain, and pydantic for tricky validation at the boundary. Pyrsistent is a pretty neat solution if you want immutable data structures, with some nice methods for working with nested records.

ac130kz 3 days ago | parent | prev | next [-]

An easier/moderate approach: make a proper base DTO model, which can be extended by validators, such as Pydantic, and the db model is the Domain is just whatever an ORM offers/dataclasses.

talos_ 3 days ago | parent | prev | next [-]

You should checkout the Python framework Litestar. It's an alternative to FastAPI that implements these ideas via their "Data Transfer Object" concept

leoff 3 days ago | parent | prev | next [-]

>The less your core logic depends on specific tools or libraries, the easier it becomes to maintain, test, or even replace parts of your system without causing everything to break.

It seems like the author doesn't like depending on `pydantic`, simply because it's a third party dependency. To solve this they introduce another, but more obscure, third party dependency called `dacite`, that converts `pydantic` to `dataclasses`.

It's more likely that `dacite` is going to break your application, than `pydantic`, a library used by millions of users in huge projects, ever will. Not to mention the complexity overhead introduced by this non sense mapping.

Lucasoato 3 days ago | parent | prev | next [-]

Actually Pydantic could be extremely useful if used in conjunction with SQLAlchemy, check out the SQLModel library, from the very same creators of Pydantic.

3 days ago | parent | prev | next [-]
[deleted]
mindcrash 3 days ago | parent | prev | next [-]

And that's why it is key in your architecture to differentiate between Data Transfer Objects (DTOs) or Models on one hand which has values which can and actually must be validated when they come from the outside, and Domain Entities / Value Objects on the other. Even though the DTO and Domain Entity might look similar.

Thank me later.

stephantul 3 days ago | parent | prev | next [-]

I think this article misses the main point by focusing on removing pydantic. The main point is that you should convert external types as soon as possible to decouple them from the rest of your code. Whether this involves pydantic or something else is not really important I guess

nisten 3 days ago | parent | prev | next [-]

From the article:

"Why are there no laws requiring device manufacturers to open source all software and hardware for consumer devices no longer sold?"

I think it's because people (us here included) love to yap and argue about problems instead of just implementing them and iterating on solutions in an organized manned. A good way these days to go about it would be to forego the facade of civility and use your public name to publicly tell your politician to just fuck it, do it it bad, and have plan to UNfuck after you fuck it up, until the fucking problem is fucking solved.

Same goes for UBI and other semi-infuriating issues that seem to (and probably do) have obvious solutions that we just don't try.

axpy906 3 days ago | parent | prev | next [-]

The trouble I have with pedantic is that everything is immutable. There are use cases where I need mutability and it’s not bad but a trade off.

throwaway7783 3 days ago | parent | prev [-]

Return of Java DTOs!