| ▲ | motorest 5 days ago |
| What a great article. It's always a treat to read this sort of take. I have some remarks though. Taken from the article: > Avoid having five different services all write to the same table. Instead, have four of them send API requests (or emit events) to the first service, and keep the writing logic in that one service. This is not so cut-and-dry. The trade offs are far from obvious or acceptable. If the five services access the database then you are designing a distributed system where the interface being consumed is the database, which you do not need to design or implement, and already supports authorization and access controls out of the box, and you have out-of-the-box support for transactions and custom queries. On the other hand, if you design one service as a high-level interface over a database then you need to implement and manage your own custom interface with your own custom access controls and constrains, and you need to design and implement yourself how to handle transactions and compensation strategies. And what exactly do you buy yourself? More failure modes and a higher micro services tax? Additionally, having five services accessing the same database is a code smell. Odds are that database fused together two or three separate databases. This happens a lot, as most services grow by accretion and adding one more table to a database gets far less resistance than proposing creating an entire new persistence service. And is it possible that those five separate services are actually just one or two services? |
|
| ▲ | paffdragon 5 days ago | parent | next [-] |
| > the interface being consumed is the database, which you do not need to design or implement You absolutely should design and implement it, exactly because it is now your interface. In fact, it will add more constraints to your design, because now you have different consumers and potentially writers all competing for the same resource with potentially different access patterns. Plus the maintenance overhead that migrations of such shared tables come with. And eventually you might have data in this table that are only needed for some of the services, so you now need to implement views and access controls at the DB level. Ideally, if you have a chance to implement it, an API is cleaner and more flexible. The problem in most cases is simply business pushing for faster features which often leads to quick hacks including just giving direct access to some DB table from another service, because the alternative would take more time, and we don't have time, we want features, now. But I agree with your thoughts in the last paragraph. It happens very often that people don't want to undertake the effort of a whole new design or redesign to match the evolving requirements and just patch it by adding a new table to an existing DB, then another,... |
| |
| ▲ | hamandcheese 5 days ago | parent | next [-] | | > In fact, it will add more constraints to your design, because now you have different consumers and potentially writers all competing for the same resource with potentially different access patterns. Plus the maintenance overhead that migrations of such shared tables come with. And eventually you might have data in this table that are only needed for some of the services, so you now need to implement views and access controls at the DB level. PostgreSQL, to name one example, can handle every one of these challenges. | | |
| ▲ | paffdragon 5 days ago | parent [-] | | It's not that it is not possible, but whether it's a good idea. The usual problem is that some team exposes one of their internal tables and they don't have control over what type of queries are run against it that could impact their service when the access patterns differ. Or when the external team is asking for extra fields that do not make sense for the owning team's model. Or adding some externally sourced information. Or the team moving from PostgreSQL to S3 or DynamoDB. And this is not an exhaustive list. An API layer is more flexible and can remain stable over a longer time than exposing internal implementation depending on a particular technology implemented in a particular way at the time they agreed on sharing. This is, of course, not a concern inside the same team or very closely working teams. They can handle the necessary coordination. So, there are always exceptions and simple use cases where DB access works just fine. Especially, if you don't already have an API, which could be a bigger investment to set up for something simple if it's not even known yet the idea will work etc. |
| |
| ▲ | marcosdumay 5 days ago | parent | prev [-] | | > Plus the maintenance overhead that migrations of such shared tables come with. Moving your data types from SQL into another language solves exactly 0 migration problems. Every migration you can hide with that abstraction language you can also hide in SQL. Databases can express exactly the same behaviors as your application code. | | |
| ▲ | zbentley 5 days ago | parent [-] | | I’m generally pro SQL-as-interface, but this is just wrong. Not only are there all sorts of bizarre constraints imposed by databases on migration behavior that application code can’t express (for example, how can I implement a transaction-plus-double-write pattern to migrate to use a new table because the locks taken to add an index to the old table require unacceptably long downtime? There are probably some SQL engines out there that can do this with views, but most people solve it on the client side for good reason), but there are plenty of changes that you just plain can’t do without a uniform service layer in front of your database. Note that “uniform service layer” doesn’t necessarily mean “networked service layer”, this can be in-process if you can prevent people from bypassing your querying functions and going directly to the DB. | | |
| ▲ | branko_d 5 days ago | parent | next [-] | | > Note that “uniform service layer” doesn’t necessarily mean “networked service layer”, this can be in-process if you can prevent people from bypassing your querying functions and going directly to the DB. You can take it one step further and implement the “uniform service layer” in the database itself - using stored procedures and views. This has downsides, like strong coupling with the specific DBMS, and difficulty of development in a comparatively primitive SQL dialect, but protects the database from “naughty” clients and can have tremendous performance advantages in some cases. | |
| ▲ | sgarland 5 days ago | parent | prev | next [-] | | As the sibling comment mentioned, this is a solved problem. MySQL and MariaDB take a very brief lock on the table at the very end of index creation that you will not notice, I promise. Postgres does the same if you use the CONCURRENTLY option for index builds. If for some reason you do need to migrate data to a new table, triggers. If somehow these still don’t solve your problem, ProxySQL or the equivalent (you are running some kind of connection pooler, right?) can rewrite queries on the fly to do whatever you want. | |
| ▲ | ndriscoll 5 days ago | parent | prev [-] | | Either triggers or create index concurrently? Do most people solve that on the client side? Doesn't e.g. percona use triggers? |
|
|
|
|
| ▲ | dkarl 5 days ago | parent | prev | next [-] |
| > And what exactly do you buy yourself? APIs can be evolved much more easily than shared database schemas. Having worked with many instances of each kind of system, I think this outweighs all of the other considerations, and I don't think I'll ever again design a system with multiple services accessing the same database schema. It was maybe a good idea if you were a small company in the early 2000s, when databases were well-understood and services weren't. After that era, I haven't seen a single example of a system where it wasn't a mistake for multiple services to access the same database schema (not counting systems where the read and write path were architecturally distinct components of the same service.) |
| |
| ▲ | vbezhenar 4 days ago | parent | next [-] | | I've implement an interesting service 15 years ago. Recently I've heard of it. So this service was basically an "universal integration service". Company wanted to share some data, and they wanted to implement it in an universal way. So basically I've implemented SOAP web service which received request with SQL text and responded with list of rows. This service was surprisingly popular and used a lot. I was smart enough, so I built a limited SQL syntax parser and UI, so administrator could just set up tables and columns they wanted to share for this specific client. SQL query was limited in a sense that it worked only with one table, simple set of columns and some limited conditions (that I bothered to implement). The reason I've heard about it few months ago is that they shared with me, that they caught a malicious guy, who worked at some company integrating with this system and he tried to do SQL attack. They noticed errors in the logs and caught him. Their database is pretty much done and frozen, regarding to schema. They hardly evolve it. So this service turned out ot be pretty backwards-compatible. And simple changes, of course, could be supported with view, if necessary. | |
| ▲ | CuriouslyC 5 days ago | parent | prev [-] | | Service specific views, my guy. | | |
| ▲ | zbentley 5 days ago | parent [-] | | And when the underlying tables have to change, what then? Views are good, and help with this situation. But if the data is complicated, big, and even somewhat frequently changes shape (DDL), views only help a little. That said, I think that API update coordination is often much harder than schema change coordination (due to API behavior having many more dimensions along which it can change than database query behavior), so I am generally open to multiple services sharing a database—so long as it’s within reason and understood to pose risks/be appropriately justified and used responsibly. | | |
| ▲ | dkarl 5 days ago | parent [-] | | 100%. Views don't even cover all the use cases of schema evolution, unless you're willing to duplicate business logic between stored procedures and services, but schema evolution is only the start of it. API versioning gives you a lot more flexibility to evolve how data is stored and accessed. Some parts of your data might get shifted into other data stores, some functionality might get outsourced to third party APIs, you might have to start supporting third-party integrations, etc. Try doing that from a view -- or rather, please don't! |
|
|
|
|
| ▲ | sethammons 5 days ago | parent | prev | next [-] |
| The goal is to minimize what needs changing when things need changing. When you need to alter the datastore, usually for product or scalability, you have to orchestrate all access to that datastore. Ergo: one only one thing using the datastore means less orchestration. At work, we just updated a datastore. We had to move some tables to their own db. 3 years later, 40+ teams have updated their access. This was a product need. If this was a scale issue, the product would just have died sans some as of yet imagined solution. |
| |
| ▲ | wahnfrieden 5 days ago | parent [-] | | A reused code library for DB use is an alternative there | | |
| ▲ | paffdragon 5 days ago | parent [-] | | That moves your API layer to the client library you need to distribute and build for your customers in programming languages they support. There are some cases where a thick client makes sense, but usually easier to do it server side and let customers consume the API from their env, it is easier to patch the server than to ship library updates to all users. | | |
| ▲ | zbentley 5 days ago | parent [-] | | I think most of the discussion in this thread assumes that “customers” of the interface are other groups in the same organization using the database for a shared overarching business/goal, not external end user customers. For external end users, absolutely provide an API, no argument here. The internal service interactions behind that API are a less simple answer, though. | | |
| ▲ | paffdragon 5 days ago | parent [-] | | It's definitely worse for external customers, of course. But it's still not that easy even for internal customers. The main problem is that usually the tables exposed are not meant to be public interfaces, so the team takes on an external dependency to their internal schema. And that other team could have completely different goals and priorities, speed and size, management and end users with different requirements. At some point the other team might start to ask the first team for adding some innocent looking fields to their internal table for them. Also first team might need to make changes to support their own service that might not be compatible with the other team. The other team making queries that are not in control of the team owning the DB, which could impact performance. If possible, it is better to agree on an API and avoid depending on internal implementations directly even for internal customers. There are always some exceptions, e.g. very close or subteams under same management and same customers could be fine. Or if the table in question was explicitly designed as a public interface, it is rare, but possible. |
|
|
|
|
|
| ▲ | 5 days ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | sgarland 5 days ago | parent | prev | next [-] |
| > Additionally, having five services accessing the same database is a code smell. Counterpoint (assuming by database you mean database cluster, not a schema): having a separate physical DB for each service means that for most places, your reliability has now gone from N to N^M. |
| |
| ▲ | lnenad 5 days ago | parent [-] | | From which perspective? If a service is up, but is unable to do anything since another service is down, what good does it do other than increase some metrics on some dashboard. (Note that we are specifically talking about coupled services since the implication is writing to a single db being split up into multiple dbs - a distributed monolith). | | |
| ▲ | sgarland 5 days ago | parent [-] | | Fair point. Unfortunately for me, the only kind of microservice architecture I’ve ever worked with is a distributed monolith - at multiple companies. |
|
|
|
| ▲ | bubblebeard 5 days ago | parent | prev | next [-] |
| I think the author meant, in a general way, it’s better to avoid simultaneous writes from different services, because this is an easy way to introduce race conditions. |
|
| ▲ | brainzap 4 days ago | parent | prev | next [-] |
| Airflow 2 used database to coordinate, airflow 3 switched to API. |
|
| ▲ | Muromec 5 days ago | parent | prev [-] |
| >And what exactly do you buy yourself? More failure modes and a higher micro services tax? Nice boxes in the architectural diagram. Each box is handed to a different team and then, when engineers from those teams don't talk to each other, the system doesn't suddenly fail in an unexpected way. |
| |
| ▲ | PartiallyTyped 5 days ago | parent [-] | | At amzn a decision from atop was made that nobody would ever write in shared dynamo db tables. A team
would own and provide APIs. That massively improved reliability and velocity. | | |
| ▲ | paffdragon 5 days ago | parent | next [-] | | The team boundary is very important. You can get away with shared DB for a long time if the same team handles all services that access it and have absolute tight control over them. If there are different teams in picture, however, the tight coupling is a source of problems and a bottleneck, beyond prototyping / idea validation, etc. | |
| ▲ | foobarian 5 days ago | parent | prev [-] | | I don't need a decision from atop amazon to remind me how painful it would be to migrate a widely shared dynamo instance or god forbid change dax settings |
|
|