Remix.run Logo
mrkeen 6 hours ago

I walked away from a job interview a few years ago on this point.

One of the technical questions was "if you have a db and a message queue, how do you get your update to alter both or neither (i.e. transactionally)"?

I thought about it for a couple of minutes, then came back with something like "I can't, and you can't either." Then I proposed the usual spiel about using a replicated-state-machine/write-ahead-log/event-sourcing (whatever it might be called at the time) and leaning into eventual consistency as the only practical solution.

He asked if I'd heard about the outbox pattern, so I let him describe it. Sure enough it sounded like this article. The secret to transacting across the database D and the message queue Q:

  (D,Q)
is to split D into two parts (the State and the Outbox), transact across those instead

  (S,O)    Q
and then just pretend that you have a transaction across D and Q.
foundart 3 minutes ago | parent | next [-]

The way they worded that question is bad and, as you say, the outbox pattern does not transactionally update the queue itself. The outbox pattern is nevertheless very useful.

andix 4 hours ago | parent | prev | next [-]

With an inbox/outbox pattern it's possible. The incoming message might be processed more than once, and an outgoing message might be sent more than once. That's the limitation, and the system needs to be able to handle it.

If you can't de-duplicate messages it's not possible, that's true.

orbisvicis 2 hours ago | parent [-]

I'm not following. Doesn't the outbox pattern just pass the buck?

The motive seems to be a naive process that enqueues a message and then commits to a database - two independent actions. But a well-behaved process would commit to a database, and then only if successful enqueue a message. That's better but still not atomic - commit, crash, and no message queued.

So the solution is a two-table write - the outbox pattern. But the process that reads the outbox must commit both a query and delete before sending the message. That's the same risk as the agreement well-behaved program - commit, crash, and no message queued. Except now you introduced another pipeline element so your overall complexity increases, and so too risk.

What if you never delete messages from the outbox? Well, what you have now is no longer an outbox nor a database nor useful for large volumes. What if you implement a database to track procesed messages. Return to square one - that's the same problem you were initially trying to solve.

What if you fetch, enqueue, and then delete? Ohh... that works. In case of a crash the message remains in the outbox. It may be processed in duplicate, but eventually if successfully it will be deleted from the outbox.

The message broker then receives a possibly duplicate message. It must consult its internal database, and if the message is unique, route it. So right back at square one. Can't have atomicity and uniqueness.

KraftyOne an hour ago | parent | next [-]

Outbox's power is that it turns an atomicity problem into an idempotency problem. You atomically write to the outbox, then you have an idempotent "workflow" that processes events from the outbox. This turns "at most once" semantics (where an event could be dropped entirely) to "at least once" semantics (where the event processing could run multiple times). For many systems, that's a big improvement.

620gelato 21 minutes ago | parent [-]

That's a good tradeoff I suppose. I've been racking my brain trying to find a solution recently that solves both of these but haven't been able to.

What I had landed on was idempotency on a best effort basis and just made the event processing safely retryable without violating any system invariants.

andix 34 minutes ago | parent | prev | next [-]

The message is written to the outbox table in the same transaction as the database changes. Only if the transaction completes, the message is actually created, and other tables are updated.

In a second step the message is taken from the outbox and gets sent to the queue/broker. Only after it was sent out, the message is removed from the outbox. If the sending fails, it stays in the outbox and is retried. If the deletion of the message from the outbox fails after sending, it's getting re-sent later. So you can get a duplicated out-message.

Message brokers usually don't de-duplicate messages, they don't have a database that keeps messages, the receivers need to do that. Either with idempotency, or by tracking message ids. Event sourcing brokers can de-duplicate, because it can stores all messages.

If you never delete messages from the outbox, then they are re-sent all the time. You are going to notice such a bug really quickly.

Inbox pattern works very similarly, but the other way around.

atomicnumber3 an hour ago | parent | prev [-]

No, you're right. It basically just passes the buck. But the general idea is that if your transaction succeeds, you KNOW that there is a durable record that some external thing needs to end up in a message bus. And then something else can sit there and spin retries until it happens. It gives you the opportunity for retrying getting it onto the message bus, out of band of the process that is trying to initiate the enqueue.

And the outbox pattern isn't bs - it DOES help a lot in practice. But exactly how much it _guarantees_ something happens is of course still quite limited. And yes as you note it's an At-least-once strategy.

vlovich123 5 hours ago | parent | prev | next [-]

> Sure enough it sounded like this article

FWIW The article literally talks about the challenges with getting this to actually work and recommends removing it and just using the DB for everything.

mrkeen 5 hours ago | parent [-]

But that's what the outbox pattern is. You take the problem of transacting between more than one system, and by "just using the db", you declare the problem solved, leaving the communication with other systems as an exercise for the reader.

From the end of the article:

  The enqueue_workflow UDF creates this row in the same transaction as the user database update, guaranteeing atomicity
sarchertech 4 hours ago | parent | next [-]

Your right! But the outbox pattern is good enough for a lot of purposes. The outbox pattern works if the only reason the write to the 2nd system can fail is because of transient issues. It will keep trying until the system is back up.

If the 2nd system write can fail for non-transient reasons, the outbox pattern doesn’t work and you need either 2 phase commit or a distributed saga.

I wrote about this here a few years ago.

https://linuxblog.io/the-two-generals-problem/

koverstreet 4 hours ago | parent | prev [-]

The point is that the "outbox pattern" is not an atomic transaction. You fundamentally don't have those in the distributed world, via the CAP theorem, and if you want anything close to the guarantees a local transactional database gives you in a distributed system you have to design your schema for it.

Distributed coherency is not something you can abstract away, the abstractions all leak.

belinder 6 hours ago | parent | prev | next [-]

Why not just put the message queue in the same db

CodesInChaos 6 hours ago | parent | next [-]

That's what I generally choose. You don't need to worry about distributed system semantics, if you choose to not make the system distributed.

However the way Postgres keeps around obsolete rows (deleted or modified) until they're vacuumed can cause problems for high throughput queues. So for those systems the complexity might be worth it. But I bet 90% of the time the choice to use a separate queue is premature optimization. And hopefully OrioleDB (undo based storage engine for postgres) will avoid most of these pitfalls reducing the need for separate queues even further.

mrkeen 5 hours ago | parent | prev | next [-]

Step 1: identify that you and at least one other node are separated by distance, and some lossy communication channel, and therefore form a distributed system.

Step 2: propose a source of truth that everyone can listen to. Hearing the same facts in the same order should put everyone in the same state (eventual consistency)

Step 3 (you are here): try to do better than EC, by merging the external queue into one of the nodes, making it the master.

Step 4: Now there's no distance between the nodes, so no need to solve the distributed systems problem and you can retire the queue.

convolvatron 5 hours ago | parent [-]

eventual consistency as generally used doesn't guarantee that events are presented in the same order. I use 'monotonic consistency' for that, but idk how common that is.

AlotOfReading 3 hours ago | parent | next [-]

Order independence/monotonicity is strong EC rather than regular EC.

mrkeen 5 hours ago | parent | prev [-]

Yes, same-ordering gives you EC, not the other round.

KraftyOne 6 hours ago | parent | prev [-]

That's what the post is about! Once you're doing that, you really do have transactions between the state and the queue.

pylua 4 hours ago | parent | prev | next [-]

Just post to the database then asynch send to message queue. Messages should still be idempotent by the consumer but at least this follows rest and is transactional.

It’s simple and easy to follow. At scale use multi tenancy.

jayd16 6 hours ago | parent | prev | next [-]

It's a bit of trick that the outbox to queue part of it likely needs to support "at least once but duplicates possible" into the queue.

mrkeen 5 hours ago | parent | next [-]

"Send multiple times from D to Q and deduplicate with a UUID" (idempotency) is well short of "insert into both D and Q or neither" (atomicity)

jayd16 5 hours ago | parent [-]

What are you saying here? I'm pointing out that you need to be ok with the lack of exactly once transaction between O and Q. Maybe you're agreeing and simply saying that's a fine?

KraftyOne 6 hours ago | parent | prev [-]

Every item will be written to the queue exactly once (as the update is transactional). Queue processing may need at-least-once semantics, yes, depending on what exactly you're doing.

jayd16 5 hours ago | parent [-]

The queue write is not in the transaction. The proposed trick is that that is ok because an outbox is able to be transacted on. It kicks the can some what...

EGreg 2 hours ago | parent | prev | next [-]

One major trick in distributed systems is to always attempt things in the same order. And then locally, you just store what you’ve seen, for “a long time”. That takes care of a lot of transactional issues — idempotency, retries, exactly-once delivery with no distributed locks, etc.

But as someone who builds distributed systems, I can tell you that transactions should be local. Anytime you want to lock something across the network (eg Canisters in ICP) so you can read it, that’s probably a code smell. You probably want to have evented reactive things ripple out, you do need idempotency, but you shouldn’t design your system to read remote state if you can help it. Only subscribe to remote messages.

game_the0ry 4 hours ago | parent | prev [-]

I envy you DB + distributed systems specialists. Reminds me I still have a lot to learn.