This feels like the sort of architecture that starts clean and then gradually grows most of the things a workflow-native system already has. I've seen systems like this, seen companies that are built out of this idea, and built small systems like this over time.

Once you need retries, backoff, timeouts, cancellation, versioning, visibility, task routing, rate limits, leases, heartbeats, stuck-worker detection, replay/debugging semantics, workflow migration, fanout/fanin, long timers, audit trails, and operator tooling, the “just use a database” story becomes “build a poor copy of a workflow engine plus a bunch of workers.” pretty quick.

That may still be a good tradeoff for many applications, especially if Postgres is already the core operational dependency. But the comparison shouldn’t be “database vs overcomplicated orchestrator.” It’s more like “what complexity do you want to own, and what do you want to buy / offload to a professional system?”

▲

hmaxdml 15 hours ago | parent | next [-]

Yeah, we've observed that too: people start implementing their own retry logic, idempotency, etc. But then they grow a hard to maintain, complex stack that's not their core business logic. There's a reason why there is a dedicated team building DBOS, every day. Because it's not that easy to build a solid durable workflows engine on Postgres.

▲

UltraSane 4 hours ago | parent | prev | next [-]

Comments like this by people who know exactly what they are talking about are why I love Hackernews

▲

cpursley 8 hours ago | parent | prev | next [-]

https://github.com/pgmq/pgmq

▲

epolanski 15 hours ago | parent | prev | next [-]

Bingo, not even mentioning the blog post assumes all steps to be serializable.

I feel like this is the usual "just use postgres" garbage post that lacks any kind of nuance.

In fact you could replace that post with any other db and the statements keep being true, and naive.

▲

tomcam 8 hours ago | parent | prev | next [-]

Ridiculously good analysis! HN is a national treasure because of posts like this.

▲

hack1312 an hour ago | parent [-]

What was so revolutionary to you in their post to cause you to describe it as a “ridiculously good analysis”?

	▲	an hour ago \| parent [-]
		[deleted]

▲

nulltrace 15 hours ago | parent | prev [-]

The SKIP LOCKED pattern is fine until the worker count climbs. Then vacuum can't keep up. Dead tuples pile up, visibility map turns to swiss cheese. Queue table is tiny on disk but the planner thinks it's huge and stops using the index. It gets ugly fast.