I run a large on-prem temporal setup - throwaway acct as they will likely out me.

Temporal is, in my opinion having run it in prod for over a year - poorly designed, slow and ridicliously heavy infra wise.

If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day, you're going to spend millions on infra, and it's still going to absolutely suck.

Try running their own benchmarks, the numbers are pathetic.

Their sales team is also absolutely appalling and desperate.

From a Developer standpoint, the SDK is quite nice though.

Don't get trapped into nexus, and if the sales team call you make sure legal is in the room.

▲

quacker 36 minutes ago | parent | next [-]

Honest question: Can you use Temporal Cloud? Have you evaluated Temporal Cloud pricing?

Ballparking: 200 events/workflow, 200 workflows/per day and assuming 1 event = 1 cloud action[1], that is 1.2M or so actions per month. The $100/month plan includes 1M actions each month, and even the pay-as-you pricing when you exceed that is $50 per 1M actions[2].

Temporal Cloud seems extremely cheap for your use case, even if I'm off by a factor of 10. Is there a catch? You still need infra to run your Temporal workers, and I assume there are storage and other costs, but I assume action usage is the majority of it.

1. Not sure exactly what constitutes an "Action". At a glance, seems like most events have a corresponding action(?) and a subset of those actions are actually billable(?)

2. https://docs.temporal.io/cloud/pricing#payg-action-pricing

	▲	temporal_thr321 25 minutes ago \| parent [-]
		I was not clear; I did not mean not 200 a day, it's 10s of thousands of concurrently running workflows, sometimes into the hundreds of thousands, each with 200 events. We run many hundreds of thousands of these a day. Temporal was a bad fit for us, and we regret it deeply.

▲

temporal_thr123 3 hours ago | parent | prev | next [-]

Since I'm in a ranting mode -- here's a good example: you're limited to _ONE_ IO per shard in the history service:

https://github.com/temporalio/temporal/blob/e22e6304b3c4a409...

Temporal does a crazy amount of database operations and all of these are behind that mutex.

Oh, and you can't change the shard count on existing clusters.

Great stuff.

▲

lll-o-lll 2 hours ago | parent | prev | next [-]

> If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day, you're going to spend millions on infra, and it's still going to absolutely suck.

Where are the “millions” on infra going? It’s a handful of services and a Postgres?

> Their sales team is also absolutely appalling and desperate.

You said “on-prem”. It’s open source; why are you dealing with their sales team?

> If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day…

If “millions” were required to obtain such tiny scale, I’d agree there’d be a massive problem. No one would use Temporal; it would be a complete waste of resource. If this were true.

▲

cyberpunk 31 minutes ago | parent | next [-]

We also hit scaling problems with temporal.

Postgres doesn't scale at all four our workload, so you're into cassandra.

For a medium sized deployment, you're looking at 200+ vcpus, and then lets say standard dev/uat/prod. So now you're at 600 cpus. Now you need two geographic regions, dev can stay in one place, so now you're at 800. Want a failover cluster for prod? Have another 200 cpus.

and 200 CPUs is a medium deployment, assuming something like 36 cpus per cassandra node, then say 4-8 per instance of matching, worker, history, frontend. Then all your other components around it, ingress controller, service mesh, etc.

There's a million a year easy, for a small deployment.

Our prod one is 4x this size.

▲

temporal_thr321 21 minutes ago | parent | prev | next [-]

Not a couple hundred in one day, a couple hundred being started, concurrently, every second in a day. Each with ~200 events.

We need a 12 node cassandra cluster for this, with 64cpu nodes. So no, it's not a couple of services and a postgres.

Sales team, as we are an enterprise, and they want to extract money from us.

▲

turtlebits an hour ago | parent | prev [-]

The same with any "open-source" enterprise ($$$) software. It sucks to run yourself. Docs on running/errors are non-existent. Their helm charts are broken. Instead of degraded performance, it just fails.

	▲	cyberpunk 15 minutes ago \| parent \| next [-]
		Yeah, they've had so much VC cash pumped in lately they really need to pump the SAAS side of the business.
	▲	lll-o-lll an hour ago \| parent \| prev [-]
		With all due respect – if that’s the attitude, you have no business running anything on-prem. And that’s fine, there’s a reason the various cloud providers are the go-to for many businesses.

▲

dakiol 3 hours ago | parent | prev | next [-]

Agree. Have worked in a codebase using Temporal, and is pretty much a nightmare. I don't know about the infra side, but from the developer side, all the abstractions they bring to the table are poorly designed. Wouldn't recommend

▲

Linell 2 hours ago | parent | prev [-]

[dead]