Look, this is pointless. I'm not learning anything new when you tell me that it can and will happen. How will it happen and how much will it happen?

Hence linking to Uber's case study on the issue. The answer? Not that much.

Uber started performing race detection in production over a 6 month period and found 2,000 different race conditions. Ouch, that sounds horrible!

But wait, we're talking about 50 million lines of Go code and 2,100 services at the time of that writing. That means they were seeing approximately 1 race condition per 25,000 lines of code and about 1 race condition per service. That actually lines up pretty well with my experiences. Although I haven't had a production outage or serious correctness issue caused by a race condition in Go, I have seen probably about one or two race conditions that made it to production per service. I reckon those codebases were likely somewhere between 10,000 and 25,000 lines of code most likely, so not so far off of the scale.

But again it doesn't always lead to a serious production outage, it's just that simple. It could be worse too (could corrupt some data and pollute your production database or something, in the worst case) but usually it's better (wonky behavior but no long-term effects, maybe the service periodically crashes but restarts, leading to some dropped requests but no long term downtime.) Uber has no doubt seen at least some Go data races that have caused actual production outages, but they've seen at least 2,000 Go data races that haven't, otherwise they would've probably been caught before the race detector caught them, Go dumps stacktraces on crash. That has to tell you something about the actual probability of causing a production outage due to a data race.

Again, you do you, but I will not be losing sleep over this. It is something to be weary of when working on Go services, but it is manageable.

▲

zozbot234 4 days ago | parent [-]

Identifiable "wonky" behavior and periodic crashes seem like a very real issue to me. This wouldn't fly for any mission-critical service, it's something that demands a root cause analysis. Especially since it's hard to be sure after the fact that no data has been corrupted somehow or that security invariants have not been violated due to the "wonky" behavior.

▲

jchw 4 days ago | parent [-]

I am saying in no uncertain terms that most people here, and by most I am not talking simple majority stuff, have literally not once worked on software that is mission critical by any meaningful definition of "mission critical". Even Rust is questionable on truly mission critical software, since it does not actually prevent all runtime crashes and certainly not all correctness issues; you'd have to go further, towards something like Ada/SPARK for that. I kind of wish I could get into Ada/SPARK, too, to be honest, but it's a pretty big rabbithole it seems.

▲

zozbot234 4 days ago | parent [-]

A meaningful definition of "mission critical" is just "serious money can be lost if this software crashes or misbehaves in problematic ways". That would seem to cover a whole lot of software that is not written in Ada/SPARK or anything comparable. I'm not talking about the "safety critical" kind where actual human lives may be at stake, only about the well known run-of-the-mill stuff.

▲

jchw 4 days ago | parent [-]

In that case, when we're just talking about money, it's pretty easy to reason about this then, no? You can literally determine how much risk you're willing to take on by estimating what you might have to lose from such a bug and how much it might cost you versus how often they are likely to happen. The answer for how often is "not very often", and depending on the nature of the bug the monetary cost of it may be "pretty much $0" in the easy cases. Let's be conservative and say that you might see a moderate severity Go concurrency bug every 10,000 lines of Go code or so. That's still really not much. It means that a moderate sized 50,000 line code program might see like five of said kinds of bugs, and they might wind up being benign. Computers and networks are unreliable. Dropping some requests occasionally or having a weird bug for a small fraction of requests or database records is usually not going to cause you serious financial distress as a business.

When working on Go services it is nearly the last thing I am concerned about.

▲

zozbot234 4 days ago | parent [-]

> Computers and networks are unreliable. Dropping some requests occasionally or having a weird bug for a small fraction of requests

This seems to come with the obvious implication that Golang should only ever be used to implement "services" that are essentially a part of the network infrastructure, passing requests along to other parts of the backend but not implementing any "logic" themselves (since that's where the correctness issues we're discussing might have severe consequences for the business). Isn't this a rather sobering take, all things considered?

	▲	jchw 4 days ago \| parent [-]
		No, no. It's just that boring glue software is the majority of all software. Seriously, it is. It's what most people here are writing most of the time. Rust is surely good for when you are doing something vastly more complicated than boring web services, but if you try to write a database or message queue you are not going to pass Jepsen testing because you have a borrow checker present. Some of the most proven software in the world is written in programming languages with worse concurrency control than Go, like sqlite in C. But, if you wanted to write something with super complex concurrency from scratch, you probably would opt to use Rust, because well, it's just good at that, and it probably is worth the up front and ongoing investment to entirely eliminate some classes of concurrency issues. But in those cases you need much more rigorous testing that will likely help to prevent menial concurrency bugs too, like running torture tests with race detection that try to ensure consistency guarantees hold up in all situations. So are all Go programs of note just boring glue logic? Also no. I use tons of Go software every day that is a lot more than glue logic. Some examples include ESBuild, SyncThing, rclone, restic, and probably a bunch of other utilities of various shapes and sizes. People write databases and message queues and whatever else in Go too. Still, yes most software is terribly boring. Most software is doing glue shit and basic CRUD operations and not much more. That doesn't mean that companies that write these kinds of software do nothing interesting, but even if they do, most of the software is going to be really god damned boring, because a lot of what we need to do is not super novel cutting edge stuff, and you don't rewrite a relational database or message queue system every single time you need one, you pick an off the shelf option and go on your way.