Testing concurrency is extremely hard

For instance, get sql queries; You ran them, and you have no issue; Is your code sane ? Or is it because one query ran 10ms earlier and, thus, you avoided the issue ?

I truly wonder if there is real world tests around this; I bet there is only algorithm and fuzzing;

▲

hxtk 4 days ago | parent | next [-]

I’ve long wished for an SQL error model: given a schema, query, and transaction isolation mode, what errors are theoretically possible?

I have a hard time answering this for Postgres, which disappoints me because I don’t see any reason it sounds very easy to answer, like there could be an extension to EXPLAIN that would dry run the query and list all the error states reachable.

	▲	jiggawatts 4 days ago \| parent [-]
		Computer Science is incredibly immature. Many of its "founding fathers" are still alive! Something akin to this that blew my mind recently was an IDE for a functional language that used typed holes, the programming equivalent of a semiconductor electron "vacancy", a quasi-particle with real properties that is actually just the lack of a particle. The concept was that if you delete (or haven't yet typed) some small segment out of an otherwise well-typed program, the compiler can figure out the type that the missing part must have. This can be used for rapid development because many such "holes" can only have one possible type. This kind of mechanistic development with tool assistance is woefully under-developed.

▲

toolslive 4 days ago | parent | prev | next [-]

There are "lightweight formal methods". Most problems can be produced via small models. Tools like alloy are built around this idea. (IIRC alloy was used to show that a famous DHT had issues with the churn protocol)

https://en.wikipedia.org/wiki/Alloy_(specification_language)

▲

lucianbr 4 days ago | parent | prev | next [-]

https://jepsen.io/

▲

wubrr 4 days ago | parent | prev | next [-]

> Testing concurrency is extremely hard

Writing a non-trivial concurrent system based on your understanding of the 'algorithm' , without relying on testing is much harder.

> I truly wonder if there is real world tests around this

Of course there are. There are many tools, methods, and test suites out there for concurrency testing, for almost any major language out there. Of course, understanding your algorithm, and the systems involved is required to write a proper test suite.

> For instance, get sql queries; You ran them, and you have no issue; Is your code sane ?

Take those queries and run them 1000x+ times concurrently in a loop. That will catch most common issues. If you want to go a step further you can build a setup to execute your queries in any desired order.

▲

sarchertech 4 days ago | parent | next [-]

I’ve never worked somewhere (in 20 years from big tech companies to small startups) that was generally and reliably testing for concurrency bugs.

And I’ve seen dozens of bugs caused by people assuming that transactions (with the default isolation level) protect against race conditions.

▲

wubrr 4 days ago | parent | next [-]

Every place I worked at, that had any kind of reliable, high-throughput concurrent system had an extensive suite of concurrent tests.

https://github.com/postgres/postgres/tree/master/src/test/is...

https://muratbuffalo.blogspot.com/2023/08/distributed-transa...

https://learn.microsoft.com/en-us/archive/msdn-magazine/2008...

https://go.dev/blog/synctest

https://learntla.com/core/concurrency.html

▲

sarchertech 4 days ago | parent [-]

> Every place I worked at, that had any kind of reliable, high-throughput concurrent system

Pretty much anyone with high throughput is running a high throughput concurrent system, and very few companies have an extensive suite of concurrency tests unless you just mean load tests (that aren’t setup to catch race conditions).

The “reliable” part of that statement might be doing a lot of heavy lifting depending on what exactly you mean by that.

▲

wubrr 4 days ago | parent [-]

I gave you several concrete examples. Your claims of 'very few companies have...' aren't very convincing, and the apparent popularity of concurrency testing isn't really a strong argument for or against it's effectiveness or do-ability.

▲

sarchertech 4 days ago | parent [-]

Did you Google “concurrency testing” and send me the top 5 results?

▲

fn-mote 4 days ago | parent | next [-]

Kind of looks like it … the supporting evidence includes work from Microsoft: learning how to write concurrent programs. Surely not evidence that Microsoft is testing for concurrency bugs (of course they are).

	▲	ongy 4 days ago \| parent [-]
		> In Go 1.24, we are introducing a new, experimental testing/synctest package Clearly a mature mechanism we'd see in large companies...

▲

wubrr 4 days ago | parent | prev [-]

Maybe you should have googled 'concurrency testing' before telling me a story about how you worked at every tech company for 76000 years and never saw any concurrency testing lmao.

▲

sarchertech 2 days ago | parent | next [-]

Obvious hyperbole aside, I never said "never saw any concurrency testing". I said I never saw a place that was generally and reliably testing for concurrency bugs.

That is to say that following the standard practices at that company, an average developer introducing a concurrency bug will only tend to find out about that bug in production. And even if the developer wants to implement concurrency testing they will probably find it difficult enough to do that they'll give up because it's not a normal part of the test suite.

This is less true for people working on infrastructure like operating systems, and databases than it is for web developers.

▲

hluska 4 days ago | parent | prev [-]

They said twenty, not 76000. What a crock.

	▲	jamespo 4 days ago \| parent \| next [-]
		Perhaps he had a concurrency bug writing the sentence and didn't lock the value
	▲	gubicle 4 days ago \| parent \| prev [-]
		same thing

▲

nijave 3 days ago | parent | prev [-]

Not sure about other languages but I believe the stock test tooling for `go` generates a random sorting seed and has a flag to run tests concurrently. You can also manually pass the seed to simulate a certain ordering.

While not perfect, our e2e tests caught a couple bugs running them with concurrency.

▲

JackSlateur 4 days ago | parent | prev | next [-]

Running 1000x queries in a loop is called luck.

▲

wubrr 4 days ago | parent [-]

No, it's called testing many concurrent operations.

Implementing a complex concurrent algorithm based on your understanding of it, without proper testing is called luck, and often called delusion.

▲

teraflop 4 days ago | parent | next [-]

You can't easily, automatically test concurrent code for correctness without testing all possible interleavings of instructions, and that state space is usually galactically huge.

It is very easy to write multithreaded code that is incorrect (buggy), but where the window of time for the incorrectness to manifest is only a few CPU instructions at a time, sprinkled occasionally throughout the flow of execution.

Such a bug is unlikely to be found by test cases in a short period of time, even if you have 1000 concurrent threads running. And yet it'll show up in production eventually if you keep running the code long enough. And of course, when it does show up, you won't be able to reproduce it.

That is, I think, what the parent commenter means by "luck".

This is similar to the problem you'll run into when testing code that explicitly uses randomness. If you have a program that calls rand(), and it works perfectly almost all the time but fails when rand() returns the specific number 12345678, and you don't know ahead of time to test that value, then your automated test suite is unlikely to ever catch the problem. And testing all possible return values of rand() is usually impractical.

	▲	nick__m 4 days ago \| parent \| next [-]
		There is a cost benefit ratio and context that matters. The repeat the concurent operations 1000times technique is adequate for a CRUD API but it's whofully inadequate for a database engine or garbage collector.
	▲	wubrr 4 days ago \| parent \| prev \| next [-]
		It will obviously not catch all bugs. Nothing will. But it is a relatively easy and reliable way to catch many of them. It works.
	▲	nijave 3 days ago \| parent \| prev \| next [-]
		There's still value in eliminating ways your program can wrong even if you can't eliminate all of them. Using your logic, why bother testing at all.
	▲	afiori 4 days ago \| parent \| prev [-]
		and if you have too many threads running you could have slowdowns in the system that prevent the race condition from happening

▲

JackSlateur 4 days ago | parent | prev [-]

What algorithm ? The whole idea is that algorithms are useless, and you should just write a bunch of tests and go with it

Yes, if I write stuff with locks, I shall ensure that my code acquires and releases locks correctly

This is completely off-topic with the original post;

Also, you cannot prove something by tests; Just because you found 100000 cases where your code works does not mean there is not a case where is does not (just as you cannot prove that unicorn does not exist) :)

▲

sarchertech 4 days ago | parent | next [-]

> Also, you cannot prove something by tests; Just because you found 100000 cases where your code works does not mean there is not a case where is does not (just as you cannot prove that unicorn does not exist) :)

That’s exactly it. For any non trivial program, there exists an infinite number of ways your program can be wrong and still pass all your tests.

Unless you can literally test every possible input and every bit of state this holds true.

▲

wubrr 4 days ago | parent | next [-]

For any 'non trivial' program there exists an infinite number of ways your program can be wrong but you still believe it's right.

Testing is not a perfect solution to catch all bugs. It's a relatively easy, efficient and reliable way to catch many common bugs though.

	▲	sarchertech 2 days ago \| parent [-]
		Sure but it's not a reliable replacement for understanding what the program you wrote is doing either.

▲

pfdietz 4 days ago | parent | prev [-]

And yet, (1) testing finds bugs in any nontrivial program that hasn't been tested, and (2) test long enough and with enough variety and you can make programs significantly more reliable.

Perfect is the enemy of good, and absent academic fantasies of verified software testing is essential (even then, it's still essential, since you are unlikely to have verified every component of your system.)

	▲	sarchertech 2 days ago \| parent [-]
		Sure testing is useful. It's not so useful that you don't need to understand what you're doing though.

▲

wubrr 4 days ago | parent | prev [-]

It's not about making sure your system is 100% perfect. You cannot do that on any real sufficiently complex system. It's about testing the core functionality in a relatively straightforward and reliable way (including concurrency testing), to catch many common bugs.

	▲	JackSlateur 3 days ago \| parent [-]
		My shit is the backbone of a multibillions compagny Common bugs are not enough, uncommon bugs are just too expensive

▲

pixl97 3 days ago | parent | prev [-]

I mean if you're talking SQL on a large real database vs a small test db you can get some pretty big differences in performance and behavior. Of course query planning is something that should be monitored as an app is deployed and used, but testing never does seem to catch the edge cases.

▲

terpimost 3 days ago | parent | prev | next [-]

https://antithesis.com/ was made to deal with this. You can think of its as a fuzzing but it has overall determinism for the whole system, so there is a time travel and interactive debugging.

▲

eevmanu 2 days ago | parent | prev [-]

deterministic simulation testing[1] (DST)

[1] https://notes.eatonphil.com/2024-08-20-deterministic-simulat...