Remix.run Logo
pornel 15 hours ago

In short, the maximum possible speed is the same (+/- some nitpicks), but there can be significant differences in typical code, and it's hard to define what's a realistic typical example.

The big one is multi-threading. In Rust, whether you use threads or not, all globals must be thread-safe, and the borrow checker requires memory access to be shared XOR mutable. When writing single-threaded code takes 90% of effort of writing multi-threaded one, Rust programmers may as well sprinkle threads all over the place regardless whether that's a 16x improvement or 1.5x improvement. In C, the cost/benefit analysis is different. Even just spawning a thread is going to make somebody complain that they can't build the code on their platform due to C11/pthread/openmp. Risk of having to debug heisenbugs means that code typically won't be made multi-threaded unless really necessary, and even then preferably kept to simple cases or very coarse-grained splits.

arghwhat 15 hours ago | parent | next [-]

To be honest, I think a lot of the justification here is just a difference in standard library and ease of use.

I wouldn't consider there to be any notable effort in making thread build on target platforms in C relative to normal effort levels in C, but it's objectively more work than `std::thread::spawn(move || { ... });`.

Despite benefits, I don't actually think the memory safety really plays a role in the usage rate of parallelism. Case in point, Go has no implicit memory safety with both races and atomicity issues being easy to make, and yet relies much heavier on concurrency (with a parallelism degree managed by the runtime) with much less consideration than Rust. After all, `go f()` is even easier.

(As a personal anecdote, I've probably run into more concurrency-related heisenbugs in Go than I ever did in C, with C heisenbugs more commonly being memory mismanagement in single-threaded code with complex object lifetimes/ownership structures...)

josephg 4 hours ago | parent | next [-]

> To be honest, I think a lot of the justification here is just a difference in standard library and ease of use.

I really liked this article by Bryan Cantrill from 2018:

https://bcantrill.dtrace.org/2018/09/28/the-relative-perform...

He straight ported some C code to rust and found the rust code outperformed it by ~30% or something. The culprit ended up being that in C, he was using a hash table library he's been copy pasting between projects for years. In rust, he used BTreeMap from the standard library, which turns out to be much better optimized.

This isn't evidence Rust is faster than C. I mean, you could just backport that btreemap to C and get exactly the same performance in C code. At the limit, I think both languages perform basically the same.

But most people aren't going to do that.

If we're comparing normal rust to normal C - whatever that means - then I think rust takes the win here. Even Bryan Cantrill - one of the best C programmers you're likely to ever run into - isn't using a particularly well optimized hash table implementation in his C code. The quality of the standard tools matters.

When we talk about C, we're really talking about an ecosystem of practice. And in that ecosystem, having a better standard library will make the average program better.

mandw 2 hours ago | parent [-]

The only real question I have with this is did the program have to have any specific performance metric? I could write a small utility in python that would be completely acceptable for use but at the same time be 15x slower than an implementation in another language. So you do you compare code across languages that were not written for performance given one may have some set of functions that happens to favour one language in that particular app? I think to compare you have to at least have the goal of performance for both when testing. If he needed his app to be 30% faster he would have made it so, but it didn't need to be so he didn't. Which doesn't make it great for comparison.

   Edit, I also see that your reply was specifically about the point that the libs by themselves can help the performance with no work, and I do agree with you, as you were to the guy above.
josephg an hour ago | parent [-]

Honestly I'm not quite sure what point you're making.

> If he needed his app to be 30% faster he would have made it so

Would he have? Improving performance by 30% usually isn't so easy. Especially not in a codebase which (according to Cantrill) was pretty well optimized already.

The performance boost came to him as a surprise. As I remember the story, he had already made the C code pretty fast and didn't realise his C hash table implementation could be improved that much. The fact rust gave him a better map implementation out of the box is great, because it means he didn't need to be clever enough to figure those optimizations out himself.

Its not an apples-to-apples comparison. But I don't think comparing the world's fastest C code to the world's fastest rust code is a good comparison either, since most programmers don't write code like that. Its usually incidental, low effort performance differences that make a programming language "fast" in the real world. Like a good btree implementation just shipping with the language.

oconnor663 2 hours ago | parent | prev | next [-]

> Despite benefits, I don't actually think the memory safety really plays a role in the usage rate of parallelism.

I can see what you mean with explicit things like thread::spawn, but I think Tokio is a major exception. Multithreaded by default seems like it would be an insane choice without all the safety machinery. But we have the machinery, so instead most of the async ecosystem is automatically multithreaded, and it's mostly fine. (The biggest problems seem to be the Send bounds, i.e. the machinery again.) Cargo test being multithreaded by default is another big one.

majormajor 6 hours ago | parent | prev | next [-]

> (As a personal anecdote, I've probably run into more concurrency-related heisenbugs in Go than I ever did in C, with C heisenbugs more commonly being memory mismanagement in single-threaded code with complex object lifetimes/ownership structures...)

Is that beyond just "concurrency is tricky and a language that makes it easier to add concurrency will make it easier to add sneaky bugs"? I've definitely run into that, but have never written concurrent C to compare the ease of heisenbug-writing.

nineteen999 9 hours ago | parent | prev | next [-]

> (As a personal anecdote, I've probably run into more concurrency-related heisenbugs in Go than I ever did in C, with C heisenbugs more commonly being memory mismanagement in single-threaded code with complex object lifetimes/ownership structures...)

This is my experience too.

kannujd 6 hours ago | parent [-]

[flagged]

pornel 7 hours ago | parent | prev [-]

Go is weirdly careless about thread-safety of its built-in data structures, but GC, channels, and the race detector seem to be enough?

OptionOfT 15 hours ago | parent | prev | next [-]

Apart from multi threading, there is more information in the Rust type system. Would that would allow more optimizations?

kouteiheika 15 hours ago | parent | next [-]

Yes. All `&mut` references in Rust are equivalent to C's `restrict` qualified pointers. In the past I measured a ~15% real world performance improvement in one of my projects due to this (rustc has/had a flag where you can turn this on/off; it was disabled by default for quite some time due to codegen bugs in LLVM).

steveklabnik 15 hours ago | parent | next [-]

Not just all &mut T, but also all &T, where the T does not transitively contain an UnsafeCell<T>. Click "show llvm ir" instead of "build" here: https://play.rust-lang.org/?version=stable&mode=release&edit...

marcianx 15 hours ago | parent [-]

I was confused by this at first since `&T` clearly allows aliasing (which is what C's `restrict` is about). But I realize that Steve meant just the optimization opportunity: you can be guaranteed that (in the absence of UB), the data behind the `&T` can be known to not change in the absence of a contained `UnsafeCell<T>`, so you don't have to reload it after mutations through other pointers.

steveklabnik 14 hours ago | parent | next [-]

Yes. It's a bit tricky to think about, because while it is literally called 'noalias', what it actually means is more subtle. I already linked to a version of the C spec below, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf but if anyone is curious, this part is in "6.7.4.2 Formal definition of restrict" on page 122.

In some ways, this is kind of the core observation of Rust: "shared xor mutable". Aliasing is only an issue if the aliasing leads to mutability. You can frame it in terms of aliasing if you have to assume all aliases can mutate, but if they can't, then that changes things.

salmon640 6 hours ago | parent | prev [-]

[flagged]

dmitrygr 3 hours ago | parent | prev [-]

Do you not use restrict in your normal everyday C code that you write? I use it in my normal C code.

kouteiheika 42 minutes ago | parent [-]

I used to use it, but very rarely, since it's instant UB if you get it wrong. In tiny codebases which you can hold in your head it's probably practical to sprinkle it everywhere, but in anything bigger it's quite risky.

Nevertheless, I don't write normal everyday C code anymore since Rust has pretty much made it completely obsolete for the type of software I write.

mhh__ 15 hours ago | parent | prev | next [-]

Aliasing info is gold dust to a compiler in various situations although the absence of it in the past can mean that they start smoking crack when it's provided.

randomNumber7 12 hours ago | parent | prev | next [-]

In C there is the "restrict" keyword to tell the compiler that there is no other pointer to the values accessed over a certain pointer.

If you do not use that the generated code can be quite suboptimal in certain cases.

adgjlsfhk1 15 hours ago | parent | prev | next [-]

Yes. Specifically since Rust's design prevents shared mutablity, if you have 2 mutable data-structures you know that they don't alias which makes auto vectorization a whole lot easier.

tcfhgj 15 hours ago | parent | prev [-]

what about generics (equivalent to templates in C++), which allow compile time optimizations all the way down which may not possible if the implementation is hidden behind a void*?

OptionOfT 12 hours ago | parent [-]

Unless you use `dyn`, all code is monomorphized, and that code on its own will get optimized.

This does come with code-bloat. So the Rust std sometimes exposes a generic function (which gets monomorphized), but internally passes it off to a non-generic function.

This to avoid that the underlying code gets monomorphized.

https://github.com/rust-lang/rust/blob/8c52f735abd1af9a73941...

dwattttt 5 hours ago | parent [-]

> This does come with code-bloat. So the Rust std sometimes exposes a generic function (which gets monomorphized), but internally passes it off to a non-generic function.

There's no free lunch here. Reducing the amount of code that's monomorphised reduces the code emitted & improves compile times, but it reduces the scope of the code that's exposed to the input type, which reduces optimisation opportunities.

josephg 4 hours ago | parent [-]

Yes. But I like that rust gives you the option.

In C, the only way to write a monomorphized hash table or array list involves horribly ugly macros that are difficult to write and debug. Rust does monomorphization by default, but you can also use &dyn trait for vtable-like behaviour if you prefer.

gpderetta 14 hours ago | parent | prev | next [-]

Then again, often

  #pragma omp for 
is a very low mental-overhead way to speed up code.
MeetingsBrowser 14 hours ago | parent | next [-]

Depends on the code.

OpenMP does nothing to prevent data races, and anything beyond simple for loops quickly becomes difficult to reason about.

thesz 5 hours ago | parent [-]

No.

It is easy to divide loop body into computation and share info update, the latter can be done under #pragma omp critical (label).

nurettin 14 hours ago | parent | prev [-]

Yes! gcc/omp in general solved a lot of the problems which are conveniently left out in the article.

The we have the anecdotal "They failed firefox layout in C++ twice then did it in Rust" < to this I sigh in chrome.

steveklabnik 14 hours ago | parent [-]

The Rust version of this is "turn .iter() into .par_iter()."

It's also true that for both, it's not always as easy as "just make the for loop parallel." Stylo is significantly more complex than that.

> to this I sigh in chrome.

I'm actually a Chrome user. Does Chrome do what Stylo does? I didn't think it did, but I also haven't really paid attention to the internals of any browsers in the last few years.

pjmlp 14 hours ago | parent | next [-]

And the C++ version is add std::execution::par_unseq as parameter to the ranges algorithm.

MeetingsBrowser 9 hours ago | parent [-]

This has the same drawbacks as "#pragma omp for".

The hard part isn't splitting loop iterations between threads, but doing so _safely_.

Proving an arbitrary loop's iterations are split in a memory safe way is an NP hard problem in C and C++, but the default behavior in Rust.

nurettin 13 hours ago | parent | prev [-]

Afaik it does all styling and layout in the main thread and offloads drawing instructions to other threads (CompositorTileWorker) and it works fine?

dwattttt 5 hours ago | parent [-]

That does sound like Chrome has also either failed to make styling multithreaded in C++ (or haven't attempted it), while it was achieved in Rust?

m-schuetz 15 hours ago | parent | prev | next [-]

I'm still confused as to why linux requires linking against TBB for multithreading, thus breaking cmake configs without if(linux) for tbb. That stuff should be included by default without any effort by the developer.

sebtron 15 hours ago | parent [-]

I think this is related to the C++ standard library implementation.

Using pthread in C, for example, TBB is not required.

Not sure about C11 threads, but I have always thought that GLIBC just uses pthread under the hood.

m-schuetz 14 hours ago | parent [-]

I don't know the details since I'm mainly a windows dev, but when porting to linux, TBB has always been a huge pain in the ass since it's a suddenly additionally required dependency by gcc. Using C++ and std::thread.

pjmlp 14 hours ago | parent [-]

Also clang, and in general parallel algorithms aren't available outside of platforms not supported by TBB.

C++26 will get another similar dependency, because BLAS algorithms are going to be added, but apparently the expectation is to build on top of C/Fortran BLAS battle tested implementations.

jasonjmcghee 14 hours ago | parent | prev | next [-]

> Rust programmers may as well sprinkle threads all over the place regardless whether that's a 16x improvement or 1.5x improvement

What about energy use and contention?

pornel 7 hours ago | parent [-]

Usually it's a benefit for energy usage anyway.

CPUs are most energy efficient sitting idle doing nothing, so finishing work sooner in wall-clock time usually helps despite overheads.

Energy usage is most affected by high clock frequencies, and CPUs will boost clocks for single-threaded code.

Threads waiting on cache misses let CPU use hyperthreading, which is actually energy efficient (you get context switching in hardware).

You can waste energy in pathological cases if you overuse spinlocks or spawn so many threads that bookkeeping takes more work than what the threads do, but helper libraries for multithreading all have thread pools, queues, and dynamic work splitting to avoid extreme cases.

Most of the time low speed up is merely Amdahl's law – even if you can distribute work across threads, there's not enough work to do.

jasonjmcghee 4 hours ago | parent [-]

Thanks

groundzeros2015 15 hours ago | parent | prev [-]

Multithreading does not make code more efficient. It still takes the same amount of work and power (slightly more).

On a backend system where you already have multiple processes using various cores (databases, web servers, etc) it usually doesn’t make sense as a performance tool.

And on an embedded device you want to save power so it also rarely makes sense.

MrJohz 14 hours ago | parent | next [-]

According to [1], the most important factor for the power consumption of code is how long the code takes to run. Code that spreads over multiple cores is generally more power efficient than code that runs sequentially, because the power consumption of multiple cores grows less than linearly (that is, it requires less than twice as much power to run two cores as it does one core).

Therefore if parallelising code reduces the runtime of that code, it is almost always more energy efficient to do so. Obviously if this is important in a particular context, it's probably worth measuring it in that context (e.g. embedded devices), but I suspect this is true more often than it isn't true.

[1]: https://arxiv.org/abs/2410.05460

fauigerzigerk 13 hours ago | parent [-]

>Therefore if parallelising code reduces the runtime of that code, it is almost always more energy efficient to do so

Only if it leads to better utilisation. But in the scenario that the parent comment suggests, it does not lead to better utilisation as all cores are constantly busy processing requests.

Throughput as well as CPU time across cores remains largely the same regardless of whether or not you paralellise individual programs/requests.

MrJohz 11 hours ago | parent [-]

That's true, which is why I added the caveat that this is only true if parallelising reduces the overall runtime - if you can get in more requests per second through parallelisation. And the flip side of that is that if you're able to perfectly utilise all cores then you're already running everything in parallel.

That said, I suspect it's a rare case where you really do have perfect core utilisation.

pirocks 14 hours ago | parent | prev | next [-]

> Multithreading does not make code more efficient. It still takes the same amount of work and power (slightly more).

In addition to my sibling comments I would like to point out that multithreading quite often can save power. Typically the power consumption of an all core load is within 2x the power consumption of a single core load, while being many times faster assuming your task parallelizes well. This makes sense b/c a fully loaded cpu core still needs all the L3 cache mechanisms, all the DRAM controller mechanisms, etc to run at full speed. A fully idle system on the other hand can consume very little power if it idles well(which admittedly many cpus do not idle on low power).

Edit:

I would also add that if your system is running a single threaded database, and a single threaded web server, that still leaves over a hundred of underutilized cores on many modern server class cpus.

groundzeros2015 13 hours ago | parent [-]

Responding to your last point.

If you use a LAMP style architecture with a scripting language handling requests and querying a database, you can never write a single line of multithreaded code and already are setup to utilize N cores.

Each web request can happen in a thread/process and their queries and spawns happen independently as well.

NetMageSCW 15 hours ago | parent | prev [-]

Multithreading can made an application more responsive and more performant to the end user. If multithreading causes an end user to have to wait less, the code is more performant.

groundzeros2015 15 hours ago | parent [-]

Yes it can used to reduce latency of a particular task. Did you read my points about when it’s not helpful?

Are people making user facing apps in rust with GUIs?

sebtron 14 hours ago | parent | next [-]

> Are people making user facing apps in rust with uis?

We are talking not only about Rust, but also about C and C++. There are lots of C++ UI applications. Rust poses itself as an alternative to C++, so it is definitely intended to be used for UI applications too - it was created to write a browser!

At work I am using tools such as uv [1] and ruff [2], which are user-facing (although not GUI), and I definitely appreciate a 16x speedup if possible.

[1] https://github.com/astral-sh/uv

[2]https://github.com/astral-sh/ruff

allreduce 14 hours ago | parent | prev | next [-]

Usually it does not reduce latency but increases throughput.

Multithreading is an invaluable tool when actually using your computer to crunch numbers (scientific computing, rendering, ...).

tcfhgj 15 hours ago | parent | prev [-]

> Are people making user facing apps in rust with GUIs?

yes

groundzeros2015 15 hours ago | parent [-]

got any to share? Should I assume native gui in these these rust performance debates?

tcfhgj 14 hours ago | parent | next [-]

https://system76.com/cosmic

https://helix-editor.com/

https://zed.dev/

groundzeros2015 12 hours ago | parent [-]

What do you think are good use cases for multi threading in these editors?

steveklabnik 11 hours ago | parent | next [-]

"don't block the ui thread" is a pretty classic aphorism in any language.

tcfhgj 12 hours ago | parent | prev [-]

search, linting

gf000 14 hours ago | parent | prev | next [-]

Well, what about small CLI tools, like ripgrep and the like? Does multithreading not matter when we open a large number of files and process them? What about compilers?

groundzeros2015 12 hours ago | parent [-]

Sure. But the more obviously parallel the problem is (visiting N files) the less compelling complex synchronization tools are.

To over explain, if you just need to make N forks of the same logic then it’s very easy to do this correctly in C. The cases where I’m going to carefully maintain shared mutable state with locking are cases where the parallelism is less efficient (Ahmdal’s law).

Java style apps that just haphazardly start threads are what rust makes safer. But that’s a category of program design I find brittle and painful.

The example you gave of a compiler is canonically implemented as multiple process making .o files from .c files, not threads.

gf000 12 hours ago | parent [-]

> The example you gave of a compiler is canonically implemented as multiple process making .o files from .c files, not threads.

This is a huge limitation of C's compilation model, and basically every other language since then does it differently, so not sure if that's a good example. You do want some "interconnection" between translation units, or at least less fine-grained units.

groundzeros2015 12 hours ago | parent [-]

And yet despite that theoretical limit C compiles faster than any other language. Even C++ is very fast if you are not using header-only style.

What’s better? Rust? Haskell? Swift?

It’s very hard to do multithreading at a more granular level without hitting amdahl’s law and synchronization traps.

xpe 14 hours ago | parent | prev [-]

You might start with https://github.com/zed-industries/awesome-gpui and https://blog.logrocket.com/state-rust-gui-libraries/