A couple years ago I implemented a btree (technically order statistic tree) in a couple thousand lines of unsafe rust for a project. I wrote it more or less how I'd do it in C. Each internal node and leaf node was a separate heap allocation and internal nodes had an array of child pointers. It was surprisingly hard to program up. And complicated!

In my opinion, unsafe rust code is worse to use than C because rust is missing the arrow operator. And rust still requires strict aliasing to be followed even in unsafe code. This makes complex unsafe code very hard to implement correctly. Like, it’s easy for something to look right and work correctly but for MIR to still find subtle issues.

Eventually I rewrote my btree on top of Vecs. My node & leaf pointers are now array indices. The result? There is no longer any unsafe code. The code has become significantly simpler and it now runs ~10% faster than it did before, which is shocking to me. I guess bounds checks are cheaper than memory fragmentation on modern computers.

I have so many thoughts having done that. First, I think this is actually the right way to write rust. Yes, manually keeping track of which array slots are in use is inconvenient. But unsafe & pointers are also quite inconvenient in rust. Programming like this makes use after free bugs possible to write. But it’s still memory safe by rust’s definition. It’s impossible to get arbitrary heap corruption because there are no raw pointers. And the indexes are bounds checked.

I also don’t think the resulting code is any worse than the equivalent C++. Everyone talks about memory safety but IMO rust’s best features are enums, traits, cargo, match expressions and so on. Even when you do a run around the borrow checker, it’s these features which make me keep coming back to rust.

I agree better guidance would be nice, but so many words have been spilled on rust already. Would you find content talking about subtle stuff like this? Sometimes the only way to learn is by trying stuff out.

▲

byko3y 2 hours ago | parent [-]

>Eventually I rewrote my btree on top of Vecs. My node & leaf pointers are now array indices. The result? There is no longer any unsafe code. The code has become significantly simpler and it now runs ~10% faster than it did before, which is shocking to me. I guess bounds checks are cheaper than memory fragmentation on modern computers.

Optimizations are very complex and potentially fragile in Rust, LLVM has to sort through tons of generated IR, so it might be just that native Rust structures are optimized better for compilation. Particulary, Rust is able to optimize out some bound checks.

Do note that binary trees are mostly an obsolete legacy today — they are way too cache-unfriendly. I mean you could have written similar code in C++ using std::vector or std::dequeue and get the bounds checking too.

>Everyone talks about memory safety but IMO rust’s best features are enums, traits, cargo, match expressions and so on

C++20 with concepts mostly reproduce the traits. C++17 with std::variants emulate enum/tagged union. Match is unmatched by C++, that's true.

Cargo is good for as long as there are few packages in there. Large projects already suffer from five versions of serde in one build and dependencies on FFI-connected libs that cargo itself cannot build. I mean look at the NPM nightmare — and they've mostly dodged FFI-s.

	▲	josephg an hour ago \| parent \| next [-]
		> Do note that binary trees are mostly an obsolete legacy today — they are way too cache-unfriendly. I mean you could have written similar code in C++ using std::vector or std::dequeue and get the bounds checking too. As a sibling comment said, its a b-tree not a binary tree. B-trees are - as far as I know - the fastest data structure on modern computers for the class of problems they solve. And yes, I think if I ever go back to C/C++ I'll try this approach out. It might also work great in GC languages like JS/TS/C#/Go because there's fewer pointers to keep track of. > Cargo is good for as long as there are few packages in there. Large projects already suffer from five versions of serde in one build and dependencies on FFI-connected libs that cargo itself cannot build. I mean look at the NPM nightmare — and they've mostly dodged FFI-s. I haven't run into the "five versions of serde" problem, but I can easily imagine it. I've lived NPM nightmares more times than I can count. But I'd still prefer that to all the problems you get from CMake, autotools and Makefiles. At this rate we're going to get self driving cars before we have a single sane build system for C.
	▲	EnPissant 2 hours ago \| parent \| prev [-]
		> Do note that binary trees are mostly an obsolete legacy today — they are way too cache-unfriendly BTree is not Binary Tree. It's B-Tree and is cache-friendly > C++20 with concepts mostly reproduce the traits. C++20 concepts are not the same as traits. Concepts are structural and awkward to use compared to Traits which are nominal. There are other important differences, too.