Remix.run Logo
weinzierl 5 days ago

With Rust executing a function for either case deploys the “optimal” version (reference or move) by default, moreover, the compiler (not the linter) will point out the any improper “use after moves”.

    struct Data {
      // Vec cannot implement "Copy" type
      data: Vec<i32>,
    }

    // Equivalent to "passing by const-ref" in C++
    fn BusinessLogic(d :&Data) {
      d.DoThing();
    }

    // Equivalent to "move" in C++
    fn FactoryFunction(d: Data) -> Owner {
      owner = Owner{data: d};
      // ...
      return owner
    }

Is this really true?

I believe in Rust, when you move a non-Copy type, like in this case, it is up to the compiler if it passes a reference or makes a physical copy.

In my (admittedly limited) understanding of Rust semantics calling

     FactoryFunction(d: Data) 
could physically copy d despite it being non-Copy. Is this correct?

EDIT:

Thinking about it, the example is probably watertight because d is essentially a Vec (as Ygg2 pointed out).

My point is that if you see

     FactoryFunction(d: Data) 
and all you know is that d is non-Copy you should not assume it is not physically copied on function call. At least that is my believe.
aw1621107 5 days ago | parent | next [-]

> could physically copy d despite it being non-Copy. Is this correct?

I believe the answer is technically yes. IIRC a "move" in Rust is defined as a bitwise copy of whatever is being moved, modulo optimizations. The only difference is what you can do with the source after - for non-Copy types, the source is no longer considered accessible/usable. With Copy types, the source is still accessible/usable.

tialaramex 5 days ago | parent | prev | next [-]

Well since you're saying "physically" I guess we should talk about a concrete thing, so lets say we're compiling this for the archaic Intel Core i7 I'm writing this on.

On that machine Data is "physically" just the Vec, which is three 64-bit values, a pointer to i32 ("physically" on this machine a virtual address), an integer length and an integer capacity, and the machine has a whole bunch of GPRs so sure, one way the compiler might implement FactoryFuncton is to "physically" copy those three values into CPU registers. Maybe say RAX, RCX, RDX ?

Actually though there's an excellent chance that this gets inlined in your program, and so FactoryFunction never really exists as a distinct function, the compiler just stamps out the appropriate stuff in line every time we "call" this function, so then there was never a "parameter" because there was never a "function".

weinzierl 5 days ago | parent [-]

True. When I wrote the comment I did not think about the Vec though.

The point I am trying to make is more general:

I believe that when you have a type in Rust that is not Copy it will never be implicitly copied in a way that you end up with two visible instances but it is not guaranteed that Rust never implicitly memcopies all its bytes.

I have not tried it but what I had in mind instead of the Vec was a big struct that is not Copy. Something like:

   struct Big<const M: usize> {
       buf: [u8; M],
   }

   // Make it non-Copy.
   impl<const M: usize> Drop for Big<M> {
        fn drop(&mut self) {} 
   }
From my understanding, to know if memory is shoveled around it is not enough to know the function signature and whether the type is Copy or not. The specifics of the type matter.
catlifeonmars 5 days ago | parent | next [-]

Wouldn’t you need a Pin<T> to guarantee no copying? I think copy has two different meanings, depending on whether you’re talking about the underlying memory representation and the logical representation that is available to the developer.

Obviously the distinction can matter sometimes and thus copy in the logical sense is a leaky abstraction (although in practice I notice I do not see that leakage often).

tialaramex 5 days ago | parent | prev [-]

Yes, Rust absolutely might memcpy your Big when you move it somewhere.

I will say that programmers very often have bad instincts for when that's a bad idea. If you have a mix of abilities and can ask, try it, who in your team thinks that'll perform worse for moving M = 64 or M = 32? Don't give them hours to think about it. I would not even be surprised to find real world experienced programmers whose instinct tells them even M = 4 is a bad idea despite the fact that if we analyse it we're copying a 4 byte value rather than copying the (potentially much bigger) pointer and taking an indirection

Edited: To fix order of last comparison

ninkendo 5 days ago | parent [-]

> I will say that programmers very often have bad instincts for when that's a bad idea

True that. memcpy is basically the literal fastest thing your processor can do, it’s trivially pipelined and can be done asynchronously.

If the alternative is heap storage you’re almost always cooked: that heap space is far less likely to be in L1 cache, allocating it takes time and requires walking a free list, dealing with memory fragmentation, freeing it when dropped, etc.

It’s not a bad short-hand to think of the heap as being 10-100x slower than the stack.

Ygg2 5 days ago | parent | prev [-]

Can't run Godbolt on my phone for some reason, but in this case I expect compiler to ignore wrapper types and just pass Vec around.

If you have

    Vec<i32>

    // newtype struct 
    struct Data{ data: Vec<i32> }

    // newtype enum in rust
    // Possibly but not 100% sure 
    // enum OneVar { Data(Vec<i32>) }
From my experiments with newtype pattern, operations implemented on data and newtype struct yielded same assembly. To be fair in my case it wasn't a Vec but a [u8; 64] and a u32.
tialaramex 5 days ago | parent [-]

The compiler isn't ignoring your new types, as you'll see if you try to pass a OneVar when the function takes a Vec but yes, Rust really likes new types whose representation is identical yet their type is different.

My favourite as a Unix person is Option<OwnedFd>. In a way Option<OwnedFd> is the same as the classic C int file descriptor. It has the exact same representation, 32 bits of aligned integer. But Rust's type system means we know None isn't a file descriptor, whereas it's too easy for the C programmer to forget that -1 isn't a valid file descriptor. Likewise the Rust programmer can't mistakenly do arithmetic on file descriptors, if we intend to count up some file descriptors but instead sum them in C that compiles and isn't what you wanted, in Rust it won't compile.

Ygg2 5 days ago | parent [-]

> The compiler isn't ignoring your new types

True, I didn't meant to imply you can just ignore types; I meant to say that the equivalent operations on a naked vs wrapped value return equivalent assembly.

It's one of those zero cost abstraction. You can writ your newtype wrapper and it will be just as if you wrote implementations by hand.

> My favourite as a Unix person is Option<OwnedFd>.

Yeah, but that's a bit different. Compiler won't treat any Option<T> that way out of the box. You need a NonZero type or nightly feature to get that[1].

That relies on compiler "knowing" there are some values that will never be used.

[1] https://www.0xatticus.com/posts/understanding_rust_niche/

tialaramex 5 days ago | parent [-]

You can't make your own types with niches (in stable Rust, yet, though I am trying to change that and I think there's a chance we'll make that happen some day) except for enumerations.

So if you make an enumeration AlertLevel with values Ominous, Creepy, Terrifying, OMFuckingGoose then Option<AlertLevel> is a single byte, Rust will assign a bit pattern for AlertLevel::Ominous and AlertLevel::Creepy and so on, but the None just gets one of the bit patterns which wasn't used for a value of AlertLevel.

It is a bit trickier to have Color { Red, Green, Blue, Yellow } and Breed { Spaniel, Labrador, Poodle } and make a type DogOrHat where DogOrHat::Dog has a Breed but DogOrHat::Hat has a Color and yet the DogOrHat fits in a single byte. This is because Rust won't (by default) avoid clashes, so if it asssigned Color::Red bit pattern 0x01 and Breed::Spaniel bit pattern 0x01 as well, it won't be able to disambiguate without a separate dog-or-hat tag, however we can arrange that the bit patterns don't overlap and then it works. [This is not guaranteed by Rust unlike the Option<OwnedFd> niche which is guaranteed by the language]

Ygg2 4 days ago | parent [-]

> You can't make your own types with niches in stable Rust

You can, provided they are wrapper around NonZero types. See https://docs.rs/nonmax/latest/nonmax/

Hence my comment before NonZero types or Rust nightly.