Remix.run Logo
TheDong 5 days ago

Even if you use channels to send things between goroutines, go makes it very hard to do so safely because it doesn't have the idea of sendable types, ownership, read-only references, and so on.

For example, is the following program safe, or does it race?

    func processData(lines <-chan []byte) {
      for line := range lines {
        fmt.Printf("processing line: %v\n", line)
      }
    }

    func main() {
      lines := make(chan []byte)
      go processData(lines)

      var buf bytes.Buffer
      for range 3 {
        buf.WriteString("mock data, assume this got read into the buffer from a file or something")
        lines <- buf.Bytes()
        buf.Reset()
      }
    }
The answer is of course that it's a data race. Why?

Because `buf.Bytes()` returns the underlying memory, and then `Reset` lets you re-use the same backing memory, and so "processData" and "main" are both writing to the same data at the same time.

In rust, this would not compile because it is two mutable references to the same data, you'd either have to send ownership across the channel, or send a copy.

In go, it's confusing. If you use `bytes.Buffer.ReadBytes("\n")` you get a copy back, so you can send it. Same for `bytes.Buffer.String()`.

But if you use `bytes.Buffer.Bytes()` you get something you can't pass across a channel safely, unless you also never use that bytes.Buffer again.

Channels in rust solve this problem because rust understands "sending" and ownership. Go does not have those things, and so they just give you a new tool to shoot yourself in the foot that is slower than mutexes, and based on my experience with new gophers, also more difficult to use correctly.

Mawr 4 days ago | parent | next [-]

> In go, it's confusing. If you use `bytes.Buffer.ReadBytes("\n")` you get a copy back, so you can send it. Same for `bytes.Buffer.String()`.

>

> But if you use `bytes.Buffer.Bytes()`

If you're experienced, it's pretty obvious that a `bytes.Buffer` will simply return its underlying storage if you call `.Bytes()` on it, but will have to allocate and return a new object if you call say `.String()` on it.

> unless you also never use that bytes.Buffer again.

I'm afraid that's concurrency 101. It's exactly the same in Go as in any language before it, you must make sure to define object lifetimes once you start passing them around in concurrent fashion.

Channels are nice in that they model certain common concurrency patterns really well - pipelines of processing. You don't have to annotate everything with mutexes and you get backpressure for free. But they are not supposed to be the final solution to all things concurrency and they certainly aren't supposed to make data races impossible.

> Even if you use channels to send things between goroutines, go makes it very hard to do so safely

Really? Because it seems really easy to me. The consumer of the channel needs some data to operate on? Ok, is it only for reading? Then send a copy. For writing too? No problem, send a reference and never touch that reference on our side of the fence again until the consumer is done executing.

Seems about as hard to understand to me as the reason why my friend is upset when I ate the cake I gave to him as a gift. I gave it to him and subsequently treated it as my own!

Such issues only arise if you try to apply concurrency to a problem willy-nilly, without rethinking your data model to fit into a concurrent context.

Now, would the Rust approach be better here? Sure, but not if that means using Rust ;) Rust's fancy concurrency guarantees come with the whole package that is Rust, which as a language is usually wildly inappropriate for the problem at hand. But if I could opt into Rust-like protections for specific Go data structures, that'd be great.

hu3 5 days ago | parent | prev [-]

That code would never pass a human pull request review. It doesn't even pass AI code review with a simple "review this code" prompt: https://chatgpt.com/share/68829f14-c004-8001-ac20-4dc1796c76...

"2. Shared buffer causes race/data reuse You're writing to buf, getting buf.Bytes(), and sending it to the channel. But buf.Bytes() returns a slice backed by the same memory, which you then Reset(). This causes line in processData to read the reset or reused buffer."

I mean, you're basically passing a pointer to another thread to processData() and then promptly trying to do stuff with the same pointer.

nemothekid 5 days ago | parent | next [-]

If you are familiar with the internals of bytes/buffer you would catch this. But it would be great for the compiler to catch this instead of a human reviewer. In Rust, this code wouldn't even compile. And I'd argue even in C++, this mistake would be clearer to see in just the code.

TheDong 5 days ago | parent | prev [-]

> I mean, you're basically passing a pointer to another thread to processData()

And yet, "bytes.Buffer.ReadBytes(delim)" returns a copy of the underlying data which would be safe in this context.

The type system does not make it obvious when this is safe or not, and passing pointers you own across channels is fine and common.

> That code would never pass a human pull request review

Yes, that was a simplified example that a human or AI could spot.

When you actually see this in the wild, it's not a minimal example, it's a small bug in hundreds of lines of code.

I've seen this often enough that it obviously does actually happen, and does pass human code review.