Remix.run Logo
johnisgood 3 days ago

I keep hearing that Go's C FFI is slow, why is that? How much slower is it in comparison to other languages?

pornel 3 days ago | parent | next [-]

Go's goroutines aren't plain C threads (blocking syscalls are magically made async), and Go's stack isn't a normal C stack (it's tiny and grown dynamically).

A C function won't know how to behave in Go's runtime environment, so to call a C function Go needs make itself look more like a C program, call the C function, and then restore its magic state.

Other languages like C++, Rust, and Swift are similar enough to C that they can just call C functions directly. CPython is a C program, so it can too. Golang was brave enough to do fundamental things its own way, which isn't quite C-compatible.

9rx 2 days ago | parent | next [-]

> CPython is a C program

Go (gc) was also a C program originally. It still had the same overhead back then as it does now. The implementation language is immaterial. How things are implemented is what is significant. Go (tinygo), being a different implementation, can call C functions as fast as C can.

> ...so it can too.

In my experience, the C FFI overhead in CPython is significantly higher than Go (gc). How are you managing to avoid it?

pornel a day ago | parent | next [-]

I think in case of CPython it's just Python being slow to do anything. There are costs of the interpreter, GIL, and conversion between Python's objects and low-level data representation, but the FFI boundary itself is just a trivial function call.

9rx a day ago | parent [-]

> but the FFI boundary itself is just a trivial function call.

Which no different than Go, or any other language under the sun. There is no way to call a C function other than trivially, as you put it. The overhead in both Python and Go is in doing all the things you have to do in order to get to that point.

A small handful of languages/implementations are designed to be like C so that they don't have to do all that preparation in order to call a C function. The earlier comment included CPython in them. But the question questioned how that is being pull off, as that isn't the default. By default, CPython carries tremendous overhead to call a C function — way more than Go.

johnisgood 2 days ago | parent | prev [-]

I would like to know this, too.

hinkley 2 days ago | parent | prev | next [-]

I wonder if they should be using something like libuv to handle this. Instead of flipping state back and forth, create a playground for the C code that looks more like what it expects.

johnisgood 3 days ago | parent | prev [-]

What about languages like Java, or other popular languages with GC?

lmm 2 days ago | parent | next [-]

Java FFI is slow and cumbersome, even more so if you're using the fancy auto-async from recent versions. The JVM community has mostly bitten the bullet and rewritten the entire world in Java rather than using native libraries, you only see JNI calls for niche things like high performance linear algebra; IMO that was the right tradeoff but it's also often seen as e.g. the reason why Java GUIs on the desktop suck.

Other languages generally fall into either camp of having a C-like stack and thread model and easy FFI (e.g. Ruby, TCL, OCaml) and maybe having futures/async but not in an invisible/magic way, or having a radically different threading model at the cost of FFI being slow and painful (e.g. Erlang). JavaScript is kind of special in having C-like stack but being built around calling async functions from a global event loop, so it's technically the first but feels more like the second.

hinkley 2 days ago | parent [-]

JNI is the second or maybe third FFI for Java. JRI existed before it and that was worse, including performance. The debugging and instrumentation interfaces have been rewritten more times.

https://docs.oracle.com/en/java/javase/24/docs/specs/jni/int... mentions JRI.

But it seems like JNI has been replaced by third party solutions multiple times as well.

https://developer.okta.com/blog/2022/04/08/state-of-ffi-java...

pjc50 2 days ago | parent | prev | next [-]

C# does marshal/unmarshal for you, with a certain amount of GC-pinning required for structures while the function is executing. It's pretty convenient, although not frictionless, and I wouldn't like to say how fast it is.

andrewflnr 2 days ago | parent | prev | next [-]

Similar enough to C I guess, at least in their stack layout.

fnord123 2 days ago | parent | prev [-]

It's explained in the article.

3836293648 2 days ago | parent | prev | next [-]

Go's threading model involves a lot of tiny (but growable) stacks and calling C functions almost immediately stack overflows.

Calling C safely is then slow because you have to allocate a larger stack, copy data around and mess with the GC.

2 days ago | parent | prev | next [-]
[deleted]
9rx 2 days ago | parent | prev | next [-]

> How much slower is it in comparison to other languages?

It's about the same as most other languages that aren't specifically optimized for C calling. Considerably faster than Python.

Which is funny as everyone on HN loves to extol the virtues of Python being a "C DSL" and never think twice about its overhead, but as soon as the word Go is mentioned its like your computer is going to catch fire if you even try.

Emotion-driven development is a bizarre world.

johnisgood a day ago | parent [-]

Yeah, that is why I am asking.

malkia 3 days ago | parent | prev [-]

I've asked ChatGPT to summarize (granted my prompt might not be ideal), but some points to note, here just first in details others in the link at the bottom:

     Calling C from Go (or vice versa) often requires switching from Go's lightweight goroutine model to a full OS thread model because:
       - Go's scheduler manages goroutines on M:N threads, but C doesn't cooperate with Go's scheduler.
       - If C code blocks (e.g., on I/O or mutex), Go must assume the worst and parks the thread, spawning another to keep Go alive.
     * Cost: This means entering/exiting cgo is significantly more expensive than a normal Go call. There’s a syscall-like overhead.

... This was only the first issue, but then it follows with "Go runtime can't see inside C to know is it allocating, blocking, spinning, etc.", then "Stack switching", "Thread Affinity and TLS", "Debug/Profiling support overhead", "Memory Ownership and GC barriers"

All here - https://chatgpt.com/share/688172c3-9fa4-800a-9b8f-e1252b57d0...

johnisgood 3 days ago | parent [-]

Just to roll with your way: https://chatgpt.com/share/688177c9-ebc0-8011-88cc-9514d8e167...

Please do not take the numbers below at face value. I still expect an actual reply to my initial comment.

Per-call overhead:

  C (baseline)    - ~30 ns
  Rust (unsafe)   - ~30 ns
  C# (P/Invoke)   - ~30-50 ns
  LuaJIT          - ~30-50 ns
  Go (cgo)        - ~40-60 ns
  Java (22, FFM)  - ~40-70 ns
  Java (JNI)      - ~300-1000 ns
  Perl (XS)       - ~500-1000 ns
  Python (ctypes) - ~10,000-30,000 ns
  Common Lisp (SBCL) - ~500-1500 ns
Seems like Go is still fast enough as opposed to other programming languages with GC, so I am not sure it is fair to Go.
throwaway7783 3 days ago | parent | next [-]

Java now has FFM, that is far better and simpler than JNI, FWIW. and chatgpt says

Language/API | Call Overhead (no-op C) | Notes

Go (cgo) | ~40–60 ns | Stack switch + thread pinning

Java FFM | ~50 ns (downcall) | Similar to JNI, can be ~30 ns with isTrivial()

Java FFM (leaf) | ~30–40 ns | Optimized (isTrivial=true)

JNI | ~50–60 ns | Slightly slower than FFM

Rust (unsafe) | ~5–20 ns | Near-zero overhead

C# (P/Invoke) | ~20–50 ns | Depends on marshaling

Python (cffi) | 1000–10000 ns | Orders of magnitude slower |

billywhizz 2 days ago | parent | next [-]

i can't see how these numbers can be anywhere near correct (nor the ones above). in JavaScript on an old Core i5 the overhead of a simple ffi call is on the order of 5 nanoseconds. on a recent x64/arm64 cpu it's more like 2 nanoseconds.

you can verify this easily with Deno ffi which is pretty much optimal for JS runtimes. also, from everything i have seen and read, luajit should be even lower overhead than this.

you really shouldn't be asking chatgpt questions like this imo. these are facts, that need to be proven, not just vibes.

throwaway7783 a day ago | parent [-]

I agree. was just following the parents pattern, to make it work for me :)

johnisgood 3 days ago | parent | prev [-]

Thanks, I added it to the list. Keep in mind that the numbers may be off (both yours and mine), so I would not take them at face value. It is interesting how in yours JNI is still pretty good. Also Rust is "~5–20 ns" in yours, so I assume "0" is the baseline.

throwaway7783 a day ago | parent [-]

This is chatgpt. Not my own benchmark. So it is probably hallucinating

Cyph0n 3 days ago | parent | prev | next [-]

> Rust (unsafe)

As if there is an alternative :)

More seriously, it’s “unsafe” from the perspective of the library calling into C, but usually “safe” for any layer above.

johnisgood 3 days ago | parent [-]

Hey, since I am in a thread where we are sharing what ChatGPT spits out, I just copy pasted it from there, too. :)

For what it is worth, I asked about LuaJIT after I have shared the link, and the numbers are now different for some languages, albeit not by much. Go (cgo) became ~50-100 ns. That said, I still believe it is unfair to single out Go when it does way better than some other GC languages.

malkia 3 days ago | parent | prev [-]

oh wow I got downvoted a lot - I guess I'm bad at prompting :)

Sesse__ 2 days ago | parent | next [-]

You are being downvoted because pasting AI output with no attempt at fact-checking is not bringing any real value to the discussion.

johnisgood 3 days ago | parent | prev | next [-]

That is not it. Everyone who copy pastes output from an LLM gets downvoted (and likely flagged). Even though I simply went with your method, I got down-voted too, when down-voting the parent comment (yours) would have sufficed. Oh well.

malkia 2 days ago | parent [-]

Noted! Well maybe it makes sense! Thanks for the info!!!

johnisgood 2 days ago | parent [-]

I even got down-voted for telling you the truth. Lmao.

"ants_everywhere" is right.

ants_everywhere 2 days ago | parent | prev [-]

Don't take it too personally.

The anti-LLM activists that patrol HN will often downvote comments just for stating facts that disagree with their hatred of LLMs.