This kinda stuff brings languages like C# and Java closer to Rust in performance, thinking like the "borrow checker" it understands the scope of some objects and puts them on the stack and avoids garbage collection and allocation overhead.

It keeps the unsung benefit of garbage collection for "programming in the large" in which memory allocation is treated as a global concern independent of everything else instead of a global concern that has to be managed locally in every line of code.

Rust's strategy is problematic for code reuse just as C/C++'s strategy is problematic. Without garbage collection a library has to know how it fits into the memory allocation strategies of the application as a whole. In general a library doesn't know if the application still needs a buffer and the application doesn't know if the library needs it, but... the garbage collector does.

Sure you can "RC all the things" but then you might as well have a garbage collector.

In the Java world we are still waiting for

https://openjdk.org/projects/valhalla/

▲ gpderetta 2 days ago | parent | next [-]

GC is problematic for cross-language foundational libraries though (unless they run on the same VM of course).

▲

gwbas1c 2 days ago | parent | next [-]

Makes me wonder how easy/hard it is to write a library in Rust and expose it to other languages.

	▲	tialaramex 2 days ago \| parent \| next [-]
		One extreme case of this would be rustls-openssl-compat which is basically what if you take a Rust TLS implementation but you give it the same ABI as the openssl C library so (where this works, which is far from everywhere) it's a drop-in replacement. e.g. you can run curl, and go fetch https://example.com/ but using this ersatz openssl rather than your C openssl implementation. That's a thing which works today, albeit not a supported configuration from the point of view of Curl's author.
	▲	steveklabnik 2 days ago \| parent \| prev [-]
		Can be viewed as either, you expose a C ABI and then the other languages can use it, which is fundamentally not super hard but fiddly, which could be viewed as hard but some languages have nice libraries that exist to make it easier, which is of course easier.

▲

PaulHoule 2 days ago | parent | prev [-]

But what’s so bad about that Clojure and Java make a great team.

▲

Grikbdl 2 days ago | parent | next [-]

Yeah but now try to make your Java library useful to a C#, Go or Python application.

	▲	PaulHoule 2 days ago \| parent [-]
		In the case of Python I think you could produce something like Jython that runs inside the Java runtime and lines up https://openjdk.org/jeps/454 with the Python FFA so you could run things like numpy inside of it. Would be transformative for my side projects.

▲

gpderetta 2 days ago | parent | prev [-]

Nothing, hence my "unless they run on the same VM" comment.

▲ SkiFire13 2 days ago | parent | prev | next [-]

> thinking like the "borrow checker" it understands the scope of some objects and puts them on the stack and avoids garbage collection and allocation overhead.

On the other side however if you don't write code that the borrow checker would accept you likely won't get these optimizations. And even if it was accepted there's a chance the analysis needs to be too deep or complex for the escape analysis to work. Ultimately this is a nice speed up in practice but not something I would rely on.

▲

PaulHoule 2 days ago | parent [-]

The counter to that is that is that performance really matters in inner loops and in those cases the hot area is not that big, it doesn’t matter if your setup and tear down give up a lot of possible performance.

Early this year my son was playing chess which motivated me to write a chess program. I wrote one in Python really quickly that could challenge him. I was thinking of writing one I could bring to the chess club which would have had to have respected time control and with threads in Java this would have been a lot easier. I was able to get the inner loop to generate very little garbage in terms of move generation and search and evaluation with code that was only slightly stilted. To get decent play though I would have needed transposition table that were horribly slow using normal Java data structures but it could have been done off heap with something that would have looked nice on the outside but done it the way C would have done it in the inside.

I gave up because my son gave up on chess and started building and playing guitars in all his spare time.

Chess is a nice case of specialized programming where speed matters and it is branchy and not numeric.

▲

ofrzeta 2 days ago | parent [-]

I guess you could to now try some realtime audio processing. Surprisingly (to me) there are actuall existing realtime DSP packages for Java.

	▲	PaulHoule 2 days ago \| parent [-]
		Yeah, audio DSP is not challenging which is what killed off the high-end soundcard. So far my collab with him has been over physics and electronics such as scaling for the prototype electric octobase (one string so far) that he made. His current stretch project is: he cut a MIDI keyboard in half and stacked the segments on top of each other like an organ keyboard and he's attaching that to two electric guitar necks. Probably the MIDI is going to go to a rack-mount MIDI synth so it's going to have one hell of a wire bundle coming out of it. Personally I am really fascinated with https://en.wikipedia.org/wiki/Guitar_synthesizer which can take the sound from an electric guitar or bass and turn it into MIDI events or the equivalent that controls a synthesizer. Most commercial versions have six independent pickups, they used to be connected to the brains with a ribbon cable but some of them now digitize on the guitar and send the data to the controller over a serial link https://www.boss.info/us/categories/guitar_synthesizers/seri...

▲ rudedogg 2 days ago | parent | prev | next [-]

This is wishful thinking. It’s the same as other layers we have like auto-vectorization where you don’t know if it’s working without performance analysis. The complexity compounds and reasoning about performance gets harder because the interactions get more complex with abstractions like these.

Also, the more I work with this stuff the more I think trying to avoid memory management is foolish. You end up having to think about it, even at the highest of levels like a React app. It takes some experience, but I’d rather just manage the memory myself and confront the issue from the start. It’s slower at first, but leads to better designs. And it’s simpler, you just have to do more work upfront.

Edit:

> Rust's strategy is problematic for code reuse just as C/C++'s strategy is problematic. Without garbage collection a library has to know how it fits into the memory allocation strategies of the application as a whole. In general a library doesn't know if the application still needs a buffer and the application doesn't know if the library needs it, but... the garbage collector does.

Should have noted that Zig solves this by making the convention be to pass an allocator in to any function that allocates. So the boundaries/responsibilities become very clear.

▲ bob1029 2 days ago | parent | next [-]

Use of a GC does not imply we are trying to avoid memory management or no longer have a say in how memory is utilized. Getting sweaty chasing around esoteric memory management strategies leads to poor designs, not good ones.

▲

rudedogg 2 days ago | parent [-]

> Getting sweaty chasing around esoteric memory management strategies

I’m advocating learning about, and understanding a couple different allocation strategies and simplifying everything by doing away with the GC and minimizing the abstractions you need.

My guess is this stuff used to be harder, but it’s now much easier with the languages and knowledge we have available. Even for application development.

See https://www.rfleury.com/p/untangling-lifetimes-the-arena-all...

	▲	pron 2 days ago \| parent \| next [-]
		Arenas are fantastic when they work; when they don't, you're in a place that's neither simple nor particularly efficient. Generational tracing garbage collectors automatically work in a manner similar to arenas (sometimes worse; sometimes better) in the young-gen, but they also automatically promote the non-arena-friendly objects to the old-gen. Modern GCs - which are constantly evolving at a pretty fast pace - use algorithms that reprensent a lot of expertise gathered in the memory management space that's hard to beat unless arenas fully solve your needs.
	▲	PaulHoule 2 days ago \| parent \| prev [-]
		For a lot of everyday programming arenas are "all you need".

▲ pron 2 days ago | parent | prev | next [-]

> It’s the same as other layers we have like auto-vectorization where you don’t know if it’s working without performance analysis. The complexity compounds and reasoning about performance gets harder because the interactions get more complex with abstractions like these.

Reasoning about performance is hard as it is, given nondeterministic optimisations by the CPU. Furthermore, a program that's optimal for one implementation of an Aarch64 architecture can be far from optimal for a different implementation of the same architecture. Because of that, reasoning deeply about micro-optimisations can be counterproductive, as your analysis today could be outdated tomorrow (or on a different vendor's chip). Full low-level control is helpful when you have full knowledge of the exact environment, including hardware details, and may be harmful otherwise.

What is meant by "performance" is also subjective. Improving average performance and improving worst-case performance are not the same thing. Also, improving the performance of the most efficient program possible and improving the performance of the program you are likely to write given your budget aren't the same thing.

For example, it may be the case that using a low-level language would yield a faster program given virtually unlimited resources, yet a higher-level language with less deterministic optimisation would yield a faster program if you have a more limited budget. Put another way, it may be cheaper to get to 100% of the maximal possible performance in language A, but cheaper to get to 97% with language B. If you don't need more than 97%, language B is the "faster language" from your perspective, as the programs you can actually afford to write will be faster.

> Also, the more I work with this stuff the more I think trying to avoid memory management is foolish.

It's not about avoiding thinking about memory management but about finding good memory management algorithms for your target definition of "good". Tracing garbage collectors offer a set of very attractive algorithms that aren't always easy to match (when it comes to throughput, at least, and in some situations even latency) and offer a knowb that allows you to trade footprint for speed. More manual memory management, as well as refcounting collectors often tend to miss the sweet spot, as they have a tendency for optimising for footprint over throughput. See this great talk about the RAM/CPU tradeoff - https://youtu.be/mLNFVNXbw7I from this year's ISMM (International Symposium on Memory Management); it focuses on tracing collectors, but the point applies to all memory management solutions.

> Should have noted that Zig solves this by making the convention be to pass an allocator in to any function that allocates. So the boundaries/responsibilities become very clear.

Yes, and arenas may give such usage patterns a similar CPU/RAM knob to tracing collectors, but this level of control isn't free. In the end you have to ask yourself if what you're gaining is worth the added effort.

▲ rudedogg 2 days ago | parent [-]

I enjoy reading your comments here. Thanks for sharing your knowledge, I'll watch the talk.

> Yes, and arenas may give such usage patterns a similar CPU/RAM knob to tracing collectors, but this level of control isn't free. In the end you have to ask yourself if what you're gaining is worth the added effort.

For me using them has been very easy/convenient. My earlier attempts with Zig used alloc/defer free everywhere and it required a lot of thought to not make mistakes. But on my latest project I'm using arenas and it's much more straightforward.

▲ pron a day ago | parent [-]

Sure, using arenas is very often straightforward, but it also very often isn't. For example, say you have a server. It's very natural to have an arena for the duration of some request. But then things could get complicated. Say that in the course of handling the transaction, you need to make multiple outgoing calls to services. They have to be concurrent to keep latency reasonable. Now arenas start posing some challenges. You could use async/coroutine IO to keep everything on the same thread, but that imposes some limitations on what you can do. If you use multiple threads, then either you need to synchronise the arena (which is no longer as efficient) or use "cactus stacks" of arenas and figure out a way to communicate values from the "child" tasks to the parent one, which isn't always simple (and may not even be super efficient).

In lots of common cases, arenas work great; in lots of common cases they don't.

There are also other advantages unrelated to memory management. In this talk by Andrew Kelley (https://youtu.be/f30PceqQWko) he shows how Zig, despite its truly spectacular partial evaluation, still runs into an abstraction/performance tradeoff (when he talks about what should go "above" or "below" the vtable). When you have a really good JIT, as Java does, this tradeoff is gone (instead, you trade off warmup time) as the "runtime knowns" are known at compile time (since compilation is done at runtime).

▲ chpill 8 hours ago | parent [-]

  When you have a really good JIT, as Java does, this tradeoff is gone

Is there a way to visualize the machine code generated by the JVM when optimizing the same kind of code as the examples shown in the talk you mention? I tried putting the following into godbolt.org, but i'm not sure I'm doing it right:

  public class DontForgetToFlush {
      public static void example(java.io.BufferedWriter w) throws java.io.IOException {
          w.write("a");
          w.write("b");
          w.write("c");
          w.write("d");
          w.write("e");
          w.write("f");
          w.write("g");
          w.flush();
      }
      public static void main(String... args) throws java.io.IOException {
          var os = new java.io.OutputStreamWriter(System.out);
          var writer = new java.io.BufferedWriter(os, 100);
          example(writer);
      }
  }

▲ rkagerer 2 days ago | parent | prev [-]

Convention (as you report Zig does) seems to be a sensible way to deal with the problem.

> Also, the more I work with this stuff the more I think trying to avoid memory management is foolish ... It takes some experience, but I’d rather just manage the memory myself and confront the issue from the start.

Not sure why you're getting downvoted, this is a reasonable take on the matter.

▲ nly 2 days ago | parent | prev [-]

Efficient memory allocation is part of a well written designed API.

Languages like C++ give you a tonne of options here, from passing in scratch buffers to libraries, passing in reusable containers, move semantics, to type erased primitives like std::memory_resource and std::shared_ptr

▲

tialaramex 2 days ago | parent [-]

Perhaps rather than "a tonne of options" people might like to have fewer that are actually good ?

▲

nly 2 days ago | parent [-]

I'd you're using an unmanaged language then you need to think about memory allocation and ownership.

This is something you should think about early on in your design.

	▲	tialaramex 2 days ago \| parent [-]
		I do think about such things, but having "a tonne of options" when they're mostly terrible is the opposite of helpful. Let's pull out an easy one, you mention the move assignment semantic. In C++ that's a performance leak because it isn't the destructive move - so each such move incurs a creation whether you wanted one or not and it may also incur a "moved-from" check in the destructor, another overhead you wouldn't pay with the destructive move.