Remix.run Logo
bdangubic 4 days ago

you know why you don’t see many non-Java programs on your computer taking up 10x memory? because no one uses them to write anything :)

jokes aside, we got a shift in the industry where many java programs were replaced by electron-like programs which now take 20x memory

vlovich123 4 days ago | parent | next [-]

Technically kind of true but at the same time Android apps are predominantly Java/Kotlin. It speaks more to Java just having a bad desktop story. But it’s also why Android devices need 2x the ram

vips7L 3 days ago | parent [-]

That has nothing to do with Java. The Android runtime is NOT Java/OpenJdk.

vlovich123 3 days ago | parent [-]

No but it does speak to the memory overhead of tracing GC vs ref counting as garbage collection strategies.

gf000 3 days ago | parent [-]

Which is very important in... embedded settings.

While for typical backend situations, reference counting has a crazy high throughput overhead, doing atomic inc/decs left and right, that instantly trashes any kind of cache, and does it in the mutator thread that would do the actual work, for the negligible benefit of using less memory. Meanwhile a tracing GC can do (almost) all its work in another thread, not slowing down the actually important business task, and with generational GCs cleaning up is basically a no-op (of just saying that this region can now be reused).

It's a tradeoff as everything in IT.

Also, iPhone CPUs are always a generation ahead, than any android CPU, if not more. So it's not really Apples to oranges.

vlovich123 3 days ago | parent | next [-]

That would be a compelling counter if and only if languages like Java actually beat other languages in throughput. In practice that doesn’t seem to be the case and the reasons for that seem to be:

* languages like c++ and Rust simply don’t allocate as much as Java, instead using value types. Even C# is better here with value types being better integrated.

* languages like c++ and Rust do not force atomic reference counting. Rust even offers non atomic ref counting in the standard library. You also only need to atomic increment / decrement when ownership is being transferred to a thread - that isn’t quite as common depending on the structure of your code. Even swift doesn’t do too badly here because of the combination of compiler being able to prove the permission of eliding the need for reference counting altogether and offering escape hatches of data types that don’t need it.

* c++, Rust, and Swift can access lower level capabilities (eg SIMD and atomics) that let them get significantly higher throughput.

* Java’s memory model implies and requires the JVM to insert atomic accesses all over the place you wouldn’t expect (eg reading an integer field of a class is an atomic read and writing it is an atomic write). This is going to absolutely swamp any advantage of the GC. Additionally, a lot of Java code declares methods synchronized which requires taking a “global” lock on the object which is expensive and pessimistic for performance as compared with the fine-grained access other languages offer.

* there’s lots of research into ways of offering atomic reference counts more cheaply (called biased RC) which can safely avoid needing to do an atomic operation in places completely transparently and safely provided the conditions are met .

I’ve yet to see a Java program that actually gets higher throughput than Rust so the theoretical performance advantage you claim doesn’t appear to manifest in practice.

gf000 3 days ago | parent | next [-]

The main topic here was Swift vs Android's Java.

Of course with manual memory management you may be able to write more efficient programs, though it is not a given, and comes at the price of a more complicated and less flexible programming model. At least with Rust, it is actually memory safe, unlike c++.

- ref counting still has worse throughout than a tracing GC, even if it is single-threaded, and doesn't have to use atomic instructions. This may or may not matter, I'm not claiming it's worse, especially when used very rarely as is the case with typical c++/rust programs.

> You also only need to atomic increment / decrement when ownership is being transferred to a thread

Java can also do on-stack replacement.. sometimes.

- regarding lower level capabilities, java does have an experimental Vector API for simd. Atomics are readily available in the language.

- Java's memory model only requires 32-bit writes to be "atomic" (though in actuality the only requirement is to not tear - there is no happens before relation in the general case, and that's what is expensive), though in practice 64-bit is also atomic, both of which are free on modern hardware. Field acces is not different from what rust or c++ does, AFAIK in the general case. And `synchronized` is only used when needed - it's just syntactic convenience. This depends on the algorithm at hand, there is no difference between the same algorithm written in rust/c++ vs java from this perspective. If it's lockless, it will be lockless in Java as well. If it's not, than all of them will have to add a lock.

The point is not that manual memory can't be faster/more efficient. It's that it is not free, and comes at a non-trivial extra effort on developers side, which is not even a one-time thing, but applies for the lifetime of the program.

vlovich123 3 days ago | parent [-]

> ref counting still has worse throughout than a tracing GC, even if it is single-threaded, and doesn't have to use atomic instructions. This may or may not matter, I'm not claiming it's worse, especially when used very rarely as is the case with typical c++/rust programs.

That’s a bold claim to make that doesn’t seem to actually be true from my experience. Your 5ghz CPU can probably do ~20 billion non atomic reference adjustments whereas your GC system has to have atomics all over the place or it won’t work and atomics have parasitic performance on unrelated code due to bus locks and whatnot.

> Java can also do on-stack replacement.. sometimes

That’s not what this is. It’s called hybrid RC and it applies always provided you follow the rules.

> The point is not that manual memory can't be faster/more efficient. It's that it is not free, and comes at a non-trivial extra effort on developers side, which is not even a one-time thing, but applies for the lifetime of the program.

The argument here is not about developer productivity - the specific claim is that the Java GC lets you write higher throughput code than you would get with Rust or C++. That just isn’t true so you end up sacrificing throughput AND latency AND peak memory usage. You may not care and are fine with that tradeoff, but claiming you’re not making that tradeoff is not based on the facts.

gf000 2 days ago | parent [-]

> the specific claim is that the Java GC lets you write higher throughput code than you would get with Rust or C++

No, that has never been the specific claim - you can always write more efficient code with manual memory management, given enough time, effort and skill. I wasn't even the one who brought up c++ and rust. Like literally I write this twice in my comment.

What I'm talking about is reference counting as a GC technique vs tracing as a GC technique, all else being equal - it would be idiotic to compare these two if no other "variable" is fixed. (Oh and I didn't even mention the circular references problem, which means you basically have to add a tracing step either-way unless you restrict your language so that it can't express circular stuff).

As for the atomic part, sure, if all it would do is non-atomic increments then CPUs would be plenty happy. And you are right that depending on how the tracing GC is implemented, it will have a few atomic instructions. What you may miss is how often each run. On almost every access, vs every once in a while on a human timescale. Your OS scheduler will also occasionally trash the performance of your thread. But this is the actually apples to oranges comparison, and both techniques can do plenty of tweaks to hide certain tradeoffs, at the price of something else.

And I also mention that the above triad of time, skill and effort is not a given and is definitely not free.

vlovich123 2 days ago | parent [-]

In Rust there’s no forcing of any specific garbage collection mechanism. You’re free to do whatever and there’s many high performance crates to let you accomplish this. Even in Swift this is opt-in.

As for “skill” this is one thing that’s super hard to optimize for. All I can do is point to existence proofs that there’s no mainstream operating system, browser or other piece of high performance code written in Java and it’s all primarily C/C++ with some assembly with Rust starting to take over the C/C++ bits. And at the point where you’re relegating Java to being “business” logic, there’s plenty of languages that are better suited for that in terms of ergonomics.

gf000 a day ago | parent [-]

Sure, but I think that people often fall into the trap of imagining a problem that nicely fits a RAII model, where each lifetime is statically knowable. This is either due to having a specific problem, or because we decided on a specific constraint.

Java is used in HFT (well, there are two types of "high frequency", one where general purpose CPUs are already too slow, where it obviously doesn't apply (neither do rust or c++)) - but sure, I wouldn't write a runtime or other piece of code in Java where absolute control over the hardware is required. But that's a small niche only. What about large distributed systems/algorithms? Why is Java over-represented in this niche (e.g. Kafka, Elasticsearch, etc)?

> And at the point where you’re relegating Java to being “business” logic, there’s plenty of languages that are better suited for that in terms of ergonomics.

That's subjective.

vips7L 3 days ago | parent | prev [-]

> Java’s memory model implies and requires the JVM to insert atomic accesses all over the place you wouldn’t expect (eg reading an integer field of a class is an atomic read and writing it is an atomic write).

AFAIK that doesn’t really happen. They won’t insert atomic accesses anywhere on real hardware because the cpu is capable of doing that atomically anyway.

> Additionally, a lot of Java code declares methods synchronized which requires taking a “global” lock on the object which is expensive and pessimistic for performance as compared with the fine-grained access other languages offer.

What does this have to do with anything? Concurrency requires locks. Arc<T> is a global lock on references. “A lot” of Java objects don’t use synchronized. I’d even bet that 95-99% of them don’t.

vlovich123 3 days ago | parent [-]

> Concurrency requires locks. Arc<T> is a global lock on references

Concurrency does not require locks. There’s entire classes of lock free and wait free algorithms. Arc<T> is also not a lock - it uses atomics to manage the reference counts and no operation on an Arc needs to wait on a lock (it is a lock-free container).

> “A lot” of Java objects don’t use synchronized. I’d even bet that 95-99% of them don’t.

Almost all objects that are used in a concurrent context will likely feature synchronized, at least historically. That’s why Hashtable was split into HashMap (unsynchronized) and ConcurrentHashMap (no longer using synchronized). Thats why you have StringBuffer which was redone into StringBuilder.

vips7L 3 days ago | parent [-]

Ok I mispoke on Arc because I was being hasty; but you're still being pedantic. Concurrency still requires locks. Wait/lock free algorithms can't cover the entirety of concurrency. Rust ships with plenty of locks in std::sync and to implement a ConcurrentHashMap in Rust you would still need to lock. In fact it doesn't even look like Rust supplies concurrent collections at all. So what are we even talking about here? This is still a far cry from "a lot of Java objects use global synchronized locks".

vlovich123 3 days ago | parent [-]

No, that’s an overly strong statement - concurrency doesn’t necessarily require locks even though they can be convenient to express it. You could have channels and queues to transfer data and ownership between threads. Not a lock in sight as queues and channels can be done lock free. The presence of locks in the Rust standard library says nothing other than it’s a very common concurrency tool, not that it’s absolutely required.

> and to implement a ConcurrentHashMap in Rust you would still need to lock

There’s many ways to implement concurrency safe hashmaps (if you explicitly needs such a data structure as the synchronization mechanism) without locks. Notably RCU is such a mechanism (really neat mechanism developed for the kernel although not super user friendly yet or common in userspace) and there are also generational garbage techniques available (kind of similar to tracing GC conceptually but implemented just for a single data structure). A common and popular crate in Rust for this is DashMap which doesn’t use locks and is a concurrency safe hashmap.

vips7L 2 days ago | parent [-]

> A common and popular crate in Rust for this is DashMap which doesn’t use locks and is a concurrency safe hashmap.

Still not in the standard library. The only way in Rust is to use a global lock around map. Seems to be worse than the situation in Java. You could implement the same thing and use a third party library in Java too. So your original point of "everything uses a global lock" is "overly strong"

vlovich123 2 days ago | parent [-]

You’ve now degraded the conversation into a very very weird direction. You made a claim that concurrency required locks. It simply does not and I have an existence proof of Dashmap as a hashmap that doesn’t have any locks anywhere.

The strengths and weaknesses of the standard library aren’t relevant. But if we’re going there, the reason they’re not in the Rust standard library is likely in practice concurrent data structures are an anti pattern - putting a lock around a data structure doesn’t suddenly solve higher order race conditions which is something a lot of Java programmers seem to believe because the standard library encourages this kind of thinking.

As for “my comment” about “global lock” (your words not mine), it’s that the implicit lock that’s available on every object is a bad idea for highly concurrent code (not to mention the implicit overhead that implies for every part of the object graph regardless of it being needed anywhere). Don’t get me wrong - Java took a valiant effort to define a solid memory model for concurrency when the field was still young. Many of the ideas didn’t pan out and are antipatterns these days for high performing code. Of course none of that pertains to the original point of the conversation - tracing GCs have significantly more overhead in practice because they’re very difficult to be opt in, carry quite a penalty if not, Rc/Arc is much better as it’s possible to do opt-in when you need shared ownership (which isn’t always), and in practice loops don’t come up often enough to matter and when they do there’s still solutions. In other words tracing GCs drop huge amounts of performance on the floor and you can read all the comments to see how the claims are “it’s more efficient than Rc”, or “performance is free” or even “it doesn’t matter because the programmer is more efficient”. I’d buy the efficiency argument when the only alternative was C/C++ and came with serious memory safety baggage, but not any of the others and memory safety without sacrificing performance of C++ in my view is a solved problem with Rust.

zozbot234 3 days ago | parent | prev [-]

It depends how you implement reference counting. In Rust the atomic inc-dec operations can be kept at a minimum (i.e. only for true changes in lifecycle ownership) because most accesses are validated at compile time by the borrow checker.

js4ever 4 days ago | parent | prev [-]

[flagged]

p2detar 4 days ago | parent | next [-]

Is this an AI-generated answer? Most of these are not even true, although I still would prefer Go for micro-services. I'll address just a bunch and to be clear - I'm not even a big Java fan.

- Quarkus with GraalVM compiles your Java app to native code. There is no JIT or warm up, memory footprint is also low. By the way, the JVM Hotspot JIT can actually make your Java app faster than your Go or Rust app in many cases [citation needed] exactly due to the hot path optimizations it does.

- GC tuning - I don't even know who does this. Maybe Netflix or some trading shops? Almost no one does this nowadays and with the new JVM ZGC [0] coming up, nobody would need to.

> You can’t ship a minimal standalone binary without pulling in a JVM.

- You'd need JRE actually, e.g., 27 MB .MSI for Windows. That's probably the easiest thing to install today and if you do this via your package manager, you also get regular security fixes. Build tools like Gradle generate a fully ready-to-execute directory structure for your app. If you got the JRE on your system, it will run.

> Dependency management and classpath conflicts historically plagued Java

The keyword here is "historically". Please try Maven or Gradle today and enjoy the modern dependency management. It just works. I won't delve into Java 9 modules, but it's been ages since I last saw a class path issue.

> J2EE

Is someone still using this? It is super easy writing a web app with Java+Javalin for example. The Java library and frameworks ecosystem is super rich.

> “Write once, run anywhere” costs: The abstraction layers that make Java portable also add runtime weight and overhead.

Like I wrote above, the HotSpot JIT is actually doing the heavy lifting for your in real time. These claims are baseless without pointing to what "overhead" is meant in practice.

---

0 - https://inside.java/2023/11/28/gen-zgc-explainer/ or https://www.youtube.com/watch?v=dSLe6G3_JmE

vips7L 3 days ago | parent [-]

> GC tuning, Netflix

I believe Netflix has moved to ZGC with no tuning. Their default setup is to set the min/max heap to the same size, enable always pretouch, and to use transparent huge pages [0]. GC tuning is something of the past. Once automatic heap sizing for ZGC and G1 land you won’t even need to set the heap size [1][2]. They’ll still use more ram because the vm and jit, but the days of it holding on to ram when it doesn’t need it should be over.

[0] https://netflixtechblog.com/bending-pause-times-to-your-will...

[1] https://openjdk.org/jeps/8329758

[2] https://openjdk.org/jeps/8359211

dionian 4 days ago | parent | prev [-]

conflicts are a necessary evil with a massive dependency ecosystem