Remix.run Logo
scottlamb 5 days ago

> It's not like C++ users don't say the exact same thing about String, though.

If they do, they're wrong, as the two languages are quite different here. In Java, String requires two allocations: your variable is implicitly a pointer to String allocation, which in turn has a pointer to a char[] allocation. In C++, the std::string itself is a value type. The actual bytes might be inline (short string optimization) or behind a single allocation accessible from the std::string.

Rust's std::string::String is somewhere between: it's a value type but does not have a short string optimization (unless you count the empty string returned by String::new).

> Different people and programs always have a different notion of what a String should do (Should it be mutable? Should it always be valid UTF-8? Which operations should be O(1), O(n) or O(n log n)? etc.)

Sure, there can be call for writing your own String type. But what's unique about Java as compared to say C, C++, Go, Rust, even to some extent C# is that you can't have a class or struct that bundles up the parts of your data structure (in the case of a mutable string, two fields: data pointer/capacity + the used length) without boxing. There's a heavy cost to any non-primitive data type.

josephg 5 days ago | parent | next [-]

> Sure, there can be call for writing your own String type. But what's unique about Java as compared to say C, C++, Go, Rust, even to some extent C# is that …

You also can’t make a first class string type in most of those language because “hi” is defined to be of a system specified class. You can make a different type to store strings but it’ll never be as ergonomic to use.

It’s even worse in JavaScript, where the standard library is written in a different language (usually C++). It’s impossible to write JavaScript code that matches the performance of the built in string primitive. At least outside of specific niche use cases.

Rust has lots of alternate string like libraries in cargo that are optimized better for different use cases - like smallstring or ropey. They’re fast, convenient and ergonomic. I imagine C++ is the same.

layer8 5 days ago | parent | prev | next [-]

> In Java, String requires two allocations

That’s true, but thanks to GC an allocation also is just a pointer bump in Java, and the two allocations are likely to be close to each other. For short-lived strings, the GC cost is furthermore zero, because only the longer-lived objects need to be tracked and copied with generational GC. So, “heavy cost” is relative.

grogers 5 days ago | parent | prev | next [-]

Also, you can't construct a java String without copying the data into it, because String is immutable. In other languages like c++ or rust the string type is mutable so you don't need an extra copy. Java doesn't even special case blessed APIs like StringBuilder to avoid the extra copy, since StringBuilder itself doesn't have a method to consume the inner buffer, you can only create a string from it without touching the buffer even though it's not the normal usage to create multiple strings from a given StringBuilder.

throeaway9 4 days ago | parent [-]

For cases where a string repeats itself, you can use String.intern() to reuse data.

layer8 4 days ago | parent [-]

String.intern() doesn't reuse the data per se. It merely gives you a canonical instance with the same value as the String instance you invoke it on (unless it's the first time it is invoked for that value, in which case it returns the same instance you already have in hand). At that point the latter duplicate instance has already been constructed. The only benefit in terms of memory is that the duplicate instance might be garbage-collected earlier if the program stops referencing it and uses the interned instance instead.

Also, String.intern() can be quite slow compared to something like ConcurrentHashMap.putIfAbsent().

pjmlp 4 days ago | parent | prev [-]

Except many aren't aware that as long as std::string fullfills the ISO C++ complexity requirements for its implementation, anything goes.

This is less problematic nowadays, because for many folks there are only three compilers that still matter, or even a single one.