| ▲ | chrismorgan a day ago | |||||||||||||
> In the above example, on every loop, the += operator causes a new string to be allocated, and the content to be copied, which gets exponentially more expensive as the string grows. But that’s only the theoretical behaviour. In practice, languages tend to end up optimising it in various ways. As noted a paragraph later, the Java compiler is able to detect and “fix” this, by rewriting the code to use a mutable string. Another solution is to put concatenation into your string type as another possible representation. I believe at least some (no idea if it’s all) JavaScript engines do this. You end up with something like this (expressed in Rust syntax, and much simpler than the real ones are):
Then, when you try to access the string, if it’s Concatenated it’ll flatten it into one of the other representations.Thus, the += itself becomes cheap, and in typical patterns you only incur the cost of allocating a new string once, when you next try to read from it (including things like JSON.stringify(object_containing_this_string) or element.setAttribute(name, this_string)). | ||||||||||||||
| ▲ | masklinn a day ago | parent | next [-] | |||||||||||||
> As noted a paragraph later, the Java compiler is able to detect and “fix” this, by rewriting the code to use a mutable string. Does it actually do that nowadays? Back in my days it was incapable of lifting the builder out of loops, so for each iteration it would instantiate a builder with the accumulator string, append the concatenation, then stringify and reset the accumulator. The linked docs don’t say anything about loops. | ||||||||||||||
| ||||||||||||||
| ▲ | masklinn a day ago | parent | prev | next [-] | |||||||||||||
An option you did not mention, although not generally available to languages with advanced runtime, is that even if you have immutable strings you can realloc them if you know you're the only owner of that string (e.g. because they're refcounted, or can't be shared). CPython does that, so a trivial concatenation loop is (amortised) linear (causing issues for alternate implementations when they have to run that). Swift might also be able to via COW optimisation. Rust's string concatenation is a variant of this (though it has mutable strings anyway): `String::add` takes an owned value on the LHS, and the implementation just appends to the LHS before returning it:
so repeated concatenation will realloc and amortize as if you were just `push_str`-ing in a loop (which maps directly to appending to the underlying buffer). | ||||||||||||||
| ||||||||||||||
| ▲ | bruce343434 a day ago | parent | prev | next [-] | |||||||||||||
Maybe I'm missing something, but I thought that when you "grow" an array or a string, eventually something under the hood needs to call realloc()? which allocates a bigger chunk of memory and copies the content O(n)? Why would it matter if you manually do alloc() and then memcpy() (replacing an immutable string with its concatenation) vs letting the runtime do it for you with realloc()? | ||||||||||||||
| ||||||||||||||
| ▲ | shawn_w 21 hours ago | parent | prev | next [-] | |||||||||||||
You're basically describing the Rope. Fancier versions use self balancing trees, allowing other string operations to be fairly efficient too. | ||||||||||||||
| ||||||||||||||
| ▲ | miki123211 a day ago | parent | prev [-] | |||||||||||||
> Another solution is to put concatenation into your string type Aah, the Erlang way of handling strings. On Beam (Erlang's VM), that goes as deep as IO. It's perfectly fine to pass a (possibly nested) list of strings (whether charlists or binaries) to an IO function, and the system just knows how to deal with that. | ||||||||||||||
| ||||||||||||||