You can also use virtual memory for a stable resizable vector implementation, up to some max length based on how much you virtual memory you reserve initially, then commit as required to grow the physical capacity.

▲

loeg 4 days ago | parent | next [-]

Yeah, with less runtime overhead, so long as you're ok with the ~4kB minimum allocation size.

▲

IshKebab 4 days ago | parent [-]

I think optimal would be just normal vector under 4 kB and then switch to whole pages after that.

Do any vector implementations & allocators actually do this though?

▲

loeg 4 days ago | parent [-]

You lose the in-place resize property if you don't start with an entire page.

As far as "has anyone implemented this?" -- I don't know.

	▲	IshKebab 3 days ago \| parent [-]
		Yeah but it probably makes little difference if your vector is under 4 kB anyway.

▲

fyrn_ 4 days ago | parent | prev [-]

This mention this alturnative in the article, and also point out how it does not work in embeded contexts or with WASM

▲

ncruces 4 days ago | parent | next [-]

I've actually used this to implement the linear memory of a Wasm runtime (reserve 4GB and commit as needed) and have faced user complaints (it's in a library that uses a Wasm runtime internally) due to it unexpectedly running into issues, particularly under nested virtualization scenarios.

I've needed to add knobs to configure it, because even a handful of 4GB instances causes issues. I've defaulted to 256MB/instance, and for my own GitHub CI use 32MB/instance to reduce test flakiness.

This is to say: I found the idea that “just reserve address space, it's not an issue of you don't commit it” very flaky in practice, unless you're running on bare metal Linux.

▲

cmrdporcupine 3 days ago | parent | next [-]

Like you, I've played with it many times and am always intrigued by this idea of handling sparsity this way, but I just end up backing out these changes in the end because it never seems to pay off like my geeky fantasy plays it out.

The LeanStore and Umbra DB buffer pool management systems intrigued me. I even got paid to work on a system like this (buffer pool at an analytical database company). The KISS data structure (https://dl.acm.org/doi/10.1145/2236584.2236587) and the START adaptive radix tree (https://db.in.tum.de/~fent/papers/Self%20Tuning%20Art.pdf?la...) have also intrigued me. I've been very intrigued and almost always disappointed.

In practice performance gains have never really materialized for me ... I do think there are wins to be had here, but only for specialized cases.

And just watch how confused and annoyed your users are when they see an absolutely huge VSS. Sysadmins freaking out at your "insane memory usage" (which is really only on paper not in reality). And the difficulty of profiling or tracking what's going on easily.

▲

loeg 4 days ago | parent | prev [-]

Historically the Java runtime did something similar, with similar problems (exacerbated by being older hardware with smaller RAM).

Virtual address space just isn't free.

	▲	Dylan16807 2 days ago \| parent [-]
		On the other hand if we're speaking historically then it was probably using a lot more than a hundredth of a percent of the address space. Or a billionth of the address space, depending on how we're counting. It doesn't need to be free, but your address space use should be able to go a handful of bits above your physical memory without problems. Even fully allocating all the page entries, with no large pages, on an architecture with 4KB pages, would only take 8MB to track a 4GB allocation.

▲

loeg 4 days ago | parent | prev [-]

Embedded environments without virtual memory are increasingly rare, and generally not places where you would need a variably sized generic list ADT anyway.