Remix.run Logo
nine_k 3 days ago

But we don't have a linear address space, unless you're working with a tiny MCU. For last like 30 years we have virtual address space on every mainstream processor, and we can mix and match pages the way we want, insulate processes from one another, add sentinel pages at the ends of large structures to generate a fault, etc. We just structure process heaps as linear memory, but this is not a hard requirement, even on current hardware.

What we lack is the granularity that something like iAPX432 envisioned. Maybe some hardware breakthrough would allow for such granularity cheaply enough (like it allowed for signed pointers, for instance), so that smart compilers and OSes would offer even more protection without the expense of switching to kernel mode too often. I wonder what research exists in this field.

imtringued 2 days ago | parent | next [-]

This feels like a pointless form of pendantry.

Okay, so we went from linear address spaces to partioned/disaggregated linear address spaces. This is hardly the victory you claim it is, because page sizes are increasing and thus the minimum addressable block of memory keeps increasing. Within a page everything is linear as usual.

The reason why linear address spaces are everywhere has to do with the fact that they are extremely cost effective and fast to implement in hardware. You can do prefix matching to check if an address is pointing at a specific hardware device and you can use multiplexers to address memory. Addresses can easily be encoded inside a single std_ulogic_vector. It's also possible to implement a Network-on-Chip architecture for your on-chip interconnect. It also makes caching easier, since you can translate the address into a cache entry.

When you add a scan chain to your flip flops, you're implicitly ordering your flip flops and thereby building an implicit linear address space.

There is also the fact that databases with auto incrementing integers as their primary keys use a logical linear address space, so the most obvious way to obtain a non-linear address space would require you to use randomly generated IDs instead. It seems like a huge amount of effort would have to be spent to get away from the idea of linear address spaces.

nine_k 2 days ago | parent [-]

We have a linear address space where we can map physical RAM and memory-mapped devices dynamically. Every core, at any given time, may have its own view of it. The current approach uses pretty coarse granularity, separating execution at the process level. The separation could be more granular.

The problem is the granularity of trust within the system. Were the MMU much faster, and TLB much larger (say, 128MiB of dedicated SRAM), the granularity might be pretty high, giving each function's stack a separate address space insulated from the rest of RAM. This is possible even now, just would be impractically slow.

Any hierarchical (tree-based) addressing scheme is equivalent to a linear addressing scheme, pick any tree traversal algorithm. Any locally-hierarchical addressing scheme seemingly can be implemented with (short) offsets in a linear address space; this is how most jumps in x64 and aarch64 are encoded, for instance.

latchup a day ago | parent [-]

It has very little to do with trust and a lot to do with the realities of hardware implementation. Every interconnect has a minimum linear transfer granularity that properly utilizes its hardware links, dictated primarily by its physical link width and minimum efficient burst length. The larger this minimum granularity, the faster and more efficient moving data becomes. However, below this granularity, bandwidth and energy efficiency crater. Hence, reducing access granularity below this limit has disastrous consequences.

In fact, virtual memory is already a limiting factor to increasing minimum transfer size, as pages must be an efficient unit of exchange. Traditional 4 KiB pages are already smaller than what would be a good minimum transfer size; this is exactly why hardware designers push for larger pages (with Apple silicon forgoing 4 KiB page support entirely).

I cannot help but feel that many of these discussions are led astray by the misconceptions of people with an insufficient understanding of modern computer architecture.

convolvatron 3 days ago | parent | prev | next [-]

its entirely possible to implement segments on top of paging. what you need to do is add the kernel abstractions for implementing call gates that change segment visibility, and write some infrastructure to manage unions-of-a-bunch-of-little-regions. I haven't implemented this myself, but a friend did on a project we were working on together and as a mechanism it works perfectly well.

getting userspace to do the right thing without upending everything is what killed that project

Joker_vD 3 days ago | parent | next [-]

There is also a problem of nested virtualization. If the VM has its own "imaginary" page tables on top of the hypervisor's page tables, then the number of actual physical memory reads goes from 4–6 to 16–36.

aforwardslash 3 days ago | parent | prev | next [-]

If I understood correctly, you'te talking about using descriptors to map segments; the issue with this approach is two-fold: it is slow (as each descriptor needs to be created for each segment - and sometimes more than one, if you need write-execute permissions), and there is a practical limit on the number of descriptors you can have - 8192 total, including call gates and whatnot. To extend this, you need to use LDTs, that - again - also require a descriptor in the GDT and are limited to 8192 entries. In a modern desktop system, 67 million segments would be both quite slow and at the same time quite limited.

convolvatron 2 days ago | parent [-]

no, not at all. we weren't using the underlying segmentation support. we just added kernel facilities to support segment ids and ranges and augment the kernel region structure appropriately. A call gate is just a syscall that changes the processes VM tables to include or drop regions (segments) based on the policy of the call.

aforwardslash 7 hours ago | parent [-]

Humm very interesting approach, are there any publicly available documentation links?

tliltocatl 3 days ago | parent | prev | next [-]

But that wouldn't protect against out-of boundary access (which is the whole point of segments), would it?

convolvatron 2 days ago | parent [-]

thats enforced by the VM hardware - we just shuffle the PTEs around to match the appropriate segment view

rep_lodsb 2 days ago | parent [-]

As long as it's a linear address space, adding/subtracting a large enough value to a pointer (array, stack) could still cross into another "segment".

convolvatron a day ago | parent [-]

but those wouldn't be mapped unless you have crossed a call gate that enabled them. the kernel call gate implementation changes the VM map (region visibility) accordingly

nine_k 3 days ago | parent | prev | next [-]

Indeed. Also, TLB as it exists on x64 is not free, nor is very large. A multi-level "TLB", such that a process might pick an upper level of a large stretch of lower-level pages and e.g. allocate a disjoint micro-page for each stack frame, would be cool. But it takes a rather different CPU design.

formerly_proven 3 days ago | parent | prev [-]

"Please give me a segmented memory model on top of paged memory" - words which have never been uttered

nine_k 3 days ago | parent [-]

There is a subtle difference between "give me an option" and "thrust on me a requirement".

bschmidt25014 3 days ago | parent [-]

[dead]

inkyoto 3 days ago | parent | prev [-]

> But we don't have a linear address space, unless you're working with a tiny MCU.

We actually do, albeit for a brief duration of time – upon a cold start of the system when the MCU is inactive yet, no address translation is performed, and the entire memory space is treated as a single linear, contiguous block (even if there are physical holes in it).

When a system is powered on, the CPU runs in the privileged mode to allow an operating system kernel to set up the MCU and activate it, which takes place early on in the boot sequence. But until then, virtual memory is not available.

loeg 3 days ago | parent [-]

Those holes can be arbitrarily large, though, especially in weirder environments (e.g., memory-mapped optane and similar). Linear address space implies some degree of contiguity, I think.

inkyoto 3 days ago | parent [-]

Indeed. It can get ever weirder in the embedded world where a ROM, an E(E)PROM or a device may get mapped into an arbitrary slice of physical address space, anywhere within its bounds. It has become less common, though.

But devices are still commonly mapped at the top of the physical address space, which is a rather widespread practice.

crote 3 days ago | parent [-]

And it's not uncommon for devices to be mapped multiple times in the address space! The different aliases provide slightly different ways of accessing it.

For example, 0x000-0x0ff providing linear access to memory bank A, 0x100-0x1ff linear access to bank B, but 0x200-0x3ff providing striped access across the two banks, with evenly-addressed words coming from bank A but odd ones from bank B.

Similarly, 0x000-0x0ff accessing memory through a cache, but 0x100-0x1ff accessing the same memory directly. Or 0x000-0x0ff overwriting data, 0x100-0x1ff setting bits (OR with current content), and 0x200-0x2ff clearing bits.