Remix.run Logo
kh9000 4 days ago

Windows developer here. After reading this post, my gut instinct is that this is due to something called 'segment heap'.

A bit of backstory: there are two, totally independent implementations behind the Windows heap allocation APIs (i.e. the implementation code behind RtlHeapAlloc and RtlHeapFree, which are called by malloc/free). The older of the two, developed uring the Dave Cutler era, is known as the "NT heap". The newer implementation, developed in the 2010s, is known as "segment heap". This is all documented online if anyone wants to read more. When development on segment heap was completed, it was known to be superior to the NT heap in many ways. In particular, it was more efficient in terms of memory footprint, due to lower fragmentation-related waste. Segment heap was smarter about reusing small allocations slots that were recently free'd. But, as ever, Windows was very serious about legacy app compat. Joel Spolsky calls this the 'Raymond Chen camp'. So, they didn't want to turn segment heap on universally. It was known that a small portion of legacy software would misbehave and do things like, rely on doing a bit of use-after-free as a treat. Or worse, it took dependencies on casting addresses to internal NT heap data structures. So, the decision at the time was to make segment heap the default for packaged executables. At that time, Windows Phone still existed, and Microsoft was pushing super hard on the Universal platform being the new, recommended way to make apps on Windows. So they thought we'd see a gradual transition from unpackaged executables to packaged, and thus, a gradual transition from NT heap to segment heap. The dream of UWP died, and the Windows framework landscape is more fragmented than ever. Most important software on Windows is still unpackaged, and most of it runs on x64.

Why does this matter? Because segment heap is also enabled by default on arm. Same logic as the packaged vs unpackaged decision. Arm64 binaries on Windows are guaranteed not to be ancient, unmaintained legacy code. Arm64 windows devices have been a big success, and users widely report that they feel more responsive than x64 devices.

A not insignificant part of why Windows feels better on arm is because segment heap is enabled by default on arm.

I'd be interested to see how this test turns out if you force segment heap on x64. You can do it on a per-executable basis via creating a DWORD value named FrontEndHeapDebugOptions under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\<myExeName>.exe, and giving it a value of 8.

You can turn it on globally for all processes by creating a DWORD value named "Enabled" under HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Segment Heap, and giving it a value of 3. I do this on my dev machine and have encountered zero problems. The memory footprint savings are pretty crazy. About 15% in my testing.

adzm 4 days ago | parent | next [-]

For those interested, you can opt-in to this behavior via the application manifest for your own executables: set heapType to SegmentHeap https://learn.microsoft.com/en-us/windows/win32/sbscs/applic...

garganzol 4 days ago | parent | prev | next [-]

I've measured NT Heap vs. Segment Heap for my RAM and CPU intensive workloads and got a steady 7% overall performance improvement. The combined workload finishes 7% quicker with the Segment Heap.

P.S. In Windows 95 - Windows Vista era, there was a good tradition of "Compatible with Windows XXX" certifications for apps. If MS did something like that for Windows 10/11 and included the segment heap tick mark into it, a considerably larger amount of apps and its users would benefit from increased performance. Think better energy consumption and eco-friendliness as additional bonuses.

P.S. 2: The problem with UWP was not the technology itself, it was the stubbornness to have it packaged and tied to The Store, all of which contradicts the very existence of Windows as an OS.

jeroenhd 4 days ago | parent [-]

UWP is not strictly tied to the Windows store (you can install UWP applications packaged in the right format(s) from the command line, for business deployments for instance), but it might as well be when it comes to consumers.

I can't really complain, though. If UWP would've broken through, the Steam Deck would've probably been a much more massive undertaking to get working right.

As long as developers can opt into the new system (which they can with the manifest approach), I don't think it matters whether you're doing UWP or traditional Windows applications.

Microsoft has added a mishmash of flags in the app manifest and transparently supports manifest-less applications, so developers don't have a need to ever bother including a manifest either.

It'd annoy a lot of people, but if Windows would show a "this app has been written for an older version of Windows and may be slower than modern applications" warning for old .exes (or maybe one of those popups they now like about which apps are slower than they could be), developers would have an incentive to add a manifest to their applications and Microsoft could enable a lot more of these optimisations for a lot more applications.

cesarb 4 days ago | parent [-]

> As long as developers can opt into the new system (which they can with the manifest approach) [...] Microsoft has added a mishmash of flags in the app manifest

Could you please tell me, where are all these manifest flags documented? I asked about it a decade and a half ago at stackoverflow (https://stackoverflow.com/questions/5733085/application-mani...), and the only answer was "there isn't".

jeroenhd 4 days ago | parent [-]

https://learn.microsoft.com/en-us/windows/win32/sbscs/applic... has the majority here.

I don't see why you'd need a separate flag for memory management, Windows version, printer driver isolation, awareness of long paths, and all of that jazz.

Still, https://learn.microsoft.com/en-us/windows/win32/sbscs/applic... has a setting to enable modern memory management.

ack_complete 4 days ago | parent | prev | next [-]

Two issues.

First, regarding application compatibility: the heap was already changed once prior to the segment heap. The Low Fragmentation Heap (LFH) was added in XP and made default in Vista, with applications no longer having to opt into it:

https://learn.microsoft.com/en-us/windows/win32/memory/low-f...

Second, the segment heap has different tradeoffs that make it not a guaranteed win to swap in, it trades off performance for working set:

https://issues.chromium.org/issues/40138716

kh9000 4 days ago | parent [-]

It's complicated. It's not always a straightforward space vs time tradeoff. For chromium's allocation patterns, it sounds like segment heap was slower. But BinaryNinja reported the opposite! See https://github.com/Vector35/binaryninja-api/issues/2778

Side note on the Chromium topic: Google Chrome decided NT Heap is still best for their usage, but Microsoft Edge, which is also built on the Chromium, uses segment heap. Not sure what Firefox uses. You can check by attaching WinDbg and doing !heap. Note that not every heap will be segment heap, even if you globally opt into segment heap. Some code paths explicitly create their own heaps as NT heaps.

At the very least, using fewer pages to allocate the same amount of data improves memory locality slightly. Folks should test and see what works best in their applications.

Another benefit of segment heap that we haven't discussed yet is that it's more strict and proactive about detecting problems and terminating. From what I understand, heap metadata is now stored separately from heap data, and they use guard pages. So heap buffer overruns don't overwrite the heap manager's bookkeeping. With NT heap, crashes due to use-after-free might manifest much later and more indirectly. Like, maybe it overwrote the free list, or it overwrote some newer allocation that landed on the same address. So, the crash is usually in some unlucky 'innocent bystander' call stack that worked with the corrupted region. With segment heap, you tend to get earlier, more actionable, specific crashing call stacks, closer to the site of the original bug. So, if you're an engineer who looks at a lot of difficult windows crash dumps involving memory corruption, segment heap makes the challenge slightly more surmountable.

garganzol 3 days ago | parent [-]

> Segment heap is more strict and proactive about detecting problems and terminating

I definitely noted that in my tests. Under load, machines with flaky RAM have higher memory access violation rates compared to NT Heap.

bentcorner 4 days ago | parent | prev | next [-]

This feels like it deserves to live somewhere on a blog, not as a comment on some forum. This is really interesting thanks for sharing.

jamesfinlayson 4 days ago | parent [-]

Agreed! Looks like it can be enabled with a manifest at build time too: https://learn.microsoft.com/en-us/windows/win32/sbscs/applic...

zamadatix 4 days ago | parent | prev | next [-]

> You can turn it on globally for all processes by creating a DWORD value named "Enabled" under HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Segment Heap, and giving it a value of 3

I had previously seen this described as 0 vs non-zero. Since you have some inside experience :), anything special about 3 instead? What about 2? How would I find these value meanings out on my own (if that's even possible)?

Thanks!

kh9000 4 days ago | parent [-]

It's a combinination of bit flags. The lowest bit controls whether segment heap is on or off. The 2nd lowest bit bit controls some additional optimizations that go along with it, something about multithreading. A value of 3 (both flags set) gives you identical behavior to what specifying <heapType>SegmentHeap</heapType> in your application manifest does.

Using the application manifest approach is the right way to ship software that opts into segment heap. The registry thing is just a convenience for local testing.

Melatonic 4 days ago | parent | next [-]

How often does software actually ship with the opt in for segment heap turned on though ?

Anyway to globally turn it on when a blacklist or denylist or whatever in case something individual acts up ?

kh9000 4 days ago | parent [-]

Not nearly often enough, because most veteran Windows deverlopers don't even know what segment heap is, or that they have a choice. VS code, for example, is still on NT Heap. It is heartbreaking how under-utilized and under-publicized segment heap is. Raymond Chen needs to make a public service announcement or something.

For the question of how to do "segment heap on globally, with a list of exceptions that are still on NT Heap", I believe the "Image File Execution Options" regkey takes precedence over the global one. And the IFEO one lets you explicitly opt out. If you read the whitepaper from Mark Yason's 2016 talk at black hat, they explain how to use these registry keys.

Melatonic 3 days ago | parent [-]

Besides better ram utilisation is it offering actual performance improvements (regardless of ram availability) ? Need to read more about this

kh9000 3 days ago | parent [-]

In terms of speed, anecdotally, it seems like it can go either way. Some programs run faster, some run slower. Usage patterns of malloc/free are so varied that it's probably impossible to optimize for one without hurting others.

lloydatkinson 4 days ago | parent | prev | next [-]

Would the (not Framework) .NET apps I work on benefit from this?

garganzol 4 days ago | parent [-]

Any app using memory allocation functions would benefit from a newer heap implementation independently of a technology it's created with, unless it's actively constrained by compatibility burdens. In case of .NET, the memory layout compatibility is not something you usually care about unless the app loads old 3rd party .DLLs through P/Invoke. So for 99.9% of .NET (not Framework) apps, the segment heap should work just fine.

zamadatix 4 days ago | parent | prev [-]

Much thanks, this is why I come to HN!

pjc50 4 days ago | parent | prev | next [-]

This is the sort of extremely valuable hint that makes HN worthwhile.

Does that global registry key require a reboot, or does it just take effect on executable launch?

kh9000 4 days ago | parent [-]

It takes effect on executable launch.

shoobiedoo 4 days ago | parent | prev | next [-]

Wonderful breakdown. I love reading this kind of thing. thank you

Melatonic 4 days ago | parent | prev | next [-]

Seems like a no brainer on virtualised environments for Windows servers ?

Also assuming that most Microsoft first party applications in Windows server (DNS, etc etc) would all be optimised for segment heap ?

nnevatie 4 days ago | parent | prev | next [-]

Ahh, a Windows problem surfaces - so does a registry hack allegedly fixing the problem. That's basically the (sad) story of Windows today.

p_ing 4 days ago | parent | next [-]

I don't see how this is a 'problem', but rather a tunable. And any decent OS has tunables. Would you rather not have this option?

nnevatie 4 days ago | parent [-]

I would rather have some meaningful progress on the Windows scheduler, for example.

p_ing 3 days ago | parent [-]

It's doubtful that there would be 1000 people dedicated to the scheduler, or that it would produce a better result than the 1 to 4 people that likely work or are tied to owning the scheduler.

That said, what deficiencies do you see in the scheduler with the current build of Windows?

layer8 4 days ago | parent | prev | next [-]

You could say that Windows is giving you options while defaulting to backwards compatibility.

creaturemachine 4 days ago | parent | prev [-]

One could also call it tuning.

ww520 4 days ago | parent | prev [-]

I feel there should be a PowerToy applet to turn Segment Heap on or off.

kh9000 4 days ago | parent [-]

This is a great idea, honestly. PowerToys is open source. There's a decent change that they would be open to such a contribution.

It is a crime that segment heap is over a decade old and still so underutilized. Gamers in particular go to such great lengths to tweak and optimize their windows machines for perf, but I still haven't seen that crowd discussing segment heap anywhere. It's more important than ever with the recent explosion in RAM cost.