I had my formative years in programming when memory usage was something you still worried about as a programmer. And then memory expanded so much that all kinds of “optimal” patterns for programming just become nearly irrelevant. Will we start to actually consider this in software solutions again as a result?

▲

fulafel 9 hours ago | parent | next [-]

You're right in terms of fitting your program to memory, so that it can run in the first place.

But in performance work, the relative speed of RAM relative to computation has dropped such that it's a common wisdom to treat today's cache as RAM of old (and today's RAM as disk of old, etc).

In software performance work it's been all about hitting the cache for a long time. LLMs aren't too amenable to caching though.

▲

makapuf 8 hours ago | parent | next [-]

AFAIK, you can't explicitly allocate cache like you allocate RAM however. A bit like if you could only work on files and ram was used for cache. Maybe I am mistaken ? (Edit: typo)

▲

KeplerBoy 3 hours ago | parent | next [-]

You can in CUDA. You can have shared memory which is basically L1 cache you have full control over. It's called shared memory because all threads within a block (which reside on a common SM) have fast access to it. The downside: you now have less regular L1 cache.

▲

lou1306 7 hours ago | parent | prev [-]

You can't explicitly allocate cache, but you can lay things out in memory to minimize cache misses.

▲

tux3 5 hours ago | parent | next [-]

A fun fact for the people who like to go on rabbit holes. There is an x86 technique called cache-as-RAM (CAR) that allows you to explicitly allocate a range of memory to be stored directly in cache, avoiding the DRAM entirely.

CAR is often used in early boot before the DRAM is initialized. It works because the x86 disable cache bit actually only decouples the cache from the memory, but the CPU will still use the cache if you primed it with valid cache lines before setting the cache disable bit.

So the technique is to mark a particular range of memory as write-back cacheable, prime the cache with valid cache lines for the entire region, and then set the bit to decouple the cache from memory. Now every access to this memory region is a cache hit that doesn't write back to DRAM.

The one downside is that when CAR is on, any cache you don't allocate as memory is wasted. You could allocate only half the cache as RAM to a particular memory region, but the disable bit is global, so the other half would just sit idle.

▲

crakenzak 6 hours ago | parent | prev [-]

Out of curiosity, why has there not been a slight paradigm shift in modern system programming languages to expose more control over the caches?

▲

pjc50 4 hours ago | parent | next [-]

Same as the failure of Itanium VLIW instructions: you don't actually want to force the decision of what is in the cache back to compile time, when the relevant information is better available at runtime.

Also, additional information on instructions costs instruction bandwidth and I-cache.

▲

david-gpu an hour ago | parent [-]

> you don't actually want to force the decision of what is in the cache back to compile time, when the relevant information is better available at runtime

That is very context-dependent. In high-performance code having explicit control over caches can be very beneficial. CUDA and similar give you that ability and it is used extensively.

Now, for general "I wrote some code and want the hardware to run it fast with little effort from my side", I agree that transparent caches are the way.

	▲	10000truths 34 minutes ago \| parent [-]
		x86 provides this control with non-temporal load/store instructions.

▲

greybcg 5 hours ago | parent | prev | next [-]

There are cache control instructions already. The reason why it goes no further than prefetch/invalidate hints is probably because exposing a fuller api on the chip level to control the cache would overcomplicate designs, not be backwards compatible/stable api. Treating the cache as ram would also require a controller, which then also needs to receive instructions, or the cpu has to suddenly manage the cache itself.

I can understand why they just decide to bake the cache algorithms into hardware, validate it and be done with it. Id love if a hardware engineer or more well-read fellow could chime in.

	▲	Someone 5 hours ago \| parent [-]
		Another reason for doing cache algorithms in hardware is that cache access (especially for level 1 caches) has to be low latency to be useful.

▲

Tuna-Fish 5 hours ago | parent | prev | next [-]

Because programmers are in general worse at managing them than the basic LRU algorithm.

And because the abstraction is simple and easy enough to understand that when you do need close control, it's easy to achieve by just writing to the abstraction. Careful control of data layout and nontemporal instructions are almost always all you need.

▲

zozbot234 3 hours ago | parent | prev | next [-]

This is not applicable to most programming scenarios since the cache gets trashed unpredictably during context switches (including the user-level task switches involved in cooperative async patterns). It's not a true scratchpad storage, and turning it into one would slow down context switches a lot since the scratchpad would be processor state. Maybe this can be revisited once even low-end computers have so many hardware cores/threads that context switches become so rare that the overhead is not a big deal. But we are very far from anything of the sort.

▲

rwmj 5 hours ago | parent | prev [-]

There has! Intel has Cache Acceleration Technology, and I was very peripherally involved in reviewing research projects at Boston University into this. One that I remember was allowing the operating system to divide up cache and memory bandwidth for better prioritization.

https://www.intel.com/content/www/us/en/developer/articles/t...

▲

seanmcdirmid 8 hours ago | parent | prev [-]

LLMs need memory bandwidth to stream lots of data through quickly, not so much caching. Well, this is basically the same way that a GPU uses memory.

	▲	zozbot234 3 hours ago \| parent [-]
		OTOH, LLM inference tends to have very predictable memory access patterns. So well-placed prefetch instructions that can execute predictable memory fetches in parallel with expensive compute might help CPU performance quite a bit. I assume that this is done already as part of optimized numerical primitives such as GEMM, since that's where most of the gain would be.

▲

dahcryn 3 hours ago | parent | prev | next [-]

I've actively started to use outlook and teams through chrome to free up some of my ram, easily saves 3-4gb. It's gotten ridiculous how much ram basic tools are using, leaving nothing for doing actually real work

	▲	christophilus 40 minutes ago \| parent \| next [-]
		People get on me all the time about not installing programs on my computer. I run everything in the browser, if I can. Partly so I can kill it properly without it misbehaving, and partly because I don't trust their software at all. Zoom, Slack, Gmail, etc-- if I can run it in the browser, then that's the only way I'll run it.
	▲	emeril 24 minutes ago \| parent \| prev [-]
		ive found the web versions use a similar amount of memory and have fewer features my issue is that my company won't issue laptops with more than 16 gbs of ram guess i'm not virtualizing anything...

▲

throw0101a 30 minutes ago | parent | prev | next [-]

> I had my formative years in programming when memory usage was something you still worried about as a programmer.

As 'just' a user in the 1990s and MS-DOS, fiddling with QEMM was a bit of a craft to get what you wanted to run in the memory you had.

* https://en.wikipedia.org/wiki/QEMM

(Also, DESQview was awesome.)

▲

jacquesm 9 hours ago | parent | prev | next [-]

> And then memory expanded so much that all kinds of “optimal” patterns for programming just become nearly irrelevant.

I don't think that ever happened. Using relatively sparse amount of memory turns into better cache management which in turn usually improves performance drastically.

And in embedded stuff being good with memory management can make the difference between 'works' and 'fail'.

▲

zeta0134 8 hours ago | parent | next [-]

The need to use optimal patterns didn't go away, but the techniques certainly did. Just as a quick example, it's usually a bad idea now to use lookup tables to accelerate small math workloads. The lookup table creates memory pressure on the cache, which ends up degrading performance on modern systems. Back in the 1980s, lookup tables were by far the dominant technique because math was *slow.*

▲

zozbot234 3 hours ago | parent | next [-]

> Back in the 1980s, lookup tables were by far the dominant technique because math was slow.

This actually generalizes in a rather clean way: compared to the 1980s, you now want to cheaply compress data in memory and use succinct representations as much as practicable, since the extra compute involved in translating a more succinct representation into real data is practically free compared to even one extra cacheline fetch from RAM (which is now hundreds of cycles latency, and in parallel code often has surprisingly low throughput).

▲

QuadmasterXLII 2 hours ago | parent [-]

It’s a mad word where ultimate performance in one problem can require compressing data in ram and in another storing it uncompressed on disc.

	▲	bonesss 2 hours ago \| parent [-]
		The same atmosphere that makes bread hard makes crackers soft.

▲

jacquesm 3 hours ago | parent | prev [-]

The way to approach this is to benchmark and then pick the best solution.

▲

_fizz_buzz_ 4 hours ago | parent | prev | next [-]

It obviously never became completely irrelevant. But I think programmers spend a lot less time thinking about memory than they used to. People used to do a lot of gymnastics and crazy optimizations to fit stuff into memory. I do quite a bit of embedded programming and most of the time it seems easier for me to simply upgrade the MCU and spend 10cents more (or whatever) than to make any crazy optimimzations. But of course there are still cases where it makes sense.

▲

yread 7 hours ago | parent | prev [-]

When was the last time you used mergesort because you had to?

	▲	jacquesm 3 hours ago \| parent [-]
		Coincidentially, last night, and I'm not pulling your leg! But to be fair that's the first time in much more than a decade. I don't normally work with such huge files and this was one very rare exception. I also nearly crashed my machine by triggering the OOM killer after naively typing 'vi file' without first checking how large it had become. I'm working on a project that I probably should run on a more serious machine but I don't feel like moving my whole work environment from the laptop that I normally use.

▲

rTX5CMRXIfFG 9 hours ago | parent | prev | next [-]

I never really bought in to the anti-Leetcode crowd’s sentiment that it’s irrelevant. It has always mattered as a competitive edge, against other job candidates if you’re an employee or the competition of you’re a company. It only looked irrelevant because opportunities were everywhere during ZIRP, but good times never last.

▲

ponector 3 hours ago | parent | next [-]

It mattered to pass through the interview, but not for the job itself. With all leetcode geniuses in Microsoft why Teams and Windows are so shitty?

	▲	majewsky 2 hours ago \| parent [-]
		Because they are only allowed to review what the LLM has come up with.

▲

raw_anon_1111 8 hours ago | parent | prev [-]

Most developers work at banks, insurance companies and other “enterprise” jobs. Even most developers at BigTech and who are working “at scale” are building on top of scalable infrastructure and aren’t worrying about reversing a btree on a whiteboard.

▲

AdamN 4 hours ago | parent | next [-]

Agree that the whiteboard thing is often not applicable but it's so nice when a developer has efficient code if only because it indicates that they know what's going on and also that there are fewer bugs and other bottlenecks in the system.

	▲	raw_anon_1111 2 hours ago \| parent [-]
		Those bugs don’t come from using the wrong algorithm, they come from not understanding the business case of what you’re writing. Most performance issues in the real world for most cases don’t have anything to do with the code. It’s networking, databases, etc. Your login isn’t slow because the developer couldn’t do leetcode

▲

8 hours ago | parent | prev [-]

[deleted]

▲

cyberrock 6 hours ago | parent | prev | next [-]

It's not like most developers are wasting memory for fun by using Electron etc. It's just the simplest way to deploy applications that require frequent multiplatform changes. Until you get Apple to approve native app changes faster and Linux users to agree on framework, app distribution, etc., it's the most optimal way to ship a product and not just a program.

	▲	close04 6 hours ago \| parent [-]
		> for fun Not for fun but for convenience (laziness occasionally?). Someone needed to "pay" for the app being available on all platforms. Either the programmer by coding and optimizing multiple times, or the user by using a bloated unoptimized piece of software. The choice was made to have the user pay. It's been so long I doubt recent generations of coders could even do it differently.

▲

zarzavat 3 hours ago | parent | prev | next [-]

RAM didn't get more expensive to produce. It just got more desirable. The prices will come down again when supply responds. It may take some time, but it will happen eventually.

▲

StopDisinfo910 3 hours ago | parent | next [-]

RAM production is highly inelastic and controlled by an oligopoly. They have little desire to increase production considering the lead time and the risk that the AI demand might be transient.

They actively prefer keeping confortable margins than competing between each other. They have already been condemned for active collusion in the past.

New actors from China could shake things up a bit but the geopolitical situation makes that complicated. The market can stay broken for a long time.

	▲	zozbot234 3 hours ago \| parent [-]
		They are increasing production as fast as they can (which is not fast at all, it's more like slowly steering a huge ship towards the correct direction) because current prices are too high even when accounting for the historical oligopoly dynamics. They can easily increase their collective profits by making more.

▲

zozbot234 3 hours ago | parent | prev [-]

RAM actually got more expensive to produce in the medium term because production is bottlenecked. It takes years to expand production.

▲

yxhuvud 5 hours ago | parent | prev | next [-]

We would have, if the expensive memory was a long term trend. It is not - eventually the supply will expand to match demand. There is no fundamental lack of raw materials underlying the issues, it is just a demand shock.

	▲	junon 5 hours ago \| parent [-]
		Also, it's not like we have regressed in the process itself either, which was historically the limiting factor. As you said this is purely an economics thing resulting from a greedy shift in business focus by e.g. Micron.

▲

jooz 5 hours ago | parent | prev | next [-]

When I train some leetcode problems, I remember the best solution was the one that optimised cpu (time) instead of memory. Meaning adding data index in memory instead of iterating on the main data structure. I thought, ok, thats fine, it's normal, you can (could) always buy more RAM, but you can't buy more time.

But well, I think there is no right answer and there always be a trade off case by case depending on the context.

▲

lmcd 6 hours ago | parent | prev | next [-]

I've recently started a side project for the N64, and this is very relatable! Working within such tight constraints is most of the fun.

▲

ReedorReed 8 hours ago | parent | prev | next [-]

I just heard in a podcast, they talked about how powerful our devices are today but do not feel faster than they did 15 years ago and that it's because of what you write here.

	▲	AdamN 4 hours ago \| parent [-]
		A lot of that is on the OS vendors (and security requirements drive some inefficiencies that didn't used to be needed either).

▲

NooneAtAll3 8 hours ago | parent | prev [-]

most likely in a couple years this bubble will pop, just like 8 years and 16 years ago

it's just a cartel cycle of gaining profits while soon eliminating all investments into competitors when flood of cheap ram "suddenly" appears

▲

thfuran 8 hours ago | parent | next [-]

This is coming from an insane demand spike, not some nefarious plot by the RAM manufacturers.

▲

sekai 5 hours ago | parent | next [-]

> This is coming from an insane demand spike, not some nefarious plot by the RAM manufacturers.

Something something, 2000 dot-com bubble, something

▲

pipes 5 hours ago | parent | prev | next [-]

I can never understand why so many people resort conspiracy theories when the obvious answer is supply and demand. I know well educated people, who do this when they talk about the resential property market. (Including an accountant).

	▲	inigyou 2 hours ago \| parent [-]
		Supply and demand can be caused by a conspiracy. OpenAI secretly bought 40% of the world's RAM on purpose. It's only a conspiracy if Anthropic and Google did something similar, though.

▲

jonathanlydall 3 hours ago | parent | prev | next [-]

Which is in large part due to hoarding by OpenAI.

Although their stated reason for hoarding is that they "really need it", I think it was a strategic move to make their competitors' lives more difficult with little regard for the collateral consequences to non-competitors, such as regular people or companies needing new computers.

▲

cyanydeez 8 hours ago | parent | prev [-]

Yes, it's a nefarious plot of AI producers to attempt a monopoly with a product that no one seems capable of demonstrating has the exponential value they're betting on.

▲

adornKey 6 hours ago | parent | next [-]

Once everybody has a decent amount of VRAM they can just run local AIs and the need to mess with Ad-laden search results will fizzle. So of course they are desperate to grab a new monopoly. People haven't realised yet, that local AIs are fast and produce good results - on pretty average hardware. If they don't manage to grab a new monopoly Google will be history.

But it doesn't really need a nefarious plot for the price spikes. There is a serious lack of VRAM deployed out there. Filling that gap will take quite some time. Add to that the nefarious plot and the situation will most likely get even worse....

▲

mrob 6 hours ago | parent [-]

LLM inference is mostly read only, so high-bandwidth flash looks like it could provide huge cost savings over VRAM. It's not yet in commercial products but there are working prototypes already. Previous HN discussion:

https://news.ycombinator.com/item?id=46700384

	▲	whosegotit 4 hours ago \| parent [-]
		[dead]

▲

jug 3 hours ago | parent | prev [-]

AI companies yes, RAM manufacturers no.

▲

seanmcdirmid 8 hours ago | parent | prev [-]

Eventually new capacity will come online, and the money the DRAM companies are making are going to accelerate even ,ore new capacity. If you can get your new capacity going before your competitors, maybe you can avoid a bubble burst. If you don’t build new capacity, your competitors will, etc, etc…

	▲	debugnik 5 hours ago \| parent [-]
		They're not building any new manufacturing capacity though. They assume this is a demand bubble and they don't want supply to exceed demand after it pops.