Please explain to me like I am five: Why does OpenAI need so much RAM?

2024 production was (according to openai/chatgpt) 120 billion gigabytes. With 8 billion humans that's about 15 GB per person.

▲

GistNoesis 6 minutes ago | parent | next [-]

What they need is need is not so much memory but memory bandwidth.

For training, their models have a certain number of memory needed to store the parameters, and this memory is touched for every example of every iteration. Big models have 10^12 (>1T )parameters, and with typical values of 10^3 examples per batch, and 10^6 number of iteration. They need ~10^21 memory accesses per run. And they want to do multiple runs.

DDR5 RAM bandwidth is 100G/s = 10^11, Graphics RAM (HBM) is 1T/s = 10^12. By buying the wafer they get to choose which types of memory they get.

10^21 / 10^12 = 10^9s = 30 years of memory access (just to update the model weights), you need to also add a factor 10^1-10^3 to account for the memory access needed for the model computation)

But the good news is that it parallelize extremely well. If you parallelize you 1T parameters, 10^3 times, your run time is brought down to 10^6 s = 12 days. But you need 10^3 *10^12 = 10^15 Bytes of RAM by run for weight update and 10^18 for computation (your 120 billions gigabytes is 10^20, so not so far off).

Are all these memory access technically required : No if you use other algorithms, but more compute and memory is better if money is not a problem.

Is it strategically good to deprive your concurrents from access to memory : Very short-sighted yes.

It's a textbook cornering of the computing market to prevent the emergence of local models, because customers won't be able to buy the minimal RAM necessary to run the models locally even just the inferencing part (not the training). Basically a war on people where little Timmy won't be able to get a RAM stick to play computer games at Xmas.

▲

mebassett an hour ago | parent | prev | next [-]

large language models are large and must be loaded into memory to train or to use for inference if we want to keep them fast. older models like gpt3 have around 175 billion parameters. at float32s that comes out to something like 700GB of memory. newer models are even larger. and openai wants to run them as consumer web services.

▲

lysace an hour ago | parent [-]

I mean, I know that much. The numbers still don't make sense to me. How is my internal model this wrong?

For one, if this was about inference, wouldn't the bottleneck be the GPU computation part?

	▲	ssl-3 24 minutes ago \| parent [-]
		Concurrency? Suppose some some parallelized, distributed task requires 700GB of memory (I don't know if it does or does not) per node to accomplish, and that speed is a concern. A singular pile of memory that is 700GB is insufficient not because it lacks capacity, but instead because it lacks scalability. That pile is only enough for 1 node. If more nodes were added to increase speed but they all used that same single 700GB pile, then RAM bandwidth (and latency) gets in the way.

▲

daemonologist 23 minutes ago | parent | prev [-]

The conspiracy theory (which, to be clear, may be correct) is that they don't actually need so much RAM, but they know they and all their competitors do still need quite a bit of RAM. By buying up all the memory supply they can, for a while, keep everyone else from being able to add compute capacity/grow their business/compete.