Remix.run Logo
ajross 16 hours ago

That's valid jargon but from the wrong layer of the stack. A Harvard bus is about the separation of the "instruction" memory from "data" memory so that (pipelined) instructions can fetch from both in parallel. And in practice it's implemented in the L1 (and sometimes L2) cache, where you have separate icache/dcache blocks in front of a conceptually unified[1] memory space.

The "Von Neumann architecture" is the more basic idea that all the computation state outside the processor exists as a linear range of memory addresses which can be accessed randomly.

And the (largely correct) argument in the linked article is that ML computation is a poor fit for Von Neumann machines, as all the work needed to present that unified picture of memory to all the individual devices is largely wasted since (1) very little computation is actually done on individual fetches and (2) the connections between all the neurons are highly structured in practice (specific tensor rows and columns always go to the same places), so a simpler architecture might be a better use of die space.

[1] Not actually unified, because there's a page translation, IO-MMUs, fabric mappings and security boundaries all over the place that prevents different pieces of hardware from actually seeing the same memory. But that's the idea anyway.