Remix.run Logo
ryao 2 days ago

This is the result of an industry wide problem where technology just is not moving forward as quickly as it used to move. Dennard scaling is dead. Moore’s law is also dead for SRAM and IO logic. It is barely clinging to life for compute logic, but the costs are skyrocketing as each die shrink happens. The result is that we are getting anemic improvements. This issue is visible in Nvidia’s graphics offerings too. They are not improving from generation to generation like they did in the past, despite Nvidia turning as many knobs as they could to higher values to keep the party going (e.g. power, die area, price, etcetera).

timschmidt 2 days ago | parent [-]

Jim Keller disagrees: https://www.youtube.com/watch?v=oIG9ztQw2Gc

ryao a day ago | parent | next [-]

That talk predates the death of SRAM scaling. I will not bother wasting my time watching a video that is out of date.

That said, you should read that I did not say Moore’s Law was entirely dead. It is dead for SRAM and IO logic, but is still around for compute logic. However, pricing is shooting upward with each die shrink far faster than it did in the past.

pjmlp a day ago | parent | prev [-]

Hardware improvements only matter to the extent software is actually able to make use of them.

timschmidt a day ago | parent [-]

And? Software is getting more sophisticated and capable too. First time I switched an iter to a par_iter in Rust and saw the loop spawn as many threads as I have logical cores felt like magic. Writing multi-threaded code used to be challenging.

pjmlp a day ago | parent [-]

Now make that multi-threaded code exhaust a 32 core desktop system, all the time, not only at peak execution.

As brownie points, keep the GPU busy as well, beyond twirling its fingers while keeping the GUI desktop going.

Even more points if the CPU happens to have a NPU or integrated FPGA, and you manage to also keep them going alongside those 32 cores, and GPU.

timschmidt a day ago | parent [-]

> Now make that multi-threaded code exhaust a 32 core desktop system

Switching an iter to par_iter does this. So long as there are enough iterations to work through, it'll exhaust 1024 cores or more.

> all the time, not only at peak execution.

What are you doing that keeps a desktop or phone at 100% utilization? That kind of workload exists in datacenters, but end user devices are inherently bursty. Idle when not in use, race to idle while in use.

> As brownie points, keep the GPU busy as well... Even more points if the CPU happens to have a NPU or integrated FPGA

In a recent project I serve a WASM binary from an ESP32 via Wifi / HTTP, which makes use of the GPU via WebGL to draw the GUI, perform CSG, calculate toolpaths, and drip feed motion control commands back to the ESP. This took about 12k lines of Rust including the multithreaded CAD library I wrote for the project, only a couple hundred lines of which are gated behind the "parallel" feature flag. It was way less work than the inferior C++ version I wrote as part of the RepRap project 20 years ago. Hence my stance that software has become increasingly sophisticated.

https://github.com/timschmidt/alumina-firmware

https://github.com/timschmidt/alumina-ui

https://github.com/timschmidt/csgrs

What's your point?

pjmlp a day ago | parent [-]

The point being those are very niche cases that still don't keep the hardware busy as it should 24h around the clock.

Most consumer software even less, hence why anyone will hardly see a computer on the shopping mall with higher than 16 core count, and on average most shops will have something between 4 and 8.

Also a reason why systems with built-in FPGAs failed in the consumer market, specialised tools without consumer software to help sell them.

timschmidt a day ago | parent [-]

> don't keep the hardware busy as it should 24h around the clock.

If your workload demands 24/7 100% CPU usage, Epyc and Xeon are for you. There you can have multiple sockets with 256 or more cores each.

> Most consumer software even less

And yet, even in consumer gear which is built to a minimum spec budget, core counts, memory capacity, pcie lanes, bus bandwidth, IPC, cache sizes, GPU shaders, NPU TOPS, all increasing year over year.

> systems with built-in FPGAs failed in the consumer market

Talk about niche. I've never met an end user with a use for an FPGA or the willingness to learn what one is. I'd say that has more to do with it. Write a killer app that regular folks want to use that requires one, and they'll become popular. Rooting for you.

pjmlp 12 hours ago | parent [-]

You have to root for those hardware designers to have software devs in quantities, actually using what they produce, at scale.