Remix.run Logo
jobead 4 hours ago

Why is everyone so focused on heat when cosmic rays randomly scrambling memory is going to be a much bigger problem?

jcranmer 4 hours ago | parent | next [-]

When I was peripherally working on some HPC stuff, there was a comment by one of the hardware guys that it mattered which national lab you were building the supercomputer for, because the guys at high altitude like Los Alamos get a lot more bitflips than someone closer to sea-level like Argonne. Although that said, for an exascale supercomputer, the mean time between uncorrected bit flip somewhere in the machine is on the order of a few hours, which means that large supercomputer-scale workloads should actually expect to hit a bit flip in their computation.

Pxtl 4 hours ago | parent | prev | next [-]

Cosmic rays can probably be resolved with some parity bits and redundancy. Especially since these don't have to be located way up in geosynch, lower orbits get you more magnetosphere protection.

Not that it isn't a problem, but I think heat dissipation will have the edge.

rbanffy 4 hours ago | parent | next [-]

You can also place the most sensitive electronics inside a module within the propellant tank. That also should help a lot.

convolvatron 3 hours ago | parent | prev [-]

I worked at the architecture level on designs to mitigate signal corruption inside the asic. I don't remember the exact numbers, and obviously it depends on the design. but you need to add error detection and correction on every path (busses, mixes, registers, function units, etc). the number I seem to recall was a 25% area overhead and a nominal decrease in clock rate. this was for an earth-bound very large machine, so idk if that would be sufficient for space. usually those designs have much larger nodes and much slower clock rates. primarily because of damage caused by ionizing radiation that accumulates over time.

so sure, the heat issue seems fatal, but rad-hard designs will certainly have a bottom line impact

cmrdporcupine 4 hours ago | parent | prev [-]

I mean, LLMs are already a pile of stochastic output .. maybe that just adds to the fun? /s

mikestorrent 3 hours ago | parent [-]

Finally, the classic BOFH excuse of sunspots causing issues will be true