| ▲ | zozbot234 a day ago |
| Will these space-based data centers run on rad-hard silicon (which is dog slow compared to anything on Earth) or just silently accept wrong results, hardware lockups and permanent failure due to the harsh space environment? Will they cool that hardware with special über-expensive high-temperature Peltiers that heat the radiators up to visible incandescence so that the heat can be shed with any efficiency? There's zillions of those issues. The whole idea is just bonkers. |
|
| ▲ | eldenring 20 hours ago | parent | next [-] |
| Google did a study with their TPU v6 > For ML accelerators to be effective in space, they must withstand the environment of low-Earth orbit. We tested Trillium, Google’s v6e Cloud TPU, in a 67MeV proton beam to test for impact from total ionizing dose (TID) and single event effects (SEEs).
>
> The results were promising. While the High Bandwidth Memory (HBM) subsystems were the most sensitive component, they only began showing irregularities after a cumulative dose of 2 krad(Si) — nearly three times the expected (shielded) five year mission dose of 750 rad(Si). No hard failures were attributable to TID up to the maximum tested dose of 15 krad(Si) on a single chip, indicating that Trillium TPUs are surprisingly radiation-hard for space applications. |
| |
|
| ▲ | kragen a day ago | parent | prev | next [-] |
| At Satellogic, we famously flew mostly just regular cellphone parts on orbit. We did have higher rates of various kinds of failures than is usual on Earth, but hardware failure can generally be masked by software redundancy. |
| |
| ▲ | klysm a day ago | parent [-] | | RAM corruption is not cheap to protect against | | |
| ▲ | kragen 20 hours ago | parent | next [-] | | You need parity, which is cheap, or lockstep duplexing, which isn't. Or, you know, sometimes you can just restart malfunctioning processes and repair corrupted filesystems while you run the failed tasks again on another node. | |
| ▲ | Onavo 21 hours ago | parent | prev [-] | | At today's prices perhaps, but pre ChatGPT you just have to run more of it + more error correction. Not great for the power budget but not anything significant in the grand scheme of things. |
|
|
|
| ▲ | wmf a day ago | parent | prev | next [-] |
| rad-hard silicon ... or just silently accept wrong results, hardware lockups and permanent failure Somehow I don't think those are the only options. AFAIK Starlink is using a lot of non-rad-hard silicon already. |
| |
| ▲ | danpalmer a day ago | parent | next [-] | | Starlink is however operating at ~500km where radiation is less of a concern, but where the lifetime of a satellite is only 2-3 years. The unit economics of orbital GPUs suggest that we'll need to run them for much longer than that. This is actually one of the few good points of orbital data centers, normally older hardware is cycled out because it's not economic to run anymore due to power efficiency improvements, but if your power is "free" and you've already got sufficient solar power onboard for the compute, you can just keep running old compute as long as you can keep the satellite up there. | | |
| ▲ | wmf a day ago | parent [-] | | I think they last 2-3 years after they run out of argon fuel, so more like 7-8 years total. It looks like some Starlinks from Nov 2019 are still operational. | | |
| ▲ | danpalmer a day ago | parent | next [-] | | My understanding was that anything at ~500km needed readjustments every few months in order to not come down. Much less than 2-3 years. I'd be interested to know what the average lifespan or failure rate of Starlink has been. That's good that some are still up there 6+ years later, but I know many aren't. I'm not sure how many of those ran out of fuel, had hardware failures, or were simply obsolete, but an AFR would be interesting to see. | | | |
| ▲ | perihelions a day ago | parent | prev [-] | | Or in theory, indefinitely, https://news.ycombinator.com/item?id=16527007 ("First firing of air-breathing electric thruster (esa.int)" (2018)) |
|
| |
| ▲ | johnsmith1840 a day ago | parent | prev | next [-] | | My understanding is non rad hardened method get around this by basically doubling or some multiple of repeating calculations and chexking data often. Random errors will occur you just need to be checking fast enough to fix and update that bad bit flip. I am sure there's all sorts of fun algorithms in this space but I am under the impression there is SOME tax to doing this. What is the tax? Is it 10% ir 60% I have no idea would love to know! | | |
| ▲ | marcosdumay a day ago | parent | next [-] | | Why make a GW datacenter on the ground if you can make two and pay to launch them into space? | |
| ▲ | danpalmer a day ago | parent | prev [-] | | There's more than that, it's possible to get permanent hardware damage from radiation at smaller (modern standard) process sizes. | | |
| ▲ | johnsmith1840 a day ago | parent [-] | | I didn't think about that, so yeah, basically space based compute centers are just hype on top of hype. |
|
| |
| ▲ | notahacker a day ago | parent | prev | next [-] | | Your other options of fault tolerance typically achieved by doing everything at least twice and being willing to reboot (and accepting attrition from total ionizing radiation) or lots of shielding are fine for building functioning space hardware but suboptimal for building datacentre business models... | |
| ▲ | enderfusion 21 hours ago | parent | prev | next [-] | | The radiation effects on the silicon solar cells is often underestimated, it's not just the GPUs! | |
| ▲ | tekno45 a day ago | parent | prev [-] | | they throw those satellites to a fiery doom on a regular cadence. |
|
|
| ▲ | Symmetry 12 hours ago | parent | prev | next [-] |
| It's very important in this case to specify which orbit the satellite is going to be in. If you're in LEO like the international space station you spend all day inside the Van Allen Belt protected from all those charged particles that the sun is pumping out. You're still lacking the atmosphere's protection from cosmic rays but that's not a huge dosage. If you go out to MEO then suddenly you're outside that protective magnetic shield and you have to deal with charged particles smashing into you and you want a large mass of water or wax shielding if you don't have radiation tolerant electronics. SSO, a low earth orbit whose plane is perpendicular to the direction of the sun so it gets constant sunlight, is harsher than normal LEO orbits because it passes over the poles where the protection from the Earth's magnetic field is weakest, but it's still a lot better than higher orbits. This is probably where you want a datacenter to get constant sunlight and as much protection as possible. |
|
| ▲ | Fomite a day ago | parent | prev | next [-] |
| The LLMs they hope to have in those data centers already silently accept wrong results. |
|
| ▲ | recursivecaveat 19 hours ago | parent | prev | next [-] |
| Say what you will about the data centers in space idea (I think it's transparently stupid), but ML is generally resistant to random undirected noise. It's almost a requirement by definition that a machine which takes pictures and accurately outputs the probability that they are pelicans has to be pretty robust to nigh-infinite amounts of minor variation. That's part of the reason all the super low precision stuff works. It's only in the control logic or maybe the absolute precise chokepoints of computation where flips are dangerous, so most of them are harmless. |
|
| ▲ | JumpCrisscross a day ago | parent | prev | next [-] |
| At this scale could you do shielding? |
| |
| ▲ | mjhay a day ago | parent | next [-] | | Orbital data centers are impractical for a lot of reasons (to put it mildly) but radiation shielding isn’t one of them. Proportionally less shielding is needed as one scales up, due to lower surface/volume ratios. | |
| ▲ | inejge 21 hours ago | parent | prev | next [-] | | There are ways in which shielding in space can do harm: really energetic particles get trapped and produce a shower of daughter particles and rays over a greater area. So you'd need even more shielding. Or you accept that such things will happen and use rad-hard parts, redundancy etc. When you have the whole atmosphere above, it's much less of a concern. Besides, that's even more mass to be lofted. Pushing the economics further into the ludicrous end. | |
| ▲ | ted_dunning a day ago | parent | prev [-] | | Sure. At the cost of lofting that shielding from the ground and taking the economics from 500x to 2000x crazy. |
|
|
| ▲ | rsynnott 19 hours ago | parent | prev | next [-] |
| > or just silently accept wrong results Silently wrong results are very fashionable these days, you know. Deterministic results are very 2010s. |
|
| ▲ | turtletontine a day ago | parent | prev [-] |
| > Will… I think “won’t”. I could be wrong of course, but I imagine efforts to put servers into orbit will die before anything is launched. It’s just a bad idea. Maybe a few grifters will make bank taking suckers’ money before it becomes common knowledge that this is stupid, but I will be genuinely surprised if real servers with GPUs are launched. I don’t mean to be facetious here. But saying “will” is treating it as inevitable that this will happen, which is how the grifters win. |