Remix.run Logo
elihu 3 hours ago

Do they need to be maintained? If one compute node breaks, you just turn it off and don't worry about it. You just assume you'll have some amount of unrecoverable errors and build that into the cost/benefit analysis. As long as failures are in line with projections, it's baked in as a cost of doing business.

The idea itself may be sound, though that's unrelated to the question of whether Elon Musk can be relied on to be honest with investors about what their real failure projections and cost estimates are and whether it actually makes financial sense to do this now or in the near future.

lugao 3 hours ago | parent [-]

AI clusters are heavily interconnected, the blast radius for single component failure is much larger than running single nodes -- you would fragment it beyond recovery to be able to use it meaningfully.

I can't get in detail about real numbers but it's not doable with current hardware by a large margin.