| ▲ | palisade 4 hours ago | ||||||||||||||||||||||||||||
I've been contemplating a decentralized model training system for some time using volunteer machines that we all contribute. But, it is astronomically difficult. The communication speeds are untenable. And, there is the issue of data poisoning from untrusted nodes. I've almost cracked that last issue with a self-healing checkpointed rollback system that doesn't have to throw out anything that follows the corrupt datum. But, I'm just one person with an idea and I don't have infinite funds to make this happen. This isn't a small project. Maybe there would be interest in something like this, now that entire frontier labs are being banned from making further progress. The total power of all GPUs on the planet dwarf their capabilities, if we had a way to harness them in a distributed way efficiently. We wouldn't be able to train a Fable as fast as them, but eventually having access is better than never having access. | |||||||||||||||||||||||||||||
| ▲ | sho 2 hours ago | parent | next [-] | ||||||||||||||||||||||||||||
As I replied to a child comment - this is a nice idea that just isn't tenable in reality. AI hardware isn't just hilariously faster than consumer GPUs, it's also hilariously more power-efficient and has hilariously better connectivity. Every one of these dimensions kills the idea. The far, FAR superior power efficiency means that even if you did harness every public GPU or GPU-like device on earth, you'd end up consuming so much excess electricity it would be cheaper on net to simply take the money that would have gone to the power bill and spend it on your own datacenter. And even if electricity was free, having those GPUs spread over the world with internet-level latency will slow everything down by factors of thousands to millions - if it's feasible at all. Regardless, you're not getting fable-oss this decade, maybe even not this century. It would be better for governments to buy and own their own datacenters, maybe as a coalition, and dedicate their operation to the public good. I believe that is what we actually have to do. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | trenchgun 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
>But when people think of decentralized training, they don’t first think of gigantic datacenters, owned by the same company, training models across large distances. Instead, they imagine thousands of small datacenters, or individual consumers, pooling their spare compute over the internet to orchestrate a training run larger than any single actor could manage alone. Many companies are pursuing this vision: Pluralis Research, Prime Intellect and Nous Research have already successfully decentrally trained models at scale. But in practice, training decentrally over the internet has lagged far behind more centralized training. Even their largest models (Pluralis’ 8B Protocol Model, Prime Intellect’s INTELLECT-1, and Nous’ Consilience 40B) have been trained with 1,000x less compute than today’s frontier models (such as xAI’s Grok 4). https://epoch.ai/gradient-updates/how-far-can-decentralized-... | |||||||||||||||||||||||||||||
| ▲ | andai 24 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
>The communication speeds are untenable. Can it be parallelized or not? If you take a model, make two copies, and fine-tune each one on different data, what happens when you merge them? Does it work if you freeze different layers? I think this works if the steps are small enough. And the transfer should become tenable if the steps are big enough. Where's the cutoff? | |||||||||||||||||||||||||||||
| ▲ | girvo 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
> The total power of all GPUs on the planet dwarf their capabilities That just isn't true. It misunderstands exactly how much silicon has gone directly to those companies, and exactly how much more powerful said silicon is compared to consumer grade gear. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | whiplash451 an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
This could be of interest to you: https://thealliance.ai/projects/tapestry | |||||||||||||||||||||||||||||
| ▲ | cpdomina an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
there was a project trying to achieve some of those goals a few years ago using p2p: petals https://github.com/bigscience-workshop/petals their bloom model was also a collaborative effort https://huggingface.co/docs/transformers/en/model_doc/bloom | |||||||||||||||||||||||||||||
| ▲ | rustcleaner 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Could it be done by making a sparse MoE of thousands, or tens of thousands, of smaller experts in very niche domains? Maybe a tree-like structure of experts which can delegate from relatively general but inaccurate to extremely niche but accurate? Also these experts might be plug-and-play, easily swap out an inferior expert with a stronger one in the future without having to redo the whole pile? | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | Catloafdev 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Ya that'd be an awesome project, the only issue is how do you verify it's not being poisoned? To actually validate it would require more analysis than the training took to run. It would require a trusted network, not an open one, unless that can get solved somehow. | |||||||||||||||||||||||||||||
| ▲ | slashdave an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Well, I suppose it is understandable why you want to attack the most obvious problem with such a scheme: obtaining sufficient compute. That does mean you are actually neglecting the more difficult issues. | |||||||||||||||||||||||||||||
| ▲ | laserx 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
there are some strong open source groups like NOUS research taking the fight https://nousresearch.com/ | |||||||||||||||||||||||||||||
| ▲ | Davidzheng 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Is the total compute capacity outside of meta, google, amazon, anthropic, oai and x is higher than even the capacity of any of them? In any case, there's no chance a public collaboration gets to anthropic levels of compute even if communication were no issue. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | labbett an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Sounds like SETI@home but for AGI... SAGI@home? | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | thomasjeff1 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
I believe we are not the only ones | |||||||||||||||||||||||||||||
| ▲ | ai_fry_ur_brain 3 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||
[flagged] | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||