Remix.run Logo
DesaiAshu 3 days ago

data bandwidth limits distributed training under current architectures. really interesting implications if we can make progress on that

dogcomplex 2 days ago | parent | next [-]

Limits but doesn't prohibit. See https://www.primeintellect.ai/blog/intellect-3 - still useful and can scale enormously. Takes a particular shape and relies heavily on RL, but still big.

andoando 2 days ago | parent | prev [-]

What bandwith limits? Im assuming the forward and backward passes have to be done sequentially?

DesaiAshu 16 hours ago | parent [-]

Yes also passing data within each layer