| ▲ | aimanbenbaha 2 days ago | |
Exo-Labs is an open source project that allows this too, pipeline parallelism I mean not the latter, and it's device agnostic meaning you can daisy-chain anything you have that has memory and the implementation will intelligently shard model layers across them, though its slow but scales linearly with concurrent requests. Exo-Labs: https://github.com/exo-explore/exo | ||