Remix.run Logo
Tuna-Fish 3 hours ago

The project is an inference framework which should support 100B parameter model at 5-7tok/s on CPU. No one has quantized a 100B parameter model to 1 trit, but this existing is an incentive for someone to do so.