| ▲ | gaeld 4 hours ago | |||||||
Follow-up reading the most technical and research people here: Monokernel deep dive (GPU Engineering): http://blog.kog.ai/building-a-single-kernel-latency-optimize... Delayed Tensor Parallelism (research): http://blog.kog.ai/delayed-tensor-parallelism-for-faster-tra... To try the speed on the playground: http://playground.kog.ai | ||||||||
| ▲ | zozbot234 35 minutes ago | parent [-] | |||||||
It looks like DTP is a distinct architectural choice that would require training new models accordingly? This wouldn't be able to just run inference for existing models. | ||||||||
| ||||||||