| ▲ | syntaxing 6 hours ago | ||||||||||||||||||||||
This is a very interesting strategy that might pay off. This model is a very good option for enterprise self host. I would argue a lot of companies are VRAM constrained rather than compute constrained. You could fit 4-5 running instances on one H100 cluster where you can only fit 1-2 Kimi K2 or GLM5. | |||||||||||||||||||||||
| ▲ | 2001zhaozhao 6 hours ago | parent | next [-] | ||||||||||||||||||||||
This is 128B dense though. the K/V cache on long context is going to be massive | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | sayYayToLife 5 hours ago | parent | prev [-] | ||||||||||||||||||||||
[dead] | |||||||||||||||||||||||