| ▲ | windexh8er 2 hours ago | ||||||||||||||||||||||||||||||||||
I think you missed the point and don't understand / aren't considerate of SLM utility. | |||||||||||||||||||||||||||||||||||
| ▲ | Kirby64 2 hours ago | parent [-] | ||||||||||||||||||||||||||||||||||
But I’m not missing the point. If you can run one frontier model at 750t/s, then you can probably run many many instances of an SLM in parallel at a rate that exceeds 15k/s. That’s kinda the point of the flash or ultrafast variants. And they’re on something much more modern than llama3.1. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||