| ▲ | ipsod 2 hours ago | |
How fast is it? | ||
| ▲ | wolttam 2 hours ago | parent | next [-] | |
2000 t/s prompt processing and 40-50 t/s generation. We should see 60-70 t/s generation with DSpark support solidifying in vLLM in a few days Recent discussion on DSpark: https://news.ycombinator.com/item?id=48696585 | ||
| ▲ | 2 hours ago | parent | prev [-] | |
| [deleted] | ||