| ▲ | banana_giraffe 13 hours ago | |
A quick benchmark using float32 copies using torch cuda->cuda copies, comparing some random machines:
This is a "eh, it works" benchmarks, but should give you a feel for the relative performance of the different systems.In practice, this means I can get something like 55 tokens a sec running a larger model like gpt-oss-120b-Q8_0 on the DGX Spark. | ||
| ▲ | ekropotin 12 hours ago | parent [-] | |
Nice! Thanks for that. 55 t/s is much better than I could expect. | ||