▲ | conradev 9 days ago | ||||||||||||||||
Are you using Ollama or LMStudio/llama.cpp? https://x.com/ggerganov/status/1953088008816619637 | |||||||||||||||||
▲ | diggan 9 days ago | parent [-] | ||||||||||||||||
> LMStudio/llama.cpp Even though LM Studio uses llama.cpp as a runtime, the performance differs between them. With LM Studio 0.3.22 Build 2 with CUDA Llama.cpp (Linux) v1.45.0 runtime I get ~86 tok/s on a RTX Pro 6000, while with llama.cpp compiled from 1d72c841888 (Aug 7 10:53:21 2025) I get ~180 tok/s, almost 100 more per second, both running lmstudio-community/gpt-oss-120b-GGUF. | |||||||||||||||||
|