Remix.run Logo
lovelydata 3 days ago

llama.cpp + Qwen3-4B running on older PC with AMD Radeon GPU (Vulcan). Users connect via web UI. Usually around 30 tokens/sec. Usable.

NicoJuicy 3 days ago | parent [-]

What do they use it for? It's a very small model

embedding-shape 3 days ago | parent [-]

Autocomplete words, I'd wager, as yeah, super tiny model that can barely output coherent output in many cases.