▲ | kace91 5 days ago | ||||||||||||||||
This might be a very basic question, but as a dev whose only interaction with models is using the main commercial ones (sonnet, ChatGPT and the like), what are some usecases for these smaller local models? What usages can be reasonable to expect from them? Are there uses out of the box or does one have to go through some custom post-training to get useful behavior? I feel like there is a huge gap between understanding models as a user of commercial tools and the kind of discussions happening in these threads, but I’m not sure what are the in-between steps. | |||||||||||||||||
▲ | canyon289 5 days ago | parent | next [-] | ||||||||||||||||
Its a crucial question. I wrote up a long answer here. Let me know it helps | |||||||||||||||||
| |||||||||||||||||
▲ | ModelForge 5 days ago | parent | prev | next [-] | ||||||||||||||||
I'd say the common ones (besides educational) are - private, on-device models (possibly with lower latency than models via web API); also edge devices - algorithm research (faster and cheaper to prototype new ideas) - cheap tasks, like classification/categorization; sure, you don't need a decoder-style LLM for that, but it has the advantage of being more free-form, which is useful in many scenarios; or maybe a sanity checker for grammar; or even a router to other model (GPT-5 style) | |||||||||||||||||
▲ | barrkel 5 days ago | parent | prev | next [-] | ||||||||||||||||
Summarization, very basic tool use, without needing to go across the internet and back, and zero cost because of edge compute. | |||||||||||||||||
▲ | _giorgio_ 5 days ago | parent | prev [-] | ||||||||||||||||
Maybe also secrecy and privacy. |