Remix.run Logo
phainopepla2 3 hours ago

Why not just use DS V4 Flash for the small stuff? Very fast and extremely cheap.

ngxson 2 hours ago | parent [-]

The dsv4 flash is 158B params in total. It is possible to run locally but will require all my system RAM.

Also, a lot of my day-to-day tasks perform the same on both small and bigger models: summarize a web page, draft a response, translations, quick web search, etc.

phainopepla2 2 hours ago | parent [-]

Sorry, I meant non-locally.

I'm assuming privacy is not a concern since you mentioned using Deepseek already. The cost of V4 Flash for small tasks is so minuscule as to be almost free, and you don't have to deal with a churning laptop (or even buying a high-end laptop, for someone who doesn't already have one).

I guess what I'm really asking is, what's the advantage of using these small local models if privacy isn't a concern?

ngxson 2 hours ago | parent [-]

I do use both DSv4 the "normal" and the flash variant, non-locally. It works well, not exceptionally. And while it's cheap, I'd say that the difference between $1 per month vs $5 per month is not a big concern to me. IMO pricing is pretty competitive among open-weight models: https://huggingface.co/inference/models

Depending on use cases, but for me I found 2 use cases where a local model is a must and not optional:

- Running offline without internet access: for example, I have this project that allow transcribe and summarize audio in real time. I already used it in some events where wifi is not available: https://github.com/ngxson/llama.cpp-realtime-audio-recap

- Handle private personal data, for example health records. This is the same category of "privacy" that you mentioned, but I just want to bring up the fact that people value their privacy differently.