This might be a very basic question, but as a dev whose only interaction with models is using the main commercial ones (sonnet, ChatGPT and the like), what are some usecases for these smaller local models?

What usages can be reasonable to expect from them? Are there uses out of the box or does one have to go through some custom post-training to get useful behavior?

I feel like there is a huge gap between understanding models as a user of commercial tools and the kind of discussions happening in these threads, but I’m not sure what are the in-between steps.

▲

canyon289 5 days ago | parent | next [-]

Its a crucial question. I wrote up a long answer here. Let me know it helps

https://news.ycombinator.com/item?id=44913558

▲

kace91 5 days ago | parent [-]

Thanks for the reply!

It does help to figure out where in the space this model fits. I'm still a bit confused about this part:

>since it needs to be shaped to match specific tasks, we did our best to design it to be a flexible starting point for LLM-style tasks and worked with partners to put it into the right frameworks and places for you all to be able to shape it to what you need it to be.

What does shaping mean in this case? What tools are used, what requirements are there, both in terms of hardware and knowledge?

I would like to go beyond being spoonfed by large companies' high usability products, both to improve my knowledge and not be a victim of potential future rug pulls. In the classic software world, I guess the equivalent would be someone who runs open source software navigating the extra complexity, and ocassionally collaborates with the projects.

But I don't know what that looks like in the AI world. I've gone through some courses on machine learning but learning the basics about hessian matrices and gradient descent seems as detached from the practical point I'm searching as taking a compilers class is from learning React, so I think I've been looking in the wrong places (?).

	▲	canyon289 5 days ago \| parent [-]
		> What does shaping mean in this case? What tools are used, what requirements are there, both in terms of hardware and knowledge? I'll try making an analogy to another task I like which is cooking. In cooking the chef has to make decisions like what is the overall meal going to look like, but then also detailed decisions like what the main course versus side, and even more detailed what's the proportion of side dish serving to main dish, what ingredients, how long to cook something etc. It's kind of the same with ML models, whether AI or not. When I build smaller bayesian models I make specific choices about the model architecture, which data I use, the array shape of the output etc. The tools used here are largely jax or pytorch, often in a framework like flax, or a NN higher level package. You often then pair it with libraries that which have NN optimizers, data loaders etc. Pytorch is more batteries included than the JAX ecosystem which separates these out. One of the best ways to get a grasp of all of this is implement some small models yourself. These pieces will start to be come more apparent and concrete, especially because as an end users you're not exposed to them, the same way most end users are not exposed to compilers.

▲

ModelForge 5 days ago | parent | prev | next [-]

I'd say the common ones (besides educational) are

- private, on-device models (possibly with lower latency than models via web API); also edge devices

- algorithm research (faster and cheaper to prototype new ideas)

- cheap tasks, like classification/categorization; sure, you don't need a decoder-style LLM for that, but it has the advantage of being more free-form, which is useful in many scenarios; or maybe a sanity checker for grammar; or even a router to other model (GPT-5 style)

▲

barrkel 5 days ago | parent | prev | next [-]

Summarization, very basic tool use, without needing to go across the internet and back, and zero cost because of edge compute.

▲

_giorgio_ 5 days ago | parent | prev [-]

Maybe also secrecy and privacy.