Any resources you can share for these experimental builds? This is something I was looking into setting up at some point. I'd love to take a look at examples in the wild to gauge if it's worth my time / money.

An aside, if we ever reach a point where it's possible to run an OSS 20b model at reasonable inference on a Macbook Pro type of form factor, then the future is definitely here!

▲

disambiguation 3 days ago | parent [-]

In reference to this post i saw a few weeks ago:

https://lemmy.zip/post/50193734

(Lemmy is a reddit style forum)

The author mainly demos their "custom tools" and doesn't elaborate further. But IMO is still an impressive showcase for an offline setup.

I think the big hint is "open webui" which supports native function calls.

Some more searching and i found this: https://pypi.org/project/llm-tools-kiwix/

It's possible the future is now.. assuming you have an M series with enough RAM. My sense is that you need ~1gb of RAM for every 1b paramters, so 32gb should in theory work here. I think macs also get a performance boost over other hardware due to unified memory.

Spit balling aside, I'm in the same boat, saving my money, waiting for the right time. If it isn't viable already its damn close.

	▲	stuxnet79 3 hours ago \| parent [-]
		It seems like the ecosystem around these tools has matured quite rapidly. I am somewhat familiar with Open WebUI, however, the last time I played around with it, I got the sense that it was merely a front-end to Ollama, the llm command line tool & it didn't have any capabilities outside of that. I got spooked when the Ollama team started monetizing so I ended up doing more research into llama.cpp and realized it could do everything I wanted including serve up a web front end. Once I discovered this I sort of lost interest in Open WebUI. I'll have to revisit all these tools again to see what's possible in the current moment. > My sense is that you need ~1gb of RAM for every 1b paramters, so 32gb should in theory work here. I think macs also get a performance boost over other hardware due to unified memory. This is a handy heuristic to work with, and the links you sent will keep me busy for the next little while. Thanks!