| ▲ | seanmcdirmid 5 hours ago | |
Running an LLM locally means you never have to worry about how many tokens you've used, and also it allows for a lot of low latency interactions on smaller models that can run quickly. I don't see why consumer hardware won't evolve to run more LLMs locally. It is a nice goal to strive for, which consumer hardware makers have been missing for a decade now. It is definitely achievable, especially if you just care about inference. | ||