▲ | anonz4FWNqnX 4 days ago | |
I've had similar experiences. I've gone back and forth between running models locally and using the commercial models. The local models can be incredibly useful (gemma, qwen), but they need more patience and work to get them to work. One advantage to running locally[1] is that you can set the context length manually and see how well the llm uses it. I don't have an exact experience to relay, but it's not unusual for models to be allow longer contexts, but ignore that context. Just making the context big doesn't mean the LLM is going to use it well. [1] I've using lm studio on both a macbook air and a macbook pro. Even a macbook air with 16G can run pretty decent models. | ||
▲ | nomel 3 days ago | parent [-] | |
A good example of this was the first Gemini model that allowed 1 million tokens, but would lose track of the conversation after a couple paragraphs. |