Remix.run Logo
segmondy 6 hours ago

Not applicable... the models just process whatever context you provide to them, context management happens outside of the model and depends on your inference tool/coding agent.

cyanydeez 4 hours ago | parent [-]

It's interesting how people can be so into LLMs but dont, at the end of the day, understand they're just passing "well formatted" text to a text processor and everything else is build around encoding/decoding it into familiar or novel interfaces & the rest.

The instability of the tooling outside of the LLM is what keeps me from building anything on the cloud, because you're attaching your knowledge and work flow to a tool that can both change dramatically based on context, cache, and model changes and can arbitrarily raise prices as "adaptable whales" push the cost up.

Its akin to learning everything about beanie babies in the early 1990's and right when you think you understand the value proposition, suddenly they're all worthless.

storus 2 hours ago | parent [-]

That's why you can use latest open coding models locally that reportedly reached the performance of Sonet-4.5 so almost SOTA. And then you can think of tricks like I mentioned above to directly manipulate GPU RAM for context cleanup when needed which is not possible with cloud models unless their provider enables that.