I could see a front-end / back-end split in the future where a completely on-client LLM is used to trim down the request and context before shoving the request off to the back-end.