▲ | csoham 3 days ago | |
I'm working on ScaleDown [1], a context pruning API. So over the past few years, I have seen how contexts have been steadily growing in AI apps. And while the context lengths of LLMs have also been increasing, they are still effectively about 200k tokens. The performance drops off a cliff after that (you might have noticed it as well with long AI chats). It is a simple API that prunes away irrelevant parts of a context for a given prompt, a.k.a. context-aware pruning. Integration is super simple: just an extra API call before the final LLM API call. You can get an API from the website. I would love to chat if this is something that is relevant to you and if you have any feedback on what we are building! |