| ▲ | Aurornis 5 hours ago | |
I think you missed the part where this is a lossy technique that reduces performance. The image trick reduces context because it’s lossy. The README says you can’t use it for anything needing exact recall. It produces a gist of the input. You could achieve something similar by using a small, cheap model to pre-summarize information for the expensive LLM. This is what many people do already and it’s a much better way to do it for most situations. | ||