| ▲ | kovek 2 days ago | |
> The core idea is content-dependent selection. For each query, the model selects which parts of the sequence are worth attending to, and computes attention exactly over those positions. I don't know if this will help for things like understanding code, where the all relevant parts can be the file of 1000 lines that we are analyzing, and where every token is relevant in understanding recursion, loops, function calls, etc. This sounds like it would be great to do SSA before passing things along to a code model like claude code. Let me know if I misunderstood | ||