Remix.run Logo
Fast KV Compaction via Attention Matching(arxiv.org)
52 points by cbracketdash 11 hours ago | 10 comments
WarmWash an hour ago | parent | next [-]

Considering the insanity of the AI arms race going on now, and the incredible sums of money be thrown at any slight advantage, is there any reason to believe that any meaningful AI breakthrough would be openly published for anyone to leverage?

542458 22 minutes ago | parent | next [-]

These folks are MIT, so citations are valuable to them. Citations convert into prestige, academic career progression, or a favorable exit from academia into industry.

Also, I don't see why you couldn't patent this if you wanted to monetize it.

mikodin 38 minutes ago | parent | prev | next [-]

I would say yes.

The reality is that the money being thrown = the time of humans. I guess compute as well, but in terms of people doing innovation - openly published things are the same thing, minus the money.

abeppu 29 minutes ago | parent | prev [-]

I do sometimes wonder -- if the transformers paper wasn't published, what would the industry be like? Would the same ideas have been put together in almost the same way weeks or months later somewhere else?

cs702 25 minutes ago | parent | prev | next [-]

This looks promising. I've added it to my reading list.

cadamsdotcom 4 hours ago | parent | prev | next [-]

Superficially it sounds like this could create a bit more of a move toward doing compaction on some continuous basis, or compacting in batches once you hit the context limit, rather than starting fresh with a summary and system prompt..

Feels like high fidelity, fast compaction could be a path to “solving” long context.

speedping 3 hours ago | parent | prev | next [-]

This is big for long-horizon tasks

esafak an hour ago | parent | prev [-]

None of the compaction accuracies look impressive.

yorwba an hour ago | parent [-]

I think matching or exceeding the original cache at 20% compacted size is fairly impressive.

esafak 23 minutes ago | parent [-]

The original cache had 70% accuracy, and the alternatives were only worse.