Remix.run Logo
storus 19 hours ago

Does it support paged attention like vLLM though? Without that they will run into memory fragmentation quickly.

lukebechtel 19 hours ago | parent | next [-]

Yes, great question!

The system started without paged attention, and recreated its own paged attention implementation automatically once it realized it was a bottleneck.

Pretty cool!

8 hours ago | parent | prev [-]
[deleted]