| ▲ | storus 19 hours ago | |
Does it support paged attention like vLLM though? Without that they will run into memory fragmentation quickly. | ||
| ▲ | lukebechtel 19 hours ago | parent | next [-] | |
Yes, great question! The system started without paged attention, and recreated its own paged attention implementation automatically once it realized it was a bottleneck. Pretty cool! | ||
| ▲ | 8 hours ago | parent | prev [-] | |
| [deleted] | ||