| ▲ | pixelpoet 3 hours ago | |
IIRC llama.cpp doesn't implement DSv4's compressed attention mechanism, and while it does use (credited) parts of llama.cpp, it's focused on this great model for now. Much of this is covered better in the repo's readme. | ||
| ▲ | rnewme 14 minutes ago | parent [-] | |
In repo Readme and antirez reddit comments there was also expressed willingness to upstream. | ||