Remix.run Logo
theapadayo 3 hours ago

IMO the biggest thing still missing is an actual way to define the model architecture outside of being hard coded into the current build. It doesn't need to be a 1:1 performance parity with the fully supported models. Having proper, vendor validated support for day 1 is what is the difference between people thinking a model is amazing vs horrible. See recent Gemma vs Qwen releases.

Not sure what the solution is, other than writing a DSL to describe the model graphs which you then embed in the GGUF. The other fallback is to just read the PyTorch modules from the official model releases and convert that to GGML ops somehow.

Philpax 2 hours ago | parent | next [-]

Yeah, I intentionally left space for the computation graph to be included in the GGUF spec in the hopes that this would be picked up by someone. I would have loved to have it in the first version, but I was prioritising getting the MVP spec out and implemented.

I'd still love to see this, but it would need a cheerleader very familiar with the current state of the GGML IR.

LoganDark 3 hours ago | parent | prev [-]

I feel like the computation graph could be embedded into the weights similarly to how ONNX works. Then you expose some common interfaces that except some common parameters, and additional custom ones can practically be extensions, sort of like how Wayland works. So you can support not only transformer-ish models like LLaMa, but also RNN-ish models like RWKV and also multimodal models and more. Not sure how this would be implemented in practice but it sounds like a cool idea. I just worry that if the computation graph is baked into the model file, then improvements to the architecture or optimizations that don't require changes to the weights won't be applied to existing files without a conversion.