| ▲ | LoganDark 3 hours ago | |
I feel like the computation graph could be embedded into the weights similarly to how ONNX works. Then you expose some common interfaces that except some common parameters, and additional custom ones can practically be extensions, sort of like how Wayland works. So you can support not only transformer-ish models like LLaMa, but also RNN-ish models like RWKV and also multimodal models and more. Not sure how this would be implemented in practice but it sounds like a cool idea. I just worry that if the computation graph is baked into the model file, then improvements to the architecture or optimizations that don't require changes to the weights won't be applied to existing files without a conversion. | ||