▲ | kajecounterhack 4 days ago | ||||||||||||||||
Cool stuff! Is the goal of this project personal learning, inference performance, or something else? Would be nice to see how inference speed stacks up against say llama.cpp | |||||||||||||||||
▲ | nirw4nna 4 days ago | parent | next [-] | ||||||||||||||||
Thanks! To be honest, it started purely as a learning project. I was really inspired when llama.cpp first came out and tried to build something similar in pure C++ (https://github.com/nirw4nna/YAMI), mostly for fun and to practice low-level coding. The idea for DSC came when I realized how hard it was to port new models to that C++ engine, especially since I don't have a deep ML background. I wanted something that felt more like PyTorch, where I could experiment with new architectures easily. As for llama.cpp, it's definitely faster! They have hand-optimizing kernels for a whole bunch of architectures, models and data types. DSC is more of a general-purpose toolkit. I'm excited to work on performance later on, but for now, I'm focused on getting the API and core features right. | |||||||||||||||||
| |||||||||||||||||
▲ | liuliu 4 days ago | parent | prev [-] | ||||||||||||||||
Both uses cublas under the hood. So I think it is similar for prefilling (of course, this framework is too early and don't have FP16 / BF16 support for GEMM it seems). Hand-roll gemv is faster for token generation hence llama.cpp is better. | |||||||||||||||||
|