If folks are interested, @antirez has opened a C implementation of Voxtral Mini 4B here: https://github.com/antirez/voxtral.c

I have my own fork here: https://github.com/HorizonXP/voxtral.c where I’m working on a CUDA implementation, plus some other niceties. It’s working quite well so far, but I haven’t got it to match Mistral AI’s API endpoint speed just yet.

▲

kingreflex 7 hours ago | parent | next [-]

hey,

how does someone get started with doing things like these (writing inference code/ cuda etc..). any guidance is appreciated. i understand one doesn't just directly write these things and this would require some kind of reading. would be great to receive some pointers.

	▲	briandw 3 hours ago \| parent \| next [-]
		These are good lectures and there is also a discord. https://github.com/gpu-mode/lectures
	▲	Kilenaitor 6 hours ago \| parent \| prev [-]
		Same! Would love any resources. I'm interested more in making models run vs making the models themselves :)

▲

Ygg2 13 hours ago | parent | prev [-]

There is also another Mistral implementation: https://github.com/EricLBuehler/mistral.rs Not sure what the difference is, but it seems to be just be overall better received.

	▲	NitpickLawyer 12 hours ago \| parent [-]
		mistral.rs is more like llama.cpp, it's a full inference library written in rust that supports a ton of models and many hardware architectures, not just mistral models.