https://github.com/jundot/omlx
note: 27b is going to be slow; use the 35b MoE if you want decent token/sec speed.