| ▲ | Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA(github.com) | |||||||
| 79 points by yu3zhou4 5 hours ago | 7 comments | ||||||||
| ▲ | yu3zhou4 4 hours ago | parent | next [-] | |||||||
README is in my opinion (author here) the most interesting - I wrote it to help others build useful mental model to be able to recreate the project yourself, without need to even read my code | ||||||||
| ▲ | juancn 3 hours ago | parent | prev | next [-] | |||||||
Looks interesting, it reminds me of the first llama.cpp, but better documented. | ||||||||
| ▲ | nazgulsenpai 4 hours ago | parent | prev | next [-] | |||||||
I love the documentation formatted in lessons. I can't wait to read through it. | ||||||||
| ▲ | dwa3592 3 hours ago | parent | prev | next [-] | |||||||
Very nice job on read me. >>Physically, LLM is a file which contains a lot of float numbers. aka atoms of the LLM. | ||||||||
| ||||||||
| ▲ | cookiengineer 3 hours ago | parent | prev | next [-] | |||||||
Wanted to add that the author has an amazing blog with lots of interesting papers: https://jedrzej.maczan.pl/ | ||||||||
| ▲ | einpoklum 3 hours ago | parent | prev [-] | |||||||
It seems the author believes checking the return values of CUDA API calls is not "tiny" enough :-( | ||||||||