Remix.run Logo
Show HN: TRiP – a complete transformer engine in C built from scratch just by me(github.com)
30 points by carlovalenti 7 hours ago | 6 comments
upupupandaway 2 hours ago | parent | next [-]

Any data on performance?

carlovalenti 2 hours ago | parent [-]

Good question, I hope the answer doesn't disappoint you too much:

1. I made no benchmark comparisons to other existing projects

2. TRiP is CPU-only

3. the matmul kernel is not hand-optimized. I have some experience in this, and made several attempts, but could not achieve significant improvement against simple gcc13 -Ofast, so I decide to leave it readable, and just moved forward. The only optimization hint left is probably the directive to align allocs to the size of the cache line. I considered adding flash attention, but CPU memory hierarchy does not benefit at the same level of GPUs. I considered (shortly) using optimized libraries, but actually I got bad results, and still - that was not my main focus (learning the transformer architecture in the details).

This does not mean that TRiP is horribly slow! Keeping the kernel straight-forward, plus the alignment thing, should help the optimizer to make its fair use of unrolls, strides and vectorization. If you have any suggestion to improve it (and for sure there's room for improvement), I'd be glad to get it, and if this does not complicate the things up to the point of messing up with the educational purpose, I could put it in! Thank you for your interest!

devlsx 6 hours ago | parent | prev | next [-]

thats super cool congrats on the nice project and for not using some ai bot for it

carlovalenti 3 hours ago | parent [-]

Thank you! I hope that's useful for study and understanding

thenewguy077 3 hours ago | parent | prev [-]

This looks AI generated code, is it?

carlovalenti 2 hours ago | parent [-]

It's not, except some utilities (json parser and picture handling). Please refer to the readme for full details. The main purpose was to understand the internals of transformer models by coding an engine from the ground up, so using AI to generate the machine learning related code would have made the whole project pointless. The most exciting moments: when I first ran successfully a decode; when I managed to fine tune a Gemma model by having it learn things about me; when Paligemma boxed correctly a bee in a picture I presented to it.