TL;DR: The CPU implementation was 71x faster than the FPGA.

Note: model has only 4192 parameters.

hedgehog 2 hours ago | parent | next [-]

That post is uninteresting both because they miss the point, and it's not clear a human was even involved to perceive a point to miss. Sure, with an unlimited transistor budget, power budget, and a design clocked at 4GHz fabbed on 5nm one of the best CPU design teams in the world can make a thing that is straight line faster than a one-person project running at 80MHz on a 20 year old 65nm FPGA. Any other answer would be extremely surprising.

Now, there are a bunch of interesting things about this project. Seeing the example of a tiny transformer running on FPGA is informative, and that it was apparently a pretty quick project for one person + robot assistance. Probably some transferable lessons for anyone else doing robo-FPGA development.

https://github.com/fguzman82/gateGPT/tree/main/

▲

cyanydeez 2 hours ago | parent | prev [-]

yeah, then theres prompt loading too.

but anyone who can fit QWEN-3.6 35B with a sustained ~30 token/s and ~100k context with cache could print money as a hardware vendor.

▲

wmf 2 hours ago | parent [-]

That just sounds like a 3090.

	▲	cyanydeez 18 minutes ago \| parent [-]
		not at the vram sizes that control how much context to load; also, GPUs arn't as effiecient as direct inference.