| ▲ | brucehoult 2 days ago | |
> That documentation also seems to be the original source of the marketing claims that the K3 provides 60 TOPS of AI compute and can run 30B models at over 10 tokens/s. Well, sort of– as another spoiler, I can share that I hit a hard cap at an effective 3B I'm running Qwen3-Coder-30B-A3B-Instruct-Q5_K_M.gguf on mine ... picked kind of at random from a web page as I'm a complete n00b at running local LLMs, have never used anything other than ChatGPT or (mostly) Grok. I get 6-7 tok/s, which is slow for throwing around program code, but fine for general knowledge queries. It's a reasonable speed to read along as it outputs. Simple queries start to give output in about 2 seconds. e.g. I dunno ..
Output started in about 2 seconds.
Again, output starts in about two seconds.This is offline, no internet, and uses 14W while running all 8 A100 "AI" cores at max. Is this useful? I mean, for something, right? I asked it to review https://github.com/brucehoult/trv which is a total of 320 lines of code (I used `/read` on a tar file containing the two code files). It thought for 22 minutes before output started and then spent 8 minutes outputting comments at just over 6.5 tok/s. Nothing there to scare Claude, but 30 minutes total is still faster than asking a colleague for a code review, and probably more comprehensive too. And it did it on about 0.25 cents of electricity. > Turns out getting a thread onto the A100 cores requires a two-step handshake: > > write the thread’s TID to /proc/set_ai_thread (a kernel interface that unlocks scheduling on cores 8–15 for that specific thread) > then call sched_setaffinity to pin it. If you want to just run arbitrary Linux programs on the A100 cores, I wrote a small assembly language launcher which does the above PID writing and then EXECs the thing you really want.
https://github.com/brucehoult/k3_aiAs normal CPUs the eight 2-wide in-order A100 cores (like an A53 or A55 or Pentium or PPC603) add about 40% normal scalar processing power to the eight X100 cores. That's better than Hyperthreading and well worth using for some additional processing power. Just kick off a background build, or CI or something there while you do something else on the X100 cores. If you ignore the special "AI" matrix processing extension they are just perfectly normal RISC-V RVA23 cores as far as user code is concerned — and in fact significantly faster than the previous generation K1 chip. A Linux kernel build on just the A100 "AI" cores is faster than on any previous RISC-V SBC under $1000, including the HiFive Premier P550 or Milk-V Megrez. It's several times faster than the VisionFive 2 or Milk-V Jupiter / BPI-F3.
The K3 is also faster than using QEMU/Docker on my 24 core i9-13900 laptop, and while using 25W instead of 200W.Note the fastest time using a distccd on the X100 cores and another distccd on the A100 cores. This adds a lot of overhead in preprocessing and communication over the network (loopback, but still). But it still gives a pretty nice boost. But running independent tasks on each set of cores is more efficient. Or teaching `gmake` or `ninja` to distribute to two pools of cores using my `ai` launcher would be even better ... | ||