▲ | mikewarot 8 days ago | |
I know the actual output of the model is wider than a token.... but I can't find it (the actual width, or number of bytes) in the source. Perhaps it's my very casual familiarity with Python that's limiting me, but I don't see any actual declarations of array sizes anywhere in the code. I'm just trying to calculate the actual bandwidth required for the full output of the model, not just a token to be handed off to the user. I need this so I can compute just what bandwidth a fully FPGA (later ASIC) based implementation of the model would result in. Edit/Append: I asked GPT-5, and it estimated:
Which sounds about right to me. This yields a maximum of about 500 logits/second on Gigabit ethernet.The actual compute of the model is peanuts compared to just shuffling the data around. |