Remix.run Logo
mikewarot 8 days ago

I know the actual output of the model is wider than a token.... but I can't find it (the actual width, or number of bytes) in the source. Perhaps it's my very casual familiarity with Python that's limiting me, but I don't see any actual declarations of array sizes anywhere in the code.

I'm just trying to calculate the actual bandwidth required for the full output of the model, not just a token to be handed off to the user.

I need this so I can compute just what bandwidth a fully FPGA (later ASIC) based implementation of the model would result in.

Edit/Append: I asked GPT-5, and it estimated:

  Total bytes = 50,000 tokens × 4 bytes/token = 200,000 bytes
Which sounds about right to me. This yields a maximum of about 500 logits/second on Gigabit ethernet.

The actual compute of the model is peanuts compared to just shuffling the data around.