Remix clone Hacker News

new | show | ask | jobs Github

	▲	mikewarot 8 days ago
		I know the actual output of the model is wider than a token.... but I can't find it (the actual width, or number of bytes) in the source. Perhaps it's my very casual familiarity with Python that's limiting me, but I don't see any actual declarations of array sizes anywhere in the code. I'm just trying to calculate the actual bandwidth required for the full output of the model, not just a token to be handed off to the user. I need this so I can compute just what bandwidth a fully FPGA (later ASIC) based implementation of the model would result in. Edit/Append: I asked GPT-5, and it estimated: `Total bytes = 50,000 tokens × 4 bytes/token = 200,000 bytes` Which sounds about right to me. This yields a maximum of about 500 logits/second on Gigabit ethernet. The actual compute of the model is peanuts compared to just shuffling the data around.