| ▲ | littlestymaar 6 hours ago | |
I guess it mostly comes from using the model with batch-size = 1 locally, vs high batch size in a DC, since GPU consumption don't grow that much with batch size. Note that while a local chatbot user will mostly be using batch-size = 1, it's not going to be true if they are running an agentic framework, so the gap is going to narrow or even reverse. | ||
| ▲ | eru an hour ago | parent [-] | |
Well, different parts of the world also have different electricity prices. | ||