Remix.run Logo
ridethelightnin 7 hours ago

This has so many unintended consequences for LLM over the next four years I would think.

"JavaScript" tokenizes to 2 tokens (BPE). "ECMAScript" tokenizes to 3. No biggie here.

But the real cost isn't training—it's inference. Every time an LLM has to reconcile "ES6" with "JavaScript," explain the naming, or reason through "user said JavaScript but docs say ECMAScript"— Hidden chain-of-thought overhead. Clarification tokens.

Back of envelope: ~376M JS-related LLM queries/day globally. ~30% trigger some clarification overhead. That's ~5B extra tokens/day, ~1.85T tokens/year.

At ~0.000025 kWh/token inference cost, that's ~46 GWh/year.

~23,000 tonnes CO2 annually. ~200,000 tonnes over 4 years, based on rough growth of LLM use, and terms sticking around on both names over 4 years - probably wrong here too.

Sources

Token counts: OpenAI tiktoken cl100k_base encoder 2.5B ChatGPT queries/day: Sam Altman, July 2025 [1] ~4.7B total LLM interactions/day: aggregated from ChatGPT + Gemini (2B monthly AI Overviews users) + Copilot + Claude + others [2][3] JS = 62% of developers: Stack Overflow 2024 Survey [4] 8% of queries JS-related: my estimate based on language prevalence 30% clarification rate: my estimate - probably way off Energy/token: ~0.000025 kWh blended from Luccioni et al. and Patterson et al. inference estimates [5]

CO2: 0.5 kg/kWh global grid average

[1] techcrunch.com/2025/07/21/chatgpt-users-send-2-5-billion-prompts-a-day [2] demandsage.com/chatgpt-statistics [3] sqmagazine.co.uk/chatgpt-vs-google-gemini-statistics [4] survey.stackoverflow.co/2024 [5] arxiv.org/pdf/2211.02001 (BLOOM carbon footprint paper)