Remix.run Logo
jjani 3 days ago

Here's what happened:

1. Google rolled our AI summaries on all of their search queries, through some very tiny model 2. Given worldwide search volume, that model now represents more than 50% of all queries if you throw it on a big heap with "intentional" LLM usage 3. Google gets to claim "the median is now 33x lower!", as the median is now that tiny model giving summaries nobody asked for

It's very concerning that this marketing puff piece is being eaten up by HN of all places as evidenced by the other thread.

Google is basing this all of "median" because there's orders of magnitudes difference betwen strong models (what most people think of when you talk AI) and tiny models, which Google uses "most" by virtue of running them for every single google search to produce the summaries. So the "median" will be whatever tiny model they use for those models. Never mind that Gemini 2.5 Pro, which is what everyone here would actually be using, may well consume >100x much.

It's absurdly misleading and rather obvious, but it feels like most are very eager to latch on to this so they can tell themselves their usage and work (for the many here in AI or at Google) is all peachy. I've been reading this place for years and have never before seen such uncritical adoption of an obvious PR piece detached from reality.

raincole 3 days ago | parent | next [-]

It's not what the report says.

> It's very concerning that this marketing puff piece is being eaten up by HN of all places as evidenced by the other thread.

It's very concerning that you can just make shit up on HN and be the top comment as long as it's to bash Google.

> Never mind that Gemini 2.5 Pro, which is what everyone here would actually be using, may well consume >100x much

Yes, exactly, never mind that. The report is to compare against a data point from May 2024, before Gemini 2.5 Pro became a thing.

3 days ago | parent | next [-]
[deleted]
latexr 3 days ago | parent | prev | next [-]

> make shit up on HN and be the top comment as long as it's to bash Google.

I don’t think that’s fair. Same would’ve happened if it were Microsoft, or Apple, or Amazon. By now we’re all used to (and tired) of these tech giants lying to us and being generally shitty. Additionally, for decades we haven’t been able to trust reports from big companies which say “everything is fine, really” when they publish it themselves, about themselves, contradicting the general wisdom of something bad they’ve been doing. Put those together and you have the perfect combination; we’re primed to believe they’re trying to deceive us again, because that’s what happens most of the time. It has nothing to do with it being Google, they just happened to be the target this time.

ksec 2 days ago | parent | prev [-]

>It's very concerning that you can just make shit up on HN and be the top comment as long as it's to bash Google.

Off topic. I wanted to say somewhat counterintuitively I often upvote / submit things I disagree with and dont downvote it as long as sub comments offer a good counter argument or explanation.

Sometimes being top just meant that is what most people are thinking, and it being wrong and corrected is precisely why I upvote it and wish it stayed on top so others can learn.

mgraczyk 3 days ago | parent | prev | next [-]

As others have pointed out, this is false. Google has made their models and hardware more efficient, you can read the linked report. Most of the efficiency comes from quantization, MoE, new attention techniques, and distillation (making smaller models useable in place of bigger models)

jjani 3 days ago | parent | next [-]

- The report doesn't name any Gemini models at all, only competitors'. Wonder why that is? If the models got so much more efficient, they'd be eager to show this.

- The report doesn't name any averages (means), only medians. Why oh why would they be doing this, when all other marketing pieces always use the average because outside of HN 99% of Joes on the street have no idea what a median is/how it differs from the mean? The average is much more relevant here when "measuring the environmental impact of AI inference".

- The report doesn't define what any of the terms "Gemini Apps", "the Gemini AI assistant" or "Gemini Apps text prompt" concretely mean

jsnell 2 days ago | parent | next [-]

The report also doesn't define what the word "AI" means. What are they trying to hide?!

In reality, we know what Google means by the term "Gemini Apps", because it's a term they've had to define for e.g. their privacy policies[0].

> The Gemini web app available through gemini.google.com and browser sidebars

> The Gemini mobile apps, which include:

> The Gemini app, including as your mobile assistant, on Android. Note that Gemini is hosted by the Google app, even if you download the Gemini app.

> The Gemini app on iOS

> Gemini in the Google Messages app in specific locations

> The Gemini in Chrome feature. Learn more about availability.

That established definition does not include AI summaries (actually AI Overviews) on search like you very claimed. And it's something where Google probably is going to be careful -- the "Gemini Apps" name is awkward, but they need a name that distinguishes these use cases from other AI use cases with different data boundaries / policies / controls.

If the report was talking about "Gemini apps", your objection might make sense.

[0] https://support.google.com/gemini/answer/13594961?hl=en

jjani 2 days ago | parent [-]

It's very strange that we'd have to dive into their privacy policy to get a clear definition of it, but good spot.

The rest stands though - no models, no averages. User tovej below put it better than I did:

> The median does not move if the upper tail shifts, it only moves if the median moves.

> The fact that they do not report the mean is concerning. The mean captures the entire distribution and could actually be used to calculate the expected value of energy used.

> The median only tells you which point separates the upper half from the lower half, if you don't know anything else about the distribution you cannot use it for any kind of analysis

49% of queries could be costing 1000x that median. Stats 101 combined with a sliver of critical reading reveals this report isn't worth the bytes it's taking up.

scott_w 3 days ago | parent | prev [-]

To be fair, the report explains their reasoning: they state the mean is too sensitive to outliers.

Now, I do agree it would have been nice to demonstrate this, however it could be genuine.

jjani 3 days ago | parent [-]

That's a complete cop out. They didn't give the data to back this up.

scott_w 2 days ago | parent [-]

They definitely should have shown an example, or referenced something else that backs up their claim. I think it was you who made the good point that, when it comes to data usage, the mean may well give you more meaningful information because of the outliers!

I can see the median being useful for answering what the cost of one more server/agent/whatever would be, but that’s not what this paper is asking.

oulipo2 2 days ago | parent | prev [-]

sure, but the issue is if you make the model 30x more efficient, but you use it 300x more often (mostly for stuff nobody wants), it's still a net loss

mgraczyk 2 days ago | parent [-]

Would you say that computers are less efficient now than they were in the 90s because they are more widely used?

jononor 2 days ago | parent [-]

Not less efficient. But the impact on resources usage is still higher. Of course the impact in terms of positive effects is also higher. So the cost/benefit may also have gone up.

shwaj 3 days ago | parent | prev | next [-]

Are you sure? It wouldn’t shock me, but they specifically say “Gemini Apps”. I wasn’t familiar with the term, but a web search indicated that it has a specific meaning, and it doesn’t seem to me like web search AI summaries would be covered by it. Am I missing something?

user568439 3 days ago | parent | prev | next [-]

"It's very concerning that this marketing puff piece is being eaten up by HN of all places as evidenced by the other thread."

It's very concerning that you claim this without previously fully reading and understanding Google's publication...

tobr 3 days ago | parent | prev | next [-]

I’ve dramatically reduced my median calories per meal, by scheduling eight new meals a day, each consisting of one lettuce leaf.

jsnell 2 days ago | parent | prev | next [-]

I know there's a lot of rebuttals to this statement already, but I think there's a simpler way of showing it is incorrect:

Figure 2 in the paper shows the LMArena score of whatever model is used for "median" Gemini query. That score is consistent with Gemini Flash (probably 2.0, given the numbers are from May), not a "tiny model" used for summaries nobody is asking for.

RajT88 3 days ago | parent | prev | next [-]

Big tech seems all about the fluff.

But, wasn't it always so?

Wasn't it always so in business of all kinds?

Why should we expect anything different? We should have been skeptical all along.

camillomiller 3 days ago | parent [-]

I’ve been covering tech for 20 years. No, it wasn’t always like that. There was a sincere mutual respect between the companies and the media industry that I don’t see anymore. Both sides have their fault, but you know it’s not media that huperscaled and created gazillionaires by the score. Also, software is way more bendable to the emperors’ whims, and Google has become particularly hypocritical in the way it publicly represent itself.

rbinv 2 days ago | parent [-]

Agreed. Big tech is trying to become the media industry.

3 days ago | parent | prev | next [-]
[deleted]
3 days ago | parent | prev | next [-]
[deleted]
jonas21 3 days ago | parent | prev [-]

What exactly are you basing this assertion on (other than your feelings)? Are you accusing Google of lying when they say in the technical report [1]:

> This impact results from: A 33x reduction in per-prompt energy consumption driven by software efficiencies—including a 23x reduction from model improvements, and a 1.4x reduction from improved machine utilization.

followed by a list of specific improvements they've made?

[1] https://services.google.com/fh/files/misc/measuring_the_envi...

esperent 3 days ago | parent [-]

Unless marketing blogs from any company specifically say what model they are talking about, we should always assume they're hiding/conflating/mislabeling/misleading in every way possible. This is corporate media literacy 101.

The burden of proof is on Google here. If they've reduced gemini 2.5 energy use by 33x, they need to state that clearly. Otherwise a we should assume they're fudging the numbers, for example:

A) they've chosen one particular tiny model for this number

or

B) it's a median across all models including the tiny one they use for all search queries

EDIT: I've read over the report and it's B) as far as I can see

Without more info, any other reading of this is a failing on the reader's part, or wishful thinking if they want to feel good about their AI usage.

We should also be ready to change these assumptions if Google or another reputable party does confirm this applies to large models like Gemini 2.5, but should assume the least impressive possible reading until that missing info arrives.

Even more useful info would be how much electricity Google uses per month, and whether that has gone down or continued to grow in the period following this announcement. Because total energy use across their whole AI product range, including training, is the only number that really matters.

mquander 3 days ago | parent | next [-]

You should not assume that "they've chosen one particular tiny model", or "it's a median across all models including the tiny one they use for all search queries" because those are totally made up assumptions that have nothing to do with what they say they measured. They measured the Gemini Apps product that completes text prompts. They also provided a chart showing that the thing they are measuring scores comparably to GPT-4o on LM Arena.

penteract 3 days ago | parent [-]

From the report:

> To calculate the energy consumption for the median Gemini Apps text prompt on a given day, we first determine the average energy/prompt for each model, and then rank these models by their energy/prompt values. We then construct a cumulative distribution of text prompts along this energy-ranked list to identify the model that serves the 50-th percentile prompt.

They are measuring more than one model. I assume this statement describes how they chose which model to report the LM arena score for, and it's a ridiculous way to do so - the LM arena score calculated this way could change dramatically day-to-day.

mgraczyk 3 days ago | parent | prev | next [-]

> total energy use across their whole AI product range, including training, is the only number that really matters.

What if they are serving more requests?

mgraczyk 3 days ago | parent | prev [-]

They did specifically say in the linked report

esperent 3 days ago | parent [-]

Here's the report. Could you tell me where in it you found a link to 33x reduction (or any large reduction) for any specific non-tiny model? Because all I can find is lots of references to "median Gemini". In fact, I would say they're being extremely careful in this paper not to mention any particular Google models with regards to energy reduction.

https://services.google.com/fh/files/misc/measuring_the_envi...

mgraczyk 3 days ago | parent [-]

Figure 4

I think you are assuming we are talking about swapping API usage from one model to another. That is not what happened. A specific product doing a specific thing uses less energy now.

To clarify: the way models become more efficient is usually by training a new one with a new architecture, quantization, etc.

This is analogous to making a computer more efficient by putting a new CPU in it. It would be completely normal to say that you made the computer more efficient, even though you've actually swapped out the hardware.

sigilis 3 days ago | parent | next [-]

Don’t they call all their LLM models Gemini? The paper indicates that they specifically used all the AI models to come up with this figure when they describe the methodology. It looks like they even include classification and search models in this estimate.

I’m inclined to believe that they are issuing a misleading figure here, myself.

mgraczyk 3 days ago | parent | next [-]

They reuse the word here for a product, not a model. It's the name of a specific product surface. There is no single model and the models used change over time and for different requests

immibis 3 days ago | parent [-]

So it includes both tiny models and large models?

mgraczyk 3 days ago | parent [-]

I would assume so. One important trend is that models have gotten more intelligent for the same size, so for a given product you can use a smaller model.

Again this is pretty similar to how CPUs have changed

immibis 2 days ago | parent [-]

So it's not a specific product doing a specific thing, but the average across different things?

simianwords 3 days ago | parent | prev [-]

“Gemini App” would be the specific Gemini App in the App Store. Why would it be anything different?

esperent 3 days ago | parent | prev [-]

> Figure 4: Median Gemini Apps text prompt emissions over time—broken down by Scope 2 MB emissions (top) and Scope 1+3 emissions (bottom). Over 12 months, we see that AI model efficiency efforts have led to a 47x reduction in the Scope 2 MB emissions per prompt, and 36x reduction in the Scope 1+3 emissions per user prompt—equivalent to a 44x reduction in total emissions per prompt.

Again, it's talking about "median Gemini" while being very careful not to name any specific numbers for any specific models.

logicprog 3 days ago | parent | next [-]

You're grouping those words wrong. As another commenter pointed out to you, which you ignored, it's median (Gemini Apps) not (median Gemini) Apps. Gemini Apps is a highly specific thing — with a legal definition even iirc — that does not include search, and encompasses a list of models you can actually see and know.

esperent 2 days ago | parent [-]

I didn't ignore it, I actually spent some time researching to find out what Google means by "Gemini Apps" (plural) and whether it includes search AI overview, and I can't get a clear answer anywhere.

Of course, Gemini App (singular) means the mobile app. But it seems that the term Gemini Apps (plural) is being used by Google to refer to any way in which users can access the Gemini models, and also they do clearly state that a version of Gemini isused to generate the search overviews.

So it still seems reasonably likely, until they confirm otherwise, that this median includes search overview.

simianwords 2 days ago | parent [-]

"This section presents the environmental impact metrics for the Gemini Apps AI assistant" is this also not specific enough?

esperent a day ago | parent [-]

No, because unless they state otherwise we should assume that they consider search overview to be an AI assistant (they definitely believe this) and also that it's one of the Gemini Apps.

Look, there's not enough information to answer this within the paper. I'm not willing to give Google the benefit of the doubt on vague language, and you are. I'm assuming they're a huge basicappy evil corporation whose every publication is gone over and reworded by marketing to make them look good, and you're assuming... whatever.

That's fine by me, we disagree. Let's stop here.

simianwords 3 days ago | parent | prev | next [-]

What do you think the Gemini app means? It can only mean the consumer facing actually existing Gemini App that exposes 2 models.

esperent a day ago | parent [-]

They refer to Gemini Apps, plural. One of those apps is also called the Gemini App, singular.

mgraczyk 3 days ago | parent | prev [-]

That isn't what that means. Look at the paragraph above that where they explain.

This is the median model used to serve requests for a specific product surface. It's exactly analogous to upgrading the CPU in a computer over time

tovej 3 days ago | parent | next [-]

The median does not move if the upper tail shifts, it only moves if the median moves.

The fact that they do not report the mean is concerning. The mean captures the entire distribution and could actually be used to calculate the expected value of energy used.

The median only tells you which point separates the upper half from the lower half, if you don't know anything else about the distribution you cannot use it for any kind of analysis.

esperent 3 days ago | parent | prev [-]

I can't copy text from that pdf on my phone, but the paragraph above says exactly what you'd expect: they're using a "median" value from a "typical user" across all Gemini models. While being very careful not to list the specific models which are used to calculate this median, because it almost certainly includes the tiny model used to show AI summaries on google.com, which would massively skew the median value. As someone above said, it's like adding 8 extra meals of a single lettuce leaf and then claiming you reduced the median caloric intake of your meals.

simianwords 3 days ago | parent [-]

This doesn’t check out. It is not reasonable to interpret “Gemini app” as also including a functionality that is embedded in google searches.

Gemini app is a specific thing: the Gemini App that actually exists.

How can Gemini App also include their internal augmented functionality on search which itself is not an application?

tupshin 3 days ago | parent [-]

If I, as a regular Google user ask in the search "is this search powered by Gemini?", the AI generated result is in the affirmative.

"Yes, this search is powered by a customized version of the Gemini model for its generative AI features."

Based on that, I'm not sure how it is reasonable to claim that Gemini App has a legal term that is exclusive of its use in search.

Amusingly, it refuses to answer if i ask "is this search powered by Gemini app?"

simianwords 3 days ago | parent [-]

What? The paper clearly says "This section presents the environmental impact metrics for the Gemini Apps AI assistant". You are going through lots of hoops instead of just reading the paper.