Remix.run Logo
Computer use in Gemini 3.5 Flash(blog.google)
43 points by swolpers an hour ago | 14 comments
mlmonkey 39 minutes ago | parent | next [-]

It's funny how in their own graph, https://storage.googleapis.com/gweb-uniblog-publish-prod/ima... Gemini 3.5 Flash is beat hands down by both Opus 4.8 and GPT 5.5, and yet the graph is drawn as if Gemini wins ... :-D

mroche 16 minutes ago | parent | next [-]

The graph has Gemini 3.5 Flash matching Sonnet 4.6, losing to Opus 4.8, and slightly behind GPT-5.5 by 0.3 points... That's not that much of a hands-down loss for Gemini for this specific workload benchmark.

The methodology used:

https://deepmind.google/models/evals-methodology/gemini-3-5-...

Methodology: All Gemini scores are pass @1 except where otherwise noted. "Single attempt" settings allow no majority voting or parallel test-time compute. All of the results are all run with the Gemini API for the model-id gemini-3.5-flash with default sampling settings unless indicated otherwise below. To reduce variance, we average over multiple trials for smaller benchmarks.

All the results for non-Gemini models are sourced from providers' self reported numbers unless otherwise mentioned below. For Claude Opus 4.7 , Sonnet 4.6, and GPT-5.5 we default to reporting maximum thinking/reasoning settings available, but when reported results are not available we use best available reasoning results.

gb2d_hn 6 minutes ago | parent | prev | next [-]

It's honest - people who know what they are looking at will take speed and token costs into account. I don't use Gemini 3.5 for coding, but I use it as something in between a search engine and agent.

sheept 34 minutes ago | parent | prev [-]

It highlights the Gemini models blue since that's what the article is about. The bar heights seem consistent with the values.

satvikpendem an hour ago | parent | prev | next [-]

There's still no MCP support in the Gemini app, which is very useful to get various pieces of info as a user just via chatting. For example I recently wanted to get an Airbnb and wanted to filter by specific criteria including house image analysis and Gemini couldn't do it so I had to do it in Codex.

anticorporate 36 minutes ago | parent | next [-]

Yeah, it seems like this is the biggest missing feature from the Gemini ecosystem.

If I can't connect MCP, there's really no selling point for me to use Gemini from my watch, car, smart speaker, etc. If I'm already bound to using my own front end, then I'm only evaluating Gemini as a model/API, at which point it has many competitors that may be cheaper or better fit for the task.

thejaycampbell 32 minutes ago | parent [-]

agreed... this is where they lost me too

tonyrice an hour ago | parent | prev [-]

This is why I don't always use the official Gemini Web app. Lately I've found that it's more useful to utilize a CLI. I'm looking forward to the day they add MCP in the web.

singingtoday 2 minutes ago | parent | next [-]

CLI doesn't work with my subscription..

pregseahorses 22 minutes ago | parent | prev [-]

Gemini CLi now requires antigravity subscription..

airstrike 40 minutes ago | parent | prev | next [-]

Computer use is such a terrible idea. It's slow, insecure, error prone, expensive.

I guess if you're trying to get people to tokenmaxx it may look like a valid strategy, but ain't no way this will be delightful to users.

I think it's a symptom of just not understanding how LLMs should interface with the OS because we're still in their early days.

Eventually there'll be an iPhone moment for the ergonomics of LLM usage outside of coding

beastman82 28 minutes ago | parent | prev | next [-]

No UI like their competitors Claude CoWork or Codex. This is vaporware

villgax 19 minutes ago | parent | prev [-]

Will it skip Ads lol

humblyCrazy 14 minutes ago | parent [-]

I looked at their demo and it does not