| ▲ | himata4113 4 hours ago |
| I already felt that gemini 3 proved what is possible if you train a model for efficiency. If I had to guess the pro and flash variants are 5x to 10x smaller than opus and gpt-5 class models. They produce drastically lower amount of tokens to solve a problem, but they haven't seem to have put enough effort into refinining their reasoning and execution as they produce broken toolcalls and generally struggle with 'agentic' tasks, but for raw problem solving without tools or search they match opus and gpt while presumably being a fraction of the size. I feel like google will surprise everyone with a model that will be an entire generation beyond SOTA at some point in time once they go from prototyping to making a model that's not a preview model anymore. All models up till now feel like they're just prototypes that were pushed to GA just so they have something to show to investors and to integrate into their suite as a proof of concept. |
|
| ▲ | onlyrealcuzzo 4 hours ago | parent | next [-] |
| > They produce drastically lower amount of tokens to solve a problem, but they haven't seem to have put enough effort into refinining their reasoning and execution as they produce broken toolcalls and generally struggle with 'agentic' tasks, but for raw problem solving without tools or search they match opus and gpt while presumably being a fraction of the size. Agreed, Gemini-cli is terrible compared to CC and even Codex. But Google is clearly prioritizing to have the best AI to augment and/or replace traditional search. That's their bread and butter. They'll be in a far better place to monetize that than anyone else. They've got a 1B+ user lead on anyone - and even adding in all LLMs together, they still probably have more query volume than everyone else put together. I hope they start prioritizing Gemini-cli, as I think they'd force a lot more competition into the space. |
| |
| ▲ | JeremyNT 3 hours ago | parent | next [-] | | > Agreed, Gemini-cli is terrible compared to CC and even Codex. Using it with opencode I don't find the actual model to cause worse results with tool calling versus Opus/GPT. This could be a harness problem more than a model problem? I do prefer the overall results with GPT 5.4, which seems to catch more bugs in reviews that Gemini misses and produce cleaner code overall. (And no, I can't quantify any of that, just "vibes" based) | |
| ▲ | ljm 10 minutes ago | parent | prev | next [-] | | Google doesn't need to give a shit, because so much of the internet is infested with with google ad trackers and adwords, and everybody uses Chrome, that they will continue to make billions even without AI. Facebook did the same with their pixel so they could soak up data. Gemini will be dead in 2 years and there'll be something else, but the ad and search company will remain given that they basically own the world wide web. Except now, so much of the WWW is filled with AI slop that it breaks the system. | |
| ▲ | rjh29 2 hours ago | parent | prev | next [-] | | I wonder what I am missing, because I can use gemini-cli with English descriptions of features or entire projects and it just cranks out the code. Built a bunch of stuff with it. Can't think of anything it's currently lacking. | | |
| ▲ | xnx 27 minutes ago | parent | next [-] | | Same. I've built dozens of small tools and scripts and never felt the need to try something else. | |
| ▲ | CraigJPerry 2 hours ago | parent | prev [-] | | >> Can't think of anything it's currently lacking. Speed? The pro models are slow for me The model 3.1 pro model is good and i don't recognise the GP's complaint of broken tool calls but i'm only using via gemini cli harness, sounds like they might be hosting their own agentic loop? |
| |
| ▲ | asah 3 hours ago | parent | prev | next [-] | | also, for incorporating into gsuite, youtube, maps, gcp and their other winning apps and behind-the-scenes infra... | |
| ▲ | toraway an hour ago | parent | prev | next [-] | | I thought the same for a long time, borderline unusable with loops/bizarre decisions compared to Claude Code and later Codex. But I picked it up again about a month ago and I have been quite impressed. Haven’t hit any of those frustrating QoL issues yet it was famous for and I’ve been using it a few hours a day. Maybe it will let me down sooner or later but so far it has been working really well for me and is pretty snappy with the auto model selection. After cancelling my Claude Pro plan months ago due to Anthropic enshittification I’ve been nervous relying solely on Codex in case they do the same, so I’ve been glad to have it available on my Google One plan. | |
| ▲ | Iulioh 3 hours ago | parent | prev [-] | | Not only that, google has an advange because they don't need to always generate a response. When a lot of people ask the same thing they can just index the questions, like a results on the search engine and recalculate it only so often, |
|
|
| ▲ | UncleOxidant 3 hours ago | parent | prev | next [-] |
| IIRC when Gemini 3 Pro came out it was considered to be just about on par with whatever version of Claude was out then (4?). Now Gemini 3 is looking long in the tooth. Considering how many Chinese models have been released since then, and at least 2 or 3 versions of Claude, it's starting to look like Google is kind of sitting still here. Maybe you're right and they'll surprise us soon with a large step improvement over what they currently have. Note: I do realize that there's been a Gemini 3.1 release, but it didn't seem like a noticeable change from 3. |
|
| ▲ | orbital-decay 2 hours ago | parent | prev | next [-] |
| Their "preview" naming is pretty arbitrary. It's just their way to avoid making any availability or persistence promises, let alone guarantees. It's also a PR tactic to mask any failures by pretending it's beta quality. |
|
| ▲ | big-chungus4 an hour ago | parent | prev | next [-] |
| Am I tripping or is this an AI reply? Like it barely has anything to do with the article other than both are related to AI |
|
| ▲ | mrcwinn an hour ago | parent | prev | next [-] |
| Interesting mix of words: "I felt" -> "proved" -> "guess". One of those is not like the others! |
|
| ▲ | ALLTaken 4 hours ago | parent | prev [-] |
| [flagged] |
| |
| ▲ | _boffin_ 3 hours ago | parent | next [-] | | Is your friend on the JAX team? | |
| ▲ | neonstatic 3 hours ago | parent | prev [-] | | I'm really struggling with terrible bloating today, but I deemed it too dangerous to release. | | |
| ▲ | tclancy an hour ago | parent [-] | | Thank you for your sacrifice. Could you speak to my dog please? You may wish to yell from a distance, actually. |
|
|