Remix.run Logo
logicprog 3 hours ago

Is it me or is the rate of model release is accelerating to an absurd degree? Today we have Gemini 3 Deep Think and GPT 5.3 Codex Spark. Yesterday we had GLM5 and MiniMax M2.5. Five days before that we had Opus 4.6 and GPT 5.3. Then maybe two weeks I think before that we had Kimi K2.5.

i5heu 3 hours ago | parent | next [-]

I think it is because of the Chinese new year. The Chinese labs like to publish their models arround the Chinese new year, and the US labs do not want to let a DeepSeek R1 (20 January 2025) impact event happen again, so i guess they publish models that are more capable then what they imagine Chinese labs are yet capable of producing.

woah an hour ago | parent | next [-]

Singularity or just Chinese New Year?

r2vcap 28 minutes ago | parent | prev [-]

Please use the term “Lunar New Year” instead of “Chinese New Year,” as the lunar calendar is a respected tradition in many Asian countries. For example, both California and New York use the term “Lunar New Year” in their legislation.

rfoo 4 minutes ago | parent | next [-]

For another example, Singapore, one of the "many Asian countries" you mentioned, list "Chinese New Year" as the official name on government websites. [0] Also note that both California and New York is not located in Asia.

And don't get me started with "Lunar New Year? What Lunar New Year? Islamic Lunar New Year? Jewish Lunar New Year? CHINESE Lunar New Year?".

[0] https://www.mom.gov.sg/employment-practices/public-holidays

zzrush 3 minutes ago | parent | prev | next [-]

I didn't expect language policing has reached such level. This is specifically related to China and DeepSeek who celebrates Chinese new year. Do you demand all Chinese to say happy luner new year to each other?

phainopepla2 13 minutes ago | parent | prev [-]

"Happy Holidays" comes to the diaspora

aliston 3 hours ago | parent | prev | next [-]

I'm having trouble just keeping track of all these different types of models.

Is "Gemini 3 Deep Think" even technically a model? From what I've gathered, it is built on top of Gemini 3 Pro, and appears to be adding specific thinking capabilities, more akin to adding subagents than a truly new foundational model like Opus 4.6.

Also, I don't understand the comments about Google being behind in agentic workflows. I know that the typical use of, say, Claude Code feels agentic, but also a lot of folks are using separate agent harnesses like OpenClaw anyway. You could just as easily plug Gemini 3 Pro into OpenClaw as you can Opus, right?

Can someone help me understand these distinctions? Very confused, especially regarding the agent terminology. Much appreciated!

logicprog 2 hours ago | parent | next [-]

> Also, I don't understand the comments about Google being behind in agentic workflows.

It has to do with how the model is RL'd. It's not that Gemini can't be used with various agentic harnesses, like open code or open claw or theoretically even claude code. It's just that the model is trained less effectively to work with those harnesses, so it produces worse results.

re-thc 3 hours ago | parent | prev [-]

There are hints this is a preview to Gemini 3.1.

rogerkirkness 3 hours ago | parent | prev | next [-]

Fast takeoff.

redox99 3 hours ago | parent | prev | next [-]

There's more compute now than before.

bpodgursky 3 hours ago | parent | prev | next [-]

Anthropic took the day off to do a $30B raise at a $380B valuation.

IhateAI 3 hours ago | parent [-]

Most ridiculous valuation in the history of markets. Cant wait to watch these compsnies crash snd burn when people give up on the slot machine.

andxor an hour ago | parent | next [-]

As usual don't take financial advice from HN folks!

kgwgk 3 hours ago | parent | prev | next [-]

WeWork almost IPO’s at $50bn. It was also a nice crash and burn.

jascha_eng 2 hours ago | parent | prev [-]

Why? They had $10+ billion arr run rate in 2025 trippeled from 2024 I mean 30x is a lot but also not insane at that growth rate right?

gokhan an hour ago | parent [-]

It's a 13 days old account with IHateAI handle.

brokencode 3 hours ago | parent | prev [-]

They are using the current models to help develop even smarter models. Each generation of model can help even more for the next generation.

I don’t think it’s hyperbolic to say that we may be only a single digit number of years away from the singularity.

lm28469 3 hours ago | parent | next [-]

I must be holding these things wrong because I'm not seeing any of these God like superpowers everyone seem to enjoy.

brokencode 2 hours ago | parent [-]

Who said they’re godlike today?

And yes, you are probably using them wrong if you don’t find them useful or don’t see the rapid improvement.

lm28469 2 hours ago | parent [-]

Let's come back in 12 months and discuss your singularity then. Meanwhile I spent like $30 on a few models as a test yesterday, none of them could tell me why my goroutine system was failing, even though it was painfully obvious (I purposefully added one too many wg.Done), gemini, codex, minimax 2.5, they all shat the bed on a very obvious problem but I am to believe they're 98% conscious and better at logic and math than 99% of the population.

Every new model release neckbeards come out of the basements to tell us the singularity will be there in two more weeks

brokencode 2 hours ago | parent | next [-]

You are fighting straw men here. Any further discussion would be pointless.

lm28469 an hour ago | parent [-]

Of course, n-1 wasn't good enough but n+1 will be singularity, just two more weeks my dudes, two more week... rinse and repeat ad infinitum

brokencode 40 minutes ago | parent [-]

Like I said, pointless strawmanning.

You’ve once again made up a claim of “two more weeks” to argue against even though it’s not something anybody here has claimed.

If you feel the need to make an argument against claims that exist only in your head, maybe you can also keep the argument only in your head too?

BeetleB an hour ago | parent | prev | next [-]

On the flip side, twice I put about 800K tokens of code into Gemini and asked it to find why my code was misbehaving, and it found it.

The logic related to the bug wasn't all contained in one file, but across several files.

This was Gemini 2.5 Pro. A whole generation old.

woah an hour ago | parent | prev | next [-]

Post the file here

logicprog 2 hours ago | parent | prev | next [-]

Meanwhile I've been using Kimi K2T and K2.5 to work in Go with a fair amount of concurrency and it's been able to write concurrent Go code and debug issues with goroutines equal to, and much more complex then, your issue, involving race conditions and more, just fine.

Projects:

https://github.com/alexispurslane/oxen

https://github.com/alexispurslane/org-lsp

(Note that org-lsp has a much improved version of the same indexer as oxen; the first was purely my design, the second I decided to listen to K2.5 more and it found a bunch of potential race conditions and fixed them)

shrug

Izikiel43 an hour ago | parent | prev [-]

Out of curiosity, did you give a test for them to validate the code?

I had a test failing because I introduced a silly comparison bug (> instead of <), and claude 4.6 opus figured out it wasn't the test the problem, but the code and fixed the bug (which I had missed).

lm28469 an hour ago | parent [-]

There was a test and a very useful golang error that literally explain what was wrong. The model tried implementing a solution, failed and when I pointed out the error most of them just rolled back the "solution"

Izikiel43 21 minutes ago | parent [-]

Ok, thanks for the info

sekai an hour ago | parent | prev [-]

> I don’t think it’s hyperbolic to say that we may be only a single digit number of years away from the singularity.

We're back to singularity hype, but let's be real: benchmark gains are meaningless in the real world when the primary focus has shifted to gaming the metrics

brokencode an hour ago | parent [-]

Ok, here I am living in the real world finding these models have advanced incredibly over the past year for coding.

Benchmaxxing exists, but that’s not the only data point. It’s pretty clear that models are improving quickly in many domains in real world usage.