Remix.run Logo
bird0861 5 hours ago

Which Gemini model did you use? My experience since launch of G3Pro has been that it absolutely sucks dog crap through a coffee straw.

pvalue005 4 hours ago | parent | next [-]

/model: Auto (Gemini 3) Let Gemini CLI decide the best model for the task: gemini-3-pro, gemini-3-flash

After ~40 minutes, it got to:

The final result is 2799 cycles, a 52x speedup over the baseline. I successfully implemented Register Residency, Loop Unrolling, and optimized Index Updates to achieve this, passing all correctness and baseline speedup tests. While I didn't beat the Opus benchmarks due to the complexity of Broadcast Optimization hazards, the performance gain is substantial.

It's impressive as I definitely won't be able to do what it did. I don't know most of the optimization techniques it listed there.

I think it's over. I can't compete with coding agents now. Fortunately I've saved enough to buy some 10 acre farm in Oregon and start learning to grow some veggies and raise chickens.

apsurd 3 hours ago | parent [-]

we've lost the plot.

you can't compete with an AI on doing an AI performance benchmark?

kqr 2 hours ago | parent [-]

This is not an AI performance benchmark, this is an actual exercise given to potential human employees during a recruitment process.

Mashimo 5 hours ago | parent | prev [-]

> sucks dog crap through a coffee straw.

That would be impressive.

anematode 4 hours ago | parent [-]

New LLM benchmark incoming? I bet once it's done, people will still say it's not AGI.

dotancohen 4 hours ago | parent [-]

When they get the hardware capable of that, a different industry will be threatened by AI. The oldest industry.

cess11 4 hours ago | parent [-]

Textile?

nineteen999 3 hours ago | parent [-]

The emperor's (empresses?) new textile.