Remix.run Logo
Robdel12 4 hours ago

I really want to use google’s models but they have the classic Google product problem that we all like to complain about.

I am legit scared to login and use Gemini CLI because the last time I thought I was using my “free” account allowance via Google workspace. Ended up spending $10 before realizing it was API billing and the UI was so hard to figure out I gave up. I’m sure I can spend 20-40 more mins to sort this out, but ugh, I don’t want to.

With alllll that said.. is Gemini 3.1 more agentic now? That’s usually where it failed. Very smart and capable models, but hard to apply them? Just me?

alpineman 4 hours ago | parent | next [-]

100% agreed. I wish someone would make a test for how reliably the LLMs follow tool use instructions etc. The pelicans are nice but not useful for me to judge how well a model will slot into a production stack.

embedding-shape 4 hours ago | parent [-]

At first when I got started with using LLMs I read/analyzed benchmarks, looked at what example prompts people used and so on, but many times, a new model does best at the benchmark, and you think it'll be better, but then in real work, it completely drops the ball. Since then I've stopped even reading benchmarks, I don't care an iota about them, they always seem more misdirected than helpful.

Today I have my own private benchmarks, with tests I run myself, with private test cases I refuse to share publicly. These have been built up during the last 1/1.5 years, whenever I find something that my current model struggles with, then it becomes a new test case to include in the benchmark.

Nowadays it's as easy as `just bench $provider $model` and it runs my benchmarks against it, and I get a score that actually reflects what I use the models for, and it feels like it more or less matches with actually using the models. I recommend people who use LLMs for serious work to try the same approach, and stop relying on public benchmarks that (seemingly) are all gamed by now.

cdelsolar 4 hours ago | parent [-]

share

embedding-shape 3 hours ago | parent [-]

The harness? Trivial to build yourself, ask your LLM for help, it's ~1000 LOC you could hack together in 10-15 minutes.

As for the test cases themselves, that would obviously defeat the purpose, so no :)

phamilton 4 hours ago | parent | prev | next [-]

> For those building with a mix of bash and custom tools, Gemini 3.1 Pro Preview comes with a separate endpoint available via the API called gemini-3.1-pro-preview-customtools. This endpoint is better at prioritizing your custom tools (for example view_file or search_code).

It sounds like there was at least a deliberate attempt to improve it.

pdntspa 4 hours ago | parent | prev | next [-]

You can delete the billing from a given API key

Stevvo 4 hours ago | parent | prev | next [-]

You could always use it through Copilot. The credits based billing is pretty simple without surprise charges.

abiraja 2 hours ago | parent | prev | next [-]

I've been using it lately with OpenCode and it's working pretty well (except for API reliability issues).

surgical_fire 4 hours ago | parent | prev | next [-]

May be very silly of me, but I avoid using Gemini on my personal Google account. I use it at work, because my employer provides it.

I am scared some automated system may just decide I am doing something bad and terminate my account. I have been moving important things to Proton, but there are some stuff that I couldn't change that would cause me a lot of annoyance. It's not trivial to set up an alternative account just for Gemini, because my Google account is basically on every device I use.

I mostly use LLMs as coding assistant, learning assistant, and general queries (e.g.: It helped me set up a server for self hosting), so nothing weird.

CamperBob2 2 hours ago | parent [-]

For what it's worth, there was an (unfortunately unsuccessful) HN submission from a guy who got his Gemini account banned, apparently without losing his whole Google account: https://news.ycombinator.com/item?id=47007906

surgical_fire an hour ago | parent [-]

Comforting to know that they may ban you from only some of their services, I guess?

I really regret relying so much on my Google account for so long. Untangling myself from it is really hard. Some places treat your email as a login, not as simply as a way to contact you. This is doubly concerning for government websites, where setting up a new account may just not be a possibility.

At some point I suppose Gemini will be the only viable option for LLMs, so oh well.

horsawlarway 4 hours ago | parent | prev | next [-]

So much this.

It's absolutely amazing how hostile Google is to releasing billing options that are reasonable, controllable, or even fucking understandable.

I want to do relatively simple things like:

1. Buy shit from you

2. For a controllable amount (ex - let me pick a limit on costs)

3. Without spending literally HOURS trying to understand 17 different fucking products, all overlapping, with myriad project configs, api keys that should work, then don't actually work, even though the billing links to the same damn api key page, and says it should work.

And frankly - you can't do any of it. No controls (at best delayed alerts). No clear access. No real product differentiation pages. No guides or onboarding pages to simplify the matter. No support. SHIT LOADS of completely incorrect and outdated docs, that link to dead pages, or say incorrect things.

So I won't buy shit from them. Period.

sciencejerk 4 hours ago | parent [-]

You think AWS is better?

3form 3 hours ago | parent | next [-]

Exact reason I used none of these platforms for my personal projects, ever.

pdimitar 3 hours ago | parent | prev [-]

Who is comparing to AWS and why? They can both be terrible at the same time, you know.

himata4113 4 hours ago | parent | prev [-]

use openrouter instead

Robdel12 2 hours ago | parent [-]

This is actually an excellent idea, I’ll give this a shot tonight!