Remix.run Logo
kazinator 7 days ago

> what model did you ask?

Are you hoping to disprove my point by cherry picking the AI that gets the answer?

I used Gemini 2.5 Flash.

Where can I get an exact list of stuff that Gemini 2.5 Flash does not know that Claude Sonnet does, and vice versa?

Then before deciding to consult with AI, I can consult the list?

simianwords 7 days ago | parent | next [-]

2.5 flash is particularly cheap and fast, I think 2.5 pro would have got all the answers correct - at least it gets this one correct.

Yokolos 7 days ago | parent | next [-]

I get a lot of garbage out of 2.5 Pro and Claude Sonnet and ChatGPT. There's always this "this is how you solve it", I take a close look and it's clearly broken, I point it out and it's all "you're right, this is a common issue". Okay, so why do we have to do this song and dance a million times to arrive at the actually correct answer?

kazinator 7 days ago | parent | prev [-]

Why doesn't Flash get it correct, yet comes up with plausible sounding nonsense? That means it is trained on some texts in the area.

What would make 2.5 Pro (or anything else) categorically better would be if it could say "I don't know".

There will be things that Claude 3.7 or Gemini Pro will not know, and the interpolations they come up with will not make sense.

simianwords 7 days ago | parent [-]

Model accuracy goes up as you use heavier models. Accuracy is always preferable and the jump from Flash to Pro is considerable.

You must rely on your own internal model in your head to verify the answers it gives.

On hallucination: it is a problem but again, it reduces as you use heavier models.

Macha 6 days ago | parent [-]

> You must rely on your own internal model in your head to verify the answers it gives

This is what significantly reduces the utility, if it can only be trusted to answer things I know the answer to, why would I ask it anything?

simianwords 6 days ago | parent | next [-]

its the same reason I find it useful to read comments in Reddit, ask people their advice and opinions.

I have written about it here: https://news.ycombinator.com/item?id=44712300

rockemsockem 6 days ago | parent | prev [-]

Verification is often easier/faster than coming up with the answer totally

simianwords 6 days ago | parent [-]

true! generation of an answer is much harder than verification. i wonder if a parallel can be drawn to P vs NP problem.

ryao 7 days ago | parent | prev | next [-]

Gemini 2.5 Flash is meant for things that have a higher tolerance for mistakes as long as the costs are low and responses are quick. Claude Sonnet is similar, although the trade off it makes between mistake tolerance and cost/speed is more in favor of fewer mistakes.

Lately, I have been using Grok 4 and I have had very good results from it.

iusewindows 7 days ago | parent | prev | next [-]

Today I read a stupid Hackernews comment about how AI is useless. Therefore Hackernews is stupid. Oh, I need a filtered list of which comments to read?

Do you build computers by ordering random parts off Alibaba and complaining when they are deficient? You are complaining that you need to RTFM for a piece of high tech?

kazinator 7 days ago | parent [-]

> Oh, I need a filtered list of which comments to read?

If they are about something you're not sure about, and you're making decisions based on them ... maybe it would actually help, so yes?

> Do you build computers by ordering random parts off Alibaba and complaining when they are deficient?

We build computers using parts which are carefully documented by data sheets, which tell you exactly for what ranges of parameters their operation is defined and in what ways. (temperatures, voltages, currents, frequencies, loads, timings, typical circuits, circuit board layouts, programming details ...)

7 days ago | parent | prev [-]
[deleted]