Remix.run Logo
coliveira 4 days ago

My personal experience is that it produces high quality results.

amrrs 4 days ago | parent | next [-]

Any example or prompt you use to make this statment?

imachine1980_ 4 days ago | parent | next [-]

I remember asking for quotes about the Spanish conquest of South America because I couldn't remember who said a specific thing. The GPT model started hallucinating quotes on the topic, while DeepSeek responded with, "I don't know a quote about that specific topic, but you might mean this other thing." or something like that then cited a real quote in the same topic, after acknowledging that it wasn't able to find the one I had read in an old book. i don't use it for coding, but for things that are more unique i feel is more precise.

mycall 4 days ago | parent | next [-]

I wonder if Conway's law is at all responsible for that, in the similarity it is based on; regional trained data which has concept biases which it sends back in response.

valtism 4 days ago | parent | prev [-]

Was that true for GPT-5? They claim it is much better at not hallucinating

sync 4 days ago | parent | prev [-]

I'm doing coreference resolution and this model (w/o thinking) performs at the Gemini 2.5-Pro level (w/ thinking_budget set to -1) at a fraction of the cost.

antman 3 days ago | parent | next [-]

Nice point. How did you test for coreference resolution? Specific prompt or dataset?

dr_dshiv 4 days ago | parent | prev [-]

Strong claim there!

SV_BubbleTime 4 days ago | parent | prev [-]

Vine is about the only benchmark I think is real.

We made objective systems turn out subjective answers… why the shit would anyone think objective tests would be able to grade them?