Remix.run Logo
sbuttgereit 3 days ago

Hmmm... I wonder if this is why some of the results I've gotten over the past few days have been pretty bad. It's easy to dismiss poor results on LLM quality variance from prompt to prompt vs. something like this where the quality is actively degraded without notification. I can't say this is in fact what I'm experience, but it was noticeable enough I'm going to check.

jmathai 3 days ago | parent | next [-]

Never occurred to me that the response changes based on load. I’ve definitely noticed it seems smarter at times. Makes evaluating results nearly impossible.

kridsdale1 3 days ago | parent [-]

My human responses degrade when I’m heavily loaded and low on resources, too.

TeMPOraL 3 days ago | parent [-]

Unrelated. Inference doesn't run in sync with the wall clock; it takes whatever it takes. The issue is more like telling a room of support workers they are free to half-ass the work if there's too many calls, so they don't reject any until even half-assing doesn't lighten the load enough.

Seattle3503 3 days ago | parent | prev | next [-]

This is one reason closed models suck. You can't tell if the bad responses are due to something you are doing, or if the company you are paying to generate the responses is cutting corners and looking for efficiencies, eg by reducing the number of bits. It is a black box.

mirsadm 3 days ago | parent [-]

To be fair even if you did know it would still behave the same way.

TeMPOraL 3 days ago | parent [-]

Still, knowing is what makes the difference between gaslighting and merely subpar/inconsistent service.

baxtr 3 days ago | parent | prev | next [-]

Recently I started wondering about the quality of ChatGPT. A couple of instances I was like: "hmm, I’m not impressed at all by this answer, I better google it myself!"

Maybe it’s the same effect over there as well.

dave84 3 days ago | parent [-]

Recently I asked 4o to ‘try again’ when it failed to respond fully, it started telling me about some song called Try Again. It seems to lose context a lot in the conversations now.

55555 3 days ago | parent | prev [-]

Same experience here.