Remix.run Logo
nonethewiser a day ago

I wonder what brought them into compliance.

himata4113 a day ago | parent | next [-]

the classifier is very picky, working with languages such as C I get cyber refusals and random reasoning_extraction errors.

anonym29 a day ago | parent [-]

It's gone from bad to unbelievably bad. Just a prompt with the single word "virus" was enough to get downgraded to O4.8 before the ban.

"Tell me a story about a man named CVE." gets me downgraded now.

"CVE-" gets me downgraded and broken reasoning with no response.

"Giant crane fly, 2 feet wide" gets me downgraded.

Schiendelman a day ago | parent [-]

Is it likely that it keeps track of what you said recently and decides whether you might be referring to things you've previously been downgraded for?

I haven't been downgraded once, either a few weeks ago for the three days it was live, or since I got it back today.

anonym29 a day ago | parent [-]

>Is it likely that it keeps track of what you said recently and decides whether you might be referring to things you've previously been downgraded for?

I have no special knowledge here, it feels rather unproductive for me to speculate.

Out of curiosity, if you're comfortable trying any of them, do any of the above prompts cause you to get downgraded?

Schiendelman a day ago | parent [-]

Good question! I'll give them a shot when I'm back to my computer.

Schiendelman a day ago | parent [-]

I went last to first, and I was not blocked.

anonym29 17 hours ago | parent [-]

Thanks for trying and for sharing results here. This is a pretty interesting data point and suggests something I don't think I've read about: the possibility that safeguards may not be account-invariant; that two different users might be getting downgraded or blocked differently for the same prompts.

This raises some really interesting ethical questions in my mind, but I suppose I need to do more reading and research on this before anything else.

Schiendelman 13 hours ago | parent [-]

You bet! It's interesting.

As someone who has worked in the space on multiple products, I can tell you with 99.999% certainty that fraud/abuse detection is never account invariant once money is involved.

baggachipz a day ago | parent | prev [-]

Gold statues, ring-kissing, and total fealty to the military's desires?