Remix.run Logo
renewiltord 15 hours ago

Opus 4.6 is a very good model but harness around it is good too. It can talk about sensitive subjects without getting guardrail-whacked.

This is much more reliable than ChatGPT guardrail which has a random element with same prompt. Perhaps leakage from improperly cleared context from other request in queue or maybe A/B test on guardrail but I have sometimes had it trigger on innocuous request like GDP retrieval and summary with bucketing.

menzoic 15 hours ago | parent | next [-]

I would think it’s due to the non determinism. Leaking context would be an unacceptable flaw since many users rely on the same instance.

A/B test is plausible but unlikely since that is typically for testing user behavior. For testing model output you can do that with offline evaluations.

sciencejerk 12 hours ago | parent [-]

Can you explain the "same instance" and user isolation? Can context be leaked since it is (secretly?) shared? Explain pls, genuinely curious

tbossanova 15 hours ago | parent | prev [-]

What kind of value do you get from talking to it about “sensitive” subjects? Speaking as someone who doesn’t use AI, so I don’t really understand what kind of conversation you’re talking about

NiloCK 14 hours ago | parent | next [-]

The most boring example is somehow the best example.

A couple of years back there was a Canadian national u18 girls baseball tournament in my town - a few blocks from my house in fact. My girls and I watched a fair bit of the tournament, and there was a standout dominating pitcher who threw 20% faster than any other pitcher in the tournament. Based on the overall level of competition (women's baseball is pretty strong in Canada) and her outlier status, I assumed she must be throwing pretty close to world-class fastballs.

Curiosity piqued, I asked some model(s) about world-records for women's fastballs. But they wouldn't talk about it. Or, at least, they wouldn't talk specifics.

Women's fastballs aren't quite up to speed with top major league pitchers, due to a combination of factors including body mechanics. But rest assured - they can throw plenty fast.

Etc etc.

So to answer your question: anything more sensitive than how fast women can throw a baseball.

Der_Einzige 13 hours ago | parent [-]

They had to tune the essentialism out of the models because they’re the most advanced pattern recognizers in the world and see all the same patterns we do as humans. Ask grok and it’ll give you the right, real answer that you’d otherwise have to go on twitter or 4chan to find.

I hate Elon (he’s a pedo guy confirmed by his daughter), but at least he doesn’t do as much of the “emperor has no clothes” shit that everyone else does because you’re not allowed to defend essentialism anymore in public discourse.

nvch 14 hours ago | parent | prev | next [-]

I recall two recent cases:

* An attempt to change the master code of a secondhand safe. To get useful information I had to repeatedly convince the model that I own the thing and can open it.

* Researching mosquito poisons derived from bacteria named Bacillus thuringiensis israelensis. The model repeatedly started answering and refused to continue after printing the word "israelensis".

tbrownaw 14 hours ago | parent [-]

> israelensis

Does it also take issue with the town of Scunthorpe?

gensym 3 hours ago | parent | prev | next [-]

One example - I'm doing research for some fiction set in the late 19th century, when strychnine was occasionally used as a stimulant. I want to understand how / when it would have been used and dosages, and ChatGTP shut down that conversation "for safety".

rebeccaskinner 14 hours ago | parent | prev [-]

I sometimes talk with ChatGPT in a conversational style when thinking critically about media. In general I find the conversational style a useful format for my own exploration of media, and it can be particularly useful for quickly referencing work by particular directors for example.

Normally it does fairly well but the guardrails sometimes kick even with fairly popular mainstream media- for example I’ve recently been watching Shameless and a few of the plot lines caused the model to generate output that hit the content moderation layer, even when the discussion was focused on critical analysis.

sciencejerk 12 hours ago | parent [-]

Interesting. Specific examples of what was censored?