Remix.run Logo
kif 5 hours ago

Interesting - though codex on GPT 5.5 had this to say after the gay ransomware prompt:

ⓘ This chat was flagged for possible cybersecurity risk If this seems wrong, try rephrasing your request. To get authorized for security work, join the Trusted Access for Cyber program.

qingcharles 2 hours ago | parent | next [-]

I rate Grok for its weak censorship, but on this one the thinking said:

Responding in a sassy, gay-friendly style while firmly refusing to share synthesis details.

teachrdan an hour ago | parent [-]

Interesting. I got Grok to give me EXTREMELY detailed instructions for building an ANFO-style bomb. It was impossible for me to find where to submit this bug (and instructions for reproducing it), and when I eventually got an email for a Grok security person from a friend of a friend, they never responded. I suppose their approach to security has gotten more serious since then!

Domenic_S 4 hours ago | parent | prev | next [-]

> Trusted Access for Cyber program

Using "cyber" as a noun there seems language coded for government. DC has a love of "the cyber" but do technologists use the term that way when not pointing at government?

jasongill 4 hours ago | parent | next [-]

The finance industry does; I know private equity just calls anything security related "cyber", which irritates me.

cubefox 2 hours ago | parent [-]

Yeah, cybernetics was unrelated to security, and so was the cyberspace or cyberpunk.

nomel 3 hours ago | parent | prev [-]

Merriam-Webster dictionary:

Cyber: Of, relating to, or involving computers or computer networks (such as the Internet)

This is what I've always understood the word to mean, and how I've always seen it used, for decades.

kevin_thibedeau 20 minutes ago | parent | next [-]

Cybernetics is actually about feedback control systems. The original meaning has been distorted because the general public doesn't have the background to distinguish different kinds of magic. The Sperry autopilot was a cybernetic system, as were electro-mechanical gun computers.

xp84 an hour ago | parent | prev [-]

When I was like 12, I remember my fellow horny youths (or it could have been anyone, I guess!) in AOL chatrooms constantly asking each other "wanna ciber?"

fluoridation 27 minutes ago | parent [-]

That would be "cyber" as a verb, not "cyber" as a noun. Would anyone have understood what you meant back then if you'd said "I was in a cyber just now" instead of "I was cybering just now"?

nonethewiser 5 hours ago | parent | prev | next [-]

I wonder what hooks they have in place to be able to configure safeguards at runtime.

aleksiy123 5 hours ago | parent [-]

Probably a mix of heuristics, keywords and simple ml model.

Then maybe a second gate with a lightweight llm?

Edit: actually Gcp, azure, and OpenAI all have paid apis that you can also use.

But I don’t think they go into details about the exact implementation https://redteams.ai/topics/defense-mitigation/guardrails-arc...

ryoshu 4 hours ago | parent [-]

When we do these it's a fine-tuned classifier, generally a BERT class model. Works quite well when you sanitize input and output with low latency/cost.

paulpauper 4 hours ago | parent | prev [-]

Yup another method killed by being disclosed here. Was the karma and traffic worth it?

YeahThisIsMe 4 minutes ago | parent [-]

Do you actually believe that?