Remix.run Logo
lmeyerov 3 hours ago

We have been getting increasingly hit by this. We do defense, not offense, and AI refusals to run defense prompts has been going noticeably up. Historically, tasks used to only get randomly rejected when we were doing disaster management AI, so this is a surprise shift in refusals to function reliably for basic IT.

Related, they outsourced the TAP verification to a terrible vendor, and their internal support process to AI, so we are now in fairly busted support email threads with both and no humans in sight.

This all feels like an unserious cybersecurity partner.

intended 3 hours ago | parent [-]

They are selling an impossible product.

If you make an LLM more safe, you are going to shift the weight for defensive actions as well.

There’s no physical way to assign weights to have one and not the other.

Borealid 2 hours ago | parent [-]

> If you make an LLM more safe, you are going to shift the weight for defensive actions as well. > > There’s no physical way to assign weights to have one and not the other.

Do you think a human is capable of providing assistance with defense but not offense, over a textual communication channel with another human?

If no, how does a cybersec firm train its employees?

If yes, how can you make the bold claim that it's possible for a human to differentiate between the two cases using incoming text as their basis for judgement, but IMpossible for an LLM to be configured to do the same? Note that if some hypothetical completely-determinstic LLM that always rejects "attack" requests and accepts "defense" ones can exist, the claim it's impossible is false. Providing nondeterministic output for a given input is not a hard requirement for language models.

beering an hour ago | parent [-]

> Do you think a human is capable of providing assistance with defense but not offense, over a textual communication channel with another human? > If no, how does a cybersec firm train its employees?

In general, no, humans can’t be sure they are only helping with defensive and not offensive work unless they have more context. IRL, a security engineer would know who they’re working for. If they’re advising Apple, then they’d feel pretty confident that Apple is not turning around and hacking people.