Remix.run Logo
kristianc 9 hours ago

There's two classes of models now - the cybersecurity ones that none of us are getting, and the 'safe' models released for general consumption. This is letting us know which side of the divide it sits on.

Taek 9 hours ago | parent | next [-]

There's also Chinese models, which aren't trying to self-limit capabilities.

axus 9 hours ago | parent | next [-]

Surely the Chinese government will see US gov's intervention and say "Government control of business is stupid, our industry will have more independence from CCP control for the benefit of the world".

baq 9 hours ago | parent | prev [-]

…as long as you don’t ask them about certain dates or squares.

Also, I wouldn’t expect Mythos-class models to be allowed to be openly released by the CCP. Thinking otherwise is pure naivety.

girvo 6 hours ago | parent | next [-]

Depends on the model. Step (from StepFun) will happily yap about Tiannemen to you, if you're running it locally.

Quite a lot of these models have "safety" (lol) filters in front of them, vs it being heavily encoded into the weights not.

satvikpendem 6 hours ago | parent | prev | next [-]

Like the sibling said, you can fine tune if the rejections are in the weights but most often it's actually in the API harness itself; download Qwen or DeepSeek and run it locally to ask about certain dates and squares and it will happily tell you.

atemerev 9 hours ago | parent | prev [-]

Well, the weights are open. De-CCP-ing them is a trivial task, about 40 minutes on modern hardware. So can be done for about $50.

bjelkeman-again 6 hours ago | parent [-]

Any good reference for how?

ls612 6 hours ago | parent | next [-]

https://github.com/p-e-w/heretic

atemerev 6 hours ago | parent [-]

Heretic is a general abliterating framework, mostly used to remove safety alignment, not CCP alignment. Yes, you can put China-specific prompts to it, but you'll need a dataset first (which is available at deccp).

Also Heretic as it is does not work for GLM5.2 (at least as of 3 days ago when I tested it). You'll need some hybrid approaches.

atemerev 6 hours ago | parent | prev [-]

https://github.com/AUGMXNT/deccp - one example for Qwen models. For GLM 5.2, abliteration/realignment works somewhat differently, but with Claude's help, you can finish the job.

I am planning to release the steering patch for the GLM 5.2 eliminating pro-CCP alignment in the next few days.

bwat49 9 hours ago | parent | prev [-]

this seems rather counter-productive, wouldn't a model with less cybersecurity capabilities be more likely to produce insecure code? Not to mention, Chinese models don't have these restrictions and can be used to exploit said unsecure code.

I supposed I shouldn't be surprised at how the trump admin is approaching AI regulation, counter-productive is really all they do

ihsw 7 hours ago | parent [-]

As contradictory as it sounds, they (Anthropic) are probably trying to dance the fine line where its public models can write secure code but cannot exploit insecure code.