Remix.run Logo
eikenberry 10 days ago

Then don't use the cloud-based Chinese providers, use cloud-base US/EU providers using Chinese models. The interesting Chinese models are all open making this issue mostly moot.

daemin 9 days ago | parent [-]

A key point here is open in terms of being able to download and use it, not open as knowing what data and instructions were fed into it when training.

A paranoid part of me thinks that these models are all inherently biased and instructed to be pro CCP, with specific gaps in their training data related to undesirable historic events and political ideas.

gck1 9 days ago | parent | next [-]

The same thing applies to US models. Check out various system prompt leak repos on github. There are also prompt injections by various parallel "alignment" models that pre-process the prompt before it's sent to the main one with questionable guidance.

You'd be surprised how much of bias exists in easily extractable information. Now imagine how much of that happens during training, that you can't easily extract.

So this is largely a moot point. Yes, Chinese models will likely have some weird things injected into them. But so do the US models. Do I care? Not in the slightest. Models are my code monkeys, and if the code leaves my machine, I assume IP is leaked be it a Chinese model that clearly tells me they do use the data, or US models that pinky promise they don't.

therealdrag0 9 days ago | parent | prev | next [-]

Sure but that goes both ways. Any dataset has a bias. My coding doesn’t need to know about Tienamen square.

viking123 9 days ago | parent | prev [-]

Applies both ways, ask it about Israel.