| ▲ | Avicebron 3 hours ago | ||||||||||||||||||||||||||||
I don't think it's a stretch that you can train/align a model to avoid "hatespeech" or other topics deemed $Unacceptable you can align a model to favor a certain ideological viewpoint and have that alignment subtly influence the output. How do most Chinese models handle Tienanmen square or discussions on Han superiority? | |||||||||||||||||||||||||||||
| ▲ | margalabargala 2 hours ago | parent | next [-] | ||||||||||||||||||||||||||||
Oh sure, no one said you can't train a model to do this. You certainly can. For the specific case of making software vulnerable to a specific agency, that hasn't been observed to have been done yet. Not because it can't be, but because no one has for now. If it were done, it would be easy(ish) to detect, since it'll be reproducible. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | zozbot234 3 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||
> How do most Chinese models handle Tienanmen square or discussions on Han superiority? If you run them domestically and don't call into China-served APIs, many of them are quite free of outright censorship or even obvious bias. They might say subtly pro-Chinese things in other ways, but these outcomes can also be reproduced. | |||||||||||||||||||||||||||||