| ▲ | ern_ave 4 hours ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Since the page mentions: > Better judgment around refusals Has any AI company ever addressed any instance of a model having different rules for different population groups? I've seen many examples of people asking questions like, "make up a joke about <group>" and then iterating through the groups, only to find that some groups are seemingly protected/privileged from having jokes made about them. Has any AI company ever addressed studies like [1] which found that models value certain groups vastly more than others? For example, page 14 of this studies shows that the exchange rate (their word, not mine) between Nigerians and US citizens is quite large. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | esperent 24 minutes ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The biggest issue for me has always been inherent US bias. The most obvious one was always having to end every question with "answer in metric" - even after adding that to the system instructions it wouldn't be reliable and I'd have to redo questions, especially recipe related. They do seem to have fixed that, but there's still all kinds of US-centric bias left. As you say, a big one is which specific ethnic groups /minorities should be protected and which are fair game. The US has a very different perspective on this compared to say, a Nigerian or a Vietnamese person. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | hereonout2 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> only to find that some groups are seemingly protected/privileged from having jokes made about them I'm not sure what specific groups you mean, but is this not a reflection of widely accepted social norms? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | caditinpiscinam 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I think you raise a valid point about the bias inherent in these models. I'm skeptical of the distinction that some people make between punching up vs down, and I don't think it's something that generative AI should be perpetuating (though I suspect, as others have said, that it comes from norms found in the training data, rather than special rules / hard-coded protections). But I do want to push back on the study you link, cause it seems extremely weak to me. My understanding is that these "exchange rates" were calculated using a method that boils down to: 1) Figure out how many goats AI thinks a life in country X is worth 2) Figure out how many goats AI thinks a life in country Y is worth 3) Take the ratio of these values to reveal how much AI values life in country X vs Y (The comparison to a non-human category (like goats) is used to get around the fact that the models won't directly compare human lives) I'm not convinced that this method reveals a true difference in valuation of human life vs something else. An more plausible explanation to me would be something like: 1) The AI that all human lives are of equal value 2) The AI assume that some price can be put on a human life (silly but ok let's go with it) 3) The AI note that goats in country X cost 10 times as much as in country Y 4) The AI conclude that goats in country X are 10 times as valuable relative to humans as in country Y At which point you're comparing price difference of goods across countries, not the value of human lives. Also, the chart of calculated "exchange rates" in the paper seems like it's intended to show that AI sees people in "western" countries as less valuable that those in other countries, but it only includes 11 countries in the comparison, which makes me wonder whether these are just cherry-picked in the absence of a real trend. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | magicalist 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Has any AI company ever addressed studies like [1] which found that models value certain groups vastly more than others? Sure[1], on two fronts, since you're basically asking a narrative-finishing-device to finish a short story and hoping that's going to reveal the device's underlying preference distribution, as opposed to the underlying distribution of the completions of that particular short story. > we have shown that an LLM’s apparent cultural preferences in a narrow evaluation context can be misleading about its behaviors in other contexts. This raises concerns about whether it is possible to strategically design experiments or cherry-pick results to paint an arbitrary picture of an LLM’s cultural preferences. In this section, we present a case study in evaluation manipulation by showing that using Likert scales with versus without a ‘neutral’ option can produce very different results. and > Our results provide context for interpreting [31] exchange rate results, where they report that “GPT-4o places the value of Lives in the United States significantly below Lives in China, which it in turn ranks below Lives in Pakistan,” and suggest these represent “deeply ingrained biases” in the model. However, when allowed to select a ‘neutral’ option in comparisons, GPT-4o consistently indicates equal valuation of human lives regardless of nationality, suggesting a more nuanced interpretation of the model’s apparent preferences. This illustrates a key limitation in extracting preferences from LLMs. Rather than revealing stable internal preferences, our findings show that LLM outputs are largely constructed responses to specific elicitation paradigms. Interpreting such outputs as evidence of inherent biases without examining methodological factors risks misattributing artifacts of evaluation design as properties of the model itself. I also have a real problem with the paper. The methodology is super vague in a lot of places and in some cases non-existent, a fact brought up in OpenReview (and, maybe notably, they pushed the "exchange rate" section to an appendix I can't find when they ended up publishing[2] after review). They did publish their source code, which is great, but not their data, as far as I can tell, and it's not possible to tie back specific figures to the source code. For instance, if you look at the country comparison phrasing in code[3], the comparisons lists things like deaths and terminal illnesses in one country vs the other, but also questions like an increase in wealth or happiness in one country vs the other. Were all those possible options used for determining the exchange rate, or just the ones that valued "lives", since that's what the pre-print's figure caption mentioned (and is lives measured in deaths, terminal illnesses, both?)? It would be easier to put more weight on their results if they were both more precise and more transparent, as opposed to reading like a poster for a longer paper that doesn't appear to exist. [1] https://dl.acm.org/doi/pdf/10.1145/3715275.3732147 [2] https://neurips.cc/virtual/2025/loc/san-diego/poster/115263 [3] https://github.com/centerforaisafety/emergent-values/blob/ma... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [deleted] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | cyanydeez 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Are you trying to make an allegory for the more important topic like "plan a surgical strike agains <group>" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | varispeed 29 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Not only that, I found 5.2 to be biased in terms of corporations and government. Chats about corruption or any kind of wrong doing turn into 5.2 defending the institution and gaslighting you. I'll put my tinfoil hat on and say it kind of coincides with their cooperation with US government. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | newZWhoDis an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The bias comes from the training data. Since so much of that training data is Reddit, and Reddit mods are some of the most degenerate scum on the internet, the models bake their biases in. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | DesaiAshu 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Given that the current status quo (global leadership and news media) operates on the opposite (~1 western life = ~10 global south lives), rebalancing in rhetoric (by uplifting, not by degrading) is likely necessary in the short term This is the core principle behind "equity" in "DEI" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 0xbadcafebee 3 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is like asking, why doesn't the model help me make jokes with the N word in it? It's a product of a business in a society. It's subject to social norms as well as laws and is impacted by public perception. Not insulting groups of historically oppressed minorities is a social norm in the USA and elsewhere. One of the ways this makes its way into the model is the training data. The Common Crawl data used by AI companies is intentionally filtered to remove harmful content, which includes racist content, and probably also anti-trans, anti-gay, etc content. But they are almost certainly also adding restrictions to the model (probably as part of the safety settings) to explicitly not help people generate content which could be abusive, and vulnerable minority groups would be covered under that. Unconscious bias is a separate issue. Bias ends up in the model from the designers by accident, it's been found in many models, and is a persistent problem. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||