| ▲ | charcircuit 2 hours ago | |
>Why should we expect a model to be aligned with human interests, if it has been trained on a myriad instances of humans being degraded and violated? Understanding more about what exists in the real world, outside of its pile of weights, is separate from alignment. If an AI model learns that it is possible for a house to burn down. That doesn't mean an AI will want to burn down a house. | ||
| ▲ | anematode an hour ago | parent | next [-] | |
Context matters; how many of these images in the training data are taken from shock websites, and therefore associated with misanthropic commentary, versus legitimate sources like medical journals or historical pictures? Based on the samples posted by the author, it seems likely to be mostly the former. Whereas most discussions of burning a house down (not saying all, of course!) are probably in a neutral or negative context (e.g., news articles describing a crime). "Understanding more about what exists in the real world" is a remarkable euphemism, btw. | ||
| ▲ | paytonjjones an hour ago | parent | prev | next [-] | |
Exposure to horrors doesn't imply capability or desire to commit said horrors. But it does seem like kind of a prerequisite. All else being equal, I think I'd prefer my models to be naive about human degradation and torture, for instance. Exceptions made for specialized models used for police work etc. I do think broader alignment is necessary either way but that seems like an extra guardrail it'd be nice to have. | ||
| ▲ | queenkjuul an hour ago | parent | prev [-] | |
The AI doesn't want or understand anything; it presents a statistically likely output given an input. Including this stuff in the inputs guarantees it is available as an output. | ||