| ▲ | wattsy2025 3 hours ago | |
The most important part of AI safety is AI alignment: making sure AI does what we want. It's very hard because even if AI isn't trying to deceive you it can have bad outcomes by executing your request to the letter. The classical example is tasking an AI to make paperclips, training the AI with a reward for making more paperclips. Then the AI makes the most paperclips possible by strip mining the Earth and killing anything in its way. Sometimes you see this AI alignment problem in action. I once asked an older model to fix the tests and it eventually gave up and just deleted them | ||