Remix.run Logo
catigula 16 hours ago

Category error. Intelligence is a different type of thing. It is not a boring technology.

>Give humanity's wellbeing the highest weight on its training

We don't even know how to do this relatively trivial thing. We only know how to roughly train for some signals that probably aren't correct.

This may surprise you but alignment is not merely unsolved; there are many people who think it's unsolvable.

Why do people eat artificially sweetened things? Why do people use birth control? Why do people watch pornography? Why do people do drugs? Why do people play video games? Why do people watch moving lights and pictures? These are all symptoms of humans being misaligned.

Natural selection would be very angry with us if it knew we didn't care about what it wanted.

ASalazarMX 14 hours ago | parent [-]

> Why do people eat artificially sweetened things? Why do people use birth control? Why do people watch pornography? Why do people do drugs? Why do people play video games? Why do people watch moving lights and pictures? These are all symptoms of humans being misaligned.

I think these behaviors are fully aligned with natural selection. Why do we overengineer our food? It's not for health, because simpler food would satisfy our nutritional needs as easily, it's because our far ancestors developed a taste for food that kept them alive longer. Our incredibly complex chain of meal preparation is just us looking to satisfy that desire for tasty food by overloading it as much as possible.

People prefer artificial sweeteners because they taste sweeter than regular ones, they use birth control because we inherently enjoy sex and want more of it (but not more raising babies), drugs are an overloading of our need for hapiness, etc. Our bodies crave for things, and uninformed, we give them what they want but multiplied several fold.

But geez, I agree, alignment of AI is a hard problem, but it would be wrong to say it's impossible, at least until it's understood better.

catigula 14 hours ago | parent [-]

It seems like you don’t understand reinforcement learning. The signal is reinforced because it correlates to behavior, hacking the signal itself is misalignment.