Remix.run Logo
ChadNauseam 4 hours ago

Sometimes people say that they don't understand something just to emphasize how much they disagree with it. I'm going to assume that that's not what you're doing here. I'll lay out the chain of reasoning. The step one is some beings are able to do "more things" than others. For example, if humans wanted bats to go extinct, we could probably make it happen. If any quantity of bats wanted humans to go extinct, they definitely could not make it happen. So humans are more powerful than bats.

The reason humans are more powerful isn't because we have lasers or anything, it's because we're smart. And we're smart in a somewhat general way. You know, we can build a rocket that lets us go to the moon, even though we didn't evolve to be good at building rockets.

Now imagine that there was an entity that was much smarter than humans. Stands to reason it might be more powerful than humans as well. Now imagine that it has a "want" to do something that does not require keeping humans alive, and that alive humans might get in its way. You might think that any of these are extremely unlikely to happen, but I think everyone should agree that if they were to happen, it would be a dangerous situation for humans.

In some ways, it seems like we're getting close to this. I can ask Claude to do something, and it kind of acts as if it wants to do it. For example, I can ask it to fix a bug, and it will take steps that could reasonably be expected to get it closer to solving the bug, like adding print statements and things of that nature. And then most of the time, it does actually find the bug by doing this. But sometimes it seems like what Claude wants to do is not exactly what I told it to do. And that is somewhat concerning to me.

mrob 2 hours ago | parent | next [-]

Not just bats. I'm pretty sure humans are already capable of extincting any species we want to, even cockroaches or microbes. It's a political problem not a technical one. I'm not even a superintelligence, and I've got a good idea what would happen if we dedicated 100% of our resources to an enormous mega-project of pumping nitrous oxide into the atmosphere. N2O's 20 year global warming is 273 times higher than carbon dioxide, and the raw materials are just air and energy. Get all our best chemical engineers working on it, turn all our steel into chemical plant, burn through all our fissionables to power it. Safety doesn't matter. The beauty of this plan is the effects continue compounding even after it kills all the maintenance engineers, so we'll definitely get all of them. Venus 2.0 is within our grasp.

Of course, we won't survive the process, but the task didn't mention collateral damage. As an optimization problem it will be a great success. A real ASI probably will have better ideas. And remember, every prediction problem is more reliably solved with all life dead. Tomorrow's stock market numbers are trivially predictable when there's zero trade.

9dev 4 hours ago | parent | prev [-]

> Now imagine that it has a "want" to do something that does not require keeping humans alive […]

This belligerent take is so very human, though. We just don't know how an alien intelligence would reason or what it wants. It could equally well be pacifist in nature, whereas we typically conquer and destroy anything we come into contact with. Extrapolating from that that an AGI would try to do the same isn't a reasonable conclusion, though.

mofeien 2 hours ago | parent | next [-]

There are some basic reasoning steps about the environment that we live in that don't only apply to humans, but also other animals and geterally any goal-driven being. Such as "an agent is more likely to achieve its goal if it keeps on existing" or "in order to keep existing, it's beneficial to understand what other acting beings want and are capable of" or "in order to keep existing, it's beneficial to be cute/persuasive/powerful/ruthless" or "in order to more effectively reach it's goals, it is beneficial for an agent to learn about the rules governing the environment it acts in".

Some of these statements derive from the dynamics in our current environment were living in, such as that we're acting beings competing for scarce resources. Others follow even more straightforwardly logically, such as that you have more options for agency if you stay alive/turned on.

These goals are called instrumental goals and they are subgoals that apply to most if not all terminal goals an agentic being might have. Therefore any agent that is trained to achieve a wide variety of goals within this environment will likely optimize itself towards some or all of these sub-goals above. And this is no matter by which outer optimization they were trained by, be it evolution, selective breeding of cute puppies, or RLHF.

And LLMs already show these self-preserving behaviors in experiments, where they resist to be turned off and e. g. start blackmailing attempts on humans.

Compare these generally agentic beings with e. g. a chess engine stockfish that is trained/optimized as a narrow AI in a very different environment. It also strives for survival of its pieces to further its goal of maximizing winning percentage, but the inner optimization is less apparent than with LLMs where you can listen to its inner chain of thought reasoning about the environment.

The AGI may very well have pacifistic values, or it my not, or it may target a terminal goal for which human existence is irrelevant or even a hindrance. What can be said is that when the AGI has a human or superhuman level of understanding about the environment then it will converge toward understanding of these instrumental subgoals, too and target these as needed.

And then, some people think that most of the optimal paths towards reaching some terminal goal the AI might have don't contain any humans or much of what humans value in them, and thus it's important to solve the AI alignment problem first to align it with our values before developing capabilities further, or else it will likely kill everyone and destroy everything you love and value in this universe.

hugh-avherald 4 hours ago | parent | prev [-]

The conquering alien civilization is more likely to be encountered than the pacifist one, if they have the otherwise same level of intelligence etc.