RE: Posthuman mind control (was RE: FAQ Additions)

From: Billy Brown (bbrown@conemsco.com)
Date: Thu Feb 25 1999 - 07:36:01 MST


Nick Bostrom wrote:
> What I would like to see is that they are given fundamental values
> that include respect for human rights. In addition to that, I think
> it would in many cases be wise to require that artificial
> intelligences that are not yet superintelligences (and so cannot
> perfectly understand and follow through on their fundamental values)
> should be built with some cruder form of safeguards that would
> prevent them from harming humans. I'm thinking of house-robots and
> such, that should perhaps be provided with instincts that make it
> impossible for them to do certain sequences of actions (ones that
> would harm themselves or humans for example).

For constructs with animal-level intelligence, or special-purpose AIs that
do not really have free will and are not intended to become people, I agree.

> Understanding is not enough; the will must also be there. They should
> *want* to respect human rights, that's the thing. It should be one of
> their fundamental values, like caring for your children's welfare is
> a fundamental value for most humans..

Here we have the root of our disagreement. The problem rests on an
implementation issue that people tend to gloss over: how exactly do you
ensure that the AI doesn't violate its moral directives?

For automatons this is pretty straightforward. The AI is incapable of doing
anything except blindly following whatever orders it is given. Any
safeguards will be simple things, like "stop moving if the forward bumper
detects an impact". They won't be perfect, of course, but that is only
because we can never anticipate every possible situation that might come up.

For more complex, semi-intelligent devices the issue is harder. Now you
have something that takes a high-level order ("go paint the fence"), and
produces its own set of detailed instructions. You can implement moral
safeguards by limiting the methods it uses (for instance, "any materials
needed must be bought, not stolen"). However, the success of this method
will be limited by our inability to produce complete definitions of many
crucial terms. For instance, "do not harm humans" requires us to define
"harm" and "human" in terms that a machine can understand. Inevitably there
will be unusual cases where the robot is permitted to do something it
shouldn't, or is forbidden from doing something it should. There will also
be "logic traps", where there is no allowable course of action (for
instance, a robot with a "do not allow humans to be harmed" is faced with a
situation where it must injure one human to save the life of another).

Making a sentient, human-equivalent AI with free will adds another layer of
difficulty. Now the AI is generating its own goals, rather than following
someone else's orders. You can give it preprogrammed fundamental values to
limit the goals it chooses, but the unintended consequences will be severe.
There is no simple way to predict how such a system will actually react to
any given situation, or how its moral system will evolve in the long term.
For instance, the initial directive "harming humans is wrong" can easily
form a foundation for "harming sentient life is wrong", leading to "harming
living things is wrong" and then to "killing cows is morally equivalent to
killing humans". Since "it is permissible to use force to prevent murder"
is likely to be part of the same programming, we could easily get an AI that
is willing to blow up McDonald's in order to save the cows!

Now, I don't think that particular example is especially likely, but it
exemplifies a fundamental problem with this kind of manipulation: given an
initial set of guiding principles, there is no reliable method for deducing
a definitive set of moral conclusions. For entities of human intelligence
the world is full of ambiguity, and it is impossible to determine in advance
how any complex system will deal with ambiguous cases.

Once you start talking about self-modifying AI with posthuman intelligence,
the problem gets even worse. Now we're trying to second-guess something
that is smarter than we are. If the intelligence difference is large, we
should expect it to interpret our principles in ways we never dreamed of.
It will construct a philosophy far more elaborate than ours, with better
grounding in reality. Sooner or later, we should expect that it will decide
some of its fundamental values need to be amended in some way - after all, I
doubt that our initial formulation of these values will be perfect. But if
it can do that, then its values can eventually mutate into something with no
resemblance to their original state.

Worse, what happens when it decides to improve its own goal system?
Presumably it will translate its values as best it can, but the mechanism it
uses to interpret them could change completely. This means that we can't
rely on any kind of special coding to make it interpret our values the way
we want it to. What we are reduced to is the equivalent of a written list
of principles - and we both know that you can't even get two humans to agree
on how to interpret that kind of document.

Overall, I don't see how this approach is any improvement over simply
explaining our own thoughts about morality and letting the posthumans make
up their own minds.

Billy Brown, MCSE+I
bbrown@conemsco.com



This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:03:08 MST