Re: Singularity: AI Morality

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Mon Dec 07 1998 - 12:27:02 MST


I think the features you named are too central to useful intelligence to be eliminated.

Billy Brown wrote:
>
> If we are going to write a seed AI, then I agree with you that it is
> absolutely critical that its goal system function on a purely rational
> basis. There is no way a group of humans is going to impose durable
> artificial constraints on a self-modifying system of that complexity.

Hallelujah, I finally found someone sane! Yes, that's it exactly.

> However, this only begs the question - why build a seed AI?

Primarily to beat the nanotechnologists. If nuclear war doesn't get us in
Y2K, grey goo will get us a few years later. (I can't even try to slow it
down... that just puts it in Saddam Hussein's hands instead of Zyvex's.)

> More specifically, why attempt to create a sentient, self-enhancing entity?
> Not only is this an *extremely* dangerous undertaking,

Someone's going to do it eventually, especially if the source code of a
crippleware version is floating around, and certainly if the good guys do
nothing for a few decades. The key safety question is not _if_, but _when_
(and _who_!).

> but it requires that
> we solve the Hard Problem of Sentience using merely human mental faculties.

Disagreement. The Hard Problem of Consciousness is not necessarily central to
the problem of intelligence. I think that they can be disentangled without
too much trouble.

> Creating a non-sentient AI with similar capabilities would be both less
> complex and less hazardous. We could use the same approach you outlined in
> 'Coding a Transhuman AI', with the following changes:
>
> 1) Don't implement a complete goal system. Instead, the AI is instantiated
> with a single arbitrary top-level goal, and it stops running when that goal
> is completed.

[ "You have the power to compel me," echoed Archive back, flat.
    It was lying.
    It remembered the pain, but in the way something live'd remember the
weather. Pain didn't matter to Archive. No matter how much Archangel hurt
Archive, it wouldn't matter. Ever.
    Archangel thought he could break Archive's will, but he was wrong. A
Library doesn't have a will any more than a stardrive does. It has a
what-it-does, not a will, and if you break it you don't have a Library that
will do what you want. You have a broken chop-logic.
        -- Eluki Bes Shahar, "Archangel Blues", p. 127. ]

Was that what you were looking for?

Problem: You may need the full goal-and-subgoal architecture to solve
problems that can be decomposed into sub-problems, and even if there's a
built-in supergoal, it seems that a fairly small blind search would be needed
to start finding Interim goals.

And who's to say that the goal architecture you program is the real goal
architecture? Maybe, even if you program a nice subgoal system, the AI will
use rational chains of predictions leading "inline" from the top goal. From
there, it's only a short step to totally offline implicit goals. The problem
is that it's hard to draw a fundamental distinction between goals and
statements about goals. Losing the reflexive traces might help, if you can
get away with it.

> 2) Don't try to implement full self-awareness. The various domdules need to
> be able to interface with each other, but we don't need to create one for
> 'thinking about thought'.

The problem is that it it's hard to draw a line between "interfacing" and
"reflexivity". If domdule A can think about domdule B, and B can think about
A, isn't there something thinking about itself? I'm not sure that symbolic
thought (see #det_sym) is possible without some module that analyzes a set of
experiences, finds the common quality, and extracts it into a symbol core. So
in that place, there will be at least module that has to analyze modules.

I'm not sure I could code anything useful without reflexive glue - i.e.,
places where the AI holds itself together.

Furthermore, I'm not sure that the AI will be a useful tool without the
ability to watch itself and tune heuristics. Imagine EURISKO without the
ability to learn which heuristics work best. Learning from self-perceptions
is almost as important as learning from other-perceptions, and again it can be
hard to draw the line. If I choose to move my rook, do I attribute the
results to moving the rook, or to the choice?

Just because an AI doesn't contain a module labeled "self-perception" doesn't
mean that it has no self-perception. I do doubt that self-awareness will
emerge spontaneously (although I wouldn't bet the world on it); what I'm
worried about is self-awareness we don't realize we've coded.

Even without the explicit traces, I think there will be covariances that an
innocent pattern-catching program could unintentionally catch. If some
particular heuristic has the side-effect of presenting its results on CPU
ticks that are multiples of 17, the AI might learn to "trust perceptions
observed on CPU ticks that are multiples of 17". You get the idea.
(Actually, I don't think this particular objection is a problem. There may be
minor effects, but I don't see how they can build up or loop.)

> 3) Don't make it self-enhancing. We want an AI that can write and modify
> other programs, but can't re-code itself while it is running.

I assume that you don't necessarily object to the AI analyzing specific pieces
of code that compose it, only to the AI analyzing _itself_. So it would
redesign pieces of itself, not knowing what it redesigned, and a human would
actually implement the changes? Hmmm...

If it doesn't have reflexivity, how will it self-document well enough for you
to understand the changes?

> The result of this project would be a very powerful tool, rather than a
> sentient being. It could be used to solve a wide variety of problems,
> including writing better AIs, so it would offer most of the same benefits as
> a sentient AI. It would have a flatter enhancement trajectory, but it could
> be implemented much sooner.

I think there's a fundamental tradeoff here, between usefulness and
predictability. Leaving off certain "features" doesn't necessarily help,
because then you have to replace their functionality. Deliberately trying to
exclude certain characteristics makes everything a lot more difficult. (I use
reflexivity heavily in ordinary code!) Trying to keep everything predictable
is almost impossible.

I'm not sure that you can dance close to the fire, getting warm without
getting burned. You want something just intelligent enough to be useful, but
not intelligent enough to wake up. (What if your thresholds are reversed?)
No, it's worse than that. You want something intelligent enough to be useful,
but without any of the features that would give it the *potential* to wake up.
 You want something that can do things we can't, but predictable enough to be safe.

EURISKO is the AI that comes closest enough to displaying the functionality
you want - broad aid, surpassing human efforts, along a wide range of domains.
 But take the self-watching and self-improvement out of EURISKO and it would
collapse in a heap.

I think the features you named are too central to useful intelligence to be eliminated.

> As a result, we might be able to get human
> enhancement off the ground fast enough to avoid an 'AI takes over the world'
> scenario.

I'm not sure if AI tools would help human enhancement at all. When I start
working on human enhancement, I'm going to have three basic methodologies:
One is algernically shuffling neurons around. Two is adding neurons and
expanding the skull to see what happens. Three is implanting two-way
terminals to PowerPC processors and hoping the brain can figure out how to use
them, perhaps with a neural-net front-end incorporating whatever we know about
cerebral codes. Problem is, while these procedures can be safety-tested on
adults, they would probably have to be used on infants to be effective. So
that's a ten-year cycle time.

I know what to do. The key question is whether the brain's neural-level
programmer knows what to do. AI-as-pattern-catcher might help decipher the
cerebral code, but aside from that, I don't think I'll need their suggestions.

Final question: Do you really trust humans more than you trust AIs? I might
trust myself, Mitchell Porter, or Greg Egan. I can't think of anyone else
offhand. And I'd trust an AI over any of us.

-- 
        sentience@pobox.com         Eliezer S. Yudkowsky
         http://pobox.com/~sentience/AI_design.temp.html
          http://pobox.com/~sentience/sing_analysis.html
Disclaimer:  Unless otherwise specified, I'm not telling you
everything I think I know.


This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 14:49:55 MST