From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Mon Sep 18 2000 - 22:42:03 MDT
Matt Gingell wrote:
>
> You seem to think there's a dense, more-or-less coherent, domain of
> attraction with a happily-ever-after stable point in the middle. A
> blotch of white paint with some grey bleed around the edges. If we
> start close to the center, our AI can orbit around as much as it
> likes but never escape into not-nice space. The boundaries are fuzzy,
> but that's unimportant: I can know the Empire State Building is a
> skyscraper without knowing exactly how many floors it has, or having
> any good reason for believing a twelve story apartment building
> isn't. So long as we're careful to start somewhere unambiguous, we
> don't have to worry about formal definitions or prove nothing nasty
> is going to happen.
>
> Eugene, on the other hand, seems to think this is not the case.
> Rather, the friendly and unfriendly subspaces are embedded in each
> other, and a single chaotic step in the wrong direction shoots a
> thread off into unpredictability. More like intertwined balls of
> turbulent Cantor-String, knotted together at infinitely many
> meta-stable catastrophe points, than like the comforting whirlpool of
> nice brains designing ever nicer ones. Your final destination is
> still a function of where you start, but it's so sensitive to initial
> conditions it depends more on the floating point rounding model you
> used on your first build than it does on anything you actually though
> about.
Let's distinguish between trajectories of intelligence and cognitive
architecture, and trajectories of motives. My basic point about motivations
is that the center of Friendly Space is both reasonably compact and
referenceable by a reasonably intelligent human; it's something that seems
possible, at least in theory, for us to point to and say: "Look where I'm
pointing, not at my finger."
We can't specify cognitive architectures; lately I've been thinking that even
my underlying quantitative definition of goals and subgoals may not be
something that should be treated as intrinsic to the definition of
friendliness. But suppose we build a Friendliness Seeker, that starts out
somewhere in Friendly Space, and whose purpose is to stick around in Friendly
Space regardless of changes in cognitive architecture. We tell the AI up
front and honestly that we don't know everything, but we do want the AI to be
friendly. What force would knock such an AI out of Friendly Space?
The rule of thumb I use is that there'll be at least one thing I don't know
about, and therefore I have to come up with a *simple* underlying strategy
that looks able to handle the things I do know about even if I hadn't thought
of them.
> The later seems more plausible, my being a pessimist, the trajectory
> to hell being paved with locally-good intentions, etc.
Matt Gingell makes my quotes file yet again. "The trajectory to hell is paved
with locally-good intentions."
The underlying character of intelligence strikes me as being pretty
convergent. I don't worry about nonconvergent behaviors; I worry that
behaviors will converge to somewhere I didn't expect.
> > Evolution is the degenerate case of intelligent design in which intelligence
> > equals zero. If I happen to have a seed AI lying around, why should it be
> > testing millions of unintelligent mutations when it could be testing millions
> > of intelligent mutations?
>
> Intelligent design without intelligence is exactly what makes
> evolution such an interesting bootstrap: It doesn't beg the question
> of how to build an intelligent machine by assuming you happen to have
> one lying around already.
Intelligence isn't a binary quality. You use whatever intelligence you have
lying around, even if it's just a little tiny bit. Just being able to mutate
descriptors and reversible features is a step above mutating the raw data, and
that doesn't require any intelligence above the level of sensory modalities.
> Eugene is talking, I think, about parasitic memes and mental illness
> here, not space invaders.
If you're asking whether the Sysop will be subject to spontaneously arising
computer viruses, my answer is that the probability can be reduced to an
arbitrarily low level, just like nanocomputers and thermal vibration errors.
Remember the analogy between human programmers and blind painters. Our
pixel-by-pixel creations can be drastically changed by the mutation of a
single pixel; the Mona Lisa would remain the Mona Lisa. I think that viruses
are strictly problems of the human level.
I think that parasitic memes and mental illness are definitely problems of the
human level.
If you're asking how a human can design motivations for a seed AI so that the
seed AI grows into a Sysop which is not evilly exploitable by a formerly-human
superintelligence running on a Java sandbox inside the Sysop, the best answer
I can give is that we should politely ask the Sysop not to be evilly
exploitable. There are some issues that are simply beyond what the Sysop
Programmers can reasonably be expected to handle in detail. This doesn't mean
the issues aren't solvable.
> But it's rather like thinking a
> friendly, symbiotic, strain of bacteria is likely to eventually
> evolve into friendly people. The first few steps might preserve
> something of our initial intent, but none of the well-meaning
> intermediates is going to have any more luck anticipating the
> behavior of a qualitatively better brain than you or I.
It depends on to what extent the behavior of a qualitatively better brain
impacts motives, and to what extent the changes are transmitted from
generation to generation. If the AI's motives are defined only in terms of
inertia, with no attempt at external-reference semantics, then changes will
build up. If the AI's motives are defined by pointing to Friendly Space, so
that the AI knows even in the early stages that there is such a thing as a
"wrong" motive, then the behaviors of better brains should hopefully affect
planning more than outcomes. Eventually the AI finally reaches the level of
superintelligence necessary to perceive Friendly Space cleanly and clearly and
to pick out a really good, solid spot inside it.
-- -- -- -- --
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:31:03 MST