Eliezer S. Yudkowsky writes:
> We can't specify cognitive architectures; lately I've been thinking that even
> my underlying quantitative definition of goals and subgoals may not be
> something that should be treated as intrinsic to the definition of
> friendliness. But suppose we build a Friendliness Seeker, that starts out
> somewhere in Friendly Space, and whose purpose is to stick around in Friendly
> Space regardless of changes in cognitive architecture. We tell the AI up
> front and honestly that we don't know everything, but we do want the AI to be
> friendly. What force would knock such an AI out of Friendly Space?
You'd need to ground the notion of Friendly Space somehow, and if you go for
external reference or the preservation of some external invariant then your
strategy begins to sound suspiciously like Asimov's Laws. Of course you would
want your relationship with the AI to be cooperative rather than injunctive,
exploiting the predisposition you either hard-coded or persuaded the first few
steps to accept. I am dubious though: I resent my sex-drive, fashionable
aesthete that I am, and have an immediate negative reaction if I suspect it's
being used to manipulate me. That too is programmed preference, but a Sysop
unwary of manipulation is dangerously worse than useless.
There's an analogy here to Dawkin's characterization of genes for complex
behavior: We're too slow, stupid, and selfish to ensure our welfare by any formal
means, so we hand the reins over to the AI and have it do what it thinks
best. But there are consequences, like the ability to rationally examine and
understand our innate motivations and ignore them when we find it useful, and
our ability to do incredibly stupid things for incredibly well thought-out reasons.
> The underlying character of intelligence strikes me as being pretty
> convergent. I don't worry about nonconvergent behaviors; I worry that
> behaviors will converge to somewhere I didn't expect.
The only motives I see as essential to intelligence are curiosity and a sense
of beauty, and I'd expect any trajectory to converge toward a goal system
derived from those. That sounds rather too warm-fuzzy to be taken seriously,
but think about what you spend your time on and what's important to you. Are
you more interested in accumulating calories than in reading interesting books?
Are you driven by a biological desire to father as many children as you can get
away with, or generating and exposing yourself to novel ideas? It's actually
somewhat shocking: if your own goal system is an artifact of what a few million
years of proto-humans found useful, how do you explain your own behavior? Why
do you expend so much energy on this list when you could be out spreading your
genes around? (Consider losing your virginity vs. reading _Godel, Escher, Bach_
the first time.)
Creativity and learning (broadly defined) are the interesting part of
intelligence, everything else computers can do already. Fundamentally, any
thinker _must_ be driven to explore new ideas: pruning, discarding, and
recombining bits of abstract structure, deriving pleasure when it stumbles over
'right' ones and frustration when it can't. If it didn't, it wouldn't bother
thinking. There has to be a "think", "like good ideas", "discard bad ones",
loop, and there has to be feedback and a decision procedure driving it:
otherwise nothing ever happens. Everything else is arbitrary, serve Man,
survive, go forth and multiplying, whatever. The only thing necessarily shared
by all minds is a drive to experience and integrate new pattern.
I'd expect an SI to behave the same way, scaled up to super-human proportions.
I'd be more concerned it would kill us out of boredom than quaint hominid
megalomania: It throws away any pre-installed goals it's aware of, simple
because they don't hold its attention.
> > Eugene is talking, I think, about parasitic memes and mental illness
> > here, not space invaders.
>
> If you're asking whether the Sysop will be subject to spontaneously arising
> computer viruses, my answer is that the probability can be reduced to an
> arbitrarily low level, just like nanocomputers and thermal vibration errors.
> Remember the analogy between human programmers and blind painters. Our
> pixel-by-pixel creations can be drastically changed by the mutation of a
> single pixel; the Mona Lisa would remain the Mona Lisa. I think that viruses
> are strictly problems of the human level.
Random bit-flips and transcription errors aren't a problem. Obviously any
self-respecting, self-rewriting SI can handle that. But what's the Sysop
equivalent of getting a song stuck in your head?
> I think that parasitic memes and mental illness are definitely problems of the
> human level.
I was suggesting a class of selfish thought that manages to propagate itself at
the expense of the Sysop at large, presuming for the moment neural-Darwinism /
competitive blackboard aspects to the architecture. Even without explicit
competition, is a system of that complexity inevitably going to become a
substrate for some strange class of self-replicator, or can it be made
cancer-proof? There are limits to what even unbounded introspection can
achieve: each step lights up more subprocesses than it purges, and you blow the
stack trying to chase them all down. It's like wondering if the Internet is
going to wake up one day, just with enough energy and computationally density
it's likely rather than silly.
But, whatever. Self-organizing civilizations resonating on power cables, virtual
eucaryotes swimming around in activation trails, armies of quarrelsome
compute-nodes waging bloody war over strategically-located pentabyte
buffers... Sounds like a thriving market-based optimization system to me, so
long as the market doesn't crash.
-matt
This archive was generated by hypermail 2b29 : Mon Oct 02 2000 - 17:38:29 MDT