From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Mon Jan 29 2001 - 22:05:46 MST
Samantha Atkins wrote:
>
> But this are declaritive cognitive supergoals that you yourself, after a
> lot of consideration, chose. Since you seem to be saying you will
> establish Friendliness as the Supergoal in the SI by fiat, it is not the
> same situation. In order to be fully intelligent an AI has to be able
> to question its own goals and adjust as necessary. I don't see that you
> can establish and hope to keep a Friendliness or any other supergoal for
> all time without cripping the intellectual and self-evaluative power of
> the AI.
Um, when did I *ever*, *ever* say that I would establish Friendliness as
the supergoal by fiat? For Friendliness to be stable, the AI needs
human-equivalent or transhuman "strength of philosophical personality".
The definition has to be supported by the same shapers that support it in
me, at least as a starting point. Remember that programmer independence
is a design goal, and that the AI learns about Friendliness by tracing out
the causal chain that lies behind the volitional actions of the
programmers. Eventually, Friendliness should be defined by reference to
the interactional product of functional forces that are shared by all, or
almost all, humans, at which point the AI or SI is programmer-independent.
> > But I am *not* proposing to make AIs neurotic. "Friendliness is the only
> > important thing in the world", or rather, "Friendliness is my goal
> > system", is a perfectly healthy state of mind for an AI. And taking
> > massive preventative action if Friendliness is threatened doesn't require
> > a programmer assist; it's a natural consequence of the enormous unFriendly
> > consequences of an AI community drifting away from Friendliness. I would
> > *never* "make" an AI fear anything; at most, I would ask politely.
>
> I don't see this as a healthy state. It is an unexamined primary
> belief. No, much stronger than that. It is the wired in basis of
> everything else. Either it is forbidden and/or impossible to examine it
> (by definition unhealthy) or it can be examined and found wanting. That
> friendliness is desirable does not mean having it as an absolute is
> either tenable or healthy.
Yes, Friendliness can be examined and found wanting. This is a feature,
not a bug. Nothing I said above is incompatible with that.
> So is the idea of a super-SI that rules everything else on the basis of
> a more or less hardwired and presumably immutable Friendliness
> supergoal. Said SI will be most unfriendly to those entities that do
> not agree with its notion of what friendliness entails.
What? Why on Earth would ve care? Taken at face value, this statement
sounds remarkably anthropomorphic. If you scream and hold protests about
the evil dictator SI, the evil dictator SI will help you write your
banners, organize the march, think of good slogans, and fulfill any other
Sysop API requests that don't infringe on the living space of other
citizens.
> AFAIK it will
> not necessarily be "friendly" to those entities that simply wish to
> strike out on their own outisde of its influence.
It's the Sysop's superintelligent decision as to whether letting someone
Outside would pose an unacceptable risk to innocent sentients. My
personal guess is that it does pose an unacceptable risk. If something
doesn't pose an unacceptable risk to innocent sentients, you should be
able to do it through the Sysop API. That's practically what Friendliness
*is*. If you want to play tourist in Betelguese, wrap a chunk of Sysop
around yourself and take off. You won't be able to torture the primitives
when you get there, but you'll be able to do anything else.
-- -- -- -- --
Eliezer S. Yudkowsky http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:35 MDT