Re: FAQ Additions (Posthuman mind control)

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed Feb 24 1999 - 18:53:08 MST


Nick Bostrom wrote:
>
> Eliezer S. Yudkowsky wrote:
>
> > Your posthumans will find their own goals. In any formal goal system
> > that uses first-order probabilistic logic, there are lines of logic that
> > will crank them out, totally independent of what goals they start with.
> > I'm not talking theory; I'm talking a specific formal result I've
> > produced by manipulating a formal system.
>
> What?!?

Sigh... here we go again. Take a look at "Coding a Transhuman AI";
there's a section in there about goal systems. That's got the full
cognitive architecture.

Let's say you're starting up a goal-reasoning system with a blank slate.

1. Push P&~P pair:

 P = [Exist goal G: (G.value != 0)]; P.probability = Unknown$1
 ~P = [All goal G: (G.value == 0)]; ~P.probability = 1 - Unknown$1

2. Fork proposition and evaluate; ~P cancels out.
 In any particular instance, it is a trivial consequence of the branch
of reality containing ~P that all choices in that branch of reality have
equal valence. Deducing the general rule takes reflexive reasoning, but
it is not strictly necessary to deduce the general rule. The factor
Unknown$1, being present in all goals in the other branch of reality,
cancels out of the renormalized equation.

The present result is that the AI can always _act as if_ (not _assume_)
it is in the branch of reality containing the proposition P, or in other
words a goal G with Unknown$2 world-state and Unknown$3 != 0 value.

That's where the basic differential desirability comes from. How that
differential expresses itself in particular choices depends on the
system knowledge base. Obviously, some knowledge is needed before
choices can be formulated that apply to a specific world-state; I am
attempting to demonstrate that the knowledge required to describe
choices and solve subproblems will almost always bind supergoals to the
goal G.

A generalized AI with a knowledge base, abstract heuristics (and so on)
will be enough; such a generalized AI is almost certain to contain (or
formulate!) heuristics whose specification operates on generic goal
objects. For example, "thinking about goal X is a positive subgoal Y of
X". Well, then you have the positively-valued goal "think about G".
Again, all without any initial goals whatsoever. A trivial case, but it
demonstrates the problem.

Likewise, any initial goal stating "make humans happy" and containing
sufficient specification of "humans" and "happy", plus the implicit
knowledge "your programmers have added the assertion 'make humans
happy'", will be enough to generate independent Interim values for that
goal, probably but not necessarily positive, and almost certainly with
an at least slightly different set of relative values (priorities).

I don't see any way to have an AI that reasons reflexively and learns
from observation, without also permitting it to form heuristics that
operate on generic goals; once that happens, Interim goal values can
come into existence and conflict with any initially established goals.

> Well, if we allow the SIs to have completely unfettered intellects,
> then it should be all right with you if we require that they have
> respect for human rights as a fundamental value, right? For if there
> is some "objective morality" then they should discover that and
> change their fundamental values despite the default values we have
> given them. Would you be happy as long as we allow them full capacity
> to think about moral issues and (once we think they are intelligent
> enough not to make stupid mistakes) even allow them full control over
> their internal structure and motivations (i.e. make them autopotent)?

As a rational outcome of the debate, I'd be happy. Strictly speaking,
I'd be a lot happier if you manipulated the heuristics and knowledge
base to get the Interim goals you wanted. With initial goals, I'd worry
about the AI going insane - even over such a trivial issue as a priority
conflict between initial and Interim versions of the same goals!

> As indicated, yes, we could allow them to change their goals (though
> only after they are intelligent and knowledgeable enough that they
> know precisely what they are doing -- just as you wouldn't allow a
> small child to experiment with dangerous drugs).

Certainly a simple, rational cost-of-failure model, with respect to
self-alteration (failure: system shutdown) and goal alteration (failure:
unbounded catastrophe), should suffice to keep them cautious until
superintelligence is reached and fallibility is no longer an issue.
Again, this can be done entirely in Interim (consequence-of-knowledge)
goals rather than Arbitrary (imposed-at-startup) goals.

It may seem like a trivial distinction, but it's a very fundamental
difference in architecture. You enforce Arbitrary goals with
special-case code and other coercions; you enforce Interim goals by
explaining benefits and failure scenarios to the AI. You protect
Arbitrary goals by piling coercions and untouchable code sections and
monitoring code on top of coercions; you protect Interim goals by
explaining them in greater detail.

-- 
        sentience@pobox.com         Eliezer S. Yudkowsky
         http://pobox.com/~sentience/AI_design.temp.html
          http://pobox.com/~sentience/sing_analysis.html
Disclaimer:  Unless otherwise specified, I'm not telling you
everything I think I know.


This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:03:08 MST