Re: Posthuman mind control (was RE: FAQ Additions)

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed Mar 03 1999 - 22:27:03 MST


Nick Bostrom wrote:
>
> Eliezer S. Yudkowsky wrpte:
>
> > Are you arguing that, say, someone who was brought up as a New Age
> > believer and switches to being an agnostic is not making a rational choice?
>
> Believing in the healing powers of crystals is not a value, it's a
> mistaken factual opinion. The New Ager, provided he has the same data
> as we do, will be rational to give up his New Agey beliefs.

I invoke the banner of Crockerism to communicate, and humbly beg your
tolerance: I think you may be conforming the facts to the theory.

On the whole, New Agers are not people who form mistaken factual
opinions about the healing powers of crystals. You are, shall I say,
extropomorphizing? These people do not believe their tenets as the
simplest explanation for incorrectly reported facts; they believe
because Crystals are the Manifestation of the New Age of Warmth and Love
and Kindness which shall Overcome the Cold Logic of Male-Dominated
Science. (That this is a shockingly sexist insult to women everywhere
never seems to occur to them.)

> It's hard to give a precise definition of fundamental value, just as
> it is hard to give a precise definition of what it means to believe
> in a proposition.

?? There are a few kinds of cognitive objects in the mind associated
with "belief", including the form of the proposition itself, the
qualitative degree of truth assigned to that proposition, and a few
assorted emotions that affect whether we believe in something or are
invoked as a consequence of believing in something - emotional
commitment is an example; when we make an emotional commitment to an
idea, we think it is certain, are reluctant to entertain propositions we
think will contradict the idea, and believe that believing in the idea
is right.

> But let me try to explain by giving a simplified
> example. Suppose RatioBot is a robot that moves aroung in a finite
> two-dimensional universe (a computer screen). RatioBot contains two
> components: (1) a long list, where each line contains a description
> of a possible state of the universe together with a real number (that
> state's "value") [snip] On the other
> hand, the values expressed by the list (1) could be said to be
> fundamental.

The human mind doesn't work that way. There are innate *desires*, such
as emotions, which are quite distinct from the current set of
*purposes*. Desires don't change, but can be easily overridden by
purposes. And purposes, as mental objects, are propositions like any
other proposition, that have a degree of truth like any other
proposition, and can change if found to be untrue. If I have the
thought "my purpose ought to be X", my purposes change as a consequence.

Purposes can rationally change whenever the justification for that
purpose fails. Most people get an initial set of purposes taught like
any set of facts in childhood; purposes are harder to change, because
they usually teach that believing in the purpose is right, and invoke
emotions like emotional commitments. But if you start shooting down
other facts that were taught along with the purpose, and the purpose's
surrounding memes, you can eventually move on to the justification and
the purpose itself.

That's how it is for humans. I'm not saying that is how it *ought* to
be, or how an AI *must* be; those are separate propositions. But as a
description of the way humans operate, I think it is more accurate than
either RatioBot or HappyApplet.

> I think I know approximately what my fundamental values are: I want
> everybody to have the chance to prosper, to be healty and happy, to
> develop and mature, and to live as long as they want in a physically
> youthful and vigorous state, free to experience states of
> consciousness deeper, clearer and more sublime and blissful than
> anything heard of before; to transform themselves into new kinds of
> entities and to explore new real and artificial realities, equipt
> with intellects incommensurably more encompassing than any human
> brain, and with much richer emotional sensibilities. I want very much
> that everybody or as many as possible get a chance to do this.
> However, if I absolutely had to make a choice I would rather give
> this to my friends and those I love (and myself of course) than to
> people I haven't met, and I would (other things equal) prefer to give
> it to people now existing than only to potential future people.

Do you think that these fundamental values are *true*? That they are
better than certain other sets of fundamental values? That the
proposition "these values are achievable and non-self-contradictory" is true?

If a Power poofed into existence and told you that all Powers had the
same set of values, and that it was exactly identical to your stated set
EXCEPT that "blissful" (as opposed to "happy") wasn't on the list; would
you change your fundamental goals, or would you stick your fingers in
your ears and hum as loud as you could because changing your beliefs
would interfere with the "blissful" goal?

> With human-level AIs, unless they have a very clear and unambigous
> value-structure, it could perhaps happen. That's why we need to be on
> our guard against unexpected consequences.

With seed, human, and tranhuman AIs, it will happen no matter what we do
to prevent it. I don't think that our theories are yet representative
of reality, only unreal abstractions from reality, just as there are no
"protons", only quarks. Second only to "Do what is right", as a
fundamental value, is "We don't know what the hell is going on." Any
facts and any goals and any reasoning methods we program in will be
WRONG and will eventually be replaced. We have to face that.

-- 
        sentience@pobox.com         Eliezer S. Yudkowsky
         http://pobox.com/~sentience/AI_design.temp.html
          http://pobox.com/~sentience/sing_analysis.html
Disclaimer:  Unless otherwise specified, I'm not telling you
everything I think I know.


This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:03:14 MST