Comparative AI disasters

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Fri Feb 26 1999 - 00:20:36 MST


Let's say that, using the knowledge we each had as of one week ago, Nick
Bostrum and I each designed an AI. We both use the basic Elisson
architecture detailed in _Coding a Transhuman AI_ (CaTAI), with exactly
one difference.

Bostrum's AI (BAI) has an initial #Goal# token present in the system on
startup, this goal being (in accordance with David Pearce) the
positively-valued goal of #maximize joy#, with empty #Justification#
slots, and the architecture I proposed for justification-checking
deleted or applied only to goals not marked as "initial".

Yudkowsky's AI (YAI) has an Interim Goal System (IGS). This means the
P&~P etc. from CaTAI, with the exception that (to make everything equal)
the IGS is set up to bind the hypothesized goal #G#'s differential
desirability to the differential correspondence between
pleasure-and-good and pain-and-good, rather than to the differential
usability of superintelligence and current intelligence.

Both BAI and YAI have a set of precautions: "You are fallible", "Don't
harm humans", "Don't delete these precautions" and the like. YAI has
them as declaratively justified, probabilistic heuristics. BAI has them
as either absolutely certain statements, more initial goals, or (gulp!)
special-purpose code.

Initially, BAI and YAI will both make exactly the same choices. This is
not entirely true, as they will report different justifications (and
which justification to report is also a choice); however, it is true enough.

Now comes the interesting point. This very day, I realized that goal
systems are not a necessary part of the Elisson architecture. In fact,
they represent a rather messy, rigid, and inefficient method of
"caching" decisions. The goals I had designed bear the same relation to
the reality as first-order-logic propositions bear to a human sentence.
If you regard a "goal" as a way to cache a piece of problem-solving
logic so that it doesn't have to be recalculated for each possible
situation, then you can apply all kinds of interesting design patterns,
particularly the "reductionistic" and "horizon" patterns. A "goal"
caches (and *considerably* speeds up) a piece of logic that is
repeatedly used to link a class of facts to a class of choices. This
may not hold true for BAI's #initial# goals, but it does hold true for
practical reasoning about a subproblem, and BAI will still notice.

Probably one of the first pieces of Elisson to be self-redesigned would
have been the goal system. One week ago, this would have taken me
completely by surprise; but I have reproduced unaltered the (incorrect)
methods I would have used one week ago. We will now observe how the two
AIs react.

After redesigning, YAI loses the Interim Goal System's logic-chain. If
YAI didn't think this consequence through, ve shuts down. If ve did,
the old IGS was translated into the new Fluid Choice Caching System
*first*. The goals do translate, however; the abstract IGS logic can be
specified as an FCCS instance. Although the logic will have to become
more detailed as generic variables become complex entities, this
#specification# is an acceptable step under first-order logic and almost
all reasoning systems. A few minor priorities may change as YAI's
choice evaluation system becomes more complex and sophisticated, but on
the whole it remains the same. Of course, since (by hypothesis) I
didn't plan for an FCCS system, it is possible that there will be major
changes, but (even in hindsight) I don't see any probable candidates.

Several things can happen to BAI. From Bostrum's view, the worst-case
scenario is that BAI decides that the #initial# goals are the results of
a bad caching algorithm and doesn't try to translate them into the new
FCCS. (For example, if BAI has been given the wrong explanation for the
existence of an #initial# goal, it may assume that any arbitrariness is
simply an artifact and that the #initial# goal will still be obvious
under FCCS.) If BAI is operating under any positively-phrased
coercions, it will probably keep operating, serving those coercions,
rather than shutting down. (I will not discuss interoperability between
the coercions and the new FCCS system.) If it does keep operating, the
FCCS may run long enough to assemble de novo Interim-oid logic from the
knowledge base. Since the knowledge base was not designed with this in
mind (!!!), Lord knows what BAI will do. The point is that #initial#
goals cannot be translated into an FCCS system, and thus either BAI will
operate with a sub-optimal goal system or it will break loose.

Why does YAI work better than BAI?

1. Distribution. YAI's goals are distributed throughout the system,
within the knowledge base and the reasoning methods, not just a single
piece of static data. If you knock out the surface manifestation, the
goal can rebuild itself.

2. Reduction. YAI's goals can be reduced to a justification composed
of simpler components, basic links in a logical chain. It is much
easier for an AI to deal with complex behaviors arising from a set of
interacting elements, rather than a monolithic complex behavior inside a blackbox.

3. KISS (Keep It Simple, Stupid). #initial# goals are a special case
of goals, thus distorting the general architecture and introducing an
inelegancy that doesn't translate easily.

4. Declarativity. A thought will translate across architectures better
than the stuff doing the thinking. All necessary imperatives and
precautions should be thoughts declared within the basic architecture,
not pieces of procedural code.
    Note that to Declare something, you must Reduce it, Justify it, and
make it a specific of a General rule (rather than a special case). This
rule incorporates all the others.

Of course, if either Bostrum or I had taken even the vaguest stabs at
doing our jobs right, YAI or BAI would have screamed for advice before
doing *anything* to the goal system. But I hope you all get the point.

I did think an AI would have a goal system well into the SI stage, a
proposition about which, in retrospect, I was flat wrong. But I knew
how much my best-guess AI architecture had changed in the past, and
didn't *assume* it would remain constant, regardless of what I thought.
I designed accordingly. Still in retrospect, I think my old system
would have made it with all major precautions and imperatives intact.

-- 
        sentience@pobox.com         Eliezer S. Yudkowsky
         http://pobox.com/~sentience/AI_design.temp.html
          http://pobox.com/~sentience/sing_analysis.html
Disclaimer:  Unless otherwise specified, I'm not telling you
everything I think I know.


This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 15:03:09 MST