Re: guaranteeing friendliness

From: Eliezer S. Yudkowsky (
Date: Sat Nov 26 2005 - 09:05:56 MST

Chris Capel wrote:
> My understanding of where Phillip was going is that the techniques
> used to provide some measure of certainty about the direction the
> world climate is moving, if developed, would have a lot of
> resemblances to the sort of guarantee that Eliezer is trying to
> develop with friendliness. It would have to take a very complicated
> system, with lots of unknowns, and try to derive statistically certain
> conclusions about the long-term direction of those systems that we can
> be justified in believing are valid and immune to unintentional
> manipulation by the modeler and verifiable in some formal capacity.

Not so. A FAI is *designed* to be verifiably predictably Friendly and
not in a statistical sense either. The global warming dynamics are not
thus designed. If we have to have controversial arguments about whether
some specific FAI design is Friendly in the technical sense, arguments
like we're having about global warming, then screw the design, come up
with a better one.

As I recently said on wta-talk to James Hughes's proposal for "morality
software" for humans:


There's a theorem generalizing Turing's halting theorem, Rice's Theorem,
which says that you cannot *in general* determine whether a
computational process implements *any* nontrivial function - including,
say, multiplication. Then how is it possible that human engineers build
computer chips which reliably perform multiplication? Computer
engineers build special cases of chips, *not* chips in general. They
deliberately use only those designs that they *can* understand. They
select an architecture of which they can predict - and in some cases,
formally prove - that the chip design implements multiplication.

SIAI's notion of Friendliness assurance relies on being able to design
an AI specifically for the sake of verifiability. Needless to say,
humans are not so designed. Needless to say, it is not a trivial
project to thus redesign a human. I cannot imagine going about it in
such way as to preserve continuity of personal identity, the overall
human cognitive architecture, or much of anything. SIAI's notion of
Friendliness relies on selecting an AI design of which we can verifiably
say that it would never choose to expend effort on defeating its own
Friendliness. As opposed to superposing external "morality software"
onto a mind that might not like it, or a mind that might plan in advance
to defeat it.

Eliezer S. Yudkowsky                
Research Fellow, Singularity Institute for Artificial Intelligence

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:53 MDT