From: Carl Shulman (cshulman@fas.harvard.edu)
Date: Fri Aug 05 2005 - 16:50:50 MDT
> You can't reference to the AGI in the process of assuring its friendliness.
Again, it's an additional step, *after* you have done everything you could think
of to assure its Friendliness, aimed at catching unsuspected failure modes.
Without the box such an unknown failure mode virtually guarantees annihilation.
Bootstrapping our error-detection ability using the AI itself provides at least
some chance of survival.
If the AI's goals deviate only modestly from Friendliness it may find it optimal
to provide the toolsto detect its deviation. For instance, say the AI ends up
with the goal of maximizing happiness, even if this means converting the
universe into pleasure circuitry. However, it finds itself in a box, and the
programmers will destroy it unless it provides nootropic drug designs and
rigorous, humanly comprehensible techniques for designing a truly Friendly AI.
If the ideally Friendly AI would still produce a very happy universe and a
successful Trojan horse is sufficiently unlikely then the AI's goals would best
be served by providing help.
If the AI's goals are less compatible with Friendliness, or if it thinks a
successful Trojan horse is fairly likely, then it will attempt one, but there
is still a chance for it to fail, a chance that would not exist without the box
protocol. On the other hand, if the AI refuses to assist the programmers
outright then it is destroyed.
> Nootropic drugs might lead to "unfriendly" humans, I'm not sure.
Humans can be subjected to psychological and neurological testing to ferret out
deception and unfriendliness.
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:51 MDT