From: Tim Freeman (tim@fungible.com)
Date: Sat Apr 12 2008 - 10:12:52 MDT
On Thu, Apr 10, 2008 at 10:23 AM, Tim Freeman <tim@fungible.com> wrote:
> That sort of altruism is exploitable even without considering absurdly
> improbable hells. All I need to do to exploit you is breed or
> construct or train a bunch of humans who want exactly what I want.
From: "Rolf Nelson" <rolf.h.d.nelson@gmail.com>
>In Causal Decision Theory, this is true. However, it's possible to
>adopt a different decision theory that is specifically designed to be
>less exploitable, without having to give up the things that you care
>about.
What things are we talking about giving up here? Would you care to
describe the different decision theory you have in mind?
>1. Please give your alternative suggestion for a morality that you
>would actually endorse, and which is completely unexploitable.
I don't get to decide the morality of any existing human, so the
remaining problem is to decide the morality of something that might be
constructed. I don't care much what it does if there are no existing
humans any more, so it can take its values from observation of the
people it interacts with, and then it has to be trained to identify
them and figure out what they are doing and why they're doing it, blah
blah blah. There are a few details there but no rocket science.
My best documented idea at the moment is at
http://www.fungible.com/respect/index.html. There was valid feedback
to it in a previous discussion here that is not yet in the document,
but it's not bad enough to need to be recalled. It seems workable
unless it has bugs I don't know about. (Check the known bugs section
for one significant bug I do know about. It isn't pertinent to this
conversation.)
We fix the set of people we care for, how much we care for them, the
time for the end of the plan, the maximum utility, the resolution of
the utility function, and maybe a few other parameters that slip my
mind at the moment. When the time for the end of the plan comes, the
planner is complete. It exits and unless the code is changed it won't
run any more. Before the end of the plan, the people it's altruistic
toward will probably care what happens after the end of the plan, so
the planner will put the world in a state where the people can
reasonably expect good things to happen after the end of the plan. It
will do this because it cares about them right now, and the people
care what happens after the end of the plan. It won't do it because
it cares what happens after the end of the plan, since by construction
it doesn't care.
For example, if people want it to run more, it is likely to arrange
for something similar to itself to be automatically instantiated and
start running with a new end-of-plan date and perhaps other changed
parameters.
>2. Is your overall goal to make the world a better place, or have you
>elevated 'be unexploitable' to a supergoal rather than a means to an
>end?
Being unexploitable is a means to an end. It's a pretty important
means, since an exploitable, powerful AI is likely to be taken over by
someone I don't like. Then his next plausible step is to squish me
like a bug, along with everyone else who might object to him running
the show.
-- Tim Freeman http://www.fungible.com tim@fungible.com
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:02 MDT