From: Tim Freeman (tim@fungible.com)
Date: Fri Jun 27 2008 - 09:16:18 MDT
From: "Stathis Papaioannou" <stathisp@gmail.com>
>It's a metaphorical box we are specifying, some sort of restriction.
>It's interesting that when a goal is stated in the form "do X but with
>restriction Y", the response is that the AI, being very clever, will
>find a way to do X without restriction Y. But what marks Y as being
>less important than X, whatever X might be? For example, if the AI is
>instructed to preserve its own integrity while ensuring that no humans
>are hurt, why should it be more likely to try to get around the "no
>humans are hurt" part rather than the "preserve its own integrity"
>part?
The question for me here is exactly what we mean by "restriction".
A definition I could work with is that a "restriction" is an
undesirable state. So if the AI is supposed to preserve its own
integrity while ensuring that no humans are hurt, then we assign a
positive utility to whatever fits "preserve its own integrity", and a
negative utility to "humans being hurt", and then we scale them
somehow and add them up, and that's our utility function.
A definition that does not seem to work is that being "restricted from
doing X" means "not causing X, directly or indirectly". The problem
with that is that everything indirectly causes everything else a
little bit.
By the way, if the AI has any long-term goals, then it will want to
preserve its own integrity in order to preserve those goals. Although
"preserve its own integrity" is a good enough example for the issue at
hand, it's not something you'd really need to put in there explicitly.
-- Tim Freeman http://www.fungible.com tim@fungible.com
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:03 MDT