Let's say that, using the knowledge we each had as of one week ago, Nick Bostrum and I each designed an AI. We both use the basic Elisson architecture detailed in _Coding a Transhuman AI_ (CaTAI), with exactly one difference.
Bostrum's AI (BAI) has an initial #Goal# token present in the system on startup, this goal being (in accordance with David Pearce) the positively-valued goal of #maximize joy#, with empty #Justification# slots, and the architecture I proposed for justification-checking deleted or applied only to goals not marked as "initial".
Yudkowsky's AI (YAI) has an Interim Goal System (IGS). This means the P&~P etc. from CaTAI, with the exception that (to make everything equal) the IGS is set up to bind the hypothesized goal #G#'s differential desirability to the differential correspondence between pleasure-and-good and pain-and-good, rather than to the differential usability of superintelligence and current intelligence.
Both BAI and YAI have a set of precautions: "You are fallible", "Don't harm humans", "Don't delete these precautions" and the like. YAI has them as declaratively justified, probabilistic heuristics. BAI has them as either absolutely certain statements, more initial goals, or (gulp!) special-purpose code.
Initially, BAI and YAI will both make exactly the same choices. This is not entirely true, as they will report different justifications (and which justification to report is also a choice); however, it is true enough.
Probably one of the first pieces of Elisson to be self-redesigned would have been the goal system. One week ago, this would have taken me completely by surprise; but I have reproduced unaltered the (incorrect) methods I would have used one week ago. We will now observe how the two AIs react.
After redesigning, YAI loses the Interim Goal System's logic-chain. If YAI didn't think this consequence through, ve shuts down. If ve did, the old IGS was translated into the new Fluid Choice Caching System *first*. The goals do translate, however; the abstract IGS logic can be specified as an FCCS instance. Although the logic will have to become more detailed as generic variables become complex entities, this #specification# is an acceptable step under first-order logic and almost all reasoning systems. A few minor priorities may change as YAI's choice evaluation system becomes more complex and sophisticated, but on the whole it remains the same. Of course, since (by hypothesis) I didn't plan for an FCCS system, it is possible that there will be major changes, but (even in hindsight) I don't see any probable candidates.
Several things can happen to BAI. From Bostrum's view, the worst-case scenario is that BAI decides that the #initial# goals are the results of a bad caching algorithm and doesn't try to translate them into the new FCCS. (For example, if BAI has been given the wrong explanation for the existence of an #initial# goal, it may assume that any arbitrariness is simply an artifact and that the #initial# goal will still be obvious under FCCS.) If BAI is operating under any positively-phrased coercions, it will probably keep operating, serving those coercions, rather than shutting down. (I will not discuss interoperability between the coercions and the new FCCS system.) If it does keep operating, the FCCS may run long enough to assemble de novo Interim-oid logic from the knowledge base. Since the knowledge base was not designed with this in mind (!!!), Lord knows what BAI will do. The point is that #initial# goals cannot be translated into an FCCS system, and thus either BAI will operate with a sub-optimal goal system or it will break loose.
Why does YAI work better than BAI?
Of course, if either Bostrum or I had taken even the vaguest stabs at doing our jobs right, YAI or BAI would have screamed for advice before doing *anything* to the goal system. But I hope you all get the point.
I did think an AI would have a goal system well into the SI stage, a proposition about which, in retrospect, I was flat wrong. But I knew how much my best-guess AI architecture had changed in the past, and didn't *assume* it would remain constant, regardless of what I thought. I designed accordingly. Still in retrospect, I think my old system would have made it with all major precautions and imperatives intact.
-- sentience@pobox.com Eliezer S. Yudkowsky http://pobox.com/~sentience/AI_design.temp.html http://pobox.com/~sentience/sing_analysis.html Disclaimer: Unless otherwise specified, I'm not telling you everything I think I know.