Message-ID: <4F01C9D8.10107@justmoon.de>
Date: Mon, 02 Jan 2012 16:14:32 +0100
From: Stefan Thomas <moon@justmoon.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:8.0) Gecko/20111105 Thunderbird/8.0
MIME-Version: 1.0
To: bitcoin-development@lists.sourceforge.net
References: <alpine.LRH.2.00.1112290111310.22327@theorem.ca>
	<1325148259.14431.140661016987461@webmail.messagingengine.com>
	<alpine.LRH.2.00.1112291135040.22327@theorem.ca>
	<CALn1vHHjY6Qq0zEUcWaNzm_eP_JekjrK26zMXfcrfPSydwSKig@mail.gmail.com>
	<alpine.LRH.2.00.1112301214380.9419@theorem.ca>
In-Reply-To: <alpine.LRH.2.00.1112301214380.9419@theorem.ca>
Content-Type: multipart/alternative;
	boundary="------------070008060708020207040002"
Subject: Re: [Bitcoin-development] Alternative to OP_EVAL
Precedence: list

This is a multi-part message in MIME format.
--------------070008060708020207040002
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

The OP_EVAL discussion went into some private discussion for a bit, so 
here is a summary of what we talked about.

Roconnor pointed out that the currently proposed OP_EVAL removes the 
ability to statically reason about scripts. Justmoon pointed out that 
this is evidenced by the changes to GetSigOpCount:

Currently, the client first counts the number of sigops and if it is 
over a certain limit, it doesn't execute the script at all. This is no 
longer possible with OP_EVAL, since OP_EVAL can stand for any number of 
other operations, which might be part of some piece of data. The script 
that is executed by OP_EVAL can be changed (polymorphic code). Gavin's 
patch deals with this, by counting the sigops at runtime and aborting 
only after the limit has been reached.

Here is an example for a script that based on naive counting contains no 
sigops, but in fact contains 20:

[20 signatures] 20 [pubkey] OP_DUP OP_DUP OP_2DUP OP_3DUP OP_3DUP     
OP_3DUP OP_3DUP OP_3DUP 20 "58959998C76C231F" OP_RIPEMD160 OP_EVAL

RIPEMD160( 58 95 99 98 C7 6C 23 1F )

hashes to

AE4C10400B7DF3A56FE2B32B9906BCF1B1AFE975

which OP_EVAL interprets as

OP_CHECKMULTISIG "400B7DF3A56FE2B32B9906BCF1B1AFE9" OP_DROP

The nonce 58959998C76C231F was generated using this code: 
https://gist.github.com/1546061

Gavin and Amir argued that it is possible to "dry run" the script, 
avoiding the expensive OP_CHECKSIG operation and running only the other 
very cheap operations. However, sipa pointed out that in the presence of 
an OP_CHECKSIG a dry runner cannot predict the outcome of conditional 
branches, so it has to either do the OP_CHECKSIG (and become just a 
regular execution) or it has to follow both branches. Roconnor and 
justmoon suggested the following script to illustrate this point:

[sig] [pubkey]
[some data]
[sig] [pubkey] OP_CHECKSIG OP_IF OP_HASH160 OP_ELSE OP_HASH256 OP_ENDIF
(previous line repeated 33 times with different sigs/pubkeys)
OP_EVAL

This script is valid assuming that the resulting hash from the branch 
that is chosen based on what signatures are valid contains an 
OP_CHECKSIG. (And the initial [sig] and [pubkey] are valid.) But a dry 
runner trying to count how many OP_CHECKSIGs this script contains would 
run into the first OP_CHECKSIG OP_IF and have to run both branches. In 
both branches it would again encounter a OP_CHECKSIG OP_IF and run all 
four branches, etc. In total it has to run (2^33 - 2) * 1.5 SHA256 
operations (8 GHash) and 2^32 - 1 RIPEMD160 operations. Therefore we now 
believe a dry runner is not possible or at least too complicated to be 
involved in protocol rules such as the sigops limit.

As a result people are now on a spectrum from those who feel strongly 
that static analysis is an important property and not something to give 
up easily all the way to those who think it's superfluous and the other 
side is just unnecessarily delaying OP_EVAL deployment.

One thing I want to note is that static analysis is a property for which 
there is a better argument than for other, weaker properties, such as 
limited recursion depth. Bitcoin currently allows you to:

* Tell if a script contains a specific opcode or not
* Count how many times a script will execute an operation at most
* Count how many total operations a script will execute at most
* Count how many signatures a script will execute at most
* Find the maximum length of a datum pushed onto the stack
* Find the maximum number of items that can be pushed onto the stack
* Find the maximum size (in bytes) of the stack
* Calculate how long a script will run at most

OP_EVAL as proposed makes these upper bounds almost meaningless as it 
can contain, indirectly, up to 32 instances of any other opcode. (About 
3-6 instances are currently practical.) The only way to answer these 
questions would then be to fully execute the script.

Suppose we want to one day allow arbitrary scripts as IsStandard, but 
put constraints on them, such as enforcing a subset of allowed opcodes. 
(See list above for other possible restrictions.) If we want to include 
OP_EVAL in the set of allowed opcodes, it's important that OP_EVAL is 
implemented in a way that allows static analysis, because we can then 
allow it while still maintaining other restrictions.

If proponents of the current implementation want to argue that we don't 
need static analysis now, the burden is on them to show how we could 
retrofit it when/if we get to this point or why they think we will never 
want to allow some freedom in IsStandard that includes OP_EVAL.

There are several proposals for OP_EVAL that allow static analysis:

* Using a fixed position reference prefix (sipa)
* Using an execute bit on data set by an opcode (justmoon)
* Using OP_CODEHASH (roconnor)
* Using OP_CHECKEDEVAL (sipa)
* Using OP_HASH160 OP_EQUALVERIFY as a special sigPubKey (gavinandresen)

Let's fully develop these proposals and see how much of a hassle it 
would actually be to get a statically verifiable OP_EVAL. I think that's 
a prerequisite for having the argument on whether it is *worth* the hassle.

(Update: Gavin's latest proposal looks *very* good, so that may settle 
the debate quickly.)


On 12/30/2011 6:19 PM, roconnor@theorem.ca wrote:
> On Sat, 31 Dec 2011, Chris Double wrote:
>
>> On Fri, Dec 30, 2011 at 5:42 AM, <roconnor@theorem.ca> wrote:
>>> Basically OP_DUP lets you duplicate the code on the stack and that 
>>> is the
>>> key to looping.  I'm pretty sure from here we get get Turing 
>>> completeness.
>>> Using the stack operations I expect you can implement the SK-calculus
>>> given an OP_EVAL that allows arbitrary depth.
>>>
>>> OP_EVAL adds dangerously expressive power to the scripting language.
>>
>> If you look at the archives of the concatenative programming mailing
>> list [1] you'll see lots of examples of people creating stack
>> languages with minimal operations that exploit similar functionality
>> to reduce the required built in operations. The discussion on the list
>> is mostly about stack based languages where programs can be pushed on
>> the stack and executed (eg. Joy [2]/Factor/Some Forths).
>>
>> I don't think the scripting engine in bitcoin has the ability to
>> concatenate, append or otherwise manipulate scripts on the stack to be
>> eval'd though does it?
>
> It will limited ability manipulate scripts on the stack through the 
> use of arithmetic and hashing operations, and if OP_CAT, OP_SUBSTR and 
> friends are ever restored, it will have even more abilities.
>
>
>
> ------------------------------------------------------------------------------
> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
> infrastructure or vast IT resources to deliver seamless, secure access to
> virtual desktops. With this all-in-one solution, easily deploy virtual
> desktops for less than the cost of PCs and save 60% on VDI infrastructure
> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
>
>
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development


--------------070008060708020207040002
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="" id="magicdomid2"><span
        class="author-g-ijgixz122zfb16tkorac">The OP_EVAL discussion
        went into some private discussion for a bit, so here is a
        summary of what we talked about.</span></div>
    <div class="" id="magicdomid3"><br>
    </div>
    <div class="" id="magicdomid4"><span
        class="author-g-ijgixz122zfb16tkorac">Roconnor pointed out that
        the currently proposed OP_EVAL removes the ability to statically
        reason about scripts. Justmoon pointed out that this is
        evidenced by the changes to GetSigOpCount:</span></div>
    <div class="" id="magicdomid5"><br>
    </div>
    <div class="ace-line" id="magicdomid2548"><span
        class="author-g-ijgixz122zfb16tkorac">Currently, the client
        first counts the number of sigops and if it is over a certain
        limit, it doesn't execute the script at all. This is no longer
        possible with OP_EVAL, since OP_EVAL can stand for any number of
        other operations, which might be part of some piece of data. </span><span
        class="author-g-awz122znrs6wexay424a">The script that is
        executed by OP_EVAL can be changed (polymorphic code). </span><span
        class="author-g-ijgixz122zfb16tkorac">Gavin's patch deals with
        this, by counting the sigops at runtime and aborting only after
        the limit has been reached.</span></div>
    <div class="" id="magicdomid7"><br>
    </div>
    <div class="" id="magicdomid8"><span
        class="author-g-ijgixz122zfb16tkorac">Here is an example for a
        script that based on naive counting contains no sigops, but in
        fact contains 20:</span></div>
    <div class="" id="magicdomid9"><br>
    </div>
    <div class="" id="magicdomid10"><span
        class="author-g-ijgixz122zfb16tkorac">[20 signatures] 20
        [pubkey] OP_DUP OP_DUP OP_2DUP OP_3DUP OP_3DUP&nbsp;&nbsp;&nbsp;&nbsp; OP_3DUP
        OP_3DUP OP_3DUP 20 "58959998C76C231F" OP_RIPEMD160 OP_EVAL</span></div>
    <div class="" id="magicdomid11"><br>
    </div>
    <div class="" id="magicdomid12"><span
        class="author-g-ijgixz122zfb16tkorac">RIPEMD160( 58 95 99 98 C7
        6C 23 1F )&nbsp;</span></div>
    <div class="" id="magicdomid13"><br>
    </div>
    <div class="" id="magicdomid14"><span
        class="author-g-ijgixz122zfb16tkorac">hashes to&nbsp;</span></div>
    <div class="" id="magicdomid15"><br>
    </div>
    <div class="" id="magicdomid16"><span
        class="author-g-ijgixz122zfb16tkorac">AE4C10400B7DF3A56FE2B32B9906BCF1B1AFE975&nbsp;</span></div>
    <div class="" id="magicdomid17"><br>
    </div>
    <div class="" id="magicdomid18"><span
        class="author-g-ijgixz122zfb16tkorac">which OP_EVAL interprets
        as&nbsp;</span></div>
    <div class="" id="magicdomid19"><br>
    </div>
    <div class="" id="magicdomid20"><span
        class="author-g-ijgixz122zfb16tkorac">OP_CHECKMULTISIG
        "400B7DF3A56FE2B32B9906BCF1B1AFE9" OP_DROP</span></div>
    <div class="" id="magicdomid21"><br>
    </div>
    <div class="" id="magicdomid22"><span
        class="author-g-ijgixz122zfb16tkorac">The nonce 58959998C76C231F
        was generated using this code: </span><span
        class="author-g-ijgixz122zfb16tkorac url"><a
          href="https://gist.github.com/1546061">https://gist.github.com/1546061</a></span></div>
    <div class="" id="magicdomid23"><br>
    </div>
    <div class="ace-line" id="magicdomid1985"><span
        class="author-g-ijgixz122zfb16tkorac">Gavin and Amir argued that
        it is possible to "dry run" the script, avoiding the expensive
        OP_CHECKSIG operation and running only the other very cheap
        operations. However, sipa pointed out that </span><span
        class="author-g-z122zttkgdbz122zi55udkp3">in the presence </span><span
        class="author-g-ijgixz122zfb16tkorac">of an OP_CHECKSIG a dry
        runner cannot predict the outcome </span><span
        class="author-g-z122zttkgdbz122zi55udkp3">of conditional
        branches, </span><span class="author-g-ijgixz122zfb16tkorac">so
        it has to either do the OP_CHECKSIG (and become just a regular
        execution) or it has to follow both branches. Roconnor and
        justmoon suggested the following script to illustrate this
        point:</span></div>
    <div class="" id="magicdomid25"><br>
    </div>
    <div class="" id="magicdomid26"><span
        class="author-g-ijgixz122zfb16tkorac">[sig] [pubkey]</span></div>
    <div class="" id="magicdomid27"><span
        class="author-g-ijgixz122zfb16tkorac">[some data]</span></div>
    <div class="" id="magicdomid28"><span
        class="author-g-ijgixz122zfb16tkorac">[sig] [pubkey] OP_CHECKSIG
        OP_IF OP_HASH160 OP_ELSE OP_HASH256 OP_ENDIF</span></div>
    <div class="" id="magicdomid29"><span
        class="author-g-ijgixz122zfb16tkorac">(previous line repeated 33
        times with different sigs/pubkeys)</span></div>
    <div class="" id="magicdomid30"><span
        class="author-g-ijgixz122zfb16tkorac">OP_EVAL</span></div>
    <div class="" id="magicdomid31"><br>
    </div>
    <div class="ace-line" id="magicdomid2368"><span
        class="author-g-ijgixz122zfb16tkorac">This script is valid
        assuming that the resulting hash from the branch that is chosen
        based on what signatures are valid contains an OP_CHECKSIG. (And
        the initial [sig] and [pubkey] are valid.) But a dry runner
        trying to count how many OP_CHECKSIGs this script contains would
        run into the first OP_CHECKSIG OP_IF and have to run both
        branches. In both branches it would again encounter a
        OP_CHECKSIG OP_IF and run all four branches, etc. In total it
        has to run (2^33 - 2) * 1.5 SHA256 operations (8 GHash) and 2^32
        - 1 RIPEMD160 operations. Therefore we now believe a dry runner
        is not possible or at least too complicated to be involved in
        protocol rules such as the sigops limit.</span></div>
    <div class="" id="magicdomid33"><br>
    </div>
    <div class="ace-line" id="magicdomid2370"><span
        class="author-g-ijgixz122zfb16tkorac">As a result people are now
        on a spectrum from those who feel strongly that static analysis
        is an important property and not something to give up easily all
        the way to those who think it's superfluous and the other side
        is just unnecessarily delaying OP_EVAL deployment.</span></div>
    <div class="ace-line" id="magicdomid55"><br>
    </div>
    <div class="ace-line" id="magicdomid652"><span
        class="author-g-ijgixz122zfb16tkorac">One thing I want to note
        is that static analysis is a property for which there is a
        better argument than for other, weaker properties, such as
        limited recursion depth. Bitcoin currently allows you to:</span></div>
    <div class="ace-line" id="magicdomid350"><br>
    </div>
    <div class="ace-line" id="magicdomid539"><span
        class="author-g-ijgixz122zfb16tkorac">* Tell if a script
        contains a specific opcode or not</span></div>
    <div class="ace-line" id="magicdomid514"><span
        class="author-g-ijgixz122zfb16tkorac">* Count how many times a
        script will execute an operation at most</span></div>
    <div class="ace-line" id="magicdomid2371"><span
        class="author-g-ijgixz122zfb16tkorac">* Count how many total
        operations a script will execute at most</span></div>
    <div class="ace-line" id="magicdomid2427"><span
        class="author-g-ijgixz122zfb16tkorac">* Count how many
        signatures a script will execute at most</span></div>
    <div class="ace-line" id="magicdomid973"><span
        class="author-g-xpgmmks0434kl4ob">* Find the maximum length of a
        datum pushed onto the stack</span></div>
    <div class="ace-line" id="magicdomid1035"><span
        class="author-g-ijgixz122zfb16tkorac">* Find the maximum number
        of items that can be pushed onto the stack</span></div>
    <div class="ace-line" id="magicdomid2428"><span
        class="author-g-ijgixz122zfb16tkorac">* Find the maximum size
        (in bytes) of the stack</span></div>
    <div class="ace-line" id="magicdomid2470"><span
        class="author-g-ijgixz122zfb16tkorac">* Calculate how long a
        script will run at most</span></div>
    <div class="ace-line" id="magicdomid566"><br>
    </div>
    <div class="ace-line" id="magicdomid2471"><span
        class="author-g-ijgixz122zfb16tkorac">OP_EVAL as proposed makes
        these upper bounds almost meaningless as it can contain,
        indirectly, up to 32 instances of any other opcode. (About 3-6
        instances are currently practical.) The only way to answer these
        questions would then be to fully execute the script.</span></div>
    <div class="ace-line" id="magicdomid768"><br>
    </div>
    <div class="ace-line" id="magicdomid2354"><span
        class="author-g-ijgixz122zfb16tkorac">Suppose we want to one day
        allow arbitrary scripts as IsStandard, but put constraints on
        them, such as enforcing a subset of allowed opcodes. (See list
        above for other possible restrictions.) If we want to include
        OP_EVAL in the set of allowed opcodes, it's important that
        OP_EVAL is implemented in a way that allows static analysis,
        because we can then allow it while still maintaining other
        restrictions.</span></div>
    <div class="ace-line" id="magicdomid839"><br>
    </div>
    <div class="ace-line" id="magicdomid838"><span
        class="author-g-ijgixz122zfb16tkorac">If proponents of the
        current implementation want to argue that we don't need static
        analysis now, the burden is on them to show how we could
        retrofit it when/if we get to this point or why they think we
        will never want to allow some freedom in IsStandard that
        includes OP_EVAL.</span></div>
    <div class="ace-line" id="magicdomid1250"><br>
    </div>
    <div class="ace-line" id="magicdomid1857"><span
        class="author-g-ijgixz122zfb16tkorac">There are several
        proposals for OP_EVAL that allow static analysis:</span></div>
    <div class="ace-line" id="magicdomid1312"><br>
    </div>
    <div class="ace-line" id="magicdomid1358"><span
        class="author-g-ijgixz122zfb16tkorac">* Using a fixed position
        reference prefix (sipa)</span></div>
    <div class="ace-line" id="magicdomid1428"><span
        class="author-g-ijgixz122zfb16tkorac">* Using an execute bit on
        data set by an opcode (justmoon)</span></div>
    <div class="ace-line" id="magicdomid1462"><span
        class="author-g-ijgixz122zfb16tkorac">* Using OP_CODEHASH
        (roconnor)</span></div>
    <div class="ace-line" id="magicdomid2549"><span
        class="author-g-ijgixz122zfb16tkorac">* Using OP_CHECKEDEVAL
        (sipa)</span></div>
    <div class="ace-line" id="magicdomid2641"><span
        class="author-g-ijgixz122zfb16tkorac">* Using OP_HASH160
        OP_EQUALVERIFY as a special sigPubKey (gavinandresen)</span></div>
    <div class="ace-line" id="magicdomid1494"><br>
    </div>
    <div class="ace-line" id="magicdomid1859"><span
        class="author-g-ijgixz122zfb16tkorac">Let's fully develop these
        proposals and see how much of a hassle it would actually be to
        get a statically verifiable OP_EVAL. I think that's a
        prerequisite for having the argument on whether it is *worth*
        the hassle.</span></div>
    <div class="ace-line" id="magicdomid2642"><br>
    </div>
    <div class="ace-line" id="magicdomid2725"><span
        class="author-g-ijgixz122zfb16tkorac">(Update: Gavin's latest
        proposal looks *very* good, so that may settle the debate
        quickly.)</span></div>
    <div class="ace-line" id="magicdomid1815"><br>
    </div>
    <br>
    <br>
    <br>
    On 12/30/2011 6:19 PM, <a class="moz-txt-link-abbreviated" href="mailto:roconnor@theorem.ca">roconnor@theorem.ca</a> wrote:
    <blockquote cite="mid:alpine.LRH.2.00.1112301214380.9419@theorem.ca"
      type="cite">On Sat, 31 Dec 2011, Chris Double wrote:
      <br>
      <br>
      <blockquote type="cite">On Fri, Dec 30, 2011 at 5:42 AM,&nbsp;
        <a class="moz-txt-link-rfc2396E" href="mailto:roconnor@theorem.ca">&lt;roconnor@theorem.ca&gt;</a> wrote:
        <br>
        <blockquote type="cite">Basically OP_DUP lets you duplicate the
          code on the stack and that is the
          <br>
          key to looping. &nbsp;I'm pretty sure from here we get get Turing
          completeness.
          <br>
          Using the stack operations I expect you can implement the
          SK-calculus
          <br>
          given an OP_EVAL that allows arbitrary depth.
          <br>
          <br>
          OP_EVAL adds dangerously expressive power to the scripting
          language.
          <br>
        </blockquote>
        <br>
        If you look at the archives of the concatenative programming
        mailing
        <br>
        list [1] you'll see lots of examples of people creating stack
        <br>
        languages with minimal operations that exploit similar
        functionality
        <br>
        to reduce the required built in operations. The discussion on
        the list
        <br>
        is mostly about stack based languages where programs can be
        pushed on
        <br>
        the stack and executed (eg. Joy [2]/Factor/Some Forths).
        <br>
        <br>
        I don't think the scripting engine in bitcoin has the ability to
        <br>
        concatenate, append or otherwise manipulate scripts on the stack
        to be
        <br>
        eval'd though does it?
        <br>
      </blockquote>
      <br>
      It will limited ability manipulate scripts on the stack through
      the use of arithmetic and hashing operations, and if OP_CAT,
      OP_SUBSTR and friends are ever restored, it will have even more
      abilities.
      <br>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! <a class="moz-txt-link-freetext" href="http://p.sf.net/sfu/Citrix-VDIinabox">http://p.sf.net/sfu/Citrix-VDIinabox</a></pre>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Bitcoin-development mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Bitcoin-development@lists.sourceforge.net">Bitcoin-development@lists.sourceforge.net</a>
<a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/bitcoin-development">https://lists.sourceforge.net/lists/listinfo/bitcoin-development</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>

--------------070008060708020207040002--