MIME-Version: 1.0
In-Reply-To: <921EB5CF-B472-4BD6-9493-1A681586FB51@friedenbach.org>
References: <5B6756D0-6BEF-4A01-BDB8-52C646916E29@friedenbach.org>
	<201709302323.33004.luke@dashjr.org>
	<921EB5CF-B472-4BD6-9493-1A681586FB51@friedenbach.org>
From: "Russell O'Connor" <roconnor@blockstream.io>
Date: Mon, 2 Oct 2017 13:15:38 -0400
Message-ID: <CAMZUoK=SwR6=vWCo54_Yd11nbbwD0q60n42Uj38Yq_sWD2zS9g@mail.gmail.com>
To: Mark Friedenbach <mark@friedenbach.org>, 
	Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Content-Type: multipart/alternative; boundary="001a113c4f3c3975b8055a9387ec"
Subject: Re: [bitcoin-dev] Merkle branch verification & tail-call semantics
 for generalized MAST
Precedence: list

--001a113c4f3c3975b8055a9387ec
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

(Subject was: [bitcoin-dev] Version 1 witness programs (first draft)), but
I'm moving part of that conversation to this thread.

On Sun, Oct 1, 2017 at 5:32 PM, Johnson Lau <jl2012@xbt.hk> wrote:

> 3. Do we want to allow static analysis of sigop?
> BIP114 and the related proposals are specifically designed to allow stati=
c
> analysis of sigop. I think this was one of the main reason of OP_EVAL not
> being accepted. This was also the main reason of Ethereum failing to do a
> DAO hacker softfork, leading to the ETH/ETC split. I=E2=80=99m not sure i=
f we
> really want to give up this property. Once we do it, we have to support i=
t
> forever.


I would very much like to retain the ability to do static analysis.  More
generally, the idea of interpreting arbitrary data as code, as done in
OP_EVAL and in TAILCALL, makes me quite anxious.  This at the root of many
security problems throughout the software industry, and I don't relish
giving more fuel to the underhanded Bitcoin Script contestants.

On Sun, Oct 1, 2017 at 8:45 PM, Luke Dashjr <luke@dashjr.org> wrote:

> > 3. Do we want to allow static analysis of sigop?
> > BIP114 and the related proposals are specifically designed to allow
> static
> > analysis of sigop. I think this was one of the main reason of OP_EVAL n=
ot
> > being accepted. This was also the main reason of Ethereum failing to do=
 a
> > DAO hacker softfork, leading to the ETH/ETC split. I=E2=80=99m not sure=
 if we
> > really want to give up this property. Once we do it, we have to support
> it
> > forever.
>
> It seems inevitable at this point. Maybe we could add a separate
> "executable-
> witness" array (in the same manner as the current witness was softforked
> in),
> and require tail-call and condition scripts to merely reference these by
> hash,
> but I'm not sure it's worth the effort?
>
> Thinking further, we could avoid adding a separate executable-witness
> commitment by either:
> A) Define that all the witness elements in v1 are type-tagged (put the
> minor
>    witness version on them all, and redefine minor 0 as a stack item?); o=
r
> B) Use an empty element as a delimiter between stack and executable items=
.
>
> To avoid witness malleability, the executable items can be required to be
> sorted in some manner.
>
> The downside of these approaches is that we now need an addition 20 or 32
> bytes per script reference... which IMO may possibly be worse than losing
> static analysis. I wonder if there's a way to avoid that overhead?
>

Actually, I have a half-baked idea I've been thinking about along these
lines.

The idea is to add a flag to each stack item in the Script interpreter to
mark whether the item in the stack is "executable" or "non-executable", not
so different from how computers mark pages to implement executable space
protection.  By default, all stack items are marked "non-executable".  We
then redefine OP_PUSHDATA4 as OP_PUSHCODE within ScriptSigs.  The
operational semantics of OP_PUSHCODE would remain the same as OP_PUSHDATA4
except it would set the pushed item's associated flag to "executable".  All
data pushed by OP_PUSHCODE would be subject to the sigops limits and any
other similar static analysis limits.

Segwit v0 doesn't use OP_PUSHDATA codes to create the input stack, so we
would have to mark executable input stack items using a new witness v1
format. But, IIUC, TAILCALL isn't going to be compatible with Segwit v0
anyway.

During a TAILCALL, it is required that the top item on the stack have the
"executable" flag, otherwise TAILCALL is not used (and the script succeeds
or fails based on the top item's data value as usual).

All other operations can treat "executable" items as data, including the
merkle branch verification.  None of the Script operations can create
"executable" items; in particular, OP_PUSHDATA4 within the ScriptPubKey
also would not create "executable" items.  (We can talk about the behaviour
of OP_CAT when that time comes).

One last trick is that when "executable" values are duplicated, by OP_DUP,
OP_IFDUP, OP_PICK. then the newly created copy of the value on top of the
stack is marked "non-executable".

Because we make the "executable" flag non-copyable, we are now free to
allow unbounded uses of TAILCALL (i.e. TAILCALL can be used multiplie times
in a single input).  Why is this safe?  Because the number of "executable"
items decreases by at least one every time TAILCALL is invoked. the number
of OP_PUSHCODE occurrences in the witness puts an upper bound on the number
of invocations of TAILCALL allowed.  Using static analysis of the script
pubkey and the data within the OP_PUSHCODE data, we compute an upper bound
on the number of operations (of any type) that can occur during execution.

Unbounded TAILCALL should let us (in the presence of OP_CHECKSIGFROMSTACK)
have unbounded delegation.

Overall, I believe that OP_PUSHCODE

1. is fully backwards compatible.
2. maintains our ability to perform static analysis with TAILCALL.
3. never lets us interpret computed values as executable code.
4. extends TAILCALL to safely allow multiple TAILCALLs per script.

--001a113c4f3c3975b8055a9387ec
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_quote">(Subject was: [bitcoin-dev] Ver=
sion 1 witness programs (first draft)), but I&#39;m moving part of that con=
versation to this thread.<br></div><div class=3D"gmail_quote"><br></div><di=
v class=3D"gmail_quote">On Sun, Oct 1, 2017 at 5:32 PM, Johnson Lau <span d=
ir=3D"ltr">&lt;<a href=3D"mailto:jl2012@xbt.hk" target=3D"_blank">jl2012@xb=
t.hk</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:=
1ex">3. Do we want to allow static analysis of sigop?<br>
BIP114 and the related proposals are specifically designed to allow=20
static analysis of sigop. I think this was one of the main reason of=20
OP_EVAL not being accepted. This was also the main reason of Ethereum=20
failing to do a DAO hacker softfork, leading to the ETH/ETC split. I=E2=80=
=99m=20
not sure if we really want to give up this property. Once we do it, we=20
have to support it forever.</blockquote><div><br></div><div>I would very mu=
ch like to retain the ability to do static analysis.=C2=A0 More generally, =
the idea of interpreting arbitrary data as code, as done in OP_EVAL and in =
TAILCALL, makes me quite anxious.=C2=A0 This at the root of many security p=
roblems throughout the software industry, and I don&#39;t relish giving mor=
e fuel to the underhanded Bitcoin Script contestants.<br></div><div>=C2=A0<=
br><div class=3D"gmail_quote"><div><div class=3D"gmail_quote">On Sun, Oct 1=
, 2017 at 8:45 PM, Luke Dashjr <span dir=3D"ltr">&lt;<a href=3D"mailto:luke=
@dashjr.org" target=3D"_blank">luke@dashjr.org</a>&gt;</span> wrote:<span c=
lass=3D"gmail-m_-6043557196476035592m_2342340743758042992gmail-"></span><br=
><span class=3D"gmail-m_-6043557196476035592m_2342340743758042992gmail-"></=
span><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left:1px solid rgb(204,204,204);padding-left:1ex"><span class=3D"gmail=
-m_-6043557196476035592m_2342340743758042992gmail-">
&gt; 3. Do we want to allow static analysis of sigop?<br>
&gt; BIP114 and the related proposals are specifically designed to allow st=
atic<br>
&gt; analysis of sigop. I think this was one of the main reason of OP_EVAL =
not<br>
&gt; being accepted. This was also the main reason of Ethereum failing to d=
o a<br>
&gt; DAO hacker softfork, leading to the ETH/ETC split. I=E2=80=99m not sur=
e if we<br>
&gt; really want to give up this property. Once we do it, we have to suppor=
t it<br>
&gt; forever.<br>
<br>
</span>It seems inevitable at this point. Maybe we could add a separate &qu=
ot;executable-<br>
witness&quot; array (in the same manner as the current witness was softfork=
ed in),<br>
and require tail-call and condition scripts to merely reference these by ha=
sh,<br>
but I&#39;m not sure it&#39;s worth the effort?<br>
<br>
Thinking further, we could avoid adding a separate executable-witness<br>
commitment by either:<br>
A) Define that all the witness elements in v1 are type-tagged (put the mino=
r<br>
=C2=A0 =C2=A0witness version on them all, and redefine minor 0 as a stack i=
tem?); or<br>
B) Use an empty element as a delimiter between stack and executable items.<=
br>
<br>
To avoid witness malleability, the executable items can be required to be<b=
r>
sorted in some manner.<br>
<br>
The downside of these approaches is that we now need an addition 20 or 32<b=
r>
bytes per script reference... which IMO may possibly be worse than losing<b=
r>
static analysis. I wonder if there&#39;s a way to avoid that overhead?<span=
 class=3D"gmail-m_-6043557196476035592m_2342340743758042992gmail-HOEnZb"><f=
ont color=3D"#888888"><br></font></span></blockquote><div><br></div><div>Ac=
tually, I have a half-baked idea I&#39;ve been thinking about along these l=
ines.</div><div><br></div><div>The idea is to add a flag to each stack item=
 in the Script interpreter to mark whether the item in the stack is &quot;e=
xecutable&quot; or &quot;non-executable&quot;, not so different from how co=
mputers mark pages to implement executable space protection.=C2=A0 By defau=
lt, all stack items are marked &quot;non-executable&quot;.=C2=A0 We then re=
define OP_PUSHDATA4 as OP_PUSHCODE within ScriptSigs.=C2=A0 The operational=
 semantics of OP_PUSHCODE would remain the same as OP_PUSHDATA4 except it w=
ould set the pushed item&#39;s associated flag to &quot;executable&quot;.=
=C2=A0 All data pushed by OP_PUSHCODE would be subject to the sigops limits=
 and any other similar static analysis limits.</div><div><br></div><div>Seg=
wit v0 doesn&#39;t use OP_PUSHDATA codes to create the input stack, so we w=
ould have to mark executable input stack items using a new witness v1 forma=
t. But, IIUC, TAILCALL isn&#39;t going to be compatible with Segwit v0 anyw=
ay.<br></div><div><br></div><div>During a TAILCALL, it is required that the=
 top item on the stack have the &quot;executable&quot; flag, otherwise TAIL=
CALL is not used (and the script succeeds or fails based on the top item=
9;s data value as usual).</div><div><br></div><div>All other operations can=
 treat &quot;executable&quot; items as data, including the merkle branch ve=
rification.=C2=A0 None of the Script operations can create &quot;executable=
&quot; items; in particular, OP_PUSHDATA4 within the ScriptPubKey also woul=
d not create &quot;executable&quot; items.=C2=A0 (We can talk about the beh=
aviour of OP_CAT when that time comes).<br></div><div><br></div><div>One la=
st trick is that when &quot;executable&quot; values are duplicated, by OP_D=
UP, OP_IFDUP, OP_PICK. then the newly created copy of the value on top of t=
he stack is marked &quot;non-executable&quot;.</div><div><br></div><div>Bec=
ause we make the &quot;executable&quot; flag non-copyable, we are now free =
to allow unbounded uses of TAILCALL (i.e. TAILCALL can be used multiplie ti=
mes in a single input).=C2=A0 Why is this safe?=C2=A0 Because the number of=
 &quot;executable&quot; items decreases by at least one every time TAILCALL=
 is invoked. the number of OP_PUSHCODE occurrences in the witness puts an u=
pper bound on the number of invocations of TAILCALL allowed.=C2=A0 Using st=
atic analysis of the script pubkey and the data within the OP_PUSHCODE data=
, we compute an upper bound on the number of operations (of any type) that =
can occur during execution.</div><div><br></div><div>Unbounded TAILCALL sho=
uld let us (in the presence of OP_CHECKSIGFROMSTACK) have unbounded delegat=
ion.</div><div><br></div><div>Overall, I believe that OP_PUSHCODE<br></div>=
<div><br></div><div>1. is fully backwards compatible.<br></div><div>2. main=
tains our ability to perform static analysis with TAILCALL.</div><div>3. ne=
ver lets us interpret computed values as executable code.</div><div>4. exte=
nds TAILCALL to safely allow multiple TAILCALLs per script.<br></div></div>=
</div></div></div></div></div>

--001a113c4f3c3975b8055a9387ec--