5e/9070d356d723feb2f365872770bcd2bd115bad


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308

Return-Path: <roconnor@blockstream.io>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 5915FB4A
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon,  2 Oct 2017 17:16:01 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-ua0-f172.google.com (mail-ua0-f172.google.com
	[209.85.217.172])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id EF29E478
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon,  2 Oct 2017 17:15:59 +0000 (UTC)
Received: by mail-ua0-f172.google.com with SMTP id i35so1029357uah.9
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 02 Oct 2017 10:15:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=blockstream-io.20150623.gappssmtp.com; s=20150623;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to
	:cc; bh=enbTFmMcjMu0hFAYlZBG6nGxsja/MS7oUhAOrLflxl8=;
	b=UywB095ow2mDCkfVW7767YeSSAjC7HnLGFgKj7gBHb+VI+EtsFGY2ClHkfTcuI+P46
	uBIYSBR+7SMEaGj/BjzI5px502it8agbxpcmOQMdH3AIdKhrpiHNc0yhhdy8bFDWumOi
	qsByWg+LKARWMf2agwCVA8J8bqpRQca3uDkDDshdgSoAjCgEp9+YJVr81EZuPpn8DQH9
	Hg0+b/HYgT1hlmsVUA3c83/H38uXYptoSQfZCNXerG9iGMhRr1F5LdeaS3bRtfonOytZ
	5r+p2nLsvrZH6FN3NC7T3DqG0WgwwCzGjzR5d7QnCCMRdQnB02WC23dUI2eEJY7xFRpE
	mxLQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:in-reply-to:references:from:date
	:message-id:subject:to:cc;
	bh=enbTFmMcjMu0hFAYlZBG6nGxsja/MS7oUhAOrLflxl8=;
	b=CfPqLNyigp5ALPYMXNw/Q1fU3mhYkxtcyJqlnVYbrsJZXZJg6XlPXIZCD5dJKpBRhf
	qTf6oTzfqvdhxw2RzHgB3aWfS+Z+DcAlgCG+cqpn/WWtKRbYuVopM/f4MzIYIfEVv08S
	bEjm8WivZT4afH+EH1x+iG+xwABFFwd9PxtlVKn3NEbGTcwiuPzrBp3kgaFgO+Ba4Y27
	WAd8kpUobf2GEAliaDX5gk9mOSvP1o/W3lAHhWwH+hHAUF6ETAO2rmYnyqHUfvbfevg5
	zXRj3NE4q8DAkHpGDQzo1SMRgs2DK6pCdQECuD/k9fn2R9Ta4xeIavVfbTXyA5eyahgi
	x/TQ==
X-Gm-Message-State: AHPjjUjCglBvp5DW67RXVjYG2HR9PG0ervZm6OYD8nh1q1vVHK/dzbWv
	hmLppVXiiYw9CQp7mLcZsGuWO1ZgHUImXa/Q43UJUA==
X-Google-Smtp-Source: AOwi7QCDAvFv9GWyu3fpuBT5nhQDoZN75UZiLTnnYeQJqnouY3/Lg3UBa/glpotSKgxIuA56g4Nq/mP7pWk/jzk3NK0=
X-Received: by 10.159.37.201 with SMTP id 67mr9193744uaf.59.1506964558926;
	Mon, 02 Oct 2017 10:15:58 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.159.41.161 with HTTP; Mon, 2 Oct 2017 10:15:38 -0700 (PDT)
In-Reply-To: <921EB5CF-B472-4BD6-9493-1A681586FB51@friedenbach.org>
References: <5B6756D0-6BEF-4A01-BDB8-52C646916E29@friedenbach.org>
	<201709302323.33004.luke@dashjr.org>
	<921EB5CF-B472-4BD6-9493-1A681586FB51@friedenbach.org>
From: "Russell O'Connor" <roconnor@blockstream.io>
Date: Mon, 2 Oct 2017 13:15:38 -0400
Message-ID: <CAMZUoK=SwR6=vWCo54_Yd11nbbwD0q60n42Uj38Yq_sWD2zS9g@mail.gmail.com>
To: Mark Friedenbach <mark@friedenbach.org>, 
	Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Content-Type: multipart/alternative; boundary="001a113c4f3c3975b8055a9387ec"
X-Spam-Status: No, score=0.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	HTML_MESSAGE,RCVD_IN_DNSWL_NONE autolearn=disabled version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: Re: [bitcoin-dev] Merkle branch verification & tail-call semantics
 for generalized MAST
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Oct 2017 17:16:01 -0000

--001a113c4f3c3975b8055a9387ec
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

(Subject was: [bitcoin-dev] Version 1 witness programs (first draft)), but
I'm moving part of that conversation to this thread.

On Sun, Oct 1, 2017 at 5:32 PM, Johnson Lau <jl2012@xbt.hk> wrote:

> 3. Do we want to allow static analysis of sigop?
> BIP114 and the related proposals are specifically designed to allow stati=
c
> analysis of sigop. I think this was one of the main reason of OP_EVAL not
> being accepted. This was also the main reason of Ethereum failing to do a
> DAO hacker softfork, leading to the ETH/ETC split. I=E2=80=99m not sure i=
f we
> really want to give up this property. Once we do it, we have to support i=
t
> forever.


I would very much like to retain the ability to do static analysis.  More
generally, the idea of interpreting arbitrary data as code, as done in
OP_EVAL and in TAILCALL, makes me quite anxious.  This at the root of many
security problems throughout the software industry, and I don't relish
giving more fuel to the underhanded Bitcoin Script contestants.

On Sun, Oct 1, 2017 at 8:45 PM, Luke Dashjr <luke@dashjr.org> wrote:

> > 3. Do we want to allow static analysis of sigop?
> > BIP114 and the related proposals are specifically designed to allow
> static
> > analysis of sigop. I think this was one of the main reason of OP_EVAL n=
ot
> > being accepted. This was also the main reason of Ethereum failing to do=
 a
> > DAO hacker softfork, leading to the ETH/ETC split. I=E2=80=99m not sure=
 if we
> > really want to give up this property. Once we do it, we have to support
> it
> > forever.
>
> It seems inevitable at this point. Maybe we could add a separate
> "executable-
> witness" array (in the same manner as the current witness was softforked
> in),
> and require tail-call and condition scripts to merely reference these by
> hash,
> but I'm not sure it's worth the effort?
>
> Thinking further, we could avoid adding a separate executable-witness
> commitment by either:
> A) Define that all the witness elements in v1 are type-tagged (put the
> minor
>    witness version on them all, and redefine minor 0 as a stack item?); o=
r
> B) Use an empty element as a delimiter between stack and executable items=
.
>
> To avoid witness malleability, the executable items can be required to be
> sorted in some manner.
>
> The downside of these approaches is that we now need an addition 20 or 32
> bytes per script reference... which IMO may possibly be worse than losing
> static analysis. I wonder if there's a way to avoid that overhead?
>

Actually, I have a half-baked idea I've been thinking about along these
lines.

The idea is to add a flag to each stack item in the Script interpreter to
mark whether the item in the stack is "executable" or "non-executable", not
so different from how computers mark pages to implement executable space
protection.  By default, all stack items are marked "non-executable".  We
then redefine OP_PUSHDATA4 as OP_PUSHCODE within ScriptSigs.  The
operational semantics of OP_PUSHCODE would remain the same as OP_PUSHDATA4
except it would set the pushed item's associated flag to "executable".  All
data pushed by OP_PUSHCODE would be subject to the sigops limits and any
other similar static analysis limits.

Segwit v0 doesn't use OP_PUSHDATA codes to create the input stack, so we
would have to mark executable input stack items using a new witness v1
format. But, IIUC, TAILCALL isn't going to be compatible with Segwit v0
anyway.

During a TAILCALL, it is required that the top item on the stack have the
"executable" flag, otherwise TAILCALL is not used (and the script succeeds
or fails based on the top item's data value as usual).

All other operations can treat "executable" items as data, including the
merkle branch verification.  None of the Script operations can create
"executable" items; in particular, OP_PUSHDATA4 within the ScriptPubKey
also would not create "executable" items.  (We can talk about the behaviour
of OP_CAT when that time comes).

One last trick is that when "executable" values are duplicated, by OP_DUP,
OP_IFDUP, OP_PICK. then the newly created copy of the value on top of the
stack is marked "non-executable".

Because we make the "executable" flag non-copyable, we are now free to
allow unbounded uses of TAILCALL (i.e. TAILCALL can be used multiplie times
in a single input).  Why is this safe?  Because the number of "executable"
items decreases by at least one every time TAILCALL is invoked. the number
of OP_PUSHCODE occurrences in the witness puts an upper bound on the number
of invocations of TAILCALL allowed.  Using static analysis of the script
pubkey and the data within the OP_PUSHCODE data, we compute an upper bound
on the number of operations (of any type) that can occur during execution.

Unbounded TAILCALL should let us (in the presence of OP_CHECKSIGFROMSTACK)
have unbounded delegation.

Overall, I believe that OP_PUSHCODE

1. is fully backwards compatible.
2. maintains our ability to perform static analysis with TAILCALL.
3. never lets us interpret computed values as executable code.
4. extends TAILCALL to safely allow multiple TAILCALLs per script.

--001a113c4f3c3975b8055a9387ec
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_quote">(Subject was: [bitcoin-dev] Ver=
sion 1 witness programs (first draft)), but I&#39;m moving part of that con=
versation to this thread.<br></div><div class=3D"gmail_quote"><br></div><di=
v class=3D"gmail_quote">On Sun, Oct 1, 2017 at 5:32 PM, Johnson Lau <span d=
ir=3D"ltr">&lt;<a href=3D"mailto:jl2012@xbt.hk" target=3D"_blank">jl2012@xb=
t.hk</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:=
1ex">3. Do we want to allow static analysis of sigop?<br>
BIP114 and the related proposals are specifically designed to allow=20
static analysis of sigop. I think this was one of the main reason of=20
OP_EVAL not being accepted. This was also the main reason of Ethereum=20
failing to do a DAO hacker softfork, leading to the ETH/ETC split. I=E2=80=
=99m=20
not sure if we really want to give up this property. Once we do it, we=20
have to support it forever.</blockquote><div><br></div><div>I would very mu=
ch like to retain the ability to do static analysis.=C2=A0 More generally, =
the idea of interpreting arbitrary data as code, as done in OP_EVAL and in =
TAILCALL, makes me quite anxious.=C2=A0 This at the root of many security p=
roblems throughout the software industry, and I don&#39;t relish giving mor=
e fuel to the underhanded Bitcoin Script contestants.<br></div><div>=C2=A0<=
br><div class=3D"gmail_quote"><div><div class=3D"gmail_quote">On Sun, Oct 1=
, 2017 at 8:45 PM, Luke Dashjr <span dir=3D"ltr">&lt;<a href=3D"mailto:luke=
@dashjr.org" target=3D"_blank">luke@dashjr.org</a>&gt;</span> wrote:<span c=
lass=3D"gmail-m_-6043557196476035592m_2342340743758042992gmail-"></span><br=
><span class=3D"gmail-m_-6043557196476035592m_2342340743758042992gmail-"></=
span><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left:1px solid rgb(204,204,204);padding-left:1ex"><span class=3D"gmail=
-m_-6043557196476035592m_2342340743758042992gmail-">
&gt; 3. Do we want to allow static analysis of sigop?<br>
&gt; BIP114 and the related proposals are specifically designed to allow st=
atic<br>
&gt; analysis of sigop. I think this was one of the main reason of OP_EVAL =
not<br>
&gt; being accepted. This was also the main reason of Ethereum failing to d=
o a<br>
&gt; DAO hacker softfork, leading to the ETH/ETC split. I=E2=80=99m not sur=
e if we<br>
&gt; really want to give up this property. Once we do it, we have to suppor=
t it<br>
&gt; forever.<br>
<br>
</span>It seems inevitable at this point. Maybe we could add a separate &qu=
ot;executable-<br>
witness&quot; array (in the same manner as the current witness was softfork=
ed in),<br>
and require tail-call and condition scripts to merely reference these by ha=
sh,<br>
but I&#39;m not sure it&#39;s worth the effort?<br>
<br>
Thinking further, we could avoid adding a separate executable-witness<br>
commitment by either:<br>
A) Define that all the witness elements in v1 are type-tagged (put the mino=
r<br>
=C2=A0 =C2=A0witness version on them all, and redefine minor 0 as a stack i=
tem?); or<br>
B) Use an empty element as a delimiter between stack and executable items.<=
br>
<br>
To avoid witness malleability, the executable items can be required to be<b=
r>
sorted in some manner.<br>
<br>
The downside of these approaches is that we now need an addition 20 or 32<b=
r>
bytes per script reference... which IMO may possibly be worse than losing<b=
r>
static analysis. I wonder if there&#39;s a way to avoid that overhead?<span=
 class=3D"gmail-m_-6043557196476035592m_2342340743758042992gmail-HOEnZb"><f=
ont color=3D"#888888"><br></font></span></blockquote><div><br></div><div>Ac=
tually, I have a half-baked idea I&#39;ve been thinking about along these l=
ines.</div><div><br></div><div>The idea is to add a flag to each stack item=
 in the Script interpreter to mark whether the item in the stack is &quot;e=
xecutable&quot; or &quot;non-executable&quot;, not so different from how co=
mputers mark pages to implement executable space protection.=C2=A0 By defau=
lt, all stack items are marked &quot;non-executable&quot;.=C2=A0 We then re=
define OP_PUSHDATA4 as OP_PUSHCODE within ScriptSigs.=C2=A0 The operational=
 semantics of OP_PUSHCODE would remain the same as OP_PUSHDATA4 except it w=
ould set the pushed item&#39;s associated flag to &quot;executable&quot;.=
=C2=A0 All data pushed by OP_PUSHCODE would be subject to the sigops limits=
 and any other similar static analysis limits.</div><div><br></div><div>Seg=
wit v0 doesn&#39;t use OP_PUSHDATA codes to create the input stack, so we w=
ould have to mark executable input stack items using a new witness v1 forma=
t. But, IIUC, TAILCALL isn&#39;t going to be compatible with Segwit v0 anyw=
ay.<br></div><div><br></div><div>During a TAILCALL, it is required that the=
 top item on the stack have the &quot;executable&quot; flag, otherwise TAIL=
CALL is not used (and the script succeeds or fails based on the top item&#3=
9;s data value as usual).</div><div><br></div><div>All other operations can=
 treat &quot;executable&quot; items as data, including the merkle branch ve=
rification.=C2=A0 None of the Script operations can create &quot;executable=
&quot; items; in particular, OP_PUSHDATA4 within the ScriptPubKey also woul=
d not create &quot;executable&quot; items.=C2=A0 (We can talk about the beh=
aviour of OP_CAT when that time comes).<br></div><div><br></div><div>One la=
st trick is that when &quot;executable&quot; values are duplicated, by OP_D=
UP, OP_IFDUP, OP_PICK. then the newly created copy of the value on top of t=
he stack is marked &quot;non-executable&quot;.</div><div><br></div><div>Bec=
ause we make the &quot;executable&quot; flag non-copyable, we are now free =
to allow unbounded uses of TAILCALL (i.e. TAILCALL can be used multiplie ti=
mes in a single input).=C2=A0 Why is this safe?=C2=A0 Because the number of=
 &quot;executable&quot; items decreases by at least one every time TAILCALL=
 is invoked. the number of OP_PUSHCODE occurrences in the witness puts an u=
pper bound on the number of invocations of TAILCALL allowed.=C2=A0 Using st=
atic analysis of the script pubkey and the data within the OP_PUSHCODE data=
, we compute an upper bound on the number of operations (of any type) that =
can occur during execution.</div><div><br></div><div>Unbounded TAILCALL sho=
uld let us (in the presence of OP_CHECKSIGFROMSTACK) have unbounded delegat=
ion.</div><div><br></div><div>Overall, I believe that OP_PUSHCODE<br></div>=
<div><br></div><div>1. is fully backwards compatible.<br></div><div>2. main=
tains our ability to perform static analysis with TAILCALL.</div><div>3. ne=
ver lets us interpret computed values as executable code.</div><div>4. exte=
nds TAILCALL to safely allow multiple TAILCALLs per script.<br></div></div>=
</div></div></div></div></div>

--001a113c4f3c3975b8055a9387ec--