d3/689c0b4c16b881d5f8041a37636260261f3567


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269

Return-Path: <jlrubin@mit.edu>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id CD357C03
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 24 Jun 2019 22:48:00 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id E2D867FB
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 24 Jun 2019 22:47:59 +0000 (UTC)
Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com
	[209.85.208.41]) (authenticated bits=0)
	(User authenticated as jlrubin@ATHENA.MIT.EDU)
	by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id x5OMlvED030440
	(version=TLSv1/SSLv3 cipher=AES128-GCM-SHA256 bits=128 verify=NOT)
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 24 Jun 2019 18:47:58 -0400
Received: by mail-ed1-f41.google.com with SMTP id p15so24000080eds.8
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 24 Jun 2019 15:47:58 -0700 (PDT)
X-Gm-Message-State: APjAAAW6UgKbAUdx6avJoYEMHTt9MxRxtkeejQwaSTCI068c4m2Zs+R3
	fyy5CgYPuwBnFYBN34pKOfgTYs9gFiE/yw/odrI=
X-Google-Smtp-Source: APXvYqx1A9HECnSkJ4vMPDqpGLKSY5TFqKIgGBjP91YKXhgPT6UD2Etl7mykTwsC2fKjBYhDFLhncrbzF1SVqIG6fSU=
X-Received: by 2002:aa7:cfc3:: with SMTP id r3mr35075817edy.202.1561416476859; 
	Mon, 24 Jun 2019 15:47:56 -0700 (PDT)
MIME-Version: 1.0
References: <CAD5xwhjSj82YYuQHHbwgSLvUNV2RDY0b=yMYeLj-p6j7PpS9-Q@mail.gmail.com>
	<20190605093039.xfo7lcylqkhsfncv@erisian.com.au>
	<im0q8670MxshmvMLmoJU0dv4rFhwWZNvQeQYv7i4fBWJOx0ghAdH8fYuQSqNxO2z8uxXGV-kurinUDfl0FsLWD0knw_U_h3zVZ0xy7vmn8o=@protonmail.com>
	<CAMZUoK=ZB06jwAbuX2D=aN8ztAqr_jSgEXS1z1ABjQYVawKCBQ@mail.gmail.com>
	<CAD5xwhj8o8Vbrk2KADBOFGfkD3fW3eMZo5aHJytGAj_5LLhYCg@mail.gmail.com>
	<CAMZUoKkPUn01V7WruMqoYtwJ__ai-QPvD81ceoYC7j4+hC99gg@mail.gmail.com>
	<CAD5xwhi6QU5OZwSGMp4P3q7OYZMMZRUZgd2YOiUnv5tqgJxPSA@mail.gmail.com>
	<CAMZUoKkorcO+CD6jcV5tyCtrKuHq_2hJhKE08FTrqJz7GgPM8Q@mail.gmail.com>
In-Reply-To: <CAMZUoKkorcO+CD6jcV5tyCtrKuHq_2hJhKE08FTrqJz7GgPM8Q@mail.gmail.com>
From: Jeremy <jlrubin@mit.edu>
Date: Mon, 24 Jun 2019 15:47:44 -0700
X-Gmail-Original-Message-ID: <CAD5xwhjaC61jOLvPrMcsvL9ji5zUAP-=ai3NhBojeQcC4v8DpA@mail.gmail.com>
Message-ID: <CAD5xwhjaC61jOLvPrMcsvL9ji5zUAP-=ai3NhBojeQcC4v8DpA@mail.gmail.com>
To: "Russell O'Connor" <roconnor@blockstream.io>
Content-Type: multipart/alternative; boundary="0000000000007324b1058c199a6e"
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,HTML_MESSAGE,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Tue, 25 Jun 2019 17:26:58 +0000
Cc: Bitcoin development mailing list <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] OP_SECURETHEBAG (supersedes OP_CHECKOUTPUTSVERIFY)
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Jun 2019 22:48:00 -0000

--0000000000007324b1058c199a6e
Content-Type: text/plain; charset="UTF-8"

I agree in principal, but I think that's just a bit of 'how things are'
versus how they should be.

I disagree that we get composability semantics because of OP_IF. E.g., the
script "OP_IF .... " and "OP_END" are two scripts that separately are
invalid as parsed, but together are valid. OP_IF already imposes some
lookahead functionality... but as I understand it, it may be feasible to
get rid of OP_IF for tapscripts anyways. Also in this bucket are P2SH and
segwit, which I think breaks this because the concat of two p2sh scripts or
segwit scripts is not the same as them severally.

I also think that the OP_SECURETHEBAG use of pushdata is a backwards
compatible hack: we can always later redefine the parser to parse
OP_SECURETHEBAG as the 34 byte opcode, recapturing the purity of the
semantics. We can also fix it to not use an extra byte in a future tapleaf
version.

====

In any case, I don't disagree with figuring out what patching the parser to
handle multibyte opcodes would look like. If that sort of upgrade-path were
readily available when I wrote this, it's how I would have done it. There
are two approaches I looked at mostly:

1) Adding flags to GetOp to change how it parses
  a) Most of the same code paths used for new and old script
  b) Higher risk of breaking something in old script style/downstream
  c) Cleans up only one issue (multibyte opcodes) leaves other warts in
place
  d) less bikesheddable design (mostly same as old script)
  e) code not increased in size
2) Adding a completely new interpreter for Tapscript
  a) Fork the existing interpreter code
  b) For all places where scripts are run, switch based on if it is
tapscript or not
  c) Can clean up various semantics, can even do fancier things like
huffman encode opcodes to less than a byte
  d) Can clearly separate parsing the script from executing it
  e) Can improve versioning techniques
  f) Low risk of breaking something in old script style/downstream
  g) Increases amount of code substantially
  h) Bikesheddable design (everything is on the table).
  i) probably a better general mechanism for future changes to script
parsing, less consensus risk
  j) More compatible with templated script as well.

If not clear, I think that 2 is probably a better approach, but I'm worried
that 2.h means this would take a much longer time to implement.

2 can be segmented into two components:

1) the architecture of script parser versioning
2) the actual new script version

I think that component 1 can be relatively non controversial, thankfully,
using tapleaf versions (the architecture question is more around code
structure). A proof of concept of this would be to have a fork that uses
two independent, but identical, script parsers.

Part two of this plan would be to modify one of the versions substantially.
I'm not sure what exists on the laundry list, but I think it would be
possible to pick a few worthwhile cleanups. E.g.:

1) Multibyte opcodes
2) Templated scripts
3) Huffman Encoding opcodes
4) OP_IF handling (maybe just get rid of it in favor of conditional Verify
semantics)

And make it clear that because we can add future script versions fairly
easily, this is a sufficient step.


Does that seem in line with your understanding of how this might be done?

--0000000000007324b1058c199a6e
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div style=3D"font-family:arial,helvetica,sans-serif;font-=
size:small;color:rgb(0,0,0)" class=3D"gmail_default">I agree in principal, =
but I think that&#39;s just a bit of &#39;how things are&#39; versus how th=
ey should be.<br></div><div style=3D"font-family:arial,helvetica,sans-serif=
;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default"><br></div><div s=
tyle=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,=
0,0)" class=3D"gmail_default">I disagree that we get composability semantic=
s because of OP_IF. E.g., the script &quot;OP_IF .... &quot; and &quot;OP_E=
ND&quot; are two scripts that separately are invalid as parsed, but togethe=
r are valid. OP_IF already imposes some lookahead functionality... but as I=
 understand it, it may be feasible to get rid of OP_IF for tapscripts anywa=
ys. Also in this bucket are P2SH and segwit, which I think breaks this beca=
use the concat of two p2sh scripts or segwit scripts is not the same as the=
m severally.<br></div><div style=3D"font-family:arial,helvetica,sans-serif;=
font-size:small;color:rgb(0,0,0)" class=3D"gmail_default"><br></div><div st=
yle=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0=
,0)" class=3D"gmail_default">I also think that the OP_SECURETHEBAG use of p=
ushdata is a backwards compatible hack: we can always later redefine the pa=
rser to parse OP_SECURETHEBAG as the 34 byte opcode, recapturing the purity=
 of the semantics. We can also fix it to not use an extra byte in a future =
tapleaf version.<br></div><div style=3D"font-family:arial,helvetica,sans-se=
rif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default"><br></div><di=
v style=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb=
(0,0,0)" class=3D"gmail_default">=3D=3D=3D=3D</div><div style=3D"font-famil=
y:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"gma=
il_default"><br></div><div style=3D"font-family:arial,helvetica,sans-serif;=
font-size:small;color:rgb(0,0,0)" class=3D"gmail_default">In any case, I do=
n&#39;t disagree with figuring out what patching the parser to handle multi=
byte opcodes would look like. If that sort of upgrade-path were readily ava=
ilable when I wrote this, it&#39;s how I would have done it. There are two =
approaches I looked at mostly:<br></div><div style=3D"font-family:arial,hel=
vetica,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default"=
><br></div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:s=
mall;color:rgb(0,0,0)" class=3D"gmail_default">1) Adding flags to GetOp to =
change how it parses</div><div style=3D"font-family:arial,helvetica,sans-se=
rif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default">=C2=A0 a) Mos=
t of the same code paths used for new and old script</div><div style=3D"fon=
t-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)" class=
=3D"gmail_default">=C2=A0 b) Higher risk of breaking something in old scrip=
t style/downstream</div><div style=3D"font-family:arial,helvetica,sans-seri=
f;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default">=C2=A0 c) Clean=
s up only one issue (multibyte opcodes) leaves other warts in place</div><d=
iv style=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rg=
b(0,0,0)" class=3D"gmail_default">=C2=A0 d) less bikesheddable design (most=
ly same as old script)</div><div style=3D"font-family:arial,helvetica,sans-=
serif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default">=C2=A0 e) c=
ode not increased in size<br></div><div style=3D"font-family:arial,helvetic=
a,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default">2) A=
dding a completely new interpreter for Tapscript</div><div style=3D"font-fa=
mily:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"=
gmail_default">=C2=A0 a) Fork the existing interpreter code</div><div style=
=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)=
" class=3D"gmail_default">=C2=A0 b) For all places where scripts are run, s=
witch based on if it is tapscript or not</div><div style=3D"font-family:ari=
al,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_de=
fault">=C2=A0 c) Can clean up various semantics, can even do fancier things=
 like huffman encode opcodes to less than a byte</div><div style=3D"font-fa=
mily:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"=
gmail_default">=C2=A0 d) Can clearly separate parsing the script from execu=
ting it</div><div style=3D"font-family:arial,helvetica,sans-serif;font-size=
:small;color:rgb(0,0,0)" class=3D"gmail_default">=C2=A0 e) Can improve vers=
ioning techniques</div><div style=3D"font-family:arial,helvetica,sans-serif=
;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default">=C2=A0 f) Low ri=
sk of breaking something in old script style/downstream</div><div style=3D"=
font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)" cl=
ass=3D"gmail_default">=C2=A0 g) Increases amount of code substantially</div=
><div style=3D"font-family:arial,helvetica,sans-serif;font-size:small;color=
:rgb(0,0,0)" class=3D"gmail_default">=C2=A0 h) Bikesheddable design (everyt=
hing is on the table).</div><div style=3D"font-family:arial,helvetica,sans-=
serif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default">=C2=A0 i) p=
robably a better general mechanism for future changes to script parsing, le=
ss consensus risk</div><div style=3D"font-family:arial,helvetica,sans-serif=
;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default">=C2=A0 j) More c=
ompatible with templated script as well.</div><div style=3D"font-family:ari=
al,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_de=
fault"><br></div><div style=3D"font-family:arial,helvetica,sans-serif;font-=
size:small;color:rgb(0,0,0)" class=3D"gmail_default">If not clear, I think =
that 2 is probably a better approach, but I&#39;m worried that 2.h means th=
is would take a much longer time to implement.</div><div style=3D"font-fami=
ly:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"gm=
ail_default"><br></div><div style=3D"font-family:arial,helvetica,sans-serif=
;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default">2 can be segment=
ed into two components:</div><div style=3D"font-family:arial,helvetica,sans=
-serif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default"><br></div>=
<div style=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:=
rgb(0,0,0)" class=3D"gmail_default">1) the architecture of script parser ve=
rsioning</div><div style=3D"font-family:arial,helvetica,sans-serif;font-siz=
e:small;color:rgb(0,0,0)" class=3D"gmail_default">2) the actual new script =
version</div><div style=3D"font-family:arial,helvetica,sans-serif;font-size=
:small;color:rgb(0,0,0)" class=3D"gmail_default"><br></div><div style=3D"fo=
nt-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)" clas=
s=3D"gmail_default">I think that component 1 can be relatively non controve=
rsial, thankfully, using tapleaf versions (the architecture question is mor=
e around code structure). A proof of concept of this would be to have a for=
k that uses two independent, but identical, script parsers.<br></div><div s=
tyle=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,=
0,0)" class=3D"gmail_default"><br></div><div style=3D"font-family:arial,hel=
vetica,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default"=
>Part two of this plan would be to modify one of the versions substantially=
. I&#39;m not sure what exists on the laundry list, but I think it would be=
 possible to pick a few worthwhile cleanups. E.g.:<br></div><div style=3D"f=
ont-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)" cla=
ss=3D"gmail_default"><br></div><div style=3D"font-family:arial,helvetica,sa=
ns-serif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default">1) Multi=
byte opcodes</div><div style=3D"font-family:arial,helvetica,sans-serif;font=
-size:small;color:rgb(0,0,0)" class=3D"gmail_default">2) Templated scripts<=
br></div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:sma=
ll;color:rgb(0,0,0)" class=3D"gmail_default">3) Huffman Encoding opcodes<br=
></div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:small=
;color:rgb(0,0,0)" class=3D"gmail_default">4) OP_IF handling (maybe just ge=
t rid of it in favor of conditional Verify semantics)<br></div><div style=
=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)=
" class=3D"gmail_default"><br></div><div style=3D"font-family:arial,helveti=
ca,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default">And=
 make it clear that because we can add future script versions fairly easily=
, this is a sufficient step.</div><div style=3D"font-family:arial,helvetica=
,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"gmail_default"><br><=
/div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:small;c=
olor:rgb(0,0,0)" class=3D"gmail_default"><br></div><div style=3D"font-famil=
y:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)" class=3D"gma=
il_default">Does that seem in line with your understanding of how this migh=
t be done?<br></div></div>

--0000000000007324b1058c199a6e--