1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
|
Return-Path: <mark@friedenbach.org>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id CBF76A88
for <bitcoin-dev@lists.linuxfoundation.org>;
Wed, 20 Sep 2017 22:51:43 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-pf0-f181.google.com (mail-pf0-f181.google.com
[209.85.192.181])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 5823C455
for <bitcoin-dev@lists.linuxfoundation.org>;
Wed, 20 Sep 2017 22:51:42 +0000 (UTC)
Received: by mail-pf0-f181.google.com with SMTP id y29so2264178pff.0
for <bitcoin-dev@lists.linuxfoundation.org>;
Wed, 20 Sep 2017 15:51:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=friedenbach-org.20150623.gappssmtp.com; s=20150623;
h=from:content-transfer-encoding:mime-version:subject:message-id:date
:to; bh=77kvE7qcXyJxbx3OnXB5x/NAbD+CpSiXYIzch+2GI9I=;
b=zU9xrtvagZIvfqL2T/DvgPKUpkO+c+cagU33wiqVWTaQV6TSdGmeDbd8HPH9/3Mvx1
Cv8Xso8UzzZWaHfWNquhx8nKkXde7wQGe/wlKRl6DyxIDSY7umIpH+QPYGsSzq5tGXTe
QZ1n7h2ZvOig/DMMJa3uNTH5TyZ9csgkCvI+iYGRH5pxP2sd1otQSWdLCsNHYQMaLo2F
ifQHpQfD6GBJTfkFBXz2xG0aC4vIUyUi2aHt7/KysKVjNGKajlThU2CR8m3+UcoFtH5Y
OPgpCV50LrlcPZ/CES447KQjTTMHbzEy0RoO6MSi4yxABQ39DRptA/Tv4K6nrSoNP1qQ
Z5TQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:from:content-transfer-encoding:mime-version
:subject:message-id:date:to;
bh=77kvE7qcXyJxbx3OnXB5x/NAbD+CpSiXYIzch+2GI9I=;
b=SpDeOHH9RoVsFtaDxvlS/ZlXIRvTNQUuXlztOLR820M07q4nPH4mMNNdKLkCQrzsN7
pSFGP4Krd3rmlnibQYn+e/sr+5xemWJ1qcO2MwSvl/wMxa9mYt7psuilL+VGkzrvLPIz
nJRSdfvVCc7LiE9SHKEV936GVdQEH1CTvbaxy++jkVtDSUy1D5A5aFcrWA27OGcLf2r2
bgduZewQpAy69nZlR6Bt1bm2S/w4yiDFu9OyZXC1KQsEbPLVmoSTll2vycuyv54NyGD8
pZDQmCvff1BKQiQ4pKT6LCCauh9seRbZqOi6vh5IeaZlNwsWBTtBfiw5vnNS+6gApIeo
7Y6w==
X-Gm-Message-State: AHPjjUiVL3DoahlRfT1/mDbDpI+HpBtYgRSuy9nqbGHXLpn7G+z3miSm
QASy9l/oRlBIw1gm2ui3uotyrV2S57g=
X-Google-Smtp-Source: AOwi7QCHF/kh9q0xy+Ujr5YTn1NuvS3mIIaqctxi+hCRh0VIwKOxAcjloCJByvql0jQLToN4mSneKA==
X-Received: by 10.98.144.21 with SMTP id a21mr3644862pfe.159.1505947901259;
Wed, 20 Sep 2017 15:51:41 -0700 (PDT)
Received: from ?IPv6:2601:647:4600:9c66:6dfb:5b84:3a07:de6a?
([2601:647:4600:9c66:6dfb:5b84:3a07:de6a])
by smtp.gmail.com with ESMTPSA id f69sm16199pff.4.2017.09.20.15.51.40
for <bitcoin-dev@lists.linuxfoundation.org>
(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
Wed, 20 Sep 2017 15:51:40 -0700 (PDT)
From: Mark Friedenbach <mark@friedenbach.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Message-Id: <6856ECC5-06BF-4A03-869D-AA1132FE0705@friedenbach.org>
Date: Wed, 20 Sep 2017 15:51:39 -0700
To: bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org>
X-Mailer: Apple Mail (2.3273)
X-Spam-Status: No, score=0.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
RCVD_IN_DNSWL_NONE autolearn=disabled version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
X-Mailman-Approved-At: Wed, 20 Sep 2017 22:55:06 +0000
Subject: [bitcoin-dev] An explanation and justification of the tail-call and
MBV approach to MAST
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Sep 2017 22:51:43 -0000
Over the past few weeks I've been explaining the MERKLEBRANCHVERIFY
opcode and tail-call execution semantics to a variety of developers,
and it's come to my attention that the BIPs presentation of the
concept is not as clear as it could be. Part of this is the fault of
standards documents being standards documents whose first and foremost
responsibility is precision, not pedagogy.
I think there's a better way to explain this approach to achieving
MAST, and it's worked better in the face to face and whiteboard
conversations I've had. I'm forwarding it to this list in case there
are others who desire a more clear explanation of what the
MERKLEBRANCHVERIFY and tail-call BIPs are trying to achieve, and what
any of it has to do with MAST / Merklized script.
I've written for all audiences so I apologize if it starts of at a
newbie level, but I encourage you to skim, not skip as I quickly start
varying this beginner material in atypical ways.
Review of P2SH
It's easiest to explain the behavior and purpose of these BIPs by
starting with P2SH, which we are generalizing from. BIP 16 (Pay to
Script Hash) specifies a form of implicit script recursion where a
redeem script is provided in the scriptSig, and the scriptPubKey is a
program that verifies the redeem script hashes to the committed value,
with the following template:
HASH160 <hash> EQUAL
This script specifies that the redeem script is pulled from the stack,
its hash is compared against the expected value, and by fiat it is
declared that the redeem script is then executed with the remaining
stack items as arguments.
Sortof. What actually happens of course is that the above scriptPubKey
template is never executed, but rather the interpreter sees that it
matches this exact template format, and thereby proceeds to carry out
the same logic as a hard-coded behavior.
Generalizing P2SH with macro-op fusion
This template-matching is unfortunate because otherwise we could
imagine generalizing this approach to cover other use cases beyond
committing to and executing a single redeem script. For example, if we
instead said that anytime the script interpreter encountered the
3-opcode sequence "HASH160 <20-byte-push> EQUAL" it switched to
interpreting the top element as if it were a script, that would enable
not just BIP 16 but also constructs like this:
IF
HASH160 <hash-1> EQUAL
ELSE
HASH160 <hash-2> EQUAL
ENDIF
This script conditionally executes one of two redeem scripts committed
to in the scriptPubKey, and at execution only reveals the script that
is actually used. All an observer learns of the other branch is the
script hash. This is a primitive form of MAST!
The "if 3-opcode P2SH template is encountered, switch to subscript"
rule is a bit difficult to work with however. It's not a true EVAL
opcode because control never returns back to the top-level script,
which makes some important aspects of the implementation easier, but
only at the cost of complexity somewhere else. What if there are
remaining opcodes in the script, such as the ELSE clause and ENDIF in
the script above? They would never be executed, but does e.g. the
closing ENDIF still need to be present? Or what about the standard
pay-to-pubkey-hash "1Address" output:
DUP HASH160 <20-byte-key-hash> EQUALVERIFY CHECKSIG
That almost looks like the magic P2SH template, except there is an
EQUALVERIFY instead of an EQUAL. The script interpreter should
obviously not treat the pubkey of a pay-to-pubkey-hash output as a
script and recurse into it, whereas it should for a P2SH style
script. But isn't the distinction kinda arbitrary?
And of course the elephant in the room is that by choosing not to
return to the original execution context we are no longer talking
about a soft-fork. Work out, for example, what will happen with the
following script:
[TRUE] HASH160 <hash-of-[TRUE]> EQUAL FALSE
(It returns false on a node that doesn't understand generalized
3-opcode P2SH recursion, true on a node that does.)
Implicit tail-call execution semantics and P2SH
Well there's a better approach than trying to create a macro-op fusion
franken-EVAL. We have to run scripts to the end to for any proposal to
be a soft-fork, and we want to avoid saving state due to prior
experience of that leading to bugs in BIP 12. That narrows our design
space to one option: allow recursion only as the final act of a
script, as BIP 16 does, but for any script not just a certain
template. That way we can safely jump into the subscript without
bothering to save local state because termination of the subscript is
termination of the script as a whole. In computer science terms, this
is known as tail-call execution semantics.
To illustrate, consider the following scriptPubKey:
DUP HASH160 <20-byte-hash> EQUALVERIFY
This script is almost exactly the same as the P2SH template, except
that it leaves the redeem script on the stack rather than consuming
it, thanks to the DUP, while it _does_ consume the boolean value at
the end because of the VERIFY. If executed, it leaves a stack exactly
as it was, which we assume will look like the following::
<argN> ... <arg1> <redeemScript>
Now a normal script is supposed to finish with just true or false on
the stack. Any script that finishes execution with more than a single
element on the stack is in violation of the so-called clean-stack rule
and is considered non-standard -- not relayable and potentially broken
by future soft-fork upgrades. But so long as at least one bit of
<redeemScript> is set, it is interpreted as true and the script
interpreter would normally interpret a successful validation at this
point, albeit with a clean-stack violation.
Let's take advantage of that by changing what the script interpreter
does when a script finishes with multiple items remaining on the stack
and top-most one evaluates as true -- a state of affairs that would
pass validation under the old rules. Now instead the interpreter
treats the top-most item on the stack as a script, and tail-call
recurse into it, P2SH-style. In the above example, <redeemScript> is
popped off the stack and is executed with <arg1> ... <argN> remaining
on the stack as its arguments.
The above script can be interpreted in English as "Perform tail-call
recursion if and only if the HASH160 of the script on the top of the
stack exactly matches this 20-byte push." Which is, of course, what
BIP 16 accomplishes with template matching. However the implicit tail
call approach allows us to do much more than just P2SH!
For starters, it turns out that using HASH160 for P2SH was probably a
bad idea as it reduces the security of a multi-party constructed hash
to an unacceptable 80 bits. That's why segwit uses 256-bit hashes for
its pay to script hash format, for 128-bit security. Had we tail call
semantics instead of BIP 16, we could have just switched to a new
address type that decodes to the following script template instead:
DUP HASH256 <32-byte-hash> EQUALVERIFY
Ta-da, we're back to full 128-bit security with no changes to the
consensus code, just a new address version to target this script
template.
MAST with tail-call alone?
Or: an aside on general recursion
Our IF-ELSE Merklized Abstract Syntax Tree example above, rewritten to
use tail-call evaluation, might look like this (there are more compact
formulations possible, but our purpose here is not code golf):
IF
DUP HASH160 <hash-1> EQUALVERIFY
ELSE
DUP HASH160 <hash-2> EQUALVERIFY
ENDIF
Either execution pathway leaves us with one of the two allowed redeem
scripts on the top of the stack, and presumably its arguments beneath
it. We then execute that script via implicit tail-call.
We could write scripts using IF-ELSE branches or other tricks to
commit to more than two possible branches, although this unfortunately
scales linearly with the number of possible branches. If we allow the
subscript itself to do its own tail-call recursion, and its subscript
and so on, then we could nest these binary branches for a true MAST in
the original sense of the term.
However in doing so we would have enabled general recursion and
inherit all the difficulties that come with that. For example, some
doofus could use a script that consists of or has the same effect as a
single DUP to cause an infinite loop in the script interpreter. And
that's just the tip of the iceberg of problems general recursion can
bring, which stem generally from resource usage no longer being
correlated with the size of the witness stack, which is the primary
resource for which there are global limits.
This is fixable with a gas-like resource accounting scheme, which
would affect not just script but also mempool, p2p, and other
layers. And there is perhaps an argument for doing so, particularly as
part of a hard-fork block size increase as more accurate resource
accounting helps prevent many bad-block attacks and let us set
adversarial limits closer to measured capacity in the expected/average
use case. But that would immensely complicate things beyond what could
achieve consensus in a reasonably short amount of time, which is a
goal of this proposal.
Instead I suggest blocking off general recursion by only allowing the
script interpreter to do one tail-call per input. To get log-scaling
benefits without deep recursion we introduce instead one new script
feature, which we'll cover in the next section. But we do leave the
door open to possible future general recursion, as we will note that
going from one layer of recursion to many would itself be a soft-fork
for the same reason that the first tail-call recursion is.
Merkle branch verify to the rescue!
In #bitcoin-wizards and elsewhere there has been a desire for some
time to have an opcode that proves that an item was drawn from the set
used to construct a given Merkle tree. This is not a new idea although
I'm not aware of any actual proposal made for it until now. The most
simple version of the opcode, the version initially proposed, takes
three arguments:
<proof> <leaf-hash> <root-hash> MERKLEBRANCHVERIFY 2DROP DROP
<root-hash> is the 32-byte hash label of the root of the Merkle tree,
calculated using a scheme defined in the fast Merkle hash tree BIP.
<leaf-hash> is 32 bytes of data which we are proving is part of the
Merkle hash tree -- usually the double-SHA256 hash of an item off the
stack.
<proof> is the path through the Merkle tree including the hashes of
branches not taken, which is the information necessary to recalculate
the root hash thereby proving that <leaf-hash> is in the Merkle tree.
The 2DROP and DROP are necessary to remove the 3 arguments from the
stack, as the opcode cannot consume them since it is soft-forked in.
There are two primary motivating applications of Merkle branch verify
(MBV), which will be covered next.
The MBV BIP will be extended to support extraction of more than one
item from the same Merkle tree, but for the rest of this explanation
we assume the current implementation of a single item proof, just for
simplicity.
MBV and MAST
This new opcode combines with single tail-call execution semantics to
allow for a very short and succinct MAST implementation:
OVER HASH256 <root-hash> MERKLEBRANCHVERIFY 2DROP DROP
That's it. This script expects an input stack in the following format:
<argN> ... <arg1> <policyScript> <proof>
At the end of execution the script has verified that <policyScript> is
part of the Merkle tree previously committed to, and <proof> is
dropped from the stack. This leaves the stack ready for a tail-call
recursion into <policyScript>.
MBV and Key Aggregation
If the signature scheme supports key aggregation, which it happens
that the the new signature aggregation scheme being worked on will
support as a side effect, then there is a very cool and useful
application that would be supported as well: tree signatures as
described by Pieter Wuille[1]. This looks almost exactly the same as
the MAST script, but with a CHECKSIG tacked on the end:
OVER HASH256 <root-hash> MERKLEBRANCHVERIFY 2DROP DROP CHECKSIG
This script expects an input stack of the following form:
<sig> <pubkey> <proof>
And proves that the pubkey is drawn from the set used to construct the
Merkle hash tree, and then its signature is checked. While it should
be clear this has 1-of-N semantics, what might be less obvious is that
key aggregation allows any signature policy expressible as a monotone
Boolean function (anything constructible with combinations of AND, OR,
and k-of-N thresholds) to be transformed to a 1-of-N over a set of key
aggregates. So the above script is a generic template able to verify
any monotone Boolean function over combinations of pubkeys, which
encompasses quite a number of use cases!
[1] https://blockstream.com/2015/08/24/treesignatures.html
An argument for permission-less innovation
The driving motivation for the tail-call and MBV proposals, and the
reason they are presented and considered together is to enable
Merklized Abstract Syntax Trees. However neither BIP actually defines
a MAST template, except as an example use case of the primitive
features. This is very much on purpose: it is the opinion of this
author that new bitcoin consensus features should follow the UNIX
philosophy as expressed by Peter Salus and Mike Gancarz and
paraphrased by yours truly:
* Write features that do one thing and do it well.
* Write features to work together.
* Simple is beautiful.
By using modularity and composition of powerful but simple tools like
MERKLEBRANCHVERIFY and single tail-call recursion to construct MAST we
enable a complex and desirable feature while minimizing changes to the
consensus code, review burden, and acquired technical debt.
The reusability of the underlying primitives also means that they can
be combined with other modular features to support use cases other
than vanilla MAST, or reused in series to work around various limits
that might otherwise be imposed on a templated form of MAST. At the
moment the chief utility of these proposals is the straightforward
MAST script written above, but as primitives they do allow a few other
use cases and also combine well with features in the pipeline or on
the drawing board. For example, in addition to MAST you can:
1. Use MERKLEBRANCHVERIFY alone to support honeypot bounties, as
discussed in the BIP.
2. Use a series of MERKLEBRANCHVERIFY opcodes to verify a branch with
split proofs to stay under script and witness push limitations.
3. Combine MERKLEBRANCHVERIFY with key aggregation to get
Wuille-Maxwell tree signatures which support arbitrary signing
policies using a single, aggregatable signature.
4. Combine tail-call execution semantics with CHECKSIGFROMSTACK to get
delegation and signature-time commitment to subscript policy.
5. Combine MERKLEBRANCHVERIFY with a Merkle proof prefix check opcode
and Lamport signature support to get reusable Lamport keys.
I believe these benefits and possible future expansions to be strong
arguments in favor of extending bitcoin in the form of small, modular,
incremental, and reusable changes that can be combined and used even
in ways unforeseen even by their creators, creating a platform for
unrestricted innovation.
The alternative approach of rigid templates achieves the stated goals,
perhaps even with slightly better encoding efficiency, but at the cost
of requiring workaround for each future innovation. P2SH is just such
an example -- we couldn't even upgrade to 128-bit security without
designing an entirely different implementation because of the
limitations of template pattern matching.
Efficiency gains from templating
Finally, I need to point out that there is one efficiency gain that a
proper template-matching implementation has over user-specified
schemes: reduction of witness data. This is both a practical side
effect of more efficient serialization that doesn't need to encode
logic as opcodes, as well as the fact that since the hashing scheme is
fixed, one layer of hashes can be removed from the serialization. In
the case of MAST, rather than encode the Merkle root hash in the
redeem script, the hash is propagated upwards and compared against the
witness commitment. The amount space saved from adopting a template is
about equal to the size of the redeem script, which is approximately
40 bytes of witness data per MAST input.
That is arguably significant enough to matter, and in the long term I
think we will adopt a MAST template for those efficiency gains. But I
feel strongly that since MAST is not a feature in wide use at this
time, it is strongly in our interests to deploy MBV, tail-call, as
well overhaul the CHECKSIG operator before tackling what we feel an
ideal MAST-supporting witness type would look like, so that with some
experience under our belt we can adopt a system that is as simple and
as succinct as possible while supporting the necessary use cases
identified by actual use of the features.
Kind regards,
Mark Friedenbach
|