summaryrefslogtreecommitdiff
path: root/3c/5363ddaa3d12d9e0e20a333d213d7a0ab72050
blob: 3a04f93765b400ca35143c27550d5bfec797cd7a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
Return-Path: <andreas@antonopoulos.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 1D8AD98C
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Thu, 21 Sep 2017 00:15:43 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-wm0-f54.google.com (mail-wm0-f54.google.com [74.125.82.54])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 8EF58433
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Thu, 21 Sep 2017 00:15:40 +0000 (UTC)
Received: by mail-wm0-f54.google.com with SMTP id e71so11283464wmg.4
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed, 20 Sep 2017 17:15:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=antonopoulos.com; s=antonopoulos;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to; 
	bh=+Hvt7VaXrBjbGr3+c5YZlg6IjOPlc7WRsOa/xpQIOms=;
	b=CRgkbRJ/fjxVlXbrgwSduWhQaANBGxoqKudONJDvu3sOfs78MYn2n43OYx0cH/kaBp
	sfXcyNu88q7ii3OMa0RsCsrvm1IlmPMip5Law0QnU0aE13/v0UbBWIFi1BIdMUZPBIZu
	NlL+BxzWaqNvzCWXYVuJ/4L49vPpl/YDmqMns=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:in-reply-to:references:from:date
	:message-id:subject:to;
	bh=+Hvt7VaXrBjbGr3+c5YZlg6IjOPlc7WRsOa/xpQIOms=;
	b=d/da47Iiu4s9d6+wfu3SwAHWpfulDutVHoxOrdRgutuNaWiRckBUtQde0Cm8LnVScb
	7SSNlYgG4ZCfl6C/mYRXEbc54xEUdcg3CcPeIdilNeOaesThwF0825PgNsLcP28jZClS
	MiOSzOKxt+H3bSR8ZX90KUsn9EwF/zJkZJ/HA5tbsP7uPBGu6UXeDOood8qp7zf7vJhS
	VdOjL3A/R1PeY7ABuRvOupFr0YkQgf902Z1clxGB0lxqrbg9BxEluExFN8tTVWiGpGMX
	VH0ts1DuFJevJYfJwrxuCYYmnj4Tm9gVDGYDQYemCjEqok3pPKXSk7l7LQOoO4sa/h4M
	wbDg==
X-Gm-Message-State: AHPjjUhiIDN6+1fVeegxF/R+iEK8isYJx/tlQPkAPvjQglFi+/+gHtS1
	krWAkJI4Z5sXT72ekALh701X6nm+cCkyzD19Ww/yfGHf
X-Google-Smtp-Source: AOwi7QBQVi+NUDaMld5epyDAbN0FAxGXv/GS75n9wUTGwAi7r6yykEbw55H76m7ZfCG99FVkvjOi0rz0soiYF6k7uyY=
X-Received: by 10.80.191.65 with SMTP id g1mr396873edk.243.1505952938631; Wed,
	20 Sep 2017 17:15:38 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.80.228.10 with HTTP; Wed, 20 Sep 2017 17:15:37 -0700 (PDT)
Received: by 10.80.228.10 with HTTP; Wed, 20 Sep 2017 17:15:37 -0700 (PDT)
In-Reply-To: <6856ECC5-06BF-4A03-869D-AA1132FE0705@friedenbach.org>
References: <6856ECC5-06BF-4A03-869D-AA1132FE0705@friedenbach.org>
From: "Andreas M. Antonopoulos" <andreas@antonopoulos.com>
Date: Wed, 20 Sep 2017 20:15:37 -0400
Message-ID: <CAFmyj8y=1fvUcJGctx4pksQO1pYRUnBmncwwc0kVqvreLbJxkg@mail.gmail.com>
To: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>, 
	Mark Friedenbach <mark@friedenbach.org>
Content-Type: multipart/alternative; boundary="089e0824d93cf4b7680559a7fd44"
X-Spam-Status: No, score=0.6 required=5.0 tests=DKIM_SIGNED,HTML_MESSAGE,
	RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM,
	T_DKIM_INVALID autolearn=disabled version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Thu, 21 Sep 2017 01:20:32 +0000
Subject: Re: [bitcoin-dev] An explanation and justification of the tail-call
 and MBV approach to MAST
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Sep 2017 00:15:43 -0000

--089e0824d93cf4b7680559a7fd44
Content-Type: text/plain; charset="UTF-8"

This was very helpful at least to me, thank you

I've been struggling to follow the discussion of tail-call execution and
understand the options for implementing MAST. This clarified everything. I
can now follow the previous discussions.




On Sep 20, 2017 18:56, "Mark Friedenbach via bitcoin-dev" <
bitcoin-dev@lists.linuxfoundation.org> wrote:

Over the past few weeks I've been explaining the MERKLEBRANCHVERIFY
opcode and tail-call execution semantics to a variety of developers,
and it's come to my attention that the BIPs presentation of the
concept is not as clear as it could be. Part of this is the fault of
standards documents being standards documents whose first and foremost
responsibility is precision, not pedagogy.

I think there's a better way to explain this approach to achieving
MAST, and it's worked better in the face to face and whiteboard
conversations I've had. I'm forwarding it to this list in case there
are others who desire a more clear explanation of what the
MERKLEBRANCHVERIFY and tail-call BIPs are trying to achieve, and what
any of it has to do with MAST / Merklized script.

I've written for all audiences so I apologize if it starts of at a
newbie level, but I encourage you to skim, not skip as I quickly start
varying this beginner material in atypical ways.


Review of P2SH

It's easiest to explain the behavior and purpose of these BIPs by
starting with P2SH, which we are generalizing from. BIP 16 (Pay to
Script Hash) specifies a form of implicit script recursion where a
redeem script is provided in the scriptSig, and the scriptPubKey is a
program that verifies the redeem script hashes to the committed value,
with the following template:

  HASH160 <hash> EQUAL

This script specifies that the redeem script is pulled from the stack,
its hash is compared against the expected value, and by fiat it is
declared that the redeem script is then executed with the remaining
stack items as arguments.

Sortof. What actually happens of course is that the above scriptPubKey
template is never executed, but rather the interpreter sees that it
matches this exact template format, and thereby proceeds to carry out
the same logic as a hard-coded behavior.


Generalizing P2SH with macro-op fusion

This template-matching is unfortunate because otherwise we could
imagine generalizing this approach to cover other use cases beyond
committing to and executing a single redeem script. For example, if we
instead said that anytime the script interpreter encountered the
3-opcode sequence "HASH160 <20-byte-push> EQUAL" it switched to
interpreting the top element as if it were a script, that would enable
not just BIP 16 but also constructs like this:

  IF
    HASH160 <hash-1> EQUAL
  ELSE
    HASH160 <hash-2> EQUAL
  ENDIF

This script conditionally executes one of two redeem scripts committed
to in the scriptPubKey, and at execution only reveals the script that
is actually used. All an observer learns of the other branch is the
script hash. This is a primitive form of MAST!

The "if 3-opcode P2SH template is encountered, switch to subscript"
rule is a bit difficult to work with however. It's not a true EVAL
opcode because control never returns back to the top-level script,
which makes some important aspects of the implementation easier, but
only at the cost of complexity somewhere else. What if there are
remaining opcodes in the script, such as the ELSE clause and ENDIF in
the script above?  They would never be executed, but does e.g. the
closing ENDIF still need to be present? Or what about the standard
pay-to-pubkey-hash "1Address" output:

  DUP HASH160 <20-byte-key-hash> EQUALVERIFY CHECKSIG

That almost looks like the magic P2SH template, except there is an
EQUALVERIFY instead of an EQUAL. The script interpreter should
obviously not treat the pubkey of a pay-to-pubkey-hash output as a
script and recurse into it, whereas it should for a P2SH style
script. But isn't the distinction kinda arbitrary?

And of course the elephant in the room is that by choosing not to
return to the original execution context we are no longer talking
about a soft-fork. Work out, for example, what will happen with the
following script:

  [TRUE] HASH160 <hash-of-[TRUE]> EQUAL FALSE

(It returns false on a node that doesn't understand generalized
3-opcode P2SH recursion, true on a node that does.)


Implicit tail-call execution semantics and P2SH

Well there's a better approach than trying to create a macro-op fusion
franken-EVAL. We have to run scripts to the end to for any proposal to
be a soft-fork, and we want to avoid saving state due to prior
experience of that leading to bugs in BIP 12. That narrows our design
space to one option: allow recursion only as the final act of a
script, as BIP 16 does, but for any script not just a certain
template. That way we can safely jump into the subscript without
bothering to save local state because termination of the subscript is
termination of the script as a whole. In computer science terms, this
is known as tail-call execution semantics.

To illustrate, consider the following scriptPubKey:

  DUP HASH160 <20-byte-hash> EQUALVERIFY

This script is almost exactly the same as the P2SH template, except
that it leaves the redeem script on the stack rather than consuming
it, thanks to the DUP, while it _does_ consume the boolean value at
the end because of the VERIFY. If executed, it leaves a stack exactly
as it was, which we assume will look like the following::

  <argN> ... <arg1> <redeemScript>

Now a normal script is supposed to finish with just true or false on
the stack. Any script that finishes execution with more than a single
element on the stack is in violation of the so-called clean-stack rule
and is considered non-standard -- not relayable and potentially broken
by future soft-fork upgrades. But so long as at least one bit of
<redeemScript> is set, it is interpreted as true and the script
interpreter would normally interpret a successful validation at this
point, albeit with a clean-stack violation.

Let's take advantage of that by changing what the script interpreter
does when a script finishes with multiple items remaining on the stack
and top-most one evaluates as true -- a state of affairs that would
pass validation under the old rules. Now instead the interpreter
treats the top-most item on the stack as a script, and tail-call
recurse into it, P2SH-style. In the above example, <redeemScript> is
popped off the stack and is executed with <arg1> ... <argN> remaining
on the stack as its arguments.

The above script can be interpreted in English as "Perform tail-call
recursion if and only if the HASH160 of the script on the top of the
stack exactly matches this 20-byte push." Which is, of course, what
BIP 16 accomplishes with template matching. However the implicit tail
call approach allows us to do much more than just P2SH!

For starters, it turns out that using HASH160 for P2SH was probably a
bad idea as it reduces the security of a multi-party constructed hash
to an unacceptable 80 bits. That's why segwit uses 256-bit hashes for
its pay to script hash format, for 128-bit security. Had we tail call
semantics instead of BIP 16, we could have just switched to a new
address type that decodes to the following script template instead:

  DUP HASH256 <32-byte-hash> EQUALVERIFY

Ta-da, we're back to full 128-bit security with no changes to the
consensus code, just a new address version to target this script
template.


MAST with tail-call alone?
Or: an aside on general recursion

Our IF-ELSE Merklized Abstract Syntax Tree example above, rewritten to
use tail-call evaluation, might look like this (there are more compact
formulations possible, but our purpose here is not code golf):

  IF
    DUP HASH160 <hash-1> EQUALVERIFY
  ELSE
    DUP HASH160 <hash-2> EQUALVERIFY
  ENDIF

Either execution pathway leaves us with one of the two allowed redeem
scripts on the top of the stack, and presumably its arguments beneath
it. We then execute that script via implicit tail-call.

We could write scripts using IF-ELSE branches or other tricks to
commit to more than two possible branches, although this unfortunately
scales linearly with the number of possible branches. If we allow the
subscript itself to do its own tail-call recursion, and its subscript
and so on, then we could nest these binary branches for a true MAST in
the original sense of the term.

However in doing so we would have enabled general recursion and
inherit all the difficulties that come with that. For example, some
doofus could use a script that consists of or has the same effect as a
single DUP to cause an infinite loop in the script interpreter. And
that's just the tip of the iceberg of problems general recursion can
bring, which stem generally from resource usage no longer being
correlated with the size of the witness stack, which is the primary
resource for which there are global limits.

This is fixable with a gas-like resource accounting scheme, which
would affect not just script but also mempool, p2p, and other
layers. And there is perhaps an argument for doing so, particularly as
part of a hard-fork block size increase as more accurate resource
accounting helps prevent many bad-block attacks and let us set
adversarial limits closer to measured capacity in the expected/average
use case. But that would immensely complicate things beyond what could
achieve consensus in a reasonably short amount of time, which is a
goal of this proposal.

Instead I suggest blocking off general recursion by only allowing the
script interpreter to do one tail-call per input. To get log-scaling
benefits without deep recursion we introduce instead one new script
feature, which we'll cover in the next section. But we do leave the
door open to possible future general recursion, as we will note that
going from one layer of recursion to many would itself be a soft-fork
for the same reason that the first tail-call recursion is.


Merkle branch verify to the rescue!

In #bitcoin-wizards and elsewhere there has been a desire for some
time to have an opcode that proves that an item was drawn from the set
used to construct a given Merkle tree. This is not a new idea although
I'm not aware of any actual proposal made for it until now. The most
simple version of the opcode, the version initially proposed, takes
three arguments:

  <proof> <leaf-hash> <root-hash> MERKLEBRANCHVERIFY 2DROP DROP

<root-hash> is the 32-byte hash label of the root of the Merkle tree,
calculated using a scheme defined in the fast Merkle hash tree BIP.

<leaf-hash> is 32 bytes of data which we are proving is part of the
Merkle hash tree -- usually the double-SHA256 hash of an item off the
stack.

<proof> is the path through the Merkle tree including the hashes of
branches not taken, which is the information necessary to recalculate
the root hash thereby proving that <leaf-hash> is in the Merkle tree.

The 2DROP and DROP are necessary to remove the 3 arguments from the
stack, as the opcode cannot consume them since it is soft-forked in.
There are two primary motivating applications of Merkle branch verify
(MBV), which will be covered next.

The MBV BIP will be extended to support extraction of more than one
item from the same Merkle tree, but for the rest of this explanation
we assume the current implementation of a single item proof, just for
simplicity.


MBV and MAST

This new opcode combines with single tail-call execution semantics to
allow for a very short and succinct MAST implementation:

  OVER HASH256 <root-hash> MERKLEBRANCHVERIFY 2DROP DROP

That's it. This script expects an input stack in the following format:

  <argN> ... <arg1> <policyScript> <proof>

At the end of execution the script has verified that <policyScript> is
part of the Merkle tree previously committed to, and <proof> is
dropped from the stack. This leaves the stack ready for a tail-call
recursion into <policyScript>.


MBV and Key Aggregation

If the signature scheme supports key aggregation, which it happens
that the the new signature aggregation scheme being worked on will
support as a side effect, then there is a very cool and useful
application that would be supported as well: tree signatures as
described by Pieter Wuille[1].  This looks almost exactly the same as
the MAST script, but with a CHECKSIG tacked on the end:

  OVER HASH256 <root-hash> MERKLEBRANCHVERIFY 2DROP DROP CHECKSIG

This script expects an input stack of the following form:

  <sig> <pubkey> <proof>

And proves that the pubkey is drawn from the set used to construct the
Merkle hash tree, and then its signature is checked. While it should
be clear this has 1-of-N semantics, what might be less obvious is that
key aggregation allows any signature policy expressible as a monotone
Boolean function (anything constructible with combinations of AND, OR,
and k-of-N thresholds) to be transformed to a 1-of-N over a set of key
aggregates. So the above script is a generic template able to verify
any monotone Boolean function over combinations of pubkeys, which
encompasses quite a number of use cases!

[1] https://blockstream.com/2015/08/24/treesignatures.html


An argument for permission-less innovation

The driving motivation for the tail-call and MBV proposals, and the
reason they are presented and considered together is to enable
Merklized Abstract Syntax Trees. However neither BIP actually defines
a MAST template, except as an example use case of the primitive
features. This is very much on purpose: it is the opinion of this
author that new bitcoin consensus features should follow the UNIX
philosophy as expressed by Peter Salus and Mike Gancarz and
paraphrased by yours truly:

  * Write features that do one thing and do it well.
  * Write features to work together.
  * Simple is beautiful.

By using modularity and composition of powerful but simple tools like
MERKLEBRANCHVERIFY and single tail-call recursion to construct MAST we
enable a complex and desirable feature while minimizing changes to the
consensus code, review burden, and acquired technical debt.

The reusability of the underlying primitives also means that they can
be combined with other modular features to support use cases other
than vanilla MAST, or reused in series to work around various limits
that might otherwise be imposed on a templated form of MAST. At the
moment the chief utility of these proposals is the straightforward
MAST script written above, but as primitives they do allow a few other
use cases and also combine well with features in the pipeline or on
the drawing board. For example, in addition to MAST you can:

1. Use MERKLEBRANCHVERIFY alone to support honeypot bounties, as
   discussed in the BIP.

2. Use a series of MERKLEBRANCHVERIFY opcodes to verify a branch with
   split proofs to stay under script and witness push limitations.

3. Combine MERKLEBRANCHVERIFY with key aggregation to get
   Wuille-Maxwell tree signatures which support arbitrary signing
   policies using a single, aggregatable signature.

4. Combine tail-call execution semantics with CHECKSIGFROMSTACK to get
   delegation and signature-time commitment to subscript policy.

5. Combine MERKLEBRANCHVERIFY with a Merkle proof prefix check opcode
   and Lamport signature support to get reusable Lamport keys.

I believe these benefits and possible future expansions to be strong
arguments in favor of extending bitcoin in the form of small, modular,
incremental, and reusable changes that can be combined and used even
in ways unforeseen even by their creators, creating a platform for
unrestricted innovation.

The alternative approach of rigid templates achieves the stated goals,
perhaps even with slightly better encoding efficiency, but at the cost
of requiring workaround for each future innovation. P2SH is just such
an example -- we couldn't even upgrade to 128-bit security without
designing an entirely different implementation because of the
limitations of template pattern matching.


Efficiency gains from templating

Finally, I need to point out that there is one efficiency gain that a
proper template-matching implementation has over user-specified
schemes: reduction of witness data. This is both a practical side
effect of more efficient serialization that doesn't need to encode
logic as opcodes, as well as the fact that since the hashing scheme is
fixed, one layer of hashes can be removed from the serialization. In
the case of MAST, rather than encode the Merkle root hash in the
redeem script, the hash is propagated upwards and compared against the
witness commitment. The amount space saved from adopting a template is
about equal to the size of the redeem script, which is approximately
40 bytes of witness data per MAST input.

That is arguably significant enough to matter, and in the long term I
think we will adopt a MAST template for those efficiency gains. But I
feel strongly that since MAST is not a feature in wide use at this
time, it is strongly in our interests to deploy MBV, tail-call, as
well overhaul the CHECKSIG operator before tackling what we feel an
ideal MAST-supporting witness type would look like, so that with some
experience under our belt we can adopt a system that is as simple and
as succinct as possible while supporting the necessary use cases
identified by actual use of the features.

Kind regards,
Mark Friedenbach
_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

--089e0824d93cf4b7680559a7fd44
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div>This was very helpful at least to me, thank you<div =
dir=3D"auto"><br></div><div dir=3D"auto">I&#39;ve been struggling to follow=
 the discussion of tail-call execution and understand the options for imple=
menting MAST. This clarified everything. I can now follow the previous disc=
ussions.</div><div dir=3D"auto"><br></div><div dir=3D"auto"><br></div><br><=
div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Sep 20, 2017 18=
:56, &quot;Mark Friedenbach via bitcoin-dev&quot; &lt;<a href=3D"mailto:bit=
coin-dev@lists.linuxfoundation.org">bitcoin-dev@lists.linuxfoundation.org</=
a>&gt; wrote:<br type=3D"attribution"><blockquote class=3D"quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Over the pas=
t few weeks I&#39;ve been explaining the MERKLEBRANCHVERIFY<br>
opcode and tail-call execution semantics to a variety of developers,<br>
and it&#39;s come to my attention that the BIPs presentation of the<br>
concept is not as clear as it could be. Part of this is the fault of<br>
standards documents being standards documents whose first and foremost<br>
responsibility is precision, not pedagogy.<br>
<br>
I think there&#39;s a better way to explain this approach to achieving<br>
MAST, and it&#39;s worked better in the face to face and whiteboard<br>
conversations I&#39;ve had. I&#39;m forwarding it to this list in case ther=
e<br>
are others who desire a more clear explanation of what the<br>
MERKLEBRANCHVERIFY and tail-call BIPs are trying to achieve, and what<br>
any of it has to do with MAST / Merklized script.<br>
<br>
I&#39;ve written for all audiences so I apologize if it starts of at a<br>
newbie level, but I encourage you to skim, not skip as I quickly start<br>
varying this beginner material in atypical ways.<br>
<br>
<br>
Review of P2SH<br>
<br>
It&#39;s easiest to explain the behavior and purpose of these BIPs by<br>
starting with P2SH, which we are generalizing from. BIP 16 (Pay to<br>
Script Hash) specifies a form of implicit script recursion where a<br>
redeem script is provided in the scriptSig, and the scriptPubKey is a<br>
program that verifies the redeem script hashes to the committed value,<br>
with the following template:<br>
<br>
=C2=A0 HASH160 &lt;hash&gt; EQUAL<br>
<br>
This script specifies that the redeem script is pulled from the stack,<br>
its hash is compared against the expected value, and by fiat it is<br>
declared that the redeem script is then executed with the remaining<br>
stack items as arguments.<br>
<br>
Sortof. What actually happens of course is that the above scriptPubKey<br>
template is never executed, but rather the interpreter sees that it<br>
matches this exact template format, and thereby proceeds to carry out<br>
the same logic as a hard-coded behavior.<br>
<br>
<br>
Generalizing P2SH with macro-op fusion<br>
<br>
This template-matching is unfortunate because otherwise we could<br>
imagine generalizing this approach to cover other use cases beyond<br>
committing to and executing a single redeem script. For example, if we<br>
instead said that anytime the script interpreter encountered the<br>
3-opcode sequence &quot;HASH160 &lt;20-byte-push&gt; EQUAL&quot; it switche=
d to<br>
interpreting the top element as if it were a script, that would enable<br>
not just BIP 16 but also constructs like this:<br>
<br>
=C2=A0 IF<br>
=C2=A0 =C2=A0 HASH160 &lt;hash-1&gt; EQUAL<br>
=C2=A0 ELSE<br>
=C2=A0 =C2=A0 HASH160 &lt;hash-2&gt; EQUAL<br>
=C2=A0 ENDIF<br>
<br>
This script conditionally executes one of two redeem scripts committed<br>
to in the scriptPubKey, and at execution only reveals the script that<br>
is actually used. All an observer learns of the other branch is the<br>
script hash. This is a primitive form of MAST!<br>
<br>
The &quot;if 3-opcode P2SH template is encountered, switch to subscript&quo=
t;<br>
rule is a bit difficult to work with however. It&#39;s not a true EVAL<br>
opcode because control never returns back to the top-level script,<br>
which makes some important aspects of the implementation easier, but<br>
only at the cost of complexity somewhere else. What if there are<br>
remaining opcodes in the script, such as the ELSE clause and ENDIF in<br>
the script above?=C2=A0 They would never be executed, but does e.g. the<br>
closing ENDIF still need to be present? Or what about the standard<br>
pay-to-pubkey-hash &quot;1Address&quot; output:<br>
<br>
=C2=A0 DUP HASH160 &lt;20-byte-key-hash&gt; EQUALVERIFY CHECKSIG<br>
<br>
That almost looks like the magic P2SH template, except there is an<br>
EQUALVERIFY instead of an EQUAL. The script interpreter should<br>
obviously not treat the pubkey of a pay-to-pubkey-hash output as a<br>
script and recurse into it, whereas it should for a P2SH style<br>
script. But isn&#39;t the distinction kinda arbitrary?<br>
<br>
And of course the elephant in the room is that by choosing not to<br>
return to the original execution context we are no longer talking<br>
about a soft-fork. Work out, for example, what will happen with the<br>
following script:<br>
<br>
=C2=A0 [TRUE] HASH160 &lt;hash-of-[TRUE]&gt; EQUAL FALSE<br>
<br>
(It returns false on a node that doesn&#39;t understand generalized<br>
3-opcode P2SH recursion, true on a node that does.)<br>
<br>
<br>
Implicit tail-call execution semantics and P2SH<br>
<br>
Well there&#39;s a better approach than trying to create a macro-op fusion<=
br>
franken-EVAL. We have to run scripts to the end to for any proposal to<br>
be a soft-fork, and we want to avoid saving state due to prior<br>
experience of that leading to bugs in BIP 12. That narrows our design<br>
space to one option: allow recursion only as the final act of a<br>
script, as BIP 16 does, but for any script not just a certain<br>
template. That way we can safely jump into the subscript without<br>
bothering to save local state because termination of the subscript is<br>
termination of the script as a whole. In computer science terms, this<br>
is known as tail-call execution semantics.<br>
<br>
To illustrate, consider the following scriptPubKey:<br>
<br>
=C2=A0 DUP HASH160 &lt;20-byte-hash&gt; EQUALVERIFY<br>
<br>
This script is almost exactly the same as the P2SH template, except<br>
that it leaves the redeem script on the stack rather than consuming<br>
it, thanks to the DUP, while it _does_ consume the boolean value at<br>
the end because of the VERIFY. If executed, it leaves a stack exactly<br>
as it was, which we assume will look like the following::<br>
<br>
=C2=A0 &lt;argN&gt; ... &lt;arg1&gt; &lt;redeemScript&gt;<br>
<br>
Now a normal script is supposed to finish with just true or false on<br>
the stack. Any script that finishes execution with more than a single<br>
element on the stack is in violation of the so-called clean-stack rule<br>
and is considered non-standard -- not relayable and potentially broken<br>
by future soft-fork upgrades. But so long as at least one bit of<br>
&lt;redeemScript&gt; is set, it is interpreted as true and the script<br>
interpreter would normally interpret a successful validation at this<br>
point, albeit with a clean-stack violation.<br>
<br>
Let&#39;s take advantage of that by changing what the script interpreter<br=
>
does when a script finishes with multiple items remaining on the stack<br>
and top-most one evaluates as true -- a state of affairs that would<br>
pass validation under the old rules. Now instead the interpreter<br>
treats the top-most item on the stack as a script, and tail-call<br>
recurse into it, P2SH-style. In the above example, &lt;redeemScript&gt; is<=
br>
popped off the stack and is executed with &lt;arg1&gt; ... &lt;argN&gt; rem=
aining<br>
on the stack as its arguments.<br>
<br>
The above script can be interpreted in English as &quot;Perform tail-call<b=
r>
recursion if and only if the HASH160 of the script on the top of the<br>
stack exactly matches this 20-byte push.&quot; Which is, of course, what<br=
>
BIP 16 accomplishes with template matching. However the implicit tail<br>
call approach allows us to do much more than just P2SH!<br>
<br>
For starters, it turns out that using HASH160 for P2SH was probably a<br>
bad idea as it reduces the security of a multi-party constructed hash<br>
to an unacceptable 80 bits. That&#39;s why segwit uses 256-bit hashes for<b=
r>
its pay to script hash format, for 128-bit security. Had we tail call<br>
semantics instead of BIP 16, we could have just switched to a new<br>
address type that decodes to the following script template instead:<br>
<br>
=C2=A0 DUP HASH256 &lt;32-byte-hash&gt; EQUALVERIFY<br>
<br>
Ta-da, we&#39;re back to full 128-bit security with no changes to the<br>
consensus code, just a new address version to target this script<br>
template.<br>
<br>
<br>
MAST with tail-call alone?<br>
Or: an aside on general recursion<br>
<br>
Our IF-ELSE Merklized Abstract Syntax Tree example above, rewritten to<br>
use tail-call evaluation, might look like this (there are more compact<br>
formulations possible, but our purpose here is not code golf):<br>
<br>
=C2=A0 IF<br>
=C2=A0 =C2=A0 DUP HASH160 &lt;hash-1&gt; EQUALVERIFY<br>
=C2=A0 ELSE<br>
=C2=A0 =C2=A0 DUP HASH160 &lt;hash-2&gt; EQUALVERIFY<br>
=C2=A0 ENDIF<br>
<br>
Either execution pathway leaves us with one of the two allowed redeem<br>
scripts on the top of the stack, and presumably its arguments beneath<br>
it. We then execute that script via implicit tail-call.<br>
<br>
We could write scripts using IF-ELSE branches or other tricks to<br>
commit to more than two possible branches, although this unfortunately<br>
scales linearly with the number of possible branches. If we allow the<br>
subscript itself to do its own tail-call recursion, and its subscript<br>
and so on, then we could nest these binary branches for a true MAST in<br>
the original sense of the term.<br>
<br>
However in doing so we would have enabled general recursion and<br>
inherit all the difficulties that come with that. For example, some<br>
doofus could use a script that consists of or has the same effect as a<br>
single DUP to cause an infinite loop in the script interpreter. And<br>
that&#39;s just the tip of the iceberg of problems general recursion can<br=
>
bring, which stem generally from resource usage no longer being<br>
correlated with the size of the witness stack, which is the primary<br>
resource for which there are global limits.<br>
<br>
This is fixable with a gas-like resource accounting scheme, which<br>
would affect not just script but also mempool, p2p, and other<br>
layers. And there is perhaps an argument for doing so, particularly as<br>
part of a hard-fork block size increase as more accurate resource<br>
accounting helps prevent many bad-block attacks and let us set<br>
adversarial limits closer to measured capacity in the expected/average<br>
use case. But that would immensely complicate things beyond what could<br>
achieve consensus in a reasonably short amount of time, which is a<br>
goal of this proposal.<br>
<br>
Instead I suggest blocking off general recursion by only allowing the<br>
script interpreter to do one tail-call per input. To get log-scaling<br>
benefits without deep recursion we introduce instead one new script<br>
feature, which we&#39;ll cover in the next section. But we do leave the<br>
door open to possible future general recursion, as we will note that<br>
going from one layer of recursion to many would itself be a soft-fork<br>
for the same reason that the first tail-call recursion is.<br>
<br>
<br>
Merkle branch verify to the rescue!<br>
<br>
In #bitcoin-wizards and elsewhere there has been a desire for some<br>
time to have an opcode that proves that an item was drawn from the set<br>
used to construct a given Merkle tree. This is not a new idea although<br>
I&#39;m not aware of any actual proposal made for it until now. The most<br=
>
simple version of the opcode, the version initially proposed, takes<br>
three arguments:<br>
<br>
=C2=A0 &lt;proof&gt; &lt;leaf-hash&gt; &lt;root-hash&gt; MERKLEBRANCHVERIFY=
 2DROP DROP<br>
<br>
&lt;root-hash&gt; is the 32-byte hash label of the root of the Merkle tree,=
<br>
calculated using a scheme defined in the fast Merkle hash tree BIP.<br>
<br>
&lt;leaf-hash&gt; is 32 bytes of data which we are proving is part of the<b=
r>
Merkle hash tree -- usually the double-SHA256 hash of an item off the<br>
stack.<br>
<br>
&lt;proof&gt; is the path through the Merkle tree including the hashes of<b=
r>
branches not taken, which is the information necessary to recalculate<br>
the root hash thereby proving that &lt;leaf-hash&gt; is in the Merkle tree.=
<br>
<br>
The 2DROP and DROP are necessary to remove the 3 arguments from the<br>
stack, as the opcode cannot consume them since it is soft-forked in.<br>
There are two primary motivating applications of Merkle branch verify<br>
(MBV), which will be covered next.<br>
<br>
The MBV BIP will be extended to support extraction of more than one<br>
item from the same Merkle tree, but for the rest of this explanation<br>
we assume the current implementation of a single item proof, just for<br>
simplicity.<br>
<br>
<br>
MBV and MAST<br>
<br>
This new opcode combines with single tail-call execution semantics to<br>
allow for a very short and succinct MAST implementation:<br>
<br>
=C2=A0 OVER HASH256 &lt;root-hash&gt; MERKLEBRANCHVERIFY 2DROP DROP<br>
<br>
That&#39;s it. This script expects an input stack in the following format:<=
br>
<br>
=C2=A0 &lt;argN&gt; ... &lt;arg1&gt; &lt;policyScript&gt; &lt;proof&gt;<br>
<br>
At the end of execution the script has verified that &lt;policyScript&gt; i=
s<br>
part of the Merkle tree previously committed to, and &lt;proof&gt; is<br>
dropped from the stack. This leaves the stack ready for a tail-call<br>
recursion into &lt;policyScript&gt;.<br>
<br>
<br>
MBV and Key Aggregation<br>
<br>
If the signature scheme supports key aggregation, which it happens<br>
that the the new signature aggregation scheme being worked on will<br>
support as a side effect, then there is a very cool and useful<br>
application that would be supported as well: tree signatures as<br>
described by Pieter Wuille[1].=C2=A0 This looks almost exactly the same as<=
br>
the MAST script, but with a CHECKSIG tacked on the end:<br>
<br>
=C2=A0 OVER HASH256 &lt;root-hash&gt; MERKLEBRANCHVERIFY 2DROP DROP CHECKSI=
G<br>
<br>
This script expects an input stack of the following form:<br>
<br>
=C2=A0 &lt;sig&gt; &lt;pubkey&gt; &lt;proof&gt;<br>
<br>
And proves that the pubkey is drawn from the set used to construct the<br>
Merkle hash tree, and then its signature is checked. While it should<br>
be clear this has 1-of-N semantics, what might be less obvious is that<br>
key aggregation allows any signature policy expressible as a monotone<br>
Boolean function (anything constructible with combinations of AND, OR,<br>
and k-of-N thresholds) to be transformed to a 1-of-N over a set of key<br>
aggregates. So the above script is a generic template able to verify<br>
any monotone Boolean function over combinations of pubkeys, which<br>
encompasses quite a number of use cases!<br>
<br>
[1] <a href=3D"https://blockstream.com/2015/08/24/treesignatures.html" rel=
=3D"noreferrer" target=3D"_blank">https://blockstream.com/2015/<wbr>08/24/t=
reesignatures.html</a><br>
<br>
<br>
An argument for permission-less innovation<br>
<br>
The driving motivation for the tail-call and MBV proposals, and the<br>
reason they are presented and considered together is to enable<br>
Merklized Abstract Syntax Trees. However neither BIP actually defines<br>
a MAST template, except as an example use case of the primitive<br>
features. This is very much on purpose: it is the opinion of this<br>
author that new bitcoin consensus features should follow the UNIX<br>
philosophy as expressed by Peter Salus and Mike Gancarz and<br>
paraphrased by yours truly:<br>
<br>
=C2=A0 * Write features that do one thing and do it well.<br>
=C2=A0 * Write features to work together.<br>
=C2=A0 * Simple is beautiful.<br>
<br>
By using modularity and composition of powerful but simple tools like<br>
MERKLEBRANCHVERIFY and single tail-call recursion to construct MAST we<br>
enable a complex and desirable feature while minimizing changes to the<br>
consensus code, review burden, and acquired technical debt.<br>
<br>
The reusability of the underlying primitives also means that they can<br>
be combined with other modular features to support use cases other<br>
than vanilla MAST, or reused in series to work around various limits<br>
that might otherwise be imposed on a templated form of MAST. At the<br>
moment the chief utility of these proposals is the straightforward<br>
MAST script written above, but as primitives they do allow a few other<br>
use cases and also combine well with features in the pipeline or on<br>
the drawing board. For example, in addition to MAST you can:<br>
<br>
1. Use MERKLEBRANCHVERIFY alone to support honeypot bounties, as<br>
=C2=A0 =C2=A0discussed in the BIP.<br>
<br>
2. Use a series of MERKLEBRANCHVERIFY opcodes to verify a branch with<br>
=C2=A0 =C2=A0split proofs to stay under script and witness push limitations=
.<br>
<br>
3. Combine MERKLEBRANCHVERIFY with key aggregation to get<br>
=C2=A0 =C2=A0Wuille-Maxwell tree signatures which support arbitrary signing=
<br>
=C2=A0 =C2=A0policies using a single, aggregatable signature.<br>
<br>
4. Combine tail-call execution semantics with CHECKSIGFROMSTACK to get<br>
=C2=A0 =C2=A0delegation and signature-time commitment to subscript policy.<=
br>
<br>
5. Combine MERKLEBRANCHVERIFY with a Merkle proof prefix check opcode<br>
=C2=A0 =C2=A0and Lamport signature support to get reusable Lamport keys.<br=
>
<br>
I believe these benefits and possible future expansions to be strong<br>
arguments in favor of extending bitcoin in the form of small, modular,<br>
incremental, and reusable changes that can be combined and used even<br>
in ways unforeseen even by their creators, creating a platform for<br>
unrestricted innovation.<br>
<br>
The alternative approach of rigid templates achieves the stated goals,<br>
perhaps even with slightly better encoding efficiency, but at the cost<br>
of requiring workaround for each future innovation. P2SH is just such<br>
an example -- we couldn&#39;t even upgrade to 128-bit security without<br>
designing an entirely different implementation because of the<br>
limitations of template pattern matching.<br>
<br>
<br>
Efficiency gains from templating<br>
<br>
Finally, I need to point out that there is one efficiency gain that a<br>
proper template-matching implementation has over user-specified<br>
schemes: reduction of witness data. This is both a practical side<br>
effect of more efficient serialization that doesn&#39;t need to encode<br>
logic as opcodes, as well as the fact that since the hashing scheme is<br>
fixed, one layer of hashes can be removed from the serialization. In<br>
the case of MAST, rather than encode the Merkle root hash in the<br>
redeem script, the hash is propagated upwards and compared against the<br>
witness commitment. The amount space saved from adopting a template is<br>
about equal to the size of the redeem script, which is approximately<br>
40 bytes of witness data per MAST input.<br>
<br>
That is arguably significant enough to matter, and in the long term I<br>
think we will adopt a MAST template for those efficiency gains. But I<br>
feel strongly that since MAST is not a feature in wide use at this<br>
time, it is strongly in our interests to deploy MBV, tail-call, as<br>
well overhaul the CHECKSIG operator before tackling what we feel an<br>
ideal MAST-supporting witness type would look like, so that with some<br>
experience under our belt we can adopt a system that is as simple and<br>
as succinct as possible while supporting the necessary use cases<br>
identified by actual use of the features.<br>
<br>
Kind regards,<br>
Mark Friedenbach<br>
______________________________<wbr>_________________<br>
bitcoin-dev mailing list<br>
<a href=3D"mailto:bitcoin-dev@lists.linuxfoundation.org">bitcoin-dev@lists.=
<wbr>linuxfoundation.org</a><br>
<a href=3D"https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev" =
rel=3D"noreferrer" target=3D"_blank">https://lists.linuxfoundation.<wbr>org=
/mailman/listinfo/bitcoin-<wbr>dev</a><br>
</blockquote></div><br></div></div></div>

--089e0824d93cf4b7680559a7fd44--