[bitcoin-dev] An explanation and justification of the tail-call and MBV approach to MAST

author: Mark Friedenbach <mark@friedenbach.org> 2017-09-20 15:51:39 -0700
committer: bitcoindev <bitcoindev@gnusha.org> 2017-09-20 22:51:43 +0000
commit: f9b9d0dfd35c06f28b257cf2af64dfaa9db4e60a (patch)
tree: 3e432e16d47d1819d847bf7648bc5f16766d07b6
parent: 5ce5c96ca3a15634a6c28de3bb27e2389da2c429 (diff)
download: pi-bitcoindev-f9b9d0dfd35c06f28b257cf2af64dfaa9db4e60a.tar.gz
pi-bitcoindev-f9b9d0dfd35c06f28b257cf2af64dfaa9db4e60a.zip
1 files changed, 445 insertions, 0 deletions
diff --git a/31/28f5edef9e43b8b405ee5639b673c34c483c01 b/31/28f5edef9e43b8b405ee5639b673c34c483c01
new file mode 100644
index 000000000..7551b46a1
--- /dev/null
+++ b/31/28f5edef9e43b8b405ee5639b673c34c483c01
@@ -0,0 +1,445 @@
+Return-Path: <mark@friedenbach.org>
+Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
+	[172.17.192.35])
+	by mail.linuxfoundation.org (Postfix) with ESMTPS id CBF76A88
+	for <bitcoin-dev@lists.linuxfoundation.org>;
+	Wed, 20 Sep 2017 22:51:43 +0000 (UTC)
+X-Greylist: whitelisted by SQLgrey-1.7.6
+Received: from mail-pf0-f181.google.com (mail-pf0-f181.google.com
+	[209.85.192.181])
+	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 5823C455
+	for <bitcoin-dev@lists.linuxfoundation.org>;
+	Wed, 20 Sep 2017 22:51:42 +0000 (UTC)
+Received: by mail-pf0-f181.google.com with SMTP id y29so2264178pff.0
+	for <bitcoin-dev@lists.linuxfoundation.org>;
+	Wed, 20 Sep 2017 15:51:42 -0700 (PDT)
+DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
+	d=friedenbach-org.20150623.gappssmtp.com; s=20150623;
+	h=from:content-transfer-encoding:mime-version:subject:message-id:date
+	:to; bh=77kvE7qcXyJxbx3OnXB5x/NAbD+CpSiXYIzch+2GI9I=;
+	b=zU9xrtvagZIvfqL2T/DvgPKUpkO+c+cagU33wiqVWTaQV6TSdGmeDbd8HPH9/3Mvx1
+	Cv8Xso8UzzZWaHfWNquhx8nKkXde7wQGe/wlKRl6DyxIDSY7umIpH+QPYGsSzq5tGXTe
+	QZ1n7h2ZvOig/DMMJa3uNTH5TyZ9csgkCvI+iYGRH5pxP2sd1otQSWdLCsNHYQMaLo2F
+	ifQHpQfD6GBJTfkFBXz2xG0aC4vIUyUi2aHt7/KysKVjNGKajlThU2CR8m3+UcoFtH5Y
+	OPgpCV50LrlcPZ/CES447KQjTTMHbzEy0RoO6MSi4yxABQ39DRptA/Tv4K6nrSoNP1qQ
+	Z5TQ==
+X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
+	d=1e100.net; s=20161025;
+	h=x-gm-message-state:from:content-transfer-encoding:mime-version
+	:subject:message-id:date:to;
+	bh=77kvE7qcXyJxbx3OnXB5x/NAbD+CpSiXYIzch+2GI9I=;
+	b=SpDeOHH9RoVsFtaDxvlS/ZlXIRvTNQUuXlztOLR820M07q4nPH4mMNNdKLkCQrzsN7
+	pSFGP4Krd3rmlnibQYn+e/sr+5xemWJ1qcO2MwSvl/wMxa9mYt7psuilL+VGkzrvLPIz
+	nJRSdfvVCc7LiE9SHKEV936GVdQEH1CTvbaxy++jkVtDSUy1D5A5aFcrWA27OGcLf2r2
+	bgduZewQpAy69nZlR6Bt1bm2S/w4yiDFu9OyZXC1KQsEbPLVmoSTll2vycuyv54NyGD8
+	pZDQmCvff1BKQiQ4pKT6LCCauh9seRbZqOi6vh5IeaZlNwsWBTtBfiw5vnNS+6gApIeo
+	7Y6w==
+X-Gm-Message-State: AHPjjUiVL3DoahlRfT1/mDbDpI+HpBtYgRSuy9nqbGHXLpn7G+z3miSm
+	QASy9l/oRlBIw1gm2ui3uotyrV2S57g=
+X-Google-Smtp-Source: AOwi7QCHF/kh9q0xy+Ujr5YTn1NuvS3mIIaqctxi+hCRh0VIwKOxAcjloCJByvql0jQLToN4mSneKA==
+X-Received: by 10.98.144.21 with SMTP id a21mr3644862pfe.159.1505947901259;
+	Wed, 20 Sep 2017 15:51:41 -0700 (PDT)
+Received: from ?IPv6:2601:647:4600:9c66:6dfb:5b84:3a07:de6a?
+	([2601:647:4600:9c66:6dfb:5b84:3a07:de6a])
+	by smtp.gmail.com with ESMTPSA id f69sm16199pff.4.2017.09.20.15.51.40
+	for <bitcoin-dev@lists.linuxfoundation.org>
+	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
+	Wed, 20 Sep 2017 15:51:40 -0700 (PDT)
+From: Mark Friedenbach <mark@friedenbach.org>
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
+Message-Id: <6856ECC5-06BF-4A03-869D-AA1132FE0705@friedenbach.org>
+Date: Wed, 20 Sep 2017 15:51:39 -0700
+To: bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org>
+X-Mailer: Apple Mail (2.3273)
+X-Spam-Status: No, score=0.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
+	RCVD_IN_DNSWL_NONE autolearn=disabled version=3.3.1
+X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
+	smtp1.linux-foundation.org
+X-Mailman-Approved-At: Wed, 20 Sep 2017 22:55:06 +0000
+Subject: [bitcoin-dev] An explanation and justification of the tail-call and
+ MBV approach to MAST
+X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
+X-Mailman-Version: 2.1.12
+Precedence: list
+List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
+List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
+	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
+List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
+List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
+List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
+List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
+	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
+X-List-Received-Date: Wed, 20 Sep 2017 22:51:43 -0000
+
+Over the past few weeks I've been explaining the MERKLEBRANCHVERIFY
+opcode and tail-call execution semantics to a variety of developers,
+and it's come to my attention that the BIPs presentation of the
+concept is not as clear as it could be. Part of this is the fault of
+standards documents being standards documents whose first and foremost
+responsibility is precision, not pedagogy.
+
+I think there's a better way to explain this approach to achieving
+MAST, and it's worked better in the face to face and whiteboard
+conversations I've had. I'm forwarding it to this list in case there
+are others who desire a more clear explanation of what the
+MERKLEBRANCHVERIFY and tail-call BIPs are trying to achieve, and what
+any of it has to do with MAST / Merklized script.
+
+I've written for all audiences so I apologize if it starts of at a
+newbie level, but I encourage you to skim, not skip as I quickly start
+varying this beginner material in atypical ways.
+
+
+Review of P2SH
+
+It's easiest to explain the behavior and purpose of these BIPs by
+starting with P2SH, which we are generalizing from. BIP 16 (Pay to
+Script Hash) specifies a form of implicit script recursion where a
+redeem script is provided in the scriptSig, and the scriptPubKey is a
+program that verifies the redeem script hashes to the committed value,
+with the following template:
+
+  HASH160 <hash> EQUAL
+
+This script specifies that the redeem script is pulled from the stack,
+its hash is compared against the expected value, and by fiat it is
+declared that the redeem script is then executed with the remaining
+stack items as arguments.
+
+Sortof. What actually happens of course is that the above scriptPubKey
+template is never executed, but rather the interpreter sees that it
+matches this exact template format, and thereby proceeds to carry out
+the same logic as a hard-coded behavior.
+
+
+Generalizing P2SH with macro-op fusion
+
+This template-matching is unfortunate because otherwise we could
+imagine generalizing this approach to cover other use cases beyond
+committing to and executing a single redeem script. For example, if we
+instead said that anytime the script interpreter encountered the
+3-opcode sequence "HASH160 <20-byte-push> EQUAL" it switched to
+interpreting the top element as if it were a script, that would enable
+not just BIP 16 but also constructs like this:
+
+  IF
+    HASH160 <hash-1> EQUAL
+  ELSE
+    HASH160 <hash-2> EQUAL
+  ENDIF
+
+This script conditionally executes one of two redeem scripts committed
+to in the scriptPubKey, and at execution only reveals the script that
+is actually used. All an observer learns of the other branch is the
+script hash. This is a primitive form of MAST!
+
+The "if 3-opcode P2SH template is encountered, switch to subscript"
+rule is a bit difficult to work with however. It's not a true EVAL
+opcode because control never returns back to the top-level script,
+which makes some important aspects of the implementation easier, but
+only at the cost of complexity somewhere else. What if there are
+remaining opcodes in the script, such as the ELSE clause and ENDIF in
+the script above?  They would never be executed, but does e.g. the
+closing ENDIF still need to be present? Or what about the standard
+pay-to-pubkey-hash "1Address" output:
+
+  DUP HASH160 <20-byte-key-hash> EQUALVERIFY CHECKSIG
+
+That almost looks like the magic P2SH template, except there is an
+EQUALVERIFY instead of an EQUAL. The script interpreter should
+obviously not treat the pubkey of a pay-to-pubkey-hash output as a
+script and recurse into it, whereas it should for a P2SH style
+script. But isn't the distinction kinda arbitrary?
+
+And of course the elephant in the room is that by choosing not to
+return to the original execution context we are no longer talking
+about a soft-fork. Work out, for example, what will happen with the
+following script:
+
+  [TRUE] HASH160 <hash-of-[TRUE]> EQUAL FALSE
+
+(It returns false on a node that doesn't understand generalized
+3-opcode P2SH recursion, true on a node that does.)
+
+
+Implicit tail-call execution semantics and P2SH
+
+Well there's a better approach than trying to create a macro-op fusion
+franken-EVAL. We have to run scripts to the end to for any proposal to
+be a soft-fork, and we want to avoid saving state due to prior
+experience of that leading to bugs in BIP 12. That narrows our design
+space to one option: allow recursion only as the final act of a
+script, as BIP 16 does, but for any script not just a certain
+template. That way we can safely jump into the subscript without
+bothering to save local state because termination of the subscript is
+termination of the script as a whole. In computer science terms, this
+is known as tail-call execution semantics.
+
+To illustrate, consider the following scriptPubKey:
+
+  DUP HASH160 <20-byte-hash> EQUALVERIFY
+
+This script is almost exactly the same as the P2SH template, except
+that it leaves the redeem script on the stack rather than consuming
+it, thanks to the DUP, while it _does_ consume the boolean value at
+the end because of the VERIFY. If executed, it leaves a stack exactly
+as it was, which we assume will look like the following::
+
+  <argN> ... <arg1> <redeemScript>
+
+Now a normal script is supposed to finish with just true or false on
+the stack. Any script that finishes execution with more than a single
+element on the stack is in violation of the so-called clean-stack rule
+and is considered non-standard -- not relayable and potentially broken
+by future soft-fork upgrades. But so long as at least one bit of
+<redeemScript> is set, it is interpreted as true and the script
+interpreter would normally interpret a successful validation at this
+point, albeit with a clean-stack violation.
+
+Let's take advantage of that by changing what the script interpreter
+does when a script finishes with multiple items remaining on the stack
+and top-most one evaluates as true -- a state of affairs that would
+pass validation under the old rules. Now instead the interpreter
+treats the top-most item on the stack as a script, and tail-call
+recurse into it, P2SH-style. In the above example, <redeemScript> is
+popped off the stack and is executed with <arg1> ... <argN> remaining
+on the stack as its arguments.
+
+The above script can be interpreted in English as "Perform tail-call
+recursion if and only if the HASH160 of the script on the top of the
+stack exactly matches this 20-byte push." Which is, of course, what
+BIP 16 accomplishes with template matching. However the implicit tail
+call approach allows us to do much more than just P2SH!
+
+For starters, it turns out that using HASH160 for P2SH was probably a
+bad idea as it reduces the security of a multi-party constructed hash
+to an unacceptable 80 bits. That's why segwit uses 256-bit hashes for
+its pay to script hash format, for 128-bit security. Had we tail call
+semantics instead of BIP 16, we could have just switched to a new
+address type that decodes to the following script template instead:
+
+  DUP HASH256 <32-byte-hash> EQUALVERIFY
+
+Ta-da, we're back to full 128-bit security with no changes to the
+consensus code, just a new address version to target this script
+template.
+
+
+MAST with tail-call alone?
+Or: an aside on general recursion
+
+Our IF-ELSE Merklized Abstract Syntax Tree example above, rewritten to
+use tail-call evaluation, might look like this (there are more compact
+formulations possible, but our purpose here is not code golf):
+
+  IF
+    DUP HASH160 <hash-1> EQUALVERIFY
+  ELSE
+    DUP HASH160 <hash-2> EQUALVERIFY
+  ENDIF
+
+Either execution pathway leaves us with one of the two allowed redeem
+scripts on the top of the stack, and presumably its arguments beneath
+it. We then execute that script via implicit tail-call.
+
+We could write scripts using IF-ELSE branches or other tricks to
+commit to more than two possible branches, although this unfortunately
+scales linearly with the number of possible branches. If we allow the
+subscript itself to do its own tail-call recursion, and its subscript
+and so on, then we could nest these binary branches for a true MAST in
+the original sense of the term.
+
+However in doing so we would have enabled general recursion and
+inherit all the difficulties that come with that. For example, some
+doofus could use a script that consists of or has the same effect as a
+single DUP to cause an infinite loop in the script interpreter. And
+that's just the tip of the iceberg of problems general recursion can
+bring, which stem generally from resource usage no longer being
+correlated with the size of the witness stack, which is the primary
+resource for which there are global limits.
+
+This is fixable with a gas-like resource accounting scheme, which
+would affect not just script but also mempool, p2p, and other
+layers. And there is perhaps an argument for doing so, particularly as
+part of a hard-fork block size increase as more accurate resource
+accounting helps prevent many bad-block attacks and let us set
+adversarial limits closer to measured capacity in the expected/average
+use case. But that would immensely complicate things beyond what could
+achieve consensus in a reasonably short amount of time, which is a
+goal of this proposal.
+
+Instead I suggest blocking off general recursion by only allowing the
+script interpreter to do one tail-call per input. To get log-scaling
+benefits without deep recursion we introduce instead one new script
+feature, which we'll cover in the next section. But we do leave the
+door open to possible future general recursion, as we will note that
+going from one layer of recursion to many would itself be a soft-fork
+for the same reason that the first tail-call recursion is.
+
+
+Merkle branch verify to the rescue!
+
+In #bitcoin-wizards and elsewhere there has been a desire for some
+time to have an opcode that proves that an item was drawn from the set
+used to construct a given Merkle tree. This is not a new idea although
+I'm not aware of any actual proposal made for it until now. The most
+simple version of the opcode, the version initially proposed, takes
+three arguments:
+
+  <proof> <leaf-hash> <root-hash> MERKLEBRANCHVERIFY 2DROP DROP
+
+<root-hash> is the 32-byte hash label of the root of the Merkle tree,
+calculated using a scheme defined in the fast Merkle hash tree BIP.
+
+<leaf-hash> is 32 bytes of data which we are proving is part of the
+Merkle hash tree -- usually the double-SHA256 hash of an item off the
+stack.
+
+<proof> is the path through the Merkle tree including the hashes of
+branches not taken, which is the information necessary to recalculate
+the root hash thereby proving that <leaf-hash> is in the Merkle tree.
+
+The 2DROP and DROP are necessary to remove the 3 arguments from the
+stack, as the opcode cannot consume them since it is soft-forked in.
+There are two primary motivating applications of Merkle branch verify
+(MBV), which will be covered next.
+
+The MBV BIP will be extended to support extraction of more than one
+item from the same Merkle tree, but for the rest of this explanation
+we assume the current implementation of a single item proof, just for
+simplicity.
+
+
+MBV and MAST
+
+This new opcode combines with single tail-call execution semantics to
+allow for a very short and succinct MAST implementation:
+
+  OVER HASH256 <root-hash> MERKLEBRANCHVERIFY 2DROP DROP
+
+That's it. This script expects an input stack in the following format:
+
+  <argN> ... <arg1> <policyScript> <proof>
+
+At the end of execution the script has verified that <policyScript> is
+part of the Merkle tree previously committed to, and <proof> is
+dropped from the stack. This leaves the stack ready for a tail-call
+recursion into <policyScript>.
+
+
+MBV and Key Aggregation
+
+If the signature scheme supports key aggregation, which it happens
+that the the new signature aggregation scheme being worked on will
+support as a side effect, then there is a very cool and useful
+application that would be supported as well: tree signatures as
+described by Pieter Wuille[1].  This looks almost exactly the same as
+the MAST script, but with a CHECKSIG tacked on the end:
+
+  OVER HASH256 <root-hash> MERKLEBRANCHVERIFY 2DROP DROP CHECKSIG
+
+This script expects an input stack of the following form:
+
+  <sig> <pubkey> <proof>
+
+And proves that the pubkey is drawn from the set used to construct the
+Merkle hash tree, and then its signature is checked. While it should
+be clear this has 1-of-N semantics, what might be less obvious is that
+key aggregation allows any signature policy expressible as a monotone
+Boolean function (anything constructible with combinations of AND, OR,
+and k-of-N thresholds) to be transformed to a 1-of-N over a set of key
+aggregates. So the above script is a generic template able to verify
+any monotone Boolean function over combinations of pubkeys, which
+encompasses quite a number of use cases!
+
+[1] https://blockstream.com/2015/08/24/treesignatures.html
+
+
+An argument for permission-less innovation
+
+The driving motivation for the tail-call and MBV proposals, and the
+reason they are presented and considered together is to enable
+Merklized Abstract Syntax Trees. However neither BIP actually defines
+a MAST template, except as an example use case of the primitive
+features. This is very much on purpose: it is the opinion of this
+author that new bitcoin consensus features should follow the UNIX
+philosophy as expressed by Peter Salus and Mike Gancarz and
+paraphrased by yours truly:
+
+  * Write features that do one thing and do it well.
+  * Write features to work together.
+  * Simple is beautiful.
+
+By using modularity and composition of powerful but simple tools like
+MERKLEBRANCHVERIFY and single tail-call recursion to construct MAST we
+enable a complex and desirable feature while minimizing changes to the
+consensus code, review burden, and acquired technical debt.
+
+The reusability of the underlying primitives also means that they can
+be combined with other modular features to support use cases other
+than vanilla MAST, or reused in series to work around various limits
+that might otherwise be imposed on a templated form of MAST. At the
+moment the chief utility of these proposals is the straightforward
+MAST script written above, but as primitives they do allow a few other
+use cases and also combine well with features in the pipeline or on
+the drawing board. For example, in addition to MAST you can:
+
+1. Use MERKLEBRANCHVERIFY alone to support honeypot bounties, as
+   discussed in the BIP.
+
+2. Use a series of MERKLEBRANCHVERIFY opcodes to verify a branch with
+   split proofs to stay under script and witness push limitations.
+
+3. Combine MERKLEBRANCHVERIFY with key aggregation to get
+   Wuille-Maxwell tree signatures which support arbitrary signing
+   policies using a single, aggregatable signature.
+
+4. Combine tail-call execution semantics with CHECKSIGFROMSTACK to get
+   delegation and signature-time commitment to subscript policy.
+
+5. Combine MERKLEBRANCHVERIFY with a Merkle proof prefix check opcode
+   and Lamport signature support to get reusable Lamport keys.
+
+I believe these benefits and possible future expansions to be strong
+arguments in favor of extending bitcoin in the form of small, modular,
+incremental, and reusable changes that can be combined and used even
+in ways unforeseen even by their creators, creating a platform for
+unrestricted innovation.
+
+The alternative approach of rigid templates achieves the stated goals,
+perhaps even with slightly better encoding efficiency, but at the cost
+of requiring workaround for each future innovation. P2SH is just such
+an example -- we couldn't even upgrade to 128-bit security without
+designing an entirely different implementation because of the
+limitations of template pattern matching.
+
+
+Efficiency gains from templating
+
+Finally, I need to point out that there is one efficiency gain that a
+proper template-matching implementation has over user-specified
+schemes: reduction of witness data. This is both a practical side
+effect of more efficient serialization that doesn't need to encode
+logic as opcodes, as well as the fact that since the hashing scheme is
+fixed, one layer of hashes can be removed from the serialization. In
+the case of MAST, rather than encode the Merkle root hash in the
+redeem script, the hash is propagated upwards and compared against the
+witness commitment. The amount space saved from adopting a template is
+about equal to the size of the redeem script, which is approximately
+40 bytes of witness data per MAST input.
+
+That is arguably significant enough to matter, and in the long term I
+think we will adopt a MAST template for those efficiency gains. But I
+feel strongly that since MAST is not a feature in wide use at this
+time, it is strongly in our interests to deploy MBV, tail-call, as
+well overhaul the CHECKSIG operator before tackling what we feel an
+ideal MAST-supporting witness type would look like, so that with some
+experience under our belt we can adopt a system that is as simple and
+as succinct as possible while supporting the necessary use cases
+identified by actual use of the features.
+
+Kind regards,
+Mark Friedenbach
+
author	Mark Friedenbach <mark@friedenbach.org>	2017-09-20 15:51:39 -0700
committer	bitcoindev <bitcoindev@gnusha.org>	2017-09-20 22:51:43 +0000
commit	f9b9d0dfd35c06f28b257cf2af64dfaa9db4e60a (patch)
tree	3e432e16d47d1819d847bf7648bc5f16766d07b6
parent	5ce5c96ca3a15634a6c28de3bb27e2389da2c429 (diff)
download	pi-bitcoindev-f9b9d0dfd35c06f28b257cf2af64dfaa9db4e60a.tar.gz pi-bitcoindev-f9b9d0dfd35c06f28b257cf2af64dfaa9db4e60a.zip