James Chiang
Taproot and Policy
Video: https://www.youtube.com/watch?v=EdRm_mnoCWc
Slides: https://residency.chaincode.com/presentations/Taproot_Policy.pdf
Introduction
Hi my name is James and I have been a resident here at Chaincode. It has been a privilege. I hear they might be doing it again in the future, I highly recommend the experience, it has been really fantastic. I’ve been working on a demo library in Python for Taproot and that’s why I chose to talk about Taproot and policy today. Seemingly two separate topics but actually there are some very interesting implications for each other. Policy not mempool policy or standardness but policy language.
Overview
So these are the three things I’d like to talk about. Taproot and Schnorr, a quick introduction, perhaps a refresher for Schnorr. Policy language, which is really just saying a nice high level way for a user to describe the spending conditions of a Bitcoin without having to worry about the Bitcoin script. Finally how policy can really be enriched or have new features in privacy but also in cost and efficiency thanks to the Taproot output.
Schnorr Properties
So Schnorr, why do we care? Schnorr has some nice things that we want for Bitcoin. As Elichai talked about last week a lot of these advantages are built on the linearity property of Schnorr signatures. I’ve split the advantages into two groups from the point of view of a validator, somebody running a node or somebody receiving funds. It allows batch verification of signatures across transactions so that’s within transactions or across transactions in blocks. That means cheaper verification, it is cheaper to run a node, that is all good stuff. Security proof and non-malleable encoding. So if you are interested in ECDSA malleability, check out DER encoding. From the user side, and that’s mostly what I’ll be focusing on today, it really allows us to activate new features and new ways to spend or receive Bitcoin. For example I can create a shared public key which is owned by multiple people but from the point of view of the chain it is a single pubkey. That’s what I mean by tweakable Musig. That’s something that Taproot is really built upon. We’ve got other really neat stuff. We can do commitments in Schnorr signatures, there’s adaptor signatures, discreet log contracts, blind signatures, threshold signatures, all things that are possible which I’m not going to talk about. Those are great things that we can all activate as soon as we have Schnorr. Of course the great thing about Schnorr is that all these schemes above are indistinguishable from a single owner pubkey. That’s really nice. I’d like to have a look at how tweakable Musig can enable us to create a Taproot output.
Tweaking Schnorr for Taproot
Just a reminder, Schnorr signatures, I’m sure you’ve seen this a million times. We have a private key, a generator point, a message that we’re signing usually a transaction, a sighash or transaction digest. The public key, the nonce point, public key and message are committed to in the hash. There’s a deterministic nonce that we pick. That’s the regular Schnorr signature. What we’re going to do next is introduce a key tweak. The key tweak is something other than just adding a scalar factor to the private key. Due to the way EC math works if I add a scalar to my private key that’s equivalent to adding the public key point of that scalar, of that tweak, to my original public key point P. So what I have now is this modified key pair which I denote q and Q. We’re going to use Q inside the Schnorr signature instead of P. Why do we do that? We do that because we can now commit some kind of information into T. In Taproot the T is used to commit alternative spending conditions other than the public key point P. This is interesting especially in the case where we have P as a Musig scheme. P is aggregated for multiple keypairs which are owned by different parties. But it can be spent as if it were a single keypair. That’s especially useful in Taproot.
Taproot - Keypath Spend
There are two ways to spend a Taproot output. The first way is a little more obvious, it is the keypath spend. If we look at the Taproot output it is a very similar to a version 0 SegWit output. It is just a version byte and a 32 byte pubkey. In this case it is the pubkey Q which is tweaked from our original key point P. We have version 1 which indicates that it is a Taproot output. If we just want to spend it directly we can just provide a signature in the witness and that satisfies the spending condition. If I’m observing the blockchain this is indistinguishable from a single owner pubkey spend which is really nice. However, the signature could be signed by multiple people. If that pubkey were a Musig scheme the signature would be aggregated for multiple parties. That’s nice, there’s a certain privacy element there that is observed.
Taproot - Scriptpath Spend
The second way to spend a Taproot output is via the scriptpath. What do I mean by a scriptpath? We talked about the tweak. There is something that we can commit to, that tweak. In the scriptpath spend what we’re doing is we’re revealing a specific subset of spending conditions which we’ve committed into that tweak. We do that via a Merkle tree. In this case let’s say we’ve committed this Merkle tree into that tweak. The leaves of the Merkle tree all represent individual variants of Bitcoin script. I as a user can decide to spend a single one. In this case I’ve decided to take this one. I provide the leaf and the entire Merkle proof which includes these nodes that are highlighted in red. In the witness I have the initial stack which is actually the satisfying conditions for spending this script here, this script leaf. I have the script of the Tapleaf and I have a controlblock which essentially contains the Merkle proof for that specific Tapleaf. Any questions so far? You’ve probably heard this before, maybe, maybe not. Just to recap, a Taproot output allows me to spend either directly via the pubkey or I can enforce certain spending conditions by explicitly revealing the script as well as the proof that it is committed into that output. That’s what Taproot basically is. You can imagine layering protocols like Lightning where most of the enforcement logic is committed in this tweak but if the parties collaboratively spend none of that is visible. So privacy is awesome.
Taproot Descriptor
So really quickly. How do we describe an output like that? If I have just the output script which is a byte and the pubkey that actually doesn’t give me as a wallet or as a piece of client software, a lot of information about the output. It fails to describe… What we can do is we can say let’s create this descriptor language where the tree structure, the actual committed scripts individually, are all described in some kind of human readable language. This would be like a descriptor for Taproot not formalized today but I imagine it could look something like this. Descriptors today are used in Core. Descriptors of wallets are very similar to what you see in the Tapscript themselves. Pay-to-pubkey, we have CHECKSIG add here. This is Taproot specific. You can see the way these leaf scripts are nested in this expression reflect the Merkle tree. From that I can interpret what the specific structure is. That’s important because depending on where my…. the Merkle proof may be longer or shorter. If the script is up here my proof is basically one node shorter than if I were to provide the Merkle proof for this script down here. That’s something to take into consideration. As a user, as the person who is designing this output, I have certain creative freedom in terms of what I want to optimize for. Just based on the structure of the tree. We’ll come back to that later when we look at the policy language. The actual script itself at the bottom, we’re committing in this case 5 different scripts, it is an arbitrary number. The actual tree itself can have a max size of 32.
Tapscript
The actual script itself is called Tapscript, it is defined in a separate BIP. There are a couple of noteworthy things. In general it is very similar to the Bitcoin script we know today. Very similar OP codes. There is some versioning that I am going to fly over, CHECKSIG now obviously verifies Schnorr signatures. CHECKMULTISIG is deprecated. So instead we have CHECKSIGADD. The difference between CHECKMULTISIG and CHECKSIGADD is it requires the signatures to be in the same order as the pubkeys. That enables batch verification. There are some malleability improvements, the Schnorr signatures aren’t encoded in DER, it is just the R_x and S scalar so it is not malleable. There is another example there, OP_IF must consume 1. Previously it would consume any top stack element that was not zero which obviously is malleable. So there is a slight difference to BIP 143 in the transaction digest. The spendtype, the bit 2 must be set. That’s different than if you were to spend it via the keypath, the second bit of the spendtype byte is not activated. So generally speaking relatively similar, a couple of differences in CHECKSIG. Obviously we’re working with Schnorr signatures and some malleability improvements.
Policy Language & Miniscript
Now we’re going to switch topics quickly. We’ll come back to Taproot after this. Now we’re going to talk about policy language. Policy language is actually a really interesting topic. It was first proposed by sipa earlier this year. There is a YouTube video of his talk at Stanford and there is a demo web tool that you can check out where you enter policy language which is then compiled down to Bitcoin script. There is also a Rust compiler that Andrew is working on.
Policy Language
So what is policy language? If we look at Bitcoin script, over time we’ve figured out there are a couple of useful things we can do with it. Let me start with the terminal expressions first. We can require that the Bitcoin can be spent if… for a specific pubkey. There are timeouts that are useful. We’ve seen them in Lightning. The same goes for hashes and preimages. We can combine these different conditions with Boolean expressions like AND, OR or perhaps even a threshold expression like a m-of-n threshold expression. The idea of this policy language is I can arbitrarily nest these expressions in each other. When I write my policy language I’m only focusing on the spending logic. I’m not worrying about the OP codes in Bitcoin script. That’s nice from a user point of view. I can create pretty complex locking conditions without having to worry about the technical aspects of Bitcoin script. I have a few examples. In this case I have an AND with pay-to-pubkey and a time expression. What that basically means is I can only spend this output if I have a signature that signs with the private key of that public key and after a certain time has passed, let’s say after a 100 blocks. There’s basically a timelock pubkey output. Here’s another example within an OR expression. Either of these policy expressions can be fulfilled to spend this output. Either it is a timelocked key or as you see in the second line I have a 2-of-3 threshold expression. If I have two signatures which correspond to these pubkeys I can spend this output immediately or I can use the key number 3 to spend it which is kind of like a backup key. In this case you can imagine this is a multsignature output which would be spendable after a certain time with a backup key. Let’s say the 2-of-3 parties failed to sign or we lost the keys, I could obtain a backup key from cold storage to spend this output. We’re still working our way through policy language on a locking logic level. Here’s a final example. An OR expression with two subexpressions. I have one with a pubkey and a hash and I have one with a pubkey and time. Does anybody recognize what kind of output this expresses? A hash time-locked contract minus the revocation keys. One could see the parallels. Obviously this is a lot simpler than the actual Bitcoin script if any of you are aware of the Bitcoin script version in Lightning. It is a lot easier to reason about.
Policy to Bitcoin Script
We’ve talked about policy language but how do we go from policy language to the Bitcoin script that I can broadcast on the network? It seems like a pretty big step. We want to do that in a way that can be satisfied in a non-malleable way and we want to do it in a way where the wallet can reason about the output script. The wallet needs to figure out “Ok this is the output script, do I have the keys to spend it? If I do how do I sign? What kind of witness do I generate to be able to sign that output?” The solution to that is to create this kind of template language which is composable, modular and allows the wallet or client software to reason about malleability, solvability and spendability.
Miniscript
That is what Miniscript is. That’s really the genius of Pieter Wuille’s work here. There is a compiler that takes policy language and translates that into a set of Bitcoin script templates if you will. You can have a look here. These are a couple of terminal expressions from policy which map relatively well to Miniscript. I have pubkey, pubkeyhash, older, after so absolute timelock, relative timelock; and I have hash expressions. These Bitcoin script templates on the right are probably familiar to you if you are into Bitcoin script. As you see they map one to one so there is nothing controversial here. Where it gets a little more tricky and the compiler has to do some work is when we get into the Boolean expressions. Let’s say I have an AND policy expression, here the compiler needs to do a little bit more work to solve what type of Bitcoin script AND expression to use. For example in the first one we have two nested expressions that can be composed of any other expression in Miniscript. The first one cannot drop a one onto the stack. The first one can verify, pass verification and the second one can drop a Boolean onto the stack if you want the entire expression to Boolean. That’s why it is called and_v
, v for verify. In the second case the first expression can drop a 1 on the top of the stack if it verifies successfully but the Y needs to somehow rotate or push the 1 to the alt stack, it needs to do something with that result before it can evaluate its actual Y expression. The compiler needs to figure out how to wrap this expression in something that makes that possible. Same goes for OR .We actually have four different OR expressions. The compiler needs to figure out which one to use to make sure the expression is actually solvable. Same goes for threshold. There are some wrappers that Miniscript works with which enables us to change the properties of the expressions. What the compiler does is work its way through and tries to find one which gives us the right properties. We saw before sometimes you want a Boolean to be put pushed onto the stack, sometimes you want it to be consumed. These are modifiers that allows a compiler to achieve that. At the very end what we then obtain is Bitcoin script. We’ve gone from policy language to Miniscript. Miniscript is really just a templated Bitcoin script subset. This is all for a single script. If we loop back to Taproot now. A nice thing about Taproot is we can commit multiple spending conditions in a single tree. Policy allows us to compile it down to a single script and what I’d like to talk about now is the implications of Taproot for Policy because it really allows us to turbocharge the abilities of Policy in two important aspects.
Policy to Taproot & Tapscript
We talked about Policy language, this high level language that is user friendly and really just describes locking conditions and we somehow have to convert into this tree structure that is then spendable and can be broadcast on the Bitcoin network. I’m going to run through one simple example here with you guys to hopefully build an intuition of how that works. Here is a policy expression. Let’s presume we’re a Policy language compiler, what do we do?
Policy Abstract Syntax Tree
We build an abstract syntax tree. What am I doing here? This is not a Taproot tree. I’m simply structuring the syntax in the Policy language in a form that the computer can understand. Each node is either a terminal expression like a pubkey, time, hash or some kind of logical expression that expresses the relation between the nodes. Just by looking at this tree I can kind of decompose it into several expressions which are not atomic but in a sense independent. What expressions can I fulfill to spend this output? I can fulfill the Boolean nodes on the syntax tree and I can see that’s the first one. The AND requires both nodes on both sides. The OR just requires one. That’s the first set of conditions I can fulfill. If I fulfill these conditions that’s a successful spend for this policy. Alternatively I can walk along another OR branch and I’ve got my second expression. Finally the third possibility is that one. These all have OR relations between each other. I can spend one of these three and that would be a successful spend of this Bitcoin output. Can anybody imagine how we would organize this in a Taproot tree? This most obvious naive way would be to say these are simply individual Tapscripts in a a Taproot tree because we can only spend a single Tapscript for a Taproot output. So the relationship between these different leaves is OR. Only one can be fulfilled, only one can be executed. I forgot one aspect. What we can also do is layer on top some expected probability of execution. For every OR branch I can say “I believe the probability for that OR branch to be executed to be 2/3, 1/3, 1/5, 4/5 and I can probability weight these three expressions in composite. Why is that important?
Tapscript/Taptree Compilation
Now we’re back to Taproot and Taptrees. We talked about the implication of the key structure for the Merkle proof lengths. The higher up in the tree I am the shorter my Merkle proof is and therefore the cheaper my spend is when I reveal that branch during spending. I probability weighted this expression to be most probable for execution. Therefore when I build my Taproot tree I am going to try to put it as high as possible. Alternatively these guys I weighted a little lower. If they do get spent the proof will be a little more lengthy, a little more expensive but since I weighted the probability lower they have a lower chance of being executed. This will be the first set. This is a naive build of a Taproot tree from that complicated expression that we had at the beginning. However, the compiler can go on and say “Actually I can combine these expressions into a single expression, something with an OR.” These three leaves all have OR relationships. Any one of these can be spent. Therefore I can combine any two into a single expression wrapped in an OR Boolean. I can continue to do that with each subsequent leaf and I can class my tree all the way to one node. These spending conditions are logically equivalent. There is something interesting though. As I do that the compiler is figuring out what composition is most economically feasible for spending. Secondly as I collapse the tree more and more of these conditions are being revealed during a spend. If I only have one Tapscript all the conditions are being revealed when I spend it. If I have multiple and I only spend this Tapscript I will see the hash of this script, the hash of that script but none of the other conditions are actually revealed, only one. Whereas if I spend this one all of them will be revealed. There is a privacy tradeoff as I class my tree. That’s really the job of the compiler. The compiler in the case of Policy language to a Taproot output is twofold. One is cost, it is optimizing for the lowest expected spending cost and then there is the opportunity for the user to indicate certain privacy preferences. The user may say “I don’t want this expression here to be revealed with that one.” I want to keep them separate. If I spend one of them obviously that one that’s being spent is being revealed on chain but the other ones remain private and unrevealed.
Policy Language & Taproot
In conclusion, you can imagine some kind of user preference where they say “For that pubkey I want to exclude it from any hash expression or timeout expression. I want to keep that separate.” Therefore they need to be in separate Tapscript branches. That’s something we can’t do today because an output script is a single script. Even though there may be multiple conditions, if I execute one the entire script is revealed onchain. Furthermore, there is another degree of freedom for cost optimization because I can really have the freedom to design or structure the Taproot tree to optimize for cost efficiency and cheap spending.