Taproot

Introduction

Okay, so, first question- who put my name on that list and what do they want? It wasn't me. I'll ask questions. I can give a summary, but there's been a lot of talk already and I don't know what to focus on. What would sipa like us to review in particular about it? What design decisions do you feel least confident about? Is there anything where you would like other people to investigate design decisions before charging ahead?

Taproot overview

Let me make a list of all the things that are included in bip-taproot. Should I also give a brief summary of what the broad idea of taproot is? Is it already well known? Okay, five minute overview of taproot. Okay, where is roasbeef? This is hard, knowing where to start. Let's start with the taproot assumption. That's a good start.

The history before taproot is we have had p2sh where we moved the script actual code we don't put it in the output, it's only revealed in the input and you commit to it ahead of time. People realized you could do this recursively. Instead of just hashing the script, make it a merkle tree of multiple scripts, commit to just the merkle root and then reveal only the part of the script you're going to use, under the assumption that you're going to split your script into a collection of disjunctive statements. You only reveal a log scale number of branches. So you can have a root to the script tree, then scripts here and scripts here. So there's a bit of a scaling improvement from this and also a bit of a privacy because you're only revealing the parts you're using.

Pay-to-contract

Greg Maxwell and andytoshi realized at a breakfast place in Mountain View while I was in the restroom was that there's a function called the pay-to-contract function which takes as input an elliptic curve point or public key and a script, which is defined as the public key plus the hash of the public key and S... it's a cryptographic commitment. Nobody else, knowing P and S, cannot find the same output value. It's a weird hash value, but it has all the properties of a hash function. Second, it has a property that if you know the private key for this public key P then you also know the private key to the output of this function if you also know S. What you can do with this is say, well we're going to use this merkle tree still but we're going to add one special level on top of it where you have P here and you do this pay-to-contract of those two things instead. Now we can have consensus rules that allow you to spend in two different ways. One way is, pay-to-contract is an elliptic curve point, it's a public key. So you could sign directly with that public key, which you can do if you know the private key. It's like taking the right branch, but it's a special branch and you don't have to reveal that there was another branch or any branching at all. Or you reveal P plus a branch leading down to one of the scripts and you can verify that this thing commits to it. This is the key spending path, and these are the script spending path. You reveal the public key. When spending through script, you reveal the public key plus the merkle path and nothing else. It's the internal public key. There's an output public key, and then this is the internal public key. You only reveal the intenral public when spending using a script. When you spend using the internal public key, you are actually spending with the output public key but it's a tweaked version of the internal one.

Taproot assumption

What we're calling the taproot assumption is that, any kind of more complex script construction in bitcoin really just encodes conditions under which coins can be spent. In any sort of smart contract on top, it's possible to add a branch that says "everyone agrees" or "at least some people agree". You can always add a unanimity branch, or a certain set of people. It's a script that consists only of public keys. With adaptor signatures, you can do some things still. No hash locks or timelocks, etc. Spending using this is the cheapest thing; the only thing that goes on the blockchain is one public key and one signature. You don't reveal to the world even the existence of the other scripts that were also allowed to spend. Even when you spend using a script, you're not revealing what other scripts existed or whether a public key existed. The assumption is that most spends on the network can be written as ones that just consist of public keys. There's always the possibility of spending using a script, but we believe that if the cheapest option for spending is one that just has a public key, then people will optimize their scripts to actually have this public key on the unanimity branch, and with the argument that- there's never a problem with having a branch for when everyone agrees- there should never be a problem with that.

To your point though, there are reasons why you might not want to do that, like expecting everyone to be online at signing time. I expect that to be fairly common, like a two-of-three multisig, maybe you pick the two most-- you expect it's almost always going to be A and B then you put A and B on this branch and then put another branch that says A and C which is less likely.

How am I doing on my five minutes? Why not use threshold signatures in that specific case? I'm talking about multiple public keys, but in taproot only one is there. In musig or other key aggregation schemes where we can represent a combination of multiple keys, really just with a single public key, and there are--- with musig, you can do very simple and fairly straightforward security arguments on n-of-n multisig. You can also do non-accountable thresholds like 3-of-5 or any boolean expression over multiple keys you can encode in a single public key. But if it's not everyone, then you always have an interactive setup where the different participants need to share secrets with each other that they need to securely store and I expect this to be a major hurdle in practical implementations. Probably not in something like lightning where you already have lots of interactivity; but for most wallet applications, this probably won't fly. With n-of-n, you can do non-interactive setup, but there is interaction at signing time.

Everything becomes a merkle tree with this one special branch at the top that is mandatory. Given that it is mandatory and very cheap to use, it's reasonable that people will optimize for it. It also gives uniformity. Every output is going to look identical now. They are all just public keys or elliptic curve points. It's not really mandatory, you can use a non-existing public key. It's mandatory to put a point there, but it doesn't need to be a valid key of course. That's a bit of a tradeoff. There are probably constructions where it is more efficient to publish this without a required public key, but this breaks the uniformity and you get less privacy. This is a tradeoff between a mild scaling advantage, or bandwidth advantage really, versus privacy.

Key aggregation you use is not part of consensus; you can do whatever down the road people come up with. Yes, wallets decide. The only requirement we have at the consensus level is that it's a signature scheme that permits aggregation easily. Schnorr does this much more easily than ECDSA, so that's why Schnorr is in there. I know people will say 2-party ECDSA is a thing, but it's a couple orders of magnitude harder. Calling 2p-ECDSA a thing is strong, there are some papers. Maybe people are already massively and widely using this, specifically bitconner.

Does the pay-to-contract thing need more eyes? If you model the hash as a random oracle, then it follows trivially. The pay-to-contract scheme was introduced in 2013 conference in San Jose by Tim Ohanka. It didn't include the public key here; which made it not a commitment and therefore it was trivially broken.

So that's the explanation for why including merkle branches, which is that if you're already making this taproot execution structure and Schnorr signatures, then merkle branches are literally maybe 10 lines of consensus code. It makes very big scripts scale logarithmically instead of linearly. Such an obvious win, if you're going to change the structure anyway.

Have you analyzed how much historical scripts would have ... no, I haven't. I've made numbers about what we can expect for various types of multisig constructions. It's hard to analyze because the biggest advantage is probably going to hopefully be in the way that people use the scripting system, not so much in doing the exact same thing as was possible before.

bip-taproot proposal

In the proposal, there's a whole bunch of smaller and bigger things included. I guess I will go over those. We tried to really think about extensibility without going too far, and maybe we went too far. The reasoning here is that it's already a couple of things; there are many more ideas, even just about script execution structure, there's graftroot, g'root, and delegation mechanisms that people have thought about. There are incentives as an engineer to try to pack everything together in one proposal. One of the reasons is well, then you can analyze how they all interact and you can make sure all the combinations are done the most efficient ways and that they are all possible, and it also has some fungibility improvements because now you don't create dozens of new observable script versions. I think we have to recognize that incentive exists, and also pick a tradeoff for picking some features but not everything. As the complexity of the proposal goes up, the political difficulty of convincing the ecosystem of the necessity of everything goes up. This field is evolving, so for some things maybe it's better to wait. To compensate for having a bunch of missing things, we thought about extensibility in terms of making sure that some of the things we can imagine at least wouldn't cause a drop in efficient if they were done separately.

One of them is that the scripts at the bottom of the tree are all given a version number; we are calling that the leaf version. Really the reason for doing this was because we had 5 or 6 bits available in the serialization, instead of saying they have to be this value, if it's an unknown version number then it's an unencumbered spend. The difference between this version number, and witness version number which goes at the top, is that these are only revealed along with the script that is actually executed. You can have a tree with a whole bunch of satisfactions and they are all using boring old scripts, and one uses a new feature that exists in a future version, you're not going to tell the network that you're doing anything with that new version unless you reveal that script. So this allows you to plan for future upgrades ahead of time, without telling anyone or revealing anything.

There's a few changes to the scripting language, such as to support batch validation, like making sure the Schnorr signatures and pay-to-contract (p2c) validation is all batch verifiable. This gives us a factor of 2-4x speedup when verifying all transactions or a whole block or the whole blockchain. You can aggregate millions of signatures together. The speedup goes up only logarithmically but we're really in a use case where batch validation is the thing we-- we actually do have millions of signatures, and the only thing we care about is that they are all valid. Making sure that there's no things in the script system that break that ability to batch verify. To give an example of something not batch verifiable is OP_CHECKSIGNOT... you could write a script that depends on giving an invalid signature for a given private key; it's allowed in the scripting language even htough I can't think why you want that. So we have a rule that says all signatures that pass must be a valid one or be empty. This makes the checksig fail without making the script fail; this is a current standardness rule. Nulldummy is the checkmultisig one, I think this is nullfail.

The ECDSA opcodes are gone and replaced with Schnorr opcodes. There's no OP_CHECKMULTISIG anymore because it's not batch validatable. It actually tries multiple combinations in there, and if you don't have the information in there to say what matches with what, youcan't batch verify it. Instead, there's CHECKSIG_ADD, and it increments a counter with whether the second signature check succeeded. You can write the signature check as key checksig add number key checksig add number equal verify. This is batch validatable. There's still a limit of 201 opcodes in the scripting language, and simulating this would cost 3 opcodes per key, and with this special opcode it's only one. With respect to the 200 opcode limit, you're still counting htem the same? Yes, nothing has changed about that limit. CHECKSIGADD is just a lower cost. If you use v0 segwit, you have ECDSA opcodes. If you use v1 or taproot, you only have the Schnorr opcodes in there.

There's a whole range of unusable opcodes in today's bitcoin scripting language which automatically fail your script, which are turned into OP_SUCCESS. It's an opcode that makes the script automatically succeed. It's an unencumbered output again, even when encountered in an unexecuted IF branch. There's reasons for doing so; the advantage of this is we're no longer restricted to redefining NOP opcodes to introduce new functionality. You could always have a new leaf version to completely replace the scripting language, but if you want to replace just one of the opcodes, you don't need the version mechanism, you can just use the OP_SUCCESS opcode and redefine it to have a new meaning. It makes the script return true, and it can have any semantics, rather than "don't touch the stack" semantics for redefining NOPs. Does that make sense?

One more thing is public keys that start with an unknown byte are treated as an automatic success as well. This means that if a--- not in the top level one, only in the scripts or leafs. The reason for this is that it lets you introduce new public key crypto systems, new sighash modes, or anything without adding another 3 opcodes for doing signature checks, instead you just encode it in the public key itself where we have a byte anyway. This is not like OP_SUCCESS, it makes just that signature check succeed. I forget the actual rationale for it. The pubkey is an argument passed to a CHECKSIG in a script. It doesn't need to be a push, it can come from anywhere. It's a stack element passed to the CHECKSIG opcode. How do you avoid someone changing a transaction in flight? You're committing to the public key in script. If you're passing it in on the stack, you have another problem, without otherwise constraining it. Oh, right. Any questions about this part?

Also, uncompressed public keys are gone because really why do we still have those.

If you're doing a soft-fork later, you have to turn previously valid things into invalid things. So OP_SUCCESS is useful for that. When redefining a NOP opcode, the only thing you can do is not change the stack but it can observe it. You are restricted to opcodes that can't modify the stack. This is why CHECKLOCKTIMEVERIFY leaves the data on the stack. There could be a variation of OP_SUCCESS called OP_RELATIVESUCCESS where if you hit the opcode then it's success but otherwise you don't. The reason why it doesn't do that is that, you want an OP_SUCCESS that redefines the whole script language in a potential soft-fork. It lets you introduce an OP_SUCCESS that changes how you parse opcodes, which is something you do before any execution. The rule is that you iterate through all opcodes and if you encounter OP_SUCCESS without failing to parse, you don't execute anything at all, it just succeeds. You don't continue to parse either.

Also, there's the part about getting rid of the sigops limit. It's not entirely removing it. Instead of having two separate resource limitations in a block- the weight limit and sigops limits which leads to annoying minor miner optimization problem. Instead, there's just an allowance for having one sigop per 50 bytes of witness. Given every signature requires 64 bytes for the witness being checked, plus bytes for the public key, plus overhead from just the input itself, this shouldn't restrict any features but it gets rid of the two-dimensional problem. Now, what if at some point there is a reason to introduce an opcode that is very expensive, like say someone wants OP_CHECKSNARKVERIFY or something more expensive like execute something or OP_VERIFYETHEREUM or OP_JAVASCRIPT. You can imagine that because of this taproot assumption where we essentially assume that most spends are going to use the simple keypath, it might be reasonable to have fairly expensive exception clauses in your taproot leaf scripts to satisfy. Well, what do you do if such an opcode is orders of magnitude more expensive than a sigop now? You would want that to correspond to a weight increase, because proportionality you don't want to go beyond more than x CPU per bandwidth really. In order to do that, we feared that introducing that would incentivize people to stuff the transaction with hundreds of zero bytes or something just to not hit this limit. Say we introduce a limit that costs a hundred sigops; now you need a 5000 byte witness which would be a waste to the network just to stuff it up. So an idea we had is, if we had a way to amend a transaction's weight in an unconditional way, where we could just set a marker on the transaction that says compute the weight but incremented by this value and that increment should have a property that is signed by all the keys otherwise people could change the transaction weight in-flight, and also it should be recognizable out-of-context. Even when you don't have the UTXO being spent, you should unconditionally be able to do this weight adjustment. For that reason, we have an annex which is a witness element when spending a taproot output that has no consensus meaning except it goes into the signature and it is otherwise skipped. This is an area where weight increments could go. Say the transaction didn't have a sequence number, and we wanted to do something like relative locktime, what we now call the sequence field could have been put in this annex as well. It's essentially adding fields to an input of a transaction that the script doesn't care about. The only consensus rule included in the taproot proposal now is that you can have an annex identified in this certain unique way with a certain prefix byte, and if it is, then it's skipped. It lives in the witness, it's the last witness stack element, it has to start with byte 0x50 in hex. There's no valid witness spend right now that can start with byte 0x50 in p2wsh or p2wpkh.

Another thing is, say we want to do another script execution change like graftroot or things in that domain. We may want to reuse tapscript inside those where the script execution semantics actually remain the same and maybe increments ot the scripting language apply to both. If those would need an annex, then... I think it's useful to think of the leaf versions as the real script version, and the version at the top is really the execution structure version and they might be independent from each other. There's a number of execution mechanisms, and then there's a number of script versions. This might increase why we would want to have this annex attached to the tapscripts.

I think that's it. Oh, the transaction digest algorithm is updated. Jonas mentioned, there's a number of sighash improvements. There's more precomputed things to reduce the impact of large scripts. Tagged hashing, yep. Why? Couldn't you just tag it differently if you wanted to change it? Yes, you can but for example you don't want to introduce new tags for-- you don't want the introduction of new tags to be a common thing. Likely you want optimized implementations for them, and it increases code size. For simple things that get shared, you want to put it in the data, and you use the tags to make sure different domains don't interact. I don't care much about the epoch bytes.

It seems like the signer requires a lot more context. Actually, it's less. The biggest change in the sighash is SIGHASH_ALL is now signing the amounts being spent of all inputs, rather than just the inputs being spent. This is because of a particular attack against hardware wallets which is exploitable today. I think this was discussed on the mailing lists, actually, a couple months ago. You need to give the hardware wallet the unspent outputs being spent... you need the scripts of the outputs being spent anyway so that the hardware wallet can... say a 1000 person coinjoin, you have one input, but now you need all the other context data? Yes, that's a good point. I need to re-read that thread on that mailing list. This is a good point, and I hadn't considered it. This would make PSBTs very very large before you could even consider signing. You already need the vouts and txids. You also need this data right now for computing the fee; you don't necessarily have to compute the fee, but you certainly should and you certainly should want to.

In addition, there's a change where you always sign the scriptpubkey being spent which protects against the sort of concern of can I mutate the sighash that was designed for a P2SH thing into a thing that is not P2SH which is possible today? You fix this by including a bit that says explicitly whether this is P2SH but to categorically remove this concern you sign the scriptpubkey being spent.

There's a couple more pre-computed values.. Any kind of variable-length data is pre-hashed, so if you have multiple checksigs it doesn't need to be recomputed. The annex is variable length. The hash of the annex is computed once and included wherever it is needed. The number of inputs and number of outputs are pre-hashed, always. The data fed into the sighash computation has a bounded size, which is 200-something bytes.

This is the first time any kind of merkle inclusion proof is covered by consensus. You have made a change to that in that you order the pairs. Did you consider any other changes? Tagged hashes, sure. It could theoretically be any kind of accumulator, right? So what John is referring to is that, in this merkle tree we don't care about the position of things, only that something is in there. Someone suggested, why don't you sort the inputs to the hash function at every level? Now you don't need to reveal to the network whether you're going left or right; you just give the other branch and you could always combine it. This gets rid of one bit of witness data for every level, that's not a big issue, but there's a bit of complexity avoided about how to deal with the serialization of all those bits. Given sufficiently randomized scripts, it actually gives a tiny bit of privacy as well because you don't leak information through the ordering of position in the tree anymore. So why not any other kind of accumulator? Do you have any suggestion? It's semi-serious. Is there anything else out there? I think even a simple RSA accumulator, which has problems with trusted setup... but in this case, the setup is the person who owns the private key, right? It's a private accumulator? You do a setup for every key? I did math for this at some point and concluded it only makes sense if you have more than 16 million leafs, just in terms of size and CPU I don't want to think about even. You can have up to 4 billion leaves, actually. Above that, you will have difficulty computing your address.

The deterministic ordering of the leaves, can you not order your leaves by likelihood? You can still do that. You want to build your tree as a huffman tree based on likelihood, and then there's sorting that flips at various levels, but the depth of every leaf remains the same. So through that, you still leak information obviously, but that's desirable I think.

Post-quantum security

https://twitter.com/pwuille/status/1133838842690060289

One of the interesting things about post-quantum security is that, if you have a taproot output generated from an internal public key with a known discrete log and it ever gets spent using the script path, you have a transferable proof that ECDLP is broken. Someone with an ECDLP break might choose to not use it for this, because using it would convince the world that they really do have an ECDLP break.

This isn't specific to quantum computers. If there's ever a substantial threat of ECDLP being broken, there is no choice but to blacklist spending using ECDLP-based constructions. So at that point, either you say well there's just these coins and they can't move anymore at all, or you go towards fairly complicated post-quantum zero-knowledge proof to show this public key was derived from this seed or this hardened path. If a convention, and this is unenforcable, if a convention on the use of taproot is that the public key must always appear..... the public key doesn't have to appear there, though. You can just define that as another way to spend it. You could do that at a hard-fork; that's the least of our concerns. The idea is that, a best case scenario is that, if ECDLP gets broken, like 10 years from now there's sufficient research that we have confidence that there's a fairly acceptable tradeoff for a post-quantum secure scheme, a new output type gets defined, you can spend it and all good. Then 20 years later than that, people start to say there's actually a 500 qubit quantum computer only a factor this much more before our funds are at risk... but by that time, pretty much everyone has moved to a post-quantum scheme anyway. In the taproot tree you put some big hash thing for a post-quantum situation, as a backup. Have a branch somewhere that is like an unspendable thing but with a hash of your private key in it? Not unspendable; you can define a Lamport-kind of signature, which nobody should want to use becaues it's huge, but you could always put it in the script tree and it's there for years and decades and nobody touches it. It can be added later. If you have this in for 10 or 20 years, all the wallets put it in but never spend from it. So if we kill ECDLP stuff, then it makes sense because wallets have already been using this. I somewhat agree, but it just has to be enough time in advance. The big problem is that in post-quantum secure schemes, we can't do fun stuff. But that happens no matter what. Well, not necessarily. Maybe post-quantum secure schemes will be found later that allow those. That's not crazy. There's very minimal cost for doing this just-in-case, and if you find a better post-quantum scheme that has fun features then you can soft-fork that in later. You need another branch, but you never see it. It adds complexity, though.

If the public keys involved are always derived from some seed which we know, then you could have a zero-knowledge proof of knowledge of whatever generated that seed which will be enormous but that's fine because most post-quantum stuff is enormous. This would be a hard-fork, but we don't need to do anything except make sure that our private keys have known seeds that are known to wallets. Or your private key is just the hash of another private key.

Would the bip32 derivation be too crazy for the purpose you're describing? I don't know what a zero-knowledge proof here would look like. It's difficult to predict what the efficiency would be. It's going to be at least 10's of kilobytes probably. You could use a simple post-quantum scheme like Lamport signatures.

Most wallets use one of like five libraries, so as soon as you have it in bitcoinjs you're fine (regrettably). Part of the point of taproot is that you won't really be able to tell which wallets people are using anymore. You have a financial incentive to have those Lamport signautres in there.