Return-Path: Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 83B20C000B for ; Mon, 7 Mar 2022 08:08:14 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 801F181461 for ; Mon, 7 Mar 2022 08:08:14 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org X-Spam-Flag: NO X-Spam-Score: -0.415 X-Spam-Level: X-Spam-Status: No, score=-0.415 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FAKE_REPLY_C=1.486, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6NOctzI7yKk1 for ; Mon, 7 Mar 2022 08:08:11 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from azure.erisian.com.au (azure.erisian.com.au [172.104.61.193]) by smtp1.osuosl.org (Postfix) with ESMTPS id A6B4B81521 for ; Mon, 7 Mar 2022 08:08:11 +0000 (UTC) Received: from aj@azure.erisian.com.au (helo=sapphire.erisian.com.au) by azure.erisian.com.au with esmtpsa (Exim 4.92 #3 (Debian)) id 1nR8PW-0004Ny-VB; Mon, 07 Mar 2022 18:08:09 +1000 Received: by sapphire.erisian.com.au (sSMTP sendmail emulation); Mon, 07 Mar 2022 18:08:03 +1000 Date: Mon, 7 Mar 2022 18:08:03 +1000 From: Anthony Towns To: Jeremy Rubin , Bitcoin Protocol Discussion Message-ID: <20220307080803.GA6464@erisian.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Score-int: -3 X-Spam-Bar: / Subject: Re: [bitcoin-dev] Annex Purpose Discussion: OP_ANNEX, Turing Completeness, and other considerations X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2022 08:08:14 -0000 On Sat, Mar 05, 2022 at 12:20:02PM +0000, Jeremy Rubin via bitcoin-dev wrote: > On Sat, Mar 5, 2022 at 5:59 AM Anthony Towns wrote: > > The difference between information in the annex and information in > > either a script (or the input data for the script that is the rest of > > the witness) is (in theory) that the annex can be analysed immediately > > and unconditionally, without necessarily even knowing anything about > > the utxo being spent. > I agree that should happen, but there are cases where this would not work. > E.g., imagine OP_LISP_EVAL + OP_ANNEX... and then you do delegation via the > thing in the annex. > Now the annex can be executed as a script. You've got the implication backwards: the benefit isn't that the annex *can't* be used as/in a script; it's that it *can* be used *without* having to execute/analyse a script (and without even having to load the utxo being spent). How big a benefit that is might be debatable -- it's only a different ordering of the work you have to do to be sure the transaction is valid; it doesn't reduce the total work. And I think you can easily design invalid transactions that will maximise the work required to establish the tx is invalid, no matter what order you validate things. > Yes, this seems tough to do without redefining checksig to allow partial > annexes. "Redefining checksig to allow X" in taproot means "defining a new pubkey format that allows a new sighash that allows X", which, if it turns out to be necessary/useful, is entirely possible. It's not sensible to do what you suggest *now* though, because we don't have a spec of how a partial annex might look. > Hence thinking we should make our current checksig behavior > require it be 0, Signatures already require the annex to not be present. If you personally want to do that for every future transaction you sign off on, you already can. > > It seems like a good place for optimising SIGHASH_GROUP (allowing a group > > of inputs to claim a group of outputs for signing, but not allowing inputs > > from different groups to ever claim the same output; so that each output > > is hashed at most once for this purpose) -- since each input's validity > > depends on the other inputs' state, it's better to be able to get at > > that state as easily as possible rather than having to actually execute > > other scripts before your can tell if your script is going to be valid. > I think SIGHASH_GROUP could be some sort of mutable stack value, not ANNEX. The annex is already a stack value, and the SIGHASH_GROUP parameter cannot be mutable since it will break the corresponding signature, and (in order to ensure validating SIGHASH_GROUP signatures don't require hashing the same output multiple times) also impacts SIGHASH_GROUP signatures from other inputs. > you want to be able to compute what range you should sign, and then the > signature should cover the actual range not the argument itself. The value that SIGHASH_GROUP proposes putting in the annex is just an indication of whether (a) this input is using the same output group as the previous input; or else (b) how many outputs are in this input's output group. The signature naturally commits to that value because it's signing all the outputs in the group anyway. > Why sign the annex literally? To prevent it from being third-party malleable. When there is some meaning assigned to the annex then perhaps it will make sense to add some more granular way of accessing it via script, but until then, committing to the whole thing is the best option possible, since it still allows some potential uses of the annex without having to define a new sighash. Note that signing only part of the annex means that you probably reintroduce the quadratic hashing problem -- that is, with a script of length X and an annex of length Y, you may have to hash O(X*Y) bytes instead of O(X+Y) bytes (because every X/k bytes of the script selects a different Y/j subset of the annex to sign). > Why require that all signatures in one output sign the exact same digest? > What if one wants to sign for value and another for value + change? You can already have one signature for value and one for value+change: use SIGHASH_SINGLE for the former, and SIGHASH_ALL for the latter. SIGHASH_GROUP is designed for the case where the "value" goes to multiple places. > > > Essentially, I read this as saying: The annex is the ability to pad a > > > transaction with an additional string of 0's > > If you wanted to pad it directly, you can do that in script already > > with a PUSH/DROP combo. > You cannot, because the push/drop would not be signed and would be > malleable. If it's a PUSH, then it's in the tapscript and committed to by the scriptPubKey, and not malleable. There's currently no reason to have padding specifiable at spend time -- you know when you're writing the script whether the spender can reuse the same signature for multiple CHECKSIG ops, because the only way to do that is to add DUP/etc opcodes -- so if you're doing that, you can add any necessary padding at the same time. > The annex is not malleable, so it can be used to this as authenticated > padding. The reason that the annex is not third-party malleable is that its content is committed to by signatures. > > The point of doing it in the annex is you could have a short byte > > string, perhaps something like "0x010201a4" saying "tag 1, data length 2 > > bytes, value 420" and have the consensus intepretation of that be "this > > transaction should be treated as if it's 420 weight units more expensive > > than its serialized size", while only increasing its witness size by > > 6 bytes (annex length, annex flag, and the four bytes above). Adding 6 > > bytes for a 426 weight unit increase seems much better than adding 426 > > witness bytes. > Yes, that's what I say in the next sentence, > *> Or, we might somehow make the witness a small language (e.g., run length > encoded zeros) If you're doing run-length encoding, you might as well just use gzip at the p2p and storage layers; you don't need to touch consensus at all. That's not an extensible or particularly interesting idea. > > > Introducing OP_ANNEX: Suppose there were some sort of annex pushing > > opcode, > > > OP_ANNEX which puts the annex on the stack > > I think you'd want to have a way of accessing individual entries from > > the annex, rather than the annex as a single unit. > Or OP_ANNEX + OP_SUBSTR + OP_POVARINTSTR? Then you can just do 2 pops for > the length and the tag and then get the data. If you want to make things as inconvenient as possible, sure, I guess? > > > Now every time you run this, > > You only run a script from a transaction once at which point its > > annex is known (a different annex gives a different wtxid and breaks > > any signatures), and can't reference previous or future transactions' > > annexes... > In a transaction validator, yes. But in a satisfier, no. In a satisfier you don't "run" a script, you provide a solution to the script... You can certainly create scripts where it's not possible to provide valid solutions, eg: DUP EQUAL NOT VERIFY or where it's theoretically possible but in practice extremely difficult to provide solutions, eg: DUP 2

2 CHECKMULTISIG 2DUP EQUAL NOT VERIFY SHA256 SWAP SHA256 EQUAL or where the difficulty is known and there really isn't an easier way of coming up with a solution than doing multiple guesses and validating the result: SIZE 80 EQUAL NOT VERIFY HASH256 0 18 SUBSTR 0 NUMEQUAL But if you don't want to make life difficult for yourself, the answer's pretty easy: just don't do those things. Or, at a higher level, don't design new opcodes where you have to do those sorts of things. > Not true about accessing previous TXNs annexes. All coins spend from > Coinbase transactions. If you can get the COutpoint you're spending, you > can get the parent of the COutpoint... and iterate backwards so on and so > forth. Then you have the CB txn, which commits to the tree of wtxids. So > you get previous transactions annexes comitted there. None of that information is stored in the utxo database or accessible at validation time. Adding that information would make the utxo database much larger, increasing the costs of running a node, and increasing validation time for each transaction/block. > For future transactions, (For future transactions, if you had generic recursive covenants and a opcode to examine the annex, you could prevent spending without a particular value appearing in the annex; that doesn't let you "inspect" a future annex, though) > > > Because the Annex is signed, and must be the same, this can also be > > > inconvenient: > > The annex is committed to by signatures in the same way nVersion, > > nLockTime and nSequence are committed to by signatures; I think it helps > > to think about it in a similar way. > nSequence, yes, nLockTime is per-tx. nVersion is also per-tx not per-input. You still need to establish all three of them before you start signing things. > BTW i think we now consider nSeq/nLock to be misdesigned given desire to > vary these per-input/per-tx....\ Since nSequence is per-input, you can obviously vary that per-input; and you can vary all three per-tx. > > > Suppose that you have a Miniscript that is something like: and(or(PK(A), > > > PK(A')), X, or(PK(B), PK(B'))). > Yes, my point is this is computationally hard to do sometimes. Sometimes, what makes things computationally hard is that you've got the wrong approach to looking at the problem. > > CLTV also has the problem that if you have one script fragment with > > CLTV by time, and another with CLTV by height, you can't come up with > > an nLockTime that will ever satisfy both. If you somehow have script > > fragments that require incompatible interpretations of the annex, you're > > likewise going to be out of luck. > Yes, see above. If we don't know how the annex will be structured or used, If you don't know how the annex will be structured or used, don't use it. That's exactly how things are today, because no one knows how it will be structured or used. > this is the point of this thread.... > We need to drill down how to not introduce these problems. From where I sit, it looks like you're drawing hasty conclusions based on a lot of misconceptions. That's not the way you avoid introducing problems... I mean, having the misconceptions is perfectly reasonable; if anyone knew exactly how annex things should work, we'd have a spec already. It's leaping straight to "this is the only way it can work, it's a dumb way, and therefore we should throw this out immediately" that I don't really see the humour in. > > > It seems like one good option is if we just go on and banish the > > OP_ANNEX. > > > Maybe that solves some of this? I sort of think so. It definitely seems > > > like we're not supposed to access it via script, given the quote from > > above: > > How the annex works isn't defined, so it doesn't make any sense to > > access it from script. When how it works is defined, I expect it might > > well make sense to access it from script -- in a similar way that the > > CLTV and CSV opcodes allow accessing nLockTime and nSequence from script. > That's false: CLTV and CSV expressly do not allow accessing it from script, > only lower bounding it Lower bounding something requires accessing it. That CLTV/CSV only allows lower-bounding it rather than more arbitrary manipulation is mostly due to having to be implemented via upgrading an OP_NOP opcode, rather than any other reason, IMHO. > Legacy outputs can use these new sighash flags as well, in theory (maybe > I'll do a post on why we shouldn't...) Existing outputs can't use new sighash flags introduced by a soft fork -- if they could, then those outputs would have been anyone-can-spend prior to the soft fork activating, because node software that doesn't support the soft fork isn't able to calculate the message that the signature applies to, so can't reject invalid signatures. Perhaps you mean "we could replace OP_NOPx by OP_CHECKSIGv2 and allow creating new p2wsh or p2sh addresses that can be spent using the new flags", but I can't really think why anyone would bring that up at this point, except as a way of deliberately wasting people's time and attention... Cheers, aj