Return-Path: Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 31898C000B for ; Thu, 17 Feb 2022 14:27:38 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 110DE41705 for ; Thu, 17 Feb 2022 14:27:38 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org X-Spam-Flag: NO X-Spam-Score: -1.621 X-Spam-Level: X-Spam-Status: No, score=-1.621 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, KHOP_HELO_FCRDNS=0.276, SPF_HELO_NONE=0.001, SPF_NONE=0.001, UNPARSEABLE_RELAY=0.001] autolearn=no autolearn_force=no Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hFB51wp5OE5N for ; Thu, 17 Feb 2022 14:27:36 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from azure.erisian.com.au (cerulean.erisian.com.au [139.162.42.226]) by smtp4.osuosl.org (Postfix) with ESMTPS id 4AF0E41676 for ; Thu, 17 Feb 2022 14:27:36 +0000 (UTC) Received: from aj@azure.erisian.com.au (helo=sapphire.erisian.com.au) by azure.erisian.com.au with esmtpsa (Exim 4.92 #3 (Debian)) id 1nKhkp-0006vT-3b; Fri, 18 Feb 2022 00:27:33 +1000 Received: by sapphire.erisian.com.au (sSMTP sendmail emulation); Fri, 18 Feb 2022 00:27:27 +1000 Date: Fri, 18 Feb 2022 00:27:27 +1000 From: Anthony Towns To: Russell O'Connor , Bitcoin Protocol Discussion Message-ID: <20220217142727.GA1429@erisian.com.au> References: <20220128013436.GA2939@erisian.com.au> <20220201011639.GA4317@erisian.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Score-int: -18 X-Spam-Bar: - Subject: Re: [bitcoin-dev] TXHASH + CHECKSIGFROMSTACKVERIFY in lieu of CTV and ANYPREVOUT X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Feb 2022 14:27:38 -0000 On Mon, Feb 07, 2022 at 09:16:10PM -0500, Russell O'Connor via bitcoin-dev wrote: > > > For more complex interactions, I was imagining combining this TXHASH > > > proposal with CAT and/or rolling SHA256 opcodes. > Indeed, and we really want something that can be programmed at redemption > time. I mean, ideally we'd want something that can be flexibly programmed at redemption time, in a way that requires very few bytes to express the common use cases, is very efficient to execute even if used maliciously, is hard to misuse accidently, and can be cleanly upgraded via soft fork in the future if needed? That feels like it's probably got a "fast, cheap, good" paradox buried in there, but even if it doesn't, it doesn't seem like something you can really achieve by tweaking around the edges? > That probably involves something like how the historic MULTISIG worked by > having list of input / output indexes be passed in along with length > arguments. > > I don't think there will be problems with quadratic hashing here because as > more inputs are list, the witness in turns grows larger itself. If you cache the hash of each input/output, it would mean each byte of the witness would be hashing at most an extra 32 bytes of data pulled from that cache, so I think you're right. Three bytes of "script" can already cause you to rehash an additional ~500 bytes (DUP SHA256 DROP), so that should be within the existing computation-vs-weight relationship. If you add the ability to hash a chosen output (as Rusty suggests, and which would allow you to simulate SIGHASH_GROUP), your probably have to increase your cache to cover each outputs' scriptPubKey simultaneously, which might be annoying, but doesn't seem fatal. > That said, your SIGHASH_GROUP proposal suggests that some sort of > intra-input communication is really needed, and that is something I would > need to think about. I think the way to look at it is that it trades off spending an extra witness byte or three per output (your way, give or take) vs only being able to combine transactions in limited ways (sighash_group), but being able to be more optimised than the more manual approach. That's a fine tradeoff to make for something that's common -- you save onchain data, make something easier to use, and can optimise the implementation so that it handles the common case more efficiently. (That's a bit of a "premature optimisation" thing though -- we can't currently do SIGHASH_GROUP style things, so how can you sensibly justify optimising it because it's common, when it's not only currently not common, but also not possible? That seems to me a convincing reason to make script more expressive) > While normally I'd be hesitant about this sort of feature creep, when we > are talking about doing soft-forks, I really think it makes sense to think > through these sorts of issues (as we are doing here). +1 I guess I especially appreciate your goodwill here, because this has sure turned out to be a pretty long message as I think some of these things through out loud :) > > "CAT" and "CHECKSIGFROMSTACK" are both things that have been available in > > elements for a while; has anyone managed to build anything interesting > > with them in practice, or are they only useful for thought experiments > > and blog posts? To me, that suggests that while they're useful for > > theoretical discussion, they don't turn out to be a good design in > > practice. > Perhaps the lesson to be drawn is that languages should support multiplying > two numbers together. Well, then you get to the question of whether that's enough, or if you need to be able to multiply bignums together, etc? I was looking at uniswap-like things on liquid, and wanted to do constant product for multiple assets -- but you already get the problem that "x*y < k" might overflow if the output values x and y are ~50 bits each, and that gets worse with three assets and wanting to calculate "x*y*z < k", etc. And really you'd rather calculate "a*log(x) + b*log(y) + c*log(z) < k" instead, which then means implementing fixed point log in script... > Having 2/3rd of the language you need to write interesting programs doesn't > mean that you get 2/3rd of the interesting programs written. I guess to abuse that analogy: I think you're saying something like we've currently got 67% of an ideal programming language, and CTV would give us 68%, but that would only take us from 10% to 11% of the interesting programs. I agree txhash might bump that up to, say, 69% (nice) but I'm not super convinced that even moves us from 11% to 12% of interesting programs, let alone a qualitative leap to 50% or 70% of interesting programs. It's *possible* that the ideal combination of opcodes will turn out to be CAT, TXHASH, CHECKSIGFROMSTACK, MUL64LE, etc, but it feels like it'd be better working something out that fits together well, rather than adding things piecemeal and hoping we don't spend all that effort to end up in a local optimum that's a long way short of a global optimum? [rearranged:] > The flexibility of TXHASH is intended to head off the need for future soft > forks. If we had specific applications in mind, we could simply set up the > transaction hash flags to cover all the applications we know about. But it > is the applications that we don't know about that worry me. If we don't > put options in place with this soft-fork proposal, then they will need > their own soft-fork down the line; and the next application after that, and > so on. > > If our attitude is to craft our soft-forks as narrowly as possible to limit > them to what only allows for given tasks, then we are going to end up > needing a lot more soft-forks, and that is not a good outcome. I guess I'm not super convinced that we're anywhere near the right level of generality that this would help in avoiding future soft forks? That's what I meant by it not covering SIGHASH_GROUP. I guess the model I have in my head, is that what we should ideally have a general/flexible/expressive but expensive way of doing whatever scripting you like (so a "SIMPLICITY_EXEC" opcode, perhaps), but then, as new ideas get discovered and widely deployed, we should then make them easy and cheap to use (whether that's deploying a "jet" for the simplicity code, or a dedicated opcode, or something else), but "cheap to use" means defining a new cost function (or defining new execution conditions for something that was already cheaper than the cheapest existing way of encoding those execution conditions), which is itself a soft fork since to make it "cheaper" means being able to fit more transactions using that feature into a block than was previously possible.. But even then, based on [0], pure simplicity code to verify a signature apparently takes 11 minutes, so that code probably should cost 66M vbytes (based on a max time to verify a block of 10 seconds), which would make it obviously unusable as a bitcoin tx with their normal 100k vbyte limit... Presumably an initial simplicity deployment would come with a bunch of jets baked in so that's less of an issue in practice... But I think that means that even with simplicity you couldn't experiment with alternative ECC curves or zero knowledge stuff without a soft fork to make the specific setup fast and cheap, first. [0] https://medium.com/blockstream/simplicity-jets-release-803db10fd589 (I think this approach would already be an improvement in how we do soft forks, though: (1) for many things, you would already have on-chain evidence that this is something that's worthwhile, because people are paying high fees to do it via hand-coded simplicity, so there's no question of whether it will be used; (2) you can prove the jet and the simplicity code do the exact same thing (and have unit/fuzz tests to verify it), so can be more confident that the implementation is correct; (3) maybe it's easier to describe in a bip that way too, since you can just reference the simplicity code it's replacing rather than having C++ code?) That still probably doesn't cover every experiment you might want to do; eg if you wanted to have your tx commit to a prior block hash, you'd presumably need a soft fork to expose that data; and if you wanted to extend the information about the utxo being spent (eg a parity bit for the internal public key to make recursive TLUV work better) you'd need a soft fork for that too. I guess a purist approach to generalising sighashes might look something like: [s] [shimplicity] DUP EXEC p CHECKSIGFROMSTACK where both s and shimplicity (== sighash + simplicity or shim + simplicity :) are provided by the signer, with s being a signature, and shimplicity being a simplicity script that builds a 32 byte message based on whatever bits of the transaction it chooses as well as the shimplicity script itself to prevent malleability. But writing a shimplicity script all the time is annoying, so adding an extra opcode to avoid that makes sense, reducing it to: [s] [sh] TXHASH p CHECKIGFROMSTACK which is then equivalent to the exisitng [s|sh] p CHECKSIG Though in that case, wouldn't you just have "txhash(sh)" be your shimplicity script (in which case txhash is a jet rather than an opcode), and keep the program as "DUP EXEC p CHECKSIGFROMSTACK", which then gives the signer maximum flexibility to either use a standard sighash, or write special code to do something new and magic? So I think I'm 100% convinced that a (simplified) TXHASH makes sense in a world where we have simplicity-equivalent scripting (and where there's *also* some more direct introspection functionality like Rusty's OP_TX or elements' tapscript opcodes or whatever). (I don't think there's much advantage of a TaggedHash opcode that takes the tag as a parameter over just writing "SHA256 DUP CAT SWAP CAT SHA256", and if you were going to have a "HASH_TapSighash" opcode it probably should be limited to hashing the same things from the bip that defines it anyway. So having two simplicity functions, one for bip340 (checksigfromstack) and one for bip342 (generating a signature message for the current transaction) seems about ideal) But, I guess that brings me back to more or less what Jeremy asked earlier in this thread: ] Does it make "more sense" to invest the research and development effort ] that would go into proving TXHASH safe, for example, into Simplicity ] instead? Should we be trying to gradually turn script into a more flexible language, one opcode at a time -- going from 11% to 12% to 13.4% to 14.1% etc of coverage of interesting programs -- or should we invest that time/effort into working on simplicity (or something like chialisp or similar) instead? That is, something where we could actually evaluate how all the improved pieces fit together rather than guessing how it might work if we maybe in future add CAT or 64 bit maths or something else... If we put all our "language design" efforts into simplicity/whatever, we could leave script as more of a "macro" language than a programming one; that is, focus on it being an easy, cheap, safe way of doing the most common things. I think that would still be worthwhile, both before and after simplicity/* is available? I think my opinions are: * recursive covenants are not a problem; avoiding them isn't and shouldn't be a design goal; and trying to prevent other people using them is wasted effort * having a language redesign is worthwhile -- there are plenty of ways to improve script, and there's enough other blockchain languages out there by now that we ought be able to avoid a "second system effect" disaster * CTV via legacy script saves ~17 vbytes compared to going via tapscript (since the CTV hash is already in the scriptPubKey and the internal pubkey isn't needed, so neither need to be revealed to spend) and avoids the taproot ECC equation check, at the cost of using up an OP_NOP opcode. That seems worthwhile to me. Comparatively, TXHASH saves ~8 vbytes compared to emulating it with CTV (because you don't have to supply an unacceptable hash on demand). So having both may be worthwhile, but if we only have one, CTV seems the bigger saving? And we're not wasting an opcode if we do CTV now and add TXHASH later, since we TXHASH isn't NOP-compatible and can't be included in legacy script anyway. * TXHASH's "PUSH" behaviour vs CTV's "examine the stack but don't change it, and VERIFY" behaviour is independent of the question of if we want to supply flags to CTV/TXHASH so they're more flexible And perhaps less strongly: * I don't like the ~18 TXHASH flags; for signing/CTV behaviour, they're both overkill (they have lots of seemingly useless combinations) and insufficient (don't cover SIGHASH_GROUP), and they add additional bytes of witness data, compared to CTV's zero-byte default or CHECKSIG's zero/one-byte sighash which only do things we know are useful (well, some of those combinations might not be useful either...). * If we're deliberately trying to add transaction introspection, then all the flags do make sense, but Rusty's unhashed "TX" approach seems better than TXHASH for that (assuming we want one opcode versus the many opcodes elements use). But if we want that, we probably should also add maths opcodes that can cope with output amounts, at least; and perhaps also have some way for signatures to some witness data that's used as script input. Also, convenient introspection isn't really compatible with convenient signing without some way of conveniently converting data into a tagged hash. * I'm not really convinced CTV is ready to start trying to deploy on mainnet even in the next six months; I'd much rather see some real third-party experimentation *somewhere* public first, and Jeremy's CTV signet being completely empty seems like a bad sign to me. Maybe that means we should tentatively merge the feature and deploy it on the default global signet though? Not really sure how best to get more real world testing; but "deploy first, test later" doesn't sit right. I'm not at all sure about bundling CTV with ANYPREVOUT and SIGHASH_GROUP: Pros: - with APO available, you don't have to worry as much if spending a CTV output doesn't result in a guaranteed txid, and thus don't need to commit to scriptSigs and they like - APOAS and CTV are pretty similar in what they hash - SIGHASH_GROUP lets you add extra extra change outputs to a CTV spend which you can't otherwise do - reusing APOAS's tx hash scheme for CTV would avoid some of the weird ugly bits in CTV (that the input index is committed to and that the scriptSig is only "maybe!" included) - defining SIGHASH_GROUP and CTV simultaneously might let you define the groups in a way that is compatible between tapscript (annex-based) and legacy CTV. On the other hand, this probably still works provided you deploy SIGHASH_GROUP /after/ CTV is specced in (by defining CTV behaviour for a different length arg) Cons: - just APOAS|ALL doesn't quite commit to the same things as bip 119 CTV and that matters if you reuse CTV addresses - SIGHASH_GROUP assumes use of the annex, which would need to be specced out; SIGHASH_GROUP itself doesn't really have a spec yet either - txs signed by APOAS|GROUP are more malleable than txs with a bip119 CTV hash which might be annoying to handle even non-adversarially - that malleability with current RBF rules might lead to pinning problems I guess for me that adds up to: * For now, I think I prefer OP_CTV over either OP_TXHASH alone or both OP_CTV and OP_TXHASH * I'd like to see CTV get more real-world testing before considering deployment * If APO/SIGHASH_GROUP get specced, implemented *and* tested by the time CTV is tested enough to think about deploying it, bundle them * Unless CTV testing takes ages, it's pretty unlikely it'll be worth simplifying CTV to more closely match APO's tx hashing * CAT, CHECKSIGFROMSTACK, tx introspection, better maths *are* worth prioritising, but would be better as part of a more thorough language overhaul (since you can analyse how they interact with each other in combination, and you get a huge jump from ~10% to ~80% benefit, instead of tiny incremental ones)? I guess that's all partly dependent on thinking that, TXHASH isn't great for tx introspection (especially without CAT) and, (without tx introspection and decent math opcodes), DLCs already provide all the interesting oracle behaviour you're really going to get... > I don't know if this is the answer you are looking for, but technically > TXHASH + CAT + SHA256 awkwardly gives you limited transaction reflection. > In fact, you might not even need TXHASH, though it certainly helps. Yeah, it wasn't really what I was looking for but it does demolish that specific thought experiment anyway. > > I believe "sequences hash", "input count" and "input index" are all an > > important part of ensuring that if you have two UTXOs distributing 0.42 > > BTC to the same set of addresses via CTV, that you can't combine them in a > > single transaction and end up sending losing one of the UTXOs to fees. I > > don't believe there's a way to resolve that with bip 118 alone, however > > that does seem to be a similar problem to the one that SIGHASH_GROUP > > tries to solve. > It was my understanding that it is only "input count = 1" that prevents > this issue. If you have input count = 1, that solves the issue, but you could also have input count > 1, and simply commit to different input indexes to allow/require you to combine two CTV utxos into a common set of new outputs, or you could have input count > 1 but input index = 1 for both utxos to prevent combining them with each other, but allow adding a fee funding input (but not a change output; and at a cost of an unpredictable txid). (I only listed "sequences hash" there because it implicitly commits to "input count") Cheers, aj