Return-Path: Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 46216C000B for ; Fri, 4 Mar 2022 01:04:53 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 2D2124167E for ; Fri, 4 Mar 2022 01:04:53 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org X-Spam-Flag: NO X-Spam-Score: -1.621 X-Spam-Level: X-Spam-Status: No, score=-1.621 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, KHOP_HELO_FCRDNS=0.276, SPF_HELO_NONE=0.001, SPF_NONE=0.001, UNPARSEABLE_RELAY=0.001] autolearn=no autolearn_force=no Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id We4oVN58A2kC for ; Fri, 4 Mar 2022 01:04:51 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from azure.erisian.com.au (cerulean.erisian.com.au [139.162.42.226]) by smtp4.osuosl.org (Postfix) with ESMTPS id 79EC041679 for ; Fri, 4 Mar 2022 01:04:51 +0000 (UTC) Received: from aj@azure.erisian.com.au (helo=sapphire.erisian.com.au) by azure.erisian.com.au with esmtpsa (Exim 4.92 #3 (Debian)) id 1nPwNB-0008Ma-SC for ; Fri, 04 Mar 2022 11:04:48 +1000 Received: by sapphire.erisian.com.au (sSMTP sendmail emulation); Fri, 04 Mar 2022 11:04:42 +1000 Date: Fri, 4 Mar 2022 11:04:42 +1000 From: Anthony Towns To: Bitcoin Protocol Discussion Message-ID: <20220304010442.GC3869@erisian.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Score-int: -18 X-Spam-Bar: - Subject: [bitcoin-dev] bitcoin scripting and lisp X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Mar 2022 01:04:53 -0000 On Sun, Feb 27, 2022 at 04:34:31PM +0000, ZmnSCPxj via bitcoin-dev wrote: > In reaction to this, AJ Towns mailed me privately about some of his > thoughts on this insane `OP_EVICT` proposal. > He observed that we could generalize the `OP_EVICT` opcode by > decomposing it into smaller parts, including an operation congruent > to the Scheme/Haskell/Scala `map` operation. At much the same time Zman was thinking about OP_FOLD and in exactly the same context, I was wondering what the simplest possible language that had some sort of map construction was -- I mean simplest in a "practical engineering" sense; I think Simplicity already has the Euclidean/Peano "least axioms" sense covered. The thing that's most appealing to me about bitcoin script as it stands (beyond "it works") is that it's really pretty simple in an engineering sense: it's just a "forth" like system, where you put byte strings on a stack and have a few operators to manipulate them. The alt-stack, and supporting "IF" and "CODESEPARATOR" add a little additional complexity, but really not very much. To level-up from that, instead of putting byte strings on a stack, you could have some other data structure than a stack -- eg one that allows nesting. Simple ones that come to mind are lists of (lists of) byte strings, or a binary tree of byte strings [0]. Both those essentially give you a lisp-like language -- lisp is obviously all about lists, and a binary tree is just made of things or pairs of things, and pairs of things are just another way of saying "car" and "cdr". A particular advantage of lisp-like approaches is that they treat code and data exactly the same -- so if we're trying to leave the option open for a transaction to supply some unexpected code on the witness stack, then lisp handles that really naturally: you were going to include data on the stack anyway, and code and data are the same, so you don't have to do anything special at all. And while I've never really coded in lisp at all, my understanding is that its biggest problems are all about doing things efficiently at large scales -- but script's problem space is for very small scale things, so there's at least reason to hope that any problems lisp might have won't actually show up for this use case. So, to me, that seemed like something worth looking into... After looking into it, I actually think chia lisp [1] gets pretty much all the major design decisions pretty much right. There are obviously a few changes needed given the differences in design between chia and bitcoin: - having secp256k1 signatures (and curve operations), instead of BLS12-381 ones - adding tx introspection instead of having bundle-oriented CREATE_COIN, and CREATE/ASSERT results [10] and there are a couple of other things that could maybe be improved upon: - serialization seems to be a bit verbose -- 100kB of serialized clvm code from a random block gzips to 60kB; optimising the serialization for small lists, and perhaps also for small literal numbers might be a feasible improvement; though it's not clear to me how frequently serialization size would be the limiting factor for cost versus execution time or memory usage. - I don't think execution costing takes into account how much memory is used at any one time, just how much was allocated in total; so the equivalent of (OP_DUP OP_DROP OP_DUP OP_DROP ..) only has the allocations accounted for, with no discount given for the immediate freeing, so it gets treated as having the same cost as (OP_DUP OP_DUP .. OP_DROP OP_DROP ..). Doing it that way would be a worse than how bitcoin script is currently costed, but doing better might mean locking in an evaluation method at the consensus level. Seems worth looking into, at least. But otherwise, it seems a pretty good match. I think you'd need about 40 opcodes to match bitcoin script and (roughly) chia lisp, something like: q - quote a - apply x - exception / immediately fail (OP_RETURN style) i - if/then/else softfork - upgradability not, all, any - boolean logic bitand, bitor, bitxor, bitnot, shift - bitwise logic = - bitwise equality > - + * / divmod - (signed, bignum) arithmetic ashift - arithmetic shift (sign extended) >s - string comparison strlen, substr, concat - string ops f, r, c, l - list ops (head, tail, make a list, is this a list?) sha256 - hashing numequal - arithmetic equal, equivalent to (= (+ a 0) (+ b 0)) ripemd160, hash160, hash256 - more hashing bip342-txmsg - given a sighash byte, construct the bip342 message bip340-verify - given a pubkey, message, and signature bip340 verify it tx - get various information about the tx taproot - get merkle path/internalpubkey/program/annex information ecdsa - same as bip340-verify, except for traditional ecdsa? secp256k1-muladd - given (a B C) where B,C are points, calculate a*B+C? That compares to about 60 (non-disabled) opcodes in current script. Pretty much all the opcodes in the first section are directly from chia lisp, while all the rest are to complete the "bitcoin" functionality. The last two are extensions that are more food for thought than a real proposal. Using a lisp-style approach seems an improvement in general to me. For example, rather than the streaming-sha256 approach in Elements, where you could write: "a" SHA256INITIALIZE "b" SHA256UPDATE "c" SHA256UPDATE "d" SHA256FINALIZE to get the sha256 of "abcd" without having to CAT them first (important if they'd potentially overflow the 520B stack item limit), in chia lisp you write: (sha256 "a" "b" "c" "d") which still has the benefit of streaming the inputs into the function, but only adds a single opcode, doesn't involve representing the internal sha256 midstate on the stack, and generally seems easier to understand, at least to me. As another example, following the traditional functional "tail recursion instead of for-loops" approach, doing CHECKMULTISIG might become something like: (defun checksig (sig key) bip340-verify (f sig) (bip342-txmsg (r sig)) key) (defun checkmultisig (sigs keys k) if (= k 0) 1 (if (l sigs) (if (checksig (f sigs) (f keys)) (checkmultisig (r sigs) (r keys) (- k 1)) (checkmultisig sigs (r keys) k) ) 0 ) ) Here each "sig" is a pair of a 64B bip340 signature and a 1B sighash; instead of a 65B string combining both, and sigs, keys are lists, and k is the number of successful signature checks you're requiring for success. Of course, "defun" and "if" aren't listed as opcodes above; instead you have a compiler that gives you nice macros like defun and translates them into correct uses of the "a" opcode, etc. As I understand it, those sort of macros and translations are pretty well understood across lisp-like languages, and, of course, they're already implemented for chia lisp. I think with the "tx" opcode defined similarly to how Rusty suggested it [2] you could implement OP_CTV-like behaviours in similar way, and also replace "bip342-txmsg" with your own code to generate SIGHASH_ANYPREVOUT or SIGHASH_GROUP style messages to sign. (This would mean also being able to pull information about the utxo being spent to obtain its amount and scriptpubkey, which are committed to wit ANYPREVOUT. If it was also able to obtain the "is_coinbase" flag, that might allow you a more accurate covenant-based implementation of drivechains...) Likewise, with the "taproot" opcode defined in a way that lets you extract out the internal public key and merkle path, I think you could implement OP_TLUV and OP_EVICT with a similar recursive approach. There's two ways to think about upgradability here; if someday we want to add new opcodes to the language -- perhaps something to validate zero knowledge proofs or calculate sha3 or use a different ECC curve, or some way to support cross-input signature aggregation, or perhaps it's just that some snippets are very widely used and we'd like to code them in C++ directly so they validate quicker and don't use up as much block weight. One approach is to just define a new version of the language via the tapleaf version, defining new opcodes however we like. The other is to use the "softfork" opcode -- chia defines it as: (softfork cost code) though I think it would probably be better if it were (softfork cost version code) where the idea is that "code" will use the "x" opcode if there's a problem, and anyone supporting the "version" softfork can verify that there aren't any problems at a cost of "cost". However, whether you do or don't support that softfork, as far as the rest of the script is concerned, the expression will either fail entirely or evaluate as zero; so anyone who doesn't support the softfork can just replace it with zero and continue on, treating it as if it had costed "cost" units. One thing worth noting: "softfork" behaves more like OP_NOP than tapscript's OP_SUCCESS -- I think it's just not possible in general to have OP_SUCCESS-like behaviour if you're trying to allow accepting code from the witness data -- otherwise as soon as you reveal that your script does accept arbitrary code supplied by the spender, someone could stick in an OP_SUCCESS code, and remove all the restrictions on spending and steal your funds. To me, it seems like chia lisp is a better answer to the problem here than the Simplicity language. Simplicity is complicated in a few ways: - it defines over 100 jets (plus another 6 for sha3, and another 45 for individual libsecp256k1 functions) that need to be implemented natively to efficiently track consensus [3] - as far as I know, how to soft-fork in new jets isn't yet well established. I think the ideal is that you just write everything in raw simplicity, and either the interpreter has a jet and does things quickly, or doesn't, and gets the same result much more slowly [4]. But that approach doesn't seem compatible with maintaining consensus, when "slowly" can be slower by more than 6 orders of magnitude [5]. - to understand what's going on with a smart contract, you need to understand both simplicity (to define the program) and the bit machine (to follow how it's computed), both of which are fairly novel -- and if nobody's directly coding in simplicity which seems likely, you're adding a third layer on top of that: you want to understand what the programmer asked for (the source), what is included in the transaction (the simplicity code) and how that's executed by consensus code (via the bit machine). There's a branch for enabling simplicity on elements/liquid [6] to see what enabling simplicity might involve in concrete terms; it just... doesn't seem at all simple in practice to me. I can't see how you'd reasonably translate that simplicity code into a BIP series that anyone could understand, or how you'd do a thorough review of all the changes... By contrast, chia lisp has fewer opcodes than Simplicity's jets, has feasible approaches to low-impact soft forks to increase functionality, can be used with only two levels of abstraction (lisp with macros and the opcodes-only vm level) that seem not too bad to understand, and (in my opinion) doesn't seem too hard to implement/maintain reasonably. On the other hand, Simplicity's big advantage over *everything* else is in formal verification. But I'm not really seeing why you couldn't preserve that advantage by writing simplicity definitions for the "lisp" opcodes [7], so that you can "compile" the lisp programs to simplicity, and then verify them however you like. One of the things people sometimes claim about bitcoin as an asset, is that it's got both the advantage of having been first to market, but also that if some altcoin comes along with great new ideas, then those ideas can just be incorporated into bitcoin too, so bitcoin can preserve it's lead even from innovators. Granted, I've only really been looking at chia lisp for a bit over a week, but it really seems to me like a case where it might be worth putting that philosophy into practice. If we were to adopt this, obviously we shouldn't call it "chia lisp" anymore, since it wouldn't work the same in important ways. But since the program would be encoded as a binary-tree of car/cdr pairs, maybe we could call it "binary-tree coded script", or "btc-script" for short... PS: related tweets: [8] Cheers, aj [0] You could also allow things to be pushed onto the stack that (recursively) can push things onto the stack -- the language "Joy" takes this approach. It seems to end up equivalent to doing things in a list oriented way to me. [1] https://chialisp.com/docs/ref/clvm [2] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-February/019871.html [3] https://raw.githubusercontent.com/ElementsProject/simplicity/pdf/Simplicity-TR.pdf [4] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-November/015244.html https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-October/015227.html https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-October/015228.html [5] https://medium.com/blockstream/simplicity-jets-release-803db10fd589 [6] https://github.com/ElementsProject/elements/compare/simplicity [7] At least I think so? chia lisp does multibyte math, that is "+" can accept "arbitrary" length bytestrings, which it interprets as numbers and adds together; whereas Simplicity requires finite types. I think you could decide "a program that only costs X can't have any bytestrings greater than length k*X" and construct a finite type up to that length, maybe? So long as you're only using it for verification, maybe that stays feasible? Or perhaps you could arbitrarily limit the strings to a max of 520 bytes at a consensus level, and the corresponding Simplicity types to 4160 bits and go from there? [8] https://twitter.com/brian_trollz/status/1499048316956549123 https://twitter.com/jb55/status/1499045998315724801 [9] Oops, out of order footnotes. Anyway... [10] [9] The CREATE/ASSERT bundling stuff is interesting; and could be used to achieve functionality like the "transaction sponsorship" stuff. It doesn't magically solve the issues with maintaining the mempool and using that to speed up block acceptance, though, and the chia chain has apparently suffered from mempool-flooding attacks recently [11] so I don't think they've solved the broader problem, and thus I think it still makes more sense to stick with bitcoin's current model here. [11] https://thechiaplot.net/2021/11/03/interview-with-the-chia-dust-stormer/ https://github.com/Chia-Network/post-mortem/blob/main/post-mortem.md