diff options
author | Bram Cohen <bram@bittorrent.com> | 2017-02-24 14:20:19 -0800 |
---|---|---|
committer | bitcoindev <bitcoindev@gnusha.org> | 2017-02-24 22:20:22 +0000 |
commit | eb049d723c552d5fc69a226b75e6468276f9d998 (patch) | |
tree | e38fc7e9b3b750637352097f2065437468ed740a /3f/4604fef2a53a84e7c3c10aab47474ad070d17b | |
parent | 9c631aa447f44f341a1cc70c4a75560890ab428e (diff) | |
download | pi-bitcoindev-eb049d723c552d5fc69a226b75e6468276f9d998.tar.gz pi-bitcoindev-eb049d723c552d5fc69a226b75e6468276f9d998.zip |
Re: [bitcoin-dev] A Better MMR Definition
Diffstat (limited to '3f/4604fef2a53a84e7c3c10aab47474ad070d17b')
-rw-r--r-- | 3f/4604fef2a53a84e7c3c10aab47474ad070d17b | 485 |
1 files changed, 485 insertions, 0 deletions
diff --git a/3f/4604fef2a53a84e7c3c10aab47474ad070d17b b/3f/4604fef2a53a84e7c3c10aab47474ad070d17b new file mode 100644 index 000000000..fe7e0570e --- /dev/null +++ b/3f/4604fef2a53a84e7c3c10aab47474ad070d17b @@ -0,0 +1,485 @@ +Return-Path: <bram@bittorrent.com> +Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org + [172.17.192.35]) + by mail.linuxfoundation.org (Postfix) with ESMTPS id 46950B7C + for <bitcoin-dev@lists.linuxfoundation.org>; + Fri, 24 Feb 2017 22:20:22 +0000 (UTC) +X-Greylist: whitelisted by SQLgrey-1.7.6 +Received: from mail-io0-f174.google.com (mail-io0-f174.google.com + [209.85.223.174]) + by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C098E18C + for <bitcoin-dev@lists.linuxfoundation.org>; + Fri, 24 Feb 2017 22:20:20 +0000 (UTC) +Received: by mail-io0-f174.google.com with SMTP id l66so2808289ioi.1 + for <bitcoin-dev@lists.linuxfoundation.org>; + Fri, 24 Feb 2017 14:20:20 -0800 (PST) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=bittorrent-com.20150623.gappssmtp.com; s=20150623; + h=mime-version:in-reply-to:references:from:date:message-id:subject:to + :cc; bh=apRFdcYHU8VoMC5fRKoY9Bz/aL0S1caImkffHdneXmA=; + b=P9HiKMVKvbUe6MDFOTyEJbLClhDC1tO6sVNKzQQ2UaHH5Nfl3SBQKijhUrpMn80gtn + rRNH8baBwhrCp0o8Pf4fMSgJD9JPA1xGtzfOU0K+koZQC2o4vaiURUfYyGmFKV1ZRLGT + 9Xoxuhd8dfs2DqNA0dq2LFkSw/wt+rcXcwWhdXypBWjcdv8xi+ETmSaYXX7+a6O/n9RK + 57EMqpxL6elA1+EsFPD1BuwBnc+101DjX0azXQ24stNC7dz0qtuRKoFayS62AbYbohAu + QxRKlCz4FHAG0ev6Vz4hP4UqBABB5A7b5VXWw4m5jyeC1Dm8tc/yeqsPvLH3zv2aXBm/ + mKZw== +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=1e100.net; s=20161025; + h=x-gm-message-state:mime-version:in-reply-to:references:from:date + :message-id:subject:to:cc; + bh=apRFdcYHU8VoMC5fRKoY9Bz/aL0S1caImkffHdneXmA=; + b=sp0TSSLXhaNVeYqLaE3wWBA3AZQ7oZeUEEChkL6A9w6OyopjwnsMZhD20J4CZ81C1R + qjNOoD1ADF1qWwrcyBFhUY+ohJ/iNw5/mLlO69yPWpu89ngtaoGJnC5ZcSd+gWZFo7Uz + hQLDbVLI7us+E8Q95ssAdFa8QH+dDjwyVs8OJWG05OxaEbeeCF8Gn6pIX8AJIQ9YsT5S + WSCr2C5IY+8DI2MdpYNB9VPTdjt4XQcFYxWmwaCgmdkvLO/9UofgboLAI+MchzmQft1c + PsL02gV8CeiUb0y1sE/+axFpIHVXMzmimhUa1tdFxw8/xDqexNgfWRIX1bMWSCZEG0GX + BXHg== +X-Gm-Message-State: AMke39mDzUATkS+F3vrMNT8POEs6i7k54A1sTqNQwB4OcEJViw6IRe+JPNsk4osENr+JUHfNKuNOyhwZcpkt6Gwu +X-Received: by 10.107.26.205 with SMTP id a196mr4651415ioa.214.1487974820033; + Fri, 24 Feb 2017 14:20:20 -0800 (PST) +MIME-Version: 1.0 +Received: by 10.36.73.150 with HTTP; Fri, 24 Feb 2017 14:20:19 -0800 (PST) +In-Reply-To: <20170224043613.GA32502@savin.petertodd.org> +References: <CAAcC9ys5sUxVfOjogFiF3gzk51D_L=QQkOYevTH=qbh_RkA3Hw@mail.gmail.com> + <CA+KqGkrUneGe4yORi=JAAWzoO0UftMUuJm3S-__W5sBh-+T1vQ@mail.gmail.com> + <20170223235105.GA28497@savin.petertodd.org> + <CA+KqGkowxEZeAFYa2JJchBDtRkg1p3YZNocivzu3fAtgRLDRBQ@mail.gmail.com> + <20170224010943.GA29218@savin.petertodd.org> + <CA+KqGkrOK76S3ffPJmpqYcBwtSeKESqN16yZsrwzDR6JZZmwFA@mail.gmail.com> + <20170224025811.GA31911@savin.petertodd.org> + <CA+KqGkq7gavAnAk-tcA+gxL2sWpv3ENhEmHrQHaPdyAsKrLjGg@mail.gmail.com> + <20170224031531.GA32118@savin.petertodd.org> + <CA+KqGkrfhg3GnbWwvKXHQ2NWuCnfzYyTPUxRhzYMuDBiNQR4eA@mail.gmail.com> + <20170224043613.GA32502@savin.petertodd.org> +From: Bram Cohen <bram@bittorrent.com> +Date: Fri, 24 Feb 2017 14:20:19 -0800 +Message-ID: <CA+KqGkpi4GvgU-K6vt-U5ZN4AkpjZ0rruzddoJS4-V0TcnyqUQ@mail.gmail.com> +To: Peter Todd <pete@petertodd.org> +Content-Type: multipart/alternative; boundary=001a113fd05695bf9205494e22bc +X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, + DKIM_VALID,HTML_MESSAGE,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 +X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on + smtp1.linux-foundation.org +Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org> +Subject: Re: [bitcoin-dev] A Better MMR Definition +X-BeenThere: bitcoin-dev@lists.linuxfoundation.org +X-Mailman-Version: 2.1.12 +Precedence: list +List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org> +List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, + <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe> +List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/> +List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org> +List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help> +List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, + <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe> +X-List-Received-Date: Fri, 24 Feb 2017 22:20:22 -0000 + +--001a113fd05695bf9205494e22bc +Content-Type: text/plain; charset=UTF-8 + +So your idea is to cluster entries by entry time because newer things are +more likely to leave and updating multiple things near each other is +cheaper? + +That can be done with my tool. Instead of using hashes for the values being +stored, you use position entries. The first entry gets a value of all +zeros, the next one a one followed by all zeros, then the next two +correspond to the first two with the second bit flipped to one, then the +next four the first four with the third bit flipped to one, etc. It +probably performs a little bit better to do it two bits at a time instead +of one so that the entries are 00, 01, 10, 11, 0001, 0010, 0011, 0101, +0110, 0111, 1001, etc. If you were to really use this you'd probably want +to to add some optimizations to use the fact that the terminals fit in 64 +bits instead of 256, but it mostly works unchanged, and gets whatever +benefits there are to this clustering plus the high performance +implementation tricks I've built which I keep complaining that nobody's +giving feedback on. + +I'm not sold on this being a win: The empirical access patterns are +unknown, it requires an extra cache miss per lookup to find the entry +number, it may be that everything is optimized well enough without it for +there to be no meaningful gains, and it's a bunch of extra complexity. What +should be done is that a plain vanilla UTXO set solution is optimized as +well as it can be first, and then the insertion ordering trick is tried as +an optimization to see if it's an improvement. Without that baseline +there's no meaningful basis for comparison, and I'm quite confident that a +naive implementation which just allocates individual nodes will +underperform the thing I've come up with, even without adding optimizations +related to fitting in 64 bits. + +On Thu, Feb 23, 2017 at 8:36 PM, Peter Todd <pete@petertodd.org> wrote: + +> On Thu, Feb 23, 2017 at 07:32:43PM -0800, Bram Cohen wrote: +> > On Thu, Feb 23, 2017 at 7:15 PM, Peter Todd <pete@petertodd.org> wrote: +> > +> > > +> > > Glad we're on the same page with regard to what's possible in TXO +> > > commitments. +> > > +> > > Secondly, am I correct in saying your UTXO commitments scheme requires +> > > random +> > > access? While you describe it as a "merkle set", obviously to be +> merkelized +> > > it'll have to have an ordering of some kind. What do you propose that +> > > ordering +> > > to be? +> > > +> > +> > The ordering is by the bits in the hash. Technically it's a Patricia +> Trie. +> > I'm using 'merkle tree' to refer to basically anything with a hash root. +> +> The hash of what? The values in the set? +> +> > > Maybe more specifically, what exact values do you propose to be in the +> set? +> > > +> > > +> > That is unspecified in the implementation, it just takes a 256 bit value +> > which is presumably a hash of something. The intention is to nail down a +> > simple format and demonstrate good performance and leave those semantics +> to +> > a higher layer. The simplest thing would be to hash together the txid and +> > output number. +> +> Ok, so let's assume the values in the set are the unspent outpoints. +> +> Since we're ordering by the hash of the values in the set, outpoints will +> be +> distributed uniformly in the set, and thus the access pattern of data in +> the +> set is uniform. +> +> Now let's fast-forward 10 years. For the sake of argument, assume that for +> every 1 UTXO in the set that corresponds to funds in someone's wallet that +> are +> likely to be spent, there are 2^12 = 4096 UTXO's that have been permanently +> lost (and/or created in spam attacks) and thus will never be spent. +> +> Since lost UTXO's are *also* uniformly distributed, if I'm processing a new +> block that spends 2^12 = 4096 UTXO's, on average for each UTXO spent, I'll +> have to update log2(4096) = 12 more digests than I would have had those +> "dead" +> UTXO's not existed. +> +> Concretely, imagine our UTXO set had just 8 values in it, and we were +> updating +> two of them: +> +> # +> / \ +> / \ +> / \ +> / \ +> / \ +> # # +> / \ / \ +> / \ / \ +> # . . # +> / \ / \ / \ / \ +> . X . . . . X . +> +> To mark two coins as spent, we've had to update 5 inner nodes. +> +> +> Now let's look at what happens in an insertion-ordered TXO commitment +> scheme. +> For sake of argument, let's assume the best possible case, where every UTXO +> spent in that same block was recently created. Since the UTXO's are +> recently +> created, chances are almost every single one of those "dead" UTXO's will +> have +> been created in the past. Thus, since this is an insertion-ordered data +> structure, those UTXO's exist in an older part of the data structure that +> our +> new block doesn't need to modify at all. +> +> Concretely, again let's imagine a TXO commitment with 8 values in it, and +> two +> of them being spent: +> +> # +> / \ +> / \ +> / \ +> / \ +> / \ +> . # +> / \ / \ +> / \ / \ +> . . . # +> / \ / \ / \ / \ +> . . . . . . X X +> +> To mark two coins as spent, we've only had to update 3 inner nodes; while +> our +> tree is higher with those lost coins, those extra inner nodes are amortised +> across all the coins we have to update. +> +> +> The situation gets even better when we look at the *new* UTXO's that our +> block +> creates. Suppose our UTXO set has size n. To mark a single coin as spent, +> we +> have to update log2(n) inner nodes. We do get to amortise this a bit at +> the top +> levels in the tree, but even if we assume the amortisation is totally free, +> we're updating at least log2(n) - log2(m) inner nodes "under" the amortised +> nodes at the top of the tree for *each* new node. +> +> Meanwhile with an insertion-ordered TXO commitment, each new UTXO added to +> the +> data set goes in the same place - the end. So almost none of the existing +> data +> needs to be touched to add the new UTXOs. Equally, the hashing required +> for the +> new UTXO's can be done in an incremental fashion that's very L1/L2 cache +> friendly. +> +> +> tl;dr: Precisely because access patterns in TXO commitments are *not* +> uniform, +> I think we'll find that from a L1/L2/etc cache perspective alone, TXO +> commitments will result in better performance than UTXO commitments. +> +> +> Now it is true that Bitcoin's current design means we'll need a map of +> confirmed outpoints to TXO insertion order indexes. But it's not +> particularly +> hard to add that "metadata" to transactions on the P2P layer in the same +> way +> that segwit added witnesses to transactions without modifying how txids +> were +> calculated; if you only connect to peers who provide you with TXO index +> information in blocks and transactions, you don't need to keep that map +> yourself. +> +> Finally, note how this makes transactions *smaller* in many circumstances: +> it's +> just a 8-byte max index rather than a 40 byte outpoint. +> +> -- +> https://petertodd.org 'peter'[:-1]@petertodd.org +> + +--001a113fd05695bf9205494e22bc +Content-Type: text/html; charset=UTF-8 +Content-Transfer-Encoding: quoted-printable + +<div dir=3D"ltr">So your idea is to cluster entries by entry time because n= +ewer things are more likely to leave and updating multiple things near each= + other is cheaper?<div><br></div><div>That can be done with my tool. Instea= +d of using hashes for the values being stored, you use position entries. Th= +e first entry gets a value of all zeros, the next one a one followed by all= + zeros, then the next two correspond to the first two with the second bit f= +lipped to one, then the next four the first four with the third bit flipped= + to one, etc. It probably performs a little bit better to do it two bits at= + a time instead of one so that the entries are 00, 01, 10, 11, 0001, 0010, = +0011, 0101, 0110, 0111, 1001, etc. If you were to really use this you'd= + probably want to to add some optimizations to use the fact that the termin= +als fit in 64 bits instead of 256, but it mostly works unchanged, and gets = +whatever benefits there are to this clustering plus the high performance im= +plementation tricks I've built which I keep complaining that nobody'= +;s giving feedback on.</div><div><br></div><div>I'm not sold on this be= +ing a win: The empirical access patterns are unknown, it requires an extra = +cache miss per lookup to find the entry number, it may be that everything i= +s optimized well enough without it for there to be no meaningful gains, and= + it's a bunch of extra complexity. What should be done is that a plain = +vanilla UTXO set solution is optimized as well as it can be first, and then= + the insertion ordering trick is tried as an optimization to see if it'= +s an improvement. Without that baseline there's no meaningful basis for= + comparison, and I'm quite confident that a naive implementation which = +just allocates individual nodes will underperform the thing I've come u= +p with, even without adding optimizations related to fitting in 64 bits.</d= +iv></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Thu, = +Feb 23, 2017 at 8:36 PM, Peter Todd <span dir=3D"ltr"><<a href=3D"mailto= +:pete@petertodd.org" target=3D"_blank">pete@petertodd.org</a>></span> wr= +ote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border= +-left:1px #ccc solid;padding-left:1ex"><span class=3D"">On Thu, Feb 23, 201= +7 at 07:32:43PM -0800, Bram Cohen wrote:<br> +> On Thu, Feb 23, 2017 at 7:15 PM, Peter Todd <<a href=3D"mailto:pete= +@petertodd.org">pete@petertodd.org</a>> wrote:<br> +><br> +> ><br> +> > Glad we're on the same page with regard to what's possibl= +e in TXO<br> +> > commitments.<br> +> ><br> +> > Secondly, am I correct in saying your UTXO commitments scheme req= +uires<br> +> > random<br> +> > access? While you describe it as a "merkle set", obviou= +sly to be merkelized<br> +> > it'll have to have an ordering of some kind. What do you prop= +ose that<br> +> > ordering<br> +> > to be?<br> +> ><br> +><br> +> The ordering is by the bits in the hash. Technically it's a Patric= +ia Trie.<br> +> I'm using 'merkle tree' to refer to basically anything wit= +h a hash root.<br> +<br> +</span>The hash of what? The values in the set?<br> +<span class=3D""><br> +> > Maybe more specifically, what exact values do you propose to be i= +n the set?<br> +> ><br> +> ><br> +> That is unspecified in the implementation, it just takes a 256 bit val= +ue<br> +> which is presumably a hash of something. The intention is to nail down= + a<br> +> simple format and demonstrate good performance and leave those semanti= +cs to<br> +> a higher layer. The simplest thing would be to hash together the txid = +and<br> +> output number.<br> +<br> +</span>Ok, so let's assume the values in the set are the unspent outpoi= +nts.<br> +<br> +Since we're ordering by the hash of the values in the set, outpoints wi= +ll be<br> +distributed uniformly in the set, and thus the access pattern of data in th= +e<br> +set is uniform.<br> +<br> +Now let's fast-forward 10 years. For the sake of argument, assume that = +for<br> +every 1 UTXO in the set that corresponds to funds in someone's wallet t= +hat are<br> +likely to be spent, there are 2^12 =3D 4096 UTXO's that have been perma= +nently<br> +lost (and/or created in spam attacks) and thus will never be spent.<br> +<br> +Since lost UTXO's are *also* uniformly distributed, if I'm processi= +ng a new<br> +block that spends 2^12 =3D 4096 UTXO's, on average for each UTXO spent,= + I'll<br> +have to update log2(4096) =3D 12 more digests than I would have had those &= +quot;dead"<br> +UTXO's not existed.<br> +<br> +Concretely, imagine our UTXO set had just 8 values in it, and we were updat= +ing<br> +two of them:<br> +<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0#<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 / \<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/=C2=A0 =C2=A0\<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /=C2=A0 =C2=A0 =C2=A0\<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/=C2=A0 =C2=A0 =C2=A0 =C2=A0\<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0\<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0#=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= +=A0#<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 / \=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/ \<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0/=C2=A0 =C2=A0\=C2=A0 =C2=A0 =C2=A0 =C2=A0/=C2= +=A0 =C2=A0\<br> +=C2=A0 =C2=A0 =C2=A0 #=C2=A0 =C2=A0 =C2=A0.=C2=A0 =C2=A0 =C2=A0.=C2=A0 =C2= +=A0 =C2=A0#<br> +=C2=A0 =C2=A0 =C2=A0/ \=C2=A0 =C2=A0/ \=C2=A0 =C2=A0/ \=C2=A0 =C2=A0/ \<br> +=C2=A0 =C2=A0 .=C2=A0 =C2=A0X .=C2=A0 =C2=A0. .=C2=A0 =C2=A0. X=C2=A0 =C2= +=A0.<br> +<br> +To mark two coins as spent, we've had to update 5 inner nodes.<br> +<br> +<br> +Now let's look at what happens in an insertion-ordered TXO commitment s= +cheme.<br> +For sake of argument, let's assume the best possible case, where every = +UTXO<br> +spent in that same block was recently created. Since the UTXO's are rec= +ently<br> +created, chances are almost every single one of those "dead" UTXO= +'s will have<br> +been created in the past. Thus, since this is an insertion-ordered data<br> +structure, those UTXO's exist in an older part of the data structure th= +at our<br> +new block doesn't need to modify at all.<br> +<br> +Concretely, again let's imagine a TXO commitment with 8 values in it, a= +nd two<br> +of them being spent:<br> +<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0#<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 / \<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/=C2=A0 =C2=A0\<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /=C2=A0 =C2=A0 =C2=A0\<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/=C2=A0 =C2=A0 =C2=A0 =C2=A0\<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0\<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0.=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= +=A0#<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 / \=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/ \<br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0/=C2=A0 =C2=A0\=C2=A0 =C2=A0 =C2=A0 =C2=A0/=C2= +=A0 =C2=A0\<br> +=C2=A0 =C2=A0 =C2=A0 .=C2=A0 =C2=A0 =C2=A0.=C2=A0 =C2=A0 =C2=A0.=C2=A0 =C2= +=A0 =C2=A0#<br> +=C2=A0 =C2=A0 =C2=A0/ \=C2=A0 =C2=A0/ \=C2=A0 =C2=A0/ \=C2=A0 =C2=A0/ \<br> +=C2=A0 =C2=A0 .=C2=A0 =C2=A0. .=C2=A0 =C2=A0. .=C2=A0 =C2=A0. X=C2=A0 =C2= +=A0X<br> +<br> +To mark two coins as spent, we've only had to update 3 inner nodes; whi= +le our<br> +tree is higher with those lost coins, those extra inner nodes are amortised= +<br> +across all the coins we have to update.<br> +<br> +<br> +The situation gets even better when we look at the *new* UTXO's that ou= +r block<br> +creates. Suppose our UTXO set has size n. To mark a single coin as spent, w= +e<br> +have to update log2(n) inner nodes. We do get to amortise this a bit at the= + top<br> +levels in the tree, but even if we assume the amortisation is totally free,= +<br> +we're updating at least log2(n) - log2(m) inner nodes "under"= + the amortised<br> +nodes at the top of the tree for *each* new node.<br> +<br> +Meanwhile with an insertion-ordered TXO commitment, each new UTXO added to = +the<br> +data set goes in the same place - the end. So almost none of the existing d= +ata<br> +needs to be touched to add the new UTXOs. Equally, the hashing required for= + the<br> +new UTXO's can be done in an incremental fashion that's very L1/L2 = +cache<br> +friendly.<br> +<br> +<br> +tl;dr: Precisely because access patterns in TXO commitments are *not* unifo= +rm,<br> +I think we'll find that from a L1/L2/etc cache perspective alone, TXO<b= +r> +commitments will result in better performance than UTXO commitments.<br> +<br> +<br> +Now it is true that Bitcoin's current design means we'll need a map= + of<br> +confirmed outpoints to TXO insertion order indexes. But it's not partic= +ularly<br> +hard to add that "metadata" to transactions on the P2P layer in t= +he same way<br> +that segwit added witnesses to transactions without modifying how txids wer= +e<br> +calculated; if you only connect to peers who provide you with TXO index<br> +information in blocks and transactions, you don't need to keep that map= +<br> +yourself.<br> +<br> +Finally, note how this makes transactions *smaller* in many circumstances: = +it's<br> +just a 8-byte max index rather than a 40 byte outpoint.<br> +<div class=3D"HOEnZb"><div class=3D"h5"><br> +--<br> +<a href=3D"https://petertodd.org" rel=3D"noreferrer" target=3D"_blank">http= +s://petertodd.org</a> 'peter'[:-1]@<a href=3D"http://petertodd.org"= + rel=3D"noreferrer" target=3D"_blank">petertodd.org</a><br> +</div></div></blockquote></div><br></div> + +--001a113fd05695bf9205494e22bc-- + |