summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPeter Todd <pete@petertodd.org>2016-06-17 00:34:35 -0400
committerbitcoindev <bitcoindev@gnusha.org>2016-06-17 04:34:43 +0000
commit490eca7ffa7da4626a3729c2e714d96ba2ee5a14 (patch)
tree87ade4ec504416a4d86827ef2139480526bd1c03
parentc81d0794122576abb1875dd410bf499e9e363bc8 (diff)
downloadpi-bitcoindev-490eca7ffa7da4626a3729c2e714d96ba2ee5a14.tar.gz
pi-bitcoindev-490eca7ffa7da4626a3729c2e714d96ba2ee5a14.zip
Re: [bitcoin-dev] Merkle trees and mountain ranges
-rw-r--r--d1/5ce84d761db9ad8cddc04ca48d7db2a15303ae306
1 files changed, 306 insertions, 0 deletions
diff --git a/d1/5ce84d761db9ad8cddc04ca48d7db2a15303ae b/d1/5ce84d761db9ad8cddc04ca48d7db2a15303ae
new file mode 100644
index 000000000..8f14c7843
--- /dev/null
+++ b/d1/5ce84d761db9ad8cddc04ca48d7db2a15303ae
@@ -0,0 +1,306 @@
+Return-Path: <pete@petertodd.org>
+Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
+ [172.17.192.35])
+ by mail.linuxfoundation.org (Postfix) with ESMTPS id 2742E7AA
+ for <bitcoin-dev@lists.linuxfoundation.org>;
+ Fri, 17 Jun 2016 04:34:43 +0000 (UTC)
+X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
+Received: from outmail148114.authsmtp.net (outmail148114.authsmtp.net
+ [62.13.148.114])
+ by smtp1.linuxfoundation.org (Postfix) with ESMTP id 16C2B170
+ for <bitcoin-dev@lists.linuxfoundation.org>;
+ Fri, 17 Jun 2016 04:34:41 +0000 (UTC)
+Received: from mail-c247.authsmtp.com (mail-c247.authsmtp.com [62.13.128.247])
+ by punt20.authsmtp.com (8.14.2/8.14.2/) with ESMTP id u5H4YdSm072714;
+ Fri, 17 Jun 2016 05:34:39 +0100 (BST)
+Received: from petertodd.org (ec2-52-5-185-120.compute-1.amazonaws.com
+ [52.5.185.120]) (authenticated bits=0)
+ by mail.authsmtp.com (8.14.2/8.14.2/) with ESMTP id u5H4YaOu039543
+ (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
+ Fri, 17 Jun 2016 05:34:37 +0100 (BST)
+Received: from [127.0.0.1] (localhost [127.0.0.1])
+ by petertodd.org (Postfix) with ESMTPSA id AB44840110;
+ Fri, 17 Jun 2016 04:32:38 +0000 (UTC)
+Received: by localhost (Postfix, from userid 1000)
+ id 111FA20738; Fri, 17 Jun 2016 00:34:35 -0400 (EDT)
+Date: Fri, 17 Jun 2016 00:34:35 -0400
+From: Peter Todd <pete@petertodd.org>
+To: Bram Cohen <bram@bittorrent.com>
+Message-ID: <20160617043435.GA12800@fedora-21-dvm>
+References: <CA+KqGkosGrWQ2Lpg3Ptc+UyidK7H4_HLQ_QWx2DOAFRmLkv4zQ@mail.gmail.com>
+ <20160616001040.GA5026@fedora-21-dvm>
+ <CA+KqGkqAHcU2PzEX6OmorXRBQ22eF_QBYYqUDS1v_ZvevhLCuQ@mail.gmail.com>
+ <20160616032612.GA7792@fedora-21-dvm>
+ <CA+KqGkqA2WnvBEck3kv6p2no-9wzNVCTNA-MGw=Jg=gMGfrxUQ@mail.gmail.com>
+MIME-Version: 1.0
+Content-Type: multipart/signed; micalg=pgp-sha256;
+ protocol="application/pgp-signature"; boundary="lrZ03NoBR/3+SXJZ"
+Content-Disposition: inline
+In-Reply-To: <CA+KqGkqA2WnvBEck3kv6p2no-9wzNVCTNA-MGw=Jg=gMGfrxUQ@mail.gmail.com>
+User-Agent: Mutt/1.5.23 (2014-03-12)
+X-Server-Quench: cc9f5dec-3444-11e6-bcde-0015176ca198
+X-AuthReport-Spam: If SPAM / abuse - report it at:
+ http://www.authsmtp.com/abuse
+X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVKBZePFsRUQkR
+ aAdMdwUUEkAaAgsB AmAbW1FeU1l7WmQ7 bghPaBtcak9QXgdq
+ T0pMXVMcUQAQBXVQ eVseURB3cwYIenxx ZEYsDSNSCkEuIBVg
+ QUpQEXAHZDJmdTJM BBVFdwNVdQJNeEwU a1l3GhFYa3VsNCMk
+ FAgyOXU9MCtqYAFY WAIJIBoOW0sGBXY1 QRxKGDIyG1EMRiN7 NRUgJVMHdAAA
+X-Authentic-SMTP: 61633532353630.1038:706
+X-AuthFastPath: 0 (Was 255)
+X-AuthSMTP-Origin: 52.5.185.120/25
+X-AuthVirus-Status: No virus detected - but ensure you scan with your own
+ anti-virus system.
+X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW
+ autolearn=ham version=3.3.1
+X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
+ smtp1.linux-foundation.org
+Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
+Subject: Re: [bitcoin-dev] Merkle trees and mountain ranges
+X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
+X-Mailman-Version: 2.1.12
+Precedence: list
+List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
+List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
+ <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
+List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
+List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
+List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
+List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
+ <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
+X-List-Received-Date: Fri, 17 Jun 2016 04:34:43 -0000
+
+
+--lrZ03NoBR/3+SXJZ
+Content-Type: text/plain; charset=us-ascii
+Content-Disposition: inline
+Content-Transfer-Encoding: quoted-printable
+
+On Thu, Jun 16, 2016 at 02:07:26AM -0700, Bram Cohen wrote:
+> On Wed, Jun 15, 2016 at 8:26 PM, Peter Todd <pete@petertodd.org> wrote:
+> Okay, clearly my assumptions about the parts of that post I didn't read
+> carefully were way off. I'll have to look through it carefully to be able
+> to make coherent apples to apples comparisons.
+
+Thanks!
+
+> > I'm worried that once there's real transaction fees everyone might stop
+> > > consolidating dust and the set of unspent transactions might grow wit=
+hout
+> > > bound as well, but that's a topic for another day.
+> >
+> > Ok, but then if you're concerned about that risk, why introduce a data
+> > structure - the STXO set - that's _guaranteed_ to grow without bound?
+> >
+>=20
+> I'm not proposing STXO set commitments either. My point was that there
+> should be incentives for collecting dust. That has nothing to do with this
+> thread though and should be discussed separately (also I don't feel like
+> discussing it because I don't have a good proposal).
+
+Ah, yeah, I misunderstood you there; as expected absolutely no-one is propo=
+sing
+STXO set commitments. :)
+
+> > > The main differences to your patricia trie are the non-padding sha256=
+ and
+> > > that each level doesn't hash in a record of its depth and the usage of
+> > > ONLY0 and ONLY1.
+> >
+> > I'm rather confused, as the above sounds nothing like what I've
+> > implemented,
+> > which only has leaf nodes, inner nodes, and the special empty node
+> > singleton,
+> > for both the MMR and merbinner trees.
+> >
+>=20
+> It's quite a bit like merbinner trees. I've basically taken the leaf nodes
+> and smushed them into the inner nodes above them, thus saving a hashing
+> operation and some memory. They're both binary radix trees.
+
+Ah, I see what you mean now.
+
+So above you said that in merbinner trees each node "hash[es] in a record of
+its depth" That's actually incorrect: each node commits to the prefix that =
+all
+keys below that level start with, not just the depth.
+
+This means that in merbinner trees, cases where multiple keys share parts of
+the same prefix are handled efficiently, without introducing extra levels
+unnecessarily; there's no need for the ONLY0/1 nodes as the children of an
+inner node will always be on different sides.
+
+When keys are randomly distributed, this isn't a big deal; OTOH against
+attackers who are choosing keys, e.g. by grinding hashes, merbinner trees
+always have maximum depths in proportion to log2(n) of the actual number of
+items in the tree. Grinding is particularly annoying to deal with due to the
+birthday attack: creating a ground prefix 64 bits long only takes 32 bits w=
+orth
+of work.
+
+
+In my deterministic expressions work one of the ideas I've been tossing aro=
+und
+is rather than always using hash digests directly for when you need to comm=
+it
+to some data, we could instead extend the idea of a digest to that of a
+"commitment", where a commitment is simply some short, but variable-sized,
+string that uniquely maps to a given set of data. Secondly, commitments do
+*not* always guarantee that the original data can't be recovered from the
+commitment itself.
+
+By allowing commitments to be variable sized - say 0 to ~64 bytes - we get a
+number of advantages:
+
+1) Data shorter than the length of a digest (32 bytes) can be included in t=
+he
+commitment itself, improving efficiency.
+
+2) Data a little longer than a digest can have hashing delayed, to better f=
+ill
+up blocks.
+
+In particular, case #2 handles your leaf node optimizations generically,
+without special cases and additional complexity. It'd also be a better way =
+to
+do the ONLY0/1 cases, as if the "nothing on this side" symbol is a single b=
+yte,
+each additional colliding level would simply extend the commitment without
+hashing. In short, you'd have nearly the same level of optimization even if=
+ at
+the cryptography level your tree consists of only leaves, inner nodes, and =
+nil.
+
+Another advantage of variable sized commitments is that it can help make cl=
+ear
+to users when it's possible to brute force the message behind the commitmen=
+t.
+For instance, digest from a hashed four byte integer can be trivially rever=
+sed
+by just trying all combinations. Equally, if that integer is concatenated w=
+ith
+a 32 byte digest that the attacker knows, the value of the integer can be b=
+rute
+forced.
+
+> > Technically even a patricia trie utxo commitment can have sub-1 cache
+> > > misses per update if some of the updates in a single block are close =
+to
+> > > each other in memory. I think I can get practical Bitcoin updates down
+> > to a
+> > > little bit less than one l2 cache miss per update, but not a lot less.
+> >
+> > I'm very confused as to why you think that's possible. When you say
+> > "practical
+> > Bitcoin updates", what exactly is the data structure you're proposing to
+> > update? How is it indexed?
+>=20
+>=20
+> My calculations are: a Bitcoin block contains about 2000 updates. The l2
+> cache is about 256 kilobytes, and if an update is about 32 bytes times two
+> for the parents, grandparents, etc. then an l2 cache can contain about 40=
+00
+> values. If the current utxo size is about 2000 * 4000 =3D 8,000,000 in si=
+ze
+> then about half the pages which contain a transaction will contain a seco=
+nd
+> one. I think the utxo set is currently about an order of magnitude greater
+> than that, so the number of such collisions will be fairly mall, hence my
+> 'less than one but not a lot less' comment.
+
+Your estimate of updates requiring 32 bytes of data is *way* off.
+
+Each inner node updated on the path to a leaf node will itself require 32 b=
+ytes
+of data to be fetched - the digest of the sibling. As of block 416,628, the=
+re
+are 39,167,128 unspent txouts, giving us a tree about 25 levels deep.
+
+So if I want to update a single leaf, I need to read:
+
+ 25 nodes * 32 bytes/node =3D 800 bytes
+
+of data. Naively, that'd mean our 2,000 updates needs to read 1.6MB from RA=
+M,
+which is 6.4x bigger than the L2 cache - it's just not going to fit.
+
+Taking into account the fact that this is a batched update improves things a
+little bit. For a node at level i with random access patterns and N accesses
+total our amortised cost is 1/(1 + N/2^i) Summing that over 2,000 leaf upda=
+tes
+and 25 levels gives us ~29,000 total updates, 0.9MB, which is still a lot
+larger than L2 cache.
+
+While this might fit in L3 cache - usually on the order of megabytes - this=
+ is
+a rather optimistic scenario anyway: we're assuming no other cache pressure=
+ and
+100% hit rate.
+
+Anyway hashing is pretty slow. The very fast BLAKE2 is about 3 cycles/byte
+(SHA256 is about 15 cycles/byte) so hashing that same data would take around
+200 cycles, and probably quite a bit more in practice due to overheads from=
+ our
+short message lengths; fetching a cache line from DRAM only takes about 1,0=
+00
+cycles. I'd guess that once other overheads are taken into account, even if=
+ you
+could eliminate L2/L3 cache-misses it wouldn't be much of an improvement.
+
+> As for how it's indexed, at a crypto definition level it's just a binary
+> radix tree. In terms of how it's indexed in memory, that involves some
+> optimizations to avoid cache misses. Memory is allocated into blocks of
+> about the size of an 12 cache (or maybe an l1 cache, it will require some
+> testing and optimization). Blocks are either branch blocks, which keep
+> everything in fixed positions, or leaf blocks, which contain fixed size
+> entries for nodes plus indexes within the same leaf block of their
+> children. Branch blocks can have many children which can be either branch
+> blocks or leaf blocks, but typically are either all branch blocks or all
+> leaf blocks. Branch blocks always have exactly one parent. Leaf blocks
+> always have all their inputs come from a single branch block, but there c=
+an
+> be multiple ones of those. When a branch block overflows it first tries to
+> put stuff into the last leaf block it used, and if there's no more room it
+> allocates a new one. It's fairly common for branches to have just a few
+> leaf children, but they also could have a lot, depending on whether the
+> base 2 log of the number of things currently in the set modulo the number
+> levels in a branch is a small number.
+>=20
+> Usually when an update is done it consists of first checking the
+> appropriate output of the root block (it's jumped to directly to avoid
+> unnecessary memory lookups. If there's nothing there the algorithm will
+> walk back until it finds something.) That leads directly to (usually)
+> another branch whose output is jumped to directly again. At Bitcoin utxo
+> set sizes that will usually lead to a leaf block, which is then walked do=
+wn
+> manually to find the actual terminal node, which is then updated, and the
+> parent, grandparent, etc. is then marked invalid until something which was
+> already marked invalid is hit, and it exits. Calculation of hash values is
+> done lazily.
+
+I think it's safe to say that given our working set is significantly larger
+than the L2/L3 cache available, none of the above optimizations are likely =
+to
+help much. Better to just keep the codebase simple and use standard techniq=
+ues.
+
+--=20
+https://petertodd.org 'peter'[:-1]@petertodd.org
+
+--lrZ03NoBR/3+SXJZ
+Content-Type: application/pgp-signature; name="signature.asc"
+Content-Description: Digital signature
+
+-----BEGIN PGP SIGNATURE-----
+
+iQEcBAEBCAAGBQJXY33YAAoJEGOZARBE6K+yRpMH/iu6jy2HeYSemU3z0hjnRf/x
+h6C7fEGFM0QvQ5xYB6VL+FnRxNMmiG94rLDlUxTlrQ6U3OvFGPZhOYVI1hyUgzFW
+98kB1RfWI233sdu/XKAp93zPKTJ7qT/GQpk2MQG58/gJmtKFDuWV40FYJlHI+62s
+nhzEKoq9HLtCNcXUmfAWj/lGaHMhGQlLVs7O1v26lo7Ez8KO1fxFyrtg97LtVzFz
+4qTqG3Lk4ic3snBMq48EVLtKMa+jO0TliZhUshEvtgA/T6FprEv1dpLDPyAs44W5
+8D1T4Uki84SYRnGq1IMfXtran9H61ez+XxYU54f0k3yaT5BPD4v3B7zHlrlgyO4=
+=EcTA
+-----END PGP SIGNATURE-----
+
+--lrZ03NoBR/3+SXJZ--
+