Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 4006A957 for ; Thu, 16 Jun 2016 03:26:20 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from outmail149082.authsmtp.co.uk (outmail149082.authsmtp.co.uk [62.13.149.82]) by smtp1.linuxfoundation.org (Postfix) with ESMTP id F2DC4100 for ; Thu, 16 Jun 2016 03:26:18 +0000 (UTC) Received: from mail-c232.authsmtp.com (mail-c232.authsmtp.com [62.13.128.232]) by punt20.authsmtp.com (8.14.2/8.14.2/) with ESMTP id u5G3QGZ0067346; Thu, 16 Jun 2016 04:26:16 +0100 (BST) Received: from petertodd.org (ec2-52-5-185-120.compute-1.amazonaws.com [52.5.185.120]) (authenticated bits=0) by mail.authsmtp.com (8.14.2/8.14.2/) with ESMTP id u5G3QF86086084 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 16 Jun 2016 04:26:15 +0100 (BST) Received: from [127.0.0.1] (localhost [127.0.0.1]) by petertodd.org (Postfix) with ESMTPSA id 7313240092; Thu, 16 Jun 2016 03:24:18 +0000 (UTC) Received: by localhost (Postfix, from userid 1000) id 44D8920738; Wed, 15 Jun 2016 23:26:12 -0400 (EDT) Date: Wed, 15 Jun 2016 23:26:12 -0400 From: Peter Todd To: Bram Cohen Message-ID: <20160616032612.GA7792@fedora-21-dvm> References: <20160616001040.GA5026@fedora-21-dvm> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="GvXjxJ+pjyke8COw" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Server-Quench: 159850b5-3372-11e6-829e-00151795d556 X-AuthReport-Spam: If SPAM / abuse - report it at: http://www.authsmtp.com/abuse X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVKBZePFsRUQkR aAdMdwQUEkAaAgsB AmAbW1BeUlt7WGU7 bghPaBtcak9QXgdq T0pMXVMcUQAfAn13 DhgeWh9yfwEIeXh4 YEYsX3VSVEF6J0Ng QU1TF3AHZDJmdWgd WRVFdwNVdQJNdxoR b1V5GhFYa3VsNCMk FAgyOXU9MCtqYAFY WAIJIBoOW0sGBXY1 QRxKGDIyG1EMRiN7 NRUgJVMHdAAA X-Authentic-SMTP: 61633532353630.1037:706 X-AuthFastPath: 0 (Was 255) X-AuthSMTP-Origin: 52.5.185.120/25 X-AuthVirus-Status: No virus detected - but ensure you scan with your own anti-virus system. X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Bitcoin Protocol Discussion Subject: Re: [bitcoin-dev] Merkle trees and mountain ranges X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Jun 2016 03:26:20 -0000 --GvXjxJ+pjyke8COw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jun 15, 2016 at 06:16:26PM -0700, Bram Cohen wrote: > On Wed, Jun 15, 2016 at 5:10 PM, Peter Todd wrote: >=20 > > On Tue, Jun 14, 2016 at 05:14:23PM -0700, Bram Cohen via bitcoin-dev wr= ote: > > > > > > Peter proposes that there should be both UTXO and STXO commitments, a= nd > > > > No, that's incorrect - I'm only proposing TXO commitments, not UTXO nor > > STXO > > commitments. > > >=20 > What do you mean by TXO commitments? If you mean that it only records > insertions rather than deletions, then that can do many of the same proofs > but has no way of proving that something is currently in the UTXO set, > which is functionality I'd like to provide. I think you need to re-read my original post on TXO commitments, specifical= ly where I say: # TXO Commitments A merkle tree committing to the state of __all transaction outputs, bot= h spent and unspent__, we can provide a method of compactly proving the current= state of an output. https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/012715.html > When I say 'merkle tree' what I mean is a patricia trie. What I assume is > meant by a merkle mountain range is a series of patricia tries of > decreasing size each of which is an addition to the previous one, and > they're periodically consolidated into larger tries so the old ones can go > away. This data structure has the nice property that it's both in sorted > order and has less than one cache miss per operation because the > consolidation operations can be batched and done linearly. There are a > number of different things you could be describing if I misunderstood. Nope, MMR's are completely unlike what you just described. > > I'm not proposing STXO commitments precisely because the set of _spent_ > > transactions grows without bound. >=20 >=20 > I'm worried that once there's real transaction fees everyone might stop > consolidating dust and the set of unspent transactions might grow without > bound as well, but that's a topic for another day. Ok, but then if you're concerned about that risk, why introduce a data structure - the STXO set - that's _guaranteed_ to grow without bound? > > > Now I'm going to go out on a limb. My thesis is that usage of a mount= ain > > > range is unnecessary, and that a merkle tree in the raw can be made > > > serviceable by sprinkling magic pixie dust on the performance problem. > > > > It'd help if you specified exactly what type of merkle tree you're talk= ing > > about here; remember that the certificate transparency RFC appears to h= ave > > reinvented merkle mountain ranges, and they call them "merkle trees". > > Bitcoin > > meanwhile uses a so-called "merkle tree" that's broken, and Zcash uses a > > partially filled fixed-sized perfect tree. > > >=20 > What I'm making is a patricia trie. Its byte level definition is very > similar to the one in your MMR codebase. Which codebase exactly? I have both a insertion-ordered list (MMR) and a key:value mapping (referred to as a "merbinner tree" in the codebase) in the proofchains codebase. They're very different data structures. > Each level of the tree has a single metadata byte and followed by two > hashes. The hashes are truncated by one byte and the hash function is a > non-padding variant of sha256 (right now it's just using regular sha256, > but that's a nice optimization which allows everything to fit in a single > block). >=20 > The possible metadata values are: TERM0, TERM1, TERMBOTH, ONLY0, ONLY1, > MIDDLE. They mean: >=20 > TERM0, TERM1: There is a single thing in the tree on the specified side. > The thing hashed on that side is that thing verbatim. The other side has > more than one thing and the hash of it is the root of everything below. >=20 > TERMBOTH: There are exactly two things below and they're included inline. > Note that two things is a special case, if there are more you sometimes > have ONLY0 or ONLY1. >=20 > ONLY0, ONLY1: There are more than two things below and they're all on the > same side. This makes proofs of inclusion and exclusion simpler, and makes > some implementation details easier, for example there's always something = at > every level with perfect memory positioning. It doesn't cause much extra > memory usage because of the TERMBOTH exception for exactly two things. >=20 > MIDDLE: There two or more things on both sides. >=20 > There's also a SINGLETON special case for a tree which contains only one > thing, and an EMPTY special value for tree which doesn't contain anything. >=20 > The main differences to your patricia trie are the non-padding sha256 and > that each level doesn't hash in a record of its depth and the usage of > ONLY0 and ONLY1. I'm rather confused, as the above sounds nothing like what I've implemented, which only has leaf nodes, inner nodes, and the special empty node singleto= n, for both the MMR and merbinner trees. > > I'm having a hard time understanding this paragraph; could you explain > > what you > > think is happening when things are "merged into larger hills"? > > >=20 > I'm talking about the recalculation of mountain tips, assuming we're on t= he > same page about what 'MMR' means. Yeah, we're definitely not... In MMR's append operations never need to modify mountain contents. > > As UTXO/STXO/TXO sets are all enormously larger than L1/L2 cache, it's > > impossible to get CPU cache misses below one for update operations. The > > closest > > thing to an exception is MMR's, which due to their insertion-ordering c= ould > > have good cache locality for appends, in the sense that the mountain ti= ps > > required to recalculate the MMR tip digest will already be in cache from > > the > > previous append. But that's not sufficient, as you also need to modify = old > > TXO's further back in the tree to mark them as spent - that data is goi= ng > > to be > > far larger than L1/L2 cache. > > >=20 > This makes me think we're talking about subtly different things for MMRs. > The ones I described above have sub-1 cache miss per update due to the > amortized merging together of old mountains. Again, see above. > Technically even a patricia trie utxo commitment can have sub-1 cache > misses per update if some of the updates in a single block are close to > each other in memory. I think I can get practical Bitcoin updates down to= a > little bit less than one l2 cache miss per update, but not a lot less. I'm very confused as to why you think that's possible. When you say "practi= cal Bitcoin updates", what exactly is the data structure you're proposing to update? How is it indexed? --=20 https://petertodd.org 'peter'[:-1]@petertodd.org --GvXjxJ+pjyke8COw Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJXYhxRAAoJEGOZARBE6K+y/TgH/id+TvgsceX/8+w+sneS0nfq K/l9zNcizl6UwLwKYYih7iHQ3zq/9xS9/caI1W7RvGv1kbJtbce9RHSM06IfWzy3 hpgNbRge1kh5umzqxalhKx7KzcAtFHQU2gi/wBJFBrQ+zEn4FyDJov1hFOONupEE vHRlrksXDblfAEaRBovzKfIYQlf8j3vgHg/UPPIP95cUaC5rRimrBKQ7jbC5676X LwxFsZGeITp0jmPasgiTPq931YbCSzB7UYHDC8ppffTfL2B9k9zmfGGxnIDXOufB yKelvGYr78oIAKVQVUlihrAmJJ01TwExRERfUYntSv+3/gSDjVOw4nH6i0cIt+M= =80le -----END PGP SIGNATURE----- --GvXjxJ+pjyke8COw--