Return-Path: <pete@petertodd.org>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 4006A957
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Thu, 16 Jun 2016 03:26:20 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from outmail149082.authsmtp.co.uk (outmail149082.authsmtp.co.uk
	[62.13.149.82])
	by smtp1.linuxfoundation.org (Postfix) with ESMTP id F2DC4100
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Thu, 16 Jun 2016 03:26:18 +0000 (UTC)
Received: from mail-c232.authsmtp.com (mail-c232.authsmtp.com [62.13.128.232])
	by punt20.authsmtp.com (8.14.2/8.14.2/) with ESMTP id u5G3QGZ0067346;
	Thu, 16 Jun 2016 04:26:16 +0100 (BST)
Received: from petertodd.org (ec2-52-5-185-120.compute-1.amazonaws.com
	[52.5.185.120]) (authenticated bits=0)
	by mail.authsmtp.com (8.14.2/8.14.2/) with ESMTP id u5G3QF86086084
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 16 Jun 2016 04:26:15 +0100 (BST)
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by petertodd.org (Postfix) with ESMTPSA id 7313240092;
	Thu, 16 Jun 2016 03:24:18 +0000 (UTC)
Received: by localhost (Postfix, from userid 1000)
	id 44D8920738; Wed, 15 Jun 2016 23:26:12 -0400 (EDT)
Date: Wed, 15 Jun 2016 23:26:12 -0400
From: Peter Todd <pete@petertodd.org>
To: Bram Cohen <bram@bittorrent.com>
Message-ID: <20160616032612.GA7792@fedora-21-dvm>
References: <CA+KqGkosGrWQ2Lpg3Ptc+UyidK7H4_HLQ_QWx2DOAFRmLkv4zQ@mail.gmail.com>
	<20160616001040.GA5026@fedora-21-dvm>
	<CA+KqGkqAHcU2PzEX6OmorXRBQ22eF_QBYYqUDS1v_ZvevhLCuQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="GvXjxJ+pjyke8COw"
Content-Disposition: inline
In-Reply-To: <CA+KqGkqAHcU2PzEX6OmorXRBQ22eF_QBYYqUDS1v_ZvevhLCuQ@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Server-Quench: 159850b5-3372-11e6-829e-00151795d556
X-AuthReport-Spam: If SPAM / abuse - report it at:
	http://www.authsmtp.com/abuse
X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVKBZePFsRUQkR
	aAdMdwQUEkAaAgsB AmAbW1BeUlt7WGU7 bghPaBtcak9QXgdq
	T0pMXVMcUQAfAn13 DhgeWh9yfwEIeXh4 YEYsX3VSVEF6J0Ng
	QU1TF3AHZDJmdWgd WRVFdwNVdQJNdxoR b1V5GhFYa3VsNCMk
	FAgyOXU9MCtqYAFY WAIJIBoOW0sGBXY1 QRxKGDIyG1EMRiN7 NRUgJVMHdAAA
X-Authentic-SMTP: 61633532353630.1037:706
X-AuthFastPath: 0 (Was 255)
X-AuthSMTP-Origin: 52.5.185.120/25
X-AuthVirus-Status: No virus detected - but ensure you scan with your own
	anti-virus system.
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW
	autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] Merkle trees and mountain ranges
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Jun 2016 03:26:20 -0000


--GvXjxJ+pjyke8COw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jun 15, 2016 at 06:16:26PM -0700, Bram Cohen wrote:
> On Wed, Jun 15, 2016 at 5:10 PM, Peter Todd <pete@petertodd.org> wrote:
>=20
> > On Tue, Jun 14, 2016 at 05:14:23PM -0700, Bram Cohen via bitcoin-dev wr=
ote:
> > >
> > > Peter proposes that there should be both UTXO and STXO commitments, a=
nd
> >
> > No, that's incorrect - I'm only proposing TXO commitments, not UTXO nor
> > STXO
> > commitments.
> >
>=20
> What do you mean by TXO commitments? If you mean that it only records
> insertions rather than deletions, then that can do many of the same proofs
> but has no way of proving that something is currently in the UTXO set,
> which is functionality I'd like to provide.

I think you need to re-read my original post on TXO commitments, specifical=
ly
where I say:

    # TXO Commitments

    A merkle tree committing to the state of __all transaction outputs, bot=
h spent
    and unspent__, we can provide a method of compactly proving the current=
 state of
    an output.

https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/012715.html

> When I say 'merkle tree' what I mean is a patricia trie. What I assume is
> meant by a merkle mountain range is a series of patricia tries of
> decreasing size each of which is an addition to the previous one, and
> they're periodically consolidated into larger tries so the old ones can go
> away. This data structure has the nice property that it's both in sorted
> order and has less than one cache miss per operation because the
> consolidation operations can be batched and done linearly. There are a
> number of different things you could be describing if I misunderstood.

Nope, MMR's are completely unlike what you just described.

> > I'm not proposing STXO commitments precisely because the set of _spent_
> > transactions grows without bound.
>=20
>=20
> I'm worried that once there's real transaction fees everyone might stop
> consolidating dust and the set of unspent transactions might grow without
> bound as well, but that's a topic for another day.

Ok, but then if you're concerned about that risk, why introduce a data
structure - the STXO set - that's _guaranteed_ to grow without bound?

> > > Now I'm going to go out on a limb. My thesis is that usage of a mount=
ain
> > > range is unnecessary, and that a merkle tree in the raw can be made
> > > serviceable by sprinkling magic pixie dust on the performance problem.
> >
> > It'd help if you specified exactly what type of merkle tree you're talk=
ing
> > about here; remember that the certificate transparency RFC appears to h=
ave
> > reinvented merkle mountain ranges, and they call them "merkle trees".
> > Bitcoin
> > meanwhile uses a so-called "merkle tree" that's broken, and Zcash uses a
> > partially filled fixed-sized perfect tree.
> >
>=20
> What I'm making is a patricia trie. Its byte level definition is very
> similar to the one in your MMR codebase.

Which codebase exactly? I have both a insertion-ordered list (MMR) and a
key:value mapping (referred to as a "merbinner tree" in the codebase) in the
proofchains codebase. They're very different data structures.

> Each level of the tree has a single metadata byte and followed by two
> hashes. The hashes are truncated by one byte and the hash function is a
> non-padding variant of sha256 (right now it's just using regular sha256,
> but that's a nice optimization which allows everything to fit in a single
> block).
>=20
> The possible metadata values are: TERM0, TERM1, TERMBOTH, ONLY0, ONLY1,
> MIDDLE. They mean:
>=20
> TERM0, TERM1: There is a single thing in the tree on the specified side.
> The thing hashed on that side is that thing verbatim. The other side has
> more than one thing and the hash of it is the root of everything below.
>=20
> TERMBOTH: There are exactly two things below and they're included inline.
> Note that two things is a special case, if there are more you sometimes
> have ONLY0 or ONLY1.
>=20
> ONLY0, ONLY1: There are more than two things below and they're all on the
> same side. This makes proofs of inclusion and exclusion simpler, and makes
> some implementation details easier, for example there's always something =
at
> every level with perfect memory positioning. It doesn't cause much extra
> memory usage because of the TERMBOTH exception for exactly two things.
>=20
> MIDDLE: There two or more things on both sides.
>=20
> There's also a SINGLETON special case for a tree which contains only one
> thing, and an EMPTY special value for tree which doesn't contain anything.
>=20
> The main differences to your patricia trie are the non-padding sha256 and
> that each level doesn't hash in a record of its depth and the usage of
> ONLY0 and ONLY1.

I'm rather confused, as the above sounds nothing like what I've implemented,
which only has leaf nodes, inner nodes, and the special empty node singleto=
n,
for both the MMR and merbinner trees.

> > I'm having a hard time understanding this paragraph; could you explain
> > what you
> > think is happening when things are "merged into larger hills"?
> >
>=20
> I'm talking about the recalculation of mountain tips, assuming we're on t=
he
> same page about what 'MMR' means.

Yeah, we're definitely not...

In MMR's append operations never need to modify mountain contents.

> > As UTXO/STXO/TXO sets are all enormously larger than L1/L2 cache, it's
> > impossible to get CPU cache misses below one for update operations. The
> > closest
> > thing to an exception is MMR's, which due to their insertion-ordering c=
ould
> > have good cache locality for appends, in the sense that the mountain ti=
ps
> > required to recalculate the MMR tip digest will already be in cache from
> > the
> > previous append. But that's not sufficient, as you also need to modify =
old
> > TXO's further back in the tree to mark them as spent - that data is goi=
ng
> > to be
> > far larger than L1/L2 cache.
> >
>=20
> This makes me think we're talking about subtly different things for MMRs.
> The ones I described above have sub-1 cache miss per update due to the
> amortized merging together of old mountains.

Again, see above.

> Technically even a patricia trie utxo commitment can have sub-1 cache
> misses per update if some of the updates in a single block are close to
> each other in memory. I think I can get practical Bitcoin updates down to=
 a
> little bit less than one l2 cache miss per update, but not a lot less.

I'm very confused as to why you think that's possible. When you say "practi=
cal
Bitcoin updates", what exactly is the data structure you're proposing to
update? How is it indexed?

--=20
https://petertodd.org 'peter'[:-1]@petertodd.org

--GvXjxJ+pjyke8COw
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature

-----BEGIN PGP SIGNATURE-----

iQEcBAEBCAAGBQJXYhxRAAoJEGOZARBE6K+y/TgH/id+TvgsceX/8+w+sneS0nfq
K/l9zNcizl6UwLwKYYih7iHQ3zq/9xS9/caI1W7RvGv1kbJtbce9RHSM06IfWzy3
hpgNbRge1kh5umzqxalhKx7KzcAtFHQU2gi/wBJFBrQ+zEn4FyDJov1hFOONupEE
vHRlrksXDblfAEaRBovzKfIYQlf8j3vgHg/UPPIP95cUaC5rRimrBKQ7jbC5676X
LwxFsZGeITp0jmPasgiTPq931YbCSzB7UYHDC8ppffTfL2B9k9zmfGGxnIDXOufB
yKelvGYr78oIAKVQVUlihrAmJJ01TwExRERfUYntSv+3/gSDjVOw4nH6i0cIt+M=
=80le
-----END PGP SIGNATURE-----

--GvXjxJ+pjyke8COw--