Return-Path: <pete@petertodd.org>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id B683F259
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat, 25 Feb 2017 04:12:08 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from outmail148100.authsmtp.co.uk (outmail148100.authsmtp.co.uk
	[62.13.148.100])
	by smtp1.linuxfoundation.org (Postfix) with ESMTP id 8C0321A0
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat, 25 Feb 2017 04:12:07 +0000 (UTC)
Received: from mail-c247.authsmtp.com (mail-c247.authsmtp.com [62.13.128.247])
	by punt24.authsmtp.com (8.14.2/8.14.2/) with ESMTP id v1P4C5iY036068;
	Sat, 25 Feb 2017 04:12:05 GMT
Received: from petertodd.org (ec2-52-5-185-120.compute-1.amazonaws.com
	[52.5.185.120]) (authenticated bits=0)
	by mail.authsmtp.com (8.14.2/8.14.2/) with ESMTP id v1P4C3Ko008545
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 25 Feb 2017 04:12:04 GMT
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by petertodd.org (Postfix) with ESMTPSA id 0DE4E40576;
	Sat, 25 Feb 2017 04:12:03 +0000 (UTC)
Received: by localhost (Postfix, from userid 1000)
	id 27D40204AB; Fri, 24 Feb 2017 23:12:02 -0500 (EST)
Date: Fri, 24 Feb 2017 23:12:02 -0500
From: Peter Todd <pete@petertodd.org>
To: Bram Cohen <bram@bittorrent.com>
Message-ID: <20170225041202.GA11152@savin.petertodd.org>
References: <20170223235105.GA28497@savin.petertodd.org>
	<CA+KqGkowxEZeAFYa2JJchBDtRkg1p3YZNocivzu3fAtgRLDRBQ@mail.gmail.com>
	<20170224010943.GA29218@savin.petertodd.org>
	<CA+KqGkrOK76S3ffPJmpqYcBwtSeKESqN16yZsrwzDR6JZZmwFA@mail.gmail.com>
	<20170224025811.GA31911@savin.petertodd.org>
	<CA+KqGkq7gavAnAk-tcA+gxL2sWpv3ENhEmHrQHaPdyAsKrLjGg@mail.gmail.com>
	<20170224031531.GA32118@savin.petertodd.org>
	<CA+KqGkrfhg3GnbWwvKXHQ2NWuCnfzYyTPUxRhzYMuDBiNQR4eA@mail.gmail.com>
	<20170224043613.GA32502@savin.petertodd.org>
	<CA+KqGkpi4GvgU-K6vt-U5ZN4AkpjZ0rruzddoJS4-V0TcnyqUQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="6c2NcOVqGQ03X4Wi"
Content-Disposition: inline
In-Reply-To: <CA+KqGkpi4GvgU-K6vt-U5ZN4AkpjZ0rruzddoJS4-V0TcnyqUQ@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Server-Quench: 90cdee85-fb10-11e6-bcdf-0015176ca198
X-AuthReport-Spam: If SPAM / abuse - report it at:
	http://www.authsmtp.com/abuse
X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVKBZePFsRUQkR
	aQdMdAcUHlAWAgsB AmEbW1BeUV97WWc7 bghPaBtcak9QXgdq
	T0pMXVMcUgQIBW8C fUEeUhF3cwAIen93 ZU4sV3AICBEvfUNg
	FBxVFXAHZDJmdTJM BBZFdwNVdQJNeEwU a1l3GhFYa3VsNCMk
	FAgyOXU9MCtqYA0d aAwRMV8ICWMuJHYQ Sh4DGzQzHEoDLwAA 
X-Authentic-SMTP: 61633532353630.1038:706
X-AuthFastPath: 0 (Was 255)
X-AuthSMTP-Origin: 52.5.185.120/25
X-AuthVirus-Status: No virus detected - but ensure you scan with your own
	anti-virus system.
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW
	autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] A Better MMR Definition
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Feb 2017 04:12:08 -0000


--6c2NcOVqGQ03X4Wi
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Feb 24, 2017 at 02:20:19PM -0800, Bram Cohen wrote:
> So your idea is to cluster entries by entry time because newer things are
> more likely to leave and updating multiple things near each other is
> cheaper?

Yes, exactly.

> That can be done with my tool. Instead of using hashes for the values bei=
ng
> stored, you use position entries. The first entry gets a value of all
> zeros, the next one a one followed by all zeros, then the next two
> correspond to the first two with the second bit flipped to one, then the
> next four the first four with the third bit flipped to one, etc. It
> probably performs a little bit better to do it two bits at a time instead
> of one so that the entries are 00, 01, 10, 11, 0001, 0010, 0011, 0101,
> 0110, 0111, 1001, etc. If you were to really use this you'd probably want
> to to add some optimizations to use the fact that the terminals fit in 64
> bits instead of 256, but it mostly works unchanged, and gets whatever

So to be clear, what you're proposing there is to use the insertion order as
the index - once you go that far you've almost entirely re-invented my
proposal!

In fact, when I was working my proofchains/proofmarshal libraries I put some
thought into whether or not I could leave out the MMR merkelized list
implementation and use only the key-value map I also wrote. I decided to
include both as they aren't quite the same datastructure - using a list for=
 a
list has advantages.

> benefits there are to this clustering plus the high performance
> implementation tricks I've built which I keep complaining that nobody's
> giving feedback on.

Your merkle-set implementation is 1500 lines of densely written Python with
almost no comments, and less than a 100 lines of (also uncommented) tests. =
By
comparison, my Python MMR implementation is 300 lines of very readable Pyth=
on
with lots of comments, a 200 line explanation at the top, and 200 lines of
(commented) tests. Yet no-one is taking the (still considerable) effort to
understand and comment on my implementation. :)

Fact is, what you've written is really daunting to review, and given it's n=
ot
in the final language anyway, it's unclear what basis to review it on anywa=
y. I
suspect you'd get more feedback if the codebase was better commented, in a
production language, and you have actual real-world benchmarks and performa=
nce
figures.

In particular, while at the top of merkle_set.py you have a list of advanta=
ges,
and a bunch of TODO's, you don't explain *why* the code has any of these
advantages. To figure that out, I'd have to read and understand 1500 lines =
of
densely written Python. Without a human-readable pitch, not many people are
going to do that, myself included.

> I'm not sold on this being a win: The empirical access patterns are
> unknown,

Lost coins alone guarantees that access patterns will be biased towards new
coins being more likely to be spent. That basis alone is sufficient to just=
ify
an insertion-ordered data structure. Additionally, people have done graphs =
of
the average age of UTXO's when spent, and that data clearly shows that newer
coins are more likely to be spent than older coins.

> unknown, it requires an extra cache miss per lookup to find the entry
> number,

Like I mentioned in the email you're replying to, that extra lookup can be
easily avoided with a change to how transactions/blocks are serialized; if =
all
your peers support TXO commitments you can even discard the lookup database
entirely, as it's only a backwards compatibility measure.

> it may be that everything is optimized well enough without it for
> there to be no meaningful gains, and it's a bunch of extra complexity. Wh=
at

Optimization is itself extra complexity. If you're data structure has worse
inherent performance, and you have to make up the different with a highly
optimized implementation, that's likely to lead to more overall complexity =
than
using a data structure with better inherent performance.

Your current merkle-set implementation definitely _is_ very complex. An
apples-to-apples comparison is with my merkelized key:value tree(1), also a
patricia tree, which like the MMR is only about 300 lines of well-commented=
 and
straight-forward code.

1) https://github.com/proofchains/python-proofmarshal/blob/master/proofmars=
hal/merbinnertree.py

> should be done is that a plain vanilla UTXO set solution is optimized as
> well as it can be first, and then the insertion ordering trick is tried as
> an optimization to see if it's an improvement. Without that baseline
> there's no meaningful basis for comparison, and I'm quite confident that a
> naive implementation which just allocates individual nodes will
> underperform the thing I've come up with, even without adding optimizatio=
ns
> related to fitting in 64 bits.

To be clear, "insertion ordering" isn't a simple trick, it's a fundamental
change to what the data structure is. Once you do that, you're talking abou=
t my
proposal.

--=20
https://petertodd.org 'peter'[:-1]@petertodd.org

--6c2NcOVqGQ03X4Wi
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature

-----BEGIN PGP SIGNATURE-----

iQEcBAEBCAAGBQJYsQQPAAoJECSBQD2l8JH7XH4H/02u4PvOnJldydIt8yOi6ksO
lfuoqCbDj6z6wN7cyz0jb24TNfzzoMPNdd0OVCyyB9FhSH4MSokIxMDJJJR0NANi
04/Fm74+J0C+rLh3IysTZpUVf53F6HfDCtiK9XdNeOkncyRhEvYc99ULlgpnWQ2o
JpFeTDVilDalCPIbZ+SvkYK1jK/MaB5tANkn8nVc7PgNKCp3lRyifhm8LIaiM4SG
ZuV96pI/Pg3HKW1NA2fkqjAXnkgpQxjZIz/SbL/fe66pLYoWmFKL+FKWPgFPhFUn
kDKAjG0AXaIBWPvYQLgaXDW7g3dTAWnVJTmrWgp0U6/xxwk+c+MZ88EY/d7/fW4=
=p9PW
-----END PGP SIGNATURE-----

--6c2NcOVqGQ03X4Wi--