summaryrefslogtreecommitdiff
path: root/ef/6adcecdd8cf71f8b29bd5b65fa75bad81630c9
blob: a783baa2c2c9112eb8c081db68f76b3562461512 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
Return-Path: <pete@petertodd.org>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id B683F259
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat, 25 Feb 2017 04:12:08 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from outmail148100.authsmtp.co.uk (outmail148100.authsmtp.co.uk
	[62.13.148.100])
	by smtp1.linuxfoundation.org (Postfix) with ESMTP id 8C0321A0
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat, 25 Feb 2017 04:12:07 +0000 (UTC)
Received: from mail-c247.authsmtp.com (mail-c247.authsmtp.com [62.13.128.247])
	by punt24.authsmtp.com (8.14.2/8.14.2/) with ESMTP id v1P4C5iY036068;
	Sat, 25 Feb 2017 04:12:05 GMT
Received: from petertodd.org (ec2-52-5-185-120.compute-1.amazonaws.com
	[52.5.185.120]) (authenticated bits=0)
	by mail.authsmtp.com (8.14.2/8.14.2/) with ESMTP id v1P4C3Ko008545
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 25 Feb 2017 04:12:04 GMT
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by petertodd.org (Postfix) with ESMTPSA id 0DE4E40576;
	Sat, 25 Feb 2017 04:12:03 +0000 (UTC)
Received: by localhost (Postfix, from userid 1000)
	id 27D40204AB; Fri, 24 Feb 2017 23:12:02 -0500 (EST)
Date: Fri, 24 Feb 2017 23:12:02 -0500
From: Peter Todd <pete@petertodd.org>
To: Bram Cohen <bram@bittorrent.com>
Message-ID: <20170225041202.GA11152@savin.petertodd.org>
References: <20170223235105.GA28497@savin.petertodd.org>
	<CA+KqGkowxEZeAFYa2JJchBDtRkg1p3YZNocivzu3fAtgRLDRBQ@mail.gmail.com>
	<20170224010943.GA29218@savin.petertodd.org>
	<CA+KqGkrOK76S3ffPJmpqYcBwtSeKESqN16yZsrwzDR6JZZmwFA@mail.gmail.com>
	<20170224025811.GA31911@savin.petertodd.org>
	<CA+KqGkq7gavAnAk-tcA+gxL2sWpv3ENhEmHrQHaPdyAsKrLjGg@mail.gmail.com>
	<20170224031531.GA32118@savin.petertodd.org>
	<CA+KqGkrfhg3GnbWwvKXHQ2NWuCnfzYyTPUxRhzYMuDBiNQR4eA@mail.gmail.com>
	<20170224043613.GA32502@savin.petertodd.org>
	<CA+KqGkpi4GvgU-K6vt-U5ZN4AkpjZ0rruzddoJS4-V0TcnyqUQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="6c2NcOVqGQ03X4Wi"
Content-Disposition: inline
In-Reply-To: <CA+KqGkpi4GvgU-K6vt-U5ZN4AkpjZ0rruzddoJS4-V0TcnyqUQ@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Server-Quench: 90cdee85-fb10-11e6-bcdf-0015176ca198
X-AuthReport-Spam: If SPAM / abuse - report it at:
	http://www.authsmtp.com/abuse
X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVKBZePFsRUQkR
	aQdMdAcUHlAWAgsB AmEbW1BeUV97WWc7 bghPaBtcak9QXgdq
	T0pMXVMcUgQIBW8C fUEeUhF3cwAIen93 ZU4sV3AICBEvfUNg
	FBxVFXAHZDJmdTJM BBZFdwNVdQJNeEwU a1l3GhFYa3VsNCMk
	FAgyOXU9MCtqYA0d aAwRMV8ICWMuJHYQ Sh4DGzQzHEoDLwAA 
X-Authentic-SMTP: 61633532353630.1038:706
X-AuthFastPath: 0 (Was 255)
X-AuthSMTP-Origin: 52.5.185.120/25
X-AuthVirus-Status: No virus detected - but ensure you scan with your own
	anti-virus system.
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW
	autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] A Better MMR Definition
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Feb 2017 04:12:08 -0000


--6c2NcOVqGQ03X4Wi
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Feb 24, 2017 at 02:20:19PM -0800, Bram Cohen wrote:
> So your idea is to cluster entries by entry time because newer things are
> more likely to leave and updating multiple things near each other is
> cheaper?

Yes, exactly.

> That can be done with my tool. Instead of using hashes for the values bei=
ng
> stored, you use position entries. The first entry gets a value of all
> zeros, the next one a one followed by all zeros, then the next two
> correspond to the first two with the second bit flipped to one, then the
> next four the first four with the third bit flipped to one, etc. It
> probably performs a little bit better to do it two bits at a time instead
> of one so that the entries are 00, 01, 10, 11, 0001, 0010, 0011, 0101,
> 0110, 0111, 1001, etc. If you were to really use this you'd probably want
> to to add some optimizations to use the fact that the terminals fit in 64
> bits instead of 256, but it mostly works unchanged, and gets whatever

So to be clear, what you're proposing there is to use the insertion order as
the index - once you go that far you've almost entirely re-invented my
proposal!

In fact, when I was working my proofchains/proofmarshal libraries I put some
thought into whether or not I could leave out the MMR merkelized list
implementation and use only the key-value map I also wrote. I decided to
include both as they aren't quite the same datastructure - using a list for=
 a
list has advantages.

> benefits there are to this clustering plus the high performance
> implementation tricks I've built which I keep complaining that nobody's
> giving feedback on.

Your merkle-set implementation is 1500 lines of densely written Python with
almost no comments, and less than a 100 lines of (also uncommented) tests. =
By
comparison, my Python MMR implementation is 300 lines of very readable Pyth=
on
with lots of comments, a 200 line explanation at the top, and 200 lines of
(commented) tests. Yet no-one is taking the (still considerable) effort to
understand and comment on my implementation. :)

Fact is, what you've written is really daunting to review, and given it's n=
ot
in the final language anyway, it's unclear what basis to review it on anywa=
y. I
suspect you'd get more feedback if the codebase was better commented, in a
production language, and you have actual real-world benchmarks and performa=
nce
figures.

In particular, while at the top of merkle_set.py you have a list of advanta=
ges,
and a bunch of TODO's, you don't explain *why* the code has any of these
advantages. To figure that out, I'd have to read and understand 1500 lines =
of
densely written Python. Without a human-readable pitch, not many people are
going to do that, myself included.

> I'm not sold on this being a win: The empirical access patterns are
> unknown,

Lost coins alone guarantees that access patterns will be biased towards new
coins being more likely to be spent. That basis alone is sufficient to just=
ify
an insertion-ordered data structure. Additionally, people have done graphs =
of
the average age of UTXO's when spent, and that data clearly shows that newer
coins are more likely to be spent than older coins.

> unknown, it requires an extra cache miss per lookup to find the entry
> number,

Like I mentioned in the email you're replying to, that extra lookup can be
easily avoided with a change to how transactions/blocks are serialized; if =
all
your peers support TXO commitments you can even discard the lookup database
entirely, as it's only a backwards compatibility measure.

> it may be that everything is optimized well enough without it for
> there to be no meaningful gains, and it's a bunch of extra complexity. Wh=
at

Optimization is itself extra complexity. If you're data structure has worse
inherent performance, and you have to make up the different with a highly
optimized implementation, that's likely to lead to more overall complexity =
than
using a data structure with better inherent performance.

Your current merkle-set implementation definitely _is_ very complex. An
apples-to-apples comparison is with my merkelized key:value tree(1), also a
patricia tree, which like the MMR is only about 300 lines of well-commented=
 and
straight-forward code.

1) https://github.com/proofchains/python-proofmarshal/blob/master/proofmars=
hal/merbinnertree.py

> should be done is that a plain vanilla UTXO set solution is optimized as
> well as it can be first, and then the insertion ordering trick is tried as
> an optimization to see if it's an improvement. Without that baseline
> there's no meaningful basis for comparison, and I'm quite confident that a
> naive implementation which just allocates individual nodes will
> underperform the thing I've come up with, even without adding optimizatio=
ns
> related to fitting in 64 bits.

To be clear, "insertion ordering" isn't a simple trick, it's a fundamental
change to what the data structure is. Once you do that, you're talking abou=
t my
proposal.

--=20
https://petertodd.org 'peter'[:-1]@petertodd.org

--6c2NcOVqGQ03X4Wi
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature

-----BEGIN PGP SIGNATURE-----

iQEcBAEBCAAGBQJYsQQPAAoJECSBQD2l8JH7XH4H/02u4PvOnJldydIt8yOi6ksO
lfuoqCbDj6z6wN7cyz0jb24TNfzzoMPNdd0OVCyyB9FhSH4MSokIxMDJJJR0NANi
04/Fm74+J0C+rLh3IysTZpUVf53F6HfDCtiK9XdNeOkncyRhEvYc99ULlgpnWQ2o
JpFeTDVilDalCPIbZ+SvkYK1jK/MaB5tANkn8nVc7PgNKCp3lRyifhm8LIaiM4SG
ZuV96pI/Pg3HKW1NA2fkqjAXnkgpQxjZIz/SbL/fe66pLYoWmFKL+FKWPgFPhFUn
kDKAjG0AXaIBWPvYQLgaXDW7g3dTAWnVJTmrWgp0U6/xxwk+c+MZ88EY/d7/fW4=
=p9PW
-----END PGP SIGNATURE-----

--6c2NcOVqGQ03X4Wi--