summaryrefslogtreecommitdiff
path: root/f6/7ea606ffc3c08c8d6e8c4fb51b6353be8b7abf
blob: 2eba8f00df9fe2f6adcc2cd0d228c8730cdbb704 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
Return-Path: <bram@bittorrent.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id BB6BA9D
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed, 15 Jun 2016 00:14:44 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-it0-f54.google.com (mail-it0-f54.google.com
	[209.85.214.54])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id A93B7116
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed, 15 Jun 2016 00:14:43 +0000 (UTC)
Received: by mail-it0-f54.google.com with SMTP id z189so92070953itg.0
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Tue, 14 Jun 2016 17:14:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=bittorrent-com.20150623.gappssmtp.com; s=20150623;
	h=mime-version:from:date:message-id:subject:to;
	bh=JsmKdx/sK5olhBrNh5RcYacTrUSIVUTPxRG+3v3a2gc=;
	b=wAw4/k5Q7w0SxS9bZiVAufx/ThRyL/KG7ueuRKToZW34BFcnbYi/sSr+HNFGXW8c6k
	D3Yv+pNljUANAEzIfAXZdDgxNRCInu2Q1z7Cqw3LRtKgFVFyfflkQqyBPT7UNYG1+wo6
	0JGwHu50J2kK8C4/FhgXKmkLPr/OFzIdLWxaWSWUwwW6hLXl8Z4u003vqNhg6c9WxJ8e
	KeVWsF5Z5w78caCHwmlpF6FA7VpWzkaNU7otv//zufH7X0gQBuRq/4DzcWcQlvqObjFz
	U0ki0QgOITa6iZotnTDz8PIXAz+7tftC5IGowNVSGNshktxPxBFF38B5gJP6wI09XXIs
	eMpA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
	bh=JsmKdx/sK5olhBrNh5RcYacTrUSIVUTPxRG+3v3a2gc=;
	b=TnHEQLEvUNTdd/7j12BVcWYwfgCuZ77RriozaqoOLHStFDQ1BL/bwgNdPipQNxujvd
	NXY7mjLqt6/LsXjHkD8nb8Sv5ayRmKY+hpMHgDR0VLlfrTXD7MuiclvWmAQTUBHSjFWY
	nNFc7yAq4zMpfQ8FBtFoKji5C20Wn7cMXP5sQGhSSDEkEDF0nV5J7jTZRHDkXUQFcrEb
	caaz0mQouZHF4v0SyYLRgLyMEGg18mwB1b1ZRJC7ZMLBMIAJ0CX/rljgchA+p8i4J1j3
	TNad4l0VcobkTZYn0sK1Aqvzy4DWFi42cy8GzrRk1Q3CPi17yvZCJ19Q38VO7znUbkWQ
	tT4Q==
X-Gm-Message-State: ALyK8tKngBzzS+92zBfec25s1dFSHL/tBVesTRpxMJ/qRuRXHyVqWChMtegDKVQx3OX9xUtLVC2pct6cynUDxyIb
X-Received: by 10.36.127.196 with SMTP id r187mr31080834itc.6.1465949682882;
	Tue, 14 Jun 2016 17:14:42 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.36.134.68 with HTTP; Tue, 14 Jun 2016 17:14:23 -0700 (PDT)
From: Bram Cohen <bram@bittorrent.com>
Date: Tue, 14 Jun 2016 17:14:23 -0700
Message-ID: <CA+KqGkosGrWQ2Lpg3Ptc+UyidK7H4_HLQ_QWx2DOAFRmLkv4zQ@mail.gmail.com>
To: bitcoin-dev@lists.linuxfoundation.org
Content-Type: multipart/alternative; boundary=001a114753aa1b6bbd05354602c6
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HTML_MESSAGE,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Wed, 15 Jun 2016 08:01:52 +0000
Subject: [bitcoin-dev] Merkle trees and mountain ranges
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Jun 2016 00:14:44 -0000

--001a114753aa1b6bbd05354602c6
Content-Type: text/plain; charset=UTF-8

This is in response to Peter Todd's proposal for Merkle Mountain Range
commitments in blocks:

https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/012715.html

I'm in strong agreement that there's a compelling need to put UTXO
commitments in blocks, and that the big barrier to getting it done is
performance, particularly latency. But I have strong disagreements (or
perhaps the right word is skepticism) about the details.

Peter proposes that there should be both UTXO and STXO commitments, and
they should be based on Merkle Mountain Ranges based on Patricia Tries. My
first big disagreement is about the need for STXO commitments. I think
they're unnecessary and a performance problem. The STXO set is much larger
than the utxo set and requires much more memory and horespower to maintain.
Most if not all of its functionality can in practice be done using the utxo
set. Almost anything accepting proofs of inclusion and exclusion will have
a complete history of block headers, so to prove inclusion in the stxo set
you can use a utxo proof of inclusion in the past and a proof of exclusion
for the most recent block. In the case of a txo which has never been
included at all, it's generally possible to show that an ancestor of the
txo in question was at one point included but that an incompatible
descendant of it (or the ancestor itself) is part of the current utxo set.
Generating these sorts of proofs efficiently can for some applications
require a complete STXO set, but that can done with a non-merkle set,
getting the vastly better performance of an ordinary non-cryptographic
hashtable.

The fundamental approach to handling the latency problem is to have the
utxo commitments trail a bit. Computing utxo commitments takes a certain
amount of time, too much to hold up block propagation but still hopefully
vastly less than the average amount of time between blocks. Trailing by a
single block is probably a bad idea because you sometimes get blocks back
to back, but you never get blocks back to back to back to back. Having the
utxo set be trailing by a fixed amount - five blocks is probably excessive
- would do a perfectly good job of keeping latency from every becoming an
issue. Smaller commitments for the utxos added and removed in each block
alone could be added without any significant performance penalty. That way
all blocks would have sufficient commitments for a completely up to date
proofs of inclusion and exclusion. This is not a controversial approach.

Now I'm going to go out on a limb. My thesis is that usage of a mountain
range is unnecessary, and that a merkle tree in the raw can be made
serviceable by sprinkling magic pixie dust on the performance problem.

There are two causes of performance problems for merkle trees: hashing
operations and memory cache misses. For hashing functions, the difference
between a mountain range and a straight merkle tree is roughly that in a
mountain range there's one operation for each new update times the number
of times that thing will get merged into larger hills. If there are fewer
levels of hills the number of operations is less but the expense of proof
of inclusion will be larger. For raw merkle trees the number of operations
per thing added is the log base 2 of the number of levels in the tree,
minus the log base 2 of the number of things added at once since you can do
lazy evaluation. For practical Bitcoin there are (very roughly) a million
things stored, or 20 levels, and there are (even more roughly) about a
thousand things stored per block, so each thing forces about 20 - 10 = 10
operations. If you follow the fairly reasonable guideline of mountain range
hills go up by factors of four, you instead have 20/2 = 10 operations per
thing added amortized. Depending on details this comparison can go either
way but it's roughly a wash and the complexity of a mountain range is
clearly not worth it at least from the point of view of CPU costs.

But CPU costs aren't the main performance problem in merkle trees. The
biggest issues is cache misses, specifically l1 and l2 cache misses. These
tend to take a long time to do, resulting in the CPU spending most of its
time sitting around doing nothing. A naive tree implementation is pretty
much the worst thing you can possibly build from a cache miss standpoint,
and its performance will be completely unacceptable. Mountain ranges do a
fabulous job of fixing this problem, because all their updates are merges
so the metrics are more like cache misses per block instead of cache misses
per transaction.

The magic pixie dust I mentioned earlier involves a bunch of subtle
implementation details to keep cache coherence down which should get the
number of cache misses per transaction down under one, at which point it
probably isn't a bottleneck any more. There is an implementation in the
works here:

https://github.com/bramcohen/MerkleSet

This implementation isn't finished yet! I'm almost there, and I'm
definitely feeling time pressure now. I've spent quite a lot of time on
this, mostly because of a bunch of technical reworkings which proved
necessary. This is the last time I ever write a database for kicks. But
this implementation is good on all important dimensions, including:

Lazy root calculation
Few l1 and l2 cache misses
Small proofs of inclusion/exclusion
Reasonably simple implementation
Reasonably efficient in memory
Reasonable defense against malicious insertion attacks

There is a bit of a false dichotomy with the mountain range approach.
Mountain ranges need underlying merkle trees, and mine are semantically
nearly identically to Peter's. This is not a coincidence - I adopted
patricia tries at his suggestion. There are a bunch of small changes which
allow a more efficient implementation. I believe that my underlying merkle
tree is unambiguously superior in every way, but the question of whether a
mountain range is worth it is one which can only be answered empirically,
and that requires a bunch of implementation work to be done, starting with
me finishing my merkle tree implementation and then somebody porting it to
C and optimizing it. The Python version has details which are ridiculous
and only make sense once it gets ported, and even under the best of
conditions Python performance is not strongly indicative of C performance.

--001a114753aa1b6bbd05354602c6
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>This is in response to Peter Todd&#39;s proposal for =
Merkle Mountain Range commitments in blocks:</div><div><br></div><div><a hr=
ef=3D"https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/0127=
15.html">https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/0=
12715.html</a></div><div><br></div><div>I&#39;m in strong agreement that th=
ere&#39;s a compelling need to put UTXO commitments in blocks, and that the=
 big barrier to getting it done is performance, particularly latency. But I=
 have strong disagreements (or perhaps the right word is skepticism) about =
the details.</div><div><br></div><div>Peter proposes that there should be b=
oth UTXO and STXO commitments, and they should be based on Merkle Mountain =
Ranges based on Patricia Tries. My first big disagreement is about the need=
 for STXO commitments. I think they&#39;re unnecessary and a performance pr=
oblem. The STXO set is much larger than the utxo set and requires much more=
 memory and horespower to maintain. Most if not all of its functionality ca=
n in practice be done using the utxo set. Almost anything accepting proofs =
of inclusion and exclusion will have a complete history of block headers, s=
o to prove inclusion in the stxo set you can use a utxo proof of inclusion =
in the past and a proof of exclusion for the most recent block. In the case=
 of a txo which has never been included at all, it&#39;s generally possible=
 to show that an ancestor of the txo in question was at one point included =
but that an incompatible descendant of it (or the ancestor itself) is part =
of the current utxo set. Generating these sorts of proofs efficiently can f=
or some applications require a complete STXO set, but that can done with a =
non-merkle set, getting the vastly better performance of an ordinary non-cr=
yptographic hashtable.</div><div><br></div><div>The fundamental approach to=
 handling the latency problem is to have the utxo commitments trail a bit. =
Computing utxo commitments takes a certain amount of time, too much to hold=
 up block propagation but still hopefully vastly less than the average amou=
nt of time between blocks. Trailing by a single block is probably a bad ide=
a because you sometimes get blocks back to back, but you never get blocks b=
ack to back to back to back. Having the utxo set be trailing by a fixed amo=
unt - five blocks is probably excessive - would do a perfectly good job of =
keeping latency from every becoming an issue. Smaller commitments for the u=
txos added and removed in each block alone could be added without any signi=
ficant performance penalty. That way all blocks would have sufficient commi=
tments for a completely up to date proofs of inclusion and exclusion. This =
is not a controversial approach.</div><div><br></div><div>Now I&#39;m going=
 to go out on a limb. My thesis is that usage of a mountain range is unnece=
ssary, and that a merkle tree in the raw can be made serviceable by sprinkl=
ing magic pixie dust on the performance problem.</div><div><br></div><div>T=
here are two causes of performance problems for merkle trees: hashing opera=
tions and memory cache misses. For hashing functions, the difference betwee=
n a mountain range and a straight merkle tree is roughly that in a mountain=
 range there&#39;s one operation for each new update times the number of ti=
mes that thing will get merged into larger hills. If there are fewer levels=
 of hills the number of operations is less but the expense of proof of incl=
usion will be larger. For raw merkle trees the number of operations per thi=
ng added is the log base 2 of the number of levels in the tree, minus the l=
og base 2 of the number of things added at once since you can do lazy evalu=
ation. For practical Bitcoin there are (very roughly) a million things stor=
ed, or 20 levels, and there are (even more roughly) about a thousand things=
 stored per block, so each thing forces about 20 - 10 =3D 10 operations. If=
 you follow the fairly reasonable guideline of mountain range hills go up b=
y factors of four, you instead have 20/2 =3D 10 operations per thing added =
amortized. Depending on details this comparison can go either way but it&#3=
9;s roughly a wash and the complexity of a mountain range is clearly not wo=
rth it at least from the point of view of CPU costs.</div><div><br></div><d=
iv>But CPU costs aren&#39;t the main performance problem in merkle trees. T=
he biggest issues is cache misses, specifically l1 and l2 cache misses. The=
se tend to take a long time to do, resulting in the CPU spending most of it=
s time sitting around doing nothing. A naive tree implementation is pretty =
much the worst thing you can possibly build from a cache miss standpoint, a=
nd its performance will be completely unacceptable. Mountain ranges do a fa=
bulous job of fixing this problem, because all their updates are merges so =
the metrics are more like cache misses per block instead of cache misses pe=
r transaction.</div><div><br></div><div>The magic pixie dust I mentioned ea=
rlier involves a bunch of subtle implementation details to keep cache coher=
ence down which should get the number of cache misses per transaction down =
under one, at which point it probably isn&#39;t a bottleneck any more. Ther=
e is an implementation in the works here:</div><div><br></div><div><a href=
=3D"https://github.com/bramcohen/MerkleSet">https://github.com/bramcohen/Me=
rkleSet</a></div><div><br></div><div>This implementation isn&#39;t finished=
 yet! I&#39;m almost there, and I&#39;m definitely feeling time pressure no=
w. I&#39;ve spent quite a lot of time on this, mostly because of a bunch of=
 technical reworkings which proved necessary. This is the last time I ever =
write a database for kicks. But this implementation is good on all importan=
t dimensions, including:</div><div><br></div><div>Lazy root calculation</di=
v><div>Few l1 and l2 cache misses</div><div>Small proofs of inclusion/exclu=
sion</div><div>Reasonably simple implementation</div><div>Reasonably effici=
ent in memory</div><div>Reasonable defense against malicious insertion atta=
cks</div><div><br></div><div>There is a bit of a false dichotomy with the m=
ountain range approach. Mountain ranges need underlying merkle trees, and m=
ine are semantically nearly identically to Peter&#39;s. This is not a coinc=
idence - I adopted patricia tries at his suggestion. There are a bunch of s=
mall changes which allow a more efficient implementation. I believe that my=
 underlying merkle tree is unambiguously superior in every way, but the que=
stion of whether a mountain range is worth it is one which can only be answ=
ered empirically, and that requires a bunch of implementation work to be do=
ne, starting with me finishing my merkle tree implementation and then someb=
ody porting it to C and optimizing it. The Python version has details which=
 are ridiculous and only make sense once it gets ported, and even under the=
 best of conditions Python performance is not strongly indicative of C perf=
ormance.</div><div><br></div></div>

--001a114753aa1b6bbd05354602c6--