summaryrefslogtreecommitdiff
path: root/91/4b1c6a5f5a07426174ca1e9bf3678d3e890a3d
blob: 455b5929f5f7247e17669476479710df4a250d7e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
Return-Path: <bram@bittorrent.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id BCDF471
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat, 18 Jun 2016 02:44:09 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-it0-f45.google.com (mail-it0-f45.google.com
	[209.85.214.45])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 483BE1FB
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat, 18 Jun 2016 02:44:08 +0000 (UTC)
Received: by mail-it0-f45.google.com with SMTP id a5so7284581ita.1
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 17 Jun 2016 19:44:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=bittorrent-com.20150623.gappssmtp.com; s=20150623;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to
	:cc; bh=v7k5ULenbVHN2vt63imfIOm2dg4bHKc3yTwgq71816A=;
	b=tYpr8qiFa/1XRYsaexiV/z3RYY3/8gOAiBtwJu2MUQn98dunYjDLDrIim2wyTXRe3/
	bpEJKo95z8TNEjVfSbPbgVhVqesmy5TE/8dXSY8n1gOEkEHdK64PbhahF0gLLcoblQgG
	k6lMypFYIwJQ94/UIZUy8+BGJ3xAuXiOwkDO3JQ8hkwEFw7lTybyOgeurjTv0AQlvZaL
	edTSF/YOggc52Uddw3UQb40odD5+tpKduEGhC4rKW7ZhgkAS8Tp9brKUz6ajqYcMuytj
	ptBo1MZWK7m3mdVhtLA2uVCpEPZoemBHl70Lgrrkgn8oOuumMemFjFnILs8rMn3MJ6dD
	DXCQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:mime-version:in-reply-to:references:from:date
	:message-id:subject:to:cc;
	bh=v7k5ULenbVHN2vt63imfIOm2dg4bHKc3yTwgq71816A=;
	b=TPjhUjKF/8ajuQkwv/KeVDcWtBFGO2aeGnyTX/uK1OTN31Q2JwVNZqN+1AJgxzzXFD
	dWmxudMOqZ6ZTHEyY8Ue/upprYsP/iti4r/R6Yov2xpB9HsmOTwppRW+DAq4ZV5ziQfX
	agSZeACnidUjUqcGcmzJOKtb7sgMO7WRzXHWyOU5XFDDjmlxFHkxwlcfkKK0czHidpxr
	rbftBfjNkYNIjinczn/BY9/LH+GbeZLziRYGlPP3MCOT3Y98zajEvJ1hHogPYwbi8GPy
	sYkZkcXKqqnjrixKn5s/XwjpTzTzl04Y9cUFRrRVPH+jo0sWiKKpS3LXrI9sw3uZTIc0
	Se+w==
X-Gm-Message-State: ALyK8tKX/sMQgCa2ey8Yp4CD3ky0TnixivzfHUFWc9smyCGTgVRsgaSCuNIzeTKnz92fbAdb+aPtfEKijYhHozt5
X-Received: by 10.36.120.71 with SMTP id p68mr2331130itc.46.1466217847480;
	Fri, 17 Jun 2016 19:44:07 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.36.134.68 with HTTP; Fri, 17 Jun 2016 19:43:47 -0700 (PDT)
In-Reply-To: <20160617043435.GA12800@fedora-21-dvm>
References: <CA+KqGkosGrWQ2Lpg3Ptc+UyidK7H4_HLQ_QWx2DOAFRmLkv4zQ@mail.gmail.com>
	<20160616001040.GA5026@fedora-21-dvm>
	<CA+KqGkqAHcU2PzEX6OmorXRBQ22eF_QBYYqUDS1v_ZvevhLCuQ@mail.gmail.com>
	<20160616032612.GA7792@fedora-21-dvm>
	<CA+KqGkqA2WnvBEck3kv6p2no-9wzNVCTNA-MGw=Jg=gMGfrxUQ@mail.gmail.com>
	<20160617043435.GA12800@fedora-21-dvm>
From: Bram Cohen <bram@bittorrent.com>
Date: Fri, 17 Jun 2016 19:43:47 -0700
Message-ID: <CA+KqGkpRmeKyo6TFpe+uUCdJSina+ARraNd0dkHSb2Hpx5dYuw@mail.gmail.com>
To: Peter Todd <pete@petertodd.org>
Content-Type: multipart/alternative; boundary=001a114a9388f67e5e0535847144
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HTML_MESSAGE,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] Merkle trees and mountain ranges
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Jun 2016 02:44:09 -0000

--001a114a9388f67e5e0535847144
Content-Type: text/plain; charset=UTF-8

On Thu, Jun 16, 2016 at 9:34 PM, Peter Todd <pete@petertodd.org> wrote:

> So above you said that in merbinner trees each node "hash[es] in a record
> of
> its depth" That's actually incorrect: each node commits to the prefix that
> all
> keys below that level start with, not just the depth.


I considered a similar trick at the implementation rather than the
definition level: A node doesn't have to store the prefix which is implicit
in its position. That would create a fair number of headaches though,
because I'm using fixed size stuff in important ways, and it could at most
save about 10% of memory, so it goes into the 'maybe later' bucket.


>
> This means that in merbinner trees, cases where multiple keys share parts
> of
> the same prefix are handled efficiently, without introducing extra levels
> unnecessarily; there's no need for the ONLY0/1 nodes as the children of an
> inner node will always be on different sides.
>
> When keys are randomly distributed, this isn't a big deal; OTOH against
> attackers who are choosing keys, e.g. by grinding hashes, merbinner trees
> always have maximum depths in proportion to log2(n) of the actual number of
> items in the tree. Grinding is particularly annoying to deal with due to
> the
> birthday attack: creating a ground prefix 64 bits long only takes 32 bits
> worth
> of work.
>

Yes an attacker can force the tree to be deeper in places, but it's
mitigated in several ways: (1) The way I'm using memory it won't cause a
whole new block to be allocated, it will just force log(attack strength) -
log(n) nodes to be used (2) logarithmic growth being what it is that isn't
such a big amount (3) With the special casing of TERMBOTH an attacker needs
three things with the same prefix to pull off an attack rather than two,
which is quite a bit harder to pull off.

That said, it wouldn't be all that hard to change how the hashing function
works to do a single hash for a whole series of ONLY in a row instead of a
new one at every level, which would make the attacker only able to force
extra memory usage instead of extra CPU, but this is a slightly annoying
thing to write to stop a fairly lame attack, so I'm at least not doing it
for my initial implementation. I could likely be convinced that it's worth
doing before an actual release though. There's another implementation trick
to do the same thing for memory usage, which is much more in the 'do later'
category because it doesn't involve changing the format and hence it can be
put off.


> In particular, case #2 handles your leaf node optimizations generically,
> without special cases and additional complexity. It'd also be a better way
> to
> do the ONLY0/1 cases, as if the "nothing on this side" symbol is a single
> byte,
> each additional colliding level would simply extend the commitment without
> hashing. In short, you'd have nearly the same level of optimization even
> if at
> the cryptography level your tree consists of only leaves, inner nodes, and
> nil.
>

I'm taking pains to make all the hashing be of fixed-size things, so that a
non-padding variant of a secure hashing algorithm can be used. The chains
of ONLY thing above would force a special exception to that, which can be
done but is annoying. Making things smaller than a single block (64 bytes)
won't speed up hashing time, and making things a single byte longer than
that doubles it.


> Another advantage of variable sized commitments is that it can help make
> clear
> to users when it's possible to brute force the message behind the
> commitment.
> For instance, digest from a hashed four byte integer can be trivially
> reversed
> by just trying all combinations. Equally, if that integer is concatenated
> with
> a 32 byte digest that the attacker knows, the value of the integer can be
> brute
> forced.
>

I'm hashing all strings before inserting to get them to be a fixed size and
avoid a few different attacks. In Bitcoin most of the strings added are
longer than that so it's a form of compression. A custom hash function
could be used which 'hashes' very short strings by repeating them verbatim
could be used, but seems like not such a hot idea. I'm making extensive use
of things being fixed size everywhere, which improves performance in a lot
of ways.


> > > Technically even a patricia trie utxo commitment can have sub-1 cache
> > > > misses per update if some of the updates in a single block are close
> to
> > > > each other in memory. I think I can get practical Bitcoin updates
> down
> > > to a
> > > > little bit less than one l2 cache miss per update, but not a lot
> less.
> > >
> > > I'm very confused as to why you think that's possible. When you say
> > > "practical
> > > Bitcoin updates", what exactly is the data structure you're proposing
> to
> > > update? How is it indexed?
>

I'll re-answer this because I did a terrible job before. The entire data
structure consists of nodes which contain a metadata byte (TERM0, ONLY1,
etc.) followed by fixes size secure hashes, and (in some cases) pointers to
where the children are. The secure hashes in parent nodes are found by
hashing their children verbatim (or the stored string in the case of a
TERM). This is very conventional. All of the cleverness is in where in
memory these nodes are stored so that tracing down the tree causes very few
cache misses.

(The alternate approach is to have each node store its own hash rather than
that be stored by the parent. That approach means that when you're
recalculating you have to look up siblings which doubles the number of
cache misses. Not such a hot idea.)

At the root there's a branch block. It consists of all nodes up to some
fixed depth - let's say 12 - with that depth set so that it roughly fits
within a single memory page. Branch blocks are arranged with the nodes in
fixed position defined by the prefix they correspond to, and the terminals
have outpointers to other blocks. Because they're all clustered together, a
lookup or update will only require a single

Below the root block are other branch blocks. Each of them has a fixed 12
bit prefix it is responsible for. When doing a lookup a second cache miss
will be hit for levels 13-24, because those are all clustered in the same
branch block.

Below the second level of root block (at Bitcoin utxo set scale - this
varies based on how much is stored) there are leaf blocks. A leaf block
consists of nodes with outpointers to its own children which must be within
the same leaf block. All entry points into a leaf block are from the same
branch block, and the leaf block has no out pointers to other blocks. When
a leaf block overflows the entry point into it which overflowed is moved
into the active leaf for that branch, and if that's full a new one is
allocated. There's some subtlety to exactly how this is done, but I've
gotten everything down to simple expedient tricks with straightforward
implementations. The thing which matters for now is that there's only a
single cache miss for each leaf node, because they also fit in a page.

So at Bitcoin scale there will probably only be 3 cache misses for a
lookup, and that's a worst case scenario. The first one is probably always
warm, bringing it down to 2, and if you do a bunch in sorted order they'll
probably hit the same second level branches repeatedly bringing it down to
1, and might even average less than that if there are enough that the leaf
block has multiple things being accessed.

(These same tricks can be applied to merbinner tree implementation as well,
although that would be a bit less parsimonious with memory, by a small
constant factor.)


> Anyway hashing is pretty slow. The very fast BLAKE2 is about 3 cycles/byte
> (SHA256 is about 15 cycles/byte) so hashing that same data would take
> around
> 200 cycles, and probably quite a bit more in practice due to overheads
> from our
> short message lengths; fetching a cache line from DRAM only takes about
> 1,000
> cycles. I'd guess that once other overheads are taken into account, even
> if you
> could eliminate L2/L3 cache-misses it wouldn't be much of an improvement.
>

Those numbers actually back up my claims about performance. If you're doing
a single update and recalculating the root afterwards, then the amount of
rehashing to be done is about 30 levels deep times 64 bytes per thing
hashed times 15 cycles per byte then it's about 28,800 cycles of hashing.
If you have a naive radix tree implementation which hits a cache miss at
every level then that's 30,000 cycles, which is about half the performance
time, certainly worth optimizing. If instead of sha256 you use blake2
(Which sounds like a very good idea!) then hashing for an update will be
about 5760 cycles and performance will be completely dominated by cache
misses. If a more cache coherent implementation is used, then the cost of
cache misses will be 3000 cycles, which will be a non-factor with sha256
and a significant but not dominating one with blake2.

It's reasonable to interpret those numbers as saying that blake2 and cache
coherent implementation are both clearly worth it (probably necessary for
real adoption) and that an amortized binary radix tree is tempting but not
worth it.

--001a114a9388f67e5e0535847144
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On T=
hu, Jun 16, 2016 at 9:34 PM, Peter Todd <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:pete@petertodd.org" target=3D"_blank">pete@petertodd.org</a>&gt;</span=
> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex">So above you said that in merbin=
ner trees each node &quot;hash[es] in a record of<br>
its depth&quot; That&#39;s actually incorrect: each node commits to the pre=
fix that all<br>
keys below that level start with, not just the depth.</blockquote><div><br>=
</div><div>I considered a similar trick at the implementation rather than t=
he definition level: A node doesn&#39;t have to store the prefix which is i=
mplicit in its position. That would create a fair number of headaches thoug=
h, because I&#39;m using fixed size stuff in important ways, and it could a=
t most save about 10% of memory, so it goes into the &#39;maybe later&#39; =
bucket.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
This means that in merbinner trees, cases where multiple keys share parts o=
f<br>
the same prefix are handled efficiently, without introducing extra levels<b=
r>
unnecessarily; there&#39;s no need for the ONLY0/1 nodes as the children of=
 an<br>
inner node will always be on different sides.<br>
<br>
When keys are randomly distributed, this isn&#39;t a big deal; OTOH against=
<br>
attackers who are choosing keys, e.g. by grinding hashes, merbinner trees<b=
r>
always have maximum depths in proportion to log2(n) of the actual number of=
<br>
items in the tree. Grinding is particularly annoying to deal with due to th=
e<br>
birthday attack: creating a ground prefix 64 bits long only takes 32 bits w=
orth<br>
of work.<br></blockquote><div><br></div><div>Yes an attacker can force the =
tree to be deeper in places, but it&#39;s mitigated in several ways: (1) Th=
e way I&#39;m using memory it won&#39;t cause a whole new block to be alloc=
ated, it will just force log(attack strength) - log(n) nodes to be used (2)=
 logarithmic growth being what it is that isn&#39;t such a big amount (3) W=
ith the special casing of TERMBOTH an attacker needs three things with the =
same prefix to pull off an attack rather than two, which is quite a bit har=
der to pull off.</div><div><br></div><div>That said, it wouldn&#39;t be all=
 that hard to change how the hashing function works to do a single hash for=
 a whole series of ONLY in a row instead of a new one at every level, which=
 would make the attacker only able to force extra memory usage instead of e=
xtra CPU, but this is a slightly annoying thing to write to stop a fairly l=
ame attack, so I&#39;m at least not doing it for my initial implementation.=
 I could likely be convinced that it&#39;s worth doing before an actual rel=
ease though. There&#39;s another implementation trick to do the same thing =
for memory usage, which is much more in the &#39;do later&#39; category bec=
ause it doesn&#39;t involve changing the format and hence it can be put off=
.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0=
 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">In particular, case =
#2 handles your leaf node optimizations generically,<br>
without special cases and additional complexity. It&#39;d also be a better =
way to<br>
do the ONLY0/1 cases, as if the &quot;nothing on this side&quot; symbol is =
a single byte,<br>
each additional colliding level would simply extend the commitment without<=
br>
hashing. In short, you&#39;d have nearly the same level of optimization eve=
n if at<br>
the cryptography level your tree consists of only leaves, inner nodes, and =
nil.<br></blockquote><div><br></div><div>I&#39;m taking pains to make all t=
he hashing be of fixed-size things, so that a non-padding variant of a secu=
re hashing algorithm can be used. The chains of ONLY thing above would forc=
e a special exception to that, which can be done but is annoying. Making th=
ings smaller than a single block (64 bytes) won&#39;t speed up hashing time=
, and making things a single byte longer than that doubles it.</div><div>=
=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex">Another advantage of variable si=
zed commitments is that it can help make clear<br>
to users when it&#39;s possible to brute force the message behind the commi=
tment.<br>
For instance, digest from a hashed four byte integer can be trivially rever=
sed<br>
by just trying all combinations. Equally, if that integer is concatenated w=
ith<br>
a 32 byte digest that the attacker knows, the value of the integer can be b=
rute<br>
forced.<br></blockquote><div><br></div><div>I&#39;m hashing all strings bef=
ore inserting to get them to be a fixed size and avoid a few different atta=
cks. In Bitcoin most of the strings added are longer than that so it&#39;s =
a form of compression. A custom hash function could be used which &#39;hash=
es&#39; very short strings by repeating them verbatim could be used, but se=
ems like not such a hot idea. I&#39;m making extensive use of things being =
fixed size everywhere, which improves performance in a lot of ways.</div><d=
iv>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex=
;border-left:1px #ccc solid;padding-left:1ex"><span class=3D"">&gt; &gt; Te=
chnically even a patricia trie utxo commitment can have sub-1 cache<br>
&gt; &gt; &gt; misses per update if some of the updates in a single block a=
re close to<br>
&gt; &gt; &gt; each other in memory. I think I can get practical Bitcoin up=
dates down<br>
&gt; &gt; to a<br>
&gt; &gt; &gt; little bit less than one l2 cache miss per update, but not a=
 lot less.<br>
&gt; &gt;<br>
&gt; &gt; I&#39;m very confused as to why you think that&#39;s possible. Wh=
en you say<br>
&gt; &gt; &quot;practical<br>
&gt; &gt; Bitcoin updates&quot;, what exactly is the data structure you&#39=
;re proposing to<br>
&gt; &gt; update? How is it indexed?<br></span></blockquote><div><br></div>=
<div>I&#39;ll re-answer this because I did a terrible job before. The entir=
e data structure consists of nodes which contain a metadata byte (TERM0, ON=
LY1, etc.) followed by fixes size secure hashes, and (in some cases) pointe=
rs to where the children are. The secure hashes in parent nodes are found b=
y hashing their children verbatim (or the stored string in the case of a TE=
RM). This is very conventional. All of the cleverness is in where in memory=
 these nodes are stored so that tracing down the tree causes very few cache=
 misses.</div><div><br></div><div>(The alternate approach is to have each n=
ode store its own hash rather than that be stored by the parent. That appro=
ach means that when you&#39;re recalculating you have to look up siblings w=
hich doubles the number of cache misses. Not such a hot idea.)</div><div><b=
r></div><div>At the root there&#39;s a branch block. It consists of all nod=
es up to some fixed depth - let&#39;s say 12 - with that depth set so that =
it roughly fits within a single memory page. Branch blocks are arranged wit=
h the nodes in fixed position defined by the prefix they correspond to, and=
 the terminals have outpointers to other blocks. Because they&#39;re all cl=
ustered together, a lookup or update will only require a single=C2=A0</div>=
<div><br></div><div>Below the root block are other branch blocks. Each of t=
hem has a fixed 12 bit prefix it is responsible for. When doing a lookup a =
second cache miss will be hit for levels 13-24, because those are all clust=
ered in the same branch block.</div><div><br></div><div>Below the second le=
vel of root block (at Bitcoin utxo set scale - this varies based on how muc=
h is stored) there are leaf blocks. A leaf block consists of nodes with out=
pointers to its own children which must be within the same leaf block. All =
entry points into a leaf block are from the same branch block, and the leaf=
 block has no out pointers to other blocks. When a leaf block overflows the=
 entry point into it which overflowed is moved into the active leaf for tha=
t branch, and if that&#39;s full a new one is allocated. There&#39;s some s=
ubtlety to exactly how this is done, but I&#39;ve gotten everything down to=
 simple expedient tricks with straightforward implementations. The thing wh=
ich matters for now is that there&#39;s only a single cache miss for each l=
eaf node, because they also fit in a page.</div><div><br></div><div>So at B=
itcoin scale there will probably only be 3 cache misses for a lookup, and t=
hat&#39;s a worst case scenario. The first one is probably always warm, bri=
nging it down to 2, and if you do a bunch in sorted order they&#39;ll proba=
bly hit the same second level branches repeatedly bringing it down to 1, an=
d might even average less than that if there are enough that the leaf block=
 has multiple things being accessed.</div><div><br></div><div>(These same t=
ricks can be applied to merbinner tree implementation as well, although tha=
t would be a bit less parsimonious with memory, by a small constant factor.=
)</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0=
 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Anyway hashing is pr=
etty slow. The very fast BLAKE2 is about 3 cycles/byte<br>
(SHA256 is about 15 cycles/byte) so hashing that same data would take aroun=
d<br>
200 cycles, and probably quite a bit more in practice due to overheads from=
 our<br>
short message lengths; fetching a cache line from DRAM only takes about 1,0=
00<br>
cycles. I&#39;d guess that once other overheads are taken into account, eve=
n if you<br>
could eliminate L2/L3 cache-misses it wouldn&#39;t be much of an improvemen=
t.<br></blockquote><div><br></div><div>Those numbers actually back up my cl=
aims about performance. If you&#39;re doing a single update and recalculati=
ng the root afterwards, then the amount of rehashing to be done is about 30=
 levels deep times 64 bytes per thing hashed times 15 cycles per byte then =
it&#39;s about 28,800 cycles of hashing. If you have a naive radix tree imp=
lementation which hits a cache miss at every level then that&#39;s 30,000 c=
ycles, which is about half the performance time, certainly worth optimizing=
. If instead of sha256 you use blake2 (Which sounds like a very good idea!)=
 then hashing for an update will be about 5760 cycles and performance will =
be completely dominated by cache misses. If a more cache coherent implement=
ation is used, then the cost of cache misses will be 3000 cycles, which wil=
l be a non-factor with sha256 and a significant but not dominating one with=
 blake2.=C2=A0</div><div><br></div><div>It&#39;s reasonable to interpret th=
ose numbers as saying that blake2 and cache coherent implementation are bot=
h clearly worth it (probably necessary for real adoption) and that an amort=
ized binary radix tree is tempting but not worth it.</div><div><br></div></=
div></div></div>

--001a114a9388f67e5e0535847144--