1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
|
Return-Path: <pete@petertodd.org>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 2742E7AA
for <bitcoin-dev@lists.linuxfoundation.org>;
Fri, 17 Jun 2016 04:34:43 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from outmail148114.authsmtp.net (outmail148114.authsmtp.net
[62.13.148.114])
by smtp1.linuxfoundation.org (Postfix) with ESMTP id 16C2B170
for <bitcoin-dev@lists.linuxfoundation.org>;
Fri, 17 Jun 2016 04:34:41 +0000 (UTC)
Received: from mail-c247.authsmtp.com (mail-c247.authsmtp.com [62.13.128.247])
by punt20.authsmtp.com (8.14.2/8.14.2/) with ESMTP id u5H4YdSm072714;
Fri, 17 Jun 2016 05:34:39 +0100 (BST)
Received: from petertodd.org (ec2-52-5-185-120.compute-1.amazonaws.com
[52.5.185.120]) (authenticated bits=0)
by mail.authsmtp.com (8.14.2/8.14.2/) with ESMTP id u5H4YaOu039543
(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
Fri, 17 Jun 2016 05:34:37 +0100 (BST)
Received: from [127.0.0.1] (localhost [127.0.0.1])
by petertodd.org (Postfix) with ESMTPSA id AB44840110;
Fri, 17 Jun 2016 04:32:38 +0000 (UTC)
Received: by localhost (Postfix, from userid 1000)
id 111FA20738; Fri, 17 Jun 2016 00:34:35 -0400 (EDT)
Date: Fri, 17 Jun 2016 00:34:35 -0400
From: Peter Todd <pete@petertodd.org>
To: Bram Cohen <bram@bittorrent.com>
Message-ID: <20160617043435.GA12800@fedora-21-dvm>
References: <CA+KqGkosGrWQ2Lpg3Ptc+UyidK7H4_HLQ_QWx2DOAFRmLkv4zQ@mail.gmail.com>
<20160616001040.GA5026@fedora-21-dvm>
<CA+KqGkqAHcU2PzEX6OmorXRBQ22eF_QBYYqUDS1v_ZvevhLCuQ@mail.gmail.com>
<20160616032612.GA7792@fedora-21-dvm>
<CA+KqGkqA2WnvBEck3kv6p2no-9wzNVCTNA-MGw=Jg=gMGfrxUQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
protocol="application/pgp-signature"; boundary="lrZ03NoBR/3+SXJZ"
Content-Disposition: inline
In-Reply-To: <CA+KqGkqA2WnvBEck3kv6p2no-9wzNVCTNA-MGw=Jg=gMGfrxUQ@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Server-Quench: cc9f5dec-3444-11e6-bcde-0015176ca198
X-AuthReport-Spam: If SPAM / abuse - report it at:
http://www.authsmtp.com/abuse
X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVKBZePFsRUQkR
aAdMdwUUEkAaAgsB AmAbW1FeU1l7WmQ7 bghPaBtcak9QXgdq
T0pMXVMcUQAQBXVQ eVseURB3cwYIenxx ZEYsDSNSCkEuIBVg
QUpQEXAHZDJmdTJM BBVFdwNVdQJNeEwU a1l3GhFYa3VsNCMk
FAgyOXU9MCtqYAFY WAIJIBoOW0sGBXY1 QRxKGDIyG1EMRiN7 NRUgJVMHdAAA
X-Authentic-SMTP: 61633532353630.1038:706
X-AuthFastPath: 0 (Was 255)
X-AuthSMTP-Origin: 52.5.185.120/25
X-AuthVirus-Status: No virus detected - but ensure you scan with your own
anti-virus system.
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] Merkle trees and mountain ranges
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Jun 2016 04:34:43 -0000
--lrZ03NoBR/3+SXJZ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Thu, Jun 16, 2016 at 02:07:26AM -0700, Bram Cohen wrote:
> On Wed, Jun 15, 2016 at 8:26 PM, Peter Todd <pete@petertodd.org> wrote:
> Okay, clearly my assumptions about the parts of that post I didn't read
> carefully were way off. I'll have to look through it carefully to be able
> to make coherent apples to apples comparisons.
Thanks!
> > I'm worried that once there's real transaction fees everyone might stop
> > > consolidating dust and the set of unspent transactions might grow wit=
hout
> > > bound as well, but that's a topic for another day.
> >
> > Ok, but then if you're concerned about that risk, why introduce a data
> > structure - the STXO set - that's _guaranteed_ to grow without bound?
> >
>=20
> I'm not proposing STXO set commitments either. My point was that there
> should be incentives for collecting dust. That has nothing to do with this
> thread though and should be discussed separately (also I don't feel like
> discussing it because I don't have a good proposal).
Ah, yeah, I misunderstood you there; as expected absolutely no-one is propo=
sing
STXO set commitments. :)
> > > The main differences to your patricia trie are the non-padding sha256=
and
> > > that each level doesn't hash in a record of its depth and the usage of
> > > ONLY0 and ONLY1.
> >
> > I'm rather confused, as the above sounds nothing like what I've
> > implemented,
> > which only has leaf nodes, inner nodes, and the special empty node
> > singleton,
> > for both the MMR and merbinner trees.
> >
>=20
> It's quite a bit like merbinner trees. I've basically taken the leaf nodes
> and smushed them into the inner nodes above them, thus saving a hashing
> operation and some memory. They're both binary radix trees.
Ah, I see what you mean now.
So above you said that in merbinner trees each node "hash[es] in a record of
its depth" That's actually incorrect: each node commits to the prefix that =
all
keys below that level start with, not just the depth.
This means that in merbinner trees, cases where multiple keys share parts of
the same prefix are handled efficiently, without introducing extra levels
unnecessarily; there's no need for the ONLY0/1 nodes as the children of an
inner node will always be on different sides.
When keys are randomly distributed, this isn't a big deal; OTOH against
attackers who are choosing keys, e.g. by grinding hashes, merbinner trees
always have maximum depths in proportion to log2(n) of the actual number of
items in the tree. Grinding is particularly annoying to deal with due to the
birthday attack: creating a ground prefix 64 bits long only takes 32 bits w=
orth
of work.
In my deterministic expressions work one of the ideas I've been tossing aro=
und
is rather than always using hash digests directly for when you need to comm=
it
to some data, we could instead extend the idea of a digest to that of a
"commitment", where a commitment is simply some short, but variable-sized,
string that uniquely maps to a given set of data. Secondly, commitments do
*not* always guarantee that the original data can't be recovered from the
commitment itself.
By allowing commitments to be variable sized - say 0 to ~64 bytes - we get a
number of advantages:
1) Data shorter than the length of a digest (32 bytes) can be included in t=
he
commitment itself, improving efficiency.
2) Data a little longer than a digest can have hashing delayed, to better f=
ill
up blocks.
In particular, case #2 handles your leaf node optimizations generically,
without special cases and additional complexity. It'd also be a better way =
to
do the ONLY0/1 cases, as if the "nothing on this side" symbol is a single b=
yte,
each additional colliding level would simply extend the commitment without
hashing. In short, you'd have nearly the same level of optimization even if=
at
the cryptography level your tree consists of only leaves, inner nodes, and =
nil.
Another advantage of variable sized commitments is that it can help make cl=
ear
to users when it's possible to brute force the message behind the commitmen=
t.
For instance, digest from a hashed four byte integer can be trivially rever=
sed
by just trying all combinations. Equally, if that integer is concatenated w=
ith
a 32 byte digest that the attacker knows, the value of the integer can be b=
rute
forced.
> > Technically even a patricia trie utxo commitment can have sub-1 cache
> > > misses per update if some of the updates in a single block are close =
to
> > > each other in memory. I think I can get practical Bitcoin updates down
> > to a
> > > little bit less than one l2 cache miss per update, but not a lot less.
> >
> > I'm very confused as to why you think that's possible. When you say
> > "practical
> > Bitcoin updates", what exactly is the data structure you're proposing to
> > update? How is it indexed?
>=20
>=20
> My calculations are: a Bitcoin block contains about 2000 updates. The l2
> cache is about 256 kilobytes, and if an update is about 32 bytes times two
> for the parents, grandparents, etc. then an l2 cache can contain about 40=
00
> values. If the current utxo size is about 2000 * 4000 =3D 8,000,000 in si=
ze
> then about half the pages which contain a transaction will contain a seco=
nd
> one. I think the utxo set is currently about an order of magnitude greater
> than that, so the number of such collisions will be fairly mall, hence my
> 'less than one but not a lot less' comment.
Your estimate of updates requiring 32 bytes of data is *way* off.
Each inner node updated on the path to a leaf node will itself require 32 b=
ytes
of data to be fetched - the digest of the sibling. As of block 416,628, the=
re
are 39,167,128 unspent txouts, giving us a tree about 25 levels deep.
So if I want to update a single leaf, I need to read:
25 nodes * 32 bytes/node =3D 800 bytes
of data. Naively, that'd mean our 2,000 updates needs to read 1.6MB from RA=
M,
which is 6.4x bigger than the L2 cache - it's just not going to fit.
Taking into account the fact that this is a batched update improves things a
little bit. For a node at level i with random access patterns and N accesses
total our amortised cost is 1/(1 + N/2^i) Summing that over 2,000 leaf upda=
tes
and 25 levels gives us ~29,000 total updates, 0.9MB, which is still a lot
larger than L2 cache.
While this might fit in L3 cache - usually on the order of megabytes - this=
is
a rather optimistic scenario anyway: we're assuming no other cache pressure=
and
100% hit rate.
Anyway hashing is pretty slow. The very fast BLAKE2 is about 3 cycles/byte
(SHA256 is about 15 cycles/byte) so hashing that same data would take around
200 cycles, and probably quite a bit more in practice due to overheads from=
our
short message lengths; fetching a cache line from DRAM only takes about 1,0=
00
cycles. I'd guess that once other overheads are taken into account, even if=
you
could eliminate L2/L3 cache-misses it wouldn't be much of an improvement.
> As for how it's indexed, at a crypto definition level it's just a binary
> radix tree. In terms of how it's indexed in memory, that involves some
> optimizations to avoid cache misses. Memory is allocated into blocks of
> about the size of an 12 cache (or maybe an l1 cache, it will require some
> testing and optimization). Blocks are either branch blocks, which keep
> everything in fixed positions, or leaf blocks, which contain fixed size
> entries for nodes plus indexes within the same leaf block of their
> children. Branch blocks can have many children which can be either branch
> blocks or leaf blocks, but typically are either all branch blocks or all
> leaf blocks. Branch blocks always have exactly one parent. Leaf blocks
> always have all their inputs come from a single branch block, but there c=
an
> be multiple ones of those. When a branch block overflows it first tries to
> put stuff into the last leaf block it used, and if there's no more room it
> allocates a new one. It's fairly common for branches to have just a few
> leaf children, but they also could have a lot, depending on whether the
> base 2 log of the number of things currently in the set modulo the number
> levels in a branch is a small number.
>=20
> Usually when an update is done it consists of first checking the
> appropriate output of the root block (it's jumped to directly to avoid
> unnecessary memory lookups. If there's nothing there the algorithm will
> walk back until it finds something.) That leads directly to (usually)
> another branch whose output is jumped to directly again. At Bitcoin utxo
> set sizes that will usually lead to a leaf block, which is then walked do=
wn
> manually to find the actual terminal node, which is then updated, and the
> parent, grandparent, etc. is then marked invalid until something which was
> already marked invalid is hit, and it exits. Calculation of hash values is
> done lazily.
I think it's safe to say that given our working set is significantly larger
than the L2/L3 cache available, none of the above optimizations are likely =
to
help much. Better to just keep the codebase simple and use standard techniq=
ues.
--=20
https://petertodd.org 'peter'[:-1]@petertodd.org
--lrZ03NoBR/3+SXJZ
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
-----BEGIN PGP SIGNATURE-----
iQEcBAEBCAAGBQJXY33YAAoJEGOZARBE6K+yRpMH/iu6jy2HeYSemU3z0hjnRf/x
h6C7fEGFM0QvQ5xYB6VL+FnRxNMmiG94rLDlUxTlrQ6U3OvFGPZhOYVI1hyUgzFW
98kB1RfWI233sdu/XKAp93zPKTJ7qT/GQpk2MQG58/gJmtKFDuWV40FYJlHI+62s
nhzEKoq9HLtCNcXUmfAWj/lGaHMhGQlLVs7O1v26lo7Ez8KO1fxFyrtg97LtVzFz
4qTqG3Lk4ic3snBMq48EVLtKMa+jO0TliZhUshEvtgA/T6FprEv1dpLDPyAs44W5
8D1T4Uki84SYRnGq1IMfXtran9H61ez+XxYU54f0k3yaT5BPD4v3B7zHlrlgyO4=
=EcTA
-----END PGP SIGNATURE-----
--lrZ03NoBR/3+SXJZ--
|