summaryrefslogtreecommitdiff
path: root/88/32f98a75339e47b08717302efaa967a9041979
blob: aeb3583f957950d55193d8f1ea6ed28d2f15e3b6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
Return-Path: <bram@bittorrent.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 4E267279
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat, 25 Feb 2017 06:23:22 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-io0-f182.google.com (mail-io0-f182.google.com
	[209.85.223.182])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 72C01157
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat, 25 Feb 2017 06:23:21 +0000 (UTC)
Received: by mail-io0-f182.google.com with SMTP id 25so3575381iom.3
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 24 Feb 2017 22:23:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=bittorrent-com.20150623.gappssmtp.com; s=20150623;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to
	:cc; bh=rsdaM0G3TpdvudIg/V3pGiqZdEmzDK8f8gTsA26DxxI=;
	b=GDOTpbPWqP08ceqbmEzDR/uhSjzn7Sfxl49XZB1QqCB+ohYYYVdX5t/R+aiEV3FFNE
	VurnaN07OI8WYtEAriLaWuIBAq1TUB8pGD3NMMdo0YIsLOuhW8d1kvOMHARnV5qCCkcK
	dQV+IuGUuqoRPViitx4StVniJqNONwQcKs7d53lJGTPmVBd2zhsF/8leQ+8+XPr0XGgy
	cwHKFMeOJtHRffYvGgDbZnSPHCPjbyv1S6P3P8ab4ppd2CgxYF/ChA4C1V8wVcBkYEjk
	tzfYPtrgpP97h6KZ9kTk0olZcmz2tCFHUUdpzHHvsNx9tbFH0WfABsQUznv/V4+fIgn7
	XtyQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:in-reply-to:references:from:date
	:message-id:subject:to:cc;
	bh=rsdaM0G3TpdvudIg/V3pGiqZdEmzDK8f8gTsA26DxxI=;
	b=iTwMFOPmRZJ62A+PbO3k2iBb9kizYyiwotysHxWUk+p0zzG8KPwU6CS/p2EG681yum
	PGFlL8UEpX5kmwMZ+7RBcZ1RyWBxPvGMKt9YnhHH+uRQSWJi53Nh0eDhk1IJfyF7fBBc
	Baj0KckglOOXSgKgs8fsqXmuCSufIpwSjWCg/mGeLoWzLohq5OenyIBejbc5kTjg2bpd
	Q3zmlAobbDk+6wEtpdLP4MYhSIUGOg7OgchbAoODf1Z5gYmJG60OzaR4LqL6FdQOjqEM
	3rsUolq8M3nsXd4py6GygJ4L0acgz+IaNc6NbDBTAptW9fI9VndTE5FEGbU/c00Qlyso
	MAWA==
X-Gm-Message-State: AMke39mykzR9FsMtllTSszdcAKGUDzz1PMYH4WCnmZkkiDJKc4mY8Tk9gxBxUEW9QKfX7CJDg+TGBS3tYgjSrIJ8
X-Received: by 10.107.15.70 with SMTP id x67mr5646511ioi.103.1488003800629;
	Fri, 24 Feb 2017 22:23:20 -0800 (PST)
MIME-Version: 1.0
Received: by 10.36.73.150 with HTTP; Fri, 24 Feb 2017 22:23:20 -0800 (PST)
In-Reply-To: <20170225041202.GA11152@savin.petertodd.org>
References: <20170223235105.GA28497@savin.petertodd.org>
	<CA+KqGkowxEZeAFYa2JJchBDtRkg1p3YZNocivzu3fAtgRLDRBQ@mail.gmail.com>
	<20170224010943.GA29218@savin.petertodd.org>
	<CA+KqGkrOK76S3ffPJmpqYcBwtSeKESqN16yZsrwzDR6JZZmwFA@mail.gmail.com>
	<20170224025811.GA31911@savin.petertodd.org>
	<CA+KqGkq7gavAnAk-tcA+gxL2sWpv3ENhEmHrQHaPdyAsKrLjGg@mail.gmail.com>
	<20170224031531.GA32118@savin.petertodd.org>
	<CA+KqGkrfhg3GnbWwvKXHQ2NWuCnfzYyTPUxRhzYMuDBiNQR4eA@mail.gmail.com>
	<20170224043613.GA32502@savin.petertodd.org>
	<CA+KqGkpi4GvgU-K6vt-U5ZN4AkpjZ0rruzddoJS4-V0TcnyqUQ@mail.gmail.com>
	<20170225041202.GA11152@savin.petertodd.org>
From: Bram Cohen <bram@bittorrent.com>
Date: Fri, 24 Feb 2017 22:23:20 -0800
Message-ID: <CA+KqGkqs8F1hK6y-JnLFRpqhQ5i8i+MXVmtGUQBYmE5d1OCAAg@mail.gmail.com>
To: Peter Todd <pete@petertodd.org>
Content-Type: multipart/alternative; boundary=001a113ed7fef638b9054954e147
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, HTML_MESSAGE, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] A Better MMR Definition
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Feb 2017 06:23:22 -0000

--001a113ed7fef638b9054954e147
Content-Type: text/plain; charset=UTF-8

On Fri, Feb 24, 2017 at 8:12 PM, Peter Todd <pete@petertodd.org> wrote:

>
> So to be clear, what you're proposing there is to use the insertion order
> as
> the index - once you go that far you've almost entirely re-invented my
> proposal!
>

I'm not 'proposing' this, I'm saying it could be done simply but I'm
skeptical of the utility. Probably the most compelling argument for it is
that the insertion indexed values are much smaller so they can be compacted
down a lot resulting in using less memory and more locality and fewer
hashes, but your implementation doesn't take advantage of that.


> Your merkle-set implementation is 1500 lines of densely written Python


The reference implementation which is included in those 1500 lines is less
than 300 lines and fairly straightforward. The non-reference implementation
always behaves semantically identically to the reference implementation, it
just does so faster and using less memory.


> with
> almost no comments,


The comments at the top explain both the proof format and the in-memory
data structures very precisely. The whole codebase was reviewed by a
coworker of mine and comments were added explaining the subtleties which
tripped him up.


> and less than a 100 lines of (also uncommented) tests.


Those tests get 98% code coverage and extensively hit not only the lines of
code but the semantic edge cases as well. The lines which aren't hit are
convenience functions and error conditions of the parsing code for when
it's passed bad data.


> By
> comparison, my Python MMR implementation is 300 lines of very readable
> Python
> with lots of comments, a 200 line explanation at the top, and 200 lines of
> (commented) tests. Yet no-one is taking the (still considerable) effort to
> understand and comment on my implementation. :)
>

Given that maaku's Merkle prefix trees were shelved because of performance
problems despite being written in C and operating in basically the same way
as your code and my reference code, it's clear that non-optimized Python
won't be touching the bitcoin codebase any time soon.


>
> Fact is, what you've written is really daunting to review, and given it's
> not
> in the final language anyway, it's unclear what basis to review it on
> anyway.


It should reviewed based on semantic correctness and performance.
Performance can only be accurately and convincingly determined by porting
to C and optimizing it, which mostly involves experimenting with different
values for the two passed in magic numbers.


> I
> suspect you'd get more feedback if the codebase was better commented, in a
> production language, and you have actual real-world benchmarks and
> performance
> figures.
>

Porting to C should be straightforward. Several people have already
expressed interest in doing so, and it's written in intentionally C-ish
Python, resulting in some rather odd idioms which is a bit part of why you
think it looks 'dense'. A lot of that weird offset math should be much more
readable in C because it's all structs and x.y notation can be used instead
of adding offsets.


> In particular, while at the top of merkle_set.py you have a list of
> advantages,
> and a bunch of TODO's, you don't explain *why* the code has any of these
> advantages. To figure that out, I'd have to read and understand 1500 lines
> of
> densely written Python. Without a human-readable pitch, not many people are
> going to do that, myself included.
>

It's all about cache coherence. When doing operations it pulls in a bunch
of things which are near each other in memory instead of jumping all over
the place. The improvements it gets should be much greater than the ones
gained from insertion ordering, although the two could be accretive.

--001a113ed7fef638b9054954e147
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On F=
ri, Feb 24, 2017 at 8:12 PM, Peter Todd <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:pete@petertodd.org" target=3D"_blank">pete@petertodd.org</a>&gt;</span=
> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex"><span class=3D""><br>
</span>So to be clear, what you&#39;re proposing there is to use the insert=
ion order as<br>
the index - once you go that far you&#39;ve almost entirely re-invented my<=
br>
proposal!<br></blockquote><div><br></div><div>I&#39;m not &#39;proposing&#3=
9; this, I&#39;m saying it could be done simply but I&#39;m skeptical of th=
e utility. Probably the most compelling argument for it is that the inserti=
on indexed values are much smaller so they can be compacted down a lot resu=
lting in using less memory and more locality and fewer hashes, but your imp=
lementation doesn&#39;t take advantage of that.</div><div>=C2=A0</div><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #cc=
c solid;padding-left:1ex">Your merkle-set implementation is 1500 lines of d=
ensely written Python</blockquote><div><br></div><div>The reference impleme=
ntation which is included in those 1500 lines is less than 300 lines and fa=
irly straightforward. The non-reference implementation always behaves seman=
tically identically to the reference implementation, it just does so faster=
 and using less memory.</div><div>=C2=A0</div><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex=
"> with<br>
almost no comments,</blockquote><div><br></div><div>The comments at the top=
 explain both the proof format and the in-memory data structures very preci=
sely. The whole codebase was reviewed by a coworker of mine and comments we=
re added explaining the subtleties which tripped him up.</div><div>=C2=A0</=
div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef=
t:1px #ccc solid;padding-left:1ex"> and less than a 100 lines of (also unco=
mmented) tests.</blockquote><div><br></div><div>Those tests get 98% code co=
verage and extensively hit not only the lines of code but the semantic edge=
 cases as well. The lines which aren&#39;t hit are convenience functions an=
d error conditions of the parsing code for when it&#39;s passed bad data.</=
div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> By<br>
comparison, my Python MMR implementation is 300 lines of very readable Pyth=
on<br>
with lots of comments, a 200 line explanation at the top, and 200 lines of<=
br>
(commented) tests. Yet no-one is taking the (still considerable) effort to<=
br>
understand and comment on my implementation. :)<br></blockquote><div><br></=
div><div>Given that maaku&#39;s Merkle prefix trees were shelved because of=
 performance problems despite being written in C and operating in basically=
 the same way as your code and my reference code, it&#39;s clear that non-o=
ptimized Python won&#39;t be touching the bitcoin codebase any time soon.=
=C2=A0</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Fact is, what you&#39;ve written is really daunting to review, and given it=
&#39;s not<br>
in the final language anyway, it&#39;s unclear what basis to review it on a=
nyway.</blockquote><div><br></div><div>It should reviewed based on semantic=
 correctness and performance. Performance can only be accurately and convin=
cingly determined by porting to C and optimizing it, which mostly involves =
experimenting with different values for the two passed in magic numbers.</d=
iv><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I<br>
suspect you&#39;d get more feedback if the codebase was better commented, i=
n a<br>
production language, and you have actual real-world benchmarks and performa=
nce<br>
figures.<br></blockquote><div><br></div><div>Porting to C should be straigh=
tforward. Several people have already expressed interest in doing so, and i=
t&#39;s written in intentionally C-ish Python, resulting in some rather odd=
 idioms which is a bit part of why you think it looks &#39;dense&#39;. A lo=
t of that weird offset math should be much more readable in C because it&#3=
9;s all structs and x.y notation can be used instead of adding offsets.</di=
v><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">In particular, while at t=
he top of merkle_set.py you have a list of advantages,<br>
and a bunch of TODO&#39;s, you don&#39;t explain *why* the code has any of =
these<br>
advantages. To figure that out, I&#39;d have to read and understand 1500 li=
nes of<br>
densely written Python. Without a human-readable pitch, not many people are=
<br>
going to do that, myself included.<br></blockquote><div><br></div><div>It&#=
39;s all about cache coherence. When doing operations it pulls in a bunch o=
f things which are near each other in memory instead of jumping all over th=
e place. The improvements it gets should be much greater than the ones gain=
ed from insertion ordering, although the two could be accretive.</div><div>=
<br></div></div></div></div>

--001a113ed7fef638b9054954e147--