summaryrefslogtreecommitdiff
path: root/f8/058ba3a4523962643a3cd699bd96d2bc7790c1
blob: 4e0071d9dfaf07166ad6c63932264291360015be (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
Return-Path: <jim.posen@gmail.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 7E3061229
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Tue,  3 Apr 2018 05:34:42 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-io0-f175.google.com (mail-io0-f175.google.com
	[209.85.223.175])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 77AC5418
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Tue,  3 Apr 2018 05:34:41 +0000 (UTC)
Received: by mail-io0-f175.google.com with SMTP id x77so14210264ioi.2
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 02 Apr 2018 22:34:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to
	:cc; bh=Fpvv1IJIJQj0Hrd1aSnloxow6WBFnhKaWHKUng2WR1k=;
	b=ebhppV0qHIOQ8ZlGtG0HaBMToMICdqQcN8RJ8FLk4hQkrpG57LAVU3+0lGb0fVzgnK
	/qSqR48B7OejiTBtSgp+d6m3orTywAm2md/gJg+5/qqMBBhTPuqGHYsqhb93HOS6aj1t
	EObTe+P+KGQ/gXWCZ7ymS5y2eqYudqhUNO5s28LZo0EIfK7y0Etx5MKalwMgu8Ax6zxW
	e/Qo7BKqhHVvDw0VPJXcUEwnoy+OrHrEjHph7xcEgTpA+Wd/kANmFuPoTmloPIcy0DHm
	OpEMVHPztbJkBnFKGCXWvtWEtdPAt2eEORAfQlrkfp/tGG8oDV35pbUFyndXqPWvN5ah
	b9Nw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:in-reply-to:references:from:date
	:message-id:subject:to:cc;
	bh=Fpvv1IJIJQj0Hrd1aSnloxow6WBFnhKaWHKUng2WR1k=;
	b=UBNEi0HaNAzp9h46InbJvQ4xVWrLpBJCThGAUyCZwr+FOpQoCGzRf1E0cWBSAlK6Mb
	9uQhs3x/YeeHgKTb7SwwMrlD2sClC13aCZK8/VjUiw7KgT/PpT7uyxJsKr5xNkYLvfWB
	5KF8jAOVxpbSMED0xOU5ld40dGBRJyQD0ssZJO+3F/JorpG8Uc6Iu8XTgFbcis/duNod
	qknmtnqvnviNbPJmSvGp3jsqLYX4htDZmjALdLXW+kgngM+1CTu5RIpg+TMcVQSqLvi/
	ynOUVvaZIoFepvFCfFSfFg4gAeG9j2B0r/rqsIhfoAVwZj5CW72DKu3H2rvURW71B0N4
	r73g==
X-Gm-Message-State: ALQs6tD3UymMPrH/xjIJgJ4yZExG+4PSVI27Z7R7sSSoKWowbek9egng
	cLzclMRTE+MN/O6gWZ1JHlURuKvRdS8jDr6LSqzeCjjw
X-Google-Smtp-Source: AIpwx489ml/dFUFxqawGlTy37IbHrgbmdvyN3brpnRcw2L7UhyG847iXMoHNa6WGDJlh1lGN8sZ0nlBoaxb8oqLc3DY=
X-Received: by 10.107.114.22 with SMTP id n22mr10238168ioc.41.1522733680657;
	Mon, 02 Apr 2018 22:34:40 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.107.52.80 with HTTP; Mon, 2 Apr 2018 22:34:39 -0700 (PDT)
In-Reply-To: <CADabwBDiT2zNPHZ2=OyvCVrSY3Km2oyTnhRCHyMNsjW2vLMmOg@mail.gmail.com>
References: <CADZtCSg7+x-sg-ysgacXobRexOVwT+k9fr6a9S-6xU2w8f8m3A@mail.gmail.com>
	<CADabwBAjTRdVqsL+V=YdQ+kr8+LtSPOeSXUJOzKoPNdKEbAOWQ@mail.gmail.com>
	<CADZtCSjmQfBZoaO=MCyRoEn-AYe4A=1kDhxSVxVMw+O4k7YJfQ@mail.gmail.com>
	<20180330061418.GA6017@erisian.com.au>
	<CADabwBDiT2zNPHZ2=OyvCVrSY3Km2oyTnhRCHyMNsjW2vLMmOg@mail.gmail.com>
From: Jim Posen <jim.posen@gmail.com>
Date: Mon, 2 Apr 2018 22:34:39 -0700
Message-ID: <CADZtCShrTKqR26RMnUAwK32_7XKNvsCWgfYvQ3Bud2J6r54AKg@mail.gmail.com>
To: Riccardo Casatta <riccardo.casatta@gmail.com>
Content-Type: multipart/alternative; boundary="089e0825f9b01f8c500568eb1017"
X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Tue, 03 Apr 2018 11:54:20 +0000
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] Optimized Header Sync
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Apr 2018 05:34:42 -0000

--089e0825f9b01f8c500568eb1017
Content-Type: text/plain; charset="UTF-8"

Thank you for your feedback AJ and Riccardo.

Nice observation about using nBits from every 2016th block as a short
specifier of chain work. You can get some savings from the 4 byte nBits
encoding over VLQ for total chain work as in my spec.

I tried it out on the current chain. At block height 516,387, there are 258
total checkpoints in the response payload with an interval of 2016. The
size of the checkpts message is:

- 9,304 bytes using hash + nBits
- 10,934 bytes using hash + chain work delta encoded as VLQ
- 11,030 bytes using hash + chain work total encoded as VLQ

The saving from using deltas instead of the total seems negligible to me
especially considering the additional computation it requires. Going from
total chain work as VLQ to nBits is a 16% savings in the size of a checkpts
message. According to some rather rough benchmarks, it takes ~3us to
generate the message with nBits versus ~105us to generate each message with
VLQ chain work (including block index lookups and serialization time).

The downside, however, is that the new P2P message would be tightly coupled
to a specific parameter in Bitcoin's consensus protocol, and one that is
changed in many alt chains. Also, it would require that checkpoints can
only be fetched at intervals of 2016, instead of intervals chosen by the
clients. Being able to specify the interval is a very nice property for
longer chains, where a client may select really large intervals, then
bisect that range even further to request a smaller PoW sample (eg. start
by fetching every 10,000th, then every 100th).

Personally, I strongly think using total chain work instead of nBits is the
right tradeoff and is worth the extra 1KB. I'm curious to hear others'
opinions. Note that the checkpoints message is only fetched once per peer
per download from genesis. Subsequent catchups only fetch checkpoints from
the locator fork point. I also don't find the caching argument compelling
-- the time to generate checkpts response messages is fast enough anyway.

I also finally got around to pulling numbers on the space savings from the
nVersion omission. As a reminder of how this works, three bits in the
encoding indicator represent a value 1-7 of the distance in block height
since another block with the same version. Looking at the current Bitcoin
main chain, this is a table of the occurrences of these values:

Height distance # of Blocks
1 469537
2 22301
3 8833
4 4368
5 2633
6 1630
7 1114
                     8+ 5967
You can read this as "469,537 blocks have the same version as their
parent", "22,301 have the same version as their parent's parent", etc.
Given the information in this table, we may consider only allocating 2 bits
in the encoding header rather than 3.

On Fri, Mar 30, 2018 at 1:06 AM, Riccardo Casatta <
riccardo.casatta@gmail.com> wrote:

> Yes, I think the checkpoints and the compressed headers streams should be
> handled in chunks of 2016 headers and queried by chunk number instead of
> height, falling back to current method if the chunk is not full yet.
>
> This is cache friendly and allows to avoid bit 0 and bit 1 in the bitfield
> (because they are always 1 after the first header in the chunk of 2016).
>
> 2018-03-30 8:14 GMT+02:00 Anthony Towns <aj@erisian.com.au>:
>
>> On Thu, Mar 29, 2018 at 05:50:30PM -0700, Jim Posen via bitcoin-dev wrote:
>> > Taken a step further though, I'm really interested in treating the
>> checkpoints
>> > as commitments to chain work [...]
>>
>> In that case, shouldn't the checkpoints just be every 2016 blocks and
>> include the corresponding bits value for that set of blocks?
>>
>> That way every node commits to (approximately) how much work their entire
>> chain has by sending something like 10kB of data (currently), and you
>> could verify the deltas in each node's chain's target by downloading the
>> 2016 headers between those checkpoints (~80kB with the proposed compact
>> encoding?) and checking the timestamps and proof of work match both the
>> old target and the new target from adjacent checkpoints.
>>
>> (That probably still works fine even if there's a hardfork that allows
>> difficulty to adjust more frequently: a bits value at block n*2016 will
>> still enforce *some* lower limit on how much work blocks n*2016+{1..2016}
>> will have to contribute; so will still allow you to estimate how much work
>> will have been done, it may just be less precise than the estimate you
>> could
>> generate now)
>>
>> Cheers,
>> aj
>>
>>
>
>
> --
> Riccardo Casatta - @RCasatta <https://twitter.com/RCasatta>
>

--089e0825f9b01f8c500568eb1017
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thank you for your feedback AJ and Riccardo.<div><br></div=
><div>Nice observation about using nBits from every 2016th block as a short=
 specifier of chain work. You can get some savings from the 4 byte nBits en=
coding over VLQ for total chain work as in my spec.</div><br>I tried it out=
 on the current chain. At block height 516,387, there are 258 total checkpo=
ints in the response payload with an interval of 2016. The size of the chec=
kpts message is:<div></div><div><br></div><div>-=C2=A09,304 bytes using has=
h + nBits</div><div>-=C2=A010,934 bytes using hash + chain work delta encod=
ed as VLQ</div><div>- 11,030 bytes using hash + chain work total encoded as=
 VLQ</div><div><br></div><div>The saving from using deltas instead of the t=
otal seems negligible to me especially considering the additional computati=
on it requires. Going from total chain work as VLQ to nBits is a 16% saving=
s in the size of a checkpts message. According to some rather rough benchma=
rks, it takes ~3us to generate the message with nBits versus ~105us to gene=
rate each message with VLQ chain work (including block index lookups and se=
rialization time).</div><div><br></div><div>The downside, however, is that =
the new P2P message would be tightly coupled to a specific parameter in Bit=
coin&#39;s consensus protocol, and one that is changed in many alt chains. =
Also, it would require that checkpoints can only be fetched at intervals of=
 2016, instead of intervals chosen by the clients. Being able to specify th=
e interval is a very nice property for longer chains, where a client may se=
lect really large intervals, then bisect that range even further to request=
 a smaller PoW sample (eg. start by fetching every 10,000th, then every 100=
th).</div><div><br></div><div>Personally, I strongly think using total chai=
n work instead of nBits is the right tradeoff and is worth the extra 1KB. I=
&#39;m curious to hear others&#39; opinions.=C2=A0<span style=3D"color:rgb(=
34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;fo=
nt-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter=
-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-=
space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decora=
tion-style:initial;text-decoration-color:initial;float:none;display:inline"=
>Note that the checkpoints message is only fetched once per peer per downlo=
ad from genesis. Subsequent catchups only fetch checkpoints from the locato=
r fork point. I also don&#39;t find the caching argument compelling -- the =
time to generate checkpts response messages is fast enough anyway.</span></=
div><div><span style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;fo=
nt-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-=
caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-ind=
ent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-=
color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:=
initial;float:none;display:inline"><br></span></div><div>I also finally got=
 around to pulling numbers on the space savings from the nVersion omission.=
 As a reminder of how this works, three bits in the encoding indicator repr=
esent a value 1-7 of the distance in block height since another block with =
the same version. Looking at the current Bitcoin main chain, this is a tabl=
e of the occurrences of these values:</div><div><br><table cellspacing=3D"0=
" cellpadding=3D"0" dir=3D"ltr" border=3D"1" style=3D"table-layout:fixed;fo=
nt-size:10pt;font-family:arial,sans,sans-serif;width:0px;border-collapse:co=
llapse;border:none"><colgroup><col width=3D"100"><col width=3D"100"></colgr=
oup><tbody><tr style=3D"height:21px"><td style=3D"overflow:hidden;padding:2=
px 3px;vertical-align:bottom;border:1px solid rgb(204,204,204)">Height dist=
ance</td><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom=
;border:1px solid rgb(204,204,204)"># of Blocks</td></tr><tr style=3D"heigh=
t:21px"><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;=
text-align:right;border:1px solid rgb(204,204,204)">1</td><td style=3D"over=
flow:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;border:1=
px solid rgb(204,204,204)">469537</td></tr><tr style=3D"height:21px"><td st=
yle=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;text-align:rig=
ht;border:1px solid rgb(204,204,204)">2</td><td style=3D"overflow:hidden;pa=
dding:2px 3px;vertical-align:bottom;text-align:right;border:1px solid rgb(2=
04,204,204)">22301</td></tr><tr style=3D"height:21px"><td style=3D"overflow=
:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;border:1px s=
olid rgb(204,204,204)">3</td><td style=3D"overflow:hidden;padding:2px 3px;v=
ertical-align:bottom;text-align:right;border:1px solid rgb(204,204,204)">88=
33</td></tr><tr style=3D"height:21px"><td style=3D"overflow:hidden;padding:=
2px 3px;vertical-align:bottom;text-align:right;border:1px solid rgb(204,204=
,204)">4</td><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bo=
ttom;text-align:right;border:1px solid rgb(204,204,204)">4368</td></tr><tr =
style=3D"height:21px"><td style=3D"overflow:hidden;padding:2px 3px;vertical=
-align:bottom;text-align:right;border:1px solid rgb(204,204,204)">5</td><td=
 style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;text-align:=
right;border:1px solid rgb(204,204,204)">2633</td></tr><tr style=3D"height:=
21px"><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;te=
xt-align:right;border:1px solid rgb(204,204,204)">6</td><td style=3D"overfl=
ow:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;border:1px=
 solid rgb(204,204,204)">1630</td></tr><tr style=3D"height:21px"><td style=
=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;=
border:1px solid rgb(204,204,204)">7</td><td style=3D"overflow:hidden;paddi=
ng:2px 3px;vertical-align:bottom;text-align:right;border:1px solid rgb(204,=
204,204)">1114</td></tr><tr style=3D"height:21px"><td style=3D"overflow:hid=
den;padding:2px 3px;vertical-align:bottom;border:1px solid rgb(204,204,204)=
">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A08+</td><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:botto=
m;text-align:right;border:1px solid rgb(204,204,204)">5967</td></tr></tbody=
></table><br></div><div>You can read this as &quot;469,537 blocks have the =
same version as their parent&quot;, &quot;22,301 have the same version as t=
heir parent&#39;s parent&quot;, etc. Given the information in this table, w=
e may consider only allocating 2 bits in the encoding header rather than 3.=
</div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fr=
i, Mar 30, 2018 at 1:06 AM, Riccardo Casatta <span dir=3D"ltr">&lt;<a href=
=3D"mailto:riccardo.casatta@gmail.com" target=3D"_blank">riccardo.casatta@g=
mail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr"><div>Yes, I think the checkpoints and the compressed headers strea=
ms should be handled in chunks of 2016 headers and queried by chunk number =
instead of height, falling back to current method if the chunk is not full =
yet.</div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">T=
his is cache friendly and allows to avoid bit 0 and bit 1 in the bitfield (=
because they are always 1 after the first header in the chunk of 2016).</di=
v><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><div><div=
 class=3D"h5"><div class=3D"gmail_quote">2018-03-30 8:14 GMT+02:00 Anthony =
Towns <span dir=3D"ltr">&lt;<a href=3D"mailto:aj@erisian.com.au" target=3D"=
_blank">aj@erisian.com.au</a>&gt;</span>:<br><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
><span>On Thu, Mar 29, 2018 at 05:50:30PM -0700, Jim Posen via bitcoin-dev =
wrote:<br>
&gt; Taken a step further though, I&#39;m really interested in treating the=
 checkpoints<br>
</span>&gt; as commitments to chain work [...]<br>
<br>
In that case, shouldn&#39;t the checkpoints just be every 2016 blocks and<b=
r>
include the corresponding bits value for that set of blocks?<br>
<br>
That way every node commits to (approximately) how much work their entire<b=
r>
chain has by sending something like 10kB of data (currently), and you<br>
could verify the deltas in each node&#39;s chain&#39;s target by downloadin=
g the<br>
2016 headers between those checkpoints (~80kB with the proposed compact<br>
encoding?) and checking the timestamps and proof of work match both the<br>
old target and the new target from adjacent checkpoints.<br>
<br>
(That probably still works fine even if there&#39;s a hardfork that allows<=
br>
difficulty to adjust more frequently: a bits value at block n*2016 will<br>
still enforce *some* lower limit on how much work blocks n*2016+{1..2016}<b=
r>
will have to contribute; so will still allow you to estimate how much work<=
br>
will have been done, it may just be less precise than the estimate you coul=
d<br>
generate now)<br>
<br>
Cheers,<br>
aj<br>
<br>
</blockquote></div><br><br clear=3D"all"><div><br></div></div></div><span c=
lass=3D"">-- <br><div class=3D"m_-4748661211707078546gmail_signature" data-=
smartmail=3D"gmail_signature"><div dir=3D"ltr">Riccardo Casatta - <a href=
=3D"https://twitter.com/RCasatta" target=3D"_blank">@RCasatta</a></div></di=
v>
</span></div></div>
</blockquote></div><br></div>

--089e0825f9b01f8c500568eb1017--