summaryrefslogtreecommitdiff
path: root/11/3128ce54ea7ccae08e13dc545c4c53d7ddce80
blob: 267d1552058f61790457f4aa3c1d309fa7c27d4b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
Return-Path: <jim.posen@gmail.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id AB3A0BD1
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 15 Dec 2017 23:55:18 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-it0-f41.google.com (mail-it0-f41.google.com
	[209.85.214.41])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 26783E7
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 15 Dec 2017 23:55:18 +0000 (UTC)
Received: by mail-it0-f41.google.com with SMTP id t1so22432168ite.5
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 15 Dec 2017 15:55:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:from:date:message-id:subject:to;
	bh=ZJZci6uJbxPbggILHl13gqWKoANfKijZ2lVj20SoGX8=;
	b=kEILFBM+0Z1bhhMm7wibp8RLR4DuQc3daxhPXyCfRRrppiLeAiPbtCgjoz7hbKruAC
	lGCpN3Je8t20Wmx2/BbBjBT4LX2UpBb4QY8c1HqPsoUWBjlAzd7gCaGoVAbhj/A/6fwa
	62KX42Fce+oowp/VCCq04LWbGyY2eZaWNiiTpugeCjZ41tTKl/AoKPAis2aeFNUBl7Af
	1dyHAfuaNaD5/ifm2uvs9uE3sAIHKf1ULx/zgVJ6Nzv3ZZVXbj+FEhG5zJlxD17WAyiw
	ZP1MQnq/EQn3PAPDT+W8apNxEadno72FEFlHQU+I+HDmkJiCSwZFj3piPcpn09bXEjLu
	M2iA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
	bh=ZJZci6uJbxPbggILHl13gqWKoANfKijZ2lVj20SoGX8=;
	b=MB1aA1JsMcpE0qptutxPaE8gCfm75I7t45ORDCLGm2gO636WBtwQBjDjQDPFdP4h5z
	dz3frZQFCwCvpS41MIfQpNjLTvC6K1FrBGol1nRrYOWlZxzcupot++Mxmbt36fxLjzzp
	kmEfwJcuiQzWJs6mqcdIu7kodkVWg49ahAIEumBllVMY1R/PnHIe4bbGV8ughJkmL4Bk
	A1UMznkNzikrbWhk9n0Zsow1216Wq1LmlDqSbrZfSQX961KXkr/4eAluXGN05OYAeyhT
	IaRyol4j0qeM58U9/1xDqor66IPT+GA3+jw+4AJ0T/G/1ZzRDMSSpE1i5VUBIcXFgPy5
	vzRg==
X-Gm-Message-State: AKGB3mJ9lJ6gRT4E2lgMmlkn8HC6XM7fZg1GhJgHSC040+3oKbQaSjqN
	VRZM42MRmQd8raFO/by+sBvmQjqlA31xmHBnjB2TcuLv
X-Google-Smtp-Source: ACJfBotZ7v1AT2BuX4XiYjhbB0oenaHJKfnv3q8Xj5UQR14sSsJ+hc94cIiUESRPtNIo7zzZEK1EHbKvCIGJUya/4VI=
X-Received: by 10.36.36.213 with SMTP id f204mr10490092ita.89.1513382117090;
	Fri, 15 Dec 2017 15:55:17 -0800 (PST)
MIME-Version: 1.0
Received: by 10.107.34.69 with HTTP; Fri, 15 Dec 2017 15:55:16 -0800 (PST)
From: Jim Posen <jim.posen@gmail.com>
Date: Fri, 15 Dec 2017 15:55:16 -0800
Message-ID: <CADZtCSjR8Cd30Ag__+Rr2FV8rdbYY4_bUgGKeEgpoyTjxzM=dw@mail.gmail.com>
To: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Content-Type: multipart/alternative; boundary="001a1147bce67faa74056069bb56"
X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Sat, 16 Dec 2017 00:01:41 +0000
Subject: [bitcoin-dev] Parallel header download during sync
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Dec 2017 23:55:18 -0000

--001a1147bce67faa74056069bb56
Content-Type: text/plain; charset="UTF-8"

One of the ideas that Greg Maxwell brought up in the "'Compressed' headers
stream" thread is the possibility of a header sync mechanism that allowed
parallel download from multiple peers. With the current getheaders/headers
semantics, headers must be downloaded sequentially from genesis. In my
testing, I saw that syncing headers directly from a colocated node took <5s
whereas syncing normally from network peers takes ~5 min for me, which goes
to show that 5s is an upper bound on the time to process all headers if
they are locally available. So if we can introduce new p2p messages for
header sync, what would they look like? Here's one idea.

A new getheadersv2 request would include a start height for the range of
headers requested and a commitment to the last block in the chain that you
want to download. Then you find N peers that are all on the same chain,
partition the range of headers from 0 to the chain height minus some
reasonable reorg safety buffer (~6 blocks), and send download requests in
parallel. So how do we know that the peers are on the same chain and that
their headers served connect into this chain?

When you connect to outbound peers and are in IBD, you will query them for
a Merkle Mountain Range commitment to all headers up to a height X (which
is 6ish blocks before their start height from the version message). Then
you choose the commitment that the majority of the queried peers sent (or
some other heuristic), and these become your download peers. Every
getheadersv2 request includes the start height, X, and the chain
commitment. The headersv2 response messages include all of the headers
followed by a merkle branch linking the last header into the chain
commitment. Headers are processed in order as they arrive and if any of the
headers are invalid, you can ban/disconnect all peers that committed to it,
drop the buffer of later headers and start over.

That's the basic idea. Here are other details:

- This would require an additional 32-byte MMR commitment for each header
in memory.
- When a node receives a headersv2 request and constructs a merkle proof
for the last header, it checks against the sent commitment. In the case of
a really deep reorg, that check would fail, and the node can instead
respond with an updated commitment hash for that height.
- Another packet is needed, getheaderchain or something, that a syncing
peer first sends along with a header locator and an end height. The peer
responds with headerchain, which includes the last common header from the
locator along with the chain commitment at that height and a merkle branch
proving inclusion of that header in the chain.
- Nodes would cache chain commitments for the last ~20 blocks (somewhat
arbitrary), and refuse to serve chain commitments for heights before that.

Thoughts? This is a pretty recycled idea, so please point me at prior
proposals that are similar as well.

-jimpo

--001a1147bce67faa74056069bb56
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">One of the ideas that Greg Maxwell brought up in the &quot=
;&#39;Compressed&#39; headers stream&quot; thread is the possibility of a h=
eader sync mechanism that allowed parallel download from multiple peers. Wi=
th the current getheaders/headers semantics, headers must be downloaded seq=
uentially from genesis. In my testing, I saw that syncing headers directly =
from a colocated node took &lt;5s whereas syncing normally from network pee=
rs takes ~5 min for me, which goes to show that 5s is an upper bound on the=
 time to process all headers if they are locally available. So if we can in=
troduce new p2p messages for header sync, what would they look like? Here&#=
39;s one idea.<div><br></div><div>A new getheadersv2 request would include =
a start height for the range of headers requested and a commitment to the l=
ast block in the chain that you want to download. Then you find N peers tha=
t are all on the same chain, partition the range of headers from 0 to the c=
hain height minus some reasonable reorg safety buffer (~6 blocks), and send=
 download requests in parallel. So how do we know that the peers are on the=
 same chain and that their headers served connect into this chain?</div><di=
v><br></div><div>When you connect to outbound peers and are in IBD, you wil=
l query them for a Merkle Mountain Range commitment to all headers up to a =
height X (which is 6ish blocks before their start height from the version m=
essage). Then you choose the commitment that the majority of the queried pe=
ers sent (or some other heuristic), and these become your download peers. E=
very getheadersv2 request includes the start height, X, and the chain commi=
tment. The headersv2 response messages include all of the headers followed =
by a merkle branch linking the last header into the chain commitment. Heade=
rs are processed in order as they arrive and if any of the headers are inva=
lid, you can ban/disconnect all peers that committed to it, drop the buffer=
 of later headers and start over.</div><div><br></div><div>That&#39;s the b=
asic idea. Here are other details:</div><div><br></div><div>- This would re=
quire an additional 32-byte MMR commitment for each header in memory.</div>=
<div>- When a node receives a headersv2 request and constructs a merkle pro=
of for the last header, it checks against the sent commitment. In the case =
of a really deep reorg, that check would fail, and the node can instead res=
pond with an updated commitment hash for that height.</div><div>- Another p=
acket is needed, getheaderchain or something, that a syncing peer first sen=
ds along with a header locator and an end height. The peer responds with he=
aderchain, which includes the last common header from the locator along wit=
h the chain commitment at that height and a merkle branch proving inclusion=
 of that header in the chain.</div><div>- Nodes would cache chain commitmen=
ts for the last ~20 blocks (somewhat arbitrary), and refuse to serve chain =
commitments for heights before that.</div><div><br></div><div>Thoughts? Thi=
s is a pretty recycled idea, so please point me at prior proposals that are=
 similar as well.</div><div><br></div><div>-jimpo</div></div>

--001a1147bce67faa74056069bb56--