Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id AB3A0BD1 for ; Fri, 15 Dec 2017 23:55:18 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-it0-f41.google.com (mail-it0-f41.google.com [209.85.214.41]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 26783E7 for ; Fri, 15 Dec 2017 23:55:18 +0000 (UTC) Received: by mail-it0-f41.google.com with SMTP id t1so22432168ite.5 for ; Fri, 15 Dec 2017 15:55:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=ZJZci6uJbxPbggILHl13gqWKoANfKijZ2lVj20SoGX8=; b=kEILFBM+0Z1bhhMm7wibp8RLR4DuQc3daxhPXyCfRRrppiLeAiPbtCgjoz7hbKruAC lGCpN3Je8t20Wmx2/BbBjBT4LX2UpBb4QY8c1HqPsoUWBjlAzd7gCaGoVAbhj/A/6fwa 62KX42Fce+oowp/VCCq04LWbGyY2eZaWNiiTpugeCjZ41tTKl/AoKPAis2aeFNUBl7Af 1dyHAfuaNaD5/ifm2uvs9uE3sAIHKf1ULx/zgVJ6Nzv3ZZVXbj+FEhG5zJlxD17WAyiw ZP1MQnq/EQn3PAPDT+W8apNxEadno72FEFlHQU+I+HDmkJiCSwZFj3piPcpn09bXEjLu M2iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=ZJZci6uJbxPbggILHl13gqWKoANfKijZ2lVj20SoGX8=; b=MB1aA1JsMcpE0qptutxPaE8gCfm75I7t45ORDCLGm2gO636WBtwQBjDjQDPFdP4h5z dz3frZQFCwCvpS41MIfQpNjLTvC6K1FrBGol1nRrYOWlZxzcupot++Mxmbt36fxLjzzp kmEfwJcuiQzWJs6mqcdIu7kodkVWg49ahAIEumBllVMY1R/PnHIe4bbGV8ughJkmL4Bk A1UMznkNzikrbWhk9n0Zsow1216Wq1LmlDqSbrZfSQX961KXkr/4eAluXGN05OYAeyhT IaRyol4j0qeM58U9/1xDqor66IPT+GA3+jw+4AJ0T/G/1ZzRDMSSpE1i5VUBIcXFgPy5 vzRg== X-Gm-Message-State: AKGB3mJ9lJ6gRT4E2lgMmlkn8HC6XM7fZg1GhJgHSC040+3oKbQaSjqN VRZM42MRmQd8raFO/by+sBvmQjqlA31xmHBnjB2TcuLv X-Google-Smtp-Source: ACJfBotZ7v1AT2BuX4XiYjhbB0oenaHJKfnv3q8Xj5UQR14sSsJ+hc94cIiUESRPtNIo7zzZEK1EHbKvCIGJUya/4VI= X-Received: by 10.36.36.213 with SMTP id f204mr10490092ita.89.1513382117090; Fri, 15 Dec 2017 15:55:17 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.34.69 with HTTP; Fri, 15 Dec 2017 15:55:16 -0800 (PST) From: Jim Posen Date: Fri, 15 Dec 2017 15:55:16 -0800 Message-ID: To: Bitcoin Protocol Discussion Content-Type: multipart/alternative; boundary="001a1147bce67faa74056069bb56" X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Sat, 16 Dec 2017 00:01:41 +0000 Subject: [bitcoin-dev] Parallel header download during sync X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Dec 2017 23:55:18 -0000 --001a1147bce67faa74056069bb56 Content-Type: text/plain; charset="UTF-8" One of the ideas that Greg Maxwell brought up in the "'Compressed' headers stream" thread is the possibility of a header sync mechanism that allowed parallel download from multiple peers. With the current getheaders/headers semantics, headers must be downloaded sequentially from genesis. In my testing, I saw that syncing headers directly from a colocated node took <5s whereas syncing normally from network peers takes ~5 min for me, which goes to show that 5s is an upper bound on the time to process all headers if they are locally available. So if we can introduce new p2p messages for header sync, what would they look like? Here's one idea. A new getheadersv2 request would include a start height for the range of headers requested and a commitment to the last block in the chain that you want to download. Then you find N peers that are all on the same chain, partition the range of headers from 0 to the chain height minus some reasonable reorg safety buffer (~6 blocks), and send download requests in parallel. So how do we know that the peers are on the same chain and that their headers served connect into this chain? When you connect to outbound peers and are in IBD, you will query them for a Merkle Mountain Range commitment to all headers up to a height X (which is 6ish blocks before their start height from the version message). Then you choose the commitment that the majority of the queried peers sent (or some other heuristic), and these become your download peers. Every getheadersv2 request includes the start height, X, and the chain commitment. The headersv2 response messages include all of the headers followed by a merkle branch linking the last header into the chain commitment. Headers are processed in order as they arrive and if any of the headers are invalid, you can ban/disconnect all peers that committed to it, drop the buffer of later headers and start over. That's the basic idea. Here are other details: - This would require an additional 32-byte MMR commitment for each header in memory. - When a node receives a headersv2 request and constructs a merkle proof for the last header, it checks against the sent commitment. In the case of a really deep reorg, that check would fail, and the node can instead respond with an updated commitment hash for that height. - Another packet is needed, getheaderchain or something, that a syncing peer first sends along with a header locator and an end height. The peer responds with headerchain, which includes the last common header from the locator along with the chain commitment at that height and a merkle branch proving inclusion of that header in the chain. - Nodes would cache chain commitments for the last ~20 blocks (somewhat arbitrary), and refuse to serve chain commitments for heights before that. Thoughts? This is a pretty recycled idea, so please point me at prior proposals that are similar as well. -jimpo --001a1147bce67faa74056069bb56 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
One of the ideas that Greg Maxwell brought up in the "= ;'Compressed' headers stream" thread is the possibility of a h= eader sync mechanism that allowed parallel download from multiple peers. Wi= th the current getheaders/headers semantics, headers must be downloaded seq= uentially from genesis. In my testing, I saw that syncing headers directly = from a colocated node took <5s whereas syncing normally from network pee= rs takes ~5 min for me, which goes to show that 5s is an upper bound on the= time to process all headers if they are locally available. So if we can in= troduce new p2p messages for header sync, what would they look like? Here&#= 39;s one idea.

A new getheadersv2 request would include = a start height for the range of headers requested and a commitment to the l= ast block in the chain that you want to download. Then you find N peers tha= t are all on the same chain, partition the range of headers from 0 to the c= hain height minus some reasonable reorg safety buffer (~6 blocks), and send= download requests in parallel. So how do we know that the peers are on the= same chain and that their headers served connect into this chain?

When you connect to outbound peers and are in IBD, you wil= l query them for a Merkle Mountain Range commitment to all headers up to a = height X (which is 6ish blocks before their start height from the version m= essage). Then you choose the commitment that the majority of the queried pe= ers sent (or some other heuristic), and these become your download peers. E= very getheadersv2 request includes the start height, X, and the chain commi= tment. The headersv2 response messages include all of the headers followed = by a merkle branch linking the last header into the chain commitment. Heade= rs are processed in order as they arrive and if any of the headers are inva= lid, you can ban/disconnect all peers that committed to it, drop the buffer= of later headers and start over.

That's the b= asic idea. Here are other details:

- This would re= quire an additional 32-byte MMR commitment for each header in memory.
=
- When a node receives a headersv2 request and constructs a merkle pro= of for the last header, it checks against the sent commitment. In the case = of a really deep reorg, that check would fail, and the node can instead res= pond with an updated commitment hash for that height.
- Another p= acket is needed, getheaderchain or something, that a syncing peer first sen= ds along with a header locator and an end height. The peer responds with he= aderchain, which includes the last common header from the locator along wit= h the chain commitment at that height and a merkle branch proving inclusion= of that header in the chain.
- Nodes would cache chain commitmen= ts for the last ~20 blocks (somewhat arbitrary), and refuse to serve chain = commitments for heights before that.

Thoughts? Thi= s is a pretty recycled idea, so please point me at prior proposals that are= similar as well.

-jimpo
--001a1147bce67faa74056069bb56--