diff options
author | Jim Posen <jim.posen@gmail.com> | 2018-04-02 22:34:39 -0700 |
---|---|---|
committer | bitcoindev <bitcoindev@gnusha.org> | 2018-04-03 05:34:42 +0000 |
commit | 48bce4cb4ebf403daf34f237ccc7e5c0ef8f1197 (patch) | |
tree | 035e6ddeeb02aab6330cde2839f85b67a8590164 | |
parent | f340f28218146c4c8d9b1257af798363b056b7c1 (diff) | |
download | pi-bitcoindev-48bce4cb4ebf403daf34f237ccc7e5c0ef8f1197.tar.gz pi-bitcoindev-48bce4cb4ebf403daf34f237ccc7e5c0ef8f1197.zip |
Re: [bitcoin-dev] Optimized Header Sync
-rw-r--r-- | f8/058ba3a4523962643a3cd699bd96d2bc7790c1 | 330 |
1 files changed, 330 insertions, 0 deletions
diff --git a/f8/058ba3a4523962643a3cd699bd96d2bc7790c1 b/f8/058ba3a4523962643a3cd699bd96d2bc7790c1 new file mode 100644 index 000000000..4e0071d9d --- /dev/null +++ b/f8/058ba3a4523962643a3cd699bd96d2bc7790c1 @@ -0,0 +1,330 @@ +Return-Path: <jim.posen@gmail.com> +Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org + [172.17.192.35]) + by mail.linuxfoundation.org (Postfix) with ESMTPS id 7E3061229 + for <bitcoin-dev@lists.linuxfoundation.org>; + Tue, 3 Apr 2018 05:34:42 +0000 (UTC) +X-Greylist: whitelisted by SQLgrey-1.7.6 +Received: from mail-io0-f175.google.com (mail-io0-f175.google.com + [209.85.223.175]) + by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 77AC5418 + for <bitcoin-dev@lists.linuxfoundation.org>; + Tue, 3 Apr 2018 05:34:41 +0000 (UTC) +Received: by mail-io0-f175.google.com with SMTP id x77so14210264ioi.2 + for <bitcoin-dev@lists.linuxfoundation.org>; + Mon, 02 Apr 2018 22:34:41 -0700 (PDT) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; + h=mime-version:in-reply-to:references:from:date:message-id:subject:to + :cc; bh=Fpvv1IJIJQj0Hrd1aSnloxow6WBFnhKaWHKUng2WR1k=; + b=ebhppV0qHIOQ8ZlGtG0HaBMToMICdqQcN8RJ8FLk4hQkrpG57LAVU3+0lGb0fVzgnK + /qSqR48B7OejiTBtSgp+d6m3orTywAm2md/gJg+5/qqMBBhTPuqGHYsqhb93HOS6aj1t + EObTe+P+KGQ/gXWCZ7ymS5y2eqYudqhUNO5s28LZo0EIfK7y0Etx5MKalwMgu8Ax6zxW + e/Qo7BKqhHVvDw0VPJXcUEwnoy+OrHrEjHph7xcEgTpA+Wd/kANmFuPoTmloPIcy0DHm + OpEMVHPztbJkBnFKGCXWvtWEtdPAt2eEORAfQlrkfp/tGG8oDV35pbUFyndXqPWvN5ah + b9Nw== +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=1e100.net; s=20161025; + h=x-gm-message-state:mime-version:in-reply-to:references:from:date + :message-id:subject:to:cc; + bh=Fpvv1IJIJQj0Hrd1aSnloxow6WBFnhKaWHKUng2WR1k=; + b=UBNEi0HaNAzp9h46InbJvQ4xVWrLpBJCThGAUyCZwr+FOpQoCGzRf1E0cWBSAlK6Mb + 9uQhs3x/YeeHgKTb7SwwMrlD2sClC13aCZK8/VjUiw7KgT/PpT7uyxJsKr5xNkYLvfWB + 5KF8jAOVxpbSMED0xOU5ld40dGBRJyQD0ssZJO+3F/JorpG8Uc6Iu8XTgFbcis/duNod + qknmtnqvnviNbPJmSvGp3jsqLYX4htDZmjALdLXW+kgngM+1CTu5RIpg+TMcVQSqLvi/ + ynOUVvaZIoFepvFCfFSfFg4gAeG9j2B0r/rqsIhfoAVwZj5CW72DKu3H2rvURW71B0N4 + r73g== +X-Gm-Message-State: ALQs6tD3UymMPrH/xjIJgJ4yZExG+4PSVI27Z7R7sSSoKWowbek9egng + cLzclMRTE+MN/O6gWZ1JHlURuKvRdS8jDr6LSqzeCjjw +X-Google-Smtp-Source: AIpwx489ml/dFUFxqawGlTy37IbHrgbmdvyN3brpnRcw2L7UhyG847iXMoHNa6WGDJlh1lGN8sZ0nlBoaxb8oqLc3DY= +X-Received: by 10.107.114.22 with SMTP id n22mr10238168ioc.41.1522733680657; + Mon, 02 Apr 2018 22:34:40 -0700 (PDT) +MIME-Version: 1.0 +Received: by 10.107.52.80 with HTTP; Mon, 2 Apr 2018 22:34:39 -0700 (PDT) +In-Reply-To: <CADabwBDiT2zNPHZ2=OyvCVrSY3Km2oyTnhRCHyMNsjW2vLMmOg@mail.gmail.com> +References: <CADZtCSg7+x-sg-ysgacXobRexOVwT+k9fr6a9S-6xU2w8f8m3A@mail.gmail.com> + <CADabwBAjTRdVqsL+V=YdQ+kr8+LtSPOeSXUJOzKoPNdKEbAOWQ@mail.gmail.com> + <CADZtCSjmQfBZoaO=MCyRoEn-AYe4A=1kDhxSVxVMw+O4k7YJfQ@mail.gmail.com> + <20180330061418.GA6017@erisian.com.au> + <CADabwBDiT2zNPHZ2=OyvCVrSY3Km2oyTnhRCHyMNsjW2vLMmOg@mail.gmail.com> +From: Jim Posen <jim.posen@gmail.com> +Date: Mon, 2 Apr 2018 22:34:39 -0700 +Message-ID: <CADZtCShrTKqR26RMnUAwK32_7XKNvsCWgfYvQ3Bud2J6r54AKg@mail.gmail.com> +To: Riccardo Casatta <riccardo.casatta@gmail.com> +Content-Type: multipart/alternative; boundary="089e0825f9b01f8c500568eb1017" +X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, + DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE, + RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 +X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on + smtp1.linux-foundation.org +X-Mailman-Approved-At: Tue, 03 Apr 2018 11:54:20 +0000 +Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org> +Subject: Re: [bitcoin-dev] Optimized Header Sync +X-BeenThere: bitcoin-dev@lists.linuxfoundation.org +X-Mailman-Version: 2.1.12 +Precedence: list +List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org> +List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, + <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe> +List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/> +List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org> +List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help> +List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, + <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe> +X-List-Received-Date: Tue, 03 Apr 2018 05:34:42 -0000 + +--089e0825f9b01f8c500568eb1017 +Content-Type: text/plain; charset="UTF-8" + +Thank you for your feedback AJ and Riccardo. + +Nice observation about using nBits from every 2016th block as a short +specifier of chain work. You can get some savings from the 4 byte nBits +encoding over VLQ for total chain work as in my spec. + +I tried it out on the current chain. At block height 516,387, there are 258 +total checkpoints in the response payload with an interval of 2016. The +size of the checkpts message is: + +- 9,304 bytes using hash + nBits +- 10,934 bytes using hash + chain work delta encoded as VLQ +- 11,030 bytes using hash + chain work total encoded as VLQ + +The saving from using deltas instead of the total seems negligible to me +especially considering the additional computation it requires. Going from +total chain work as VLQ to nBits is a 16% savings in the size of a checkpts +message. According to some rather rough benchmarks, it takes ~3us to +generate the message with nBits versus ~105us to generate each message with +VLQ chain work (including block index lookups and serialization time). + +The downside, however, is that the new P2P message would be tightly coupled +to a specific parameter in Bitcoin's consensus protocol, and one that is +changed in many alt chains. Also, it would require that checkpoints can +only be fetched at intervals of 2016, instead of intervals chosen by the +clients. Being able to specify the interval is a very nice property for +longer chains, where a client may select really large intervals, then +bisect that range even further to request a smaller PoW sample (eg. start +by fetching every 10,000th, then every 100th). + +Personally, I strongly think using total chain work instead of nBits is the +right tradeoff and is worth the extra 1KB. I'm curious to hear others' +opinions. Note that the checkpoints message is only fetched once per peer +per download from genesis. Subsequent catchups only fetch checkpoints from +the locator fork point. I also don't find the caching argument compelling +-- the time to generate checkpts response messages is fast enough anyway. + +I also finally got around to pulling numbers on the space savings from the +nVersion omission. As a reminder of how this works, three bits in the +encoding indicator represent a value 1-7 of the distance in block height +since another block with the same version. Looking at the current Bitcoin +main chain, this is a table of the occurrences of these values: + +Height distance # of Blocks +1 469537 +2 22301 +3 8833 +4 4368 +5 2633 +6 1630 +7 1114 + 8+ 5967 +You can read this as "469,537 blocks have the same version as their +parent", "22,301 have the same version as their parent's parent", etc. +Given the information in this table, we may consider only allocating 2 bits +in the encoding header rather than 3. + +On Fri, Mar 30, 2018 at 1:06 AM, Riccardo Casatta < +riccardo.casatta@gmail.com> wrote: + +> Yes, I think the checkpoints and the compressed headers streams should be +> handled in chunks of 2016 headers and queried by chunk number instead of +> height, falling back to current method if the chunk is not full yet. +> +> This is cache friendly and allows to avoid bit 0 and bit 1 in the bitfield +> (because they are always 1 after the first header in the chunk of 2016). +> +> 2018-03-30 8:14 GMT+02:00 Anthony Towns <aj@erisian.com.au>: +> +>> On Thu, Mar 29, 2018 at 05:50:30PM -0700, Jim Posen via bitcoin-dev wrote: +>> > Taken a step further though, I'm really interested in treating the +>> checkpoints +>> > as commitments to chain work [...] +>> +>> In that case, shouldn't the checkpoints just be every 2016 blocks and +>> include the corresponding bits value for that set of blocks? +>> +>> That way every node commits to (approximately) how much work their entire +>> chain has by sending something like 10kB of data (currently), and you +>> could verify the deltas in each node's chain's target by downloading the +>> 2016 headers between those checkpoints (~80kB with the proposed compact +>> encoding?) and checking the timestamps and proof of work match both the +>> old target and the new target from adjacent checkpoints. +>> +>> (That probably still works fine even if there's a hardfork that allows +>> difficulty to adjust more frequently: a bits value at block n*2016 will +>> still enforce *some* lower limit on how much work blocks n*2016+{1..2016} +>> will have to contribute; so will still allow you to estimate how much work +>> will have been done, it may just be less precise than the estimate you +>> could +>> generate now) +>> +>> Cheers, +>> aj +>> +>> +> +> +> -- +> Riccardo Casatta - @RCasatta <https://twitter.com/RCasatta> +> + +--089e0825f9b01f8c500568eb1017 +Content-Type: text/html; charset="UTF-8" +Content-Transfer-Encoding: quoted-printable + +<div dir=3D"ltr">Thank you for your feedback AJ and Riccardo.<div><br></div= +><div>Nice observation about using nBits from every 2016th block as a short= + specifier of chain work. You can get some savings from the 4 byte nBits en= +coding over VLQ for total chain work as in my spec.</div><br>I tried it out= + on the current chain. At block height 516,387, there are 258 total checkpo= +ints in the response payload with an interval of 2016. The size of the chec= +kpts message is:<div></div><div><br></div><div>-=C2=A09,304 bytes using has= +h + nBits</div><div>-=C2=A010,934 bytes using hash + chain work delta encod= +ed as VLQ</div><div>- 11,030 bytes using hash + chain work total encoded as= + VLQ</div><div><br></div><div>The saving from using deltas instead of the t= +otal seems negligible to me especially considering the additional computati= +on it requires. Going from total chain work as VLQ to nBits is a 16% saving= +s in the size of a checkpts message. According to some rather rough benchma= +rks, it takes ~3us to generate the message with nBits versus ~105us to gene= +rate each message with VLQ chain work (including block index lookups and se= +rialization time).</div><div><br></div><div>The downside, however, is that = +the new P2P message would be tightly coupled to a specific parameter in Bit= +coin's consensus protocol, and one that is changed in many alt chains. = +Also, it would require that checkpoints can only be fetched at intervals of= + 2016, instead of intervals chosen by the clients. Being able to specify th= +e interval is a very nice property for longer chains, where a client may se= +lect really large intervals, then bisect that range even further to request= + a smaller PoW sample (eg. start by fetching every 10,000th, then every 100= +th).</div><div><br></div><div>Personally, I strongly think using total chai= +n work instead of nBits is the right tradeoff and is worth the extra 1KB. I= +'m curious to hear others' opinions.=C2=A0<span style=3D"color:rgb(= +34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;fo= +nt-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter= +-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-= +space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decora= +tion-style:initial;text-decoration-color:initial;float:none;display:inline"= +>Note that the checkpoints message is only fetched once per peer per downlo= +ad from genesis. Subsequent catchups only fetch checkpoints from the locato= +r fork point. I also don't find the caching argument compelling -- the = +time to generate checkpts response messages is fast enough anyway.</span></= +div><div><span style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;fo= +nt-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-= +caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-ind= +ent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-= +color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:= +initial;float:none;display:inline"><br></span></div><div>I also finally got= + around to pulling numbers on the space savings from the nVersion omission.= + As a reminder of how this works, three bits in the encoding indicator repr= +esent a value 1-7 of the distance in block height since another block with = +the same version. Looking at the current Bitcoin main chain, this is a tabl= +e of the occurrences of these values:</div><div><br><table cellspacing=3D"0= +" cellpadding=3D"0" dir=3D"ltr" border=3D"1" style=3D"table-layout:fixed;fo= +nt-size:10pt;font-family:arial,sans,sans-serif;width:0px;border-collapse:co= +llapse;border:none"><colgroup><col width=3D"100"><col width=3D"100"></colgr= +oup><tbody><tr style=3D"height:21px"><td style=3D"overflow:hidden;padding:2= +px 3px;vertical-align:bottom;border:1px solid rgb(204,204,204)">Height dist= +ance</td><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom= +;border:1px solid rgb(204,204,204)"># of Blocks</td></tr><tr style=3D"heigh= +t:21px"><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;= +text-align:right;border:1px solid rgb(204,204,204)">1</td><td style=3D"over= +flow:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;border:1= +px solid rgb(204,204,204)">469537</td></tr><tr style=3D"height:21px"><td st= +yle=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;text-align:rig= +ht;border:1px solid rgb(204,204,204)">2</td><td style=3D"overflow:hidden;pa= +dding:2px 3px;vertical-align:bottom;text-align:right;border:1px solid rgb(2= +04,204,204)">22301</td></tr><tr style=3D"height:21px"><td style=3D"overflow= +:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;border:1px s= +olid rgb(204,204,204)">3</td><td style=3D"overflow:hidden;padding:2px 3px;v= +ertical-align:bottom;text-align:right;border:1px solid rgb(204,204,204)">88= +33</td></tr><tr style=3D"height:21px"><td style=3D"overflow:hidden;padding:= +2px 3px;vertical-align:bottom;text-align:right;border:1px solid rgb(204,204= +,204)">4</td><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bo= +ttom;text-align:right;border:1px solid rgb(204,204,204)">4368</td></tr><tr = +style=3D"height:21px"><td style=3D"overflow:hidden;padding:2px 3px;vertical= +-align:bottom;text-align:right;border:1px solid rgb(204,204,204)">5</td><td= + style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;text-align:= +right;border:1px solid rgb(204,204,204)">2633</td></tr><tr style=3D"height:= +21px"><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;te= +xt-align:right;border:1px solid rgb(204,204,204)">6</td><td style=3D"overfl= +ow:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;border:1px= + solid rgb(204,204,204)">1630</td></tr><tr style=3D"height:21px"><td style= +=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;= +border:1px solid rgb(204,204,204)">7</td><td style=3D"overflow:hidden;paddi= +ng:2px 3px;vertical-align:bottom;text-align:right;border:1px solid rgb(204,= +204,204)">1114</td></tr><tr style=3D"height:21px"><td style=3D"overflow:hid= +den;padding:2px 3px;vertical-align:bottom;border:1px solid rgb(204,204,204)= +">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= +=A08+</td><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:botto= +m;text-align:right;border:1px solid rgb(204,204,204)">5967</td></tr></tbody= +></table><br></div><div>You can read this as "469,537 blocks have the = +same version as their parent", "22,301 have the same version as t= +heir parent's parent", etc. Given the information in this table, w= +e may consider only allocating 2 bits in the encoding header rather than 3.= +</div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fr= +i, Mar 30, 2018 at 1:06 AM, Riccardo Casatta <span dir=3D"ltr"><<a href= +=3D"mailto:riccardo.casatta@gmail.com" target=3D"_blank">riccardo.casatta@g= +mail.com</a>></span> wrote:<br><blockquote class=3D"gmail_quote" style= +=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir= +=3D"ltr"><div>Yes, I think the checkpoints and the compressed headers strea= +ms should be handled in chunks of 2016 headers and queried by chunk number = +instead of height, falling back to current method if the chunk is not full = +yet.</div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">T= +his is cache friendly and allows to avoid bit 0 and bit 1 in the bitfield (= +because they are always 1 after the first header in the chunk of 2016).</di= +v><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><div><div= + class=3D"h5"><div class=3D"gmail_quote">2018-03-30 8:14 GMT+02:00 Anthony = +Towns <span dir=3D"ltr"><<a href=3D"mailto:aj@erisian.com.au" target=3D"= +_blank">aj@erisian.com.au</a>></span>:<br><blockquote class=3D"gmail_quo= +te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"= +><span>On Thu, Mar 29, 2018 at 05:50:30PM -0700, Jim Posen via bitcoin-dev = +wrote:<br> +> Taken a step further though, I'm really interested in treating the= + checkpoints<br> +</span>> as commitments to chain work [...]<br> +<br> +In that case, shouldn't the checkpoints just be every 2016 blocks and<b= +r> +include the corresponding bits value for that set of blocks?<br> +<br> +That way every node commits to (approximately) how much work their entire<b= +r> +chain has by sending something like 10kB of data (currently), and you<br> +could verify the deltas in each node's chain's target by downloadin= +g the<br> +2016 headers between those checkpoints (~80kB with the proposed compact<br> +encoding?) and checking the timestamps and proof of work match both the<br> +old target and the new target from adjacent checkpoints.<br> +<br> +(That probably still works fine even if there's a hardfork that allows<= +br> +difficulty to adjust more frequently: a bits value at block n*2016 will<br> +still enforce *some* lower limit on how much work blocks n*2016+{1..2016}<b= +r> +will have to contribute; so will still allow you to estimate how much work<= +br> +will have been done, it may just be less precise than the estimate you coul= +d<br> +generate now)<br> +<br> +Cheers,<br> +aj<br> +<br> +</blockquote></div><br><br clear=3D"all"><div><br></div></div></div><span c= +lass=3D"">-- <br><div class=3D"m_-4748661211707078546gmail_signature" data-= +smartmail=3D"gmail_signature"><div dir=3D"ltr">Riccardo Casatta - <a href= +=3D"https://twitter.com/RCasatta" target=3D"_blank">@RCasatta</a></div></di= +v> +</span></div></div> +</blockquote></div><br></div> + +--089e0825f9b01f8c500568eb1017-- + |