summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJim Posen <jim.posen@gmail.com>2018-04-02 22:34:39 -0700
committerbitcoindev <bitcoindev@gnusha.org>2018-04-03 05:34:42 +0000
commit48bce4cb4ebf403daf34f237ccc7e5c0ef8f1197 (patch)
tree035e6ddeeb02aab6330cde2839f85b67a8590164
parentf340f28218146c4c8d9b1257af798363b056b7c1 (diff)
downloadpi-bitcoindev-48bce4cb4ebf403daf34f237ccc7e5c0ef8f1197.tar.gz
pi-bitcoindev-48bce4cb4ebf403daf34f237ccc7e5c0ef8f1197.zip
Re: [bitcoin-dev] Optimized Header Sync
-rw-r--r--f8/058ba3a4523962643a3cd699bd96d2bc7790c1330
1 files changed, 330 insertions, 0 deletions
diff --git a/f8/058ba3a4523962643a3cd699bd96d2bc7790c1 b/f8/058ba3a4523962643a3cd699bd96d2bc7790c1
new file mode 100644
index 000000000..4e0071d9d
--- /dev/null
+++ b/f8/058ba3a4523962643a3cd699bd96d2bc7790c1
@@ -0,0 +1,330 @@
+Return-Path: <jim.posen@gmail.com>
+Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
+ [172.17.192.35])
+ by mail.linuxfoundation.org (Postfix) with ESMTPS id 7E3061229
+ for <bitcoin-dev@lists.linuxfoundation.org>;
+ Tue, 3 Apr 2018 05:34:42 +0000 (UTC)
+X-Greylist: whitelisted by SQLgrey-1.7.6
+Received: from mail-io0-f175.google.com (mail-io0-f175.google.com
+ [209.85.223.175])
+ by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 77AC5418
+ for <bitcoin-dev@lists.linuxfoundation.org>;
+ Tue, 3 Apr 2018 05:34:41 +0000 (UTC)
+Received: by mail-io0-f175.google.com with SMTP id x77so14210264ioi.2
+ for <bitcoin-dev@lists.linuxfoundation.org>;
+ Mon, 02 Apr 2018 22:34:41 -0700 (PDT)
+DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
+ h=mime-version:in-reply-to:references:from:date:message-id:subject:to
+ :cc; bh=Fpvv1IJIJQj0Hrd1aSnloxow6WBFnhKaWHKUng2WR1k=;
+ b=ebhppV0qHIOQ8ZlGtG0HaBMToMICdqQcN8RJ8FLk4hQkrpG57LAVU3+0lGb0fVzgnK
+ /qSqR48B7OejiTBtSgp+d6m3orTywAm2md/gJg+5/qqMBBhTPuqGHYsqhb93HOS6aj1t
+ EObTe+P+KGQ/gXWCZ7ymS5y2eqYudqhUNO5s28LZo0EIfK7y0Etx5MKalwMgu8Ax6zxW
+ e/Qo7BKqhHVvDw0VPJXcUEwnoy+OrHrEjHph7xcEgTpA+Wd/kANmFuPoTmloPIcy0DHm
+ OpEMVHPztbJkBnFKGCXWvtWEtdPAt2eEORAfQlrkfp/tGG8oDV35pbUFyndXqPWvN5ah
+ b9Nw==
+X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
+ d=1e100.net; s=20161025;
+ h=x-gm-message-state:mime-version:in-reply-to:references:from:date
+ :message-id:subject:to:cc;
+ bh=Fpvv1IJIJQj0Hrd1aSnloxow6WBFnhKaWHKUng2WR1k=;
+ b=UBNEi0HaNAzp9h46InbJvQ4xVWrLpBJCThGAUyCZwr+FOpQoCGzRf1E0cWBSAlK6Mb
+ 9uQhs3x/YeeHgKTb7SwwMrlD2sClC13aCZK8/VjUiw7KgT/PpT7uyxJsKr5xNkYLvfWB
+ 5KF8jAOVxpbSMED0xOU5ld40dGBRJyQD0ssZJO+3F/JorpG8Uc6Iu8XTgFbcis/duNod
+ qknmtnqvnviNbPJmSvGp3jsqLYX4htDZmjALdLXW+kgngM+1CTu5RIpg+TMcVQSqLvi/
+ ynOUVvaZIoFepvFCfFSfFg4gAeG9j2B0r/rqsIhfoAVwZj5CW72DKu3H2rvURW71B0N4
+ r73g==
+X-Gm-Message-State: ALQs6tD3UymMPrH/xjIJgJ4yZExG+4PSVI27Z7R7sSSoKWowbek9egng
+ cLzclMRTE+MN/O6gWZ1JHlURuKvRdS8jDr6LSqzeCjjw
+X-Google-Smtp-Source: AIpwx489ml/dFUFxqawGlTy37IbHrgbmdvyN3brpnRcw2L7UhyG847iXMoHNa6WGDJlh1lGN8sZ0nlBoaxb8oqLc3DY=
+X-Received: by 10.107.114.22 with SMTP id n22mr10238168ioc.41.1522733680657;
+ Mon, 02 Apr 2018 22:34:40 -0700 (PDT)
+MIME-Version: 1.0
+Received: by 10.107.52.80 with HTTP; Mon, 2 Apr 2018 22:34:39 -0700 (PDT)
+In-Reply-To: <CADabwBDiT2zNPHZ2=OyvCVrSY3Km2oyTnhRCHyMNsjW2vLMmOg@mail.gmail.com>
+References: <CADZtCSg7+x-sg-ysgacXobRexOVwT+k9fr6a9S-6xU2w8f8m3A@mail.gmail.com>
+ <CADabwBAjTRdVqsL+V=YdQ+kr8+LtSPOeSXUJOzKoPNdKEbAOWQ@mail.gmail.com>
+ <CADZtCSjmQfBZoaO=MCyRoEn-AYe4A=1kDhxSVxVMw+O4k7YJfQ@mail.gmail.com>
+ <20180330061418.GA6017@erisian.com.au>
+ <CADabwBDiT2zNPHZ2=OyvCVrSY3Km2oyTnhRCHyMNsjW2vLMmOg@mail.gmail.com>
+From: Jim Posen <jim.posen@gmail.com>
+Date: Mon, 2 Apr 2018 22:34:39 -0700
+Message-ID: <CADZtCShrTKqR26RMnUAwK32_7XKNvsCWgfYvQ3Bud2J6r54AKg@mail.gmail.com>
+To: Riccardo Casatta <riccardo.casatta@gmail.com>
+Content-Type: multipart/alternative; boundary="089e0825f9b01f8c500568eb1017"
+X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,
+ DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE,
+ RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
+X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
+ smtp1.linux-foundation.org
+X-Mailman-Approved-At: Tue, 03 Apr 2018 11:54:20 +0000
+Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
+Subject: Re: [bitcoin-dev] Optimized Header Sync
+X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
+X-Mailman-Version: 2.1.12
+Precedence: list
+List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
+List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
+ <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
+List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
+List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
+List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
+List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
+ <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
+X-List-Received-Date: Tue, 03 Apr 2018 05:34:42 -0000
+
+--089e0825f9b01f8c500568eb1017
+Content-Type: text/plain; charset="UTF-8"
+
+Thank you for your feedback AJ and Riccardo.
+
+Nice observation about using nBits from every 2016th block as a short
+specifier of chain work. You can get some savings from the 4 byte nBits
+encoding over VLQ for total chain work as in my spec.
+
+I tried it out on the current chain. At block height 516,387, there are 258
+total checkpoints in the response payload with an interval of 2016. The
+size of the checkpts message is:
+
+- 9,304 bytes using hash + nBits
+- 10,934 bytes using hash + chain work delta encoded as VLQ
+- 11,030 bytes using hash + chain work total encoded as VLQ
+
+The saving from using deltas instead of the total seems negligible to me
+especially considering the additional computation it requires. Going from
+total chain work as VLQ to nBits is a 16% savings in the size of a checkpts
+message. According to some rather rough benchmarks, it takes ~3us to
+generate the message with nBits versus ~105us to generate each message with
+VLQ chain work (including block index lookups and serialization time).
+
+The downside, however, is that the new P2P message would be tightly coupled
+to a specific parameter in Bitcoin's consensus protocol, and one that is
+changed in many alt chains. Also, it would require that checkpoints can
+only be fetched at intervals of 2016, instead of intervals chosen by the
+clients. Being able to specify the interval is a very nice property for
+longer chains, where a client may select really large intervals, then
+bisect that range even further to request a smaller PoW sample (eg. start
+by fetching every 10,000th, then every 100th).
+
+Personally, I strongly think using total chain work instead of nBits is the
+right tradeoff and is worth the extra 1KB. I'm curious to hear others'
+opinions. Note that the checkpoints message is only fetched once per peer
+per download from genesis. Subsequent catchups only fetch checkpoints from
+the locator fork point. I also don't find the caching argument compelling
+-- the time to generate checkpts response messages is fast enough anyway.
+
+I also finally got around to pulling numbers on the space savings from the
+nVersion omission. As a reminder of how this works, three bits in the
+encoding indicator represent a value 1-7 of the distance in block height
+since another block with the same version. Looking at the current Bitcoin
+main chain, this is a table of the occurrences of these values:
+
+Height distance # of Blocks
+1 469537
+2 22301
+3 8833
+4 4368
+5 2633
+6 1630
+7 1114
+ 8+ 5967
+You can read this as "469,537 blocks have the same version as their
+parent", "22,301 have the same version as their parent's parent", etc.
+Given the information in this table, we may consider only allocating 2 bits
+in the encoding header rather than 3.
+
+On Fri, Mar 30, 2018 at 1:06 AM, Riccardo Casatta <
+riccardo.casatta@gmail.com> wrote:
+
+> Yes, I think the checkpoints and the compressed headers streams should be
+> handled in chunks of 2016 headers and queried by chunk number instead of
+> height, falling back to current method if the chunk is not full yet.
+>
+> This is cache friendly and allows to avoid bit 0 and bit 1 in the bitfield
+> (because they are always 1 after the first header in the chunk of 2016).
+>
+> 2018-03-30 8:14 GMT+02:00 Anthony Towns <aj@erisian.com.au>:
+>
+>> On Thu, Mar 29, 2018 at 05:50:30PM -0700, Jim Posen via bitcoin-dev wrote:
+>> > Taken a step further though, I'm really interested in treating the
+>> checkpoints
+>> > as commitments to chain work [...]
+>>
+>> In that case, shouldn't the checkpoints just be every 2016 blocks and
+>> include the corresponding bits value for that set of blocks?
+>>
+>> That way every node commits to (approximately) how much work their entire
+>> chain has by sending something like 10kB of data (currently), and you
+>> could verify the deltas in each node's chain's target by downloading the
+>> 2016 headers between those checkpoints (~80kB with the proposed compact
+>> encoding?) and checking the timestamps and proof of work match both the
+>> old target and the new target from adjacent checkpoints.
+>>
+>> (That probably still works fine even if there's a hardfork that allows
+>> difficulty to adjust more frequently: a bits value at block n*2016 will
+>> still enforce *some* lower limit on how much work blocks n*2016+{1..2016}
+>> will have to contribute; so will still allow you to estimate how much work
+>> will have been done, it may just be less precise than the estimate you
+>> could
+>> generate now)
+>>
+>> Cheers,
+>> aj
+>>
+>>
+>
+>
+> --
+> Riccardo Casatta - @RCasatta <https://twitter.com/RCasatta>
+>
+
+--089e0825f9b01f8c500568eb1017
+Content-Type: text/html; charset="UTF-8"
+Content-Transfer-Encoding: quoted-printable
+
+<div dir=3D"ltr">Thank you for your feedback AJ and Riccardo.<div><br></div=
+><div>Nice observation about using nBits from every 2016th block as a short=
+ specifier of chain work. You can get some savings from the 4 byte nBits en=
+coding over VLQ for total chain work as in my spec.</div><br>I tried it out=
+ on the current chain. At block height 516,387, there are 258 total checkpo=
+ints in the response payload with an interval of 2016. The size of the chec=
+kpts message is:<div></div><div><br></div><div>-=C2=A09,304 bytes using has=
+h + nBits</div><div>-=C2=A010,934 bytes using hash + chain work delta encod=
+ed as VLQ</div><div>- 11,030 bytes using hash + chain work total encoded as=
+ VLQ</div><div><br></div><div>The saving from using deltas instead of the t=
+otal seems negligible to me especially considering the additional computati=
+on it requires. Going from total chain work as VLQ to nBits is a 16% saving=
+s in the size of a checkpts message. According to some rather rough benchma=
+rks, it takes ~3us to generate the message with nBits versus ~105us to gene=
+rate each message with VLQ chain work (including block index lookups and se=
+rialization time).</div><div><br></div><div>The downside, however, is that =
+the new P2P message would be tightly coupled to a specific parameter in Bit=
+coin&#39;s consensus protocol, and one that is changed in many alt chains. =
+Also, it would require that checkpoints can only be fetched at intervals of=
+ 2016, instead of intervals chosen by the clients. Being able to specify th=
+e interval is a very nice property for longer chains, where a client may se=
+lect really large intervals, then bisect that range even further to request=
+ a smaller PoW sample (eg. start by fetching every 10,000th, then every 100=
+th).</div><div><br></div><div>Personally, I strongly think using total chai=
+n work instead of nBits is the right tradeoff and is worth the extra 1KB. I=
+&#39;m curious to hear others&#39; opinions.=C2=A0<span style=3D"color:rgb(=
+34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;fo=
+nt-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter=
+-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-=
+space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decora=
+tion-style:initial;text-decoration-color:initial;float:none;display:inline"=
+>Note that the checkpoints message is only fetched once per peer per downlo=
+ad from genesis. Subsequent catchups only fetch checkpoints from the locato=
+r fork point. I also don&#39;t find the caching argument compelling -- the =
+time to generate checkpts response messages is fast enough anyway.</span></=
+div><div><span style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;fo=
+nt-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-=
+caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-ind=
+ent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-=
+color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:=
+initial;float:none;display:inline"><br></span></div><div>I also finally got=
+ around to pulling numbers on the space savings from the nVersion omission.=
+ As a reminder of how this works, three bits in the encoding indicator repr=
+esent a value 1-7 of the distance in block height since another block with =
+the same version. Looking at the current Bitcoin main chain, this is a tabl=
+e of the occurrences of these values:</div><div><br><table cellspacing=3D"0=
+" cellpadding=3D"0" dir=3D"ltr" border=3D"1" style=3D"table-layout:fixed;fo=
+nt-size:10pt;font-family:arial,sans,sans-serif;width:0px;border-collapse:co=
+llapse;border:none"><colgroup><col width=3D"100"><col width=3D"100"></colgr=
+oup><tbody><tr style=3D"height:21px"><td style=3D"overflow:hidden;padding:2=
+px 3px;vertical-align:bottom;border:1px solid rgb(204,204,204)">Height dist=
+ance</td><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom=
+;border:1px solid rgb(204,204,204)"># of Blocks</td></tr><tr style=3D"heigh=
+t:21px"><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;=
+text-align:right;border:1px solid rgb(204,204,204)">1</td><td style=3D"over=
+flow:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;border:1=
+px solid rgb(204,204,204)">469537</td></tr><tr style=3D"height:21px"><td st=
+yle=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;text-align:rig=
+ht;border:1px solid rgb(204,204,204)">2</td><td style=3D"overflow:hidden;pa=
+dding:2px 3px;vertical-align:bottom;text-align:right;border:1px solid rgb(2=
+04,204,204)">22301</td></tr><tr style=3D"height:21px"><td style=3D"overflow=
+:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;border:1px s=
+olid rgb(204,204,204)">3</td><td style=3D"overflow:hidden;padding:2px 3px;v=
+ertical-align:bottom;text-align:right;border:1px solid rgb(204,204,204)">88=
+33</td></tr><tr style=3D"height:21px"><td style=3D"overflow:hidden;padding:=
+2px 3px;vertical-align:bottom;text-align:right;border:1px solid rgb(204,204=
+,204)">4</td><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bo=
+ttom;text-align:right;border:1px solid rgb(204,204,204)">4368</td></tr><tr =
+style=3D"height:21px"><td style=3D"overflow:hidden;padding:2px 3px;vertical=
+-align:bottom;text-align:right;border:1px solid rgb(204,204,204)">5</td><td=
+ style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;text-align:=
+right;border:1px solid rgb(204,204,204)">2633</td></tr><tr style=3D"height:=
+21px"><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;te=
+xt-align:right;border:1px solid rgb(204,204,204)">6</td><td style=3D"overfl=
+ow:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;border:1px=
+ solid rgb(204,204,204)">1630</td></tr><tr style=3D"height:21px"><td style=
+=3D"overflow:hidden;padding:2px 3px;vertical-align:bottom;text-align:right;=
+border:1px solid rgb(204,204,204)">7</td><td style=3D"overflow:hidden;paddi=
+ng:2px 3px;vertical-align:bottom;text-align:right;border:1px solid rgb(204,=
+204,204)">1114</td></tr><tr style=3D"height:21px"><td style=3D"overflow:hid=
+den;padding:2px 3px;vertical-align:bottom;border:1px solid rgb(204,204,204)=
+">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
+=A08+</td><td style=3D"overflow:hidden;padding:2px 3px;vertical-align:botto=
+m;text-align:right;border:1px solid rgb(204,204,204)">5967</td></tr></tbody=
+></table><br></div><div>You can read this as &quot;469,537 blocks have the =
+same version as their parent&quot;, &quot;22,301 have the same version as t=
+heir parent&#39;s parent&quot;, etc. Given the information in this table, w=
+e may consider only allocating 2 bits in the encoding header rather than 3.=
+</div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fr=
+i, Mar 30, 2018 at 1:06 AM, Riccardo Casatta <span dir=3D"ltr">&lt;<a href=
+=3D"mailto:riccardo.casatta@gmail.com" target=3D"_blank">riccardo.casatta@g=
+mail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
+=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
+=3D"ltr"><div>Yes, I think the checkpoints and the compressed headers strea=
+ms should be handled in chunks of 2016 headers and queried by chunk number =
+instead of height, falling back to current method if the chunk is not full =
+yet.</div><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">T=
+his is cache friendly and allows to avoid bit 0 and bit 1 in the bitfield (=
+because they are always 1 after the first header in the chunk of 2016).</di=
+v><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra"><div><div=
+ class=3D"h5"><div class=3D"gmail_quote">2018-03-30 8:14 GMT+02:00 Anthony =
+Towns <span dir=3D"ltr">&lt;<a href=3D"mailto:aj@erisian.com.au" target=3D"=
+_blank">aj@erisian.com.au</a>&gt;</span>:<br><blockquote class=3D"gmail_quo=
+te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
+><span>On Thu, Mar 29, 2018 at 05:50:30PM -0700, Jim Posen via bitcoin-dev =
+wrote:<br>
+&gt; Taken a step further though, I&#39;m really interested in treating the=
+ checkpoints<br>
+</span>&gt; as commitments to chain work [...]<br>
+<br>
+In that case, shouldn&#39;t the checkpoints just be every 2016 blocks and<b=
+r>
+include the corresponding bits value for that set of blocks?<br>
+<br>
+That way every node commits to (approximately) how much work their entire<b=
+r>
+chain has by sending something like 10kB of data (currently), and you<br>
+could verify the deltas in each node&#39;s chain&#39;s target by downloadin=
+g the<br>
+2016 headers between those checkpoints (~80kB with the proposed compact<br>
+encoding?) and checking the timestamps and proof of work match both the<br>
+old target and the new target from adjacent checkpoints.<br>
+<br>
+(That probably still works fine even if there&#39;s a hardfork that allows<=
+br>
+difficulty to adjust more frequently: a bits value at block n*2016 will<br>
+still enforce *some* lower limit on how much work blocks n*2016+{1..2016}<b=
+r>
+will have to contribute; so will still allow you to estimate how much work<=
+br>
+will have been done, it may just be less precise than the estimate you coul=
+d<br>
+generate now)<br>
+<br>
+Cheers,<br>
+aj<br>
+<br>
+</blockquote></div><br><br clear=3D"all"><div><br></div></div></div><span c=
+lass=3D"">-- <br><div class=3D"m_-4748661211707078546gmail_signature" data-=
+smartmail=3D"gmail_signature"><div dir=3D"ltr">Riccardo Casatta - <a href=
+=3D"https://twitter.com/RCasatta" target=3D"_blank">@RCasatta</a></div></di=
+v>
+</span></div></div>
+</blockquote></div><br></div>
+
+--089e0825f9b01f8c500568eb1017--
+