Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 7E3061229 for ; Tue, 3 Apr 2018 05:34:42 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-io0-f175.google.com (mail-io0-f175.google.com [209.85.223.175]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 77AC5418 for ; Tue, 3 Apr 2018 05:34:41 +0000 (UTC) Received: by mail-io0-f175.google.com with SMTP id x77so14210264ioi.2 for ; Mon, 02 Apr 2018 22:34:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=Fpvv1IJIJQj0Hrd1aSnloxow6WBFnhKaWHKUng2WR1k=; b=ebhppV0qHIOQ8ZlGtG0HaBMToMICdqQcN8RJ8FLk4hQkrpG57LAVU3+0lGb0fVzgnK /qSqR48B7OejiTBtSgp+d6m3orTywAm2md/gJg+5/qqMBBhTPuqGHYsqhb93HOS6aj1t EObTe+P+KGQ/gXWCZ7ymS5y2eqYudqhUNO5s28LZo0EIfK7y0Etx5MKalwMgu8Ax6zxW e/Qo7BKqhHVvDw0VPJXcUEwnoy+OrHrEjHph7xcEgTpA+Wd/kANmFuPoTmloPIcy0DHm OpEMVHPztbJkBnFKGCXWvtWEtdPAt2eEORAfQlrkfp/tGG8oDV35pbUFyndXqPWvN5ah b9Nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=Fpvv1IJIJQj0Hrd1aSnloxow6WBFnhKaWHKUng2WR1k=; b=UBNEi0HaNAzp9h46InbJvQ4xVWrLpBJCThGAUyCZwr+FOpQoCGzRf1E0cWBSAlK6Mb 9uQhs3x/YeeHgKTb7SwwMrlD2sClC13aCZK8/VjUiw7KgT/PpT7uyxJsKr5xNkYLvfWB 5KF8jAOVxpbSMED0xOU5ld40dGBRJyQD0ssZJO+3F/JorpG8Uc6Iu8XTgFbcis/duNod qknmtnqvnviNbPJmSvGp3jsqLYX4htDZmjALdLXW+kgngM+1CTu5RIpg+TMcVQSqLvi/ ynOUVvaZIoFepvFCfFSfFg4gAeG9j2B0r/rqsIhfoAVwZj5CW72DKu3H2rvURW71B0N4 r73g== X-Gm-Message-State: ALQs6tD3UymMPrH/xjIJgJ4yZExG+4PSVI27Z7R7sSSoKWowbek9egng cLzclMRTE+MN/O6gWZ1JHlURuKvRdS8jDr6LSqzeCjjw X-Google-Smtp-Source: AIpwx489ml/dFUFxqawGlTy37IbHrgbmdvyN3brpnRcw2L7UhyG847iXMoHNa6WGDJlh1lGN8sZ0nlBoaxb8oqLc3DY= X-Received: by 10.107.114.22 with SMTP id n22mr10238168ioc.41.1522733680657; Mon, 02 Apr 2018 22:34:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.52.80 with HTTP; Mon, 2 Apr 2018 22:34:39 -0700 (PDT) In-Reply-To: References: <20180330061418.GA6017@erisian.com.au> From: Jim Posen Date: Mon, 2 Apr 2018 22:34:39 -0700 Message-ID: To: Riccardo Casatta Content-Type: multipart/alternative; boundary="089e0825f9b01f8c500568eb1017" X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Tue, 03 Apr 2018 11:54:20 +0000 Cc: Bitcoin Protocol Discussion Subject: Re: [bitcoin-dev] Optimized Header Sync X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Apr 2018 05:34:42 -0000 --089e0825f9b01f8c500568eb1017 Content-Type: text/plain; charset="UTF-8" Thank you for your feedback AJ and Riccardo. Nice observation about using nBits from every 2016th block as a short specifier of chain work. You can get some savings from the 4 byte nBits encoding over VLQ for total chain work as in my spec. I tried it out on the current chain. At block height 516,387, there are 258 total checkpoints in the response payload with an interval of 2016. The size of the checkpts message is: - 9,304 bytes using hash + nBits - 10,934 bytes using hash + chain work delta encoded as VLQ - 11,030 bytes using hash + chain work total encoded as VLQ The saving from using deltas instead of the total seems negligible to me especially considering the additional computation it requires. Going from total chain work as VLQ to nBits is a 16% savings in the size of a checkpts message. According to some rather rough benchmarks, it takes ~3us to generate the message with nBits versus ~105us to generate each message with VLQ chain work (including block index lookups and serialization time). The downside, however, is that the new P2P message would be tightly coupled to a specific parameter in Bitcoin's consensus protocol, and one that is changed in many alt chains. Also, it would require that checkpoints can only be fetched at intervals of 2016, instead of intervals chosen by the clients. Being able to specify the interval is a very nice property for longer chains, where a client may select really large intervals, then bisect that range even further to request a smaller PoW sample (eg. start by fetching every 10,000th, then every 100th). Personally, I strongly think using total chain work instead of nBits is the right tradeoff and is worth the extra 1KB. I'm curious to hear others' opinions. Note that the checkpoints message is only fetched once per peer per download from genesis. Subsequent catchups only fetch checkpoints from the locator fork point. I also don't find the caching argument compelling -- the time to generate checkpts response messages is fast enough anyway. I also finally got around to pulling numbers on the space savings from the nVersion omission. As a reminder of how this works, three bits in the encoding indicator represent a value 1-7 of the distance in block height since another block with the same version. Looking at the current Bitcoin main chain, this is a table of the occurrences of these values: Height distance # of Blocks 1 469537 2 22301 3 8833 4 4368 5 2633 6 1630 7 1114 8+ 5967 You can read this as "469,537 blocks have the same version as their parent", "22,301 have the same version as their parent's parent", etc. Given the information in this table, we may consider only allocating 2 bits in the encoding header rather than 3. On Fri, Mar 30, 2018 at 1:06 AM, Riccardo Casatta < riccardo.casatta@gmail.com> wrote: > Yes, I think the checkpoints and the compressed headers streams should be > handled in chunks of 2016 headers and queried by chunk number instead of > height, falling back to current method if the chunk is not full yet. > > This is cache friendly and allows to avoid bit 0 and bit 1 in the bitfield > (because they are always 1 after the first header in the chunk of 2016). > > 2018-03-30 8:14 GMT+02:00 Anthony Towns : > >> On Thu, Mar 29, 2018 at 05:50:30PM -0700, Jim Posen via bitcoin-dev wrote: >> > Taken a step further though, I'm really interested in treating the >> checkpoints >> > as commitments to chain work [...] >> >> In that case, shouldn't the checkpoints just be every 2016 blocks and >> include the corresponding bits value for that set of blocks? >> >> That way every node commits to (approximately) how much work their entire >> chain has by sending something like 10kB of data (currently), and you >> could verify the deltas in each node's chain's target by downloading the >> 2016 headers between those checkpoints (~80kB with the proposed compact >> encoding?) and checking the timestamps and proof of work match both the >> old target and the new target from adjacent checkpoints. >> >> (That probably still works fine even if there's a hardfork that allows >> difficulty to adjust more frequently: a bits value at block n*2016 will >> still enforce *some* lower limit on how much work blocks n*2016+{1..2016} >> will have to contribute; so will still allow you to estimate how much work >> will have been done, it may just be less precise than the estimate you >> could >> generate now) >> >> Cheers, >> aj >> >> > > > -- > Riccardo Casatta - @RCasatta > --089e0825f9b01f8c500568eb1017 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you for your feedback AJ and Riccardo.

Nice observation about using nBits from every 2016th block as a short= specifier of chain work. You can get some savings from the 4 byte nBits en= coding over VLQ for total chain work as in my spec.

I tried it out= on the current chain. At block height 516,387, there are 258 total checkpo= ints in the response payload with an interval of 2016. The size of the chec= kpts message is:

-=C2=A09,304 bytes using has= h + nBits
-=C2=A010,934 bytes using hash + chain work delta encod= ed as VLQ
- 11,030 bytes using hash + chain work total encoded as= VLQ

The saving from using deltas instead of the t= otal seems negligible to me especially considering the additional computati= on it requires. Going from total chain work as VLQ to nBits is a 16% saving= s in the size of a checkpts message. According to some rather rough benchma= rks, it takes ~3us to generate the message with nBits versus ~105us to gene= rate each message with VLQ chain work (including block index lookups and se= rialization time).

The downside, however, is that = the new P2P message would be tightly coupled to a specific parameter in Bit= coin's consensus protocol, and one that is changed in many alt chains. = Also, it would require that checkpoints can only be fetched at intervals of= 2016, instead of intervals chosen by the clients. Being able to specify th= e interval is a very nice property for longer chains, where a client may se= lect really large intervals, then bisect that range even further to request= a smaller PoW sample (eg. start by fetching every 10,000th, then every 100= th).

Personally, I strongly think using total chai= n work instead of nBits is the right tradeoff and is worth the extra 1KB. I= 'm curious to hear others' opinions.=C2=A0Note that the checkpoints message is only fetched once per peer per downlo= ad from genesis. Subsequent catchups only fetch checkpoints from the locato= r fork point. I also don't find the caching argument compelling -- the = time to generate checkpts response messages is fast enough anyway.

I also finally got= around to pulling numbers on the space savings from the nVersion omission.= As a reminder of how this works, three bits in the encoding indicator repr= esent a value 1-7 of the distance in block height since another block with = the same version. Looking at the current Bitcoin main chain, this is a tabl= e of the occurrences of these values:

2633
Height dist= ance# of Blocks
1469537
222301
388= 33
44368
5
61630
71114
5967

You can read this as "469,537 blocks have the = same version as their parent", "22,301 have the same version as t= heir parent's parent", etc. Given the information in this table, w= e may consider only allocating 2 bits in the encoding header rather than 3.=

On Fr= i, Mar 30, 2018 at 1:06 AM, Riccardo Casatta <riccardo.casatta@g= mail.com> wrote:
Yes, I think the checkpoints and the compressed headers strea= ms should be handled in chunks of 2016 headers and queried by chunk number = instead of height, falling back to current method if the chunk is not full = yet.

T= his is cache friendly and allows to avoid bit 0 and bit 1 in the bitfield (= because they are always 1 after the first header in the chunk of 2016).

2018-03-30 8:14 GMT+02:00 Anthony = Towns <aj@erisian.com.au>:
On Thu, Mar 29, 2018 at 05:50:30PM -0700, Jim Posen via bitcoin-dev = wrote:
> Taken a step further though, I'm really interested in treating the= checkpoints
> as commitments to chain work [...]

In that case, shouldn't the checkpoints just be every 2016 blocks and include the corresponding bits value for that set of blocks?

That way every node commits to (approximately) how much work their entire chain has by sending something like 10kB of data (currently), and you
could verify the deltas in each node's chain's target by downloadin= g the
2016 headers between those checkpoints (~80kB with the proposed compact
encoding?) and checking the timestamps and proof of work match both the
old target and the new target from adjacent checkpoints.

(That probably still works fine even if there's a hardfork that allows<= br> difficulty to adjust more frequently: a bits value at block n*2016 will
still enforce *some* lower limit on how much work blocks n*2016+{1..2016} will have to contribute; so will still allow you to estimate how much work<= br> will have been done, it may just be less precise than the estimate you coul= d
generate now)

Cheers,
aj




--
Riccardo Casatta - @RCasatta

--089e0825f9b01f8c500568eb1017--