Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 4D4F1A67 for ; Fri, 9 Jun 2017 03:59:30 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-yb0-f181.google.com (mail-yb0-f181.google.com [209.85.213.181]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 282A4E5 for ; Fri, 9 Jun 2017 03:59:29 +0000 (UTC) Received: by mail-yb0-f181.google.com with SMTP id 4so13456328ybl.1 for ; Thu, 08 Jun 2017 20:59:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=D5NMO7bM/slFvtGnCF1TNfV/R+Mz/w49XnHYILWG5l0=; b=NG8Bef86FK7a6koD4on+jvtmV1lN2bTNPPiUUN/+1dFgUn6u0beMAHdRRWIKc6qJll vJr7BI1PpRwqDcFtLhAHnYRR5kdv5mmFeu01FPwwC2lFJ6MYWQBsTXoG2Pm3rrrpzysp ABZ6UOyKUV/EM3LTrK+rvRglPkiiNyEPNiWN/VV35oSf9qtZByzK8FnGvMjRnLw5eVwW eIs0fezvdYyAMPTJ6RRB8CIC3cXUePy027s/hmIk+xZSIIcvvRodbmoeJZaY1BSTALT7 lGwlaGpelUu3zVwlUXxfZaK2g7/DNi9QKnYLGG5AvRMQLvvbNV1nlwcNRfZpCK7F1MQy PRsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=D5NMO7bM/slFvtGnCF1TNfV/R+Mz/w49XnHYILWG5l0=; b=EGcSmaNwfNeV8mkP4aktQwmf/Gt2qnvGPXlAtfFH3LI1kqAxxsxXWC+UXLlm/61hez Ie/EihNKhx8xDXB5v5J8+dzIB/0T3eK1QDtoW2FwwI4pcfPmkd7MAp3dOCjl4H3eYdGB NItXlGz6conzJuWKfpewEF2RSr3/JNhF2a1N/n7lTV1Uk5/gLE5eclRuFSuxYvD3XgJj CNdill8huD02PULGKdp/8TqDtqG+kXxLNgAN1nlQLS0YgH6cKjfB5GLhu6fVP6SjTi3l mH752jTAXX363SNnzFmOf8n87l0946k0E226MFSwyFdvmd7g+roViXOWLJ//RIav9BcP bk1Q== X-Gm-Message-State: AODbwcDeiNpJthAkmPyHxLEkqHTBsyWcGcYjGLaqsUHZIz5XaNrfpoEL P659JJt6dg5iCbb7w+a/ttgu9/AN3fG8 X-Received: by 10.37.128.131 with SMTP id n3mr13560280ybk.30.1496980768076; Thu, 08 Jun 2017 20:59:28 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Olaoluwa Osuntokun Date: Fri, 09 Jun 2017 03:59:17 +0000 Message-ID: To: Arnoud Kouwenhoven - Pukaki Corp via bitcoin-dev Content-Type: multipart/alternative; boundary="089e0822ef74eabbc705517efe44" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: Re: [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for Light Clients X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Jun 2017 03:59:30 -0000 --089e0822ef74eabbc705517efe44 Content-Type: text/plain; charset="UTF-8" Hi y'all, Thanks for all the comments so far! I've pushed a series of updates to the text of the BIP repo linked in the OP. The fixes include: typos, components of the specification which were incorrect (N is the total number of items, NOT the number of txns in the block), and a few sections have been clarified. The latest version also includes a set of test vectors (as CSV files), which for a series of fp rates (1/2 to 1/2^32) includes (for 6 testnet blocks, one of which generates a "null" filter): * The block height * The block hash * The raw block itself * The previous basic+extended filter header * The basic+extended filter header for the block * The basic+extended filter for the block The size of the test vectors was too large to include in-line within the document, so we put them temporarily in a distinct folder [1]. The code used to generate the test vectors has also been included. -- Laolu [1]: https://github.com/Roasbeef/bips/tree/master/gcs_light_client On Thu, Jun 1, 2017 at 9:49 PM Olaoluwa Osuntokun wrote: > > In order to consider the average+median filter sizes in a world worth > larger > > blocks, I also ran the index for testnet: > > > > * total size: 2753238530 > > * total avg: 5918.95736054141 > > * total median: 60202 > > * total max: 74983 > > * regular size: 1165148878 > > * regular avg: 2504.856172982827 > > * regular median: 24812 > > * regular max: 64554 > > * extended size: 1588089652 > > * extended avg: 3414.1011875585823 > > * extended median: 35260 > > * extended max: 41731 > > > > Oops, realized I made a mistake. These are the stats for Feb 2016 until > about a > month ago (since height 400k iirc). > > -- Laolu > > > On Thu, Jun 1, 2017 at 12:01 PM Olaoluwa Osuntokun > wrote: > >> Hi y'all, >> >> Alex Akselrod and I would like to propose a new light client BIP for >> consideration: >> * >> https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki >> >> This BIP proposal describes a concrete specification (along with a >> reference implementations[1][2][3]) for the much discussed client-side >> filtering reversal of BIP-37. The precise details are described in the >> BIP, but as a summary: we've implemented a new light-client mode that uses >> client-side filtering based off of Golomb-Rice coded sets. Full-nodes >> maintain an additional index of the chain, and serve this compact filter >> (the index) to light clients which request them. Light clients then fetch >> these filters, query the locally and _maybe_ fetch the block if a relevant >> item matches. The cool part is that blocks can be fetched from _any_ >> source, once the light client deems it necessary. Our primary motivation >> for this work was enabling a light client mode for lnd[4] in order to >> support a more light-weight back end paving the way for the usage of >> Lightning on mobile phones and other devices. We've integrated neutrino >> as a back end for lnd, and will be making the updated code public very >> soon. >> >> One specific area we'd like feedback on is the parameter selection. Unlike >> BIP-37 which allows clients to dynamically tune their false positive rate, >> our proposal uses a _fixed_ false-positive. Within the document, it's >> currently specified as P = 1/2^20. We've done a bit of analysis and >> optimization attempting to optimize the following sum: >> filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex >> has made a JS calculator that allows y'all to explore the affect of >> tweaking the false positive rate in addition to the following variables: >> the number of items the wallet is scanning for, the size of the blocks, >> number of blocks fetched, and the size of the filters themselves. The >> calculator calculates the expected bandwidth utilization using the CDF of >> the Geometric Distribution. The calculator can be found here: >> https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical >> script he's been running on actual data, and the results seem to match up >> rather nicely. >> >> We we're excited to see that Karl Johan Alm (kallewoof) has done some >> (rather extensive!) analysis of his own, focusing on a distinct encoding >> type [5]. I haven't had the time yet to dig into his report yet, but I >> think I've read enough to extract the key difference in our encodings: his >> filters use a binomial encoding _directly_ on the filter contents, will we >> instead create a Golomb-Coded set with the contents being _hashes_ (we use >> siphash) of the filter items. >> >> Using a fixed fp=20, I have some stats detailing the total index size, as >> well as averages for both mainnet and testnet. For mainnet, using the >> filter contents as currently described in the BIP (basic + extended), the >> total size of the index comes out to 6.9GB. The break down is as follows: >> >> * total size: 6976047156 >> * total avg: 14997.220622758816 >> * total median: 3801 >> * total max: 79155 >> * regular size: 3117183743 >> * regular avg: 6701.372750217131 >> * regular median: 1734 >> * regular max: 67533 >> * extended size: 3858863413 <(385)%20886-3413> >> * extended avg: 8295.847872541684 >> * extended median: 2041 >> * extended max: 52508 >> >> In order to consider the average+median filter sizes in a world worth >> larger blocks, I also ran the index for testnet: >> >> * total size: 2753238530 >> * total avg: 5918.95736054141 >> * total median: 60202 >> * total max: 74983 >> * regular size: 1165148878 >> * regular avg: 2504.856172982827 >> * regular median: 24812 >> * regular max: 64554 >> * extended size: 1588089652 >> * extended avg: 3414.1011875585823 >> * extended median: 35260 >> * extended max: 41731 >> >> Finally, here are the testnet stats which take into account the increase >> in the maximum filter size due to segwit's block-size increase. The max >> filter sizes are a bit larger due to some of the habitual blocks I >> created last year when testing segwit (transactions with 30k inputs, 30k >> outputs, etc). >> >> * total size: 585087597 >> * total avg: 520.8839608674402 >> * total median: 20 >> * total max: 164598 >> * regular size: 299325029 >> * regular avg: 266.4790836307566 >> * regular median: 13 >> * regular max: 164583 >> * extended size: 285762568 >> * extended avg: 254.4048772366836 >> * extended median: 7 >> * extended max: 127631 >> >> For those that are interested in the raw data, I've uploaded a CSV file >> of raw data for each block (mainnet + testnet), which can be found here: >> * mainnet: (14MB): >> https://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0 >> * testnet: (25MB): >> https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0 >> >> >> We look forward to getting feedback from all of y'all! >> >> -- Laolu >> >> >> [1]: https://github.com/lightninglabs/neutrino >> [2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf >> [3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs >> [4]: https://github.com/lightningnetwork/lnd/ >> >> -- Laolu >> >> --089e0822ef74eabbc705517efe44 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi y'all,=C2=A0

Tha= nks for all the comments so far!

I've pushed a= series of updates to the text of the BIP repo linked in the OP.
= The fixes include: typos, components of the specification which were incorr= ect
(N is the total number of items, NOT the number of txns in th= e block), and a
few sections have been clarified.

<= /div>
The latest version also includes a set of test vectors (as CSV fi= les), which
for a series of fp rates (1/2 to 1/2^32) includes (fo= r 6 testnet blocks, one of
which generates a "null" fil= ter):=C2=A0

=C2=A0 =C2=A0* The block height
<= div>=C2=A0 =C2=A0* The block hash
=C2=A0 =C2=A0* The raw block it= self
=C2=A0 =C2=A0* The previous basic+extended filter header=C2= =A0
=C2=A0 =C2=A0* The basic+extended filter header for the block=
=C2=A0 =C2=A0* The basic+extended filter for the block

The size of the test vectors was too large to include in-li= ne within the
document, so we put them temporarily in a distinct = folder [1]. The code used to
generate the test vectors has also b= een included.

-- Laolu



On Thu, Jun 1,= 2017 at 9:49 PM Olaoluwa Osuntokun <laolu32@gmail.com> wrote:
=
> In order to consider the average+median filter s= izes in a world worth larger
> blocks, I also ran the index fo= r testnet:
>=C2=A0
> =C2=A0 =C2=A0 * total size: = =C2=A02753238530
> =C2=A0 =C2=A0 * total avg: =C2=A05918.95736= 054141
> =C2=A0 =C2=A0 * total median: =C2=A060202
&= gt; =C2=A0 =C2=A0 * total max: =C2=A074983
> =C2=A0 =C2=A0 * r= egular size: =C2=A01165148878
> =C2=A0 =C2=A0 * regular avg: = =C2=A02504.856172982827
> =C2=A0 =C2=A0 * regular median: =C2= =A024812
> =C2=A0 =C2=A0 * regular max: =C2=A064554
= > =C2=A0 =C2=A0 * extended size: =C2=A01588089652
> =C2=A0 = =C2=A0 * extended avg: =C2=A03414.1011875585823
> =C2=A0 =C2= =A0 * extended median: =C2=A035260
> =C2=A0 =C2=A0 * extended = max: =C2=A041731
>=C2=A0

Oops, realized I made a mistake. These are the stats for Feb 201= 6 until about a
month ago (since height 400k iirc).
-- Laolu


On Thu, Jun 1, 2017 at 12:01 PM Olaolu= wa Osuntokun <lao= lu32@gmail.com> wrote:
Hi y'all,=C2=A0

Alex Akselrod= and I would like to propose a new light client BIP for
considera= tion:=C2=A0
=
This BIP proposal describes a concrete specification (along = with a
reference implementations[1][2][3]) for the much discussed= client-side
filtering reversal of BIP-37. The precise details ar= e described in the
BIP, but as a summary: we've implemented a= new light-client mode that uses
client-side filtering based off = of Golomb-Rice coded sets. Full-nodes
maintain an additional inde= x of the chain, and serve this compact filter
(the index) to ligh= t clients which request them. Light clients then fetch
these filt= ers, query the locally and _maybe_ fetch the block if a relevant
= item matches. The cool part is that blocks can be fetched from _any_
<= div>source, once the light client deems it necessary. Our primary motivatio= n
for this work was enabling a light client mode for lnd[4] in or= der to
support a more light-weight back end paving the way for th= e usage of
Lightning on mobile phones and other devices. We'v= e integrated neutrino
as a back end for lnd, and will be making t= he updated code public very
soon.

One sp= ecific area we'd like feedback on is the parameter selection. Unlike
BIP-37 which allows clients to dynamically tune their false positiv= e rate,
our proposal uses a _fixed_ false-positive. Within the do= cument, it's
currently specified as P =3D 1/2^20. We've d= one a bit of analysis and
optimization attempting to optimize the= following sum:
filter_download_bandwidth + expected_block_false_= positive_bandwidth. Alex
has made a JS calculator that allows y&#= 39;all to explore the affect of
tweaking the false positive rate = in addition to the following variables:
the number of items the w= allet is scanning for, the size of the blocks,
number of blocks f= etched, and the size of the filters themselves. The
calculator ca= lculates the expected bandwidth utilization using the CDF of
the = Geometric Distribution. The calculator can be found here:
s= cript he's been running on actual data, and the results seem to match u= p
rather nicely.

We we're excited to= see that Karl Johan Alm (kallewoof) has done some
(rather extens= ive!) analysis of his own, focusing on a distinct encoding
type [= 5]. I haven't had the time yet to dig into his report yet, but I
<= div>think I've read enough to extract the key difference in our encodin= gs: his
filters use a binomial encoding _directly_ on the filter = contents, will we
instead create a Golomb-Coded set with the cont= ents being _hashes_ (we use
siphash) of the filter items.

Using a fixed fp=3D20, I have some stats detailing the to= tal index size, as
well as averages for both mainnet and testnet.= For mainnet, using the
filter contents as currently described in= the BIP (basic + extended), the
total size of the index comes ou= t to 6.9GB. The break down is as follows:

=C2=A0 = =C2=A0 * total size: =C2=A06976047156
=C2=A0 =C2=A0 * total avg: = =C2=A014997.220622758816
=C2=A0 =C2=A0 * total median: =C2=A03801=
=C2=A0 =C2=A0 * total max: =C2=A079155
=C2=A0 =C2=A0 *= regular size: =C2=A03117183743
=C2=A0 =C2=A0 * regular avg: =C2= =A06701.372750217131
=C2=A0 =C2=A0 * regular median: =C2=A01734
=C2=A0 =C2=A0 * regular max: =C2=A067533
=C2=A0 =C2=A0 *= extended size: =C2=A03858863413
=C2=A0 =C2=A0 * extended avg:= =C2=A08295.847872541684
=C2=A0 =C2=A0 * extended median: =C2=A02= 041
=C2=A0 =C2=A0 * extended max: =C2=A052508

In order to consider the average+median filter sizes in a world worth=
larger blocks, I also ran the index for testnet:=C2=A0

=C2=A0 =C2=A0 * total size: =C2=A02753238530
=C2= =A0 =C2=A0 * total avg: =C2=A05918.95736054141
=C2=A0 =C2=A0 * to= tal median: =C2=A060202
=C2=A0 =C2=A0 * total max: =C2=A074983
=C2=A0 =C2=A0 * regular size: =C2=A01165148878
=C2=A0 =C2= =A0 * regular avg: =C2=A02504.856172982827
=C2=A0 =C2=A0 * regula= r median: =C2=A024812
=C2=A0 =C2=A0 * regular max: =C2=A064554
=C2=A0 =C2=A0 * extended size: =C2=A01588089652
=C2=A0 = =C2=A0 * extended avg: =C2=A03414.1011875585823
=C2=A0 =C2=A0 * e= xtended median: =C2=A035260
=C2=A0 =C2=A0 * extended max: =C2=A04= 1731

Finally, here are the testnet stats which tak= e into account the increase
in the maximum filter size due to seg= wit's block-size increase. The max
filter sizes are a bit lar= ger due to some of the habitual blocks I
created last year when t= esting segwit (transactions with 30k inputs, 30k
outputs, etc).

=C2=A0 =C2=A0 =C2=A0* total size: =C2=A0585087597
=C2=A0 =C2=A0 =C2=A0* total avg: =C2=A0520.8839608674402
=C2=A0 =C2=A0 =C2=A0* total median: =C2=A020
=C2=A0 =C2=A0 =C2= =A0* total max: =C2=A0164598
=C2=A0 =C2=A0 =C2=A0* regular size: = =C2=A0299325029
=C2=A0 =C2=A0 =C2=A0* regular avg: =C2=A0266.4790= 836307566
=C2=A0 =C2=A0 =C2=A0* regular median: =C2=A013
=C2=A0 =C2=A0 =C2=A0* regular max: =C2=A0164583
=C2=A0 =C2=A0 = =C2=A0* extended size: =C2=A0285762568
=C2=A0 =C2=A0 =C2=A0* exte= nded avg: =C2=A0254.4048772366836
=C2=A0 =C2=A0 =C2=A0* extended = median: =C2=A07
=C2=A0 =C2=A0 =C2=A0* extended max: =C2=A0127631<= /div>

For those that are interested in the raw data, I&#= 39;ve uploaded a CSV file
of raw data for each block (mainnet + t= estnet), which can be found here:


W= e look forward to getting feedback from all of y'all!
--089e0822ef74eabbc705517efe44--