Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 7472F2C for ; Fri, 2 Jun 2017 04:49:29 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-yw0-f178.google.com (mail-yw0-f178.google.com [209.85.161.178]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 6CD9213B for ; Fri, 2 Jun 2017 04:49:28 +0000 (UTC) Received: by mail-yw0-f178.google.com with SMTP id b68so28705555ywe.3 for ; Thu, 01 Jun 2017 21:49:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=QEQvtY3q9rR76eMn2tDVxKiuwhmScobWAZrnjpdB434=; b=nkVaX4kQ2JCfedD/GWmHXzM+y0CFRejqRKewbRxDYWTyd900r9A8wlwuCuPW6LrTB5 LRIjw5nq6iYt212xfMU0H0ArFt6sX1gbIwZkdVXAUpfJ1GATlPpCEAluRFUL7RNgGSpl 755ONuAqRHR9j/cSJtlQjVzKVvQ2LBga87SlQDZz76sRaDKeP3ANN/DDVC5CEdPkCOj8 6/1zZ4Gqc91NfHBJs4ixS7UWhS+UnofcYxO496WujPu4zxxELdEDpFuHzOYqx5XxfKwW jJ9G+lki4meu7yc7eWhHyM7HBwB/bixGT/v9q4mDbB01VN0885R4/g4t1c1f+cWgfbsQ Wscg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=QEQvtY3q9rR76eMn2tDVxKiuwhmScobWAZrnjpdB434=; b=iGeHQhn6hM98org5lFR5N2akcmUO0zJwbyGpwjohFJANtbkQxdj+TK2NgKWLPevIpr 1ACx43EbkMT9GmhCks+zCi+XrhNAIIMpXFOxmMxf2xonsvjOE2QB9RK2AQEpj5u4CVYQ eLXL9BZKWgL+Z9uahwETvlIH+oPZZNVcOQWPIP93w7bk1pAjSRv3ITe4HKROTVOmB3nT 1c+tZ3WeUc32YWKY8Yj++A0AA6llNVg7mGU2DDGggiwCDaINCuk0/eVLL/Xwg5DX0XSB FGWhc4dZByKV3AbS+2G2z+QxlU662+AOshJB+yysCyG0bmnUsMaqeIL+DtnJvpMR+K8r RgWQ== X-Gm-Message-State: AODbwcCgl/LVP9QOqzqUE/lVym+c7XU+/SZ27YDvXFxub+jNKn68+YbK kWqA5sXqEe2qCwWxaDyimZkj9FGI1ge+ X-Received: by 10.129.88.84 with SMTP id m81mr4319648ywb.22.1496378967406; Thu, 01 Jun 2017 21:49:27 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Olaoluwa Osuntokun Date: Fri, 02 Jun 2017 04:49:16 +0000 Message-ID: To: Arnoud Kouwenhoven - Pukaki Corp via bitcoin-dev Content-Type: multipart/alternative; boundary="001a11492ff2cd5d460550f2e089" X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: Re: [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for Light Clients X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jun 2017 04:49:29 -0000 --001a11492ff2cd5d460550f2e089 Content-Type: text/plain; charset="UTF-8" > In order to consider the average+median filter sizes in a world worth larger > blocks, I also ran the index for testnet: > > * total size: 2753238530 > * total avg: 5918.95736054141 > * total median: 60202 > * total max: 74983 > * regular size: 1165148878 > * regular avg: 2504.856172982827 > * regular median: 24812 > * regular max: 64554 > * extended size: 1588089652 > * extended avg: 3414.1011875585823 > * extended median: 35260 > * extended max: 41731 > Oops, realized I made a mistake. These are the stats for Feb 2016 until about a month ago (since height 400k iirc). -- Laolu On Thu, Jun 1, 2017 at 12:01 PM Olaoluwa Osuntokun wrote: > Hi y'all, > > Alex Akselrod and I would like to propose a new light client BIP for > consideration: > * > https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki > > This BIP proposal describes a concrete specification (along with a > reference implementations[1][2][3]) for the much discussed client-side > filtering reversal of BIP-37. The precise details are described in the > BIP, but as a summary: we've implemented a new light-client mode that uses > client-side filtering based off of Golomb-Rice coded sets. Full-nodes > maintain an additional index of the chain, and serve this compact filter > (the index) to light clients which request them. Light clients then fetch > these filters, query the locally and _maybe_ fetch the block if a relevant > item matches. The cool part is that blocks can be fetched from _any_ > source, once the light client deems it necessary. Our primary motivation > for this work was enabling a light client mode for lnd[4] in order to > support a more light-weight back end paving the way for the usage of > Lightning on mobile phones and other devices. We've integrated neutrino > as a back end for lnd, and will be making the updated code public very > soon. > > One specific area we'd like feedback on is the parameter selection. Unlike > BIP-37 which allows clients to dynamically tune their false positive rate, > our proposal uses a _fixed_ false-positive. Within the document, it's > currently specified as P = 1/2^20. We've done a bit of analysis and > optimization attempting to optimize the following sum: > filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex > has made a JS calculator that allows y'all to explore the affect of > tweaking the false positive rate in addition to the following variables: > the number of items the wallet is scanning for, the size of the blocks, > number of blocks fetched, and the size of the filters themselves. The > calculator calculates the expected bandwidth utilization using the CDF of > the Geometric Distribution. The calculator can be found here: > https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical > script he's been running on actual data, and the results seem to match up > rather nicely. > > We we're excited to see that Karl Johan Alm (kallewoof) has done some > (rather extensive!) analysis of his own, focusing on a distinct encoding > type [5]. I haven't had the time yet to dig into his report yet, but I > think I've read enough to extract the key difference in our encodings: his > filters use a binomial encoding _directly_ on the filter contents, will we > instead create a Golomb-Coded set with the contents being _hashes_ (we use > siphash) of the filter items. > > Using a fixed fp=20, I have some stats detailing the total index size, as > well as averages for both mainnet and testnet. For mainnet, using the > filter contents as currently described in the BIP (basic + extended), the > total size of the index comes out to 6.9GB. The break down is as follows: > > * total size: 6976047156 > * total avg: 14997.220622758816 > * total median: 3801 > * total max: 79155 > * regular size: 3117183743 > * regular avg: 6701.372750217131 > * regular median: 1734 > * regular max: 67533 > * extended size: 3858863413 <(385)%20886-3413> > * extended avg: 8295.847872541684 > * extended median: 2041 > * extended max: 52508 > > In order to consider the average+median filter sizes in a world worth > larger blocks, I also ran the index for testnet: > > * total size: 2753238530 > * total avg: 5918.95736054141 > * total median: 60202 > * total max: 74983 > * regular size: 1165148878 > * regular avg: 2504.856172982827 > * regular median: 24812 > * regular max: 64554 > * extended size: 1588089652 > * extended avg: 3414.1011875585823 > * extended median: 35260 > * extended max: 41731 > > Finally, here are the testnet stats which take into account the increase > in the maximum filter size due to segwit's block-size increase. The max > filter sizes are a bit larger due to some of the habitual blocks I > created last year when testing segwit (transactions with 30k inputs, 30k > outputs, etc). > > * total size: 585087597 > * total avg: 520.8839608674402 > * total median: 20 > * total max: 164598 > * regular size: 299325029 > * regular avg: 266.4790836307566 > * regular median: 13 > * regular max: 164583 > * extended size: 285762568 > * extended avg: 254.4048772366836 > * extended median: 7 > * extended max: 127631 > > For those that are interested in the raw data, I've uploaded a CSV file > of raw data for each block (mainnet + testnet), which can be found here: > * mainnet: (14MB): > https://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0 > * testnet: (25MB): > https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0 > > > We look forward to getting feedback from all of y'all! > > -- Laolu > > > [1]: https://github.com/lightninglabs/neutrino > [2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf > [3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs > [4]: https://github.com/lightningnetwork/lnd/ > > -- Laolu > > --001a11492ff2cd5d460550f2e089 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
> In order to consider the average+median filter s= izes in a world worth larger
> blocks, I also ran the index fo= r testnet:
>=C2=A0
> =C2=A0 =C2=A0 * total size: = =C2=A02753238530
> =C2=A0 =C2=A0 * total avg: =C2=A05918.95736= 054141
> =C2=A0 =C2=A0 * total median: =C2=A060202
&= gt; =C2=A0 =C2=A0 * total max: =C2=A074983
> =C2=A0 =C2=A0 * r= egular size: =C2=A01165148878
> =C2=A0 =C2=A0 * regular avg: = =C2=A02504.856172982827
> =C2=A0 =C2=A0 * regular median: =C2= =A024812
> =C2=A0 =C2=A0 * regular max: =C2=A064554
= > =C2=A0 =C2=A0 * extended size: =C2=A01588089652
> =C2=A0 = =C2=A0 * extended avg: =C2=A03414.1011875585823
> =C2=A0 =C2= =A0 * extended median: =C2=A035260
> =C2=A0 =C2=A0 * extended = max: =C2=A041731
>=C2=A0

Oops, realiz= ed I made a mistake. These are the stats for Feb 2016 until about a
month ago (since height 400k iirc).

-- Laolu


On Thu, J= un 1, 2017 at 12:01 PM Olaoluwa Osuntokun <laolu32@gmail.com> wrote:
Hi y'all,=C2=A0

Al= ex Akselrod and I would like to propose a new light client BIP for
consideration:=C2=A0

This BIP proposal describes a concrete specificat= ion (along with a
reference implementations[1][2][3]) for the muc= h discussed client-side
filtering reversal of BIP-37. The precise= details are described in the
BIP, but as a summary: we've im= plemented a new light-client mode that uses
client-side filtering= based off of Golomb-Rice coded sets. Full-nodes
maintain an addi= tional index of the chain, and serve this compact filter
(the ind= ex) to light clients which request them. Light clients then fetch
these filters, query the locally and _maybe_ fetch the block if a relevant=
item matches. The cool part is that blocks can be fetched from _= any_
source, once the light client deems it necessary. Our primar= y motivation
for this work was enabling a light client mode for l= nd[4] in order to
support a more light-weight back end paving the= way for the usage of
Lightning on mobile phones and other device= s. We've integrated neutrino
as a back end for lnd, and will = be making the updated code public very
soon.

=
One specific area we'd like feedback on is the parameter selection= . Unlike
BIP-37 which allows clients to dynamically tune their fa= lse positive rate,
our proposal uses a _fixed_ false-positive. Wi= thin the document, it's
currently specified as P =3D 1/2^20. = We've done a bit of analysis and
optimization attempting to o= ptimize the following sum:
filter_download_bandwidth + expected_b= lock_false_positive_bandwidth. Alex
has made a JS calculator that= allows y'all to explore the affect of
tweaking the false pos= itive rate in addition to the following variables:
the number of = items the wallet is scanning for, the size of the blocks,
number = of blocks fetched, and the size of the filters themselves. The
ca= lculator calculates the expected bandwidth utilization using the CDF of
the Geometric Distribution. The calculator can be found here:
<= div>https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical<= /div>
script he's been running on actual data, and the results seem= to match up
rather nicely.

We we're= excited to see that Karl Johan Alm (kallewoof) has done some
(ra= ther extensive!) analysis of his own, focusing on a distinct encoding
=
type [5]. I haven't had the time yet to dig into his report yet, b= ut I
think I've read enough to extract the key difference in = our encodings: his
filters use a binomial encoding _directly_ on = the filter contents, will we
instead create a Golomb-Coded set wi= th the contents being _hashes_ (we use
siphash) of the filter ite= ms.

Using a fixed fp=3D20, I have some stats detai= ling the total index size, as
well as averages for both mainnet a= nd testnet. For mainnet, using the
filter contents as currently d= escribed in the BIP (basic + extended), the
total size of the ind= ex comes out to 6.9GB. The break down is as follows:

=C2=A0 =C2=A0 * total size: =C2=A06976047156
=C2=A0 =C2=A0 * t= otal avg: =C2=A014997.220622758816
=C2=A0 =C2=A0 * total median: = =C2=A03801
=C2=A0 =C2=A0 * total max: =C2=A079155
=C2= =A0 =C2=A0 * regular size: =C2=A03117183743
=C2=A0 =C2=A0 * regul= ar avg: =C2=A06701.372750217131
=C2=A0 =C2=A0 * regular median: = =C2=A01734
=C2=A0 =C2=A0 * regular max: =C2=A067533
=C2= =A0 =C2=A0 * extended size: =C2=A03858863413
=C2=A0 =C2=A0 * e= xtended avg: =C2=A08295.847872541684
=C2=A0 =C2=A0 * extended med= ian: =C2=A02041
=C2=A0 =C2=A0 * extended max: =C2=A052508

In order to consider the average+median filter sizes in a= world worth
larger blocks, I also ran the index for testnet:=C2= =A0

=C2=A0 =C2=A0 * total size: =C2=A02753238530
=C2=A0 =C2=A0 * total avg: =C2=A05918.95736054141
=C2=A0= =C2=A0 * total median: =C2=A060202
=C2=A0 =C2=A0 * total max: = =C2=A074983
=C2=A0 =C2=A0 * regular size: =C2=A01165148878
<= div>=C2=A0 =C2=A0 * regular avg: =C2=A02504.856172982827
=C2=A0 = =C2=A0 * regular median: =C2=A024812
=C2=A0 =C2=A0 * regular max:= =C2=A064554
=C2=A0 =C2=A0 * extended size: =C2=A01588089652
=C2=A0 =C2=A0 * extended avg: =C2=A03414.1011875585823
=C2= =A0 =C2=A0 * extended median: =C2=A035260
=C2=A0 =C2=A0 * extende= d max: =C2=A041731

Finally, here are the testnet s= tats which take into account the increase
in the maximum filter s= ize due to segwit's block-size increase. The max
filter sizes= are a bit larger due to some of the habitual blocks I
created la= st year when testing segwit (transactions with 30k inputs, 30k
ou= tputs, etc).

=C2=A0 =C2=A0 =C2=A0* total size: =C2= =A0585087597
=C2=A0 =C2=A0 =C2=A0* total avg: =C2=A0520.883960867= 4402
=C2=A0 =C2=A0 =C2=A0* total median: =C2=A020
=C2= =A0 =C2=A0 =C2=A0* total max: =C2=A0164598
=C2=A0 =C2=A0 =C2=A0* = regular size: =C2=A0299325029
=C2=A0 =C2=A0 =C2=A0* regular avg: = =C2=A0266.4790836307566
=C2=A0 =C2=A0 =C2=A0* regular median: =C2= =A013
=C2=A0 =C2=A0 =C2=A0* regular max: =C2=A0164583
= =C2=A0 =C2=A0 =C2=A0* extended size: =C2=A0285762568
=C2=A0 =C2= =A0 =C2=A0* extended avg: =C2=A0254.4048772366836
=C2=A0 =C2=A0 = =C2=A0* extended median: =C2=A07
=C2=A0 =C2=A0 =C2=A0* extended m= ax: =C2=A0127631

For those that are interested in = the raw data, I've uploaded a CSV file
of raw data for each b= lock (mainnet + testnet), which can be found here:


--001a11492ff2cd5d460550f2e089--