Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 8548D596 for ; Sat, 19 May 2018 03:08:44 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 733DDB0 for ; Sat, 19 May 2018 03:08:43 +0000 (UTC) Received: by mail-wm0-f41.google.com with SMTP id n10-v6so18168439wmc.1 for ; Fri, 18 May 2018 20:08:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=n9sm6DXfn0NS4Q1aG08hQ1vYuTMoKI/vUqTwjTobsj0=; b=fMMnsWBXWJ9CwFznoMc+Jj+xic+vUh2ReWdvNVlGO1t2Uq/CTrN3xEre3ddatWUz+8 j6tna+7YyKlyiKHh6+5azJgUqLeW8jG2bZjezBQUsrbhvSFcEeNT9zXiU7Xkfxb74SBq wdakYWitxuLndU7DwCpMhHSVEjbZG3LgzJt1SgfKYPy6MF0ZTuHZCfb0kzJwi+mjG9kz K4i8T5E9/Kzi+t0KR3g53hNFI2tKRGPkH6s5PSQh9D0kA4EFmppJxwAfPb7Yge7YNbTg 4dXR7zVwp+PyYg1hsqp+ew5kgLtisf1Jsg8mHC0qTnddY8vPqTfoEsurDgz6DzMKVvuY mxJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=n9sm6DXfn0NS4Q1aG08hQ1vYuTMoKI/vUqTwjTobsj0=; b=KIRixiBY6UAGzc1j0Q4MiCfByFyFgV09aokAWKD4rR86WFMYfRoCdGf/Mfp/xFsCML rGoKElREgWmQgLu9G/RTSb8B9ang4oSG0qlg49Tieil/2bxZ8j8AQIT6XpqqmWqSFd4E x68Ms6x0DKP9NhxAyYpMHC3jTthqG90JcZqlVY5SXpku0hxEeiqunuThnabiJZWbNrXI A794FkeTyNt56/K6KJeErOL+7MSo/C/Sxc05Z8vbocjAPJsfVrXnQSgpFZvgiIYJvorS KLsNIrB+lwgZUCJs/uFACw8mVXSVNOc5AGEeKQMAgVQQUZHtWZPZTTsRJqd5V1OqWAZY sFkg== X-Gm-Message-State: ALKqPwcHmszEQjHulIq+UACXeFlvX2G23kC/7jG9Wvo8RF0Hvvf6qghT bp1Q9HdH/ZLm43Ax/fDxrgED5mxicczG+agZJ8s= X-Google-Smtp-Source: AB8JxZoKflsCfuzwg7jtsiQH2sxF8hdsV4nHAJibuasDovW3RLq/OhYStMCaocdXziNyuwyjAx1LhsDrPkH28AkjlLc= X-Received: by 2002:a50:a390:: with SMTP id s16-v6mr14471512edb.271.1526699322051; Fri, 18 May 2018 20:08:42 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Olaoluwa Osuntokun Date: Fri, 18 May 2018 20:08:29 -0700 Message-ID: To: Riccardo Casatta , Bitcoin Protocol Discussion Content-Type: multipart/alternative; boundary="000000000000c50ee8056c866263" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 May 2018 03:08:44 -0000 --000000000000c50ee8056c866263 Content-Type: text/plain; charset="UTF-8" Riccardo wrote: > The BIP recall some go code for how the parameter has been selected which > I can hardly understand and run The code you're linking to is for generating test vectors (to allow implementations to check the correctness of their gcs filters. The name of the file is 'gentestvectors.go'. It produces CSV files which contain test vectors of various testnet blocks and at various false positive rates. > it's totally my fault but if possible I would really like more details on > the process, like charts and explanations When we published the BIP draft last year (wow, time flies!), we put up code (as well as an interactive website) showing the process we used to arrive at the current false positive rate. The aim was to minimize the bandwidth required to download each filter plus the expected bandwidth from downloading "large-ish" full segwit blocks. The code simulated a few wallet types (in terms of number of addrs, etc) focusing on a "mid-sized" wallet. One could also model the selection as a Bernoulli process where we attempt to compute the probability that after k queries (let's say you have k addresses) we have k "successes". A success would mean the queries item wasn't found in the filter, while a failure is a filter match (false positive or not). A failure in the process requires fetching the entire block. -- Laolu On Fri, May 18, 2018 at 5:35 AM Riccardo Casatta via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > Another parameter which heavily affects filter size is the false positive > rate which is empirically set > > to 2^-20 > The BIP recall some go code > > for how the parameter has been selected which I can hardly understand and > run, it's totally my fault but if possible I would really like more details > on the process, like charts and explanations (for example, which is the > number of elements to search for which the filter has been optimized for?) > > Instinctively I feel 2^-20 is super low and choosing a lot higher alpha > will shrink the total filter size by gigabytes at the cost of having to > wastefully download just some megabytes of blocks. > > > 2018-05-17 18:36 GMT+02:00 Gregory Maxwell via bitcoin-dev < > bitcoin-dev@lists.linuxfoundation.org>: > >> On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev >> wrote: >> > I believe (1) could be skipped entirely - there is almost no reason why >> > you'd not be able to filter for, eg, the set of output scripts in a >> > transaction you know about >> >> I think this is convincing for the txids themselves. >> >> What about also making input prevouts filter based on the scriptpubkey >> being _spent_? Layering wise in the processing it's a bit ugly, but >> if you validated the block you have the data needed. >> >> This would eliminate the multiple data type mixing entirely. >> _______________________________________________ >> bitcoin-dev mailing list >> bitcoin-dev@lists.linuxfoundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev >> > > > > -- > Riccardo Casatta - @RCasatta > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > --000000000000c50ee8056c866263 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Riccardo wrote:
> The BIP recall some go= code for how the parameter has been selected which
> I can ha= rdly understand and run

The code you're linkin= g to is for generating test vectors (to allow
implementations to = check the correctness of their gcs filters. The name of
the file = is 'gentestvectors.go'. It produces CSV files which contain test
vectors of various testnet blocks and at various false positive rat= es.

> it's totally my fault but if possible= I would really like more details on
> the process, like chart= s and explanations

When we published the BIP draft= last year (wow, time flies!), we put up code
(as well as an inte= ractive website) showing the process we used to arrive at
the cur= rent false positive rate. The aim was to minimize the bandwidth
r= equired to download each filter plus the expected bandwidth from
= downloading "large-ish" full segwit blocks. The code simulated a = few wallet
types (in terms of number of addrs, etc) focusing on a= "mid-sized" wallet.
One could also model the selection= as a Bernoulli process where we attempt
to compute the probabili= ty that after k queries (let's say you have k
addresses) we h= ave k "successes". A success would mean the queries item
wasn't found in the filter, while a failure is a filter match (false<= /div>
positive or not). A failure in the process requires fetching the = entire
block.

-- Laolu

On Fri, May 18, 2018 at 5:35 AM Riccard= o Casatta via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">
Another parameter which heav= ily affects filter size is the false positive rate which is empirically set to 2^-20=C2=A0
The BIP recall some = go cod= e for how the parameter has been selected which I can hardly understand= and run, it's totally my fault but if possible I would really like mor= e details on the process, like charts and explanations (for example, which = is the number of elements to search for which the filter has been optimized= for?)

Instinctively I feel 2^-20 is super low and= choosing a lot higher alpha will shrink the total filter size by gigabytes= at the cost of having to wastefully download just some megabytes of blocks= .


2018-05-17 18:36 GMT+02:00 Greg= ory Maxwell via bitcoin-dev <bitcoin-dev@lists.linuxfo= undation.org>:
On Thu= , May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> I believe (1) could be skipped entirely - there is almost no reason wh= y
> you'd not be able to filter for, eg, the set of output scripts in = a
> transaction you know about

I think this is convincing for the txids themselves.

What about also making input prevouts filter based on the scriptpubkey
being _spent_?=C2=A0 Layering wise in the processing it's a bit ugly, b= ut
if you validated the block you have the data needed.

This would eliminate the multiple data type mixing entirely.
_______________________________________________
bitcoin-dev mailing list
= bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mail= man/listinfo/bitcoin-dev



<= div class=3D"gmail_extra">--
Riccardo Cas= atta - @RCasatta=
_______________________________________________
bitcoin-dev mailing list
= bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mail= man/listinfo/bitcoin-dev
--000000000000c50ee8056c866263--