Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 2F829A73 for ; Wed, 23 May 2018 08:16:44 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 6AEE06BE for ; Wed, 23 May 2018 08:16:43 +0000 (UTC) Received: by mail-wm0-f50.google.com with SMTP id q4-v6so1373921wmq.1 for ; Wed, 23 May 2018 01:16:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=QmON9PyLw8wXM0tfKQiArUaB0wY3tsr5o4lAu6CT/nc=; b=JLPGGLxTix4jANCejIMs+q7Ly4x+hyzQvVKcA+7Msud4BFzMUv/9/Tz8P4S90WU/s9 YufhDmxR/jFgOMBYzvdn6XGfPCIIUcJBiRyD5rHOG23z/kLREx0t5KvNEQ85KZ6YwIBo FLcL43jZOVhzFxRNryrAQClC8RctrNApVm5dAZpuTY9H2mi8ueo7EiCke7iBEWEhsWxl ADgMZBrYJTyyHl/aKtJL1ITX4IYj0Cbqxv0UV3mfPBPbuhfGD4qz7aRCM7x0V0ksqn9b PVOKuIRf0I2jj/keDZEgRXaK3/Q3ntrlGjweHfvICf/9caDMbpyIHqR50e+SB8GWBCrE WfQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=QmON9PyLw8wXM0tfKQiArUaB0wY3tsr5o4lAu6CT/nc=; b=aPHMT3BPYY8aQd2KFrcAo7TBquhcoAUhgOcTBSoEW54opSPTUGHqNA8RnLbV62Xvix jxB8lPYXz6PiGO3/hF35dwjZMkei/tf6SfTS/x9xOiYdyo4bVjo+e9BjbvMidKrZl6RC DQRTqp4g//aAzHtYEhjhBs1JZTwMjl5Le38I9clr1Hya37LbkHAr1jeQZ1WR6DkZeKlo 3E3c2SafGNX141D9XmA6mr/Uni+C7P6qRZBdebLowCkxQiGLLo/Ey1uzrP8QEy5YfEy1 xGSE8iERiHm7nQZHBDegxWCWWQzDrgjzuWc0AqgCo+fbbL072jnnHCgf9nHS8RpEGYjx tvBw== X-Gm-Message-State: ALKqPweKJtSImoNRWxYR53JRIrziXoRM8Q4vA8kjqrdIQ7ransDB5qMg 9ZqoykbboeOAn+BAtnFofgs5P7UZUEXs03bgCXE= X-Google-Smtp-Source: AB8JxZrrzK5sTtsI9IdZsqA2C8QRt7tNhdmzSNo+Qgl/bs3DDzzc/dOh8V2whA9JPsyatw/eQtnQUO5JzBqDo8c8dfM= X-Received: by 2002:a2e:4d5d:: with SMTP id a90-v6mr1072790ljb.86.1527063401897; Wed, 23 May 2018 01:16:41 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a19:d04d:0:0:0:0:0 with HTTP; Wed, 23 May 2018 01:16:41 -0700 (PDT) In-Reply-To: References: <22d375c7-a032-8691-98dc-0e6ee87a4b08@mattcorallo.com> From: =?UTF-8?Q?Johan_Tor=C3=A5s_Halseth?= Date: Wed, 23 May 2018 10:16:41 +0200 Message-ID: To: Jim Posen Content-Type: multipart/alternative; boundary="0000000000009ea4dd056cdb2725" X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Wed, 23 May 2018 12:21:32 +0000 Cc: Bitcoin Protocol Discussion Subject: Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 May 2018 08:16:44 -0000 --0000000000009ea4dd056cdb2725 Content-Type: text/plain; charset="UTF-8" Thanks, Jimpo! This is very encouraging, I think. I sorta assumed that separating the elements into their own sub-filters would hurt the compression a lot more. Can the compression ratio/false positive rate be tweaked with the sub-filters in mind? With the total size of the separated filters being no larger than the combined filters, I see no benefit of combined filters? Committing to them all in the headers would also save space, and we could ensure nodes are serving all sub-filters. - Johan On Wed, May 23, 2018 at 9:38 AM, Jim Posen wrote: > So I checked filter sizes (as a proportion of block size) for each of the > sub-filters. The graph is attached. > > As interpretation, the first ~120,000 blocks are so small that the > Golomb-Rice coding can't compress the filters that well, which is why the > filter sizes are so high proportional to the block size. Except for the > input filter, because the coinbase input is skipped, so many of them have 0 > elements. But after block 120,000 or so, the filter compression converges > pretty quickly to near the optimal value. The encouraging thing here is > that if you look at the ratio of the combined size of the separated filters > vs the size of a filter containing all of them (currently known as the > basic filter), they are pretty much the same size. The mean of the ratio > between them after block 150,000 is 99.4%. So basically, not much > compression efficiently is lost by separating the basic filter into > sub-filters. > > On Tue, May 22, 2018 at 5:42 PM, Jim Posen wrote: > >> My suggestion was to advertise a bitfield for each filter type the node >>> serves, >>> where the bitfield indicates what elements are part of the filters. This >>> essentially >>> removes the notion of decided filter types and instead leaves the >>> decision to >>> full-nodes. >>> >> >> I think it makes more sense to construct entirely separate filters for >> the different types of elements and allow clients to download only the ones >> they care about. If there are enough elements per filter, the compression >> ratio shouldn't be much worse by splitting them up. This prevents the >> exponential blowup in the number of filters that you mention, Johan, and it >> works nicely with service bits for advertising different filter types >> independently. >> >> So if we created three separate filter types, one for output scripts, one >> for input outpoints, and one for TXIDs, each signaled with a separate >> service bit, are people good with that? Or do you think there shouldn't be >> a TXID filter at all, Matt? I didn't include the option of a prev output >> script filter or rolling that into the block output script filter because >> it changes the security model (cannot be proven to be correct/incorrect >> succinctly). >> >> Then there's the question of whether to separate or combine the headers. >> I'd lean towards keeping them separate because it's simpler that way. >> > > --0000000000009ea4dd056cdb2725 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks, Jimpo!

This is very encouraging= , I think. I sorta assumed that separating the elements into their own sub-= filters would hurt the compression a lot more. Can the compression ratio/fa= lse positive rate be tweaked with the sub-filters in mind?

With the total size of the separated filters being no larger than = the combined filters, I see no benefit of combined filters? Committing to t= hem all in the headers would also save space, and we could ensure nodes are= serving all sub-filters.

- Johan

On Wed, May 23, 2018 a= t 9:38 AM, Jim Posen <jim.posen@gmail.com> wrote:
So I checked filter sizes (as a = proportion of block size) for each of the sub-filters. The graph is attache= d.

As interpretation, the first ~120,000 blocks are so s= mall that the Golomb-Rice coding can't compress the filters that well, = which is why the filter sizes are so high proportional to the block size. E= xcept for the input filter, because the coinbase input is skipped, so many = of them have 0 elements. But after block 120,000 or so, the filter compress= ion converges pretty quickly to near the optimal value. The encouraging thi= ng here is that if you look at the ratio of the combined size of the separa= ted filters vs the size of a filter containing all of them (currently known= as the basic filter), they are pretty much the same size. The mean of the = ratio between them after block 150,000 is 99.4%. So basically, not much com= pression efficiently is lost by separating the basic filter into sub-filter= s.

On Tue, May 22, 2018 at 5:42 PM, Jim P= osen <jim.posen@gmail.com> wrote:
My suggest= ion was to advertise a bitfield for each filter type the node serves,
where the bitfield indicates what elements are part of the filters= . This essentially
removes the notion of decided filter types and= instead leaves the decision to=C2=A0
full-nodes.

I think it makes more sense to construc= t entirely separate filters for the different types of elements and allow c= lients to download only the ones they care about. If there are enough eleme= nts per filter, the compression ratio shouldn't be much worse by splitt= ing them up. This prevents the exponential blowup in the number of filters = that you mention, Johan, and it works nicely with service bits for advertis= ing different filter types independently.

So if we= created three separate filter types, one for output scripts, one for input= outpoints, and one for TXIDs, each signaled with a separate service bit, a= re people good with that? Or do you think there shouldn't be a TXID fil= ter at all, Matt? I didn't include the option of a prev output script f= ilter or rolling that into the block output script filter because it change= s the security model (cannot be proven to be correct/incorrect succinctly).=

Then there's the question of whether to separ= ate or combine the headers. I'd lean towards keeping them separate beca= use it's simpler that way.


--0000000000009ea4dd056cdb2725--