1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
|
Return-Path: <johanth@gmail.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 2F829A73
for <bitcoin-dev@lists.linuxfoundation.org>;
Wed, 23 May 2018 08:16:44 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 6AEE06BE
for <bitcoin-dev@lists.linuxfoundation.org>;
Wed, 23 May 2018 08:16:43 +0000 (UTC)
Received: by mail-wm0-f50.google.com with SMTP id q4-v6so1373921wmq.1
for <bitcoin-dev@lists.linuxfoundation.org>;
Wed, 23 May 2018 01:16:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
h=mime-version:in-reply-to:references:from:date:message-id:subject:to
:cc; bh=QmON9PyLw8wXM0tfKQiArUaB0wY3tsr5o4lAu6CT/nc=;
b=JLPGGLxTix4jANCejIMs+q7Ly4x+hyzQvVKcA+7Msud4BFzMUv/9/Tz8P4S90WU/s9
YufhDmxR/jFgOMBYzvdn6XGfPCIIUcJBiRyD5rHOG23z/kLREx0t5KvNEQ85KZ6YwIBo
FLcL43jZOVhzFxRNryrAQClC8RctrNApVm5dAZpuTY9H2mi8ueo7EiCke7iBEWEhsWxl
ADgMZBrYJTyyHl/aKtJL1ITX4IYj0Cbqxv0UV3mfPBPbuhfGD4qz7aRCM7x0V0ksqn9b
PVOKuIRf0I2jj/keDZEgRXaK3/Q3ntrlGjweHfvICf/9caDMbpyIHqR50e+SB8GWBCrE
WfQQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:mime-version:in-reply-to:references:from:date
:message-id:subject:to:cc;
bh=QmON9PyLw8wXM0tfKQiArUaB0wY3tsr5o4lAu6CT/nc=;
b=aPHMT3BPYY8aQd2KFrcAo7TBquhcoAUhgOcTBSoEW54opSPTUGHqNA8RnLbV62Xvix
jxB8lPYXz6PiGO3/hF35dwjZMkei/tf6SfTS/x9xOiYdyo4bVjo+e9BjbvMidKrZl6RC
DQRTqp4g//aAzHtYEhjhBs1JZTwMjl5Le38I9clr1Hya37LbkHAr1jeQZ1WR6DkZeKlo
3E3c2SafGNX141D9XmA6mr/Uni+C7P6qRZBdebLowCkxQiGLLo/Ey1uzrP8QEy5YfEy1
xGSE8iERiHm7nQZHBDegxWCWWQzDrgjzuWc0AqgCo+fbbL072jnnHCgf9nHS8RpEGYjx
tvBw==
X-Gm-Message-State: ALKqPweKJtSImoNRWxYR53JRIrziXoRM8Q4vA8kjqrdIQ7ransDB5qMg
9ZqoykbboeOAn+BAtnFofgs5P7UZUEXs03bgCXE=
X-Google-Smtp-Source: AB8JxZrrzK5sTtsI9IdZsqA2C8QRt7tNhdmzSNo+Qgl/bs3DDzzc/dOh8V2whA9JPsyatw/eQtnQUO5JzBqDo8c8dfM=
X-Received: by 2002:a2e:4d5d:: with SMTP id
a90-v6mr1072790ljb.86.1527063401897;
Wed, 23 May 2018 01:16:41 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:a19:d04d:0:0:0:0:0 with HTTP; Wed, 23 May 2018 01:16:41
-0700 (PDT)
In-Reply-To: <CADZtCShYnM3A949H18V2+BArA-K9J+cDkd=rX8xRn0+0js5CwA@mail.gmail.com>
References: <d43c6082-1b2c-c95b-5144-99ad0021ea6c@mattcorallo.com>
<CAAS2fgRF-MhOvpFY6c_qAPzNMo3GQ28RExdSbOV6Q6Oy2iWn1A@mail.gmail.com>
<22d375c7-a032-8691-98dc-0e6ee87a4b08@mattcorallo.com>
<CAAS2fgR3QRHeHEjjOS1ckEkL-h7=Na56G12hYW9Bmy9WEMduvg@mail.gmail.com>
<CADZtCShLmH_k-UssNWahUNHgHvWQQ1y638LwaOfnJEipwjbiYg@mail.gmail.com>
<CAAS2fgQLCN_cuZ-3QPjCLfYOtHfEk=SenTn5=y9LfGzJxLPR3Q@mail.gmail.com>
<CADZtCSjYr6VMBVQ=rx44SgRWcFSXhVXUZJB=rHMh4X78Z2eY1A@mail.gmail.com>
<CAO3Pvs9K3n=OzVQ06XGQvzNC+Aqp9S60kWM9VRPA8hWTJ3u9BQ@mail.gmail.com>
<c23a5346-9f99-44f0-abbf-d7e7979bf1d8@gmail.com>
<CAO3Pvs_MA4TtgCCu1NgCBjK2bZRN+rKnGQJN6m4yTrViBXRiPA@mail.gmail.com>
<CAD3i26BibcaMdbQv-j+Egz_1y0GuhzepBp5ATNpj=Qv8hi1TVA@mail.gmail.com>
<CADZtCShAYpbN=4qNoX5c8yd1j08+mEZzG8gZwcHrj2suY0mb9w@mail.gmail.com>
<CADZtCShYnM3A949H18V2+BArA-K9J+cDkd=rX8xRn0+0js5CwA@mail.gmail.com>
From: =?UTF-8?Q?Johan_Tor=C3=A5s_Halseth?= <johanth@gmail.com>
Date: Wed, 23 May 2018 10:16:41 +0200
Message-ID: <CAD3i26BdyZcWL5UKmk5KJtMtDjePfqs+EH1ZD6HPLZfwYxrfNA@mail.gmail.com>
To: Jim Posen <jim.posen@gmail.com>
Content-Type: multipart/alternative; boundary="0000000000009ea4dd056cdb2725"
X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,
DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE,
RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
X-Mailman-Approved-At: Wed, 23 May 2018 12:21:32 +0000
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Wed, 23 May 2018 08:16:44 -0000
--0000000000009ea4dd056cdb2725
Content-Type: text/plain; charset="UTF-8"
Thanks, Jimpo!
This is very encouraging, I think. I sorta assumed that separating the
elements into their own sub-filters would hurt the compression a lot more.
Can the compression ratio/false positive rate be tweaked with the
sub-filters in mind?
With the total size of the separated filters being no larger than the
combined filters, I see no benefit of combined filters? Committing to them
all in the headers would also save space, and we could ensure nodes are
serving all sub-filters.
- Johan
On Wed, May 23, 2018 at 9:38 AM, Jim Posen <jim.posen@gmail.com> wrote:
> So I checked filter sizes (as a proportion of block size) for each of the
> sub-filters. The graph is attached.
>
> As interpretation, the first ~120,000 blocks are so small that the
> Golomb-Rice coding can't compress the filters that well, which is why the
> filter sizes are so high proportional to the block size. Except for the
> input filter, because the coinbase input is skipped, so many of them have 0
> elements. But after block 120,000 or so, the filter compression converges
> pretty quickly to near the optimal value. The encouraging thing here is
> that if you look at the ratio of the combined size of the separated filters
> vs the size of a filter containing all of them (currently known as the
> basic filter), they are pretty much the same size. The mean of the ratio
> between them after block 150,000 is 99.4%. So basically, not much
> compression efficiently is lost by separating the basic filter into
> sub-filters.
>
> On Tue, May 22, 2018 at 5:42 PM, Jim Posen <jim.posen@gmail.com> wrote:
>
>> My suggestion was to advertise a bitfield for each filter type the node
>>> serves,
>>> where the bitfield indicates what elements are part of the filters. This
>>> essentially
>>> removes the notion of decided filter types and instead leaves the
>>> decision to
>>> full-nodes.
>>>
>>
>> I think it makes more sense to construct entirely separate filters for
>> the different types of elements and allow clients to download only the ones
>> they care about. If there are enough elements per filter, the compression
>> ratio shouldn't be much worse by splitting them up. This prevents the
>> exponential blowup in the number of filters that you mention, Johan, and it
>> works nicely with service bits for advertising different filter types
>> independently.
>>
>> So if we created three separate filter types, one for output scripts, one
>> for input outpoints, and one for TXIDs, each signaled with a separate
>> service bit, are people good with that? Or do you think there shouldn't be
>> a TXID filter at all, Matt? I didn't include the option of a prev output
>> script filter or rolling that into the block output script filter because
>> it changes the security model (cannot be proven to be correct/incorrect
>> succinctly).
>>
>> Then there's the question of whether to separate or combine the headers.
>> I'd lean towards keeping them separate because it's simpler that way.
>>
>
>
--0000000000009ea4dd056cdb2725
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">Thanks, Jimpo!<div><br></div><div>This is very encouraging=
, I think. I sorta assumed that separating the elements into their own sub-=
filters would hurt the compression a lot more. Can the compression ratio/fa=
lse positive rate be tweaked with the sub-filters in mind?</div><div><br></=
div><div>With the total size of the separated filters being no larger than =
the combined filters, I see no benefit of combined filters? Committing to t=
hem all in the headers would also save space, and we could ensure nodes are=
serving all sub-filters.</div><div><br></div><div>- Johan</div></div><div =
class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed, May 23, 2018 a=
t 9:38 AM, Jim Posen <span dir=3D"ltr"><<a href=3D"mailto:jim.posen@gmai=
l.com" target=3D"_blank">jim.posen@gmail.com</a>></span> wrote:<br><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #cc=
c solid;padding-left:1ex"><div dir=3D"ltr">So I checked filter sizes (as a =
proportion of block size) for each of the sub-filters. The graph is attache=
d.<div><br></div><div>As interpretation, the first ~120,000 blocks are so s=
mall that the Golomb-Rice coding can't compress the filters that well, =
which is why the filter sizes are so high proportional to the block size. E=
xcept for the input filter, because the coinbase input is skipped, so many =
of them have 0 elements. But after block 120,000 or so, the filter compress=
ion converges pretty quickly to near the optimal value. The encouraging thi=
ng here is that if you look at the ratio of the combined size of the separa=
ted filters vs the size of a filter containing all of them (currently known=
as the basic filter), they are pretty much the same size. The mean of the =
ratio between them after block 150,000 is 99.4%. So basically, not much com=
pression efficiently is lost by separating the basic filter into sub-filter=
s.</div></div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_e=
xtra"><br><div class=3D"gmail_quote">On Tue, May 22, 2018 at 5:42 PM, Jim P=
osen <span dir=3D"ltr"><<a href=3D"mailto:jim.posen@gmail.com" target=3D=
"_blank">jim.posen@gmail.com</a>></span> wrote:<br><blockquote class=3D"=
gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-=
left:1ex"><div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_q=
uote"><span><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>My suggest=
ion was to advertise a bitfield for each filter type the node serves,<br></=
div><div>where the bitfield indicates what elements are part of the filters=
. This essentially</div><div>removes the notion of decided filter types and=
instead leaves the decision to=C2=A0</div><div>full-nodes.</div></div></bl=
ockquote><div><br></div></span><div>I think it makes more sense to construc=
t entirely separate filters for the different types of elements and allow c=
lients to download only the ones they care about. If there are enough eleme=
nts per filter, the compression ratio shouldn't be much worse by splitt=
ing them up. This prevents the exponential blowup in the number of filters =
that you mention, Johan, and it works nicely with service bits for advertis=
ing different filter types independently.</div><div><br></div><div>So if we=
created three separate filter types, one for output scripts, one for input=
outpoints, and one for TXIDs, each signaled with a separate service bit, a=
re people good with that? Or do you think there shouldn't be a TXID fil=
ter at all, Matt? I didn't include the option of a prev output script f=
ilter or rolling that into the block output script filter because it change=
s the security model (cannot be proven to be correct/incorrect succinctly).=
</div><div><br></div><div>Then there's the question of whether to separ=
ate or combine the headers. I'd lean towards keeping them separate beca=
use it's simpler that way.<br></div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
--0000000000009ea4dd056cdb2725--
|