summaryrefslogtreecommitdiff
path: root/9d/d7aa90c39a50effca9c7f7ced14463a6828ba0
blob: abb358f31c8ff0c19e92da4af203a356da6546d1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
Return-Path: <laolu32@gmail.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id A30CDD7F
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Tue, 22 May 2018 01:15:36 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-wm0-f52.google.com (mail-wm0-f52.google.com [74.125.82.52])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id AAB61F3
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Tue, 22 May 2018 01:15:35 +0000 (UTC)
Received: by mail-wm0-f52.google.com with SMTP id n10-v6so29556035wmc.1
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 21 May 2018 18:15:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:references:in-reply-to:from:date:message-id:subject:to
	:cc; bh=NgDqljMeKd0G5cLv7zLgUSJMCkf02t/Ye2V8Nt+CB1o=;
	b=m1h+202UfVE715qXQWBmKmgQqwib6pBx3T0R4EByqknZhH0m6avnNoIwXaWEYcZOCm
	+TX1WehPrn3BH5emIdW/nCmqPoOj03G6tRbPrDrxK0Y9faSsHhsHrun+atrT1eOoeU14
	CDlYzJ09wDXsOoWSVVL5CKRro/SzH8+PiBE3hDvfBWCCcH/gdao/DIEDJpJKJUWGHQlK
	Lq58rczcX/REf9IkqZg7U12iafn1llayr+/7YgfeoWpoWXtwUomVHwljZihij5qUv9Ic
	tHiUbmbJHY7/at/Aq7KygzxTmqnwyskk8/0sAo7ozR4eVvrpNH1MKmX7rLmNl54KzTFW
	yWlA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:references:in-reply-to:from:date
	:message-id:subject:to:cc;
	bh=NgDqljMeKd0G5cLv7zLgUSJMCkf02t/Ye2V8Nt+CB1o=;
	b=FDkFXzjLQtnC9kk908JAKS1GsLbobEwMxcrkQ0VlVAl8+dM180QscDX1PQuvOh+Y37
	bQAMnx0cH4ncqwgARTe//shmOTZnSqjel2zucSOu6LjkZO2cOJjv9/df2y0w6ten5+1G
	g1oN+yCHF97dyU0fRSBDXV4wEu67/vGvlTaRRbTC4CWwgkWcAS+AJQeKzqss5JX6ynY6
	1IMwP8BWg/R/Bp0bNJ6AzabQeFarEgObMhczunZuBh48ppBg95BZtCd3L8VrRPD9XCSh
	sy7a8lDCPI0zjM1cfJI/V42in0ByGuuwgWFYg9e29t30CTg6R3wDdGz3gmSbbYX4O93b
	M+Gg==
X-Gm-Message-State: ALKqPwctA0Di8eh9FHyBx5YxedtzslqxaC7TN8o5n4gCymxhHwqRE417
	/KZAlsbSE/Fv0yS8XQPSuz3PtPSQhDdwOzglw50=
X-Google-Smtp-Source: AB8JxZrUJvL0M04s1bMjYlTSD7ida7c2YyJcRE6TRFd/99izRO0iJaYrZxWyNH1QZikiQf1Hxq4ERmmgWpXU0NIQq/I=
X-Received: by 2002:a50:81e3:: with SMTP id
	90-v6mr26494528ede.252.1526951734174; 
	Mon, 21 May 2018 18:15:34 -0700 (PDT)
MIME-Version: 1.0
References: <d43c6082-1b2c-c95b-5144-99ad0021ea6c@mattcorallo.com>
	<CAAS2fgRF-MhOvpFY6c_qAPzNMo3GQ28RExdSbOV6Q6Oy2iWn1A@mail.gmail.com>
	<CAO3Pvs8DaphZjZUp8_Og+SMmYrrgFi3HyWTZb5J1mGVEcmkn8A@mail.gmail.com>
	<CAPg+sBhL8ZV+kswgyQfQyhd0Qv5Mkt1cYxrfFV4H32s9QYLo0A@mail.gmail.com>
In-Reply-To: <CAPg+sBhL8ZV+kswgyQfQyhd0Qv5Mkt1cYxrfFV4H32s9QYLo0A@mail.gmail.com>
From: Olaoluwa Osuntokun <laolu32@gmail.com>
Date: Mon, 21 May 2018 18:15:22 -0700
Message-ID: <CAO3Pvs_fSWN5mQeJzD302LuGyy-VFrNbv9r8JVPJ=v3-_c4kCQ@mail.gmail.com>
To: Pieter Wuille <pieter.wuille@gmail.com>
Content-Type: multipart/alternative; boundary="000000000000b4674c056cc12700"
X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,
	HTML_MESSAGE,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Cc: Bitcoin Dev <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Tue, 22 May 2018 01:15:36 -0000

--000000000000b4674c056cc12700
Content-Type: text/plain; charset="UTF-8"

Hi Y'all,

The script finished a few days ago with the following results:

reg-filter-prev-script total size:  161236078  bytes
reg-filter-prev-script avg:         16123.6078 bytes
reg-filter-prev-script median:      16584      bytes
reg-filter-prev-script max:         59480      bytes

Compared to the original median size of the same block range, but with the
current filter (has both txid, prev outpoint, output scripts), we see a
roughly 34% reduction in filter size (current median is 22258 bytes).
Compared to the suggested modified filter (no txid, prev outpoint, output
scripts), we see a 15% reduction in size (median of that was 19198 bytes).
This shows that script re-use is still pretty prevalent in the chain as of
recent.

One thing that occurred to me, is that on the application level, switching
to the input prev output script can make things a bit awkward. Observe that
when looking for matches in the filter, upon a match, one would need access
to an additional (outpoint -> script) map in order to locate _which_
particular transaction matched w/o access to an up-to-date UTOX set. In
contrast, as is atm, one can locate the matching transaction with no
additional information (as we're matching on the outpoint).

At this point, if we feel filter sizes need to drop further, then we may
need to consider raising the false positive rate.

Does anyone have any estimates or direct measures w.r.t how much bandwidth
current BIP 37 light clients consume? It would be nice to have a direct
comparison. We'd need to consider the size of their base bloom filter, the
accumulated bandwidth as a result of repeated filterload commands (to adjust
the fp rate), and also the overhead of receiving the merkle branch and
transactions in distinct messages (both due to matches and false positives).

Finally, I'd be open to removing the current "extended" filter from the BIP
as is all together for now. If a compelling use case for being able to
filter the sigScript/witness arises, then we can examine re-adding it with a
distinct service bit. After all it would be harder to phase out the filter
once wider deployment was already reached. Similarly, if the 16% savings
achieved by removing the txid is attractive, then we can create an
additional
filter just for the txids to allow those applications which need the
information to seek out that extra filter.

-- Laolu


On Fri, May 18, 2018 at 8:06 PM Pieter Wuille <pieter.wuille@gmail.com>
wrote:

> On Fri, May 18, 2018, 19:57 Olaoluwa Osuntokun via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
>> Greg wrote:
>> > What about also making input prevouts filter based on the scriptpubkey
>> being
>> > _spent_?  Layering wise in the processing it's a bit ugly, but if you
>> > validated the block you have the data needed.
>>
>> AFAICT, this would mean that in order for a new node to catch up the
>> filter
>> index (index all historical blocks), they'd either need to: build up a
>> utxo-set in memory during indexing, or would require a txindex in order to
>> look up the prev out's script. The first option increases the memory load
>> during indexing, and the second requires nodes to have a transaction index
>> (and would also add considerable I/O load). When proceeding from tip, this
>> doesn't add any additional load assuming that your synchronously index the
>> block as you validate it, otherwise the utxo set will already have been
>> updated (the spent scripts removed).
>>
>
> I was wondering about that too, but it turns out that isn't necessary. At
> least in Bitcoin Core, all the data needed for such a filter is in the
> block + undo files (the latter contain the scriptPubKeys of the outputs
> being spent).
>
> I have a script running to compare the filter sizes assuming the regular
>> filter switches to include the prev out's script rather than the prev
>> outpoint itself. The script hasn't yet finished (due to the increased I/O
>> load to look up the scripts when indexing), but I'll report back once it's
>> finished.
>>
>
> That's very helpful, thank you.
>
> Cheers,
>
> --
> Pieter
>
>

--000000000000b4674c056cc12700
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hi Y&#39;all,=C2=A0</div><div><br></div><div>The scri=
pt finished a few days ago with the following results:</div><div><br></div>=
<div>reg-filter-prev-script total size:=C2=A0 161236078=C2=A0 bytes</div><d=
iv>reg-filter-prev-script avg:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A016123.6078 =
bytes</div><div>reg-filter-prev-script median:=C2=A0 =C2=A0 =C2=A0 16584=C2=
=A0 =C2=A0 =C2=A0 bytes</div><div>reg-filter-prev-script max:=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A059480=C2=A0 =C2=A0 =C2=A0 bytes</div><div><br></div><di=
v>Compared to the original median size of the same block range, but with th=
e</div><div>current filter (has both txid, prev outpoint, output scripts), =
we see a</div><div>roughly 34% reduction in filter size (current median is =
22258 bytes).</div><div>Compared to the suggested modified filter (no txid,=
 prev outpoint, output</div><div>scripts), we see a 15% reduction in size (=
median of that was 19198 bytes).</div><div>This shows that script re-use is=
 still pretty prevalent in the chain as of</div><div>recent.</div><div><br>=
</div><div>One thing that occurred to me, is that on the application level,=
 switching</div><div>to the input prev output script can make things a bit =
awkward. Observe that</div><div>when looking for matches in the filter, upo=
n a match, one would need access</div><div>to an additional (outpoint -&gt;=
 script) map in order to locate _which_</div><div>particular transaction ma=
tched w/o access to an up-to-date UTOX set. In</div><div>contrast, as is at=
m, one can locate the matching transaction with no</div><div>additional inf=
ormation (as we&#39;re matching on the outpoint).</div><div><br></div><div>=
At this point, if we feel filter sizes need to drop further, then we may</d=
iv><div>need to consider raising the false positive rate.</div><div><br></d=
iv><div>Does anyone have any estimates or direct measures w.r.t how much ba=
ndwidth</div><div>current BIP 37 light clients consume? It would be nice to=
 have a direct</div><div>comparison. We&#39;d need to consider the size of =
their base bloom filter, the</div><div>accumulated bandwidth as a result of=
 repeated filterload commands (to adjust</div><div>the fp rate), and also t=
he overhead of receiving the merkle branch and</div><div>transactions in di=
stinct messages (both due to matches and false positives).</div><div><br></=
div><div>Finally, I&#39;d be open to removing the current &quot;extended&qu=
ot; filter from the BIP</div><div>as is all together for now. If a compelli=
ng use case for being able to</div><div>filter the sigScript/witness arises=
, then we can examine re-adding it with a</div><div>distinct service bit. A=
fter all it would be harder to phase out the filter</div><div>once wider de=
ployment was already reached. Similarly, if the 16% savings</div><div>achie=
ved by removing the txid is attractive, then we can create an additional</d=
iv><div>filter just for the txids to allow those applications which need th=
e</div><div>information to seek out that extra filter.</div><div><br></div>=
<div>-- Laolu</div><div><br></div><br><div class=3D"gmail_quote"><div dir=
=3D"ltr">On Fri, May 18, 2018 at 8:06 PM Pieter Wuille &lt;<a href=3D"mailt=
o:pieter.wuille@gmail.com">pieter.wuille@gmail.com</a>&gt; wrote:<br></div>=
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"auto"><div><div class=3D"gmail_q=
uote"><div dir=3D"ltr">On Fri, May 18, 2018, 19:57 Olaoluwa Osuntokun via b=
itcoin-dev &lt;<a href=3D"mailto:bitcoin-dev@lists.linuxfoundation.org" tar=
get=3D"_blank">bitcoin-dev@lists.linuxfoundation.org</a>&gt; wrote:<br></di=
v><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Greg wrote:</div><di=
v>&gt; What about also making input prevouts filter based on the scriptpubk=
ey being</div><div>&gt; _spent_?=C2=A0 Layering wise in the processing it&#=
39;s a bit ugly, but if you</div><div>&gt; validated the block you have the=
 data needed.</div><div><br></div><div>AFAICT, this would mean that in orde=
r for a new node to catch up the filter</div><div>index (index all historic=
al blocks), they&#39;d either need to: build up a</div><div>utxo-set in mem=
ory during indexing, or would require a txindex in order to</div><div>look =
up the prev out&#39;s script. The first option increases the memory load</d=
iv><div>during indexing, and the second requires nodes to have a transactio=
n index</div><div>(and would also add considerable I/O load). When proceedi=
ng from tip, this</div><div>doesn&#39;t add any additional load assuming th=
at your synchronously index the</div><div>block as you validate it, otherwi=
se the utxo set will already have been</div><div>updated (the spent scripts=
 removed).</div></div></blockquote></div></div><div dir=3D"auto"><br></div>=
</div><div dir=3D"auto"><div dir=3D"auto">I was wondering about that too, b=
ut it turns out that isn&#39;t necessary. At least in Bitcoin Core, all the=
 data needed for such a filter is in the block + undo files (the latter con=
tain the scriptPubKeys of the outputs being spent).</div></div><div dir=3D"=
auto"><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gmail_quo=
te"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef=
t:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>I have a script ru=
nning to compare the filter sizes assuming the regular</div><div>filter swi=
tches to include the prev out&#39;s script rather than the prev</div><div>o=
utpoint itself. The script hasn&#39;t yet finished (due to the increased I/=
O</div><div>load to look up the scripts when indexing), but I&#39;ll report=
 back once it&#39;s</div><div>finished.</div></div></blockquote></div></div=
><div dir=3D"auto"><br></div></div><div dir=3D"auto"><div dir=3D"auto">That=
&#39;s very helpful, thank you.</div><div dir=3D"auto"><br></div><div dir=
=3D"auto">Cheers,</div></div><div dir=3D"auto"><div dir=3D"auto"><br></div>=
<div dir=3D"auto">--=C2=A0</div><div dir=3D"auto">Pieter</div><div dir=3D"a=
uto"><br></div><div dir=3D"auto"><div class=3D"gmail_quote"><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex">
</blockquote></div></div></div></blockquote></div></div>

--000000000000b4674c056cc12700--