summaryrefslogtreecommitdiff
path: root/5b/57e85e13f9277046eaf444805f0599bc5fd47e
blob: cd06cf6341d740135610bb220efb5bece81ea114 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
Return-Path: <laolu32@gmail.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 8548D596
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat, 19 May 2018 03:08:44 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 733DDB0
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Sat, 19 May 2018 03:08:43 +0000 (UTC)
Received: by mail-wm0-f41.google.com with SMTP id n10-v6so18168439wmc.1
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 18 May 2018 20:08:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:references:in-reply-to:from:date:message-id:subject:to; 
	bh=n9sm6DXfn0NS4Q1aG08hQ1vYuTMoKI/vUqTwjTobsj0=;
	b=fMMnsWBXWJ9CwFznoMc+Jj+xic+vUh2ReWdvNVlGO1t2Uq/CTrN3xEre3ddatWUz+8
	j6tna+7YyKlyiKHh6+5azJgUqLeW8jG2bZjezBQUsrbhvSFcEeNT9zXiU7Xkfxb74SBq
	wdakYWitxuLndU7DwCpMhHSVEjbZG3LgzJt1SgfKYPy6MF0ZTuHZCfb0kzJwi+mjG9kz
	K4i8T5E9/Kzi+t0KR3g53hNFI2tKRGPkH6s5PSQh9D0kA4EFmppJxwAfPb7Yge7YNbTg
	4dXR7zVwp+PyYg1hsqp+ew5kgLtisf1Jsg8mHC0qTnddY8vPqTfoEsurDgz6DzMKVvuY
	mxJQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:references:in-reply-to:from:date
	:message-id:subject:to;
	bh=n9sm6DXfn0NS4Q1aG08hQ1vYuTMoKI/vUqTwjTobsj0=;
	b=KIRixiBY6UAGzc1j0Q4MiCfByFyFgV09aokAWKD4rR86WFMYfRoCdGf/Mfp/xFsCML
	rGoKElREgWmQgLu9G/RTSb8B9ang4oSG0qlg49Tieil/2bxZ8j8AQIT6XpqqmWqSFd4E
	x68Ms6x0DKP9NhxAyYpMHC3jTthqG90JcZqlVY5SXpku0hxEeiqunuThnabiJZWbNrXI
	A794FkeTyNt56/K6KJeErOL+7MSo/C/Sxc05Z8vbocjAPJsfVrXnQSgpFZvgiIYJvorS
	KLsNIrB+lwgZUCJs/uFACw8mVXSVNOc5AGEeKQMAgVQQUZHtWZPZTTsRJqd5V1OqWAZY
	sFkg==
X-Gm-Message-State: ALKqPwcHmszEQjHulIq+UACXeFlvX2G23kC/7jG9Wvo8RF0Hvvf6qghT
	bp1Q9HdH/ZLm43Ax/fDxrgED5mxicczG+agZJ8s=
X-Google-Smtp-Source: AB8JxZoKflsCfuzwg7jtsiQH2sxF8hdsV4nHAJibuasDovW3RLq/OhYStMCaocdXziNyuwyjAx1LhsDrPkH28AkjlLc=
X-Received: by 2002:a50:a390:: with SMTP id
	s16-v6mr14471512edb.271.1526699322051; 
	Fri, 18 May 2018 20:08:42 -0700 (PDT)
MIME-Version: 1.0
References: <d43c6082-1b2c-c95b-5144-99ad0021ea6c@mattcorallo.com>
	<CAAS2fgRF-MhOvpFY6c_qAPzNMo3GQ28RExdSbOV6Q6Oy2iWn1A@mail.gmail.com>
	<CADabwBCKe6DaiBf_sjz9zyirkw8BdsDZnWSLEiAABEZvVDwj-Q@mail.gmail.com>
In-Reply-To: <CADabwBCKe6DaiBf_sjz9zyirkw8BdsDZnWSLEiAABEZvVDwj-Q@mail.gmail.com>
From: Olaoluwa Osuntokun <laolu32@gmail.com>
Date: Fri, 18 May 2018 20:08:29 -0700
Message-ID: <CAO3Pvs_Ca6FKWw32hDSnuOGWLHAikNrdeopgS6L-FdXT6jn-AA@mail.gmail.com>
To: Riccardo Casatta <riccardo.casatta@gmail.com>, 
	Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Content-Type: multipart/alternative; boundary="000000000000c50ee8056c866263"
X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,
	HTML_MESSAGE,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Sat, 19 May 2018 03:08:44 -0000

--000000000000c50ee8056c866263
Content-Type: text/plain; charset="UTF-8"

Riccardo wrote:
> The BIP recall some go code for how the parameter has been selected which
> I can hardly understand and run

The code you're linking to is for generating test vectors (to allow
implementations to check the correctness of their gcs filters. The name of
the file is 'gentestvectors.go'. It produces CSV files which contain test
vectors of various testnet blocks and at various false positive rates.

> it's totally my fault but if possible I would really like more details on
> the process, like charts and explanations

When we published the BIP draft last year (wow, time flies!), we put up code
(as well as an interactive website) showing the process we used to arrive at
the current false positive rate. The aim was to minimize the bandwidth
required to download each filter plus the expected bandwidth from
downloading "large-ish" full segwit blocks. The code simulated a few wallet
types (in terms of number of addrs, etc) focusing on a "mid-sized" wallet.
One could also model the selection as a Bernoulli process where we attempt
to compute the probability that after k queries (let's say you have k
addresses) we have k "successes". A success would mean the queries item
wasn't found in the filter, while a failure is a filter match (false
positive or not). A failure in the process requires fetching the entire
block.

-- Laolu

On Fri, May 18, 2018 at 5:35 AM Riccardo Casatta via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Another parameter which heavily affects filter size is the false positive
> rate which is empirically set
> <https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki#construction>
> to 2^-20
> The BIP recall some go code
> <https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe936dd0c9b99ecb9/gcs_light_client/gentestvectors.go>
> for how the parameter has been selected which I can hardly understand and
> run, it's totally my fault but if possible I would really like more details
> on the process, like charts and explanations (for example, which is the
> number of elements to search for which the filter has been optimized for?)
>
> Instinctively I feel 2^-20 is super low and choosing a lot higher alpha
> will shrink the total filter size by gigabytes at the cost of having to
> wastefully download just some megabytes of blocks.
>
>
> 2018-05-17 18:36 GMT+02:00 Gregory Maxwell via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org>:
>
>> On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
>> <bitcoin-dev@lists.linuxfoundation.org> wrote:
>> > I believe (1) could be skipped entirely - there is almost no reason why
>> > you'd not be able to filter for, eg, the set of output scripts in a
>> > transaction you know about
>>
>> I think this is convincing for the txids themselves.
>>
>> What about also making input prevouts filter based on the scriptpubkey
>> being _spent_?  Layering wise in the processing it's a bit ugly, but
>> if you validated the block you have the data needed.
>>
>> This would eliminate the multiple data type mixing entirely.
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>
>
>
> --
> Riccardo Casatta - @RCasatta <https://twitter.com/RCasatta>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

--000000000000c50ee8056c866263
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Riccardo wrote:</div><div>&gt; The BIP recall some go=
 code for how the parameter has been selected which</div><div>&gt; I can ha=
rdly understand and run</div><div><br></div><div>The code you&#39;re linkin=
g to is for generating test vectors (to allow</div><div>implementations to =
check the correctness of their gcs filters. The name of</div><div>the file =
is &#39;gentestvectors.go&#39;. It produces CSV files which contain test</d=
iv><div>vectors of various testnet blocks and at various false positive rat=
es.</div><div><br></div><div>&gt; it&#39;s totally my fault but if possible=
 I would really like more details on</div><div>&gt; the process, like chart=
s and explanations</div><div><br></div><div>When we published the BIP draft=
 last year (wow, time flies!), we put up code</div><div>(as well as an inte=
ractive website) showing the process we used to arrive at</div><div>the cur=
rent false positive rate. The aim was to minimize the bandwidth</div><div>r=
equired to download each filter plus the expected bandwidth from</div><div>=
downloading &quot;large-ish&quot; full segwit blocks. The code simulated a =
few wallet</div><div>types (in terms of number of addrs, etc) focusing on a=
 &quot;mid-sized&quot; wallet.</div><div>One could also model the selection=
 as a Bernoulli process where we attempt</div><div>to compute the probabili=
ty that after k queries (let&#39;s say you have k</div><div>addresses) we h=
ave k &quot;successes&quot;. A success would mean the queries item</div><di=
v>wasn&#39;t found in the filter, while a failure is a filter match (false<=
/div><div>positive or not). A failure in the process requires fetching the =
entire</div><div>block.</div><div><br></div><div>-- Laolu</div><br><div cla=
ss=3D"gmail_quote"><div dir=3D"ltr">On Fri, May 18, 2018 at 5:35 AM Riccard=
o Casatta via bitcoin-dev &lt;<a href=3D"mailto:bitcoin-dev@lists.linuxfoun=
dation.org">bitcoin-dev@lists.linuxfoundation.org</a>&gt; wrote:<br></div><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex"><div dir=3D"ltr">Another parameter which heav=
ily affects filter size is the false positive rate which is <a href=3D"http=
s://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki#construction" ta=
rget=3D"_blank">empirically set</a> to 2^-20=C2=A0<div>The BIP recall some =
<a href=3D"https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe9=
36dd0c9b99ecb9/gcs_light_client/gentestvectors.go" target=3D"_blank">go cod=
e</a> for how the parameter has been selected which I can hardly understand=
 and run, it&#39;s totally my fault but if possible I would really like mor=
e details on the process, like charts and explanations (for example, which =
is the number of elements to search for which the filter has been optimized=
 for?)</div><div><br></div><div>Instinctively I feel 2^-20 is super low and=
 choosing a lot higher alpha will shrink the total filter size by gigabytes=
 at the cost of having to wastefully download just some megabytes of blocks=
.</div><div><br></div></div><div class=3D"gmail_extra"></div><div class=3D"=
gmail_extra"><br><div class=3D"gmail_quote">2018-05-17 18:36 GMT+02:00 Greg=
ory Maxwell via bitcoin-dev <span dir=3D"ltr">&lt;<a href=3D"mailto:bitcoin=
-dev@lists.linuxfoundation.org" target=3D"_blank">bitcoin-dev@lists.linuxfo=
undation.org</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On Thu=
, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev<br>
&lt;<a href=3D"mailto:bitcoin-dev@lists.linuxfoundation.org" target=3D"_bla=
nk">bitcoin-dev@lists.linuxfoundation.org</a>&gt; wrote:<br>
&gt; I believe (1) could be skipped entirely - there is almost no reason wh=
y<br>
&gt; you&#39;d not be able to filter for, eg, the set of output scripts in =
a<br>
&gt; transaction you know about<br>
<br>
</span>I think this is convincing for the txids themselves.<br>
<br>
What about also making input prevouts filter based on the scriptpubkey<br>
being _spent_?=C2=A0 Layering wise in the processing it&#39;s a bit ugly, b=
ut<br>
if you validated the block you have the data needed.<br>
<br>
This would eliminate the multiple data type mixing entirely.<br>
<div class=3D"m_-3892677285626005673HOEnZb"><div class=3D"m_-38926772856260=
05673h5">_______________________________________________<br>
bitcoin-dev mailing list<br>
<a href=3D"mailto:bitcoin-dev@lists.linuxfoundation.org" target=3D"_blank">=
bitcoin-dev@lists.linuxfoundation.org</a><br>
<a href=3D"https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev" =
rel=3D"noreferrer" target=3D"_blank">https://lists.linuxfoundation.org/mail=
man/listinfo/bitcoin-dev</a><br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div></div><=
div class=3D"gmail_extra">-- <br><div class=3D"m_-3892677285626005673gmail_=
signature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr">Riccardo Cas=
atta - <a href=3D"https://twitter.com/RCasatta" target=3D"_blank">@RCasatta=
</a></div></div>
</div>
_______________________________________________<br>
bitcoin-dev mailing list<br>
<a href=3D"mailto:bitcoin-dev@lists.linuxfoundation.org" target=3D"_blank">=
bitcoin-dev@lists.linuxfoundation.org</a><br>
<a href=3D"https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev" =
rel=3D"noreferrer" target=3D"_blank">https://lists.linuxfoundation.org/mail=
man/listinfo/bitcoin-dev</a><br>
</blockquote></div></div>

--000000000000c50ee8056c866263--