summaryrefslogtreecommitdiff
path: root/3b/2d6fe58c75fca4e8b5062c6a304c20dd364b09
blob: 25f35b6aac53b01eced58988d46db39665274aa3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
Return-Path: <belcher@riseup.net>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id A41ACBC6
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 17 Feb 2017 00:29:06 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mx1.riseup.net (mx1.riseup.net [198.252.153.129])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 2FEAFCD
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 17 Feb 2017 00:29:05 +0000 (UTC)
Received: from cotinga.riseup.net (unknown [10.0.1.164])
	(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "*.riseup.net",
	Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK))
	by mx1.riseup.net (Postfix) with ESMTPS id 3798C1A03C2
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 17 Feb 2017 00:29:04 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=riseup.net; s=squak;
	t=1487291344; bh=S6zHFx54WGY2N735rFILmxGeP5OBeR9s+Q/Cbc27KHo=;
	h=Subject:To:References:From:Date:In-Reply-To:From;
	b=aFO85NsRijEATJnSa6kmnsmCgKUDYw8JmugsGGDGrpxYoYbpnT1bcgCQO4N89Bv8f
	re5eGOgtD11xzed5YZfFqmYgRKUmbawHNnrWjQZRpSquORUeofqUfwJsAORXV5+v17
	Nyw7xowaG9i+b1oZhwsWEuu+HYwQJVhTXVRj7RHA=
Received: from [127.0.0.1] (localhost [127.0.0.1])
	(Authenticated sender: belcher) with ESMTPSA id 85D6E418E0
To: bitcoin-dev@lists.linuxfoundation.org
References: <71d822e413ac457a530e1c367811cc24@cock.lu>
From: Chris Belcher <belcher@riseup.net>
Message-ID: <f8511114-4bcd-32a0-f654-414a723781fa@riseup.net>
Date: Fri, 17 Feb 2017 00:28:59 +0000
MIME-Version: 1.0
In-Reply-To: <71d822e413ac457a530e1c367811cc24@cock.lu>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD,
	UNPARSEABLE_RELAY autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: Re: [bitcoin-dev] Committed bloom filters for improved wallet
 performance and SPV security
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Feb 2017 00:29:06 -0000

I believe this proposal still suffers from one problem that bip37 did,
albiet by a much lesser extent. Combining the partial information from
the block downloads with the transaction subgraph information from the
blockchain can in some cases still reveal which addresses belong to the
wallet. Nonetheless this proposal still has many benefits and is well
worth working on.

==BIP37==

As a recap, probably the biggest and most problematic way that bip37 was
broken was by combining the partial wallet information from the bloom
filter with the transaction subgraph information from the blockchain

Suppose a wallet synchronizes it's history, if it spent a coin from its
address A, it must also also add the change address B to the bloom
filter, which is connected to A directly on transaction graph.

As an example, consider five typical transactions that consume one input
each and produce two outputs.
A, B, C, D, E refer to transactions. A1, A2, etc refer to addresses
within those transactions

          -> C1
A1 -> B2  -> C2
   -> B2  -> D1
          -> D2 -> E1
                -> E2

If a bip37 bloom filter matches addresses A1, B2, D2, E1 then it can be
seen that they form a "peel chain" [this terminology comes from
https://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf]

          -> X
A1 -> X   -> X
   -> B2  -> X
          -> D2 -> E1
                -> X

The same five transactions with non-matching addresses replaced by X.
The peel chain is visible, it's clear that B2, D2, E1 are change
addresses which belong to the same wallet as A1.

For a given false-positive rate fp and given length of peel chain C, the
odds of a false positive peel chain happening by chance is fp^C which
rapidly gets very small as the wallet makes more transactions (increases C).

If only one address was matched from the above group (for example B2)
then it likely to be a false positive by the fact that it doesn't make
any transactions to another address that also matches the bloom filter.
Another possibility is that the address is a payment output that the
wallet received but hasn't spent yet, but the wallet cant spend it
without adding the change address to the bloom filter and thus revealing
itself to the spy.

I believe the committed bloom filter proposal is vulnerability to this
same kind of attack because it still leaks information about which
addresses the wallet is interested in.

==Committed Bloom Filter Maths==

I'll try to analyze this now. I'll find the expectation value of the
number of transaction subgraphs in those blocks that appear just by
chance. If this expectation goes to zero, then the only transaction
subgraph left will be the real one that the wallet is actually
interested in. In that case it will be possible to spy on the wallet.

Assuming outputs have the same probability of being spent in each time
interval (i.e. they are spent in a Poisson process) This is
approximately true, see the graphs from
[https://letstalkbitcoin.com/blog/post/rise-of-the-zombie-bitcoins].
This means we can assign
a single probability P that an output is spent in each block.

Assume every transaction has one change address only and spending of
unconfirmed change doesn't happen (its more efficient to use RBF to add
additional outputs anyway)

Number of transactions per block = Q (about 1800 today)
Number of outputs per block = Z = 2*Q (approximately)
Length of peel chain = Number of transactions in wallet = C
Average time an output is unspent for = T (about 1 month, very roughly
estimating from the above blog post)
Probability an output being spent in any particular later block = P =
10minutes/T

Assume no false positive blocks
Say wallet downloaded two blocks and they are ordered by block height
The expected number of tx subgraphs between them, E(#G)
E(#G) = number of outputs created in block 1 that get spent in block 2
      = Z*P

Say the wallet downloaded three blocks
Expected number of subgraphs going through them all
E(#G) = number of outputs created in block 1 get spent in block 2, that
create a change address which gets spent in block 3
      = Z*P*P

Say the wallet downloaded C blocks
Expected number of tx subgraphs going through all the blocks by chance
E(#G) = Z*P^C
which gets small quickly as C goes up, because P < 1

Now drop the assumption about no false positive blocks.

Let the number of candidate blocks be D.
This is how many blocks the wallet scans, it's related to how far in the
past the wallet's keys was created. At one extreme wallet was created at
genesis block and so D = ~450000, at other extreme created now so D = 0.
Note that D = 0 must also imply C = 0

Expected number of false positive blocks downloaded = F = fp*D

In all these situations the blocks are sorted by block height

Suppose have C=2, F=1, and false one is in the middle.
I want to find E(#G|CF), the expected number of transaction subgraphs
that appear just by chance, given C and F.
E(#G|CF) = how many outputs which are created in block 1 get spent in
block 3
         = Z*P

Same situation, but false one at the start instead of middle.
E(#G|CF) = how many outputs which are created in block 2 get spent in
block 3
         = Z*P

Same situation but false one could be anywhere, result in the sum of the
probability for any false block position
E(#G|CF) = C(3, 1)*Z*P = 3*Z*P

where C() is the number of order-independent ways of choosing 1 element
out of a set of 3 elements, also known as the binomial coefficient

Now suppose C=3 and F=1
The same argument leads to
E(#G|CF) = C(4, 1)*Z*P^2 = 4*Z*P^2


Now suppose C=3 and F=2, with fp blocks at the end
E(#G|CF)
= how many outputs are created in block 1, are spent in block 2 and
change address spent in block 3
= Z*P^2

Same situation but fp blocks can be anywhere, add up all the possible
combinations of them within the rest
E(#G|CF) = C(5, 2)*Z*P^2 = 5*Z*P^2

With these same rules, its clear the general expression for any F and C
E(#G|CF) = C(F + C, F)*Z*P^(C - 1)

A more interesting value might be the time evolution of E(#G)
Let B be the blocks in the blockchain since the wallet creation date, as you
know it increases at an average rate of one every ten minutes

w = wallet transaction creation rate, expressed per-block
C = w * B
F = fp * B

J = average blocks between wallet transactions = 1440 (10 days)
w = 1/J

E(#G|B) = C((fp + w)*B, fp*B)*Z*P^(w*B - 1)

This goes to zero as B becomes big, although choosing very high values
of fp makes it go to zero slower.

This is only approximate maths, in actuality you cannot take the number
of false positive blocks to be fp*B, you have to sum over all blocks
weighted by probability. And outputs might not be spent in an exact
Poisson process so you cant just multiply by P each time. Plus if your
false positive rate is very high then some of your false positive blocks
will actually contain your real transactions, this analysis
double-counts them.

Using some reasonable values and plotting E(#G|B) against B can show how
quickly it drops and therefore leaves only the true transaction subgraph.

(note: in LibreOffice Math and Microsoft Excel the binomial coefficient
function is COMBIN)

==Notes==

*) The expected number of transaction subgraphs that happen by chance
goes to zero eventually as the blockchain steps ahead. Unless the fp
rate is very high (close to 1) and time between wallet transaction very
long, in which case the binomial coefficient term gets larger more
quicker than the exponential decay P^B term gets smaller.
*) fp rate doesn't help in most cases that much compared to the
exponential drop-off from time ticking ahead requiring more downloading
of blocks
*) its good for privacy if bitcoin outputs are spent more frequently so
P is higher, because that creates more transaction subgraphs in the
anonymity set.
*) its good for privacy if more outputs are made per block, although
still only linearly which is no match for the exponential reduction from
the P^B term.
*) its good for privacy to make less of your own transactions (increase
J and reduce w), for low-activity users the privacy of committed bloom
filters can be actually pretty good, for high-activity users who use
bitcoin's blockchain all the time it's not very good
*) For the reasonable values I tried for a once-a-month user with fp=1%,
their chance-transaction-subgraph-count drops below 1 in about eight months.
*) Because of the exponential nature, E(#G) goes from "billions of
billions" to "about 10" fairly quickly.

==Discussion of ways to mitigate this==

One way is to not use change outputs. This is unrealistic, doesn't match
people's behavior and money must be divisible.

A better way to mitigate this is to not leak the information that all
those blocks are interesting to the same wallet. Don't download all
blocks from the same archival node. If you download blocks from many
different nodes, it gives an incentive for surveillance startups to
create lots of sybil nodes they control and can then correlate together
block downloads with the wallet IP address. Many such startups are
already doing this today to try to detect the origin IP address of
broadcasted transactions.

Another solution could be to download a few blocks from different nodes
with new tor circuits used. This would delink the wallet IP address from
the downloads and would help a lot. This has the issue that tor is
slower (but still not as slow as downloading the entire blockchain)

Another way a wallet could be correlated with its block downloads is
timing correlations. At any one time only a certain number of peers
would be downloading blocks which narrows down which wallets are
downloading what. However even today Bitcoin Core downloads blocks in
parallel from many nodes so there's probably quite a large anonymity set
for lightweight wallets using committed bloom filters. Plus timing
correlation can be reduced simply by waiting longer. Wallets are not
sync'd from backup very often so it might be okay to wait.

Another way to improve privacy could be for the wallet to choose random
transaction subgraphs and download all the blocks related to them as well.

Wallet developers might choose to allow the user to configure their own
fp rate. This is probably not a good idea since the relationship between
fp rate and anonymity set is non-obvious. It might be better to ask the
user how often they expect to make transactions.

==Conclusion==

I think this committed bloom filter idea is very good and much better
than bip37, but for good privacy for when bitcoin is used often still
requires certain behavior namely downloading blocks
from many different peers with new tor circuits.

Note that I've been dealing with counting transaction subgraphs but
actually finding them from blocks might also be computationally
infeasible. Although a Bayesian approach worked very
well for similar transaction subgraph linking
[https://arxiv.org/pdf/1612.06747v3.pdf]

It would also be interesting to analyze what information a spy can get
if they are missing some blocks that the wallet downloaded.

For the long term, private and high-volume bitcoin use will be best
served by off-chain transactions. They will probably be a huge win just
because the large and public blockchain is such a non-private
way of doing things.

On 09/05/16 09:26, bfd--- via bitcoin-dev wrote:
> We introduce several concepts that rework the lightweight Bitcoin
> client model in a manner which is secure, efficient and privacy
> compatible.
> 
> Thea properties of BIP37 SPV [0] are unfortunately not as strong as
> originally thought:
> 
>     * The expected privacy of the probabilistic nature of bloom
>       filters does not exist [1][2], any user with a BIP37 SPV wallet
>       should be operating under no expectation of privacy.
>       Implementation flaws make this effect significantly worse, the
>       behavior meaning that no matter how high the false positive
>       rate (up to simply downloading the whole blocks verbatim) the
>       intent of the client connection is recoverable.
> 
>     * Significant processing load is placed on nodes in the Bitcoin
>       network by lightweight clients, a single syncing wallet causes
>       (at the time of writing) 80GB of disk reads and a large amount
>       of CPU time to be consumed processing this data. This carries
>       significant denial of service risk [3], non-distinguishable
>       clients can repeatedly request taxing blocks causing
>       reprocessing on every request. Processed data is unique to every
>       client, and can not be cached or made more efficient while
>       staying within specification.
> 
>     * Wallet clients can not have strong consistency or security
>       expectations, BIP37 merkle paths allow for a wallet to validate
>       that an output was spendable at some point in time but does not
>       prove that this output is not spent today.
> 
>     * Nodes in the network can denial of service attack all BIP37 SPV
>       wallet clients by simply returning null filter results for
>       requests, the wallet has no way of discerning if it has been
>       lied to and may be made simply unaware that any payment has been
>       made to them. Many nodes can be queried in a probabilistic manor
>       but this increases the already heavy network load with little
>       benefit.
> 
> 
> 
> We propose a new concept which can work towards addressing these
> shortcomings.
> 
> 
> A Bloom Filter Digest is deterministically created of every block
> encompassing the inputs and outputs of the containing transactions,
> the filter parameters being tuned such that the filter is a small
> portion of the size of the total block data. To determine if a block
> has contents which may be interesting a second bloom filter of all
> relevant key material is created. A binary comparison between the two
> filters returns true if there is probably matching transactions, and
> false if there is certainly no matching transactions. Any matched
> blocks can be downloaded in full and processed for transactions which
> may be relevant.
> 
> The BFD can be used verbatim in replacement of BIP37, where the filter
> can be cached between clients without needing to be recomputed. It can
> also be used by normal pruned nodes to do re-scans locally of their
> wallet without needing to have the block data available to scan, or
> without reading the entire block chain from disk.
> 
> -
> 
> For improved probabilistic security the bloom filters can be presented
> to lightweight clients by semi-trusted oracles. A client wallet makes
> an assumption that they trust a set, or subset of remote parties
> (wallet vendors, services) which all all sign the BFD for each block.
> The BFD can be downloaded from a single remote source, and the hash of
> the filters compared against others in the trust set. Agreement is a
> weak suggestion that the filter has not been tampered with, assuming
> that these parties are not conspiring to defraud the client.
> 
> The oracles do not learn any additional information about the client
> wallet, the client can download the block data from either nodes on
> the network, HTTP services, NTTP, or any other out of band
> communication method that provides the privacy desired by the client.
> 
> -
> 
> The security model of the oracle bloom filter can be vastly improved
> by instead committing a hash of the BFD inside every block as a soft-
> fork consensus rule change. After this, every node in the network would
> build the filter and validate that the hash in the block is correct,
> then make a conscious choice discard it for space savings or cache the
> data to disk.
> 
> With a commitment to the filter it becomes impossible to lie to
> lightweight clients by omission. Lightweight clients are provided with
> a block header, merkle path, and the BFD. Altering the BFD invalidates
> the merkle proof, it's validity is a strong indicator that the client
> has an unadulterated picture of the UTXO condition without needing to
> build one itself. A strong assurance that the hash of the BFD means
> that the filters can be downloaded out of band along with the block
> data at the leisure of the client, allowing for significantly greater
> privacy and taking load away from the P2P Bitcoin network.
> 
> Committing the BFD is not a hard forking change, and does not require
> alterations to mining software so long as the coinbase transaction
> scriptSig is not included in the bloom filter.
> 
> 
> [0] https://github.com/bitcoin/bips/blob/master/bip-0037.mediawiki
> [1] https://eprint.iacr.org/2014/763.pdf
> [2] https://jonasnick.github.io/blog/2015/02/12/privacy-in-bitcoinj/
> [3] https://github.com/petertodd/bloom-io-attack
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>