1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
|
Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194]
helo=mx.sourceforge.net)
by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
(envelope-from <andreas@petersson.at>) id 1StDTl-0003Rl-O3
for bitcoin-development@lists.sourceforge.net;
Mon, 23 Jul 2012 07:54:57 +0000
X-ACL-Warn:
Received: from petersson.at ([213.239.210.117])
by sog-mx-4.v43.ch3.sourceforge.com with esmtps (TLSv1:AES256-SHA:256)
(Exim 4.76) id 1StDTk-0000sa-KM
for bitcoin-development@lists.sourceforge.net;
Mon, 23 Jul 2012 07:54:57 +0000
Received: by petersson.at (Postfix, from userid 65534)
id 35F9519A002; Mon, 23 Jul 2012 09:54:50 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on petersson.at
X-Spam-Level:
X-Spam-Status: No, score=-1.0 required=4.0 tests=ALL_TRUSTED
autolearn=disabled version=3.3.1
Received: from [192.168.0.199] (chello084114039092.14.vie.surfer.at
[84.114.39.92])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(No client certificate requested) (Authenticated sender: andreas)
by petersson.at (Postfix) with ESMTPSA id A74D419A001
for <bitcoin-development@lists.sourceforge.net>;
Mon, 23 Jul 2012 09:54:49 +0200 (CEST)
Message-ID: <500D0348.4010604@petersson.at>
Date: Mon, 23 Jul 2012 09:54:48 +0200
From: Andreas Petersson <andreas@petersson.at>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
rv:14.0) Gecko/20120713 Thunderbird/14.0
MIME-Version: 1.0
To: bitcoin-development@lists.sourceforge.net
References: <CA+8xBpecVQcTTbPxUm_3_GWC99dEd4=-VFWb+QT6jUy4rg8U4w@mail.gmail.com>
<CANEZrP0kNZDByHpK2=UjP+ag0X1KmqHxnJdm=e_pWMitP4QvvA@mail.gmail.com>
<1339766346.31489.49.camel@bmthinkpad>
<CANEZrP3jj2ymQPH50g2PvzZhRzTnUnCLUjvBYj8ndBCJsnGJ-w@mail.gmail.com>
<1339771184.31489.53.camel@bmthinkpad>
<CANEZrP0hTRbE9+VEa3eCzJkbHqa3u8tpdw7eDLBQQR6DBf2adw@mail.gmail.com>
<1340132998.6065.7.camel@bmthinkpad>
<CANEZrP0ws6bGk5qmDCmRPMbwyNX+3W5BRNzZPn_Av-nqFPAqOw@mail.gmail.com>
In-Reply-To: <CANEZrP0ws6bGk5qmDCmRPMbwyNX+3W5BRNzZPn_Av-nqFPAqOw@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.0 (/)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
See http://spamassassin.org/tag/ for more details.
-0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
domain
X-Headers-End: 1StDTk-0000sa-KM
Subject: Re: [Bitcoin-development] New P2P commands for diagnostics,
SPV clients
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Mon, 23 Jul 2012 07:54:57 -0000
Some concerns regarding Bloom Filters. I talked with Stefan Thomas on
the Hackathon Berlin about this.
I tried to follow the discussion closely but i have not taken a look at
the code yet (is there already an implementation?) so please correct me
if i got something wrong.
The way the Bloom filters are planned now this requires a complicated
setup. basically the client will ask the server to replay the whole
blockchain, but filtered.
This is not optimal for the following reasons:
This will require the server to do a full scan of his data and only
filter out non-matching entries.
Really lightweight clients (like Bitcoincard), clients with shared
private keys (electrum-style), or brainwallets - will ask the following
question quite often to "supernodes": Given my public keys/addresses,
what is the list of unspent outputs. i think it would make sense to
include such a command, instead or in addition to the filterload/filterinit.
And perhaps more severe: as far as i understand classic bloom filters,
the server has no method of indexing his data for the expected requests.
There is simply no data structure (or maybe it has to be invented) which
allows the use of an index for queries by bloom filters of *varying
length* and a generic hashing method.
im not sure what a really efficient data structure for these kinds of
query is. but i think it should be possible to find a good compromise
between query flexibility, server load, client privacy.
one possible scheme, looks like this:
the client takes his list of addesses he is interested in. he hashes all
of them to a fixed-length bit array (bloom filter) of length 64KiB (for
example), and combines them with | to add more 1's with each address.
the server maintains a binary tree data structure of unspent outputs
arranged by the Bloom filter bits.
to build the tree, the server would need to calculate the 64KiB bits for
each address and arrange them in a binary tree. that way he can easily
traverse the tree for a given bloom query.
if a client whishes to query more broadly he can calculate the bloom
filter to 64KiB and after that fill up the last 50% of the Bits with 1.
or 95%. the trailing 1 bits even don't need to be transmitted to the
server when a client is querying. of course, if the client is more
privacy-concerned he could also fill up random bits with 1, which would
not change much actually.
the value of 64KiB is just out of thin air.
according to my experimentation using BloomFilter from Guava -
currently, also 8KiB would be sufficient to hava a 3% false positive
rate for the 40000 active addresses we have right now.
someone more familiar with hashing should please give his opinion if
cutting a bloom filter in half has any bad consequences.
Andreas
|