summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPeter Todd <pete@petertodd.org>2014-01-05 21:53:12 -0500
committerbitcoindev <bitcoindev@gnusha.org>2014-01-06 02:53:25 +0000
commit8c676e54f9640466ba3685cbf3112963dbd6b9ac (patch)
tree20e43bc39df0287c7835a0b17c32b20fac9472f3
parent5248899adcffdb1f7eb9d6cd4a24aa4a0bff879f (diff)
downloadpi-bitcoindev-8c676e54f9640466ba3685cbf3112963dbd6b9ac.tar.gz
pi-bitcoindev-8c676e54f9640466ba3685cbf3112963dbd6b9ac.zip
[Bitcoin-development] Privacy and blockchain data
-rw-r--r--ef/06e2c02633a4433c5f117993e003fe650244d1373
1 files changed, 373 insertions, 0 deletions
diff --git a/ef/06e2c02633a4433c5f117993e003fe650244d1 b/ef/06e2c02633a4433c5f117993e003fe650244d1
new file mode 100644
index 000000000..1053f882c
--- /dev/null
+++ b/ef/06e2c02633a4433c5f117993e003fe650244d1
@@ -0,0 +1,373 @@
+Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194]
+ helo=mx.sourceforge.net)
+ by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
+ (envelope-from <pete@petertodd.org>) id 1W00Jh-0003kr-CN
+ for bitcoin-development@lists.sourceforge.net;
+ Mon, 06 Jan 2014 02:53:25 +0000
+Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of petertodd.org
+ designates 62.13.148.154 as permitted sender)
+ client-ip=62.13.148.154; envelope-from=pete@petertodd.org;
+ helo=outmail148154.authsmtp.co.uk;
+Received: from outmail148154.authsmtp.co.uk ([62.13.148.154])
+ by sog-mx-4.v43.ch3.sourceforge.com with esmtp (Exim 4.76)
+ id 1W00Jf-0008G0-Hw for bitcoin-development@lists.sourceforge.net;
+ Mon, 06 Jan 2014 02:53:25 +0000
+Received: from mail-c237.authsmtp.com (mail-c237.authsmtp.com [62.13.128.237])
+ by punt15.authsmtp.com (8.14.2/8.14.2/) with ESMTP id s062rHHY051386
+ for <bitcoin-development@lists.sourceforge.net>;
+ Mon, 6 Jan 2014 02:53:17 GMT
+Received: from savin (76-10-178-109.dsl.teksavvy.com [76.10.178.109])
+ (authenticated bits=128)
+ by mail.authsmtp.com (8.14.2/8.14.2/) with ESMTP id s062rCtF018230
+ (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO)
+ for <bitcoin-development@lists.sourceforge.net>;
+ Mon, 6 Jan 2014 02:53:15 GMT
+Date: Sun, 5 Jan 2014 21:53:12 -0500
+From: Peter Todd <pete@petertodd.org>
+To: bitcoin-development@lists.sourceforge.net
+Message-ID: <20140106025312.GC2356@savin>
+MIME-Version: 1.0
+Content-Type: multipart/signed; micalg=pgp-sha256;
+ protocol="application/pgp-signature"; boundary="f0KYrhQ4vYSV2aJu"
+Content-Disposition: inline
+User-Agent: Mutt/1.5.21 (2010-09-15)
+X-Server-Quench: b0caeecc-767d-11e3-94fa-002590a135d3
+X-AuthReport-Spam: If SPAM / abuse - report it at:
+ http://www.authsmtp.com/abuse
+X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVJwpGK10IU0Fd
+ P1hXKl1LNVAaWXld WiVPGEoXDxgzCjYj NEgGOBsDNw4AXwN1
+ KhkXXVBSFQF4AR4L Ax0UUx88cANYeX5u ZEFqQHFbVVt/fUFi
+ QwAXFWsEJxQkHmAe WEJbd01WeAJDfFEX agMuASZeMngAYH00
+ WlZqMmt0bGlRIWEN GltQfAobGB1WEmUq fwoDAzwkDAUMQSl7
+ JRghIV0XHE8QNA0+ OEcoMQAA
+X-Authentic-SMTP: 61633532353630.1024:706
+X-AuthFastPath: 0 (Was 255)
+X-AuthSMTP-Origin: 76.10.178.109/587
+X-AuthVirus-Status: No virus detected - but ensure you scan with your own
+ anti-virus system.
+X-Spam-Score: -1.5 (-)
+X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
+ See http://spamassassin.org/tag/ for more details.
+ -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
+ sender-domain
+ -0.0 SPF_PASS SPF: sender matches SPF record
+X-Headers-End: 1W00Jf-0008G0-Hw
+Subject: [Bitcoin-development] Privacy and blockchain data
+X-BeenThere: bitcoin-development@lists.sourceforge.net
+X-Mailman-Version: 2.1.9
+Precedence: list
+List-Id: <bitcoin-development.lists.sourceforge.net>
+List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
+ <mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
+List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
+List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
+List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
+List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
+ <mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
+X-List-Received-Date: Mon, 06 Jan 2014 02:53:25 -0000
+
+
+--f0KYrhQ4vYSV2aJu
+Content-Type: text/plain; charset=us-ascii
+Content-Disposition: inline
+Content-Transfer-Encoding: quoted-printable
+
+* Summary
+
+CoinJoin, CoinSwap and similar technologies improve your privacy by
+making sure information about what coins you own doesn't make it into
+the blockchain, but syncing your wallet is a privacy risk in itself and
+can easily leak that same info. Here's an overview of that risk, how to
+quantify it, and how to reduce it efficiently.
+
+
+* Background
+
+In the most general sense a Bitcoin wallet is a collection of one or
+more scriptPubKeys, often known as addresses.(*) The basic purpose of
+the wallet is maintain the set of all transaction outputs (txouts)
+matching the scriptPubKeys in the wallet. Secondary to that purpose is
+to maintain the set of all transactions associated with scriptPubKeys in
+the wallet; almost all (all?) wallet software maintains transaction
+information rather than only txout data. Usually, but not always, the
+wallet will have some mechanism to spend transaction outputs, creating
+new transactions. (if the wallet doesn't it is referred to as a
+watch-only wallet)
+
+Given a full set of blockchain data the task of keeping the set of all
+relevant transactions and txouts up-to-date is simple: scan the
+blockchain for the relevant data. The challenge is to devise systems
+where wallets can be kept up to date without this requirement in a way
+that is secure, efficient, scalable, and meets the user's privacy
+requirements.
+
+*) Alternatively addresses can be thought of as instructions to the
+ payor as to how to generate a scriptPubKey that the payee can spend,
+ a subtlety different concept.
+
+
+* Threat Model and Goals
+
+Currently the Bitcoin network consists of a large (low thousands) number
+of allegedly independent nodes. There is no mechanism to prevent an
+attacker from sybil attacking the network other than the availability of
+IP addresses. This protection is made even weaker by the difficulty of
+being sure you have a non-sybilled list of nodes to connect too; IP
+addresses are passed gossip-style with no authentication.
+
+=46rom a privacy perspective we are conservative and assume an active,
+internal, and global attacker - using the terminology of Diaz et al.(1)
+- that controls up to 100% of the nodes you are connected too. With
+regard to retrieval of blockchain data we can use the Sweeney's notion
+of k-anonymity(2) where the privacy-sensitive data for an individual is
+obscured by it's inclusion in a data of a large set of individuals, the
+anonymity set.
+
+
+* Basic Functionality
+
+With regard to blockchain data we have the following basic functions:
+
+** Spending funds
+
+The user creates a transaction and gets it to miners by some method,
+usually the P2P network although also possibly by direct submission.
+Either way privacy can be achieved through a mix network such as Tor
+and/or relaying other users' transactions so as to embed yours within a
+larger anonymity set. In some cases payment protocols can shift the
+problem to the recipient of the funds. Using CoinJoin also helps
+increase the anonymity set.
+
+Usually the sender will want to determine when the transaction confirms;
+once the transaction has confirmed modulo a reorganization the
+confirmation count can only increase. Transaction mutability and
+double-spends by malicious CoinJoin participants complicate the task of
+detecting confirmation: ideally we could simply query for the presence
+of a given txid in each new block, however the transaction could be
+mutated, changing the txid. The most simple way to detect confirmation
+is then to query for spends of the txouts spend by the transaction.
+
+
+** Receiving new funds
+
+While in the future payment protocols may give recipients transaction
+information directly it is most likely that wallets will continue to
+have to query peers for new transactions paying scriptPubKey's under the
+user's control for the forseeable future.
+
+
+** Detection of unauthorized spends
+
+Users' want early detection of private key compromise, accomplished by
+querying blockchain data for spends from txouts in their wallets. This
+has implications for how change must be handled, discussed below.
+
+
+* Scalability/Efficiency
+
+The total work done by the system as a whole for all queries given some
+number of transactions n is the scalability of the scheme. In addition
+scalability, and privacy in some cases, is improved if work can be
+easily spread out across multiple nodes both at a per-block and
+within-block level.
+
+
+* Reliability/Robustness
+
+Deterministic wallets using BIP32 or similar, where all private keys are
+derived from a fixed seed, have proven to be extremely popular with
+users for their simple backup model. While losing transaction metadata
+after a data-loss event is unfortunate, losing access to all funds is a
+disaster. Any address generation scheme must take this into account and
+make it possible for all funds to be recovered quickly and efficiently
+=66rom blockchain data. Preserving privacy during this recovery is a
+consideration, but 100% recovery of funds should not be sacrificed for
+that goal.
+
+
+* Query schemes
+
+** Bloom filters
+
+BIP37 bloom filters are currently implemented by the Bitcoin reference
+implementation and used by bitcoinj-based SPV clients. Bloom filters
+achieve a privacy-bandwidth tradeoff by having probabalistic
+false-positives; the false-positives comprise the anonymity set.
+
+Boom filters have a number of problems, both in the specific BIP37
+implementation, as well as fundemental to the idea. Scalability is a
+serious problem: the client sends asks a peer with a copy of all
+blockchain data to filter data sent to the client, limiting the client's
+bandwidth to only the data they are interested in. In the typical case
+of a SPV wallet syncronizing against m new blocks this requires the peer
+to read those m blocks from disk in their entirety, apply the filter,
+and send the client the subset of matching transactions. Obviously this
+results in poor O(n^2) scaling for n clients each making some fixed
+number of transactions.
+
+Of course bloom filters are attractive in that they have very good
+performance per match, but this performance is only really relevant for
+the most recent blockchain information where the data is in RAM. For
+older information they make possible the Bloom IO attack where an
+attacker uses an inordinant amount of disk IO bandwidth at little cost
+to themselves.(3)
+
+The actual BIP37 standard, and existing implementations of it, have a
+number of other flaws that reduce privacy. For instance the standard
+lets the seed value of the hash function be tweaked with a 32-bit
+integer, nTweak. However on the one hand if randomly chosen and rarely
+changed, as suggested by BIP37, the 32-bit integer can be used by an
+attacker to correlate multiple connections from the same wallet. On the
+other hand if nTweak is changed an attacker that can link multiple bloom
+filters can AND those filters together to greatly decrease the
+false-positive rate and determine exactly what funds are in the user's
+wallet.
+
+
+** Prefix filters
+
+With a randomly distributed keyspace - common in cryptographic
+applications - clients can query using variable length prefixes that
+partially match the desired keys. A very simple format for a query of n
+prefixes will look like the following:
+
+ <1 byte length in bits> <1 to 256/8 bytes of prefix>
+ ...
+ ...
+ 0x00
+
+The anonymity set is then the blockchain data whose key is the same
+prefix, usually H(scriptPubKey) or scriptPubKey directly. An important
+advantage of prefix filters is compatibility with the proposed (U)TXO
+commitment schemes: the prefix maps directly to the committed
+scriptPubKey lookup trees, and nodes simply return all entries matching
+the prefix, as well the the rest of the merkle path to the blockchain
+headers proving the data is valid.
+
+While bloom filters have O(n) cost per lookup, or O(n^2) scalability
+system-wide, prefix filters have significantly better O(log n) cost per
+lookup, or O(n log n) system-wide. It's also worth noting that a naive
+implementation can achieve very similar performance to bloom filters
+without bothering to build key-value indexes by just scanning blockchain
+data; once the data is hashed testing the hash against a prefix has a
+minimal cost.
+
+
+** Cryptographically blinded schemes
+
+There are many blinded database query schemes in existence. While we do
+not reject such schemes completely, technologies that rely on simple and
+easy-to-understand cryptography have a significant advantage in their
+simplicity. In addition such complex schemes are unlikely to ever be
+made into miner commitments and thus are less trustworthy in the long
+run.
+
+
+* Correlation attacks
+
+It is often advantageous if blockchain queries can be efficiently spread
+across multiple servers to avoid allowing the attacker to correllate the
+information into a whole. If you have n addresses that need to be
+watched for new transactions, splitting the queries across m nodes
+reduces the information any one node may learn. With bloom filters doing
+this is extremely costly as the full blockchain data needs to be read
+=66rom disk to apply the filter; with prefix filters if the nodes have
+appropriate indexes there is little overhead to splitting the queries
+and no performance loss.
+
+
+* DoS attacks
+
+A possible DoS attack on bandwidth is to insert a large amount of
+blockchain data matching the target's filter; the BIP37 nTweak parameter
+was an attempt to avoid this problem, although with privacy tradeoffs.
+Blockchain data is an extremely expensive communications channel so we
+do not consider this a serious issue. Implementations may wish to give
+clients the ability to specify a filter for information they do not want
+to avoid unintentional collisions, although hopefully in the future the
+address reuse making this a potential problem will become less common.
+
+
+* Address use, management, and generation
+
+If privacy was not a consideration the most efficient mode of operation
+would be to use a single address, as is done by many existing wallets,
+notably the bitcoinj-derived Multibit and Android Wallet, both of which
+use bloom filters. In addition to strongly encouraging address re-use,
+neither provide the user any control over the privacy/bandwidth tradeoff
+given by bloom filters; the default settings have an extremely low
+false-positive rate that is a significant privacy risk.
+
+Taking privacy into account better clients such as Electrum, Armory, and
+Bitcoin Core discourage the re-use of addresses in their UIs, and send
+change to new addresses. However this leads to problem with user
+expectations: users expect it to be possible to be notified quickly of
+new transactions paying any address ever generated by their wallet, as
+well as unauthorized spends from any txout, yet for privacy each query
+for transactions related to the address/txout must match false-positives
+that consume bandwidth; for a fixed bandwidth budget the specificity and
+size of the filter must increase over time.
+
+We have two main avenues to solve this problem:
+
+1) Txin-reuse: Continue to reinforce the idea that transaction inputs
+ have no particular relationship to outputs. Using them for refunds or
+ other purposes implying "ownership" must be strongly discouraged.
+ CoinJoin will help here. If addresses associated with change txouts
+ are truly one-time-use, we can reduce or eliminate queries associated
+ with them. In particular, while the set of all change addresses ever
+ used will grow linearly with time, the set of all change addresses
+ with funds in them will remain roughly stable - it's this set that
+ needs to be queried to detect unauthorized spends.
+
+2) Common prefixes: Generate addresses such that for a given wallet they
+ all share a fixed prefix. The length of that prefix determines the
+ anonymity set and associated privacy/bandwidth tradeoff, which
+ remainds a fixed ratio of all transactions for the life of the
+ wallet.
+
+With this approach change addresses continue to be generated randomly, a
+requirement for CoinJoin privacy. There is some statistical information
+leaked if many non-change txouts are spent in a single transaction in a
+CoinJoin, but even that leak can be avoided with the authors
+OP_RETURN-based stealth addresses proposal. (to be published)
+
+The actual prefix-forcing scheme in many cases will have to be
+brute-force search; fortunately the search space involved is reasonably
+small, ~2 to ~16 bits, and can be done as a background task.
+
+
+1) Towards Measuring Anonymity, Claudia Diaz and Stefaan Seys and Joris
+ Claessens and Bart Preneel (April 2002)
+
+2) k-Anonymity: A Model for Protecting Privacy, Latanya Sweeney, May
+ 2002
+
+3) Private discussions with developers.
+
+--=20
+'peter'[:-1]@petertodd.org
+000000000000000f9102d27cfd61ea9e8bb324593593ca3ce6ba53153ff251b3
+
+--f0KYrhQ4vYSV2aJu
+Content-Type: application/pgp-signature; name="signature.asc"
+Content-Description: Digital signature
+
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1.4.12 (GNU/Linux)
+
+iQGrBAEBCACVBQJSyhqYXhSAAAAAABUAQGJsb2NraGFzaEBiaXRjb2luLm9yZzAw
+MDAwMDAwMDAwMDAwMDI4NjFlZTA5MTlmYzg2OTkwNTczYWMzNjA4MjA3NjZkYzFi
+OWJhNTgwZTVjY2Y3YjYvFIAAAAAAFQARcGthLWFkZHJlc3NAZ251cGcub3JncGV0
+ZUBwZXRlcnRvZC5vcmcACgkQJIFAPaXwkfuoIAgAuP2jjGXbY/9/+Ql/1B+vcny6
+K9HAJ06YA6nEsFqdjr8qBG482OVRAmnSQGFsiDuwYyekalQuOGmY0lDANGhllMEL
+lqNqxc8+uMnl0RC/k6mVQzFJkmlD3uhUHvt0jS7Aqfw9LgumtnGbIkln37uOyrDu
+fyGjkRbQbIP+XHfQVn7h+818t1TaLkUgK5xc6ibyztS6sw5hHaDZk1H5g3NaMN1o
+u+1HT6qaYmfewHyfPH0EXXYX/TagiVpBFrKZow/coM5i91M7MqHQht6cT7ETvud3
+oxV987j03v2LMgKjfk0NxZaajlWWoMmOM55ZUbOA9yC1ZmOVB616rBUY3zTzuw==
+=k8LD
+-----END PGP SIGNATURE-----
+
+--f0KYrhQ4vYSV2aJu--
+
+