Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 08D13900 for ; Thu, 21 Sep 2017 02:11:15 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-it0-f46.google.com (mail-it0-f46.google.com [209.85.214.46]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 1F65C20D for ; Thu, 21 Sep 2017 02:11:11 +0000 (UTC) Received: by mail-it0-f46.google.com with SMTP id e134so4945890ite.3 for ; Wed, 20 Sep 2017 19:11:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=andrew-cmu-edu.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=CtefwGCSQ0h0q+jmsvGI5531lCJtLX1p/K8GDKg29wU=; b=jCYUtG/C20C214cC7CWM+2UotwwvfSONAtN/n/oulttjH2yDKdWNkRkMrruY7HF4M7 jOxNhg306PboRTe2uzyQQYNc0M9hkYnfElOXNYWt+mdrfHUI6apn1HsE9O81wYp4m/He nutIPSN1c0WbEj2mZDRoaMcZclk28RMjk6mVD22McU+WD6FIQ1fHOazjb9q4W44dmg5u lASIQDxtiWbGKLqvc76DNaxm8nftwrgxE7Px75FUFktHR+FTAABjDfUP+1jL7lX2pyjq DR0EPPZgL+vcrgsW2GrBZ1nLzZ0nT0Rvvd8U69RLYqpqLDhWSHn26CX5WxgOKOUKZy+z tpeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=CtefwGCSQ0h0q+jmsvGI5531lCJtLX1p/K8GDKg29wU=; b=bc9URgKw9UhJWBBILAGLZNX92abZyJACQ38pwtSuRenBFHiMGIj1rBhnTgR9+j7WKA RU27f4xa0uIwMbn7ykGarQ2oRCL9GfkCecBtRQwf/TSZvZi/cTiBzxhU/Zi19k96T2m3 rxkAMWZQFyzKR+cYDM0ZwOmF9bkKBVMuIOL/E1QwmT1MFp8/wWMCd6mpvGznvThg8vyw iBRmIo385/odK8EpkZCbKcba06Txkxteng4wUdMZ2Du+vSVBNITH+qa+tHu/CAc1kJoW HuBpUq2nFTOZAvdzKacP0Hw8kF52eWVaATJ4bfsytbog2NJDjYobkHTeAXNVyTLtD+qT FsPw== X-Gm-Message-State: AHPjjUiCTLiFrdxd5/DVP7QEiqsnZv4joGW84y7pkK6uZ9T8VSA0unrT hQ4TDx7zkfJLFp6AT6BzKACWbQZOn64V+kDyB6GR3Bhe X-Google-Smtp-Source: AOwi7QC63MxVifBW3QMAVr61q20spBLXJXMUlk8ubAUfdXAsDjXkhTfuhmhX7bNJdh8jOH8S/OtEy+BPAf1oaQ/QE6g= X-Received: by 10.36.189.12 with SMTP id x12mr5805638ite.108.1505959870589; Wed, 20 Sep 2017 19:11:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.2.176.19 with HTTP; Wed, 20 Sep 2017 19:10:29 -0700 (PDT) From: Giulia Fanti Date: Wed, 20 Sep 2017 22:10:29 -0400 Message-ID: To: bitcoin-dev@lists.linuxfoundation.org Content-Type: multipart/alternative; boundary="94eb2c0ed4a02204530559a99b16" X-Spam-Status: No, score=0.5 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM autolearn=disabled version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Thu, 21 Sep 2017 02:14:25 +0000 Subject: Re: [bitcoin-dev] BIP proposal - Dandelion: Privacy Preserving Transaction Propagation X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Sep 2017 02:11:15 -0000 --94eb2c0ed4a02204530559a99b16 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Greetings bitcoin-dev, We=E2=80=99re returning to this thread to give an update on the Dandelion= project after several months of additional work. (Dandelion is a new privacy-preserving transaction propagation method, which we are proposing as a BIP. See the original post in this thread https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-June/014571.ht= ml for more background) The feedback on our initial BIP from Greg Maxwell in this thread touched on several important issues affecting the protocol design, which it has taken us until now to adequately address. The focus of this update is a new variant of the Dandelion++ mechanism presented earlier. The new variant is called =E2=80=9CPer-Incoming-Edge=E2= =80=9D routing. In a nutshell, while the earlier Dandelion++ variant calls for routing *each* stem transaction through a randomly chosen path, Per-Incoming Edge routing causes each transaction from the same source to traverse the same pseudorandom path. The most important benefit of Per-Incoming-Edge is that it prevents =E2=80=9Cintersection attacks=E2=80=9D that result if a client = broadcasts multiple transactions over a short period of time. We validate this new variant with new analysis and simulation as explained below. Today=E2=80=99s update also includes an outline of our next development pla= ns. We have not yet completed a reference implementation, so this update does not include a new BIP. Instead we=E2=80=99re just outlining the steps we plan t= o take before an updated BIP. The new approach also impacts our implementation approach. Since Per-Incoming Edge routing simplifies the handling of orphan transactions, we=E2=80=99re now planning on adopting Greg Maxwell=E2=80=99s= suggestion to bypass the txMempool for dandelion stem transactions. =3D=3D=3D=3D=3D=3D The feedback on Dandelion from Greg Maxwell touched on a few important issues: (1) robustness to observations over time, aka =E2=80=9Cintersection attacks=E2=80=9D, (2) protocol- or implementation-level data leaks, and (3)= graph learning. (1) With time, the adversary may be able to observe many message trajectories, thereby eventually learning the underlying graph structure and/or improving its deanonymization estimate for a given estimate of the graph structure. In our original Dandelion BIP, we addressed this by changing the anonymity graph topology from a directed line to a directed 4-regular graph. (In short, instead of a single outgoing edge for Dandelion transactions, each node selects from *two* such edges). This topology provides robustness to adversaries who are able to learn the graph, but those results still assume that each node generates only one transaction in each =E2=80=9Cepoch=E2=80=9D (time between reshuffling the anonymity graph)= . Hence a big remaining question is to understand the effect of intersection attacks--an adversary observing multiple dependent transactions--on deanonymization precision and recall. (2) The second issue is protocol- or implementation-level behavior that would allow an adversary to actively probe Dandelion to learn more information than before. As you correctly note, we want to avoid the adversary using conflicting transactions to infer which nodes are in the stem. This issue is related to issue (1), in that our mechanism for addressing intersection attacks will determine what data structures we need in the implementation. (3) The third issue is that an adversary may be able to infer the structure of the graph by observing network traffic. We want to prevent this. ---------- Intersection Attacks ---------- An adversary=E2=80=99s ability to launch intersection attacks depends on th= e internal Dandelion routing policy. Two natural ways to approach routing are the following: 1. Per-Hop: For each incoming stem transaction, make an independent random decision of (a) whether to transition to =E2=80=9Cfluff=E2=80=9D phase, and= (b) if =E2=80=9Cstem=E2=80=9D, then which node should we relay to. This means that two transactions, even starting from the same source, take independent random walks through the anonymity graph. This is what our current implementation does. 2. Per-Inbound-Edge: For each inbound edge e, randomly select one outbound edge g, and relay all transactions arriving on edge e to edge g (assuming the transaction remains in stem phase). Each node uses this relay mapping for an entire epoch, which lasts about 10 min. Each source also randomly chooses one outbound edge g=E2=80=99 for its own transactions; so if a node generates 5 transactions, they will all get propagated over edge g=E2=80=99= . This approach has the property that during an epoch, all transactions from a single source will take the same path through the stem graph. We have simulated and analyzed these two routing protocols, and find that per-inbound-edge routing seems to be more robust to intersection attacks. For our simulations we consider the =E2=80=9Cfirst-spy=E2=80=9D estimator -= -- this means the rule where the attacker simply guesses that the first peer to relay a transaction to a spy node is the real source. Figure 1 (link below) illustrates the first-spy precision for per-incoming-edge routing and per-transaction routing when each node has *one* transaction. Higher precision means worse anonymity. For comparison, this figure includes diffusion, which is the spreading mechanism currently used. Here =E2=80=98p= =E2=80=99 denotes the fraction of nodes in the network that are spies. (Recall that in our model, we treat the attacker has having control over some fraction of random nodes). The turquoise curve (labelled =E2=80=98p=E2=80=99) is sho= wn for reference---it does not represent any routing protocol. https://github.com/gfanti/bips/blob/master/per-edge-vs-per-tx.jpg First, note that the first-spy estimator is thought to be significantly suboptimal for diffusion (red line). Prior literature has shown that on certain classes of graphs, there exist estimators that can detect diffusion sources with much higher probability than the first-spy estimator. While it=E2=80=99s unclear how to apply those algorithms to Bitcoin=E2=80=99s gra= ph, it is likely that strong algorithms exist. Hence the first-spy estimator serves as a lower bound on precision for diffusion. On the other hand, we can show theoretically that the first-spy precision for per-tx and per-incoming-edge routing is within a small constant factor of the optimal precision for per-incoming-edge routing. Thus, we expect that the green (per-edge) and blue (per-tx) lines reflect the near-optimal attack, whereas the red line (diffusion) could be much higher in practice. The second issue to note is that the blue line (per-tx forwarding) has the lowest precision of the three protocols for one tx per node. The green line (per-edge forwarding) has higher precision than per-tx forwarding when there are very few spies, but approaches per-tx forwarding as p increases. Moreover, it has lower precision than diffusion for p>=3D0.05. However, the real benefits of per-edge forwarding emerge as nodes start to transmit multiple transactions. Under per-edge forwarding, even if nodes transmit multiple transactions each, those transactions will traverse the same path in the anonymity graph, thereby preventing the adversary from learning any new information from later transactions. Meanwhile, under per-tx routing, we find empirically that as nodes generate an increasing number of transactions, each source generates a unique signature of spy-node-observations (we are currently working on a more detailed exploration of this question). We expect that such signatures can be used to exactly deanonymize users in cases where the adversary learns the graph. Hence per-tx forwarding is actually quite fragile to adversaries learning the graph, whereas per-incoming-edge is robust to intersection attacks. This is one key reason for adopting per-incoming-edge forwarding. Adopting per-incoming-edge forwarding has another important implication: it becomes easy to enforce the condition that child transactions never enter fluff mode before parent transactions. This significantly simplifies orphan handling, and means that adversaries cannot infer that a preceding transaction is still in stem mode just by passively listening to network traffic. We revisit this issue in the next section. ---------- Implementation-Level Metadata Leaks ---------- tl;dr: concept ACK for gmaxwell=E2=80=99s suggestion on a new per-peer data structure instead of mempool Regardless of which routing policy we choose, it is important that implementations do not leak more information about transactions than they do in our model. It=E2=80=99s especially important that spies do not get an =E2=80=9Coff-path=E2=80=9D view of the nodes involved in the stem of a tran= saction. This practically means that implementations must be careful not to expose whether or not a stem transaction was received, to any node except the two randomly chosen ones. (i.e., not to supernodes that may make inbound connections to thousands of nodes). We are currently developing a reference implementation for Dandelion++, as a patch against Bitcoin Core. It requires thoughtful integration to make this patch, and the choice of routing policy informs our approach. We have so far considered two main integration approaches, whose main difference is whether or not they reuse the existing *txMempool* data structure to store stem mode transactions. A. Mempool embargo: This how is our current implementation works. Stem transactions are only relayed if they are accepted to mempool. Stem transactions are =E2=80=9Cemb= argoed=E2=80=9D by suppressing them from MEMPOOL and INV messages sent from the node. This was the easiest to implement while preserving all of Bitcoin=E2=80=99s exis= ting DoS prevention. In particular, it simplifies the handling of orphan transactions, because the AcceptToMempool routine already handles orphan transactions. However, this approach comes with a risk of indirect leakage, especially if some edge case is missed in implementation. B. Avoid modifying mempool (or any global structure) for stem transactions: This is the approach preferred by Greg Maxwell. The main benefit is that it is much more clear that there is no indirect leakage, although it makes it harder to argue there is no additional DoS concern. We have already taken a couple of steps towards implementing this here: https://github.com/gfanti/bitcoin/commits/dandelion-nomempool The main idea is to avoid duplicating the rules for whether a transaction would be accepted into mempool or not, by adding a =E2=80=9Cdry run=E2=80=9D option = to the AcceptToMempool function. Our implementation of this approach is not yet finished; it still remains to develop the per-peer data structure. Orphan transactions are important for per-tx routing, because with per-tx routing, the child and the parent might travel along different stems. A burst of transactions from a single sender would have to be queued so they enter fluff mode sequentially. A lot of our testing (with the included test framework) involved ensuring such transactions were handled effectively. This was also the deciding factor for our choice of using Option B =E2=80= =9CMempool Embargo=E2=80=9D above. With Per-incoming Edge routing, however, orphan tra= nsaction handling can be simplified, since out-of-order transactions would not be sent along stems. We therefore plan to re-engineer a much of our reference implementation to: 1) use per-incoming edge routing, 2) simplify handling of orphan transactions, 3) adopt the proposed approach of avoiding the mempool data structure for stem transactions. We=E2=80=99ll give an update soon on our development progress before updati= ng the BIP. ---------- Graph Learning ---------- Greg Maxwell also asked: ``` Has any work been given to the fact that dandelion propagation potentially making to measure properties of the inter-node connection graph? e.g. Say I wish to partition node X by disconnecting all of its outbound connections, to do that it would be useful to learn whom is connected to X. I forward a transaction to X, observe the first node to fluff it, then DOS attack that node to take it offline. Will I need to DOS attack fewer or more nodes to get all of X's outbounds if X supports rapid stem forwarding? ``` In terms of graph learning, there are two graphs to consider: the anonymity graph (i.e., the stem set of each node), and the main P2P graph. Dandelion has at least as good graph-hiding properties as diffusion for a natural class of attacks (which include the attack described in the comment above). Consider the task of learning the main P2P graph in today=E2=80=99s network= (under diffusion spreading). Suppose a supernode is connected to all nodes, and wants to learn the 1-hop neighbors of a given target node. The eavesdropper passes a transaction to the target, and waits to hear which nodes relay the transaction first. If the target has 8 outbound neighbors, then in each experiment, the supernode will receive 8 independent relay timestamps from the target=E2=80=99s 1-hop neighbors. By repeating this many times, the adv= ersary can infer the 1-hop neighbors as the nodes who relay the transaction with the appropriate mean delay (taking into account the appropriate exponential parameters). Eventually, this set will be learned with high certainty. Now consider the same task if the target is a Dandelion node. Note that the supernode=E2=80=99s probe tx must be relayed as a Dandelion message to obse= rve any difference with the prior experiment. First of all, the target will only pass the tx to one node in its stem set. Hence, in each experiment, the supernode can learn at most one timestamp from a relevant node, whereas previously it learned eight per experiment. This inherently reduces the adversary=E2=80=99s learning rate. Second, if the target=E2=80=99s relay is= a Dandelion node and chooses to extend the stem, then the supernode will not receive an= y relevant timestamp (i.e. a timestamp from a 1-hop neighbor) unless the supernode lies in the relay=E2=80=99s stem set. This happens with a probabi= lity that depends on the level of deployment and the number of (seemingly) distinct nodes being run by the supernode, but is strictly smaller than 1. _ Hence, the rate at which an attacker can learn the main P2P graph is strictly higher under diffusion (as in Bitcoin Core today) compared to using Dandelion. _ A similar argument can be made for the anonymity graph, which we currently implement as an overlay to the main P2P graph. =3D=3D=3D=3D=3D Responses to Other Miscellaneous Comments =3D=3D=3D=3D ``` An alternative construction would be that when a stem transaction goes out there is a random chance that the stem flag is not set (with suitable adjustment to keep the same expected path length) For some reason I believe this would be a superior construction, but I am only able to articulate one clear benefit: It allows non-dandelion capable nodes to take on the role of the last stem hop, which I believe would improve the anonymity set during the transition phase. ``` Agreed, this is actually what we have implemented. --------- Thanks! Giulia Fanti Andrew Miller Surya Bakshi Shaileshh Bojja Venkatakrishnan Pramod Viswanath Date: Tue, 13 Jun 2017 01:00:50 +0000 > From: Gregory Maxwell > To: Andrew Miller > Cc: Bitcoin Dev > Subject: Re: [bitcoin-dev] BIP proposal - Dandelion: Privacy > Preserving Transaction Propagation > Message-ID: > gmail.com> > Content-Type: text/plain; charset=3D"UTF-8" > > On Mon, Jun 12, 2017 at 2:46 PM, Andrew Miller via bitcoin-dev > wrote: > > Dear bitcoin-dev, > > We've put together a preliminary implementation and BIP for > > Dandelion, and would love to get your feedback on it. Dandelion is a > > privacy-enhancing modification to Bitcoin's transaction propagation > > mechanism. Its goal is to obscure the original source IP of each > > transaction. > > I'm glad to see this out now, so I'm not longer invading the git repo > uninvited. :) > > > - Stronger attacker model: we defend against an attacker that > > actively tries to learn which nodes were involved in the stem phase. > > Our approach is called "Mempool Embargo", meaning a node that receives > > a "stem phase" transaction behaves as though it never heard of it, > > until it receives it again from someone else (or until a random timer > > elapsess). > > > The description in the BIP appears inadequate: > > > > That is, Alice will not include the embargoed transaction when > responding to MEMPOOL requests, and will not respond to GETDATA requests > from another node (Bob) unless Alice previously sent an INV to Bob. The > embargo period ends as soon as Alice receives an INV advertising the > transaction as being in fluff mode. > > For example, it's not clear if I can query for the existence of a > transaction by sending a conflict. If this doesn't seem problematic, > consider the case where I, communicating with you over some private > channel, send you a payment inside a payment protocol message. You > announce it to the network and I concurrently send a double spend. > Only nodes that were part of the stem will reject my double spend, so > I just learned a lot about your network location. > > It's also appears clear that I can query by sending an inv and > noticing that no getdata arrives. An example attack in the latter is > that when I get a stem transaction I rapidly INV interrogate the > entire network and by observing who does and doesn't getdata I will > likely learn the entire stem path upto permutation. > > The extra network capacity used by getdata-ing a transaction you > already saw via dandelion would be pretty insignificant. > > I believe the text should be simplified and clarified so just say: > > "With the exception of her behavior towards the peer sending in the > stem transaction and the peer sending out the transaction Alice's > behavior should be indistinguishable from a node which has not seen > the transaction at all until she receives it via ordinary forwarding > or until after the timeout." -- then its up to the implementation to > achieve indistinguishably regardless of what other protocol features > it uses. > > > Are there other ways we haven't thought of? We think the alternative > > approach (bypassing mempool entirely) seems even harder to get right, > > and foregoes existing DoS protection. > > I think avoiding the is the most sensible way; and from a software > maintenance perspective I expect that anything less will end up > continually suffering from serious information leaks which are hard to > avoid accidentally introducing via other changes. > > The primary functionality should be straightforward to implement, > needing just a flag to determine if a transaction would be accepted to > the mempool but for the flag, but which skips actually adding it. > > Handling chains of unconfirmed stem transactions is made more > complicated by this and this deserves careful consideration. I'm not > sure if its possible to forward stem children of stem transactions > except via the same stem path as the parent without leaking > information, it seems unlikely. > > This approach would mostly take additional complexity from the need to > limit the amplification of double spends. I believe this can be > resolved by maintaining a per-peer map of the not yet expired vin's > consumed by stem fowards sent out via that peer. E.g. vin->{timeout, > feerate}. Then any new forward via that stem-peer is tested against > that map and suppressed if it it spends a non-timed-out input and > doesn't meet the feerate epsilon for replacement. > > Use of the orphan map is not indistinguishable as it is used for block > propagation, and itself also a maintenance burden to make sure > unrelated code is not inadvertently leaking the stem transactions. > > > After a random number of hops along the stem, the transaction enters th= e > fluff phase, > > The BIP is a bit under-specified on this transition, I think-- but I > know how it works from reading the prior implementation (I have not > yet read the new implementation). > > The way it works (assuming I'm not confused and it hasn't changed) is > that when a new stem transaction comes in there is a chance that it is > treated as coming in as a normal transaction. > > An alternative construction would be that when a stem transaction goes > out there is a random chance that the stem flag is not set (with > suitable adjustment to keep the same expected path length) > > For some reason I believe this would be a superior construction, but I > am only able to articulate one clear benefit: It allows non-dandelion > capable nodes to take on the role of the last stem hop, which I > believe would improve the anonymity set during the transition phase. > > Unrelated: > > Has any work been given to the fact that dandelion propagation > potentially making to measure properties of the inter-node connection > graph? e.g. Say I wish to partition node X by disconnecting all of > its outbound connections, to do that it would be useful to learn whom > is connected to X. I forward a transaction to X, observe the first > node to fluff it, then DOS attack that node to take it offline. Will > I need to DOS attack fewer or more nodes to get all of X's outbounds > if X supports rapid stem forwarding? > > > ------------------------------ > > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > --94eb2c0ed4a02204530559a99b16 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Greetings bitcoin-dev,

=C2=A0=C2=A0We=E2= =80=99re returning to this thread to give an update on the Dandelion projec= t after several months of additional work. (Dandelion is a new privacy-pres= erving transaction propagation method, which we are proposing as a BIP. See= the original post in this thread https://lists.linuxfoundat= ion.org/pipermail/bitcoin-dev/2017-June/014571.html for more background) The feedback on our initial BIP from Greg Maxw= ell in this thread touched on several important issues affecting the protoc= ol design, which it has taken us until now to adequately address.

The focus of this update is a new vari= ant of the Dandelion++ mechanism presented earlier. The new variant is call= ed =E2=80=9CPer-Incoming-Edge=E2=80=9D routing. In a nutshell, while the ea= rlier Dandelion++ variant calls for routing *each* stem transaction through= a randomly chosen path, Per-Incoming Edge routing causes each transaction = from the same source to traverse the same pseudorandom path. The most impor= tant benefit of Per-Incoming-Edge is that it prevents =E2=80=9Cintersection= attacks=E2=80=9D that result if a client broadcasts multiple transactions = over a short period of time. We validate this new variant with new analysis= and simulation as explained below.

Today=E2=80=99s update also includes an outline of our next developm= ent plans. We have not yet completed a reference implementation, so this up= date does not include a new BIP. Instead we=E2=80=99re just outlining the s= teps we plan to take before an updated BIP. The new approach also =C2=A0imp= acts our implementation approach. Since Per-Incoming Edge routing simplifie= s the handling of orphan transactions, we=E2=80=99re now planning on adopti= ng Greg Maxwell=E2=80=99s suggestion to bypass the txMempool for dandelion = stem transactions.


=3D=3D=3D=3D=3D=3D


The feedback on Dandelion from Greg Maxwell touc= hed on a few important issues: (1) robustness to observations over time, ak= a =E2=80=9Cintersection attacks=E2=80=9D, (2) protocol- or implementation-l= evel data leaks, and (3) graph learning.


(1) With time, the adversary may be able to obs= erve many message trajectories, thereby eventually learning the underlying = graph structure and/or improving its deanonymization estimate for a given e= stimate of the graph structure. In our original Dandelion BIP, we addressed= this by changing the anonymity graph topology from a directed line to a di= rected 4-regular graph. (In short, instead of a single outgoing edge for Da= ndelion transactions, each node selects from =C2=A0*two* such edges). This = topology provides robustness to adversaries who are able to learn the graph= , but those results still assume that each node generates only one transact= ion in each =E2=80=9Cepoch=E2=80=9D (time between reshuffling the anonymity= graph). Hence a big remaining question is to understand the effect of inte= rsection attacks--an adversary observing multiple dependent transactions--o= n deanonymization precision and recall.


(2= ) The second issue is protocol- or implementation-level behavior that would= allow an adversary to actively probe Dandelion to learn more information t= han before. As you correctly note, we want to avoid the adversary using con= flicting transactions to infer which nodes are in the stem. This issue is r= elated to issue (1), in that our mechanism for addressing intersection atta= cks will determine what data structures we need in the implementation.


(3) The third issue is that an adversary may be= able to infer the structure of the graph by observing network traffic. We = want to prevent this.


----------

Intersection Attacks

----------

<= p dir=3D"ltr" style=3D"line-height:1.38;margin-top:0pt;margin-bottom:0pt"><= br>

An adversary=E2=80=99s ability to launch intersect= ion attacks depends on the internal Dandelion routing policy. Two natural w= ays to approach routing are the following:


= =C2=A01. Per-Hop: For each incoming stem transaction, make an independent = random decision of (a) whether to transition to =E2=80=9Cfluff=E2=80=9D pha= se, and (b) if =E2=80=9Cstem=E2=80=9D, then which node should we relay to. = This means that two transactions, even starting from the same source, take = independent random walks through the anonymity graph. This is what our curr= ent implementation does.


=C2=A02. Per-Inbo= und-Edge: For each inbound edge e, randomly select one outbound edge g, and= relay all transactions arriving on edge e to edge g (assuming the transact= ion remains in stem phase). Each node uses this relay mapping for an entire= epoch, which lasts about 10 min. Each source also randomly chooses one out= bound edge g=E2=80=99 for its own transactions; so if a node generates 5 tr= ansactions, they will all get propagated over edge g=E2=80=99. This approac= h has the property that during an epoch, all transactions from a single sou= rce will take the same path through the stem graph.


We have simulated and analyzed these two routing protocols, and fi= nd that per-inbound-edge routing seems to be more robust to intersection at= tacks. For our simulations we consider the =E2=80=9Cfirst-spy=E2=80=9D esti= mator --- this means the rule where the attacker simply guesses that the fi= rst peer to relay a transaction to a spy node is the real source. Figure 1 = (link below) illustrates the first-spy precision for per-incoming-edge rout= ing and per-transaction routing when each node has *one* transaction. Highe= r precision means worse anonymity. For comparison, this figure includes dif= fusion, which is the spreading mechanism currently used. Here =E2=80=98p=E2= =80=99 denotes the fraction of nodes in the network that are spies. (Recall= that in our model, we treat the attacker has having control over some frac= tion of random nodes). The turquoise curve (labelled =E2=80=98p=E2=80=99) i= s shown for reference---it does not represent any routing protocol.<= /p>

https://github.com/gfanti/bips/blob/master/per-edge-vs-per-tx.jpg=


First, note that the first-spy estimator is t= hought to be significantly suboptimal for diffusion (red line). Prior liter= ature has shown that on certain classes of graphs, there exist estimators t= hat can detect diffusion sources with much higher probability than the firs= t-spy estimator. While it=E2=80=99s unclear how to apply those algorithms t= o Bitcoin=E2=80=99s graph, it is likely that strong algorithms exist. Hence= the first-spy estimator serves as a lower bound on precision for diffusion= . On the other hand, = we can show theoretically that the first-spy precision for per-tx and per-i= ncoming-edge routing is within a small constant factor of the optimal preci= sion for per-incoming-edge routing. Thus, we expect that the green (per-edg= e) and blue (per-tx) lines reflect the near-optimal attack, whereas the red= line (diffusion) could be much higher in practice.


The second issue to note is that the blue line (per-tx forwarding)= has the lowest precision of the three protocols for one tx per node. The g= reen line (per-edge forwarding) has higher precision than per-tx forwarding= when there are very few spies, but approaches per-tx forwarding as p incre= ases. Moreover, it has lower precision than diffusion for p>=3D0.05.


However, the real benefits of per-edge forwar= ding emerge as nodes start to transmit multiple transactions. Under per-edge forwarding, even if= nodes transmit multiple transactions each, those transactions will travers= e the same path in the a= nonymity graph, thereby preventing the adversary from learning any new info= rmation from later transactions. Meanwhile, under per-tx routing, we find e= mpirically that as nodes generate an increasing number of transactions, eac= h source generates a unique signature of spy-node-observations (we are curr= ently working on a more detailed exploration of this question). We expect t= hat such signatures can be used to exactly deanonymize users in cases where the adversary learns th= e graph. Hence per-tx forwarding is actual= ly quite fragile to adversaries learning the graph, whereas per-incoming-ed= ge is robust to intersection attacks. This is one key reason for adopting p= er-incoming-edge forwarding.


Adopting per-incoming-edge forwarding has another important = implication: it becomes easy to enforce the condition that child transactio= ns never enter fluff mode before parent transactions. This significantly si= mplifies orphan handling, and means that adversaries cannot infer that a pr= eceding transaction is still in stem mode just by passively listening to ne= twork traffic. We revisit this issue in the next section.


---= -------

Implementation-Level Metadata Leaks

=

= ----------


tl;dr: = concept ACK for gmaxwell=E2=80=99s suggestion on a new per-peer data struct= ure instead of mempool


Regardless of which = routing policy we choose, it is important that implementations do not leak = more information about transactions than they do in our model. It=E2=80=99s= especially important that spies do not get an =E2=80=9Coff-path=E2=80=9D v= iew of the nodes involved in the stem of a transaction. This practically me= ans that implementations must be careful not to expose whether or not a ste= m transaction was received, to any node except the two randomly chosen ones= . (i.e., not to supernodes that may make inbound connections to thousands o= f nodes).


We are currently developing a ref= erence implementation for Dandelion++, as a patch against Bitcoin Core. It = requires thoughtful integration to make this patch, and the choice of routi= ng policy informs our approach. We have so far considered two main integrat= ion approaches, whose main difference is whether or not they reuse the exis= ting *txMempool* data structure to store stem mode transactions.

=

A. Mempool embargo:

This how is our current implementation works. Stem transact= ions are only relayed if they are accepted to mempool. Stem transactions ar= e =E2=80=9Cembargoed=E2=80=9D by suppressing them from MEMPOOL and INV mess= ages sent from the node. This was the easiest to implement while preserving= all of Bitcoin=E2=80=99s existing DoS prevention. In particular, it simpli= fies the handling of orphan transactions, because the AcceptToMempool routi= ne already handles orphan transactions. However, this approach comes with a= risk of indirect leakage, especially if some edge case is missed in implem= entation.


B. Avoid modifying mempool (or an= y global structure) for stem transactions:

This is the approach preferred by Greg Maxwell. The main= benefit is that it is much more clear that there is no indirect leakage, a= lthough it makes it harder to argue there is no additional DoS concern. We = have already taken a couple of steps towards implementing this here: =C2=A0= h= ttps://github.com/gfanti/bitcoin/commits/dandelion-nomempool The main idea is to avoid duplicating the rules for whether = a transaction would be accepted into mempool or not, by adding a =E2=80=9Cd= ry run=E2=80=9D option to the AcceptToMempool function. Our implementation = of this approach is not yet finished; it still remains to develop the per-p= eer data structure.


Orphan transactions are= important for per-tx routing, because with per-tx routing, the child and t= he parent might travel along different stems. A burst of transactions from = a single sender would have to be queued so they enter fluff mode sequential= ly. A lot of our testing (with the included test framework) involved ensuri= ng such transactions were handled effectively. This was also the deciding f= actor for our choice of using Option B =E2=80=9CMempool Embargo=E2=80=9D ab= ove. With Per-incoming Edge routing, however, orphan transaction handling c= an be simplified, since out-of-order transactions would not be sent along s= tems.


We therefore plan to re-engineer a mu= ch of our reference implementation to:

1) use p= er-incoming edge routing,

2) simplify handling = of orphan transactions,

3) adopt the proposed a= pproach of avoiding the mempool data structure for stem transactions.

We=E2=80=99ll give an update soon on our development= progress before updating the BIP.


----------

Graph Learning

------= ----


Greg Maxwell also asked:

```

Has any work been given= to the fact that dandelion propagation
potentially making to = measure properties of the inter-node connection<= br class=3D"gmail-kix-line-break">graph? =C2=A0e= .g.=C2=A0 Say I wish to partition node X by disconnecting all of
its outbound connections, to do that it would be useful to learn whom
is connected to X.=C2=A0 I forward a transaction to X, observe t= he first
node to fluff it, =C2=A0then DOS attack that node to = take it offline.=C2=A0 Will
I need to DOS attack fewer or more= nodes =C2=A0to get all of X's outbounds
if X supports rap= id stem forwarding?

```


In terms of graph learning, there are two graphs to consider: t= he anonymity graph (i.e., the stem set of each node), and the main P2P grap= h. Dandelion has at least as good graph-hiding properties as diffusion for = a natural class of attacks (which include the attack described in the comme= nt above). =C2=A0


Consider the task of lear= ning the main P2P graph in today=E2=80=99s network (under diffusion spreadi= ng). Suppose a supernode is connected to all nodes, and wants to learn the = 1-hop neighbors of a given target node. The eavesdropper passes a transacti= on to the target, and waits to hear which nodes relay the transaction first= . If the target has 8 outbound neighbors, then in each experiment, the supe= rnode will receive 8 independent relay timestamps from the target=E2=80=99s= 1-hop neighbors. By repeating this many times, the adversary can infer the= 1-hop neighbors as the nodes who relay the transaction with the appropriat= e mean delay (taking into account the appropriate exponential parameters). = Eventually, this set will be learned with high certainty.


Now consider the same task if the target is a Dandelion nod= e. Note that the supernode=E2=80=99s probe tx must be relayed as a Dandelio= n message to observe any difference with the prior experiment. First of all= , the target will only pass the tx to one node in its stem set. Hence, in e= ach experiment, the supernode can learn at most one timestamp from a releva= nt node, whereas previously it learned eight per experiment. This inherentl= y reduces the adversary=E2=80=99s learning rate. Second, if the target=E2= =80=99s relay is a Dandelion node and chooses to extend the stem, then the = supernode will not receive any relevant timestamp (i.e. a timestamp from a 1-hop n= eighbor) unless the supernode lies in the relay=E2=80=99s stem set. This ha= ppens with a probability that depends on the level of deployment and the nu= mber of (seemingly) distinct nodes being run by the supernode, but is stric= tly smaller than 1.

=C2=A0=C2=A0=C2=A0_ Hence, the rate at which an attacker can learn the main= P2P graph is strictly higher under diffusion (as in Bitcoin Core today) co= mpared to using Dandelion. _


A similar argu= ment can be made for the anonymity graph, which we currently implement as a= n overlay to the main P2P graph.


=3D= =3D=3D=3D=3D

Responses to Other Miscellaneous Comments=

=3D=3D=3D=3D

```

An alternative construction would be that when= a stem transaction goes
out there is a random chance that the= stem flag is not set (with
suitable adjustment to keep the sa= me expected path length)

For some reason I believe this would be a superior= construction, but I
am only able to articulate one clear bene= fit: =C2=A0It allows non-dandelion
capable nodes to take on th= e role of the last stem hop, which I
believe would improve the= anonymity set during the transition phase.

= ```

Agreed, this is actually what we have imple= mented.


-----= ----

Thanks!

Giulia Fanti <gfanti@andrew.cmu.edu>
Andrew Miller <soc1024@illinois.= edu>
Surya Bakshi <sbakshi3@illinois.edu>Shaileshh Boj= ja Venkatakrishnan <bjjvnkt2@illinois.edu>Pramod Viswa= nath <pramodv@illinois.edu>

<= span style=3D"color:rgb(34,34,34);font-size:small;white-space:normal">
<= /span>


Date: Tue, 13 Jun 20= 17 01:00:50 +0000
From: Gregory Maxwell <greg@xiph.org>
To: Andrew Miller <
soc1024@illin= ois.edu>
Cc: Bitcoin Dev <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] BIP proposal - Dandelion: Privacy
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Preserving Transaction Propagation
Message-ID:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <CAAS2fgRAnGMMxKPCaj1SL=3Dz3O2wuGS8nyPrgtGhSpuGgAoV= tKg@mail.gmail.com>
Content-Type: text/plain; charset=3D"UTF-8"

On Mon, Jun 12, 2017 at 2:46 PM, Andrew Miller via bitcoin-dev
<bitcoin-dev@li= sts.linuxfoundation.org> wrote:
> Dear bitcoin-dev,
>=C2=A0 =C2=A0 We've put together a preliminary implementation and B= IP for
> Dandelion, and would love to get your feedback on it. Dandelion is a > privacy-enhancing modification to Bitcoin's transaction propagatio= n
> mechanism. Its goal is to obscure the original source IP of each
> transaction.

I'm glad to see this out now, so I'm not longer invading the git re= po
uninvited. :)

>=C2=A0 - Stronger attacker model: we defend against an attacker that > actively tries to learn which nodes were involved in the stem phase. > Our approach is called "Mempool Embargo", meaning a node tha= t receives
> a "stem phase" transaction behaves as though it never heard = of it,
> until it receives it again from someone else (or until a random timer<= br> > elapsess).


The description in the BIP appears inadequate:


> That is, Alice will not include the embargoed transaction when respond= ing to MEMPOOL requests, and will not respond to GETDATA requests from anot= her node (Bob) unless Alice previously sent an INV to Bob. The embargo peri= od ends as soon as Alice receives an INV advertising the transaction as bei= ng in fluff mode.

For example, it's not clear if I can query for the existence of a
transaction by sending a conflict.=C2=A0 If this doesn't seem problemat= ic,
consider the case where I, communicating with you over some private
channel, send you a payment inside a payment protocol message. You
announce it to the network and I concurrently send a double spend.
Only nodes that were part of the stem will reject my double spend, so
I just learned a lot about your network location.

It's also appears clear that I can query by sending an inv and
noticing that no getdata arrives.=C2=A0 An example attack in the latter is<= br> that when I get a stem transaction I rapidly INV interrogate the
entire network and by observing who does and doesn't getdata I will
likely learn the entire stem path upto permutation.

The extra network capacity used by getdata-ing a transaction you
already saw via dandelion would be pretty insignificant.

I believe the text should be simplified and clarified so just say:

"With the exception of her behavior towards the peer sending in the stem transaction and the peer sending out the transaction Alice's
behavior should be indistinguishable from a node which has not seen
the transaction at all until she receives it via ordinary forwarding
or until after the timeout." -- then its up to the implementation to achieve indistinguishably regardless of what other protocol features
it uses.

> Are there other ways we haven't thought of? We think the alternati= ve
> approach (bypassing mempool entirely) seems even harder to get right,<= br> > and foregoes existing DoS protection.

I think avoiding the is the most sensible way; and from a software
maintenance perspective I expect that anything less will end up
continually suffering from serious information leaks which are hard to
avoid accidentally introducing via other changes.

The primary functionality should be straightforward to implement,
needing just a flag to determine if a transaction would be accepted to
the mempool but for the flag, but which skips actually adding it.

Handling chains of unconfirmed stem transactions is made more
complicated by this and this deserves careful consideration. I'm not sure if its possible to forward stem children of stem transactions
except via the same stem path as the parent without leaking
information, it seems unlikely.

This approach would mostly take additional complexity from the need to
limit the amplification of double spends. I believe this can be
resolved by maintaining a per-peer map of the not yet expired vin's
consumed by stem fowards sent out via that peer. E.g. vin->{timeout,
feerate}.=C2=A0 Then any new forward via that stem-peer is tested against that map and suppressed if it it spends a non-timed-out input and
doesn't meet the feerate epsilon for replacement.

Use of the orphan map is not indistinguishable as it is used for block
propagation, and itself also a maintenance burden to make sure
unrelated code is not inadvertently leaking the stem transactions.

> After a random number of hops along the stem, the transaction enters t= he fluff phase,

The BIP is a bit under-specified on this transition, I think-- but I
know how it works from reading the prior implementation (I have not
yet read the new implementation).

The way it works (assuming I'm not confused and it hasn't changed) = is
that when a new stem transaction comes in there is a chance that it is
treated as coming in as a normal transaction.

An alternative construction would be that when a stem transaction goes
out there is a random chance that the stem flag is not set (with
suitable adjustment to keep the same expected path length)

For some reason I believe this would be a superior construction, but I
am only able to articulate one clear benefit:=C2=A0 It allows non-dandelion=
capable nodes to take on the role of the last stem hop, which I
believe would improve the anonymity set during the transition phase.

Unrelated:

Has any work been given to the fact that dandelion propagation
potentially making to measure properties of the inter-node connection
graph?=C2=A0 e.g.=C2=A0 Say I wish to partition node X by disconnecting all= of
its outbound connections, to do that it would be useful to learn whom
is connected to X.=C2=A0 I forward a transaction to X, observe the first node to fluff it,=C2=A0 then DOS attack that node to take it offline.=C2=A0= Will
I need to DOS attack fewer or more nodes=C2=A0 to get all of X's outbou= nds
if X supports rapid stem forwarding?


------------------------------

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.= linuxfoundation.org
https://lists.linuxfoundation.org= /mailman/listinfo/bitcoin-dev

<= /div> --94eb2c0ed4a02204530559a99b16--