From matsjj at gmail.com  Fri Oct 16 13:22:25 2015
From: matsjj at gmail.com (Mats Jerratsch)
Date: Fri, 16 Oct 2015 14:22:25 +0100
Subject: [Lightning-dev] Preventing MITM - Providing new nodes with real
	pubkeys
Message-ID: <CAE8CtV=uH9PG1Tb3e9Kh2CmnNXGVBBaEokocZuv0bjiD-TQm1w@mail.gmail.com>

So being done with encryption and authentication, the next layer for
me now is to figure out how exactly nodes will broadcast their
existence and open channels and everything.

The one problem that we have currently with the way encryption and
authentication works, is that the encryption layer is not protecting
against MITM attacks, such that an attacker could have a connection
with both and establish different encryption with both and just reads
and relays all the data.

This gets defeated with the agreed-on authentication layer, where both
nodes sign a message with their real pubkey and the temp pubkey of the
party they are talking to, where a MITM could not produce these
signatures. However, this only holds true if the nodes actually know
the pubkey of the node they want to talk to. Which raises the point of
- how do we bring this information across securely? A new node joining
the network and obtaining one/some IP of another network participant
will want to get a list with nodes/pubkeys/IPs. Without a central
authority that could provide trust into the data, an attacker could
trick it into a fake network, even if just for vandalism. (Having peer
discovery trust on some hardcoded nodes to obtain IP addresses is
dangerous enough, we don't want to rely on that for pubkeys as well)


There are some possibilities how to mitigate the risk / make an attack expensive

(1)
Have a snapshot of the data hashed and linked to the blockchain.
Similar to the way 'Factom' works. It would provide the data with some
integrity framework, but keeping track of the changes would require
some overhead. Without a central service it would further be difficult
to establish who should make these linking transactions to the
blockchain...

(2)
As long as the malleability issue has not been fixed, the blockchain
can only used with additional techniques to obtain a map of the
channels from it. As the anchor transactions are P2SH, we need to
expose the script, such that others are able to verify we at least
have an anchor tx on the blockchain (associated with costs after all).
For the current form it would be enough to have
SecretAHash || KeyB' || KeyB || KeyA || TxID || SignatureB (L=231B)
with KeyB being the node pubkey (lots of key reusage...)
or
SecretAHash || KeyB' || KeyB || KeyA || TxID || nodePubKey ||
SignatureB (L=264B) with KeyB as a channel key that does not need to
be equal with the nodePubKey.

This is information everyone should store in case a new node joins a
network, similar to the blockchain. New nodes can then check against
the blockchain, whether this data is actually present there. An
attacker can fake a complete network together with lots of
transactions on the blockchain, but the incentive is low (vandalism)
and the costs are high. For 100k nodes and 10 open channels per node,
this adds up to 220MB. Not too bad, considering full nodes are highly
incentivised to run full bitcoin nodes as well, it is actually rather
negligible. This information is pretty static, however we want
everyone to have a decently consistent view of the network, so we
would probably do some rebroadcast of that every few days, just to
ensure everyone knows about it.

(3)
Similar to (2), but instead of broadcasting our script we add a
OP_RETURN output to each anchor transaction. It is cheap to implement
it, as we don't have to broadcast anything specific to this issue. It
is more expensive to attack, but also more expensive to open up a new
channel. And it doesn't help scaling either, so I tend to dislike this
idea.

(4)
Start with one node and just go through a lot of different IP
addresses you get from that node and repeat over and over and compare
what the different nodes are telling you about the system. We can
always add this technique on top of the other 3, and add IP addresses
as an additional cost vector for a successful attack. If most of the
IP addresses the first node gave you are dead, you can assume he gave
you wrong information about the network and start again with another
node.


As I'm implementing broadcast messages anyways for other purposes (see
other ML post), I tent to like (2) the most, it is the most expensive
to attack as well I think.

Mats Jerratsch