From cjp at ultimatestunts.nl  Tue Feb  2 17:56:08 2016
From: cjp at ultimatestunts.nl (CJP)
Date: Tue, 02 Feb 2016 18:56:08 +0100
Subject: [Lightning-dev] Laundry list of inter-peer wire protocol changes
In-Reply-To: <87mvrpru3e.fsf@rustcorp.com.au>
References: <87d1snvhyf.fsf@rustcorp.com.au>
	<1453923255.11915.36.camel@ultimatestunts.nl>
	<87mvrpru3e.fsf@rustcorp.com.au>
Message-ID: <1454435768.2011.28.camel@ultimatestunts.nl>


> > * Message confirmation: this is done manually (instead of relying on
> > TCP), so that a node knows which messages were received / need to be
> > re-transmitted, even after a crash + restart.
> 
> I think the protocol itself needs to be robust against retransmissions.
> There's no way to know if the other side received your acknowledgement
> before a crash, so you will always need to handle duplication on
> re-establishment.

Yes. Amiko Pay does that: It assigns a number to every message, and the
receiving side can confirm "I've received up to this number".
Not-yet-confirmed messages will be retransmitted, and the receiving
sides will ignore duplicates (except it will send a confirmation again,
in case the previous confirmation was lost).

> > * There is not only two-way communication between linked peers, but also
> > between payer and payee. This is necessary for Amiko Pay's
> > bi-directional routing, but also useful e.g. for transmitting meta-data
> > that doesn't fit in a QR code. Amiko Pay transmits an arbitrary-contents
> > "receipt" from payee to payer; in the future, this might be digitally
> > signed by the payee, as a "proof of transfer of ownership" of
> > non-cryptographic goods.
> 
> I agree.  There's room in the initial onion design for payer -> payee
> messages, but we don't have a channel for responses.
> 
> I can't see an easy way to implement the payee --> payer comms reliably:
> to be reliable it would have to be published on-chain in the commit tx.
> (Which we could do by constructing HTLCs such that they require a blob
> signed by the payee, but that's tracable ...).
> 
> Mats and Laolu wanted to add an arbitrary comms protocol layer, but I
> think that's something we can defer.

In Amiko Pay, payer <-> payee communication is done on a direct TCP
stream between them. Note that this also reduces latency: once
transaction locking reaches the payee, the payee knows (s)he's capable
of claiming the money, and can tell the payer that the payment is
completed. If reduced latency is in the interest of the payee, this is
likely to happen.

> > * Reserving before locking: this is an optimization, to reduce the risk
> > of locking funds in payment channels on a part of the route, and then
> > having to undo the locking when it turns out that the remaining part of
> > the route doesn't exist (anymore). Reserving is an informal(*),
> > temporary locking of funds for use in the transaction, and can be done
> > and undone very fast, without any channel operations. It is done
> > together with route searching + establishment.
> 
> I think that trades one DoS for another, though.  It saves cryptographic
> constructs, but latency is the real cost, and this increases it.
> 
> Of course, we'll have to revisit that if the network in practice proves
> subject to these problems...

For one category of channel designs, reserving is absolutely essential:
channels where bi-directional payments are made possible with a
decrementing lock time. There, you want to make sure that failed routing
attempts don't cause lock time decrements, since that would reduce the
channel lifetime more than necessary. I'd have to check whether there is
still any use case for this channel design, and whether the reserving
step is important for some other reason.

Note that reserving is necessary for bi-directional routing: on the
payee side of the meeting point, routing happens in the payee -> meeting
point direction, but locking has to happen in the meeting point -> payee
direction. So, they have to be different steps.

On latency: what latency do you think is needed for different use cases,
and what can we reach? Does this extra step really make a difference?

My estimate is that we'll typically have 10 hops ("six degrees of
separation" theory), and 100ms to transmit a message(*) over one hop.

Without reserving, you need to traverse all hops once(**) (the locking
operation) before payer(***)+payee know that the transaction has
succeeded. Actual settlement on the channels happens afterwards, but is
no longer critical for the latency as seen by payer+payee.

With reserving, you need to traverse all hops three times(**), in the
worst case that the meeting point is on one of the end points of the
route: once for making the route and reserving funds, once for
confirming that the route has been established and once for locking.

So, instead of one second, a transaction might take three seconds. Is
that a game changer? Maybe it is for e.g. public transport access gates,
where passenger throughput is essential. But then, people could reduce
latency a lot by having a direct channel with the public transport
operator.

For some use cases, e.g. high-frequency trading, people might of course
manually optimize their network and physical location to get better
figures than this.

CJP

(*) Not counting sending the confirmation back: a node that receives a
message can immediately forward a message on the next hop; message
confirmation on the receiving side can occur in parallel.

(**) Not counting failed routing attempts

(***) Assuming the payee tells the payer directly about the payment
succes, over a low-latency connection