From hiroki.gondo at nayuta.co  Tue Jun 25 08:20:12 2019
From: hiroki.gondo at nayuta.co (Hiroki Gondo)
Date: Tue, 25 Jun 2019 17:20:12 +0900
Subject: [Lightning-dev] Proposal for Stuckless Payment
Message-ID: <CAO6oAq0etMXQj7mMvC=bJ8HLqZ5Aw8v_D7iT0eA73_d1v_+r9A@mail.gmail.com>

Problem

===============================

With the current BOLT 1.x, there exists a theoretical possibility that
payments get stuck. In the actual network, this problem appears to be
significantly improved due to the maturity of each implementation and the
preventive ping-before-commitment_signed [1]. However, there is still a
possibility that payments would get stuck due to an unstable communication
environment, trouble and malice of intermediate nodes, etc., and we need to
estimate the cost to deal with potential problems in advance.


Proposal Summary

===============================

The following proposal is intended to reduce payments getting stuck. Both
the payee and the payer provide keys to settle the payment, so it is
possible to safely retry in each phase that composes one payment. Also, the
involvement of the intermediate nodes can be eliminated.

In other words, we can assure this problem involves only the parties (the
payer and the payee). This proposal is not compatible with the BOLT 1.x,
and it?s assumed to apply to future specifications.


How to deal with payments getting stuck at present?

===============================

If payments get stuck, we will need additional *trusted* operations by the
final node?s implementation, the application or the service operator. For
example, when a payer orders a cup of coffee and attempts to pay for the
invoice, but the payment gets stuck, what must we do to correct the problem?

If an `update_add_htlc` gets stuck on an intermediate node (strictly, the
absence of the `revoke_and_ack` for the `commitment_signed` for the
message), the payer cannot ignore the issue. It may be fulfilled or failed,
but in the worst case the payer must wait until the `cltv_expiry`. That is
not realistic for UX.

There are obvious problems with retrying the payment by different routes
using the same invoice. After a successful retry, if the previous stuck
payment starts moving again and also succeeds, the payer will pay twice for
the same invoice. The payer cannot obtain the information to prove he has
paid twice (he has only a single preimage). Although the payee may fail the
second arrival payment for the same invoice, that is a *trusted* operation
dependent upon the final node implementation. Also, if an
`update_fulfill_htlc` gets stuck on the return, there is no solution.

If the payer receives another invoice from the payee and pays again, and if
the two payments are succeed, the payer must obtain a refund from the payee
for the extra payment. This requires additional *trusted* operations
dependent on the application or the service operator.


Key Provided by Payer

===============================

As I previously mentioned, if an `update_add_htlc` gets stuck, the payer
cannot ignore that. Because the payee has the key (preimage) to unlock the
HTLCs in BOLT 1.x, the HTLCs may be fulfilled in unintended timing for the
payer.

What happens if the payer provides the key? For example, in the original
AMP [2], the payer provides the preimage (this proposal may be useful for
AMP, but in this example it is not meaningful to be *multi-path*, so please
imagine a simple *single-path*). The preimage is sent in the onion of the
`update_add_htlc`s. In other words, the HTLCs (locks) and the preimage
(key) are sent together.

        A --> B --> C --> D        # update_add_htlc, preimage

        A <-- B <-- C <-- D        # update_fulfill_htlc

I would like to separate the process into two phases; "add HTLCs (locks)"
and "provide a preimage (key)". After adding HTLCs along the route by
`update_add_htlc`s not including the preimage, the payee returns the ACK to
the payer. The payer will provide the preimage to the payee in the next
phase. In this way, the payer simply forgets the payment if an
`update_add_htlc` gets stuck and the ACK is not returned in a more
realistic timeout (e.g. 1 min) than the `cltv_expiry`, and can retry the
payment with another invoice without worrying about overpayment.

        A --> B --> C --> D        # update_add_htlc

        A <-- B <-- C <-- D        # ACK

        A --> B --> C --> D        # preimage

        A <-- B <-- C <-- D        # update_fulfill_htlc

However, this procedure does not yet solve the problem if
`update_fulfill_htlc`s gets stuck.


Proof of Payment

===============================

If the payer provided the preimage, the payer cannot get the proof of
payment (PoP).

PoP is important. It's important to bring it to the court, but it's also
necessary if upper layer applications are willing to proceed to the next
action based on the payment outcome. However, if we do not need to maintain
compatibility with BOLT 1.x, it is possible to add PoP to the above case
(HTLCs need two keys from both the payee and the payer, not only one key
(preimage) from the payee).

I have another consideration regarding PoP. In BOLT1.x, a preimage is
returned along the route. The payer receives the PoP when he receives the
preimage by the `update_fulfill_htlc` directly from the peer. Thus, if an
`update_fulfill_htlc`s gets stuck somewhere in the route, the payer cannot
obtain the PoP.

        A --> B --> C --> D        # update_add_htlc

        A       B x-- C <-- D        # update_fulfill_htlc (stuck)

However, the payee can provide the preimage to the payer earlier. If the
`update_add_htlc` is irrevocably committed at the payee's own channel and
there is no problem with the parameters, it is safe to send the preimage to
the payer skipping the intermediate nodes or using an alternate route (if
possible).

        A --> B --> C --> D        # update_add_htlc

        A <----------------- D        # preimage

        A       B x-- C <-- D        # update_fulfill_htlc (stuck)


e.g. Modifications of ?Multi-Hop Locks from Scriptless Scripts?

===============================

Based on the above considerations, I will describe the stuckless payment
protocol. I will do the work from "Multi-Hop Locks from Scriptless Scripts".

Multi-Hop Locks from Scriptless Scripts

https://github.com/apoelstra/scriptless-scripts/blob/master/md/multi-hop-locks.md

(commit 94a4e2f961c839bd1b9ca8773abadbf0f198c34b)

I will modify the following sequence.

https://raw.githubusercontent.com/apoelstra/scriptless-scripts/master/md/images/multi-hop-locks.png

The modifications are as follows.

Setup: At the end of this phase, Do NOT send the key (`y0+y1+y2`) from A to
D yet.

Update: At the end of this phase, D returns the ACK to A.

        A <-- B <-- C <-- D        # ACK

Pre-Settlement: Add this new phase after the Update phase. Any route can be
used.

        A --> * --> * --> D        # key (`y0+y1+y2`)

        A <-- * <-- * <-- D        # PoP (`z`)

Settlement: No change.

Let's look at the details.

In the original sequence, *the payer* provides the key (`y0+y1+y2`) to
unlock the HTLCs (like the original AMP). The sequence has also introduced
the PoP (`z`) provided by *the payee*, which already meets, to some extent,
the requirements of what I will describe.

Setup:

In the original sequence, the resistance to stuck payments is the same as
in the BOLT 1.x. All we need to do is to separate the process sending the
key (`y0+y1+y2`) from the Setup phase and bring it after the Update phase.
This prevents the payee from immediately moving to the Settlement phase
after the Update phase is complete. Therefore, if a stuck payment in the
Update phase suddenly begins moving, the phase cannot automatically move to
the Settlement phase against the will of the payer.

Update:

At the end of this phase, we require the payee return the ACK to the payer
to notify the completion of this phase. It must be guaranteed that the
payee himself returns it. This can be achieved by reversing the route and
wrapping the ACK in the onion sequentially, as the `reason` field of the
`update_fail_htlc` in BOLT 1.x.

If the payment gets stuck in this phase, the payer can create a new key and
reuse the PoP (`z`) to start over from the Setup phase. Since the key of
the previous stuck payment has not been sent to the payee, the stuck HTLCs
can be left and they will be removed later.

Pre-Settlement:

The new Pre-Settlement phase is the actual settlement phase between the
payer and the payee. When the payer receives the ACK at the end of the
Update phase, he can send the key to the payee. Since this phase is just
passing data between two points (unlike adding HTLCs), if it fails we can
safely retry and (if possible) not have to use the same route or routing
protocol as the Update phase.

After the payee receives and verifies the key from the payer, he can send
the PoP (`z`) to the payer as the response. The payee can do this before
the Settlement phase if he can verify that the received key is for his own
incoming HTLC.

Settlement:

The Settlement phase is the same as that of the original sequence. Even if
a message get stucks in this phase, the payment itself is not affected
since the settlement between the payer and the payee has already been
substantially completed in the Pre-Settlement phase.

These modifications add the cost of three new messages (ACK, key, PoP), but
it is only three (unaccompanied by other messages). These may also reduce
other preventive messages.


Conclusion

===============================

In this proposal, the probability that messages in each phase get stuck is
at the same level as BOLT 1.x, but even if they get stuck, it is possible
to safely retry each phase that composes one payment, and the number of
payments completely stuck will be reduced.

In fact, the stuck problem will be limited to the Pre-Settlement phase
between the two parties (the payer and the payee). This is the case where
the payer sends the key to the payee, but the PoP (`z`) is not returned. I
have already mentioned that the payer can safely retry this phase and does
not have to use the same route or routing protocol as the Update phase.

Any remaining problems would be caused by the payee's own trouble or
malice, which means that no intermediate node is involved in the stuck
problem, and this problem becomes one involves only the parties (the payer
and the payee). This improvement allows us to see the Lightning Network as
more trustless.


Hiroki Gondo


[1] https://github.com/lightningnetwork/lightning-rfc/pull/508

[2]
https://lists.linuxfoundation.org/pipermail/lightning-dev/2018-February/000993.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20190625/ddfd97d2/attachment-0001.html>