From joost.jager at gmail.com  Fri Jun 14 10:59:26 2019
From: joost.jager at gmail.com (Joost Jager)
Date: Fri, 14 Jun 2019 12:59:26 +0200
Subject: [Lightning-dev] Improve Lightning payment reliability through
 better error attribution
In-Reply-To: <Cc2L0OrkDyHH7y1t6--ndZY63XWDgEhTWEyPYFdGhFcBdEPAWSXw0jvKsVM3hdzLCNJy2mUxuPtJtdSmoydkJyuv-CRG8yuW3zv6l5QRMpk=@protonmail.com>
References: <CAJBJmV-TGo0sE2-3GVtDvewj8E=ONd9bv-2bRqjkjV870qrCDQ@mail.gmail.com>
	<CACdvm3OXibyyBJW9NgK_o0m3W0VK0bodpnZ3a4+UdgP+Jux45w@mail.gmail.com>
	<VLqGxfXFptkC42VUIja6DsiVTFfFF3M2CqJifBOdb0bMHTKySFbm-tVl-y8GWCuNSh4qIriy0EAiv3n0j_8jJiEdgC8aI6ZdeQIdHDGQjP0=@protonmail.com>
	<CAJBJmV-Wg5KAhVsgVJJv8Bp52HDMP6K0vd5t+Gyekxrn1n0DkA@mail.gmail.com>
	<CAJBJmV--+RYNJaH=g10EKgGh==47jBwO922PpevoQTzgyocw=A@mail.gmail.com>
	<CAJBJmV-WEDjZW8S5Ud=6NpZcgC+3Eu56piBVHVd3Eb_JA-Gr+g@mail.gmail.com>
	<Cc2L0OrkDyHH7y1t6--ndZY63XWDgEhTWEyPYFdGhFcBdEPAWSXw0jvKsVM3hdzLCNJy2mUxuPtJtdSmoydkJyuv-CRG8yuW3zv6l5QRMpk=@protonmail.com>
Message-ID: <CAJBJmV9t2ygmn2o6bFCwdXpebVKHAQebZUUdt9QfTugcdsCzhA@mail.gmail.com>

Hi ZmnSCPxj,


> > That is definitely a concern. It is up to senders how to interpret the
> received timestamps. They can decide to tolerate slight variations. Or they
> could just look at the difference between the in and out timestamp,
> abandoning the synchronization requirement altogether (a node could also
> just report that difference instead of two timestamps). The held duration
> is enough to identify a pair of nodes from which one of the nodes is
> responsible for the delay.
> >
> > Example (held durations between parenthesis):
> >
> > A (15 secs) -> B (14 secs) -> C (3 secs) -> D (2 secs)
> >
> > In this case either B or C is delaying the payment. We'd penalize the
> channel between B and C.
>
> This seems better.
> If B is at fault, it could lie and reduce its reported delta time, but
> that simply means it will be punished with A.
> If C is at fault, it could lie and increase its reported delta time, but
> that simply means it will be punished with D.
>
> I presume that the delta time is the time difference from when it sends
> `update_add_htlc` and when it receives `update_fulfill_htlc`, or when it
> gets an irrevocably committed `update_fail_htlc` + `revoke_and_ack`.
> Is that accurate?
>

Yes that is accurate, although using the time difference between receiving
the `update_add_htlc` and sending back the `update_fail_htlc` would work
too. It would then include the node's processing time.


> Unit should probably be milliseconds
>

Yes, we probably want sub-second resolution for this.

An alternative that comes to mind is to use active probing and tracking
> persistent data per node.
>
> For each node we record two pieces of information:
>
> 1.  Total imposed delay.
> 2.  Number of attempts.
>
> Suppose a probe or payment takes N milliseconds on a route with M nodes to
> fulfill or irrevocably fail at the payer.
> For each node on the route, we increase Total imposed delay by N / M
> rounded up, and increment Number of attempts.
> For error reports we can shorten the route if we get an error response
> that points to a specific failing node, or penalize the entire route in
> case of a completely undecodable error response.
>
> When finding a route for a "real" payment, we adjust the cost of
> traversing a node by the ratio Total imposed delay / Number of attempts (we
> can avoid undefined math by starting both fields at 1).
> For probes we can probably ignore this factor in order to give nodes that
> happened to be borked by a different slow node on the trial route another
> chance to exonerate their apparent slowness.
>
> This does not need changes in the current spec.
>

I think we could indeed do more with the information that we currently have
and gather some more by probing. But in the end we would still be sampling
a noisy signal. More scenarios to take into account, less accurate results
and probably more non-ideal payment attempts. Failed, slow or stuck
payments degrade the user experience of lightning, while "fat errors"
arguably don't impact the user in a noticeable way.

Joost
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20190614/9c95c6f1/attachment.html>