From joost.jager at gmail.com Fri Jun 14 10:59:26 2019 From: joost.jager at gmail.com (Joost Jager) Date: Fri, 14 Jun 2019 12:59:26 +0200 Subject: [Lightning-dev] Improve Lightning payment reliability through better error attribution In-Reply-To: References: Message-ID: Hi ZmnSCPxj, > > That is definitely a concern. It is up to senders how to interpret the > received timestamps. They can decide to tolerate slight variations. Or they > could just look at the difference between the in and out timestamp, > abandoning the synchronization requirement altogether (a node could also > just report that difference instead of two timestamps). The held duration > is enough to identify a pair of nodes from which one of the nodes is > responsible for the delay. > > > > Example (held durations between parenthesis): > > > > A (15 secs) -> B (14 secs) -> C (3 secs) -> D (2 secs) > > > > In this case either B or C is delaying the payment. We'd penalize the > channel between B and C. > > This seems better. > If B is at fault, it could lie and reduce its reported delta time, but > that simply means it will be punished with A. > If C is at fault, it could lie and increase its reported delta time, but > that simply means it will be punished with D. > > I presume that the delta time is the time difference from when it sends > `update_add_htlc` and when it receives `update_fulfill_htlc`, or when it > gets an irrevocably committed `update_fail_htlc` + `revoke_and_ack`. > Is that accurate? > Yes that is accurate, although using the time difference between receiving the `update_add_htlc` and sending back the `update_fail_htlc` would work too. It would then include the node's processing time. > Unit should probably be milliseconds > Yes, we probably want sub-second resolution for this. An alternative that comes to mind is to use active probing and tracking > persistent data per node. > > For each node we record two pieces of information: > > 1. Total imposed delay. > 2. Number of attempts. > > Suppose a probe or payment takes N milliseconds on a route with M nodes to > fulfill or irrevocably fail at the payer. > For each node on the route, we increase Total imposed delay by N / M > rounded up, and increment Number of attempts. > For error reports we can shorten the route if we get an error response > that points to a specific failing node, or penalize the entire route in > case of a completely undecodable error response. > > When finding a route for a "real" payment, we adjust the cost of > traversing a node by the ratio Total imposed delay / Number of attempts (we > can avoid undefined math by starting both fields at 1). > For probes we can probably ignore this factor in order to give nodes that > happened to be borked by a different slow node on the trial route another > chance to exonerate their apparent slowness. > > This does not need changes in the current spec. > I think we could indeed do more with the information that we currently have and gather some more by probing. But in the end we would still be sampling a noisy signal. More scenarios to take into account, less accurate results and probably more non-ideal payment attempts. Failed, slow or stuck payments degrade the user experience of lightning, while "fat errors" arguably don't impact the user in a noticeable way. Joost -------------- next part -------------- An HTML attachment was scrubbed... URL: