From laolu32 at gmail.com  Mon Aug 16 19:55:48 2021
From: laolu32 at gmail.com (Olaoluwa Osuntokun)
Date: Mon, 16 Aug 2021 12:55:48 -0700
Subject: [Lightning-dev] #zerobasefee
In-Reply-To: <8c14f528-03e8-495a-bf0f-27c15fe1e6b0@mattcorallo.com>
References: <20210815010037.GA3971@erisian.com.au>
 <8c14f528-03e8-495a-bf0f-27c15fe1e6b0@mattcorallo.com>
Message-ID: <CAO3Pvs_0cj7kxyVjMDy9=Fj-+qaUa20Yxt7QHEOXUUv6+Ti0fQ@mail.gmail.com>

Matt wrote:
> I'm frankly still very confused why we're having these conversations now

1000% this!!

This entire conversation strikes me as extremely premature and backwards
tbh.  Someone experimenting with a new approach shouldn't prompt us to
immediately modify the protocol to better "fit" the approach, particularly
before any sort of comparative analysis has even been published. At this
point, to my knowledge we don't even have an independent implementation of
the algorithm that has been tightly integrated into an existing LN
implementation. We don't know in which conditions the algorithm excels, and
in which conditions this improvement is maybe negligible (likely when payAmt
<< chanCapacity).

I think part of the difficulty here lies in the current lack of a robust
framework to use in comparing the efficacy of different approaches. Even in
this domain, there're a number of end traits to optimize for including: path
finding length, total CLTV delay across all shards, the amount of resulting
splits (goal should be consume less commitment space), attempt iteration
latency, amount/path randomization, path finding memory, etc, etc.

This also isn't the first time someone has attempted to adapt typical
flow-based algorithms to path finding in the LN setting. T-bast from ACINQ
initially attempted to adapt a greedy flow-based algorithm [1], but found
that a number of implementation related edge cases (he cites that the
min+max constraints in addition to the normal fee limit most implementations
as barriers to adapting the algorithm) led him to go with a simpler approach
to then iterate off of. I'd be curious to hear from T-bast w.r.t how this
new approach differs from his initial approach, and if he spots any
yet-to-be-recognized implementation level complexities to properly
integrating flow based algorithms into path finding.

> a) to my knowledge, no one has (yet) done any follow-on work to
> investigate pulling many of the same heuristics Rene et al use in a
> Dijkstras/A* algorithm with multiple passes or generating multiple routes
> in the same pass to see whether you can emulate the results in a faster
> algorithm without the drawbacks here,

lnd's current approach (very far from perfect, derived via satisficement)
has some similarities to the flow-based approach in its use of probabilities
to attempt to quantify the level of uncertainty of internal network channel
balances.

We start by assuming a config level a priori probability of any given route
working, we then take that, and the fee to route across a given link and
convert the two values into a scalar "distance/weight" (mapping to an
expected cost) we can plug into vanilla Dijkstras [2]. A fresh node uses
this value to compare routes instead of the typical hop count distance
metric. With a cold cache this doesn't really do much, but then we'll update
all the a priori probabilities with observations we gain in the wild.

If a node is able to forward an HTLC to the next hop, we boost their
probability (conditional on the amount forward/failed, so there's a bayesian
aspect). Each failure results in the probabilities of nodes being affected
differently (temp chan failure vs no next node, etc). For example, if we're
able to route through the first 3 hops of the route, but the final hop fails
with a temp chan failure. We'll rewards all the nodes with a success
probability amount (default rn is 95%) that applies when the amount being
carried is < that prior attempt.

As we assume balances are always changing, we then apply a half life decay
that slows increases a penalized probability back to the baseline. The
resulting algorithm starts with no information/memory, but then gains
information with each attempt (increasing and decreasing probabilities as a
function of the amount attempted and time that has passed since the last
attempt). The APIs also let you mark certain nodes as having a higher
apriori probability which can reduce the amount of bogus path exploration.
This API can be used "at scale" to create a sort of active learning system
that learns from the attempts of a fleet of nodes, wallets, trampoline
nodes, wallets, etc (some privacy caveats apply, though there're ways to
fuzz things a bit differential style).

Other knobs exist such as the min probability setting, which controls how
high a success probability a candidate edge needs to have before it is
explored. If the algo is exploring too many lackluster paths (and there're a
lot of these on mainnet due to normal balance imbalance), this value can be
increased which will let it shed a large number of edges to be explored.
When comparing this to the discussed approach that doesn't use any prior
memory, there may be certain cases that allows this algorithm to "jump" to
the correct approach and skip all the pre-processing and optimization that
may result in the same route, just with added latency before the initial
attempt. I'm skipping some other details like how we handle repeated
failures of a node's channels (there's a mechanism that penalizes them more
heavily, as the algo assumes most of the node's channels aren't well
maintained/balanced, so why waste time trying the other 900 channels).

Our approach differs more dramatically from this new approach further when
it comes to the question of how to split a payment either due to a failure,
or when it's necessary (amt > max(chanCapacities...)). lnd takes a very
a simple approach of just divides the problem in half (divide and conquer,
so
fork into two instances of amt/2) and try again. The divide and conquer
approach typically means you'll end up with a minimal-ish amount of fees.
There's also a knob that lets you control the largest split size, which can
force the algo to split sooner, as otherwise it only splits when no route
exists (either we explored a ton and they all failed, or the payment amt is
larger than chan capacity).

The algo works pretty well when the probabilities combined with well tuning
of the parameters help the algorithm naturally ignore lower probability
routes during the edge relaxation step of Dijkstras. However if a client has
no prior observations (and didn't get any from say it's wallet provider or
w/e), then it can end up exploring poor routes for a while and eventually
hit the default timeout (particularly with a slow disk, but that'll be
optimized away in lnd 0.14).

One interesting area of research would be to investigate if a small amount
of flow pre-planning can help the algorithm effectively re-summarize its
current working memory to ignore more paths we know likely won't work.

In the end, there's still so much of the design space that needs exploring,
so settling on something (and advertising that everything is solved by
magically setting a particular value to zero) that appears (so far we're
going mainly off of experimental anecdotes w/ unclear methods) to improve on
things in certain scenarios, and morphing the protocol around it is a
premature declaration of "Mission Accomplished!".

In any case, major kudos to Rene and Stefan for examining the problem in a
new
light, and re-invigorating research of related areas!

-- Laolu


[1]: https://github.com/ACINQ/eclair/pull/1427
[2]:
https://github.com/lightningnetwork/lnd/blob/958119a12ab60d24c75c7681f344ceb5a450c4ad/routing/pathfind.go#L930

On Sun, Aug 15, 2021 at 5:07 PM Matt Corallo <lf-lists at mattcorallo.com>
wrote:

> Thanks, AJ, for kicking off the thread.
>
> I'm frankly still very confused why we're having these conversations now.
> In one particular class of applicable routing
> algorithms you could use for lightning routing having a base fee makes the
> algorithm intractably slow, but:
>
> a) to my knowledge, no one has (yet) done any follow-on work to
> investigate pulling many of the same heuristics Rene et
> al use in a Dijkstras/A* algorithm with multiple passes or generating
> multiple routes in the same pass to see whether
> you can emulate the results in a faster algorithm without the drawbacks
> here,
>
> b) to my knowledge, no one has (yet) done any follow-on work to
> investigate mapping the base fee to other, more
> flow-based-routing-compatible numbers, eg you could convert the base fee
> to a minimum fee by increasing the "effective"
> proportional fees. From what others have commented, this may largely
> "solve" the issue.
>
> c) to my knowledge, no one has (yet) done any follow-on work to analyze
> where the proposed algorithm may be most optimal
> in the HTLC-value<->channel liquidity ratio ranges. We may find that the
> proposed algorithm only provides materially
> better routing when the HTLC value approaches X% of common network channel
> liquidity, allowing us to only use it for
> large-value payments where we can almost ignore the base fees entirely.
>
> There's real cost to distorting the fee structures on the network away
> from the costs of node operators, especially as
> we move towards requiring and using Real (tm) amounts of capital on
> routing nodes. If we're relying purely on hobbyists
> forever who are operating out of goodwill, we should just remove all fees.
> If we think Lightning is going to involve
> capital with real opportunity cost, matching fees to the costs is
> important, or at least important enough that we
> shouldn't throw it away after one (pretty great) paper and limited further
> analysis.
>
> Imagine we find some great way to address HTLC slot flooding/DoS attacks
> (or just chose to do it in a not-great way) by
> charging for HTLC slot usage, now we can't fix a critical DoS issue
> because the routing algorithms we deployed can't
> handle the new costing. Instead, we should investigate how we can apply
> the ideas here with the more complicated fee
> structures we have.
>
> Color me an optimist, but I'm quite confident with sufficient elbow grease
> and heuristics we can get 95% of the way
> there. We can and should revisit these conversations if such exploration
> is done and we find that its not possible, but
> until then this all feels incredibly premature.
>
> Matt
>
> On 8/14/21 21:00, Anthony Towns wrote:
> > Hey *,
> >
> > There's been discussions on twitter and elsewhere advocating for
> > setting the BOLT#7 fee_base_msat value [0] to zero. I'm just writing
> > this to summarise my understanding in a place that's able to easily be
> > referenced later.
> >
> > Setting the base fee to zero has a couple of benefits:
> >
> >   - it means you only have one value to optimise when trying to collect
> >     the most fees, and one-dimensional optimisation problems are
> >     obviously easier to write code for than two-dimensional optimisation
> >     problems
> >
> >   - when finding a route, if all the fees on all the channels are
> >     proportional only, you'll never have to worry about paying more fees
> >     just as a result of splitting a payment; that makes routing easier
> >     (see [1])
> >
> > So what's the cost? The cost is that there's no longer a fixed minimum
> > fee -- so if you try sending a 1sat payment you'll pay 0.1% of the fee
> > to send a 1000sat payment, and there may be fixed costs that you have
> > in routing payments that you'd like to be compensated for (eg, the
> > computational work to update channel state, the bandwith to forward the
> > tx, or the opportunity cost for not being able to accept another htlc if
> > you've hit your max htlcs per channel limit).
> >
> > But there's no need to explicitly separate those costs the way we do
> > now; instead of charging 1sat base fee and 0.02% proportional fee,
> > you can instead just set the 0.02% proportional fee and have a minimum
> > payment size of 5000 sats (htlc_minimum_msat=5e6, ~$2), since 0.02%
> > of that is 1sat. Nobody will be asking you to route without offering a
> > fee of at least 1sat, but all the optimisation steps are easier.
> >
> > You could go a step further, and have the node side accept smaller
> > payments despite the htlc minimum setting: eg, accept a 3000 sat payment
> > provided it pays the same fee that a 5000 sat payment would have. That
> is,
> > treat the setting as minimum_fee=1sat, rather than
> minimum_amount=5000sat;
> > so the advertised value is just calculated from the real settings,
> > and that nodes that want to send very small values despite having to
> > pay high rates can just invert the calculation.
> >
> > I think something like this approach also makes sense when your channel
> > becomes overloaded; eg if you have x HTLC slots available, and y channel
> > capacity available, setting a minimum payment size of something like
> > y/2/x**2 allows you to accept small payments (good for the network)
> > when you're channel is not busy, but reserves the last slots for larger
> > payments so that you don't end up missing out on profits because you
> > ran out of capacity due to low value spam.
> >
> > Two other aspects related to this:
> >
> > At present, I think all the fixed costs are also incurred even when
> > a htlc fails, so until we have some way of charging failing txs for
> > incurring those costs, it seems a bit backwards to penalise successful
> > txs who at least pay a proportional fee for the same thing. Until we've
> > got a way of handling that, having zero base fee seems at least fair.
> >
> > Lower value HTLCs don't need to be included in the commitment transaction
> > (if they're below the dust level, they definitely shouldn't be included,
> > and if they're less than 1sat they can't be included), and as such don't
> > incur all the same fixed costs that HTLCs that are committed too do.
> > Having different base fees for microtransactions that incur fewer costs
> > would be annoying; so having that be "amortised" into the proportional
> > fee might help there too.
> >
> > I think eltoo can help in two ways by reducing the fixed costs: you no
> > longer need to keep HTLC information around permanently, and if you do
> > a multilevel channel factory setup, you can probably remove the ~400
> > HTLCs per channel at any one time limit. But there's still other fixed
> > costs, so I think that would just lower the fixed costs, not remove them
> > altogether and isn't a fundamental change.
> >
> > I think the fixed costs for forwarding a HTLC are very small; something
> > like:
> >
> >     0.02sats -- cost of permanently storing the HTLC info
> >                 (100 bytes, $500/TB/year, 1% discount rate)
> >     0.04sats -- compute and bandwidth cost for updating an HTLC
> ($40/month
> >                 at linode, 1 second of compute)
> >
> > The opportunity cost of having HTLC slots or Bitcoin locked up until
> > the HTLC succeeds/fails could be much more significant, though.
> >
> > Cheers,
> > aj
> >
> > [0]
> https://github.com/lightningnetwork/lightning-rfc/blob/master/07-routing-gossip.md#the-channel_update-message
> > [1] https://basefee.ln.rene-pickhardt.de/
> >
> > _______________________________________________
> > Lightning-dev mailing list
> > Lightning-dev at lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
> >
> _______________________________________________
> Lightning-dev mailing list
> Lightning-dev at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/lightning-dev/attachments/20210816/e843ca02/attachment-0001.html>