Date: Mon, 11 Sep 2023 10:56:40 +1000
From: Anthony Towns <aj@erisian.com.au>
To: jlspc <jlspc@protonmail.com>
Message-ID: <ZP5lyA9CCUf138ve@erisian.com.au>
References: <vUfA21-18moEP9UiwbWvzpwxxn83yJQ0J4YsnzK4iQGieArfWPpIZblsVs1yxEs9NBpqoMBISuufMsckbuWXZE1qkzXkf36oJKfwDVqQ2as=@protonmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <vUfA21-18moEP9UiwbWvzpwxxn83yJQ0J4YsnzK4iQGieArfWPpIZblsVs1yxEs9NBpqoMBISuufMsckbuWXZE1qkzXkf36oJKfwDVqQ2as=@protonmail.com>
X-Spam_score: -1.0
X-Spam_bar: -
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>,
 "lightning-dev@lists.linuxfoundation.org"
 <lightning-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] [Lightning-dev] Scaling Lightning With Simple
	Covenants
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Sun, 10 Sep 2023 15:19:53 -0000

On Fri, Sep 08, 2023 at 06:54:46PM +0000, jlspc via Lightning-dev wrote:
> TL;DR
> =====

I haven't really digested this, but I think there's a trust vs
capital-efficiency tradeoff here that's worth extracting.

Suppose you have a single UTXO, that's claimable by "B" at time T+L,
but at time T that UTXO holds funds belonging not only to B, but also
millions of casual users, C_1..C_1000000. If B cheats (eg by not signing
any further lightning updates between now and time T+L), then each
casual user needs to drop their channel to the chain, or else lose all
their funds. (Passive rollovers doesn't change this -- it just moves the
responsibility for dropping the channel to the chain to some other
participant)

That then faces the "thundering herd" problem -- instead of the single
one-in/one-out tx that we expected when B is doing the right thing,
we're instead seeing between 1M and 2M on-chain txs as everyone recovers
their funds (the number of casual users multiplied by some factor that
depends on how many outputs each internal tx has).

But whether an additional couple of million txs is a problem depends
on how long a timeframe they're spread over -- if it's a day or two,
then it might simply be impossible; if it's over a year or more, it
may not even be noticable; if it's somewhere in between, it might just
mean you're paying a modest amount in additional fees than you'd have
normally expected.

Suppose that casual users have a factor in mind, eg "If worst comes to
worst, and everyone decides to exit at the same time I do, I want to be
sure that only generates 100 extra transactions per block if everyone
wants to recover their funds prior to B being able to steal everything".

Then in that case, they can calculate along the following lines: 1M users
with 2-outputs per internal tx means 2M transactions, divide that by 100
gives 20k blocks, at 144 blocks per day, that's 5 months. Therefore,
I'm going to ensure all my funds are rolled over to a new utxo while
there's at least 5 months left on the timeout.

That lowers B's capital efficiency -- if all the causal users follow
that policy, then B is going to own all the funds in Fx for five whole
months before it can access them. So each utxo here has its total
lifetime (L) actually split into two phases: an active lifetime LA of
some period, and an inactive lifetime of LI=5 months, which would have
been used by everyone to recover their funds if B had attempted to block
normal rollover. The capital efficiency is then reduced by a factor of
1/(1+LA/LI). (LI is dependent on the number of users, their willingness
to pay high fees to recover their funds, and global blockchain capacity,
LA is L-LI, L is your choice)

Note that casual users can't easily reduce their LI timeout just by
having the provider split them into different utxos -- if the provider
cheats/fails, that's almost certainly a correlated across all their
utxos, and all the participants across each of those utxos will need
to drop to the chain to preserve their funds, each competing with each
other for confirmation.

Also, if different providers collude, they can cause problems: if you
expected 2M transactions over five months due to one provider failing,
that's one thing; but if a dozen providers fail simultaneously, then that
balloons up to perhaps 24M txs over the same five months, or perhaps 25%
of every block, which may be quite a different matter.

Ignoring that caveat, what do numbers here look like? If you're a provider
who issues a new utxo every week (so new customers can join without too
much delay), have a million casual users as customers, and target LA=16
weeks (~3.5 months), so users don't need to rollover too frequently,
and each user has a balanced channel with $2000 of their own funds,
and $2000 of your funds, so they can both pay and be paid, then your
utxos might look like:

   active_1 through active_16: 62,500 users each; $250M balance each
   inactive_17 through inactive_35: $250M balance each, all your funds,
      waiting for timeout to be usable

That's:
  * $2B of user funds
  * $2B of your funds in active channels
  * $4.5B of your funds locked up, waiting for timeout

In that case, only 30% of the $6.5B worth of working capital that you've
dedicated to lightning is actually available for routing.

Optimising that formula by making LA as large as possible doesn't
necessarily work -- if a casual user spends all their funds and
disappears prior to the active lifetime running out, then those
funds can't be easily spent by B until the total lifetime runs out,
so depending on how persistent your casual users are, I think that's
another way of ending up with your capital locked up unproductively.
(There are probably ways around this with additional complexity: eg,
you could peer with a dedicated node, and have the timeout path be
"you+them+timeout", so that while you could steal from casual users who
don't rollover, you can't steal from your dedicated peer, so that $4.5B
could be rolled into a channel with them, and used for routing)

You could perhaps also vary the timeout at different layers of the
internal tree -- if you have 500k users with a $10 balance, and give them
a timeout of 16 weeks, and give the remaining 500k with an average $2000
balance a timeout of 26 weeks, then each will calculate LI=10 weeks,
and the $10 folks will rollover at 1.5 months, and the remainder will
rollover at about 4 months; but your idle balance will be $5M for 20
weeks plus $1B for 10 weeks, rather than $1.005B for 20 weeks.

Anyway, I think that's an interesting way of capturing a big concern
with this sort of approach (namely, "what happens if the nice, scalable
path doesn't work, and we have to dump *LOTS* of stuff onchain") in a
measurable way.

Cheers,
aj