From aj at erisian.com.au  Sat Aug 29 07:42:39 2015
From: aj at erisian.com.au (Anthony Towns)
Date: Sat, 29 Aug 2015 17:42:39 +1000
Subject: [Lightning-dev] A state machine.
In-Reply-To: <877fopi4un.fsf@rustcorp.com.au>
References: <87si7eiehg.fsf@rustcorp.com.au>
	<CAMXsxZNRDkJVa2YFLDk_wyBigdGp-WtqwjeGD4-HDX5JkrZBMA@mail.gmail.com>
	<CAMXsxZNFGOq2j-mXCpeNWSOwavb6D50wsYbL=86fU9_kqYFAGg@mail.gmail.com>
	<87h9ntifwf.fsf@rustcorp.com.au> <877fopi4un.fsf@rustcorp.com.au>
Message-ID: <20150829074239.GA15643@navy>

On Fri, Aug 21, 2015 at 03:02:32PM +0930, Rusty Russell wrote:
> Rusty Russell <rusty at rustcorp.com.au> writes:
> > Yeah.  Let me generate a decent text flowchart for the normal cases...
> I've taken out some transitions for simplicity (eg. ERR_ANCHOR_LOST and
> ERR_INFORMATION_LEAK, which shouldn't happen):

Some thoughs, fwiw.

I think the two INIT states is odd. I guess I would have expected
something more like:
  INIT:
    - (cmd_open_my_anchor) -> INIT_WITHANCHOR
    - (pkt_propose_channel) -> INIT_NOANCHOR

I think the "state" orientation of the dot output isn't great. ie rather
than laying it out like how it is in the code with a state then all the
transitions from that state:

  NORMAL:
    - UPDATE_HTLC -> X
    - UPDATE_PKT -> Y
    - COMPLETE_HTLC -> Z

I'd rather see it laid out as the protocol steps:
  UPDATE_HTLC
    NORMAL -[UPDATE_HTLC]-> X -[ACCEPT]-> X2 ->...

  UPDATE_PKT
    NORMAL -[UPDATE_PKT]-> Y ->...

ie, have multiple graphs, each starting at INIT/NORMAL with a single
cmd/pkt and ending at NORMAL'/CLOSED/ERROR. I'd probably prefer
including the decision points as nodes too.

A lot of the "an error occurred!" abort paths apply to a whole bunch of
states, it would be nice to combine them in the output... Hmm, I think
that's doable...

I've had a go at doing this with code. C is hard, so I converted to
python, and came up with:

  http://azure.erisian.com.au/~aj/tmp/lightstate/statepy.svg

(source bits in the lightstate/ directory)

I was thinking it would be possible to update many HTLCs at once, so
I was expecting a single PKT_UPDATE_CHANNEL rather than the ADD_HTLC,
COMPLETE_HTLC, TIMEOUT_HTLC, etc variants. From a protocol POV, I guess
that's something like:

  UpdateChannel:
     int  n_ops
     op   ops[]
     int  commitment_len;
     byte commitment[]

where an op would be ADD_HTLC, COMPLETE_HTLC, TIMEOUT_HTLC,
ROUTEFAIL_HTLC, etc. That's a bit more complicated in the protocol,
but I think it pays off in simplifying the state diagram and hence the
software? (I think it's also kindof necessary for some fee models, and
will be useful for batching updates when performance eventually matters)

That would look something like:

  http://azure.erisian.com.au/~aj/tmp/lightstate/update.svg

I think...

If you're going to have separate PKT_UPDATE_COMPLETE_HTLC, etc; then
it might make sense to have some of them go to WAIT_FOR_UPDATE_ACCEPT
instead of WAIT_FOR_HTLC_ACCEPT so that declining is forbidden -- if you
reveal R, or claim a timeout has been reached, then your counterparty
doesn't have the right to decline the update, and if they try, you want
to close the channel anyway afaics.

Hmm, I find the PRIO stuff pretty klunky. It's only used to choose who
wins in the event of simultaneous/overlapping channel updates, no? Why
not just have a constant tiebreak in that case, eg, where whoever has the
lowest channel balance (or the lowest IP / key id), wins?

  NORMAL
    >pkt_update

  WAIT_FOR_HTLC_ACCEPT
    <pkt_update_accept
    >pkt_update_signature -> WAIT_FOR_UPDATE_COMPLETE

    <pkt_update_decline -> NORMAL

    <pkt_update + <dec_win_tie
    -> WAIT_FOR_HTLC_ACCEPT

    <pkt_update + <dec_lose_tie + <dec_accept_update
    >pkt_update_accept -> WAIT_FOR_UPDATE_SIG

    <pkt_update + <dec_lose_tie + <dec_decline_update
    >pkt_update_decline -> NORMAL

Or alternatively, just have the high priority be given to whoever last
went from WAIT_FOR_UPDATE_SIG -> NORMAL (and low priority to whoever
went from WAIT_FOR_UPDATE_COMPLETE -> NORMAL). That way the priority
would still swap, and you could keep the priority embedded in the state,
but it would only be needed for NORMAL and WAIT_FOR_*_ACCEPT states;
and not the UPDATE_COMPLETE and UPDATE_SIG states as well.

AFAICS, you still have a potential deadlock atm if you think you're
high priority but your counterparty also thinks they're high priority,
or just missed your update packet. I think there might be a similar
deadlock if both systems think they're low priority. There should be a
"timeout waiting for message -> retry/close channel", for most of the
non-NORMAL states shouldn't there?

If the INIT_NOANCHOR is meant to do to single-sided-anchor idea proposed
in

  https://bitcointalk.org/index.php?topic=1134319.msg11963141

then it would probably be good to have a PKT_REBALANCE_VIA_BLOCKCHAIN
option; ie something like:

 0. channel is unbalanced: A has many funds, B has no funds
 1. B proposes rebalancing by $x.
 2. A accepts, chooses R, reveals #(R).
 3. B proposes HTLC from A to B of x funds, conditional on R.
 4. B posts $x in funds on the blockchain, payable to:
      [SIG_B & R | SIG_A & TIMEOUT]
 5. B tells A the transaction id (and p2sh details etc).
 6. A waits for this to be "deep enough"
 7. A claims the blockchain transaction, revealing R.
 8. B completes the HTLC, rebalancing the channel.

I think that would look like:

A:
  NORMAL
    <pkt_rebalance (x)
    >pkt_update_channel (add htlc: A->B, x, R, T)

    (or pkt_rebalance_decline -> NORMAL)

  REBAL_WAIT_FOR_HTLC_ACCEPT (R,x)
    <pkt_update_accept
    >pkt_update_signature

  REBAL_WAIT_FOR_UPDATE_COMPLETE (R,x)
    <pkt_update_complete
    >pkt_rebal_ok (#R,x)

  NORMAL+REBAL_2 (R,x)
    <pkt_rebal_underway (txn)

  NORMAL+REBAL_3 (R,x,txn)
    <bitcoin_depthok (txn)
    >bitcoin_spend (txn,R,SIG_A -> ...)

  NORMAL

B:
  NORMAL
    <cmd_rebalance (x)
    >pkt_rebalance (x)

  NORMAL+REBAL_A (x)
    <pkt_rebal_ok (#R,x)
    >bitcoin_spend (... -> R+SIG_A|TIME+SIG_B)
    >pkt_rebal_underway (txn)

    (or <pkt_rebal_decline -> NORMAL)

  NORMAL+REBAL_B (#R,x,txn)
    <bitcoin_spent (txn)
    <pkt_update_channel

  NORMAL

Not being able to tack on orthogonal state makes things a little clumsy
here...

Cheers,
aj