From pieter.wuille at gmail.com  Wed Dec 12 23:34:42 2018
From: pieter.wuille at gmail.com (Pieter Wuille)
Date: Wed, 12 Dec 2018 15:34:42 -0800
Subject: [Lightning-dev] Minisketch and lightning gossip
In-Reply-To: <87pnu7xzvy.fsf@rustcorp.com.au>
References: <87pnu7xzvy.fsf@rustcorp.com.au>
Message-ID: <CAPg+sBhuFsMHqgSon+K15QT-KVQ7k+eQgnjUByFX3xYfZPnMXA@mail.gmail.com>

On Tue, 11 Dec 2018 at 18:07, Rusty Russell <rusty at rustcorp.com.au> wrote:>
> Hi all,
>
>         In case you're bored with the limited range of improvements
> going into the 1.1 spec, you might like to ruminate on:
>
>   https://github.com/sipa/minisketch/blob/master/README.md
>
> It's a library for efficient summaries of data, such as bitcoin
> transaction gossip.  It has an implementation sweet-spot at 64-bits,
> which almost works well for our gossip messages.  I've a straw-man
> protocol below.

Cool to see there is interesting in exploring using minisketch for
this application.

Generally when trying to minimize the amount of data to transfer, I
think you should pick a field size that is appropriate for the data to
send, or if you're hashing the data, a size that gives an acceptable
collision rate. In case it isn't quite clear what the trade-off is,
you can prefer a size that permits faster implementation (like 15, 22,
28, 30, 46, 60, or 64 bits).

Also, if there is a use for, it wouldn't be too much work to support
fields over 64 bits in size. I expect it'd come at a ~2x performance
reduction (compared to the expected continuation of the performance in
function of field size), but we'd need to benchmark to be sure.

> The required size of the minisketch we send depends on the number of
> differences with our peer.  We can use some minimum value (maybe based
> on past gossip rates?), and add the number of changes we've received
> since last time, increasing if we failed to reconstruct a previous one.

There is some research on estimating the sizes of differences that are
linked to in https://github.com/sipa/minisketch/blob/master/doc/protocoltips.md
which may be useful.

There are also techniques based on subdivision. In such a model, both
sides maintain a separate sketch for (say) the first and the second
half of the domain (one for all elements starting with leading bit 0,
and one for all elements with leading bit 1). Let's call those
sketches A1, A2, B1, B2.
* A initially sends A12=merge(A1,A2) (a sketch of the entire set).
* B tries to reconcile A12 against B12=merge(B1,B2). If that works,
all good; otherwise B responds "I needz moar".
* A now just sends its first sketch A1 (and not A2)
* B reconciles A1 against B1, and also reconciles merge(A1,A12) against B2.

The trick here is that merge(A1,A12) is equal to A2, but A2 never
needed transferring; it was computed by combining the two sketches B
already received. This can be done several levels deep of course if
needed, and reduces the amount of "overestimation" needed for likely
reconciliation in exchange for more roundtrips (one per level) in case
of larger differences.

> There doesn't need to be consensus between peers on the minisketch size
> though, since truncated minisketches regrow into whole ones when tossed
> back into the ocean[3].

Eh... if you have a sketch of size p and a sketch of size q, you can
reconcile up to min(p,q) elements with them. They don't need to agree
in size, but the excess of one size is not useful.

Cheers,

-- 
Pieter