From pieter.wuille at gmail.com Wed Dec 12 23:34:42 2018 From: pieter.wuille at gmail.com (Pieter Wuille) Date: Wed, 12 Dec 2018 15:34:42 -0800 Subject: [Lightning-dev] Minisketch and lightning gossip In-Reply-To: <87pnu7xzvy.fsf@rustcorp.com.au> References: <87pnu7xzvy.fsf@rustcorp.com.au> Message-ID: On Tue, 11 Dec 2018 at 18:07, Rusty Russell wrote:> > Hi all, > > In case you're bored with the limited range of improvements > going into the 1.1 spec, you might like to ruminate on: > > https://github.com/sipa/minisketch/blob/master/README.md > > It's a library for efficient summaries of data, such as bitcoin > transaction gossip. It has an implementation sweet-spot at 64-bits, > which almost works well for our gossip messages. I've a straw-man > protocol below. Cool to see there is interesting in exploring using minisketch for this application. Generally when trying to minimize the amount of data to transfer, I think you should pick a field size that is appropriate for the data to send, or if you're hashing the data, a size that gives an acceptable collision rate. In case it isn't quite clear what the trade-off is, you can prefer a size that permits faster implementation (like 15, 22, 28, 30, 46, 60, or 64 bits). Also, if there is a use for, it wouldn't be too much work to support fields over 64 bits in size. I expect it'd come at a ~2x performance reduction (compared to the expected continuation of the performance in function of field size), but we'd need to benchmark to be sure. > The required size of the minisketch we send depends on the number of > differences with our peer. We can use some minimum value (maybe based > on past gossip rates?), and add the number of changes we've received > since last time, increasing if we failed to reconstruct a previous one. There is some research on estimating the sizes of differences that are linked to in https://github.com/sipa/minisketch/blob/master/doc/protocoltips.md which may be useful. There are also techniques based on subdivision. In such a model, both sides maintain a separate sketch for (say) the first and the second half of the domain (one for all elements starting with leading bit 0, and one for all elements with leading bit 1). Let's call those sketches A1, A2, B1, B2. * A initially sends A12=merge(A1,A2) (a sketch of the entire set). * B tries to reconcile A12 against B12=merge(B1,B2). If that works, all good; otherwise B responds "I needz moar". * A now just sends its first sketch A1 (and not A2) * B reconciles A1 against B1, and also reconciles merge(A1,A12) against B2. The trick here is that merge(A1,A12) is equal to A2, but A2 never needed transferring; it was computed by combining the two sketches B already received. This can be done several levels deep of course if needed, and reduces the amount of "overestimation" needed for likely reconciliation in exchange for more roundtrips (one per level) in case of larger differences. > There doesn't need to be consensus between peers on the minisketch size > though, since truncated minisketches regrow into whole ones when tossed > back into the ocean[3]. Eh... if you have a sketch of size p and a sketch of size q, you can reconcile up to min(p,q) elements with them. They don't need to agree in size, but the excess of one size is not useful. Cheers, -- Pieter