Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 1D7BE267 for ; Mon, 9 May 2016 18:40:00 +0000 (UTC) X-Greylist: delayed 00:05:06 by SQLgrey-1.7.6 Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 1302C115 for ; Mon, 9 May 2016 18:39:58 +0000 (UTC) Received: from [192.168.1.59] ([205.250.126.165]) by mail.gmx.com (mrgmx002) with ESMTPSA (Nemesis) id 0MexFh-1bFPzn0Qzh-00OZBs; Mon, 09 May 2016 20:34:50 +0200 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) From: Peter R In-Reply-To: <5730C37E.2000004@gmail.com> Date: Mon, 9 May 2016 11:34:47 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <5727D102.1020807@mattcorallo.com> <5730C37E.2000004@gmail.com> To: Pieter Wuille , Bitcoin Protocol Discussion X-Mailer: Apple Mail (2.2104) X-Provags-ID: V03:K0:tIOTP7Ck6WWlgO70r6mNTR7Z8m4X6Io7yx/xPRwn1aAh099mHWb YftNaa9rVwis151erITocEf7ioRK1hZ+lCvRhwHKj+xewmIhI320H2K9I6+fprSRSMGE2Tm l08dziE4fsfrgFVYUUJRoRkTPWKslEbbMZ7i2uZfdkpIPJKIxv3AfJre/SoUJ4wxFiPr7W/ 2L5t7OHUcJWZnlgmY92Jg== X-UI-Out-Filterresults: notjunk:1;V01:K0:cZ2zFL4tUXs=:y7JWyxB2Vmn9c1eAZ2F292 xboPKHOhjYy34lnK6TPmYYFwTtTAJ39jiF3lyEEGuhWFxkNndTrVt50Qr62Oe1b6j8AqgDhEt LVDwF6gRMv/3HUxUe/QWaOJVBr2cmSXeHOI1pjVyz0/epjIF3X2Enhs/2dVrtFd8cdeZue4R8 d0F8axPogNsu9ZeTKTQ9uUn8LfvIArzR/O/GFWpun3svxrmoZb5UieONxfTAg3yzZD2H6klzQ 6WXaQwLByw8zP7FNh3jyo61HXtSCZibbrI+uWzr+0snRx1sT4qT2CnYcqD2rdkYEVbxeufa9x 33cW3Xue7lHKmfjxR1tRby6vBbmoQfYIrjWmQKigwcrl+9VNZpJXwAV8HrVkYzzQpjDQDMRol 5FmT+8HXT8PwNxrUzNepTdhQ5XDGOE+dUmwAVemEodCluHqDY5sglqsGT/NL6iAVQoHxGGxpM s4Ui9lD4ZsJKlU7hZl6fCWD2majQfSgpe68bmuVCjAI1sgYjuet7VblxQ1ca5lx8ykR7L+bTK a34kk2NCvNXB4tWRR+NIrVSvzkP7wMSryUQTifr5WQA6ylr7jEiKDSjlPh0I0qQl6wT4GXe4h 301G/dyn2RKE7YRUTOkFpkQHDcHt/9SD6n/i8AkgZaXF99nBMhK4LTnXlpQduOSjY/ysceTfr lue4PzoYuP7JNV5I04ek7dnMCpkkci+xGlp0LivNu/BS4OAGkh8FK9Kis3q1J4DEZgph1cIkH FCj1O/47rYrrZz1MbgP0Mw6ngAnM3Po+ZmXDMU9lR50JWqPHXQokzmIcNQ2SRSKhwrxSkBohA MvDCYeh X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Mon, 09 May 2016 18:52:07 +0000 Subject: Re: [bitcoin-dev] Compact Block Relay BIP X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2016 18:40:00 -0000 Hi Pieter, > I tried to derive what length of short ids is actually necessary (some > write-up is on > https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's > incomplete). >=20 > For any reasonable numbers I can come up with (in a very wide range), > the number of bits needed is very well approximated by: >=20 > log2(#receiver_mempool_txn * #block_txn_not_in_receiver_mempool / > acceptable_per_block_failure_rate) >=20 > For example, with 20000 mempool transactions, 2500 transactions in a > block, 95% hitrate, and a chance of 1 in 10000 blocks to fail to > reconstruct, needed_bits =3D log2(20000 * 2500 * (1 - 0.95) / 0.0001) = =3D > 34.54, or 5 byte txids would suffice. >=20 > Note that 1 in 10000 failures may sound like a lot, but this is for = each > individual connection, and since every transmission uses separately > salted identifiers, occasional failures should not affect global > propagation. Given that transmission failures due to timeouts, network > connectivity, ... already occur much more frequently than once every = few > gigabytes (what 10000 blocks corresponds to), that's probably already > more than enough. >=20 > In short: I believe 5 or 6 byte txids should be enough, but perhaps it > makes sense to allow the sender to choose (so he can weigh trying > multiple nonces against increasing the short txid length). [9 May 16 @ 11am PDT] =20 We worked on this with respect to =E2=80=9CXthin" for Bitcoin Unlimited, = and came to a similar conclusion. =20 But we (I think it was theZerg) also noticed another trick: if the node = receiving the thin blocks has a small number of collisions with = transactions in its mempool (e.g., 1 or 2), then it can test each = possible block against the Merkle root in the block header to determine = the correct one. Using this technique, it should be possible to further = reduce the number of bytes used for the txids. That being said, even = thin blocks built from 64-bit short IDs represent a tremendous savings = compared to standard block propagation. So we (Bitcoin Unlimited) = decided not to pursue this optimization any further at that time. *** It=E2=80=99s also interesting to ask what the information-theoretic = minimum amount of information necessary for a node to re-construct a = block is. The way I=E2=80=99m thinking about this currently[1] is that = the node needs all of the transactions in the block that were not = initially part of its mempool, plus enough information to select and = ordered subset from that mempool that represents the block. If m is the = number of transactions in mempool and n is the number of transactions in = the block, then the number of possible subsets (C') is given by the = binomial coefficient: C' =3D m! / [n! (m - n)!] Since there are n! possible orderings for each subset, the total number = of possible blocks (C) of size n from a mempool of size m is C =3D n! C=E2=80=99 =3D m! / (m-n)! Assuming that all possible blocks are equally likely, the Shannon = entropy (the information that must be communicated) is the base-2 = logarithm of the number of possible blocks. After making some = approximations, this works out very close to minimum information ~=3D n * log2(m), which for your case of 20,000 transactions in mempool (m =3D 20,000) and = a 2500-transaction block (n =3D 2500), yields minimum information =3D 2500 * log2(20,000) ~ 2500 * 15 bits. In other words, a lower bound on the information required is about 2 = bytes per transactions for every transaction in the block that the node = is already aware of, as well as all the missing transactions in full.=20 Of course, this assumes an unlimited number of round trips, and it is = probably complicated by other factors that I haven=E2=80=99t considered = (queue the =E2=80=9Cspherical cow=E2=80=9D jokes :), but I thought it = was interesting that a technique like Xthin or compact blocks is already = pretty close to this limit. =20 Cheers, Peter=20 [1] There are still some things that I can=E2=80=99t wrap my mind around = that I=E2=80=99d love to discuss with another math geek :)