Return-Path: <peter_r@gmx.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 1D7BE267
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon,  9 May 2016 18:40:00 +0000 (UTC)
X-Greylist: delayed 00:05:06 by SQLgrey-1.7.6
Received: from mout.gmx.net (mout.gmx.net [212.227.15.18])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 1302C115
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon,  9 May 2016 18:39:58 +0000 (UTC)
Received: from [192.168.1.59] ([205.250.126.165]) by mail.gmx.com (mrgmx002)
	with ESMTPSA (Nemesis) id 0MexFh-1bFPzn0Qzh-00OZBs;
	Mon, 09 May 2016 20:34:50 +0200
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Peter R <peter_r@gmx.com>
In-Reply-To: <5730C37E.2000004@gmail.com>
Date: Mon, 9 May 2016 11:34:47 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <D0F57BE3-EFD2-4557-8D05-704D9C4E5EA1@gmx.com>
References: <5727D102.1020807@mattcorallo.com> <5730C37E.2000004@gmail.com>
To: Pieter Wuille <pieter.wuille@gmail.com>,
	Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
X-Mailer: Apple Mail (2.2104)
X-Provags-ID: V03:K0:tIOTP7Ck6WWlgO70r6mNTR7Z8m4X6Io7yx/xPRwn1aAh099mHWb
	YftNaa9rVwis151erITocEf7ioRK1hZ+lCvRhwHKj+xewmIhI320H2K9I6+fprSRSMGE2Tm
	l08dziE4fsfrgFVYUUJRoRkTPWKslEbbMZ7i2uZfdkpIPJKIxv3AfJre/SoUJ4wxFiPr7W/
	2L5t7OHUcJWZnlgmY92Jg==
X-UI-Out-Filterresults: notjunk:1;V01:K0:cZ2zFL4tUXs=:y7JWyxB2Vmn9c1eAZ2F292
	xboPKHOhjYy34lnK6TPmYYFwTtTAJ39jiF3lyEEGuhWFxkNndTrVt50Qr62Oe1b6j8AqgDhEt
	LVDwF6gRMv/3HUxUe/QWaOJVBr2cmSXeHOI1pjVyz0/epjIF3X2Enhs/2dVrtFd8cdeZue4R8
	d0F8axPogNsu9ZeTKTQ9uUn8LfvIArzR/O/GFWpun3svxrmoZb5UieONxfTAg3yzZD2H6klzQ
	6WXaQwLByw8zP7FNh3jyo61HXtSCZibbrI+uWzr+0snRx1sT4qT2CnYcqD2rdkYEVbxeufa9x
	33cW3Xue7lHKmfjxR1tRby6vBbmoQfYIrjWmQKigwcrl+9VNZpJXwAV8HrVkYzzQpjDQDMRol
	5FmT+8HXT8PwNxrUzNepTdhQ5XDGOE+dUmwAVemEodCluHqDY5sglqsGT/NL6iAVQoHxGGxpM
	s4Ui9lD4ZsJKlU7hZl6fCWD2majQfSgpe68bmuVCjAI1sgYjuet7VblxQ1ca5lx8ykR7L+bTK
	a34kk2NCvNXB4tWRR+NIrVSvzkP7wMSryUQTifr5WQA6ylr7jEiKDSjlPh0I0qQl6wT4GXe4h
	301G/dyn2RKE7YRUTOkFpkQHDcHt/9SD6n/i8AkgZaXF99nBMhK4LTnXlpQduOSjY/ysceTfr
	lue4PzoYuP7JNV5I04ek7dnMCpkkci+xGlp0LivNu/BS4OAGkh8FK9Kis3q1J4DEZgph1cIkH
	FCj1O/47rYrrZz1MbgP0Mw6ngAnM3Po+ZmXDMU9lR50JWqPHXQokzmIcNQ2SRSKhwrxSkBohA
	MvDCYeh
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Mon, 09 May 2016 18:52:07 +0000
Subject: Re: [bitcoin-dev] Compact Block Relay BIP
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Mon, 09 May 2016 18:40:00 -0000

Hi Pieter,

> I tried to derive what length of short ids is actually necessary (some
> write-up is on
> https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
> incomplete).
>=20
> For any reasonable numbers I can come up with (in a very wide range),
> the number of bits needed is very well approximated by:
>=20
>  log2(#receiver_mempool_txn * #block_txn_not_in_receiver_mempool /
> acceptable_per_block_failure_rate)
>=20
> For example, with 20000 mempool transactions, 2500 transactions in a
> block, 95% hitrate, and a chance of 1 in 10000 blocks to fail to
> reconstruct, needed_bits =3D log2(20000 * 2500 * (1 - 0.95) / 0.0001) =
=3D
> 34.54, or 5 byte txids would suffice.
>=20
> Note that 1 in 10000 failures may sound like a lot, but this is for =
each
> individual connection, and since every transmission uses separately
> salted identifiers, occasional failures should not affect global
> propagation. Given that transmission failures due to timeouts, network
> connectivity, ... already occur much more frequently than once every =
few
> gigabytes (what 10000 blocks corresponds to), that's probably already
> more than enough.
>=20
> In short: I believe 5 or 6 byte txids should be enough, but perhaps it
> makes sense to allow the sender to choose (so he can weigh trying
> multiple nonces against increasing the short txid length).

[9 May 16 @ 11am PDT] =20

We worked on this with respect to =E2=80=9CXthin" for Bitcoin Unlimited, =
and came to a similar conclusion. =20

But we (I think it was theZerg) also noticed another trick: if the node =
receiving the thin blocks has a small number of collisions with =
transactions in its mempool (e.g., 1 or 2), then it can test each =
possible block against the Merkle root in the block header to determine =
the correct one.  Using this technique, it should be possible to further =
reduce the number of bytes used for the txids.  That being said, even =
thin blocks built from 64-bit short IDs represent a tremendous savings =
compared to standard block propagation.  So we (Bitcoin Unlimited) =
decided not to pursue this optimization any further at that time.

***

It=E2=80=99s also interesting to ask what the information-theoretic =
minimum amount of information necessary for a node to re-construct a =
block is. The way I=E2=80=99m thinking about this currently[1] is that =
the node needs all of the transactions in the block that were not =
initially part of its mempool, plus enough information to select and =
ordered subset from that mempool that represents the block.  If m is the =
number of transactions in mempool and n is the number of transactions in =
the block, then the number of possible subsets (C') is given by the =
binomial coefficient:

  C' =3D  m! / [n! (m - n)!]

Since there are n! possible orderings for each subset, the total number =
of possible blocks (C) of size n from a mempool of size m is

  C =3D n! C=E2=80=99 =3D m! / (m-n)!

Assuming that all possible blocks are equally likely, the Shannon =
entropy (the information that must be communicated) is the base-2 =
logarithm of the number of possible blocks.  After making some =
approximations, this works out very close to

   minimum information ~=3D n * log2(m),

which for your case of 20,000 transactions in mempool (m =3D 20,000) and =
a 2500-transaction block (n =3D 2500), yields

   minimum information =3D 2500 * log2(20,000) ~ 2500 * 15 bits.

In other words, a lower bound on the information required is about 2 =
bytes per transactions for every transaction in the block that the node =
is already aware of, as well as all the missing transactions in full.=20

Of course, this assumes an unlimited number of round trips, and it is =
probably complicated by other factors that I haven=E2=80=99t considered =
(queue the =E2=80=9Cspherical cow=E2=80=9D jokes :), but I thought it =
was interesting that a technique like Xthin or compact blocks is already =
pretty close to this limit. =20

Cheers,
Peter=20

[1] There are still some things that I can=E2=80=99t wrap my mind around =
that I=E2=80=99d love to discuss with another math geek :)