Return-Path: <m.bevand@gmail.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id B1D37B0A
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Thu, 16 Nov 2017 16:56:48 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-ot0-f174.google.com (mail-ot0-f174.google.com
	[74.125.82.174])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 274DC171
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Thu, 16 Nov 2017 16:56:48 +0000 (UTC)
Received: by mail-ot0-f174.google.com with SMTP id o23so12230405otd.1
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Thu, 16 Nov 2017 08:56:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:from:date:message-id:subject:to;
	bh=wtwAt0tVYbdAcyU8G5aNVjaQLYmfu5DXwtQrGVRzvNY=;
	b=HsexXy/futNFr9IT9a4aQ4/bTJrWs6lnZCigTougCqleu3r2X+v/seDyiDVFVvcTDP
	6gL7kURLjCK/xS8V3jEoNp90bvfJtMdsgoollN6eSaMAhATlfUlLBRqfSt96y5VQ9Iu1
	CsK76q8lSjL2/Wdu9McAW19TDCMrhlMu+mLfbf8dzg05hsuaIHVKCN/Fqk270vwx/Zcx
	8dfg7n5q4zxFsBS7NLbR9wqBT8N7LZY1LQwIi9jfKV5t6WwZeFbC7cRdAoabViBB3Ob1
	S8/zCh9z5GAnFV9wU/KvcMo5I2dX140hCJBCWyfIMx5w/5EeqzlDAkwoZA3bmtJkrhxF
	l4vQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
	bh=wtwAt0tVYbdAcyU8G5aNVjaQLYmfu5DXwtQrGVRzvNY=;
	b=fwGPdqv4WhgFh96LXqmE/1Sg/V0mAJQI5/WBqz/g2bWA6u1QRgXlJMW5x389enMKJT
	ZilVOyCgZD87b+jVrhv4qo4/Tx6l6F7OpW4lz33rhS1BxYl1K7UE9nLiNV3tuOkV58Xo
	KxPe4taM3WJiKKL/JO0lLskR7zUF7ZJlAFO52JbgWkiOg9+rIyup08g5cXjtxDuJimAL
	YVA1FbtrpJFwV895B1LVql/GqMCUv3kuWdgMXJbq4JhJ7JWO17HJuPDUiGYaIyCjofiC
	PuT7aXgKAlt7BY/RvmNIoJrY/c6mC6TKHvS6SqtXtDAlJrH8nxXGKnMbEd4tjcU9FwE3
	CyCg==
X-Gm-Message-State: AJaThX7lJ79r+6AGRuAm9Jksh3sYQkOACxMJ/43FjBO91ZI+l5q0CZq8
	Ciot3W0ELkrPa1OcuWIZP9Mwd6a/mK8xVpGgdUDVU3om
X-Google-Smtp-Source: AGs4zMZermgYq0ZS7dKURBQJDUFHS3uJOqeA4K4gnx0YnEstKCk4TG2XXhvNCHU5BeKToXYZQ9EgMJRuAyZTHlRi7w8=
X-Received: by 10.157.15.9 with SMTP id 9mr1738102ott.356.1510851407070; Thu,
	16 Nov 2017 08:56:47 -0800 (PST)
MIME-Version: 1.0
Received: by 10.74.124.70 with HTTP; Thu, 16 Nov 2017 08:56:46 -0800 (PST)
From: Marc Bevand <m.bevand@gmail.com>
Date: Thu, 16 Nov 2017 10:56:46 -0600
Message-ID: <CADH-5r0m0zUP545hN=Or3v8ujixeKZpLqqja0M+5wuZZOWhO4A@mail.gmail.com>
To: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Content-Type: multipart/alternative; boundary="94eb2c032ec26d4fe4055e1c8172"
X-Spam-Status: No, score=-0.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE
	autolearn=disabled version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Thu, 16 Nov 2017 17:11:14 +0000
Subject: [bitcoin-dev] Protocol-Level Pruning
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Nov 2017 16:56:48 -0000

--94eb2c032ec26d4fe4055e1c8172
Content-Type: text/plain; charset="UTF-8"

It occurred to me that we could push the classic concept of pruning even
further: we could significantly shrink the blockchain as well as reduce the
amount of network traffic during initial block download by doing something
I would call protocol-level pruning. This would, as of today, reduce the
size of the blockchain by a factor of 50, hence enabling massive on-chain
scaling.

The idea behind PLP is to serialize the UTXO set in a standardized way, and
publish a hash of it in the block header so that the blockchain commits to
it. Since hashing and verifying it is a moderately intensive operation,
perhaps the UTXO set hash should be published only once every 576 blocks (4
days).

When a new Bitcoin node joins the network, it would download the block
headers only (not the block data), it would identify the most recent block
containing the UTXO set hash, and download the UTXO set from peers. From
that point on, it downloads and verifies all blocks as normal.

Every 576 blocks, nodes serialize and verify that their UTXO set hash
matches the one published in the blockchain. Doing so becomes a new part of
consensus rules. The last 576 blocks could then be permanently discarded as
they are no longer useful.

Today the serialized UTXO set is about 3GB and the blockchain is about
150GB. Therefore PLP would cut down the amount of data stored by full nodes
by a factor of ~50 as they would have to store only the UTXO set plus at
most 576 blocks.

One trivial optimization is possible: to avoid hashing the entire UTXO set
every 576 blocks (which would take multiple seconds even on a fast
machine), the UTXO set serialization could be a sparse merkle tree
<https://eprint.iacr.org/2016/683.pdf> which would allow on-the fly
recomputation of the hash as new blocks are mined: when a UTXO is added to
(or removed from) the tree, only a small number of hash operations are
needed to recalculate the UTXO set merkle tree root hash.

Maybe we don't even need sparse merkle trees, but a regular merkle tree
would suffice: the tree leaves would be small groups of UTXOs (some bits in
the ID/hash of a UTXO would determine which leaf it belongs to.)

Unlike classic pruning mode, *ALL* full nodes on the network could switch
to PLP. There is no need for any node to archive the entire blockchain any
more.

I can think of one downside of PLP: nodes would no longer be able to handle
reorgs that go further back than the last UTXO set hash published on the
chain (since previous blocks have been discarded). So, perhaps keeping
around the last N*576 blocks (N=10?) would be a sufficient workaround.

Thoughts?

-Marc

--94eb2c032ec26d4fe4055e1c8172
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">It occurred to me that we could push the classic concept o=
f pruning even further: we could significantly shrink the blockchain as wel=
l as reduce the amount of network traffic during initial block download by =
doing something I would call protocol-level pruning. This would, as of toda=
y,=C2=A0reduce the size of the blockchain by a factor of 50, hence enabling=
 massive on-chain scaling.<div><br></div><div>The idea behind PLP is to ser=
ialize the UTXO set in a standardized way, and publish a hash of it in the =
block header so that the blockchain commits to it. Since hashing and verify=
ing it is a moderately intensive operation, perhaps the UTXO set hash shoul=
d be published only once every 576 blocks (4 days).</div><div><br></div><di=
v>When a new Bitcoin node joins the network, it would download the block he=
aders only (not the block data), it would identify the most recent block co=
ntaining the UTXO set hash, and download the UTXO set from peers. From that=
 point on, it downloads and verifies all blocks as normal.</div><div><br></=
div><div>Every 576 blocks, nodes serialize and verify that their UTXO set h=
ash matches the one published in the blockchain. Doing so becomes a new par=
t of consensus rules. The last 576 blocks could then be permanently discard=
ed as they are no longer useful.</div><div><br></div><div>Today the seriali=
zed UTXO set is about 3GB and the blockchain is about 150GB. Therefore PLP =
would cut down the amount of data stored by full nodes by a factor of ~50 a=
s they would have to store only the UTXO set plus at most 576 blocks.</div>=
<div><br></div><div>One trivial optimization is possible: to avoid hashing =
the entire UTXO set every 576 blocks (which would take multiple seconds eve=
n on a fast machine), the UTXO set serialization could be a <a href=3D"http=
s://eprint.iacr.org/2016/683.pdf">sparse merkle tree</a>=C2=A0which would a=
llow on-the fly recomputation of the hash as new blocks are mined: when a U=
TXO is added to (or removed from) the tree, only a small number of hash ope=
rations are needed to recalculate the UTXO set merkle tree root hash.</div>=
<div><br></div><div>Maybe we don&#39;t even need sparse merkle trees, but a=
 regular merkle tree would suffice: the tree leaves would be small groups o=
f UTXOs (some bits in the ID/hash of a UTXO would determine which leaf it b=
elongs to.)</div><div><br></div><div>Unlike classic pruning mode, <b>ALL</b=
> full nodes on the network could switch to PLP. There is no need for any n=
ode to archive the entire blockchain any more.</div><div><br></div><div>I c=
an think of one downside of PLP: nodes would no longer be able to handle re=
orgs that go further back than the last UTXO set hash published on the chai=
n (since previous blocks have been discarded). So, perhaps keeping around t=
he last N*576 blocks (N=3D10?) would be a sufficient workaround.</div><div>=
<br></div><div>Thoughts?</div><div><br></div><div>-Marc</div></div>

--94eb2c032ec26d4fe4055e1c8172--