summaryrefslogtreecommitdiff
path: root/f6/02cf365a13323272424e4c63eab235261fd8ed
blob: 47d2fa4bc4f261f988c27c06102afc0d7bdca067 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
Return-Path: <tomas@tomasvdw.nl>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 9DF6A2C
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Tue,  5 Sep 2017 14:17:28 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com
	[66.111.4.25])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C69251BB
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Tue,  5 Sep 2017 14:17:27 +0000 (UTC)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42])
	by mailout.nyi.internal (Postfix) with ESMTP id 0C5FA20CB0
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Tue,  5 Sep 2017 10:17:27 -0400 (EDT)
Received: from web3 ([10.202.2.213])
	by compute2.internal (MEProxy); Tue, 05 Sep 2017 10:17:27 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
	messagingengine.com; h=content-transfer-encoding:content-type
	:date:from:message-id:mime-version:subject:to:x-me-sender
	:x-me-sender:x-sasl-enc; s=fm1; bh=tyqN8HH/QzLHru2jtU9rnhq2SCiRg
	G2T9yyGYnCu9CY=; b=oGYaEBQaM5WFK8JkGDmnEr5gpu/KXkAKkieF3phHN0HoF
	AW77ED9puiO8AkrIwrZfO9N048YozSJzi08Tqr4Pw9Bd2NsarnzFD9c/C7OT9mF5
	sBXpTpLzh+4IXI1xE/i2x+xK1z+xGILxjmeVBhO/iFEJI0SpF5cgF2TQljS0lAZ5
	ZEx6l04SEFbdNLwgOqO/b609hSY4if8y9u/kW83rKp+w0Gxx5/LHh9QEkpLW0RZA
	7tngJ/b0gt2GirEyEVVStZsuoi/JNiYndf2D7ToU7IWpBG0zgWu1nD5fckg+TBXz
	uVMccW9beC/JoEY/rCLxmhxCgBzjxN1pEuJ0Ku/sg==
X-ME-Sender: <xms:9rGuWRuGoxVFEG3YdH1gkyYrKByFvbJDeUv_ia5J4zmKly6W6GDCHw>
Received: by mailuser.nyi.internal (Postfix, from userid 99)
	id DF3D29EC8C; Tue,  5 Sep 2017 10:17:26 -0400 (EDT)
Message-Id: <1504621046.2505938.1095808376.49C6DC53@webmail.messagingengine.com>
From: Tomas <tomas@tomasvdw.nl>
To: bitcoin-dev@lists.linuxfoundation.org
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="utf-8"
X-Mailer: MessagingEngine.com Webmail Interface - ajax-f7b75467
Date: Tue, 05 Sep 2017 16:17:26 +0200
X-Spam-Status: No, score=-0.7 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	RCVD_IN_DNSWL_LOW autolearn=disabled version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Tue, 05 Sep 2017 14:36:36 +0000
Subject: [bitcoin-dev] Partial UTXO tree as commitment
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Sep 2017 14:17:28 -0000

I would like to propose an efficient UTXO commitment scheme.

A UTXO commitment can be useful for:

1. Fast syncing a full node, by downloading the UTXO-set
2. Proofing (non) existence of a UTXO..

Various schemes have been proposed:

* Merkle/radix trees and variants; all of which have the problem that
they significantly increase the burden of maintaining the UTXO set.
Furthermore, such schemes tend to practically prescribe the UTXO storage
format severely limiting continuous per-implementation optimizations.
* A "flat" rolling hash, eg the ECMH proposed by Pieter Wiulle which is
cheap to calculate but only solves (1) and not (2).

I propose a hybrid approach, with very limited extra burden to maintain
and reasonably small proofs: 

We divide the UTXO set in buckets by prefix of their TXID, then maintain
a rolling hash for each bucket. The commitment is then the root of the
tree constructed from the resulting bucket hashes. To construct the
tree: For each depth, we group the hashes of the next depth per 64
hashes and calculate the rolling hash of each. (Effectively, this is a
prefix tree with a fixed branch-size of 64).

Bucketcount
-------------------
txcount = number of TXIDs in the UTXO set
bucketcount = (smallest power of 2 larger than sqrt(txcount))  << 6

Rationale for bucketcount:

* This currently gives a bucketcount of 2^19, which is very cheap to
maintain with a 16mb array of rolling hashes.
* This currently gives an average bucket size of 4kb. With a rolling
hash, full nodes don't need to maintain the buckets themselves, but they
are used for proofs.
* The burden of future UTXO growth is divided among maintaining the
rolling hashes and size of the proof: 10,000x as large UTXO set (20TB),
gives ~400kb buckets and ~1.6gb in maintaining rolling hashes.
* This gives a tree depth of 5, which means the cost of every UTXO
update is increased  by ~3 rolling hashes (and a double SHA), as the
lowest depths don't benefit from caching.
* A proof for (non) existence of a UTXO is ~ 4*64*32 =8kb (branch-nodes)
+ 4kb (bucket) = ~12kb

Specification [WIP]
---------------------------
We define the "UTXO commitment" as the serialized byte array: "U" "T"
"X" "O" VARINT(version) VARINT(txcount) UINT256(UTXO-root)    [todo
clarify]

A block that contains an output in the coinbase whose scriptPubKey
consists solely of OP_RETURN [UTXO commitment] must be rejected if in
the UTXO commitment the version equals 1 and either 
* After updating the UTXO state, the number of distinct TXIDs in the
UTXO set is not equal to the txcount value of the UTXO commitment
* After updating the UTXO state, the UTXO-root in the UTXO commitment is
not equal to the UTXO-root defined below.

The UTXO-root can be calculated as follows:

* Define _bucketcount_ as (smallest power of 2 larger than
sqrt(txcount))  << 6
* Given a TXID in the UTXO set, define UTXO(TXID) as the double SHA256
of (TXID + coins). (coins is the serialization of unspent outputs to be
spec'ed). 
* Let bucket N be the set of values UTXO(TXID) for each TXID in the
UTXO-set where (TXID mod _bucketcount_) equals N.
* Let rhash N be the rolling hash (TBD) of all values in bucket N
* Let the hash sequence be the ordered sequence  rhash
[0,_bucketcount_).

1. If the hash sequence contains at most 64 entries, then the UTXO-root
is the rolling hash of all entries in the hash sequence, otherwise:
2. Group the hash sequence in ordered subsequences of 64 entries each.
3. Find the rolling hash of each subsequence
4. Continue with 1., with the hash sequence being the ordered sequence
of these rolling hashes.

Note: an implementation may want to maintain and update the set of
rolling hashes at higher depths on each UTXO set operation.

Note: the secure ECMH is a good candidate for the bucket hash. This
could also be used for the branch rolling hashes, but it might be worth
considering XOR for those as there seem to be simply not enough
candidates to find a colliding set?

Note: two magic numbers are used: "<< 6" for the bucket count, and "64"
for the branch size. They work nicely but are pulled out of a dark place
and merit some experimentation.

Use cases for light clients
-------------------------------------
These UTXO proofs could be used as compact fraud proofs, although the
benefit of this is not generally agreed upon.

They can also be used to increase low-conf security to light clients, by
validating the signatures and order-validity of incoming transactions
against the right bucket of the current UTXO set.

An interesting use case may be another type of light client. It could be
interesting for a light client to abandon the bloom filters, and instead
use the UTXO proofs to verify whether an incoming or outgoing
transaction is confirmed. This could be beneficial for "rarely active"
light clients such as smartphone apps, as it prevents the need to
synchronize previous blocks with bloom filters, and allows syncing to
the latest block with 12kb/output.

Summary 
--------------
* Allows fast full node syncing.
* Costs full nodes ~20mb extra in RAM
* Costs full nodes ~3 rolling hash operations per UTXO operation.
* Allows UTXO (non) existence proofs for currently avg ~12kb.
* Size of proof grows O(sqrt(N)) with UTXO set
* Size of extra full node memory grows O(sqrt(N)) with UTXO set


Tomas van der Wansem
bitcrust