summaryrefslogtreecommitdiff
path: root/27/c77d222126dc82c747f0370e4ffc659a7c00d6
blob: 5689a3c922c61d2c9550de5f80999a372938fec8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
Return-Path: <cory@coryfields.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id B84A1E29
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed, 16 May 2018 16:36:38 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-oi0-f45.google.com (mail-oi0-f45.google.com
	[209.85.218.45])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 849496BF
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed, 16 May 2018 16:36:37 +0000 (UTC)
Received: by mail-oi0-f45.google.com with SMTP id b130-v6so1283832oif.12
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed, 16 May 2018 09:36:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=coryfields-com.20150623.gappssmtp.com; s=20150623;
	h=mime-version:reply-to:from:date:message-id:subject:to
	:content-transfer-encoding;
	bh=1Mxsd38bAfxwR3syO7nqYThVT9qibohi9M9YWjyK2PA=;
	b=Oa2EFdVcXS2coYeq/cXmtEIcw+dfd9gxKptyiHXCdsF/4Z73RnMp/QufuJg5CG5gxt
	iDQdnGmJnQn2mHhhGYeWxLJe1jL0P2MW4tH/+2N7DEhFUsiSb6eXfTDUgILfrWmijewC
	rROKPIYBsBoSsZ3UlFOWfKxB/uitQ1wzqGhoGluYSUe+GAoG0llH9F8uNoUyEilPhvmt
	PqAVeiXVMU2LOcozatBPfEBj26xCFU/Ngc1ZFSvGq+kQMqYP0RHiSF7hlXKv7Au/kNA9
	rk17T9qJifz9m5a/ezlI+yCIAXSzG4h7T2pcaqkFptJAvDZzRYcqZ71f6Yn9tK8G5R7/
	UbRw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:reply-to:from:date:message-id
	:subject:to:content-transfer-encoding;
	bh=1Mxsd38bAfxwR3syO7nqYThVT9qibohi9M9YWjyK2PA=;
	b=ZdWOF8qKJ8FKd3ook0mzHdc5WQ2jCehcY320JOE9+ruFi12jFBiKOOwVJOsOp3Wry6
	Z7lFF+gKoYILr6K+Kz4k/ZJ+VUooXIT4eW7oXsaEUUso09IeHr5xVwrUS/xjd86O4cxt
	aOxedqUsiHhTqrW9PvLtxvLrM4ZP2GCNhKp4OnOcA0Cc3amxEjXqD0lG255RIg9kdVVc
	DdhVnzP2vI7Q0Rh1qNYP+Uqb5P9wZT0Fk0/UpwnGKkaq6MpU76bgkGgeE52Nq9Nj2/8R
	FqbiAy+WIGVVU7dtH3kyjeqApBOZpag3VafR58Q1M9Z/ifAJawjSUCIUfg9/iE8BusPu
	XHdA==
X-Gm-Message-State: ALKqPwcJMiVce/5S+1wVWCqa6V4Ve32ZJ/3GuhB/nS3wpS4iPvH5MUkG
	nGX8jJ3SIqD7+OHGB3OIKH6sUJDnG+nNlAoHwViJfRoPYoo=
X-Google-Smtp-Source: AB8JxZp3+xKRfYzCO81PM8NGj3f8kwJkFg5KD79qjK7Hm4SXG7PosUjQmS1ki2ghkVALOT6yIueNTVjCA8HfcRl8Rco=
X-Received: by 2002:aca:508f:: with SMTP id
	e137-v6mr1091014oib.22.1526488596293; 
	Wed, 16 May 2018 09:36:36 -0700 (PDT)
MIME-Version: 1.0
Reply-To: lists@coryfields.com
Received: by 10.74.191.17 with HTTP; Wed, 16 May 2018 09:36:35 -0700 (PDT)
From: Cory Fields <lists@coryfields.com>
Date: Wed, 16 May 2018 12:36:35 -0400
Message-ID: <CAApLimjfPKDxmiy_SHjuOKbfm6HumFPjc9EFKvw=3NwZO8JcmQ@mail.gmail.com>
To: Bitcoin Dev <bitcoin-dev@lists.linuxfoundation.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: [bitcoin-dev] UHS: Full-node security without maintaining a full
	UTXO set
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Wed, 16 May 2018 16:36:38 -0000

Tl;dr: Rather than storing all unspent outputs, store their hashes. Untrust=
ed
peers can supply the full outputs when needed, with very little overhead.
Any attempt to spoof those outputs would be apparent, as their hashes would=
 not
be present in the hash set. There are many advantages to this, most apparen=
tly
in disk and memory savings, as well as a validation speedup. The primary
disadvantage is a small increase in network traffic. I believe that the
advantages outweigh the disadvantages.

--

Bitcoin=E2=80=99s unspent transaction output set (usually referred to as =
=E2=80=9CThe UTXO
set=E2=80=9D) has two primary roles: providing proof that previous outputs =
exist to be
spent, and providing the actual previous output data for verification when =
new
transactions attempts to spend them. These roles are not usually discussed
independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are
compelling reasons to consider them this way.

To see why, consider running a node with the following changes:

- For each new output, gather all extra data that will be needed for
  verification when spending it later as an input: the amount, scriptPubKey=
,
  creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh, etc.=
).
  Call this the Dereferenced Prevout data.
- Create a hash from the concatenation of the new outpoint and the derefere=
nced
  prevout data. Call this a Unspent Transaction Output Hash.
- Rather than storing the full dereferenced prevout entries in a UTXO set a=
s is
  currently done, instead store their hashes to an Unspent Transaction Outp=
ut
  Hash Set, or UHS.
- When relaying a transaction, append the dereferenced prevout for each inp=
ut.

Now when a transaction is received, it contains everything needed for
verification, including the input amount, height, and coinbaseness, which w=
ould
have otherwise required a lookup the UTXO set.

To verify an input's unspentness, again create a hash from the concatenatio=
n of
the referenced outpoint and the provided dereferenced prevout, and check fo=
r
its presence in the UHS. The hash will only be present if a hash of the exa=
ct
same data was previously added to (and not since removed from) the UHS. As
such, we are protected from a peer attempting to lie about the dereferenced
prevout data.

### Some benefits of the UHS model

- Requires no consensus changes, purely a p2p/implementation change.

- UHS is substantially smaller than a full UTXO set (just over half for the
  main chain, see below). In-memory caching can be much more effective as a
  result.

- A block=E2=80=99s transactions can be fully verified before doing a poten=
tially
  expensive database lookup for the previous output data. The UHS can be
  queried afterwards (or in parallel) to verify previous output inclusion.

- Entire blocks could potentially be verified out-of-order because all inpu=
t
  data is provided; only the inclusion checks have to be in-order. Admitted=
ly
  this is likely too complicated to be realistic.

- pay-to-pubkey outputs are less burdensome on full nodes, since they use n=
o
  more space on-disk than pay-to-pubkey-hash or pay-to-script-hash. Taproot=
 and
  Graftroot outputs may share the same benefits.

- The burden of holding UTXO data is technically shifted from the verifiers=
 to
  the spender. In reality, full nodes will likely always have a copy as wel=
l,
  but conceptually it's a slight improvement to the incentive model.

- Block data from peers can also be used to roll backwards during a reorg. =
This
  potentially enables an even more aggressive pruning mode.

- UTXO storage size grows exactly linearly with UTXO count, as opposed to
  growing linearly with UTXO data size. This may be relevant when consideri=
ng
  new larger output types which would otherwise cause the UTXO Set size to
  increase more quickly.

- The UHS is a simple set, no need for a key-value database. LevelDB could
  potentially be dropped as a dependency in some distant future.

- Potentially integrates nicely with Pieter Wuille's Rolling UTXO set hashe=
s
  [1]. Unspent Transaction Output Hashes would simply be mapped to points o=
n a
  curve before adding them to the set.

- With the help of inclusion proofs and rolling hashes, libbitcoinconsensus
  could potentially safely verify entire blocks. The size of the required
  proofs would be largely irrelevant as they would be consumed locally.

- Others?

### TxIn De-duplication

Setting aside the potential benefits, the obvious drawback of using a UHS i=
s a
significant network traffic increase. Fortunately, some properties of
transactions can be exploited to offset most of the difference.

For quick reference:

p2pkh scriptPubKey: DUP HASH160 [pubkey hash] EQUALVERIFY CHECKSIG
p2pkh scriptSig:    [signature] [pubkey]

p2sh scriptPubKey:  HASH160 [script hash] EQUAL
p2sh scriptSig:     [signature(s)] [script]

Notice that if a peer is sending a scriptPubKey and a scriptSig together, a=
s
they would when using a UHS, there would likely be some redundancy. Using a
p2sh output for example, the scriptPubKey contains the script hash, and the
scriptSig contains the script itself. Therefore when sending dereferenced
prevouts over the wire, any hash which can be computed can be omitted and o=
nly
the preimages sent.

Non-standard output scripts must be sent in-full, though because they accou=
nt
for only ~1% of all current UTXOs, they are rare enough to be ignored here.

### Intra-block Script De-duplication

When transactions are chained together in the same block, dereferenced prev=
out
data for these inputs would be redundant, as the full output data is alread=
y
present. For that reason, these dereferenced prevouts can be omitted when
sending over the wire.

The downside would be a new reconstruction pass requirement prior to
validation.

### Data

Here's some preliminary testing with a naive POC implementation patched int=
o
Bitcoin Core. Note that the final sizes will depend on optimization of the
serialization format. The format used for these tests is suboptimal for sur=
e.
Syncing mainnet to block 516346:

                      UTXO Node      UHS Node
  IBD Network Data:   153G           157G
  Block disk space:   175G           157G
  UTXO disk space :   2.8G           1.6G
  Total disk space:   177.8G         158.6G

The on-disk block-space reduction comes from the elimination of the Undo da=
ta
that Bitcoin Core uses to roll back orphaned blocks. For UHS Nodes, this da=
ta
is merged into to the block data and de-duplicated.

Compared to the UXTO model, using a UHS reduces disk space by ~12%, yet onl=
y
requires ~5% more data over the wire.

Experimentation shows intra-block de-duplication to be of little help in
practice, as it only reduces overhead by ~0.2% on mainnet. It could become =
more
useful if, for example, CPFP usage increases substantially in the future.

### Safety

Assuming sha256 for the UHS's hash function, I don't believe any fundamenta=
l
changes to Bitcoin's security model are introduced. Because the unspent
transaction output hashes commit to all necessary data, including output ty=
pes,
it should not be possible to be tricked into validating using mutated or fo=
rged
inputs.

### Compatibility

Transitioning from the current UTXO model would be annoying, but not
particularly painful. I'll briefly describe my current preferred approach, =
but
it makes sense to largely ignore this until there's been some discussion ab=
out
UHS in general.

A new service-bit should be allocated to indicate that a node is willing to
serve blocks and transactions with dereferenced prevout data appended. Once
connected to a node advertising this feature, nodes would use a new getdata
flag, creating MSG_PREVDATA_BLOCK and MSG_PREVDATA_TX.

Because current full nodes already have this data readily available in the
block-undo files, it is trivial to append on-the-fly. For that reason, it w=
ould
be easy to backport patches to the current stable version of Bitcoin Core i=
n
order to enable serving these blocks even before they could be consumed. Th=
is
would avoid an awkward bootstrapping phase where there may only be a few no=
des
available to serve to all new nodes.

Admittedly I haven't put much thought into the on-disk format, I'd rather l=
eave
that to a database person. Though it does seem like a reasonable excuse to
consider moving away from LevelDB.

Wallets would begin holding full prevout data for their unspent outputs, th=
ough
they could probably back-into the data as-is.

### Serialization

I would prefer to delay this discussion until a more high-level discussion =
has
been had, otherwise this would be a magnet for nits. The format used to gat=
her
the data above can be seen in the implementation below.

It should be noted, though, that the size of a UHS is directly dependent on=
 the
chosen hash function. Smaller hashes could potentially be used, but I belie=
ve
that to be unwise.

### Drawbacks

The primary drawback of this approach is the ~5% network ovhead.

Additionally, there is the possibility that a few "bridge nodes" may be nee=
ded
for some time. In a future with a network of only pruned UHS nodes, an old
wallet with no knowledge of its dereferenced prevout data would need to
broadcast an old-style transaction, and have a bridge node append the extra
data before forwarding it along the network.

I won't speculate further there, except to say that I can't imagine a
transition problem that wouldn't have a straightforward solution.

Migration issues aside, am I missing any obvious drawbacks?

### Implementation

This code [2] was a quick hack-job, just enough to gather some initial data=
. It
builds a UHS in memory and never flushes to disk. Only a single run works,
nasty things will happen upon restart. It should only be viewed in order to=
 get
an idea of what changes are needed. Only enough for IBD is implemented,
mempool/wallet/rpc are likely all broken. It is definitely not consensus-sa=
fe.

### Acknowledgement

I consider the UHS concept to be an evolution of Bram Cohen's TXO bitfield
idea. Bram also provided invaluable input when initially walking through th=
e
feasibility of a UHS.

Pieter Wuille's work on Rolling UTXO set hashes served as a catalyst for
rethinking how the UTXO set may be represented.

Additional thanks to those at at Financial Crypto and the CoreDev event
afterwards who helped to flesh out the idea:

Tadge Dryja
Greg Sanders
John Newbery
Neha Narula
Jeremy Rubin
Jim Posen
...and everyone else who has chimed in.


[0] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/0139=
28.html
[1] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014337=
.html
[2] https://github.com/theuni/bitcoin/tree/utxo-set-hash3