summaryrefslogtreecommitdiff
path: root/37/d74a00513c16c31d74fb481ab615b30e6d39a6
blob: 636e4458dda2253a98bb994425c06e1e595e623c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
Return-Path: <alex.mizrahi@gmail.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 3EFDFCAE
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 18 May 2018 15:42:04 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-qk0-f193.google.com (mail-qk0-f193.google.com
	[209.85.220.193])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 60725D3
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 18 May 2018 15:42:02 +0000 (UTC)
Received: by mail-qk0-f193.google.com with SMTP id d125-v6so6756516qkb.8
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 18 May 2018 08:42:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to; 
	bh=UA280XWmQ1/cVb1hNrp8AxF9AZjveF2/RhbtGmU67Zg=;
	b=brHdSwfU4hQSq+BNdVfKPuYni2BWxMW6HjYIZNnv1LdNid75bAlFxUL4ftbOTBWQjW
	aSTHiiNZTjkfuq+/UidTZEnw3+hxAvzO2NYmAC4sqFbAfo970FrQNtB5hcbY9WUSZrW2
	r5qe7d2c7N1sOVu/Q1h0/pKsOz1jSZFXJ9EyhrZUXq3xPVav2i01YGoZR72lXlnlfOOE
	Y91EJ+N/wyQqTdUch82UfFm6w3d+y+d6hOb71xHR4CX8KxdslUTUlgpEo3fga52qoySC
	8eRAPFJyzJrE9aTu+BsY/4X+dfPkNnPB/ZvnOnvU4uuUEdnnfzH6sGMRy1td71yPeBWo
	tKlQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:in-reply-to:references:from:date
	:message-id:subject:to;
	bh=UA280XWmQ1/cVb1hNrp8AxF9AZjveF2/RhbtGmU67Zg=;
	b=hnAcmZqoCRup3t3LKhLLydTZk404EizQNEdOhRtj72fmJYdxzVov4JiHyckyBXn+58
	rO01vBph8RO/6WabsNK+6e7Cx3bEIYv8A/MTdnAxoLtS3sW85I00zhxXu3LoVBuluq2+
	jiEXR+VqEmZQ775F9fm/KMHrLF8PaWUCYuHFRdCSai6fyy1wga6OZFKoGzrtGfMIshw2
	sIpKR4FPY/Rd1tD3cTSbFNZvkviUxhQOsCipQahT6XYXTkD43GglSCkokCFEN9QV1wsV
	bVhF5QJiN1DsKDc33LnYuKjCCH73g0hTFagqr8I/qLFFMdVN+CEPN95ryxPQ/ooWlelg
	rtSQ==
X-Gm-Message-State: ALKqPwcZjO2COuLQSDAjyr2ZgVvrHIEbtkyiH17l3gOmgY6VlHDOE8ZC
	Qc6s1iPzTbng6mf1lo6qPaJKDqDKGqRv9hr7VplMTFfA
X-Google-Smtp-Source: AB8JxZqlphvZVtm/N6KfsIhFNTsyCy525/svGJDvYj4e1ui1NXtXbXLKEu9zdMuofjOUNYQcjBTnxN9KxglTFV2NhQI=
X-Received: by 2002:a37:cc4:: with SMTP id
	187-v6mr9093749qkm.279.1526658121337; 
	Fri, 18 May 2018 08:42:01 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.55.76.72 with HTTP; Fri, 18 May 2018 08:42:00 -0700 (PDT)
In-Reply-To: <CAApLimjfPKDxmiy_SHjuOKbfm6HumFPjc9EFKvw=3NwZO8JcmQ@mail.gmail.com>
References: <CAApLimjfPKDxmiy_SHjuOKbfm6HumFPjc9EFKvw=3NwZO8JcmQ@mail.gmail.com>
From: Alex Mizrahi <alex.mizrahi@gmail.com>
Date: Fri, 18 May 2018 18:42:00 +0300
Message-ID: <CAE28kUT0bK=hOMTqjSP8stC64eNXaqrXuz_HC-4DXx3+M4QJ7w@mail.gmail.com>
To: lists@coryfields.com, 
	Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Content-Type: multipart/alternative; boundary="000000000000040f2b056c7ccbbf"
X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: Re: [bitcoin-dev] UHS: Full-node security without maintaining a
 full UTXO set
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Fri, 18 May 2018 15:42:04 -0000

--000000000000040f2b056c7ccbbf
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

You should read this:
https://bitcointalk.org/index.php?topic=3D153662.10

On Wed, May 16, 2018 at 7:36 PM, Cory Fields via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Tl;dr: Rather than storing all unspent outputs, store their hashes.
> Untrusted
> peers can supply the full outputs when needed, with very little overhead.
> Any attempt to spoof those outputs would be apparent, as their hashes
> would not
> be present in the hash set. There are many advantages to this, most
> apparently
> in disk and memory savings, as well as a validation speedup. The primary
> disadvantage is a small increase in network traffic. I believe that the
> advantages outweigh the disadvantages.
>
> --
>
> Bitcoin=E2=80=99s unspent transaction output set (usually referred to as =
=E2=80=9CThe UTXO
> set=E2=80=9D) has two primary roles: providing proof that previous output=
s exist
> to be
> spent, and providing the actual previous output data for verification whe=
n
> new
> transactions attempts to spend them. These roles are not usually discusse=
d
> independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are
> compelling reasons to consider them this way.
>
> To see why, consider running a node with the following changes:
>
> - For each new output, gather all extra data that will be needed for
>   verification when spending it later as an input: the amount,
> scriptPubKey,
>   creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh,
> etc.).
>   Call this the Dereferenced Prevout data.
> - Create a hash from the concatenation of the new outpoint and the
> dereferenced
>   prevout data. Call this a Unspent Transaction Output Hash.
> - Rather than storing the full dereferenced prevout entries in a UTXO set
> as is
>   currently done, instead store their hashes to an Unspent Transaction
> Output
>   Hash Set, or UHS.
> - When relaying a transaction, append the dereferenced prevout for each
> input.
>
> Now when a transaction is received, it contains everything needed for
> verification, including the input amount, height, and coinbaseness, which
> would
> have otherwise required a lookup the UTXO set.
>
> To verify an input's unspentness, again create a hash from the
> concatenation of
> the referenced outpoint and the provided dereferenced prevout, and check
> for
> its presence in the UHS. The hash will only be present if a hash of the
> exact
> same data was previously added to (and not since removed from) the UHS. A=
s
> such, we are protected from a peer attempting to lie about the dereferenc=
ed
> prevout data.
>
> ### Some benefits of the UHS model
>
> - Requires no consensus changes, purely a p2p/implementation change.
>
> - UHS is substantially smaller than a full UTXO set (just over half for t=
he
>   main chain, see below). In-memory caching can be much more effective as=
 a
>   result.
>
> - A block=E2=80=99s transactions can be fully verified before doing a pot=
entially
>   expensive database lookup for the previous output data. The UHS can be
>   queried afterwards (or in parallel) to verify previous output inclusion=
.
>
> - Entire blocks could potentially be verified out-of-order because all
> input
>   data is provided; only the inclusion checks have to be in-order.
> Admittedly
>   this is likely too complicated to be realistic.
>
> - pay-to-pubkey outputs are less burdensome on full nodes, since they use
> no
>   more space on-disk than pay-to-pubkey-hash or pay-to-script-hash.
> Taproot and
>   Graftroot outputs may share the same benefits.
>
> - The burden of holding UTXO data is technically shifted from the
> verifiers to
>   the spender. In reality, full nodes will likely always have a copy as
> well,
>   but conceptually it's a slight improvement to the incentive model.
>
> - Block data from peers can also be used to roll backwards during a reorg=
.
> This
>   potentially enables an even more aggressive pruning mode.
>
> - UTXO storage size grows exactly linearly with UTXO count, as opposed to
>   growing linearly with UTXO data size. This may be relevant when
> considering
>   new larger output types which would otherwise cause the UTXO Set size t=
o
>   increase more quickly.
>
> - The UHS is a simple set, no need for a key-value database. LevelDB coul=
d
>   potentially be dropped as a dependency in some distant future.
>
> - Potentially integrates nicely with Pieter Wuille's Rolling UTXO set
> hashes
>   [1]. Unspent Transaction Output Hashes would simply be mapped to points
> on a
>   curve before adding them to the set.
>
> - With the help of inclusion proofs and rolling hashes, libbitcoinconsens=
us
>   could potentially safely verify entire blocks. The size of the required
>   proofs would be largely irrelevant as they would be consumed locally.
>
> - Others?
>
> ### TxIn De-duplication
>
> Setting aside the potential benefits, the obvious drawback of using a UHS
> is a
> significant network traffic increase. Fortunately, some properties of
> transactions can be exploited to offset most of the difference.
>
> For quick reference:
>
> p2pkh scriptPubKey: DUP HASH160 [pubkey hash] EQUALVERIFY CHECKSIG
> p2pkh scriptSig:    [signature] [pubkey]
>
> p2sh scriptPubKey:  HASH160 [script hash] EQUAL
> p2sh scriptSig:     [signature(s)] [script]
>
> Notice that if a peer is sending a scriptPubKey and a scriptSig together,
> as
> they would when using a UHS, there would likely be some redundancy. Using=
 a
> p2sh output for example, the scriptPubKey contains the script hash, and t=
he
> scriptSig contains the script itself. Therefore when sending dereferenced
> prevouts over the wire, any hash which can be computed can be omitted and
> only
> the preimages sent.
>
> Non-standard output scripts must be sent in-full, though because they
> account
> for only ~1% of all current UTXOs, they are rare enough to be ignored her=
e.
>
> ### Intra-block Script De-duplication
>
> When transactions are chained together in the same block, dereferenced
> prevout
> data for these inputs would be redundant, as the full output data is
> already
> present. For that reason, these dereferenced prevouts can be omitted when
> sending over the wire.
>
> The downside would be a new reconstruction pass requirement prior to
> validation.
>
> ### Data
>
> Here's some preliminary testing with a naive POC implementation patched
> into
> Bitcoin Core. Note that the final sizes will depend on optimization of th=
e
> serialization format. The format used for these tests is suboptimal for
> sure.
> Syncing mainnet to block 516346:
>
>                       UTXO Node      UHS Node
>   IBD Network Data:   153G           157G
>   Block disk space:   175G           157G
>   UTXO disk space :   2.8G           1.6G
>   Total disk space:   177.8G         158.6G
>
> The on-disk block-space reduction comes from the elimination of the Undo
> data
> that Bitcoin Core uses to roll back orphaned blocks. For UHS Nodes, this
> data
> is merged into to the block data and de-duplicated.
>
> Compared to the UXTO model, using a UHS reduces disk space by ~12%, yet
> only
> requires ~5% more data over the wire.
>
> Experimentation shows intra-block de-duplication to be of little help in
> practice, as it only reduces overhead by ~0.2% on mainnet. It could becom=
e
> more
> useful if, for example, CPFP usage increases substantially in the future.
>
> ### Safety
>
> Assuming sha256 for the UHS's hash function, I don't believe any
> fundamental
> changes to Bitcoin's security model are introduced. Because the unspent
> transaction output hashes commit to all necessary data, including output
> types,
> it should not be possible to be tricked into validating using mutated or
> forged
> inputs.
>
> ### Compatibility
>
> Transitioning from the current UTXO model would be annoying, but not
> particularly painful. I'll briefly describe my current preferred approach=
,
> but
> it makes sense to largely ignore this until there's been some discussion
> about
> UHS in general.
>
> A new service-bit should be allocated to indicate that a node is willing =
to
> serve blocks and transactions with dereferenced prevout data appended. On=
ce
> connected to a node advertising this feature, nodes would use a new getda=
ta
> flag, creating MSG_PREVDATA_BLOCK and MSG_PREVDATA_TX.
>
> Because current full nodes already have this data readily available in th=
e
> block-undo files, it is trivial to append on-the-fly. For that reason, it
> would
> be easy to backport patches to the current stable version of Bitcoin Core
> in
> order to enable serving these blocks even before they could be consumed.
> This
> would avoid an awkward bootstrapping phase where there may only be a few
> nodes
> available to serve to all new nodes.
>
> Admittedly I haven't put much thought into the on-disk format, I'd rather
> leave
> that to a database person. Though it does seem like a reasonable excuse t=
o
> consider moving away from LevelDB.
>
> Wallets would begin holding full prevout data for their unspent outputs,
> though
> they could probably back-into the data as-is.
>
> ### Serialization
>
> I would prefer to delay this discussion until a more high-level discussio=
n
> has
> been had, otherwise this would be a magnet for nits. The format used to
> gather
> the data above can be seen in the implementation below.
>
> It should be noted, though, that the size of a UHS is directly dependent
> on the
> chosen hash function. Smaller hashes could potentially be used, but I
> believe
> that to be unwise.
>
> ### Drawbacks
>
> The primary drawback of this approach is the ~5% network ovhead.
>
> Additionally, there is the possibility that a few "bridge nodes" may be
> needed
> for some time. In a future with a network of only pruned UHS nodes, an ol=
d
> wallet with no knowledge of its dereferenced prevout data would need to
> broadcast an old-style transaction, and have a bridge node append the ext=
ra
> data before forwarding it along the network.
>
> I won't speculate further there, except to say that I can't imagine a
> transition problem that wouldn't have a straightforward solution.
>
> Migration issues aside, am I missing any obvious drawbacks?
>
> ### Implementation
>
> This code [2] was a quick hack-job, just enough to gather some initial
> data. It
> builds a UHS in memory and never flushes to disk. Only a single run works=
,
> nasty things will happen upon restart. It should only be viewed in order
> to get
> an idea of what changes are needed. Only enough for IBD is implemented,
> mempool/wallet/rpc are likely all broken. It is definitely not
> consensus-safe.
>
> ### Acknowledgement
>
> I consider the UHS concept to be an evolution of Bram Cohen's TXO bitfiel=
d
> idea. Bram also provided invaluable input when initially walking through
> the
> feasibility of a UHS.
>
> Pieter Wuille's work on Rolling UTXO set hashes served as a catalyst for
> rethinking how the UTXO set may be represented.
>
> Additional thanks to those at at Financial Crypto and the CoreDev event
> afterwards who helped to flesh out the idea:
>
> Tadge Dryja
> Greg Sanders
> John Newbery
> Neha Narula
> Jeremy Rubin
> Jim Posen
> ...and everyone else who has chimed in.
>
>
> [0] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/
> 2017-March/013928.html
> [1] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/
> 2017-May/014337.html
> [2] https://github.com/theuni/bitcoin/tree/utxo-set-hash3
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

--000000000000040f2b056c7ccbbf
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">You should read this:<div><a href=3D"https://bitcointalk.o=
rg/index.php?topic=3D153662.10">https://bitcointalk.org/index.php?topic=3D1=
53662.10</a><br></div></div><div class=3D"gmail_extra"><br><div class=3D"gm=
ail_quote">On Wed, May 16, 2018 at 7:36 PM, Cory Fields via bitcoin-dev <sp=
an dir=3D"ltr">&lt;<a href=3D"mailto:bitcoin-dev@lists.linuxfoundation.org"=
 target=3D"_blank">bitcoin-dev@lists.linuxfoundation.org</a>&gt;</span> wro=
te:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex">Tl;dr: Rather than storing all unspen=
t outputs, store their hashes. Untrusted<br>
peers can supply the full outputs when needed, with very little overhead.<b=
r>
Any attempt to spoof those outputs would be apparent, as their hashes would=
 not<br>
be present in the hash set. There are many advantages to this, most apparen=
tly<br>
in disk and memory savings, as well as a validation speedup. The primary<br=
>
disadvantage is a small increase in network traffic. I believe that the<br>
advantages outweigh the disadvantages.<br>
<br>
--<br>
<br>
Bitcoin=E2=80=99s unspent transaction output set (usually referred to as =
=E2=80=9CThe UTXO<br>
set=E2=80=9D) has two primary roles: providing proof that previous outputs =
exist to be<br>
spent, and providing the actual previous output data for verification when =
new<br>
transactions attempts to spend them. These roles are not usually discussed<=
br>
independently, but as Bram Cohen&#39;s TXO Bitfield [0] idea hints, there a=
re<br>
compelling reasons to consider them this way.<br>
<br>
To see why, consider running a node with the following changes:<br>
<br>
- For each new output, gather all extra data that will be needed for<br>
=C2=A0 verification when spending it later as an input: the amount, scriptP=
ubKey,<br>
=C2=A0 creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh,=
 etc.).<br>
=C2=A0 Call this the Dereferenced Prevout data.<br>
- Create a hash from the concatenation of the new outpoint and the derefere=
nced<br>
=C2=A0 prevout data. Call this a Unspent Transaction Output Hash.<br>
- Rather than storing the full dereferenced prevout entries in a UTXO set a=
s is<br>
=C2=A0 currently done, instead store their hashes to an Unspent Transaction=
 Output<br>
=C2=A0 Hash Set, or UHS.<br>
- When relaying a transaction, append the dereferenced prevout for each inp=
ut.<br>
<br>
Now when a transaction is received, it contains everything needed for<br>
verification, including the input amount, height, and coinbaseness, which w=
ould<br>
have otherwise required a lookup the UTXO set.<br>
<br>
To verify an input&#39;s unspentness, again create a hash from the concaten=
ation of<br>
the referenced outpoint and the provided dereferenced prevout, and check fo=
r<br>
its presence in the UHS. The hash will only be present if a hash of the exa=
ct<br>
same data was previously added to (and not since removed from) the UHS. As<=
br>
such, we are protected from a peer attempting to lie about the dereferenced=
<br>
prevout data.<br>
<br>
### Some benefits of the UHS model<br>
<br>
- Requires no consensus changes, purely a p2p/implementation change.<br>
<br>
- UHS is substantially smaller than a full UTXO set (just over half for the=
<br>
=C2=A0 main chain, see below). In-memory caching can be much more effective=
 as a<br>
=C2=A0 result.<br>
<br>
- A block=E2=80=99s transactions can be fully verified before doing a poten=
tially<br>
=C2=A0 expensive database lookup for the previous output data. The UHS can =
be<br>
=C2=A0 queried afterwards (or in parallel) to verify previous output inclus=
ion.<br>
<br>
- Entire blocks could potentially be verified out-of-order because all inpu=
t<br>
=C2=A0 data is provided; only the inclusion checks have to be in-order. Adm=
ittedly<br>
=C2=A0 this is likely too complicated to be realistic.<br>
<br>
- pay-to-pubkey outputs are less burdensome on full nodes, since they use n=
o<br>
=C2=A0 more space on-disk than pay-to-pubkey-hash or pay-to-script-hash. Ta=
proot and<br>
=C2=A0 Graftroot outputs may share the same benefits.<br>
<br>
- The burden of holding UTXO data is technically shifted from the verifiers=
 to<br>
=C2=A0 the spender. In reality, full nodes will likely always have a copy a=
s well,<br>
=C2=A0 but conceptually it&#39;s a slight improvement to the incentive mode=
l.<br>
<br>
- Block data from peers can also be used to roll backwards during a reorg. =
This<br>
=C2=A0 potentially enables an even more aggressive pruning mode.<br>
<br>
- UTXO storage size grows exactly linearly with UTXO count, as opposed to<b=
r>
=C2=A0 growing linearly with UTXO data size. This may be relevant when cons=
idering<br>
=C2=A0 new larger output types which would otherwise cause the UTXO Set siz=
e to<br>
=C2=A0 increase more quickly.<br>
<br>
- The UHS is a simple set, no need for a key-value database. LevelDB could<=
br>
=C2=A0 potentially be dropped as a dependency in some distant future.<br>
<br>
- Potentially integrates nicely with Pieter Wuille&#39;s Rolling UTXO set h=
ashes<br>
=C2=A0 [1]. Unspent Transaction Output Hashes would simply be mapped to poi=
nts on a<br>
=C2=A0 curve before adding them to the set.<br>
<br>
- With the help of inclusion proofs and rolling hashes, libbitcoinconsensus=
<br>
=C2=A0 could potentially safely verify entire blocks. The size of the requi=
red<br>
=C2=A0 proofs would be largely irrelevant as they would be consumed locally=
.<br>
<br>
- Others?<br>
<br>
### TxIn De-duplication<br>
<br>
Setting aside the potential benefits, the obvious drawback of using a UHS i=
s a<br>
significant network traffic increase. Fortunately, some properties of<br>
transactions can be exploited to offset most of the difference.<br>
<br>
For quick reference:<br>
<br>
p2pkh scriptPubKey: DUP HASH160 [pubkey hash] EQUALVERIFY CHECKSIG<br>
p2pkh scriptSig:=C2=A0 =C2=A0 [signature] [pubkey]<br>
<br>
p2sh scriptPubKey:=C2=A0 HASH160 [script hash] EQUAL<br>
p2sh scriptSig:=C2=A0 =C2=A0 =C2=A0[signature(s)] [script]<br>
<br>
Notice that if a peer is sending a scriptPubKey and a scriptSig together, a=
s<br>
they would when using a UHS, there would likely be some redundancy. Using a=
<br>
p2sh output for example, the scriptPubKey contains the script hash, and the=
<br>
scriptSig contains the script itself. Therefore when sending dereferenced<b=
r>
prevouts over the wire, any hash which can be computed can be omitted and o=
nly<br>
the preimages sent.<br>
<br>
Non-standard output scripts must be sent in-full, though because they accou=
nt<br>
for only ~1% of all current UTXOs, they are rare enough to be ignored here.=
<br>
<br>
### Intra-block Script De-duplication<br>
<br>
When transactions are chained together in the same block, dereferenced prev=
out<br>
data for these inputs would be redundant, as the full output data is alread=
y<br>
present. For that reason, these dereferenced prevouts can be omitted when<b=
r>
sending over the wire.<br>
<br>
The downside would be a new reconstruction pass requirement prior to<br>
validation.<br>
<br>
### Data<br>
<br>
Here&#39;s some preliminary testing with a naive POC implementation patched=
 into<br>
Bitcoin Core. Note that the final sizes will depend on optimization of the<=
br>
serialization format. The format used for these tests is suboptimal for sur=
e.<br>
Syncing mainnet to block 516346:<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 UTXO Node=C2=A0 =C2=A0 =C2=A0 UHS Node<br>
=C2=A0 IBD Network Data:=C2=A0 =C2=A0153G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0157G<br>
=C2=A0 Block disk space:=C2=A0 =C2=A0175G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0157G<br>
=C2=A0 UTXO disk space :=C2=A0 =C2=A02.8G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A01.6G<br>
=C2=A0 Total disk space:=C2=A0 =C2=A0177.8G=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0158.6G<br>
<br>
The on-disk block-space reduction comes from the elimination of the Undo da=
ta<br>
that Bitcoin Core uses to roll back orphaned blocks. For UHS Nodes, this da=
ta<br>
is merged into to the block data and de-duplicated.<br>
<br>
Compared to the UXTO model, using a UHS reduces disk space by ~12%, yet onl=
y<br>
requires ~5% more data over the wire.<br>
<br>
Experimentation shows intra-block de-duplication to be of little help in<br=
>
practice, as it only reduces overhead by ~0.2% on mainnet. It could become =
more<br>
useful if, for example, CPFP usage increases substantially in the future.<b=
r>
<br>
### Safety<br>
<br>
Assuming sha256 for the UHS&#39;s hash function, I don&#39;t believe any fu=
ndamental<br>
changes to Bitcoin&#39;s security model are introduced. Because the unspent=
<br>
transaction output hashes commit to all necessary data, including output ty=
pes,<br>
it should not be possible to be tricked into validating using mutated or fo=
rged<br>
inputs.<br>
<br>
### Compatibility<br>
<br>
Transitioning from the current UTXO model would be annoying, but not<br>
particularly painful. I&#39;ll briefly describe my current preferred approa=
ch, but<br>
it makes sense to largely ignore this until there&#39;s been some discussio=
n about<br>
UHS in general.<br>
<br>
A new service-bit should be allocated to indicate that a node is willing to=
<br>
serve blocks and transactions with dereferenced prevout data appended. Once=
<br>
connected to a node advertising this feature, nodes would use a new getdata=
<br>
flag, creating MSG_PREVDATA_BLOCK and MSG_PREVDATA_TX.<br>
<br>
Because current full nodes already have this data readily available in the<=
br>
block-undo files, it is trivial to append on-the-fly. For that reason, it w=
ould<br>
be easy to backport patches to the current stable version of Bitcoin Core i=
n<br>
order to enable serving these blocks even before they could be consumed. Th=
is<br>
would avoid an awkward bootstrapping phase where there may only be a few no=
des<br>
available to serve to all new nodes.<br>
<br>
Admittedly I haven&#39;t put much thought into the on-disk format, I&#39;d =
rather leave<br>
that to a database person. Though it does seem like a reasonable excuse to<=
br>
consider moving away from LevelDB.<br>
<br>
Wallets would begin holding full prevout data for their unspent outputs, th=
ough<br>
they could probably back-into the data as-is.<br>
<br>
### Serialization<br>
<br>
I would prefer to delay this discussion until a more high-level discussion =
has<br>
been had, otherwise this would be a magnet for nits. The format used to gat=
her<br>
the data above can be seen in the implementation below.<br>
<br>
It should be noted, though, that the size of a UHS is directly dependent on=
 the<br>
chosen hash function. Smaller hashes could potentially be used, but I belie=
ve<br>
that to be unwise.<br>
<br>
### Drawbacks<br>
<br>
The primary drawback of this approach is the ~5% network ovhead.<br>
<br>
Additionally, there is the possibility that a few &quot;bridge nodes&quot; =
may be needed<br>
for some time. In a future with a network of only pruned UHS nodes, an old<=
br>
wallet with no knowledge of its dereferenced prevout data would need to<br>
broadcast an old-style transaction, and have a bridge node append the extra=
<br>
data before forwarding it along the network.<br>
<br>
I won&#39;t speculate further there, except to say that I can&#39;t imagine=
 a<br>
transition problem that wouldn&#39;t have a straightforward solution.<br>
<br>
Migration issues aside, am I missing any obvious drawbacks?<br>
<br>
### Implementation<br>
<br>
This code [2] was a quick hack-job, just enough to gather some initial data=
. It<br>
builds a UHS in memory and never flushes to disk. Only a single run works,<=
br>
nasty things will happen upon restart. It should only be viewed in order to=
 get<br>
an idea of what changes are needed. Only enough for IBD is implemented,<br>
mempool/wallet/rpc are likely all broken. It is definitely not consensus-sa=
fe.<br>
<br>
### Acknowledgement<br>
<br>
I consider the UHS concept to be an evolution of Bram Cohen&#39;s TXO bitfi=
eld<br>
idea. Bram also provided invaluable input when initially walking through th=
e<br>
feasibility of a UHS.<br>
<br>
Pieter Wuille&#39;s work on Rolling UTXO set hashes served as a catalyst fo=
r<br>
rethinking how the UTXO set may be represented.<br>
<br>
Additional thanks to those at at Financial Crypto and the CoreDev event<br>
afterwards who helped to flesh out the idea:<br>
<br>
Tadge Dryja<br>
Greg Sanders<br>
John Newbery<br>
Neha Narula<br>
Jeremy Rubin<br>
Jim Posen<br>
...and everyone else who has chimed in.<br>
<br>
<br>
[0] <a href=3D"https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017=
-March/013928.html" rel=3D"noreferrer" target=3D"_blank">https://lists.linu=
xfoundation.<wbr>org/pipermail/bitcoin-dev/<wbr>2017-March/013928.html</a><=
br>
[1] <a href=3D"https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017=
-May/014337.html" rel=3D"noreferrer" target=3D"_blank">https://lists.linuxf=
oundation.<wbr>org/pipermail/bitcoin-dev/<wbr>2017-May/014337.html</a><br>
[2] <a href=3D"https://github.com/theuni/bitcoin/tree/utxo-set-hash3" rel=
=3D"noreferrer" target=3D"_blank">https://github.com/theuni/<wbr>bitcoin/tr=
ee/utxo-set-hash3</a><br>
______________________________<wbr>_________________<br>
bitcoin-dev mailing list<br>
<a href=3D"mailto:bitcoin-dev@lists.linuxfoundation.org">bitcoin-dev@lists.=
<wbr>linuxfoundation.org</a><br>
<a href=3D"https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev" =
rel=3D"noreferrer" target=3D"_blank">https://lists.linuxfoundation.<wbr>org=
/mailman/listinfo/bitcoin-<wbr>dev</a><br>
</blockquote></div><br></div>

--000000000000040f2b056c7ccbbf--