summaryrefslogtreecommitdiff
path: root/02/7fa5a7405eb4c9fce10c91f03ddc40bd3985e6
blob: d7ffce4210853ce83c4f7cdd02dccbeb9aa403d0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
Return-Path: <peter_r@gmx.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id E01AB4A6
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 15 May 2017 20:58:55 +0000 (UTC)
X-Greylist: delayed 00:05:04 by SQLgrey-1.7.6
Received: from mout.gmx.net (mout.gmx.net [212.227.17.21])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 3BDA620E
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 15 May 2017 20:58:53 +0000 (UTC)
Received: from [192.168.1.117] ([184.68.138.98]) by mail.gmx.com (mrgmx103
	[212.227.17.174]) with ESMTPSA (Nemesis) id 0Mb8HX-1dPZWy12cN-00KgB4;
	Mon, 15 May 2017 22:53:47 +0200
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Peter R <peter_r@gmx.com>
In-Reply-To: <CAPg+sBgruEiXya6oFy6VpmR1KPDebjeGDtZZU+facZx5=L_a5w@mail.gmail.com>
Date: Mon, 15 May 2017 13:53:45 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <553BA161-F60F-4278-914F-0E5F2D2E0A84@gmx.com>
References: <CAPg+sBgruEiXya6oFy6VpmR1KPDebjeGDtZZU+facZx5=L_a5w@mail.gmail.com>
To: Pieter Wuille <pieter.wuille@gmail.com>
X-Mailer: Apple Mail (2.2104)
X-Provags-ID: V03:K0:lT4olf41UMC80XA8oZmr++Q8jYv1K0opztH5tqs2Ir1xQ+Gwh3M
	uykILLAUs9QWHJerkM35FOHLbMgykVBf/lcU/oVZQM6BKgDpitt55IwNO4HBf9m/Qjhszbs
	90MkI+f95AMsJcpyOzhKysQNymtTdv083FYB93+q7vWUf/onVbE+tXIabLaiBNU53bdhlkp
	HY1dfOWg6qcIFMxVrut+g==
X-UI-Out-Filterresults: notjunk:1;V01:K0:x1TOwWITgRM=:mVvCC2bpqqibunXtfxZAqC
	A+fi1zZ8Xqbbizn7Em9VjNA2dWIsPSTmJ4aVd714Hswrdo2EQ6ntUCyBv8xJE+VIITWEeFcNf
	+0GnS6gDZfyJAxHlEuuVqrCmSnFSBj9Yhmhj2PgANBWSCfI2syFBM3skUvVI34fOET0kg17CZ
	7Vf0PlkUx2+Rv317rDb5vp7GO5Qymio2WE9uvdVmLAxoNzjpiT9MU+QsLAngca158G97sfKRi
	jAoOGUQL6u7jZ03DqgOI3RLUEX/IK3qwD6ZJZcM0VDK2gYL4zzYSnG8FyvzKPtFXKBZAx3Isn
	uy2t5ggzOXa/7hU9QOWyG2aHQkACXVXWfPn5pnIK/fxsnN78BefUdPLyIhzEASF58MLkHfl3Z
	iU8nIkYtwORe6MvwurojvtjeoagJo3dYA62+ziTV9XCDIPchmBe6c+HeW8lgX5P+Ur8eeG9p0
	vxWkiiKSH7+fA5QXwcLJgQS6b+m70X3urK9rlW0fUe2See5VI/l9sVP+wo8yNgXfwPx01rGp2
	6Q7idSuzLWJ/fdoJ3mbu1tt8HeHjNclEVfEONXX1agYHWqKe4jSe3qYH4MLwUMmDe+Ebp+q4e
	gT7/HVoK5A1SM7tfdh3VEEWWzXCZ9LYCEgBiToSPK42iGExNT72IsaJQh/Yka0Jm/Gtgtyqkx
	+Xh7IRi//XWYK7BJ2oIPR0frpl4XqpkWIIx+2hC9vR9bvpsfI7hK2zt9yC+uD2+SdtJt7dWLc
	kk2tSkbrk9zyIXykiiGEWvpmhOqRPiRJ0tQgFQd6s6JYXql5PaxC5LZ6uhuhCYy8PjvCCeJ9i
	alBrmxn
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	RCVD_IN_DNSWL_LOW,RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Mon, 15 May 2017 22:03:38 +0000
Cc: Bitcoin Dev <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] Rolling UTXO set hashes
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Mon, 15 May 2017 20:58:56 -0000

Hi Pieter,

I wanted to say that I thought this write-up was excellent!  And =
efficiently hashing the UTXO set in this rolling fashion is a very =
exciting idea!!=20

Peter R

> On May 15, 2017, at 1:01 PM, Pieter Wuille via bitcoin-dev =
<bitcoin-dev@lists.linuxfoundation.org> wrote:
>=20
> Hello all,
>=20
> I would like to discuss a way of computing a UTXO set hash that is
> very efficient to update, but does not support any compact proofs of
> existence or non-existence.
>=20
> Much has been written on the topic of various data structures and
> derived hashes for the UTXO/TXO set before (including Alan Reiner's
> trust-free lite nodes [1], Peter Todd's TXO MMR commitments [2] [3],
> or Bram Cohen's TXO bitfield [4]). They all provide interesting extra
> functionality or tradeoffs, but require invasive changes to the P2P
> protocol or how wallets work, or force nodes to maintain their
> database in a normative fashion. Instead, here I focus on an efficient
> hash that supports nothing but comparing two UTXO sets. However, it is
> not incompatible with any of those other approaches, so we can gain
> some of the advantages of a UTXO hash without adopting something that
> may be incompatible with future protocol enhancements.
>=20
> 1. Incremental hashing
>=20
> Computing a hash of the UTXO set is easy when it does not need
> efficient updates, and when we can assume a fixed serialization with a
> normative ordering for the data in it - just serialize the whole thing
> and hash it. As different software or releases may use different
> database models for the UTXO set, a solution that is order-independent
> would seem preferable.
>=20
> This brings us to the problem of computing a hash of unordered data.
> Several approaches that accomplish this through incremental hashing
> were suggested in [5], including XHASH, AdHash, and MuHash. XHASH
> consists of first hashing all the set elements independently, and
> XORing all those hashes together. This is insecure, as Gaussian
> elimination can easily find a subset of random hashes that XOR to a
> given value. AdHash/MuHash are similar, except addition/multiplication
> modulo a large prime are used instead of XOR. Wagner [6] showed that
> attacking XHASH or AdHash is an instance of a generalized birthday
> problem (called the k-sum problem in his paper, with unrestricted k),
> and gives a O(2^(2*sqrt(n)-1)) algorithm to attack it (for n-bit
> hashes). As a result, AdHash with 256-bit hashes only has 31 bits of
> security.
>=20
> Thankfully, [6] also shows that the k-sum problem cannot be
> efficiently solved in groups in which the discrete logarithm problem
> is hard, as an efficient k-sum solver can be used to compute discrete
> logarithms. As a result, MuHash modulo a sufficiently large safe prime
> is provably secure under the DL assumption. Common guidelines on
> security parameters [7] say that 3072-bit DL has about 128 bits of
> security. A final 256-bit hash can be applied to the 3072-bit result
> without loss of security to reduce the final size.
>=20
> An alternative to multiplication modulo a prime is using an elliptic
> curve group. Due to the ECDLP assumption, which the security of
> Bitcoin signatures already relies on, this also results in security
> against k-sum solving. This approach is used in the Elliptic Curve
> Multiset Hash (ECMH) in [8]. For this to work, we must "hash onto a
> curve point" in a way that results in points without known discrete
> logarithm. The paper suggests using (controversial) binary elliptic
> curves to make that operation efficient. If we only consider
> secp256k1, one approach is just reading potential X coordinates from a
> PRNG until one is found that has a corresponding Y coordinate
> according to the curve equation. On average, 2 iterations are needed.
> A constant time algorithm to hash onto the curve exists as well [9],
> but it is only slightly faster and is much more complicated to
> implement.
>=20
> AdHash-like constructions with a sufficiently large intermediate hash
> can be made secure against Wagner's algorithm, as suggested in [10].
> 4160-bit hashes would be needed for 128 bits of security. When
> repetition is allowed, [8] gives a stronger attack against AdHash,
> suggesting that as much as 400000 bits are needed. While repetition is
> not directly an issue for our use case, it would be nice if
> verification software would not be required to check for duplicated
> entries.
>=20
> 2. Efficient addition and deletion
>=20
> Interestingly, both ECMH and MuHash not only support adding set
> elements in any order but also deleting in any order. As a result, we
> can simply maintain a running sum for the UTXO set as a whole, and
> add/subtract when creating/spending an output in it. In the case of
> MuHash it is slightly more complicated, as computing an inverse is
> relatively expensive. This can be solved by representing the running
> value as a fraction, and multiplying created elements into the
> numerator and spent elements into the denominator. Only when the final
> hash is desired, a single modular inverse and multiplication is needed
> to combine the two.
>=20
> As the update operations are also associative, H(a)+H(b)+H(c)+H(d) can
> in fact be computed as (H(a)+H(b)) + (H(c)+H(d)). This implies that
> all of this is perfectly parallellizable: each thread can process an
> arbitrary subset of the update operations, allowing them to be
> efficiently combined later.
>=20
> 3. Comparison of approaches
>=20
> Numbers below are based on preliminary benchmarks on a single thread
> of a i7-6820HQ CPU running at 3.4GHz.
>=20
> (1) (MuHash) Multiplying 3072-bit hashes mod 2^3072 - 1103717 (the
> largest 3072-bit safe prime).
>    * Needs a fast modular multiplication/inverse implementation.
>    * Using SHA512 + ChaCha20 for generating the hashes takes 1.2us per =
element.
>    * Modular multiplication using GMP takes 1.5us per element (2.5us
> with a 60-line C+asm implementation).
>    * 768 bytes for maintaining a running sum (384 for numerator, 384
> for denominator)
>    * Very common security assumption. Even if the DL assumption would
> be broken (but no k-sum algorithm faster than Wagner's is found), this
> still maintains 110 bits of security.
>=20
> (2) (ECMH) Adding secp256k1 EC points
>    * Much more complicated than the previous approaches when
> implementing from scratch, but almost no extra complexity when ECDSA
> secp256k1 signature validation is already implemented.
>    * Using SHA512 + libsecp256k1's point decompression for generating
> the points takes 11us per element on average.
>    * Addition/subtracting of N points takes 5.25us + 0.25us*N.
>    * 64 bytes for a running sum.
>    * Identical security assumption as Bitcoin's signatures.
>=20
> Using the numbers above, we find that:
> * Computing the hash from just the UTXO set takes (1) 2m15s (2) 9m20s
> * Processing all creations and spends in an average block takes (1)
> 24ms (2) 100ms
> * Processing precomputed per-transaction aggregates in an average
> block takes (1) 3ms (2) 0.5ms
>=20
> Note that while (2) has higher CPU usage than (1) in general, it has
> lower latency when using precomputed per-transaction aggregates. Using
> such aggregates is also more feasible as they're only 64 bytes rather
> than 768. Because of simplicity, (1) has my preference.
>=20
> Overall, these numbers are sufficiently low (note that they can be
> parallellized) that it would be reasonable for full nodes and/or other
> software to always maintain one of them, and effectively have a
> rolling cryptographical checksum of the UTXO set at all times.
>=20
> 4. Use cases
>=20
> * Replacement for Bitcoin Core's gettxoutsetinfo RPC's hash
> computation. This currently requires minutes of I/O and CPU, as it
> serializes and hashes the entire UTXO set. A rolling set hash would
> make this instant, making the whole RPC much more usable for sanity
> checking.
> * Assisting in implementation of fast sync methods with known good
> blocks/UTXO sets.
> * Database consistency checking: by remembering the UTXO set hash of
> the past few blocks (computed on the fly), a consistency check can be
> done that recomputes it based on the database.
>=20
>=20
>  [1] https://bitcointalk.org/index.php?topic=3D88208.0
>  [2] =
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/012715.ht=
ml
>  [3] =
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-February/0135=
91.html
>  [4] =
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/013928.=
html
>  [5] https://cseweb.ucsd.edu/~mihir/papers/inchash.pdf
>  [6] https://people.eecs.berkeley.edu/~daw/papers/genbday.html
>  [7] https://www.keylength.com/
>  [8] https://arxiv.org/pdf/1601.06502.pdf
>  [9] https://www.di.ens.fr/~fouque/pub/latincrypt12.pdf
>  [10] =
http://csrc.nist.gov/groups/ST/hash/sha-3/Aug2014/documents/gligoroski_pap=
er_sha3_2014_workshop.pdf
>=20
> Cheers,
>=20
> --=20
> Pieter
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev