Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id E01AB4A6 for ; Mon, 15 May 2017 20:58:55 +0000 (UTC) X-Greylist: delayed 00:05:04 by SQLgrey-1.7.6 Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 3BDA620E for ; Mon, 15 May 2017 20:58:53 +0000 (UTC) Received: from [192.168.1.117] ([184.68.138.98]) by mail.gmx.com (mrgmx103 [212.227.17.174]) with ESMTPSA (Nemesis) id 0Mb8HX-1dPZWy12cN-00KgB4; Mon, 15 May 2017 22:53:47 +0200 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) From: Peter R In-Reply-To: Date: Mon, 15 May 2017 13:53:45 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <553BA161-F60F-4278-914F-0E5F2D2E0A84@gmx.com> References: To: Pieter Wuille X-Mailer: Apple Mail (2.2104) X-Provags-ID: V03:K0:lT4olf41UMC80XA8oZmr++Q8jYv1K0opztH5tqs2Ir1xQ+Gwh3M uykILLAUs9QWHJerkM35FOHLbMgykVBf/lcU/oVZQM6BKgDpitt55IwNO4HBf9m/Qjhszbs 90MkI+f95AMsJcpyOzhKysQNymtTdv083FYB93+q7vWUf/onVbE+tXIabLaiBNU53bdhlkp HY1dfOWg6qcIFMxVrut+g== X-UI-Out-Filterresults: notjunk:1;V01:K0:x1TOwWITgRM=:mVvCC2bpqqibunXtfxZAqC A+fi1zZ8Xqbbizn7Em9VjNA2dWIsPSTmJ4aVd714Hswrdo2EQ6ntUCyBv8xJE+VIITWEeFcNf +0GnS6gDZfyJAxHlEuuVqrCmSnFSBj9Yhmhj2PgANBWSCfI2syFBM3skUvVI34fOET0kg17CZ 7Vf0PlkUx2+Rv317rDb5vp7GO5Qymio2WE9uvdVmLAxoNzjpiT9MU+QsLAngca158G97sfKRi jAoOGUQL6u7jZ03DqgOI3RLUEX/IK3qwD6ZJZcM0VDK2gYL4zzYSnG8FyvzKPtFXKBZAx3Isn uy2t5ggzOXa/7hU9QOWyG2aHQkACXVXWfPn5pnIK/fxsnN78BefUdPLyIhzEASF58MLkHfl3Z iU8nIkYtwORe6MvwurojvtjeoagJo3dYA62+ziTV9XCDIPchmBe6c+HeW8lgX5P+Ur8eeG9p0 vxWkiiKSH7+fA5QXwcLJgQS6b+m70X3urK9rlW0fUe2See5VI/l9sVP+wo8yNgXfwPx01rGp2 6Q7idSuzLWJ/fdoJ3mbu1tt8HeHjNclEVfEONXX1agYHWqKe4jSe3qYH4MLwUMmDe+Ebp+q4e gT7/HVoK5A1SM7tfdh3VEEWWzXCZ9LYCEgBiToSPK42iGExNT72IsaJQh/Yka0Jm/Gtgtyqkx +Xh7IRi//XWYK7BJ2oIPR0frpl4XqpkWIIx+2hC9vR9bvpsfI7hK2zt9yC+uD2+SdtJt7dWLc kk2tSkbrk9zyIXykiiGEWvpmhOqRPiRJ0tQgFQd6s6JYXql5PaxC5LZ6uhuhCYy8PjvCCeJ9i alBrmxn X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Mon, 15 May 2017 22:03:38 +0000 Cc: Bitcoin Dev Subject: Re: [bitcoin-dev] Rolling UTXO set hashes X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 May 2017 20:58:56 -0000 Hi Pieter, I wanted to say that I thought this write-up was excellent! And = efficiently hashing the UTXO set in this rolling fashion is a very = exciting idea!!=20 Peter R > On May 15, 2017, at 1:01 PM, Pieter Wuille via bitcoin-dev = wrote: >=20 > Hello all, >=20 > I would like to discuss a way of computing a UTXO set hash that is > very efficient to update, but does not support any compact proofs of > existence or non-existence. >=20 > Much has been written on the topic of various data structures and > derived hashes for the UTXO/TXO set before (including Alan Reiner's > trust-free lite nodes [1], Peter Todd's TXO MMR commitments [2] [3], > or Bram Cohen's TXO bitfield [4]). They all provide interesting extra > functionality or tradeoffs, but require invasive changes to the P2P > protocol or how wallets work, or force nodes to maintain their > database in a normative fashion. Instead, here I focus on an efficient > hash that supports nothing but comparing two UTXO sets. However, it is > not incompatible with any of those other approaches, so we can gain > some of the advantages of a UTXO hash without adopting something that > may be incompatible with future protocol enhancements. >=20 > 1. Incremental hashing >=20 > Computing a hash of the UTXO set is easy when it does not need > efficient updates, and when we can assume a fixed serialization with a > normative ordering for the data in it - just serialize the whole thing > and hash it. As different software or releases may use different > database models for the UTXO set, a solution that is order-independent > would seem preferable. >=20 > This brings us to the problem of computing a hash of unordered data. > Several approaches that accomplish this through incremental hashing > were suggested in [5], including XHASH, AdHash, and MuHash. XHASH > consists of first hashing all the set elements independently, and > XORing all those hashes together. This is insecure, as Gaussian > elimination can easily find a subset of random hashes that XOR to a > given value. AdHash/MuHash are similar, except addition/multiplication > modulo a large prime are used instead of XOR. Wagner [6] showed that > attacking XHASH or AdHash is an instance of a generalized birthday > problem (called the k-sum problem in his paper, with unrestricted k), > and gives a O(2^(2*sqrt(n)-1)) algorithm to attack it (for n-bit > hashes). As a result, AdHash with 256-bit hashes only has 31 bits of > security. >=20 > Thankfully, [6] also shows that the k-sum problem cannot be > efficiently solved in groups in which the discrete logarithm problem > is hard, as an efficient k-sum solver can be used to compute discrete > logarithms. As a result, MuHash modulo a sufficiently large safe prime > is provably secure under the DL assumption. Common guidelines on > security parameters [7] say that 3072-bit DL has about 128 bits of > security. A final 256-bit hash can be applied to the 3072-bit result > without loss of security to reduce the final size. >=20 > An alternative to multiplication modulo a prime is using an elliptic > curve group. Due to the ECDLP assumption, which the security of > Bitcoin signatures already relies on, this also results in security > against k-sum solving. This approach is used in the Elliptic Curve > Multiset Hash (ECMH) in [8]. For this to work, we must "hash onto a > curve point" in a way that results in points without known discrete > logarithm. The paper suggests using (controversial) binary elliptic > curves to make that operation efficient. If we only consider > secp256k1, one approach is just reading potential X coordinates from a > PRNG until one is found that has a corresponding Y coordinate > according to the curve equation. On average, 2 iterations are needed. > A constant time algorithm to hash onto the curve exists as well [9], > but it is only slightly faster and is much more complicated to > implement. >=20 > AdHash-like constructions with a sufficiently large intermediate hash > can be made secure against Wagner's algorithm, as suggested in [10]. > 4160-bit hashes would be needed for 128 bits of security. When > repetition is allowed, [8] gives a stronger attack against AdHash, > suggesting that as much as 400000 bits are needed. While repetition is > not directly an issue for our use case, it would be nice if > verification software would not be required to check for duplicated > entries. >=20 > 2. Efficient addition and deletion >=20 > Interestingly, both ECMH and MuHash not only support adding set > elements in any order but also deleting in any order. As a result, we > can simply maintain a running sum for the UTXO set as a whole, and > add/subtract when creating/spending an output in it. In the case of > MuHash it is slightly more complicated, as computing an inverse is > relatively expensive. This can be solved by representing the running > value as a fraction, and multiplying created elements into the > numerator and spent elements into the denominator. Only when the final > hash is desired, a single modular inverse and multiplication is needed > to combine the two. >=20 > As the update operations are also associative, H(a)+H(b)+H(c)+H(d) can > in fact be computed as (H(a)+H(b)) + (H(c)+H(d)). This implies that > all of this is perfectly parallellizable: each thread can process an > arbitrary subset of the update operations, allowing them to be > efficiently combined later. >=20 > 3. Comparison of approaches >=20 > Numbers below are based on preliminary benchmarks on a single thread > of a i7-6820HQ CPU running at 3.4GHz. >=20 > (1) (MuHash) Multiplying 3072-bit hashes mod 2^3072 - 1103717 (the > largest 3072-bit safe prime). > * Needs a fast modular multiplication/inverse implementation. > * Using SHA512 + ChaCha20 for generating the hashes takes 1.2us per = element. > * Modular multiplication using GMP takes 1.5us per element (2.5us > with a 60-line C+asm implementation). > * 768 bytes for maintaining a running sum (384 for numerator, 384 > for denominator) > * Very common security assumption. Even if the DL assumption would > be broken (but no k-sum algorithm faster than Wagner's is found), this > still maintains 110 bits of security. >=20 > (2) (ECMH) Adding secp256k1 EC points > * Much more complicated than the previous approaches when > implementing from scratch, but almost no extra complexity when ECDSA > secp256k1 signature validation is already implemented. > * Using SHA512 + libsecp256k1's point decompression for generating > the points takes 11us per element on average. > * Addition/subtracting of N points takes 5.25us + 0.25us*N. > * 64 bytes for a running sum. > * Identical security assumption as Bitcoin's signatures. >=20 > Using the numbers above, we find that: > * Computing the hash from just the UTXO set takes (1) 2m15s (2) 9m20s > * Processing all creations and spends in an average block takes (1) > 24ms (2) 100ms > * Processing precomputed per-transaction aggregates in an average > block takes (1) 3ms (2) 0.5ms >=20 > Note that while (2) has higher CPU usage than (1) in general, it has > lower latency when using precomputed per-transaction aggregates. Using > such aggregates is also more feasible as they're only 64 bytes rather > than 768. Because of simplicity, (1) has my preference. >=20 > Overall, these numbers are sufficiently low (note that they can be > parallellized) that it would be reasonable for full nodes and/or other > software to always maintain one of them, and effectively have a > rolling cryptographical checksum of the UTXO set at all times. >=20 > 4. Use cases >=20 > * Replacement for Bitcoin Core's gettxoutsetinfo RPC's hash > computation. This currently requires minutes of I/O and CPU, as it > serializes and hashes the entire UTXO set. A rolling set hash would > make this instant, making the whole RPC much more usable for sanity > checking. > * Assisting in implementation of fast sync methods with known good > blocks/UTXO sets. > * Database consistency checking: by remembering the UTXO set hash of > the past few blocks (computed on the fly), a consistency check can be > done that recomputes it based on the database. >=20 >=20 > [1] https://bitcointalk.org/index.php?topic=3D88208.0 > [2] = https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/012715.ht= ml > [3] = https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-February/0135= 91.html > [4] = https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/013928.= html > [5] https://cseweb.ucsd.edu/~mihir/papers/inchash.pdf > [6] https://people.eecs.berkeley.edu/~daw/papers/genbday.html > [7] https://www.keylength.com/ > [8] https://arxiv.org/pdf/1601.06502.pdf > [9] https://www.di.ens.fr/~fouque/pub/latincrypt12.pdf > [10] = http://csrc.nist.gov/groups/ST/hash/sha-3/Aug2014/documents/gligoroski_pap= er_sha3_2014_workshop.pdf >=20 > Cheers, >=20 > --=20 > Pieter > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev