Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id B683F259 for ; Sat, 25 Feb 2017 04:12:08 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from outmail148100.authsmtp.co.uk (outmail148100.authsmtp.co.uk [62.13.148.100]) by smtp1.linuxfoundation.org (Postfix) with ESMTP id 8C0321A0 for ; Sat, 25 Feb 2017 04:12:07 +0000 (UTC) Received: from mail-c247.authsmtp.com (mail-c247.authsmtp.com [62.13.128.247]) by punt24.authsmtp.com (8.14.2/8.14.2/) with ESMTP id v1P4C5iY036068; Sat, 25 Feb 2017 04:12:05 GMT Received: from petertodd.org (ec2-52-5-185-120.compute-1.amazonaws.com [52.5.185.120]) (authenticated bits=0) by mail.authsmtp.com (8.14.2/8.14.2/) with ESMTP id v1P4C3Ko008545 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 25 Feb 2017 04:12:04 GMT Received: from [127.0.0.1] (localhost [127.0.0.1]) by petertodd.org (Postfix) with ESMTPSA id 0DE4E40576; Sat, 25 Feb 2017 04:12:03 +0000 (UTC) Received: by localhost (Postfix, from userid 1000) id 27D40204AB; Fri, 24 Feb 2017 23:12:02 -0500 (EST) Date: Fri, 24 Feb 2017 23:12:02 -0500 From: Peter Todd To: Bram Cohen Message-ID: <20170225041202.GA11152@savin.petertodd.org> References: <20170223235105.GA28497@savin.petertodd.org> <20170224010943.GA29218@savin.petertodd.org> <20170224025811.GA31911@savin.petertodd.org> <20170224031531.GA32118@savin.petertodd.org> <20170224043613.GA32502@savin.petertodd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="6c2NcOVqGQ03X4Wi" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Server-Quench: 90cdee85-fb10-11e6-bcdf-0015176ca198 X-AuthReport-Spam: If SPAM / abuse - report it at: http://www.authsmtp.com/abuse X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVKBZePFsRUQkR aQdMdAcUHlAWAgsB AmEbW1BeUV97WWc7 bghPaBtcak9QXgdq T0pMXVMcUgQIBW8C fUEeUhF3cwAIen93 ZU4sV3AICBEvfUNg FBxVFXAHZDJmdTJM BBZFdwNVdQJNeEwU a1l3GhFYa3VsNCMk FAgyOXU9MCtqYA0d aAwRMV8ICWMuJHYQ Sh4DGzQzHEoDLwAA X-Authentic-SMTP: 61633532353630.1038:706 X-AuthFastPath: 0 (Was 255) X-AuthSMTP-Origin: 52.5.185.120/25 X-AuthVirus-Status: No virus detected - but ensure you scan with your own anti-virus system. X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Bitcoin Protocol Discussion Subject: Re: [bitcoin-dev] A Better MMR Definition X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Feb 2017 04:12:08 -0000 --6c2NcOVqGQ03X4Wi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 24, 2017 at 02:20:19PM -0800, Bram Cohen wrote: > So your idea is to cluster entries by entry time because newer things are > more likely to leave and updating multiple things near each other is > cheaper? Yes, exactly. > That can be done with my tool. Instead of using hashes for the values bei= ng > stored, you use position entries. The first entry gets a value of all > zeros, the next one a one followed by all zeros, then the next two > correspond to the first two with the second bit flipped to one, then the > next four the first four with the third bit flipped to one, etc. It > probably performs a little bit better to do it two bits at a time instead > of one so that the entries are 00, 01, 10, 11, 0001, 0010, 0011, 0101, > 0110, 0111, 1001, etc. If you were to really use this you'd probably want > to to add some optimizations to use the fact that the terminals fit in 64 > bits instead of 256, but it mostly works unchanged, and gets whatever So to be clear, what you're proposing there is to use the insertion order as the index - once you go that far you've almost entirely re-invented my proposal! In fact, when I was working my proofchains/proofmarshal libraries I put some thought into whether or not I could leave out the MMR merkelized list implementation and use only the key-value map I also wrote. I decided to include both as they aren't quite the same datastructure - using a list for= a list has advantages. > benefits there are to this clustering plus the high performance > implementation tricks I've built which I keep complaining that nobody's > giving feedback on. Your merkle-set implementation is 1500 lines of densely written Python with almost no comments, and less than a 100 lines of (also uncommented) tests. = By comparison, my Python MMR implementation is 300 lines of very readable Pyth= on with lots of comments, a 200 line explanation at the top, and 200 lines of (commented) tests. Yet no-one is taking the (still considerable) effort to understand and comment on my implementation. :) Fact is, what you've written is really daunting to review, and given it's n= ot in the final language anyway, it's unclear what basis to review it on anywa= y. I suspect you'd get more feedback if the codebase was better commented, in a production language, and you have actual real-world benchmarks and performa= nce figures. In particular, while at the top of merkle_set.py you have a list of advanta= ges, and a bunch of TODO's, you don't explain *why* the code has any of these advantages. To figure that out, I'd have to read and understand 1500 lines = of densely written Python. Without a human-readable pitch, not many people are going to do that, myself included. > I'm not sold on this being a win: The empirical access patterns are > unknown, Lost coins alone guarantees that access patterns will be biased towards new coins being more likely to be spent. That basis alone is sufficient to just= ify an insertion-ordered data structure. Additionally, people have done graphs = of the average age of UTXO's when spent, and that data clearly shows that newer coins are more likely to be spent than older coins. > unknown, it requires an extra cache miss per lookup to find the entry > number, Like I mentioned in the email you're replying to, that extra lookup can be easily avoided with a change to how transactions/blocks are serialized; if = all your peers support TXO commitments you can even discard the lookup database entirely, as it's only a backwards compatibility measure. > it may be that everything is optimized well enough without it for > there to be no meaningful gains, and it's a bunch of extra complexity. Wh= at Optimization is itself extra complexity. If you're data structure has worse inherent performance, and you have to make up the different with a highly optimized implementation, that's likely to lead to more overall complexity = than using a data structure with better inherent performance. Your current merkle-set implementation definitely _is_ very complex. An apples-to-apples comparison is with my merkelized key:value tree(1), also a patricia tree, which like the MMR is only about 300 lines of well-commented= and straight-forward code. 1) https://github.com/proofchains/python-proofmarshal/blob/master/proofmars= hal/merbinnertree.py > should be done is that a plain vanilla UTXO set solution is optimized as > well as it can be first, and then the insertion ordering trick is tried as > an optimization to see if it's an improvement. Without that baseline > there's no meaningful basis for comparison, and I'm quite confident that a > naive implementation which just allocates individual nodes will > underperform the thing I've come up with, even without adding optimizatio= ns > related to fitting in 64 bits. To be clear, "insertion ordering" isn't a simple trick, it's a fundamental change to what the data structure is. Once you do that, you're talking abou= t my proposal. --=20 https://petertodd.org 'peter'[:-1]@petertodd.org --6c2NcOVqGQ03X4Wi Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJYsQQPAAoJECSBQD2l8JH7XH4H/02u4PvOnJldydIt8yOi6ksO lfuoqCbDj6z6wN7cyz0jb24TNfzzoMPNdd0OVCyyB9FhSH4MSokIxMDJJJR0NANi 04/Fm74+J0C+rLh3IysTZpUVf53F6HfDCtiK9XdNeOkncyRhEvYc99ULlgpnWQ2o JpFeTDVilDalCPIbZ+SvkYK1jK/MaB5tANkn8nVc7PgNKCp3lRyifhm8LIaiM4SG ZuV96pI/Pg3HKW1NA2fkqjAXnkgpQxjZIz/SbL/fe66pLYoWmFKL+FKWPgFPhFUn kDKAjG0AXaIBWPvYQLgaXDW7g3dTAWnVJTmrWgp0U6/xxwk+c+MZ88EY/d7/fW4= =p9PW -----END PGP SIGNATURE----- --6c2NcOVqGQ03X4Wi--