Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A7E0B5B1 for ; Wed, 1 Mar 2017 22:31:09 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from outmail149055.authsmtp.co.uk (outmail149055.authsmtp.co.uk [62.13.149.55]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 6424A1B7 for ; Wed, 1 Mar 2017 22:31:08 +0000 (UTC) Received: from mail-c232.authsmtp.com (mail-c232.authsmtp.com [62.13.128.232]) by punt21.authsmtp.com (8.14.2/8.14.2/) with ESMTP id v21MV5sU031486; Wed, 1 Mar 2017 22:31:05 GMT Received: from petertodd.org (ec2-52-5-185-120.compute-1.amazonaws.com [52.5.185.120]) (authenticated bits=0) by mail.authsmtp.com (8.14.2/8.14.2/) with ESMTP id v21MV2Rc041636 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 1 Mar 2017 22:31:03 GMT Received: from [127.0.0.1] (localhost [127.0.0.1]) by petertodd.org (Postfix) with ESMTPSA id D5923400AC; Wed, 1 Mar 2017 22:31:01 +0000 (UTC) Received: by localhost (Postfix, from userid 1000) id 18C1D2030A; Wed, 1 Mar 2017 17:31:01 -0500 (EST) Date: Wed, 1 Mar 2017 17:31:01 -0500 From: Peter Todd To: Bram Cohen Message-ID: <20170301223101.GA17022@savin.petertodd.org> References: <20170224010943.GA29218@savin.petertodd.org> <20170224025811.GA31911@savin.petertodd.org> <20170224031531.GA32118@savin.petertodd.org> <20170224043613.GA32502@savin.petertodd.org> <20170225041202.GA11152@savin.petertodd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="pf9I7BMVVzbSWLtt" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Server-Quench: c121ac78-fece-11e6-829f-00151795d556 X-AuthReport-Spam: If SPAM / abuse - report it at: http://www.authsmtp.com/abuse X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVKBZePFsRUQkR aQdMdgMUFVQGAgsB AmEbWVZeU1x7WWA7 bghPaBtcak9QXgdq T0pMXVMcUgdpfHoD ZE0eVhh0dAMIeXZ2 ZUAsDXFZXRUpck5g FBsHQHAHZDJmdWgd WRZFdwNVdQJNdxoR b1V5GhFYa3VsNCMk FAgyOXU9MCtqYA0d aAwRMV8ICWMuJHYQ Sh4DGzQzHEoDLwAA X-Authentic-SMTP: 61633532353630.1037:706 X-AuthFastPath: 0 (Was 255) X-AuthSMTP-Origin: 52.5.185.120/25 X-AuthVirus-Status: No virus detected - but ensure you scan with your own anti-virus system. X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Bitcoin Protocol Discussion Subject: Re: [bitcoin-dev] A Better MMR Definition X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Mar 2017 22:31:09 -0000 --pf9I7BMVVzbSWLtt Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 24, 2017 at 10:23:20PM -0800, Bram Cohen wrote: > > Your merkle-set implementation is 1500 lines of densely written Python >=20 >=20 > The reference implementation which is included in those 1500 lines is less > than 300 lines and fairly straightforward. The non-reference implementati= on > always behaves semantically identically to the reference implementation, = it > just does so faster and using less memory. Great! But do you see my point here? Even though I spent some time reading through that code, I didn't realise you had a 300 line reference implementation embedded within those 1500 lines. This makes it less likely for you to get = any review on either. A better way to present your work would have been to at least explain that = at the top of the file, and perhaps even better, split the reference implementation and optimized implementation into two separate files. If you= did this, you'd be more likely to get others to review your work. > > with > > almost no comments, >=20 >=20 > The comments at the top explain both the proof format and the in-memory > data structures very precisely. The whole codebase was reviewed by a > coworker of mine and comments were added explaining the subtleties which > tripped him up. Yes, and it's good that you have those comments. But the codebase itself co= uld definitely use more, and adding those comments would help get more people reviewing your work. > > and less than a 100 lines of (also uncommented) tests. >=20 >=20 > Those tests get 98% code coverage and extensively hit not only the lines = of > code but the semantic edge cases as well. The lines which aren't hit are > convenience functions and error conditions of the parsing code for when > it's passed bad data Great! But you see how without comments, it'll take a tremendous amount of = work for an external reviewer like myself to determine what is being tested, and what edge cases you're targeting. In fact, I'd suggest that for things like edge cases, you test edge cases in separate unit tests that explain what edge cases you're trying to catch. > > By > > comparison, my Python MMR implementation is 300 lines of very readable > > Python > > with lots of comments, a 200 line explanation at the top, and 200 lines= of > > (commented) tests. Yet no-one is taking the (still considerable) effort= to > > understand and comment on my implementation. :) > > >=20 > Given that maaku's Merkle prefix trees were shelved because of performance > problems despite being written in C and operating in basically the same w= ay > as your code and my reference code, it's clear that non-optimized Python > won't be touching the bitcoin codebase any time soon. To be clear, I gave my implementation as an example of how hard it is to get external review, not to suggest it's going to be a part of Bitcoin; I've pointed a lot of people to it when they asked for a MMR implementation, and= I'm sure if some of those people had reviewed it carefully they would have suggested changes. Yet they haven't, because doing good review is a lot of work! > > In particular, while at the top of merkle_set.py you have a list of > > advantages, > > and a bunch of TODO's, you don't explain *why* the code has any of these > > advantages. To figure that out, I'd have to read and understand 1500 li= nes > > of > > densely written Python. Without a human-readable pitch, not many people= are > > going to do that, myself included. > > >=20 > It's all about cache coherence. When doing operations it pulls in a bunch > of things which are near each other in memory instead of jumping all over > the place. The improvements it gets should be much greater than the ones > gained from insertion ordering, although the two could be accretive. That's good, but that paragraph should be part of your MerkleSet git repo, preferably in the README, where reviewers will immediately find it and get excited about reviewing your code. --=20 https://petertodd.org 'peter'[:-1]@petertodd.org --pf9I7BMVVzbSWLtt Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJYt0ugAAoJECSBQD2l8JH7GT0IAJGdWMegQ2GZnHl9Jw5OLOCf EwbKzt6+F40rajL0Rxcy6b5yhu9Vngi8eZno6Tr8FEDPFpIdE/6daf+3x+Gx5ifv 8R3dxiz1Y89IKczJHYEQgKbYviZ6KPI13REHhmSBtNw5dQee5ZX8/TOrfdagaCKd TjgkKwrem69D+HrLy5pEnlFZqncVVSzz72l0TY61EA5r2RKwFILUP0nwBfRV7jnt 8rr0mSDDd4fQNpekN1nvYDAzcOLgXc+RJIHKliHB5U33WOomhciptJSMSthd9oDb CMQEXj8xg6BQqDOaVBpnAQcdCdl7bvjtxAwc0kgDDSuBSegaznuxfUstjHAbVsY= =gLIR -----END PGP SIGNATURE----- --pf9I7BMVVzbSWLtt--