Received: from sog-mx-1.v43.ch3.sourceforge.com ([172.29.43.191] helo=mx.sourceforge.net) by sfs-ml-4.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1WuPWB-0005Pz-Uz for bitcoin-development@lists.sourceforge.net; Tue, 10 Jun 2014 17:07:27 +0000 Received-SPF: pass (sog-mx-1.v43.ch3.sourceforge.com: domain of petertodd.org designates 62.13.149.95 as permitted sender) client-ip=62.13.149.95; envelope-from=pete@petertodd.org; helo=outmail149095.authsmtp.com; Received: from outmail149095.authsmtp.com ([62.13.149.95]) by sog-mx-1.v43.ch3.sourceforge.com with esmtp (Exim 4.76) id 1WuPWA-0008TQ-2L for bitcoin-development@lists.sourceforge.net; Tue, 10 Jun 2014 17:07:27 +0000 Received: from mail-c235.authsmtp.com (mail-c235.authsmtp.com [62.13.128.235]) by punt15.authsmtp.com (8.14.2/8.14.2/) with ESMTP id s5AH7Drd032459; Tue, 10 Jun 2014 18:07:13 +0100 (BST) Received: from savin (76-10-178-109.dsl.teksavvy.com [76.10.178.109]) (authenticated bits=128) by mail.authsmtp.com (8.14.2/8.14.2/) with ESMTP id s5AH76nD054023 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Tue, 10 Jun 2014 18:07:08 +0100 (BST) Date: Tue, 10 Jun 2014 13:08:46 -0400 From: Peter Todd To: Mike Hearn , Jeff Garzik Message-ID: <20140610170846.GB21293@savin> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="CUfgB8w4ZwR/yMy5" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Server-Quench: a846716b-f0c1-11e3-b396-002590a15da7 X-AuthReport-Spam: If SPAM / abuse - report it at: http://www.authsmtp.com/abuse X-AuthRoute: OCd2Yg0TA1ZNQRgX IjsJECJaVQIpKltL GxAVKBZePFsRUQkR bgdMdwIUEkAaAgsB AmIbWlxeUFp7WWo7 bAxPbAVDY01GQQRq WVdMSlVNFUsrBG15 ARhAJhl3cwVEezBy bE5nXj4KVEB8ckd7 EFMCFT4HeGZhPWMC AkNRcR5UcAFPdx8U a1UrBXRDAzANdhES HhM4ODE3eDlSNilR RRkIIFQOdA4hGjk7 QlgIFDQzdQAA X-Authentic-SMTP: 61633532353630.1023:706 X-AuthFastPath: 0 (Was 255) X-AuthSMTP-Origin: 76.10.178.109/587 X-AuthVirus-Status: No virus detected - but ensure you scan with your own anti-virus system. X-Spam-Score: -1.5 (-) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for sender-domain -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FAKE_REPLY_C FAKE_REPLY_C X-Headers-End: 1WuPWA-0008TQ-2L Cc: Bitcoin Dev Subject: Re: [Bitcoin-development] Bloom bait X-BeenThere: bitcoin-development@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jun 2014 17:07:28 -0000 --CUfgB8w4ZwR/yMy5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 10, 2014 at 06:38:23PM +0800, Mike Hearn wrote: > > > > As I explained in the email you're replying to and didn't quote, bloom > > filters has O(n) cost per query, so sending different bloom filters to > > different peers for privacy reasons costs the network significant disk > > IO resources. If I were to actually implement it it'd look like a DoS > > attack on the network. > > >=20 > DoS attack? Nice try. Suppose I wrote an single address lookup tool for Android that connected to multiple peers and used bloom filters to find the history of a specific address. Of course, I don't want to use too much bandwidth being on mobile, so I'll use as specific a bloom filter as possible. I might even connect to multiple peers to speed up the lookup. Is this any different from my bloom filter IO attack code? Nope. Hence, splitting up bloom filter requests for better privacy will certainly look like a DoS attack and will certainly greatly increase the load on the network. > Now consider a prefix filtering implementation. You need to calculate a > sorted list of all the data elements and tx hashes in the block, that maps > to the location in the block where the tx data can be found. These > per-block indexes take up extra disk space and, realistically, would like= ly > be implemented using LevelDB as that's a tool which is designed for > creating and using these kinds of tables, so then you're both loading the > block data itself (blocks are sized about right currently to always fit in > the default kernel readahead window) AND also seeking through the indexes, > and building them too. A smart implementation might try and pack the index > next to each block so it's possible to load both at once with a single > seek, but that would probably be more work, as it'd force building of the > index to be synchronous with saving the block to disk thus slowing down > block relay. In contrast a LevelDB based index would do the bulk of the > index-building work on a separate core. That's exactly the kinds of optimizations obelisk is implementing to make its prefix lookup database fast. Also those optimizations are situation dependent, for instance "packing the index next to each block" is irrelevant if you put archival blockchain data on a slow HD, and indexes on a fast SSD, something some obelisk servers do. More to the point, your showing quite clearly there isn't just one optimal way to do it. Applying a bloom filter, or a prefix filter, or some as yet unknown filter, to blockchain data is a service and that service has different tradeoffs compared to just serving up archival block history. There is zero reason not to make that service something you advertise with NODE_BLOOM - after all, you already have the code in bitcoinj to do the exact same thing by checking the advertised protocol version. On Tue, Jun 10, 2014 at 09:02:00AM -0400, Jeff Garzik wrote: > Most of this description of disk activity is true, but it omits one > key point: Total cached data (working set). It is a binary, first > order question: are you hitting pagecache, or the disk? When nodes > act as archival data sources, the pagecache pressure is immense. When > nodes just primarily serve recent blocks, that data is being served > out of pagecache. As I directly observed running public nodes, the > disks were running constantly, impacting all clients, even clients > downloading only recent blocks. >=20 > Luckily, headers are served out of RAM, so that part of the sync is alway= s fast. >=20 > NODE_BLOOM -- and block download in general -- will tend to be slower > than it could be, due to the working set almost always being larger > than available pagecache. Fix that problem, NODE_BLOOM will always > operate out of pagecache, and disk activity will not be an issue. >=20 > Once you start hitting the disk, you've already lost. Yup. I discussed this with Matt Corallo at the financial crypto conference a few months back and he made the same point. Unfortunately we'll need an upgrade to let nodes advertise ranges of blocks to begin to fix that issue, and even then it still shows quite clearly how it's not optimal if we force everyone to share blockchain data in the same way. --=20 'peter'[:-1]@petertodd.org 000000000000000023c7fc084ed84b891cc2fa90e4a34708d6b2370d3ec1c85d --CUfgB8w4ZwR/yMy5 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQGrBAEBCACVBQJTlzuYXhSAAAAAABUAQGJsb2NraGFzaEBiaXRjb2luLm9yZzAw MDAwMDAwMDAwMDAwMDAzZjBiYjI0M2IwMjZmYTkwNDk0NmYyNjJlZWJmMDY4NDYx NGViYjI2MjJlNGU1NTQvFIAAAAAAFQARcGthLWFkZHJlc3NAZ251cGcub3JncGV0 ZUBwZXRlcnRvZC5vcmcACgkQJIFAPaXwkftr7AgAr6LTDzsBVv4pjvVlc4GpgL4b I44h1VuXiujXRRHH0mPoLQoTm7/XHcYeL4LfZi2H98jEOracYtc1ONBiMIkZauFO ZsAVroEyjr1+KrNUfvzZ8NYX8onIMIUkWQtTnV1227HMgxPkoYWpI3VDI/FxFcVu in7EBCjMW08L36lnzl24u6mFgCyQXaV5PBKzjtzkQRo4eOR2H6ImtuCpxCnWKzwS 7dV8E9zonS6kwyu9Uq+35eEtlPXNte9q8lsHQYGOly30PK45x3mebx8dSqa2+cj9 s9aacOtyIqjYt9+sM2Upjif6oyBo+xVkPVpTG/Lm1UZTS3odjTmYUFiW3bYE3g== =0B4b -----END PGP SIGNATURE----- --CUfgB8w4ZwR/yMy5--