Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id BF5434A3 for ; Thu, 1 Jun 2017 02:12:18 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from mo.garage.hdemail.jp (mo.garage.hdemail.jp [46.51.242.127]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id DD3B915F for ; Thu, 1 Jun 2017 02:12:17 +0000 (UTC) Received: from ip-10-217-1-36.ap-northeast-1.compute.internal (localhost.localdomain [127.0.0.1]) by mo.garage.hdemail.jp (hde-mf-postfix) with SMTP id 62F6D14C0C5 for ; Thu, 1 Jun 2017 11:12:16 +0900 (JST) (envelope-from karljohan-alm@garage.co.jp) X-Received: from unknown (HELO mo.garage.hdemail.jp) (127.0.0.1) by 0 with SMTP; 1 Jun 2017 11:12:16 +0900 X-Received: from mo.garage.hdemail.jp (localhost.localdomain [127.0.0.1]) by mo.garage.hdemail.jp (hde-ma-postfix) with ESMTP id 57E0A4C085 for ; Thu, 1 Jun 2017 11:12:16 +0900 (JST) (envelope-from karljohan-alm@garage.co.jp) Received: from gw19.oz.hdemail.jp (ip-10-125-134-49.ap-northeast-1.compute.internal [10.125.134.49]) by mo.garage.hdemail.jp (hde-mf-postfix) with ESMTP id 5493A14C0C5 for ; Thu, 1 Jun 2017 11:12:16 +0900 (JST) (envelope-from karljohan-alm@garage.co.jp) X-Durian-MailFrom: karljohan-alm@garage.co.jp X-Durian-RcptTo: bitcoin-dev@lists.linuxfoundation.org Received: from gw19.oz.hdemail.jp (localhost [127.0.0.1]) by gw19.oz.hdemail.jp (localhost [127.0.0.1]); Thu, 1 Jun 2017 11:12:11 +0900 X-Received: from mail-qk0-f197.google.com (lb1.oz.lo.hdemail.jp [54.248.222.53]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by gw19.oz.hdemail.jp (Postfix) with ESMTP id 185B8148C0D0 for ; Thu, 1 Jun 2017 11:12:11 +0900 (JST) X-Received: by mail-qk0-f197.google.com with SMTP id k9so10125908qkh.10 for ; Wed, 31 May 2017 19:12:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=3lcRQO3QQ01aVFC5zmcu47x+gjandIReTwlVljWVSdU=; b=XXhAEnRux3U6vJsNz+MkX9e1rG4T8W2EzRmJKzizvepIkehen3wgsMizaGgYzpvLpk ea1PdyiZhEikQkxI8whvJFUdvgpEdGp18eNVPJQeukFbo0J4sQqipbkoGqIVFiKcKOuQ ERjIlQnc3coOPlikju527BYryIt/8G02Q296DKouzNsG37Iy2vJq0Y9DoFC7iYtP3GJp NbzlioW39YYp2SB+ZnAgMF9gfurzfJg/bW7R2L4dMqFC4ia84vc86L29qlHSFFMprO1c Voqq1bT58JT+lIh5kQhgV3mcTNPNCYWL0VeiRGUGD4ZvqHN7gBSXnEYzulztTsfHSNbR Kixw== X-Gm-Message-State: AODbwcB/VlFzLv8zzIX/xFNhwSTSR2bDPcEctbl5NsuUJjXgjhfJZG21 Q3XBFGAT/B1ILFh2kitY24Kh/HvdCke4SBT3HTJrjf04BxrSjQ7ZkJums7EfnYovuqHMGxmY/f7 tWcNvehUdGp2uykb4c+b3bULgiwFvb+5Fgd3q/NVVCBYwkw== X-Received: by 10.237.53.3 with SMTP id a3mr32749483qte.207.1496283129052; Wed, 31 May 2017 19:12:09 -0700 (PDT) X-Received: by 10.237.53.3 with SMTP id a3mr32749471qte.207.1496283128867; Wed, 31 May 2017 19:12:08 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.12.137.38 with HTTP; Wed, 31 May 2017 19:11:48 -0700 (PDT) From: Karl Johan Alm Date: Thu, 1 Jun 2017 11:11:48 +0900 Message-ID: To: Bitcoin Protocol Discussion Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Thu, 01 Jun 2017 02:15:55 +0000 Subject: [bitcoin-dev] Block Filter Digest profiling X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 02:12:18 -0000 Hello, I have spent a fair bit of time trying to nail how exactly block filter digests[1] should be done to optimize bandwidth, space, resource usage. The report can be found here: http://bc-2.jp/bfd-profile.pdf This graph shows bandwidth use of 200 wallets simulated over 5000 blocks: http://bc-2.jp/bandwidth_bfd.png (black line is "sync once per block" wallet, yellow is "sync once per 144 blocks" wallet, red is average across all wallets). An interesting insight made during the experiments: when allowing digests to contain multiple blocks, the false positive rate of high block count digests can be higher than normal, because the probability of a false positive hit for a given entry in multiple digests, assuming their sizes differ, is almost completely independent. The results look rather promising to me, but I would like to hear comments, in particular on the approach taken, if I made any faulty assumptions, bad math mistakes, etc. I am also curious what people consider to be acceptable costs in terms of bandwidth use and memory (I couldn't find any stats on bandwidth use of bloom filters). In the profiling, I restricted the field sizes to 2^27 = 128 MB. I assumed this was appropriate as these fields are very short lived, and in worst case, a client *could* do the scan and decode simultaneously, without allocating up the space for the field at all. For high block count digests (e.g. 1024 blocks), this is sometimes overfilled. I wonder if 2^28 (256 MB) fields would be at all acceptable or if an over-filled (high false positive rate) field is better. For that matter, I am not entirely sure 1024-block digests are necessary, but they do come with an average 15 kb/block which is pretty good. I also wonder if the serialization approach taken is overkill or not. It does save some space instead of simply storing "BBBAAAAA" but adds some complexity that may not be warranted. [1] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/012636.html