Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id EA60F949 for ; Fri, 2 Jun 2017 06:00:58 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from mo.garage.hdemail.jp (mo.garage.hdemail.jp [46.51.242.127]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id E61BE193 for ; Fri, 2 Jun 2017 06:00:57 +0000 (UTC) Received: from ip-10-217-1-36.ap-northeast-1.compute.internal (localhost.localdomain [127.0.0.1]) by mo.garage.hdemail.jp (hde-mf-postfix) with SMTP id 49F9114C0CA for ; Fri, 2 Jun 2017 15:00:56 +0900 (JST) (envelope-from karljohan-alm@garage.co.jp) X-Received: from unknown (HELO mo.garage.hdemail.jp) (127.0.0.1) by 0 with SMTP; 2 Jun 2017 15:00:56 +0900 X-Received: from mo.garage.hdemail.jp (localhost.localdomain [127.0.0.1]) by mo.garage.hdemail.jp (hde-ma-postfix) with ESMTP id 3921D4C085 for ; Fri, 2 Jun 2017 15:00:56 +0900 (JST) (envelope-from karljohan-alm@garage.co.jp) Received: from gw21.oz.hdemail.jp (ip-10-172-186-126.ap-northeast-1.compute.internal [10.172.186.126]) by mo.garage.hdemail.jp (hde-mf-postfix) with ESMTP id 300D714C0CA for ; Fri, 2 Jun 2017 15:00:56 +0900 (JST) (envelope-from karljohan-alm@garage.co.jp) X-Durian-MailFrom: karljohan-alm@garage.co.jp X-Durian-RcptTo: bitcoin-dev@lists.linuxfoundation.org Received: from gw21.oz.hdemail.jp (gw21.oz.hdemail.jp [127.0.0.1]) by gw21.oz.hdemail.jp (gw21.oz.hdemail.jp [127.0.0.1]); Fri, 2 Jun 2017 15:00:53 +0900 X-Received: from mail-qt0-f200.google.com (lb1.oz.lo.hdemail.jp [54.248.222.53]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by gw21.oz.hdemail.jp (Postfix) with ESMTP id 33107148C13C for ; Fri, 2 Jun 2017 15:00:53 +0900 (JST) X-Received: by mail-qt0-f200.google.com with SMTP id g53so16743079qta.3 for ; Thu, 01 Jun 2017 23:00:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=FXjjkfpcgOyyAnEXTphUXQpUC2BSgVhGjOp00nM8QrA=; b=Fs0CzULcpLHsw6A30waGAMC4Qx7xjbA5lNu5oBJdN7egztQR4kekgdBch//Dvfd2/6 fDKcTlGaW75j2FvmeXrKUZ2rMohN8568o0OYxkd9eJODcz2ViiBYlDviAzam2OUVMfEC giqcj8t97YQJLmMUQyF4nnZ3ZeQbaBuz2EQKeqt3li1oWU2TKD6n1UmmvvdUhhwkMA30 IKZXtOudM5bPzJfFozwHFJDIeffZH3RYrhrKgdTXJXoJzJGFVrL4Hy4JwmCL9rn6HPHs JVwCU10yP2cSqr1UexncrII8nngKvUno5+I6GIZw1r2c+CNOzLA9sbAPgUKZL5uyH5ue SnMw== X-Gm-Message-State: AODbwcCp2jPAe+HzWoGGzpxxd6OWLD4fnyVLsV7+k4Eoif2wACZR7dlN A+5s1QQDW8ojykDZqBb03L9XSGpKGhShRTKiYBdyFs3nV+0VokeYeNqrZiu3vCZrfxc4RZ59LvT AXQ8YTAwDG/fkHlobeFEcRFPYG1kHVaWJ2LL05yBK95iJEQ== X-Received: by 10.237.35.211 with SMTP id k19mr6720483qtc.125.1496383251118; Thu, 01 Jun 2017 23:00:51 -0700 (PDT) X-Received: by 10.237.35.211 with SMTP id k19mr6720453qtc.125.1496383250870; Thu, 01 Jun 2017 23:00:50 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.12.146.10 with HTTP; Thu, 1 Jun 2017 23:00:30 -0700 (PDT) In-Reply-To: References: From: Karl Johan Alm Date: Fri, 2 Jun 2017 15:00:30 +0900 Message-ID: To: Olaoluwa Osuntokun Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Fri, 02 Jun 2017 12:09:27 +0000 Cc: Arnoud Kouwenhoven - Pukaki Corp via bitcoin-dev Subject: Re: [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for Light Clients X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jun 2017 06:00:59 -0000 Hello, Really wish I'd known you were working on this a few weeks ago, but such is life. Hopefully I can provide some useful feedback. On Fri, Jun 2, 2017 at 4:01 AM, Olaoluwa Osuntokun via bitcoin-dev wrote: > Full-nodes > maintain an additional index of the chain, and serve this compact filter > (the index) to light clients which request them. Light clients then fetch > these filters, query the locally and _maybe_ fetch the block if a relevant > item matches. Is it necessary to maintain the index all the way to the beginning of the chain? When would clients request "really old digests" and why? > One specific area we'd like feedback on is the parameter selection. Unlike > BIP-37 which allows clients to dynamically tune their false positive rate, > our proposal uses a _fixed_ false-positive. Within the document, it's > currently specified as P = 1/2^20. We've done a bit of analysis and > optimization attempting to optimize the following sum: > filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex > has made a JS calculator that allows y'all to explore the affect of > tweaking the false positive rate in addition to the following variables: > the number of items the wallet is scanning for, the size of the blocks, > number of blocks fetched, and the size of the filters themselves. The > calculator calculates the expected bandwidth utilization using the CDF of > the Geometric Distribution. The calculator can be found here: > https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical > script he's been running on actual data, and the results seem to match up > rather nicely. I haven't tried the tool yet, and maybe it will answer some of my questions. On what data were the simulated wallets on actual data based? How did false positive rates for wallets with lots of items (pubkeys etc) play out? Is there a maximum number of items for a wallet before it becomes too bandwidth costly to use digests? > We we're excited to see that Karl Johan Alm (kallewoof) has done some > (rather extensive!) analysis of his own, focusing on a distinct encoding > type [5]. I haven't had the time yet to dig into his report yet, but I > think I've read enough to extract the key difference in our encodings: his > filters use a binomial encoding _directly_ on the filter contents, will we > instead create a Golomb-Coded set with the contents being _hashes_ (we use > siphash) of the filter items. I will definitely try to reproduce my experiments with Golomb-Coded sets and see what I come up with. It seems like you've got a little less than half the size of my digests for 1-block digests but I haven't tried making digests for all blocks (and lots of early blocks are empty). On the BIP proposal itself: In Compact Filter Header Chain, you mention that clients should download filters from nodes if filter_headers is not identical, and ban offending nodes. What about temporary forks in the chain? What about longer forks? In general, I am curious how you will deal with reorgs and temporary non-consensus related chain splits. I am also curious if you have considered digests containing multiple blocks. Retaining a permanent binsearchable record of the entire chain is obviously too space costly, but keeping the last X blocks as binsearchable could speed up syncing for clients tremendously, I feel. It may also be space efficient to ONLY store older digests in chunks of e.g. 8 blocks. A client syncing up finding a match in an 8-block chunk would have to grab those 8 blocks, but if it's not recent, that may be acceptable. It may even be possible to make 4-, 2-, 1-block digests on demand. How fast are these to create? Would it make sense to provide digests on demand in some cases, rather than keeping them around indefinitely?