Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 2F787E47 for ; Fri, 25 May 2018 18:42:44 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-ua0-f178.google.com (mail-ua0-f178.google.com [209.85.217.178]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id CC2AD180 for ; Fri, 25 May 2018 18:42:42 +0000 (UTC) Received: by mail-ua0-f178.google.com with SMTP id f30-v6so2313501uab.11 for ; Fri, 25 May 2018 11:42:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to; bh=amZkCbxLeJf2BtaiwbgMCvZxYkbyvMFvXDWVNEe4pZs=; b=J1pRrVozbUYTd8n145E1dfBC41Ta3CW82rrHG0e8X3vz/7EBxuD3zqqBDD0AMj7DL8 9AA4yw6VOZS1t63uk4kt1dk5I7qGQIfDqc00DCJGt0RzIRbo0EGfFGd9V9sPgm6sGQLc yIlxO5coNurJZR5BK1dWl3+HydqrV+9aP3I3U1rUiQmVcuFJDzfyfM8KeF9qVARoYIQB n5tOVdltrTLYfG/H1TgsOU71qE4svlr2EN9pU4c1MVqXA7bBH7fob19aVIJn+j0hx+zp WUyejlmKZUGn9xPOzMC/djF8gJz0paep+wPSFikt4U3hwRCmMUjqltDvw+rLUszCDgUg j8MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to; bh=amZkCbxLeJf2BtaiwbgMCvZxYkbyvMFvXDWVNEe4pZs=; b=mm4E6K9V1aaaRImwgUaBeIwmdZRev0atHFoEsly+bH5aLWj64EunKvKozEQ3i9NZcV 8LsgoPcIz8g5wMUDIk/5BU2BjAUFNqDlzrr/v5UHYYwtPNoXTtDoQfDDIieKrenj4CRD fPbiN/ZKG8GQId9f5b0SoPVgh4E7CQAgCa2mFiMuHDoMf4sdxXDHZRhARd7iRptkuJ6u UMyUbU7wU0ZNUIeddtmDs1a7bHDkFe28Is0I4fyTKP4PMuNyGFnMX4v1wI7vFXzIQoRr xnyG/ybQPZqXzUeAw3dnF8YlNaEcijKv8YGkTJfOFyMSWLA3v8P7cngnEakuNDQZKttJ /gfw== X-Gm-Message-State: ALKqPwfzVMSNaXw8Ix6P0Oy+Z/p8rY8wA7mcy2FB9PslJ6GR7Ff2y4tR kuOUDerHpsEWS2EKSZgbrV6ug+6JKUo7Fa8z0Dc= X-Google-Smtp-Source: ADUXVKIPxarCXfPCPEkIo651XFI9wi9esoyXJ5E4hdCUgAOFx5a0E3c7GmcwFocqqQQTFRoM7v/zLhj0KmdckJ5NW0k= X-Received: by 2002:ab0:1092:: with SMTP id d18-v6mr2410356uab.87.1527273761839; Fri, 25 May 2018 11:42:41 -0700 (PDT) MIME-Version: 1.0 Sender: gmaxwell@gmail.com Received: by 2002:a67:5184:0:0:0:0:0 with HTTP; Fri, 25 May 2018 11:42:41 -0700 (PDT) In-Reply-To: References: From: Gregory Maxwell Date: Fri, 25 May 2018 18:42:41 +0000 X-Google-Sender-Auth: VQf-u-Iqg4gtmVNdunUeYVYGHVk Message-ID: To: Pieter Wuille , Bitcoin Protocol Discussion Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: Re: [bitcoin-dev] Minimizing the redundancy in Golomb Coded Sets X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 May 2018 18:42:44 -0000 On Fri, May 25, 2018 at 5:54 PM, Pieter Wuille via bitcoin-dev wrote: > Hi all, > > I spent some time working out the optimal parameter selection for the > Golomb Coded Sets that are proposed in BIP158: > https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845 > > TL;DR: if we really want an FP rate of exactly 1 in 2^20, the Rice > parameter should be 19, not 20. If we don't, we should pick an FP rate > of 1 in a 1.4971*2^B. So for example M=784931 B=19 or M=1569861 B=20. I did a rough analysis using Pieter's approximations on what parameters minimizes the total communications for a lite wallet scanning the chain and fetching a witnessless block whenever they get a filter hit. For a wallet with 1000 keys and blocks of 1MB if the number of entries in the is at least 5096 then M=784931 results in a lower total data rate rate (FP blocks + filters) than M=1569861. M=392465 (the optimal value for the rice parameter 18) is communications is better if at least 10192 entries are set, and M=196233 (optimal FP for rice 17) is better if at least 20384 entries are set. The prior filter set proposal is setting roughly 13300 entries per full block, and I guestimate that the in+out scripts only ones are setting about 7500 entries (if that actual number was in any of the recent posts I missed it, I'm guessing based on jimpo's sizes graph). The breakpoints are obviously different if the client is monitoring for, say, 10,000 keys instead of 1000 but I think it generally makes more sense to optimize for lower key counts since bigger users are more likely to tolerate the additional bandwidth usage. So I think that assuming that all-scripts inputs and outputs (but no txids) are used and that my guess of 7500 bits set for that configuration is roughly right, then M=1569861 and rice parameter 19 should be used. The actual optimal FP rate for total data transferred won't be one that gets the optimal rice coding efficiency, but since different clients will be monitoring for different numbers of keys, it probably makes sense to pick a parameter with optimal compression rather than optimal-data-transfer-for-a-specific-key-count-- at least then we're spending the least amount of filter bits per false positive rate, whatever that rate is... if we can't be optimal at least we can be efficient. :)