diff options
author | Peter Tschipper <peter.tschipper@gmail.com> | 2015-12-02 15:02:20 -0800 |
---|---|---|
committer | bitcoindev <bitcoindev@gnusha.org> | 2015-12-02 23:02:23 +0000 |
commit | 3aad6f927d263bcb873f1226028a86cbfde978c1 (patch) | |
tree | 96fb840150d2b9f8083ef05015de4c4f7cf6ec86 | |
parent | 93849b529324af7f8b3e6280f9fd5cc20ff304b4 (diff) | |
download | pi-bitcoindev-3aad6f927d263bcb873f1226028a86cbfde978c1.tar.gz pi-bitcoindev-3aad6f927d263bcb873f1226028a86cbfde978c1.zip |
Re: [bitcoin-dev] [BIP Draft] Datastream compression of Blocks and Transactions
-rw-r--r-- | e0/c96054f68b29397c7fc3e0e64c0ed5c6edeba4 | 208 |
1 files changed, 208 insertions, 0 deletions
diff --git a/e0/c96054f68b29397c7fc3e0e64c0ed5c6edeba4 b/e0/c96054f68b29397c7fc3e0e64c0ed5c6edeba4 new file mode 100644 index 000000000..ececb0f03 --- /dev/null +++ b/e0/c96054f68b29397c7fc3e0e64c0ed5c6edeba4 @@ -0,0 +1,208 @@ +Return-Path: <peter.tschipper@gmail.com> +Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org + [172.17.192.35]) + by mail.linuxfoundation.org (Postfix) with ESMTPS id 5556E8FC + for <bitcoin-dev@lists.linuxfoundation.org>; + Wed, 2 Dec 2015 23:02:23 +0000 (UTC) +X-Greylist: whitelisted by SQLgrey-1.7.6 +Received: from mail-pf0-f181.google.com (mail-pf0-f181.google.com + [209.85.192.181]) + by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7E47D185 + for <bitcoin-dev@lists.linuxfoundation.org>; + Wed, 2 Dec 2015 23:02:22 +0000 (UTC) +Received: by pfdd184 with SMTP id d184so2340908pfd.3 + for <bitcoin-dev@lists.linuxfoundation.org>; + Wed, 02 Dec 2015 15:02:22 -0800 (PST) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; + h=subject:to:references:cc:from:message-id:date:user-agent + :mime-version:in-reply-to:content-type:content-transfer-encoding; + bh=My+MT62KnkdtHUMXCVudZC/T6odykjGDphxofqcHT0o=; + b=LiuioYDma1ieeMXz613wr6rj0kFjYHhpdHKMdVKv2lC5mglVtJt6TGCKyptq5+AjDp + E8mKcYTZh3/FxbMHz9lQ9MadCJcnVhzQkhtUth87TVZbr5lhCDdj3Rqvi4gepDB2pGWI + Alx6x2+Dv/Cq/KsNEezU9KOF7Nckn38UjApi4rkYiwqwSKny1ZfK0diBzeSDWibaRxdJ + I6/vVQjGUeV0bm/Ujx+TNV6tz4LCkYc5fq6olpkouU5Hnptmjssq1scNvs1+gAjZkWee + 7/q6jabzQHINbzWcTqxPv6GuEKpNXYB6cYN000xoumjvqo36dqkLFY6bKN6jaXX0rc7K + gkew== +X-Received: by 10.98.65.135 with SMTP id g7mr8489323pfd.141.1449097342279; + Wed, 02 Dec 2015 15:02:22 -0800 (PST) +Received: from [192.168.0.132] (S0106bcd165303d84.cc.shawcable.net. + [96.54.102.88]) by smtp.googlemail.com with ESMTPSA id + v89sm6389185pfa.91.2015.12.02.15.02.21 + (version=TLSv1/SSLv3 cipher=OTHER); + Wed, 02 Dec 2015 15:02:21 -0800 (PST) +To: Matt Corallo <lf-lists@mattcorallo.com> +References: <565CD7D8.3070102@gmail.com> + <90EF4E6C-9A71-4A35-A938-EAFC1A24DD24@mattcorallo.com> + <CAPkFh0t9SwVOLrPnL7z80s-Rriezhqxn_3vXKYRxr6JVGNiUZQ@mail.gmail.com> + <565F5193.1070802@gmail.com> <565F6F73.5050906@mattcorallo.com> +From: Peter Tschipper <peter.tschipper@gmail.com> +X-Enigmail-Draft-Status: N1110 +Message-ID: <565F787C.3080604@gmail.com> +Date: Wed, 2 Dec 2015 15:02:20 -0800 +User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 + Thunderbird/38.3.0 +MIME-Version: 1.0 +In-Reply-To: <565F6F73.5050906@mattcorallo.com> +Content-Type: text/plain; charset=windows-1252 +Content-Transfer-Encoding: 8bit +X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, + DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, + RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 +X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on + smtp1.linux-foundation.org +Cc: Bitcoin Dev <bitcoin-dev@lists.linuxfoundation.org> +Subject: Re: [bitcoin-dev] [BIP Draft] Datastream compression of Blocks and + Transactions +X-BeenThere: bitcoin-dev@lists.linuxfoundation.org +X-Mailman-Version: 2.1.12 +Precedence: list +List-Id: Bitcoin Development Discussion <bitcoin-dev.lists.linuxfoundation.org> +List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, + <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe> +List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/> +List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org> +List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help> +List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, + <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe> +X-List-Received-Date: Wed, 02 Dec 2015 23:02:23 -0000 + +On 02/12/2015 2:23 PM, Matt Corallo wrote: +> My issue is more that its additional complexity and attack surface, +> and for a very minor gain +What is a minor gain? 15 to 27% compression sounds good to me and the +larger the data the better the compression. And although there is a +decent peformance gain in proportion to the % of compression, the +original motivation of the BIP was to reduce bandwidth for users in +regions where they are subject to caps. +> which should disappear with further optimization elsewhere +Why would the benefit of compressing data disappear with further +optimizations elsewhere, I'm not following you?. The compression of +data mainly has benefit in the sending of packets over the network. I +would think the performance gain would be cumulative. Why would this go +away by optimizing elsewhere? + +> and less that we absolutely shouldn't add compression because we're +> definitely gonna have issues. +It's not that difficult to add compression. Even if there was an issue, +the compression feature can be completely turned off. + +> +> On 12/02/15 20:16, Peter Tschipper via bitcoin-dev wrote: +>> Building a compressor from scratch may yeild some better compression +>> ratios, or not, but having trust and faith in whether it will stand up +>> against attack vectors another matter. LZO has been around for 20 years +>> with very few problems and no current issues. Maybe something better +>> can be built, but when and how much testing will need to be done before +>> it can be trusted? Right now there is something that provides a benefit +>> and in the future if something better is found it's not that difficult +>> to add it. We could easily support multiple compression libraries. +>> +>> +>> On 02/12/2015 10:57 AM, Emin G�n Sirer wrote: +>>> Thanks Peter for the careful, quantitative work. +>>> +>>> I want to bring one additional issue to everyone's consideration, +>>> related to the choice of the Lempel-Ziv family of compressors. +>>> +>>> While I'm not familiar with every single compression engine tested, +>>> the Lempel-Ziv family of compressors are generally based on +>>> "compression tables." Essentially, they assign a short unique number +>>> to every new subsequence they encounter, and when they re-encounter a +>>> sequence like "ab" in "abcdfdcdabcdfabcdf" they replace it with that +>>> short integer (say, in this case, 9-bit constant 256). So this example +>>> sequence may turn into "abcdfd<258 for cd><256 for ab><258 for +>>> cd>f<261 for abc><259 for df>" which is slightly shorter than the +>>> original (I'm doing this off the top of my head so the counts may be +>>> off, but it's meant to be illustrative). Note that the sequence "abc" +>>> got added into the table only after it was encountered twice in the +>>> input. +>>> +>>> This is nice and generic and works well for English text where certain +>>> letter sequences (e.g. "it" "th" "the" "this" "are" "there" etc) are +>>> repeated often, but it is nowhere as compact as it could possibly be +>>> for mostly binary data -- there are opportunities for much better +>>> compression, made possible by the structured reuse of certain byte +>>> sequences in the Bitcoin wire protocol. +>>> +>>> On a Bitcoin wire connection, we might see several related +>>> transactions reorganizing cash in a set of addresses, and therefore, +>>> several reuses of a 20-byte address. Or we might see a 200-byte +>>> transaction get transmitted, followed by the same transaction, +>>> repeated in a block. Ideally, we'd learn the sequence that may be +>>> repeated later on, all at once (e.g. a Bitcoin address or a +>>> transaction), and replace it with a short number, referring back to +>>> the long sequence. In the example above, if we knew that "abcdf" was a +>>> UNIT that would likely be repeated, we would put it into the +>>> compression table as a whole, instead of relying on repetition to get +>>> it into the table one extra byte at a time. That may let us compress +>>> the original sequence down to "abcdfd<257 for cd><256 for abcdf><256 +>>> for abcdf>" from the get go. +>>> +>>> Yet the LZ variants I know of will need to see a 200-byte sequence +>>> repeated **199 times** in order to develop a single, reusable, +>>> 200-byte long subsequence in the compression table. +>>> +>>> So, a Bitcoin-specific compressor can perhaps do significantly better, +>>> but is it a good idea? Let's argue both sides. +>>> +>>> Cons: +>>> +>>> On the one hand, Bitcoin-specific compressors will be closely tied to +>>> the contents of messages, which might make it difficult to change the +>>> wire format later on -- changes to the wire format may need +>>> corresponding changes to the compressor. If the compressor cannot be +>>> implemented cleanly, then the protocol-agnostic, off-the-shelf +>>> compressors have a maintainability edge, which comes at the expense of +>>> the compression ratio. +>>> +>>> Another argument is that compression algorithms of any kind should be +>>> tested thoroughly before inclusion, and brand new code may lack the +>>> maturity required. While this argument has some merit, all outputs are +>>> verified separately later on during processing, so +>>> compression/decompression errors can potentially be detected. If the +>>> compressor/decompressor can be structured in a way that isolates +>>> bitcoind from failure (e.g. as a separate process for starters), this +>>> concern can be remedied. +>>> +>>> Pros: +>>> +>>> The nature of LZ compressors leads me to believe that much higher +>>> compression ratios are possible by building a custom, Bitcoin-aware +>>> compressor. If I had to guess, I would venture that compression ratios +>>> of 2X or more are possible in some cases. In some sense, the "O(1) +>>> block propagation" idea that Gavin proposed a while ago can be seen as +>>> extreme example of a Bitcoin-specific compressor, albeit one that +>>> constrains the order of transactions in a block. +>>> +>>> Compression can buy us some additional throughput at zero cost, modulo +>>> code complexity. +>>> Given the amount of acrimonious debate over the block size we have all +>>> had to endure, it seems +>>> criminal to leave potentially free improvements on the table. Even if +>>> the resulting code is +>>> deemed too complex to include in the production client right now, it +>>> would be good to understand +>>> the potential for improvement. +>>> +>>> How to Do It +>>> +>>> If we want to compress Bitcoin, a programming challenge/contest would +>>> be one of the best ways to find the best possible, Bitcoin-specific +>>> compressor. This is the kind of self-contained exercise that bright +>>> young hackers love to tackle. It'd bring in new programmers into the +>>> ecosystem, and many of us would love to discover the limits of +>>> compressibility for Bitcoin bits on a wire. And the results would be +>>> interesting even if the final compression engine is not enabled by +>>> default, or not even merged. +>>> +>> +>> +>> +>> _______________________________________________ +>> bitcoin-dev mailing list +>> bitcoin-dev@lists.linuxfoundation.org +>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev +>> +> + + |