summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPeter Tschipper <peter.tschipper@gmail.com>2015-12-02 15:02:20 -0800
committerbitcoindev <bitcoindev@gnusha.org>2015-12-02 23:02:23 +0000
commit3aad6f927d263bcb873f1226028a86cbfde978c1 (patch)
tree96fb840150d2b9f8083ef05015de4c4f7cf6ec86
parent93849b529324af7f8b3e6280f9fd5cc20ff304b4 (diff)
downloadpi-bitcoindev-3aad6f927d263bcb873f1226028a86cbfde978c1.tar.gz
pi-bitcoindev-3aad6f927d263bcb873f1226028a86cbfde978c1.zip
Re: [bitcoin-dev] [BIP Draft] Datastream compression of Blocks and Transactions
-rw-r--r--e0/c96054f68b29397c7fc3e0e64c0ed5c6edeba4208
1 files changed, 208 insertions, 0 deletions
diff --git a/e0/c96054f68b29397c7fc3e0e64c0ed5c6edeba4 b/e0/c96054f68b29397c7fc3e0e64c0ed5c6edeba4
new file mode 100644
index 000000000..ececb0f03
--- /dev/null
+++ b/e0/c96054f68b29397c7fc3e0e64c0ed5c6edeba4
@@ -0,0 +1,208 @@
+Return-Path: <peter.tschipper@gmail.com>
+Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
+ [172.17.192.35])
+ by mail.linuxfoundation.org (Postfix) with ESMTPS id 5556E8FC
+ for <bitcoin-dev@lists.linuxfoundation.org>;
+ Wed, 2 Dec 2015 23:02:23 +0000 (UTC)
+X-Greylist: whitelisted by SQLgrey-1.7.6
+Received: from mail-pf0-f181.google.com (mail-pf0-f181.google.com
+ [209.85.192.181])
+ by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7E47D185
+ for <bitcoin-dev@lists.linuxfoundation.org>;
+ Wed, 2 Dec 2015 23:02:22 +0000 (UTC)
+Received: by pfdd184 with SMTP id d184so2340908pfd.3
+ for <bitcoin-dev@lists.linuxfoundation.org>;
+ Wed, 02 Dec 2015 15:02:22 -0800 (PST)
+DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
+ h=subject:to:references:cc:from:message-id:date:user-agent
+ :mime-version:in-reply-to:content-type:content-transfer-encoding;
+ bh=My+MT62KnkdtHUMXCVudZC/T6odykjGDphxofqcHT0o=;
+ b=LiuioYDma1ieeMXz613wr6rj0kFjYHhpdHKMdVKv2lC5mglVtJt6TGCKyptq5+AjDp
+ E8mKcYTZh3/FxbMHz9lQ9MadCJcnVhzQkhtUth87TVZbr5lhCDdj3Rqvi4gepDB2pGWI
+ Alx6x2+Dv/Cq/KsNEezU9KOF7Nckn38UjApi4rkYiwqwSKny1ZfK0diBzeSDWibaRxdJ
+ I6/vVQjGUeV0bm/Ujx+TNV6tz4LCkYc5fq6olpkouU5Hnptmjssq1scNvs1+gAjZkWee
+ 7/q6jabzQHINbzWcTqxPv6GuEKpNXYB6cYN000xoumjvqo36dqkLFY6bKN6jaXX0rc7K
+ gkew==
+X-Received: by 10.98.65.135 with SMTP id g7mr8489323pfd.141.1449097342279;
+ Wed, 02 Dec 2015 15:02:22 -0800 (PST)
+Received: from [192.168.0.132] (S0106bcd165303d84.cc.shawcable.net.
+ [96.54.102.88]) by smtp.googlemail.com with ESMTPSA id
+ v89sm6389185pfa.91.2015.12.02.15.02.21
+ (version=TLSv1/SSLv3 cipher=OTHER);
+ Wed, 02 Dec 2015 15:02:21 -0800 (PST)
+To: Matt Corallo <lf-lists@mattcorallo.com>
+References: <565CD7D8.3070102@gmail.com>
+ <90EF4E6C-9A71-4A35-A938-EAFC1A24DD24@mattcorallo.com>
+ <CAPkFh0t9SwVOLrPnL7z80s-Rriezhqxn_3vXKYRxr6JVGNiUZQ@mail.gmail.com>
+ <565F5193.1070802@gmail.com> <565F6F73.5050906@mattcorallo.com>
+From: Peter Tschipper <peter.tschipper@gmail.com>
+X-Enigmail-Draft-Status: N1110
+Message-ID: <565F787C.3080604@gmail.com>
+Date: Wed, 2 Dec 2015 15:02:20 -0800
+User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
+ Thunderbird/38.3.0
+MIME-Version: 1.0
+In-Reply-To: <565F6F73.5050906@mattcorallo.com>
+Content-Type: text/plain; charset=windows-1252
+Content-Transfer-Encoding: 8bit
+X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,
+ DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM,
+ RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1
+X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
+ smtp1.linux-foundation.org
+Cc: Bitcoin Dev <bitcoin-dev@lists.linuxfoundation.org>
+Subject: Re: [bitcoin-dev] [BIP Draft] Datastream compression of Blocks and
+ Transactions
+X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
+X-Mailman-Version: 2.1.12
+Precedence: list
+List-Id: Bitcoin Development Discussion <bitcoin-dev.lists.linuxfoundation.org>
+List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
+ <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
+List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
+List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
+List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
+List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
+ <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
+X-List-Received-Date: Wed, 02 Dec 2015 23:02:23 -0000
+
+On 02/12/2015 2:23 PM, Matt Corallo wrote:
+> My issue is more that its additional complexity and attack surface,
+> and for a very minor gain
+What is a minor gain? 15 to 27% compression sounds good to me and the
+larger the data the better the compression. And although there is a
+decent peformance gain in proportion to the % of compression, the
+original motivation of the BIP was to reduce bandwidth for users in
+regions where they are subject to caps.
+> which should disappear with further optimization elsewhere
+Why would the benefit of compressing data disappear with further
+optimizations elsewhere, I'm not following you?. The compression of
+data mainly has benefit in the sending of packets over the network. I
+would think the performance gain would be cumulative. Why would this go
+away by optimizing elsewhere?
+
+> and less that we absolutely shouldn't add compression because we're
+> definitely gonna have issues.
+It's not that difficult to add compression. Even if there was an issue,
+the compression feature can be completely turned off.
+
+>
+> On 12/02/15 20:16, Peter Tschipper via bitcoin-dev wrote:
+>> Building a compressor from scratch may yeild some better compression
+>> ratios, or not, but having trust and faith in whether it will stand up
+>> against attack vectors another matter. LZO has been around for 20 years
+>> with very few problems and no current issues. Maybe something better
+>> can be built, but when and how much testing will need to be done before
+>> it can be trusted? Right now there is something that provides a benefit
+>> and in the future if something better is found it's not that difficult
+>> to add it. We could easily support multiple compression libraries.
+>>
+>>
+>> On 02/12/2015 10:57 AM, Emin G�n Sirer wrote:
+>>> Thanks Peter for the careful, quantitative work.
+>>>
+>>> I want to bring one additional issue to everyone's consideration,
+>>> related to the choice of the Lempel-Ziv family of compressors.
+>>>
+>>> While I'm not familiar with every single compression engine tested,
+>>> the Lempel-Ziv family of compressors are generally based on
+>>> "compression tables." Essentially, they assign a short unique number
+>>> to every new subsequence they encounter, and when they re-encounter a
+>>> sequence like "ab" in "abcdfdcdabcdfabcdf" they replace it with that
+>>> short integer (say, in this case, 9-bit constant 256). So this example
+>>> sequence may turn into "abcdfd<258 for cd><256 for ab><258 for
+>>> cd>f<261 for abc><259 for df>" which is slightly shorter than the
+>>> original (I'm doing this off the top of my head so the counts may be
+>>> off, but it's meant to be illustrative). Note that the sequence "abc"
+>>> got added into the table only after it was encountered twice in the
+>>> input.
+>>>
+>>> This is nice and generic and works well for English text where certain
+>>> letter sequences (e.g. "it" "th" "the" "this" "are" "there" etc) are
+>>> repeated often, but it is nowhere as compact as it could possibly be
+>>> for mostly binary data -- there are opportunities for much better
+>>> compression, made possible by the structured reuse of certain byte
+>>> sequences in the Bitcoin wire protocol.
+>>>
+>>> On a Bitcoin wire connection, we might see several related
+>>> transactions reorganizing cash in a set of addresses, and therefore,
+>>> several reuses of a 20-byte address. Or we might see a 200-byte
+>>> transaction get transmitted, followed by the same transaction,
+>>> repeated in a block. Ideally, we'd learn the sequence that may be
+>>> repeated later on, all at once (e.g. a Bitcoin address or a
+>>> transaction), and replace it with a short number, referring back to
+>>> the long sequence. In the example above, if we knew that "abcdf" was a
+>>> UNIT that would likely be repeated, we would put it into the
+>>> compression table as a whole, instead of relying on repetition to get
+>>> it into the table one extra byte at a time. That may let us compress
+>>> the original sequence down to "abcdfd<257 for cd><256 for abcdf><256
+>>> for abcdf>" from the get go.
+>>>
+>>> Yet the LZ variants I know of will need to see a 200-byte sequence
+>>> repeated **199 times** in order to develop a single, reusable,
+>>> 200-byte long subsequence in the compression table.
+>>>
+>>> So, a Bitcoin-specific compressor can perhaps do significantly better,
+>>> but is it a good idea? Let's argue both sides.
+>>>
+>>> Cons:
+>>>
+>>> On the one hand, Bitcoin-specific compressors will be closely tied to
+>>> the contents of messages, which might make it difficult to change the
+>>> wire format later on -- changes to the wire format may need
+>>> corresponding changes to the compressor. If the compressor cannot be
+>>> implemented cleanly, then the protocol-agnostic, off-the-shelf
+>>> compressors have a maintainability edge, which comes at the expense of
+>>> the compression ratio.
+>>>
+>>> Another argument is that compression algorithms of any kind should be
+>>> tested thoroughly before inclusion, and brand new code may lack the
+>>> maturity required. While this argument has some merit, all outputs are
+>>> verified separately later on during processing, so
+>>> compression/decompression errors can potentially be detected. If the
+>>> compressor/decompressor can be structured in a way that isolates
+>>> bitcoind from failure (e.g. as a separate process for starters), this
+>>> concern can be remedied.
+>>>
+>>> Pros:
+>>>
+>>> The nature of LZ compressors leads me to believe that much higher
+>>> compression ratios are possible by building a custom, Bitcoin-aware
+>>> compressor. If I had to guess, I would venture that compression ratios
+>>> of 2X or more are possible in some cases. In some sense, the "O(1)
+>>> block propagation" idea that Gavin proposed a while ago can be seen as
+>>> extreme example of a Bitcoin-specific compressor, albeit one that
+>>> constrains the order of transactions in a block.
+>>>
+>>> Compression can buy us some additional throughput at zero cost, modulo
+>>> code complexity.
+>>> Given the amount of acrimonious debate over the block size we have all
+>>> had to endure, it seems
+>>> criminal to leave potentially free improvements on the table. Even if
+>>> the resulting code is
+>>> deemed too complex to include in the production client right now, it
+>>> would be good to understand
+>>> the potential for improvement.
+>>>
+>>> How to Do It
+>>>
+>>> If we want to compress Bitcoin, a programming challenge/contest would
+>>> be one of the best ways to find the best possible, Bitcoin-specific
+>>> compressor. This is the kind of self-contained exercise that bright
+>>> young hackers love to tackle. It'd bring in new programmers into the
+>>> ecosystem, and many of us would love to discover the limits of
+>>> compressibility for Bitcoin bits on a wire. And the results would be
+>>> interesting even if the final compression engine is not enabled by
+>>> default, or not even merged.
+>>>
+>>
+>>
+>>
+>> _______________________________________________
+>> bitcoin-dev mailing list
+>> bitcoin-dev@lists.linuxfoundation.org
+>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
+>>
+>
+
+