summaryrefslogtreecommitdiff
path: root/e0/c96054f68b29397c7fc3e0e64c0ed5c6edeba4
blob: ececb0f037b1d46993fbf45fa9f004dd46cef60d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
Return-Path: <peter.tschipper@gmail.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 5556E8FC
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed,  2 Dec 2015 23:02:23 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-pf0-f181.google.com (mail-pf0-f181.google.com
	[209.85.192.181])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7E47D185
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed,  2 Dec 2015 23:02:22 +0000 (UTC)
Received: by pfdd184 with SMTP id d184so2340908pfd.3
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed, 02 Dec 2015 15:02:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=subject:to:references:cc:from:message-id:date:user-agent
	:mime-version:in-reply-to:content-type:content-transfer-encoding;
	bh=My+MT62KnkdtHUMXCVudZC/T6odykjGDphxofqcHT0o=;
	b=LiuioYDma1ieeMXz613wr6rj0kFjYHhpdHKMdVKv2lC5mglVtJt6TGCKyptq5+AjDp
	E8mKcYTZh3/FxbMHz9lQ9MadCJcnVhzQkhtUth87TVZbr5lhCDdj3Rqvi4gepDB2pGWI
	Alx6x2+Dv/Cq/KsNEezU9KOF7Nckn38UjApi4rkYiwqwSKny1ZfK0diBzeSDWibaRxdJ
	I6/vVQjGUeV0bm/Ujx+TNV6tz4LCkYc5fq6olpkouU5Hnptmjssq1scNvs1+gAjZkWee
	7/q6jabzQHINbzWcTqxPv6GuEKpNXYB6cYN000xoumjvqo36dqkLFY6bKN6jaXX0rc7K
	gkew==
X-Received: by 10.98.65.135 with SMTP id g7mr8489323pfd.141.1449097342279;
	Wed, 02 Dec 2015 15:02:22 -0800 (PST)
Received: from [192.168.0.132] (S0106bcd165303d84.cc.shawcable.net.
	[96.54.102.88]) by smtp.googlemail.com with ESMTPSA id
	v89sm6389185pfa.91.2015.12.02.15.02.21
	(version=TLSv1/SSLv3 cipher=OTHER);
	Wed, 02 Dec 2015 15:02:21 -0800 (PST)
To: Matt Corallo <lf-lists@mattcorallo.com>
References: <565CD7D8.3070102@gmail.com>
	<90EF4E6C-9A71-4A35-A938-EAFC1A24DD24@mattcorallo.com>
	<CAPkFh0t9SwVOLrPnL7z80s-Rriezhqxn_3vXKYRxr6JVGNiUZQ@mail.gmail.com>
	<565F5193.1070802@gmail.com> <565F6F73.5050906@mattcorallo.com>
From: Peter Tschipper <peter.tschipper@gmail.com>
X-Enigmail-Draft-Status: N1110
Message-ID: <565F787C.3080604@gmail.com>
Date: Wed, 2 Dec 2015 15:02:20 -0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
	Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <565F6F73.5050906@mattcorallo.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM,
	RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Cc: Bitcoin Dev <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] [BIP Draft] Datastream compression of Blocks and
 Transactions
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Development Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Dec 2015 23:02:23 -0000

On 02/12/2015 2:23 PM, Matt Corallo wrote:
> My issue is more that its additional complexity and attack surface,
> and for a very minor gain 
What is a minor gain?  15 to 27% compression sounds good to me and the
larger the data the better the compression.  And although there is a
decent peformance gain in proportion to the % of compression, the
original motivation of the BIP was to reduce bandwidth for users in
regions where they are subject to caps. 
> which should disappear with further optimization elsewhere 
Why would the benefit of compressing data disappear with further
optimizations elsewhere, I'm not following you?.  The compression of
data mainly has benefit in the sending of packets over the network.  I
would think the performance gain would be cumulative.  Why would this go
away by optimizing elsewhere?

> and less that we absolutely shouldn't add compression because we're
> definitely gonna have issues.
It's not that difficult to add compression.  Even if there was an issue,
the compression feature can be completely turned off. 

>
> On 12/02/15 20:16, Peter Tschipper via bitcoin-dev wrote:
>> Building a compressor from scratch may yeild some better compression
>> ratios, or not, but having trust and faith in whether it will stand up
>> against attack vectors another matter.  LZO has been around for 20 years
>> with very few problems and no current issues.  Maybe something better
>> can be built, but when and how much testing will need to be done before
>> it can be trusted?  Right now there is something that provides a benefit
>> and in the future if something better is found it's not that difficult
>> to add it.  We could easily support multiple compression libraries.
>>
>>
>> On 02/12/2015 10:57 AM, Emin Gün Sirer wrote:
>>> Thanks Peter for the careful, quantitative work.
>>>
>>> I want to bring one additional issue to everyone's consideration,
>>> related to the choice of the Lempel-Ziv family of compressors.
>>>
>>> While I'm not familiar with every single compression engine tested,
>>> the Lempel-Ziv family of compressors are generally based on
>>> "compression tables." Essentially, they assign a short unique number
>>> to every new subsequence they encounter, and when they re-encounter a
>>> sequence like "ab" in "abcdfdcdabcdfabcdf" they replace it with that
>>> short integer (say, in this case, 9-bit constant 256). So this example
>>> sequence may turn into "abcdfd<258 for cd><256 for ab><258 for
>>> cd>f<261 for abc><259 for df>" which is slightly shorter than the
>>> original (I'm doing this off the top of my head so the counts may be
>>> off, but it's meant to be illustrative). Note that the sequence "abc"
>>> got added into the table only after it was encountered twice in the
>>> input.
>>>
>>> This is nice and generic and works well for English text where certain
>>> letter sequences (e.g. "it" "th" "the" "this" "are" "there" etc) are
>>> repeated often, but it is nowhere as compact as it could possibly be
>>> for mostly binary data -- there are opportunities for much better
>>> compression, made possible by the structured reuse of certain byte
>>> sequences in the Bitcoin wire protocol.
>>>
>>> On a Bitcoin wire connection, we might see several related
>>> transactions reorganizing cash in a set of addresses, and therefore,
>>> several reuses of a 20-byte address. Or we might see a 200-byte
>>> transaction get transmitted, followed by the same transaction,
>>> repeated in a block. Ideally, we'd learn the sequence that may be
>>> repeated later on, all at once (e.g. a Bitcoin address or a
>>> transaction), and replace it with a short number, referring back to
>>> the long sequence. In the example above, if we knew that "abcdf" was a
>>> UNIT that would likely be repeated, we would put it into the
>>> compression table as a whole, instead of relying on repetition to get
>>> it into the table one extra byte at a time. That may let us compress
>>> the original sequence down to "abcdfd<257 for cd><256 for abcdf><256
>>> for abcdf>" from the get go.
>>>
>>> Yet the LZ variants I know of will need to see a 200-byte sequence
>>> repeated **199 times** in order to develop a single, reusable,
>>> 200-byte long subsequence in the compression table.
>>>
>>> So, a Bitcoin-specific compressor can perhaps do significantly better,
>>> but is it a good idea? Let's argue both sides.
>>>
>>> Cons:
>>>
>>> On the one hand, Bitcoin-specific compressors will be closely tied to
>>> the contents of messages, which might make it difficult to change the
>>> wire format later on -- changes to the wire format may need
>>> corresponding changes to the compressor.  If the compressor cannot be
>>> implemented cleanly, then the protocol-agnostic, off-the-shelf
>>> compressors have a maintainability edge, which comes at the expense of
>>> the compression ratio.
>>>
>>> Another argument is that compression algorithms of any kind should be
>>> tested thoroughly before inclusion, and brand new code may lack the
>>> maturity required. While this argument has some merit, all outputs are
>>> verified separately later on during processing, so
>>> compression/decompression errors can potentially be detected. If the
>>> compressor/decompressor can be structured in a way that isolates
>>> bitcoind from failure (e.g. as a separate process for starters), this
>>> concern can be remedied.
>>>
>>> Pros:
>>>
>>> The nature of LZ compressors leads me to believe that much higher
>>> compression ratios are possible by building a custom, Bitcoin-aware
>>> compressor. If I had to guess, I would venture that compression ratios
>>> of 2X or more are possible in some cases. In some sense, the "O(1)
>>> block propagation" idea that Gavin proposed a while ago can be seen as
>>> extreme example of a Bitcoin-specific compressor, albeit one that
>>> constrains the order of transactions in a block.
>>>
>>> Compression can buy us some additional throughput at zero cost, modulo
>>> code complexity.
>>> Given the amount of acrimonious debate over the block size we have all
>>> had to endure, it seems
>>> criminal to leave potentially free improvements on the table. Even if
>>> the resulting code is
>>> deemed too complex to include in the production client right now, it
>>> would be good to understand
>>> the potential for improvement.
>>>
>>> How to Do It
>>>
>>> If we want to compress Bitcoin, a programming challenge/contest would
>>> be one of the best ways to find the best possible, Bitcoin-specific
>>> compressor. This is the kind of self-contained exercise that bright
>>> young hackers love to tackle. It'd bring in new programmers into the
>>> ecosystem, and many of us would love to discover the limits of
>>> compressibility for Bitcoin bits on a wire. And the results would be
>>> interesting even if the final compression engine is not enabled by
>>> default, or not even merged.
>>>
>>
>>
>>
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>