Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 87861990 for ; Wed, 11 Nov 2015 18:50:12 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-lb0-f177.google.com (mail-lb0-f177.google.com [209.85.217.177]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C42FE8C for ; Wed, 11 Nov 2015 18:50:10 +0000 (UTC) Received: by lbbcs9 with SMTP id cs9so21956997lbb.1 for ; Wed, 11 Nov 2015 10:50:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=pJdap6SZIjb+ayEcqzDYvqQxMY8AmuI2svS6qvcLYu4=; b=wyFUfb2sJBjB4bllOgpp6mIQ8JJ9LGftiyqSfBTIudKYkDHCZE3aorLcifS3CSMS5h fPJsvL3NdNIMXzsgF2yDCaNsLAx3zASzI6tP+86Cre3G0TFqWJrvqIWRwUjIJxxcoN3g jSex/Hnpmhs1yUjY9U5tH1ESKyDnG0krVlUcEeX4l89ql7u8Wa9cqK7L0gYTiJEN3NIg EIXeJGVkyExOCMKkxxoAT6V2kcpwiK7BiJtrXXljCkkAIsuM2W+xUimskxfS7R/IZqch 5JDZcDitMiApFB+dgrm2wUHqy0cocqW6YgTwtafQDgkEgQ55VaopdLMzMorYrby39ujg q+8g== X-Received: by 10.112.198.69 with SMTP id ja5mr5405945lbc.106.1447267808978; Wed, 11 Nov 2015 10:50:08 -0800 (PST) MIME-Version: 1.0 Received: by 10.114.186.106 with HTTP; Wed, 11 Nov 2015 10:49:49 -0800 (PST) In-Reply-To: <56438A55.2010604@gmail.com> References: <5640F172.3010004@gmail.com> <20151109210449.GE5886@mcelrath.org> <5642172C.701@gmail.com> <56438A55.2010604@gmail.com> From: Marco Pontello Date: Wed, 11 Nov 2015 19:49:49 +0100 Message-ID: To: Peter Tschipper Content-Type: multipart/alternative; boundary=001a11c2b582a62c890524484b40 X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Wed, 11 Nov 2015 18:54:56 +0000 Cc: Bitcoin Dev Subject: Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression" X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Development Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Nov 2015 18:50:12 -0000 --001a11c2b582a62c890524484b40 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable A random thought: aren't most communication over a data link already compressed, at some point? When I used a modem, we had the V.42bis protocol. Now, nearly all ADSL connections using PPPoE, surely are. And so on. I'm not sure another level of generic, data agnostic kind of compression will really give us some real-life practical advantage over that. Something that could take advantage of of special knowledge of the specific data, instead, would be an entirely different matter. Just my 2c. On Wed, Nov 11, 2015 at 7:35 PM, Peter Tschipper via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > Here are the latest results on compression ratios for the first 295,000 > blocks, compressionlevel=3D6. I think there are more than enough datapoi= nts > for statistical significance. > > Results are very much similar to the previous test. I'll work on gettin= g > a comparison between how much time savings/loss in time there is when > syncing the blockchains: compressed vs uncompressed. Still, I think it's > clear that serving up compressed blocks, at least historical blocks, will > be of benefit for those that have bandwidth caps on their internet > connections. > > The proposal, so far is fairly simple: > 1) compress blocks with some compression library: currently zlib but I ca= n > investigate other possiblities > 2) As a fall back we need to advertise compression as a service. That wa= y > we can turn off compression AND decompression completely if needed. > 3) Do the compression at the datastream level in the code. CDataStream i= s > the obvious place. > > > Test Results: > > range =3D block size range > ubytes =3D average size of uncompressed blocks > cbytes =3D average size of compressed blocks > ctime =3D average time to compress > dtime =3D average time to decompress > cmp_ratio% =3D compression ratio > datapoints =3D number of datapoints taken > > range ubytes cbytes ctime dtime cmp_ratio% datapoint= s > 0-250b 215 189 0.001 0.000 12.40 912= 80 > 250-500b 438 404 0.001 0.000 7.85 1321= 7 > 500-1KB 761 701 0.001 0.000 7.86 > 11434 > 1KB-10KB 4149 3547 0.001 0.000 14.51 52180 > 10KB-100KB 41934 32604 0.005 0.001 22.25 82890 > 100KB-200KB 146303 108080 0.016 0.001 26.13 29886 > 200KB-300KB 243299 179281 0.025 0.002 26.31 25066 > 300KB-400KB 344636 266177 0.036 0.003 22.77 4956 > 400KB-500KB 463201 356862 0.046 0.004 22.96 3167 > 500KB-600KB 545123 429854 0.056 0.005 21.15 366 > 600KB-700KB 647736 510931 0.065 0.006 21.12 254 > 700KB-800KB 746540 587287 0.073 0.008 21.33 294 > 800KB-900KB 868121 682650 0.087 0.008 21.36 199 > 900KB-1MB 945747 726307 0.091 0.010 23.20 304 > > On 10/11/2015 8:46 AM, Jeff Garzik via bitcoin-dev wrote: > > Comments: > > 1) cblock seems a reasonable way to extend the protocol. Further wrappin= g > should probably be done at the stream level. > > 2) zlib has crappy security track record. > > 3) A fallback path to non-compressed is required, should compression fail > or crash. > > 4) Most blocks and transactions have runs of zeroes and/or highly common > bit-patterns, which contributes to useful compression even at smaller > sizes. Peter Ts's most recent numbers bear this out. zlib has a > dictionary (32K?) which works well with repeated patterns such as those y= ou > see with concatenated runs of transactions. > > 5) LZO should provide much better compression, at a cost of CPU > performance and using a less-reviewed, less-field-tested library. > > > > > > On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev < > > bitcoin-dev@lists.linuxfoundation.org> wrote: > >> >> >> On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper < >> peter.tschipper@gmail.com> wrote: >> >>> There are better ways of sending new blocks, that's certainly true but >>> for sending historical blocks and seding transactions I don't think so. >>> This PR is really designed to save bandwidth and not intended to be a h= uge >>> performance improvement in terms of time spent sending. >>> >> >> If the main point is for historical data, then sticking to just blocks i= s >> the best plan. >> >> Since small blocks don't compress well, you could define a "cblocks" >> message that handles multiple blocks (just concatenate the block message= s >> as payload before compression). >> >> The sending peer could combine blocks so that each cblock is compressing >> at least 10kB of block data (or whatever is optimal). It is probably wo= rth >> specifying a maximum size for network buffer reasons (either 1MB or 1 bl= ock >> maximum). >> >> Similarly, transactions could be combined together and compressed >> "ctxs". The inv messages could be modified so that you can request grou= ps >> of 10-20 transactions. That would depend on how much of an improvement >> compressed transactions would represent. >> >> More generally, you could define a message which is a compressed message >> holder. That is probably to complex to be worth the effort though. >> >> >> >>> >>> On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev < >>> >>> bitcoin-dev@lists.linuxfoundation.org> wrote: >>> >>>> On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev < >>>> >>>> bitcoin-dev@lists.linuxfoundation.org> wrote: >>>> >>>> >>>>> I think 25% bandwidth savings is certainly considerable, especially >>>>> for people running full nodes in countries like Australia where inter= net >>>>> bandwidth is lower and there are data caps. >>>>> >>>> >>>> =E2=80=8B This reinforces the idea that such trade-off decisions shoul= d be be >>>> local and negotiated between peers, not a required feature of the netw= ork >>>> P2P.=E2=80=8B >>>> >>>> >>>> -- >>>> Johnathan Corgan >>>> Corgan Labs - SDR Training and Development Services >>>> http://corganlabs.com >>>> >>>> _______________________________________________ >>>> bitcoin-dev mailing list >>>> bitcoin-dev@lists.linuxfoundation.org >>>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev >>>> >>>> >>> >>> >>> _______________________________________________ >>> bitcoin-dev mailing listbitcoin-dev@lists.linuxfoundation.orghttps://li= sts.linuxfoundation.org/mailman/listinfo/bitcoin-dev >>> >>> >>> >> >> _______________________________________________ >> bitcoin-dev mailing list >> bitcoin-dev@lists.linuxfoundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev >> >> > > > _______________________________________________ > bitcoin-dev mailing listbitcoin-dev@lists.linuxfoundation.orghttps://list= s.linuxfoundation.org/mailman/listinfo/bitcoin-dev > > > > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > > --=20 Try the Online TrID File Identifier http://mark0.net/onlinetrid.aspx --001a11c2b582a62c890524484b40 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
A random thought: aren't most communication over a dat= a link already compressed, at some point?
When I used a modem, we h= ad the V.42bis protocol. Now, nearly all ADSL connections using PPPoE, sure= ly are. And so on.
I'm not sure another level of generic, dat= a agnostic kind of compression will really give us some real-life practical= advantage over that.

Something that could take ad= vantage of of special knowledge of the specific data, instead, would be an = entirely different matter.

Just my 2c.
=

On Wed, Nov 11, 2= 015 at 7:35 PM, Peter Tschipper via bitcoin-dev <bitco= in-dev@lists.linuxfoundation.org> wrote:
=20 =20 =20
Here are the latest results on compression ratios for the first 295,000 blocks, compressionlevel=3D6.=C2=A0 I think there are more than enough datapo= ints for statistical significance.=C2=A0

Results are very much similar to the previous test.=C2=A0=C2=A0 I'= ;ll work on getting a comparison between how much time savings/loss in time there is when syncing the blockchains: compressed vs uncompressed.=C2=A0 Still, I think it's clear that serving up compressed blocks, at least historical blocks, will be of benefit for those that have bandwidth caps on their internet connections.

The proposal, so far is fairly simple:
1) compress blocks with some compression library: currently zlib but I can investigate other possiblities
2) As a fall back we need to advertise compression as a service.=C2= =A0 That way we can turn off compression AND decompression completely if needed.
3) Do the compression at the datastream level in the code.=C2=A0 CDataStream is the obvious place.


Test Results:

range =3D block size range
ubytes =3D average size of uncompressed blocks
cbytes =3D average size of compressed blocks
ctime =3D average time to compress
dtime =3D average time to decompress
cmp_ratio% =3D compression ratio
datapoints =3D number of datapoints taken

range=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ubytes=C2=A0=C2=A0=C2=A0 cb= ytes=C2=A0=C2=A0=C2=A0 ctime=C2=A0=C2=A0=C2=A0 dtime=C2=A0=C2=A0=C2=A0 cmp_= ratio%=C2=A0=C2=A0=C2=A0 datapoints
0-250b=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 215=C2=A0=C2=A0=C2=A0 =C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 189=C2=A0=C2=A0=C2=A0 0.001=C2=A0=C2=A0= =C2=A0 0.000=C2=A0=C2=A0=C2=A0 12.40=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 91280
250-500b=C2=A0=C2=A0=C2=A0 438=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 404=C2=A0=C2=A0=C2=A0 0.001=C2=A0=C2=A0=C2=A0 0.00= 0=C2=A0=C2=A0=C2=A0 7.85=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 13217
500-1KB=C2=A0=C2=A0=C2=A0=C2=A0 761=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 701=C2=A0=C2=A0=C2=A0 0.001=C2=A0=C2=A0=C2=A0 0= .000=C2=A0=C2=A0=C2=A0 7.86=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 11434
1KB-10KB=C2=A0=C2=A0=C2=A0 4149=C2=A0=C2=A0=C2=A0 3547=C2=A0=C2=A0=C2= =A0 0.001=C2=A0=C2=A0=C2=A0 0.000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 14.51=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 52180
10KB-100KB=C2=A0 41934=C2=A0=C2=A0=C2=A0 32604=C2=A0=C2=A0=C2=A0 0.00= 5=C2=A0=C2=A0=C2=A0 0.001=C2=A0=C2=A0=C2=A0 22.25=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 82890
100KB-200KB 146303=C2=A0=C2=A0=C2=A0 108080=C2=A0=C2=A0=C2=A0 0.016= =C2=A0=C2=A0=C2=A0 0.001=C2=A0=C2=A0=C2=A0 26.13=C2=A0=C2=A0=C2=A0 29886 200KB-300KB 243299=C2=A0=C2=A0=C2=A0 179281=C2=A0=C2=A0=C2=A0 0.025= =C2=A0=C2=A0=C2=A0 0.002=C2=A0=C2=A0=C2=A0 26.31=C2=A0=C2=A0=C2=A0 25066 300KB-400KB 344636=C2=A0=C2=A0=C2=A0 266177=C2=A0=C2=A0=C2=A0 0.036= =C2=A0=C2=A0=C2=A0 0.003=C2=A0=C2=A0=C2=A0 22.77=C2=A0=C2=A0=C2=A0 4956
400KB-500KB 463201=C2=A0=C2=A0=C2=A0 356862=C2=A0=C2=A0=C2=A0 0.046= =C2=A0=C2=A0=C2=A0 0.004=C2=A0=C2=A0=C2=A0 22.96=C2=A0=C2=A0=C2=A0 3167
500KB-600KB 545123=C2=A0=C2=A0=C2=A0 429854=C2=A0=C2=A0=C2=A0 0.056= =C2=A0=C2=A0=C2=A0 0.005=C2=A0=C2=A0=C2=A0 21.15=C2=A0=C2=A0=C2=A0 366
600KB-700KB 647736=C2=A0=C2=A0=C2=A0 510931=C2=A0=C2=A0=C2=A0 0.065= =C2=A0=C2=A0=C2=A0 0.006=C2=A0=C2=A0=C2=A0 21.12=C2=A0=C2=A0=C2=A0 254
700KB-800KB 746540=C2=A0=C2=A0=C2=A0 587287=C2=A0=C2=A0=C2=A0 0.073= =C2=A0=C2=A0=C2=A0 0.008=C2=A0=C2=A0=C2=A0 21.33=C2=A0=C2=A0=C2=A0 294
800KB-900KB 868121=C2=A0=C2=A0=C2=A0 682650=C2=A0=C2=A0=C2=A0 0.087= =C2=A0=C2=A0=C2=A0 0.008=C2=A0=C2=A0=C2=A0 21.36=C2=A0=C2=A0=C2=A0 199
900KB-1MB=C2=A0=C2=A0 945747=C2=A0=C2=A0=C2=A0 726307=C2=A0=C2=A0=C2= =A0 0.091=C2=A0=C2=A0=C2=A0 0.010=C2=A0=C2=A0=C2=A0 23.20=C2=A0=C2=A0=C2=A0= 304

On 10/11/2015 8:46 AM, Jeff Garzik via bitcoin-dev wrote:
Comments:

1) cblock seems a reasonable way to extend the protocol.=C2=A0 Further wrapping should probably be done at the stream level.

2) zlib has crappy security track record.

3) A fallback path to non-compressed is required, should compression fail or crash.

4) Most blocks and transactions have runs of zeroes and/or highly common bit-patterns, which contributes to useful compression even at smaller sizes.=C2=A0 Peter Ts's most rece= nt numbers bear this out. =C2=A0zlib has a dictionary (32K?) which works well with repeated patterns such as those you see with concatenated runs of transactions.

5) LZO should provide much better compression, at a cost of CPU performance and using a less-reviewed, less-field-tested library.





On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev <bitcoin-dev@lists= .linuxfoundation.org> wrote:


On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper <= peter.tschipper@gma= il.com> wrote:
There are better ways of sending new blocks, that's certainly true but for sending historical blocks and seding transactions I don't think so.=C2=A0 This PR is really designe= d to save bandwidth and not intended to be a huge performance improvement in terms of time spent sending.

If the main point is for historical data, then sticking to just blocks is the best plan.

Since small blocks don't compress well, you coul= d define a "cblocks" message that handles multi= ple blocks (just concatenate the block messages as payload before compression).=C2=A0

The sending peer could combine blocks so that each cblock is compressing at least 10kB of block data (or whatever is optimal).=C2=A0 It is probably worth specifying a maximum size for network buffer reasons (either 1MB or 1 block maximum).

Similarly, transactions could be combined together and compressed "ctxs".=C2=A0 The inv= messages could be modified so that you can request groups of 10-20 transactions.=C2=A0 That would depend on how much of an improvement compressed transactions would represent.

More generally, you could define a message which is a compressed message holder.=C2=A0 That is probably = to complex to be worth the effort though.

=C2=A0

On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev <bit= coin-dev@lists.linuxfoundation.org> wrote:
On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
=C2=A0
I think 25% bandwidth savings is certainly considerable, especially for people running full nodes in countries like Australia where internet bandwidth is lower and there are data caps.

=E2=80=8B This reinforces the idea that such trade-off decisions should be be local and negotiated between peers, not a required feature of the network P2P.=E2= =80=8B
=C2=A0

--
Johnathan Corgan
Corgan Labs - SDR Training and Development Services

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org=
http= s://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev




__________________________________________=
_____
bitcoin-dev mailing list
=
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoi=
n-dev



_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundat= ion.org/mailman/listinfo/bitcoin-dev




_______________________________________________
bitcoin-dev mailing list
=
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoi=
n-dev


_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.= linuxfoundation.org
https://lists.linuxfoundation.org/mail= man/listinfo/bitcoin-dev




--
Try the Online TrID File Identifier
http://mark0.net/onlinetr= id.aspx
--001a11c2b582a62c890524484b40--