Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 7F61084 for ; Wed, 18 Nov 2015 14:00:37 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pa0-f41.google.com (mail-pa0-f41.google.com [209.85.220.41]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 6AF3A14E for ; Wed, 18 Nov 2015 14:00:36 +0000 (UTC) Received: by pabfh17 with SMTP id fh17so47515748pab.0 for ; Wed, 18 Nov 2015 06:00:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type; bh=QKJEOBH8PgwUVxGKU3sfz1wF7HJ6Y+U1iWWKlr0EJBc=; b=se1dsx2+n8+A+VlkCpBOnh3YA4W0JOXRS4y2pQurjPVcxikt4i3FRMSUFBvSEEXlVC xLgqNdfkcKMafAyH9ehoR5IfqBERk0qofk6eNjRq5tRb3QHO9YjfBr6uaHIPxH7addRl kC107Df5sMsw/K0OP/HG26ENmYS1pXyQ9UGwAeUUkdHI9dm1p/YxXcz4M6qeFtw5IjHK +nolkQUEK2alTfVh4WUTySdrKtUab1Z6Cr4BLtS3nbFeytVtRP4xWPOtDLJYfhrCm6eP EF+KRdVN+sDPWwpxT0TstdHwiBEtZFnyhEfWztEbKyPF9Vu5NDg8WibsgXLSXt6xeERd BdJw== X-Received: by 10.68.248.102 with SMTP id yl6mr2387385pbc.10.1447855236095; Wed, 18 Nov 2015 06:00:36 -0800 (PST) Received: from [192.168.0.132] (S0106bcd165303d84.cc.shawcable.net. [96.54.102.88]) by smtp.googlemail.com with ESMTPSA id mt2sm4519438pbb.90.2015.11.18.06.00.34 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Nov 2015 06:00:35 -0800 (PST) To: bitcoin-dev@lists.linuxfoundation.org References: <5640F172.3010004@gmail.com> <20151109210449.GE5886@mcelrath.org> <5642172C.701@gmail.com> <56438A55.2010604@gmail.com> <27BB52F9-3E3F-443D-93BC-B6843EB992F5@toom.im> <56465CEE.6010109@gmail.com> From: Peter Tschipper X-Enigmail-Draft-Status: N1110 Message-ID: <564C8483.1000901@gmail.com> Date: Wed, 18 Nov 2015 06:00:35 -0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <56465CEE.6010109@gmail.com> Content-Type: multipart/alternative; boundary="------------040200020804030304040102" X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [bitcoin-dev] More findings: Block Compression (Datastream Compression) test results using the PR#6973 compression prototype X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Development Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Nov 2015 14:00:37 -0000 This is a multi-part message in MIME format. --------------040200020804030304040102 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Hi all, I'm still doing a little more investigation before opening up a formal bip PR, but getting close. Here are some more findings. After moving the compression from main.cpp to streams.h (CDataStream) it was a simple matter to add compression to transactions as well. Results as follows: range = block size range ubytes = average size of uncompressed transactions cbytes = average size of compressed transactions cmp_ratio% = compression ratio datapoints = number of datapoints taken range ubytes cbytes cmp_ratio% datapoints 0-250b 220 227 -3.16 23780 250-500b 356 354 0.68 20882 500-600 534 505 5.29 2772 600-700 653 608 6.95 1853 700-800 757 649 14.22 578 800-900 822 758 7.77 661 900-1KB 954 862 9.69 906 1KB-10KB 2698 2222 17.64 3370 10KB-100KB 15463 12092 21.8 15429 A couple of obvious observations. Transactions don't compress well below 500 bytes but do very well beyond 1KB where there are a great deal of those large spam type transactions. However, most transactions happen to be in the < 500 byte range. So the next step was to appy bundling, or the creating of a "blob" for those smaller transactions, if and only if there are multiple tx's in the getdata receive queue for a peer. Doing that yields some very good compression ratios. Some examples as follows: The best one I've seen so far was the following where 175 transactions were bundled into one blob before being compressed. That yielded a 20% compression ratio, but that doesn't take into account the savings from the unneeded 174 message headers (24 bytes each) as well as 174 TCP ACK's of 52 bytes each which yields and additional 76*174=13224 bytes, making the overall bandwidth savings 32%, in this particular case. *2015-11-18 01:09:09.002061 compressed blob from 79890 to 67426 txcount:175* To be sure, this was an extreme example. Most transaction blobs were in the 2 to 10 transaction range. Such as the following: *2015-11-17 21:08:28.469313 compressed blob from 3199 to 2876 txcount:10* But even here the savings are 10%, far better than the "nothing" we would get without bundling, but add to that the 76 byte * 9 transaction savings and we have a total 20% savings in bandwidth for transactions that otherwise would not be compressible. The same bundling was applied to blocks and very good compression ratios are seen when sync'ing the blockchain. Overall the bundling or blobbing of tx's and blocks seems to be a good idea for improving bandwith use but also there is a scalability factor here, when the system is busy, transactions are bundled more often, compressed, sent faster, keeping message queue and network chatter to a minimum. I think I have enough information to put together a formal BIP with the exception of which compression library to implement. These tests were done using ZLib but I'll also be running tests in the coming days with LZO (Jeff Garzik's suggestion) and perhaps Snappy. If there are any other libraries that people would like me to get results for please let me know and I'll pick maybe the top 2 or 3 and get results back to the group. On 13/11/2015 1:58 PM, Peter Tschipper wrote: > Some further Block Compression tests results that compare performance > when network latency is added to the mix. > > Running two nodes, windows 7, compressionlevel=6, syncing the first > 200000 blocks from one node to another. Running on a highspeed > wireless LAN with no connections to the outside world. > Network latency was added by using Netbalancer to induce the 30ms and > 60ms latencies. > > From the data not only are bandwidth savings seen but also a small > performance savings as well. However, the overall the value in > compressing blocks appears to be in terms of saving bandwidth. > > I was also surprised to see that there was no real difference in > performance when no latency was present; apparently the time it takes > to compress is about equal to the performance savings in such a situation. > > > The following results compare the tests in terms of how long it takes > to sync the blockchain, compressed vs uncompressed and with varying > latencies. > uncmp = uncompressed > cmp = compressed > > num blocks sync'd uncmp (secs) cmp (secs) uncmp 30ms (secs) cmp > 30ms (secs) uncmp 60ms (secs) cmp 60ms (secs) > 10000 264 269 265 257 274 275 > 20000 482 492 479 467 499 497 > 30000 703 717 693 676 724 724 > 40000 918 939 902 886 947 944 > 50000 1140 1157 1114 1094 1171 1167 > 60000 1362 1380 1329 1310 1400 1395 > 70000 1583 1597 1547 1526 1637 1627 > 80000 1810 1817 1767 1745 1872 1862 > 90000 2031 2036 1985 1958 2109 2098 > 100000 2257 2260 2223 2184 2385 2355 > 110000 2553 2486 2478 2422 2755 2696 > 120000 2800 2724 2849 2771 3345 3254 > 130000 3078 2994 3356 3257 4125 4006 > 140000 3442 3365 3979 3870 5032 4904 > 150000 3803 3729 4586 4464 5928 5797 > 160000 4148 4075 5168 5034 6801 6661 > 170000 4509 4479 5768 5619 7711 7557 > 180000 4947 4924 6389 6227 8653 8479 > 190000 5858 5855 7302 7107 9768 9566 > 200000 6980 6969 8469 8220 10944 10724 > > --------------040200020804030304040102 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit
Hi all,

I'm still doing a little more investigation before opening up a formal bip PR, but getting close.  Here are some more findings.

After moving the compression from main.cpp to streams.h (CDataStream) it was a simple matter to add compression to transactions as well. Results as follows:

range = block size range
ubytes = average size of uncompressed transactions
cbytes = average size of compressed transactions
cmp_ratio% = compression ratio
datapoints = number of datapoints taken

range ubytes cbytes cmp_ratio% datapoints
0-250b 220 227 -3.16 23780
250-500b 356 354 0.68 20882
500-600 534 505 5.29 2772
600-700 653 608 6.95 1853
700-800 757 649 14.22 578
800-900  822 758 7.77 661
900-1KB 954 862 9.69 906
1KB-10KB  2698 2222 17.64 3370
10KB-100KB 15463 12092 21.8 15429

A couple of obvious observations.  Transactions don't compress well below 500 bytes but do very well beyond 1KB where there are a great deal of those large spam type transactions.   However, most transactions happen to be in the < 500 byte range.  So the next step was to appy bundling, or the creating of a "blob" for those smaller transactions, if and only if there are multiple tx's in the getdata receive queue for a peer.  Doing that yields some very good compression ratios.  Some examples as follows:

The best one I've seen so far was the following where 175 transactions were bundled into one blob before being compressed.  That yielded a 20% compression ratio, but that doesn't take into account the savings from the unneeded 174 message headers (24 bytes each) as well as 174 TCP ACK's of 52 bytes each which yields and additional 76*174=13224 bytes, making the overall bandwidth savings 32%, in this particular case.

2015-11-18 01:09:09.002061 compressed blob from 79890 to 67426 txcount:175

To be sure, this was an extreme example.  Most transaction blobs were in the 2 to 10 transaction range.  Such as the following:

2015-11-17 21:08:28.469313 compressed blob from 3199 to 2876 txcount:10

But even here the savings are 10%, far better than the "nothing" we would get without bundling, but add to that the 76 byte * 9 transaction savings and we have a total 20% savings in bandwidth for transactions that otherwise would not be compressible.

The same bundling was applied to blocks and very good compression ratios are seen when sync'ing the blockchain.

Overall the bundling or blobbing of tx's and blocks seems to be a good idea for improving bandwith use but also there is a scalability factor here, when the system is busy, transactions are bundled more often, compressed, sent faster, keeping message queue and network chatter to a minimum.

I think I have enough information to put together a formal BIP with the exception of which compression library to implement.  These tests were done using ZLib but I'll also be running tests in the coming days with LZO (Jeff Garzik's suggestion) and perhaps Snappy.  If there are any other libraries that people would like me to get results for please let me know and I'll pick maybe the top 2 or 3 and get results back to the group.



On 13/11/2015 1:58 PM, Peter Tschipper wrote:
Some further Block Compression tests results that compare performance when network latency is added to the mix.

Running two nodes, windows 7, compressionlevel=6, syncing the first 200000 blocks from one node to another.  Running on a highspeed wireless LAN with no connections to the outside world.
Network latency was added by using Netbalancer to induce the 30ms and 60ms latencies.

From the data not only are bandwidth savings seen but also a small performance savings as well.  However, the overall the value in compressing blocks appears to be in terms of saving bandwidth.  

I was also surprised to see that there was no real difference in performance when no latency was present; apparently the time it takes to compress is about equal to the performance savings in such a situation.


The following results compare the tests in terms of how long it takes to sync the blockchain, compressed vs uncompressed and with varying latencies.
uncmp = uncompressed
cmp = compressed

num blocks sync'd uncmp (secs) cmp (secs) uncmp 30ms (secs) cmp 30ms (secs) uncmp 60ms (secs) cmp 60ms (secs)
10000 264 269 265 257 274 275
20000 482 492 479 467 499 497
30000 703 717 693 676 724 724
40000 918 939 902 886 947 944
50000 1140 1157 1114 1094 1171 1167
60000 1362 1380 1329 1310 1400 1395
70000 1583 1597 1547 1526 1637 1627
80000 1810 1817 1767 1745 1872 1862
90000 2031 2036 1985 1958 2109 2098
100000 2257 2260 2223 2184 2385 2355
110000 2553 2486 2478 2422 2755 2696
120000 2800 2724 2849 2771 3345 3254
130000 3078 2994 3356 3257 4125 4006
140000 3442 3365 3979 3870 5032 4904
150000 3803 3729 4586 4464 5928 5797
160000 4148 4075 5168 5034 6801 6661
170000 4509 4479 5768 5619 7711 7557
180000 4947 4924 6389 6227 8653 8479
190000 5858 5855 7302 7107 9768 9566
200000 6980 6969 8469 8220 10944 10724


--------------040200020804030304040102--