Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 51851892 for ; Thu, 3 Dec 2015 05:52:23 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-oi0-f51.google.com (mail-oi0-f51.google.com [209.85.218.51]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id A6715138 for ; Thu, 3 Dec 2015 05:52:21 +0000 (UTC) Received: by oiww189 with SMTP id w189so40494636oiw.3 for ; Wed, 02 Dec 2015 21:52:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=38PJrchmHkJEfxxU0pRcJWpO9RHYFg3FfTomDkBMdcI=; b=mRekg8UQKaNaps9u4ok3JLNY1esVpa+cwg8qoWw+GvCmpAZxZWQOs3nSN1roIbsDG6 YNKxBMbpxjT/AD+c/fON8UL16/s7hrsYuxwuRtA82mPq23BBAUaRoJ2W5pa1vixcrQQy EzjbeXyFXPqiHC57dXUzTUiS270DqvfqoZMIcaMSrWfs/wlolQ0JoFo9RryRBZ9rq666 jZqI7uHsNosM6+cIFA05yMB/NMfX527NUEzTF6P6xCHx3SiZkSO5ehWKFfbCCUJXEoAm 2WYESF4XIlIb0XKCy+QKjIY7ni82oRYYksoZWhitC3r1HfNmNvebtdKxoWYn4p2IXVmO Kjfw== MIME-Version: 1.0 X-Received: by 10.202.222.193 with SMTP id v184mr5110925oig.15.1449121940929; Wed, 02 Dec 2015 21:52:20 -0800 (PST) Sender: dscotese@gmail.com Received: by 10.60.16.39 with HTTP; Wed, 2 Dec 2015 21:52:20 -0800 (PST) In-Reply-To: <565F7926.103@gmail.com> References: <565CD7D8.3070102@gmail.com> <90EF4E6C-9A71-4A35-A938-EAFC1A24DD24@mattcorallo.com> <565F7926.103@gmail.com> Date: Wed, 2 Dec 2015 21:52:20 -0800 X-Google-Sender-Auth: aQz3HCd0y-gFE4kbjaKaLFVtCac Message-ID: From: Dave Scotese To: Peter Tschipper Content-Type: multipart/alternative; boundary=001a113d54d48683f00525f7fe7d X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Thu, 03 Dec 2015 12:38:35 +0000 Cc: Bitcoin Dev Subject: Re: [bitcoin-dev] [BIP Draft] Datastream compression of Blocks and Transactions X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Development Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 05:52:23 -0000 --001a113d54d48683f00525f7fe7d Content-Type: text/plain; charset=UTF-8 Emin's email presents to me the idea of dictionaries that already contain the data we'd want to compress. With 8 bytes of indexing data, we can refer to a TxID or a Public Key or any existing part of the blockchain. There are also data sequences like scripts that contain a few variable chunks and are otherwise identical. Often, the receiver has the blockchain, which contains a lot of the data that is in the message being transmitted. First, the receiver must indicate that compressed data is preferred and the height of latest valid block it holds, and the sender must express the ability to send compressed data. From this state, the sender sends messages that are compressed. Compressed messages are the same as uncompressed messages except that: 1. Data read is copied into the decompressed message until the first occurrence of 0x00, which is discarded and is followed by compressed data. 2. Compressed data can use as a dictionary the first 16,777,215 blocks, or the last 4,244,635,647 ending with the block at the tip of the receiver's chain, or it can specify a run of zero bytes. The sender and receiver must agree on the *receiver's* current block height in order to use the last 4B blocks as the dictionary. 3. Within compressed data, the first byte identifies how to decompress: 1. 0xFF indicates that the following three bytes are a block height with most significant byte 0x00 in network byte order. 2. 0xFE indicates that the following byte indicates how many zero bytes to add to the decompressed data. 3. 0xFD is an error, so compressed messages are turned off and the recipient fails the decompression process. 4. 0x00 indicates that the zero byte by itself should be added to the decompressed data, and the data following is not compressed (return to step 1). 5. All other values represent the most significant byte of a number to be subtracted from the receiver's current block height to identify a block height (not available until there are least 16,777,216 blocks so that this byte can be at least 0x01, since 0x00 would indicate a single zero byte, end compressed data, and return to step 1). 4. If decompression has identified a block height (previous byte was not 0xFD, 0x00, or 0xFE), then the next four bytes identify a *size *(one byte) and a byte index into the block's data (three bytes), and *size *bytes from that block are added to the decompressed data. 5. Steps 3 and 4 process a chunk of compressed data. If the next byte is 0xFD, then decompression goes back to step 1 (add raw bytes until it hits a 0x00). Otherwise, it proceeds through steps 3 (and maybe 4) again. In Step 3.3, 0xFD causes an error, but it could be used to indicate a parameterized dictionary entry, for example 0xFD, 0x01 followed by eight more bytes to be interpreted according to steps 3.1 or 3.5 could mean OP_DUP OP_HASH160 (20 bytes from the blockchain dictionary) OP_EQUALVERIFY OP_CHECKSIG, replacing that very common occurrence of 24 bytes with 10 bytes. Well, 11 if you include the 0x00 required by step5. But that only works on addresses that have spent inputs. Or 0xFD, 0x02 could be shorthand for the four zeroes of lock_time, followed by Version (1), followed by 0x01 (for one-input transactions), turning nine bytes into two for the data at the end of a normal (lock_time = 0) Txn and the beginning of a single-input Txn. But I left 0xFD as an error because those gains didn't seem as frequent as the others. Dave. On Wed, Dec 2, 2015 at 3:05 PM, Peter Tschipper via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > > On 30/11/2015 9:28 PM, Matt Corallo wrote: > > I'm really not a fan of this at all. To start with, adding a compression library that is directly accessible to the network on financial software is a really, really scary idea. > > Why scary? LZO has no current security issues, and it will be > configureable by each node operator so it can be turned off completely if > needed or desired. > > If there were a massive improvement, I'd find it acceptable, but the improvement you've shown really isn't all that much. > > Why is 15% at the low end, to 27% at the high end not good? It sounds > like a very good boost. > > The numbers you recently posted show it improving the very beginning of IBD somewhat over high-latency connections, but if we're throughput-limited after the very beginning of IBD, we should fix that, not compress the blocks. > > I only did the compression up to the 200,000 block to better isolate the > transmission of data from the post processing of blocks and determine > whether the compressing of data was adding to much to the total > transmission time. > > I think it's clear from the data that as the data (blocks, transactions) > increase in size that (1) they compress better and (2) they have a bigger > and positive impact on improving performance when compressed. > > Additionally, I'd be very surprised if this had any significant effect on the speed at which new blocks traverse the network (do you have any simulations or other thoughts on this?). > > From the table below, at 120000 blocks the time to sync the chain was > roughly the same for compressed vs. uncompressed however after that point > as block sizes start increasing, all compression libraries peformed much > faster than uncompressed. The data provided in this testing clearly shows > that as block size increases, the performance improvement by compressing > data also increases. > > TABLE 5: > Results shown in seconds with 60ms of induced latency > Num blks sync'd Uncmp Zlib-1 Zlib-6 LZO1x-1 LZO1x-999 > --------------- ----- ------ ------ ------- --------- > 120000 3226 3416 3397 3266 3302 > 130000 4010 3983 3773 3625 3703 > 140000 4914 4503 4292 4127 4287 > 150000 5806 4928 4719 4529 4821 > 160000 6674 5249 5164 4840 5314 > 170000 7563 5603 5669 5289 6002 > 180000 8477 6054 6268 5858 6638 > 190000 9843 7085 7278 6868 7679 > 200000 11338 8215 8433 8044 8795 > > > As far as, what happens after the block is received, then obviously > compression isn't going to help in post processing and validating the > block, but in the pure transmission of the object it most certainly and > logically does and in a fairly direct proportion to the file size (a file > that is 20% smaller will be transmited "at least" 20% faster, you can use > any data transfer time calculator > for that). > The only issue, that I can see that required testing was to show how much > compression there would be, and how much time the compression of the data > would add to the sending of the data. > > > > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > > -- I like to provide some work at no charge to prove my value. Do you need a techie? I own Litmocracy and Meme Racing (in alpha). I'm the webmaster for The Voluntaryist which now accepts Bitcoin. I also code for The Dollar Vigilante . "He ought to find it more profitable to play by the rules" - Satoshi Nakamoto --001a113d54d48683f00525f7fe7d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Emin's email presents to me the idea of dictionar= ies that already contain the data we'd want to compress.=C2=A0 With 8 b= ytes of indexing data, we can refer to a TxID or a Public Key or any existi= ng part of the blockchain.=C2=A0 There are also data sequences like scripts= that contain a few variable chunks and are otherwise identical.=C2=A0 Ofte= n, the receiver has the blockchain, which contains a lot of the data that i= s in the message being transmitted.

First, the receiver must indicat= e that compressed data is preferred and the height of latest valid block it= holds, and the sender must express the ability to send compressed data.=C2= =A0 From this state, the sender sends messages that are compressed.=C2=A0 C= ompressed messages are the same as uncompressed messages except that:
  • Data read is copied into the decompressed message until the first occ= urrence of 0x00, which is discarded and is followed by compressed data.
    =
  • Compressed data can use as a dictionary the first 16,777,215 block= s, or the last 4,244,635,647 ending with the block at the tip of the receiv= er's chain, or it can specify a run of zero bytes.=C2=A0 The sender and= receiver must agree on the receiver's current block height in o= rder to use the last 4B blocks as the dictionary.
  • Within compressed= data, the first byte identifies how to decompress:
    1. 0xFF indica= tes that the following three bytes are a block height with most significant= byte 0x00 in network byte order.
    2. 0xFE indicates that the following= byte indicates how many zero bytes to add to the decompressed data.
    3. 0xFD is an error, so compressed messages are turned off and the recipient= fails the decompression process.
    4. 0x00 indicates that the zero byte= by itself should be added to the decompressed data, and the data following= is not compressed (return to step 1).
    5. All other values represent t= he most significant byte of a number to be subtracted from the receiver'= ;s current block height to identify a block height (not available until the= re are least 16,777,216 blocks so that this byte can be at least 0x01, sinc= e 0x00 would indicate a single zero byte, end compressed data, and return t= o step 1).
  • If decompression has identified a block height (pre= vious byte was not 0xFD, 0x00, or 0xFE), then the next four bytes identify = a size (one byte) and a byte index into the block's data (three = bytes), and size bytes from that block are added to the decompressed= data.
  • Steps 3 and 4 process a chunk of compressed data.=C2=A0 If t= he next byte is 0xFD, then decompression goes back to step 1 (add raw bytes= until it hits a 0x00).=C2=A0 Otherwise, it proceeds through steps 3 (and m= aybe 4) again.
  • In Step 3.3, 0xFD causes an error, but it could = be used to indicate a parameterized dictionary entry, for example 0xFD, 0x0= 1 followed by eight more bytes to be interpreted according to steps 3.1 or = 3.5 could mean OP_DUP OP_HASH160 (20 bytes from the blockchain dictionary) OP_EQUALVERIFY OP_CHECKSIG, replacing that very common occurrence of 24 byt= es with 10 bytes.=C2=A0 Well, 11 if you include the 0x00 required by step5.= =C2=A0 But that only works on addresses that have spent inputs.=C2=A0 Or 0x= FD, 0x02 could be shorthand for the four zeroes of lock_time, followed by V= ersion (1), followed by 0x01 (for one-input transactions), turning nine byt= es into two for the data at the end of a normal (lock_time =3D 0) Txn and t= he beginning of a single-input Txn.=C2=A0 But I left 0xFD as an error becau= se those gains didn't seem as frequent as the others.

    Dave.


    On Wed, Dec 2, 2015 at 3:05 PM, Peter Tschipper via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
    =20 =20 =20

    On 30/11/2015 9:28 PM, Matt Corallo wrote:
    I'm really not a fan of this at all. To start with, adding a=
     compression library that is directly accessible to the network on financia=
    l software is a really, really scary idea. 
    Why scary?=C2=A0 LZO has no current security issues, and it will be configureable by each node operator so it can be turned off completely if needed or desired.=C2=A0
    If there were a massive improvement, I'd find it acceptable,=
     but the improvement you've shown really isn't all that much.
    Why is 15% at the low end, to 27% at the high end not good?=C2=A0 It sounds like a very good boost.=C2=A0=C2=A0=C2=A0
     The numbers you recently posted show it improving the very begi=
    nning of IBD somewhat over high-latency connections, but if we're throu=
    ghput-limited after the very beginning of IBD, we should fix that, not comp=
    ress the blocks. 
    I only did the compression up to the 200,000 block to better isolate the transmission of data from the post processing of blocks and determine whether the compressing of data was adding to much to the total transmission time.

    I think it's clear from the data that as the data (blocks, transactions) increase in size that (1) they compress better and (2) they have a bigger and positive impact on improving performance when compressed.

    Additionally, I'd be very surprised if this had any signific=
    ant effect on the speed at which new blocks traverse the network (do you ha=
    ve any simulations or other thoughts on this?).
    
    From the table below, at 120000 blocks the time to sync the chain was roughly the same for compressed vs. uncompressed however after that point as block sizes start increasing, all compression libraries peformed much faster than uncompressed. The data provided in this testing clearly shows that as block size increases, the performance improvement by compressing data also increases.

    TABLE 5:
    Results shown in seconds with 60ms of induced latency
    Num blks sync'd=C2=A0 Uncmp=C2=A0 Zlib-1=C2=A0 Zlib-6=C2=A0 LZO1x-1= =C2=A0 LZO1x-999
    ---------------=C2=A0 -----=C2=A0 ------=C2=A0 ------=C2=A0 -------=C2= =A0 ---------
    120000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3226= =C2=A0=C2=A0 3416=C2=A0=C2=A0=C2=A0 3397=C2=A0=C2=A0=C2=A0 3266=C2=A0=C2=A0= =C2=A0=C2=A0 3302
    130000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 4010= =C2=A0=C2=A0 3983=C2=A0=C2=A0=C2=A0 3773=C2=A0=C2=A0=C2=A0 3625=C2=A0=C2=A0= =C2=A0=C2=A0 3703
    140000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 4914= =C2=A0=C2=A0 4503=C2=A0=C2=A0=C2=A0 4292=C2=A0=C2=A0=C2=A0 4127=C2=A0=C2=A0= =C2=A0=C2=A0 4287
    150000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 5806= =C2=A0=C2=A0 4928=C2=A0=C2=A0=C2=A0 4719=C2=A0=C2=A0=C2=A0 4529=C2=A0=C2=A0= =C2=A0=C2=A0 4821
    160000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 6674= =C2=A0=C2=A0 5249=C2=A0=C2=A0=C2=A0 5164=C2=A0=C2=A0=C2=A0 4840=C2=A0=C2=A0= =C2=A0=C2=A0 5314
    170000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 7563= =C2=A0=C2=A0 5603=C2=A0=C2=A0=C2=A0 5669=C2=A0=C2=A0=C2=A0 5289=C2=A0=C2=A0= =C2=A0=C2=A0 6002
    180000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 8477= =C2=A0=C2=A0 6054=C2=A0=C2=A0=C2=A0 6268=C2=A0=C2=A0=C2=A0 5858=C2=A0=C2=A0= =C2=A0=C2=A0 6638
    190000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 9843= =C2=A0=C2=A0 7085=C2=A0=C2=A0=C2=A0 7278=C2=A0=C2=A0=C2=A0 6868=C2=A0=C2=A0= =C2=A0=C2=A0 7679
    200000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1133= 8=C2=A0 8215=C2=A0=C2=A0=C2=A0 8433=C2=A0=C2=A0=C2=A0 8044=C2=A0=C2=A0=C2= =A0=C2=A0 8795


    As far as, what happens after the block is received, then obviously compression isn't going to help in post processing and validating the block, but in the pure transmission of the object it most certainly and logically does and in a fairly direct proportion to the file size (a file that is 20% smaller will be transmited "at least" 20% faster, you can use any data transfer time calculator for that).=C2=A0 The only issue, that I can see that required testing w= as to show how much compression there would be, and how much time the compression of the data would add to the sending of the data.

    =C2=A0

    _______________________________________________
    bitcoin-dev mailing list
    bitcoin-dev@lists.= linuxfoundation.org
    https://lists.linuxfoundation.org/mail= man/listinfo/bitcoin-dev




    --
    I like to provide some work at no charge to pr= ove my value. Do you need a techie?=C2=A0
    I own Litmocracy and Meme Racing (in alpha).
    I'm th= e webmaster for T= he Voluntaryist which now accepts Bitcoin.
    I also code for The Dollar Vigilante= .
    "He ought to find it more profitable to play by the rules" -= Satoshi Nakamoto
    --001a113d54d48683f00525f7fe7d--