Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 34BF0EF1 for ; Thu, 24 May 2018 02:43:48 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-wr0-f173.google.com (mail-wr0-f173.google.com [209.85.128.173]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id DCBFF6C1 for ; Thu, 24 May 2018 02:43:46 +0000 (UTC) Received: by mail-wr0-f173.google.com with SMTP id 94-v6so190432wrf.5 for ; Wed, 23 May 2018 19:43:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chia-net.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=JwmcOLbYxKhg8sBB+4oHIiR92bk8mO56q6/dwe8Gwrk=; b=HcuwbZAXx58vatQSBcdN+0miqBb86mBGMl98XuoBsH9R5/Wo2710FE9IZ25rTO6fpJ pO/M/UZNSIrVuL8LrhVjG/ZMihuG+wMWWjFIrIiM93a5aRg+pPLPu+igiZesaGaM8Gah E6q0DIT9aqZC1dbKNg5Iv8MYJxJZChSeqe+J8cTAwgC4q3B19mvJE7AfuIk6SeCVxAl6 UnT1Qi1knLTJzTPnI9ZdddbUXbl+XOohU4dAp1vBWgN7bqO4y6dtTRhP6niH9BycL7JI yWakJLWEQp+aIb0b+uvxIAExA2MsSEFAU1xuwHtLC9b29Fyci33SAOAn4Vudw3oISSv5 t0VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=JwmcOLbYxKhg8sBB+4oHIiR92bk8mO56q6/dwe8Gwrk=; b=M1xyn7uGLKcV7RK5OSMfFn6GxQV2rEyXYkmDKV5rqa0J9w04L3tN9+iincr22udgAS Zkb2fSmwMUcQDr8r8tBGduzF9b36Ix0kdYAJniTf1D/nJY8dAQG5KgOHMtzUXMhTm7Ws 32SEftTSTtEvmP0BZr9KjVFfk26L6ptxIi+/JMpxAdl7XAgjiF4aJT9jG6llSmQ0RShP fFdWmNavQA+Mrikc5c6qjidPX7fcKDud53S2gZUec5e5Hs1nhh3ZVXE9LluwkcsdzdBq JXmGDUanKS8gREhu9yBEPdHpcmIIuwfVQKiC/SuJ+vgWRyRxNShocvJ7vunvQ8LfFrgV sroA== X-Gm-Message-State: ALKqPwepfjbUkjpGND1+mg+ej6qXmOP3zHoEF1WO4U0ZLjVmEdKERC/F 1CKGIaEc9GcJV0KmOt/XoAPlimfFO50NH7v225Y2Dw== X-Google-Smtp-Source: AB8JxZqZieZ/aSTiWhYcUjWY1cHv2fkP76EC0S6Jyi4MH5dwoxFq8xUBETVp6roo8/zT62OPWo/GMhDsCHLRRJP/24o= X-Received: by 2002:adf:94a5:: with SMTP id 34-v6mr4961797wrr.43.1527129825413; Wed, 23 May 2018 19:43:45 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a1c:9104:0:0:0:0:0 with HTTP; Wed, 23 May 2018 19:43:44 -0700 (PDT) X-Originating-IP: [65.200.105.218] In-Reply-To: References: From: Bram Cohen Date: Wed, 23 May 2018 19:43:44 -0700 Message-ID: To: Jim Posen , Bitcoin Protocol Discussion Content-Type: multipart/alternative; boundary="000000000000c51fc8056cea9ecd" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HTML_MESSAGE, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Thu, 24 May 2018 02:45:46 +0000 Subject: Re: [bitcoin-dev] TXO bitfield size graphs X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 May 2018 02:43:48 -0000 --000000000000c51fc8056cea9ecd Content-Type: text/plain; charset="UTF-8" You compressed something which is truly natively a bitfield using regular compression algorithms? That is expected to get horrible results. Much better would be something which handles it natively, say doing run length encoding on the number of repeated bits and compressing that using elias omega encoding. That is suboptimal in a few ways but has the advantage of working well both on things which are mostly zeros or mostly ones, and only performs badly on truly random bits. It isn't super clear how relevant this information is. The TXO bitfield is fairly small to begin with, and to compress the data in real time would require a special data structure which gets worse compression than straight compressing the whole thing and has slower lookups than an uncompressed version. Writing such a thing sounds like an interesting project though. On Wed, May 23, 2018 at 4:48 PM, Jim Posen via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > I decided to look into the metrics around compression ratios of TXO > bitfields, as proposed by Bram Cohen [1]. I'm specifically interested in > the feasibility of committing to them with block headers. In combination > with block commitments to TXOs themselves, this would enable UTXO > inclusion/exclusion proofs for light clients. > > First, looking just at proofs of inclusion in the UTXO set, each block > needs what Bram calls a "proof of position." Concretely, one such > construction is a Merkle root over all of the block's newly created coins, > including their output data (scriptPubKey + amount), the outpoint (txid + > index), and an absolute index of the output in the entire blockchain. A > Merkle branch in this tree constitutes a proof of position. Alternatively, > the "position", rather than being an absolute index in the chain, could be > a block hash plus an output index within the block. > > Let's say we use the absolute index in the chain as position. A TXO > spentness bitfield can be constructed for the entire chain, which is added > to when new coins are created and modified when they are spent. In order to > compactly prove spentness in this bitfield to a client, one could chunk up > the bitfield and construct a Merkle Mountain Range [2] over the chunks. > Instead of building an MMR over outputs themselves, as proposed by Peter > Todd [3], an MMR constructed over bitfield chunks grows far slower, by a > large constant factor. Slower growth means faster updates. > > So there's the question of how much these bitfields can be compressed. We > expect some decent level because patterns of spending coins are very > non-random. > > The top graph in the attached figure shows the compression ratios possible > on a TXO bitfield split into 4 KiB chunks, using gzip (level=9) and lz4. > Data was collected at block height 523,303. You can see that the > compression ratio is much lower for older chunks and is worse for more > recent blocks. Over the entire history, gzip achieves 34.4%, lz4 54.8%, > and bz2 37.6%. I'm kind of surprised that the ratios are not lower with > off-the-shelf algorithms. And that gzip performs better than bz2 (it seems > to be a factor of the chunk size?). > > Alternatively, we can look at bitfields stored separately by block, which > is more compatible with constructions where an output's position is its > block hash plus relative index. The per-block bitfield sizes are shown in > the bottom graph. The compression ratios overall are 50% for gzip, 70% for > lz4, and 61.5% for bz2. > > [1] https://lists.linuxfoundation.org/pipermail/ > bitcoin-dev/2017-March/013928.html > [2] https://github.com/opentimestamps/opentimestamps- > server/blob/master/doc/merkle-mountain-range.md > [3] https://petertodd.org/2016/delayed-txo-commitments > > > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > > --000000000000c51fc8056cea9ecd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
You compressed something which is truly natively a bitfiel= d using regular compression algorithms? That is expected to get horrible re= sults. Much better would be something which handles it natively, say doing = run length encoding on the number of repeated bits and compressing that usi= ng elias omega encoding. That is suboptimal in a few ways but has the advan= tage of working well both on things which are mostly zeros or mostly ones, = and only performs badly on truly random bits.

It isn'= ;t super clear how relevant this information is. The TXO bitfield is fairly= small to begin with, and to compress the data in real time would require a= special data structure which gets worse compression than straight compress= ing the whole thing and has slower lookups than an uncompressed version. Wr= iting such a thing sounds like an interesting project though.

On Wed, May 23, 201= 8 at 4:48 PM, Jim Posen via bitcoin-dev <bitcoin-dev@l= ists.linuxfoundation.org> wrote:
I decided to look into the metrics around compressio= n ratios of TXO bitfields, as proposed by Bram Cohen [1]. I'm specifica= lly interested in the feasibility of committing to them with block headers.= In combination with block commitments to TXOs themselves, this would enabl= e UTXO inclusion/exclusion proofs for light clients.

Fir= st, looking just at proofs of inclusion in the UTXO set, each block needs w= hat Bram calls a "proof of position." Concretely, one such constr= uction is a Merkle root over all of the block's newly created coins, in= cluding their output data (scriptPubKey + amount), the outpoint (txid=C2=A0= + index), and an absolute index of the output in the entire blockchain. A M= erkle branch in this tree constitutes a proof of position. Alternatively, t= he "position", rather than being an absolute index in the chain, = could be a block hash plus an output index within the block.

=
Let's say we use the absolute index in the chain as position= . A TXO spentness bitfield can be constructed for the entire chain, which i= s added to when new coins are created and modified when they are spent. In = order to compactly prove spentness in this bitfield to a client, one could = chunk up the bitfield and construct a Merkle Mountain Range [2] over the ch= unks. Instead of building an MMR over outputs themselves, as proposed by Pe= ter Todd [3], an MMR constructed over bitfield chunks grows far slower, by = a large constant factor. Slower growth means faster updates.

=
So there's the question of how much these bitfields can be c= ompressed. We expect some decent level because patterns of spending coins a= re very non-random.

The top graph in the attached = figure shows the compression ratios possible on a TXO bitfield split into 4= KiB chunks, using gzip (level=3D9) and lz4. Data was collected at block he= ight 523,303. You can see that the compression ratio is much lower for olde= r chunks and is worse for more recent blocks.=C2=A0Over = the entire history, gzip achieves 34.4%, lz4 54.8%, and bz2 37.6%.= =C2=A0I'm kind of surprised that the ratios are not lower with off-the-= shelf algorithms. And that gzip performs better than bz2 (it seems to be a = factor of the chunk size?).

Alternatively, we can = look at bitfields stored separately by block, which is more compatible with= constructions where an output's position is its block hash plus relati= ve index. The per-block bitfield sizes are shown in the bottom graph. The c= ompression ratios overall are 50% for gzip, 70% for lz4, and 61.5% for bz2.=

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.= linuxfoundation.org
https://lists.linuxfoundation.org= /mailman/listinfo/bitcoin-dev


--000000000000c51fc8056cea9ecd--