summaryrefslogtreecommitdiff
path: root/6a/48abd5b85d732b3b14f7f56a3a7bfed4ae0aeb
blob: 3d7b11c4face0d8b72ef6d9124831dac60a15366 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
Return-Path: <jim.posen@gmail.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 67E34D8C
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Thu, 24 May 2018 04:02:30 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-qk0-f173.google.com (mail-qk0-f173.google.com
	[209.85.220.173])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 78FD66C1
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Thu, 24 May 2018 04:02:29 +0000 (UTC)
Received: by mail-qk0-f173.google.com with SMTP id k86-v6so196041qkh.13
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Wed, 23 May 2018 21:02:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:references:in-reply-to:from:date:message-id:subject:to
	:cc; bh=xOyP3FFOOeOnX4Ac1ktmpbc3peAOK895F4unFOW8Su8=;
	b=s8lox/j1Z1x/nbgquvDUhk8LNGCpfMW3ISEup9SE8chGIOpbT0WTqDapBDfNvHznX4
	hSWhtEpmVmhA1TPqjsPFjkATFuL3QJlDDy4X6eRIbT38hJbbZOJKWKMgessy0/SXnOL1
	iwEyofpAyCiy6M1QEJwJmDICIdIbUeY74UfxKun53U05+tw2aSKBxyC12vmlCf/9BVx+
	8W/759cuqdaknda9vMTzmbCsVqdcLHorY3AE9w6ZKnroUFpTz1nXJStHbfhxjkmMrlWG
	SkI83+9KFOOLYA2hF2sWcB1dbgqqnx312paKIHBpxYJ0V9NEjMopD11UmcAd/7eIUSVs
	BRRQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:references:in-reply-to:from:date
	:message-id:subject:to:cc;
	bh=xOyP3FFOOeOnX4Ac1ktmpbc3peAOK895F4unFOW8Su8=;
	b=NJvQMhZLFTXEN/6W6wxWSG9IxafsKVaMduZGjfEFD1MTfkhDN4H6ddqVKNJ3tvr/nY
	KrxGRyPzA+jmJNyBBJGs1vtSfQQ1tq0AIeUKE0YPi23pOm5hko/Nnri5HnucDT3R4AuW
	KyXDT3JeVPHI2alGdlJFW29S7Uicha55gDThthHYreVcvXNbp2lznsH5L+HQ/WBNkFQS
	luRvDCstSoOR6DbumtOnIVNyAb1y3pr7GS5k380xmHqfIrio8TjdN/u6CXiRyHmGnYjx
	CVGc40/xMmsuN+bKvFJvKAJ8PeebYKIe7KTQsYf3ROC4TNDFhTL7oFdS7KSV/x5aNoUV
	rW0Q==
X-Gm-Message-State: ALKqPweD2FQZ92RcWzeeAHNMfkE5lBv7G/00l3ryocrZ1Oh3Z1oooH+f
	c8gV4gHlLrVUqEmWMTFZbu5xi+HS5ab9cnc8O1l9FQ==
X-Google-Smtp-Source: ADUXVKLZ/ktdT/C1GFpNRCbx1nMHwSRtY6NaTC9bXn0cz5IF+Wp9Btrb/A+QwwUuKDhQ3Uyr+l8WPwuj30uue8UZ5JU=
X-Received: by 2002:a37:82c6:: with SMTP id
	e189-v6mr4839474qkd.322.1527134548456; 
	Wed, 23 May 2018 21:02:28 -0700 (PDT)
MIME-Version: 1.0
References: <CADZtCSiRxZUrSJeD0y6uBCuc+rg7knwKqA_4rw5BLVMMVxHLww@mail.gmail.com>
	<CAHUJnBC=Af2t-48n1MFMwRq945GRZGjzc4ZA2NO2JEB3xOQUtg@mail.gmail.com>
In-Reply-To: <CAHUJnBC=Af2t-48n1MFMwRq945GRZGjzc4ZA2NO2JEB3xOQUtg@mail.gmail.com>
From: Jim Posen <jim.posen@gmail.com>
Date: Wed, 23 May 2018 21:02:17 -0700
Message-ID: <CADZtCSh3uB-fQhKMbRwJ_AprTTK9+v-i3NHGMvdY0y5VoBQQCg@mail.gmail.com>
To: bram@chia.net
Content-Type: multipart/alternative; boundary="00000000000048fa29056cebb8c9"
X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Thu, 24 May 2018 04:03:03 +0000
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] TXO bitfield size graphs
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Thu, 24 May 2018 04:02:30 -0000

--00000000000048fa29056cebb8c9
Content-Type: text/plain; charset="UTF-8"

Yes, certainly an RLE-style compression would work better in this instance,
but I wanted to see how well standard compression algorithms would work
without doing something custom. If there are other standard compression
schemes better suited to this, please let me know.

As far as relevance, I'll clarify that the intention is to compress the
bitfields when sending proofs of spentness/unspentness to light clients,
where bandwidth is a concern. As you note, the bitfields are small enough
that it's probably not necessary to store the compressed versions on full
nodes. Though lz4 is fast enough that it may be worthwhile to compress
before saving to disk.

On Wed, May 23, 2018 at 7:43 PM Bram Cohen <bram@chia.net> wrote:

> You compressed something which is truly natively a bitfield using regular
> compression algorithms? That is expected to get horrible results. Much
> better would be something which handles it natively, say doing run length
> encoding on the number of repeated bits and compressing that using elias
> omega encoding. That is suboptimal in a few ways but has the advantage of
> working well both on things which are mostly zeros or mostly ones, and only
> performs badly on truly random bits.
>
> It isn't super clear how relevant this information is. The TXO bitfield is
> fairly small to begin with, and to compress the data in real time would
> require a special data structure which gets worse compression than straight
> compressing the whole thing and has slower lookups than an uncompressed
> version. Writing such a thing sounds like an interesting project though.
>
> On Wed, May 23, 2018 at 4:48 PM, Jim Posen via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
>> I decided to look into the metrics around compression ratios of TXO
>> bitfields, as proposed by Bram Cohen [1]. I'm specifically interested in
>> the feasibility of committing to them with block headers. In combination
>> with block commitments to TXOs themselves, this would enable UTXO
>> inclusion/exclusion proofs for light clients.
>>
>> First, looking just at proofs of inclusion in the UTXO set, each block
>> needs what Bram calls a "proof of position." Concretely, one such
>> construction is a Merkle root over all of the block's newly created coins,
>> including their output data (scriptPubKey + amount), the outpoint (txid +
>> index), and an absolute index of the output in the entire blockchain. A
>> Merkle branch in this tree constitutes a proof of position. Alternatively,
>> the "position", rather than being an absolute index in the chain, could be
>> a block hash plus an output index within the block.
>>
>> Let's say we use the absolute index in the chain as position. A TXO
>> spentness bitfield can be constructed for the entire chain, which is added
>> to when new coins are created and modified when they are spent. In order to
>> compactly prove spentness in this bitfield to a client, one could chunk up
>> the bitfield and construct a Merkle Mountain Range [2] over the chunks.
>> Instead of building an MMR over outputs themselves, as proposed by Peter
>> Todd [3], an MMR constructed over bitfield chunks grows far slower, by a
>> large constant factor. Slower growth means faster updates.
>>
>> So there's the question of how much these bitfields can be compressed. We
>> expect some decent level because patterns of spending coins are very
>> non-random.
>>
>> The top graph in the attached figure shows the compression ratios
>> possible on a TXO bitfield split into 4 KiB chunks, using gzip (level=9)
>> and lz4. Data was collected at block height 523,303. You can see that the
>> compression ratio is much lower for older chunks and is worse for more
>> recent blocks. Over the entire history, gzip achieves 34.4%, lz4 54.8%,
>> and bz2 37.6%. I'm kind of surprised that the ratios are not lower with
>> off-the-shelf algorithms. And that gzip performs better than bz2 (it seems
>> to be a factor of the chunk size?).
>>
>> Alternatively, we can look at bitfields stored separately by block, which
>> is more compatible with constructions where an output's position is its
>> block hash plus relative index. The per-block bitfield sizes are shown in
>> the bottom graph. The compression ratios overall are 50% for gzip, 70% for
>> lz4, and 61.5% for bz2.
>>
>> [1]
>> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/013928.html
>> [2]
>> https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md
>> [3] https://petertodd.org/2016/delayed-txo-commitments
>>
>>
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>

--00000000000048fa29056cebb8c9
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Yes, certainly an RLE-style compression would work better =
in this instance, but I wanted to see how well standard compression algorit=
hms would work without doing something custom. If there are other standard =
compression schemes better suited to this, please let me know.<div><br></di=
v><div>As far as relevance, I&#39;ll clarify that the intention is to compr=
ess the bitfields when sending proofs of spentness/unspentness to light cli=
ents, where bandwidth is a concern. As you note, the bitfields are small en=
ough that it&#39;s probably not necessary to store the compressed versions =
on full nodes. Though lz4 is fast enough that it may be worthwhile to compr=
ess before saving to disk.</div></div><br><div class=3D"gmail_quote"><div d=
ir=3D"ltr">On Wed, May 23, 2018 at 7:43 PM Bram Cohen &lt;<a href=3D"mailto=
:bram@chia.net">bram@chia.net</a>&gt; wrote:<br></div><blockquote class=3D"=
gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-=
left:1ex"><div dir=3D"ltr">You compressed something which is truly natively=
 a bitfield using regular compression algorithms? That is expected to get h=
orrible results. Much better would be something which handles it natively, =
say doing run length encoding on the number of repeated bits and compressin=
g that using elias omega encoding. That is suboptimal in a few ways but has=
 the advantage of working well both on things which are mostly zeros or mos=
tly ones, and only performs badly on truly random bits.<div><br></div><div>=
It isn&#39;t super clear how relevant this information is. The TXO bitfield=
 is fairly small to begin with, and to compress the data in real time would=
 require a special data structure which gets worse compression than straigh=
t compressing the whole thing and has slower lookups than an uncompressed v=
ersion. Writing such a thing sounds like an interesting project though.</di=
v></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed, M=
ay 23, 2018 at 4:48 PM, Jim Posen via bitcoin-dev <span dir=3D"ltr">&lt;<a =
href=3D"mailto:bitcoin-dev@lists.linuxfoundation.org" target=3D"_blank">bit=
coin-dev@lists.linuxfoundation.org</a>&gt;</span> wrote:<br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex"><div dir=3D"ltr">I decided to look into the metrics around =
compression ratios of TXO bitfields, as proposed by Bram Cohen [1]. I&#39;m=
 specifically interested in the feasibility of committing to them with bloc=
k headers. In combination with block commitments to TXOs themselves, this w=
ould enable UTXO inclusion/exclusion proofs for light clients.<div><br></di=
v><div>First, looking just at proofs of inclusion in the UTXO set, each blo=
ck needs what Bram calls a &quot;proof of position.&quot; Concretely, one s=
uch construction is a Merkle root over all of the block&#39;s newly created=
 coins, including their output data (scriptPubKey + amount), the outpoint (=
txid=C2=A0+ index), and an absolute index of the output in the entire block=
chain. A Merkle branch in this tree constitutes a proof of position. Altern=
atively, the &quot;position&quot;, rather than being an absolute index in t=
he chain, could be a block hash plus an output index within the block.</div=
><div><br></div><div>Let&#39;s say we use the absolute index in the chain a=
s position. A TXO spentness bitfield can be constructed for the entire chai=
n, which is added to when new coins are created and modified when they are =
spent. In order to compactly prove spentness in this bitfield to a client, =
one could chunk up the bitfield and construct a Merkle Mountain Range [2] o=
ver the chunks. Instead of building an MMR over outputs themselves, as prop=
osed by Peter Todd [3], an MMR constructed over bitfield chunks grows far s=
lower, by a large constant factor. Slower growth means faster updates.</div=
><div><br></div><div>So there&#39;s the question of how much these bitfield=
s can be compressed. We expect some decent level because patterns of spendi=
ng coins are very non-random.</div><div><br></div><div>The top graph in the=
 attached figure shows the compression ratios possible on a TXO bitfield sp=
lit into 4 KiB chunks, using gzip (level=3D9) and lz4. Data was collected a=
t block height 523,303. You can see that the compression ratio is much lowe=
r for older chunks and is worse for more recent blocks.=C2=A0<span style=3D=
"color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:norma=
l;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;le=
tter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;wh=
ite-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-de=
coration-style:initial;text-decoration-color:initial;float:none;display:inl=
ine">Over the entire history, gzip achieves 34.4%, lz4 54.8%, and bz2 37.6%=
.</span>=C2=A0I&#39;m kind of surprised that the ratios are not lower with =
off-the-shelf algorithms. And that gzip performs better than bz2 (it seems =
to be a factor of the chunk size?).</div><div><br></div><div>Alternatively,=
 we can look at bitfields stored separately by block, which is more compati=
ble with constructions where an output&#39;s position is its block hash plu=
s relative index. The per-block bitfield sizes are shown in the bottom grap=
h. The compression ratios overall are 50% for gzip, 70% for lz4, and 61.5% =
for bz2.</div><div><div><div><br></div><div>[1]=C2=A0<a href=3D"https://lis=
ts.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/013928.html" target=
=3D"_blank">https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-Ma=
rch/013928.html</a></div></div><div>[2]=C2=A0<a href=3D"https://github.com/=
opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.=
md" target=3D"_blank">https://github.com/opentimestamps/opentimestamps-serv=
er/blob/master/doc/merkle-mountain-range.md</a></div><div>[3]=C2=A0<a href=
=3D"https://petertodd.org/2016/delayed-txo-commitments" target=3D"_blank">h=
ttps://petertodd.org/2016/delayed-txo-commitments</a></div><div><br></div><=
/div></div>
<br>_______________________________________________<br>
bitcoin-dev mailing list<br>
<a href=3D"mailto:bitcoin-dev@lists.linuxfoundation.org" target=3D"_blank">=
bitcoin-dev@lists.linuxfoundation.org</a><br>
<a href=3D"https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev" =
rel=3D"noreferrer" target=3D"_blank">https://lists.linuxfoundation.org/mail=
man/listinfo/bitcoin-dev</a><br>
<br></blockquote></div><br></div>
</blockquote></div>

--00000000000048fa29056cebb8c9--