1c/7259ebda2df3af88dd89958d8eb299afb454a8


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283

Received: from sog-mx-1.v43.ch3.sourceforge.com ([172.29.43.191]
	helo=mx.sourceforge.net)
	by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
	(envelope-from <keziahw@gmail.com>) id 1X7tLO-0000P4-Nf
	for bitcoin-development@lists.sourceforge.net;
	Thu, 17 Jul 2014 21:36:02 +0000
Received-SPF: pass (sog-mx-1.v43.ch3.sourceforge.com: domain of gmail.com
	designates 209.85.219.46 as permitted sender)
	client-ip=209.85.219.46; envelope-from=keziahw@gmail.com;
	helo=mail-oa0-f46.google.com; 
Received: from mail-oa0-f46.google.com ([209.85.219.46])
	by sog-mx-1.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128)
	(Exim 4.76) id 1X7tLN-0007PI-KN
	for bitcoin-development@lists.sourceforge.net;
	Thu, 17 Jul 2014 21:36:02 +0000
Received: by mail-oa0-f46.google.com with SMTP id m1so1687017oag.33
	for <bitcoin-development@lists.sourceforge.net>;
	Thu, 17 Jul 2014 14:35:56 -0700 (PDT)
X-Received: by 10.60.65.170 with SMTP id y10mr49017452oes.45.1405632955987;
	Thu, 17 Jul 2014 14:35:55 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.202.98.11 with HTTP; Thu, 17 Jul 2014 14:35:35 -0700 (PDT)
From: Kaz Wesley <keziahw@gmail.com>
Date: Thu, 17 Jul 2014 14:35:35 -0700
Message-ID: <CA+iPb=EaX=bvOjNtZ+LnYTMRLQQ9nFcrefAkBdv8eActoX_b8A@mail.gmail.com>
To: bitcoin-development@lists.sourceforge.net
Content-Type: multipart/alternative; boundary=001a11c2119406c1e904fe6a6dc9
X-Spam-Score: -0.6 (/)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
	See http://spamassassin.org/tag/ for more details.
	-1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
	sender-domain
	0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
	(keziahw[at]gmail.com)
	-0.0 SPF_PASS               SPF: sender matches SPF record
	1.0 HTML_MESSAGE           BODY: HTML included in message
	-0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
	author's domain
	0.1 DKIM_SIGNED            Message has a DKIM or DK signature,
	not necessarily valid
	-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
X-Headers-End: 1X7tLN-0007PI-KN
Subject: [Bitcoin-development] Squashing redundant tx data in blocks on the
	wire
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Thu, 17 Jul 2014 21:36:02 -0000

--001a11c2119406c1e904fe6a6dc9
Content-Type: text/plain; charset=UTF-8

OVERVIEW

To improve block propagation, add a new block message that doesn't include
transactions the peer is known to have. The message must never require an
additional round trip due to any transactions the peer doesn't have, but
should
be compatible with peers sometimes forgetting transactions they have known.

APPROACH

For peers advertising support for squashed blocks: a node tracks what txes
it
knows each peer has seen (inv received, tx sent, tx appeared in competing
block
known to peer). Nodes push block contents as txes-not-already-known +
txids-known.

A node should be able to forget invs it has seen without invalidating what
peers
know about its known txes. To allow for this, a node assembles a bloom
filter of
a set of txes it is going to forget, and sends it to peers. The node can
erase
the txes as soon as no blocks requested before the filter was pushed are in
flight (relying on the assumption that messages can be expected to be
processed
in order).

When a node receives a forgotten-filter, it ORs it into its
forgotten-filter for
that peer. Any transactions matching the forgotten-filter are always
included in
full with a block. If the filter is getting full, the node can just clear it
along with peer.setTxKnown.

COSTS

Bloom filtering:
Since the bloom filter is likely to grow slowly and can be dropped when it
is
becoming full, a cheap set of hash functions and element size can be used to
keep overhead more restricted than the bloom filtering done for spv. It's
important for testing txes against the filter to be fast so that it doesn't
delay pushing the block more than the squashing helps.
Nodes currently forget txes rarely, so the bloom filters would only need to
be
used at all under conditions that are not currently common -- but I think
they're important to include to allow for different node behavior in this
regard
in the future.

Tracking txes known to peers:
A multimap of txid->peerId would obviate the current setCurrentlyKnown, and
would not take much more space since each additional peer adds about 1
peerId
per txid (setCurrentlyKnown keeps a uint256 per peer per txid, although it
tracks somewhat fewer txid per node).

Potential vulnerabilities:
- Since the bloom filters will have lower maximum overhead than the current
SPV
  filters and can be dropped at will, this shouldn't enable any resource
  exhaustion attacks that aren't already possible.
- A squashed block with bogus or missing data would be easily detected not
to
  produce the correct merkle root for its BlockHeader.

BENEFITS

Assuming a fairly typical 500 tx block with transaction sizes averaging 300b
(both on the low side), for a 150kb block:

% pruned | block size reduction | relative size reduction
-------- | -------------------- | -----------------------
100      | 134 kB               | 89%
50       | 67 kB                | 45%
25       | 33.5 kB              | 17%

I've been doing some logging, and when my node pushes a block to a peer it
seems
to typically know that a peer has seen most of the txes in the block. Even
in
the case of a small block with only 25% known-known transactions, total
network
bandwidth saved is greater than the bloom filters transmitted unless a node
is
forgetting transactions so rapidly that it pushes new maximum-size
forget-filters every block.

So this is a net gain even in total bandwidth usage, but most importantly
it's
an improvement in block propagation rate and in how block propagation rate
scales with additional transactions.

IMPLEMENTATION QUESTIONS

How should block squashing capability be advertised -- new service bit?

Bloom filters:
- How fast to test against could a suitable bloom filter be made?
- How much memory would each filter need to take, at maximum?
- Can the inputs all being 32 byte hashes be used to optimize filter hash
  calculations?

ROADMAP

If there's support for this proposal, I can begin working on the specific
implementation details, such as the bloom filters, message format, and
capability advertisment, and draft a BIP once I have a concrete proposal for
what those would look like and a corresponding precise cost/benefit
analysis.

--kaz

--001a11c2119406c1e904fe6a6dc9
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>OVERVIEW</div><div><br></div><div>To improve block pr=
opagation, add a new block message that doesn&#39;t include</div><div>trans=
actions the peer is known to have. The message must never require an</div>

<div>additional round trip due to any transactions the peer doesn&#39;t hav=
e, but should</div><div>be compatible with peers sometimes forgetting trans=
actions they have known.</div><div><br></div><div>APPROACH<br></div><div>

<br></div><div>For peers advertising support for squashed blocks: a node tr=
acks what txes it</div><div>knows each peer has seen (inv received, tx sent=
, tx appeared in competing block</div><div>known to peer). Nodes push block=
 contents as txes-not-already-known +</div>

<div>txids-known.</div><div><br></div><div>A node should be able to forget =
invs it has seen without invalidating what peers</div><div>know about its k=
nown txes. To allow for this, a node assembles a bloom filter of</div>
<div>
a set of txes it is going to forget, and sends it to peers. The node can er=
ase</div><div>the txes as soon as no blocks requested before the filter was=
 pushed are in</div><div>flight (relying on the assumption that messages ca=
n be expected to be processed</div>

<div>in order).</div><div><br></div><div>When a node receives a forgotten-f=
ilter, it ORs it into its forgotten-filter for</div><div>that peer. Any tra=
nsactions matching the forgotten-filter are always included in</div><div>

full with a block. If the filter is getting full, the node can just clear i=
t</div><div>along with peer.setTxKnown.</div><div><br></div><div>COSTS</div=
><div><br></div><div>Bloom filtering:</div><div>Since the bloom filter is l=
ikely to grow slowly and can be dropped when it is</div>

<div>becoming full, a cheap set of hash functions and element size can be u=
sed to</div><div>keep overhead more restricted than the bloom filtering don=
e for spv. It&#39;s</div><div>important for testing txes against the filter=
 to be fast so that it doesn&#39;t</div>

<div>delay pushing the block more than the squashing helps.</div><div>Nodes=
 currently forget txes rarely, so the bloom filters would only need to be</=
div><div>used at all under conditions that are not currently common -- but =
I think</div>

<div>they&#39;re important to include to allow for different node behavior =
in this regard</div><div>in the future.</div><div><br></div><div>Tracking t=
xes known to peers:</div><div>A multimap of txid-&gt;peerId would obviate t=
he current setCurrentlyKnown, and</div>

<div>would not take much more space since each additional peer adds about 1=
 peerId</div><div>per txid (setCurrentlyKnown keeps a uint256 per peer per =
txid, although it</div><div>tracks somewhat fewer txid per node).</div>

<div><br></div><div>Potential vulnerabilities:</div><div>- Since the bloom =
filters will have lower maximum overhead than the current SPV</div><div>=C2=
=A0 filters and can be dropped at will, this shouldn&#39;t enable any resou=
rce</div>

<div>=C2=A0 exhaustion attacks that aren&#39;t already possible.</div><div>=
- A squashed block with bogus or missing data would be easily detected not =
to</div><div>=C2=A0 produce the correct merkle root for its BlockHeader.</d=
iv><div>

<br></div><div>BENEFITS</div><div><br></div><div>Assuming a fairly typical =
500 tx block with transaction sizes averaging 300b</div><div>(both on the l=
ow side), for a 150kb block:</div><div><br></div><div>% pruned | block size=
 reduction | relative size reduction</div>

<div>-------- | -------------------- | -----------------------</div><div>10=
0 =C2=A0 =C2=A0 =C2=A0| 134 kB =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 | 89%</div><div>50 =C2=A0 =C2=A0 =C2=A0 | 67 kB =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 45%</div><div>25 =C2=A0 =C2=A0 =C2=A0 =
| 33.5 kB =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 17%</div><div><=
br>

</div><div>I&#39;ve been doing some logging, and when my node pushes a bloc=
k to a peer it seems</div><div>to typically know that a peer has seen most =
of the txes in the block. Even in</div><div>the case of a small block with =
only 25% known-known transactions, total network</div>

<div>bandwidth saved is greater than the bloom filters transmitted unless a=
 node is</div><div>forgetting transactions so rapidly that it pushes new ma=
ximum-size</div><div>forget-filters every block.</div><div><br></div><div>

So this is a net gain even in total bandwidth usage, but most importantly i=
t&#39;s</div><div>an improvement in block propagation rate and in how block=
 propagation rate</div><div>scales with additional transactions.</div>
<div>
<br></div><div>IMPLEMENTATION QUESTIONS</div><div><br></div><div>How should=
 block squashing capability be advertised -- new service bit?</div><div><br=
></div><div>Bloom filters:</div><div>- How fast to test against could a sui=
table bloom filter be made?</div>

<div>- How much memory would each filter need to take, at maximum?</div><di=
v>- Can the inputs all being 32 byte hashes be used to optimize filter hash=
</div><div>=C2=A0 calculations?</div><div><br></div><div>ROADMAP</div><div>
<br>
</div><div>If there&#39;s support for this proposal, I can begin working on=
 the specific</div><div>implementation details, such as the bloom filters, =
message format, and</div><div>capability advertisment, and draft a BIP once=
 I have a concrete proposal for</div>

<div>what those would look like and a corresponding precise cost/benefit an=
alysis.</div><div><br></div><div>--kaz</div></div>

--001a11c2119406c1e904fe6a6dc9--