d4/66ce3dd43db393082ac9e0f48bf3d3818484d9


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340

Return-Path: <eric@voskuil.org>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id B863E8A6
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 18 Nov 2016 16:47:14 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-yb0-f173.google.com (mail-yb0-f173.google.com
	[209.85.213.173])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 3B5811D3
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 18 Nov 2016 16:47:13 +0000 (UTC)
Received: by mail-yb0-f173.google.com with SMTP id v78so81609256ybe.3
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Fri, 18 Nov 2016 08:47:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=voskuil-org.20150623.gappssmtp.com; s=20150623;
	h=mime-version:subject:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to;
	bh=WZH08C05lJGa5s0DlaF1skZNjpaBAR7gXMo5o7gutDU=;
	b=bgVXfhyZoSZsXbukULRBx2LgsGZVc9eoyhk64OjfivxoMoNKmr0YPpsLAQmkgW3FNE
	mKTVFbzXugw6Y2OqFYb7qnMXTxqRq9fm0bad/oIVIVgG82l4Szilrpzm6Y304kMDWo2W
	JYJjVvDq1oamt5UoiHPrcEXP+wpxdBf/Af0HtlNDsVsvh+z9D2vIKglumQ/YDMHqf+Vr
	hX93RXQ2puwEldfN9xvVDPxN80GuixJJwvax3xhfvxyzdnaLfWVqYiCaUPufyMTi22pP
	OD1QCa1h0SU8csKvhcJhIa5daoQK0Td4L/Dv4Vhylmi3Kwj3M90mKaYNXyhJLUcKBaYq
	WmLQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to;
	bh=WZH08C05lJGa5s0DlaF1skZNjpaBAR7gXMo5o7gutDU=;
	b=ek+kwM+uqb/I8b5OfskQ6+m2Os07rPql4XNWHV3uQ2SUyGvHuv2EkTGei1eZt/F5yX
	ioVWEf5axvmYoAyhzMyuLpuAfVBjPnAHUi+dUuGhu9acn9go6tj73fg3d4yCnNjoMi1+
	AsyYcgZHbi/4ocFSpYNmADvSviFNGX4DeuufYENYgWcY28WgYYNrWkrPvstsf0brqyw1
	hF08/Nm9qdBRRaZmfKiloOML0xx7fBe5KjSHl5BDHp53AJgUnt48t3V6N8lOEtttxzY+
	F2zS7sXiEGjttknH/Ft5G4ZEC8+3eAk8JD0V/VTgkPReScNTqIPIrnZL0fjRHVyuDf7R
	sM5A==
X-Gm-Message-State: AKaTC00emA6javxiGzakUZ7sVnd1Ly6A3IoW46jf/r1TgAONRHyps+UHerA91KSkfPPyxg==
X-Received: by 10.37.197.201 with SMTP id v192mr341753ybe.33.1479487632068;
	Fri, 18 Nov 2016 08:47:12 -0800 (PST)
Received: from [10.26.17.106] ([166.177.185.204])
	by smtp.gmail.com with ESMTPSA id
	d136sm3138998ywh.50.2016.11.18.08.47.10
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Fri, 18 Nov 2016 08:47:11 -0800 (PST)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (1.0)
From: Eric Voskuil <eric@voskuil.org>
X-Mailer: iPhone Mail (14B100)
In-Reply-To: <169CC80A-3B63-4D58-8E8C-0D1D9489E891@xbt.hk>
Date: Fri, 18 Nov 2016 10:47:09 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <1313CE5A-430F-45B6-A476-9FFA984452C7@voskuil.org>
References: <CABm2gDr2-MCiaFFjgUFP5Xc0fQfuqJ3=ZkrzjHqmOiwRZ50CBw@mail.gmail.com>
	<d58ee114-00fd-23c8-9ca7-9a4b28c26f27@voskuil.org>
	<CAE-z3OX5vak25UWcmBSe63OmoOVoGB394WmwyWwUcSxWeDOLhw@mail.gmail.com>
	<e0e6679f-aec6-a579-667d-b5b58ea2360b@voskuil.org>
	<CAE-z3OXfJa3Lewtrafm25bdfPa=eiarOAXBNbgc3ccTi7Qoe6A@mail.gmail.com>
	<5ef23296-5909-a350-ab11-e717f8fffc41@voskuil.org>
	<CAPWm=eW9X77+qQZGHkAOjN-k7KFwq06gKS6HOVOTE1+SmYBhWA@mail.gmail.com>
	<34949746-c0c9-7f14-0e92-69d5a7d44b04@voskuil.org>
	<FA128F93-7E79-47FE-8270-225D07DD6FEF@xbt.hk>
	<8d92ae05-ac6a-30b7-5ef3-f7aa1298e46d@voskuil.org>
	<632B36D5-74AF-41E2-8E21-359F02645066@xbt.hk>
	<59D27CC6-120C-4673-9F20-6B5E95EA60C6@voskuil.org>
	<6F2B3EA2-4245-4A0E-8E19-12D02A871815@xbt.hk>
	<11B3C69E-5F1B-4D25-86CE-E5F3B603266F@voskuil.org>
	<169CC80A-3B63-4D58-8E8C-0D1D9489E891@xbt.hk>
To: Johnson Lau <jl2012@xbt.hk>
X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM
	autolearn=no version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
X-Mailman-Approved-At: Fri, 18 Nov 2016 17:10:45 +0000
Cc: bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] BIP30 and BIP34 interaction (was Re: [BIP
	Proposal] Buried Deployments)
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Nov 2016 16:47:14 -0000

What is the difference between downloading a hash and comparing it to a hash=
 vs downloading a hash and then a block and comparing it to a block?

You are talking about breaking a system in order to make it run faster. Usin=
g the hash is an non-optimization trade against correctness.

There is no "first seen" rule, there is only valid and invalid. Even the nam=
e exposes the error of this thinking as "first" requires order.

Caching invalidity for DOS protection is fine. It should be quite obvious th=
at the blockchain is nothing more than a coach of validity. If it's faster i=
n some cases to store both validity and all invalidity that you are aware of=
 it is fine, you are trading space for time.

But caching information that is neither validity nor invalidity, and using i=
t to validate blocks is a break.

I cannot emphasize this point enough. A technique that provides varied resul=
ts based on communication history, such as this "rule", is an attack vector.=
 It allows the attacker to place information into your cache and read it bac=
k later from another connection. Even optimizing correct results based on co=
mmunication history exposes the node in this manner. These sort of attacks h=
ave been shown to be very effective at deanonymizing hidden nodes.

The p2p protocol actually makes this sort of attack a matter of communicatio=
n standard via the sharing of address information, but this can be disabled w=
ithout impacting correctness. Due to such non-optimizations as the first see=
n "rule" however, a node becomes a candy store of fingerprinting attack vect=
ors.

Bitcoin provides the mechanism to reject cheaply-produced invalid blocks qui=
ckly. This is after all the fundamental principle of hash cash - force the a=
ttacker to pay to spam attack. By obtaining headers first a node can obtain p=
roof of work and perform correct and fast validation before ever obtaining t=
he block's transactions. This technique is probably no more time-costly than=
 the incorrect technique of checking a cache of hashes (ironically, a "hash c=
ache" is an incorrect "hash cash"), and avoids the extra space of a secondar=
y cache (the blockchain is the primary cache). It also avoids the varied tim=
e response that a secondary cache creates.

So once again, premature optimization erupts from the underlying design flaw=
, and creates more problems than proper design. The p2p network standard did=
n't have headers first at one point, making correct checks more costly. That=
 is no longer the case. But nevertheless, one cannot trade correctness for t=
ime.

The tx pool, like the orphan pool, as I mentioned previously, is an optimiza=
tion. It is not a part of consensus, so it isn't relevant to a discussion ab=
out forks. It is also a design flaw that nodes are expected to hold invalid t=
ransactions. It exposes nodes to both DOS and fingerprinting attacks. Proper=
 tx handling implies that a tx connect to a valid block. There is no "header=
" for a transaction so correctness requires that the tx be downloaded before=
 it can be validated.

e

> On Nov 18, 2016, at 8:43 AM, Johnson Lau <jl2012@xbt.hk> wrote:
>=20
> In this case I don=E2=80=99t understand how your implementation won=E2=80=99=
t be DoS-ed. An attacker could keep sending you inv for the same block / tra=
nsaction. Since you don=E2=80=99t assume the hash is unique, each time you h=
ave to download the block/tx again before you could tell if that is the same=
 one you have already known. Otherwise, you are implementing the =E2=80=9Cfi=
rst seen=E2=80=9D rule.
>=20
> Also, you can=E2=80=99t ban a peer just because you get an invalid tx from=
 him, because he might be referring to a hash-colliding UTXO that you don=E2=
=80=99t know. In that case you need to request for the parent tx to verify. I=
 wonder if you are really doing that.
>=20
>> On 18 Nov 2016, at 11:20, Eric Voskuil <eric@voskuil.org> wrote:
>>=20
>> You are suggesting that, since a node implements a denial of service poli=
cy that actually denies itself otherwise valid blocks, those blocks are cond=
itionally invalid. And that, since the validity condition is based on order o=
f arrival and therefore independently unverifiable, Bitcoin consensus is bro=
ken in the face of a hash collision.
>>=20
>> I am aware of two other hash collision scenarios that cause Core to decla=
re blocks invalid based on ordering. The block hash duplicate check (it's no=
t fork-point relative) and signature verification caching. Like the "block b=
anning" issue above, the latter is related to an internal optimization. I wo=
uld categorize the former as a simple oversight that presumably goes way bac=
k.
>>=20
>> What then is the consequence of validity that is unverifiable? You believ=
e this means that Bitcoin consensus is broken. This is incorrect. First unde=
rstand that it is not possible for consensus rules to invalidate blocks base=
d on order of arrival. As such any *implementation* that invalidates blocks b=
ased on order of arrival is broken. It is an error to claim that these behav=
iors are part of consensus, despite being implemented in the satoshi node(s)=
.
>>=20
>> Validity must be verifiable independent of the state of other nodes. Cons=
ensus is a function of block history and time alone. Time is presumed to be u=
niversally consistent. To be a consensus rule all nodes must be able to inde=
pendently reach the same validity conclusion, given the same set of blocks, i=
ndependent of order. If this is not the case the behavior is not a consensus=
 rule, it is simply a bug.=20
>>=20
>> Deviating from such bugs is not a break with consensus, since such non-ru=
les cannot be part of consensus. One node implementation can behave determin=
istically while others are behaving non-deterministically, with the two node=
s remaining consistent from a consensus standpoint (deterministic produces a=
 subset of non-deterministic results). But, unlike arbitrary nodes, determin=
istic nodes will not cause disruption on the network.
>>=20
>> You imply that these determinism bugs are necessary, that there is no fix=
. This is also incorrect.
>>=20
>> The block banning hash collision bug is avoided by not using non-chain/cl=
ock state to determine validity. Doing otherwise is clearly a bug. The hash o=
f a block is not the block itself, a logically-correct ban would be to compa=
re the wire serialization of the block as opposed to the hash, or not mainta=
in the feature at all.
>>=20
>> The signature verification caching hash collision bug is the same problem=
, an optimization based on an invalid assumption. A full serialization compa=
rison (true identity), or elimination of the feature resolves the  bug.
>>=20
>> The block hash check collision bug is trivially resolved by checking at t=
he fork point as opposed to the tip. This prevents arbitrary (and irrational=
) invalidity based on conflict with irrelevant blocks that may or may not ex=
ist above the fork point.
>>=20
>> Libbitcoin is deterministic in all three cases (although the third issue i=
s not made consistent until v3). I am not aware of any other non-determinism=
 in Core, but I don't spend a lot of time there. There is no need to study o=
ther implementations to ensure determinism, as that can be verified independ=
ently.
>>=20
>> Any situation in which a node cannot provide deterministic validation of u=
nordered blocks constitutes a non-consensus bug, as the behavior is not cons=
istently verifiable by others under any conditions. Fixing/preventing these b=
ugs is responsible development behavior, and does not require forks or BIPs,=
 since Bitcoin doesn't inherently contain any such bugs. They are the conseq=
uence of incorrect implementation, and in two of the three cases above have r=
esulted from supposed optimizations. But any code that creates non-determini=
sm in exchange for speed, etc. is not an optimization, it's a bug. A node mu=
st implement its optimizations in a manner that does not alter consensus.
>>=20
>> The BIP30 regression hard fork is not a case of non-determinism. This wil=
l produce deterministic results (apart from the impact of unrelated bugs). H=
owever the results are both a clear break from previous (and documented) con=
sensus but also produce a very undesirable outcome - destruction of all unsp=
ent outputs in the "replaced" transaction for starters. So this is a distinc=
t category, not a determinism bug but a hard fork that produces undesired co=
nsequences.
>>=20
>> The BIP30 regression hard fork actually enables the various pathological s=
cenarios that you were describing, where no such issues existed in Bitcoin c=
onsensus previously. It is now possible to produce a block that mutates anot=
her arbitrarily deep block, and forces a reorg all the way back to the mutat=
ed block. This was done to save microseconds per block. Despite the improbab=
ility of hash collisions, I find this deplorable and the lack of public disc=
ussion on the decision concerning.
>>=20
>> With respect to the original post, the point at issue is the introduction=
 of another hard fork, with some odd behaviors, but without any justificatio=
n apart from tidying up the small amount of necessary code. These issues are=
 related in that they are both consensus forks that have been introduced as s=
upposed optimizations, with no public discussion prior to release (or at lea=
st merging to master with the presumption of shipping in the latter case). T=
wo of the three hash collision issues above are also related in that they ar=
e bugs introduced by a desire to optimize internals.
>>=20
>> The engineering lesson here should be clear - watch out for developers be=
aring optimizations. A trade against correctness is not an optimization, it'=
s a break. Satoshi was clearly a fan of the premature optimization. FindAndD=
elete is a howler. So this is a tradition in Bitcoin. My intent is not to sl=
ing mud but to improve the situation.
>>=20
>> It is very possible to produce straightforward and deterministic code tha=
t abides consensus and materially outperforms Core, without any of the above=
 optimization breaks, even avoiding the utxo set optimization. Even the tx (=
memory) and block (orphan) pools are complex store denormalizations implemen=
ted as optimizations. Optimizing before producing a clean conceptual model a=
rchitecture and design is a software development anti-pattern (premature opt=
imization). The proposed fork is a premature optimization. There are much mo=
re significant opportunities to better organize code (and improve performanc=
e). I cannot support the decision to advance it.
>>=20
>> I was unaware Core had regressed BIP30. Given that the behavior is catast=
rophic and that it introduces the *only* hash-collision consensus misbehavio=
r (unless we consider a deep reorg sans the otherwise necessary proof of wor=
k desirable behavior), I strongly recommend it be reverted, with a post-mort=
em BIP.
>>=20
>> Finally I recommend people contemplate the difference between unlikely an=
d impossible. The chance of random collision is very small, but not zero. Co=
lliding hashes is extremely difficult, but not impossible. But Bitcoin does n=
ot rely on impossibility for correct behavior. It relies of difficulty. This=
 is a subtle but important distinction that people are missing.
>>=20
>> Difficulty is a knowable quantity - a function of computing power.  If ha=
sh operations remain difficult, Bitcoin is undeterred. Collisions will have n=
o impact, even if they happen with unexpected frequency (which would still b=
e vanishingly infrequent). If the difficulty of producing a collision is red=
uced to the point where people cannot rely on addresses (for example), then B=
itcoin has a problem, as it has become a leaky ship (and then there's mining=
). But with the unnecessary problems described above, a single hash collisio=
n can be catastrophic. Unlike difficulty, which is known, nobody can know wh=
en a single collision will show up. Betting Bitcoin, and potentially the wor=
ld's money, on the unknowable is poor reasoning, especially given that the c=
ost of not doing so is so very low.
>>=20
>> e
>>=20
>>> On Nov 17, 2016, at 10:08 AM, Johnson Lau <jl2012@xbt.hk> wrote:
>>>=20
>>> The fact that some implementations ban an invalid block hash and some do=
 not, suggests that it=E2=80=99s not a pure p2p protocol issue. A pure p2p s=
plit should be unified by a bridge node. However, a bridge node is not helpf=
ul in this case. Banning an invalid block hash is an implicit =E2=80=9Cfirst=
 seen=E2=80=9D consensus rule.
>>>=20
>>> jl2012
>>>=20
>>>> On 18 Nov 2016, at 01:49, Eric Voskuil <eric@voskuil.org> wrote:
>>>>=20
>>>> Actually both possibilities were specifically covered in my description=
. Sorry if it wasn't clear.
>>>>=20
>>>> If you create a new valid block out of an old one it's has potential to=
 cause a reorg. The blocks that previously built on the original are still a=
ble to do so but presumably cannot build forever on the *new* block as it ha=
s a different tx. But other new blocks can. There is no chain split due to a=
 different interpretation of valid, there are simply two valid competing cha=
ins.
>>>>=20
>>>> Note that this scenario requires not only block and tx validity with a t=
x hash collision, but also that the tx be valid within the block. Pretty far=
 to reach to not even get a chain split, but it could produce a deep reorg w=
ith a very low chance of success. As I keep telling people, deep reorgs can h=
appen, they are just unlikely, as is this scenario.
>>>>=20
>>>> If you create a new invalid block it is discarded by everyone. That doe=
s not invalidate the hash of that block. Permanent blocking as you describe i=
t would be a p2p protocol design choice, having nothing to do with consensus=
. Libbitcoin for example does not ban invalidated hashes at all. It just dis=
cards the block and drops the peer.
>>>>=20
>>>> e
>>>=20
>>>=20
>=20
>=20