8b/4782c0efb0a459f62cb2cb7f5b9a91e2bb1564


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639

Delivery-date: Tue, 02 Jul 2024 18:31:02 -0700
Received: from mail-yb1-f187.google.com ([209.85.219.187])
	by mail.fairlystable.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	(Exim 4.94.2)
	(envelope-from <bitcoindev+bncBC5P5KEHZQLBBTOTSK2AMGQE5N3FB5I@googlegroups.com>)
	id 1sOopo-0007je-TR
	for bitcoindev@gnusha.org; Tue, 02 Jul 2024 18:31:02 -0700
Received: by mail-yb1-f187.google.com with SMTP id 3f1490d57ef6-e03a92302d1sf817548276.1
        for <bitcoindev@gnusha.org>; Tue, 02 Jul 2024 18:31:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=20230601; t=1719970254; x=1720575054; darn=gnusha.org;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:sender:from
         :to:cc:subject:date:message-id:reply-to;
        bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=;
        b=bHnBaYi6CdztbUdGJas0Dz8G7k4UHb/ubQAPZ8zy714swVXcwb56zA4HGvSxzZkfQc
         17Rz1vAzMQLUGhx9IpD5rpDxNXPKNUKGr1srx1Hf6WclGal9hKhfcbLIBoLGfS3V+BG9
         /+OuyDbxZPbCmhATNdgbli2m+reHAUhrQarkdZbh+JiOllC52PmZJ841JGzh9lJ+XfIx
         jw0qT66q4D0l741RqBXkZ28/MxQl0PRNpx9fUBa6ctIYBeSZH6RP9p0/FEf1erf6XLJm
         7O3IEIq1+pxr2oBbOm0GqQUA9gKJL7mBWCIiv5TGQvVyoy1cx7dBKDSE34YUHLk4jyq4
         tTPw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups-com.20230601.gappssmtp.com; s=20230601; t=1719970254; x=1720575054; darn=gnusha.org;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:from:to:cc
         :subject:date:message-id:reply-to;
        bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=;
        b=irys9hm5QLYCa4xVccz5fdg/x+BHkEA8mUkN7kbYkHO5gQnDnL1qc5LzqXJNlyqMrS
         6IWoFhZH2IHlVIt66iAcwtXpeDNxUfkwOt6BnMYf4Ce5SPTuPxI6KiIFHElsJi5IeLI9
         KHWdlDkqwKeoAPpGQM75HDwfM0VRu57LcVEJsvX6n2BnTuHlggesJJX0sGSW8QxsNxYp
         9jwxJUV67dXCFBEcJc7mPtNw979vcO064CGa0xI2Jaxwt1m7YTHAECaURoWp7nO8yGSl
         2fku+QxmIdT7uRPG2EvLPeg1YO1fv+WDuow29GyE/P3m1bSlfsvPgLv0CWa7TCft6WfF
         Fz5w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1719970254; x=1720575054;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:x-beenthere
         :x-gm-message-state:sender:from:to:cc:subject:date:message-id
         :reply-to;
        bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=;
        b=a+8m231YYiPYIzYDmP3U1NNl2pj5dF8pHYI3A9ltIvOH3vU9BPtoC5QeOybSYdcDDz
         LuLzuLV9iH3rPClhdl/rCc+bFo5hRpR4HX8bB5ge3MU5fxJLfcK8h4LICFeQxn67rqlT
         dXgIB26wPWB7PIdaCOhOe563YnEimGVxb2JlYqjbhPi1v26GS2B//wA8YEzCIISJi07e
         tdCHakMLTUIvX0BAFByh5m4cHWKuE2rOyVRa9DHByb7jDG/4uHXtR2n6z3yP0D/pipTg
         t+CEL9pzlAp0aFLXps8QMeZf4vtism1590xq63pKXUqTmF7qpm3IUrN7SqgU7LP68JwY
         0BPg==
Sender: bitcoindev@googlegroups.com
X-Forwarded-Encrypted: i=1; AJvYcCV8vfW22ZVamQzVnzWU7X2sp5jLzcLfzYDTqz1nhS6mxpkF2kE62osiDjxVkGwFkRsBcvMjRchLmuzPGAvG0YnQqnL4PlY=
X-Gm-Message-State: AOJu0YyGauLvl55DjJ1R9Fz3gX92rbX+GnYT0+twUxeVfGsMDGaSkIaZ
	0dMGH21VByzxHsYrzhrIrkkCUEKOJn5dR3vi0RudYr4scE34TKER
X-Google-Smtp-Source: AGHT+IH4ZqWsLpki8Sdiv4KRJM30Qgv/PKh4KkXmPZUsFJmJ4MJMTFA+HO6ZyRwsqAnGDidr11+HtQ==
X-Received: by 2002:a25:df16:0:b0:e03:229d:69f5 with SMTP id 3f1490d57ef6-e036ead1e52mr11151803276.3.1719970254432;
        Tue, 02 Jul 2024 18:30:54 -0700 (PDT)
X-BeenThere: bitcoindev@googlegroups.com
Received: by 2002:a05:6902:1007:b0:e03:6457:383f with SMTP id
 3f1490d57ef6-e0364573bd8ls6892124276.1.-pod-prod-09-us; Tue, 02 Jul 2024
 18:30:53 -0700 (PDT)
X-Received: by 2002:a05:6902:2b8a:b0:e03:5a51:382f with SMTP id 3f1490d57ef6-e036ec429bcmr980548276.8.1719970253037;
        Tue, 02 Jul 2024 18:30:53 -0700 (PDT)
Received: by 2002:a05:690c:4289:b0:63b:c3b0:e1c with SMTP id 00721157ae682-6514011671ams7b3;
        Tue, 2 Jul 2024 18:13:22 -0700 (PDT)
X-Received: by 2002:a05:690c:fc8:b0:64b:16af:d264 with SMTP id 00721157ae682-64c776d2fd5mr283127b3.7.1719969201017;
        Tue, 02 Jul 2024 18:13:21 -0700 (PDT)
Date: Tue, 2 Jul 2024 18:13:20 -0700 (PDT)
From: Eric Voskuil <eric@voskuil.org>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Message-Id: <d9834ad5-f803-4a39-a854-95b2439738f5n@googlegroups.com>
In-Reply-To: <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com>
References: <gnM89sIQ7MhDgI62JciQEGy63DassEv7YZAMhj0IEuIo0EdnafykF6RH4OqjTTHIHsIoZvC2MnTUzJI7EfET4o-UQoD-XAQRDcct994VarE=@protonmail.com>
 <72e83c31-408f-4c13-bff5-bf0789302e23n@googlegroups.com>
 <heKH68GFJr4Zuf6lBozPJrb-StyBJPMNvmZL0xvKFBnBGVA3fVSgTLdWc-_8igYWX8z3zCGvzflH-CsRv0QCJQcfwizNyYXlBJa_Kteb2zg=@protonmail.com>
 <5b0331a5-4e94-465d-a51d-02166e2c1937n@googlegroups.com>
 <yt1O1F7NiVj-WkmnYeta1fSqCYNFx8h6OiJaTBmwhmJ2MWAZkmmjPlUST6FM7t6_-2NwWKdglWh77vcnEKA8swiAnQCZJY2SSCAh4DOKt2I=@protonmail.com>
 <be78e733-6e9f-4f4e-8dc2-67b79ddbf677n@googlegroups.com>
 <jJLDrYTXvTgoslhl1n7Fk9-pL1mMC-0k6gtoniQINmioJpzgtqrJ_WqyFZkLltsCUusnQ4jZ6HbvRC-mGuaUlDi3kcqcFHALd10-JQl-FMY=@protonmail.com>
 <9a4c4151-36ed-425a-a535-aa2837919a04n@googlegroups.com>
 <3f0064f9-54bd-46a7-9d9a-c54b99aca7b2n@googlegroups.com>
 <26b7321b-cc64-44b9-bc95-a4d8feb701e5n@googlegroups.com>
 <CALZpt+EwVyaz1=A6hOOycqFGJs+zxyYYocZixTJgVmzZezUs9Q@mail.gmail.com>
 <607a2233-ac12-4a80-ae4a-08341b3549b3n@googlegroups.com>
 <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com>
 <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com>
Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_35620_344008102.1719969200791"
X-Original-Sender: eric@voskuil.org
Precedence: list
Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com
List-ID: <bitcoindev.googlegroups.com>
X-Google-Group-Id: 786775582512
List-Post: <https://groups.google.com/group/bitcoindev/post>, <mailto:bitcoindev@googlegroups.com>
List-Help: <https://groups.google.com/support/>, <mailto:bitcoindev+help@googlegroups.com>
List-Archive: <https://groups.google.com/group/bitcoindev
List-Subscribe: <https://groups.google.com/group/bitcoindev/subscribe>, <mailto:bitcoindev+subscribe@googlegroups.com>
List-Unsubscribe: <mailto:googlegroups-manage+786775582512+unsubscribe@googlegroups.com>,
 <https://groups.google.com/group/bitcoindev/subscribe>
X-Spam-Score: -0.7 (/)

------=_Part_35620_344008102.1719969200791
Content-Type: multipart/alternative; 
	boundary="----=_Part_35621_885950809.1719969200791"

------=_Part_35621_885950809.1719969200791
Content-Type: text/plain; charset="UTF-8"

Hi Antoine R,

>> Ok, thanks for clarifying. I'm still not making the connection to 
"checking a non-null [C] pointer" but that's prob on me.

> A C pointer, which is a language idiome assigning to a memory address A 
the value o memory address B can be 0 (or NULL a standard macro defined in 
stddef.h).
> Here a snippet example of linked list code checking the pointer 
(`*begin_list`) is non null before the comparison operation to find the 
target element list.
> ...
> While both libbitcoin and bitcoin core are both written in c++, you still 
have underlying pointer derefencing playing out to access the coinbase 
transaction, and all underlying implications in terms of memory management.

I'm familiar with pointers ;).

While at some level the block message buffer would generally be referenced 
by one or more C pointers, the difference between a valid coinbase input 
(i.e. with a "null point") and any other input, is not nullptr vs. 
!nullptr. A "null point" is a 36 byte value, 32 0x00 byes followed by 4 
0xff bytes. In his infinite wisdom Satoshi decided it was better (or 
easier) to serialize a first block tx (coinbase) with an input containing 
an unusable script and pointing to an invalid [tx:index] tuple (input 
point) as opposed to just not having any input. That invalid input point is 
called a "null point", and of course cannot be pointed to by a "null 
pointer". The coinbase must be identified by comparing those 36 bytes to 
the well-known null point value (and if this does not match the Merkle hash 
cannot have been type64 malleated).

> I think it's interesting to point out the two types of malleation that a 
bitcoin consensus validation logic should respect w.r.t block validity 
checks. Like you said the first one on the merkle root committed in the 
headers's `hashMerkleRoot` due to the lack of domain separation between 
leaf and merkle tree nodes.

We call this type64 malleability (or malleation where it is not only 
possible but occurs).

> The second one is the bip141 wtxid commitment in one of the coinbase 
transaction `scriptpubkey` output, which is itself covered by a txid in the 
merkle tree.

While symmetry seems to imply that the witness commitment would be 
malleable, just as the txs commitment, this is not the case. If the tx 
commitment is correct it is computationally infeasible for the witness 
commitment to be malleated, as the witness commitment incorporates each 
full tx (with witness, sentinel, and marker). As such the block identifier, 
which relies only on the header and tx commitment, is a sufficient 
identifier. Yet it remains necessary to validate the witness commitment to 
ensure that the correct witness data has been provided in the block message.

The second type of malleability, in addition to type64, is what we call 
type32. This is the consequence of duplicated trailing sets of txs (and 
therefore tx hashes) in a block message. This is applicable to some but not 
all blocks, as a function of the number of txs contained.

>> Caching identity in the case of invalidity is more interesting question 
than it might seem.
>> Background: A fully-validated block has established identity in its 
block hash. However an invalid block message may include the same block 
header, producing the same hash, but with any kind of nonsense following 
the header. The purpose of the transaction and witness commitments is of 
course to establish this identity, so these two checks are therefore 
necessary even under checkpoint/milestone. And then of course the two 
Merkle tree issues complicate the tx commitment (the integrity of the 
witness commitment is assured by that of the tx commitment).
>>
>> So what does it mean to speak of a block hash derived from:
>> (1) a block message with an unparseable header?
>> (2) a block message with parseable but invalid header?
>> (3) a block message with valid header but unparseable tx data?
>> (4) a block message with valid header but parseable invalid uncommitted 
tx data?
>> (5) a block message with valid header but parseable invalid malleated 
committed tx data?
>> (6) a block message with valid header but parseable invalid unmalleated 
committed tx data?
>> (7) a block message with valid header but uncommitted valid tx data?
>> (8) a block message with valid header but malleated committed valid tx 
data?
>> (9) a block message with valid header but unmalleated committed valid tx 
data?
>>
>> Note that only the #9 p2p block message contains an actual Bitcoin 
block, the others are bogus messages. In all cases the message can be 
sha256 hashed to establish the identity of the *message*. And if one's 
objective is to reject repeating bogus messages, this might be a useful 
strategy. It's already part of the p2p protocol, is orders of magnitude 
cheaper to produce than a Merkle root, and has no identity issues.

> I think I mostly agree with the identity issue as laid out so far, there 
is one caveat to add if you're considering identity caching as the problem 
solved. A validation node might have to consider differently block messages 
processed if they connect on the longest most PoW valid chain for which all 
blocks have been validated. Or alternatively if they have to be added on a 
candidate longest most PoW valid chain.

Certainly an important consideration. We store both types. Once there is a 
stronger candidate header chain we store the headers and proceed to 
obtaining the blocks (if we don't already have them). The blocks are stored 
in the same table; the confirmed vs. candidate indexes simply point to them 
as applicable. It is feasible (and has happened twice) for two blocks to 
share the very same coinbase tx, even with either/all bip30/34/90 active 
(and setting aside future issues here for the sake of simplicity). This 
remains only because two competing branches can have blocks at the same 
height, and bip34 requires only height in the coinbase input script. This 
therefore implies the same transaction but distinct blocks. It is however 
infeasible for one block to exist in multiple distinct chains. In order for 
this to happen two blocks at the same height must have the same coinbase 
(ok), and also the same parent (ok). But this then means that they either 
(1) have distinct identity due to another header property deviation, or (2) 
are the same block with the same parent and are therefore in just one 
chain. So I don't see an actual caveat. I'm not certain if this is the 
ambiguity that you were referring to. If not please feel free to clarify.

>> The concept of Bitcoin block hash as unique identifier for invalid p2p 
block messages is problematic. Apart from the malleation question, what is 
the Bitcoin block hash for a message with unparseable data (#1 and #3)? 
Such messages are trivial to produce and have no block hash.

> For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` (in 
`src/node/connection_types.h`) where some preferential peering policy is 
applied in matters of block messages download.

We don't do this and I don't see how it would be relevant. If a peer 
provides any invalid message or otherwise violates the protocol it is 
simply dropped.

The "problematic" that I'm referring to is the reliance on the block hash 
as a message identifier, because it does not identify the message and 
cannot be useful in an effectively unlimited number of zero-cost cases.

>> What is the useful identifier for a block with malleated commitments (#5 
and #8) or invalid commitments (#4 and #7) - valid txs or otherwise?

> The block header, as it commits to the transaction identifier tree can be 
useful as much for #4 and #5.

#4 and #5 refer to "uncommitted" and "malleated committed". It may not be 
clear, but "uncommitted" means that the tx commitment is not valid (Merkle 
root doesn't match the header's value) and "malleated committed" means that 
the (matching) commitment cannot be relied upon because the txs represent 
malleation, invalidating the identifier. So neither of these are usable 
identifiers.

> On the bitcoin core side, about #7 the uncommitted valid tx data can be 
already present in the validation cache from mempool acceptance. About #8, 
the malleaed committed valid transactions shall be also committed in the 
merkle root in headers.

It seems you may be referring to "unconfirmed" txs as opposed to 
"uncommitted" txs. This doesn't pertain to tx storage or identifiers. 
Neither #7 nor #8 are usable for the same reasons.

>> This seems reasonable at first glance, but given the list of scenarios 
above, which does it apply to?

>> This seems reasonable at first glance, but given the list of scenarios 
above, which does it apply to? Presumably the invalid header (#2) doesn't 
get this far because of headers-first.
>> That leaves just invalid blocks with useful block hash identifiers (#6). 
In all other cases the message is simply discarded. In this case the 
attempt is to move category #5 into category #6 by prohibiting 64 byte txs.

> Yes, it's moving from the category #5 to the category #6. Note, 
transaction malleability can be a distinct issue than lack of domain 
separation.

I'm making no reference to tx malleability. This concerns only Merkle tree 
(block hash) malleability, the two types described in detail in the paper I 
referenced earlier, here again:

https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20190225/a27d8837/attachment-0001.pdf

>> The requirement to "avoid re-downloading and re-validating it" is about 
performance, presumably minimizing initial block download/catch-up time. 
There is a > computational cost to producing 64 byte malleations and none 
for any of the other bogus block message categories above, including the 
other form of malleation. > Furthermore, 64 byte malleation has almost zero 
cost to preclude. No hashing and not even true header or tx parsing are 
required. Only a handful of bytes must be read > from the raw message 
before it can be discarded presently.

>> That's actually far cheaper than any of the other scenarios that again, 
have no cost to produce. The other type of malleation requires parsing all 
of the txs in the block and > hashing and comparing some or all of them. In 
other words, if there is an attack scenario, that must be addressed before 
this can be meaningful. In fact all of the other bogus message scenarios 
(with tx data) will remain more expensive to discard than this one.

> In practice on the bitcoin core side, the bogus block message categories 
from #4 to #6 are already mitigated by validation caching for transactions 
that have been received early. While libbitcoin has no mempool (at least in 
earlier versions) transactions buffering can be done by bip152's 
HeadersAndShortIds message.

Again, this has no relation to tx hashes/identifiers. Libbitcoin has a tx 
pool, we just don't store them in RAM (memory).

> About #7 and #8, introducing a domain separation where 64 bytes 
transactions are rejected and making it harder to exploit #7 and #8 
categories of bogus block messages. This is correct that bitcoin core might 
accept valid transaction data before the merkle tree commitment has been 
verified.

I don't follow this. An invalid 64 byte tx consensus rule would definitely 
not make it harder to exploit block message invalidity. In fact it would 
just slow down validation by adding a redundant rule. Furthermore, as I 
have detailed in a previous message, caching invalidity does absolutely 
nothing to increase protection. In fact it makes the situation materially 
worse.

>> The problem arises from trying to optimize dismissal by storing an 
identifier. Just *producing* the identifier is orders of magnitude more 
costly than simply dismissing this > bogus message. I can't imagine why any 
implementation would want to compute and store and retrieve and recompute 
and compare hashes when the alterative is just dismissing the bogus 
messages with no hashing at all.

>> Bogus messages will arrive, they do not even have to be requested. The 
simplest are dealt with by parse failure. What defines a parse is entirely 
subjective. Generally it's
>> "structural" but nothing precludes incorporating a requirement for a 
necessary leading pattern in the stream, sort of like how the witness 
pattern is identified. If we were
>> going to prioritize early dismissal this is where we would put it.

> I don't think this is that simple - While producing an identifier comes 
with a computational cost (e.g fixed 64-byte structured coinbase 
transaction), if the full node have a hierarchy of validation cache like 
bitcoin core has already, the cost of bogus block messages can be slashed 
down.

No, this is not the case. As I detailed in my previous message, there is no 
possible scenario where invalidation caching does anything but make the 
situation materially worse.

> On the other hand, just dealing with parse failure on the spot by 
introducing a leading pattern in the stream just inflates the size of p2p 
messages, and the transaction-relay bandwidth cost.

I think you misunderstood me. I am suggesting no change to serialization. I 
can see how it might be unclear, but I said, "nothing precludes 
incorporating a requirement for a necessary leading pattern in the stream." 
I meant that the parser can simply incorporate the *requirement* that the 
byte stream starts with a null input point. That identifies the malleation 
or invalidity without a single hash operation and while only reading a 
handful of bytes. No change to any messages.

>> However, there is a tradeoff in terms of early dismissal. Looking up 
invalid hashes is a costly tradeoff, which becomes multiplied by every 
block validated. For example, expending 1 millisecond in hash/lookup to 
save 1 second of validation time in the failure case seems like a 
reasonable tradeoff, until you multiply across the whole chain. > 1 ms 
becomes 14 minutes across the chain, just to save a second for each mallied 
block encountered. That means you need to have encountered 840 such mallied 
blocks > just to break even. Early dismissing the block for non-null 
coinbase point (without hashing anything) would be on the order of 1000x 
faster than that (breakeven at 1 > encounter). So why the block hash cache 
requirement? It cannot be applied to many scenarios, and cannot be optimal 
in this one.

> I think what you're describing is more a classic time-space tradeoff 
which is well-known in classic computer science litterature. In my 
reasonable opinion, one should more reason under what is the security 
paradigm we wish for bitcoin block-relay network and perduring 
decentralization, i.e one where it's easy to verify block messages proofs 
which could have been generated on specialized hardware with an asymmetric 
cost. Obviously encountering 840 such malliead blocks to make it break even 
doesn't make the math up to save on hash lookup, unless you can reduce the 
attack scenario in terms of adversaries capabilities.

I'm referring to DoS mitigation (the only relevant security consideration 
here). I'm pointing out that invalidity caching is pointless in all cases, 
and in this case is the most pointless as type64 malleation is the cheapest 
of all invalidity to detect. I would prefer that all bogus blocks sent to 
my node are of this type. The worst types of invalidity detection have no 
mitigation and from a security standpoint are counterproductive to cache. 
I'm describing what overall is actually not a tradeoff. It's all negative 
and no positive.

Best,
Eric

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcoindev/d9834ad5-f803-4a39-a854-95b2439738f5n%40googlegroups.com.

------=_Part_35621_885950809.1719969200791
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Antoine R,<br /><br />&gt;&gt; Ok, thanks for clarifying. I'm still not =
making the connection to "checking a non-null [C] pointer" but that's prob =
on me.<br /><br />&gt; A C pointer, which is a language idiome assigning to=
 a memory address A the value o memory address B can be 0 (or NULL a standa=
rd macro defined in stddef.h).<br />&gt; Here a snippet example of linked l=
ist code checking the pointer (`*begin_list`) is non null before the compar=
ison operation to find the target element list.<br />&gt; ...<br />&gt; Whi=
le both libbitcoin and bitcoin core are both written in c++, you still have=
 underlying pointer derefencing playing out to access the coinbase transact=
ion, and all underlying implications in terms of memory management.<br /><b=
r />I'm familiar with pointers ;).<br /><br />While at some level the block=
 message buffer would generally be referenced by one or more C pointers, th=
e difference between a valid coinbase input (i.e. with a "null point") and =
any other input, is not nullptr vs. !nullptr. A "null point" is a 36 byte v=
alue, 32 0x00 byes followed by 4 0xff bytes. In his infinite wisdom Satoshi=
 decided it was better (or easier) to serialize a first block tx (coinbase)=
 with an input containing an unusable script and pointing to an invalid [tx=
:index] tuple (input point) as opposed to just not having any input. That i=
nvalid input point is called a "null point", and of course cannot be pointe=
d to by a "null pointer". The coinbase must be identified by comparing thos=
e 36 bytes to the well-known null point value (and if this does not match t=
he Merkle hash cannot have been type64 malleated).<br /><br /><div>&gt; I t=
hink it's interesting to point out the two types of malleation that a bitco=
in consensus validation logic should respect w.r.t block validity checks.=
=C2=A0Like you said the first one on the merkle root committed in the heade=
rs's `hashMerkleRoot` due to the lack of domain separation between leaf and=
 merkle tree nodes.<br /></div><div><br />We call this type64 malleability =
(or malleation where it is not only possible but occurs).<br /><br />&gt; T=
he second one is the bip141 wtxid commitment in one of the coinbase transac=
tion `scriptpubkey` output, which is itself covered by a txid in the merkle=
 tree.<br /><br />While symmetry seems to imply that the witness commitment=
 would be malleable, just as the txs commitment, this is not the case. If t=
he tx commitment is correct it is computationally infeasible for the witnes=
s commitment to be malleated, as the witness commitment incorporates each f=
ull tx (with witness, sentinel, and marker). As such the block identifier, =
which relies only on the header and tx commitment, is a sufficient identifi=
er. Yet it remains necessary to validate the witness commitment to ensure t=
hat the correct witness data has been provided in the block message.<br /><=
br />The second type of malleability, in addition to type64, is what we cal=
l type32. This is the consequence of duplicated trailing sets of txs (and t=
herefore tx hashes) in a block message. This is applicable to some but not =
all blocks, as a function of the number of txs contained.<br /><br />&gt;&g=
t; Caching identity in the case of invalidity is more interesting question =
than it might seem.<br />&gt;&gt; Background: A fully-validated block has e=
stablished identity in its block hash. However an invalid block message may=
 include the same block header, producing the same hash, but with any kind =
of nonsense following the header. The purpose of the transaction and witnes=
s commitments is of course to establish this identity, so these two checks =
are therefore necessary even under checkpoint/milestone. And then of course=
 the two Merkle tree issues complicate the tx commitment (the integrity of =
the witness commitment is assured by that of the tx commitment).<br />&gt;&=
gt;<br />&gt;&gt; So what does it mean to speak of a block hash derived fro=
m:<br />&gt;&gt; (1) a block message with an unparseable header?<br />&gt;&=
gt; (2) a block message with parseable but invalid header?<br />&gt;&gt; (3=
) a block message with valid header but unparseable tx data?<br />&gt;&gt; =
(4) a block message with valid header but parseable invalid uncommitted tx =
data?<br />&gt;&gt; (5) a block message with valid header but parseable inv=
alid malleated committed tx data?<br />&gt;&gt; (6) a block message with va=
lid header but parseable invalid unmalleated committed tx data?<br />&gt;&g=
t; (7) a block message with valid header but uncommitted valid tx data?<br =
/>&gt;&gt; (8) a block message with valid header but malleated committed va=
lid tx data?<br />&gt;&gt; (9) a block message with valid header but unmall=
eated committed valid tx data?<br />&gt;&gt;<br />&gt;&gt; Note that only t=
he #9 p2p block message contains an actual Bitcoin block, the others are bo=
gus messages. In all cases the message can be sha256 hashed to establish th=
e identity of the *message*. And if one's objective is to reject repeating =
bogus messages, this might be a useful strategy. It's already part of the p=
2p protocol, is orders of magnitude cheaper to produce than a Merkle root, =
and has no identity issues.<br /><br />&gt; I think I mostly agree with the=
 identity issue as laid out so far, there is one caveat to add if you're co=
nsidering identity caching as the problem solved. A validation node might h=
ave to consider differently block messages processed if they connect on the=
 longest most PoW valid chain for which all blocks have been validated. Or =
alternatively if they have to be added on a candidate longest most PoW vali=
d chain.<br /><br />Certainly an important consideration. We store both typ=
es. Once there is a stronger candidate header chain we store the headers an=
d proceed to obtaining the blocks (if we don't already have them). The bloc=
ks are stored in the same table; the confirmed vs. candidate indexes simply=
 point to them as applicable. It is feasible (and has happened twice) for t=
wo blocks to share the very same coinbase tx, even with either/all bip30/34=
/90 active (and setting aside future issues here for the sake of simplicity=
). This remains only because two competing branches can have blocks at the =
same height, and bip34 requires only height in the coinbase input script. T=
his therefore implies the same transaction but distinct blocks. It is howev=
er infeasible for one block to exist in multiple distinct chains. In order =
for this to happen two blocks at the same height must have the same coinbas=
e (ok), and also the same parent (ok). But this then means that they either=
 (1) have distinct identity due to another header property deviation, or (2=
) are the same block with the same parent and are therefore in just one cha=
in. So I don't see an actual caveat. I'm not certain if this is the ambigui=
ty that you were referring to. If not please feel free to clarify.<br /><br=
 />&gt;&gt; The concept of Bitcoin block hash as unique identifier for inva=
lid p2p block messages is problematic. Apart from the malleation question, =
what is the Bitcoin block hash for a message with unparseable data (#1 and =
#3)? Such messages are trivial to produce and have no block hash.<br /><br =
/>&gt; For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` =
(in `src/node/connection_types.h`) where some preferential peering policy i=
s applied in matters of block messages download.<br /><br />We don't do thi=
s and I don't see how it would be relevant. If a peer provides any invalid =
message or otherwise violates the protocol it is simply dropped.<br /><br /=
>The "problematic" that I'm referring to is the reliance on the block hash =
as a message identifier, because it does not identify the message and canno=
t be useful in an effectively unlimited number of zero-cost cases.<br /><br=
 />&gt;&gt; What is the useful identifier for a block with malleated commit=
ments (#5 and #8) or invalid commitments (#4 and #7) - valid txs or otherwi=
se?<br /><br />&gt; The block header, as it commits to the transaction iden=
tifier tree can be useful as much for #4 and #5.<br /><br />#4 and #5 refer=
 to "uncommitted" and "malleated committed". It may not be clear, but "unco=
mmitted" means that the tx commitment is not valid (Merkle root doesn't mat=
ch the header's value) and "malleated committed" means that the (matching) =
commitment cannot be relied upon because the txs represent malleation, inva=
lidating the identifier. So neither of these are usable identifiers.<br /><=
br />&gt; On the bitcoin core side, about #7 the uncommitted valid tx data =
can be already present in the validation cache from mempool acceptance. Abo=
ut #8, the malleaed committed valid transactions shall be also committed in=
 the merkle root in headers.<br /><br />It seems you may be referring to "u=
nconfirmed" txs as opposed to "uncommitted" txs. This doesn't pertain to tx=
 storage or identifiers. Neither #7 nor #8 are usable for the same reasons.=
<br /><br />&gt;&gt; This seems reasonable at first glance, but given the l=
ist of scenarios above, which does it apply to?<br /><br />&gt;&gt; This se=
ems reasonable at first glance, but given the list of scenarios above, whic=
h does it apply to? Presumably the invalid header (#2) doesn't get this far=
 because of headers-first.<br />&gt;&gt; That leaves just invalid blocks wi=
th useful block hash identifiers (#6). In all other cases the message is si=
mply discarded. In this case the attempt is to move category #5 into catego=
ry #6 by prohibiting 64 byte txs.<br /><br />&gt; Yes, it's moving from the=
 category #5 to the category #6. Note, transaction malleability can be a di=
stinct issue than lack of domain separation.<br /><br />I'm making no refer=
ence to tx malleability. This concerns only Merkle tree (block hash) mallea=
bility, the two types described in detail in the paper I referenced earlier=
, here again:<br /><br />https://lists.linuxfoundation.org/pipermail/bitcoi=
n-dev/attachments/20190225/a27d8837/attachment-0001.pdf<br /><br />&gt;&gt;=
 The requirement to "avoid re-downloading and re-validating it" is about pe=
rformance, presumably minimizing initial block download/catch-up time. Ther=
e is a &gt; computational cost to producing 64 byte malleations and none fo=
r any of the other bogus block message categories above, including the othe=
r form of malleation. &gt; Furthermore, 64 byte malleation has almost zero =
cost to preclude. No hashing and not even true header or tx parsing are req=
uired. Only a handful of bytes must be read &gt; from the raw message befor=
e it can be discarded presently.<br /><br />&gt;&gt; That's actually far ch=
eaper than any of the other scenarios that again, have no cost to produce. =
The other type of malleation requires parsing all of the txs in the block a=
nd &gt; hashing and comparing some or all of them. In other words, if there=
 is an attack scenario, that must be addressed before this can be meaningfu=
l. In fact all of the other bogus message scenarios (with tx data) will rem=
ain more expensive to discard than this one.<br /><br />&gt; In practice on=
 the bitcoin core side, the bogus block message categories from #4 to #6 ar=
e already mitigated by validation caching for transactions that have been r=
eceived early. While libbitcoin has no mempool (at least in earlier version=
s) transactions buffering can be done by bip152's HeadersAndShortIds messag=
e.<br /><br />Again, this has no relation to tx hashes/identifiers. Libbitc=
oin has a tx pool, we just don't store them in RAM (memory).<br /><br />&gt=
; About #7 and #8, introducing a domain separation where 64 bytes transacti=
ons are rejected and making it harder to exploit #7 and #8 categories of bo=
gus block messages. This is correct that bitcoin core might accept valid tr=
ansaction data before the merkle tree commitment has been verified.<br /><b=
r />I don't follow this. An invalid 64 byte tx consensus rule would definit=
ely not make it harder to exploit block message invalidity. In fact it woul=
d just slow down validation by adding a redundant rule. Furthermore, as I h=
ave detailed in a previous message, caching invalidity does absolutely noth=
ing to increase protection. In fact it makes the situation materially worse=
.<br /><br />&gt;&gt; The problem arises from trying to optimize dismissal =
by storing an identifier. Just *producing* the identifier is orders of magn=
itude more costly than simply dismissing this &gt; bogus message. I can't i=
magine why any implementation would want to compute and store and retrieve =
and recompute and compare hashes when the alterative is just dismissing the=
 bogus messages with no hashing at all.<br /><br />&gt;&gt; Bogus messages =
will arrive, they do not even have to be requested. The simplest are dealt =
with by parse failure. What defines a parse is entirely subjective. General=
ly it's<br />&gt;&gt; "structural" but nothing precludes incorporating a re=
quirement for a necessary leading pattern in the stream, sort of like how t=
he witness pattern is identified. If we were<br />&gt;&gt; going to priorit=
ize early dismissal this is where we would put it.<br /><br />&gt; I don't =
think this is that simple - While producing an identifier comes with a comp=
utational cost (e.g fixed 64-byte structured coinbase transaction), if the =
full node have a hierarchy of validation cache like bitcoin core has alread=
y, the cost of bogus block messages can be slashed down.<br /><br />No, thi=
s is not the case. As I detailed in my previous message, there is no possib=
le scenario where invalidation caching does anything but make the situation=
 materially worse.<br /><br />&gt; On the other hand, just dealing with par=
se failure on the spot by introducing a leading pattern in the stream just =
inflates the size of p2p messages, and the transaction-relay bandwidth cost=
.<br /><br />I think you misunderstood me. I am suggesting no change to ser=
ialization. I can see how it might be unclear, but I said, "nothing preclud=
es incorporating a requirement for a necessary leading pattern in the strea=
m." I meant that the parser can simply incorporate the *requirement* that t=
he byte stream starts with a null input point. That identifies the malleati=
on or invalidity without a single hash operation and while only reading a h=
andful of bytes. No change to any messages.<br /><br />&gt;&gt; However, th=
ere is a tradeoff in terms of early dismissal. Looking up invalid hashes is=
 a costly tradeoff, which becomes multiplied by every block validated. For =
example, expending 1 millisecond in hash/lookup to save 1 second of validat=
ion time in the failure case seems like a reasonable tradeoff, until you mu=
ltiply across the whole chain. &gt; 1 ms becomes 14 minutes across the chai=
n, just to save a second for each mallied block encountered. That means you=
 need to have encountered 840 such mallied blocks &gt; just to break even. =
Early dismissing the block for non-null coinbase point (without hashing any=
thing) would be on the order of 1000x faster than that (breakeven at 1 &gt;=
 encounter). So why the block hash cache requirement? It cannot be applied =
to many scenarios, and cannot be optimal in this one.<br /><br />&gt; I thi=
nk what you're describing is more a classic time-space tradeoff which is we=
ll-known in classic computer science litterature. In my reasonable opinion,=
 one should more reason under what is the security paradigm we wish for bit=
coin block-relay network and perduring decentralization, i.e one where it's=
 easy to verify block messages proofs which could have been generated on sp=
ecialized hardware with an asymmetric cost. Obviously encountering 840 such=
 malliead blocks to make it break even doesn't make the math up to save on =
hash lookup, unless you can reduce the attack scenario in terms of adversar=
ies capabilities.<br /><br />I'm referring to DoS mitigation (the only rele=
vant security consideration here). I'm pointing out that invalidity caching=
 is pointless in all cases, and in this case is the most pointless as type6=
4 malleation is the cheapest of all invalidity to detect. I would prefer th=
at all bogus blocks sent to my node are of this type. The worst types of in=
validity detection have no mitigation and from a security standpoint are co=
unterproductive to cache. I'm describing what overall is actually not a tra=
deoff. It's all negative and no positive.<br /><br />Best,<br />Eric</div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;Bitcoin Development Mailing List&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:bitcoindev+unsubscribe@googlegroups.com">bitcoind=
ev+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/d/msgid/bitcoindev/d9834ad5-f803-4a39-a854-95b2439738f5n%40googlegroups.=
com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com/d/msg=
id/bitcoindev/d9834ad5-f803-4a39-a854-95b2439738f5n%40googlegroups.com</a>.=
<br />

------=_Part_35621_885950809.1719969200791--

------=_Part_35620_344008102.1719969200791--