1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
|
Delivery-date: Mon, 01 Jul 2024 22:03:42 -0700
Received: from mail-yb1-f186.google.com ([209.85.219.186])
by mail.fairlystable.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
(Exim 4.94.2)
(envelope-from <bitcoindev+bncBC3PT7FYWAMRBJEUR22AMGQE2LA5WUQ@googlegroups.com>)
id 1sOVg4-0003Pg-CM
for bitcoindev@gnusha.org; Mon, 01 Jul 2024 22:03:42 -0700
Received: by mail-yb1-f186.google.com with SMTP id 3f1490d57ef6-dc691f1f83asf1909934276.1
for <bitcoindev@gnusha.org>; Mon, 01 Jul 2024 22:03:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=googlegroups.com; s=20230601; t=1719896614; x=1720501414; darn=gnusha.org;
h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
:list-id:mailing-list:precedence:x-original-sender:mime-version
:subject:references:in-reply-to:message-id:to:from:date:sender:from
:to:cc:subject:date:message-id:reply-to;
bh=eBS2L6FdZw8kfftpl7Jjf151kJv2mrSQ4RRA3P0mmyw=;
b=Aa690xSGATVAzXlzCZcYaPMjHa1PeCgjMHAO+V74mOIAOpIZ2MhjCdWnq1Z9iOu9rR
n8QZC0PVzv8jRJT7AfFxmiOZ0PA8N8YEHcVyK7qxlGZFaPF6ucl5aGzFW28ykNFzSQqI
K7wvrbjGtI0g0DHWoJIsCuXO5mXAU3Aw8I0siZ/WQu4q7+3advGSfDYHndpzbJOMIlW6
2F0qcuFhGHr7Eu1IuXpDeNXcSoK4swJW/oGWSSiIP5IL698+tqe224u/119cbMLuyL8L
0gFBHrhUfQLb2wAeWNIh3spY1LAD6AJgrJFycuRinGgGKMm7rUSoKvcjhPJ2GDxCdupa
/ycA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1719896614; x=1720501414; darn=gnusha.org;
h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
:list-id:mailing-list:precedence:x-original-sender:mime-version
:subject:references:in-reply-to:message-id:to:from:date:from:to:cc
:subject:date:message-id:reply-to;
bh=eBS2L6FdZw8kfftpl7Jjf151kJv2mrSQ4RRA3P0mmyw=;
b=imDVMNln+4Bjv2z7hdJiTBE17WjjjhAgi/hJSmzBmDpw6y37ypJC7lzkL5/o4fUXmy
QRx7REOvFZpfRMXtNtwL8fkyyauAZsOUF8heaUTYkQHky1HmO5UyQElIKl2zMxLZIZG/
IM/B3pKQg9ZHpA9SdJT8KRuPpubVk6KB7FtmLfZ7YXcm02oKy67JzY57akHf7ZgahPTZ
tx2fvRMOO53asCA3l85r+r3Dx1AnNQmqRQCplj07mU5QyTXcZSXTVDWN4oHfn0fc5rVr
6N86Y8p6+S44lkMhvcA4eIrbVDkAZtTL9N2/wji9In6HQo6yAupMz5lC4ukNO/iCzV0I
bFlA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1719896614; x=1720501414;
h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
:list-id:mailing-list:precedence:x-original-sender:mime-version
:subject:references:in-reply-to:message-id:to:from:date:x-beenthere
:x-gm-message-state:sender:from:to:cc:subject:date:message-id
:reply-to;
bh=eBS2L6FdZw8kfftpl7Jjf151kJv2mrSQ4RRA3P0mmyw=;
b=CFTYFGlTn6ii/3CJgOthF0IZ9gwtHgz4FN5KZ9tJjYzB/fB5aSlATewQlop4Wxhwoq
068FSVXHFQlm0Zc00hgEjGH5LdNUIr0LLMEv44hB+4nvWXiQDi1h1nCJJipSs28R7F0B
CCY0K+A5FCD/1dgtft1PuBs8ZT+zWL6N2p1QrMDa0r9R5xGs1OP/YyRT+pHX5C4uwgsv
Z/v9kD8hUKQSa7gHt9urUwVAvupZnqDTaaDs6E2T0cme9H9+LmMtSlRVIlGVWHw9sN01
FQvmhRCYLfUMVdQseewAX7SZgubi+wH/SY9QtaD50+/lnmIU5VOrh6Fjy3THt+RN20WO
Qraw==
Sender: bitcoindev@googlegroups.com
X-Forwarded-Encrypted: i=1; AJvYcCXgMCdEwDaFGLNXi/OQQ+G5rw1srEZFEjjguUKOAyYdrwTOVz1C993vTiZWM8lucPU0GIZZNQnDc9GiqsgJw8sObzqfy64=
X-Gm-Message-State: AOJu0YwCGjpR8VTxzP92RWshAk5rh3BwgLotI0YdrngCv5wgDY8oRIiY
eBP9MXcoLjEoZ/EUDQVVoi47HXztK3rBW7RM6fTGXXXo//93vVd7
X-Google-Smtp-Source: AGHT+IFRjvTmIYpVJKgh2S0Q357sDrV0V7L75NRvIdG4dOWX2YQSqk3vS25p9bU+ETXNwVVPJgzweg==
X-Received: by 2002:a25:abea:0:b0:e03:359a:6a54 with SMTP id 3f1490d57ef6-e036df3bcecmr5070667276.6.1719896613816;
Mon, 01 Jul 2024 22:03:33 -0700 (PDT)
X-BeenThere: bitcoindev@googlegroups.com
Received: by 2002:a05:6902:1005:b0:e03:2217:3c88 with SMTP id
3f1490d57ef6-e0356250cd3ls1670233276.1.-pod-prod-05-us; Mon, 01 Jul 2024
22:03:32 -0700 (PDT)
X-Received: by 2002:a05:6902:188e:b0:dfb:1c1c:abf9 with SMTP id 3f1490d57ef6-e036eb1ec92mr406657276.2.1719896612408;
Mon, 01 Jul 2024 22:03:32 -0700 (PDT)
Received: by 2002:a81:ae02:0:b0:627:7f59:2eee with SMTP id 00721157ae682-64d36134a0ams7b3;
Mon, 1 Jul 2024 19:36:09 -0700 (PDT)
X-Received: by 2002:a05:6902:150d:b0:e03:31ec:8a1d with SMTP id 3f1490d57ef6-e036eae64bamr338066276.3.1719887769124;
Mon, 01 Jul 2024 19:36:09 -0700 (PDT)
Date: Mon, 1 Jul 2024 19:36:08 -0700 (PDT)
From: Antoine Riard <antoine.riard@gmail.com>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Message-Id: <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com>
In-Reply-To: <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com>
References: <gnM89sIQ7MhDgI62JciQEGy63DassEv7YZAMhj0IEuIo0EdnafykF6RH4OqjTTHIHsIoZvC2MnTUzJI7EfET4o-UQoD-XAQRDcct994VarE=@protonmail.com>
<72e83c31-408f-4c13-bff5-bf0789302e23n@googlegroups.com>
<heKH68GFJr4Zuf6lBozPJrb-StyBJPMNvmZL0xvKFBnBGVA3fVSgTLdWc-_8igYWX8z3zCGvzflH-CsRv0QCJQcfwizNyYXlBJa_Kteb2zg=@protonmail.com>
<5b0331a5-4e94-465d-a51d-02166e2c1937n@googlegroups.com>
<yt1O1F7NiVj-WkmnYeta1fSqCYNFx8h6OiJaTBmwhmJ2MWAZkmmjPlUST6FM7t6_-2NwWKdglWh77vcnEKA8swiAnQCZJY2SSCAh4DOKt2I=@protonmail.com>
<be78e733-6e9f-4f4e-8dc2-67b79ddbf677n@googlegroups.com>
<jJLDrYTXvTgoslhl1n7Fk9-pL1mMC-0k6gtoniQINmioJpzgtqrJ_WqyFZkLltsCUusnQ4jZ6HbvRC-mGuaUlDi3kcqcFHALd10-JQl-FMY=@protonmail.com>
<9a4c4151-36ed-425a-a535-aa2837919a04n@googlegroups.com>
<3f0064f9-54bd-46a7-9d9a-c54b99aca7b2n@googlegroups.com>
<26b7321b-cc64-44b9-bc95-a4d8feb701e5n@googlegroups.com>
<CALZpt+EwVyaz1=A6hOOycqFGJs+zxyYYocZixTJgVmzZezUs9Q@mail.gmail.com>
<607a2233-ac12-4a80-ae4a-08341b3549b3n@googlegroups.com>
<3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com>
Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_22823_366698941.1719887768873"
X-Original-Sender: antoine.riard@gmail.com
Precedence: list
Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com
List-ID: <bitcoindev.googlegroups.com>
X-Google-Group-Id: 786775582512
List-Post: <https://groups.google.com/group/bitcoindev/post>, <mailto:bitcoindev@googlegroups.com>
List-Help: <https://groups.google.com/support/>, <mailto:bitcoindev+help@googlegroups.com>
List-Archive: <https://groups.google.com/group/bitcoindev
List-Subscribe: <https://groups.google.com/group/bitcoindev/subscribe>, <mailto:bitcoindev+subscribe@googlegroups.com>
List-Unsubscribe: <mailto:googlegroups-manage+786775582512+unsubscribe@googlegroups.com>,
<https://groups.google.com/group/bitcoindev/subscribe>
X-Spam-Score: -0.5 (/)
------=_Part_22823_366698941.1719887768873
Content-Type: multipart/alternative;
boundary="----=_Part_22824_1413333519.1719887768873"
------=_Part_22824_1413333519.1719887768873
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Hi Eric,
> Ok, thanks for clarifying. I'm still not making the connection to=20
"checking a non-null [C] pointer" but that's prob on me.
A C pointer, which is a language idiome assigning to a memory address A the=
=20
value o memory address B can be 0 (or NULL a standard macro defined in=20
stddef.h).
Here a snippet example of linked list code checking the pointer=20
(`*begin_list`) is non null before the comparison operation to find the=20
target element list.
```
pointer_t ft_list_find(pointer_t **start_list, void *data_ref, int=20
(*cmp)())
{
while (*start_list)
{
if (cmp((*start_list)->data, data_ref) =3D=3D 0)
return (*start_list);
*start_list =3D (*start_list)->next;
}
return (0);
}
```
While both libbitcoin and bitcoin core are both written in c++, you still=
=20
have underlying pointer derefencing playing out to access the coinbase
transaction, and all underlying implications in terms of memory management.
> Yes, a rough correlation but not necessarily equivalence. Note that=20
block.check has context free and contextual overrides.
>=20
> The 'bypass' parameter indicates a block under checkpoint or milestone=20
("assume valid"). In this case we must check Merkle root, witness=20
commitment, and both types of malleation - as the purpose is to establish=
=20
identity. Absent 'bypass' the typical checks are performed, and therefore a=
=20
malleation check is not required here. The "type64" malleation is subsumed=
=20
by the is_first_non_coinbase check and the "type32" malleation is subsumed=
=20
by the is_internal_double_spend check.
Yes, I understand it's not a 1-to-1 compatibility, just a rough logical=20
equivalence.
I think it's interesting to point out the two types of malleation that a=20
bitcoin consensus validation logic should respect w.r.t block validity=20
checks.
Like you said the first one on the merkle root committed in the headers's=
=20
`hashMerkleRoot` due to the lack of domain separation between leaf and=20
merkle tree nodes.
The second one is the bip141 wtxid commitment in one of the coinbase=20
transaction `scriptpubkey` output, which is itself covered by a txid in the=
=20
merkle tree.
> Caching identity in the case of invalidity is more interesting question=
=20
than it might seem.
>=20
> Background: A fully-validated block has established identity in its block=
=20
hash. However an invalid block message may include the same block header,=
=20
producing the same hash, but with any kind of nonsense following the=20
header. The purpose of the transaction and witness commitments is of course=
=20
to establish this identity, so these two checks are therefore necessary=20
even under checkpoint/milestone. And then of course the two Merkle tree=20
issues complicate the tx commitment (the integrity of the witness=20
commitment is assured by that of the tx commitment).
>=20
> So what does it mean to speak of a block hash derived from:
>=20
> (1) a block message with an unparseable header?
> (2) a block message with parseable but invalid header?
> (3) a block message with valid header but unparseable tx data?
> (4) a block message with valid header but parseable invalid uncommitted=
=20
tx data?
> (5) a block message with valid header but parseable invalid malleated=20
committed tx data?
> (6) a block message with valid header but parseable invalid unmalleated=
=20
committed tx data?
> (7) a block message with valid header but uncommitted valid tx data?
> (8) a block message with valid header but malleated committed valid tx=20
data?
> (9) a block message with valid header but unmalleated committed valid tx=
=20
data?
>=20
> Note that only the #9 p2p block message contains an actual Bitcoin block,=
=20
the others are bogus messages. In all cases the message can be sha256=20
hashed to establish the identity of the *message*. And if one's objective=
=20
is to reject repeating bogus messages, this might be a useful strategy.=20
It's already part of the p2p protocol, is orders of magnitude cheaper to=20
produce than a Merkle root, and has no identity issues.
I think I mostly agree with the identity issue as laid out so far, there is=
=20
one caveat to add if you're considering identity caching as the problem=20
solved.
A validation node might have to consider differently block messages=20
processed if they connect on the longest most PoW valid chain for which all=
=20
blocks have been validated. Or alternatively if they have to be added on a=
=20
candidate longest most PoW valid chain.
> The concept of Bitcoin block hash as unique identifier for invalid p2p=20
block messages is problematic. Apart from the malleation question, what is=
=20
the Bitcoin block
> hash for a message with unparseable data (#1 and #3)? Such messages are=
=20
trivial to produce and have no block hash.
For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` (in=20
`src/node/connection_types.h`) where some preferential peering policy is=20
applied in matters of block messages download.
> What is the useful identifier for a block with malleated commitments (#5=
=20
and #8) or invalid commitments (#4 and #7) - valid txs or otherwise?
The block header, as it commits to the transaction identifier tree can be=
=20
useful as much for #4 and #5. On the bitcoin core side, about #7 the=20
uncommitted valid tx data can be already present in the validation cache=20
from mempool acceptance. About #8, the malleaed committed valid=20
transactions shall be also committed in the merkle root in headers.
> This seems reasonable at first glance, but given the list of scenarios=20
above, which does it apply to?
> This seems reasonable at first glance, but given the list of scenarios=20
above, which does it apply to? Presumably the invalid header (#2) doesn't=
=20
get this far because of headers-first.
> That leaves just invalid blocks with useful block hash identifiers (#6).=
=20
In all other cases the message is simply discarded. In this case the=20
attempt is to move category #5 into category #6 by prohibiting 64 byte txs.
Yes, it's moving from the category #5 to the category #6. Note, transaction=
=20
malleability can be a distinct issue than lack of domain separation.
> The requirement to "avoid re-downloading and re-validating it" is about=
=20
performance, presumably minimizing initial block download/catch-up time.=20
There is a > computational cost to producing 64 byte malleations and none=
=20
for any of the other bogus block message categories above, including the=20
other form of malleation. > Furthermore, 64 byte malleation has almost zero=
=20
cost to preclude. No hashing and not even true header or tx parsing are=20
required. Only a handful of bytes must be read > from the raw message=20
before it can be discarded presently.
> That's actually far cheaper than any of the other scenarios that again,=
=20
have no cost to produce. The other type of malleation requires parsing all=
=20
of the txs in the block and > hashing and comparing some or all of them. In=
=20
other words, if there is an attack scenario, that must be addressed before=
=20
this can be meaningful. In fact all of the other
> bogus message scenarios (with tx data) will remain more expensive to=20
discard than this one.
In practice on the bitcoin core side, the bogus block message categories=20
from #4 to #6 are already mitigated by validation caching for transactions=
=20
that have been received early. While libbitcoin has no mempool (at least in=
=20
earlier versions) transactions buffering can be done by bip152's=20
HeadersAndShortIds message.
About #7 and #8, introducing a domain separation where 64 bytes=20
transactions are rejected and making it harder to exploit #7 and #8=20
categories of bogus block messages.
This is correct that bitcoin core might accept valid transaction data=20
before the merkle tree commitment has been verified.
> The problem arises from trying to optimize dismissal by storing an=20
identifier. Just *producing* the identifier is orders of magnitude more=20
costly than simply dismissing this > bogus message. I can't imagine why any=
=20
implementation would want to compute and store and retrieve and recompute=
=20
and compare hashes when the alterative is just
> dismissing the bogus messages with no hashing at all.
> Bogus messages will arrive, they do not even have to be requested. The=20
simplest are dealt with by parse failure. What defines a parse is entirely=
=20
subjective. Generally it's
> "structural" but nothing precludes incorporating a requirement for a=20
necessary leading pattern in the stream, sort of like how the witness=20
pattern is identified. If we were
> going to prioritize early dismissal this is where we would put it.
I don't think this is that simple - While producing an identifier comes=20
with a computational cost (e.g fixed 64-byte structured coinbase=20
transaction), if the full node have a hierarchy of validation cache like=20
bitcoin core has already, the cost of bogus block messages can be slashed=
=20
down. On the other hand, just dealing with parse failure on the spot by=20
introducing a leading pattern in the stream just inflates the size of p2p=
=20
messages, and the transaction-relay bandwidth cost.
> However, there is a tradeoff in terms of early dismissal. Looking up=20
invalid hashes is a costly tradeoff, which becomes multiplied by every=20
block validated. For example,
> expending 1 millisecond in hash/lookup to save 1 second of validation=20
time in the failure case seems like a reasonable tradeoff, until you=20
multiply across the whole chain. > 1 ms becomes 14 minutes across the=20
chain, just to save a second for each mallied block encountered. That means=
=20
you need to have encountered 840 such mallied blocks > just to break even.=
=20
Early dismissing the block for non-null coinbase point (without hashing=20
anything) would be on the order of 1000x faster than that (breakeven at 1 >=
=20
encounter). So why the block hash cache requirement? It cannot be applied=
=20
to many scenarios, and cannot be optimal in this one.
I think what you're describing is more a classic time-space tradeoff which=
=20
is well-known in classic computer science litterature. In my reasonable=20
opinion, one should more reason under what is the security paradigm we wish=
=20
for bitcoin block-relay network and perduring decentralization, i.e one=20
where it's easy to verify block messages proofs which could have been=20
generated on specialized hardware with an asymmetric cost. Obviously=20
encountering 840 such malliead blocks to make it break even doesn't make=20
the math up to save on hash lookup, unless you can reduce the attack=20
scenario in terms of adversaries capabilities.
Best,
Antoine=20
Le samedi 29 juin 2024 =C3=A0 21:42:23 UTC+1, Eric Voskuil a =C3=A9crit :
> Caching identity in the case of invalidity is more interesting question=
=20
> than it might seem.
>
> Background: A fully-validated block has established identity in its block=
=20
> hash. However an invalid block message may include the same block header,=
=20
> producing the same hash, but with any kind of nonsense following the=20
> header. The purpose of the transaction and witness commitments is of cour=
se=20
> to establish this identity, so these two checks are therefore necessary=
=20
> even under checkpoint/milestone. And then of course the two Merkle tree=
=20
> issues complicate the tx commitment (the integrity of the witness=20
> commitment is assured by that of the tx commitment).
>
> So what does it mean to speak of a block hash derived from:
>
> (1) a block message with an unparseable header?
> (2) a block message with parseable but invalid header?
> (3) a block message with valid header but unparseable tx data?
> (4) a block message with valid header but parseable invalid uncommitted t=
x=20
> data?
> (5) a block message with valid header but parseable invalid malleated=20
> committed tx data?
> (6) a block message with valid header but parseable invalid unmalleated=
=20
> committed tx data?
> (7) a block message with valid header but uncommitted valid tx data?
> (8) a block message with valid header but malleated committed valid tx=20
> data?
> (9) a block message with valid header but unmalleated committed valid tx=
=20
> data?
>
> Note that only the #9 p2p block message contains an actual Bitcoin block,=
=20
> the others are bogus messages. In all cases the message can be sha256=20
> hashed to establish the identity of the *message*. And if one's objective=
=20
> is to reject repeating bogus messages, this might be a useful strategy.=
=20
> It's already part of the p2p protocol, is orders of magnitude cheaper to=
=20
> produce than a Merkle root, and has no identity issues.
>
> The concept of Bitcoin block hash as unique identifier for invalid p2p=20
> block messages is problematic. Apart from the malleation question, what i=
s=20
> the Bitcoin block hash for a message with unparseable data (#1 and #3)?=
=20
> Such messages are trivial to produce and have no block hash. What is the=
=20
> useful identifier for a block with malleated commitments (#5 and #8) or=
=20
> invalid commitments (#4 and #7) - valid txs or otherwise?
>
> The stated objective for a consensus rule to invalidate all 64 byte txs i=
s:
>
> > being able to cache the hash of a (non-malleated) invalid block as=20
> permanently invalid to avoid re-downloading and re-validating it.
>
> This seems reasonable at first glance, but given the list of scenarios=20
> above, which does it apply to? Presumably the invalid header (#2) doesn't=
=20
> get this far because of headers-first. That leaves just invalid blocks wi=
th=20
> useful block hash identifiers (#6). In all other cases the message is=20
> simply discarded. In this case the attempt is to move category #5 into=20
> category #6 by prohibiting 64 byte txs.
>
> The requirement to "avoid re-downloading and re-validating it" is about=
=20
> performance, presumably minimizing initial block download/catch-up time.=
=20
> There is a computational cost to producing 64 byte malleations and none f=
or=20
> any of the other bogus block message categories above, including the othe=
r=20
> form of malleation. Furthermore, 64 byte malleation has almost zero cost =
to=20
> preclude. No hashing and not even true header or tx parsing are required.=
=20
> Only a handful of bytes must be read from the raw message before it can b=
e=20
> discarded presently.
>
> That's actually far cheaper than any of the other scenarios that again,=
=20
> have no cost to produce. The other type of malleation requires parsing al=
l=20
> of the txs in the block and hashing and comparing some or all of them. In=
=20
> other words, if there is an attack scenario, that must be addressed befor=
e=20
> this can be meaningful. In fact all of the other bogus message scenarios=
=20
> (with tx data) will remain more expensive to discard than this one.
>
> The problem arises from trying to optimize dismissal by storing an=20
> identifier. Just *producing* the identifier is orders of magnitude more=
=20
> costly than simply dismissing this bogus message. I can't imagine why any=
=20
> implementation would want to compute and store and retrieve and recompute=
=20
> and compare hashes when the alterative is just dismissing the bogus=20
> messages with no hashing at all.
>
> Bogus messages will arrive, they do not even have to be requested. The=20
> simplest are dealt with by parse failure. What defines a parse is entirel=
y=20
> subjective. Generally it's "structural" but nothing precludes incorporati=
ng=20
> a requirement for a necessary leading pattern in the stream, sort of like=
=20
> how the witness pattern is identified. If we were going to prioritize ear=
ly=20
> dismissal this is where we would put it.
>
> However, there is a tradeoff in terms of early dismissal. Looking up=20
> invalid hashes is a costly tradeoff, which becomes multiplied by every=20
> block validated. For example, expending 1 millisecond in hash/lookup to=
=20
> save 1 second of validation time in the failure case seems like a=20
> reasonable tradeoff, until you multiply across the whole chain. 1 ms=20
> becomes 14 minutes across the chain, just to save a second for each malli=
ed=20
> block encountered. That means you need to have encountered 840 such malli=
ed=20
> blocks just to break even. Early dismissing the block for non-null coinba=
se=20
> point (without hashing anything) would be on the order of 1000x faster th=
an=20
> that (breakeven at 1 encounter). So why the block hash cache requirement?=
=20
> It cannot be applied to many scenarios, and cannot be optimal in this one=
.
>
> Eric
>
--=20
You received this message because you are subscribed to the Google Groups "=
Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/=
bitcoindev/301c64c7-0f0f-476a-90c4-913659477276n%40googlegroups.com.
------=_Part_22824_1413333519.1719887768873
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Hi Eric,<div><br /></div><div>> Ok, thanks for clarifying. I'm still not=
making the connection to "checking a non-null [C] pointer" but that's prob=
on me.<br /><br />A C pointer, which is a language idiome assigning to a m=
emory address A the value o memory address B can be 0 (or NULL a standard m=
acro defined in stddef.h).<br /><br />Here a snippet example of linked list=
code checking the pointer (`*begin_list`) is non null before the compariso=
n operation to find the target element list.<br /><br />```<br />pointer_t =
=C2=A0 =C2=A0 =C2=A0 ft_list_find(pointer_t **start_list, void *data_ref, i=
nt (*cmp)())<br />{<br />=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (*start_list)<br=
/>=C2=A0 =C2=A0 =C2=A0 =C2=A0 {<br />=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 if (cmp((*start_list)->data, data_ref) =3D=3D 0)<br=
/>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 return (*start_list);<br />=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
=C2=A0 =C2=A0 =C2=A0 *start_list =3D (*start_list)->next;<br />=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 }<br />=C2=A0 =C2=A0 =C2=A0 =C2=A0 return (0);<br />}<=
/div><div>```</div><div><br /></div><div>While both libbitcoin and bitcoin =
core are both written in c++, you still have underlying pointer derefencing=
playing out to access the coinbase<br />transaction, and all underlying im=
plications in terms of memory management.<br /><br />> Yes, a rough corr=
elation but not necessarily equivalence. Note that block.check has context =
free and contextual overrides.<br />> <br />> The 'bypass' parameter =
indicates a block under checkpoint or milestone ("assume valid"). In this c=
ase we must check Merkle root, witness commitment, and both types of mallea=
tion - as the purpose is to establish identity. Absent 'bypass' the typical=
checks are performed, and therefore a malleation check is not required her=
e. The "type64" malleation is subsumed by the is_first_non_coinbase check a=
nd the "type32" malleation is subsumed by the is_internal_double_spend chec=
k.<br /><br />Yes, I understand it's not a 1-to-1 compatibility, just a rou=
gh logical equivalence.<br /><br />I think it's interesting to point out th=
e two types of malleation that a bitcoin consensus validation logic should =
respect w.r.t block validity checks.<br /><br />Like you said the first one=
on the merkle root committed in the headers's `hashMerkleRoot` due to the =
lack of domain separation between leaf and merkle tree nodes.<br />The seco=
nd one is the bip141 wtxid commitment in one of the coinbase transaction `s=
criptpubkey` output,=C2=A0which is itself covered by a txid in the merkle t=
ree.</div><div><br /></div><div>> Caching identity in the case of invali=
dity is more interesting question than it might seem.<br />> <br />> =
Background: A fully-validated block has established identity in its block h=
ash. However an invalid block message may include the same block header, pr=
oducing the same hash, but with any kind of nonsense following the header. =
The purpose of the transaction and witness commitments is of course to esta=
blish this identity, so these two checks are therefore necessary even under=
checkpoint/milestone. And then of course the two Merkle tree issues compli=
cate the tx commitment (the integrity of the witness commitment is assured =
by that of the tx commitment).<br />> <br />> So what does it mean to=
speak of a block hash derived from:<br />> <br />> (1) a block messa=
ge with an unparseable header?<br />> (2) a block message with parseable=
but invalid header?<br />> (3) a block message with valid header but un=
parseable tx data?<br />> (4) a block message with valid header but pars=
eable invalid uncommitted tx data?<br />> (5) a block message with valid=
header but parseable invalid malleated committed tx data?<br />> (6) a =
block message with valid header but parseable invalid unmalleated committed=
tx data?<br />> (7) a block message with valid header but uncommitted v=
alid tx data?<br />> (8) a block message with valid header but malleated=
committed valid tx data?<br />> (9) a block message with valid header b=
ut unmalleated committed valid tx data?<br />> <br />> Note that only=
the #9 p2p block message contains an actual Bitcoin block, the others are =
bogus messages. In all cases the message can be sha256 hashed to establish =
the identity of the *message*. And if one's objective is to reject repeatin=
g bogus messages, this might be a useful strategy. It's already part of the=
p2p protocol, is orders of magnitude cheaper to produce than a Merkle root=
, and has no identity issues.<br /><br /></div><div>I think I mostly agree =
with the identity issue as laid out so far, there is one caveat to add if y=
ou're considering identity caching as the problem solved.<br />A validation=
node might have to consider differently block messages processed if they c=
onnect on the longest most PoW valid chain for which all blocks have been v=
alidated. Or alternatively if they have to be added on a candidate longest =
most PoW valid chain.<br /><br />> The concept of Bitcoin block hash as =
unique identifier for invalid p2p block messages is problematic. Apart from=
the malleation question, what is the Bitcoin block<br />> hash for a me=
ssage with unparseable data (#1 and #3)? Such messages are trivial to produ=
ce and have no block hash.<br /><br />For reasons, bitcoin core has the con=
cept of outbound `BLOCK_RELAY` (in `src/node/connection_types.h`) where som=
e preferential peering policy is applied in matters of block messages downl=
oad.<br /><br />> What is the useful identifier for a block with malleat=
ed commitments (#5 and #8) or invalid commitments (#4 and #7) - valid txs o=
r otherwise?<br /><br />The block header, as it commits to the transaction =
identifier tree can be useful as much for #4 and #5. On the bitcoin core si=
de, about #7 the uncommitted valid tx data can be already present in the va=
lidation cache from mempool acceptance. About #8, the malleaed committed va=
lid transactions shall be also committed in the merkle root in headers.<br =
/><br />> This seems reasonable at first glance, but given the list of s=
cenarios above, which does it apply to?</div><div><br /></div>> This see=
ms reasonable at first glance, but given the list of scenarios above, which=
does it apply to? Presumably the invalid header (#2) doesn't get this far =
because of headers-first.<br />> That leaves just invalid blocks with us=
eful block hash identifiers (#6). In all other cases the message is simply =
discarded. In this case the attempt is to move category #5 into category #6=
by prohibiting 64 byte txs.<br /><br />Yes, it's moving from the category =
#5 to the category #6. Note, transaction malleability can be a distinct iss=
ue than lack of domain separation.<br /><br />> The requirement to "avoi=
d re-downloading and re-validating it" is about performance, presumably min=
imizing initial block download/catch-up time. There is a > computational=
cost to producing 64 byte malleations and none for any of the other bogus =
block message categories above, including the other form of malleation. >=
; Furthermore, 64 byte malleation has almost zero cost to preclude. No hash=
ing and not even true header or tx parsing are required. Only a handful of =
bytes must be read > from the raw message before it can be discarded pre=
sently.<br /><br />> That's actually far cheaper than any of the other s=
cenarios that again, have no cost to produce. The other type of malleation =
requires parsing all of the txs in the block and > hashing and comparing=
some or all of them. In other words, if there is an attack scenario, that =
must be addressed before this can be meaningful. In fact all of the other<d=
iv>> bogus message scenarios (with tx data) will remain more expensive t=
o discard than this one.<br /><br />In practice on the bitcoin core side, t=
he bogus block message categories from #4 to #6 are already mitigated by va=
lidation caching for transactions that have been received early. While libb=
itcoin has no mempool (at least in earlier versions) transactions buffering=
can be done by bip152's HeadersAndShortIds message.<br /><br />About #7 an=
d #8, introducing a domain separation where 64 bytes transactions are rejec=
ted and making it harder to exploit #7 and #8 categories of bogus block mes=
sages.<br /><div>This is correct that bitcoin core might accept valid trans=
action data before the merkle tree commitment has been verified.</div><div>=
<br /></div>> The problem arises from trying to optimize dismissal by st=
oring an identifier. Just *producing* the identifier is orders of magnitude=
more costly than simply dismissing this > bogus message. I can't imagin=
e why any implementation would want to compute and store and retrieve and r=
ecompute and compare hashes when the alterative is just<div>> dismissing=
the bogus messages with no hashing at all.<br /><br />> Bogus messages =
will arrive, they do not even have to be requested. The simplest are dealt =
with by parse failure. What defines a parse is entirely subjective. General=
ly it's<div>> "structural" but nothing precludes incorporating a require=
ment for a necessary leading pattern in the stream, sort of like how the wi=
tness pattern is identified. If we were</div><div>> going to prioritize =
early dismissal this is where we would put it.<div><br />I don't think this=
is that simple - While producing an identifier comes with a computational =
cost (e.g fixed 64-byte structured coinbase transaction), if the full node =
have a hierarchy of validation cache like bitcoin core has already, the cos=
t of bogus block messages can be slashed down. On the other hand, just deal=
ing with parse failure on the spot by introducing a leading pattern in the =
stream just inflates the size of p2p messages, and the transaction-relay ba=
ndwidth cost.<br /><br />> However, there is a tradeoff in terms of earl=
y dismissal. Looking up invalid hashes is a costly tradeoff, which becomes =
multiplied by every block validated. For example,</div><div>> expending =
1 millisecond in hash/lookup to save 1 second of validation time in the fai=
lure case seems like a reasonable tradeoff, until you multiply across the w=
hole chain. > 1 ms becomes 14 minutes across the chain, just to save a s=
econd for each mallied block encountered. That means you need to have encou=
ntered 840 such mallied blocks > just to break even. Early dismissing th=
e block for non-null coinbase point (without hashing anything) would be on =
the order of 1000x faster than that (breakeven at 1 > encounter). So why=
the block hash cache requirement? It cannot be applied to many scenarios, =
and cannot be optimal in this one.<br /><br />I think what you're describin=
g is more a classic time-space tradeoff which is well-known in classic comp=
uter science litterature. In my reasonable opinion, one should more reason =
under what is the security paradigm we wish for bitcoin block-relay network=
and perduring decentralization, i.e one where it's easy to verify block me=
ssages proofs which could have been generated on specialized hardware with =
an asymmetric cost. Obviously encountering 840 such malliead blocks to make=
it break even doesn't make the math up to save on hash lookup, unless you =
can reduce the attack scenario in terms of adversaries capabilities.<br /><=
br />Best,<br /><div>Antoine=C2=A0</div></div></div></div></div><div class=
=3D"gmail_quote"><div dir=3D"auto" class=3D"gmail_attr">Le samedi 29 juin 2=
024 =C3=A0 21:42:23 UTC+1, Eric Voskuil a =C3=A9crit=C2=A0:<br/></div><bloc=
kquote class=3D"gmail_quote" style=3D"margin: 0 0 0 0.8ex; border-left: 1px=
solid rgb(204, 204, 204); padding-left: 1ex;">Caching identity in the case=
of invalidity is more interesting question than it might seem.<br><br>Back=
ground: A fully-validated block has established identity in its block hash.=
However an invalid block message may include the same block header, produc=
ing the same hash, but with any kind of nonsense following the header. The =
purpose of the transaction and witness commitments is of course to establis=
h this identity, so these two checks are therefore necessary even under che=
ckpoint/milestone. And then of course the two Merkle tree issues complicate=
the tx commitment (the integrity of the witness commitment is assured by t=
hat of the tx commitment).<br><br>So what does it mean to speak of a block =
hash derived from:<br><br>(1) a block message with an unparseable header?<b=
r>(2) a block message with parseable but invalid header?<br>(3) a block mes=
sage with valid header but unparseable tx data?<br>(4) a block message with=
valid header but parseable invalid uncommitted tx data?<br>(5) a block mes=
sage with valid header but parseable invalid malleated committed tx data?<b=
r>(6) a block message with valid header but parseable invalid unmalleated c=
ommitted tx data?<br>(7) a block message with valid header but uncommitted =
valid tx data?<br>(8) a block message with valid header but malleated commi=
tted valid tx data?<br>(9) a block message with valid header but unmalleate=
d committed valid tx data?<br><br>Note that only the #9 p2p block message c=
ontains an actual Bitcoin block, the others are bogus messages. In all case=
s the message can be sha256 hashed to establish the identity of the *messag=
e*. And if one's objective is to reject repeating bogus messages, this =
might be a useful strategy. It's already part of the p2p protocol, is o=
rders of magnitude cheaper to produce than a Merkle root, and has no identi=
ty issues.<br><br>The concept of Bitcoin block hash as unique identifier fo=
r invalid p2p block messages is problematic. Apart from the malleation ques=
tion, what is the Bitcoin block hash for a message with unparseable data (#=
1 and #3)? Such messages are trivial to produce and have no block hash. Wha=
t is the useful identifier for a block with malleated commitments (#5 and #=
8) or invalid commitments (#4 and #7) - valid txs or otherwise?<br><br>The =
stated objective for a consensus rule to invalidate all 64 byte txs is:<br>=
<br>> being able to cache the hash of a (non-malleated) invalid block as=
permanently invalid to avoid re-downloading and re-validating it.<br><br>T=
his seems reasonable at first glance, but given the list of scenarios above=
, which does it apply to? Presumably the invalid header (#2) doesn't ge=
t this far because of headers-first. That leaves just invalid blocks with u=
seful block hash identifiers (#6). In all other cases the message is simply=
discarded. In this case the attempt is to move category #5 into category #=
6 by prohibiting 64 byte txs.<br><br>The requirement to "avoid re-down=
loading and re-validating it" is about performance, presumably minimiz=
ing initial block download/catch-up time. There is a computational cost to =
producing 64 byte malleations and none for any of the other bogus block mes=
sage categories above, including the other form of malleation. Furthermore,=
64 byte malleation has almost zero cost to preclude. No hashing and not ev=
en true header or tx parsing are required. Only a handful of bytes must be =
read from the raw message before it can be discarded presently.<br><br>That=
's actually far cheaper than any of the other scenarios that again, hav=
e no cost to produce. The other type of malleation requires parsing all of =
the txs in the block and hashing and comparing some or all of them. In othe=
r words, if there is an attack scenario, that must be addressed before this=
can be meaningful. In fact all of the other bogus message scenarios (with =
tx data) will remain more expensive to discard than this one.<br><br>The pr=
oblem arises from trying to optimize dismissal by storing an identifier. Ju=
st *producing* the identifier is orders of magnitude more costly than simpl=
y dismissing this bogus message. I can't imagine why any implementation=
would want to compute and store and retrieve and recompute and compare has=
hes when the alterative is just dismissing the bogus messages with no hashi=
ng at all.<br><br>Bogus messages will arrive, they do not even have to be r=
equested. The simplest are dealt with by parse failure. What defines a pars=
e is entirely subjective. Generally it's "structural" but not=
hing precludes incorporating a requirement for a necessary leading pattern =
in the stream, sort of like how the witness pattern is identified. If we we=
re going to prioritize early dismissal this is where we would put it.<br><b=
r>However, there is a tradeoff in terms of early dismissal. Looking up inva=
lid hashes is a costly tradeoff, which becomes multiplied by every block va=
lidated. For example, expending 1 millisecond in hash/lookup to save 1 seco=
nd of validation time in the failure case seems like a reasonable tradeoff,=
until you multiply across the whole chain. 1 ms becomes 14 minutes across =
the chain, just to save a second for each mallied block encountered. That m=
eans you need to have encountered 840 such mallied blocks just to break eve=
n. Early dismissing the block for non-null coinbase point (without hashing =
anything) would be on the order of 1000x faster than that (breakeven at 1 e=
ncounter). So why the block hash cache requirement? It cannot be applied to=
many scenarios, and cannot be optimal in this one.<br><br>Eric<br></blockq=
uote></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;Bitcoin Development Mailing List" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:bitcoindev+unsubscribe@googlegroups.com">bitcoind=
ev+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/d/msgid/bitcoindev/301c64c7-0f0f-476a-90c4-913659477276n%40googlegroups.=
com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com/d/msg=
id/bitcoindev/301c64c7-0f0f-476a-90c4-913659477276n%40googlegroups.com</a>.=
<br />
------=_Part_22824_1413333519.1719887768873--
------=_Part_22823_366698941.1719887768873--
|