Delivery-date: Mon, 01 Jul 2024 22:03:42 -0700 Received: from mail-yb1-f186.google.com ([209.85.219.186]) by mail.fairlystable.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sOVg4-0003Pg-CM for bitcoindev@gnusha.org; Mon, 01 Jul 2024 22:03:42 -0700 Received: by mail-yb1-f186.google.com with SMTP id 3f1490d57ef6-dc691f1f83asf1909934276.1 for ; Mon, 01 Jul 2024 22:03:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20230601; t=1719896614; x=1720501414; darn=gnusha.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:references:in-reply-to:message-id:to:from:date:sender:from :to:cc:subject:date:message-id:reply-to; bh=eBS2L6FdZw8kfftpl7Jjf151kJv2mrSQ4RRA3P0mmyw=; b=Aa690xSGATVAzXlzCZcYaPMjHa1PeCgjMHAO+V74mOIAOpIZ2MhjCdWnq1Z9iOu9rR n8QZC0PVzv8jRJT7AfFxmiOZ0PA8N8YEHcVyK7qxlGZFaPF6ucl5aGzFW28ykNFzSQqI K7wvrbjGtI0g0DHWoJIsCuXO5mXAU3Aw8I0siZ/WQu4q7+3advGSfDYHndpzbJOMIlW6 2F0qcuFhGHr7Eu1IuXpDeNXcSoK4swJW/oGWSSiIP5IL698+tqe224u/119cbMLuyL8L 0gFBHrhUfQLb2wAeWNIh3spY1LAD6AJgrJFycuRinGgGKMm7rUSoKvcjhPJ2GDxCdupa /ycA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719896614; x=1720501414; darn=gnusha.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:references:in-reply-to:message-id:to:from:date:from:to:cc :subject:date:message-id:reply-to; bh=eBS2L6FdZw8kfftpl7Jjf151kJv2mrSQ4RRA3P0mmyw=; b=imDVMNln+4Bjv2z7hdJiTBE17WjjjhAgi/hJSmzBmDpw6y37ypJC7lzkL5/o4fUXmy QRx7REOvFZpfRMXtNtwL8fkyyauAZsOUF8heaUTYkQHky1HmO5UyQElIKl2zMxLZIZG/ IM/B3pKQg9ZHpA9SdJT8KRuPpubVk6KB7FtmLfZ7YXcm02oKy67JzY57akHf7ZgahPTZ tx2fvRMOO53asCA3l85r+r3Dx1AnNQmqRQCplj07mU5QyTXcZSXTVDWN4oHfn0fc5rVr 6N86Y8p6+S44lkMhvcA4eIrbVDkAZtTL9N2/wji9In6HQo6yAupMz5lC4ukNO/iCzV0I bFlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719896614; x=1720501414; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:references:in-reply-to:message-id:to:from:date:x-beenthere :x-gm-message-state:sender:from:to:cc:subject:date:message-id :reply-to; bh=eBS2L6FdZw8kfftpl7Jjf151kJv2mrSQ4RRA3P0mmyw=; b=CFTYFGlTn6ii/3CJgOthF0IZ9gwtHgz4FN5KZ9tJjYzB/fB5aSlATewQlop4Wxhwoq 068FSVXHFQlm0Zc00hgEjGH5LdNUIr0LLMEv44hB+4nvWXiQDi1h1nCJJipSs28R7F0B CCY0K+A5FCD/1dgtft1PuBs8ZT+zWL6N2p1QrMDa0r9R5xGs1OP/YyRT+pHX5C4uwgsv Z/v9kD8hUKQSa7gHt9urUwVAvupZnqDTaaDs6E2T0cme9H9+LmMtSlRVIlGVWHw9sN01 FQvmhRCYLfUMVdQseewAX7SZgubi+wH/SY9QtaD50+/lnmIU5VOrh6Fjy3THt+RN20WO Qraw== Sender: bitcoindev@googlegroups.com X-Forwarded-Encrypted: i=1; AJvYcCXgMCdEwDaFGLNXi/OQQ+G5rw1srEZFEjjguUKOAyYdrwTOVz1C993vTiZWM8lucPU0GIZZNQnDc9GiqsgJw8sObzqfy64= X-Gm-Message-State: AOJu0YwCGjpR8VTxzP92RWshAk5rh3BwgLotI0YdrngCv5wgDY8oRIiY eBP9MXcoLjEoZ/EUDQVVoi47HXztK3rBW7RM6fTGXXXo//93vVd7 X-Google-Smtp-Source: AGHT+IFRjvTmIYpVJKgh2S0Q357sDrV0V7L75NRvIdG4dOWX2YQSqk3vS25p9bU+ETXNwVVPJgzweg== X-Received: by 2002:a25:abea:0:b0:e03:359a:6a54 with SMTP id 3f1490d57ef6-e036df3bcecmr5070667276.6.1719896613816; Mon, 01 Jul 2024 22:03:33 -0700 (PDT) X-BeenThere: bitcoindev@googlegroups.com Received: by 2002:a05:6902:1005:b0:e03:2217:3c88 with SMTP id 3f1490d57ef6-e0356250cd3ls1670233276.1.-pod-prod-05-us; Mon, 01 Jul 2024 22:03:32 -0700 (PDT) X-Received: by 2002:a05:6902:188e:b0:dfb:1c1c:abf9 with SMTP id 3f1490d57ef6-e036eb1ec92mr406657276.2.1719896612408; Mon, 01 Jul 2024 22:03:32 -0700 (PDT) Received: by 2002:a81:ae02:0:b0:627:7f59:2eee with SMTP id 00721157ae682-64d36134a0ams7b3; Mon, 1 Jul 2024 19:36:09 -0700 (PDT) X-Received: by 2002:a05:6902:150d:b0:e03:31ec:8a1d with SMTP id 3f1490d57ef6-e036eae64bamr338066276.3.1719887769124; Mon, 01 Jul 2024 19:36:09 -0700 (PDT) Date: Mon, 1 Jul 2024 19:36:08 -0700 (PDT) From: Antoine Riard To: Bitcoin Development Mailing List Message-Id: <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com> In-Reply-To: <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com> References: <72e83c31-408f-4c13-bff5-bf0789302e23n@googlegroups.com> <5b0331a5-4e94-465d-a51d-02166e2c1937n@googlegroups.com> <9a4c4151-36ed-425a-a535-aa2837919a04n@googlegroups.com> <3f0064f9-54bd-46a7-9d9a-c54b99aca7b2n@googlegroups.com> <26b7321b-cc64-44b9-bc95-a4d8feb701e5n@googlegroups.com> <607a2233-ac12-4a80-ae4a-08341b3549b3n@googlegroups.com> <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com> Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_22823_366698941.1719887768873" X-Original-Sender: antoine.riard@gmail.com Precedence: list Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com List-ID: X-Google-Group-Id: 786775582512 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -0.5 (/) ------=_Part_22823_366698941.1719887768873 Content-Type: multipart/alternative; boundary="----=_Part_22824_1413333519.1719887768873" ------=_Part_22824_1413333519.1719887768873 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Eric, > Ok, thanks for clarifying. I'm still not making the connection to=20 "checking a non-null [C] pointer" but that's prob on me. A C pointer, which is a language idiome assigning to a memory address A the= =20 value o memory address B can be 0 (or NULL a standard macro defined in=20 stddef.h). Here a snippet example of linked list code checking the pointer=20 (`*begin_list`) is non null before the comparison operation to find the=20 target element list. ``` pointer_t ft_list_find(pointer_t **start_list, void *data_ref, int=20 (*cmp)()) { while (*start_list) { if (cmp((*start_list)->data, data_ref) =3D=3D 0) return (*start_list); *start_list =3D (*start_list)->next; } return (0); } ``` While both libbitcoin and bitcoin core are both written in c++, you still= =20 have underlying pointer derefencing playing out to access the coinbase transaction, and all underlying implications in terms of memory management. > Yes, a rough correlation but not necessarily equivalence. Note that=20 block.check has context free and contextual overrides. >=20 > The 'bypass' parameter indicates a block under checkpoint or milestone=20 ("assume valid"). In this case we must check Merkle root, witness=20 commitment, and both types of malleation - as the purpose is to establish= =20 identity. Absent 'bypass' the typical checks are performed, and therefore a= =20 malleation check is not required here. The "type64" malleation is subsumed= =20 by the is_first_non_coinbase check and the "type32" malleation is subsumed= =20 by the is_internal_double_spend check. Yes, I understand it's not a 1-to-1 compatibility, just a rough logical=20 equivalence. I think it's interesting to point out the two types of malleation that a=20 bitcoin consensus validation logic should respect w.r.t block validity=20 checks. Like you said the first one on the merkle root committed in the headers's= =20 `hashMerkleRoot` due to the lack of domain separation between leaf and=20 merkle tree nodes. The second one is the bip141 wtxid commitment in one of the coinbase=20 transaction `scriptpubkey` output, which is itself covered by a txid in the= =20 merkle tree. > Caching identity in the case of invalidity is more interesting question= =20 than it might seem. >=20 > Background: A fully-validated block has established identity in its block= =20 hash. However an invalid block message may include the same block header,= =20 producing the same hash, but with any kind of nonsense following the=20 header. The purpose of the transaction and witness commitments is of course= =20 to establish this identity, so these two checks are therefore necessary=20 even under checkpoint/milestone. And then of course the two Merkle tree=20 issues complicate the tx commitment (the integrity of the witness=20 commitment is assured by that of the tx commitment). >=20 > So what does it mean to speak of a block hash derived from: >=20 > (1) a block message with an unparseable header? > (2) a block message with parseable but invalid header? > (3) a block message with valid header but unparseable tx data? > (4) a block message with valid header but parseable invalid uncommitted= =20 tx data? > (5) a block message with valid header but parseable invalid malleated=20 committed tx data? > (6) a block message with valid header but parseable invalid unmalleated= =20 committed tx data? > (7) a block message with valid header but uncommitted valid tx data? > (8) a block message with valid header but malleated committed valid tx=20 data? > (9) a block message with valid header but unmalleated committed valid tx= =20 data? >=20 > Note that only the #9 p2p block message contains an actual Bitcoin block,= =20 the others are bogus messages. In all cases the message can be sha256=20 hashed to establish the identity of the *message*. And if one's objective= =20 is to reject repeating bogus messages, this might be a useful strategy.=20 It's already part of the p2p protocol, is orders of magnitude cheaper to=20 produce than a Merkle root, and has no identity issues. I think I mostly agree with the identity issue as laid out so far, there is= =20 one caveat to add if you're considering identity caching as the problem=20 solved. A validation node might have to consider differently block messages=20 processed if they connect on the longest most PoW valid chain for which all= =20 blocks have been validated. Or alternatively if they have to be added on a= =20 candidate longest most PoW valid chain. > The concept of Bitcoin block hash as unique identifier for invalid p2p=20 block messages is problematic. Apart from the malleation question, what is= =20 the Bitcoin block > hash for a message with unparseable data (#1 and #3)? Such messages are= =20 trivial to produce and have no block hash. For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` (in=20 `src/node/connection_types.h`) where some preferential peering policy is=20 applied in matters of block messages download. > What is the useful identifier for a block with malleated commitments (#5= =20 and #8) or invalid commitments (#4 and #7) - valid txs or otherwise? The block header, as it commits to the transaction identifier tree can be= =20 useful as much for #4 and #5. On the bitcoin core side, about #7 the=20 uncommitted valid tx data can be already present in the validation cache=20 from mempool acceptance. About #8, the malleaed committed valid=20 transactions shall be also committed in the merkle root in headers. > This seems reasonable at first glance, but given the list of scenarios=20 above, which does it apply to? > This seems reasonable at first glance, but given the list of scenarios=20 above, which does it apply to? Presumably the invalid header (#2) doesn't= =20 get this far because of headers-first. > That leaves just invalid blocks with useful block hash identifiers (#6).= =20 In all other cases the message is simply discarded. In this case the=20 attempt is to move category #5 into category #6 by prohibiting 64 byte txs. Yes, it's moving from the category #5 to the category #6. Note, transaction= =20 malleability can be a distinct issue than lack of domain separation. > The requirement to "avoid re-downloading and re-validating it" is about= =20 performance, presumably minimizing initial block download/catch-up time.=20 There is a > computational cost to producing 64 byte malleations and none= =20 for any of the other bogus block message categories above, including the=20 other form of malleation. > Furthermore, 64 byte malleation has almost zero= =20 cost to preclude. No hashing and not even true header or tx parsing are=20 required. Only a handful of bytes must be read > from the raw message=20 before it can be discarded presently. > That's actually far cheaper than any of the other scenarios that again,= =20 have no cost to produce. The other type of malleation requires parsing all= =20 of the txs in the block and > hashing and comparing some or all of them. In= =20 other words, if there is an attack scenario, that must be addressed before= =20 this can be meaningful. In fact all of the other > bogus message scenarios (with tx data) will remain more expensive to=20 discard than this one. In practice on the bitcoin core side, the bogus block message categories=20 from #4 to #6 are already mitigated by validation caching for transactions= =20 that have been received early. While libbitcoin has no mempool (at least in= =20 earlier versions) transactions buffering can be done by bip152's=20 HeadersAndShortIds message. About #7 and #8, introducing a domain separation where 64 bytes=20 transactions are rejected and making it harder to exploit #7 and #8=20 categories of bogus block messages. This is correct that bitcoin core might accept valid transaction data=20 before the merkle tree commitment has been verified. > The problem arises from trying to optimize dismissal by storing an=20 identifier. Just *producing* the identifier is orders of magnitude more=20 costly than simply dismissing this > bogus message. I can't imagine why any= =20 implementation would want to compute and store and retrieve and recompute= =20 and compare hashes when the alterative is just > dismissing the bogus messages with no hashing at all. > Bogus messages will arrive, they do not even have to be requested. The=20 simplest are dealt with by parse failure. What defines a parse is entirely= =20 subjective. Generally it's > "structural" but nothing precludes incorporating a requirement for a=20 necessary leading pattern in the stream, sort of like how the witness=20 pattern is identified. If we were > going to prioritize early dismissal this is where we would put it. I don't think this is that simple - While producing an identifier comes=20 with a computational cost (e.g fixed 64-byte structured coinbase=20 transaction), if the full node have a hierarchy of validation cache like=20 bitcoin core has already, the cost of bogus block messages can be slashed= =20 down. On the other hand, just dealing with parse failure on the spot by=20 introducing a leading pattern in the stream just inflates the size of p2p= =20 messages, and the transaction-relay bandwidth cost. > However, there is a tradeoff in terms of early dismissal. Looking up=20 invalid hashes is a costly tradeoff, which becomes multiplied by every=20 block validated. For example, > expending 1 millisecond in hash/lookup to save 1 second of validation=20 time in the failure case seems like a reasonable tradeoff, until you=20 multiply across the whole chain. > 1 ms becomes 14 minutes across the=20 chain, just to save a second for each mallied block encountered. That means= =20 you need to have encountered 840 such mallied blocks > just to break even.= =20 Early dismissing the block for non-null coinbase point (without hashing=20 anything) would be on the order of 1000x faster than that (breakeven at 1 >= =20 encounter). So why the block hash cache requirement? It cannot be applied= =20 to many scenarios, and cannot be optimal in this one. I think what you're describing is more a classic time-space tradeoff which= =20 is well-known in classic computer science litterature. In my reasonable=20 opinion, one should more reason under what is the security paradigm we wish= =20 for bitcoin block-relay network and perduring decentralization, i.e one=20 where it's easy to verify block messages proofs which could have been=20 generated on specialized hardware with an asymmetric cost. Obviously=20 encountering 840 such malliead blocks to make it break even doesn't make=20 the math up to save on hash lookup, unless you can reduce the attack=20 scenario in terms of adversaries capabilities. Best, Antoine=20 Le samedi 29 juin 2024 =C3=A0 21:42:23 UTC+1, Eric Voskuil a =C3=A9crit : > Caching identity in the case of invalidity is more interesting question= =20 > than it might seem. > > Background: A fully-validated block has established identity in its block= =20 > hash. However an invalid block message may include the same block header,= =20 > producing the same hash, but with any kind of nonsense following the=20 > header. The purpose of the transaction and witness commitments is of cour= se=20 > to establish this identity, so these two checks are therefore necessary= =20 > even under checkpoint/milestone. And then of course the two Merkle tree= =20 > issues complicate the tx commitment (the integrity of the witness=20 > commitment is assured by that of the tx commitment). > > So what does it mean to speak of a block hash derived from: > > (1) a block message with an unparseable header? > (2) a block message with parseable but invalid header? > (3) a block message with valid header but unparseable tx data? > (4) a block message with valid header but parseable invalid uncommitted t= x=20 > data? > (5) a block message with valid header but parseable invalid malleated=20 > committed tx data? > (6) a block message with valid header but parseable invalid unmalleated= =20 > committed tx data? > (7) a block message with valid header but uncommitted valid tx data? > (8) a block message with valid header but malleated committed valid tx=20 > data? > (9) a block message with valid header but unmalleated committed valid tx= =20 > data? > > Note that only the #9 p2p block message contains an actual Bitcoin block,= =20 > the others are bogus messages. In all cases the message can be sha256=20 > hashed to establish the identity of the *message*. And if one's objective= =20 > is to reject repeating bogus messages, this might be a useful strategy.= =20 > It's already part of the p2p protocol, is orders of magnitude cheaper to= =20 > produce than a Merkle root, and has no identity issues. > > The concept of Bitcoin block hash as unique identifier for invalid p2p=20 > block messages is problematic. Apart from the malleation question, what i= s=20 > the Bitcoin block hash for a message with unparseable data (#1 and #3)?= =20 > Such messages are trivial to produce and have no block hash. What is the= =20 > useful identifier for a block with malleated commitments (#5 and #8) or= =20 > invalid commitments (#4 and #7) - valid txs or otherwise? > > The stated objective for a consensus rule to invalidate all 64 byte txs i= s: > > > being able to cache the hash of a (non-malleated) invalid block as=20 > permanently invalid to avoid re-downloading and re-validating it. > > This seems reasonable at first glance, but given the list of scenarios=20 > above, which does it apply to? Presumably the invalid header (#2) doesn't= =20 > get this far because of headers-first. That leaves just invalid blocks wi= th=20 > useful block hash identifiers (#6). In all other cases the message is=20 > simply discarded. In this case the attempt is to move category #5 into=20 > category #6 by prohibiting 64 byte txs. > > The requirement to "avoid re-downloading and re-validating it" is about= =20 > performance, presumably minimizing initial block download/catch-up time.= =20 > There is a computational cost to producing 64 byte malleations and none f= or=20 > any of the other bogus block message categories above, including the othe= r=20 > form of malleation. Furthermore, 64 byte malleation has almost zero cost = to=20 > preclude. No hashing and not even true header or tx parsing are required.= =20 > Only a handful of bytes must be read from the raw message before it can b= e=20 > discarded presently. > > That's actually far cheaper than any of the other scenarios that again,= =20 > have no cost to produce. The other type of malleation requires parsing al= l=20 > of the txs in the block and hashing and comparing some or all of them. In= =20 > other words, if there is an attack scenario, that must be addressed befor= e=20 > this can be meaningful. In fact all of the other bogus message scenarios= =20 > (with tx data) will remain more expensive to discard than this one. > > The problem arises from trying to optimize dismissal by storing an=20 > identifier. Just *producing* the identifier is orders of magnitude more= =20 > costly than simply dismissing this bogus message. I can't imagine why any= =20 > implementation would want to compute and store and retrieve and recompute= =20 > and compare hashes when the alterative is just dismissing the bogus=20 > messages with no hashing at all. > > Bogus messages will arrive, they do not even have to be requested. The=20 > simplest are dealt with by parse failure. What defines a parse is entirel= y=20 > subjective. Generally it's "structural" but nothing precludes incorporati= ng=20 > a requirement for a necessary leading pattern in the stream, sort of like= =20 > how the witness pattern is identified. If we were going to prioritize ear= ly=20 > dismissal this is where we would put it. > > However, there is a tradeoff in terms of early dismissal. Looking up=20 > invalid hashes is a costly tradeoff, which becomes multiplied by every=20 > block validated. For example, expending 1 millisecond in hash/lookup to= =20 > save 1 second of validation time in the failure case seems like a=20 > reasonable tradeoff, until you multiply across the whole chain. 1 ms=20 > becomes 14 minutes across the chain, just to save a second for each malli= ed=20 > block encountered. That means you need to have encountered 840 such malli= ed=20 > blocks just to break even. Early dismissing the block for non-null coinba= se=20 > point (without hashing anything) would be on the order of 1000x faster th= an=20 > that (breakeven at 1 encounter). So why the block hash cache requirement?= =20 > It cannot be applied to many scenarios, and cannot be optimal in this one= . > > Eric > --=20 You received this message because you are subscribed to the Google Groups "= Bitcoin Development Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to bitcoindev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/= bitcoindev/301c64c7-0f0f-476a-90c4-913659477276n%40googlegroups.com. ------=_Part_22824_1413333519.1719887768873 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Eric,

> Ok, thanks for clarifying. I'm still not= making the connection to "checking a non-null [C] pointer" but that's prob= on me.

A C pointer, which is a language idiome assigning to a m= emory address A the value o memory address B can be 0 (or NULL a standard m= acro defined in stddef.h).

Here a snippet example of linked list= code checking the pointer (`*begin_list`) is non null before the compariso= n operation to find the target element list.

```
pointer_t = =C2=A0 =C2=A0 =C2=A0 ft_list_find(pointer_t **start_list, void *data_ref, i= nt (*cmp)())
{
=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (*start_list)=C2=A0 =C2=A0 =C2=A0 =C2=A0 {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 if (cmp((*start_list)->data, data_ref) =3D=3D 0)=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 return (*start_list);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 *start_list =3D (*start_list)->next;
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return (0);
}<= /div>
```

While both libbitcoin and bitcoin = core are both written in c++, you still have underlying pointer derefencing= playing out to access the coinbase
transaction, and all underlying im= plications in terms of memory management.

> Yes, a rough corr= elation but not necessarily equivalence. Note that block.check has context = free and contextual overrides.
>
> The 'bypass' parameter = indicates a block under checkpoint or milestone ("assume valid"). In this c= ase we must check Merkle root, witness commitment, and both types of mallea= tion - as the purpose is to establish identity. Absent 'bypass' the typical= checks are performed, and therefore a malleation check is not required her= e. The "type64" malleation is subsumed by the is_first_non_coinbase check a= nd the "type32" malleation is subsumed by the is_internal_double_spend chec= k.

Yes, I understand it's not a 1-to-1 compatibility, just a rou= gh logical equivalence.

I think it's interesting to point out th= e two types of malleation that a bitcoin consensus validation logic should = respect w.r.t block validity checks.

Like you said the first one= on the merkle root committed in the headers's `hashMerkleRoot` due to the = lack of domain separation between leaf and merkle tree nodes.
The seco= nd one is the bip141 wtxid commitment in one of the coinbase transaction `s= criptpubkey` output,=C2=A0which is itself covered by a txid in the merkle t= ree.

> Caching identity in the case of invali= dity is more interesting question than it might seem.
>
> = Background: A fully-validated block has established identity in its block h= ash. However an invalid block message may include the same block header, pr= oducing the same hash, but with any kind of nonsense following the header. = The purpose of the transaction and witness commitments is of course to esta= blish this identity, so these two checks are therefore necessary even under= checkpoint/milestone. And then of course the two Merkle tree issues compli= cate the tx commitment (the integrity of the witness commitment is assured = by that of the tx commitment).
>
> So what does it mean to= speak of a block hash derived from:
>
> (1) a block messa= ge with an unparseable header?
> (2) a block message with parseable= but invalid header?
> (3) a block message with valid header but un= parseable tx data?
> (4) a block message with valid header but pars= eable invalid uncommitted tx data?
> (5) a block message with valid= header but parseable invalid malleated committed tx data?
> (6) a = block message with valid header but parseable invalid unmalleated committed= tx data?
> (7) a block message with valid header but uncommitted v= alid tx data?
> (8) a block message with valid header but malleated= committed valid tx data?
> (9) a block message with valid header b= ut unmalleated committed valid tx data?
>
> Note that only= the #9 p2p block message contains an actual Bitcoin block, the others are = bogus messages. In all cases the message can be sha256 hashed to establish = the identity of the *message*. And if one's objective is to reject repeatin= g bogus messages, this might be a useful strategy. It's already part of the= p2p protocol, is orders of magnitude cheaper to produce than a Merkle root= , and has no identity issues.

I think I mostly agree = with the identity issue as laid out so far, there is one caveat to add if y= ou're considering identity caching as the problem solved.
A validation= node might have to consider differently block messages processed if they c= onnect on the longest most PoW valid chain for which all blocks have been v= alidated. Or alternatively if they have to be added on a candidate longest = most PoW valid chain.

> The concept of Bitcoin block hash as = unique identifier for invalid p2p block messages is problematic. Apart from= the malleation question, what is the Bitcoin block
> hash for a me= ssage with unparseable data (#1 and #3)? Such messages are trivial to produ= ce and have no block hash.

For reasons, bitcoin core has the con= cept of outbound `BLOCK_RELAY` (in `src/node/connection_types.h`) where som= e preferential peering policy is applied in matters of block messages downl= oad.

> What is the useful identifier for a block with malleat= ed commitments (#5 and #8) or invalid commitments (#4 and #7) - valid txs o= r otherwise?

The block header, as it commits to the transaction = identifier tree can be useful as much for #4 and #5. On the bitcoin core si= de, about #7 the uncommitted valid tx data can be already present in the va= lidation cache from mempool acceptance. About #8, the malleaed committed va= lid transactions shall be also committed in the merkle root in headers.

> This seems reasonable at first glance, but given the list of s= cenarios above, which does it apply to?

> This see= ms reasonable at first glance, but given the list of scenarios above, which= does it apply to? Presumably the invalid header (#2) doesn't get this far = because of headers-first.
> That leaves just invalid blocks with us= eful block hash identifiers (#6). In all other cases the message is simply = discarded. In this case the attempt is to move category #5 into category #6= by prohibiting 64 byte txs.

Yes, it's moving from the category = #5 to the category #6. Note, transaction malleability can be a distinct iss= ue than lack of domain separation.

> The requirement to "avoi= d re-downloading and re-validating it" is about performance, presumably min= imizing initial block download/catch-up time. There is a > computational= cost to producing 64 byte malleations and none for any of the other bogus = block message categories above, including the other form of malleation. >= ; Furthermore, 64 byte malleation has almost zero cost to preclude. No hash= ing and not even true header or tx parsing are required. Only a handful of = bytes must be read > from the raw message before it can be discarded pre= sently.

> That's actually far cheaper than any of the other s= cenarios that again, have no cost to produce. The other type of malleation = requires parsing all of the txs in the block and > hashing and comparing= some or all of them. In other words, if there is an attack scenario, that = must be addressed before this can be meaningful. In fact all of the other> bogus message scenarios (with tx data) will remain more expensive t= o discard than this one.

In practice on the bitcoin core side, t= he bogus block message categories from #4 to #6 are already mitigated by va= lidation caching for transactions that have been received early. While libb= itcoin has no mempool (at least in earlier versions) transactions buffering= can be done by bip152's HeadersAndShortIds message.

About #7 an= d #8, introducing a domain separation where 64 bytes transactions are rejec= ted and making it harder to exploit #7 and #8 categories of bogus block mes= sages.
This is correct that bitcoin core might accept valid trans= action data before the merkle tree commitment has been verified.
=
> The problem arises from trying to optimize dismissal by st= oring an identifier. Just *producing* the identifier is orders of magnitude= more costly than simply dismissing this > bogus message. I can't imagin= e why any implementation would want to compute and store and retrieve and r= ecompute and compare hashes when the alterative is just
> dismissing= the bogus messages with no hashing at all.

> Bogus messages = will arrive, they do not even have to be requested. The simplest are dealt = with by parse failure. What defines a parse is entirely subjective. General= ly it's
> "structural" but nothing precludes incorporating a require= ment for a necessary leading pattern in the stream, sort of like how the wi= tness pattern is identified. If we were
> going to prioritize = early dismissal this is where we would put it.

I don't think this= is that simple - While producing an identifier comes with a computational = cost (e.g fixed 64-byte structured coinbase transaction), if the full node = have a hierarchy of validation cache like bitcoin core has already, the cos= t of bogus block messages can be slashed down. On the other hand, just deal= ing with parse failure on the spot by introducing a leading pattern in the = stream just inflates the size of p2p messages, and the transaction-relay ba= ndwidth cost.

> However, there is a tradeoff in terms of earl= y dismissal. Looking up invalid hashes is a costly tradeoff, which becomes = multiplied by every block validated. For example,
> expending = 1 millisecond in hash/lookup to save 1 second of validation time in the fai= lure case seems like a reasonable tradeoff, until you multiply across the w= hole chain. > 1 ms becomes 14 minutes across the chain, just to save a s= econd for each mallied block encountered. That means you need to have encou= ntered 840 such mallied blocks > just to break even. Early dismissing th= e block for non-null coinbase point (without hashing anything) would be on = the order of 1000x faster than that (breakeven at 1 > encounter). So why= the block hash cache requirement? It cannot be applied to many scenarios, = and cannot be optimal in this one.

I think what you're describin= g is more a classic time-space tradeoff which is well-known in classic comp= uter science litterature. In my reasonable opinion, one should more reason = under what is the security paradigm we wish for bitcoin block-relay network= and perduring decentralization, i.e one where it's easy to verify block me= ssages proofs which could have been generated on specialized hardware with = an asymmetric cost. Obviously encountering 840 such malliead blocks to make= it break even doesn't make the math up to save on hash lookup, unless you = can reduce the attack scenario in terms of adversaries capabilities.
<= br />Best,
Antoine=C2=A0
Le samedi 29 juin 2= 024 =C3=A0 21:42:23 UTC+1, Eric Voskuil a =C3=A9crit=C2=A0:
Caching identity in the case= of invalidity is more interesting question than it might seem.

Back= ground: A fully-validated block has established identity in its block hash.= However an invalid block message may include the same block header, produc= ing the same hash, but with any kind of nonsense following the header. The = purpose of the transaction and witness commitments is of course to establis= h this identity, so these two checks are therefore necessary even under che= ckpoint/milestone. And then of course the two Merkle tree issues complicate= the tx commitment (the integrity of the witness commitment is assured by t= hat of the tx commitment).

So what does it mean to speak of a block = hash derived from:

(1) a block message with an unparseable header?(2) a block message with parseable but invalid header?
(3) a block mes= sage with valid header but unparseable tx data?
(4) a block message with= valid header but parseable invalid uncommitted tx data?
(5) a block mes= sage with valid header but parseable invalid malleated committed tx data?(6) a block message with valid header but parseable invalid unmalleated c= ommitted tx data?
(7) a block message with valid header but uncommitted = valid tx data?
(8) a block message with valid header but malleated commi= tted valid tx data?
(9) a block message with valid header but unmalleate= d committed valid tx data?

Note that only the #9 p2p block message c= ontains an actual Bitcoin block, the others are bogus messages. In all case= s the message can be sha256 hashed to establish the identity of the *messag= e*. And if one's objective is to reject repeating bogus messages, this = might be a useful strategy. It's already part of the p2p protocol, is o= rders of magnitude cheaper to produce than a Merkle root, and has no identi= ty issues.

The concept of Bitcoin block hash as unique identifier fo= r invalid p2p block messages is problematic. Apart from the malleation ques= tion, what is the Bitcoin block hash for a message with unparseable data (#= 1 and #3)? Such messages are trivial to produce and have no block hash. Wha= t is the useful identifier for a block with malleated commitments (#5 and #= 8) or invalid commitments (#4 and #7) - valid txs or otherwise?

The = stated objective for a consensus rule to invalidate all 64 byte txs is:
=
> being able to cache the hash of a (non-malleated) invalid block as= permanently invalid to avoid re-downloading and re-validating it.

T= his seems reasonable at first glance, but given the list of scenarios above= , which does it apply to? Presumably the invalid header (#2) doesn't ge= t this far because of headers-first. That leaves just invalid blocks with u= seful block hash identifiers (#6). In all other cases the message is simply= discarded. In this case the attempt is to move category #5 into category #= 6 by prohibiting 64 byte txs.

The requirement to "avoid re-down= loading and re-validating it" is about performance, presumably minimiz= ing initial block download/catch-up time. There is a computational cost to = producing 64 byte malleations and none for any of the other bogus block mes= sage categories above, including the other form of malleation. Furthermore,= 64 byte malleation has almost zero cost to preclude. No hashing and not ev= en true header or tx parsing are required. Only a handful of bytes must be = read from the raw message before it can be discarded presently.

That= 's actually far cheaper than any of the other scenarios that again, hav= e no cost to produce. The other type of malleation requires parsing all of = the txs in the block and hashing and comparing some or all of them. In othe= r words, if there is an attack scenario, that must be addressed before this= can be meaningful. In fact all of the other bogus message scenarios (with = tx data) will remain more expensive to discard than this one.

The pr= oblem arises from trying to optimize dismissal by storing an identifier. Ju= st *producing* the identifier is orders of magnitude more costly than simpl= y dismissing this bogus message. I can't imagine why any implementation= would want to compute and store and retrieve and recompute and compare has= hes when the alterative is just dismissing the bogus messages with no hashi= ng at all.

Bogus messages will arrive, they do not even have to be r= equested. The simplest are dealt with by parse failure. What defines a pars= e is entirely subjective. Generally it's "structural" but not= hing precludes incorporating a requirement for a necessary leading pattern = in the stream, sort of like how the witness pattern is identified. If we we= re going to prioritize early dismissal this is where we would put it.
However, there is a tradeoff in terms of early dismissal. Looking up inva= lid hashes is a costly tradeoff, which becomes multiplied by every block va= lidated. For example, expending 1 millisecond in hash/lookup to save 1 seco= nd of validation time in the failure case seems like a reasonable tradeoff,= until you multiply across the whole chain. 1 ms becomes 14 minutes across = the chain, just to save a second for each mallied block encountered. That m= eans you need to have encountered 840 such mallied blocks just to break eve= n. Early dismissing the block for non-null coinbase point (without hashing = anything) would be on the order of 1000x faster than that (breakeven at 1 e= ncounter). So why the block hash cache requirement? It cannot be applied to= many scenarios, and cannot be optimal in this one.

Eric

--
You received this message because you are subscribed to the Google Groups &= quot;Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bitcoind= ev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg= id/bitcoindev/301c64c7-0f0f-476a-90c4-913659477276n%40googlegroups.com.=
------=_Part_22824_1413333519.1719887768873-- ------=_Part_22823_366698941.1719887768873--