diff options
author | Eric Voskuil <eric@voskuil.org> | 2024-07-02 18:13:20 -0700 |
---|---|---|
committer | bitcoindev <bitcoindev@googlegroups.com> | 2024-07-02 18:31:02 -0700 |
commit | e474eeefe150b4771b10abd354313e3a5524020f (patch) | |
tree | 9ec581b0ce95cba77111945c354db9bb19367dd1 | |
parent | bf3423d81b972daccb15116f12f01a3dbcc07e3d (diff) | |
download | pi-bitcoindev-master.tar.gz pi-bitcoindev-master.zip |
-rw-r--r-- | 8b/4782c0efb0a459f62cb2cb7f5b9a91e2bb1564 | 639 |
1 files changed, 639 insertions, 0 deletions
diff --git a/8b/4782c0efb0a459f62cb2cb7f5b9a91e2bb1564 b/8b/4782c0efb0a459f62cb2cb7f5b9a91e2bb1564 new file mode 100644 index 000000000..39ca88a53 --- /dev/null +++ b/8b/4782c0efb0a459f62cb2cb7f5b9a91e2bb1564 @@ -0,0 +1,639 @@ +Delivery-date: Tue, 02 Jul 2024 18:31:02 -0700 +Received: from mail-yb1-f187.google.com ([209.85.219.187]) + by mail.fairlystable.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 + (Exim 4.94.2) + (envelope-from <bitcoindev+bncBC5P5KEHZQLBBTOTSK2AMGQE5N3FB5I@googlegroups.com>) + id 1sOopo-0007je-TR + for bitcoindev@gnusha.org; Tue, 02 Jul 2024 18:31:02 -0700 +Received: by mail-yb1-f187.google.com with SMTP id 3f1490d57ef6-e03a92302d1sf817548276.1 + for <bitcoindev@gnusha.org>; Tue, 02 Jul 2024 18:31:00 -0700 (PDT) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=googlegroups.com; s=20230601; t=1719970254; x=1720575054; darn=gnusha.org; + h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post + :list-id:mailing-list:precedence:x-original-sender:mime-version + :subject:references:in-reply-to:message-id:to:from:date:sender:from + :to:cc:subject:date:message-id:reply-to; + bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=; + b=bHnBaYi6CdztbUdGJas0Dz8G7k4UHb/ubQAPZ8zy714swVXcwb56zA4HGvSxzZkfQc + 17Rz1vAzMQLUGhx9IpD5rpDxNXPKNUKGr1srx1Hf6WclGal9hKhfcbLIBoLGfS3V+BG9 + /+OuyDbxZPbCmhATNdgbli2m+reHAUhrQarkdZbh+JiOllC52PmZJ841JGzh9lJ+XfIx + jw0qT66q4D0l741RqBXkZ28/MxQl0PRNpx9fUBa6ctIYBeSZH6RP9p0/FEf1erf6XLJm + 7O3IEIq1+pxr2oBbOm0GqQUA9gKJL7mBWCIiv5TGQvVyoy1cx7dBKDSE34YUHLk4jyq4 + tTPw== +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=googlegroups-com.20230601.gappssmtp.com; s=20230601; t=1719970254; x=1720575054; darn=gnusha.org; + h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post + :list-id:mailing-list:precedence:x-original-sender:mime-version + :subject:references:in-reply-to:message-id:to:from:date:from:to:cc + :subject:date:message-id:reply-to; + bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=; + b=irys9hm5QLYCa4xVccz5fdg/x+BHkEA8mUkN7kbYkHO5gQnDnL1qc5LzqXJNlyqMrS + 6IWoFhZH2IHlVIt66iAcwtXpeDNxUfkwOt6BnMYf4Ce5SPTuPxI6KiIFHElsJi5IeLI9 + KHWdlDkqwKeoAPpGQM75HDwfM0VRu57LcVEJsvX6n2BnTuHlggesJJX0sGSW8QxsNxYp + 9jwxJUV67dXCFBEcJc7mPtNw979vcO064CGa0xI2Jaxwt1m7YTHAECaURoWp7nO8yGSl + 2fku+QxmIdT7uRPG2EvLPeg1YO1fv+WDuow29GyE/P3m1bSlfsvPgLv0CWa7TCft6WfF + Fz5w== +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=1e100.net; s=20230601; t=1719970254; x=1720575054; + h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post + :list-id:mailing-list:precedence:x-original-sender:mime-version + :subject:references:in-reply-to:message-id:to:from:date:x-beenthere + :x-gm-message-state:sender:from:to:cc:subject:date:message-id + :reply-to; + bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=; + b=a+8m231YYiPYIzYDmP3U1NNl2pj5dF8pHYI3A9ltIvOH3vU9BPtoC5QeOybSYdcDDz + LuLzuLV9iH3rPClhdl/rCc+bFo5hRpR4HX8bB5ge3MU5fxJLfcK8h4LICFeQxn67rqlT + dXgIB26wPWB7PIdaCOhOe563YnEimGVxb2JlYqjbhPi1v26GS2B//wA8YEzCIISJi07e + tdCHakMLTUIvX0BAFByh5m4cHWKuE2rOyVRa9DHByb7jDG/4uHXtR2n6z3yP0D/pipTg + t+CEL9pzlAp0aFLXps8QMeZf4vtism1590xq63pKXUqTmF7qpm3IUrN7SqgU7LP68JwY + 0BPg== +Sender: bitcoindev@googlegroups.com +X-Forwarded-Encrypted: i=1; AJvYcCV8vfW22ZVamQzVnzWU7X2sp5jLzcLfzYDTqz1nhS6mxpkF2kE62osiDjxVkGwFkRsBcvMjRchLmuzPGAvG0YnQqnL4PlY= +X-Gm-Message-State: AOJu0YyGauLvl55DjJ1R9Fz3gX92rbX+GnYT0+twUxeVfGsMDGaSkIaZ + 0dMGH21VByzxHsYrzhrIrkkCUEKOJn5dR3vi0RudYr4scE34TKER +X-Google-Smtp-Source: AGHT+IH4ZqWsLpki8Sdiv4KRJM30Qgv/PKh4KkXmPZUsFJmJ4MJMTFA+HO6ZyRwsqAnGDidr11+HtQ== +X-Received: by 2002:a25:df16:0:b0:e03:229d:69f5 with SMTP id 3f1490d57ef6-e036ead1e52mr11151803276.3.1719970254432; + Tue, 02 Jul 2024 18:30:54 -0700 (PDT) +X-BeenThere: bitcoindev@googlegroups.com +Received: by 2002:a05:6902:1007:b0:e03:6457:383f with SMTP id + 3f1490d57ef6-e0364573bd8ls6892124276.1.-pod-prod-09-us; Tue, 02 Jul 2024 + 18:30:53 -0700 (PDT) +X-Received: by 2002:a05:6902:2b8a:b0:e03:5a51:382f with SMTP id 3f1490d57ef6-e036ec429bcmr980548276.8.1719970253037; + Tue, 02 Jul 2024 18:30:53 -0700 (PDT) +Received: by 2002:a05:690c:4289:b0:63b:c3b0:e1c with SMTP id 00721157ae682-6514011671ams7b3; + Tue, 2 Jul 2024 18:13:22 -0700 (PDT) +X-Received: by 2002:a05:690c:fc8:b0:64b:16af:d264 with SMTP id 00721157ae682-64c776d2fd5mr283127b3.7.1719969201017; + Tue, 02 Jul 2024 18:13:21 -0700 (PDT) +Date: Tue, 2 Jul 2024 18:13:20 -0700 (PDT) +From: Eric Voskuil <eric@voskuil.org> +To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com> +Message-Id: <d9834ad5-f803-4a39-a854-95b2439738f5n@googlegroups.com> +In-Reply-To: <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com> +References: <gnM89sIQ7MhDgI62JciQEGy63DassEv7YZAMhj0IEuIo0EdnafykF6RH4OqjTTHIHsIoZvC2MnTUzJI7EfET4o-UQoD-XAQRDcct994VarE=@protonmail.com> + <72e83c31-408f-4c13-bff5-bf0789302e23n@googlegroups.com> + <heKH68GFJr4Zuf6lBozPJrb-StyBJPMNvmZL0xvKFBnBGVA3fVSgTLdWc-_8igYWX8z3zCGvzflH-CsRv0QCJQcfwizNyYXlBJa_Kteb2zg=@protonmail.com> + <5b0331a5-4e94-465d-a51d-02166e2c1937n@googlegroups.com> + <yt1O1F7NiVj-WkmnYeta1fSqCYNFx8h6OiJaTBmwhmJ2MWAZkmmjPlUST6FM7t6_-2NwWKdglWh77vcnEKA8swiAnQCZJY2SSCAh4DOKt2I=@protonmail.com> + <be78e733-6e9f-4f4e-8dc2-67b79ddbf677n@googlegroups.com> + <jJLDrYTXvTgoslhl1n7Fk9-pL1mMC-0k6gtoniQINmioJpzgtqrJ_WqyFZkLltsCUusnQ4jZ6HbvRC-mGuaUlDi3kcqcFHALd10-JQl-FMY=@protonmail.com> + <9a4c4151-36ed-425a-a535-aa2837919a04n@googlegroups.com> + <3f0064f9-54bd-46a7-9d9a-c54b99aca7b2n@googlegroups.com> + <26b7321b-cc64-44b9-bc95-a4d8feb701e5n@googlegroups.com> + <CALZpt+EwVyaz1=A6hOOycqFGJs+zxyYYocZixTJgVmzZezUs9Q@mail.gmail.com> + <607a2233-ac12-4a80-ae4a-08341b3549b3n@googlegroups.com> + <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com> + <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com> +Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival +MIME-Version: 1.0 +Content-Type: multipart/mixed; + boundary="----=_Part_35620_344008102.1719969200791" +X-Original-Sender: eric@voskuil.org +Precedence: list +Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com +List-ID: <bitcoindev.googlegroups.com> +X-Google-Group-Id: 786775582512 +List-Post: <https://groups.google.com/group/bitcoindev/post>, <mailto:bitcoindev@googlegroups.com> +List-Help: <https://groups.google.com/support/>, <mailto:bitcoindev+help@googlegroups.com> +List-Archive: <https://groups.google.com/group/bitcoindev +List-Subscribe: <https://groups.google.com/group/bitcoindev/subscribe>, <mailto:bitcoindev+subscribe@googlegroups.com> +List-Unsubscribe: <mailto:googlegroups-manage+786775582512+unsubscribe@googlegroups.com>, + <https://groups.google.com/group/bitcoindev/subscribe> +X-Spam-Score: -0.7 (/) + +------=_Part_35620_344008102.1719969200791 +Content-Type: multipart/alternative; + boundary="----=_Part_35621_885950809.1719969200791" + +------=_Part_35621_885950809.1719969200791 +Content-Type: text/plain; charset="UTF-8" + +Hi Antoine R, + +>> Ok, thanks for clarifying. I'm still not making the connection to +"checking a non-null [C] pointer" but that's prob on me. + +> A C pointer, which is a language idiome assigning to a memory address A +the value o memory address B can be 0 (or NULL a standard macro defined in +stddef.h). +> Here a snippet example of linked list code checking the pointer +(`*begin_list`) is non null before the comparison operation to find the +target element list. +> ... +> While both libbitcoin and bitcoin core are both written in c++, you still +have underlying pointer derefencing playing out to access the coinbase +transaction, and all underlying implications in terms of memory management. + +I'm familiar with pointers ;). + +While at some level the block message buffer would generally be referenced +by one or more C pointers, the difference between a valid coinbase input +(i.e. with a "null point") and any other input, is not nullptr vs. +!nullptr. A "null point" is a 36 byte value, 32 0x00 byes followed by 4 +0xff bytes. In his infinite wisdom Satoshi decided it was better (or +easier) to serialize a first block tx (coinbase) with an input containing +an unusable script and pointing to an invalid [tx:index] tuple (input +point) as opposed to just not having any input. That invalid input point is +called a "null point", and of course cannot be pointed to by a "null +pointer". The coinbase must be identified by comparing those 36 bytes to +the well-known null point value (and if this does not match the Merkle hash +cannot have been type64 malleated). + +> I think it's interesting to point out the two types of malleation that a +bitcoin consensus validation logic should respect w.r.t block validity +checks. Like you said the first one on the merkle root committed in the +headers's `hashMerkleRoot` due to the lack of domain separation between +leaf and merkle tree nodes. + +We call this type64 malleability (or malleation where it is not only +possible but occurs). + +> The second one is the bip141 wtxid commitment in one of the coinbase +transaction `scriptpubkey` output, which is itself covered by a txid in the +merkle tree. + +While symmetry seems to imply that the witness commitment would be +malleable, just as the txs commitment, this is not the case. If the tx +commitment is correct it is computationally infeasible for the witness +commitment to be malleated, as the witness commitment incorporates each +full tx (with witness, sentinel, and marker). As such the block identifier, +which relies only on the header and tx commitment, is a sufficient +identifier. Yet it remains necessary to validate the witness commitment to +ensure that the correct witness data has been provided in the block message. + +The second type of malleability, in addition to type64, is what we call +type32. This is the consequence of duplicated trailing sets of txs (and +therefore tx hashes) in a block message. This is applicable to some but not +all blocks, as a function of the number of txs contained. + +>> Caching identity in the case of invalidity is more interesting question +than it might seem. +>> Background: A fully-validated block has established identity in its +block hash. However an invalid block message may include the same block +header, producing the same hash, but with any kind of nonsense following +the header. The purpose of the transaction and witness commitments is of +course to establish this identity, so these two checks are therefore +necessary even under checkpoint/milestone. And then of course the two +Merkle tree issues complicate the tx commitment (the integrity of the +witness commitment is assured by that of the tx commitment). +>> +>> So what does it mean to speak of a block hash derived from: +>> (1) a block message with an unparseable header? +>> (2) a block message with parseable but invalid header? +>> (3) a block message with valid header but unparseable tx data? +>> (4) a block message with valid header but parseable invalid uncommitted +tx data? +>> (5) a block message with valid header but parseable invalid malleated +committed tx data? +>> (6) a block message with valid header but parseable invalid unmalleated +committed tx data? +>> (7) a block message with valid header but uncommitted valid tx data? +>> (8) a block message with valid header but malleated committed valid tx +data? +>> (9) a block message with valid header but unmalleated committed valid tx +data? +>> +>> Note that only the #9 p2p block message contains an actual Bitcoin +block, the others are bogus messages. In all cases the message can be +sha256 hashed to establish the identity of the *message*. And if one's +objective is to reject repeating bogus messages, this might be a useful +strategy. It's already part of the p2p protocol, is orders of magnitude +cheaper to produce than a Merkle root, and has no identity issues. + +> I think I mostly agree with the identity issue as laid out so far, there +is one caveat to add if you're considering identity caching as the problem +solved. A validation node might have to consider differently block messages +processed if they connect on the longest most PoW valid chain for which all +blocks have been validated. Or alternatively if they have to be added on a +candidate longest most PoW valid chain. + +Certainly an important consideration. We store both types. Once there is a +stronger candidate header chain we store the headers and proceed to +obtaining the blocks (if we don't already have them). The blocks are stored +in the same table; the confirmed vs. candidate indexes simply point to them +as applicable. It is feasible (and has happened twice) for two blocks to +share the very same coinbase tx, even with either/all bip30/34/90 active +(and setting aside future issues here for the sake of simplicity). This +remains only because two competing branches can have blocks at the same +height, and bip34 requires only height in the coinbase input script. This +therefore implies the same transaction but distinct blocks. It is however +infeasible for one block to exist in multiple distinct chains. In order for +this to happen two blocks at the same height must have the same coinbase +(ok), and also the same parent (ok). But this then means that they either +(1) have distinct identity due to another header property deviation, or (2) +are the same block with the same parent and are therefore in just one +chain. So I don't see an actual caveat. I'm not certain if this is the +ambiguity that you were referring to. If not please feel free to clarify. + +>> The concept of Bitcoin block hash as unique identifier for invalid p2p +block messages is problematic. Apart from the malleation question, what is +the Bitcoin block hash for a message with unparseable data (#1 and #3)? +Such messages are trivial to produce and have no block hash. + +> For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` (in +`src/node/connection_types.h`) where some preferential peering policy is +applied in matters of block messages download. + +We don't do this and I don't see how it would be relevant. If a peer +provides any invalid message or otherwise violates the protocol it is +simply dropped. + +The "problematic" that I'm referring to is the reliance on the block hash +as a message identifier, because it does not identify the message and +cannot be useful in an effectively unlimited number of zero-cost cases. + +>> What is the useful identifier for a block with malleated commitments (#5 +and #8) or invalid commitments (#4 and #7) - valid txs or otherwise? + +> The block header, as it commits to the transaction identifier tree can be +useful as much for #4 and #5. + +#4 and #5 refer to "uncommitted" and "malleated committed". It may not be +clear, but "uncommitted" means that the tx commitment is not valid (Merkle +root doesn't match the header's value) and "malleated committed" means that +the (matching) commitment cannot be relied upon because the txs represent +malleation, invalidating the identifier. So neither of these are usable +identifiers. + +> On the bitcoin core side, about #7 the uncommitted valid tx data can be +already present in the validation cache from mempool acceptance. About #8, +the malleaed committed valid transactions shall be also committed in the +merkle root in headers. + +It seems you may be referring to "unconfirmed" txs as opposed to +"uncommitted" txs. This doesn't pertain to tx storage or identifiers. +Neither #7 nor #8 are usable for the same reasons. + +>> This seems reasonable at first glance, but given the list of scenarios +above, which does it apply to? + +>> This seems reasonable at first glance, but given the list of scenarios +above, which does it apply to? Presumably the invalid header (#2) doesn't +get this far because of headers-first. +>> That leaves just invalid blocks with useful block hash identifiers (#6). +In all other cases the message is simply discarded. In this case the +attempt is to move category #5 into category #6 by prohibiting 64 byte txs. + +> Yes, it's moving from the category #5 to the category #6. Note, +transaction malleability can be a distinct issue than lack of domain +separation. + +I'm making no reference to tx malleability. This concerns only Merkle tree +(block hash) malleability, the two types described in detail in the paper I +referenced earlier, here again: + +https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20190225/a27d8837/attachment-0001.pdf + +>> The requirement to "avoid re-downloading and re-validating it" is about +performance, presumably minimizing initial block download/catch-up time. +There is a > computational cost to producing 64 byte malleations and none +for any of the other bogus block message categories above, including the +other form of malleation. > Furthermore, 64 byte malleation has almost zero +cost to preclude. No hashing and not even true header or tx parsing are +required. Only a handful of bytes must be read > from the raw message +before it can be discarded presently. + +>> That's actually far cheaper than any of the other scenarios that again, +have no cost to produce. The other type of malleation requires parsing all +of the txs in the block and > hashing and comparing some or all of them. In +other words, if there is an attack scenario, that must be addressed before +this can be meaningful. In fact all of the other bogus message scenarios +(with tx data) will remain more expensive to discard than this one. + +> In practice on the bitcoin core side, the bogus block message categories +from #4 to #6 are already mitigated by validation caching for transactions +that have been received early. While libbitcoin has no mempool (at least in +earlier versions) transactions buffering can be done by bip152's +HeadersAndShortIds message. + +Again, this has no relation to tx hashes/identifiers. Libbitcoin has a tx +pool, we just don't store them in RAM (memory). + +> About #7 and #8, introducing a domain separation where 64 bytes +transactions are rejected and making it harder to exploit #7 and #8 +categories of bogus block messages. This is correct that bitcoin core might +accept valid transaction data before the merkle tree commitment has been +verified. + +I don't follow this. An invalid 64 byte tx consensus rule would definitely +not make it harder to exploit block message invalidity. In fact it would +just slow down validation by adding a redundant rule. Furthermore, as I +have detailed in a previous message, caching invalidity does absolutely +nothing to increase protection. In fact it makes the situation materially +worse. + +>> The problem arises from trying to optimize dismissal by storing an +identifier. Just *producing* the identifier is orders of magnitude more +costly than simply dismissing this > bogus message. I can't imagine why any +implementation would want to compute and store and retrieve and recompute +and compare hashes when the alterative is just dismissing the bogus +messages with no hashing at all. + +>> Bogus messages will arrive, they do not even have to be requested. The +simplest are dealt with by parse failure. What defines a parse is entirely +subjective. Generally it's +>> "structural" but nothing precludes incorporating a requirement for a +necessary leading pattern in the stream, sort of like how the witness +pattern is identified. If we were +>> going to prioritize early dismissal this is where we would put it. + +> I don't think this is that simple - While producing an identifier comes +with a computational cost (e.g fixed 64-byte structured coinbase +transaction), if the full node have a hierarchy of validation cache like +bitcoin core has already, the cost of bogus block messages can be slashed +down. + +No, this is not the case. As I detailed in my previous message, there is no +possible scenario where invalidation caching does anything but make the +situation materially worse. + +> On the other hand, just dealing with parse failure on the spot by +introducing a leading pattern in the stream just inflates the size of p2p +messages, and the transaction-relay bandwidth cost. + +I think you misunderstood me. I am suggesting no change to serialization. I +can see how it might be unclear, but I said, "nothing precludes +incorporating a requirement for a necessary leading pattern in the stream." +I meant that the parser can simply incorporate the *requirement* that the +byte stream starts with a null input point. That identifies the malleation +or invalidity without a single hash operation and while only reading a +handful of bytes. No change to any messages. + +>> However, there is a tradeoff in terms of early dismissal. Looking up +invalid hashes is a costly tradeoff, which becomes multiplied by every +block validated. For example, expending 1 millisecond in hash/lookup to +save 1 second of validation time in the failure case seems like a +reasonable tradeoff, until you multiply across the whole chain. > 1 ms +becomes 14 minutes across the chain, just to save a second for each mallied +block encountered. That means you need to have encountered 840 such mallied +blocks > just to break even. Early dismissing the block for non-null +coinbase point (without hashing anything) would be on the order of 1000x +faster than that (breakeven at 1 > encounter). So why the block hash cache +requirement? It cannot be applied to many scenarios, and cannot be optimal +in this one. + +> I think what you're describing is more a classic time-space tradeoff +which is well-known in classic computer science litterature. In my +reasonable opinion, one should more reason under what is the security +paradigm we wish for bitcoin block-relay network and perduring +decentralization, i.e one where it's easy to verify block messages proofs +which could have been generated on specialized hardware with an asymmetric +cost. Obviously encountering 840 such malliead blocks to make it break even +doesn't make the math up to save on hash lookup, unless you can reduce the +attack scenario in terms of adversaries capabilities. + +I'm referring to DoS mitigation (the only relevant security consideration +here). I'm pointing out that invalidity caching is pointless in all cases, +and in this case is the most pointless as type64 malleation is the cheapest +of all invalidity to detect. I would prefer that all bogus blocks sent to +my node are of this type. The worst types of invalidity detection have no +mitigation and from a security standpoint are counterproductive to cache. +I'm describing what overall is actually not a tradeoff. It's all negative +and no positive. + +Best, +Eric + +-- +You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group. +To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com. +To view this discussion on the web visit https://groups.google.com/d/msgid/bitcoindev/d9834ad5-f803-4a39-a854-95b2439738f5n%40googlegroups.com. + +------=_Part_35621_885950809.1719969200791 +Content-Type: text/html; charset="UTF-8" +Content-Transfer-Encoding: quoted-printable + +Hi Antoine R,<br /><br />>> Ok, thanks for clarifying. I'm still not = +making the connection to "checking a non-null [C] pointer" but that's prob = +on me.<br /><br />> A C pointer, which is a language idiome assigning to= + a memory address A the value o memory address B can be 0 (or NULL a standa= +rd macro defined in stddef.h).<br />> Here a snippet example of linked l= +ist code checking the pointer (`*begin_list`) is non null before the compar= +ison operation to find the target element list.<br />> ...<br />> Whi= +le both libbitcoin and bitcoin core are both written in c++, you still have= + underlying pointer derefencing playing out to access the coinbase transact= +ion, and all underlying implications in terms of memory management.<br /><b= +r />I'm familiar with pointers ;).<br /><br />While at some level the block= + message buffer would generally be referenced by one or more C pointers, th= +e difference between a valid coinbase input (i.e. with a "null point") and = +any other input, is not nullptr vs. !nullptr. A "null point" is a 36 byte v= +alue, 32 0x00 byes followed by 4 0xff bytes. In his infinite wisdom Satoshi= + decided it was better (or easier) to serialize a first block tx (coinbase)= + with an input containing an unusable script and pointing to an invalid [tx= +:index] tuple (input point) as opposed to just not having any input. That i= +nvalid input point is called a "null point", and of course cannot be pointe= +d to by a "null pointer". The coinbase must be identified by comparing thos= +e 36 bytes to the well-known null point value (and if this does not match t= +he Merkle hash cannot have been type64 malleated).<br /><br /><div>> I t= +hink it's interesting to point out the two types of malleation that a bitco= +in consensus validation logic should respect w.r.t block validity checks.= +=C2=A0Like you said the first one on the merkle root committed in the heade= +rs's `hashMerkleRoot` due to the lack of domain separation between leaf and= + merkle tree nodes.<br /></div><div><br />We call this type64 malleability = +(or malleation where it is not only possible but occurs).<br /><br />> T= +he second one is the bip141 wtxid commitment in one of the coinbase transac= +tion `scriptpubkey` output, which is itself covered by a txid in the merkle= + tree.<br /><br />While symmetry seems to imply that the witness commitment= + would be malleable, just as the txs commitment, this is not the case. If t= +he tx commitment is correct it is computationally infeasible for the witnes= +s commitment to be malleated, as the witness commitment incorporates each f= +ull tx (with witness, sentinel, and marker). As such the block identifier, = +which relies only on the header and tx commitment, is a sufficient identifi= +er. Yet it remains necessary to validate the witness commitment to ensure t= +hat the correct witness data has been provided in the block message.<br /><= +br />The second type of malleability, in addition to type64, is what we cal= +l type32. This is the consequence of duplicated trailing sets of txs (and t= +herefore tx hashes) in a block message. This is applicable to some but not = +all blocks, as a function of the number of txs contained.<br /><br />>&g= +t; Caching identity in the case of invalidity is more interesting question = +than it might seem.<br />>> Background: A fully-validated block has e= +stablished identity in its block hash. However an invalid block message may= + include the same block header, producing the same hash, but with any kind = +of nonsense following the header. The purpose of the transaction and witnes= +s commitments is of course to establish this identity, so these two checks = +are therefore necessary even under checkpoint/milestone. And then of course= + the two Merkle tree issues complicate the tx commitment (the integrity of = +the witness commitment is assured by that of the tx commitment).<br />>&= +gt;<br />>> So what does it mean to speak of a block hash derived fro= +m:<br />>> (1) a block message with an unparseable header?<br />>&= +gt; (2) a block message with parseable but invalid header?<br />>> (3= +) a block message with valid header but unparseable tx data?<br />>> = +(4) a block message with valid header but parseable invalid uncommitted tx = +data?<br />>> (5) a block message with valid header but parseable inv= +alid malleated committed tx data?<br />>> (6) a block message with va= +lid header but parseable invalid unmalleated committed tx data?<br />>&g= +t; (7) a block message with valid header but uncommitted valid tx data?<br = +/>>> (8) a block message with valid header but malleated committed va= +lid tx data?<br />>> (9) a block message with valid header but unmall= +eated committed valid tx data?<br />>><br />>> Note that only t= +he #9 p2p block message contains an actual Bitcoin block, the others are bo= +gus messages. In all cases the message can be sha256 hashed to establish th= +e identity of the *message*. And if one's objective is to reject repeating = +bogus messages, this might be a useful strategy. It's already part of the p= +2p protocol, is orders of magnitude cheaper to produce than a Merkle root, = +and has no identity issues.<br /><br />> I think I mostly agree with the= + identity issue as laid out so far, there is one caveat to add if you're co= +nsidering identity caching as the problem solved. A validation node might h= +ave to consider differently block messages processed if they connect on the= + longest most PoW valid chain for which all blocks have been validated. Or = +alternatively if they have to be added on a candidate longest most PoW vali= +d chain.<br /><br />Certainly an important consideration. We store both typ= +es. Once there is a stronger candidate header chain we store the headers an= +d proceed to obtaining the blocks (if we don't already have them). The bloc= +ks are stored in the same table; the confirmed vs. candidate indexes simply= + point to them as applicable. It is feasible (and has happened twice) for t= +wo blocks to share the very same coinbase tx, even with either/all bip30/34= +/90 active (and setting aside future issues here for the sake of simplicity= +). This remains only because two competing branches can have blocks at the = +same height, and bip34 requires only height in the coinbase input script. T= +his therefore implies the same transaction but distinct blocks. It is howev= +er infeasible for one block to exist in multiple distinct chains. In order = +for this to happen two blocks at the same height must have the same coinbas= +e (ok), and also the same parent (ok). But this then means that they either= + (1) have distinct identity due to another header property deviation, or (2= +) are the same block with the same parent and are therefore in just one cha= +in. So I don't see an actual caveat. I'm not certain if this is the ambigui= +ty that you were referring to. If not please feel free to clarify.<br /><br= + />>> The concept of Bitcoin block hash as unique identifier for inva= +lid p2p block messages is problematic. Apart from the malleation question, = +what is the Bitcoin block hash for a message with unparseable data (#1 and = +#3)? Such messages are trivial to produce and have no block hash.<br /><br = +/>> For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` = +(in `src/node/connection_types.h`) where some preferential peering policy i= +s applied in matters of block messages download.<br /><br />We don't do thi= +s and I don't see how it would be relevant. If a peer provides any invalid = +message or otherwise violates the protocol it is simply dropped.<br /><br /= +>The "problematic" that I'm referring to is the reliance on the block hash = +as a message identifier, because it does not identify the message and canno= +t be useful in an effectively unlimited number of zero-cost cases.<br /><br= + />>> What is the useful identifier for a block with malleated commit= +ments (#5 and #8) or invalid commitments (#4 and #7) - valid txs or otherwi= +se?<br /><br />> The block header, as it commits to the transaction iden= +tifier tree can be useful as much for #4 and #5.<br /><br />#4 and #5 refer= + to "uncommitted" and "malleated committed". It may not be clear, but "unco= +mmitted" means that the tx commitment is not valid (Merkle root doesn't mat= +ch the header's value) and "malleated committed" means that the (matching) = +commitment cannot be relied upon because the txs represent malleation, inva= +lidating the identifier. So neither of these are usable identifiers.<br /><= +br />> On the bitcoin core side, about #7 the uncommitted valid tx data = +can be already present in the validation cache from mempool acceptance. Abo= +ut #8, the malleaed committed valid transactions shall be also committed in= + the merkle root in headers.<br /><br />It seems you may be referring to "u= +nconfirmed" txs as opposed to "uncommitted" txs. This doesn't pertain to tx= + storage or identifiers. Neither #7 nor #8 are usable for the same reasons.= +<br /><br />>> This seems reasonable at first glance, but given the l= +ist of scenarios above, which does it apply to?<br /><br />>> This se= +ems reasonable at first glance, but given the list of scenarios above, whic= +h does it apply to? Presumably the invalid header (#2) doesn't get this far= + because of headers-first.<br />>> That leaves just invalid blocks wi= +th useful block hash identifiers (#6). In all other cases the message is si= +mply discarded. In this case the attempt is to move category #5 into catego= +ry #6 by prohibiting 64 byte txs.<br /><br />> Yes, it's moving from the= + category #5 to the category #6. Note, transaction malleability can be a di= +stinct issue than lack of domain separation.<br /><br />I'm making no refer= +ence to tx malleability. This concerns only Merkle tree (block hash) mallea= +bility, the two types described in detail in the paper I referenced earlier= +, here again:<br /><br />https://lists.linuxfoundation.org/pipermail/bitcoi= +n-dev/attachments/20190225/a27d8837/attachment-0001.pdf<br /><br />>>= + The requirement to "avoid re-downloading and re-validating it" is about pe= +rformance, presumably minimizing initial block download/catch-up time. Ther= +e is a > computational cost to producing 64 byte malleations and none fo= +r any of the other bogus block message categories above, including the othe= +r form of malleation. > Furthermore, 64 byte malleation has almost zero = +cost to preclude. No hashing and not even true header or tx parsing are req= +uired. Only a handful of bytes must be read > from the raw message befor= +e it can be discarded presently.<br /><br />>> That's actually far ch= +eaper than any of the other scenarios that again, have no cost to produce. = +The other type of malleation requires parsing all of the txs in the block a= +nd > hashing and comparing some or all of them. In other words, if there= + is an attack scenario, that must be addressed before this can be meaningfu= +l. In fact all of the other bogus message scenarios (with tx data) will rem= +ain more expensive to discard than this one.<br /><br />> In practice on= + the bitcoin core side, the bogus block message categories from #4 to #6 ar= +e already mitigated by validation caching for transactions that have been r= +eceived early. While libbitcoin has no mempool (at least in earlier version= +s) transactions buffering can be done by bip152's HeadersAndShortIds messag= +e.<br /><br />Again, this has no relation to tx hashes/identifiers. Libbitc= +oin has a tx pool, we just don't store them in RAM (memory).<br /><br />>= +; About #7 and #8, introducing a domain separation where 64 bytes transacti= +ons are rejected and making it harder to exploit #7 and #8 categories of bo= +gus block messages. This is correct that bitcoin core might accept valid tr= +ansaction data before the merkle tree commitment has been verified.<br /><b= +r />I don't follow this. An invalid 64 byte tx consensus rule would definit= +ely not make it harder to exploit block message invalidity. In fact it woul= +d just slow down validation by adding a redundant rule. Furthermore, as I h= +ave detailed in a previous message, caching invalidity does absolutely noth= +ing to increase protection. In fact it makes the situation materially worse= +.<br /><br />>> The problem arises from trying to optimize dismissal = +by storing an identifier. Just *producing* the identifier is orders of magn= +itude more costly than simply dismissing this > bogus message. I can't i= +magine why any implementation would want to compute and store and retrieve = +and recompute and compare hashes when the alterative is just dismissing the= + bogus messages with no hashing at all.<br /><br />>> Bogus messages = +will arrive, they do not even have to be requested. The simplest are dealt = +with by parse failure. What defines a parse is entirely subjective. General= +ly it's<br />>> "structural" but nothing precludes incorporating a re= +quirement for a necessary leading pattern in the stream, sort of like how t= +he witness pattern is identified. If we were<br />>> going to priorit= +ize early dismissal this is where we would put it.<br /><br />> I don't = +think this is that simple - While producing an identifier comes with a comp= +utational cost (e.g fixed 64-byte structured coinbase transaction), if the = +full node have a hierarchy of validation cache like bitcoin core has alread= +y, the cost of bogus block messages can be slashed down.<br /><br />No, thi= +s is not the case. As I detailed in my previous message, there is no possib= +le scenario where invalidation caching does anything but make the situation= + materially worse.<br /><br />> On the other hand, just dealing with par= +se failure on the spot by introducing a leading pattern in the stream just = +inflates the size of p2p messages, and the transaction-relay bandwidth cost= +.<br /><br />I think you misunderstood me. I am suggesting no change to ser= +ialization. I can see how it might be unclear, but I said, "nothing preclud= +es incorporating a requirement for a necessary leading pattern in the strea= +m." I meant that the parser can simply incorporate the *requirement* that t= +he byte stream starts with a null input point. That identifies the malleati= +on or invalidity without a single hash operation and while only reading a h= +andful of bytes. No change to any messages.<br /><br />>> However, th= +ere is a tradeoff in terms of early dismissal. Looking up invalid hashes is= + a costly tradeoff, which becomes multiplied by every block validated. For = +example, expending 1 millisecond in hash/lookup to save 1 second of validat= +ion time in the failure case seems like a reasonable tradeoff, until you mu= +ltiply across the whole chain. > 1 ms becomes 14 minutes across the chai= +n, just to save a second for each mallied block encountered. That means you= + need to have encountered 840 such mallied blocks > just to break even. = +Early dismissing the block for non-null coinbase point (without hashing any= +thing) would be on the order of 1000x faster than that (breakeven at 1 >= + encounter). So why the block hash cache requirement? It cannot be applied = +to many scenarios, and cannot be optimal in this one.<br /><br />> I thi= +nk what you're describing is more a classic time-space tradeoff which is we= +ll-known in classic computer science litterature. In my reasonable opinion,= + one should more reason under what is the security paradigm we wish for bit= +coin block-relay network and perduring decentralization, i.e one where it's= + easy to verify block messages proofs which could have been generated on sp= +ecialized hardware with an asymmetric cost. Obviously encountering 840 such= + malliead blocks to make it break even doesn't make the math up to save on = +hash lookup, unless you can reduce the attack scenario in terms of adversar= +ies capabilities.<br /><br />I'm referring to DoS mitigation (the only rele= +vant security consideration here). I'm pointing out that invalidity caching= + is pointless in all cases, and in this case is the most pointless as type6= +4 malleation is the cheapest of all invalidity to detect. I would prefer th= +at all bogus blocks sent to my node are of this type. The worst types of in= +validity detection have no mitigation and from a security standpoint are co= +unterproductive to cache. I'm describing what overall is actually not a tra= +deoff. It's all negative and no positive.<br /><br />Best,<br />Eric</div> + +<p></p> + +-- <br /> +You received this message because you are subscribed to the Google Groups &= +quot;Bitcoin Development Mailing List" group.<br /> +To unsubscribe from this group and stop receiving emails from it, send an e= +mail to <a href=3D"mailto:bitcoindev+unsubscribe@googlegroups.com">bitcoind= +ev+unsubscribe@googlegroups.com</a>.<br /> +To view this discussion on the web visit <a href=3D"https://groups.google.c= +om/d/msgid/bitcoindev/d9834ad5-f803-4a39-a854-95b2439738f5n%40googlegroups.= +com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com/d/msg= +id/bitcoindev/d9834ad5-f803-4a39-a854-95b2439738f5n%40googlegroups.com</a>.= +<br /> + +------=_Part_35621_885950809.1719969200791-- + +------=_Part_35620_344008102.1719969200791-- + |