summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorEric Voskuil <eric@voskuil.org>2024-07-02 18:13:20 -0700
committerbitcoindev <bitcoindev@googlegroups.com>2024-07-02 18:31:02 -0700
commite474eeefe150b4771b10abd354313e3a5524020f (patch)
tree9ec581b0ce95cba77111945c354db9bb19367dd1
parentbf3423d81b972daccb15116f12f01a3dbcc07e3d (diff)
downloadpi-bitcoindev-master.tar.gz
pi-bitcoindev-master.zip
Re: [bitcoindev] Re: Great Consensus Cleanup RevivalHEADmaster
-rw-r--r--8b/4782c0efb0a459f62cb2cb7f5b9a91e2bb1564639
1 files changed, 639 insertions, 0 deletions
diff --git a/8b/4782c0efb0a459f62cb2cb7f5b9a91e2bb1564 b/8b/4782c0efb0a459f62cb2cb7f5b9a91e2bb1564
new file mode 100644
index 000000000..39ca88a53
--- /dev/null
+++ b/8b/4782c0efb0a459f62cb2cb7f5b9a91e2bb1564
@@ -0,0 +1,639 @@
+Delivery-date: Tue, 02 Jul 2024 18:31:02 -0700
+Received: from mail-yb1-f187.google.com ([209.85.219.187])
+ by mail.fairlystable.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
+ (Exim 4.94.2)
+ (envelope-from <bitcoindev+bncBC5P5KEHZQLBBTOTSK2AMGQE5N3FB5I@googlegroups.com>)
+ id 1sOopo-0007je-TR
+ for bitcoindev@gnusha.org; Tue, 02 Jul 2024 18:31:02 -0700
+Received: by mail-yb1-f187.google.com with SMTP id 3f1490d57ef6-e03a92302d1sf817548276.1
+ for <bitcoindev@gnusha.org>; Tue, 02 Jul 2024 18:31:00 -0700 (PDT)
+DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
+ d=googlegroups.com; s=20230601; t=1719970254; x=1720575054; darn=gnusha.org;
+ h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
+ :list-id:mailing-list:precedence:x-original-sender:mime-version
+ :subject:references:in-reply-to:message-id:to:from:date:sender:from
+ :to:cc:subject:date:message-id:reply-to;
+ bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=;
+ b=bHnBaYi6CdztbUdGJas0Dz8G7k4UHb/ubQAPZ8zy714swVXcwb56zA4HGvSxzZkfQc
+ 17Rz1vAzMQLUGhx9IpD5rpDxNXPKNUKGr1srx1Hf6WclGal9hKhfcbLIBoLGfS3V+BG9
+ /+OuyDbxZPbCmhATNdgbli2m+reHAUhrQarkdZbh+JiOllC52PmZJ841JGzh9lJ+XfIx
+ jw0qT66q4D0l741RqBXkZ28/MxQl0PRNpx9fUBa6ctIYBeSZH6RP9p0/FEf1erf6XLJm
+ 7O3IEIq1+pxr2oBbOm0GqQUA9gKJL7mBWCIiv5TGQvVyoy1cx7dBKDSE34YUHLk4jyq4
+ tTPw==
+DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
+ d=googlegroups-com.20230601.gappssmtp.com; s=20230601; t=1719970254; x=1720575054; darn=gnusha.org;
+ h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
+ :list-id:mailing-list:precedence:x-original-sender:mime-version
+ :subject:references:in-reply-to:message-id:to:from:date:from:to:cc
+ :subject:date:message-id:reply-to;
+ bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=;
+ b=irys9hm5QLYCa4xVccz5fdg/x+BHkEA8mUkN7kbYkHO5gQnDnL1qc5LzqXJNlyqMrS
+ 6IWoFhZH2IHlVIt66iAcwtXpeDNxUfkwOt6BnMYf4Ce5SPTuPxI6KiIFHElsJi5IeLI9
+ KHWdlDkqwKeoAPpGQM75HDwfM0VRu57LcVEJsvX6n2BnTuHlggesJJX0sGSW8QxsNxYp
+ 9jwxJUV67dXCFBEcJc7mPtNw979vcO064CGa0xI2Jaxwt1m7YTHAECaURoWp7nO8yGSl
+ 2fku+QxmIdT7uRPG2EvLPeg1YO1fv+WDuow29GyE/P3m1bSlfsvPgLv0CWa7TCft6WfF
+ Fz5w==
+X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
+ d=1e100.net; s=20230601; t=1719970254; x=1720575054;
+ h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
+ :list-id:mailing-list:precedence:x-original-sender:mime-version
+ :subject:references:in-reply-to:message-id:to:from:date:x-beenthere
+ :x-gm-message-state:sender:from:to:cc:subject:date:message-id
+ :reply-to;
+ bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=;
+ b=a+8m231YYiPYIzYDmP3U1NNl2pj5dF8pHYI3A9ltIvOH3vU9BPtoC5QeOybSYdcDDz
+ LuLzuLV9iH3rPClhdl/rCc+bFo5hRpR4HX8bB5ge3MU5fxJLfcK8h4LICFeQxn67rqlT
+ dXgIB26wPWB7PIdaCOhOe563YnEimGVxb2JlYqjbhPi1v26GS2B//wA8YEzCIISJi07e
+ tdCHakMLTUIvX0BAFByh5m4cHWKuE2rOyVRa9DHByb7jDG/4uHXtR2n6z3yP0D/pipTg
+ t+CEL9pzlAp0aFLXps8QMeZf4vtism1590xq63pKXUqTmF7qpm3IUrN7SqgU7LP68JwY
+ 0BPg==
+Sender: bitcoindev@googlegroups.com
+X-Forwarded-Encrypted: i=1; AJvYcCV8vfW22ZVamQzVnzWU7X2sp5jLzcLfzYDTqz1nhS6mxpkF2kE62osiDjxVkGwFkRsBcvMjRchLmuzPGAvG0YnQqnL4PlY=
+X-Gm-Message-State: AOJu0YyGauLvl55DjJ1R9Fz3gX92rbX+GnYT0+twUxeVfGsMDGaSkIaZ
+ 0dMGH21VByzxHsYrzhrIrkkCUEKOJn5dR3vi0RudYr4scE34TKER
+X-Google-Smtp-Source: AGHT+IH4ZqWsLpki8Sdiv4KRJM30Qgv/PKh4KkXmPZUsFJmJ4MJMTFA+HO6ZyRwsqAnGDidr11+HtQ==
+X-Received: by 2002:a25:df16:0:b0:e03:229d:69f5 with SMTP id 3f1490d57ef6-e036ead1e52mr11151803276.3.1719970254432;
+ Tue, 02 Jul 2024 18:30:54 -0700 (PDT)
+X-BeenThere: bitcoindev@googlegroups.com
+Received: by 2002:a05:6902:1007:b0:e03:6457:383f with SMTP id
+ 3f1490d57ef6-e0364573bd8ls6892124276.1.-pod-prod-09-us; Tue, 02 Jul 2024
+ 18:30:53 -0700 (PDT)
+X-Received: by 2002:a05:6902:2b8a:b0:e03:5a51:382f with SMTP id 3f1490d57ef6-e036ec429bcmr980548276.8.1719970253037;
+ Tue, 02 Jul 2024 18:30:53 -0700 (PDT)
+Received: by 2002:a05:690c:4289:b0:63b:c3b0:e1c with SMTP id 00721157ae682-6514011671ams7b3;
+ Tue, 2 Jul 2024 18:13:22 -0700 (PDT)
+X-Received: by 2002:a05:690c:fc8:b0:64b:16af:d264 with SMTP id 00721157ae682-64c776d2fd5mr283127b3.7.1719969201017;
+ Tue, 02 Jul 2024 18:13:21 -0700 (PDT)
+Date: Tue, 2 Jul 2024 18:13:20 -0700 (PDT)
+From: Eric Voskuil <eric@voskuil.org>
+To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
+Message-Id: <d9834ad5-f803-4a39-a854-95b2439738f5n@googlegroups.com>
+In-Reply-To: <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com>
+References: <gnM89sIQ7MhDgI62JciQEGy63DassEv7YZAMhj0IEuIo0EdnafykF6RH4OqjTTHIHsIoZvC2MnTUzJI7EfET4o-UQoD-XAQRDcct994VarE=@protonmail.com>
+ <72e83c31-408f-4c13-bff5-bf0789302e23n@googlegroups.com>
+ <heKH68GFJr4Zuf6lBozPJrb-StyBJPMNvmZL0xvKFBnBGVA3fVSgTLdWc-_8igYWX8z3zCGvzflH-CsRv0QCJQcfwizNyYXlBJa_Kteb2zg=@protonmail.com>
+ <5b0331a5-4e94-465d-a51d-02166e2c1937n@googlegroups.com>
+ <yt1O1F7NiVj-WkmnYeta1fSqCYNFx8h6OiJaTBmwhmJ2MWAZkmmjPlUST6FM7t6_-2NwWKdglWh77vcnEKA8swiAnQCZJY2SSCAh4DOKt2I=@protonmail.com>
+ <be78e733-6e9f-4f4e-8dc2-67b79ddbf677n@googlegroups.com>
+ <jJLDrYTXvTgoslhl1n7Fk9-pL1mMC-0k6gtoniQINmioJpzgtqrJ_WqyFZkLltsCUusnQ4jZ6HbvRC-mGuaUlDi3kcqcFHALd10-JQl-FMY=@protonmail.com>
+ <9a4c4151-36ed-425a-a535-aa2837919a04n@googlegroups.com>
+ <3f0064f9-54bd-46a7-9d9a-c54b99aca7b2n@googlegroups.com>
+ <26b7321b-cc64-44b9-bc95-a4d8feb701e5n@googlegroups.com>
+ <CALZpt+EwVyaz1=A6hOOycqFGJs+zxyYYocZixTJgVmzZezUs9Q@mail.gmail.com>
+ <607a2233-ac12-4a80-ae4a-08341b3549b3n@googlegroups.com>
+ <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com>
+ <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com>
+Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival
+MIME-Version: 1.0
+Content-Type: multipart/mixed;
+ boundary="----=_Part_35620_344008102.1719969200791"
+X-Original-Sender: eric@voskuil.org
+Precedence: list
+Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com
+List-ID: <bitcoindev.googlegroups.com>
+X-Google-Group-Id: 786775582512
+List-Post: <https://groups.google.com/group/bitcoindev/post>, <mailto:bitcoindev@googlegroups.com>
+List-Help: <https://groups.google.com/support/>, <mailto:bitcoindev+help@googlegroups.com>
+List-Archive: <https://groups.google.com/group/bitcoindev
+List-Subscribe: <https://groups.google.com/group/bitcoindev/subscribe>, <mailto:bitcoindev+subscribe@googlegroups.com>
+List-Unsubscribe: <mailto:googlegroups-manage+786775582512+unsubscribe@googlegroups.com>,
+ <https://groups.google.com/group/bitcoindev/subscribe>
+X-Spam-Score: -0.7 (/)
+
+------=_Part_35620_344008102.1719969200791
+Content-Type: multipart/alternative;
+ boundary="----=_Part_35621_885950809.1719969200791"
+
+------=_Part_35621_885950809.1719969200791
+Content-Type: text/plain; charset="UTF-8"
+
+Hi Antoine R,
+
+>> Ok, thanks for clarifying. I'm still not making the connection to
+"checking a non-null [C] pointer" but that's prob on me.
+
+> A C pointer, which is a language idiome assigning to a memory address A
+the value o memory address B can be 0 (or NULL a standard macro defined in
+stddef.h).
+> Here a snippet example of linked list code checking the pointer
+(`*begin_list`) is non null before the comparison operation to find the
+target element list.
+> ...
+> While both libbitcoin and bitcoin core are both written in c++, you still
+have underlying pointer derefencing playing out to access the coinbase
+transaction, and all underlying implications in terms of memory management.
+
+I'm familiar with pointers ;).
+
+While at some level the block message buffer would generally be referenced
+by one or more C pointers, the difference between a valid coinbase input
+(i.e. with a "null point") and any other input, is not nullptr vs.
+!nullptr. A "null point" is a 36 byte value, 32 0x00 byes followed by 4
+0xff bytes. In his infinite wisdom Satoshi decided it was better (or
+easier) to serialize a first block tx (coinbase) with an input containing
+an unusable script and pointing to an invalid [tx:index] tuple (input
+point) as opposed to just not having any input. That invalid input point is
+called a "null point", and of course cannot be pointed to by a "null
+pointer". The coinbase must be identified by comparing those 36 bytes to
+the well-known null point value (and if this does not match the Merkle hash
+cannot have been type64 malleated).
+
+> I think it's interesting to point out the two types of malleation that a
+bitcoin consensus validation logic should respect w.r.t block validity
+checks. Like you said the first one on the merkle root committed in the
+headers's `hashMerkleRoot` due to the lack of domain separation between
+leaf and merkle tree nodes.
+
+We call this type64 malleability (or malleation where it is not only
+possible but occurs).
+
+> The second one is the bip141 wtxid commitment in one of the coinbase
+transaction `scriptpubkey` output, which is itself covered by a txid in the
+merkle tree.
+
+While symmetry seems to imply that the witness commitment would be
+malleable, just as the txs commitment, this is not the case. If the tx
+commitment is correct it is computationally infeasible for the witness
+commitment to be malleated, as the witness commitment incorporates each
+full tx (with witness, sentinel, and marker). As such the block identifier,
+which relies only on the header and tx commitment, is a sufficient
+identifier. Yet it remains necessary to validate the witness commitment to
+ensure that the correct witness data has been provided in the block message.
+
+The second type of malleability, in addition to type64, is what we call
+type32. This is the consequence of duplicated trailing sets of txs (and
+therefore tx hashes) in a block message. This is applicable to some but not
+all blocks, as a function of the number of txs contained.
+
+>> Caching identity in the case of invalidity is more interesting question
+than it might seem.
+>> Background: A fully-validated block has established identity in its
+block hash. However an invalid block message may include the same block
+header, producing the same hash, but with any kind of nonsense following
+the header. The purpose of the transaction and witness commitments is of
+course to establish this identity, so these two checks are therefore
+necessary even under checkpoint/milestone. And then of course the two
+Merkle tree issues complicate the tx commitment (the integrity of the
+witness commitment is assured by that of the tx commitment).
+>>
+>> So what does it mean to speak of a block hash derived from:
+>> (1) a block message with an unparseable header?
+>> (2) a block message with parseable but invalid header?
+>> (3) a block message with valid header but unparseable tx data?
+>> (4) a block message with valid header but parseable invalid uncommitted
+tx data?
+>> (5) a block message with valid header but parseable invalid malleated
+committed tx data?
+>> (6) a block message with valid header but parseable invalid unmalleated
+committed tx data?
+>> (7) a block message with valid header but uncommitted valid tx data?
+>> (8) a block message with valid header but malleated committed valid tx
+data?
+>> (9) a block message with valid header but unmalleated committed valid tx
+data?
+>>
+>> Note that only the #9 p2p block message contains an actual Bitcoin
+block, the others are bogus messages. In all cases the message can be
+sha256 hashed to establish the identity of the *message*. And if one's
+objective is to reject repeating bogus messages, this might be a useful
+strategy. It's already part of the p2p protocol, is orders of magnitude
+cheaper to produce than a Merkle root, and has no identity issues.
+
+> I think I mostly agree with the identity issue as laid out so far, there
+is one caveat to add if you're considering identity caching as the problem
+solved. A validation node might have to consider differently block messages
+processed if they connect on the longest most PoW valid chain for which all
+blocks have been validated. Or alternatively if they have to be added on a
+candidate longest most PoW valid chain.
+
+Certainly an important consideration. We store both types. Once there is a
+stronger candidate header chain we store the headers and proceed to
+obtaining the blocks (if we don't already have them). The blocks are stored
+in the same table; the confirmed vs. candidate indexes simply point to them
+as applicable. It is feasible (and has happened twice) for two blocks to
+share the very same coinbase tx, even with either/all bip30/34/90 active
+(and setting aside future issues here for the sake of simplicity). This
+remains only because two competing branches can have blocks at the same
+height, and bip34 requires only height in the coinbase input script. This
+therefore implies the same transaction but distinct blocks. It is however
+infeasible for one block to exist in multiple distinct chains. In order for
+this to happen two blocks at the same height must have the same coinbase
+(ok), and also the same parent (ok). But this then means that they either
+(1) have distinct identity due to another header property deviation, or (2)
+are the same block with the same parent and are therefore in just one
+chain. So I don't see an actual caveat. I'm not certain if this is the
+ambiguity that you were referring to. If not please feel free to clarify.
+
+>> The concept of Bitcoin block hash as unique identifier for invalid p2p
+block messages is problematic. Apart from the malleation question, what is
+the Bitcoin block hash for a message with unparseable data (#1 and #3)?
+Such messages are trivial to produce and have no block hash.
+
+> For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` (in
+`src/node/connection_types.h`) where some preferential peering policy is
+applied in matters of block messages download.
+
+We don't do this and I don't see how it would be relevant. If a peer
+provides any invalid message or otherwise violates the protocol it is
+simply dropped.
+
+The "problematic" that I'm referring to is the reliance on the block hash
+as a message identifier, because it does not identify the message and
+cannot be useful in an effectively unlimited number of zero-cost cases.
+
+>> What is the useful identifier for a block with malleated commitments (#5
+and #8) or invalid commitments (#4 and #7) - valid txs or otherwise?
+
+> The block header, as it commits to the transaction identifier tree can be
+useful as much for #4 and #5.
+
+#4 and #5 refer to "uncommitted" and "malleated committed". It may not be
+clear, but "uncommitted" means that the tx commitment is not valid (Merkle
+root doesn't match the header's value) and "malleated committed" means that
+the (matching) commitment cannot be relied upon because the txs represent
+malleation, invalidating the identifier. So neither of these are usable
+identifiers.
+
+> On the bitcoin core side, about #7 the uncommitted valid tx data can be
+already present in the validation cache from mempool acceptance. About #8,
+the malleaed committed valid transactions shall be also committed in the
+merkle root in headers.
+
+It seems you may be referring to "unconfirmed" txs as opposed to
+"uncommitted" txs. This doesn't pertain to tx storage or identifiers.
+Neither #7 nor #8 are usable for the same reasons.
+
+>> This seems reasonable at first glance, but given the list of scenarios
+above, which does it apply to?
+
+>> This seems reasonable at first glance, but given the list of scenarios
+above, which does it apply to? Presumably the invalid header (#2) doesn't
+get this far because of headers-first.
+>> That leaves just invalid blocks with useful block hash identifiers (#6).
+In all other cases the message is simply discarded. In this case the
+attempt is to move category #5 into category #6 by prohibiting 64 byte txs.
+
+> Yes, it's moving from the category #5 to the category #6. Note,
+transaction malleability can be a distinct issue than lack of domain
+separation.
+
+I'm making no reference to tx malleability. This concerns only Merkle tree
+(block hash) malleability, the two types described in detail in the paper I
+referenced earlier, here again:
+
+https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20190225/a27d8837/attachment-0001.pdf
+
+>> The requirement to "avoid re-downloading and re-validating it" is about
+performance, presumably minimizing initial block download/catch-up time.
+There is a > computational cost to producing 64 byte malleations and none
+for any of the other bogus block message categories above, including the
+other form of malleation. > Furthermore, 64 byte malleation has almost zero
+cost to preclude. No hashing and not even true header or tx parsing are
+required. Only a handful of bytes must be read > from the raw message
+before it can be discarded presently.
+
+>> That's actually far cheaper than any of the other scenarios that again,
+have no cost to produce. The other type of malleation requires parsing all
+of the txs in the block and > hashing and comparing some or all of them. In
+other words, if there is an attack scenario, that must be addressed before
+this can be meaningful. In fact all of the other bogus message scenarios
+(with tx data) will remain more expensive to discard than this one.
+
+> In practice on the bitcoin core side, the bogus block message categories
+from #4 to #6 are already mitigated by validation caching for transactions
+that have been received early. While libbitcoin has no mempool (at least in
+earlier versions) transactions buffering can be done by bip152's
+HeadersAndShortIds message.
+
+Again, this has no relation to tx hashes/identifiers. Libbitcoin has a tx
+pool, we just don't store them in RAM (memory).
+
+> About #7 and #8, introducing a domain separation where 64 bytes
+transactions are rejected and making it harder to exploit #7 and #8
+categories of bogus block messages. This is correct that bitcoin core might
+accept valid transaction data before the merkle tree commitment has been
+verified.
+
+I don't follow this. An invalid 64 byte tx consensus rule would definitely
+not make it harder to exploit block message invalidity. In fact it would
+just slow down validation by adding a redundant rule. Furthermore, as I
+have detailed in a previous message, caching invalidity does absolutely
+nothing to increase protection. In fact it makes the situation materially
+worse.
+
+>> The problem arises from trying to optimize dismissal by storing an
+identifier. Just *producing* the identifier is orders of magnitude more
+costly than simply dismissing this > bogus message. I can't imagine why any
+implementation would want to compute and store and retrieve and recompute
+and compare hashes when the alterative is just dismissing the bogus
+messages with no hashing at all.
+
+>> Bogus messages will arrive, they do not even have to be requested. The
+simplest are dealt with by parse failure. What defines a parse is entirely
+subjective. Generally it's
+>> "structural" but nothing precludes incorporating a requirement for a
+necessary leading pattern in the stream, sort of like how the witness
+pattern is identified. If we were
+>> going to prioritize early dismissal this is where we would put it.
+
+> I don't think this is that simple - While producing an identifier comes
+with a computational cost (e.g fixed 64-byte structured coinbase
+transaction), if the full node have a hierarchy of validation cache like
+bitcoin core has already, the cost of bogus block messages can be slashed
+down.
+
+No, this is not the case. As I detailed in my previous message, there is no
+possible scenario where invalidation caching does anything but make the
+situation materially worse.
+
+> On the other hand, just dealing with parse failure on the spot by
+introducing a leading pattern in the stream just inflates the size of p2p
+messages, and the transaction-relay bandwidth cost.
+
+I think you misunderstood me. I am suggesting no change to serialization. I
+can see how it might be unclear, but I said, "nothing precludes
+incorporating a requirement for a necessary leading pattern in the stream."
+I meant that the parser can simply incorporate the *requirement* that the
+byte stream starts with a null input point. That identifies the malleation
+or invalidity without a single hash operation and while only reading a
+handful of bytes. No change to any messages.
+
+>> However, there is a tradeoff in terms of early dismissal. Looking up
+invalid hashes is a costly tradeoff, which becomes multiplied by every
+block validated. For example, expending 1 millisecond in hash/lookup to
+save 1 second of validation time in the failure case seems like a
+reasonable tradeoff, until you multiply across the whole chain. > 1 ms
+becomes 14 minutes across the chain, just to save a second for each mallied
+block encountered. That means you need to have encountered 840 such mallied
+blocks > just to break even. Early dismissing the block for non-null
+coinbase point (without hashing anything) would be on the order of 1000x
+faster than that (breakeven at 1 > encounter). So why the block hash cache
+requirement? It cannot be applied to many scenarios, and cannot be optimal
+in this one.
+
+> I think what you're describing is more a classic time-space tradeoff
+which is well-known in classic computer science litterature. In my
+reasonable opinion, one should more reason under what is the security
+paradigm we wish for bitcoin block-relay network and perduring
+decentralization, i.e one where it's easy to verify block messages proofs
+which could have been generated on specialized hardware with an asymmetric
+cost. Obviously encountering 840 such malliead blocks to make it break even
+doesn't make the math up to save on hash lookup, unless you can reduce the
+attack scenario in terms of adversaries capabilities.
+
+I'm referring to DoS mitigation (the only relevant security consideration
+here). I'm pointing out that invalidity caching is pointless in all cases,
+and in this case is the most pointless as type64 malleation is the cheapest
+of all invalidity to detect. I would prefer that all bogus blocks sent to
+my node are of this type. The worst types of invalidity detection have no
+mitigation and from a security standpoint are counterproductive to cache.
+I'm describing what overall is actually not a tradeoff. It's all negative
+and no positive.
+
+Best,
+Eric
+
+--
+You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
+To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
+To view this discussion on the web visit https://groups.google.com/d/msgid/bitcoindev/d9834ad5-f803-4a39-a854-95b2439738f5n%40googlegroups.com.
+
+------=_Part_35621_885950809.1719969200791
+Content-Type: text/html; charset="UTF-8"
+Content-Transfer-Encoding: quoted-printable
+
+Hi Antoine R,<br /><br />&gt;&gt; Ok, thanks for clarifying. I'm still not =
+making the connection to "checking a non-null [C] pointer" but that's prob =
+on me.<br /><br />&gt; A C pointer, which is a language idiome assigning to=
+ a memory address A the value o memory address B can be 0 (or NULL a standa=
+rd macro defined in stddef.h).<br />&gt; Here a snippet example of linked l=
+ist code checking the pointer (`*begin_list`) is non null before the compar=
+ison operation to find the target element list.<br />&gt; ...<br />&gt; Whi=
+le both libbitcoin and bitcoin core are both written in c++, you still have=
+ underlying pointer derefencing playing out to access the coinbase transact=
+ion, and all underlying implications in terms of memory management.<br /><b=
+r />I'm familiar with pointers ;).<br /><br />While at some level the block=
+ message buffer would generally be referenced by one or more C pointers, th=
+e difference between a valid coinbase input (i.e. with a "null point") and =
+any other input, is not nullptr vs. !nullptr. A "null point" is a 36 byte v=
+alue, 32 0x00 byes followed by 4 0xff bytes. In his infinite wisdom Satoshi=
+ decided it was better (or easier) to serialize a first block tx (coinbase)=
+ with an input containing an unusable script and pointing to an invalid [tx=
+:index] tuple (input point) as opposed to just not having any input. That i=
+nvalid input point is called a "null point", and of course cannot be pointe=
+d to by a "null pointer". The coinbase must be identified by comparing thos=
+e 36 bytes to the well-known null point value (and if this does not match t=
+he Merkle hash cannot have been type64 malleated).<br /><br /><div>&gt; I t=
+hink it's interesting to point out the two types of malleation that a bitco=
+in consensus validation logic should respect w.r.t block validity checks.=
+=C2=A0Like you said the first one on the merkle root committed in the heade=
+rs's `hashMerkleRoot` due to the lack of domain separation between leaf and=
+ merkle tree nodes.<br /></div><div><br />We call this type64 malleability =
+(or malleation where it is not only possible but occurs).<br /><br />&gt; T=
+he second one is the bip141 wtxid commitment in one of the coinbase transac=
+tion `scriptpubkey` output, which is itself covered by a txid in the merkle=
+ tree.<br /><br />While symmetry seems to imply that the witness commitment=
+ would be malleable, just as the txs commitment, this is not the case. If t=
+he tx commitment is correct it is computationally infeasible for the witnes=
+s commitment to be malleated, as the witness commitment incorporates each f=
+ull tx (with witness, sentinel, and marker). As such the block identifier, =
+which relies only on the header and tx commitment, is a sufficient identifi=
+er. Yet it remains necessary to validate the witness commitment to ensure t=
+hat the correct witness data has been provided in the block message.<br /><=
+br />The second type of malleability, in addition to type64, is what we cal=
+l type32. This is the consequence of duplicated trailing sets of txs (and t=
+herefore tx hashes) in a block message. This is applicable to some but not =
+all blocks, as a function of the number of txs contained.<br /><br />&gt;&g=
+t; Caching identity in the case of invalidity is more interesting question =
+than it might seem.<br />&gt;&gt; Background: A fully-validated block has e=
+stablished identity in its block hash. However an invalid block message may=
+ include the same block header, producing the same hash, but with any kind =
+of nonsense following the header. The purpose of the transaction and witnes=
+s commitments is of course to establish this identity, so these two checks =
+are therefore necessary even under checkpoint/milestone. And then of course=
+ the two Merkle tree issues complicate the tx commitment (the integrity of =
+the witness commitment is assured by that of the tx commitment).<br />&gt;&=
+gt;<br />&gt;&gt; So what does it mean to speak of a block hash derived fro=
+m:<br />&gt;&gt; (1) a block message with an unparseable header?<br />&gt;&=
+gt; (2) a block message with parseable but invalid header?<br />&gt;&gt; (3=
+) a block message with valid header but unparseable tx data?<br />&gt;&gt; =
+(4) a block message with valid header but parseable invalid uncommitted tx =
+data?<br />&gt;&gt; (5) a block message with valid header but parseable inv=
+alid malleated committed tx data?<br />&gt;&gt; (6) a block message with va=
+lid header but parseable invalid unmalleated committed tx data?<br />&gt;&g=
+t; (7) a block message with valid header but uncommitted valid tx data?<br =
+/>&gt;&gt; (8) a block message with valid header but malleated committed va=
+lid tx data?<br />&gt;&gt; (9) a block message with valid header but unmall=
+eated committed valid tx data?<br />&gt;&gt;<br />&gt;&gt; Note that only t=
+he #9 p2p block message contains an actual Bitcoin block, the others are bo=
+gus messages. In all cases the message can be sha256 hashed to establish th=
+e identity of the *message*. And if one's objective is to reject repeating =
+bogus messages, this might be a useful strategy. It's already part of the p=
+2p protocol, is orders of magnitude cheaper to produce than a Merkle root, =
+and has no identity issues.<br /><br />&gt; I think I mostly agree with the=
+ identity issue as laid out so far, there is one caveat to add if you're co=
+nsidering identity caching as the problem solved. A validation node might h=
+ave to consider differently block messages processed if they connect on the=
+ longest most PoW valid chain for which all blocks have been validated. Or =
+alternatively if they have to be added on a candidate longest most PoW vali=
+d chain.<br /><br />Certainly an important consideration. We store both typ=
+es. Once there is a stronger candidate header chain we store the headers an=
+d proceed to obtaining the blocks (if we don't already have them). The bloc=
+ks are stored in the same table; the confirmed vs. candidate indexes simply=
+ point to them as applicable. It is feasible (and has happened twice) for t=
+wo blocks to share the very same coinbase tx, even with either/all bip30/34=
+/90 active (and setting aside future issues here for the sake of simplicity=
+). This remains only because two competing branches can have blocks at the =
+same height, and bip34 requires only height in the coinbase input script. T=
+his therefore implies the same transaction but distinct blocks. It is howev=
+er infeasible for one block to exist in multiple distinct chains. In order =
+for this to happen two blocks at the same height must have the same coinbas=
+e (ok), and also the same parent (ok). But this then means that they either=
+ (1) have distinct identity due to another header property deviation, or (2=
+) are the same block with the same parent and are therefore in just one cha=
+in. So I don't see an actual caveat. I'm not certain if this is the ambigui=
+ty that you were referring to. If not please feel free to clarify.<br /><br=
+ />&gt;&gt; The concept of Bitcoin block hash as unique identifier for inva=
+lid p2p block messages is problematic. Apart from the malleation question, =
+what is the Bitcoin block hash for a message with unparseable data (#1 and =
+#3)? Such messages are trivial to produce and have no block hash.<br /><br =
+/>&gt; For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` =
+(in `src/node/connection_types.h`) where some preferential peering policy i=
+s applied in matters of block messages download.<br /><br />We don't do thi=
+s and I don't see how it would be relevant. If a peer provides any invalid =
+message or otherwise violates the protocol it is simply dropped.<br /><br /=
+>The "problematic" that I'm referring to is the reliance on the block hash =
+as a message identifier, because it does not identify the message and canno=
+t be useful in an effectively unlimited number of zero-cost cases.<br /><br=
+ />&gt;&gt; What is the useful identifier for a block with malleated commit=
+ments (#5 and #8) or invalid commitments (#4 and #7) - valid txs or otherwi=
+se?<br /><br />&gt; The block header, as it commits to the transaction iden=
+tifier tree can be useful as much for #4 and #5.<br /><br />#4 and #5 refer=
+ to "uncommitted" and "malleated committed". It may not be clear, but "unco=
+mmitted" means that the tx commitment is not valid (Merkle root doesn't mat=
+ch the header's value) and "malleated committed" means that the (matching) =
+commitment cannot be relied upon because the txs represent malleation, inva=
+lidating the identifier. So neither of these are usable identifiers.<br /><=
+br />&gt; On the bitcoin core side, about #7 the uncommitted valid tx data =
+can be already present in the validation cache from mempool acceptance. Abo=
+ut #8, the malleaed committed valid transactions shall be also committed in=
+ the merkle root in headers.<br /><br />It seems you may be referring to "u=
+nconfirmed" txs as opposed to "uncommitted" txs. This doesn't pertain to tx=
+ storage or identifiers. Neither #7 nor #8 are usable for the same reasons.=
+<br /><br />&gt;&gt; This seems reasonable at first glance, but given the l=
+ist of scenarios above, which does it apply to?<br /><br />&gt;&gt; This se=
+ems reasonable at first glance, but given the list of scenarios above, whic=
+h does it apply to? Presumably the invalid header (#2) doesn't get this far=
+ because of headers-first.<br />&gt;&gt; That leaves just invalid blocks wi=
+th useful block hash identifiers (#6). In all other cases the message is si=
+mply discarded. In this case the attempt is to move category #5 into catego=
+ry #6 by prohibiting 64 byte txs.<br /><br />&gt; Yes, it's moving from the=
+ category #5 to the category #6. Note, transaction malleability can be a di=
+stinct issue than lack of domain separation.<br /><br />I'm making no refer=
+ence to tx malleability. This concerns only Merkle tree (block hash) mallea=
+bility, the two types described in detail in the paper I referenced earlier=
+, here again:<br /><br />https://lists.linuxfoundation.org/pipermail/bitcoi=
+n-dev/attachments/20190225/a27d8837/attachment-0001.pdf<br /><br />&gt;&gt;=
+ The requirement to "avoid re-downloading and re-validating it" is about pe=
+rformance, presumably minimizing initial block download/catch-up time. Ther=
+e is a &gt; computational cost to producing 64 byte malleations and none fo=
+r any of the other bogus block message categories above, including the othe=
+r form of malleation. &gt; Furthermore, 64 byte malleation has almost zero =
+cost to preclude. No hashing and not even true header or tx parsing are req=
+uired. Only a handful of bytes must be read &gt; from the raw message befor=
+e it can be discarded presently.<br /><br />&gt;&gt; That's actually far ch=
+eaper than any of the other scenarios that again, have no cost to produce. =
+The other type of malleation requires parsing all of the txs in the block a=
+nd &gt; hashing and comparing some or all of them. In other words, if there=
+ is an attack scenario, that must be addressed before this can be meaningfu=
+l. In fact all of the other bogus message scenarios (with tx data) will rem=
+ain more expensive to discard than this one.<br /><br />&gt; In practice on=
+ the bitcoin core side, the bogus block message categories from #4 to #6 ar=
+e already mitigated by validation caching for transactions that have been r=
+eceived early. While libbitcoin has no mempool (at least in earlier version=
+s) transactions buffering can be done by bip152's HeadersAndShortIds messag=
+e.<br /><br />Again, this has no relation to tx hashes/identifiers. Libbitc=
+oin has a tx pool, we just don't store them in RAM (memory).<br /><br />&gt=
+; About #7 and #8, introducing a domain separation where 64 bytes transacti=
+ons are rejected and making it harder to exploit #7 and #8 categories of bo=
+gus block messages. This is correct that bitcoin core might accept valid tr=
+ansaction data before the merkle tree commitment has been verified.<br /><b=
+r />I don't follow this. An invalid 64 byte tx consensus rule would definit=
+ely not make it harder to exploit block message invalidity. In fact it woul=
+d just slow down validation by adding a redundant rule. Furthermore, as I h=
+ave detailed in a previous message, caching invalidity does absolutely noth=
+ing to increase protection. In fact it makes the situation materially worse=
+.<br /><br />&gt;&gt; The problem arises from trying to optimize dismissal =
+by storing an identifier. Just *producing* the identifier is orders of magn=
+itude more costly than simply dismissing this &gt; bogus message. I can't i=
+magine why any implementation would want to compute and store and retrieve =
+and recompute and compare hashes when the alterative is just dismissing the=
+ bogus messages with no hashing at all.<br /><br />&gt;&gt; Bogus messages =
+will arrive, they do not even have to be requested. The simplest are dealt =
+with by parse failure. What defines a parse is entirely subjective. General=
+ly it's<br />&gt;&gt; "structural" but nothing precludes incorporating a re=
+quirement for a necessary leading pattern in the stream, sort of like how t=
+he witness pattern is identified. If we were<br />&gt;&gt; going to priorit=
+ize early dismissal this is where we would put it.<br /><br />&gt; I don't =
+think this is that simple - While producing an identifier comes with a comp=
+utational cost (e.g fixed 64-byte structured coinbase transaction), if the =
+full node have a hierarchy of validation cache like bitcoin core has alread=
+y, the cost of bogus block messages can be slashed down.<br /><br />No, thi=
+s is not the case. As I detailed in my previous message, there is no possib=
+le scenario where invalidation caching does anything but make the situation=
+ materially worse.<br /><br />&gt; On the other hand, just dealing with par=
+se failure on the spot by introducing a leading pattern in the stream just =
+inflates the size of p2p messages, and the transaction-relay bandwidth cost=
+.<br /><br />I think you misunderstood me. I am suggesting no change to ser=
+ialization. I can see how it might be unclear, but I said, "nothing preclud=
+es incorporating a requirement for a necessary leading pattern in the strea=
+m." I meant that the parser can simply incorporate the *requirement* that t=
+he byte stream starts with a null input point. That identifies the malleati=
+on or invalidity without a single hash operation and while only reading a h=
+andful of bytes. No change to any messages.<br /><br />&gt;&gt; However, th=
+ere is a tradeoff in terms of early dismissal. Looking up invalid hashes is=
+ a costly tradeoff, which becomes multiplied by every block validated. For =
+example, expending 1 millisecond in hash/lookup to save 1 second of validat=
+ion time in the failure case seems like a reasonable tradeoff, until you mu=
+ltiply across the whole chain. &gt; 1 ms becomes 14 minutes across the chai=
+n, just to save a second for each mallied block encountered. That means you=
+ need to have encountered 840 such mallied blocks &gt; just to break even. =
+Early dismissing the block for non-null coinbase point (without hashing any=
+thing) would be on the order of 1000x faster than that (breakeven at 1 &gt;=
+ encounter). So why the block hash cache requirement? It cannot be applied =
+to many scenarios, and cannot be optimal in this one.<br /><br />&gt; I thi=
+nk what you're describing is more a classic time-space tradeoff which is we=
+ll-known in classic computer science litterature. In my reasonable opinion,=
+ one should more reason under what is the security paradigm we wish for bit=
+coin block-relay network and perduring decentralization, i.e one where it's=
+ easy to verify block messages proofs which could have been generated on sp=
+ecialized hardware with an asymmetric cost. Obviously encountering 840 such=
+ malliead blocks to make it break even doesn't make the math up to save on =
+hash lookup, unless you can reduce the attack scenario in terms of adversar=
+ies capabilities.<br /><br />I'm referring to DoS mitigation (the only rele=
+vant security consideration here). I'm pointing out that invalidity caching=
+ is pointless in all cases, and in this case is the most pointless as type6=
+4 malleation is the cheapest of all invalidity to detect. I would prefer th=
+at all bogus blocks sent to my node are of this type. The worst types of in=
+validity detection have no mitigation and from a security standpoint are co=
+unterproductive to cache. I'm describing what overall is actually not a tra=
+deoff. It's all negative and no positive.<br /><br />Best,<br />Eric</div>
+
+<p></p>
+
+-- <br />
+You received this message because you are subscribed to the Google Groups &=
+quot;Bitcoin Development Mailing List&quot; group.<br />
+To unsubscribe from this group and stop receiving emails from it, send an e=
+mail to <a href=3D"mailto:bitcoindev+unsubscribe@googlegroups.com">bitcoind=
+ev+unsubscribe@googlegroups.com</a>.<br />
+To view this discussion on the web visit <a href=3D"https://groups.google.c=
+om/d/msgid/bitcoindev/d9834ad5-f803-4a39-a854-95b2439738f5n%40googlegroups.=
+com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com/d/msg=
+id/bitcoindev/d9834ad5-f803-4a39-a854-95b2439738f5n%40googlegroups.com</a>.=
+<br />
+
+------=_Part_35621_885950809.1719969200791--
+
+------=_Part_35620_344008102.1719969200791--
+