Return-Path: <elombrozo@gmail.com>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 0BC06902
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Tue, 17 May 2016 14:25:30 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-wm0-f49.google.com (mail-wm0-f49.google.com [74.125.82.49])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id BA60414E
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Tue, 17 May 2016 14:25:27 +0000 (UTC)
Received: by mail-wm0-f49.google.com with SMTP id a17so34189664wme.0
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Tue, 17 May 2016 07:25:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:subject:from:in-reply-to:date
	:content-transfer-encoding:message-id:references:to;
	bh=4RpX1dt/fipFEy5tRrodTP3GGA+YkQMrrA+ufcgAgjQ=;
	b=LoooPVBRn4X8dZM9MdSAWLkW1mKDxk1PNnHZP8zuMMpztEL/j4DcWWoR74Y9y5JfyC
	U6KA/J2lTUp5VQD1mpp2poYFbfPNGvTHNbXZYxWsfY5Uf2YOez6q7v9nhlj/oYLVTl1G
	NtFWZoBuoQoW8FDyWYIhZfRuvtDsDOWUrez/Vww3AbzITn9u60+oKaauNAnFlxiYdjH2
	nZBhUNJOkNMVAc+LEQwjkkU36PMjg2l0uxqkQLVnJ8F8za6GppPVJYi20TdOOG7JDlmV
	+GfWoJXO0mD3WHPukCtK9WvrZg2BgK0CLHv6hBgW9Tpe3DrpKcfOt2O8diK1zrPaigaX
	HcMg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
	:content-transfer-encoding:message-id:references:to;
	bh=4RpX1dt/fipFEy5tRrodTP3GGA+YkQMrrA+ufcgAgjQ=;
	b=hSliwUREZDw1t6QtnnNCfmRl44AWmI7tYwQSESX0HxpZBhJiEgg8nYAhz19nT9nV13
	QbJv7NMhTlNgrprBxW/B3SzBpV+8J6Sg/t6fcP1hsPN+pKjz4J1HUpfNn1E7QfuS4hQm
	0Ar/97ll5NI9gQtidToqjCJo/sVZLgx50aOO+1eARObwAnYuHznJumocSo0hYhn+oX0w
	7OWAEGc2OJz0CEXiiGZDNgze8sKSzrUmuI5Am9d/2wGHxPOkrYVuXR1uQackRpNDRIth
	PKxKjHKA7pfBW/h/3fi2PbEDZmo8UhqgLg0La8+Z1qyTs9WOXBf9RNIBXvcrITu1gM6a
	xybQ==
X-Gm-Message-State: AOPr4FUrFQVbWd+vy+cH+4Ul2HqhTZ0RTUdqBFEMIM1TWDFugPBjsSf11LTI7ofvCwRN6A==
X-Received: by 10.28.62.15 with SMTP id l15mr24222808wma.30.1463495125694;
	Tue, 17 May 2016 07:25:25 -0700 (PDT)
Received: from [192.168.1.9] (109-92-230-17.dynamic.isp.telekom.rs.
	[109.92.230.17]) by smtp.gmail.com with ESMTPSA id
	b124sm25145642wmb.1.2016.05.17.07.25.24
	(version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Tue, 17 May 2016 07:25:25 -0700 (PDT)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\))
From: Eric Lombrozo <elombrozo@gmail.com>
In-Reply-To: <20160517132311.GA21656@fedora-21-dvm>
Date: Tue, 17 May 2016 16:25:22 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <17436700-3F7F-406B-AA09-51C20FFD7675@gmail.com>
References: <20160517132311.GA21656@fedora-21-dvm>
To: Peter Todd <pete@petertodd.org>,
	Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
X-Mailer: Apple Mail (2.3112)
X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM,
	RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: Re: [bitcoin-dev] Making UTXO Set Growth Irrelevant With
	Low-Latency Delayed TXO Commitments
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2016 14:25:30 -0000

Nice!

We=E2=80=99ve been talking about doing this forever and it=E2=80=99s so =
desperately needed.

> On May 17, 2016, at 3:23 PM, Peter Todd via bitcoin-dev =
<bitcoin-dev@lists.linuxfoundation.org> wrote:
>=20
> # Motivation
>=20
> UTXO growth is a serious concern for Bitcoin's long-term =
decentralization. To
> run a competitive mining operation potentially the entire UTXO set =
must be in
> RAM to achieve competitive latency; your larger, more centralized, =
competitors
> will have the UTXO set in RAM. Mining is a zero-sum game, so the extra =
latency
> of not doing so if they do directly impacts your profit margin. =
Secondly,
> having possession of the UTXO set is one of the minimum requirements =
to run a
> full node; the larger the set the harder it is to run a full node.
>=20
> Currently the maximum size of the UTXO set is unbounded as there is no
> consensus rule that limits growth, other than the block-size limit =
itself; as
> of writing the UTXO set is 1.3GB in the on-disk, compressed =
serialization,
> which expands to significantly more in memory. UTXO growth is driven =
by a
> number of factors, including the fact that there is little incentive =
to merge
> inputs, lost coins, dust outputs that can't be economically spent, and
> non-btc-value-transfer "blockchain" use-cases such as anti-replay =
oracles and
> timestamping.
>=20
> We don't have good tools to combat UTXO growth. Segregated Witness =
proposes to
> give witness space a 75% discount, in part of make reducing the UTXO =
set size
> by spending txouts cheaper. While this may change wallets to more =
often spend
> dust, it's hard to imagine an incentive sufficiently strong to =
discourage most,
> let alone all, UTXO growing behavior.
>=20
> For example, timestamping applications often create unspendable =
outputs due to
> ease of implementation, and because doing so is an easy way to make =
sure that
> the data required to reconstruct the timestamp proof won't get lost - =
all
> Bitcoin full nodes are forced to keep a copy of it. Similarly =
anti-replay
> use-cases like using the UTXO set for key rotation piggyback on the =
uniquely
> strong security and decentralization guarantee that Bitcoin provides; =
it's very
> difficult - perhaps impossible - to provide these applications with
> alternatives that are equally secure. These non-btc-value-transfer =
use-cases
> can often afford to pay far higher fees per UTXO created than =
competing
> btc-value-transfer use-cases; many users could afford to spend $50 to =
register
> a new PGP key, yet would rather not spend $50 in fees to create a =
standard two
> output transaction. Effective techniques to resist miner censorship =
exist, so
> without resorting to whitelists blocking non-btc-value-transfer =
use-cases as
> "spam" is not a long-term, incentive compatible, solution.
>=20
> A hard upper limit on UTXO set size could create a more level playing =
field in
> the form of fixed minimum requirements to run a performant Bitcoin =
node, and
> make the issue of UTXO "spam" less important. However, making any =
coins
> unspendable, regardless of age or value, is a politically untenable =
economic
> change.
>=20
>=20
> # TXO Commitments
>=20
> A merkle tree committing to the state of all transaction outputs, both =
spent
> and unspent, we can provide a method of compactly proving the current =
state of
> an output. This lets us "archive" less frequently accessed parts of =
the UTXO
> set, allowing full nodes to discard the associated data, still =
providing a
> mechanism to spend those archived outputs by proving to those nodes =
that the
> outputs are in fact unspent.
>=20
> Specifically TXO commitments proposes a Merkle Mountain Range=C2=B9 =
(MMR), a
> type of deterministic, indexable, insertion ordered merkle tree, which =
allows
> new items to be cheaply appended to the tree with minimal storage =
requirements,
> just log2(n) "mountain tips". Once an output is added to the TXO MMR =
it is
> never removed; if an output is spent its status is updated in place. =
Both the
> state of a specific item in the MMR, as well the validity of changes =
to items
> in the MMR, can be proven with log2(n) sized proofs consisting of a =
merkle path
> to the tip of the tree.
>=20
> At an extreme, with TXO commitments we could even have no UTXO set at =
all,
> entirely eliminating the UTXO growth problem. Transactions would =
simply be
> accompanied by TXO commitment proofs showing that the outputs they =
wanted to
> spend were still unspent; nodes could update the state of the TXO MMR =
purely
> from TXO commitment proofs. However, the log2(n) bandwidth overhead =
per txin is
> substantial, so a more realistic implementation is be to have a UTXO =
cache for
> recent transactions, with TXO commitments acting as a alternate for =
the (rare)
> event that an old txout needs to be spent.
>=20
> Proofs can be generated and added to transactions without the =
involvement of
> the signers, even after the fact; there's no need for the proof itself =
to
> signed and the proof is not part of the transaction hash. Anyone with =
access to
> TXO MMR data can (re)generate missing proofs, so minimal, if any, =
changes are
> required to wallet software to make use of TXO commitments.
>=20
>=20
> ## Delayed Commitments
>=20
> TXO commitments aren't a new idea - the author proposed them years ago =
in
> response to UTXO commitments. However it's critical for small miners' =
orphan
> rates that block validation be fast, and so far it has proven =
difficult to
> create (U)TXO implementations with acceptable performance; updating =
and
> recalculating cryptographicly hashed merkelized datasets is inherently =
more
> work than not doing so. Fortunately if we maintain a UTXO set for =
recent
> outputs, TXO commitments are only needed when spending old, archived, =
outputs.
> We can take advantage of this by delaying the commitment, allowing it =
to be
> calculated well in advance of it actually being used, thus changing a
> latency-critical task into a much easier average throughput problem.
>=20
> Concretely each block B_i commits to the TXO set state as of block =
B_{i-n}, in
> other words what the TXO commitment would have been n blocks ago, if =
not for
> the n block delay. Since that commitment only depends on the contents =
of the
> blockchain up until block B_{i-n}, the contents of any block after are
> irrelevant to the calculation.
>=20
>=20
> ## Implementation
>=20
> Our proposed high-performance/low-latency delayed commitment full-node
> implementation needs to store the following data:
>=20
> 1) UTXO set
>=20
>    Low-latency K:V map of txouts definitely known to be unspent. =
Similar to
>    existing UTXO implementation, but with the key difference that old,
>    unspent, outputs may be pruned from the UTXO set.
>=20
>=20
> 2) STXO set
>=20
>    Low-latency set of transaction outputs known to have been spent by
>    transactions after the most recent TXO commitment, but created =
prior to the
>    TXO commitment.
>=20
>=20
> 3) TXO journal
>=20
>    FIFO of outputs that need to be marked as spent in the TXO MMR. =
Appends
>    must be low-latency; removals can be high-latency.
>=20
>=20
> 4) TXO MMR list
>=20
>    Prunable, ordered list of TXO MMR's, mainly the highest pending =
commitment,
>    backed by a reference counted, cryptographically hashed object =
store
>    indexed by digest (similar to how git repos work). High-latency ok. =
We'll
>    cover this in more in detail later.
>=20
>=20
> ### Fast-Path: Verifying a Txout Spend In a Block
>=20
> When a transaction output is spent by a transaction in a block we have =
two
> cases:
>=20
> 1) Recently created output
>=20
>    Output created after the most recent TXO commitment, so it should =
be in the
>    UTXO set; the transaction spending it does not need a TXO =
commitment proof.
>    Remove the output from the UTXO set and append it to the TXO =
journal.
>=20
> 2) Archived output
>=20
>    Output created prior to the most recent TXO commitment, so there's =
no
>    guarantee it's in the UTXO set; transaction will have a TXO =
commitment
>    proof for the most recent TXO commitment showing that it was =
unspent.
>    Check that the output isn't already in the STXO set (double-spent), =
and if
>    not add it. Append the output and TXO commitment proof to the TXO =
journal.
>=20
> In both cases recording an output as spent requires no more than two =
key:value
> updates, and one journal append. The existing UTXO set requires one =
key:value
> update per spend, so we can expect new block validation latency to be =
within 2x
> of the status quo even in the worst case of 100% archived output =
spends.
>=20
>=20
> ### Slow-Path: Calculating Pending TXO Commitments
>=20
> In a low-priority background task we flush the TXO journal, recording =
the
> outputs spent by each block in the TXO MMR, and hashing MMR data to =
obtain the
> TXO commitment digest. Additionally this background task removes =
STXO's that
> have been recorded in TXO commitments, and prunes TXO commitment data =
no longer
> needed.
>=20
> Throughput for the TXO commitment calculation will be worse than the =
existing
> UTXO only scheme. This impacts bulk verification, e.g. initial block =
download.
> That said, TXO commitments provides other possible tradeoffs that can =
mitigate
> impact of slower validation throughput, such as skipping validation of =
old
> history, as well as fraud proof approaches.
>=20
>=20
> ### TXO MMR Implementation Details
>=20
> Each TXO MMR state is a modification of the previous one with most =
information
> shared, so we an space-efficiently store a large number of TXO =
commitments
> states, where each state is a small delta of the previous state, by =
sharing
> unchanged data between each state; cycles are impossible in merkelized =
data
> structures, so simple reference counting is sufficient for garbage =
collection.
> Data no longer needed can be pruned by dropping it from the database, =
and
> unpruned by adding it again. Since everything is committed to via =
cryptographic
> hash, we're guaranteed that regardless of where we get the data, after
> unpruning we'll have the right data.
>=20
> Let's look at how the TXO MMR works in detail. Consider the following =
TXO MMR
> with two txouts, which we'll call state #0:
>=20
>      0
>     / \
>    a   b
>=20
> If we add another entry we get state #1:
>=20
>        1
>       / \
>      0   \
>     / \   \
>    a   b   c
>=20
> Note how it 100% of the state #0 data was reused in commitment #1. =
Let's
> add two more entries to get state #2:
>=20
>            2
>           / \
>          2   \
>         / \   \
>        /   \   \
>       /     \   \
>      0       2   \
>     / \     / \   \
>    a   b   c   d   e
>=20
> This time part of state #1 wasn't reused - it's wasn't a perfect =
binary
> tree - but we've still got a lot of re-use.
>=20
> Now suppose state #2 is committed into the blockchain by the most =
recent block.
> Future transactions attempting to spend outputs created as of state #2 =
are
> obliged to prove that they are unspent; essentially they're forced to =
provide
> part of the state #2 MMR data. This lets us prune that data, =
discarding it,
> leaving us with only the bare minimum data we need to append new =
txouts to the
> TXO MMR, the tips of the perfect binary trees ("mountains") within the =
MMR:
>=20
>            2
>           / \
>          2   \
>               \
>                \
>                 \
>                  \
>                   \
>                    e
>=20
> Note that we're glossing over some nuance here about exactly what data =
needs to
> be kept; depending on the details of the implementation the only data =
we need
> for nodes "2" and "e" may be their hash digest.
>=20
> Adding another three more txouts results in state #3:
>=20
>                  3
>                 / \
>                /   \
>               /     \
>              /       \
>             /         \
>            /           \
>           /             \
>          2               3
>                         / \
>                        /   \
>                       /     \
>                      3       3
>                     / \     / \
>                    e   f   g   h
>=20
> Suppose recently created txout f is spent. We have all the data =
required to
> update the MMR, giving us state #4. It modifies two inner nodes and =
one leaf
> node:
>=20
>                  4
>                 / \
>                /   \
>               /     \
>              /       \
>             /         \
>            /           \
>           /             \
>          2               4
>                         / \
>                        /   \
>                       /     \
>                      4       3
>                     / \     / \
>                    e  (f)  g   h
>=20
> If an archived txout is spent requires the transaction to provide the =
merkle
> path to the most recently committed TXO, in our case state #2. If =
txout b is
> spent that means the transaction must provide the following data from =
state #2:
>=20
>            2
>           /
>          2
>         /
>        /
>       /
>      0
>       \
>        b
>=20
> We can add that data to our local knowledge of the TXO MMR, unpruning =
part of
> it:
>=20
>                  4
>                 / \
>                /   \
>               /     \
>              /       \
>             /         \
>            /           \
>           /             \
>          2               4
>         /               / \
>        /               /   \
>       /               /     \
>      0               4       3
>       \             / \     / \
>        b           e  (f)  g   h
>=20
> Remember, we haven't _modified_ state #4 yet; we just have more data =
about it.
> When we mark txout b as spent we get state #5:
>=20
>                  5
>                 / \
>                /   \
>               /     \
>              /       \
>             /         \
>            /           \
>           /             \
>          5               4
>         /               / \
>        /               /   \
>       /               /     \
>      5               4       3
>       \             / \     / \
>       (b)          e  (f)  g   h
>=20
> Secondly by now state #3 has been committed into the chain, and =
transactions
> that want to spend txouts created as of state #3 must provide a TXO =
proof
> consisting of state #3 data. The leaf nodes for outputs g and h, and =
the inner
> node above them, are part of state #3, so we prune them:
>=20
>                  5
>                 / \
>                /   \
>               /     \
>              /       \
>             /         \
>            /           \
>           /             \
>          5               4
>         /               /
>        /               /
>       /               /
>      5               4
>       \             / \
>       (b)          e  (f)
>=20
> Finally, lets put this all together, by spending txouts a, c, and g, =
and
> creating three new txouts i, j, and k. State #3 was the most recently =
committed
> state, so the transactions spending a and g are providing merkle paths =
up to
> it. This includes part of the state #2 data:
>=20
>                  3
>                 / \
>                /   \
>               /     \
>              /       \
>             /         \
>            /           \
>           /             \
>          2               3
>         / \               \
>        /   \               \
>       /     \               \
>      0       2               3
>     /       /               /
>    a       c               g
>=20
> After unpruning we have the following data for state #5:
>=20
>                  5
>                 / \
>                /   \
>               /     \
>              /       \
>             /         \
>            /           \
>           /             \
>          5               4
>         / \             / \
>        /   \           /   \
>       /     \         /     \
>      5       2       4       3
>     / \     /       / \     /
>    a  (b)  c       e  (f)  g
>=20
> That's sufficient to mark the three outputs as spent and add the three =
new
> txouts, resulting in state #6:
>=20
>                        6
>                       / \
>                      /   \
>                     /     \
>                    /       \
>                   /         \
>                  6           \
>                 / \           \
>                /   \           \
>               /     \           \
>              /       \           \
>             /         \           \
>            /           \           \
>           /             \           \
>          6               6           \
>         / \             / \           \
>        /   \           /   \           6
>       /     \         /     \         / \
>      6       6       4       6       6   \
>     / \     /       / \     /       / \   \
>   (a) (b) (c)      e  (f) (g)      i   j   k
>=20
> Again, state #4 related data can be pruned. In addition, depending on =
how the
> STXO set is implemented may also be able to prune data related to =
spent txouts
> after that state, including inner nodes where all txouts under them =
have been
> spent (more on pruning spent inner nodes later).
>=20
>=20
> ### Consensus and Pruning
>=20
> It's important to note that pruning behavior is consensus critical: a =
full node
> that is missing data due to pruning it too soon will fall out of =
consensus, and
> a miner that fails to include a merkle proof that is required by the =
consensus
> is creating an invalid block. At the same time many full nodes will =
have
> significantly more data on hand than the bare minimum so they can help =
wallets
> make transactions spending old coins; implementations should strongly =
consider
> separating the data that is, and isn't, strictly required for =
consensus.
>=20
> A reasonable approach for the low-level cryptography may be to =
actually treat
> the two cases differently, with the TXO commitments committing too =
what data
> does and does not need to be kept on hand by the UTXO expiration =
rules. On the
> other hand, leaving that uncommitted allows for certain types of =
soft-forks
> where the protocol is changed to require more data than it previously =
did.
>=20
>=20
> ### Consensus Critical Storage Overheads
>=20
> Only the UTXO and STXO sets need to be kept on fast random access =
storage.
> Since STXO set entries can only be created by spending a UTXO - and =
are smaller
> than a UTXO entry - we can guarantee that the peak size of the UTXO =
and STXO
> sets combined will always be less than the peak size of the UTXO set =
alone in
> the existing UTXO-only scheme (though the combined size can be =
temporarily
> higher than what the UTXO set size alone would be when large numbers =
of
> archived txouts are spent).
>=20
> TXO journal entries and unpruned entries in the TXO MMR have log2(n) =
maximum
> overhead per entry: a unique merkle path to a TXO commitment (by =
"unique" we
> mean that no other entry shares data with it). On a reasonably fast =
system the
> TXO journal will be flushed quickly, converting it into TXO MMR data; =
the TXO
> journal will never be more than a few blocks in size.
>=20
> Transactions spending non-archived txouts are not required to provide =
any TXO
> commitment data; we must have that data on hand in the form of one TXO =
MMR
> entry per UTXO. Once spent however the TXO MMR leaf node associated =
with that
> non-archived txout can be immediately pruned - it's no longer in the =
UTXO set
> so any attempt to spend it will fail; the data is now immutable and =
we'll never
> need it again. Inner nodes in the TXO MMR can also be pruned if all =
leafs under
> them are fully spent; detecting this is easy the TXO MMR is a =
merkle-sum tree,
> with each inner node committing to the sum of the unspent txouts under =
it.
>=20
> When a archived txout is spent the transaction is required to provide =
a merkle
> path to the most recent TXO commitment. As shown above that path is =
sufficient
> information to unprune the necessary nodes in the TXO MMR and apply =
the spend
> immediately, reducing this case to the TXO journal size question =
(non-consensus
> critical overhead is a different question, which we'll address in the =
next
> section).
>=20
> Taking all this into account the only significant storage overhead of =
our TXO
> commitments scheme when compared to the status quo is the log2(n) =
merkle path
> overhead; as long as less than 1/log2(n) of the UTXO set is active,
> non-archived, UTXO's we've come out ahead, even in the unrealistic =
case where
> all storage available is equally fast. In the real world that isn't =
yet the
> case - even SSD's significantly slower than RAM.
>=20
>=20
> ### Non-Consensus Critical Storage Overheads
>=20
> Transactions spending archived txouts pose two challenges:
>=20
> 1) Obtaining up-to-date TXO commitment proofs
>=20
> 2) Updating those proofs as blocks are mined
>=20
> The first challenge can be handled by specialized archival nodes, not =
unlike
> how some nodes make transaction data available to wallets via bloom =
filters or
> the Electrum protocol. There's a whole variety of options available, =
and the
> the data can be easily sharded to scale horizontally; the data is
> self-validating allowing horizontal scaling without trust.
>=20
> While miners and relay nodes don't need to be concerned about the =
initial
> commitment proof, updating that proof is another matter. If a node =
aggressively
> prunes old versions of the TXO MMR as it calculates pending TXO =
commitments, it
> won't have the data available to update the TXO commitment proof to be =
against
> the next block, when that block is found; the child nodes of the TXO =
MMR tip
> are guaranteed to have changed, yet aggressive pruning would have =
discarded that
> data.
>=20
> Relay nodes could ignore this problem if they simply accept the fact =
that
> they'll only be able to fully relay the transaction once, when it is =
initially
> broadcast, and won't be able to provide mempool functionality after =
the initial
> relay. Modulo high-latency mixnets, this is probably acceptable; the =
author has
> previously argued that relay nodes don't need a mempool=C2=B2 at all.
>=20
> For a miner though not having the data necessary to update the proofs =
as blocks
> are found means potentially losing out on transactions fees. So how =
much extra
> data is necessary to make this a non-issue?
>=20
> Since the TXO MMR is insertion ordered, spending a non-archived txout =
can only
> invalidate the upper nodes in of the archived txout's TXO MMR proof =
(if this
> isn't clear, imagine a two-level scheme, with a per-block TXO MMRs, =
committed
> by a master MMR for all blocks). The maximum number of relevant inner =
nodes
> changed is log2(n) per block, so if there are n non-archival blocks =
between the
> most recent TXO commitment and the pending TXO MMR tip, we have to =
store
> log2(n)*n inner nodes - on the order of a few dozen MB even when n is =
a
> (seemingly ridiculously high) year worth of blocks.
>=20
> Archived txout spends on the other hand can invalidate TXO MMR proofs =
at any
> level - consider the case of two adjacent txouts being spent. To =
guarantee
> success requires storing full proofs. However, they're limited by the =
blocksize
> limit, and additionally are expected to be relatively uncommon. For =
example, if
> 1% of 1MB blocks was archival spends, our hypothetical year long TXO =
commitment
> delay is only a few hundred MB of data with low-IO-performance =
requirements.
>=20
>=20
> ## Security Model
>=20
> Of course, a TXO commitment delay of a year sounds ridiculous. Even =
the slowest
> imaginable computer isn't going to need more than a few blocks of TXO
> commitment delay to keep up ~100% of the time, and there's no reason =
why we
> can't have the UTXO archive delay be significantly longer than the TXO
> commitment delay.
>=20
> However, as with UTXO commitments, TXO commitments raise issues with =
Bitcoin's
> security model by allowing relatively miners to profitably mine =
transactions
> without bothering to validate prior history. At the extreme, if there =
was no
> commitment delay at all at the cost of a bit of some extra network =
bandwidth
> "full" nodes could operate and even mine blocks completely statelessly =
by
> expecting all transactions to include "proof" that their inputs are =
unspent; a
> TXO commitment proof for a commitment you haven't verified isn't a =
proof that a
> transaction output is unspent, it's a proof that some miners claimed =
the txout
> was unspent.
>=20
> At one extreme, we could simply implement TXO commitments in a =
"virtual"
> fashion, without miners actually including the TXO commitment digest =
in their
> blocks at all. Full nodes would be forced to compute the commitment =
from
> scratch, in the same way they are forced to compute the UTXO state, or =
total
> work. Of course a full node operator who doesn't want to verify old =
history can
> get a copy of the TXO state from a trusted source - no different from =
how you
> could get a copy of the UTXO set from a trusted source.
>=20
> A more pragmatic approach is to accept that people will do that =
anyway, and
> instead assume that sufficiently old blocks are valid. But how old is
> "sufficiently old"? First of all, if your full node implementation =
comes "from
> the factory" with a reasonably up-to-date minimum accepted total-work
> threshold=E2=81=B1 - in other words it won't accept a chain with less =
than that amount
> of total work - it may be reasonable to assume any Sybil attacker with
> sufficient hashing power to make a forked chain meeting that threshold =
with,
> say, six months worth of blocks has enough hashing power to threaten =
the main
> chain as well.
>=20
> That leaves public attempts to falsify TXO commitments, done out in =
the open by
> the majority of hashing power. In this circumstance the "assumed =
valid"
> threshold determines how long the attack would have to go on before =
full nodes
> start accepting the invalid chain, or at least, newly =
installed/recently reset
> full nodes. The minimum age that we can "assume valid" is tradeoff =
between
> political/social/technical concerns; we probably want at least a few =
weeks to
> guarantee the defenders a chance to organise themselves.
>=20
> With this in mind, a longer-than-technically-necessary TXO commitment =
delay=CA=B2
> may help ensure that full node software actually validates some =
minimum number
> of blocks out-of-the-box, without taking shortcuts. However this can =
be
> achieved in a wide variety of ways, such as the author's =
prev-block-proof
> proposal=C2=B3, fraud proofs, or even a PoW with an inner loop =
dependent on
> blockchain data. Like UTXO commitments, TXO commitments are also =
potentially
> very useful in reducing the need for SPV wallet software to trust =
third parties
> providing them with transaction data.
>=20
> i) Checkpoints that reject any chain without a specific block are a =
more
>   common, if uglier, way of achieving this protection.
>=20
> j) A good homework problem is to figure out how the TXO commitment =
could be
>   designed such that the delay could be reduced in a soft-fork.
>=20
>=20
> ## Further Work
>=20
> While we've shown that TXO commitments certainly could be implemented =
without
> increasing peak IO bandwidth/block validation latency significantly =
with the
> delayed commitment approach, we're far from being certain that they =
should be
> implemented this way (or at all).
>=20
> 1) Can a TXO commitment scheme be optimized sufficiently to be used =
directly
> without a commitment delay? Obviously it'd be preferable to avoid all =
the above
> complexity entirely.
>=20
> 2) Is it possible to use a metric other than age, e.g. priority? While =
this
> complicates the pruning logic, it could use the UTXO set space more
> efficiently, especially if your goal is to prioritise bitcoin =
value-transfer
> over other uses (though if "normal" wallets nearly never need to use =
TXO
> commitments proofs to spend outputs, the infrastructure to actually do =
this may
> rot).
>=20
> 3) Should UTXO archiving be based on a fixed size UTXO set, rather =
than an
> age/priority/etc. threshold?
>=20
> 4) By fixing the problem (or possibly just "fixing" the problem) are =
we
> encouraging/legitimising blockchain use-cases other than BTC value =
transfer?
> Should we?
>=20
> 5) Instead of TXO commitment proofs counting towards the blocksize =
limit, can
> we use a different miner fairness/decentralization metric/incentive? =
For
> instance it might be reasonable for the TXO commitment proof size to =
be
> discounted, or ignored entirely, if a proof-of-propagation scheme =
(e.g.
> thinblocks) is used to ensure all miners have received the proof in =
advance.
>=20
> 6) How does this interact with fraud proofs? Obviously furthering =
dependency on
> non-cryptographically-committed STXO/UTXO databases is incompatible =
with the
> modularized validation approach to implementing fraud proofs.
>=20
>=20
> # References
>=20
> 1) "Merkle Mountain Ranges",
>   Peter Todd, OpenTimestamps, Mar 18 2013,
>   =
https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/me=
rkle-mountain-range.md
>=20
> 2) "Do we really need a mempool? (for relay nodes)",
>   Peter Todd, bitcoin-dev mailing list, Jul 18th 2015,
>   =
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-July/009479.h=
tml
>=20
> 3) "Segregated witnesses and validationless mining",
>   Peter Todd, bitcoin-dev mailing list, Dec 23rd 2015,
>   =
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-December/0121=
03.html
>=20
> --=20
> https://petertodd.org 'peter'[:-1]@petertodd.org
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev