8a/d5c54fecaa7a72c9fc5d47c8485fbaceb15d7d


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269

Return-Path: <aj@erisian.com.au>
Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138])
 by lists.linuxfoundation.org (Postfix) with ESMTP id C46A8C000B
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Sat,  5 Mar 2022 06:32:54 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by smtp1.osuosl.org (Postfix) with ESMTP id 926CB82A4E
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Sat,  5 Mar 2022 06:32:54 +0000 (UTC)
X-Virus-Scanned: amavisd-new at osuosl.org
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level: 
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5
 tests=[BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001,
 UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from smtp1.osuosl.org ([127.0.0.1])
 by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id TjYXKCmuQOdb
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Sat,  5 Mar 2022 06:32:53 +0000 (UTC)
X-Greylist: delayed 00:33:18 by SQLgrey-1.8.0
Received: from azure.erisian.com.au (azure.erisian.com.au [172.104.61.193])
 by smtp1.osuosl.org (Postfix) with ESMTPS id ED12C82410
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Sat,  5 Mar 2022 06:32:52 +0000 (UTC)
Received: from aj@azure.erisian.com.au (helo=sapphire.erisian.com.au)
 by azure.erisian.com.au with esmtpsa (Exim 4.92 #3 (Debian))
 id 1nQNRw-0004tz-VK; Sat, 05 Mar 2022 15:59:31 +1000
Received: by sapphire.erisian.com.au (sSMTP sendmail emulation);
 Sat, 05 Mar 2022 15:59:24 +1000
Date: Sat, 5 Mar 2022 15:59:24 +1000
From: Anthony Towns <aj@erisian.com.au>
To: Jeremy Rubin <jeremy.l.rubin@gmail.com>,
 Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Message-ID: <20220305055924.GB5308@erisian.com.au>
References: <CAD5xwhgXE9sB-hdzz_Bgz6iEA-M5-Yu2VOn1qRzkaq+DdVsgmw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAD5xwhgXE9sB-hdzz_Bgz6iEA-M5-Yu2VOn1qRzkaq+DdVsgmw@mail.gmail.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Spam-Score-int: -18
X-Spam-Bar: -
Subject: Re: [bitcoin-dev] Annex Purpose Discussion: OP_ANNEX,
 Turing Completeness, and other considerations
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Sat, 05 Mar 2022 06:32:54 -0000

On Fri, Mar 04, 2022 at 11:21:41PM +0000, Jeremy Rubin via bitcoin-dev wrote:
> I've seen some discussion of what the Annex can be used for in Bitcoin. 

https://www.erisian.com.au/meetbot/taproot-bip-review/2019/taproot-bip-review.2019-11-12-19.00.log.html

includes some discussion on that topic from the taproot review meetings.

The difference between information in the annex and information in
either a script (or the input data for the script that is the rest of
the witness) is (in theory) that the annex can be analysed immediately
and unconditionally, without necessarily even knowing anything about
the utxo being spent.

The idea is that we would define some simple way of encoding (multiple)
entries into the annex -- perhaps a tag/length/value scheme like
lightning uses; maybe if we add a lisp scripting language to consensus,
we just reuse the list encoding from that? -- at which point we might
use one tag to specify that a transaction uses advanced computation, and
needs to be treated as having a heavier weight than its serialized size
implies; but we could use another tag for per-input absolute locktimes;
or another tag to commit to a past block height having a particular hash.

It seems like a good place for optimising SIGHASH_GROUP (allowing a group
of inputs to claim a group of outputs for signing, but not allowing inputs
from different groups to ever claim the same output; so that each output
is hashed at most once for this purpose) -- since each input's validity
depends on the other inputs' state, it's better to be able to get at
that state as easily as possible rather than having to actually execute
other scripts before your can tell if your script is going to be valid.

> The BIP is tight lipped about it's purpose

BIP341 only reserves an area to put the annex; it doesn't define how
it's used or why it should be used.

> Essentially, I read this as saying: The annex is the ability to pad a
> transaction with an additional string of 0's 

If you wanted to pad it directly, you can do that in script already
with a PUSH/DROP combo.

The point of doing it in the annex is you could have a short byte
string, perhaps something like "0x010201a4" saying "tag 1, data length 2
bytes, value 420" and have the consensus intepretation of that be "this
transaction should be treated as if it's 420 weight units more expensive
than its serialized size", while only increasing its witness size by
6 bytes (annex length, annex flag, and the four bytes above). Adding 6
bytes for a 426 weight unit increase seems much better than adding 426
witness bytes.

The example scenario is that if there was an opcode to verify a
zero-knowledge proof, eg I think bulletproof range proofs are something
like 10x longer than a signature, but require something like 400x the
validation time. Since checksig has a validation weight of 50 units,
a bulletproof verify might have a 400x greater validation weight, ie
20,000 units, while your witness data is only 650 bytes serialized. In
that case, we'd need to artificially bump the weight of you transaction
up by the missing 19,350 units, or else an attacker could fill a block
with perhaps 6000 bulletproofs costing the equivalent of 120M signature
operations, rather than the 80k sigops we currently expect as the maximum
in a block. Seems better to just have "0x01024b96" stuck in the annex,
than 19kB of zeroes.

> Introducing OP_ANNEX: Suppose there were some sort of annex pushing opcode,
> OP_ANNEX which puts the annex on the stack

I think you'd want to have a way of accessing individual entries from
the annex, rather than the annex as a single unit.

> Now suppose that I have a computation that I am running in a script as
> follows:
> 
> OP_ANNEX
> OP_IF
>     `some operation that requires annex to be <1>`
> OP_ELSE
>     OP_SIZE
>     `some operation that requires annex to be len(annex) + 1 or does a
> checksig`
> OP_ENDIF
> 
> Now every time you run this,

You only run a script from a transaction once at which point its
annex is known (a different annex gives a different wtxid and breaks
any signatures), and can't reference previous or future transactions'
annexes...

> Because the Annex is signed, and must be the same, this can also be
> inconvenient:

The annex is committed to by signatures in the same way nVersion,
nLockTime and nSequence are committed to by signatures; I think it helps
to think about it in a similar way.

> Suppose that you have a Miniscript that is something like: and(or(PK(A),
> PK(A')), X, or(PK(B), PK(B'))).
> 
> A or A' should sign with B or B'. X is some sort of fragment that might
> require a value that is unknown (and maybe recursively defined?) so
> therefore if we send the PSBT to A first, which commits to the annex, and
> then X reads the annex and say it must be something else, A must sign
> again. So you might say, run X first, and then sign with A and C or B.
> However, what if the script somehow detects the bitstring WHICH_A WHICH_B
> and has a different Annex per selection (e.g., interpret the bitstring as a
> int and annex must == that int). Now, given and(or(K1, K1'),... or(Kn,
> Kn')) we end up with needing to pre-sign 2**n annex values somehow... this
> seems problematic theoretically.

Note that you need to know what the annex will contain before you sign,
since the annex is committed to via the signature. If "X" will need
entries in the annex that aren't able to be calculated by the other
parties, then they need to be the first to contribute to the PSBT, not A.

I think the analogy to locktimes would be "I need the locktime to be at
least block 900k, should I just sign that now, or check that nobody else
is going to want it to be block 950k or something? Or should I just sign
with nLockTime at 900k, 910k, 920k, 930k, etc and let someone else pick
the right one?" The obvious solution is just to work out what the
nLockTime should be first, then run signing rounds. Likewise, work out
what the annex should be first, then run the signing rounds.

CLTV also has the problem that if you have one script fragment with
CLTV by time, and another with CLTV by height, you can't come up with
an nLockTime that will ever satisfy both. If you somehow have script
fragments that require incompatible interpretations of the annex, you're
likewise going to be out of luck.

Having a way of specifying locktimes in the annex can solve that
particular problem with CLTV (different inputs can sign different
locktimes, and you could have different tags for by-time/by-height so
that even the same input can have different clauses requiring both),
but the general problem still exists.

(eg, you might have per-input by-height absolute locktimes as annex
entry 3, and per-input by-time absolute locktimes as annex entry 4,
so you might convert:

 "900e3 CLTV DROP" -> "900e3 3 PUSH_ANNEX_ENTRY GREATERTHANOREQUAL VERIFY"

 "500e6 CLTV DROP" -> "500e6 4 PUSH_ANNEX_ENTRY GREATERTHANOREQUAL VERIFY"

for height/time locktime checks respectively)

> Of course this wouldn't be miniscript then. Because miniscript is just for
> the well behaved subset of script, and this seems ill behaved. So maybe
> we're OK?

The CLTV issue hit miniscript:

https://medium.com/blockstream/dont-mix-your-timelocks-d9939b665094

> But I think the issue still arises where suppose I have a simple thing
> like: and(COLD_LOGIC, HOT_LOGIC) where both contains a signature, if
> COLD_LOGIC and HOT_LOGIC can both have different costs, I need to decide
> what logic each satisfier for the branch is going to use in advance, or
> sign all possible sums of both our annex costs? This could come up if
> cold/hot e.g. use different numbers of signatures / use checksigCISAadd
> which maybe requires an annex argument.

Signatures pay for themselves -- every signature is 64 or 65 bytes,
but only has 50 units of validation weight. (That is, a signature check
is about 50x the cost of hashing 520 bytes of data, which is the next
highest cost operation we have, and is treated as costing 1 unit, and
immediately paid for by the 1 byte that writing OP_HASH256 takes up)

That's why the "add cost" use of the annex is only talked about in
hypotheticals, not specified -- for reasonable scripts with today's
opcodes, it's not needed.

If you're doing cross-input signature aggregation, everybody needs to
agree on the message they're signing in the first place, so you definitely
can't delay figuring out some bits of some annex until after signing.

> It seems like one good option is if we just go on and banish the OP_ANNEX.
> Maybe that solves some of this? I sort of think so. It definitely seems
> like we're not supposed to access it via script, given the quote from above:

How the annex works isn't defined, so it doesn't make any sense to
access it from script. When how it works is defined, I expect it might
well make sense to access it from script -- in a similar way that the
CLTV and CSV opcodes allow accessing nLockTime and nSequence from script.

To expand on that: the logic to prevent a transaction confirming too
early occurs by looking at nLockTime and nSequence, but script can
ensure that an attempt to use "bad" values for those can never be a
valid transaction; likewise, consensus may look at the annex to enforce
new conditions as to when a transaction might be valid (and can do so
without needing to evaluate any scripts), but the individual scripts can
make sure that the annex has been set to what the utxo owner considered
to be reasonable values.

> One solution would be to... just soft-fork it out. Always must be 0. When
> we come up with a use case for something like an annex, we can find a way
> to add it back.

The point of reserving the annex the way it has been is exactly this --
it should not be used now, but when we agree on how it should be used,
we have an area that's immediately ready to be used.

(For the cases where you don't need script to enforce reasonable values,
reserving it now means those new consensus rules can be used immediately
with utxos that predate the new consensus rules -- so you could update
offchain contracts from per-tx to per-input locktimes immediately without
having to update the utxo on-chain first)

Cheers,
aj