11/b2680f88ff11af8c1d2327cdbc1b148fa76921


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327

Return-Path: <aj@erisian.com.au>
Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138])
 by lists.linuxfoundation.org (Postfix) with ESMTP id 83B20C000B
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Mon,  7 Mar 2022 08:08:14 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by smtp1.osuosl.org (Postfix) with ESMTP id 801F181461
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Mon,  7 Mar 2022 08:08:14 +0000 (UTC)
X-Virus-Scanned: amavisd-new at osuosl.org
X-Spam-Flag: NO
X-Spam-Score: -0.415
X-Spam-Level: 
X-Spam-Status: No, score=-0.415 tagged_above=-999 required=5
 tests=[BAYES_00=-1.9, FAKE_REPLY_C=1.486, SPF_HELO_PASS=-0.001,
 SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001]
 autolearn=ham autolearn_force=no
Received: from smtp1.osuosl.org ([127.0.0.1])
 by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 6NOctzI7yKk1
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Mon,  7 Mar 2022 08:08:11 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.8.0
Received: from azure.erisian.com.au (azure.erisian.com.au [172.104.61.193])
 by smtp1.osuosl.org (Postfix) with ESMTPS id A6B4B81521
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Mon,  7 Mar 2022 08:08:11 +0000 (UTC)
Received: from aj@azure.erisian.com.au (helo=sapphire.erisian.com.au)
 by azure.erisian.com.au with esmtpsa (Exim 4.92 #3 (Debian))
 id 1nR8PW-0004Ny-VB; Mon, 07 Mar 2022 18:08:09 +1000
Received: by sapphire.erisian.com.au (sSMTP sendmail emulation);
 Mon, 07 Mar 2022 18:08:03 +1000
Date: Mon, 7 Mar 2022 18:08:03 +1000
From: Anthony Towns <aj@erisian.com.au>
To: Jeremy Rubin <jeremy.l.rubin@gmail.com>,
 Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Message-ID: <20220307080803.GA6464@erisian.com.au>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAD5xwhjG7kN=LatZRpQxqmaqtoRP31BcyeN2zHtOUsGt=6oJ3w@mail.gmail.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Spam-Score-int: -3
X-Spam-Bar: /
Subject: Re: [bitcoin-dev] Annex Purpose Discussion: OP_ANNEX,
 Turing Completeness, and other considerations
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Mar 2022 08:08:14 -0000

On Sat, Mar 05, 2022 at 12:20:02PM +0000, Jeremy Rubin via bitcoin-dev wrote:
> On Sat, Mar 5, 2022 at 5:59 AM Anthony Towns <aj@erisian.com.au> wrote:
> > The difference between information in the annex and information in
> > either a script (or the input data for the script that is the rest of
> > the witness) is (in theory) that the annex can be analysed immediately
> > and unconditionally, without necessarily even knowing anything about
> > the utxo being spent.
> I agree that should happen, but there are cases where this would not work.
> E.g., imagine OP_LISP_EVAL + OP_ANNEX... and then you do delegation via the
> thing in the annex.
> Now the annex can be executed as a script.

You've got the implication backwards: the benefit isn't that the annex
*can't* be used as/in a script; it's that it *can* be used *without*
having to execute/analyse a script (and without even having to load the
utxo being spent).

How big a benefit that is might be debatable -- it's only a different
ordering of the work you have to do to be sure the transaction is valid;
it doesn't reduce the total work. And I think you can easily design
invalid transactions that will maximise the work required to establish
the tx is invalid, no matter what order you validate things.

> Yes, this seems tough to do without redefining checksig to allow partial
> annexes.

"Redefining checksig to allow X" in taproot means "defining a new pubkey
format that allows a new sighash that allows X", which, if it turns out
to be necessary/useful, is entirely possible.  It's not sensible to do
what you suggest *now* though, because we don't have a spec of how a
partial annex might look.

> Hence thinking we should make our current checksig behavior
> require it be 0,

Signatures already require the annex to not be present. If you personally
want to do that for every future transaction you sign off on, you
already can.

> > It seems like a good place for optimising SIGHASH_GROUP (allowing a group
> > of inputs to claim a group of outputs for signing, but not allowing inputs
> > from different groups to ever claim the same output; so that each output
> > is hashed at most once for this purpose) -- since each input's validity
> > depends on the other inputs' state, it's better to be able to get at
> > that state as easily as possible rather than having to actually execute
> > other scripts before your can tell if your script is going to be valid.
> I think SIGHASH_GROUP could be some sort of mutable stack value, not ANNEX.

The annex is already a stack value, and the SIGHASH_GROUP parameter
cannot be mutable since it will break the corresponding signature, and
(in order to ensure validating SIGHASH_GROUP signatures don't require
hashing the same output multiple times) also impacts SIGHASH_GROUP
signatures from other inputs.

> you want to be able to compute what range you should sign, and then the
> signature should cover the actual range not the argument itself.

The value that SIGHASH_GROUP proposes putting in the annex is just an
indication of whether (a) this input is using the same output group as
the previous input; or else (b) how many outputs are in this input's
output group. The signature naturally commits to that value because it's
signing all the outputs in the group anyway.

> Why sign the annex literally?

To prevent it from being third-party malleable.

When there is some meaning assigned to the annex then perhaps it will
make sense to add some more granular way of accessing it via script, but
until then, committing to the whole thing is the best option possible,
since it still allows some potential uses of the annex without having
to define a new sighash.

Note that signing only part of the annex means that you probably
reintroduce the quadratic hashing problem -- that is, with a script of
length X and an annex of length Y, you may have to hash O(X*Y) bytes
instead of O(X+Y) bytes (because every X/k bytes of the script selects
a different Y/j subset of the annex to sign).

> Why require that all signatures in one output sign the exact same digest?
> What if one wants to sign for value and another for value + change?

You can already have one signature for value and one for value+change:
use SIGHASH_SINGLE for the former, and SIGHASH_ALL for the latter.
SIGHASH_GROUP is designed for the case where the "value" goes to
multiple places.

> > > Essentially, I read this as saying: The annex is the ability to pad a
> > > transaction with an additional string of 0's
> > If you wanted to pad it directly, you can do that in script already
> > with a PUSH/DROP combo.
> You cannot, because the push/drop would not be signed and would be
> malleable.

If it's a PUSH, then it's in the tapscript and committed to by the
scriptPubKey, and not malleable.

There's currently no reason to have padding specifiable at spend time --
you know when you're writing the script whether the spender can reuse
the same signature for multiple CHECKSIG ops, because the only way to
do that is to add DUP/etc opcodes -- so if you're doing that, you can
add any necessary padding at the same time.

> The annex is not malleable, so it can be used to this as authenticated
> padding.

The reason that the annex is not third-party malleable is that its
content is committed to by signatures.

> > The point of doing it in the annex is you could have a short byte
> > string, perhaps something like "0x010201a4" saying "tag 1, data length 2
> > bytes, value 420" and have the consensus intepretation of that be "this
> > transaction should be treated as if it's 420 weight units more expensive
> > than its serialized size", while only increasing its witness size by
> > 6 bytes (annex length, annex flag, and the four bytes above). Adding 6
> > bytes for a 426 weight unit increase seems much better than adding 426
> > witness bytes.
> Yes, that's what I say in the next sentence,
> *> Or, we might somehow make the witness a small language (e.g., run length
> encoded zeros)

If you're doing run-length encoding, you might as well just use gzip at
the p2p and storage layers; you don't need to touch consensus at all.
That's not an extensible or particularly interesting idea.

> > > Introducing OP_ANNEX: Suppose there were some sort of annex pushing
> > opcode,
> > > OP_ANNEX which puts the annex on the stack
> > I think you'd want to have a way of accessing individual entries from
> > the annex, rather than the annex as a single unit.
> Or OP_ANNEX + OP_SUBSTR + OP_POVARINTSTR? Then you can just do 2 pops for
> the length and the tag and then get the data.

If you want to make things as inconvenient as possible, sure, I guess?

> > > Now every time you run this,
> > You only run a script from a transaction once at which point its
> > annex is known (a different annex gives a different wtxid and breaks
> > any signatures), and can't reference previous or future transactions'
> > annexes...
> In a transaction validator, yes. But in a satisfier, no.

In a satisfier you don't "run" a script, you provide a solution to
the script...

You can certainly create scripts where it's not possible to provide
valid solutions, eg:

    DUP EQUAL NOT VERIFY

or where it's theoretically possible but in practice extremely difficult
to provide solutions, eg:

    DUP 2 <P> <Q> 2 CHECKMULTISIG
    2DUP EQUAL NOT VERIFY SHA256 SWAP SHA256 EQUAL

or where the difficulty is known and there really isn't an easier way
of coming up with a solution than doing multiple guesses and validating
the result:

    SIZE 80 EQUAL NOT VERIFY HASH256 0 18 SUBSTR 0 NUMEQUAL

But if you don't want to make life difficult for yourself, the answer's
pretty easy: just don't do those things. Or, at a higher level, don't
design new opcodes where you have to do those sorts of things.

> Not true about accessing previous TXNs annexes. All coins spend from
> Coinbase transactions. If you can get the COutpoint you're spending, you
> can get the parent of the COutpoint... and iterate backwards so on and so
> forth. Then you have the CB txn, which commits to the tree of wtxids. So
> you get previous transactions annexes comitted there.

None of that information is stored in the utxo database or accessible at
validation time. Adding that information would make the utxo database
much larger, increasing the costs of running a node, and increasing
validation time for each transaction/block.

> For future transactions,

(For future transactions, if you had generic recursive covenants and
a opcode to examine the annex, you could prevent spending without a
particular value appearing in the annex; that doesn't let you "inspect"
a future annex, though)

> > > Because the Annex is signed, and must be the same, this can also be
> > > inconvenient:
> > The annex is committed to by signatures in the same way nVersion,
> > nLockTime and nSequence are committed to by signatures; I think it helps
> > to think about it in a similar way.
> nSequence, yes, nLockTime is per-tx.

nVersion is also per-tx not per-input. You still need to establish all
three of them before you start signing things.

> BTW i think we now consider nSeq/nLock to be misdesigned given desire to
> vary these per-input/per-tx....\

Since nSequence is per-input, you can obviously vary that per-input; and
you can vary all three per-tx.

> > > Suppose that you have a Miniscript that is something like: and(or(PK(A),
> > > PK(A')), X, or(PK(B), PK(B'))).
> Yes, my point is this is computationally hard to do sometimes.

Sometimes, what makes things computationally hard is that you've got
the wrong approach to looking at the problem.

> > CLTV also has the problem that if you have one script fragment with
> > CLTV by time, and another with CLTV by height, you can't come up with
> > an nLockTime that will ever satisfy both. If you somehow have script
> > fragments that require incompatible interpretations of the annex, you're
> > likewise going to be out of luck.
> Yes, see above. If we don't know how the annex will be structured or used,

If you don't know how the annex will be structured or used, don't use
it. That's exactly how things are today, because no one knows how it
will be structured or used.

> this is the point of this thread....
> We need to drill down how to not introduce these problems.

From where I sit, it looks like you're drawing hasty conclusions based
on a lot of misconceptions. That's not the way you avoid introducing
problems...

I mean, having the misconceptions is perfectly reasonable; if anyone
knew exactly how annex things should work, we'd have a spec already. It's
leaping straight to "this is the only way it can work, it's a dumb way,
and therefore we should throw this out immediately" that I don't really
see the humour in.

> > > It seems like one good option is if we just go on and banish the
> > OP_ANNEX.
> > > Maybe that solves some of this? I sort of think so. It definitely seems
> > > like we're not supposed to access it via script, given the quote from
> > above:
> > How the annex works isn't defined, so it doesn't make any sense to
> > access it from script. When how it works is defined, I expect it might
> > well make sense to access it from script -- in a similar way that the
> > CLTV and CSV opcodes allow accessing nLockTime and nSequence from script.
> That's false: CLTV and CSV expressly do not allow accessing it from script,
> only lower bounding it

Lower bounding something requires accessing it.

That CLTV/CSV only allows lower-bounding it rather than more arbitrary
manipulation is mostly due to having to be implemented via upgrading an
OP_NOP opcode, rather than any other reason, IMHO.

> Legacy outputs can use these new sighash flags as well, in theory (maybe
> I'll do a post on why we shouldn't...)

Existing outputs can't use new sighash flags introduced by a soft fork --
if they could, then those outputs would have been anyone-can-spend prior
to the soft fork activating, because node software that doesn't support
the soft fork isn't able to calculate the message that the signature
applies to, so can't reject invalid signatures.

Perhaps you mean "we could replace OP_NOPx by OP_CHECKSIGv2 and allow
creating new p2wsh or p2sh addresses that can be spent using the new
flags", but I can't really think why anyone would bring that up at
this point, except as a way of deliberately wasting people's time and
attention...

Cheers,
aj