summaryrefslogtreecommitdiff
path: root/fa/dfa18dbaaf01758d67df3f4021754c7d74c2b8
blob: b906c2407fbe9f66df799ec16377dbbc0d41a908 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
Return-Path: <aj@erisian.com.au>
Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137])
 by lists.linuxfoundation.org (Postfix) with ESMTP id 31898C000B
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Thu, 17 Feb 2022 14:27:38 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by smtp4.osuosl.org (Postfix) with ESMTP id 110DE41705
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Thu, 17 Feb 2022 14:27:38 +0000 (UTC)
X-Virus-Scanned: amavisd-new at osuosl.org
X-Spam-Flag: NO
X-Spam-Score: -1.621
X-Spam-Level: 
X-Spam-Status: No, score=-1.621 tagged_above=-999 required=5
 tests=[BAYES_00=-1.9, KHOP_HELO_FCRDNS=0.276, SPF_HELO_NONE=0.001,
 SPF_NONE=0.001, UNPARSEABLE_RELAY=0.001]
 autolearn=no autolearn_force=no
Received: from smtp4.osuosl.org ([127.0.0.1])
 by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id hFB51wp5OE5N
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Thu, 17 Feb 2022 14:27:36 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.8.0
Received: from azure.erisian.com.au (cerulean.erisian.com.au [139.162.42.226])
 by smtp4.osuosl.org (Postfix) with ESMTPS id 4AF0E41676
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Thu, 17 Feb 2022 14:27:36 +0000 (UTC)
Received: from aj@azure.erisian.com.au (helo=sapphire.erisian.com.au)
 by azure.erisian.com.au with esmtpsa (Exim 4.92 #3 (Debian))
 id 1nKhkp-0006vT-3b; Fri, 18 Feb 2022 00:27:33 +1000
Received: by sapphire.erisian.com.au (sSMTP sendmail emulation);
 Fri, 18 Feb 2022 00:27:27 +1000
Date: Fri, 18 Feb 2022 00:27:27 +1000
From: Anthony Towns <aj@erisian.com.au>
To: Russell O'Connor <roconnor@blockstream.com>,
 Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Message-ID: <20220217142727.GA1429@erisian.com.au>
References: <CAMZUoK=pkZuovtifBzdqhoyegzG+9hRTFEc7fG9nZPDK4KbU3w@mail.gmail.com>
 <20220128013436.GA2939@erisian.com.au>
 <CAMZUoK=U_-ah3cQbESE8hBXOvSMpxJJd1-ca0mYo7SvMi7izYQ@mail.gmail.com>
 <20220201011639.GA4317@erisian.com.au>
 <CAMZUoKmp_B9vYX8akyWz6dXtrx6PWfDV6mDVG5Nk2MZdoAqnAg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAMZUoKmp_B9vYX8akyWz6dXtrx6PWfDV6mDVG5Nk2MZdoAqnAg@mail.gmail.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-Spam-Score-int: -18
X-Spam-Bar: -
Subject: Re: [bitcoin-dev] TXHASH + CHECKSIGFROMSTACKVERIFY in lieu of CTV
 and ANYPREVOUT
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Feb 2022 14:27:38 -0000

On Mon, Feb 07, 2022 at 09:16:10PM -0500, Russell O'Connor via bitcoin-dev wrote:
> > > For more complex interactions, I was imagining combining this TXHASH
> > > proposal with CAT and/or rolling SHA256 opcodes.
> Indeed, and we really want something that can be programmed at redemption
> time.

I mean, ideally we'd want something that can be flexibly programmed at
redemption time, in a way that requires very few bytes to express the
common use cases, is very efficient to execute even if used maliciously,
is hard to misuse accidently, and can be cleanly upgraded via soft fork
in the future if needed?

That feels like it's probably got a "fast, cheap, good" paradox buried
in there, but even if it doesn't, it doesn't seem like something you
can really achieve by tweaking around the edges?

> That probably involves something like how the historic MULTISIG worked by
> having list of input / output indexes be passed in along with length
> arguments.
> 
> I don't think there will be problems with quadratic hashing here because as
> more inputs are list, the witness in turns grows larger itself.

If you cache the hash of each input/output, it would mean each byte of
the witness would be hashing at most an extra 32 bytes of data pulled
from that cache, so I think you're right. Three bytes of "script" can
already cause you to rehash an additional ~500 bytes (DUP SHA256 DROP),
so that should be within the existing computation-vs-weight relationship.

If you add the ability to hash a chosen output (as Rusty suggests, and
which would allow you to simulate SIGHASH_GROUP), your probably have to
increase your cache to cover each outputs' scriptPubKey simultaneously,
which might be annoying, but doesn't seem fatal.

> That said, your SIGHASH_GROUP proposal suggests that some sort of
> intra-input communication is really needed, and that is something I would
> need to think about.

I think the way to look at it is that it trades off spending an extra
witness byte or three per output (your way, give or take) vs only being
able to combine transactions in limited ways (sighash_group), but being
able to be more optimised than the more manual approach.

That's a fine tradeoff to make for something that's common -- you
save onchain data, make something easier to use, and can optimise the
implementation so that it handles the common case more efficiently.

(That's a bit of a "premature optimisation" thing though -- we can't
currently do SIGHASH_GROUP style things, so how can you sensibly justify
optimising it because it's common, when it's not only currently not
common, but also not possible? That seems to me a convincing reason to
make script more expressive)

> While normally I'd be hesitant about this sort of feature creep, when we
> are talking about doing soft-forks, I really think it makes sense to think
> through these sorts of issues (as we are doing here).

+1

I guess I especially appreciate your goodwill here, because this has
sure turned out to be a pretty long message as I think some of these
things through out loud :)

> > "CAT" and "CHECKSIGFROMSTACK" are both things that have been available in
> > elements for a while; has anyone managed to build anything interesting
> > with them in practice, or are they only useful for thought experiments
> > and blog posts? To me, that suggests that while they're useful for
> > theoretical discussion, they don't turn out to be a good design in
> > practice.
> Perhaps the lesson to be drawn is that languages should support multiplying
> two numbers together.

Well, then you get to the question of whether that's enough, or if
you need to be able to multiply bignums together, etc? 

I was looking at uniswap-like things on liquid, and wanted to do constant
product for multiple assets -- but you already get the problem that "x*y
< k" might overflow if the output values x and y are ~50 bits each, and
that gets worse with three assets and wanting to calculate "x*y*z < k",
etc. And really you'd rather calculate "a*log(x) + b*log(y) + c*log(z)
< k" instead, which then means implementing fixed point log in script...

> Having 2/3rd of the language you need to write interesting programs doesn't
> mean that you get 2/3rd of the interesting programs written.

I guess to abuse that analogy: I think you're saying something like
we've currently got 67% of an ideal programming language, and CTV
would give us 68%, but that would only take us from 10% to 11% of the
interesting programs. I agree txhash might bump that up to, say, 69%
(nice) but I'm not super convinced that even moves us from 11% to 12%
of interesting programs, let alone a qualitative leap to 50% or 70%
of interesting programs.

It's *possible* that the ideal combination of opcodes will turn out to
be CAT, TXHASH, CHECKSIGFROMSTACK, MUL64LE, etc, but it feels like it'd
be better working something out that fits together well, rather than
adding things piecemeal and hoping we don't spend all that effort to
end up in a local optimum that's a long way short of a global optimum?

[rearranged:]

> The flexibility of TXHASH is intended to head off the need for future soft
> forks.  If we had specific applications in mind, we could simply set up the
> transaction hash flags to cover all the applications we know about.  But it
> is the applications that we don't know about that worry me.  If we don't
> put options in place with this soft-fork proposal, then they will need
> their own soft-fork down the line; and the next application after that, and
> so on.
> 
> If our attitude is to craft our soft-forks as narrowly as possible to limit
> them to what only allows for given tasks, then we are going to end up
> needing a lot more soft-forks, and that is not a good outcome.

I guess I'm not super convinced that we're anywhere near the right level
of generality that this would help in avoiding future soft forks? That's
what I meant by it not covering SIGHASH_GROUP.

I guess the model I have in my head, is that what we should ideally
have a general/flexible/expressive but expensive way of doing whatever
scripting you like (so a "SIMPLICITY_EXEC" opcode, perhaps), but then,
as new ideas get discovered and widely deployed, we should then make them
easy and cheap to use (whether that's deploying a "jet" for the simplicity
code, or a dedicated opcode, or something else), but "cheap to use"
means defining a new cost function (or defining new execution conditions
for something that was already cheaper than the cheapest existing way
of encoding those execution conditions), which is itself a soft fork
since to make it "cheaper" means being able to fit more transactions
using that feature into a block than was previously possible..

But even then, based on [0], pure simplicity code to verify a signature
apparently takes 11 minutes, so that code probably should cost 66M vbytes
(based on a max time to verify a block of 10 seconds), which would
make it obviously unusable as a bitcoin tx with their normal 100k vbyte
limit... Presumably an initial simplicity deployment would come with a
bunch of jets baked in so that's less of an issue in practice... 

But I think that means that even with simplicity you couldn't experiment
with alternative ECC curves or zero knowledge stuff without a soft fork
to make the specific setup fast and cheap, first.

[0] https://medium.com/blockstream/simplicity-jets-release-803db10fd589

(I think this approach would already be an improvement in how we do soft
forks, though: (1) for many things, you would already have on-chain
evidence that this is something that's worthwhile, because people are
paying high fees to do it via hand-coded simplicity, so there's no
question of whether it will be used; (2) you can prove the jet and the
simplicity code do the exact same thing (and have unit/fuzz tests to
verify it), so can be more confident that the implementation is correct;
(3) maybe it's easier to describe in a bip that way too, since you can
just reference the simplicity code it's replacing rather than having
C++ code?)

That still probably doesn't cover every experiment you might want to do;
eg if you wanted to have your tx commit to a prior block hash, you'd
presumably need a soft fork to expose that data; and if you wanted to
extend the information about the utxo being spent (eg a parity bit for
the internal public key to make recursive TLUV work better) you'd need a
soft fork for that too.


I guess a purist approach to generalising sighashes might look something
like:

   [s] [shimplicity] DUP EXEC p CHECKSIGFROMSTACK

where both s and shimplicity (== sighash + simplicity or shim + simplicity
:) are provided by the signer, with s being a signature, and shimplicity
being a simplicity script that builds a 32 byte message based on whatever
bits of the transaction it chooses as well as the shimplicity script
itself to prevent malleability.

But writing a shimplicity script all the time is annoying, so adding an
extra opcode to avoid that makes sense, reducing it to:

   [s] [sh] TXHASH p CHECKIGFROMSTACK

which is then equivalent to the exisitng

   [s|sh] p CHECKSIG

Though in that case, wouldn't you just have "txhash(sh)" be your
shimplicity script (in which case txhash is a jet rather than an opcode),
and keep the program as "DUP EXEC p CHECKSIGFROMSTACK", which then gives
the signer maximum flexibility to either use a standard sighash, or
write special code to do something new and magic?

So I think I'm 100% convinced that a (simplified) TXHASH makes sense in
a world where we have simplicity-equivalent scripting (and where there's
*also* some more direct introspection functionality like Rusty's OP_TX
or elements' tapscript opcodes or whatever).

(I don't think there's much advantage of a TaggedHash opcode that
takes the tag as a parameter over just writing "SHA256 DUP CAT SWAP CAT
SHA256", and if you were going to have a "HASH_TapSighash" opcode it
probably should be limited to hashing the same things from the bip that
defines it anyway. So having two simplicity functions, one for bip340
(checksigfromstack) and one for bip342 (generating a signature message
for the current transaction) seems about ideal)

But, I guess that brings me back to more or less what Jeremy asked
earlier in this thread:

] Does it make "more sense" to invest the research and development effort
] that would go into proving TXHASH safe, for example, into Simplicity
] instead?

Should we be trying to gradually turn script into a more flexible
language, one opcode at a time -- going from 11% to 12% to 13.4% to
14.1% etc of coverage of interesting programs -- or should we invest
that time/effort into working on simplicity (or something like chialisp
or similar) instead? That is, something where we could actually evaluate
how all the improved pieces fit together rather than guessing how it might
work if we maybe in future add CAT or 64 bit maths or something else...

If we put all our "language design" efforts into simplicity/whatever,
we could leave script as more of a "macro" language than a programming
one; that is, focus on it being an easy, cheap, safe way of doing the
most common things. I think that would still be worthwhile, both before
and after simplicity/* is available?

I think my opinions are:

 * recursive covenants are not a problem; avoiding them isn't and
   shouldn't be a design goal; and trying to prevent other people using
   them is wasted effort

 * having a language redesign is worthwhile -- there are plenty of ways
   to improve script, and there's enough other blockchain languages out
   there by now that we ought be able to avoid a "second system effect"
   disaster

 * CTV via legacy script saves ~17 vbytes compared to going via
   tapscript (since the CTV hash is already in the scriptPubKey and the
   internal pubkey isn't needed, so neither need to be revealed to spend)
   and avoids the taproot ECC equation check, at the cost of using up
   an OP_NOP opcode. That seems worthwhile to me. Comparatively, TXHASH
   saves ~8 vbytes compared to emulating it with CTV (because you don't
   have to supply an unacceptable hash on demand). So having both may be
   worthwhile, but if we only have one, CTV seems the bigger saving? And
   we're not wasting an opcode if we do CTV now and add TXHASH later,
   since we TXHASH isn't NOP-compatible and can't be included in legacy
   script anyway.

 * TXHASH's "PUSH" behaviour vs CTV's "examine the stack but don't
   change it, and VERIFY" behaviour is independent of the question of 
   if we want to supply flags to CTV/TXHASH so they're more flexible

And perhaps less strongly:

 * I don't like the ~18 TXHASH flags; for signing/CTV behaviour, they're
   both overkill (they have lots of seemingly useless combinations)
   and insufficient (don't cover SIGHASH_GROUP), and they add additional
   bytes of witness data, compared to CTV's zero-byte default or CHECKSIG's
   zero/one-byte sighash which only do things we know are useful (well,
   some of those combinations might not be useful either...).

 * If we're deliberately trying to add transaction introspection, then
   all the flags do make sense, but Rusty's unhashed "TX" approach seems
   better than TXHASH for that (assuming we want one opcode versus the
   many opcodes elements use). But if we want that, we probably should
   also add maths opcodes that can cope with output amounts, at least;
   and perhaps also have some way for signatures to some witness data
   that's used as script input. Also, convenient introspection isn't
   really compatible with convenient signing without some way of
   conveniently converting data into a tagged hash. 

 * I'm not really convinced CTV is ready to start trying to deploy
   on mainnet even in the next six months; I'd much rather see some real
   third-party experimentation *somewhere* public first, and Jeremy's CTV
   signet being completely empty seems like a bad sign to me. Maybe that
   means we should tentatively merge the feature and deploy it on the
   default global signet though?  Not really sure how best to get more
   real world testing; but "deploy first, test later" doesn't sit right.

I'm not at all sure about bundling CTV with ANYPREVOUT and SIGHASH_GROUP:

Pros:

 - with APO available, you don't have to worry as much if spending
   a CTV output doesn't result in a guaranteed txid, and thus don't need
   to commit to scriptSigs and they like

 - APOAS and CTV are pretty similar in what they hash

 - SIGHASH_GROUP lets you add extra extra change outputs to a CTV spend
   which you can't otherwise do

 - reusing APOAS's tx hash scheme for CTV would avoid some of the weird
   ugly bits in CTV (that the input index is committed to and that the
   scriptSig is only "maybe!" included)

 - defining SIGHASH_GROUP and CTV simultaneously might let you define
   the groups in a way that is compatible between tapscript (annex-based)
   and legacy CTV. On the other hand, this probably still works provided
   you deploy SIGHASH_GROUP /after/ CTV is specced in (by defining CTV
   behaviour for a different length arg)

Cons:

 - just APOAS|ALL doesn't quite commit to the same things as bip 119 CTV
   and that matters if you reuse CTV addresses

 - SIGHASH_GROUP assumes use of the annex, which would need to be
   specced out; SIGHASH_GROUP itself doesn't really have a spec yet either

 - txs signed by APOAS|GROUP are more malleable than txs with a bip119
   CTV hash which might be annoying to handle even non-adversarially

 - that malleability with current RBF rules might lead to pinning
   problems

I guess for me that adds up to:

 * For now, I think I prefer OP_CTV over either OP_TXHASH alone or both
   OP_CTV and OP_TXHASH

 * I'd like to see CTV get more real-world testing before considering
   deployment

 * If APO/SIGHASH_GROUP get specced, implemented *and* tested by the
   time CTV is tested enough to think about deploying it, bundle them

 * Unless CTV testing takes ages, it's pretty unlikely it'll be worth
   simplifying CTV to more closely match APO's tx hashing

 * CAT, CHECKSIGFROMSTACK, tx introspection, better maths *are* worth
   prioritising, but would be better as part of a more thorough language
   overhaul (since you can analyse how they interact with each other
   in combination, and you get a huge jump from ~10% to ~80% benefit,
   instead of tiny incremental ones)?

I guess that's all partly dependent on thinking that, TXHASH isn't
great for tx introspection (especially without CAT) and, (without tx
introspection and decent math opcodes), DLCs already provide all the
interesting oracle behaviour you're really going to get...

> I don't know if this is the answer you are looking for, but technically
> TXHASH + CAT + SHA256 awkwardly gives you limited transaction reflection.
> In fact, you might not even need TXHASH, though it certainly helps.

Yeah, it wasn't really what I was looking for but it does demolish that
specific thought experiment anyway.

> > I believe "sequences hash", "input count" and "input index" are all an
> > important part of ensuring that if you have two UTXOs distributing 0.42
> > BTC to the same set of addresses via CTV, that you can't combine them in a
> > single transaction and end up sending losing one of the UTXOs to fees. I
> > don't believe there's a way to resolve that with bip 118 alone, however
> > that does seem to be a similar problem to the one that SIGHASH_GROUP
> > tries to solve.
> It was my understanding that it is only "input count = 1" that prevents
> this issue.

If you have input count = 1, that solves the issue, but you could also
have input count > 1, and simply commit to different input indexes to
allow/require you to combine two CTV utxos into a common set of new
outputs, or you could have input count > 1 but input index = 1 for both
utxos to prevent combining them with each other, but allow adding a fee
funding input (but not a change output; and at a cost of an unpredictable
txid).

(I only listed "sequences hash" there because it implicitly commits to
"input count")

Cheers,
aj