Date: Tue, 3 Dec 2019 18:35:38 +1000
From: Anthony Towns <aj@erisian.com.au>
To: Russell O'Connor <roconnor@blockstream.io>
Message-ID: <20191203083538.ggiginwo5k6m4ywq@erisian.com.au>
References: <CAMZUoKm7Y8bRJ+hqmvyPwwTWHaMA7JJc5TZdx7Eufax9G0D=Gg@mail.gmail.com>
 <20191128080659.msrpdpcjhhvbqtv2@erisian.com.au>
 <CAMZUoKmYi6btmhN8NmePPZNpPtXwNp=EpyYrxo7P2Ug7fnPSwQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CAMZUoKmYi6btmhN8NmePPZNpPtXwNp=EpyYrxo7P2Ug7fnPSwQ@mail.gmail.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] Signing CHECKSIG position in Tapscript
Precedence: list

On Sun, Dec 01, 2019 at 11:09:54AM -0500, Russell O'Connor wrote:
> On Thu, Nov 28, 2019 at 3:07 AM Anthony Towns <aj@erisian.com.au> wrote:
>     First, it seems like a bad idea for Alice to have put funds behind a
>     script she doesn't understand in the first place. There's plenty of
>     scripts that are analysable, so just not using ones that are too hard to
>     analyse sure seems like an option.
> I don't think this is true in general.  When constructing a script it seems
> quite reasonable for one party to come to the table with their own custom
> script that they want to use because they have some sort of 7-of-11 scheme but
> in one of those cases is really a 2-of-3 and another is 5-of-6.  The point is
> that you shouldn't need to decode their exact policy in order to collaborate
> with them.

Hmm, I take the opposite lesson from your scenario -- it's only fine for
people to bring their own 2-of-3 or 5-of-6 or whatever and replace a
simple key if you've got something like miniscript where you understand
the script completely enough that you can be sure those changes are
fine. 

For contrast, with ECDSA and pre-miniscript, the above scenario might
have gone like someone proposing to change:

  7 A B C1 C2 C3 C4 C5 C6 C7 C8 C9 11 CHECKMULTISIG

for something like

  7
  SWAP IF TOALT 2 A1 A2 A3 3 CHECKMULTISIGVERIFY FROMALT 1SUB ENDIF
  SWAP IF TOALT 5 B1 B2 B3 B4 B5 B6 6 CHECKMULTISIGVERIFY FROMALT 1SUB ENDIF
  C1 C2 C3 C4 C5 C6 C7 C8 C9 11 CHECKMULTISIG

but I think you'd want to be pretty sure you can decode those added
policies rather than just accepting it because your "C4" key is still
there. (In particular, any script fragment that uses an opcode that used
to be OP_SUCCESS could have arbitrary effects on the script)

[0]

> This notion is captured quite clearly in the MAST aspect of
> taproot. In many circumstances, it is sufficient for you to know that there
> exists a branch that contains a particular script without need to know what
> every branch contains.

(I'm trying to avoid using MAST in the context of taproot, despite the
backronym, so please excuse the rephrasing--)

I think if you're going to start using a taproot address with multiple
tapscripts, either as a participant in a multiparty smart contract,
or just to have different ways of spending your funds, then you do have
to analyse all the branches to make sure there's no hidden "all the
money goes to the Lizard People" script.

Once you've done that, you can then simplify things -- maybe some
scripts are only useful for other participants in the contract, or maybe
you've got a few different hardware wallets and one only needs to know
about one branch, while the other only needs to know about some other
branch, but you still need to have done the analysis in the first place.

Of course, probably most of the time that "analysis" is just making sure
the scripts match some well known, hardcoded template, as filled out
with various (tweaked) keys that you've checked elsewhere, but that
still ensures you know all the scripts do what you need them too.

>     Third, if you are doing something crazy complex where a particular key
>     could appear in different CHECKSIG operators and they should have
>     independent signatures, that seems like you're at the level of
>     complexity where learning about CODESEPARATOR is a reasonable thing to
>     do.
> So while I agree that learning about CODESEPARATOR is a reasonable thing to do,
> given that I haven't heard the CODESEPARATOR being proposed as protection
> against this sort of signature-copying attack before

Err? The current behaviour of CODESEP with taproot was first discussed in
[1], which summarised it as "CODESEP -- lets you require different sigs
for different parts of a single script" which seems to me like just a
different way of saying the same thing.

[1] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-November/016500.html

I don't think tapscript's CODESEP or the current CODESEP can be used
for anything other than preventing a signature from being reused for a
different CHECKSIG operation on the same pubkey within the same script.

> and given the subtle
> nature of the issue, I'm not sure people will know to use it to protect
> themselves.  We should aim for a Script design that makes the cheaper default
> Script programming choices the safer one.

I think techniques like miniscript and having fixed templates specified
in BIPs and BOLTs and the like are better approaches -- both let you
easily allow a limited set of changes that can be safely made to a policy
(maybe just substituting keys, hashes and times, maybe allowing more
general changes).

> On the other hand, in a previous thread a while ago I was also arguing that
> sophisticated people are plausibly using CODESEPARATOR today, hidden away in
> unredeemed P2SH UTXOs.  So perhaps I'm right about at least one of these two
> points. :)

Sounds like an economics argument :)

>      IF HASH160 x EQUALVERIFY groupa ELSE groupb ENDIF
>      MERKLEPATHVERIFY CHECKSIG
>     spendable by
>      siga keya path preimagex 1
>     or
>      sigb keyb path 0
> I admit my proposal doesn't automatically prevent this signature-copying attack
> against every Script template.

Right -- so if you're worried about this sort of attack, you need to
analyse your script to at least be sure that it's not one of these cases
that aren't covered. And if you've got to analyse the script anyway
(which I think you do no matter what), then there's no benefit -- you're
either doing something simple and you're using templates or miniscript
to make the analysis easy; or you're doing something novel and complex,
and you can probably cope with using CODESEP.

(Ultimately I think there's only really two cases where you're
contributing a signature for a tx: either you're a party to the contract,
and you should have fully analysed all the possible ways the utxo could
be spent to make sure the smart contract stuff is correctly implemented
and you can't be cheated; or you're acting as an oracle or similar and
don't really care how the contract goes because you're not a party to
it, in which case people reusing your signature as much as they like is
fine. Hardware wallets don't need to analyse scripts they sign for, eg,
but that's only because for those cases where their owners have done
that first)

> To be fully effective you need to be aware of
> this signature-copying attack vector to ensure your scripts are designed so
> that your CHECKSIG operations are protected by being within the IF block that
> does the verification of the hash-preimage.  My thinking is that my proposal is
> effective enough to save most people most of the time, even if it doesn't save
> everyone all the time, all while having no significant burden otherwise.

I agree the burden's pretty minor; but I think having a single value
for the tx digest for each input for SIGHASH_ALL is kind-of nice for
validation; and I think having to pass through a CHECKSIG position
everytime you do a signature is likely to be annoying for implementors
for pretty much zero actual benefit.

> Therefore, I don't think your point that there still exists a Script where a
> signature copying attack can be performed is adequate by itself to dismiss my
> proposal.

I'm making two points with that example: (1) it's a case where if
you don't analyse the scripts somehow, you can still be vulnerable to
the attack with your change -- so your change doesn't let you avoid
knowing what scripts do; but also (2) that CODESEP is a marginally more
efficient/general fix the problem. Maybe (1) isn't too important,
because even if it weren't true, I still think you need to know what all
the scripts do, but I think (2)'s still reelevant.

> Given that MAST design of taproot greatly reduces this problem compared to
> legacy script, I suppose you could argue that "the burden on all the other
> cases is too great" simply because you believe the problematic situation is now
> extremely rare.

As you aluded to in the previous mail; I think the problem's currently
extremely rare and trivially avoidable because we don't really have any
way of manipulating pubkeys -- there's no CAT, EC_ADD/EC_MUL/EC_TWEAK
or MERKLEPATHVERIFY opcode (or actual Merkle Abstract Syntax Trees or
OP_EXEC etc) to make it a dynamic concern rather than a static one.

> In particular, imagine a world where CODESEPARATOR never existed.  We have this
> signature copying attack to deal with, and we are designing a new Segwit
> version in which we can now address the problem.  One proposal that someone
> comes up with is to sign the CHECKSIG position (or sign the enclosing OP_IF/
> OP_ELSE... position), maybe using a SIGHASH flag to optionally disable it. 
> Someone else comes up with a proposal to add new "CODESEPARATOR" opcode which
> requires adding a new piece of state to the Script interpreter (the only
> non-stack based piece of state) to track the last executed CODESEPARATOR
> position and include that in the signature.  Would you really prefer the
> CODESEPARATOR proposal?

If CODESEP had never existed, I think my first response would be to say
"well, just make sure you don't reuse pubkeys, and because each
bip-schnorr sig commits to the pubkey, problem solved."

There's only two use cases I'm aware of, one is the ridiculous
reveal-a-secret-key-by-forced-nonce-reuse script that's never actually
been implemented [2] and ntumblebit's escrow script [3]. The first of
those requires pubkey recovery so doesn't work with bip-schnorr anyway;
and it's not clear to me whether the second is really reason enough to
justify a dedicated opcode/sighash/etc.

[2] https://lists.linuxfoundation.org/pipermail/lightning-dev/2015-November/000363.html
[3] https://github.com/NTumbleBit/NTumbleBit/blob/master/NTumbleBit/EscrowScriptBuilder.cs

An option would be to remove CODESEP and treat it as OP_SUCCESS -- that
way it could be introduced later with pretty much the exact semantics
that are currently proposed; or with some more useful semantics. That
way we could bring in whatever functionality was actually needed at the
same time as introducing CAT/EC_MUL/etc.

But my default position is to think that the way things currently work is
mostly fine, and we should default ot just keeping the same functionality
-- so SIGHASH_ALL doesn't do anything fancy, but CODESEP can be used to
prevent sig reuse.

>     > As a side benefit, we get to eliminate CODESEPARATOR, removing a fairly
>     awkward
>     > opcode from this script version.
> 
>     As it stands, ANYPREVOUTANYSCRIPT proposes to not sign the script code
>     (allowing the signature to be reused in different scripts) but does
>     continue signing the CODESEPARATOR position, allowing you to optionally
>     restrict how flexibly you can reuse signatures. That seems like a better
>     tradeoff than having ANYPREVOUTANYSCRIPT signatures commit to the CHECKSIG
>     position which would make it a fair bit harder to design scripts that
>     can share signatures, or not having any way to restrict which scripts
>     the signature could apply to other than changing the pubkey.

> Recall that originally CODESEPARTOR would let you sign a suffix of the Script
> program.  In the context of signing the whole script (which is always signed
> indirectly as part of the txid in legacy signatures) signing the offset into
> that scripts contains just as much information as signing a script suffix,
> while being constant sized.  When you remove the Script from the data being
> signed, signing an offset is no longer equivalent to signing a Script suffix,
> and an offset into an unknown data structure is a meaningless value by itself. 
The tapscript implementation isn't intended to be equivalent to signing
a script suffix; all it does is add an index to the digest being signed
so that signatures at different indexes are distinct. That it's
equivalent to the current behaviour is definitely a feature, but I think
that's a surprising coincidence than a useful way of thinking about the
actual usefulness of CODESEP in tapscript...

[4]

> Um, I believe that signing the CODESEPERATOR position without signing the
> script code is nonsensical.  You are talking about signing a piece of data
> without an interpretation of its meaning.

With ANYPREVOUTANYSCRIPT, you're still differentiating signatures by
index, you just no longer also commit to any of the other details of
the script. That means you can't prevent your signature being reused in
random other scripts someone else designs -- hence the "ANYSCRIPT" part --
but you can prevent any of your funds from going to those addresses, so
that's not really your problem anyway. What it does mean is that you can
prevent your signature from being reused in different scripts you do know
about; eg you might have a UTXO with four different tapscript branches:

     1) OP_1 CHECKSIG
     2) CODESEP OP_1 CHECKSIGVERIFY HASH160 x EQUAL
     3) n CLTV DROP CODESEP OP_1 CHECKSIGVERIFY
     4) k CSV DROP CODESEP OP_1 CHECKSIGVERIFY

(where OP_1 means using the taproot internal pubkey with support for
ANYPREVOUT*) -- that way a signature for either path (1) or (2) is only
valid for that path, but a signature for (3) can be reused for (4)
(or vice-versa), but not (1) or (2); and all those signatures could
be reused for other corresponding scripts, for instance with different
values for x,n,k if desired.

> There is no way that you should be signing CODESEPARATOR position without also
> covering the Script with the signature.

So I think it's more sensible than it seems; and still plausible enough
to leave in. If you don't want to separate your ANYPREVOUT scripts,
you can just not put a CODESEP in -- or at least only put CODESEP after
all your ANYPREVOUT CHECKSIGs; so it doesn't seem like it's creating
any added complexity.

Cheers,
aj

[0] For what it's worth, there's another reason not to allow replacing
    keys in a threshold sig with different policies: if you've got say
    30 people with a majority threshold of 16, then you could two groups
    of 9 people form parties and each agree to all vote along party lines;
    but if you let them replace their keys with multisig policies along
    those lines, you're now enforcing a 10-of-30 policy instead (as long
    as the 10 are 5 from the first party and 5 from the second party)
    and allowing minority control instead of majority rule.

[4] I wonder if it would be worth exploring whether we could do
    something more like the original (presumed) intent of CODESEP, given
    use of NOINPUT/ANYPREVOUT so as not to commit to the full script,
    you could potentially have a SIGHASH that committed to a hash of
    the script that's been executed so far, and also the witness data
    that's been consumed so far, but it would ensure the first part of
    the script behaved exactly as you expected, and allow the rest of
    the script to be arbitrarily weird, and (I think) be efficiently
    implementable. That doesn't give you delegation without the ability
    to also have executable witness data of some sort, but maybe something
    like it is interesting anyway?