MIME-Version: 1.0
References: <CAD5xwhjSj82YYuQHHbwgSLvUNV2RDY0b=yMYeLj-p6j7PpS9-Q@mail.gmail.com>
	<20190605093039.xfo7lcylqkhsfncv@erisian.com.au>
	<im0q8670MxshmvMLmoJU0dv4rFhwWZNvQeQYv7i4fBWJOx0ghAdH8fYuQSqNxO2z8uxXGV-kurinUDfl0FsLWD0knw_U_h3zVZ0xy7vmn8o=@protonmail.com>
	<CAMZUoK=ZB06jwAbuX2D=aN8ztAqr_jSgEXS1z1ABjQYVawKCBQ@mail.gmail.com>
	<CAD5xwhj8o8Vbrk2KADBOFGfkD3fW3eMZo5aHJytGAj_5LLhYCg@mail.gmail.com>
	<CAMZUoKkPUn01V7WruMqoYtwJ__ai-QPvD81ceoYC7j4+hC99gg@mail.gmail.com>
	<CAD5xwhi6QU5OZwSGMp4P3q7OYZMMZRUZgd2YOiUnv5tqgJxPSA@mail.gmail.com>
	<CAMZUoKkorcO+CD6jcV5tyCtrKuHq_2hJhKE08FTrqJz7GgPM8Q@mail.gmail.com>
	<CAD5xwhjaC61jOLvPrMcsvL9ji5zUAP-=ai3NhBojeQcC4v8DpA@mail.gmail.com>
In-Reply-To: <CAD5xwhjaC61jOLvPrMcsvL9ji5zUAP-=ai3NhBojeQcC4v8DpA@mail.gmail.com>
From: "Russell O'Connor" <roconnor@blockstream.io>
Date: Tue, 25 Jun 2019 13:05:39 -0400
Message-ID: <CAMZUoKk-WHk8xz0vSs+xoPKPeMWV16bfP+mCt80jCZk8BQqx=w@mail.gmail.com>
To: Jeremy <jlrubin@mit.edu>
Content-Type: multipart/alternative; boundary="000000000000deb3fc058c28f09d"
Cc: Bitcoin development mailing list <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] OP_SECURETHEBAG (supersedes OP_CHECKOUTPUTSVERIFY)
Precedence: list

--000000000000deb3fc058c28f09d
Content-Type: text/plain; charset="UTF-8"

Bitcoin Core is somewhat outside my core competence, but the various
OP_PUSHDATA are already multi-byte opcodes and GetOp already has a data
return parameter that is suitable for returning the payload of an immediate
32-byte data variant of OP_SECURETHEBAG.  All that I expect is needed is to
ensure that nowhere else is using a non-empty data-field as a proxy for a
non-empty push operation and fixing any such occurrences if they exist.
(AFAIKT there are only a handful of calls to GetOp).

It is probably worth updating the tapscript implementation to better
prepare it for new uses of OP_SUCCESSx.  Parsing should halt when an
OP_SUCCESSx is encountered, by having GetScriptOp advance the pc to end
after encountering such a code (decoding Script is no longer meaningful
after an OP_SUCCESS is encountered).  However, that means that GetScriptOp
needs to know what version of script it is expected to be parsing.  This
could be done by sending down some versioning flags, possibly by adding a
versioning field to CScript that can be initialized @
https://github.com/sipa/bitcoin/blob/7ddc7027b2cbdd11416809400c588e585a8b44ed/src/script/interpreter.cpp#L1679
or some other mechanism (and at the same time perhaps having GetSigOpCount
return 0 for tapscript, since counting sigops is not really meaningful in
tapscript). There are probably other reasonable approaches too (e.g your
option 2 below).  I could write some code to illustrate what I'm thinking
if you feel that would be helpful and I do think such changes around
OP_SUCCESS should be implemented regardless of whether we move forward with
OP_SECURETHEBAG or not.

It is probably worth doing this properly the first time around if we are
going to do it at all.

P.S. OP_RESERVED1 has been renamed to OP_SUCCESS137 in bip-tapscript.


>
> On Mon, Jun 24, 2019 at 6:47 PM Jeremy <jlrubin@mit.edu> wrote:

> I agree in principal, but I think that's just a bit of 'how things are'
> versus how they should be.
>
> I disagree that we get composability semantics because of OP_IF. E.g., the
> script "OP_IF .... " and "OP_END" are two scripts that separately are
> invalid as parsed, but together are valid. OP_IF already imposes some
> lookahead functionality... but as I understand it, it may be feasible to
> get rid of OP_IF for tapscripts anyways. Also in this bucket are P2SH and
> segwit, which I think breaks this because the concat of two p2sh scripts or
> segwit scripts is not the same as them severally.
>
> I also think that the OP_SECURETHEBAG use of pushdata is a backwards
> compatible hack: we can always later redefine the parser to parse
> OP_SECURETHEBAG as the 34 byte opcode, recapturing the purity of the
> semantics. We can also fix it to not use an extra byte in a future tapleaf
> version.
>

> In any case, I don't disagree with figuring out what patching the parser
> to handle multibyte opcodes would look like. If that sort of upgrade-path
> were readily available when I wrote this, it's how I would have done it.
> There are two approaches I looked at mostly:
>
> 1) Adding flags to GetOp to change how it parses
>   a) Most of the same code paths used for new and old script
>   b) Higher risk of breaking something in old script style/downstream
>   c) Cleans up only one issue (multibyte opcodes) leaves other warts in
> place
>   d) less bikesheddable design (mostly same as old script)
>   e) code not increased in size
> 2) Adding a completely new interpreter for Tapscript
>   a) Fork the existing interpreter code
>   b) For all places where scripts are run, switch based on if it is
> tapscript or not
>   c) Can clean up various semantics, can even do fancier things like
> huffman encode opcodes to less than a byte
>   d) Can clearly separate parsing the script from executing it
>   e) Can improve versioning techniques
>   f) Low risk of breaking something in old script style/downstream
>   g) Increases amount of code substantially
>   h) Bikesheddable design (everything is on the table).
>   i) probably a better general mechanism for future changes to script
> parsing, less consensus risk
>   j) More compatible with templated script as well.
>
> If not clear, I think that 2 is probably a better approach, but I'm
> worried that 2.h means this would take a much longer time to implement.
>
> 2 can be segmented into two components:
>
> 1) the architecture of script parser versioning
> 2) the actual new script version
>
> I think that component 1 can be relatively non controversial, thankfully,
> using tapleaf versions (the architecture question is more around code
> structure). A proof of concept of this would be to have a fork that uses
> two independent, but identical, script parsers.
>
> Part two of this plan would be to modify one of the versions
> substantially. I'm not sure what exists on the laundry list, but I think it
> would be possible to pick a few worthwhile cleanups. E.g.:
>
> 1) Multibyte opcodes
> 2) Templated scripts
> 3) Huffman Encoding opcodes
> 4) OP_IF handling (maybe just get rid of it in favor of conditional Verify
> semantics)
>
> And make it clear that because we can add future script versions fairly
> easily, this is a sufficient step.
>
>
> Does that seem in line with your understanding of how this might be done?
>

--000000000000deb3fc058c28f09d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Bitcoin Core is somewhat outside my core competence, =
but the various OP_PUSHDATA are already multi-byte opcodes and GetOp alread=
y has a data return parameter that is suitable for returning the payload of=
 an immediate 32-byte data variant of OP_SECURETHEBAG.=C2=A0 All that I exp=
ect is needed is to ensure that nowhere else is using a non-empty data-fiel=
d as a proxy for a non-empty push operation and fixing any such occurrences=
 if they exist.=C2=A0 (AFAIKT there are only a handful of calls to GetOp).<=
/div><div><br></div><div>It is probably worth updating the tapscript implem=
entation to better prepare it for new uses of OP_SUCCESSx.=C2=A0 Parsing sh=
ould halt when an OP_SUCCESSx is encountered, by having <span class=3D"gmai=
l-pl-en">GetScriptOp</span><span class=3D"gmail-pl-en"> advance the pc to e=
nd after encountering such a code (decoding Script is no longer meaningful =
after an OP_SUCCESS is encountered).=C2=A0 However, that means that <span c=
lass=3D"gmail-pl-en">GetScriptOp</span><span class=3D"gmail-pl-en"> needs t=
o know what version of script it is expected to be parsing.=C2=A0 This coul=
d be done by sending down some versioning flags, possibly by adding a versi=
oning field to CScript that can be initialized=C2=A0@ <a href=3D"https://gi=
thub.com/sipa/bitcoin/blob/7ddc7027b2cbdd11416809400c588e585a8b44ed/src/scr=
ipt/interpreter.cpp#L1679">https://github.com/sipa/bitcoin/blob/7ddc7027b2c=
bdd11416809400c588e585a8b44ed/src/script/interpreter.cpp#L1679</a> or some =
other mechanism (and at the same time perhaps having GetSigOpCount return 0=
 for tapscript, since counting sigops is not really meaningful in tapscript=
). There are probably other reasonable approaches too (e.g your option 2 be=
low).=C2=A0 I could write some code to illustrate what I&#39;m thinking if =
you feel that would be helpful and I do think such changes around OP_SUCCES=
S should be implemented regardless of whether we move forward with OP_SECUR=
ETHEBAG or not.<br></span></span></div><br><div style=3D"font-family:arial,=
helvetica,sans-serif;font-size:small;color:rgb(0,0,0)"><div dir=3D"ltr" cla=
ss=3D"gmail_attr"><div class=3D"gmail_quote">It is probably worth doing thi=
s properly the first time around if we are going to do it at all.<br></div>=
<div class=3D"gmail_quote"><br></div><div class=3D"gmail_quote">P.S. OP_RES=
ERVED1 has been renamed to OP_SUCCESS137 in bip-tapscript.<br></div><div cl=
ass=3D"gmail_quote"><div>=C2=A0</div><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddin=
g-left:1ex"><br></blockquote></div></div><div dir=3D"ltr" class=3D"gmail_at=
tr">On Mon, Jun 24, 2019 at 6:47 PM Jeremy &lt;<a href=3D"mailto:jlrubin@mi=
t.edu" target=3D"_blank">jlrubin@mit.edu</a>&gt; wrote:<br></div><blockquot=
e class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px s=
olid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div style=3D"font=
-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)">I agre=
e in principal, but I think that&#39;s just a bit of &#39;how things are=
9; versus how they should be.<br></div><div style=3D"font-family:arial,helv=
etica,sans-serif;font-size:small;color:rgb(0,0,0)"><br></div><div style=3D"=
font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)">I =
disagree that we get composability semantics because of OP_IF. E.g., the sc=
ript &quot;OP_IF .... &quot; and &quot;OP_END&quot; are two scripts that se=
parately are invalid as parsed, but together are valid. OP_IF already impos=
es some lookahead functionality... but as I understand it, it may be feasib=
le to get rid of OP_IF for tapscripts anyways. Also in this bucket are P2SH=
 and segwit, which I think breaks this because the concat of two p2sh scrip=
ts or segwit scripts is not the same as them severally.<br></div><div style=
=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)=
"><br></div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:=
small;color:rgb(0,0,0)">I also think that the OP_SECURETHEBAG use of pushda=
ta is a backwards compatible hack: we can always later redefine the parser =
to parse OP_SECURETHEBAG as the 34 byte opcode, recapturing the purity of t=
he semantics. We can also fix it to not use an extra byte in a future taple=
af version.<br></div></div></blockquote></div><div class=3D"gmail_quote"><b=
lockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-le=
ft:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div style=
=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)=
"><br></div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:=
small;color:rgb(0,0,0)">In any case, I don&#39;t disagree with figuring out=
 what patching the parser to handle multibyte opcodes would look like. If t=
hat sort of upgrade-path were readily available when I wrote this, it&#39;s=
 how I would have done it. There are two approaches I looked at mostly:<br>=
</div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:small;=
color:rgb(0,0,0)"><br></div><div style=3D"font-family:arial,helvetica,sans-=
serif;font-size:small;color:rgb(0,0,0)">1) Adding flags to GetOp to change =
how it parses</div><div style=3D"font-family:arial,helvetica,sans-serif;fon=
t-size:small;color:rgb(0,0,0)">=C2=A0 a) Most of the same code paths used f=
or new and old script</div><div style=3D"font-family:arial,helvetica,sans-s=
erif;font-size:small;color:rgb(0,0,0)">=C2=A0 b) Higher risk of breaking so=
mething in old script style/downstream</div><div style=3D"font-family:arial=
,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)">=C2=A0 c) Cleans up=
 only one issue (multibyte opcodes) leaves other warts in place</div><div s=
tyle=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,=
0,0)">=C2=A0 d) less bikesheddable design (mostly same as old script)</div>=
<div style=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:=
rgb(0,0,0)">=C2=A0 e) code not increased in size<br></div><div style=3D"fon=
t-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)">2) Ad=
ding a completely new interpreter for Tapscript</div><div style=3D"font-fam=
ily:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)">=C2=A0 a) =
Fork the existing interpreter code</div><div style=3D"font-family:arial,hel=
vetica,sans-serif;font-size:small;color:rgb(0,0,0)">=C2=A0 b) For all place=
s where scripts are run, switch based on if it is tapscript or not</div><di=
v style=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb=
(0,0,0)">=C2=A0 c) Can clean up various semantics, can even do fancier thin=
gs like huffman encode opcodes to less than a byte</div><div style=3D"font-=
family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)">=C2=A0 =
d) Can clearly separate parsing the script from executing it</div><div styl=
e=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0=
)">=C2=A0 e) Can improve versioning techniques</div><div style=3D"font-fami=
ly:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)">=C2=A0 f) L=
ow risk of breaking something in old script style/downstream</div><div styl=
e=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0=
)">=C2=A0 g) Increases amount of code substantially</div><div style=3D"font=
-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)">=C2=A0=
 h) Bikesheddable design (everything is on the table).</div><div style=3D"f=
ont-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)">=C2=
=A0 i) probably a better general mechanism for future changes to script par=
sing, less consensus risk</div><div style=3D"font-family:arial,helvetica,sa=
ns-serif;font-size:small;color:rgb(0,0,0)">=C2=A0 j) More compatible with t=
emplated script as well.</div><div style=3D"font-family:arial,helvetica,san=
s-serif;font-size:small;color:rgb(0,0,0)"><br></div><div style=3D"font-fami=
ly:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)">If not clea=
r, I think that 2 is probably a better approach, but I&#39;m worried that 2=
.h means this would take a much longer time to implement.</div><div style=
=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)=
"><br></div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:=
small;color:rgb(0,0,0)">2 can be segmented into two components:</div><div s=
tyle=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,=
0,0)"><br></div><div style=3D"font-family:arial,helvetica,sans-serif;font-s=
ize:small;color:rgb(0,0,0)">1) the architecture of script parser versioning=
</div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:small;=
color:rgb(0,0,0)">2) the actual new script version</div><div style=3D"font-=
family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)"><br></d=
iv><div style=3D"font-family:arial,helvetica,sans-serif;font-size:small;col=
or:rgb(0,0,0)">I think that component 1 can be relatively non controversial=
, thankfully, using tapleaf versions (the architecture question is more aro=
und code structure). A proof of concept of this would be to have a fork tha=
t uses two independent, but identical, script parsers.<br></div><div style=
=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)=
"><br></div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:=
small;color:rgb(0,0,0)">Part two of this plan would be to modify one of the=
 versions substantially. I&#39;m not sure what exists on the laundry list, =
but I think it would be possible to pick a few worthwhile cleanups. E.g.:<b=
r></div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:smal=
l;color:rgb(0,0,0)"><br></div><div style=3D"font-family:arial,helvetica,san=
s-serif;font-size:small;color:rgb(0,0,0)">1) Multibyte opcodes</div><div st=
yle=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0=
,0)">2) Templated scripts<br></div><div style=3D"font-family:arial,helvetic=
a,sans-serif;font-size:small;color:rgb(0,0,0)">3) Huffman Encoding opcodes<=
br></div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:sma=
ll;color:rgb(0,0,0)">4) OP_IF handling (maybe just get rid of it in favor o=
f conditional Verify semantics)<br></div><div style=3D"font-family:arial,he=
lvetica,sans-serif;font-size:small;color:rgb(0,0,0)"><br></div><div style=
=3D"font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)=
">And make it clear that because we can add future script versions fairly e=
asily, this is a sufficient step.</div><div style=3D"font-family:arial,helv=
etica,sans-serif;font-size:small;color:rgb(0,0,0)"><br></div><div style=3D"=
font-family:arial,helvetica,sans-serif;font-size:small;color:rgb(0,0,0)"><b=
r></div><div style=3D"font-family:arial,helvetica,sans-serif;font-size:smal=
l;color:rgb(0,0,0)">Does that seem in line with your understanding of how t=
his might be done?<br></div></div>
</blockquote></div></div>

--000000000000deb3fc058c28f09d--