diff options
author | Luke Kenneth Casson Leighton <lkcl@lkcl.net> | 2021-02-03 13:24:13 +0000 |
---|---|---|
committer | bitcoindev <bitcoindev@gnusha.org> | 2021-02-03 13:24:40 +0000 |
commit | 11e02f727f534dbd800d91fd68f22c4a8b6a5db3 (patch) | |
tree | 42d99d3a3696bc4c4f0171ddac3eabff01295e30 | |
parent | 7074ee4a79e89e6df375aadc89d89d08cc001633 (diff) | |
download | pi-bitcoindev-11e02f727f534dbd800d91fd68f22c4a8b6a5db3.tar.gz pi-bitcoindev-11e02f727f534dbd800d91fd68f22c4a8b6a5db3.zip |
Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
-rw-r--r-- | 98/2cbc869992a253427d828be46cc31ee073b10e | 364 |
1 files changed, 364 insertions, 0 deletions
diff --git a/98/2cbc869992a253427d828be46cc31ee073b10e b/98/2cbc869992a253427d828be46cc31ee073b10e new file mode 100644 index 000000000..004344e72 --- /dev/null +++ b/98/2cbc869992a253427d828be46cc31ee073b10e @@ -0,0 +1,364 @@ +Return-Path: <lkcl@lkcl.net> +Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) + by lists.linuxfoundation.org (Postfix) with ESMTP id 5CE92C013A + for <bitcoin-dev@lists.linuxfoundation.org>; + Wed, 3 Feb 2021 13:24:40 +0000 (UTC) +Received: from localhost (localhost [127.0.0.1]) + by hemlock.osuosl.org (Postfix) with ESMTP id 56C3B8710A + for <bitcoin-dev@lists.linuxfoundation.org>; + Wed, 3 Feb 2021 13:24:40 +0000 (UTC) +X-Virus-Scanned: amavisd-new at osuosl.org +Received: from hemlock.osuosl.org ([127.0.0.1]) + by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) + with ESMTP id EmdT4pluxN5h + for <bitcoin-dev@lists.linuxfoundation.org>; + Wed, 3 Feb 2021 13:24:38 +0000 (UTC) +X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 +Received: from lkcl.net (lkcl.net [217.147.94.29]) + by hemlock.osuosl.org (Postfix) with ESMTPS id 75C8D86FAD + for <bitcoin-dev@lists.linuxfoundation.org>; + Wed, 3 Feb 2021 13:24:38 +0000 (UTC) +DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lkcl.net; + s=201607131; + h=Content-Type:Cc:To:Subject:Message-ID:Date:From:References:In-Reply-To:MIME-Version; + bh=L3Docrkp8d32XEDGlVFtrKbkuYL/aC2gDKYaa1pep20=; + b=FBKppL1oitsFTHTq7I4FuSuD1iDkcOK/IwTSsUq8Fx+OHjbczmGhZcCTi0bFxqIFU0+7Ssc6cWTk0VMkUBWE1mpy1d1ETsoTLBV/WlGRsLZBO5wysHvs61HAkuM5mkgKHtFb71Q19Q2sxLenfHK1quJse+JbYXcKGWj9AtKncm4=; +Received: from mail-lf1-f49.google.com ([209.85.167.49]) + by lkcl.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) + (Exim 4.84_2) (envelope-from <lkcl@lkcl.net>) id 1l7I95-0003Yt-G2 + for bitcoin-dev@lists.linuxfoundation.org; Wed, 03 Feb 2021 13:24:35 +0000 +Received: by mail-lf1-f49.google.com with SMTP id e15so13113352lft.13 + for <bitcoin-dev@lists.linuxfoundation.org>; + Wed, 03 Feb 2021 05:24:20 -0800 (PST) +X-Gm-Message-State: AOAM530EJvrii88N5X+QMNIrq7VRp3g5X2atxxXegflftB2bYrPcG2QH + NWKBp8nK09zQuVKNIJ1lYd5BlnzCLDSPcm9XjTk= +X-Google-Smtp-Source: ABdhPJzBT7lnAR7ONWQgH8zXdhnCMEkGHOeU8qWqBQulyPMw/qpNHKw6mdC0vsEmJuBvTHUTeuJZX1dEGTnmlggAP0M= +X-Received: by 2002:a19:4191:: with SMTP id o139mr1773310lfa.224.1612358654527; + Wed, 03 Feb 2021 05:24:14 -0800 (PST) +MIME-Version: 1.0 +Received: by 2002:a05:6520:2f95:b029:bc:bc2b:60bb with HTTP; Wed, 3 Feb 2021 + 05:24:13 -0800 (PST) +In-Reply-To: <oCNGbVElAQCJ1bEmwLXLzIVec0ZoOA2Ar3vkOc1a0GW12h78bhMi_W4n3pCdDt7hJyPFoMRb0U1T5Wx5uQl4oo6zeQtjKs0MdAXGtvLw1SQ=@protonmail.com> +References: <CAPweEDx4wH_PG8=wqLgM_+RfTQEUSGfax=SOkgTZhe1FagXF9g@mail.gmail.com> + <oCNGbVElAQCJ1bEmwLXLzIVec0ZoOA2Ar3vkOc1a0GW12h78bhMi_W4n3pCdDt7hJyPFoMRb0U1T5Wx5uQl4oo6zeQtjKs0MdAXGtvLw1SQ=@protonmail.com> +From: Luke Kenneth Casson Leighton <lkcl@lkcl.net> +Date: Wed, 3 Feb 2021 13:24:13 +0000 +X-Gmail-Original-Message-ID: <CAPweEDy7Xf3nD1mfyX5MmtsGX=1sd5=gsLosZ=bYavJ0BZyy3g@mail.gmail.com> +Message-ID: <CAPweEDy7Xf3nD1mfyX5MmtsGX=1sd5=gsLosZ=bYavJ0BZyy3g@mail.gmail.com> +To: ZmnSCPxj <ZmnSCPxj@protonmail.com> +Content-Type: multipart/alternative; boundary="000000000000da8d8405ba6e8095" +X-Mailman-Approved-At: Wed, 03 Feb 2021 14:14:10 +0000 +Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org> +Subject: Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs +X-BeenThere: bitcoin-dev@lists.linuxfoundation.org +X-Mailman-Version: 2.1.15 +Precedence: list +List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org> +List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, + <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe> +List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/> +List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org> +List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help> +List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, + <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe> +X-List-Received-Date: Wed, 03 Feb 2021 13:24:40 -0000 + +--000000000000da8d8405ba6e8095 +Content-Type: text/plain; charset="UTF-8" + +(hi folks do cc me, i am subscribed digest, thank you for doing that, +ZmnSCPxj) + +On Wednesday, February 3, 2021, ZmnSCPxj <ZmnSCPxj@protonmail.com> wrote: +> Good morning Luke, +> +> I happen to have experience designing digital ASICs, mostly pipelined +data processing. +> However my experience is limited to larger geometries and in +SystemVerilog. + +larger geometries for a hardware wallet ASIC is ok (as long as it is not +retail based and trying to run e.g. RSA, taking so long to complete that +the retail customer walks out) + +> On the technical side, as I understand it (I have been out of that +industry for 4 years now, so my knowledge may be obsolete) + +not at all! still very valuable + +> as you approach lower geometries, you also start approaching analog +design. + +yyeah i could intuitively tell/guess there might be something like this +which would throw a spanner in the works, it is why the grant request i put +in specifically excluded data-dependent constant time analysis and also +power analysis. + + +> In our case we were already manually laying out gates and flip-flops (or +replacing flip-flops with level-triggered latches and being extra careful +with clocks) to squeeze performance (and area) ... + +ya-howw :) + + +> Many of the formal correctness proofs were really about the formal +equivalence of the netlist to the RTL; the correctness of the RTL was +"proved" by simulation testing. + +thanks to Symbiyosys we are using formal proofs much more extensively, as +effectively a 100% coverage replacement for unit tests. + +an example is popcount. we did two versions. one is a recursive tree +algorithm, almost impossible to read and understand what the hell it does. + +the other is a total braindead 1-liner "x = x + input[i]", rubbish +performance though. + +running a formal proof on these gave us 100% confidence that the complex +optimised version does the damn job. + + +yes we still do unit tests, these are more "demo code". + +now, the caveat is that you have to have a model of the "dut" (device under +test) against which to compare, and if the dut is ridiculously complex then +the formal model variant, which has to do the same job, ends up equally as +complex (or effectively a duplicate of the dut) and the exercise is a bit +of a waste of time... + +...*unless*... there happens to be other implementations out there. then +the proof can be run against those and everybody wins through collaboration. + + + +now, here's why i put in the NLnet Grant request to explore going back to +the mathematics of crypto-primitives. + +many ISAs e.g. intel AVX2 have added GFMULT8 etc etc because that does +S-Boxes for Rijndael. they have gone mad by analysing algorithms trying to +fit them to standard ISAs. + +nobody does Rijndael S-Boxes any way other than 256-entry lookup tables +because no standard ISA has general-purpose Galois Field Multiply. + +consequently implementations in assembler get completely divorced from the +original mathematics on which the cryptographic algorithm was based. + +the approach i would like to take is, "hang on a minute: how far would you +get if you actually added *general-purpose* instructions that *directly* +provided the underlying mathematical principles, and then wrapped a +Vector-Matrix Engine around them?". + +would this drastically simplify algorithms to the point where *READABLE* c +code compiles directly to opcodes that run screamingly fast, outperforming +hand-optimised SIMD code using standard ISAs? + +then, given the Formal Correctness approach above, can we verify that the +mathematically-related opcodes do the job? + + +> (to be fair, there were tools to force you to improve coverage by +injecting faults to your RTL, e.g. it would virtually flip an `&&` to an +`||` and if none of your tests signaled an error it would complain that +your test coverage sucked.) + +nice! + +> Things might have changed. + +nah. this is such a complex area, run by few incumbent players, that +innovation is rare. not least, innovation is different and cannot be +trusted by the Foundries! + + +> A good RTL would embed SystemVerilog Assertions or PSL Assertions as well. +> Some formal verification tools can understand a subset of SystemVerilog +Assertions / PSL assertions and validate that your RTL conformed to the +assertions, which would probably help cut down on the need for RTL +simulation. + +interesting. + +> Overall, my understanding is that smaller geometries are needed only if +you want to target a really high performance / unit cost and performance / +energy consumption ratios. +> That is, you would target smaller geometries for mining. + +yes. + +> If you need a secure tr\*sted computing module that does not need to be +fast or cheap, just very accurate to the required specification, the larger +geometries should be fine and you would be able to live almost entirely in +RTL-land without diving into netlist and layout specifications. + +hardware wallet ASICs. + +i concur. + +> A wrinkle here is that licenses for tools from tr\*sted vendors like +Synopsys or Cadence are ***expensive***. + +yes they are :) we are currently working with Sorbonne University LIP6.fr +and Staf Verhaegen from Chips4Makers, trying a different approach: +coriolis2. + +this will do fine up to 130nm (skywater). beyond that, mmm, we need a few +more years. + +> What is more, you should really buy two sets of licenses, e.g. do logic +synthesis with Synopsys and then formal verification with Cadence, because +you do not want to fully tr\*st just one vendor. + +interesting, good advice. + +> Synthesis in particular is a black box and each vendor keeps their +particular implementations and tricks secret. + +sigh. i think that's partly because they have to insert diodes, and +buffers, and generally mess with the netlist. + +i was stunned to learn that in a 28nm ASIC, 50% of it is repeater-buffers! + +plus, they make an awful lot of money, it is good business. + +> Pointing some funding at the open-source Icarus Verilog might also fit, +as it lost its ability to do synthesis more than a decade ago due to +inability to maintain. + +ah i didn't know it could do synthesis at all! i thought it was simulation +only. + +> Note as well that I heard (at the time when I was in the industry) that +some foundries will not even accept a netlist unless it was created by a +synthesis tool from one of the major vendors (Synopsys, Cadence, Mentor +Graphics, maybe more I have forgotten since). + +yes i heard this too, they don't want their time wasted: after all they +only make money by selling wafers, and if they can't sell any they have to +run empty wafers to keep the equipment at operating temperature. + +if you book a slot 18 months in advance and the RTL doesn't work during +testing 3 months before the deadline they may not be able to find someone +else in time. + +anything to reduce the risk there is good, so i totally get why. + +thank you for the insights and the discussion, really appreciated. + +l. + + +-- +--- +crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 + +--000000000000da8d8405ba6e8095 +Content-Type: text/html; charset="UTF-8" +Content-Transfer-Encoding: quoted-printable + +(hi folks do cc me, i am subscribed digest, thank you for doing that, ZmnSC= +Pxj)<br><br>On Wednesday, February 3, 2021, ZmnSCPxj <<a href=3D"mailto:= +ZmnSCPxj@protonmail.com">ZmnSCPxj@protonmail.com</a>> wrote:<br>> Goo= +d morning Luke,<br>><br>> I happen to have experience designing digit= +al ASICs, mostly pipelined data processing.<br>> However my experience i= +s limited to larger geometries and in SystemVerilog.<br><br>larger geometri= +es for a hardware wallet ASIC is ok (as long as it is not retail based and = +trying to run e.g. RSA, taking so long to complete that the retail customer= + walks out)<br><br>> On the technical side, as I understand it (I have b= +een out of that industry for 4 years now, so my knowledge may be obsolete)<= +br><br>not at all! still very valuable<br><br>> as you approach lower ge= +ometries, you also start approaching analog design.<br><br>yyeah i could in= +tuitively tell/guess there might be something like this which would throw a= + spanner in the works, it is why the grant request i put in specifically ex= +cluded data-dependent constant time analysis and also power analysis.<br><b= +r><br>> In our case we were already manually laying out gates and flip-f= +lops (or replacing flip-flops with level-triggered latches and being extra = +careful with clocks) to squeeze performance (and area) ...<br><br>ya-howw := +)<br><br><br>> Many of the formal correctness proofs were really about t= +he formal equivalence of the netlist to the RTL; the correctness of the RTL= + was "proved" by simulation testing.<br><br>thanks to Symbiyosys = +we are using formal proofs much more extensively, as effectively a 100% cov= +erage replacement for unit tests.<br><br>an example is popcount. =C2=A0we d= +id two versions. =C2=A0one is a recursive tree algorithm, almost impossible= + to read and understand what the hell it does.<br><br>the other is a total = +braindead 1-liner "x =3D x + input[i]", rubbish performance thoug= +h.<br><br>running a formal proof on these gave us 100% confidence that the = +complex optimised version does the damn job.<br><br><br>yes we still do uni= +t tests, these are more "demo code".<br><br>now, the caveat is th= +at you have to have a model of the "dut" (device under test) agai= +nst which to compare, and if the dut is ridiculously complex then the forma= +l model variant, which has to do the same job, ends up equally as complex (= +or effectively a duplicate of the dut) and the exercise is a bit of a waste= + of time...<br><br>...*unless*... there happens to be other implementations= + out there. =C2=A0then the proof can be run against those and everybody win= +s through collaboration.<br><br><br><br>now, here's why i put in the NL= +net Grant request to explore going back to the mathematics of crypto-primit= +ives.<br><br>many ISAs e.g. intel AVX2 have added GFMULT8 etc etc because t= +hat does S-Boxes for Rijndael. =C2=A0they have gone mad by analysing algori= +thms trying to fit them to standard ISAs.<br><br>nobody does Rijndael S-Box= +es any way other than 256-entry lookup tables because no standard ISA has g= +eneral-purpose Galois Field Multiply.<br><br>consequently implementations i= +n assembler get completely divorced from the original mathematics on which = +the cryptographic algorithm was based.<br><br>the approach i would like to = +take is, "hang on a minute: how far would you get if you actually adde= +d *general-purpose* instructions that *directly* provided the underlying ma= +thematical principles, and then wrapped a Vector-Matrix Engine around them?= +".<br><br>would this drastically simplify algorithms to the point wher= +e *READABLE* c code compiles directly to opcodes that run screamingly fast,= + outperforming hand-optimised SIMD code using standard ISAs? <br><br>then, = +given the Formal Correctness approach above, can we verify that the mathema= +tically-related opcodes do the job?<br><br><br>> (to be fair, there were= + tools to force you to improve coverage by injecting faults to your RTL, e.= +g. it would virtually flip an `&&` to an `||` and if none of your t= +ests signaled an error it would complain that your test coverage sucked.)<b= +r><br>nice!<br><br>> Things might have changed.<br><br>nah. =C2=A0this i= +s such a complex area, run by few incumbent players, that innovation is rar= +e. =C2=A0not least, innovation is different and cannot be trusted by the Fo= +undries!<br><br><br>> A good RTL would embed SystemVerilog Assertions or= + PSL Assertions as well.<br>> Some formal verification tools can underst= +and a subset of SystemVerilog Assertions / PSL assertions and validate that= + your RTL conformed to the assertions, which would probably help cut down o= +n the need for RTL simulation.<br><br>interesting.<br><br>> Overall, my = +understanding is that smaller geometries are needed only if you want to tar= +get a really high performance / unit cost and performance / energy consumpt= +ion ratios.<br>> That is, you would target smaller geometries for mining= +.<br><br>yes.<br><br>> If you need a secure tr\*sted computing module th= +at does not need to be fast or cheap, just very accurate to the required sp= +ecification, the larger geometries should be fine and you would be able to = +live almost entirely in RTL-land without diving into netlist and layout spe= +cifications.<br><br>hardware wallet ASICs.<br><br>i concur.<br><br>> A w= +rinkle here is that licenses for tools from tr\*sted vendors like Synopsys = +or Cadence are ***expensive***.<br><br>yes they are :) =C2=A0we are current= +ly working with Sorbonne University LIP6.fr and Staf Verhaegen from Chips4M= +akers, trying a different approach: coriolis2.<br><br>this will do fine up = +to 130nm (skywater). =C2=A0beyond that, mmm, we need a few more years.<br><= +br>> What is more, you should really buy two sets of licenses, e.g. do l= +ogic synthesis with Synopsys and then formal verification with Cadence, bec= +ause you do not want to fully tr\*st just one vendor.<br><br>interesting, g= +ood advice.<br><br>> Synthesis in particular is a black box and each ven= +dor keeps their particular implementations and tricks secret.<br><br>sigh. = +=C2=A0i think that's partly because they have to insert diodes, and buf= +fers, and generally mess with the netlist.<br><br>i was stunned to learn th= +at in a 28nm ASIC, 50% of it is repeater-buffers!<br><br>plus, they make an= + awful lot of money, it is good business.<br><br>> Pointing some funding= + at the open-source Icarus Verilog might also fit, as it lost its ability t= +o do synthesis more than a decade ago due to inability to maintain.<br><br>= +ah i didn't know it could do synthesis at all! i thought it was simulat= +ion only.<br><br>> Note as well that I heard (at the time when I was in = +the industry) that some foundries will not even accept a netlist unless it = +was created by a synthesis tool from one of the major vendors (Synopsys, Ca= +dence, Mentor Graphics, maybe more I have forgotten since).<br><br>yes i he= +ard this too, they don't want their time wasted: after all they only ma= +ke money by selling wafers, and if they can't sell any they have to run= + empty wafers to keep the equipment at operating temperature.<br><br>if you= + book a slot 18 months in advance and the RTL doesn't work during testi= +ng 3 months before the deadline they may not be able to find someone else i= +n time.<br><br>anything to reduce the risk there is good, so i totally get = +why.<br><br>thank you for the insights and the discussion, really appreciat= +ed.<br><br>l.<br><br><br>-- <br>---<br>crowd-funded eco-conscious hardware:= + <a href=3D"https://www.crowdsupply.com/eoma68" target=3D"_blank">https://w= +ww.crowdsupply.com/eoma68</a><br><br> + +--000000000000da8d8405ba6e8095-- + |