Date: Thu, 11 Feb 2021 08:20:54 +0000
To: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
From: ZmnSCPxj <ZmnSCPxj@protonmail.com>
Reply-To: ZmnSCPxj <ZmnSCPxj@protonmail.com>
Message-ID: <puUth0RIvY16I3ghjUiTkIPJQEKETPLZrm2QiiELW8AheIGIin29u5RkztTXIeYIK0xg2UIbsx6m-TpkJU2BvmVyYYr_BYbCdIQSk2t7TkU=@protonmail.com>
In-Reply-To: <CAPweEDy7Xf3nD1mfyX5MmtsGX=1sd5=gsLosZ=bYavJ0BZyy3g@mail.gmail.com>
References: <CAPweEDx4wH_PG8=wqLgM_+RfTQEUSGfax=SOkgTZhe1FagXF9g@mail.gmail.com>
 <oCNGbVElAQCJ1bEmwLXLzIVec0ZoOA2Ar3vkOc1a0GW12h78bhMi_W4n3pCdDt7hJyPFoMRb0U1T5Wx5uQl4oo6zeQtjKs0MdAXGtvLw1SQ=@protonmail.com>
 <CAPweEDy7Xf3nD1mfyX5MmtsGX=1sd5=gsLosZ=bYavJ0BZyy3g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
Precedence: list


Good morning Luke,

> > (to be fair, there were tools to force you to improve coverage by injec=
ting faults to your RTL, e.g. it would virtually flip an `&&` to an `||` an=
d if none of your tests signaled an error it would complain that your test =
coverage sucked.)
>
> nice!

It should be possible for a tool to be developed to parse a Verilog RTL des=
ign, then generate a new version of it with one change.
Then you could add some automation to run a set of testcases around mutated=
 variants of the design.
For example, it could create a "wrapper" module that connects to an unmutat=
ed differently-named version of the design, and various mutated versions, w=
ire all their inputs together, then compare outputs.
If the testcase could trigger an output of a mutated version to be differen=
t from the reference version, then we would consider that mutation covered =
by that testcase.
Possibly that could be done with Verilog-2001 file writing code in the wrap=
per module to dump out which mutations were covered, then a summary program=
 could just read in the generated file.
Or Verilog plugins could be used as well (Icarus supports this, that is how=
 it implements all `$` functions).

A drawback is that just because an output is different does not mean the te=
stcase actually ***checks*** that output.
If the testcase does not detect the diverging output it could still not be =
properly covering that.

The point of this is to check coverage of the tests.
Not sure how well this works with formal validation.


> > Synthesis in particular is a black box and each vendor keeps their part=
icular implementations and tricks secret.
>
> sigh. =C2=A0i think that's partly because they have to insert diodes, and=
 buffers, and generally mess with the netlist.
>
> i was stunned to learn that in a 28nm ASIC, 50% of it is repeater-buffers=
!

Well, that surprises me as well.

On the other hand, smaller technologies consistently have lower raw output =
current driving capability due to the smaller size, and as trace width goes=
 down and frequency goes up they stop acting like ideal 0-impedance traces =
and start acting more like transmission lines.
So I suppose at some point something like that would occur and I should not=
 actually be surprised.
(Maybe I am more surprised that it reached that level at that technology si=
ze, I would have thought 33% at 7nm.)

In the modules where we were doing manual netlist+layout, we used inverting=
 buffers instead (slightly smaller than non-inverrting buffers, in most tec=
hnologies a non-inverting buffer is just an inverter followed by an inverti=
ng buffer), it was an advantage of manual design since it looks like synthe=
sis tools are not willing to invert the contents of intermediate flip-lfops=
 even if it could give theoretical speed+size advantage to use an inverting=
 buffer rather than an non-inverting one (it looks like synthesis optimizat=
ion starts at the output of flip-flops and ends at their input, so a manual=
 designer could achieve slightly better performance if they were willing to=
 invert an intermediate flip-flop).
Another was that inverting latches were smaller in the technology we were u=
sing than non-inverting latches, so it was perfectly natural for us to use =
an inverting latch and an inverting buffer on those parts where we needed h=
igher fan-out (t was equivalent to a "custom" latch that had higher-than-no=
rmal driving capability).

Scan chain test generation was impossible though, as those require flip-flo=
ps, not latches.
Fortunately this was "just" deserialization of high-frequency low-width dat=
a with no transformation of the data (that was done after the deserializati=
on, at lower clock speeds but higher data width, in pure RTL so flip-flops)=
, so it was judged acceptable that it would not be covered by scan chain, s=
ince scan chain is primarily for testing combinational logic between flip-f=
lops.
So we just had flip-flops at the input, and flip-flops at the output, and f=
orced all latches to pass-through mode, during scan mode.
We just needed to have enough coverage to uncover stuck-at faults (which wa=
s still a pain, since additional test vectors slow down manufacturing so we=
 had to reduce the test vectors to the minimum possible) in non-scan-momde =
testing.

Man, making ASICs was tough.


>
> plus, they make an awful lot of money, it is good business.
>
> > Pointing some funding at the open-source Icarus Verilog might also fit,=
 as it lost its ability to do synthesis more than a decade ago due to inabi=
lity to maintain.
>
> ah i didn't know it could do synthesis at all! i thought it was simulatio=
n only.

Icarus was the only open-source synthesis tool I could find back then, and =
it dropped synthesis capability fairly early due to maintenance burden (I n=
ever managed to get the old version with synthesis compiled and never manag=
ed actual synthesis on it, so my knowledge of it is theoretical).


There is an argument that open-source software is not truly open-source unl=
ess it can be compiled by open-source compilers or executed by open-source =
interpreters.
Similarly, I think open-source hardware RTL designs are not truly open-sour=
ce if there are no open-source synthesis tools that can synthesize it to ne=
tlist and then lay it out.

Icarus can interpret most Veriog RTL designs, though.
However, at the time I left, I had already mandated that new code should us=
e `always_comb` and `always_ff` (previously I had mandated that new code sh=
ould use `always @*` for combinational logic) and was encouraging my subord=
inates to use loops and `generate`.
Icarus did not support `always_comb` and `always_ff` at the time (though wo=
rked perfectly fine with loops and `generate`).
In addition, at the time, we (actually just me in that company haha) were d=
abbling in object-oriented testing methodologies (which Icarus has no plans=
 on ever implementing, which is understandable since it is a massive increa=
se in complexity, it is much much harder than the scheduling shenanigans of=
 `always_comb` and the "just treat it as `always`" of `always_ff`).

(Particularly, you need object-oriented testbenches since SystemVerilog inc=
ludes a fuzz-testing framework to randomize fields of objects according to =
certain engineer-provided constraints, and then you would use those object =
fields to derive the test vectors your test framework would feed into the D=
UT, this was a massive increase in code coverage for a largish up-front cos=
t but once you built the test framework you could just dump various constra=
ints on your test specification objects, I actually caught a few bugs that =
we would not have otherwise found with our previous checklist-based testing=
 methodology.)
(Unfortunately it turned out that it required a more expensive license and =
I ended up hogging the only one we had of that more expensive license (whic=
h, if I remember correctly, was the same license needed for formal verifica=
tion of netlist<->RTL equivalence) for this, which killed enthusiasm for th=
is technique, sigh, this is another argument for getting open-source hardwa=
re design tools developed; not much sense in having open-source RTL for a c=
rypto device if you have to pay through the nose for a license just to synt=
hesize it, never mind the manufacturing cost.)


-----------------------


Another point to ponder is test modes.

In mass production you **need** test modes.
There will always be some number of manufacturing defects because even the =
cleanest of cleanrooms *will* have a tiny amount of contaminants (what can =
go wrong will go wrong).
Test modes are run in manufacturing to filter out chips with failing circui=
try due to contamination.

Now, a typical way of implementing test modes is to have a special command =
sent over, say, the "normal" serial port interface of a chip, which then en=
ters various test modes to allow, say, scan chain testing.
Of course, scan chain testing is done by pushing test vectors into all flip=
-flops, and then the test is validated by pulsing global clock once (often =
the test mode forces all flip-flops on the same clock), then pulling data f=
rom all flip-flops to verify that all the circuitry works as designed.

The "pulling data from all flip-flops" is of course just another way of say=
ing that all mass-produced chips have a way of letting ***anyone*** exfiltr=
ate data from their flip-flops via test modes.

Thus, for a secure environment, you need to ensure that test modes cannot b=
e entered after the device enters normal operation.
For example, you might have a dedicated pad which is normally pulled-down, =
but if at reset it is pulled up, the device enters test mode.
If at reset the pad is pulled down, the device is in normal mode and even i=
f the pad is pulled up afterwards the device will not enter test mode.
This ensures that only reset data can be read from the device, without poss=
ibility of exfiltration of sensitive (key material or midstate) data.
The pad should also not be exposed as a package pinout except perhaps on DS=
 and ES packages, and the pulldown resistor has to be on-chip.

As an additional precaution, we can also create a small secure memory (mayb=
e 256 octet addressable would be more than enough).
It is possible to exempt flip-flops from scan chain generation (usually by =
explicitly instantiating flip-flops in a separate module and telling post-s=
ynthesis tools to exempt the module from scan chain synthesis).
This gives an extra layer of protection against test mode accessing sensiti=
ve data; even if we manage to screw up test mode and it is possible to forc=
e reset on the test mode circuit without resetting the rest of the design, =
sensitive data is still out of the scan chain.
Of course, since they are not on scan, it is possible they have undetectabl=
e manufacturing defects, so you would need to use some kind of ECC, or bett=
er triple-redundancy best-of-three, to protect against manufacturing defect=
s on the non-scan flip-flops.
Fortunately non-scan flip-flops are often a good bit smaller than scan flip=
-flops, so the redundancy is not so onerous.
Since the ECC / best-of-three circuit itself would need to be tested, you w=
ould multiplex their inputs, in normal mode they get inputs from the non-sc=
an-chain flip-flops, in test mode they get inputs from separate scan-chain =
flip-flops, so that the ECC / best-of-three circuit is testable at scan mod=
e.
You would also need a separate test of the secure memory, this time running=
 in normal mode with a special test program in the CPU, just in case.
Finally, you would explicitly lay them out "distributed" around the chip, s=
ince manufacturing defects tend to correlate in space (they are usually fro=
m dust, and dust particles can be large relative to cell size), you do not =
want all three of the best-of-three to have manufacturing defects.
For example, you could have a 256 x 8 non-scan-chain flip-flop module, inst=
antiate three of those, and explicitly place them in corners of the digital=
 area, then use a best-of-three circuit to resolve the "correct" value.

The test mode circuit itself could ensure that the device enters test mode =
if and only if the secure memory contains all 0 data after the test mode ci=
rcuit is reset.
For example, the 256 x 8 non-scan-chain flip-flop module could have a large=
 OR circuit that ORs all the flip-flops, then outputs a single bit that is =
the bitwise OR of all the flip-flop contents.
Then the test mode circuit gets the `in_use` outputs fo the three secure fl=
ip-flop modules, and if at reset any of them are `1` then it will refuse to=
 enter test mode even if the test mode pad is pulled high.
This ensures that even if an attacker is somehow able to reset *only* the t=
est mode circuit somehow (this is basic engineering, always assume somethin=
g will go wrong), if the secure memory has any non-0 data (we presume it re=
sets to 0), the device will still not enter test mode.

Of course, if the secure memory itself is accessible from the CPU, then it =
remains possible that a CPU program is reading from the secure area, keepin=
g raw data in CPU registers, from which a test-mode might be able to extrac=
t if the device is somehow forced into test mode even after normal mode.
You could redesign your implementations of field multiplication and SHA mid=
state computation so that they directly read from the secure memory and wri=
te to the secure memory without using any flip-flops along the way, and hav=
e only the cryptographic circuit have access to the secure memory.
That way there is reduced possibility that intermediate flip-flops (that ar=
e part of the scan chain) outside the secure memory having sensitive key ma=
terial or midstate data.
You would need to use a custom bus with separate read and write addresses, =
and non-pipelined unbuffered access, and since you want to distribute your =
secure memory physically distant, that translates to wide and long buses (i=
t might be better to use 64 x 32 or 32 x 64 addressable memories, to increa=
se what the cryptographic circuit has access to per clock cycle) screwing w=
ith your layout, and probably having to run the secure memory + crypto circ=
uit at a ***much*** slower clock domain (but more secure is a good tradeoff=
 for slowness).
Of course, that is a major design headache (the crypto circuit has to act m=
ostly as a reduced-functionality processor), so you might just want to have=
 the CPU directly access the secure memory and in early boot poke a `0x01` =
in some part of the memory, in the hope that the `in_use` flag in the previ=
ous paragraph is enough to suppress test modes from exfiltrating CPU regist=
ers.

Do note that with enough power-cycles and ESD noise you can put digital cir=
cuitry into really weird and unexpected states (seen it happen, though fair=
ly hard to replicate, we had an ESD gun you could point at a chip to make i=
t go into weird states), so being extra paranoid about test modes is import=
ant.
What can go wrong will go wrong!
In particular with "`TESTMODE_PAD` is only checked at reset" you would have=
 to store `TESTMODE` in a non-scan flip-flop, and with enough targeted ESD =
that flip-flop can be jostled, setting `TESTMODE` even after normal operati=
on.
You might instead want to use, say, a byte pattern instead of a single bit =
to represent `TESTMODE`, so the `TESTMODE` register has to have a specific =
value such as `0xA5`, so that targeted ESD has to be very lucky in order to=
 force your device into test mode.
For example, since you need to check the `TESTMODE` pad at reset anyway, yo=
u could do something like this:

    input CLK, RESET_N, TESTMODE_PAD, IN_USE0, IN_USE1, IN_USE2;
    output reg TESTMODE;

    wire in_use =3D IN_USE0 || IN_USE1 || IN_USE2;

    reg [7:0] testmode_ff;
    wire [7:0] next_testmode_ff =3D
        (testmode_ff =3D=3D 8'hA5 || testmode_ff =3D=3D 8'h00) ?
          (TESTMODE_PAD && !in_use) ?                      8'hA5 :
          /*otherwise*/                                    8'h5A :
        /*otherwise*/                                      testmode_ff ;
    always_ff @(posedge CLK, negedge RESET_N) begin
        if (!RESET_N) testmode_ff <=3D 0x00;
        else          testmode_ff <=3D next_testmode_ff; end

    wire next_TESTMODE =3D (testmode_ff =3D=3D 8'hA5);
    always_ff @(posedge CLK, negedge RESET_N) begin
        if (!RESET_N) TESTMODE <=3D 1'b0;
        else          TESTMODE <=3D next_TESTMODE; end

Do note that the `TESTMODE` is a flip-flop, since you do ***not*** want gli=
tches on the `TESTMODE` signal line, it would be horribly unsafe to output =
it from combinational circuitry directly, please do not do that.
Of course that flip-flop can instead be the target of ESD gunnery, but sinc=
e you need many clock pulses to read the scan chain, it should with good pr=
obability also get set to `0` on the next clock pulse and leave test mode (=
and probably crash the device as well until full reset, but this "fails saf=
e" since at least sensitive data cannot be extracted).
`TESTMODE` has no feedback, thus cannot be stuck in a state loop.
`testmode_ff` *can* be stuck in a state loop, but that is deliberate, as it=
 would "fail safe" if it gets a value other than `0xA5`, it would not enter=
 test mode (and if it enters `0xA5` it can easily leave test mode by either=
 `TESTMODE_PAD` or `in_use`).

(Sure, an attacker can try targeted ESD at the `TESTMODE` flip-flop repeate=
dly, but this risks also flipping other scan flip-flops that contain the da=
ta that is being extracted, so this might be sufficient protection in pract=
ice.)

If you are really going to open-source the hardware design then the layout =
is also open and attackers can probably target specific chip area for ESD p=
ulse to try a flip-flop upset, so you need to be extra careful.
Note as well that even closed-source "secure" elements can be reverse-engin=
eered (I used to do this in the IC design job as a junior engineer, it was =
the sort of shitty brain-numbing work forced on new hires), so security-by-=
obscurity does have a limit as well, it should be possible to try to figure=
 out the testmode circuitry on "secure" elements and try to get targeted ES=
D upsets at flip-flops on the testmode circuit.

Test mode design is something of an arcane art, especially if you are tryin=
g to build a security device, on the one hand you need to ensure you delive=
r devices without manufacturing defects, on the other hand you need to ensu=
re that the test mode is not entered inadvertently by strange conditions.

In general, because test modes are such a pain to deal with securely, and a=
re an absolute necessity for mass production, you should assume that any "s=
ecure" chip can be broken by physical access and shooting short-range ESD p=
ulses at it to try to get it into some test mode, unless it is openly desig=
ned to prevent test mode from persisting after entering normal mode, as abo=
ve.

(No idea how that ESD gun thing worked or what it was formally called, we j=
ust called it the ESD gun, it was an amusing toy, you point it at the DUT a=
nd pull the trigger and suddenly it would switch modes, this of course was =
a bad thing since you want to make sure that as much as possible such upset=
s do not cause the chip to enter an irrecoverable mode but an amusing thing=
 to do still, we even had small amounts of flash memory containing register=
 settings that we would load into the settings registers periodically at th=
e end of each display frame to protect against this kind of ESD gun thing s=
ince the flip-flops backing the settings registers were vulnerable to it an=
d we needed a way to preserve the settings of the customer for the IC, the =
expected effect would be to cause the display to flicker.)

Regards,
ZmnSCPxj