5c/09e8d98293b461b3637ec3c38be656a904d319


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401

Return-Path: <ZmnSCPxj@protonmail.com>
Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136])
 by lists.linuxfoundation.org (Postfix) with ESMTP id 1983AC013A
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Thu, 11 Feb 2021 08:21:09 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by smtp3.osuosl.org (Postfix) with ESMTP id F26CC6F486
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Thu, 11 Feb 2021 08:21:08 +0000 (UTC)
X-Virus-Scanned: amavisd-new at osuosl.org
Received: from smtp3.osuosl.org ([127.0.0.1])
 by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id e8A8Z8_XH12A
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Thu, 11 Feb 2021 08:21:07 +0000 (UTC)
Received: by smtp3.osuosl.org (Postfix, from userid 1001)
 id 353696F4F9; Thu, 11 Feb 2021 08:21:07 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
Received: from mail-40132.protonmail.ch (mail-40132.protonmail.ch
 [185.70.40.132])
 by smtp3.osuosl.org (Postfix) with ESMTPS id C52586F486
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Thu, 11 Feb 2021 08:21:03 +0000 (UTC)
Date: Thu, 11 Feb 2021 08:20:54 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com;
 s=protonmail; t=1613031659;
 bh=xGUr/7ib+GSLc6AIyLvhELrXXaBU0AD+hxACVMAMycs=;
 h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From;
 b=iL3mNIKcmAivWHz5ayYyr3TeFdJBC1zZfItEsn9leC/GaKigdQDWbgXgHtebkSpv3
 /pZ3Mc/vbpw1JAOAIdH9zYjh48z6FXzT0I/ObBc0SKNp/S7yXbQjRixZ6KH2e7T67J
 YuaPtR5N0FY9+/2ecK5oZ2kao17ggLe0dGOwqChM=
To: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
From: ZmnSCPxj <ZmnSCPxj@protonmail.com>
Reply-To: ZmnSCPxj <ZmnSCPxj@protonmail.com>
Message-ID: <puUth0RIvY16I3ghjUiTkIPJQEKETPLZrm2QiiELW8AheIGIin29u5RkztTXIeYIK0xg2UIbsx6m-TpkJU2BvmVyYYr_BYbCdIQSk2t7TkU=@protonmail.com>
In-Reply-To: <CAPweEDy7Xf3nD1mfyX5MmtsGX=1sd5=gsLosZ=bYavJ0BZyy3g@mail.gmail.com>
References: <CAPweEDx4wH_PG8=wqLgM_+RfTQEUSGfax=SOkgTZhe1FagXF9g@mail.gmail.com>
 <oCNGbVElAQCJ1bEmwLXLzIVec0ZoOA2Ar3vkOc1a0GW12h78bhMi_W4n3pCdDt7hJyPFoMRb0U1T5Wx5uQl4oo6zeQtjKs0MdAXGtvLw1SQ=@protonmail.com>
 <CAPweEDy7Xf3nD1mfyX5MmtsGX=1sd5=gsLosZ=bYavJ0BZyy3g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Feb 2021 08:21:09 -0000


Good morning Luke,

> > (to be fair, there were tools to force you to improve coverage by injec=
ting faults to your RTL, e.g. it would virtually flip an `&&` to an `||` an=
d if none of your tests signaled an error it would complain that your test =
coverage sucked.)
>
> nice!

It should be possible for a tool to be developed to parse a Verilog RTL des=
ign, then generate a new version of it with one change.
Then you could add some automation to run a set of testcases around mutated=
 variants of the design.
For example, it could create a "wrapper" module that connects to an unmutat=
ed differently-named version of the design, and various mutated versions, w=
ire all their inputs together, then compare outputs.
If the testcase could trigger an output of a mutated version to be differen=
t from the reference version, then we would consider that mutation covered =
by that testcase.
Possibly that could be done with Verilog-2001 file writing code in the wrap=
per module to dump out which mutations were covered, then a summary program=
 could just read in the generated file.
Or Verilog plugins could be used as well (Icarus supports this, that is how=
 it implements all `$` functions).

A drawback is that just because an output is different does not mean the te=
stcase actually ***checks*** that output.
If the testcase does not detect the diverging output it could still not be =
properly covering that.

The point of this is to check coverage of the tests.
Not sure how well this works with formal validation.


> > Synthesis in particular is a black box and each vendor keeps their part=
icular implementations and tricks secret.
>
> sigh. =C2=A0i think that's partly because they have to insert diodes, and=
 buffers, and generally mess with the netlist.
>
> i was stunned to learn that in a 28nm ASIC, 50% of it is repeater-buffers=
!

Well, that surprises me as well.

On the other hand, smaller technologies consistently have lower raw output =
current driving capability due to the smaller size, and as trace width goes=
 down and frequency goes up they stop acting like ideal 0-impedance traces =
and start acting more like transmission lines.
So I suppose at some point something like that would occur and I should not=
 actually be surprised.
(Maybe I am more surprised that it reached that level at that technology si=
ze, I would have thought 33% at 7nm.)

In the modules where we were doing manual netlist+layout, we used inverting=
 buffers instead (slightly smaller than non-inverrting buffers, in most tec=
hnologies a non-inverting buffer is just an inverter followed by an inverti=
ng buffer), it was an advantage of manual design since it looks like synthe=
sis tools are not willing to invert the contents of intermediate flip-lfops=
 even if it could give theoretical speed+size advantage to use an inverting=
 buffer rather than an non-inverting one (it looks like synthesis optimizat=
ion starts at the output of flip-flops and ends at their input, so a manual=
 designer could achieve slightly better performance if they were willing to=
 invert an intermediate flip-flop).
Another was that inverting latches were smaller in the technology we were u=
sing than non-inverting latches, so it was perfectly natural for us to use =
an inverting latch and an inverting buffer on those parts where we needed h=
igher fan-out (t was equivalent to a "custom" latch that had higher-than-no=
rmal driving capability).

Scan chain test generation was impossible though, as those require flip-flo=
ps, not latches.
Fortunately this was "just" deserialization of high-frequency low-width dat=
a with no transformation of the data (that was done after the deserializati=
on, at lower clock speeds but higher data width, in pure RTL so flip-flops)=
, so it was judged acceptable that it would not be covered by scan chain, s=
ince scan chain is primarily for testing combinational logic between flip-f=
lops.
So we just had flip-flops at the input, and flip-flops at the output, and f=
orced all latches to pass-through mode, during scan mode.
We just needed to have enough coverage to uncover stuck-at faults (which wa=
s still a pain, since additional test vectors slow down manufacturing so we=
 had to reduce the test vectors to the minimum possible) in non-scan-momde =
testing.

Man, making ASICs was tough.


>
> plus, they make an awful lot of money, it is good business.
>
> > Pointing some funding at the open-source Icarus Verilog might also fit,=
 as it lost its ability to do synthesis more than a decade ago due to inabi=
lity to maintain.
>
> ah i didn't know it could do synthesis at all! i thought it was simulatio=
n only.

Icarus was the only open-source synthesis tool I could find back then, and =
it dropped synthesis capability fairly early due to maintenance burden (I n=
ever managed to get the old version with synthesis compiled and never manag=
ed actual synthesis on it, so my knowledge of it is theoretical).


There is an argument that open-source software is not truly open-source unl=
ess it can be compiled by open-source compilers or executed by open-source =
interpreters.
Similarly, I think open-source hardware RTL designs are not truly open-sour=
ce if there are no open-source synthesis tools that can synthesize it to ne=
tlist and then lay it out.

Icarus can interpret most Veriog RTL designs, though.
However, at the time I left, I had already mandated that new code should us=
e `always_comb` and `always_ff` (previously I had mandated that new code sh=
ould use `always @*` for combinational logic) and was encouraging my subord=
inates to use loops and `generate`.
Icarus did not support `always_comb` and `always_ff` at the time (though wo=
rked perfectly fine with loops and `generate`).
In addition, at the time, we (actually just me in that company haha) were d=
abbling in object-oriented testing methodologies (which Icarus has no plans=
 on ever implementing, which is understandable since it is a massive increa=
se in complexity, it is much much harder than the scheduling shenanigans of=
 `always_comb` and the "just treat it as `always`" of `always_ff`).

(Particularly, you need object-oriented testbenches since SystemVerilog inc=
ludes a fuzz-testing framework to randomize fields of objects according to =
certain engineer-provided constraints, and then you would use those object =
fields to derive the test vectors your test framework would feed into the D=
UT, this was a massive increase in code coverage for a largish up-front cos=
t but once you built the test framework you could just dump various constra=
ints on your test specification objects, I actually caught a few bugs that =
we would not have otherwise found with our previous checklist-based testing=
 methodology.)
(Unfortunately it turned out that it required a more expensive license and =
I ended up hogging the only one we had of that more expensive license (whic=
h, if I remember correctly, was the same license needed for formal verifica=
tion of netlist<->RTL equivalence) for this, which killed enthusiasm for th=
is technique, sigh, this is another argument for getting open-source hardwa=
re design tools developed; not much sense in having open-source RTL for a c=
rypto device if you have to pay through the nose for a license just to synt=
hesize it, never mind the manufacturing cost.)


-----------------------


Another point to ponder is test modes.

In mass production you **need** test modes.
There will always be some number of manufacturing defects because even the =
cleanest of cleanrooms *will* have a tiny amount of contaminants (what can =
go wrong will go wrong).
Test modes are run in manufacturing to filter out chips with failing circui=
try due to contamination.

Now, a typical way of implementing test modes is to have a special command =
sent over, say, the "normal" serial port interface of a chip, which then en=
ters various test modes to allow, say, scan chain testing.
Of course, scan chain testing is done by pushing test vectors into all flip=
-flops, and then the test is validated by pulsing global clock once (often =
the test mode forces all flip-flops on the same clock), then pulling data f=
rom all flip-flops to verify that all the circuitry works as designed.

The "pulling data from all flip-flops" is of course just another way of say=
ing that all mass-produced chips have a way of letting ***anyone*** exfiltr=
ate data from their flip-flops via test modes.

Thus, for a secure environment, you need to ensure that test modes cannot b=
e entered after the device enters normal operation.
For example, you might have a dedicated pad which is normally pulled-down, =
but if at reset it is pulled up, the device enters test mode.
If at reset the pad is pulled down, the device is in normal mode and even i=
f the pad is pulled up afterwards the device will not enter test mode.
This ensures that only reset data can be read from the device, without poss=
ibility of exfiltration of sensitive (key material or midstate) data.
The pad should also not be exposed as a package pinout except perhaps on DS=
 and ES packages, and the pulldown resistor has to be on-chip.

As an additional precaution, we can also create a small secure memory (mayb=
e 256 octet addressable would be more than enough).
It is possible to exempt flip-flops from scan chain generation (usually by =
explicitly instantiating flip-flops in a separate module and telling post-s=
ynthesis tools to exempt the module from scan chain synthesis).
This gives an extra layer of protection against test mode accessing sensiti=
ve data; even if we manage to screw up test mode and it is possible to forc=
e reset on the test mode circuit without resetting the rest of the design, =
sensitive data is still out of the scan chain.
Of course, since they are not on scan, it is possible they have undetectabl=
e manufacturing defects, so you would need to use some kind of ECC, or bett=
er triple-redundancy best-of-three, to protect against manufacturing defect=
s on the non-scan flip-flops.
Fortunately non-scan flip-flops are often a good bit smaller than scan flip=
-flops, so the redundancy is not so onerous.
Since the ECC / best-of-three circuit itself would need to be tested, you w=
ould multiplex their inputs, in normal mode they get inputs from the non-sc=
an-chain flip-flops, in test mode they get inputs from separate scan-chain =
flip-flops, so that the ECC / best-of-three circuit is testable at scan mod=
e.
You would also need a separate test of the secure memory, this time running=
 in normal mode with a special test program in the CPU, just in case.
Finally, you would explicitly lay them out "distributed" around the chip, s=
ince manufacturing defects tend to correlate in space (they are usually fro=
m dust, and dust particles can be large relative to cell size), you do not =
want all three of the best-of-three to have manufacturing defects.
For example, you could have a 256 x 8 non-scan-chain flip-flop module, inst=
antiate three of those, and explicitly place them in corners of the digital=
 area, then use a best-of-three circuit to resolve the "correct" value.

The test mode circuit itself could ensure that the device enters test mode =
if and only if the secure memory contains all 0 data after the test mode ci=
rcuit is reset.
For example, the 256 x 8 non-scan-chain flip-flop module could have a large=
 OR circuit that ORs all the flip-flops, then outputs a single bit that is =
the bitwise OR of all the flip-flop contents.
Then the test mode circuit gets the `in_use` outputs fo the three secure fl=
ip-flop modules, and if at reset any of them are `1` then it will refuse to=
 enter test mode even if the test mode pad is pulled high.
This ensures that even if an attacker is somehow able to reset *only* the t=
est mode circuit somehow (this is basic engineering, always assume somethin=
g will go wrong), if the secure memory has any non-0 data (we presume it re=
sets to 0), the device will still not enter test mode.

Of course, if the secure memory itself is accessible from the CPU, then it =
remains possible that a CPU program is reading from the secure area, keepin=
g raw data in CPU registers, from which a test-mode might be able to extrac=
t if the device is somehow forced into test mode even after normal mode.
You could redesign your implementations of field multiplication and SHA mid=
state computation so that they directly read from the secure memory and wri=
te to the secure memory without using any flip-flops along the way, and hav=
e only the cryptographic circuit have access to the secure memory.
That way there is reduced possibility that intermediate flip-flops (that ar=
e part of the scan chain) outside the secure memory having sensitive key ma=
terial or midstate data.
You would need to use a custom bus with separate read and write addresses, =
and non-pipelined unbuffered access, and since you want to distribute your =
secure memory physically distant, that translates to wide and long buses (i=
t might be better to use 64 x 32 or 32 x 64 addressable memories, to increa=
se what the cryptographic circuit has access to per clock cycle) screwing w=
ith your layout, and probably having to run the secure memory + crypto circ=
uit at a ***much*** slower clock domain (but more secure is a good tradeoff=
 for slowness).
Of course, that is a major design headache (the crypto circuit has to act m=
ostly as a reduced-functionality processor), so you might just want to have=
 the CPU directly access the secure memory and in early boot poke a `0x01` =
in some part of the memory, in the hope that the `in_use` flag in the previ=
ous paragraph is enough to suppress test modes from exfiltrating CPU regist=
ers.

Do note that with enough power-cycles and ESD noise you can put digital cir=
cuitry into really weird and unexpected states (seen it happen, though fair=
ly hard to replicate, we had an ESD gun you could point at a chip to make i=
t go into weird states), so being extra paranoid about test modes is import=
ant.
What can go wrong will go wrong!
In particular with "`TESTMODE_PAD` is only checked at reset" you would have=
 to store `TESTMODE` in a non-scan flip-flop, and with enough targeted ESD =
that flip-flop can be jostled, setting `TESTMODE` even after normal operati=
on.
You might instead want to use, say, a byte pattern instead of a single bit =
to represent `TESTMODE`, so the `TESTMODE` register has to have a specific =
value such as `0xA5`, so that targeted ESD has to be very lucky in order to=
 force your device into test mode.
For example, since you need to check the `TESTMODE` pad at reset anyway, yo=
u could do something like this:

    input CLK, RESET_N, TESTMODE_PAD, IN_USE0, IN_USE1, IN_USE2;
    output reg TESTMODE;

    wire in_use =3D IN_USE0 || IN_USE1 || IN_USE2;

    reg [7:0] testmode_ff;
    wire [7:0] next_testmode_ff =3D
        (testmode_ff =3D=3D 8'hA5 || testmode_ff =3D=3D 8'h00) ?
          (TESTMODE_PAD && !in_use) ?                      8'hA5 :
          /*otherwise*/                                    8'h5A :
        /*otherwise*/                                      testmode_ff ;
    always_ff @(posedge CLK, negedge RESET_N) begin
        if (!RESET_N) testmode_ff <=3D 0x00;
        else          testmode_ff <=3D next_testmode_ff; end

    wire next_TESTMODE =3D (testmode_ff =3D=3D 8'hA5);
    always_ff @(posedge CLK, negedge RESET_N) begin
        if (!RESET_N) TESTMODE <=3D 1'b0;
        else          TESTMODE <=3D next_TESTMODE; end

Do note that the `TESTMODE` is a flip-flop, since you do ***not*** want gli=
tches on the `TESTMODE` signal line, it would be horribly unsafe to output =
it from combinational circuitry directly, please do not do that.
Of course that flip-flop can instead be the target of ESD gunnery, but sinc=
e you need many clock pulses to read the scan chain, it should with good pr=
obability also get set to `0` on the next clock pulse and leave test mode (=
and probably crash the device as well until full reset, but this "fails saf=
e" since at least sensitive data cannot be extracted).
`TESTMODE` has no feedback, thus cannot be stuck in a state loop.
`testmode_ff` *can* be stuck in a state loop, but that is deliberate, as it=
 would "fail safe" if it gets a value other than `0xA5`, it would not enter=
 test mode (and if it enters `0xA5` it can easily leave test mode by either=
 `TESTMODE_PAD` or `in_use`).

(Sure, an attacker can try targeted ESD at the `TESTMODE` flip-flop repeate=
dly, but this risks also flipping other scan flip-flops that contain the da=
ta that is being extracted, so this might be sufficient protection in pract=
ice.)

If you are really going to open-source the hardware design then the layout =
is also open and attackers can probably target specific chip area for ESD p=
ulse to try a flip-flop upset, so you need to be extra careful.
Note as well that even closed-source "secure" elements can be reverse-engin=
eered (I used to do this in the IC design job as a junior engineer, it was =
the sort of shitty brain-numbing work forced on new hires), so security-by-=
obscurity does have a limit as well, it should be possible to try to figure=
 out the testmode circuitry on "secure" elements and try to get targeted ES=
D upsets at flip-flops on the testmode circuit.

Test mode design is something of an arcane art, especially if you are tryin=
g to build a security device, on the one hand you need to ensure you delive=
r devices without manufacturing defects, on the other hand you need to ensu=
re that the test mode is not entered inadvertently by strange conditions.

In general, because test modes are such a pain to deal with securely, and a=
re an absolute necessity for mass production, you should assume that any "s=
ecure" chip can be broken by physical access and shooting short-range ESD p=
ulses at it to try to get it into some test mode, unless it is openly desig=
ned to prevent test mode from persisting after entering normal mode, as abo=
ve.

(No idea how that ESD gun thing worked or what it was formally called, we j=
ust called it the ESD gun, it was an amusing toy, you point it at the DUT a=
nd pull the trigger and suddenly it would switch modes, this of course was =
a bad thing since you want to make sure that as much as possible such upset=
s do not cause the chip to enter an irrecoverable mode but an amusing thing=
 to do still, we even had small amounts of flash memory containing register=
 settings that we would load into the settings registers periodically at th=
e end of each display frame to protect against this kind of ESD gun thing s=
ince the flip-flops backing the settings registers were vulnerable to it an=
d we needed a way to preserve the settings of the customer for the IC, the =
expected effect would be to cause the display to flicker.)

Regards,
ZmnSCPxj