Return-Path: Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id BA3D9C013A for ; Sun, 14 Feb 2021 00:27:52 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 980746F527 for ; Sun, 14 Feb 2021 00:27:52 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id B1UqHvEfA8Xq for ; Sun, 14 Feb 2021 00:27:50 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 9301E6F52F; Sun, 14 Feb 2021 00:27:50 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mail-40137.protonmail.ch (mail-40137.protonmail.ch [185.70.40.137]) by smtp3.osuosl.org (Postfix) with ESMTPS id 87D0E6F527 for ; Sun, 14 Feb 2021 00:27:47 +0000 (UTC) Date: Sun, 14 Feb 2021 00:27:36 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail; t=1613262463; bh=b9UOWOaNpVP07IR6mQNd9gZfs4ynPCRfM5tj5Qp90Ok=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=GR1znlsyqzzxB5YPq/ESMXLXGsIM/7bFRZBP+YW9ioLOjxEIh9+xF2tT1Xk8LpA/A bBabOgQd+GMqsIVQGRnhe/q6EzJvyU78FuoKfJyMTsiPUpkrONFhdJDLuPj/IhgERK AXzEVk7HExMC1bTXHOMXYSar8ZyCb+7RrNs6Z9+Y= To: Luke Kenneth Casson Leighton From: ZmnSCPxj Reply-To: ZmnSCPxj Message-ID: In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Bitcoin Protocol Discussion , Libre-Soc General Development Subject: Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Feb 2021 00:27:52 -0000 Good morning Luke, > > Another point to ponder is test modes. > > In mass production you need test modes. > > > (Sure, an attacker can try targeted ESD at the `TESTMODE` flip-flop rep= eatedly, but this risks also flipping other scan flip-flops that contain th= e data that is being extracted, so this might be sufficient protection in p= ractice.) > > if however the ASIC can be flipped into TESTMODE and yet it carries on > otherwise working, an algorithm can be re-run and the exposed data > will be clean. But in most testmodes I have seen (and designed) all clocks are driven exte= rnally from a different pin (usually the serial interface) when in testmode= . If the CPU clock is now controlled by the attacker, how do you run any kind= of algorithm? (This could be an artifact of how my old design company designed testmodes,= YMMV.) Really the concern here is that testmode is entered while the CPU has key m= aterial loaded into registers, or caches, then it is possible, if those reg= isters/caches are in the scan chain, to exfiltrate data. Does not matter if the chip is now in a mode that cannot execute algorithms= , if it was doing any kind of computation involving privkeys (including say= deriving its public key so that PC-side hardware can get the `xpub`) then = key material may be in scan chain registers, clock is now controlled by the= attacker, and possibly scan mode as well (which disables combinational cir= cuitry thus none of your algorithms can run). > > > If you are really going to open-source the hardware design then the lay= out > > is also open and attackers can probably target specific chip area for E= SD > > pulse to try a flip-flop upset, so you need to be extra careful. > > this is extremely valuable advice. in the followup [1] you describe a > gating method: this we have already deployed on a couple of places in > case the Libre Cell Library (also being developed at the same time by > Staf Verhaegen of Chips4Makers) causes errors: we do not want, for > example, an error in a Cell Library to cause a permanent HI which > locks us from being able to perform testing of other areas of the > ASIC. > > the idea of being able to actually randomly flip bits inside an ASIC > from outside is both hilarious and entirely news to me, yet it sounds > to be exactly the kind of thing that would allow an attacker to > compromise a hardware wallet. potentially destructively, mind, but > compromise all the same. Certainly outside of the the old company design philosophy I have seen many= experts strongly protest against a design philosophy which assumes that an= y flip-flop could randomly switch. Yet the design philosophy within the old company always had this assumption= , supposedly (according to in-company lore) because previous engineers had = actually found the hard way that random bitflips did occur, and for e.g. au= tomobile chips the risk was too great to not have strong mitigations: * State machines had to force unused states into known states. For example a state machine with 3 states needs 2 bits of state, but 2 bi= ts of state is actually 4 states, so there is a 4th unused state. * Not all state machines needed this rule but during planning we had to i= dentify state machines that needed this rule, and often we just targeted ha= ving 2^n states just to ensure that there were no unused states. * I even suggested the use of ECC encoding for important state machines a= nd it was something being investigated at the time I left. * State machines that otherwise did not need the above rule were strongly e= ncouraged to clear state at display frame vsync. This ensured that any unexpected states they had would only last up to on= e display frame, which was considered acceptable. * Flip-flops that held settings were periodically reloaded at each display = frame vsync from a flash memory (which apparently as a lot more immune to b= itflips). It could be an artifact as well that the company had its own in-house found= ry rather than delegate out to TSMC or whatnot --- maybe the technology we = had was just suckier than state-of-the-art so bitflips were more common. The reason why this stuck to mind is because at one time we had a DS test w= here shooting the ESD gun could sometimes cause the chip to fail (blank dis= play) until reset, when the expectation was that at most it would flicker f= or one display frame. And afterwards we had to go through the entire RTL looking for which state = machine or settings register was the culprit. I even wrote a little Verilog-PLI plugin that would inject deterministicall= y random data into flip-flops in the model to try to catch it. Eventually we found a bunch of possible root causes, and on the next DS ite= ration testing we had fun shooting the chip with the ESD gun over and over = again and sighing in relief that the display was not failing for more than = one frame. The chip was a display driver for automotive, apparently at the time cars w= ere starting to transition to using LCD for things like speedometer and acc= elerometer rather than physical dials. And of course the display suddenly switching off while the car is running a= t high speed due to some extra-powerful pulse elsewhere was potentially dan= gerous and could distract the driver, so that is why we were paranoid about= such sudden bitflips potentially leading to such massive cascade of upsets= as to make the display fail permanently. I think being excessively cautious for cryptographic chips should be standa= rd as well. And certainly test mode exfiltration of data is always an issue, JTAG is ve= ry standard way of reading memory. Regards, ZmnSCPxj