Failures of secret key cryptography

Daniel J. Bernstein (djb)

FSE 2013

video: https://www.youtube.com/watch?v=bT4cKwBROno&list=PLgO7JBj821uGZTXEXBLckChu70kl7Celh&index=7

Welcome to the second invited talk, to be given by Dan Bernstein. Dan is one of the few members of our community who does not need an introduction. It's a great honor and pleasure that he has agreed to give a talk here. Dan comes from UC Berkeley and from Chicago. Just a few things you might know Dan from-- he sued the U.S. government about export controls. In 2004, .. in a class of his.. a lot of qmail, his work on factoring, ECC, and symmetric cryptology. Poly1305, .. and also he's... attacks.. So please join me in welcoming Dan Bernstein.

Introduction

Thank you for the introduction. Can I get a check about whether the microphone is working? Okay, it has to be on the other side. Checking again, is the microphone working? Is there a button? Okay, third time, is it working? Yes. Alright, great. Thank you for the introductions and thank you for the invitation.

I decided to start out with some quotes about the relationship between cryptography and information security. I think it's a critical part of information security, but let's see what other people think. First of all, xkcd for those of you who might not be able to read it, on the left side there's a bad guy saying "his laptop is encrypted, let's build a $1 million dollar cluster to crack it" and the other bad guy says "it's no good, it's 4096-bit RSA, blast, our evil plan is foiled". Now this is labeled "A cryptonerd's imagination". On the right side, it says "what would actually happen" and there the bad guy says "As long as the laptop is encrypted, drug him and hit him with this $5 dollar wrench until he tells us the password". The other bad guy says "Yeah". Then there's another similar quote from Grigg-Gutmann in 2011, saying "in the past 15-20 years, nobody has ever lost money to an attack on a properly designed cryptosystem". What is a properly designed cryptosystem, how does a user tell? Well they give a definition, it's one that doesn't use homebrew crypto or toy keys, on the Internet or the commercial world. And I have another quote, from Shamir in 2002 saying "Cryptography is usually bypassed. I am not aware of any major world-class security system employing cryptography in which the hackers penetrated the system by actually going through the cryptanalysis".

Now, I'm not quite sure what these quotes mean. First of all, it's clear from the xkcd comic that the bad guys are unable to break the crypto. But what about the Shamir quote and what about the Grigg-Gutmann quote? Are they saying that the cryptography is actually infeasible to break? Or are they saying well it's feasible to break the real world crypto, but the attackers have easier things to do. Well certainly all of the quotes I have given have this theme of well yeah there's easier things than going through the cryptography. Well, Shamir very clearly says that the.. that when hackers penetrate the system, they're not actually going through the cryptoanalysis. It's amazing that a comic strip is more clear than other people commenting on this security issue. Well, anyway.

There's another possibility, which is that what these people mean, namely that cryptography is not the weak point, that they are not breaking the cryptography, either because it's not feasible or because it's feasible to break but there's easier attacks. Maybe they are just wrong about all this? Maybe the real world cryptography is actually breakable, and it's actually being broken along with lots of other things. If we actually want a secure system, then we have to fix the crypto and also fix all the other things going wrong. So it's not some strong point in security, it's actually one of many weak points in security that we have to fix.

Example: Flame

Well, to resolve this question, let's look at some examples starting with Flame. Now the target here is the Windows update system. Flame is some malware that broke into a bunch of computers, targeted a bunch of computers in the Middle East. And Microsoft announced about a week after Flame was publicly discovered, Microsoft said "We recently became aware of a complex piece of targeted malware known as 'Flame' and immediately began examining the issue... We have discovered through our analysis that some components of the malware have been signed by certificates that allow software to appear as if it was produced by Microsoft". Bunch of computers are broken into and it's an "issue", okay. We have discovered through our cryptanalysis, so clever are we, that.... they broke our signature system. Oops. So Microsoft has a Windows update system where any Windows machine will accept updates to Windows if they are signed by Microsoft. And the attackers, one of the main way that Flame broke into computers, well, the way that Flame broke into computers, was by forging a signature which appeared to have come from Microsoft and they targeted computers that looked at that signature and said "oh yeah, that's valid". Now, this sounds like a PKC issue rather than FSC issue, until you look at how they forged the signatures. Namely they used a chosen prefix collision attack against md5. Mark Stevens has a nice tool where he will look at a colliding message, not a colliding pair of messages but a colliding message, and he will try XORing it with a bunch of plausible message differences and then tracking it through md5 and seeing if there's an internal collision. And so he was able to look at the Flame message, the Flame signature, and look at "yes that is part of a chosen-prefix collision attack", which wasn't his published chosen-prefix collision attack, it was something different, an entirely new and unknown variant.

The concept of "new" here is a little interesting because if you look at another report from the CrySyS lab that was one of the first labs to find Flame and analyze what it was doing... they said they had... there's a bunch of machines around the internet but they have virus protection and maybe people can get through that virus malware protection, Flame of course got through lots of protections, and these machines will still keep logs of what they are doing, and send logs out to other sites for analysis later. It was possible to go back in time to the older logs and look at what those machines were doing. And sure enough one of those machines in its logs in 2007 had one of the critical Flame files. The conclusion from this lab was that Flame was probably already active in 2007 which was before the real damage MD5 not just the .. but using them for forging certificates that was before that was publicly known, before it was established in the public cryptographic community, and they think Flame was perhaps active for as long as 8 years. If you go back to 2007, again, problems known with md5. People were saying in the mid 1990s that md5 was a bad function, don't use it, but people were still deploying md5 in 2007, there wasn't this rush to get rid of it in 2007. md5 was not homebrew crypto, it was something standardized and widely used, and nevertheless the attackers broke it, and apparently they thought this was one of the easiest things for them to do, they didn't bother with some other attack that would have been harder. The crypto was the weak point in this system, at least it was the weak point that the attackers decided to break.

If you compare this to what Grigg and Gutmann were saying, they said "Cryptosystem failure is orders of magnitude below any other risk" and this is obviously a little bit of an exaggeration, but nevertheless this is what they say - "the crypto's fine, the crypto's strong, we're done, crypto is successful, mission accomplished, yay". Let's look at another example.

There was a nice talk yesterday about, I have just two slides here saying just a little bit about the history... I would give some credit to Borisov-Goldberg-Wagner 2001 where there was a flurry of attack papers a few years after WEP was standardized. Turns out that there's a 24-bit message number, so you had this long-term secret key which is 40-bits or 104-bits and then you put on a 24-bit message number to get your 64 or 128 bit RC key, of course the 24-bit number was not meant to be repeated but of course it will be repeated often. And then you lose all your confidentiality and integrity, it's already a complete disaster just from having a 24-bit message number as your long term key. And then user authentication failures, and then Fluhrer-Mantin-Shamir in 2001 published an attack about RC4 key.. because the long-term key is concatenated with this message number and because there are correlations between the p bytes the long-term bytes and the RC4 output, what you can see when you bury the message number, you end up seeing the key bytes from a bunch of RC4 outputs. And then well, lots and lots of papers about this, prehaps concluding with the paper from yesterday saying you can optimize this attack to the point where getting the WEP secret key from under 20,000 packets. It's an incredibly weak cryptosystem.

Some people look at this and say hey you're pointing to a bunch of academic papers. And sure, WEP is still used in the real world and it's weak, the academic papers do say that, but "come on who is actually being attacked?". Well here's an example. Some attackers stole 45 million credit card numbers and a bunch of other personally identifiable information from an American company, T. J. Maxx by first of all breaking into the internal network which was protected with WEP. A company in the U.S. when there's people losing money or maybe gaining money, there's a whole subculture of other financial research companies that will look around and speculate about how much it is going to cost them. Forrester Research said that T. J. Maxx would lose something like $1 billion from this incident, which is maybe correct with all the consulting fees and cleanups... the only documented damage that I am aware of is that T. J. Maxx paid $40.9 million dollars to settle a lawsuit about exactly this theft. So nobody loses any money? Well here's an example of someone losing money because of broken crypto.

Example: Keeloq

Let's try another example. Keeloq. This is one from.. system.. for car doors and garage doors. Wikipedia says here's a bunch of car companies that have deployed it. There was a paper in 2007 by Indesteege-Keller-Biham-Dunkelman-Preneel, "How to steal cars". Recover 64-bit KeeLoq key using 2¹⁶ known plaintexts, only 2⁴⁴.5 encryptions. What this says is that the security of the cipher using in KeqLoq does not give you 2⁶⁴ security. You can break it using only 2⁴⁴.5 encryptions and 2¹⁶ known plaintexts. When I see numbers like this, I start envisioning well what does 2⁴⁴ computations mean... well how about the known plaintexts? 2¹⁶ known plaintexts? For those of you don't know .. it's.. a parking vallet at a really fancy restaurant. Rich people come to the restaurant and they give him their cars and car keys temporarily. If he steals the car with the car keys, well they know it's him and he gets fired and he gets in trouble. But if he can clone the key and then give the car, and then sell the key on the black market, then nobody knows it's him. So he drives the car away, he parks the car, he has a radio receiver to receive the Keyloq signals... and he also has a second job as a professor, so he has a bunch of graduate students, and he says, grad student please press this key as fast as you can. I actually tried this. At reasonable speeds, it takes about 3 hours to get 2¹⁶ pushes. And fancy restaurant, 3 hour dinner, that's perfectly reasonable, sometimes it takes a little time for them to give you the bill and for the vallet to bring the car back... well, by listening to the radio signals and doing some computation, he is able to get the remote key and also build clones of it.

And then there was a paper in the next year, in 2008, by Eisenbarth-Kasper-Moradi-Paar-Salmasizadeh-Shalmani, where they recovered the system's master key, allowing practically instantaneous cloning of KeeLoq keys. They had a much faster attack where they can from some distance intercept the radio signals of pushing the buttons once or twice and they will immediately have cloned the key. They were able to get the manufacturer's key out of the system. It turns out that the receiver systems inside these cars inside garage doors from each of these companies, all share a manufacturer key and the main work that was done here was actually extracting the manufacturer's keys by interacting with those receiver devices. Now, some people would say that the authors of the second paper were cheatin gbecause it was a sidechannel, and they weren't looking at just the legitimate radio signals designed to be the outputs of the system, they were looking at the power consumed by this system. Now, maybe Shamir would say this was bypassing the cryptography. Maybe it's not an issue that cryptographers should worry about that there are sidechannel attacks; perhaps there should be some other people that are protecting us against sidechannel attacks. But maybe that there are interactions between the cryptography and the sidechannels. Maybe we can change the decisions we make, and there is more and more evidence of this, that there are different decisions we could make when designing cryptographic systems that make sidechannel attacks harder or easier to carry out.

Another question that I will formulate about this attack is I don't have any examples here about press news stories about people stealing cars using these attacks. So some people will look at this and say hey, there's all these academic papers, it's theoretically weak but no hacker would ever carry this out in practice, there's no evidence that any one has broken the KeeLoq system... but now, suppose we have two academic teams, they find an attack against KeeLoq, and if an attacker finds a break against a widely deployed system like this, there is a 100% chance they will follow the NY Times or some equivalent organization. If a real attacker breaks a system, then what's the chance that they are going to call the NY Times? They don't want anyone to know they're breaking the system. So let's say there's 0% chance that the real attacker is going to call the NY Times. Now the NY Times articles are now two academic teams times 100% plus n attacks times 0% which gives us a total of 2 articles. What does this tell us about n? Some people think this means there's no real attacks. I think this doesn't tell us anything, though. If the system is weak, then it's presumably being broken by real attackers, even if we don't have leads for it.

Another example: VMWare View

https://www.youtube.com/watch?v=bT4cKwBROno&t=16m45s

Let me give you another example.

VMWare View. This one is maybe not so well known. This is one of those corporate things where you buy 100s of thousands of terminals from some company, say Dell, and these terminals are going to have little desktops where the desktop environment that the user is talking to is actually generated on some server. There's a centrally managed server with thousands of terminals all using some protocol, VMWare View, to send the graphics from the central server to the users and the user can interact with the computer system. If you look at the documentation from Dell for this system, they say switch from AES-128 to SALSA20-256 whatever that is, for the best user experience. So this is faster, the best user experience refers to the speed limit for the hardware, which means 5 megabits/second for network graphics, which AES can successfully encrypt, and yeah if you switch to something faster you can get faster graphics and users are happy. And normally I would say this is good, it increases your number of rounds and security margin, there's lots of reasons that this is a reasonable thing to do for speed and security. But let's look at documentation. Alright, I'll run through this slowly. AES-128 according to the protocol documentation is AES-128-GCM. SALSA20-256 according to the docs is Salsa20-256-Round12. Does anyone see the problem here? Authenticated encryption, says the parking vallet. Okay. AES GCM is not just encryption, it's also authenticating your messages so they can't be forged. Now, this Salsa20-256 with 12 rounds maybe.. it could include some authentication that they aren't saying anything about it, but I see no evidence that they are including authentication. As far as I know, I mean I haven't tried buying one of these things and carrying out an attack, as far as I know, I mean if it's just doing what that suggests, then this is completely vulnerable to packet forgery where AES GCM is not vulnerable. So at least from all the available documentation about this, even though switching from AES to Salsa20 generally makes me happy, switching from AES with authentication to Salsa20 without authentication, is a really bad idea. The user doesn't need just encryption, they need encryption plus authentication. There's tons and tons of examples like this where the cryptography is failing because it's barely even being deployed. The IPSec null authentication is one of the most common ways to deploy virtual private networks. You have a protocol that could have some authentication built-in, but the user actually does no authentication, and then they get surprised when their packets get forged. Okay, let's try another example.

SSL/TLS/HTTPS

https://www.youtube.com/watch?v=bT4cKwBROno&t=20m9s

In SSL, there are lots of stages to this. I will focus on the secret key part of SSL for this entire talk. What SSL should be doing is nothing at all like this, but let's start with normal AES-CBC encryption. So you have a bunch of plaintext blocks, p0, p1, p2, say 16 bytes each, 48 byte message, you encrypt each block by applying AES to each block, you add something to each block, you add the previous ciphertext block c0 to p1, and then feed that into AES to get c1, and then add p2, feed it to AES again and get c2, and you send it along again. At the beginning, the first p0, there's no c sub minus 1. So what you're supposed to do is generate a random number, like encrypting a nonce, but somehow you generate a random number, and somehow the other side knows your random number, you send along the random number or have them implicitly know the nonce and somehow the other nonce knows v, and you add that v to c0. That's standard AES CBC encryption.

So here's what SSL does. You take, well, it's very much the same thing. You take each ciphertext block and add to the next plaintext block. And then feed it through AES to get the next ciphertext block. Except at the very beginning, instead of taking a random number, SSL takes the last ciphertext block from the previous packet. So it sends a packet along, c sub minus 3, c sub minus 2, c sub minus 1, and then it gets the next uh packet to send from the user, p 0, p 1, p 2, ad adds to that last previous c sub minus 1, which we send along in the previous packet, sends that, adds that to c0, and then encrypts on that. The problem here is that the order of operations is very important. Because the attacker can see this last ciphertext before choosing this next plaintext, the attacker can therefore choose p0 as a function of c sub minus 1, in particular can choose p0 to be c sub minus 1 to be exclusive or something where the something is allowing the attacker to guess some previous value. This is none of the random collisions you were hearing about in the other talk; this is the attacker actively saying I got to guess for p sub minus 3, and I'm going to take guess and exclusive OR (XOR) it with c sub minus 4, so that was what was encrypted before, if the guess was correct, to get c sub minus 3, and then I will XOR with c sub... so if the attacker is able to control the next plaintext, if he does this, then the AES encryption of ... which is exactly what SSL says through AES right here, that's exactly the same as c sub minus ... which is encrypted to get c sub minus 3. So the attacker compares these and then checks if the guess was correct. As long as there's not much entropy in this c sub minus 3, so it's a complete failure of this.

2002 Moller
2006 Bard: malicious code in browser should be able to carry out this attack, especially if high-entropy data is split across blocks.
2011 Duong-Rizzo "BEAST" fast attack fully implemented, including controlled variable split

Bard said in 2006 that if you had something in applet, there's no clear obstacles to carry out this attack and it could repeatedly carry out guesss. And this was eventually implemented some years later. It was called the "BEAST" attack against browsers, a browser exploit against SSL/TLS and this really does work, it gets something like a cookie out of the browser in something like tonight.

https://www.youtube.com/watch?v=bT4cKwBROno&t=24m4s

Countermeasures

Browser vendors will look at this attack and say okay, what are we going to do? We get this p0 p1 p2 from the user. Before we encrypt that, before we add p0 to something, we're going to insert another packet before that, which doesn't have any data, it has a message authenticator only. So a browser sends an empty piece of data with a message authenticator, and then p0 p1 p2 with a message authenticator, and because there's an extra message authenticator before the p0, that randomizes the ciphertext block that is used to add to the p0. So the packet that gets sent is committed to before the c sub minus 1 is generated. And that's enough to make things secure. It doesn't quite work because empty data actually breaks Internet Explorer. What browsers do instead is they take one byte of the legitimate packet, send that along, and then send all the remaining bytes, and that's probably secure and it's actually deployed today.

Now the attacker instead of controlling the plaintext could try to control the ciphertext and send along forged network packets. But that doesn't work because each packet includes an authenticator. Each packet has a MAC which is protecting against forgery. And the way this works is that SSL takes the legitimate data, puts an authenticator on it, and oops CBC is supposed to have 16x times some number of bytes, or number of blocks, so SSL takes the authenticated data, puts some padding on it, and then encrypts with CBC. And that padding is something like put on 1 1s or 2 3s or something to get to multiple AES blocks. It's a little complex, but you can look at it and prove.

2001 Krawczyk

Krawczyk's paper looked at this and said this is provably secure. This is like a moment in an action movie when there's a fight going on and suddenly the camera is looking at some barrels full of oil and you just know they are going to explode. So this is provably secure. What a day.

2001 Vaudenay
2003 Canvel

In 2001, I think this paper was presented at the Crypto rump session right before Krawczyk's talk (laughter). In Vaudenay's paper, he said this is completely broken if you have a padding oracle. If you can tell the difference between a padding failure on decryption and MAC failure, so the receiver has to decrypt the CBC and then check the padding and then check the authenticator. If there's a way to tell the difference, such as different error messages for different failures, then in almost no time the attacker can figure out any plaintext for any ciphertext block that he wants. Now this if in the statement was as far as I know first demonstrated to be correct by Canvel in 2003 who said, you know, there is one of these padding oracles where here let's watch the time the server takes receiving this data. If the padding is incorrect, it's fast to reject that block. If it's correct, then the server has to compute a MAC and that takes longer. So by watching the timing of the receiver of this SSL encrypted data, you can get exactly the padding oracle you need for this attack Vaudenay's attack to work.

2013 AlFardan-Paterson Lucky 13 - watch timing more closely, attack still works

What browsers do is, well, they say let's always compute the MAC so there's no longer this timing difference between a padding failure and this MAC authenticator failure. And just a month ago (2013) Alfardan-Paterson the lucky 13 attack said okay we can still break it because there's still some small timing differences between having the padding failure and having the authenticator failure.

So what does SSL do about this? There's obvious reactions like, okay really try to control the timing, it's 100s of lines of code and there's all sorts of bugs just to deal with timing issues. But SSL/TLS has an alternative to this which is called cryptographic agility, which is a marketing stunt which has two parts. The second part first. Cryptographic agility means you have some button which says press in case of emergency and then you will switch to different crypto. Nobody has ever tried the button. Whether it works doesn't matter because it's a marketing stunt. Then the other part of cryptographic algorithmic agility is that, because you have that emergency button, you don't bother having good crypto. You don't care whether your crypto is secure. You just say well if there's a problem just press the button. As an example of the button not working, there's AES GCM which has all of its own timing problems, but at least it would get rid of the basic problems with CBC ... SSL in theory can switch TLS 1.2 can switch to AES GCM. But it doesn't work because if you try turning it on, you find that 90% of webservers and basically 100% of browsers don't actually understand it. Okay, so what do you instead? There is one alternative to AES GCM or other GCM... which is supported by all the clients and servers out there, which you can turn on, it's the emergency backup plan in case of AES GCM failure, and it's switching to RC4, and now more than 50% of SSL connections on the internet are using RC4 and lots of people are recommending RC4 ((note: don't use RC4)). Because it's much less fragile than AES CBC.... There are even statements from 2001 Rivest saying... the new attacks do not apply to RC4... SSL does not do that, it takes the whole public key set, and then it does a reasonable hash to generate a one-time RC4 key. Good, sensible, does not need related key attacks. Attacks against web don't respond to RC4 .. However, some of the problems with RC4 are a reason for concern. There's all these biases in RC4 output bytes. At this point I would like to advertise something I have been doing with AlFadan in a paper called "On the security of RC4 in TLS" where it says okay we can do something very much like "BEAST" was doing where instead of targeting AES encrypted cookies with CBC, you target RC4 encrypted cookies; and then you have the same cookie sent through lots of RC4 sessions, then use the biases in RC4 to figure out what the cookie is from all these ciphertexts. What are these biases? Well, if you were paying attention yesterday, you heard some of what I'll say here. The best known one was from 2001 Mantin-Shamir which is that the second byte of RC4 output is biased towards zero. It has a probability of 2/256 of being zero, instead of 1/256 you would expect. And then 2002 Mironov looked at z1 which found z1 was biased away from 1 towards 2 and it had a totally weird distribution of the first output byte of RC4. And then much more recently, 2011 Maitra-Paul-Sen Gupta observed that all the 255 bytes have more than 1/256 chance of being zero, which was contrary to Mantin-Shamir claim that the chances were balanced. And then in 2011 Sen Gupta-Maitra-Paul-Sarkar, the key-length dependent bias, z sub 16, 16th output byte is biased towards -16, assuming you were using a 128-bit key. That's not nearly the end of it. With my coauthors, the conclusion were that there are 256² biases in the first 256 bytes of RC4 output. Almost 2²⁵⁶... so for every possibility of the ith output byte, what's the chance that the byte output is 0? None of those probabilities are 1/256. So as a result of this, you can use statistics to attack SSL. The paper from yesterday, 2013 Watanabe-Isobe-Ohigashi-Morii, some of the biases are listed here. Let me show you some pictures of what the actual probabilities look like.

https://www.youtube.com/watch?v=bT4cKwBROno&t=33m56s