Greg Maxwell (gmaxwell)
- slides: https://people.xiph.org/~greg/gmaxwell_sfbitcoin_2015_04_20.pdf
- video: https://www.youtube.com/watch?v=Gs9lJTRZCDc (1h 23min 12sec)
A deep dive with Bitcoin Core developer Greg Maxwell
The blueberry muffins are a lie. But instead I got some other things to present about. My name is Greg Maxwell. I am one of the committers to Bitcoin Core. I am one of the five people with commit access. I have been working on Bitcoin since 2011. Came into Bitcoin through assorted paths as has everyone else.
Back in around maybe 2004 I was involved very early in the creation of Wikipedia and system administration for it. I had made some very vigorous arguments in a very contentious political debate. People argued that there should be no Wikimedia Foundation to act as a non-profit to run Wikipedia, but instead it should be completely decentralized. Having worked with cryptosystems before, I was an early user of Hal's RPOW system. My position was that decentralized consensus was impossible. I presented this wanky sort of physically-oriented argument about why, in the broad sense, a system with no admissions control could not achieve guaranteed decentralized consensus.
I spent some years working for a municipality building out broadband in an area that had none. I spent 8 years working for Juniper Networks deploying some of the largest networks in the world. I spent a number of years working at Mozilla. I was working on my secondary passion for about 15 years, which was working on royalty-free multimedia codecs. I am one of the developers for the Vorbis format and the Opus format. I worked on Theora and some other things there. Part of my motivation in that area is taking this zillion dollar industry of codec licensing that inhibits people from publishing on the internet...
I didn't look to see how Bitcoin worked because I had already proven it (strong decentralized consensus) to be impossible. I downloaded it but didn't look into it. I was surprised a year later to find out that it still existed. I read the source code, it was only about 3k lines of source code. It had achieved something not quite as strong as I was looking for but still close, so I thought maybe bitcoin could actually be something. It had some cool attributes. It was a cryptosystem, and these were areas that I were already interested in. It involved very sophisticated concerns about software security. It could radically change the face of finance in the world. It could have an effect on trillions of dollars. I am always looking for areas for where I can apply myself with lots of leverage. Where I write a little bit of code, and there's big impacts.
More recently, with a number of other people in the bitcoin ecosystem, I founded Blockstream, a company working on serious bitcoin projects down in the weeds. This was a hard decision for me. I wanted to keep the coin in the box. It calls on many different types of interests and kinds of people. I am excited to come to events and see people with radically different views from myself. That's necessary.
I put in maybe 100x the amount of hours spent on review and analysis than I do on actual coding. My work is often work that can make others' work more effective. I do some coding from time to time, though.
A lot of the things that interest me the most is in the minutia where I have to explain the explanation for the explanation. It is hard to pick topics to talk about. I thought I would cover three general areas. I want to give an update on things going on in Bitcoin Core right now. Then there is a prospective topic, things we are looking around in the future. And then one that is more philosophical.
So, on privacy. This has surprised me in the past but not anymore. Some people say "privacy, what's that for?" Well, privacy is a central characteristic of money. Sometimes we do not appreciate the importance of privacy in money. Cash is inherently private. Bank accounts are not private to the bank or regulators, but they are private to the rest of the world. The idea of a public ledger is totally unlike the kinds of money we have used in the past.
You can think about this very pragmatically. If you are using a public ledger, then your competitors know your customers, suppliers, sales volumes, amounts, prices, etc. This causes leverage imbalances in contracts. So your income goes up, and now your landlord tells you to give him more. It creates a breakdown of personal harmony when your inlaws and neighbors are looking at your purchases. If your conman knows you just bought something, he can call you up and threaten you. If someone sees that you have a lot of money, then they know to target you.
Another thing that privacy is important for is that it goes hand in hand with fungibility. One bitcoin is the same as every other. If we break fungibility... fungibility is like the definitional requirement. If fungibility is broken, then it is unsafe to accept bitcoin from someone. If you have a central blacklist, then why bother with bitcoin anyway? This has to be traded off with other considerations. You can build transparency into bitcoin. You can layer accountability and privacy on top of a private system. But trying the other direction is much harder or impossible.
You can build provable transparency in a way that you couldn't have done before. You can layer accountability and privacy on top of a private system. Giving people privacy on a system that is not private, is much harder.
This goes much further beyond the practical day-to-day privacy against your neighbors. There are significant civil rights concerns here. To effectively seek political power, you must be able to spend money. Some political forces stop others from spending money. This is not a radical political view. Money is important to our ability to speak according to multiple Supreme Court justices.
Why privacy in Bitcoin Core? I personally view Bitcoin Core as a best practices implementation. We're not necessarily trying to be the most risky. We are taking a serious approach to building software that is very high quality. Another aspect is that Bitcoin Core is a full-node. It's a system that autonomously validates all information without trusting anyone. One of the things that occurred, and this surprised me, I have been asked by companies and researchers to not fix privacy bugs. I think this is short-sighted for all of the reasons I just said. In Bitcoin Core, we would not intentionally add a privacy-harming misfeature. A missing fix is invisible, though, and needs to be avoided. Not fixing a bug is equivalent or even worse. A missing fix is invisible. It should only serve the interest of its users.
There has been recent public drama about surveillance attacks in the bitcoin network. Passive surveillance has existed since at least the beginning near 2011. This has been going on for a long time. People run nodes that serve to deanonymize users by trying to trace transactions in the network. Recently there have been farms ("sybil nodes") spun up where nodes try to gather more traffic so that they can analyze it better. Bitcoin Core has basic protections, it wont connect to the same netgroup multiple times. Those protections are only basic. Any weakness in this domain encourages wasting network resources. When people try to connect to everyone, that waste resources. If we make the system more private, there's no incentive to connect to everyone and waste capacity.
So one long-standing advice is to run Bitcoin Core over tor. Tor is great, it's not perfect because it's weak against state actors, it's vulnerable to denial-of-service. It's usually strictly better than not using tor. There has been some discussion around some of the sybil attacks like, they are unlawful because they violate the Computer Fraud and Abuse act or that they violate various e-privacy directives. And that may be true, I have no opinion on that. But in the space of bitcoin, we have to defend against people who don't care about their public image and they don't care about the law or what the law says. And so since we have to do that, our most powerful tool to defend against that is technology. So if people want to pursue other avenues to make people behave, that's fine, but my approach is to ask how can we improve the technology.
So one of the areas we have been looking to improve is with respect to "eclipse attacks". This is where people try to get your node to only connect to them or mostly connect to them by sybil attacking you. Bitcoin Core keeps a table about peers and nodes you can connect to. It is hard to gain extra space in that table. There have been some recent simulation results that reveal that it wasn't quite as good as we expected. There were some places where randomization in the algorithm actually allowed an attacker to gain a greater concentration. So there is a paper recently published on this, and prior to the publication of the paper, the authors worked with us and we implemented a half-dozen set of fixes. We expanded the tables, didn't promiscuously take up extra advertisements. These fixes will all be available in 0.10.1 which should be available in a couple days.
In general, people trying to connect to a lot of the network has been a long-term problem. There are some things discussed and in the works to dissuade this activity. We might have a proof-of-work system where you can get prioritized connections to the network by doing proof-of-work. So it's more expensive to use up capacity on the network. I proposed a proof-of-work scheme like this. Sergio has another similar scheme. They are all kind of complicated and don't completely solve the problem. To clearly solve the problem, you would deploy it right away. The combination is messy. It's still in the research phase.
I have been working on a tool that you can run alongside Bitcoin Core where you have peers that you trust (your friends, whatever) and they can compare notes without telling each other who they are connected to, but you would be able to determine who is connecting too much to the network. At the moment people are doing this manually to determine who to ban.
There are a number of other network-level improvements that impact privacy. There was an issue in Bitcoin Core today where your connections out through Tor would go through the same circuits as your browser going through tor which is not good. There is an information leak where someone can send addressed messages to you, and then there's timestamps, and this allows them to learn more about the network topology and run partition attacks. They can also see the graph path of how the transaction propagated through the network. We have also been working on a way to do batched transactions as one of the ways to increase privacy.
One of the features that has gone into the 0.11 branch right now is this wallet relay improvement. Right now with Bitcoin Core, receiving transactions is completely private. You don't send anything to identify that. When you send, someone could use that to identify what you're doing or what your transactions are. If your transaction doesn't immediately get into the blockchain, you re-broadcast it periodically. I think that some people have used this to trace users in the network. There is a new flag in 0.11 that turns off wallet relaying. Manually sending transactions is not very useful, but it means that someone can create a program beside Bitcoin Core that could use another method to relay the transaction. So it could use a high-latency network like MixMaster or BitMessage. And the cool thing about it is that it could be separate. You don't have to learn about developing Bitcoin Core, you can just run it alongside Bitcoin Core and write it yourself. We might pull in useful contributions there into Bitcoin Core.
Another area that we are working on in privacy has privacy implications.. We have been working on making it easier for users to run full nodes. Full nodes have fundamental privacy advantages. The SPV clients like electrum are fundamentally weak from a privacy perspective. The bloom filters uniquely identify the wallet. It is fundamentally weak. You might think you're private, but you're not. Electrum sends the server a list of addresses, and anyone can run an electrum server. You can easily spy on the electrum network. It's necessary that we have light nodes. It's the only way you're going to run bitcoin on a cell phone right now. If we make it easier for more people to run full nodes, then the world will be better for it. Right now to run a full node you need 30+ GB of disk space. There's also some behaviors in Bitcoin Core around bandwidth bursting that might not play well with consume routers. Sometimes there's buffer bloat and then the router will have lots of latency.
In bitcoin 0.11, we have pruning. This allows you to run a full node that is fully private but it does not contain the full blockchain. You can have just a gigabyte of space. We have some plans for automatic bandwidth ratelimiting. The main question I have here is whether there's a way to make this self-tuning. I'm not sure if I am going to be able to do that.
There are still things that we would like to see in privacy. The wallet and coin selection algorithm is fundamentally bad for privacy. It is incompatible with address reuse. If you reuse addresses at all, you blow up your privacy. The wallet just needs to try to avoid linking addresses. We have code in the code base that traces the links, but we don't have enough testing for the wallet infrastructure right now and it's hard for developers to be confident that they aren't breaking everything.
There has been a ton of development around coinjoin, and this is a casual privacy improvement in transactions that I described a year and a half ago. There's no implementation of coinjoin in Bitcoin Core yet, but there are many out on the network. There's some research results that show that there's a significant number of coinjoin transactions happening which I am happy to see. The design for coinjoin systems is still being fleshed out by some people, but it's still not as mature as I would like to see before including an implementation of that in Bitcoin Core.
There's also this thing that's called "stealth addresses". And I really hate the name, it's intentionally "edgy" and it's really doing something that's quite pedestrian. It has been promoted by the Dark Wallet folks. I like to call it "reusable addresses". It's a thing that we have been talking about for years, we were calling it "ECDH addresses" previously. The notion of these stealth addresses is that you give someone an address, and every time they reuse the address, there's sort of a randomly generated different address that appears in the blockchain so those transactions are not linked together. There's an existing proposal for this, it's basically unmaintained, it gets a bunch of things wrong. And the basic design right now makes SPV privacy problems even worse. So it's very difficult to deploy a new bitcoin address style on the bitcoin network. We created P2SH back in the beginning of 2012 and it took years before wallets started to support it. So we want to act very intentionally with this to make sure that we implement the right spec and make sure we do not have people gyrating on different approaches here.
Alright, so, that's some of the things going on with privacy in Bitcoin Core. Now I am going to switch gears and talk a little more about forward-facing technologies. So when we deploy new technology for the bitcoin network, we have to think far into the future for a number of reasons. For one, it takes a lot of work to deploy a new system or new tool to a big distributed network. Also we need to make sure that our changes do not interfere with other future changes. So I have been giving this a lot of thought with a number of other people about what kinds of cool things that we could do to make multisignature more powerful in the future. We have come up with some criteria about things that are good to have in multisignature schemes.
I think everyone here is already familiar with multisignature. It's really solving a fundamental problem, that in bitcoin there is no other recourse than the network. If someone steals your bitcoins, your bitcoins are stolen. You can't go get a clawback. And also, computer security is a joke. There's no trustworthy devices. Everything has malware. Everything is remotely controllable by someone. The idea with multisignature is hey well maybe if we take multiple devices they won't all be compromised at once, and we could get some actual security. You could define an access policy, such as these coins with these spents and if A and B agree. Or these coins can be spent only if these two parties agree, or any two out of three parties as defined, and so on. Multisig has been in bitcoin since the very first day. We added some features to make it more accessible with P2SH. That took years to get deployed. It's important to think about this to get the pipeline going.
One of the problems with multisig today is that it's costly. If you use 2-of-3 today, it increases your transaction sizes by a factor of 2.5 roughly. And that means 2.5x transaction fees and it means a reduction in total network scalability. And that also has a direct impact on the decentralization of the network because the more expensive it is to run nodes on the system, the less people will run them, and the more centralized the system becomes. So we want to have a good handle on this. The bigger your multisig, the more your cost is. And so there's a contention where your security says use multisig policy, but practicality says no you're not going to do that. It would be nice to improve that. And we can improve it.
So I want to talk about some cryptosystem background stuff to let you understand how we can improve this. There is an alternative to ECDSA called Schnorr. And Schnorr is older than ECDSA and it's simpler, more straightforward, and it has robust security proofs and it's a little bit faster as well. But Schnorr was patented, and as we have seen in the history of cryptography, patenting is poison. Any patented cryptosystem sees virtually no deployment at all. People would rather be insecure than deploy a patented cryptosystem. And of course, patenting is actively incompatible with decentralization because the patent holder owns the technology. So in any case, the NSA came up with a nice workaround to the Schnorr patent. ECDSA is very similar to it, but not algebraically equivalent. And the world has deployed ECDSA, but Schnorr still exists and has lots of academic work happening on it. One of the cool things about Schnorr is that it can do multisignature in a very straightforward and scalable way. It's sort of like.. Schnorr multisignature works the way that an idiot would tell you it works even if they knew nothing about cryptography. So the way it works is that if you want to have a 2-of-2 signature with two parties, and you add together the two pubkeys and to get the 2-of-2 signature you add together their signatures, and that's a 2-of-2 signature in Schnorr. And there are some details for actually implementing it, but that's the basic idea. And it just works. And it gives you a 2-of-2 signature. Not only does it give you a 2-of-2 signature, but this Schnorr scheme can be extended to give you an arbitrary threshold. Actually arbitrary cascade of thresholds, an arbitrary monotone linear function. You can get any policy you want. You can't distinguish it from 1-of-1, it's the size of 1-of-1, it scales like 1-of-1. Awesome, efficiency solved.
Pieter and Andrew Poelstra went to started to implement this. Pieter started with a Schnorr verifier and then Andrew went to make a key-sharing tool to do thresholds. We realized that in order to make a threshold Schnorr key, the signers have to collaborate to generate the pubkey. You can't derive a threshold key, the designers have to interact. If the threshold is big, they actually have to interact a whole lot. That's a little problematic. We have seen in the bitcoin ecosystem that there are a lot of cool things you can do with Script, but if you need a complex state machine to ... then people don't build the client or software to use the scheme. So that's sort of fundamentally worse than what we have today, even though it's more efficient. What other criteria should we be thinking about when selecting or creating a signature system?
One of them is accountability, or so I thought. This is where in the bitcoin multisig system today, you can see who signed a transaction. When there's a 2-of-3 signature in bitcoin today, using P2SH, you can see who signed a transaction. This is actually kind of important because what if one of these 2-of-3 signatures is applied to a transaction you did not authorize? You want to know. And not only do you want to know, you want to be able to prove to the world that they did it, you might want to sue them, you might want to discredit them. You want to communicate about this. This is a useful property for a multisignature scheme to have. The Schnorr signature scheme does not have this, you can't tell which of the signatures signed because it looks exactly like a 1-of-1 signature. So this is a criteria that would be useful to have.
Another useful property of a signature scheme is usability. Many of the multisignature schemes require round-trips between the participants. In the bitcoin multisignature scheme today, you can send a transaction to the first signer, they sign it and then send it to the second person to sign it, and they can do this all the way to the end and then put it in the blockchain and you're done. No roundtrips. With n-of-m Schnorr you can basically do that. You need one round-trip to establish the knowns which you can do in advance. But you have to have lots and lots of round-trips to establish the threshold. There are some other schemes that require many rounds during signing. You would have to go to and from the safe during signing, basically, to complete your transaction. Nobody is going to use that. And building the software to support that and teaching people how to use that would be a real barrier. So usability is one of the other constraints we have to worry about.
Another one is privacy. I talked before about why privacy is important. In the context of multisignature, privacy is useful because if an attacker knows your policy, he knows what to target. If he sees that you are 2-of-3, then he has more information about what to look for which may lead to him kidnapping specific necessary people to coerce them into signing or stealing their private keys. Maybe it's 8 people he has to go and kidnap, and that's a different tradeoff. If you are using an odd policy, then people can trace your transactions which has commercial implications as I mentioned earlier. Seems like privacy may be incompatible with accountability, but it's not true. Accountability means that the participants and the people they designate need to know what's going on in the transaction, and privacy on the other hand is a statement about third-parties. So this Schnorr signature stuff has great privacy. Nobody can tell what the policy is, except for the participants. But it has worse accountability. And bitcoin today, has great accountability but very poor privacy.
So there's some papers recently about threshold ECDSA. And this is fancy cryptographic techniques to do the same stuff as the Schnorr multisignature, but using the existing bitcoin infrastructure that has already been deployed today. This scheme has a limitation. It fails on usability. Also like Schnorr multisignatures they have no accountability. But it works in theory today, already. There's no implementations right now that don't require a trusted dealer. But this may be okay for situations where you don't care about those implications. Now I have to say that the first version of the paper of this said that you could do this without a trusted dealer, and I argued with the authors and they eventually convinced me that yes we could really do that. And then they retracted their paper and said no, you really need a trusted dealer to generate the keys. They have since gone back and come up with a scheme that I believe, without their convincing, will work without a trusted dealer but no one has implemented it yet. I am not going to talk further about this. It may be interesting, but it's not the long-term interesting stuff.
So I want to talk about a couple of schemes that I have been working on and coming up with that give different mixes of these usage criteria. One I call TREECHECKSIG. And we we start with the observation that n-of-n all-sign Schnorr multisignature meets the criteria of it's efficient and it doesn't require a bunch of roundtrips, and it's actually completely attemptable because if they all signed, then they all signed. A larger threshold like a 2-of-3 signature... could be satisfied by any of 3 2-of-2s. So you can enumerate all of the possible satisfactions for the thresholds, and there are M-choose-N of them, and you can build a hash tree over them, like we use for SPV proofs in bitcoin, and then in your signature you can prove to the network that this pubkey is from this set of pubkeys and you provide just the N-of-N signature. Now this is interesting because it scales fairly well, it has improved efficiency although not the same efficiency as 1-of-1, it's completely accountable, and the parties know who signed. If you randomize the keys in it, the only thing that someone can learn is the size of the upper bound of the threshold and it's relatively cheap to add one hash and double the size. So privacy is pretty good too. The verification efficiency is great, it's basically constant time to verify, it's basically checking a signature and then some hash verifications. The real problem with this scheme is that if you want to talk about a big threshold, like more than 20 participants, the tree becomes so large that you can't compute the pubkey. Now the network doesn't have to do this, but the participants do. But this becomes impractical quite quickly because of this binomial blowup.
MULTICHECKSIG was a idea that was trying to fix this. Instead of building this big hash tree where you precompute all of the satisfying combinations, why not have the signer show the network all of the M pubkeys that are participating, and then have a verifier compute the N-of-N? So I can take the pubkeys and the verifier on the network, and say okay this set is signing, the verifier can add up the pubkeys, and then provide a signature for that. This is good accountability, but it's not private because the network can see who is signing. And it works with one pass so it's usable. The size isn't great, it's always larger than the tree version, even though that tree version has that binomial blowup in it. Verification is fast, and it has an okay set of tradeoffs.
And then taking from this idea, I thought well could we do better. So I came up with this notion of POLYCHECKSIG where the idea is to take this MULTICHECKSIG and instead of revealing the pubkeys of the participants in the signature, can we reveal a linear formula of pubkeys in our signature, and then ask the verifier to do some linear formula on our pubkeys to compute the keys to be verified. So I can show how this works concretely. Say we want to do a 3-of-4. So 3 people, 1 not signing. We need to compute an M-of-N pubkey that has left out one participant. So we make our two pubkeys, it's the sum of the participants. And then we tell the network another public value, which is the sum of the participant A plus two times the sum of participant B's key plus three times participant C's key and plus four times participant D's key. So if you know how to sign with participant A, then you can be fine with 8 times A or any other constant, it's just multiplying a constant. Then you can ask the network, hey we want to do this signature and we don't want C to participate. The network would compute a new pubkey (here on the slide denoted P sub V). If you write this out, this value is the same as .. there's no C term. It canceled out. This can be extended by adding quadratic and cubic points. You can cancel out an arbitrary number of values. You can have a threshold.. M - N plus 1 is the scaling. You can send these values to the network and encode them in an unbalanced hash tree, so you only have to reveal to the network just as many as you're going to cancel. And what that means is that you might have 50 of 100 signature, but if all 100 participants are available, you compute that as a 100-of-100, and you reveal only the first term of the polynomial, and then you provide that 100-of-100 signature, and your transaction looks like a 1-of-1. And so you get perfect efficiency in the case where all of your cosigners were available. And perfect privacy in that case because you revealed nothing about your actual policy. If you need to reveal more people because some signers were offline, you can do that and you leak a little bit about your policy.
I mentioned in the list of features, composable. Composable is this notion is that it should be possible for you to have your own policy and you have your own policy, and neither of you should care about what your own policy is. I should be able to create a policy of your policy without knowing the details. You should be able to use 2-of-3, and the other member should be able to use 3-of-5 or whatever. I should be able to make a 2-of-2 of you. And I shouldn't have to care. These schemes themselves do not do anything for composability directly. But we can overlay a higher level higher scheme that achieves composability and what we found when exploring this is that if the higher-level scheme is only able to express a monotone boolean function, that is to say signatures where someone extra-signing will never make your signature untrue, then it is quite easy to write software to handle unknown parts. You can write software that will sign parts that it knows, and not worry about the rest. So we think that will make, if we overlay a scheme that does this on top, then we should be able to get something more composable, but we haven't really explored this whole space yet.
I have a sort of comparison chart here, and if you notice all of my slides are wordy. Just to give you an idea of how the schemes scale, just look at this chart.
scheme, accountable, usable, private, comms bitcoin, y, y, n, 0 + 0.5 schnorr, n, -n, y, prop (n,m + 1) TREE, y, y, ~y, 0 + 1 MULTI, y, y, n, 0 + 1 POLY, y, y, y, 0 + 1
scheme, size, 2-of-3, 13-of-15, 50-of-100, 990-of-1000, CPU bitcoin, 34N+74M, 250, 1472, 7100, 107260, M schnorr, 34+74, 108, 108, 108, 108, 1 TREE, Ig(B(M,N))*32+74, 172, 332, n/a, n/a, 1 + 0.01*N MULTI, 34N+74, 176, 584, 3474, 34074, 1 + 0.01*N POLY, <=(M-N+1)*34+74, 142, 142 - 176, 142 - 1808, 142 - ??, 1+(M-N)/2
explanation of the chart (at 35m 55sec)
It's not clear how this will develop. Some of these ideas are very complementary and can be merged. Expect to see some more development on this in the future.
The art of selection cryptography: So I want to talk about now about a thing that I am calling the art of selection cryptography. And I'm not using the word cryptography here; this is more the philosophical selection of my talk here.
Before I can tell you what selection cryptography is, I think I need to redefine cryptography. The definition that people use today is broken. It's wrong. You go to Wikipedia or any dictionary and they will say that cryptography is secret writing or deciphering messages. That definition has nothing to do with many of the things that we today, like digital signatures, zero knowledge proofs, private information retrieval, hash functions, it doesn't talk to things like cipher suite negotiation in TLS which has been a constant source of vulnerabilities. You look at TLS and say, TLS is cryptography, it's a cryptographic protocol, but the dictionary says only the ECDSA part is cryptography. That's ridiculous. And bitcoin itself, too. You can build a bitcoin node today with absolutely no cryptography in it. The only cryptography that we use if you go by the dictionary definition is wallet encryption. And then you never send the messages to anyone else.....
So to explain my explanation, I want to take a step back and sort of give my view on the world. Back in the early 90s when I was sort of politically coming of age and on the internet, I was very excited and involved in the cypherpunks group and the activity around the prosecution of hackers and the export of encryption software. And this politics or religion that the internet would change everything. There was this rallying call, "information wants to be free". And I knew in my bones that this was true. We were going to use computers, which turn everything into information, and we would use networks to hook all of the computers together and we would change the world. We were going to change the power balances, make more people more empowered and everyone would have access to the world's knowledge and they would all fulfill their potential. That's a very political take on something that I now think in fact is better described as a law of nature. This is not just that I want information to be free; no, information really does want to be free. It is fundamental that information will percolate out into every little nook and cranny, and you can't control it. The result is that often bad things happen because information wants to be free. Sunlight is the ultimate solvent, but solvents corrode. So my email wants to be read by the NSA. When I try to login to my server, it can't tell me from you, because you can just replay my login. And now you're logged in as me. When you go and browse the internet, people learns how it works. They see inside your mind what used to be completely private. When I go to research something, marketers can send out cheap spam, and that spam is just as visible as the information I seek. If I want to build a digital cash system, I can't, because information is perfectly copyable. And all copies are just as good. Money that you can just copy is not much of a money at all. So you have an environment where there are powerful parties that have more ability to use this fundamental nature of information, and this goal of everyone being more empowered may not come true.
And so, I would like to propose this definition of cryptography that says that cryptography is that art and science with which we try to fight this fundamental nature of information. We try to bend it to our political will and to serve our moral purposes. And to direct it to human ends against all outcomes and all eventualities that may try to oppose it.
"Cryptography is the art and science we use to fight the fundamental nature of information, to bend it to our political and moral will, and to direct it to human ends against all chance and efforts to oppose it."
This is a sort of broad definition. It encompasses everything that we should properly call cryptography, and a number of things that we haven't yet traditionally called cryptography, such as computer security, or even sometimes the drafting of legislation. I don't really offer this lightly. I have thought about this for a long time and I think this definition leads to pretty good intuitions about the kinds of things that have cryptographic considerations.
So often, as technologists, we get excited when we have a cryptographic tool to solve this problem. You want to read my email? Bam, encryption. You want to track my stuff, bam private information retrieval. Bam, digital signatures, I'm going to solve all problems with some cryptographic tool. You can fight back against things you don't like in the world with a bunch of math. That's really cool. But sometimes we get caught up in the coolness of that and we forget that we are really fighting the fundamental nature of information. (41min 56sec)
And it's hard. It's so hard that it may not be possible to make secure cryptosystem. They are all predicated on a set of strong assumptions. We assume that some mathematical problem is intractable. Over time we have seen that many cryptosystems have been broken. Few people believe it to be the case, but it may be fundamentally impossible to build strong cryptography. If you were able to build a provably secure asymmetric cryptosystem with no strong assumptions in it, this would be a proof that P != NP, you could win a million dollar prize to prove this. You could still build insecure cryptosystems; building securing cryptosystems is actually harder than just figuring out whether P isn't equal to NP. So don't expect anyone to solve this soon.
A really important point here is that attacks on cryptosystems are themselves information that want to be free. We often underestimate how powerful computers have become because our software is so bloated and slow and has many layers. You can imagine that a computer is sort of an intellectual equivalent of someone who is doing arithmetic for you, but a billion times faster. So if you can attack a cryptosystem by applying a lot of force, computers are a force multiplier. Everyone who is attacking your cryptosystem, if they have a desktop computer, it's like them having an army of a billion imbeciles. They might be imbeciles, but there's a billion of them. And that's before they get a botnet. And then they have a hundred thousand times that much computing power. Or the NSA data center. So if someone is able to attack your cryptosystem and reduce it to a state where it is still a huge haystack that they are searching for a needle in, they can then apply a lot of computing power to go further. We can even use the computing power to search for complicated algebraic solutions for the systems as well. It's not just the number crunching. It actually expands our intellectual capacity to attack the systems. This more strongly favors attackers than it does defenders in general. In order to build a secure cryptographic system, we have to secure it against any eventuality. And so as a result, virtually everything people propose ends up being broken. This is certainly true for everything I've touched. There's a whole subfield in academia about provable cryptography. People get confused about what provable cryptography means. Provable cryptography is about cryptography that is secure as long as the proof is right and the assumptions hold. Well why wouldn't it be secure if the assumptions hold? Well it turns out that's actually hard to achieve too. And many things that occur in provable cryptography, there's a pressure to publish a proof. So the easiest way to get a proof is to adopt a stronger assumption. There is a lot of provable cryptography that is broken because they adopted assumptions that were wrong, they sounded plausible but they turned out to not be true, or they proved some vacuous property that did not map to security in a practical sense.
I don't mean to sort of say that cryptography is the only thing that people do that is hard. Civil engineering is a tremendously difficult discipline and there are lives on the line if a building doesn't stand up. But usually in civil engineering you are more worried about a limited set of natural causes and you're not generally worried about the billion army of imbeciles and all of the world's efforts to nearly effortlessly attack you. If you ask someone to build a building that cannot be taken down through all the force in the world, they would tell you that you're nuts. They would probably ask for a trillion dollars first, and then tell you that you're nuts, but still take the trillion dollars. We are only able to think about cryptography at all because we can use software. Software is a great building block. We have tremendous tools to write software that is more complicated than anything else. A very complicated piece of mechanical engineering, something like the space shuttle, on the fringe of what we can do as a civilization, has on the order of something like 200,000 parts. But if you look at a conventional piece of software that you use every day, say Firefox, that's 17 million lines of code. Almost any one of those lines of code could be undermining your security. Typical figures for defect rates for software in the industry are numbers like 15 to 50 bugs per thousand lines of code. The number varies a lot; maybe for software where people care a lot it is more like one bug per thousand lines of code. But one per 1000, and we're talking about software that has 17 million lines of code? A complete GNU/Linux desktop is like 600 million lines of code. So software, despite our awesome tools to build it, is very buggy. Making it cryptographically secure is even harder.
"Software testing is making sure that your software does what it is supposed to do. Security testing is making sure that is all that your software does." And that is fundamentally harder.
I have hopefully impressed on you that this is a hard area. This is not news to me. There is this adage on the internet that goes like, "Never write your own cryptography". Because people did appreciate that it's hard and everyone gets it wrong. But I think that's bad advice. I call that the abstinence-only approach to cryptographic education. And one of the results is like provable security stuff, if you tell people to never write their own cryptography, they will go off and redefine cryptography to be some narrow part, like yes I didn't reimplement AES... So some people have reimplemented AES and had only minor attack problems, but rarely are systems broken by people reimplementing underlying cryptographic primitives, although there's plenty of potential to do so. Systems are more often broken by higher-level violations of their assumptions. And even if you follow this "never write your own cryptography" maxim, now you have this other problem which is, now you have to go and select the cryptography that you will be using, and you have to use it with that software's assumptions. So I would like to re-emphasize that if people are counting on a program, to fight this fundamental nature of information, the program as a whole is cryptographic. That doesn't mean that you can't write it, or that you shouldn't write it, but it means that you need to step up to the plate and recognize the risks.
This does come along with some bad news, though. I can't tell you how to write a secure cryptographic program. We don't even know if it is possible. We do know that some things are unsafe and that you shouldn't do this or you shouldn't do that. But usually this advice is very application-specific. It is not general advice. So in general what I can say is we should face the challenges frankly and understand the risks, we should communicate and learn from our mistakes, and we should advance the art.
So in the interest of advancing the art, I wanted to sort of talk a little about a special kind of cryptography that is probably the most common cryptography in the world. I call it selection cryptography. It is the cryptosystem of picking cryptosystems. And you should think about, when you select a cryptographic tool, or build software that has cryptographic implications, what selections are you making and are those choices good from the perspective of a cryptographic adversary. So I see a lot of norm in the bitcoin space is to build tools out of primitives that were found on github. I don't say that to deride it. There is some fantastic code on github, including my own. But not all code on github is good. So we can think about things like how can we go about doing a better job at selection? If you are a domain expert in the particular cryptographic tools that you are using, then you can review it as a true reviewer. That's great, and I hope everyone that can do that, does so. But if you're selecting someone else's code, you probably can't review it. You probably don't understand the underlying parts, and you probably shouldn't necessarily need to. So I propose this 3-step program which is to ask yourself first, is this code broken or malicious? What can happen if it is? You have to think about this. If you come back and say, "not much" then you are wrong and you should go back and think about it some more. Go back to step one. No, seriously. If you take a piece of software that seems like it can do nothing wrong, but its install script has a root shell backdoor in it, and you run it on your infrastructure, you're completely compromised. Everything has risks. You need to identify what they are, and then deal with them. And think about what can be done to mitigate those risks.
So I wanted to give some concrete examples, and this was really hard for me. For all that I said about how hard this is, I don't think that anyone is bad or incompetent for making mistakes in this area. I make mistakes.. DJB, one of the most brilliant cryptographers of our time, his original ED2559 code had a bug until someone tried to formally prove it correct and found that it occasionally generated incorrect results. So everyone makes mistakes. And that's fine. We need to understand the mistakes so that we may learn from them. I have given an example here. On the screen, is an incredibly commonly at least in the past deployed piece of javascript for "secure random number generation" and this has been used on 100s of websites including many many bitcoin websites that generate private keys for wallets and signing. It wasn't created by someone in the bitcoin ecosystem, but it was widely picked up by it. Now it has a couple of things that are sort of funny about it that a reviewer would pick up, or at least a reviewer with domain expertise. So one is that there is a check in it, where it checks if "window.crypto", which is a cryptographically secure random number generator, is available. And if it's not available, it just doesn't use it. It doesn't throw an error. It doesn't do anything.. it just doesn't use it. What it does use is "Math.random". And in most browsers, "Math.random" is a 48-bit linear congruent generator. There is only 248 possible states for it. And it's also in most browsers seeded from the time that browsers started. So this value is pretty predictable. And then also, it uses the time that it was running at. That's also pretty predictable. And so if you're in this state where you have not used the secure random number generator, you're using something that maybe has on the order of 50 bits of entropy at most and probably quite less. Now with the power of a billion imbeciles, an attacker can search this space. It is quite practical to do so. And they can discover private keys as a result of doing so.
Fortunately, "window.crypto" is available in all current browsers, so this state where you don't have it shouldn't be happening very often. So that's good at least. But I have complained about this code to people using it, because it looks unsafe. Now what I didn't see, and what I tried to tell 12 people now, what virtually no one I have showed this to, even telling them that there's another issue in this, is that that.. that there is this comparison with navigator version. Well, navigator version is a string. And if navigator.appversion returns false in the conditional, then it doesn't use the secure random generator. This code never uses the secure random number generator. And this happens when you are using this pile of javascript from inside webworkers, which actually happened in the bitcoin ecosystem. But you don't even have to have that problem; that's what happens when you don't select things correctly.
Another concrete example is that a very popular bitcoin wallet deployed a message encryption function using ECIES. So, ECIES is, it's not correct to say it's standardized, but it's a well-studied way of doing message encryption with elliptic curve cryptography. And I say it's not standardized because there's no test vectors or things like that, you can implement it on your own and your implementation may not match anyone else's. But it's well-understood, and if implemented correctly, it's secure. So they implemented ECIES using source code they found on github, and the source code they found on github was widely leaked all over the internet, it was mentioned on bitcointalk, people who knew cryptography were talking to the author about it. The author gives me the impression of someone who is sort of new to programming. He was really excited. And so this wallet picked it up, they reviewed it- good for them. They found that it used an insecure random number generator, like the python mersenne twister stuff. But what was actually implemented there wasn't ECIES at all. It was some other system that the author had just sort of magicked. Heh. It had a bunch of other problems. A couple of the problems is that if someone was running a decryption oracle like something that would throw error messages back, you could send it 216 messages, collect the results, and then take another message to that same destination that you wouldn't have decrypted, and you can use those 216 results and decrypt the message. It's a decryption oracle attack. This seemed dir.. directly leaked 7 bits of the plaintext in every 256 bits that it was encrypting. It also had this issue where if you send some messages through it, it would just silently corrupt them on encryption, including the all 1s-bits messages. So if you send it hex FF through it, the results would be line noise and it was just a silent corruption bug. So all of these issues I found in about 10 minutes because I have domain expertise in this exact kind of cryptosystem. And there are probably more problems with it, I stopped looking at that point. The authors of this wallet software removed this software, took it down, took that feature out, in about 1 more minute after my report. I don't think that they did wrong so much here, I think they are very competent and that they responded in a very responsible way. Other people that I have worked with have not been so responsible in the past. Other wallet vendors have done similar things. The same author of this freaky code had written a bit of signing code that another wallet vendor included in their wallet. That signing code had the same kind of insecure RNG and that wallet vendor didn't even fix that, they deployed it with the insecure RNG and it resulted in a CVE against the wallet. In theory if you sign transactions with your keys using that wallet, your private keys may be exposed by it. Finding these problems requires a lot of domain expertise. "Okay, go ask someone who knows about these things" does not scale particularly well. So what can we do to do better here?
So I have proposed a number of risk mitigation techniques that I think would help if people were doing these things or more of them more often may help advance the art here. One of the risk mitigation techniques is to ask is this software intended for your purposes. If this is someone who is learning the code project, that's great but perhaps you shouldn't secure millions of dollars of BTC with it. Are the authors taking the cryptographic considerations seriously? You can see this by looking at their discussions. How do they respond to security concerns and issues? You can look for a review process. Perhaps the most important question here is, whether there is a review process here at all. One of the other adages is that anyone can create a cryptosystem that they themselves cannot compromise. And it's very true, and this is why one of the very best ways to learn cryptography is to break other people's cryptosystems. So any kind of cryptographic software should have some level of review. The review could be made available to its users and you should be able to look at it to get a feel for how it is being handled. Sometimes people say, "well this is being used everywhere and obviously it has been reviewed". But in reality, people adopt things because other people have adopted things, without ever looking at the source code or doing domain expert review. So you can't go by whether it is in wide use to determine whether something has been reviewed. A touchy thing is, what is the experience of the authors? There is some power in the authors being domain experts. I don't mean to imply that only an elite group of people can write cryptographic software, because that doesn't work. I mean, if we're going to be frank, all of the software that we write in the bitcoin ecosystem is to some degree always cryptographic software. So saying that I have to write all of it is ridiculous. You can look for things like, do the authors have a deep understanding of what they are doing? Even if you understand the procedures, if you don't understand the reasoning for why you do things in a cryptosystem, then you wont spot all of the subtle assumptions that you have to satisfy. It's hard because if you're not yourself an expert, then it's easy to make a mistake. Someone can sling a bunch of technical terms at you, and it all sounds equal because you don't know about that area. I think that one of the things you can do is to look at the authors trying to extend their reach, that they are learning, they are citing sources that they are sort of expanding their knowledge. And there is a sort of process around excellence in knowledge. And that may be more visible than just trying to evaluate their technical skill.
One thing to look for is whether the software is documented. When we write complicated pieces of software, they are unsafe to maintain if their internal assumptions are not documented. You should look internal to the software whether there is documentation and explanations for what's going on. You can have the smartest person in the world but he wont remember what he was thinking a year ago when he goes to change it. And are the assumptions documented for the assumptions about the outside world? How can you know that your use of it isn't going to violate its assumptions unless it has told you what those assumptions are? I think you can look for software portability because people who are working hard to produce good software will tend to make their software more portable. And also, when you try your software in lots of different environments and lots of different application contexts, you will expose bugs that you wouldn't have seen otherwise.
One of the reasons that we can build really complicated things in the form of software is because it's possible to build automated tests for software in a way that you cannot do for a mechanical device. So software should be using this power of testing to explore the space of it. Unfortunately it's possible to make tests that just tell whether the software runs, and it's kind of meaningless. So a technique that I suggest is that if there are tests, then you go to the software and add bugs. You don't have to understand software to add bugs. And then see whether the tests fail. It's something that anyone can do. You might find that the tests don't fail at that point. So then you can iterate with the author and try to compare it against other risks and then make a decision as to whether to use it. You could also look for the adoption of best practices. Now, if you're not working on this kind of software then you might not know what the best practices are. There is a lot of disagreement about what the best practices look like in software. One of the most competent programmers I know, in the c language, has this rule where he writes software with basically no unnecessary whitespace and no unnecessary parens. And everyone else hates it. Now, having worked with him for some time, I actually like it a lot. Once you get used to it, you start to see things in it that you wouldn't normally see. But there's a lot of debate around this and I'm not trying to propose a specific standard for how you write software. But if people are doing a good job, they will have standards. And whatever they are, their enforcement leaves evidence, like during review you will see people saying things about adopting best practices. And you can also ask, you can ask an author of any cryptographic tool what have you done to mitigate risk. Anyone who is an author of a cryptographic tool should have an answer to that. If they don't have an answer or a list of answers, then they probably haven't given it much thought. So a lot of this reduces to looking for conscientious software development and that is not enough to guarantee secure cryptography. It's a necessary component of secure cryptography, but not sufficient. I found often that in the wider ecosystem that people's enthusiasm about software is often inversely related to how rigorous its development was. And there's a good reason for this; if you spend a lot of time making software secure, there will be no features. There needs to be some balance and compromise here. One thing to keep in mind that is for cryptographic secure, "move fast break things" does not work. If you have lost your privacy or your BTC, you're not getting it back and the fact that the next version fixes the problem doesn't help. One issue to watch out for is cryptolaundering. People will take a "I'm learning to code" program and then put it in a nice shiny professional app. I don't think you should refuse to trust people that when you take their software that they haven't done the work, but you should refuse to trust. Trust but verify. When you verify that good practices are being used in the parts that you're taking, you create a market pressure to do better. You give people a reason to feel like their time and effort invested in making more secure systems is well worth it.
Finally, this is I think a tricky point. Some of the things that we justifiably want to do, violate good practices. Now, I just said before that there's no sort of authority on what a good practice is. Name any good practice, and there is always someone that wants to do something that happens to violate it. So people are very opinionated about this stuff, so I am going to give some opinions and I know people will disagree with them. So I think that for general cryptographic code, it is unsafe to write general cryptographic code that does not deliver constant time operation. I also think that it is unsafe to write general cryptographic code that can't clean up and can't avoid memory leaks of information when it runs. I also think it is very inadvisable to write cryptographic code in languages that are not typesafe where the language wont automatically catch comparisons with the number 5 in the string or whatever. So all of the points that I just made basically say "never write crypto code in javascript, ever". And that is a ridiculous proposition because it is the best deployed platform for software that has been deployed in the world available today. And so I don't really know how to weight that, but when you set some requirements on your application, you are guaranteeing that you are excluding some secure practices sometimes. And you are excluding contributions of people who have some good thoughts on this stuff, because I won't write javascript crypto code. And I'm not alone. But that doesn't mean you don't do it. But you should be keeping that in your mind weighing that against other factors. So maybe you want to see more rigorous testing in something like that. Or you want to architect your application so that those things that you cannot achieve are not an actual issue. It's just something to keep in mind.
After all I said, I still think I know nothing about the subject. This is really a vague art right now. I think that we need to learn more about it and demand more about the cryptographic tools that we are using so that we can advance the art. And in the bitcoin ecosystem in particular, I worry that if we don't advance the art, there will be more big events, more billions of value lost. And the answer is to start like regulating people and saying Bob can't write cryptographic code, that sort of proposal will happen. And we will have to fight against proposals like that. And that would be directly opposed to the kind of decentralization that I think that makes bitcoin interesting. But to have the freedom to build these systems and to explore the space of what's possible, we have to sort of control ourselves and we have to be responsible. And we have to work towards that because we don't know how to get there today.
So I am very interested in techniques or tools that people have found to make good selection of cryptographic stuff and also things about building cryptographic stuff because I do that too, although that's its own talk.
I have reached the end and I am right on time. Thank you and I would be glad to answer any questions.
Q: So you made a comparison to civil engineering and in civil engineering they have liability for people who are developing plans for buildings and things like that to incentivize safe building. So can you think that software engineers also should be putting skin in the game and actually assume some liability for their bugs?
A: So it's not just civil liability, in some places like Israel for example you can go to jail and you can be criminally prosecuted for incompetent civil engineering. This is a tricky subject because the greater the responsibility the greater the barrier to entry, and one of the fantastic things about software is that there is basically no barrier to entry. If you have a computer, you have all of the tools you need to be a world-class programmer just by downloading them. So it is more costly to put restrictions on software than it is to put restrictions on other fields. Few people are amateur civil engineers. I don't know how society is going to weigh this going forward in the future. As a software engineer, I would say that if there was extensive licensing or bonding requirements, I would not be in this space. But we do have to weigh that. I think that maybe we could sidestep this by upping our art and doing better so that we don't need the backstopping of regulatory requirements. And I also think it is not clear that liability doesn't exist and maybe we just haven't seen what that looks like. As software becomes more integral to more important things, we are going to see more litigation related to incompetent software and as courts become more versed in what good software looks like, we might start finding negligence in cases where software wasn't doing what it should do. But that's something we'll find out in the future.
Q: Alternatively, do you think something like certification or insurance against bugs for professional software development?
A: I think we are going to be unlikely to see insurance without liability because that's the requirement for it. Contractual liability for software behavior already exists in the wider world and people should make more use of it. I think that's useful, sure.
Q: You have multiple ways of unlocking a transaction by presenting a root. And then there was another step after that.
A: I can show you the point in the bitcoin-wizards logs where that was discussed. Sorry about that. The general idea is that instead of you constructing and fully materializing this tree, only some keys would be permitted. You present the pubkeys that would contribute to it, and you get some.. and you ask the network to do the summation for the specific M-of-N that you want. So I can make a scriptPubKey that just has a list of public points, and then I can have a CHECKSIG operator that knows how to sum up the points or some subset of them and I can signal what subsets are permissible to accept the signature. Well, we can talk about this later.
Q: Point recovery?
A: Yeah... You can't pass that test unless you have 100% code coverage. So I thought it was dumb to list code coverage. If I was listing out all the techniques that I thought were useful to build secure cryptographic software, I have a list of like 40 of them. And that's one of them, to make sure you have 100% branch coverage.
Q: A specific disastrous example would have been useful.
A: Actually, that's one of the reasons why code coverage is super useful. It sort of double checks your assumptions. Like oh, this code is never running, hmm that's strange. And also doing the bit where you add bugs into that section and the test never fails, then it failed.
Q: Do you think it is possible to augment consensus rules for bitcoin? In another language other than C++? And know that it is effective?
A: So, in decentralized consensus, we have this problem that correctness is the less important criteria. Consistency is more important. Correct or not, the state can't be inconsistent. Can you reimplement the bitcoin protocol in another language and have a hope of success? And this is tricky. I don't want to answer "no", but I don't know how to answer "yes". There is some research into proving that different pieces of code compute the same output, and anything more complicated than a context free grammar, the problem of comparing two different programs is undecidable. So you have the worst case where you can't decide whether two Turing programs compute the same thing with a program. And that's kind of academic. In practice, I think you could actually get pretty close, and I think it depends on what the failure modes are if you do fall out of consensus. So if you fall out of consensus and the result is a denial of service attack on a service you run, sure no problem you can get close enough that it is reasonably unlikely and maybe you only fall out of consensus once in a while. I think the most important thing you can do is to understand the contours of the problem and to understand how hard it is. Keep in mind that Bitcoin Core is not necessarily consistent with itself. There have been bugs in the software that for example, Bitcoin Core used to use BerkeleyDB (BDB) for the blockchain database. And BDB, two copies of the same software, on the same hardware, same operating system, would not necessarily be consistent with each other because there were some non-determinism in BDB that changed its behavior based on the order that the blocks were written to disk. And under certain circumstances, under large blockchain reorganizations, some nodes would allow the reorganization to occur and other nodes would run out of locks because of how the data was put on disk on them. And then Bitcoin Core would break from the same version, on the same operating system on the same software. I think that back earlier in bitcoin's history we did not quite understand the importance of consistency. And so when we're picking tools that go into Bitcoin Core, we're performing a cryptographic operation (the consensus of the network), and we have to understand our assumptions. One of our assumptions is that all of the computers will behave consistently. So when we have a dependency, we have to ask will that dependency obey that assumption. And it turns out that most software is not written for consensus systems. And most software that you go and find isn't consistent. A problem here is that the authors of the systems will go fix bugs, and then it's inconsistent and we have to control for that. That's one of the reasons why today in Bitcoin Core we internally embed the database that we use for storing the blockchain. So I hope I didn't dance around that too much. I am just trying to say that it's hard.
Q: Does Bitcoin Core involve any floating point operations?
A: Not as part of the consensus. Difficulty is done with large 256 bit integers for work resetting. IEEE floating point is well-specified, but what compilers normally implement is not IEEE floating point, it differs from architecture to architecture. I have actually seen cryptocurrencies that have gone and put floating point in their consensus code, and it's totally breakable and quite frightening, so don't do that. Don't put floating point in consensus code.
Q: ...
A: Let me give a philosophical point here. When I make a negative comment about some other cryptocurrency, it gains me nothing. So do not expect people who are big names in this space to go around telling you what's safe in other systems. In particular not only does it not gain me something, but in some cases I have been physically threatened because of negative things that I have said about someone's pump-and-dump scheme. So I tend to be pretty conservative about this. In this case I would give you the example of Solidcoin, which implemented a proof of difficulty change over time which involved transcendental functions and floating point.
Q: You mentioned at the beginning that you are one of the five core contributors and that you spend a lot of your time reviewing other software. So I am curious if you could talk about beyond the five of you, how many people are contributing to bitcoin on a regular basis?
A: If you go by the git commit logs, you'll see that in the last release in 0.10, there were 100 contributors. Sometimes those contributions are just fixing a string or spelling mistake or whatever. Some of them were more substantial. There is a kind of power law distribution. I would say that there's on the order of 10 to 12 people who are contributing at the same level as all of the core contributors. And then there's people that fix little bugs, and then it falls off after that. It's a hard area to contribute to. It's quite frustrating and very unrewarding to join the project. We are trying to improve this. Part of the problem is that you show up with something neat and now you have to pass Greg Maxwell for review, and I think everything is broken... So. I try not to be a barrier like that, but it's really hard. So we have been doing a bunch of work to make the software more modular. It should make it easier for people to contribute more safely and expand the contributor base further.
Q: What do you think of the provably accurate Stellar.. ?
A: Well, remember my comment before. I have made some public comments on this, so I will repeat them. I have a general complaint about the consensus model in Ripple and Stellar and new Stellar. I complained about this back when when Ripple was announced. Basically, there is a strong assumption in the system, and this is true for all of them, amusingly the Ripple model didn't achieve its properties even when its strong assumption was met. The new system is provable, meaning that it should meet its properties even when the strong assumption is met. Back when Ripple was released, the strong assumption wasn't even described. So there's a bunch of participants in the system, they have trust they put out into the network, and I went out and posted about this, and I mentioned that there are certain trust topologies that are guaranteed to fail, so what are those and how does the system stop those from developing? That's the strong assumption- that people will configure their trust in a certain way. The new Stellar paper goes and formalizes that assumption a bit more. So we can say there is an intersection requirement and that trust has to overlap in certain ways in order for the system to achieve consensus. But they haven't formalized the process for achieving that outcome. Now, before you think I'm throwing a lot of stones here, I should point out that bitcoin's security model also has this kind of strong security assumption. So you could say in bitcoin our strong assumption is that a majority of the hashpower is honest, and you could say that as long as it is honest and honest participants are not partitioned from each other on the network, the system will achieve consensus reliably eventually. But why is half the hashpower honest? Well you can fallback to a set of weaker assumptions that we wave our hands about, like economic incentives to behave honestly and so on. And we have done that in the bitcoin ecosystem, we have stated our strong assumption, talked about its limitations, talked about why we think those limitations are plausible, and that space has been explored and it's still being explored by researchers and developers. And I would like to see more of that in the Stellar world. Really being frank about the assumptions and trying to figure out how plausible they are. One way to meet the assumption in the system, the trust intersection assumption, is to be completely centralized. If everyone trusts the same party, then the trust completely overlaps for the same set of parties. And that's a system that is secure but not decentralized, and I actually think there's a lot of place in the world for systems that are less decentralized because they have better scaling and some other interesting properties. I hope that answers your question.
Q: When you described ...
A: The question is, BerkeleyDB stuff showed there was a problem with the code and is that bad. Yes, it would be valuable to analyze this. So there is documentation about the behavior that we expect from the system. It's not necessarily great. There's the developer docs, the protocol specifications on the wiki but I wouldn't recommend trying to implement from that. I think that at the moment it's all relatively complete, maybe you'd have a fighting chance but I'm not sure. That could be improved and should be improved. We can make changes to the system more safely... In terms of the practical effect, whatever code is actually running and fails in consensus, we are kind of stuck with it. If the spec says one thing, and the code does something else, we are going to need to change the spec because the money has already moved. And we can't just say too bad, let's allow everyone to double spend. That's not reasonable. Work needs to be done there, but it doesn't replace actually achieving consensus in practice in the network.
Thank you.