Matt Corallo
Flexible Lightning in Rust
Video: https://vimeo.com/316624439
Slides: https://docs.google.com/presentation/d/154bMWdcMCFUco4ZXQ3lWfF51U5dad8pQ23rKVkncnns/edit#slide=id.p
https://twitter.com/kanzure/status/1144256392490029057
Introduction
Thanks for having me. I want to talk a little bit about a project I’ve been working on for about a year called rust-lightning. I started it in December so a year and a few months ago. This is my first presentation on it so I’m excited to finally get to talk about it a little bit.
Goals
It is yet another Lightning implementation because somehow we needed another one I guess or really not. But about a year and change ago I decided I want to learn Rust. Everyone is super excited about Rust. What is this thing? Maybe there is something here to it. I also wanted to learn a little bit about Lightning. We didn’t have anyone at Chaincode who had worked on any Lightning things. I wanted to get into the meat of it because we were just a Bitcoin Core shop or research lab so it is nice to have people who know all the different pieces. I kind of figured there’s this spec thing here but most of the implementations have been done by working with each other and it would be nice to have something that is done completely clean room just from the spec, just to see how good the spec is, to improve it, to provide feedback and see what kind of misconceptions I can come up with about how Lightning is supposed to work by just reading the spec without reading any of the other implementations. It turns out there were a number of them, the spec is good but it could use a little bit… Anyway I kind of got started and I started working on this and writing it. It was fun and I learned a lot about Rust and a lot about Lightning. I kind of realized there is this little niche in the Lightning node ecosystem that really isn’t filled at all. That’s to not be a wholesale node and provide people flexibility to integrate Lightning in the way they want. Today you can go take c-lightning and you can run it and it has a great interface and it works. Or you can take lnd and you can run it and you communicate over gRPC. You can take Eclair and even that has a lot of how it interacts with other peers built in and baked in and how it manages keys and all those things. There’s not really a good Lightning implementation that you bring your own key creation and you figure out how you want the UX to work, you figure out how you want the disk storage to work, how exactly you want all the pieces to fit together. That’s the niche I slipped into trying to fill with this project. The key initial supported use cases are first of all existing wallets. We have this big ecosystem of Bitcoin wallets out there, most of them don’t support Lightning because how would they have supported Lightning a year ago? How are they going to go integrate Lightning without starting completely from scratch, getting all of these protocol details correct and doing a tonne of work to make sure they handle every possible edge case in Lightning correctly. So hopefully rust-lightning is just this easy thing that you can take off the shelf, you can integrate it tightly with all of your existing management of UTXO and chain state and everything. It deals with the rest for you. It deals with channel management and peer management and all of those good things. The other use case that I want to support a little bit better is flexibility around how you do, whether it’s a hardware wallet or whether it’s these partially offline nodes and I’ll get into a little bit more of what I mean by that in a minute, but letting people have more high assurance or at least different security models around how they run their Lightning node instead of just I have an online server and I put funds in it and I trust it is not going to get hacked. That is where it has gone and it has made a lot of progress. It is mostly there. There’s some stuff I want to change in terms of getting the last bits of onchain handling implemented and then I want to wait for some of the 1.1 spec stuff in Lightning before I ship it because that simplifies a lot of the interfaces for clients especially around fee handling.
Library Structure
So at a high level what does rust-lightning look like? There’s no runtime. A nice thing about Rust is there’s no runtime, there’s no background threading, there’s no garbage collection, nothing like that. There’s no JVM required. It is not like Go where you have these threads that need to be started and run garbage collection and all of those kind of things. Rust-lightning is entirely event driven. So you call into it and it doesn’t do anything unless you’re calling it. It doesn’t have a background thread so it has to. This means it is embeddable everywhere. If you can compile to LLVM and you can compile C code to your target you can run rust-lightning hopefully. There’s a few more pieces to work out on that but you can run it anywhere including hardware wallets, including WASM that kind of thing. It’s designed to be fairly modular. Especially the different pieces you might want to have separated or replace on your own. That means routing, if you want to do your own routing management, you want to do your own routing algorithm, you want to do your own routing table management. It is completely stubbed out. There’s a simple interface for it. There’s a default one provided that you can use or you can just build your own, no problem. The channel management is separated from the actual monitoring of the channels. The pieces that have to watch the chain and handle updates to onchain transactions is separate from the actual management of what the state of all your HTLCs is and what the state of all your channels is. This lets you very easily take updates from the channels so the channel manager gives you these updates that says “Hey I’ve received new information that you need to make sure to watch the chain for” and gives you that blob. Then you can figure out how you want to get that to the device that needs to watch the chain. Maybe you have multiple different devices that are watching the chain aka watchtowers. Maybe the actual channels are managed on a hardware device and you have a remote watchtower, something like that. This lets you have flexibility there of exactly how you want to integrate these things. So this is pseudo native watchtower support. Exactly what form watchtowers take is weird because there are a lot of different trade-offs in terms of what data you need to sign in advance and provide the watchtower. How much data it needs to store versus how much you’re trusting the watchtower with respect to privacy. Maybe it could steal your funds versus storing a lot of data. There are a lot of trade-offs there and I think the next talk will cover that a little bit better. Hopefully most of that goes away with eltoo and other future work. Today it is complicated so there is not full flexibility on types of watchtower yet in rust-lightning but definitely if you trust the servers that are watching the chain for you or you trust the watchtowers it is very easy to spin up a bunch of different servers doing watchtowers. So disk writing, all the kind of system interaction pieces are handled on the client side. So rust-lightning doesn’t figure out how to write stuff to disk. It just hands it to you and this allows you to have flexibility and say something like “Ok here’s an update you need to go update your watchtowers” and you can say “Actually I couldn’t get in touch with the watchtower, please pause that channel” so it has flexibility here. It turns out that is super complicated to get right and I will talk about that a little bit more when I go into some of the fun things we’ve done with testing on rust-lightning. The handling of pausing channels for watchtower updates has actually been pretty useful in terms of being able to have flexibility around how you monitor your channels. I did mention rust-lightning doesn’t do chain sync for you because a key user of potentially rust-lightning is existing wallets and I’ve spoken to a number of existing wallets who are excited about this idea. But they all already have their own chain sync. They’re all already SPV clients or maybe they use Electrum servers or whatever. This is designed to be flexible in rust-lightning. It gives you all the information you need of here’s the transactions you need to watch for, here’s exactly the scripts and the txids and whatever so that you can use Electrum servers, you can use Neutrino, you can use existing bloom filter stuff. It is all pretty flexible and you can choose exactly how you want to integrate with the Bitcoin chain. Also BYO, RNG and key creation. The goal is that rust-lightning doesn’t make syscalls for you, it just needs malloc. It is not 100% there but it is getting close. That means you have to have your own RNG. But of course if you’re doing a hardware wallet or something like that you’re going to want to use the hardware features anyway so I’m not going to try to guess what environment you’re running in, you have to plug that in yourself. Finally the actual TCP handling is up to you as well. There is this easy interface that I’ve written that looks exactly like select so you can just map this to an existing TCP socket handler super simple. Or you can consume the messages manually and this has been very useful for testing because we have a tonne of test harnesses that deliver messages out of order and delay message processing to simulate speed of light latency between nodes which has allowed us to be really flexible in terms of how we test the library.
Existing Wallets
In terms of the use case of having existing wallets that want to use rust-lightning. There’s a few key elements of how rust-lightning fits into that. First of all as I mentioned hopefully it can live in any runtime. I’m trying not to guess anything about the system you’re running on. The goal is that it makes no syscalls except for it needs to be able to call malloc, no libc calls except for malloc. We’re not quite there. We currently have some locking calls that we want to be able to stub out but we’re getting close. So malloc, that makes it a little bit harder with hardware wallets but it’s close. It is completely c-callable so you don’t need to know anything about Rust. You just need to be able to know how to call a function in C and link a shared library which every language in the world should be able to do at this point. And finally, WASM. The goal is to be able to run this completely in a web browser. We’re almost there, we need to remove one more dependency. You can compile rust-lightning and run a full Lightning node in your web browser. I don’t know exactly why you’d want to do this but in theory we support this. It leads to some excitement around running Lightning nodes in extensions and this will definitely allow you to sync different nodes. So maybe you have your Lightning node and you just keep an encrypted copy on a server of all the data for the Lightning node and when you’re on your computer it runs in your little Chrome extension or in your React Native app and then when you go on your phone it downloads the latest state and it stops the one on your computer and runs it on your phone. It’s just the same library that you’re running everywhere and you can just compile it how you like.
Hardware Security
In the model of doing hardware or a more high assurance Lightning node. One key goal is to do what I have been calling semi-offline nodes so they’re kind of offline but in practice have constant updates. So what I mean by this is maybe you have a HSM or maybe you have some other embedded device or maybe you have a regular server but it doesn’t actually have an IP address, it can’t connect to the internet. It only has one little line of communication out to maybe it’s a regular computer or maybe it’s a mother computer, maybe it’s a mother server, maybe it’s over serial, maybe it’s over USB. You feed all of the communication with your peers over that line and this allows you to have a device where the only thing it can communicate with to the world is over the Lightning protocol to Lightning peers. It can’t get hacked via some other means and maybe it is physically secured as well. So this is similar to how you might imagine a Ledger or a Trezor or a hardware wallet integrating Lightning. You run the node effectively on the device but it communicates to peers over whatever interface it has. So this is the reason why rust-lightning again doesn’t make the actual TCP socket calls for you, it just tells you “Here’s bytes. Please send it to this peer” or accepts incoming bytes from a peer. It is very easy to wrap around a TCP socket but you still have this flexibility to be able to do other crazy integration work. Of course this sidesteps the issue of how do you monitor the chain? One of the key aspects of the Lightning security model is you have to have something that is always monitoring the chain. And so you can’t completely move over to something like a Ledger or a Trezor or a hardware wallet because it is not constantly online and it doesn’t constantly get chain updates. So if it is offline for a while that is problematic. Of course this is even worse because there are some cases in Lightning where maybe you have a pending HTLC but your counterparty went offline and in order to claim that HTLC you need to go ahead and clear the chain before some timeout. That’s even worse because you have to not only be monitoring the chain on long time periods but you might have a specific time where you need to broadcast this update if you haven’t been able to get in touch with your peer in some time. In rust-lightning we try to get round this a little bit. We don’t actually natively do this for you but one key approach you might imagine taking is if you have this hardware wallet or you have some device you can have pseudo trusted watchtowers. I’m using the term watchtower overloaded a little bit here. It is kind of a trusted other half of your node. You can send it an encrypted copy of the data it needs to monitor for and it can sign that data, send it back to you and say “Yes I’ve received it, I’ve updated it.” And then you’re still not necessarily needing to trust the computer in-between the online computer but you need to trust that that other device is actively monitoring the chain and gets updates. This is still kind of nice because you can imagine if you’re really going down the rabbit hole of running some crazy high assurance lightning node, maybe you have multiple copies of this kind of server. You have multiple things watching the chain for you in different geographic regions and so you need to be able to take this data that it needs to watch for, send it to multiple servers, get sign off from all those servers and then actually make progress on the channel. So rust-lightning supports pausing the channel temporarily until you’ve updated the watching stuff, the chain monitor and then it will continue.
Testing Fun
So those are the two key use cases that are supported in rust-lightning. What I want to talk a little bit briefly about is some of the stuff we’ve been working on in terms of testing and different approaches we’ve taken from existing nodes that have kind of been enabled somewhat on accident by the structure of the library. So this wasn’t something that I had as a goal of mine going in but it turns out just because this library is flexible, it doesn’t have a runtime, it doesn’t have any syscalls and so you don’t have to do a bunch of setup work, it means we can run really great tests that literally can stand up ten nodes in the same process just by making four function calls. We don’t have to go actually create daemons and wipe things on disk and move things around, anything like that. It has enabled us to write really fast, great tests all over the place. And also of course because we have this interface like I mentioned of getting messages and being able to handle those messages manually to send between peers, we can also reorder those messages, we can delay them and deliver them in specific orders really easily by moving lines of code around instead of having to actually sit in the middle of a TCP socket and hack up our daemon or have a bunch of testing code in the middle of our production code. We’ve also played a lot with fuzzing, I guess I’ve played a lot with fuzzing in rust-lightning. We obviously have fuzzers for all our message deserializers and stuff like that. For those who aren’t familiar, fuzzing has been incredibly useful across the software world but mostly for finding vulnerabilities in things like image decompressors, image decoders, message deserializers, that sort of thing. And what it does is you write a program that takes as input some arbitrary string of bytes and you do something with those bytes and the fuzzer tries to make you crash. So if you have some image decoder or decompressor often times it will find a bug where you used too much memory, you have an out of memory condition and you have a DoS vulnerability there. Or maybe it will find a bug where you have some buffer overflow, something like that. Or whatever depending on the language you’re writing in. Fuzzers have been amazing. You can go look up all of the various CVEs and vulnerabilities they’ve found. We’ve been playing a lot with them in rust-lightning and we have the standard ones of just make sure all of our message deserializers don’t crash or don’t use infinite memory. But we also have some allow nodes to do completely arbitrary things. So we have one fuzz target that runs a full node and can receive bytes on the wire. This allows you to literally do anything that another node could do to you possibly in a fuzzing environment where the fuzzer is trying to creatively crash your program and comes up with different inputs that might exercise different code paths. Fuzzers aren’t that efficient as you might imagine, they’re just shoving in random bytes. They’re much smarter than shoving in random bytes but it is still shoving in random bytes. And so when you have these very large messages they’re fairly slow and you don’t have great code coverage. There’s some new research on how to do fuzzing with a technique called taint tracking. If you’re interested you should go Google this, it is actually incredibly fascinating. It is a great tool for building high assurance software. We are exploring that a little bit in rust-lightning but we haven’t gotten that far. We also have a more recent fuzz target which tests for consistency of the protocol. So we can actually write little short snippets that allow the fuzzer to essentially write new test cases for us. The fuzzer can reorder the delivery of messages so that it simulates the speed of light. It can send payments, receive payments, things like that. And its goal is to find inconsistencies in the state machine of our channel. So you have two channels on both sides of a node. You have a channel and you have two nodes and your goal is to try to find a way to get the two nodes into some different state because they shouldn’t be able to, they’re sending messages to each other. But if they somehow end up disagreeing about what the current state is, that’s clearly a bug in our state machine. It is probably exploitable because there’s probably some case where we’re forgetting about a HTLC or some funds. And this has been incredibly useful. We’ve maybe copied twenty test cases out of this fuzzer into our regular test suite and found a number of bugs especially in our channel pausing stuff. So there’s a lot of really exciting work that we’ve been able to do in fuzzing in rust-lightning in addition to just standard unit tests and standard protocol tests. One final thing I wanted to mention for those of you who are interested in testing, the Rust ecosystem generally is, because it is a lot of engineers building often high assurance software, one thing that Rust has good tooling for or that there are good libraries for in Rust that don’t exist in most other languages is a technique called mutation testing. Your goal is to test your tests by modifying the actual software and making sure that your tests fail. If you take your software and you flip some if conditions or you change some default initialization of some variables, your test should be able to catch that. Otherwise you don’t have good test coverage. This is much more effective than just looking at standard branch coverage or standard coverage analysis of your code and can find a lot more details that you might have missed in testing and things you should add tests for. So this is future work that we’d like to do on rust-lightning. If you’re interested in writing really cool state of the art testing come talk to me. We’d love contributors. So with that I was told I have to mention that Chaincode is running another residency program. We’ve had great success with these in the past. Most of the people end up getting a job somewhere in the space. If you are really interested in Bitcoin and want to get really deep into the nitty gritty of protocol engineering in both Bitcoin and how everything works and how people think about it. And Lightning and other stuff. You should apply. It is residency.chaincode.com. It’s about two weeks of intensive course-ish work where there’ll be a tonne of talks by great speakers. Me and some of the other Chaincode folks and we have people coming into town just to speak to you. We’re going to do it a little bit differently this time and there’s going to be a bunch of project time so you can hang out in New York with us with some great mentors, some of the smartest people in the Bitcoin ecosystem will help you with whatever project you want to work on, whether it’s rust-lightning or BetterHash or contributing to Bitcoin Core or other Lightning implementations or whatever cool project you want to do, you can come hang out with brilliant people and do that.
Q&A
Q - I’m wondering about all of the upgrades coming up in SegWit version 1 and how that will change Lightning. How are you planning to upgrade the Rust library to handle all of the changes that are upcoming?
A - With rust-lightning not being production yet it means we can not bother supporting old school stuff and even ignoring SegWit version 1, with the Lightning 1.1 changes, some of those changes I want to have in rust-lightning and supported on the network before anyone starts using rust-lightning in production because it will simplify a lot of stuff about interfacing and making sure it is really easy to use rust-lightning and doesn’t have fee disagreements. Also obviously eltoo and whatnot will hopefully make watchtowers orders of magnitude simpler and we can just throw out old code because no one is using it yet.