Laolu Osuntokun and Conner Fromknecht

Exploring lnd 0.4-beta

Video: https://www.youtube.com/watch?v=LXY0L8eeG3k

Slides: https://docs.google.com/presentation/d/1pyQgGDUcB29r8DnTaS0GQS6usjw4CpgciQx_-XBVOwU/

The Genesis of lnd

First, the genesis of lnd. Here is the first commit of lnd. It was October 27 2015. This is when I was in school still. This was done when I went on break and it was the first major…. we had in terms of code. We talked about the language we’re using, the licensing, the architecture of the daemon. A fun fact, the original name of lnd was actually called Plasma. Before we made it open source in January 2016, we renamed it to lnd because we use btcd and this made it more btcd-y. I was kind of sad I lost the name to other things that came out afterwards but I was first. We had Plasma. I ended up starting work on lnd full time in June 2016 when I graduated school. I was like “Yay I can do this Bitcoin stuff full time now.” If you look at the contributor history on GitHub you see a big spike afterwards. It was like winter break, school, finals, TA and then boom we started doing things seriously. As far as the release timeline, we had lnd 0.1 in January 2017. That was basically we have a thing, it mostly works. By then you could send payments, you could connect to peers, there was no multihop, there wasn’t really path finding, it didn’t handle revocation or anything like that. Then we had 0.2. 0.2 had a little more improvements. You could do things like send payments across other peers, basic path finding. Then there was 0.3, most recent before this one. Now it is a little more fully fledged. We could do multihop payments, we had some but not all of the onchain contract handling, we had autopilot for the first time which is our system within lnd to automatically set up channels. Now we’re here because lnd 0.4. That says alpha there, that should be beta. This is the first beta release.

lnd 0.4 beta

lnd 0.4 matters because this is the first release that supports mainnet. Before we only had testnet and simnet support. People got really excited and did mainnet anyway but we were like “we’re going to make breaking changes, don’t do it yet”. We discouraged people from updating because we had planned breaking changes in future so we knew that if you had channels, you’d have to close them all down. There were some little mishaps with people not getting that communicated right. Most of the work within 0.3-alpha and 0.4-beta was kind of around security and fault tolerance. So before there were no backups at all. If it crashed, everything was in memory. You couldn’t resume any multi-step contract handling or anything else like that but all that has been taken care of in this new release…. people running on mainnet. Before I would be nervous, “don’t open a channel” but now it is ok. We have a pretty good degree of confidence that if things go down, lnd will correct itself and if it ever crashes, then it is able to resume where it was and continue. This is a pretty big milestone.

Go + Bitcoin

With lnd, we use Go primarily, it is pure Go. Go has several advantages. I think it is a pretty good choice for creating software in general. Usually the question we get is, why are you using Go? Why not C or C++ or Rust or whatever else? These are some of the reasons why we use it. So far, we have had pretty good developer uptake. People typically find that the codebase is pretty easy to jump into because the language is kind of familiar. If you know C, you know Python, you kind of know Go. It looks the same, maybe there are some different keywords but for the most part it is straightforward. So one thing about Go is that has very good concurrency support. So lnd itself is very concurrent, parallel by nature with the architecture itself. The main things we use within Go to do this are called channels or goroutines. So goroutines have a lightweight thread that’s like a green thread from Python greenlets and then it has these things called channels. So we have these threads and they can communicate with each other using channels. I can send messages back and forth pretty easily. If you think about this, this makes it pretty easy to create these concurrent architectures. So maybe you have some producer saying something to intermediate producer or consumer…. and you can do subscription pretty easily. It is used very heavily throughout the entire codebase of lnd. Go has very excellent tooling. I think it is the tooling what makes it… With Go it is very easy to do CPU and key profiling. On weekends I profile. I profile, I do memory, I do some CPU profiling. It makes it very easy with something called pfile. You can also hit a server to get a goroutine dump and things like that. It has a race detector. If you ever have done concurrent programming, you know race conditions suck. They’re super hard to find and they’re….and can’t always revocate. Go has a thing called race detectors that lets you catch all these issues. We always run our tests with this and you can even run it and develop locally. If it catches… it will stop everything and then dump...”read after write dependency”.. The other cool thing is that it has something called gofmt. With gofmt, everyone’s code looks the same. This matters a little more in larger projects. We don’t have to worry about whether to do a semi-colon and then the brace or a brace with a space, do we do a new line? Basically, you write your code, you gofmt and then everything looks the same. This is good because everyone’s code looks the same so there’s no arguing with the code reviewer on the proper code style, this is solved automatically. Another thing is that the standard library is super expansive. The standard library has every crypto thing you need, it has networking, it has its own…. implemented in pure Go, it has TLS, it has everything you’ll ever need to do anything Bitcoin related. Another cool thing is that it produces these statically linked binaries by default. This is nice because I can have the binary and take it anywhere. The other cool thing about this is that I can cross compile super easily for any platform at all. If you write it and it compiles, it is going to run on that other system. At times you have things around 32 vs 64 bit but those really aren’t that consequential…. compiles for like MIPS, PowerPC, BSD pretty easily. My most favorite thing about Go is that it is a very simple language. The language itself is very simple parsing wise, you don’t even need a table to parse it. With the resolver, you can focus on the problem at hand. Rather than “oh am I using the proper sealed class trait, like monad…”, no. There’s none of that. You write your code, it is very simple and it lets you focus on the problem at hand. And the final thing we like about Go is the library btcsuite. This is written in Go as well. Anything you need to do in Bitcoin, this library has. Things like signing transactions, parsing things, addresses, peer-to-peer network…. lnd is mostly composed of libraries that interact with btcsuite. When we interact with the chain we’re calling onto btcsuite itself. This set of capabilities made Go and also Go in the context of Bitcoin a very good choice for implementing lnd.

The Architecture of lnd

So now the architecture of lnd. lnd has a pretty particular architecture. We try to maintain this whenever we’re doing things like code review, writing new subsystems. For the most part, lnd is composed of a set of independent subsystems. These subsystems run concurrently. Like we talked about before, they use goroutines within the codebase itself to run in isolation. They can run in parallel but then they use a channel to… with each other. This is pretty good because when you’re reading these subsystems, you know that only it can mutate its own state. There’s no other thing where you have a race condition, grab a mutex all of a sudden the state is inconsistent. Instead in order for me to.. my own state I need to get a message to my… I get a message, I parse the message, I apply it to my state and then maybe I send a reply back as well. This is really cool because now you have a.. system within the actual processor itself. At times we have to do things like ensure you can handle duplicate message delivery. If something restarts and comes back up, you need to be able to handle that message being received in a duplicate manner. It is kind of similar to the way people handle message queues… things like that. The main tenet of Go is “Don’t communicate by sharing memory, instead share memory by communicating.” This means don’t have a single shared map that everyone has... Instead maybe you have that map… goroutine. People send it serialized messages to modify the state of it and then you can send it a message to read the current state. That by itself, makes concurrent programming very easy to reason about. Otherwise you’re like “did I have the main lock and the other lock on the entry…” you don’t have to worry about that. Basically you just send the message and then you come back and you can get the message and go on. The other cool thing about this is that we can implement crash recovery between each sub-system by itself in isolation. Each sub-system has a particular log or has the last message from sender. When we restart we can test fault tolerance very specifically in a particular subsystem using unit tests rather than hoping it all works. Instead we can get really focused and know what message we need to be sending in our subsystem… Another cool thing is that each sub-system has its own logger. This is really good for debugging because you’re like “Oh what happened to my channel?” Let me look at the subsystem that just does channel updates. Or “What message came in?”, it can crank up the peer-to-peer messaging log. Me, myself I have a very particular logging setup. I have some things that I know are spammy and I turn those off. I have other things that are a little more important, things like handling states or messages that are coming in from the other peers. Anything that is chain specific is all abstracted away. Right now, lnd supports btcd, bitcoind and Neutrino as a back end. But you could also write your own. If you have a company and you have your own API you can plug in to that back end if you want to. It is pretty easy because these are the main interfaces that we have. The first one is ChainNotifier. ChainNotifier does things we do all the time like let me know when..confirmed, let me know when an output has been spent and also let me know when a new block comes in. If you know things about how channels work, you use those three things all the time. The next thing is the Signer. The Signer basically handles signing. It needs to know how to do things like SegWit sighashing, it needs to know how to populate… We could abstract this away. Right now it is all baked into the process. Later on, maybe in a dedicated hardware device or could even be in a remote server which has control policies to prevent people rogue signing. We have something called the KeyChain and the SecretKeyChain and those themselves handle deriving the keys in a particular manner. We could even have this be even more segregated. Give us addresses and public keys for the contract prior to signing them. Finally, we have BlockchainIO which lets you read the blockchain, what’s the block, give me the transaction, things like that. A cool part about this is that we can swap them out very easily and because of the abstraction and the way we have our unit tests and integration tests set up, we know we can assert the equivalence of all these different interfaces together and ensure that bitcoind works as well as btcd and other things. There are maybe edge cases between them.

Did this turn out well? I have the architecture diagram from two years ago and it didn’t look close to this. Maybe I can blow it up a little bit. I hope you guys can squint at it and maybe look at it later on. So the way it is, any time there is an arrow, that either means there is a direct dependency or it is passing a message to another subsystem. At the very bottom we have the lightning P2P network. Above that we have this thing called Brontide. Brontide is this crypto messaging layer that we use within lightning. It is based on Noise made by Trevor Perrin who worked on Signal and WhatsApp. It is a modern messaging protocol with very modern crypto. It has some cool things around handshakes to ensure we have certain properties like identity hiding.. things like that. Right above that we have lnwire which does all the framing; encoding, decoding messages. The cool part about the way this is set up is that if you want to take our codebase, because everything here is its own individual package, and just connect to the network and listen to what’s going on, you can do that. Because everything is very modular and abstracted away and has… unit tests as well. Right above the lnwire, we have the peers. This is basically reading and writing messages from different peers. Then we have the Server which is a hook handling the state of all the peers. Above that we have the RPC Server. The RPC Server is a pretty central part of lnd. That’s where any time you interact with any application they’re going to the RPC Server. The RPC Server uses something called macaroon authentication which we’ll go into a little later. Macaroons are basically this bearer credential. Typically you have this username and password. You have a username and you lookup on this massive list which acts as a control list. Instead we have a credentials system. So I can give you a credential that says you can only make new addresses. I give this to the RPC Server and it says “You tried to make a channel, no that’s disallowed.” I can take this new address macaroon and say you can only make new pure SegWit addresses, you can’t make nested P2SH or anything else like that. Those capabilities you can delegate with a macaroon can also attenuate down. This is nice because if you are building an application on top of lnd you can partition all your boxes and give them only the responsibility they need. You can make channels, you can listen for invoices and you can send payments. That is pretty good from a hardware utilization and separate responsibilities perspective. Then next to the macaroons we have the gRPC which we use for our… and a REST Proxy which is what most people will communicate with… If we move over to the right a little bit, we have the Gossiper to the right of the Server. This is dealing with exchanging information for all the different channel peers and seeing what’s going on as far as new channels. You have the Router. The Router hooks directly into the Gossiper. Maybe it is getting new channel.. committing to the state… Then we have the HTLC Switch which is the fabric of lnd. This is the whole payments as packets thing. It has channels which are all links and is handling the capabilities for moving in and out. We’ll get into that a little later. Then moving up we have the Signer who hooks into the Funding Manager. The Funding Manager handles how do we make new channels. It basically.. state machine “ok I sign the funding transaction, the commitment transaction, I broadcast, what goes on with that”. That hooks into the main three interfaces: WalletController, ChainNotifier and BlockChainIO. After that we have the UTXO Nursery. This comes into play whenever you have a timelocked output. What it does is it basically babysits these outputs until maturity. Once they’re mature, CSV or CLTV… absolute timelock, it can sweep those back into the wallet. Because we have reasonable componentability we can use them for any contract in future. We have the ContractCourt, this is where disputes happen. If there’s any case where someone broadcasts a prior state or we need… something else the ContractCourt handles that. It communicates with the Nursery. It may be the case that I have a HTLC, it timed out and so now I need to broadcast. I give that to the UTXO Nursery, the Nursery watches over that until maturity.. maybe 100 blocks and then passes it over back to the Wallet. Then a parallel thing is the BreachArbiter. The BreachArbiter is kind of like its own section. This is kind of like where justice gets dispatched at times. By justice I mean, because the contract has these stipulations where if you ever do this, I get all the money. If that happens the BreachArbiter gets notified by the ChainNotifier, broadcasts the transaction, writes it to disk and then gives it to the Nursery maybe… timelocks. That’s the architecture, the way things are now. This was a lot simpler in the past. A few of these subsystems came up in the last couple of years or so once we refactored a little bit and realized we wanted a little more flexibility in certain areas. I think they are pretty good as of now… isolated, have their own tests which makes things easier to reason about.

lnd As An Application Platform

So let’s talk about lightning as an application platform. A cool thing about lightning is that it is this new development platform for Bitcoin. Before as a developer you needed to know how do I sign a UTXO, what’s a sequence value, how do I do signatures, what’s a SIGHASH, things like that and that can get intense…. Now we have this much more streamlined API, it’s like another layer. Because of that we can abstract away the lower level details. When I open a channel, you don’t really want to know all the stuff below that. There’s a bunch of stuff going on: keys, signing, HTLCs and everything else. But for you, you open a channel… it’s much more simple. This lets you do things like metering for services or maybe I’m walking somewhere outside and I can hook into someone’s router and I can pay them via VPN server to connect to their router and get wifi and find a cab or something like that. I can do things like fast cross-chain transactions. Conner did a demo last October which showed the way to swap between Bitcoin and Litecoin instantly. It is a pretty cool use case because otherwise I need to trust an exchange, maybe I do it onchain, that takes like 20-30 minutes. Instead that can be instant. We’ve been calling them Lapps, it is like a play on Dapps, it is kind of like a kek thing, a play on words. But it seems to be catching on now. So lnd itself, if you go back to that other diagram, we’ve architected it to make development a little easier. We wanted it to be a platform where people could make applications on, where people integrate exchanges and other things. That’s one of the first things we sat down and thought about in terms of the design of it. One of the main things that we use is gRPC. So gRPC if you guys all know, was developed internally within Google and called Stubby. They open sourced it and called gRPC instead. Anything within Google uses something that’s very similar to gRPC. It uses protobufs. If you don’t know, it is a binary... format. You can have a definition of a message and that can pass into any language. With almost any language, you can use that same struct or dictionary or whatever else. The cool thing about this is that uses HTTP/2. This lets us have one single connection that multiplexes a bunch of other different messages between themselves. This is cool because now you don’t have to use a special SDK or hit a REST thing. You can code in your own language. So you’re in Python, you have a generator, you’re iterating all this stuff, you’re sending messages. Or maybe in Swift, you do this in iOS. Because of that you can focus when you’re integrating this very deeply into your business logic. Rather than be like I’m talking to lnd… they’re a little more integrated together. Another cool thing about it is that it has streaming RPCs. I can have one single connection, get the revocations and be like “Let me knew every single time a payment is settled.” I can have a callback that… some web sockets… Or I can do things like notifying when channels are being opened and closed. Generally, we’ve seen a lot of people build many applications on this. People have built explorers, we’ve got Y’alls, one of the most popular apps, htlc.me. We’ve seen a very big community around… the CEO of Y’alls is here actually. The other thing we have is a REST Proxy. Maybe you don’t want to use gRPC, maybe you don’t want to support it, maybe you like typing raw http queries in the command line using Telnet. You can use this instead. So basically… proxies over to the gRPC server instead using JSON. It is pretty easy, here is an example of me querying for the balance of my channels. Using either of these modes, depending on your application… you can use either one of these. Once again we have macaroons. We talked about this a little bit before. You have these bearer credentials. Right now, we have a read-only macaroon so you can give this out to someone and they can pull the channels. We have an invoice macaroon. The invoice macaroon is cool because now you can have a server that accepts payments on lightning and it can’t do anything else. Even if that server is compromised, all they can do is make invoices and make addresses and cause invoice inflation which doesn’t really affect you that much. We have some other cool features for macaroons that we have yet to implement. We have what we call… What it does is I can say “Here’s a macaroon. It can only make channels below 2 BTC on Wednesday.” Or channels below 2 BTC on Wednesday and Friday. You can take that down and have very fine-grained authentication for your applications and that is going to be coming soon in future versions of lnd.

lnd As An Application Platform - API Docs

So we also have this pretty cool developer site made by Max who interned last summer. This is a pretty cool site. You can see every single RPC that lnd has. And if you look at the top right, we show example code on the command line, on Python and on JS. The cool thing about this is that it is automatically generated. So anytime we open… this will automatically get updated as well. That’s api.lightning.community, I did the wrong link, I’ll fix it afterwards.

lnd As An Application Platform - lnd Developer Site

We also have this developer site for lnd which is pretty cool itself, again Max last summer. This is targeted at those who want to build on lnd. So we have a pretty good overview section that gets you in the proper mindset, application wise. How do I think about a lightning application, how do I think about lightning, how do I think about what’s going on under the hood. It walks you through the topology, the channel updates, things like that. It also has a directory of the cool Lapps people have been working on. We have a tutorial called Lightning CoinDesk which takes you through how to make CoinDesk… We have hands on tutorials for developers. The site itself is open source so if anything is out of date you can make a pull request and update the site. Maybe you want to add new examples or tutorials in different languages, you can do that as well.

On-Chain Backup and Recovery - Cipher Seed (aezeed)

Now we’re getting into the specific stuff that we’re working on so far as safety within lnd itself. So we have this thing called the Cipher Seed, the first version is called aez. One thing with lightning is that unlike a regular wallet we have many secret keys we need to derive. For a wallet, maybe I’m using BIP 44 so I have my accounts and I can get partitions to them and that’s it. For lightning, for every single contract that we have, we maybe need five or six different keys right now. It will get more complicated in future. How do we make sure this works, how do we make it deterministic, how can we back things up properly? We look at these existing formats. Everyone knows BIP 39 and that’s been out there for a while. There’s 39, there’s other BIPs like 43, 49 and 44 for key derivation. Those are very simple and not lightning specific. So we were like we need to make our own. We can go through the justification for why we did so because it is a pretty big… away from the industry to make our own seed format. One con with BIP 39 is that it had no birthday. This maybe works if you’re.. an API… to do key rescan. If I’m on a phone, I don’t want to start scanning from genesis. I could be the first Bitcoin adopter. I could be Hal Finney’s future self or something like that. We want to avoid going all the way back into the chain. Another thing is that it has no version which means that when I have the seed how do I derive my new keys? I could have Electrum 2.0 and in ten years I’m using Python 3.7 and it doesn’t work with this prior version. The other thing is that the way they do the password, it could lead to loss of funds because you don’t know the correct password. They have a feature where they let you have hidden wallets which depending on the use case may not be that useful. If I have my seed and it was… I put in an invalid password, it doesn’t tell me no that’s wrong. It just says here’s your wallet. It could be $5, it could be $20 but now I don’t know what my password was. Another thing is that it has a pretty weak KDF. It doesn’t tell you how to re-derive the keys. So now I need to have my wallet and the backup together and then hope in the future I can still use Python 4 when we’re on Python 5 or something like that. So instead we created something called aezeed. aezeed is the first instantiation of something we call the CipherSeed family. If you look at the top right, you can see what the format of it looks like. So the first thing is that we have an external version and then we have the cipher text and a checksum over that cipher text. The external version basically tells you how to price everything else. It is one byte right now, we could bump that up in future. This lets you do cool things like, let’s say we changed the KDF in the future or we change different parameters, we can take an existing seed in an offline program and reconvert it back to a new format. Then we have the cipher text. The cipher text is an encryption of this payload above which is the internal version that tells your wallet how to derive the keys. This could be like… I was using SegWit in 2018 and I used witness key hash and I used nested P2SH and that’s it. We have the birthday and the birthday uses something called Bitcoin Days Genesis. We realized we didn’t really need a full 8 byte or 4 byte timestamp. Instead we can say Bitcoin was created in 2009, nothing else matters before then. Let’s just count every single day beyond that. The cool thing about this is that we can use 2 bytes and we can go up to 2088 or whatever. In the future maybe if we have other use cases we can have another seed format.

Q - 2180. There’s also the salt that’s missing. The whole thing is salted so it is sort of like a mini password database for your seed.

I forgot about the salt. There’s the salt that I didn’t have but it is probably in the checksum. There’s some interesting tricks about this. From the get-go if you have the password, the seed itself is encrypted. I can maybe leak this out in plaintext and no one can even do anything from that because it is encrypted, they need the passphrase. We have the passphrase, we run that through a KDF and then we also apply a salt which is encoded with everything else and then finally we have a checksum for the outside. When you’re decrypting, you can first verify that this is the correct set of words. Even beyond that, because the cipher text uses AEAD, within it we use something called aez which is this arbitrary input size blockcipher. This means that we can encrypt a very small amount of data, we can have, without the MAC, 20 bytes turn into 20 bytes. It can adjust the internal mechanism to decide on the input itself. Finally because it is AEAD, it has a tag inside of it. There is something called a subtext…factor where we can control how many additional bytes to add onto, which controls the strength of the tag itself. This is cool because now once I know it is the correct words, if I put in an password, I know it is wrong. Now at that point I don’t have to be worried about finding out I thought it was the right password, I erased my memory then…. Now this is the seed format we’re using within lnd. It has been working pretty well so far with most people. It is a little bit different than what people are used to because it is longer, we’re used to using 12 words. We also do recommend that you add the password phrase as well. More or less, it… for all the things that we needed. It has a version so we know how can we parse the external part of it. In the future if we decide we want a bigger checksum…. It has an internal version to tell you how to derive the keys. Finally it has a birthday so I can tell the light client how far back to start looking.

Off-Chain Backup and Recovery - Dynamic Channel Backup

Let’s talk about backups. The seed format is about how you rederive all the keys that we had in the past. This is basically once I have all the keys, what can I do with them? Or even once I’m live and updating my channel how can I…recovery? One thing to note is that lightning nodes are inherently more stateful. We don’t just have the keys, we also have the current channel state and that tells you what state number we’re on, what parameter we’re using, what’s my balance, what’s your balance, what HTLCs are active. As a result, you need the current state and your set of keys otherwise you can’t update. If you have your keys but you don’t know the state you can’t really do anything. If you have the state but you don’t have the keys you can’t really do anything. We have a two phase approach.The first stage is a dynamic channel backup. We have these things called watchtowers. The idea is that any time… maybe you can batch them together, you can export this state. There are different versions but the important part is that the watchtower doesn’t know which channels are being watched. You can encrypt that data so it doesn’t know the balance… I send that state to the private outsourcer and as a result now even if there are only ten or hundred in the world, only one needs to act properly. We’re going to integrate into lnd, we’re going to add them into the routing node itself. If you’re running a routing node you can also run one of these watchtowers. It makes it easier for discovery for the participants. If I’m a watchtower, then I can also run a node pretty closely together. The other thing we’ll add is that you’ll be able to point at your own instance. So let’s say I have my node and I have my computer at home. I’ll be able to backup those states to my computer as well, kind of like a redundancy thing… The other thing is that we may batch the updates together. Otherwise if there is one watchtower globally and it gets everyone’s updates now there is now a massive timing side channel attack on the whole network. We’re going to add this batch process within it to ensure… This will probably be integrated in a major release of lnd. We’re going to roll it out first and then create this standardized BOLT for it. It would be nice if all the nodes running this watchtower software had the exact same messaging and framing structure. Now it is very easy for nodes to spin up, connect some channels...

Did anyone see this happen? It happened like a day or so ago. What happened is that someone had their node… you can see the log message on the left. “Revoked state #20 was broadcast. Remote peer is doing something sketchy!” Once that’s done it says “Waiting for confirmation, then justice will be served!” On the right you can see the other node that says “Justice has been served”. We get our money back. The question is exactly what happened in this case? What happened was that a user had a node implementation and for some reason they ran into some issue and they didn’t know how to get past it. They shut things down and they restored the prior backup. They did a cp, they copied their channel state and as a result when they came back up they were on like state 25 but really they only had state 20. As a result, if they broadcast their transaction, they violated the contract. The breach arbiter which was on the other screen despatched justice onto the individual. The question is how do you avoid this from happening? If you have your dynamic states how do you ensure that if I ever have a backup of the static state, it actually works properly? I’m going to explain that. We have something called static channel backup.

Off-Chain Backup and Recovery - Static Channel Backup

So what we can do is we can overload the watchtower with some static channel information. This information can be per state. I say it is static because you only need this for when channels are created, you create one of these and when it is closed you can delete it at the end. Because it is static combined with the seed format… The backup tells you what keys were used, not the exact keys but the key path using the key derivation protocol using the seed format itself. It also tells you information around the node and things like that. Given your static backup and your seed, you can rederive all the keys you need. In the case of partial data loss you can follow this protocol here. So first you fetch the backup and make sure the MAC is correct, things like that. I use my seed and backup to rederive all the keys that I have. I can connect back to the nodes I had channels with and when they see me with a particular state they activate something called “dataloss protection” in the protocol itself. They’ll give me some information needed to sweep my funds. They close out the channel onchain and then I can sweep my commitment output without any delay. This is an emergency scenario. If this had been fully implemented, the prior state would have been prevented because both sides would have realized we need to close the channel because they lost some data. Now it is in the state where if I have my seed and I have one of these backups anywhere I can get that money back offchain. I can use my seed for all my onchain transactions but for my offchain depending on if I have a static or dynamic backup I can also… as well. We plan to have some streaming RPCs for every single time you open a channel to ensure you have all of your state properly set up. This is the safe way to do backups on lightning. If you ever do anything naive like copying it and hoping you’ll have the correct version, once you’re doing some complicated version testing yourself this is what you should be doing. We’ll be implementing this very soon within lnd. Hopefully we’ll make this into a BOLT standard because it would be cool where any node if they’re using the same seed and have the same backup format, they can connect to any other node and rederive their keys and get all their money back. This is what we want, we want to collect all the satoshis because in the future satoshis may be important.

Automatic Peer Bootstrapping

The final thing we have within lnd is automatic peer bootstrapping. Before with prior versions of lnd, you had to connect manually to other peers. This was a pain because if I’m on IRC and I don’t know anyone, I wouldn’t be able to connect down to anyone else. So instead we added this thing called automatic peer bootstrapping. Within the code itself you can see this interface on the right, we have something called NetworkPeerBoostrapper. This is pretty generic. It takes a number of addresses it knows I shouldn’t connect to and returns… We can compose this for the MultiSourceBootstrapper to ensure we get the number of new peers… distinct bootstrappers. We want distinct bootstrappers because if someone’s DNS server goes down, all of a sudden the entire network can connect to each other. The current two bootstrapping methods we have in the codebase. First, we have DNS in BOLT 10. It is very similar to the way DNS in Bitcoin works. It is made by Christian Decker who also runs a bunch of Bitcoin DNS stuff. We have one of those for testnet, Bitcoin mainnet, there will be one for Litecoin but I haven’t done it yet but maybe I’ll do it after this talk. Also we have channel peer bootstrapping. You only need DNS when you connect to the peer for the first time. After you connect, you have this set of authenticated signed announcements. You only accept announcements from individuals that can prove they have a channel open. This avoids a sybil scenario where they flood you and maybe do an eclipse attack. You force them to have some skin in the game. You need to have open UTXOs and channels, otherwise I won’t accept your announcement. When you come up, you can connect to the DNS resolver. You can get the initial set of peers. After that, because you have this data you are fully independent on yourself. One thing we’re probably going to do in future is add additional versions of bootstrapping. We want as much redundancy as possible. If for some reason the DNS server is down you may have issues connecting. One thing we’ve seen in the wild is some issues with DNS resolvers filtering large SRV records because maybe they don’t support UDP… maybe we’ll investigate some other redundant sources of how we can do bootstrapping in a decentralized manner.

Payment Circuit Time Series Database

The final thing we have here is pretty cool and I could have another entire talk on. In the past you didn’t really know what… was doing. You could look at the logs but you wouldn’t know if you actually forwarded transactions or things like that. So in this one we have a time series database of all completed payment circuits. A completed payment circuit is when I get an add and I forward that onto the add HTLC and I get back the settled one and I get some fee itself. We store all this on disk. You may want that for several reasons: financial record keeping and different analysis. The cool thing now is that I can look at my node because it is a time series database and query between 2pm and 3pm there was that lunch on the west coast where I had a spike in activity. I can look at that and ls to see what was going on. I can see if my node is running properly. If you look on the top right you can see the FeeReport command. This shows you the fees of all the different channels that you have. You can see that I have a fee of 1 base satoshi and I think it was 0.00001% after that. It has a breakdown of the day. I made 7 satoshis that day on the testnet faucet. Over the past month, I’ve had 145 satoshis which isn’t that bad. Fees are very low on testnet in particular and also there’s not that much traffic going on right now. We also have the ForwardingHistory command and what the ForwardingHistory does is by default it shows you the last 24 hours of forwarding. So you can see I had two payments in a 24 hour period. One was 2k satoshis, the other one was 1700 satoshis with 1 satoshi fee. This is pretty cool because now what people can do is they can do analysis on their channels. We have something called autopilot in the daemon right now which looks at the graph information to see where they should connect and establish channels to. In the future we can look at the real time information of all the channels coming in and decide I want to close Channel B because Channel A is getting me more revenue but is almost depleted. So I can close out one over here and pass it into one over here. Or I can ensure that I have a rebalancing schedule to ensure I can accept any available flow at any given time. Maybe it is the case that I’m getting cancels over here so I’ll ramp up my fees to only have things that are likely to go in. We can do a lot of things in future, we’re looking into this. People can make very cool graphs of every single payment coming in. We’ll have the streaming RPC and things like that. Now it’s Conner’s turn.

HTLC Forwarding Persistence - Overview (Conner Fromknecht)

Thank you Laolu. So we’re going to jump into, for the last half, a couple of things mostly related to forwarding HTLCs. Most of the work that is going to be talked about here is safety stuff and then at the end we’ll get a little bit more into the onion packets and the onion routes. To start here, this is a high level diagram of how the core components of our payment forwarding work. In the middle you have the HTLC Switch which sits in the middle and manages all of the surrounding links. A link is a connection between myself and a person who I have a channel with. When I’m forwarding payments and I send the onion packets they actually go out over these links. It is the job of the HTLC Switch to be this financial router that is accepting incoming payments and deciding how to forward them out. The life cycle of this HTLC will start on your left with the blue line, follow it all the way through. The red line indicates where a packet can fail internally and it’s sent back to the person upstream from where it came from. The green is a successful response or settle. You see a green line over there, that’s when we receive a payment locally. As soon as we receive it, we check we have an invoice, cool, send it back. Over here, we’re getting a response from a remote peer and then forwarding it back through. The key components here are the circuit map and the forwarding packages. Those are the primary things that we’re adding here. The forwarding packages are mainly responsible for ensuring reliable forwarding of all the packets internally within the switch. So if we write everything immediately to disk, when we come back out we can always know how to resume our state. It will aggressively make sure these are pushed through and get to the outward links. The Circuit Map’s job is a little different. It sits in the middle and its job is to make sure we never send a packet through the switch more than once within a particular boot cycle. It has to handle this job of broadcasting messages between m of n different peers. This is a huge communication bottleneck problem and so getting efficiency there is pretty critical.

HTLC Forwarding Persistence - Circuit Map

So we’ll start here at the Circuit Map. Whenever we get a HTLC it is assigned a CircuitKey which is a tuple of the channel and the HtlcID. The HtlcID auto increments for each channel so they’ll ratchet up and we’ll get them in order. When you’re forwarding a payment there is an incoming key. The person who forwarded it to me will assign some HtlcID tied to this channel. I will go through the Circuit Map and assign an outgoing key. I will assign a HTLC on my outgoing channel that the remote peer will then handle. The job of the Circuit Map primarily is to line up those two incoming and outgoing keys. The primary reason is that when the payment comes back across from the remote peer, I have to look up by the outgoing key which channel do I send this back along, who was the one who originally sent it to me? That whole process needs to be persistent because if it doesn’t then we might receive a payment and be like I don’t know, drop. That’s the worst case that could possibly happen here. Blackholing a payment is the worst case. If I send a payment and it gets lost by the network or your node goes down and restarts and doesn’t know how to handle anything or doesn’t realize that it’s already received this payment and just drops it, that’s going to sit there in timeout until the CLTV expires. That’s not great. One of the big challenges here is that some of the links may not be online at the time that I’m trying to make this payment. The semantics of an add which is when you’re going out are different to when you’re going backwards. When you’re going out, it is kind of like a best effort forwarding. If I’m sending a payment and the remote peer is not online, I’ll just be like they’re not online, I’ll send the bill back. That’s a little bit easier. But with the response, I need to make sure that always gets back. If I committed to forwarding the HTLC and in fact did, when I get a response I need to make sure it gets back. Otherwise that person is going to be sitting there waiting forever. That’s one of the big challenges. Also in between this whole process the links can flap. They might come online, go down and repeat this process. We’ve see this online, I don’t know if you guys had tested lnd, You see this, they’ll come up, send an error and go down. Come up, send an error and go down. I don’t know if you’ve seen that. Making sure all this state stays consistent when that process is happening is a big challenge. The big things are reliable delivery and at-most once semantics. We have to do all this while at the same time avoiding a write for every single HTLC. I can assure you and it was tested that if you do it once per HTLC you’ll get a throughput of about 10 transactions per second. That’s probably a best case. There were three subsystems I was working on where they all had to be joined in concert to do this and just putting one of them with a singular write per subsystem or HTLC dropped performance dramatically. Doing all this in a batched manner was hugely critical.

HTLC Forwarding Persistence - Forwarding Packages

Going back to our diagram, the forwarding packages are internal to every link. As soon as we receive packets from the outside world, we write them to disk. That serves the primary state in which we’ll read again on start up to make sure we reforward these internally. They’re all batched and they’re batched at the level of the channel updates. So when I do a commit sig or… all HTLCs handled in those batches are processed atomically. The reason is that it simplifies the recovery logic. When you think about, if you receive a batch of HTLCs but actually process them individually. Let’s say I get a batch of 10, I start processing the first 3 and then I go down. When I come back how do I necessarily know where I stopped? Especially with something that needs to be replay protection. If I process 3 and I say those were good the first time and I’ve written that to disk that they were good. Then I come back up and I check again and it’s like I’ve replayed myself. Making sure our persistence logic is batched at the same level as the actual logical atomicity of the channels is a huge safety feature in my opinion. It was also kind of necessary from a performance perspective. We only ever write to disk with these forwarding packages in the best case. During normal operation, all we do is write to disk, everything is buffered in memory throughout the entire switch and then only if we crash do we have to reconstruct our in-memory state. There are a few edge cases here. This is just the flow of a payment in memory. This doesn’t include all the actual persistent restarts and stuff and so each subsystem has a bunch of internal recovery procedures and stuff like that. One of the cool things about this design is that because everything is only written to disk, when we come up or when we’re done with these forwarding packages, we can garbage collect them totally synchronously just by reading to disk. We read to disk, we’ll like “hey, this one’s done, remove it” and we do this once a minute and that doesn’t interfere at all with the channels that can be done on a global asset level. Basically the win here is that we’re able to batch things heavily and that’s a big win for performance.

Multi-Chain - New Data Directory Structure

Moving onto multi-chain stuff. In this latest version of 0.4-beta we restructured the data directory entirely. We now segregate/separate graph and chain data. So lnd right now can support Bitcoin and Litecoin. That’s what it is configured for. We just added litecoind support which has almost entirely the same components as the bitcoind backend. Each chain has chain data that it needs, it might be headers or if you’re using Neutrino, it might be compact filters. Additionally, each one has a wallet. In the chain data, we store them by Bitcoin, whatever testnet, mainnet, simnet and then the actual data. The difference between that and graph data is that graph data is shared across all possible chains that you’re listening on. For example, if you saw the swaps demo, all the graph data for both Bitcoin and Litecoin was all in the graph directory whereas they would have separate testnet btc and ltc directories and wallets. This is a nice separation of directories and concerns and hopefully that gives you an introductory look ahead to where that will go when we actually incorporate the formal multichain daemon support.

Multi-Chain - Deterministic Per-Chain Key Derivation

Another cool thing is that in preparing for that we implemented deterministic per-chain key derivation. If you guys are familiar with the BIP32 key derivation path. It goes purpose, coin_type, account, change, child_index…

Q - That’s 44?

My bad, yeah. A couple of changes we made to support the configuration schemes Laolu was talking about. We employed purpose 1017. We used SLIP-44 for key derivation. Bitcoin is 0, Testnet is 1, Litecoin is 2. For all the key derivation, because we had different types of accounts, the standard wallet might use a default account of 0 but Laolu added all this stuff to do the MultiSig keys, RevocationBase points, PaymentBase points, all these things. So we swap out the account for this key_family notion. This is nice because in a multichain setting we can set the different coin types and we initialize one wallet per chain. Each one of those gets placed in its own chain directory if you’re using lnd in that setting. Those wallets can be independently rescanned and recovered so we’re currently working on the rescan logic so you’ll be able to say “restart lnd, pop in and look ahead 1000 or 5000 or 20,000” whatever it is and then it’ll derive keys, scan forward looking for them and update your balance as it goes. Finally, the biggest benefit to all this is that you can use one aezeed cipher seed and be able to manage funds on all different chains.

Onion Routing - Optimized Sphinx Packet Construction

Finally, getting into some more onion routing packet construction. In this last version, we optimized the construction process. Before we had a quadratic algorithm that when you’re processing a… to 20 hops, it would do this.. algorithm to derive all the ephemeral private keys and blinded pubkeys. You can see the equations right here. Basically you can see that you go from 0 to i and the blinding factors also extend from 0 to i. Because they’re shared across… you can catch them all. The effects of this are pretty immense. We saw a 8x speedup dropping the time down to roughly 4.5 milliseconds on my machine, 65% less memory, 77% fewer allocations. Shout out to jimbo, is he here? He wrote this up and then it was accepted to the BOLT LN spec. Now it is the reference implementation for how you derive the onion packets, or at least these keys. We have some benchmarks at the bottom and a comparison of the number of total scalar and base multiplications that you use to compute them. So, pretty big savings.

Onion Routing - Persistent Replay Detection

Finally, a part of this whole HTLC packet forwarding is the ability to detect replay SPHINX onion packets. So the packets over the network come over in this 1366 byte OnionBlob. I receive that off the network and I’m able to process it and get the… I’m forwarding to, the amount, the timeout, the CLTV, stuff like that. If someone were to intercept those packets and start replaying them to me, it’s sort of like a privacy leak. You could see the actions I take based off of processing that packet. As well as someone might process it again. We want to prevent that as much as possible. And this operation has to survive restarts. So if I send you a bunch of stuff, make you crash or DDOS you, come back up and try to send you the same things, you shouldn’t be able to accept that or you should at least detect it. The way we implement this is we implement a decaying log. When I receive this onion blog and parse it, I’m able to generate a shared secret. The equation isn’t here. We hash that and take out the first 20 bytes and store an on disk table of all those. Then when these packets come in, I compare against all the ones that I know and if any are found to be duplicates, we reject them. In that process we actually record in that batch which ones were actually marked as replays. Because going back to this example of processing a batch of 10 HTLCs and we get down to the first three, if we restart and then come up again, we might actually be replaying ourselves and not know the difference. To prevent against that we use this batch identifier, this ID short_chan_id || commit_height and we use that as an identifier, we pass that in. If the batch has already been tested for replays, we just return the replayed indexes of those packets and we know off the bat we don’t have to do any more processing. That was the decision before and we’re going to deterministically replay that. This is primarily a protection against rejecting packets against ourselves after restart.

Q&A

Q - Just a quick question about the wallet architecture. The address that you send funds to when you start or when you sweep the channel, are those in the regular BIP 44 path?

A - For those we use BIP 49. BIP 49 is basically nested P2SH but we modify it to use native witness for the chain address. I think we use BIP 84 which is basically pure… You’ll be able to rescan for those once you have the seed.

Q - In your key derivation purposes, you listed a multisig purpose. Is that just for the channel anchor?

A - Yeah, that’s just for the funding output itself.

Q - You’re using the same recovery words as is traditional at this point?

A - Currently, yes. With the way the scheme is, the actual encipher needs to stick to the encoding so we can swap out recovery words in future.

Q - I have a library where we went through 100,000 words to find the most memorable, most international words and the only thing that is left is I need to do… So if somebody is interested I’ve got like 80% of what’s necessary for a much better word list.

A - It’s versioned so we could do that in future as part of a modified word list or any other parameters.

To add onto that we can also do different languages so you can translate the same raw encoded ciphertext into Japanese or whatever you need.

Q - We’re in the early days of lightning, could you talk about how you see the network growing and it being adopted?

A - I think we’re in the early days where we have a bunch of enthusiasts who are very excited who aren’t used to writing interfacing services. “Oh someone is doing a TCP half-open attack. What do I do?” Once we get past that initial phase, we’re working on this ourselves at Lightning Labs. This is giving people the educational resources to operate a node. What should you be setting as far as your kernel settings? How do you want to connect to other peers? How do you want to manage your connections? We’re going to do a lot of work on the edge cases for the node operators so they know what they’re doing, be a little more aware of the network. We also have plans for this UI for node operators so they can look at their node and do analysis in terms of payments coming in, things like that. As far as merchants and exchanges, I think we’re in a phase where they can start experimenting with it now. One big thing was for them to see it live on the network beyond testnet. We’re actually seeing some people already piloting it. Apparently Bitrefill is the major merchant on the network. People have realized that they can connect to them and route wherever else. I think we’ll also see different merchants take advantage of that. Another thing we’ve seen in the past is people offer discounts with channel creations. Open a channel with me and I’ll pay your opening fee. Maybe people will be incentivized to offer lower fees from the get-go. I think it is in a phase where people can jump around early if you’re an enthusiast but know it is in the early days, people are still setting up the infrastructure. Beyond that, when we see it mature a little more, I think more exchanges will come on because there are some pretty cool things with exchanges as far as making them less custodial. Have a hybrid channel…and execute a trade. Also do cross-chain arbitrage. If I have an account on two exchanges, I can send my Litecoin over here, sell it, buy more Bitcoin and do whatever else.

Q - I’m wondering about what the major attack vectors are on the lightning network.

A - Attack vectors include taking down nodes which is why you want to have multiple nodes for a particular service to ensure you can have availability for your users. You don’t necessarily need to advertise on the network, you only need to do so if you’re running a routing node. So me as a merchant service, I can be on the edge, not even advertise that I’m there. Those who want to route towards me to do different payments can do that. Other things include trying to spam a node with very small channels or something like that. You can have some policies so I’ll only make a channel of above half a Bitcoin. That adds a cost barrier to spamming me with all this state. Other things include doing things on the chain. If you can make the chain… to launch attacks. We have other defenses like the scorched earth approach where if you try to cheat me I’ll send all your money to miners’ fees. That’s it. I don’t really care, I just want it to be a strong deterrent. Any other ones?

I think that covers most of it. Availability is probably one of the biggest things. Doing things like putting yourself behind a TOR node so your IP isn’t available.

I forgot to add TOR. We have outbound TOR support now. I can connect using TOR onto the network. In a later version we’re going to add hidden service support as well. So I can be a routing node but not give away my IP address. That’s even better now because they don’t know I’m using Comcast in San Bruno or something like that. So they can just know I’m in the world somewhere instead.

It kind of depends on what your definition of attack is. Whether it is inconvenience or full on exploits. There’s varying levels of all those. In general, the ones that are most practical are probably inconveniences rather than full exploits.

I guess there are things that haven’t emerged yet. Maybe some active privacy attacks. People doing things actively on the network to deanonymize users. We’ll see when that comes.

Q - I’m curious how watchtowers are going to work for more intermediate nodes like people who have nodes on their cellphones that aren’t specifically routing 24/7.

A - They wouldn’t be accepting states from individuals, they would be exporting those state themselves. There’s a question on the compensation structure. With the free ones… they could accept all the states. You could do something else where you pay some small millisatoshi amount per state. Also there could be a bonus where if you act and serve justice you could get 10% or something like that. Ideally once we’re all on mobile platforms it’s going to be all in the background. Ideally it does it for nodes automatically. If I’m a power user, I can scan a QR code and make sure it is backed to my bitcoind node as well.

Ideally you might have 1 or 2 or 3 so you can cross-reference them and when you come up make sure they’re all consistent with your state when you went down. Or if you lost something and you’re trying to recover, they’re all telling you the same thing.

They’re meant to be redundant and distributed so then it’s a one event thing at that point. One thing we do within the current codebase is we scale the CSV value according to the channel size. So maybe a $10 channel is one day but $20K you give me a few weeks.

I think what you’re getting at also is once we have more effective watchtowers and stuff like that and node availability isn’t as much of an attack vector for stealing your funds then we can reduce those timeouts. All the other inconveniences like “my funds are locked up for two weeks” aren’t so much of a problem anymore.

The timeouts are on the higher side, that’s because it’s new. We want people to be a little more cautious.

Q - I was curious. A small detail - the macaroons. Could you speak a bit more about them and what some use cases are? Are they a feature for a node operator?

A - They’re very simple. Basically it’s just a HMAC. A HMAC is a root key and maybe some authenticated data along with that. So because of this you can’t forge a macaroon ever and you can’t go in the other direction. So if you’re able to attenuate it, maybe like add on a channel for 1BTC on Monday by hashing that down, I can still verify the root chain itself. It has this final digest and also information on how to reconstruct that digest from the root macaroon key. These are pretty used. I think Google uses them extensively now. If you check back your headers, you maybe have macaroons that are being used for them. In terms of node operators, you can have a setup where your backups aren’t even on your node at all. Instead you can download them remotely and you have a special macaroon for only a particular purpose. Beyond that you could have a monitoring service. Maybe there’s some… that wants to see what’s going on as far as their operators. You can give them that macaroon to only collect that data itself. The other cool aspect is when you’re doing microservice… where I can have different distinct instances that only do what they need to do and nothing else. It isolates them so if they break into this box they can only list my channels and maybe not actually make payments. Right now, we have admin which is basically all privileges, we have read-only that can only do read-only and then we have invoice… but later on we’re going to add tooling to let you do very custom macaroons. This will be cool in future because you could say more on the mobile… I could pass a macaroon to send a payment every day to some video game. Then… pass that to lnd. There’s some pretty cool architectures in terms of having multiple different services with the same lnd but having different privileges for what they can do.

Q - I was wondering what would happen if you… channel anchors? Someone was sending all these one satoshi outputs… If people were spamming channel anchors with additional UTXOs, would they just get integrated into commitment transactions? Would they be lost?

A - They wouldn’t be a part of the original funding output. They would be different outputs from the same address. lnd could spend from them but it may not be worth it if they’re only one satoshi.

You can later add constraints on what channels you’d accept incoming. To a certain degree, incoming channels are free. If you’re going to be put up $100 to me that’s fine. Maybe you don’t want a ten cents output because maybe that’s going to be dust in future. It’s not implemented yet but there’s an issue on GItHub. There’s a bunch of GitHub issues if you want to do development on lnd. Then you’ll have a final policy on what channel do I accept.

Maybe if someone donated a lot of money it might be worth it. But the fees required to get your one satoshi back into your wallet balance is going to be more that 0.1 satoshi.

Q - Another question about the watchtowers, two little questions. Are they necessarily trusted? Is there anyway to outsource it trustlessly. Secondly, is there anything about it that’s deterministic? You have the deterministic wallet structure. Can you harden the keys that are used for revocation and then give a xpriv to the watchtower? Or do you need to get them a key every time your channel updates?

A - You could say they’re trustless because you have them do a particular action and if they don’t, as long as you have one other one, they can do that action.

Q - But could the watchtower instead of returning the funds to you sweep that channel to themselves?

A - No because I give them a signature and if they create another transaction, it’s invalid at that point. I tell them what the transaction should look like and we use BIP69 to ensure the inputs and outputs are ordered deterministically. I give them a signature with the balance information and they can only do that. If we say you can take 15% on justice enforcement they can only do that.

Q - You need to set that up in advance - the watchtowers’ fees. You would sign…

A - Exactly. We set up a new negotiation. Here’s 20% and one satoshi every single state. At that point, they can only do that action. So what they can do is not act but then in that case you want to have these redundancies in other places.

Q - With every channel update you’d have to tell the watchtower what the new…

A - Yeah but certain aspects are deterministic. There’s a revocation scheme which can let them compress the space into log(n) rather than having one for every individual one. There’s a few different versions. Some involve using these blinded keys to ensure they don’t know what channel they’re watching. You have an encrypted blob and I encrypt the blob with half the txid which you don’t know. But if you see this half of the txid on the chain, you have the full thing and you can decrypt it and act. Otherwise you’d have to brute force it, AES 256 so good luck with that.

Q - One problem I’ve had when spinning up lightning nodes is figuring out who to connect to and who to open channels with. Especially if you’re trying to receive payments, getting somebody who’s willing to open a channel with ones on their side. Do you have any potential solutions or do you envision how this problem will be solved in future, allowing people to figure out who to connect to?

A - I think one of things that will be under development a lot this year and in future will be further work on autopilot. If you can imagine, there are a couple of different use cases that users might have. You might be a routing node, you might be a person that’s trying to pay, you might be a merchant, you might be an exchange or whatever. There might be different use cases. You can think of optimizing the attachment profile of autopilot based on whatever the use case is and what you’re trying to optimize for. There might be different fitness models…

Q - Can you say what autopilot is?

A - Autopilot is something Laolu built. It’s basically like an automated channel maker. It can take certain inputs like did your wallet balance change, did you connect to somebody, stuff like that. It will look and see what’s a good channel to make. You can use different heuristics to guide that attachment. Or at the same time there could be more matchmaking services. People say “Hey I’m trying to meet. Maybe we can link up” and there’s an exchange for making the channel. That’s another way of going about it. In general, I think if you have a more informed and more optimized autopilot then a lot of these problems might go away. Because hopefully you just make channels to a better portion of the network you’re trying to target.

The end goal is that you put money in a box and it just works. We’re not quite there yet but we’re making strides towards that. The current one on testnet and some people might use it on mainnet, tends to minimize the average hop distance. It tends to a scale-free type network. We’re going to be doing a lot more experimentation on that front. Even in the codebase itself, everything is an abstraction at the interface level. So you can add what’s called an attachment heuristic which has things like do I add more channels and who should I connect to? Right now, it only uses data in the graph itself. In the future, we could also start to put in signals from each individual channel, maybe do like a “channel fitness”. From there call them down in terms of what you should be doing. The other thing is inbound liquidity. There will be liquidity matchmaking services where you can buy a channel for incoming. You can say that someone would want to do that because if you’re very popular then maybe you’ll be earning revenue routing towards you. You could also say that if you’re buying a channel I can credit you for free payments to incentivize you to have outsourced revenue to me. If you’re a merchant maybe you buy some inbound bandwidth because you compute what your total influence will be on that.. so you can have a set of rival bandwidth. It is like… but the costs are very very low because you’re opening a channel.

The last thing that would really help that as well is when we enable dual funded channels. Right now, it might be hard to get someone to be like “Hey put a bunch of money in this channel to me”. If you’re both going to put up some collateral and both be able to route fees initially off the bat without having to wait for the channel to normalize or anything. Then you’re going to have a much better time.

Q - ….

A - Possibly. It probably won’t be at the protocol level for identity and privacy reasons.

We also have reputation amongst your other peers. Every single time there’s a HTLC, it doesn’t get cancelled back. There will be some sort of metrics for robustness. You don’t want to connect to a faulty peer…

Q - It seems to me from your presentation that the replay protection is built into the router switch which will slow down performance of the router switch. Why have you done it that way and can you move the replay protection to the edge, the end node?

A - That’s actually how it works. I didn’t get to where the switch replay protection was put in that diagram. It actually happens at the link level, at the outer edges. Almost all the logic in the switch was pushed out of the interior as much as possible to enable performance as you were saying. Even when we’re adding to the circuit map because of the nature of the way incoming channels work, when I’m receiving a HTLC and I submit a batch to the switch, the only person who would ever submit that range of channel IDs is that link. As long as it itself isn’t replaying them there’s going to be no contention there. Those are all done in parallel and the only thing that goes through the centre of switch is all held in memory.

Q - I have two practical questions. I was running lightningd and then I was running lnd. I was trying to grab the graph and compute the number of nodes and it didn’t match. I ran them against the same Bitcoin node and consistently lightningd, the C implementation was giving me about 800 nodes and lnd was giving me about 480 or something like that. On mainnet.

A - There might be a couple of reasons for that. One is that there is no guarantee that you’ll end up seeing the same actual graph. It depends on who those nodes are connected to and what gossip information you’re getting from them. It is like a hopefully eventually consistent system. Like I said, there is no guarantee that you will get all routing updates, they will come as a best effort thing. The other reason that there might be differences is depending on the validation that the different implementations are applying, you might be storing invalid announcements on one while the other is doing more heavy filtering which is what I would suggest is the primary reason.

lnd only accepts nodes that have channels open but if another implementation is a little more relaxed on validation then you can see that the values would be different. Different implementations have different policies on when they garbage the channel. It could be the case that a channel had been there for a year and nothing had happened with it, we’ll forget that but the implementation may not forget that. They should be around the same size but depending on your knowledge of private channels you could have a bigger graph than someone else because they don’t really know the extent of a private channel. There could be some alternative private network channel that no one knows about but is in use super heavily right now.

The ability to garbage collect those old nodes is really beneficial to your routing performance and usability because if you’re spending time trying to route through dead nodes, that’s going to increase the time and the number of trials it takes to make a payment go through.

We’ll prune them.

Q - When you open a channel you commit some funds to the channel. Is it possible to observe those funds being depleted gradually at the level of network or not visible? You just see the event of closing the channel?

A - Network as in RPC or peer-to-peer network?

Q - Let’s say RPC.

A - For RPC you can just pull. We have that forwarding events thing now which basically will show you every single forwarding event in the channel. You can use that to see what’s happening. There’s not yet a streaming RPC but there will be one in the future. There’s also listchannels that you can just pull and see the balance…

Q - I think you mentioned this a little bit earlier when you had the dust transaction, a transaction that’s super small. And then you’re going through an intermediary node and say what if he goes uncooperative or he goes silent? Then a channel’s HTLC… so you’d have to have the output for that log. But if it’s super small you can’t have that output. So would it be stuck there?

A - You’re saying a dust HTLC never gets fully completed?

Q - Yeah

A - We won’t record that in the log. We only record things that get extended and then come back across. Nodes have a dynamic dust limit so I can say “I only accept HTLCs above 2 satoshis”. We can also do things around ensuring our commitment transaction is always valid by consensus or by popular policy to avoid having that dust output there.

Q - So you kind of mentioned that you would have a minimum transaction that you would have to have. You couldn’t have a micropayment or anything like that.

A - It depends. Certain nodes will say “I’m a high value node so I don’t accept micropayments” but other ones will say “I want that because I want the fees and I want the frequency of payments.”

Q - I wasn’t sure if lightning network was supposed to handle very very large transactions vs smaller faster daily transactions.

A - It can do both. We also have this thing called AMP which lets you split a larger payment into a bunch of smaller payments. If people have smaller channels you can still do $100 payment through a bunch of $10 channels.