New address type for segwit addresses

Pieter Wuille





demo website:

Can everyone hear me fine through this microphone? Anyone who can't hear me, please raise your hand. Oh wait. All good now? Okay.

Tonight I will be speaking on a project I've been working on on-and-off for the past year or so, which is the question of what kind of addresses we will be using in the future. Recently I proposed a BIP after several long discussions among some people. I think we have a great proposal. So today I will be talking about the proposal itself and how it came to be.

This was joint work with several people, in particular Greg Maxwell, who is here as well, and my colleagues at Blockstream. Most of this work was done thanks to the computation power of their computers. I'll talk about that more.

So this is the outline of my talk. First I'll talk about why we need a new address type. Going forward. The decision to use base32 over base58 as has been used historically. Once the choice for base32 has been made, there are a bunch of open design questions like what checksum to use, what character set to use, and what the address structure looks like. Optimal character set depends on optimal choice of checksum, which may be surprising. And then combining this into a new format, which I am calling bech32.


Segregated witness is a proposal that I presented a bit over a year ago in Hong Kong for the first time. And it is now in a state of perhaps being deployed on the bitcoin network. Segregated witness needs to encode new output types. They are described in bip143. Pay to witness hash and pay to witness script hash and there are some possible extensions later.

Segwit supports two ways of using these, either inside of P2SH which is an address type that has been supported for years on the bitcoin network, making it backwards compatible and forwards compatible with every wallet out there created in the past few years. However, going forward, we really want this to happen natively. This gives us better effiiency for spending as we don't need the overall backwards compatibility layer of redeemScript of P2SH gives us. And it gives us 128-bit security, whereas P2SH gives us only 80 bits which is becoming questionable.

This proposal replaces bip142 which was an older base58-based proposal. And I think this new one is much better.

Why base32?

First of all due to the more limited alphabet we can restrict ourselves to just lowercase or just uppercase making the address format case insensitive. This makes it much easier to read or write addresses down, as anyone who has tried to write an address down or anyone who has had to write it down from someone reading it over the phone. To be clear, my hope is that in the long term that bitcoin will not need addresses. Hopefully humans will not need to interact at all anymore, there have been some proposals going in that direction but for the time being it seems we are not there and we need a solution regardless.

Being a power of 2 means it is much easier to convert; just take the bits from the data, split them into bits, arrange into groups of five, and the groups become the base32 characters. With base58 you need some logic to turn the whole thing into a huge number and then convert to a new number, which is a lot of work.

For all of this we have a downside, it's 17% larger than base58 due to less information fitting into one character.

Due to 32 being a prime power, we can support mathematical field over the characters, we can use a lot of research on strong error detection which doesn't exist for something like base58. Due to being case insensitive, it is also more compact to store in QR codes which have a special mode for encoding alphanumeric data which only works for case insensitive data.

base32 is being used for other things already such as onion addresses in tor and in i2p and several other projects.


We fix the idea that we are going to use base32 for all these reasons. What are we going to do about the checksum? First I should point out a few design decisions we faced early on.

We want a all of this will be about the chance of getting an invalid address as a valid address. When you send money and this happens, you send money into a black hole. So we want to pick design decisions that minimize this. Six characters are what we need to minimize the chance that a random string gets accepted as a bitcoin address less than 1 in a billion. So that's just a design choice, 6 characters, that's what we're going with.

If this address type is going to be generic for every witness output which can have up to 320 bits of data, we really need 71 characters under the checksum. So we're looking for error detecting codes that detect as many errors as possible with a checksum of length fixed, with a message length of 71. Choice: 6 characters extra.

First I need to clarify the concept of distance. Anyone who knows about coding theory will find this trivial. I'll explain anyway. The distance of a code or the minimum distance or hamming distance is how many characters in a valid address you need to at least change to turn it into a different valid address. So what is the minimum number of characters to change the address? All the black dots are valid addresses. The minimum distance of the code shown here is 4. You need to cross 4 black lines between any 2 black dots. There is a very fundamental theorem that says if your distance is n, you can detect up to n minus 1 errors. This is obvious to see. If you start from any of these black dots and make up to 3 errors, you never end up at another black dot. This shows how a distance 4 code can detect 3 errors. There is an equivalence where if you have a code that can detect n errors, it can correct n/2 errors. From any point in the diagram, you go to the closest black dot. If you are on an intersection point that is a distance 2 from a number of black dots, there are a number of choices s oyou can't correct 2 errors you can only correct 1 error.

CRC codes

The first thing to look at is CRC codes. They are the most traditional checksum algorithm used in a wide variety of protocols. However, they are bit-based. This makes sense because in most protocols what we care about are bit errors. However, here, this is not the case. We don't directly care about bit errors, we care about symbol or character errors. When someone makes a mistake, they will make an entire character error, and every character is 5 bits. So here is an example of a B becoming a V. This is one character changed but actually 9 bits.

CRC codes are optimized for detecting a number of bit errors. We want something to detect 4 errors, we really need something that can detect 20 bit errors then. That's a lot. Turns out finding something that can detect 20 bit errors is impossible. But we really don't care about that-- we care about symbol errors that result in structured bit errors. They always appear in groups of 5, and we care about the number of groups that are wrong, not just bits. So it's in fact possible to find a CRC that gives us distance 4, but we can do better.

RS codes

Probably the best known type of checksum algorithm that allows error correction is Reed-Solomon codes, which work directly over symbols. Which is great. Unfortunately they are limited to the length of the size of the alphabet minus 1. Typically RS codes are done over 8 bits of data. In our case, our alphabet size is 32 using base32, means we are limited to error detection in strings of length 31, which is too short, we cannot fit enough data into this.

So one possibility is to use an alphabet extension where you look at two characters at once. So with size 1024 which is 210 you could see 2 characters at once. However this is still limited to distance 4. Is there nothing better we can get?

BCH codes

About a year ago, I was looking at BCH codes, which are a generalization of Reed-Solomon codes that drop the restriction of the alphabet size. Most of the research on BCH codes are about bit-based ones, but this is not necessary. A lot of the research is applicable to larger ones as well.

So we're going to create a BCH code over our size 32 alphabet. The theory, which you can read about on Wikipedia, if you do the math, you can construct BCH codes with distance 5. Yay. I'll soon show you a diagram for why this actually matters.

There's still a huge design space with many free parameters in this BCH class of codes. So we'll need a way to figure out which one to use. It turns out that even if you fix the field size, there are about 160,000 different codes available.

BCH code selection

How are we going to pick one? So I started with random sampling and try to see which ones are better even if you give it more errors than it was designed for. So these were for errors with distance 4 or 5, so 3 or 4 errors detected. But wha if you give it one more error? Some might be better beyond the designed limit. So we started on this project to try to characterize all the different possible codes here.

How do characterize or analyze such a code? All 71-character addresses, that's a really high number, this is about 2355 way beyond the number of atoms in the universe. However, BCH codes belong a class called linear codes. Linear codes means that if you have a valid code and you look at the values at every character it represents, and errorwise you add every character to each, the sum will again be another valid code word. If you want to see if the code is maybe below distance 5, you need to check does there exist any pair of 4 non-zero values over the 71 positions whose checksum is zero? If that is the case, then you can add that to any value codeword and the result would be another valid codeword and then you have established two valid codewords with distance 4... This is about 12 trillion results. And the next realization is that you can do a collision search. Instead of looking for 4 errors, you build a table with all the results from 2 errors, compute what their checksums would be to correct it, sort the table, and then look whether there are any two identical ones. Because now you have two codewords that meet the same checksum to correct it, and if you add them together, now you have 2 times 2 you have 4 changes and the checksum cancels out, through this collision search you can find by only doing the square root which makes it feasible... about 12024159379589 combinations for linear codes, and when doing collision search it's 1702704605 combinations.

We were able to do exhaustive search. We were starting with about 160,000 codes and we set a requirement that they actually have distance 5 at length 71 and there are 28,000 left. Which one to pick from that?

What you wnat to do is then look at how they behave at one beyond the limit. So we looked at all 28,000 codes, they all detect 4 errors at length 71, but what if you give them 5 errors and 5 errors that are maybe only appearing together or maybe randomly distributed 5 errors, and they have different behaviors. It turns out that if we pick some reasonable assumptions about how we weigh the random case vs best case, there are 310 that were identical.

To show you how much this matters, I want to show you this grpah which contains a lot of information. It shows you the detection power of a particular error detection code in function of the transit of every character in the word is wrong. This makes the assumption that every character is independent from every other. You can see the 1% line, the red line, is the error detection code we chose. You can see it's about 2^-38 which means 1 in 250 billion chance that an error you make will not be detected. The blue line is a 32 bit hash function as the current address format base58 does it. And we can see that it doesn't matter what your error rate is, every error has the same chance of being detected. But we can reasonably assume that the average error rate in an address is not large. We can assume maybe it's 1 per address, hopefully less than 1, especially if we switch to case-insensitive coding, it could become even less likely. And the yellow line shows what we could have chosen if we didn't do this exhaustive analysis to find the best one, the yellow line is the worst possible code, it has this annoying bump where its detection error probability goes -- we don't even guarantee the 1 in billion chance for that code. So this is just to show you one how great optimizing for low number of errors is, because it gives you under some assumptions, a much better model. And it also shows you how much it matters to do this analysis. It really makes a difference.

From those 310 codes, we picked the code with the best bit error rate behavior. We had these 310 identical codes in number of how many characters can be wrong they all behaved identically. So we still need some criterion to pick one. We have all this analysis available, so whta to pick? It will soon become clear why it becomes useful to optimize for a low bit error rate.

What if we're only looking for errors that only change 1 bit in a character? How does the code behave then? For random errors, they are identical anyway. Now we only have two left, and we just pick one of them. This took many years of CPU time. I think we had 200 cpu cores available to do analysis on, so this only took a couple weeks, but it was in total more than 10 years of computation time. Until we discovered that really these 310 identical codes, it's a pattern, all codes appear in groups of 310 identical ones. A couple weeks ago, when we identified which the exact changes were, some of them... became a lot more feasible to analyze, we were no longer restricted to just 160,000 BCH codes, we could look at all the 3.6 million cyclic codes which is a much larger class of functions. Turns out that if you make the range of things you're looking for larger, you find better things. However, we did not pick anything from this. The results from the search indeed which was now feasible after dividing by 310 because we took from each class just to test one instead of otherwise a billion codes, there was only 3.6 million left. It turns out that some of them were slightly better than what we were already found. We weren't going to change it because of efficient error location algorithms available for BCH codes, which are not available if you pick an arbitrary code, and the difference isn't much so we're sticking with BCH codes.

Character sets

There exist various character sets for base32 already. There's RFC3548 which is a standard.. the z-base32 standard, various other standards for base32 data that have been used in the past. We're still going to pick a different one. The reason for this is that we were able to select our code to optimize for low-bit error rates, in such a way that 1-bit errors are no more likely than other errors... So this took another year of CPU time to optimize for this. We looked at tables for similarity between characters. The z and 2 are considered similar in some fonts or writing. As you can see they are "8" apart, one is 2 and the other is 10, so they are 1 bit apart. r and t are 1 bit apart, and x and k are one bit apart, and e and a are 1 bit apart, and the s and 5 are 1 bit apart. There are way more similar errors overlaid in this data that you can look for. It's pretty cool. We made a character set that optimizes for 1-bit errors, so our code is distance 6 for 1 bit errors. If you just look at 1 bit errors, we guarantee 5 errors. If you only make errors like that, we can detect 5.

Q: Do you take into account qwerty keyboard distance?

A: We did not take qwerty keyboard distance into account.

We could consider it, but the visual component is uniform.... more commonly a source of errors. The visual path is always in the path, and a qwerty keyboard is not necessarily always in the path.

What this diagram shows you is, blue line is hash from base58 using a 32-bit hash, the purple line is the same thing for the address checksum algorithm we chose, but restricted to 1-bit errors. So you if you only make this class of errors that we consider more likely, you can see it's even more strong, making it 32 times less likely for something like that to be not detected.

Something else you can see on this diagram is that the line for 1 expected error in the whole address, there's a crossover point at 3.53, what this means is that the checksum algorithm despite being shorter, only 30 bits checksum, despite that, for up to 3.5 expected errors in the whole address, it's actually stronger than the 32-bit checksum being used in base58. For only likely errors it is up to 4.85 for address.


One last thing... how do we combine all of that into a real address? We have a character set and a checksum algorithm, how to structure a segwit address? The first is a human-readable part, which is "BC" standing for bitcoin. For testnet, it's "tb", visually distinct from "BC". There is a separator "1", which does not appear in the character set. This means that the human readable is always unambiguously separated from the data that follows, even if the human readable part has the number 1... then there's a flexible part. And for segwit addresses, this is generic, in there is the data for the witness version, the witness program, and the checksum which is 6 characters. The result of this for P2WPKH address is 42 characters, and 62 characters for P2WSH. Up to length 74 for future witness versions. I don't expect this to be a big problem. It's more compact for QR codes, it's easier to read and write. It's 62 for P2WSH because higher security.... 320 bit hashes, the length can go up to 74, although I don't really expect that, since fairly reasonable security target..


All of this together is what I call bech32. It's a generic data format for things like bitcoin addresses. One instance of it for segwit addresses. It could be used for various other things that have similar requirements. I don't know what those would be, though.

It seems strange that most of the research you can find on checksum human-readable data using like 1 checksum character or two checksum characters... like bank account numbers, a few similar things, it seems to be that there is little research on how to make actually strong checksum that is still designed for human consumption. I hope that this becomes perhaps a standard for how to do this. There's a link to the BIP address proposal.

In all of this, I haven't mentioned the most important design goal, which was simplicity. I've been talking about error detection codes, which are typically very complicated to deal with. But they are only complex if you want to deal with error correction. Detection is simple.

In bech32, no bignum code requirement, no sha256 dependency. I will also show you, sorry for making you sea sick, what we did make was a small demo website that shows how ... because it is an actual error correction algorithm, you could optionally implement it to show you error location. The spec allows this, it strongly advises against doing actual error correction because if someone types an address wrong and wants to complain and not try to correct it for them-- because they might end up with a valid address that is not the one they intended.

Say they change this C to an X, it points out that this is likely the problem. It could even be inside the checksum. The algorithm inside this website supports up to 2 errors. We have one that works for up to 3. None of the contributors to this project were not great UI people so we didn't have good ideas for how to do this.

That's it. Thank you very much for your attention.


Do we have some questions? And wait for the mic.

Q: Do you know how many bitcoin are lost per year due to like human readable character mistakes that this corrects for?

A: I have no idea. I expect it to be low. But I also expect we wouldn't know about it. It's hard to quantify and it's very nice to just categorically sweep it away and improve the state of the art there.

Q: In the error location code, cna it suggest corrections?

A: Yes. There is full error correction. It knows what the likely error is. It intentionally does not show you. You should go back and try again, instead of being shown. Greg suggests I explain why this matters. Okay. A code with distance D can either detect D-1 errors, or correct D-1 over 2... so this means our code which is distance 5 can detect 4 errors but can only correct 2. You can be in the middle of two valid code words and then you don't know which one to go to. So if you make 4 errors, which will never result in a valid codeword, you might have gotten closer to another valid codeword than the one you started off from. So this means that your ability to do error detection is erroded by trying to do correction.

The key point to make here is that if you make 4 errors, you will almost certainly get it corrected to an address that you did not intend. If you make 4 errors, and run the error correction algorithm which can make up to 2, it will correct it to the wrong thing with very high probability. With 3 it's likely, with 2 never. With 4 it's certain.

This is different when you are dealing with private keys. This is maybe future work. We're looking at a similar standard like the one here, for encoding private keys. To tell the user that the private key is wrong, that's not what you want, you want to do the error correction there. So we are looking at a stronger checksum there that has more than 6 characters extra, but could correct a reasonable number of errors, not just detect them.

Q: Thanks for presenting. I was wondering, the private key thing made me wonder, can this be used for bip32 master keys?

A: So the checksum algorithm we show here ends up being really good up to length 89. It was designed for length 71. It's really good up to length 89, not more. So this is approximately 450 bits of data. This is enough for a seed, but not enough for a master key, which has more bit of entropy because it has a chaincode as well as the key. So the future work is looking for stronger checksum which is both able to correct a reasonable number of errors for short things like private keys, but also has good performance for longer things. And given that we're talking about longer strings there anyway, you don't care if it's 6 or 10 or 15 characters for the private key format.

Q: Could bech32 implicitly support segwit in the practice of false signalling where miners are signalling for something other than Core....

A: That's completley orthogonal. All of what I have been talking about here is on the wallet implementation side. It has no relation at all as to what has been implemented in consensus rules. This is just something that wallets would implement if they want. So, no.

Q: At the beginning of your talk you mentioned there were some proposals that would abstract bitcoin addresses away from users. What would those might be?

A: The only proposal that had some traction in the past was bip70 which is the payment protocol you may have heard about it. Instead of having an address, you have a file, which is a payment descriptor. Your address becomes a URL to that file. Because in almost all cases where a bitcoin transaction is taking place, the user or sender is already interacting with the recipient anyway, so why do we need an address anyway? We can just give information through the website or software. This doesn't solve the problem of what happens if your website is being intercepted, but addresses don't solve that either. I think there were some mistakes made in the specification of bip70 that makes it less useful than it could be... there's not much software that implements it, it's hard to implement, it requires SSL certificates, something in that direction I think would be great if it was done with what we have learned so far since then, but it might be difficult to get it adopted and I am not very hopeful unfortunately.

Q: ...

A: So that is.. yes, it does. Because ... P2WPKH addresses, still have the ripemd 160 hash of the... that is part of the segwit proposal, it doesn't have anything to do with this address format that abstracts over this.. Yes the data being encoded in this address for a P2WPKH is ripemd 160 yes, ... not the case for P2WSH.

Q: Optimizations in terms of classic spending error correction or for example what happens if I miss one character and I end up adding a single character.

A: I think Greg should answer this question.

Yes, what you are describing is a drop and an add, is a correct length. So this is a burst of errors in a short space. The codes that we use are also selected by virtue of their selection to have properties that are good for small bursts. So they do detect the insert and delete case better than by chance than you would expect. Although we haven't implemented it, we could implement one that provides useful hints for the drop and insert case.

There's a table in the BIP that gives a breakdown of the performance for different sizes.

Q: With traditional bitcoin addresses, it's a 1 and 3, and that tells the scriptpubkey.. in this case it's the length?

A: No, it isn't. The q you see in red is the witness version. These are all segwit outputs. These are version number and a hash in combination with the length. So both P2WPKH and P2WSH. They are both version 0, one uses 20 bytes the other uses 32 bytes.

Commenting more on the length. Because future segwit versions may use different lengths, the specification allows the lengths to be many possible lengths, and there's some validation on what to allow it, not every possible length is allowed, from virtue of the 8 bit to 5 bit conversion.

It's important that this address type makes an intentional distinction... someone implementing the address type does not need to know about the existence of the scripthash, just the version number and the data. The sender shouldn't even care about what's in there, that's the receiver's business. There are some sanity checks you could do.

We generally consider it a flaw that the sender konws anything about the scheme that the receiver is using, so we try to abstract that away.

Q: I think I am missing something basic. How does the probability change for address collisions?

A: What do you mean by address collision?

Q: Chances of generating two addresses. Does this increase the probability that...

A: There is still 160 bits of data in the address, that's all that matters. This is only about the checksum. The data inside the address is still a hash. If you are using 256 bits, you have something like 1 to 1 to the minus 128 for a collision, which is a big improvement.

Q: Applications outside of bitcoin to this? Maybe helping with marketing a little bit?

A: Anything where humans are dealing with small strings of cryptographic material. The example I gave already that uses base32 is onion addresses in tor, for example. I don't know if they are interested in it, but something similar to bech32 could be applicable there, but they have different requirements for error detection, it isn't that bad if you accidentally go to the wrong tor onion server. This does not really prevent intentionally similar looking addresses. We have this guarantee that any two valid address are different by at least 5 characters, which makes it hard for an attacker to make the characters look similar, well the hash function output in there makes it really hard--- sorry, not really an answer to your question I guess.

Q: Are most of the design decisions around character map and the sort of code generator, documented in the bip?

A: Yeah. There's a rationale section that explains why many of the decisions were made.

Q: Is there any way to use this scheme for traditional pay to pubkey or pay to scripthash?

A: Intentionally not, because I think it would be confusing to have multiple encodings that refer to the same outputs. You could have someone using one form of address but then they go to a block explorer that shows them another one, so since they don't know which one was intended you'll end up with compatibility errors. I think we should think of addresses as the thing you are sending to. It happens to map to a particular scriptpubkey but we should just see it as an encoding as some data could be confusing.

Q: I know it's super eary, but have you talked with wallet implementors about this?

A: Yes. While designing this, we tlaked with various people before publishing it. Like GreenAddress, Armory, Electrum, different comments from various wallet providers. Many of them gave suggestions.

Q: I am sure they appreciate the simplicity.

A: I hope so. I had comments like "oh no a new error correction algorithm" and then I showed them hey it's only ten lines of code. They liked that. In practice, you need to implement more than that, but the whole reference implementation for encoding and decoding turns out to be about 120 lines of code.

Q: Thank you for presenting. There... what about flipping to ethereum?

A: People do what they want.