diff options
author | Michael Folkson <michaelfolkson@gmail.com> | 2020-12-09 15:11:39 +0000 |
---|---|---|
committer | Michael Folkson <michaelfolkson@gmail.com> | 2020-12-09 15:11:39 +0000 |
commit | da8c155bcca634bb95307b0311645180ec9b8d86 (patch) | |
tree | 0f92b05b33f586af8068f4c94b9680b7f5a61401 | |
parent | b9d62568a7ccad212dc0976aa14c68fe682bac3c (diff) | |
download | diyhpluswiki-da8c155bcca634bb95307b0311645180ec9b8d86.tar.gz diyhpluswiki-da8c155bcca634bb95307b0311645180ec9b8d86.zip |
Add edits to Pieter on bech32
-rw-r--r-- | transcripts/sf-bitcoin-meetup/2017-03-29-new-address-type-for-segwit-addresses.mdwn | 191 |
1 files changed, 80 insertions, 111 deletions
diff --git a/transcripts/sf-bitcoin-meetup/2017-03-29-new-address-type-for-segwit-addresses.mdwn b/transcripts/sf-bitcoin-meetup/2017-03-29-new-address-type-for-segwit-addresses.mdwn index 309e696..5a00c6a 100644 --- a/transcripts/sf-bitcoin-meetup/2017-03-29-new-address-type-for-segwit-addresses.mdwn +++ b/transcripts/sf-bitcoin-meetup/2017-03-29-new-address-type-for-segwit-addresses.mdwn @@ -1,209 +1,178 @@ -New address type for segwit addresses +Name: Pieter Wuille -Pieter Wuille +Topic: Bech32 addresses for Bitcoin -2017-03-29 +Location: SF Bitcoin Devs -<https://twitter.com/kanzure/status/847569047902273536> +Date: March 27th 2017 -video: <https://www.youtube.com/watch?v=NqiN9VFE4CU> +Video: <https://www.youtube.com/watch?v=NqiN9VFE4CU> -slides: <https://prezi.com/gwnjkqjqjjbz/bech32-a-base32-address-format/> +Slides: <https://prezi.com/gwnjkqjqjjbz/bech32-a-base32-address-format/> -proposal: <https://github.com/sipa/bech32/blob/master/bip-witaddr.mediawiki> +Proposal: <https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki> -demo website: <http://bitcoin.sipa.be/bech32/demo/demo.html> +Demo website: <http://bitcoin.sipa.be/bech32/demo/demo.html> -Can everyone hear me fine through this microphone? Anyone who can't hear me, please raise your hand. Oh wait. All good now? Okay. +Twitter announcement: <https://twitter.com/kanzure/status/847569047902273536> -Tonight I will be speaking on a project I've been working on on-and-off for the past year or so, which is the question of what kind of addresses we will be using in the future. Recently I [proposed a BIP](https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/013749.html) after several long discussions among some people. I think we have a great proposal. So today I will be talking about the proposal itself and how it came to be. +Transcript completed by: Bryan Bishop Edited by: Michael Folkson -This was joint work with several people, in particular Greg Maxwell, who is here as well, and my colleagues at Blockstream. Most of this work was done thanks to the computation power of their computers. I'll talk about that more. +# Intro -So this is the outline of my talk. First I'll talk about why we need a new address type. Going forward. The decision to use [base32](https://en.wikipedia.org/wiki/Base32) over [base58](https://en.bitcoin.it/wiki/Base58Check_encoding) as has been used historically. Once the choice for base32 has been made, there are a bunch of open design questions like what checksum to use, what character set to use, and what the address structure looks like. Optimal character set depends on optimal choice of checksum, which may be surprising. And then combining this into a new format, which I am calling [bech32](https://github.com/sipa/bech32/blob/master/bip-witaddr.mediawiki). +Can everyone hear me fine through this microphone? Anyone who can't hear me please raise your hand. Oh wait. All good now? Tonight I will be speaking on a project I've been working on on and off for the past year or so, which is the question of what kind of addresses we will be using in Bitcoin in the future. Recently I proposed a [BIP](https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/013749.html) after several long discussions among some people. I think we have a great proposal. So today I will be talking about the proposal itself and how it came to be. This was joint work with several people, in particular Greg Maxwell who is here as well, and my colleagues at Blockstream. Most of this work was done thanks to the computation power of their computers. I'll talk about that more. So this is the outline of my talk. First I'll talk about why we need a new address type going forward. The decision to use [base32](https://en.wikipedia.org/wiki/Base32>) rather than [base58](https://en.bitcoin.it/wiki/Base58Check_encoding) as has been used historically. Once the choice for base32 has been made, there are a bunch of open design questions like what checksum to use, what character set to use, and what the address structure looks like. Optimal character set depends on optimal choice of checksum, which may be surprising. And then combining this into a new format, which I am calling [bech32](https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki) # Why? -Segregated witness is a proposal that I presented a bit over a year ago in Hong Kong for the first time. And it is now in a state of perhaps being deployed on the bitcoin network. Segregated witness needs to encode new output types. They are described in [bip143](https://github.com/bitcoin/bips/blob/master/bip-0143.mediawiki). Pay to witness hash and pay to witness script hash and there are some possible extensions later. +Segregated Witness is a proposal that I [presented](https://www.youtube.com/watch?v=zchzn7aPQjI&list=PLX1jlrc_KO1fDYsUsCeswSXIgzlXyh3sl&index=6) a bit over a year ago in Hong Kong for the first time. It is now in a state of perhaps being deployed on the Bitcoin network. Segregated Witness needs to encode new address types. They are described in [BIP 143](https://github.com/bitcoin/bips/blob/master/bip-0143.mediawiki) pay-to-witness-pubkey-hash (P2WPKH) and pay-to-witness-script-hash (P2WSH) and there are some possible extensions later. SegWit supports two ways of using these, either inside of [P2SH](https://github.com/bitcoin/bips/blob/master/bip-0016.mediawiki) which is an address type that has been supported for years on the Bitcoin network, making it backward and forward compatible with every wallet out there created in the past few years. However, going forward, we really want this to happen natively. This gives us better efficiency for spending as we don't need the overall backward compatibility layer of the redeem script that P2SH gives us. Secondly it gives us 128 bit security for script hashes. P2SH only delivers 80 bits which is becoming questionable. This proposal replaces [BIP 142](https://github.com/bitcoin/bips/blob/master/bip-0142.mediawiki) which was an older base58 based proposal and I think this one is much better. -Segwit supports two ways of using these, either inside of [P2SH](https://github.com/bitcoin/bips/blob/master/bip-0016.mediawiki) which is an address type that has been supported for years on the bitcoin network, making it backwards compatible and forwards compatible with every wallet out there created in the past few years. However, going forward, we really want this to happen natively. This gives us better effiiency for spending as we don't need the overall backwards compatibility layer of redeemScript of P2SH gives us. And it gives us 128-bit security, whereas P2SH gives us only 80 bits which is becoming questionable. +# Base32 -This proposal replaces [bip142](https://github.com/bitcoin/bips/blob/master/bip-0142.mediawiki) which was an older base58-based proposal. And I think this new one is much better. - -# Why base32? - -First of all due to the more limited alphabet we can restrict ourselves to just lowercase or just uppercase making the address format case insensitive. This makes it much easier to read or write addresses down, as anyone who has tried to write an address down or anyone who has had to write it down from someone reading it over the phone. To be clear, my hope is that in the long term that bitcoin will not need addresses. Hopefully humans will not need to interact at all anymore, there have been some proposals going in that direction but for the time being it seems we are not there and we need a solution regardless. - -Being a power of 2 means it is much easier to convert; just take the bits from the data, split them into bits, arrange into groups of five, and the groups become the base32 characters. With base58 you need some logic to turn the whole thing into a huge number and then convert to a new number, which is a lot of work. - -For all of this we have a downside, it's 17% larger than base58 due to less information fitting into one character. - -Due to 32 being a prime power, we can support mathematical field over the characters, we can use a lot of research on strong error detection which doesn't exist for something like base58. Due to being case insensitive, it is also more compact to store in QR codes which have a special mode for encoding alphanumeric data which only works for case insensitive data. - -base32 is being used for other things already such as [onion addresses in tor](https://trac.torproject.org/projects/tor/wiki/doc/HiddenServiceNames) and in i2p and several other projects. +Why base32? First of all due to the more limited alphabet we can restrict ourselves to just lowercase or just uppercase making the address format case insensitive. This makes it much easier to read or write addresses down as anyone who has ever tried to write an address down or type it after someone reads it over the phone will easily confirm. To be clear, my hope is that in the long term Bitcoin doesn’t need addresses anymore. We will get a real solution where humans don’t need to interact with the cryptographic material at all anymore. There have been proposals going in that direction but it seems for the time being we are not there so we need a solution regardless. 32 being a power of 2 means it is much easier to convert. We can just take the bytes from the data, split them into bits, rearrange them into groups of 5, take these groups of 5 and those become your base32 characters. Compare this to base58 where you really need BigNum logic to turn the whole thing into a huge number and then convert to a new basis and so on which is a quadratic algorithm. For all of that we get a downside. It is 17 percent larger than base58 just due to being less information fitting in one character. That’s going to be the main topic of what I’ll be talking about. Due to 32 being a prime power we can support a mathematical field over the characters. We can use a lot of research on strong error detection codes which doesn’t exist something like base58. Due to being case insensitive it is also more compact to store in QR codes, which have a special mode for encoding alphanumeric data. But this only works for case insensitive things. Base32 is also being used for many cases already including onion addresses in Tor and I2P and several other projects. # Checksum -We fix the idea that we are going to use base32 for all these reasons. What are we going to do about the checksum? First I should point out a few design decisions we faced early on. +We fix the idea that we are going to use base32 for all these reasons. What are we going to do about the checksum character set and structure? First I should point out a few design decisions we fix early on. All of this will be about the chance of misdetecting an invalid address as a valid address because when that happens you will be sending money into a black hole. That’s what we want to prevent. Six characters is the minimum we need to make the chance that a random string gets accepted as an address less than one in a billion. That’s just a design choice. Six characters, that is what we are going with. This means if this address type is going to be generic for every witness output which can have up to 320 bits of data, we really need 71 characters under the checksum. So we are looking for error detecting codes that detect as many errors as possible with a checksum of length fixed, with a total message of length 71. I am now going to look at a few of the design choices we went through ending up with the one we picked, the BCH code at the end. -We want a all of this will be about the chance of getting an invalid address as a valid address. When you send money and this happens, you send money into a black hole. So we want to pick design decisions that minimize this. Six characters are what we need to minimize the chance that a random string gets accepted as a bitcoin address less than 1 in a billion. So that's just a design choice, 6 characters, that's what we're going with. +# Minimum distance of a code -If this address type is going to be generic for every witness output which can have up to 320 bits of data, we really need 71 characters under the checksum. So we're looking for error detecting codes that detect as many errors as possible with a checksum of length fixed, with a message length of 71. Choice: 6 characters extra. - -First I need to clarify the concept of distance. Anyone who knows about coding theory will find this trivial. I'll explain anyway. The distance of a code or the minimum distance or hamming distance is how many characters in a valid address you need to at least change to turn it into a different valid address. So what is the minimum number of characters to change the address? All the black dots are valid addresses. The minimum distance of the code shown here is 4. You need to cross 4 black lines between any 2 black dots. There is a very fundamental theorem that says if your distance is n, you can detect up to n minus 1 errors. This is obvious to see. If you start from any of these black dots and make up to 3 errors, you never end up at another black dot. This shows how a distance 4 code can detect 3 errors. There is an equivalence where if you have a code that can detect n errors, it can correct n/2 errors. From any point in the diagram, you go to the closest black dot. If you are on an intersection point that is a distance 2 from a number of black dots, there are a number of choices s oyou can't correct 2 errors you can only correct 1 error. +First I need to clarify the concept of distance. Anyone who knows something about coding theory will find this trivial but I’ll explain anyway. The distance of a code or the minimum distance of a code or the Hamming distance of a code is how many characters within a valid address you need to at least change to turn it into another valid address. The minimum number of different characters between two different addresses. This diagram gives a nice demonstration. All the lines are single character changes and all the black dots are valid addresses. The minimum distance of the code shown here is 4. You need to at least cross black lines between any two black dots. There is a very fundamental theorem that says if your distance is n you can detect up to n-1 errors. This is really obvious to see. If you start from any of these black dots and you make up to 3 errors, you follow up to 3 black lines, you never end up with another black dot. This shows you how a distance 4 code can detect 3 errors. There is also an equivalence. If you have a code that can detect n errors it can also correct n-2 errors. From any point in the diagram you go to the closest black dot. If you are on a black dot that is on an intersection point that is a distance 2 from a number of black dots, there are multiple choices you can make. You cannot correct 2 errors but you can correct 1. If they were all 5 apart you could correct 2. What we will be looking for is things that are 5 apart. # CRC codes -The first thing to look at is [CRC codes](https://en.wikipedia.org/wiki/Cyclic_redundancy_check). They are the most traditional [checksum](https://en.wikipedia.org/wiki/Checksum) algorithm used in a wide variety of protocols. However, they are bit-based. This makes sense because in most protocols what we care about are bit errors. However, here, this is not the case. We don't directly care about bit errors, we care about symbol or character errors. When someone makes a mistake, they will make an entire character error, and every character is 5 bits. So here is an example of a B becoming a V. This is one character changed but actually 9 bits. - -CRC codes are optimized for detecting a number of bit errors. We want something to detect 4 errors, we really need something that can detect 20 bit errors then. That's a lot. Turns out finding something that can detect 20 bit errors is impossible. But we really don't care about that-- we care about symbol errors that result in structured bit errors. They always appear in groups of 5, and we care about the number of groups that are wrong, not just bits. So it's in fact possible to find a CRC that gives us distance 4, but we can do better. +The first thing to look at is [CRC codes](https://en.wikipedia.org/wiki/Cyclic_redundancy_check). They are the most traditional [checksum](https://en.wikipedia.org/wiki/Checksum) algorithm used in a wide variety of protocols. However, they are bit based. This makes sense because in most protocols what we care about are bit errors. However, here this is not the case. We don’t care directly about bit errors, we care about symbol errors or character errors. Whenever someone makes a mistake they will make an entire character error and every character is 5 bits. Here is an example where the B is turned into a J and the D is turned a V. Even though this is only two characters that are changed you can see that it is actually 9 bits that are changed. CRC codes are designed to optimize for detecting a number of bit errors. Which means that if we want something that can detect 4 errors we really need something that can detect 20 bit errors, which is a lot. It turns out that finding something that can detect 20 bit errors is impossible. But we don’t really care about that. We really care about these symbol errors which result in bit errors that are somewhat structured. They always occur in groups of 5 and we care about the number of groups that are wrong, not just the bits. It is in fact possible to find a CRC that gives us distance 4 but we can do better. # RS codes -Probably the best known type of checksum algorithm that allows error correction is [Reed-Solomon codes](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction), which work directly over symbols. Which is great. Unfortunately they are limited to the length of the size of the alphabet minus 1. Typically RS codes are done over 8 bits of data. In our case, our alphabet size is 32 using base32, means we are limited to error detection in strings of length 31, which is too short, we cannot fit enough data into this. - -So one possibility is to use an alphabet extension where you look at two characters at once. So with size 1024 which is 2^10 you could see 2 characters at once. However this is still limited to distance 4. Is there nothing better we can get? - -<https://www.youtube.com/watch?v=NqiN9VFE4CU&t=12m50s> +Probably the best known type of checksum algorithm that allows error correction are [Reed-Solomon codes](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction). These work directly over symbols which is great. Unfortunately they are limited in length to the size of the alphabet minus 1. Typically Reed-Solomon codes are done over 8 bits of data which means that they can work over code words of length 255 which is `2^8 - 1`. But in our case our alphabet size is 32 using base32, which means we would be limited to doing error detection in strings of length 31. This is too short. We cannot fit enough data into that. So a possibility is to use an alphabet extension where you are really looking at two characters at once. You are really doing a size 1024 which is `2^10` alphabet. You see 2 characters in your code as one symbol. This is possible. However it is still limited to distance 4. We are really trying to see if there is nothing better we can get. # BCH codes -About a year ago, I was looking at [BCH codes](https://en.wikipedia.org/wiki/BCH_code), which are a generalization of Reed-Solomon codes that drop the restriction of the alphabet size. Most of the research on BCH codes are about bit-based ones, but this is not necessary. A lot of the research is applicable to larger ones as well. - -So we're going to create a BCH code over our size 32 alphabet. The theory, which you can read about on Wikipedia, if you do the math, you can construct BCH codes with distance 5. Yay. I'll soon show you a diagram for why this actually matters. - -There's still a huge design space with many free parameters in this BCH class of codes. So we'll need a way to figure out which one to use. It turns out that even if you fix the field size, there are about 160,000 different codes available. +About a year ago I found out about [BCH codes](https://en.wikipedia.org/wiki/BCH_code) which are a generalization of Reed-Solomon code that drop this restriction of being limited to the alphabet size minus 1. Most of the research on BCH code is actually about bits based ones but this is not necessary. A lot of research is applicable to larger ones as well. We are going to create a BCH code over our size 32 alphabet. In fact it turns out the theory that you can read about in nice articles on Wikipedia, if you do the math you can construct a BCH code with distance 5. Yay! I’ll soon show you a diagram for why this actually matters. It turns out there is still a huge design space, many parameters are free in this BCH class of codes. So we’ll need a way to figure out which one to use. # BCH code selection -How are we going to pick one? So I started with random sampling and try to see which ones are better even if you give it more errors than it was designed for. So these were for errors with distance 4 or 5, so 3 or 4 errors detected. But wha if you give it one more error? Some might be better beyond the designed limit. So we started on this project to try to characterize all the different possible codes here. +Even if you fix the field size there are about 160,000 different ones. When confronted with this I thought how are we ever going to pick one? I started trying to do random sampling and see which ones are actually better even if you give it more errors than they are designed for. These are codes that are designed for having distance 4 or 5 meaning they will detect 3 or 4 errors. But what if you give them one more error? Probably if these are all different codes some of them are actually better if you go beyond their limit. I started on this project of trying to characterize all the different codes that are possible here. -How do characterize or analyze such a code? All 71-character addresses, that's a really high number, this is about 2^355 way beyond the number of atoms in the universe. However, BCH codes belong a class called linear codes. Linear codes means that if you have a valid code and you look at the values at every character it represents, and errorwise you add every character to each, the sum will again be another valid code word. If you want to see if the code is maybe below distance 5, you need to check does there exist any pair of 4 non-zero values over the 71 positions whose checksum is zero? If that is the case, then you can add that to any value codeword and the result would be another valid codeword and then you have established two valid codewords with distance 4... This is about 12 trillion results. And the next realization is that you can do a collision search. Instead of looking for 4 errors, you build a table with all the results from 2 errors, compute what their checksums would be to correct it, sort the table, and then look whether there are any two identical ones. Because now you have two codewords that meet the same checksum to correct it, and if you add them together, now you have 2 times 2 you have 4 changes and the checksum cancels out, through this collision search you can find by only doing the square root which makes it feasible... about 12024159379589 combinations for linear codes, and when doing collision search it's 1702704605 combinations. +How do you characterize such a code because all 71 character addresses is that ridiculously large number that we need to go over to characterize them? This is about 2^350 so way, way, way beyond the number of atoms in the universe. However, BCH codes belong to a class called linear codes. Linear codes means if you have a valid code word and you look at the values that every character represents and pairwise add every character to another valid code word the sum will again be a valid code word. What you only need to look for if you want to see whether this code below distance 5, you need to check does there exist any pair of 4 non-zero values over these 71 positions whose checksum is zero. Because if that is the case you can add that to any valid code word and your result will be another valid code word. Now we have established two valid code words with distance 4. It turns out that is still 12 trillion combinations which is painfully large. Maybe not entirely impossible but more than we were able to do. The next realization is that you can use a collision search. Instead of looking for 4 errors what you do is you look for only 2. You build a table with all the results on 2 errors, compute what their checksum would be to correct it, sort that table and look whether there are any 2 identical ones. Now you have 2 code words that need the same checksum to correct it. If you XOR them, if you add them together, now you have 2 x 2 = 4 changes and the checksum cancels out. Through this collision search you can now find by only doing the square root of the amount of work which makes it feasible. -We were able to do exhaustive search. We were starting with about 160,000 codes and we set a requirement that they actually have distance 5 at length 71 and there are 28,000 left. Which one to pick from that? +There are a bunch of other optimizations on top which I may talk about later if there is interest. We are able to do some search. We start from this 159,605 codes, let’s require that they actually have distance 5 at length 71. There are 28,825 left. Which one to pick from now? What you want to do is look at how they behave at 1 beyond the limit. All of these codes, these 28,825 codes, they all detect 4 errors at length 71. But what if you give them 5 errors? What if you give them 5 errors that are only appearing in a burst very close together or you give them randomly distributed behaviors? It turns out if we pick some reasonable assumptions about how we weigh the random case versus the worst case there are 310 best ones that are identical. -What you wnat to do is then look at how they behave at one beyond the limit. So we looked at all 28,000 codes, they all detect 4 errors at length 71, but what if you give them 5 errors and 5 errors that are maybe only appearing together or maybe randomly distributed 5 errors, and they have different behaviors. It turns out that if we pick some reasonable assumptions about how we weigh the random case vs best case, there are 310 that were identical. +# Effective error detecting power as a function of error rate (graph) -To show you how much this matters, I want to show you this grpah which contains a lot of information. It shows you the detection power of a particular error detection code in function of the transit of every character in the word is wrong. This makes the assumption that every character is independent from every other. You can see the 1% line, the red line, is the error detection code we chose. You can see it's about 2^-38 which means 1 in 250 billion chance that an error you make will not be detected. The blue line is a 32 bit hash function as the current address format base58 does it. And we can see that it doesn't matter what your error rate is, every error has the same chance of being detected. But we can reasonably assume that the average error rate in an address is not large. We can assume maybe it's 1 per address, hopefully less than 1, especially if we switch to case-insensitive coding, it could become even less likely. And the yellow line shows what we could have chosen if we didn't do this exhaustive analysis to find the best one, the yellow line is the worst possible code, it has this annoying bump where its detection error probability goes -- we don't even guarantee the 1 in billion chance for that code. So this is just to show you one how great optimizing for low number of errors is, because it gives you under some assumptions, a much better model. And it also shows you how much it matters to do this analysis. It really makes a difference. +To show you how much this matters I want to show this graph that contains a lot of information. What this graph shows you is the detecting power of a particular error detection code in a function of the chance that every character individually is wrong. This makes the assumption that every character is independent from every other. For example you can see that the 1 percent line, the red line is the error detection code we chose. You can see it is about 2^(-38) which means that it is 1 in 250 billion, the chance that an error you make would not be detected. The blue line is what the 32 bit hash function would do as the current address format, base58. You can see it doesn’t matter what your error rate is. Any error has exactly the same chance of being detected. We can reasonably assume that the average number of errors that would be expected within an address is not large. We can assume that it is maybe 1 in an address, hopefully less than 1 especially if we switch to case insensitive coding. It would become even less likely. The yellow line shows what we could have chosen if we didn’t do this exhaustive analysis to find the best one. The yellow line is actually the worst possible code. You can see that it has this annoying bump where its detection probability even goes below 1/(2^30). We don’t even guarantee this 1 in a billion chance for that code. This is just to show you how great optimizing for a low number of errors is because it gives you under these assumptions a much better model. But it also shows you how much it matters to do this analysis. It clearly makes a difference. -From those 310 codes, we picked the code with the best bit error rate behavior. We had these 310 identical codes in number of how many characters can be wrong they all behaved identically. So we still need some criterion to pick one. We have all this analysis available, so whta to pick? It will soon become clear why it becomes useful to optimize for a low bit error rate. +# BCH code selection (continued) -What if we're only looking for errors that only change 1 bit in a character? How does the code behave then? For random errors, they are identical anyway. Now we only have two left, and we just pick one of them. This took many years of CPU time. I think we had 200 cpu cores available to do analysis on, so this only took a couple weeks, but it was in total more than 10 years of computation time. Until we discovered that really these 310 identical codes, it's a pattern, all codes appear in groups of 310 identical ones. A couple weeks ago, when we identified which the exact changes were, some of them... became a lot more feasible to analyze, we were no longer restricted to just 160,000 BCH codes, we could look at all the 3.6 million cyclic codes which is a much larger class of functions. Turns out that if you make the range of things you're looking for larger, you find better things. However, we did not pick anything from this. The results from the search indeed which was now feasible after dividing by 310 because we took from each class just to test one instead of otherwise a billion codes, there was only 3.6 million left. It turns out that some of them were slightly better than what we were already found. We weren't going to change it because of efficient error location algorithms available for BCH codes, which are not available if you pick an arbitrary code, and the difference isn't much so we're sticking with BCH codes. +From those 310 we pick the codes with the best bit error rate behavior. We still have these 310 identical codes. For how many characters can be wrong they behave identically. No difference at all. We still need some criteria to pick one. We have all this analysis available anyway so what to pick? It will soon become clear why it is useful to optimize for low bit error rates. What this means is what if we are only looking at errors that change 1 bit in a character? How does the code behave then? For random errors they are identical anyway. Now we only have 2 left and we just pick one of them. This took many, many years of CPU time. I think we have something like 200 CPU cores available to do analysis on. This only took a couple of weeks. But in total it was more than ten years of computation time. Until we discovered that these 310 identical ones, it is a pattern. All codes appear in groups of 310 identical ones. A couple of weeks ago, when we identified what these exact changes were, suddenly a lot more became feasible to analyze. We were no longer restricted to those 160,000 BCH codes. We could in fact look at all the 3.6 million 6 character cyclic codes which is a much larger class of functions. It turns out that if you make the range of things you are looking for larger, you find better things. However we did not pick anything from this. The results from this search, which was now feasible after dividing by 310, because we could from each class test one. Instead of a billion codes there were only 3.6 million left. It turns out some of them were slightly better than what we had already found but we are not changing it for the reason that there are efficient error location algorithms available for these BCH codes. These aren’t available if you pick an arbitrary code. The difference isn’t much so we are sticking to it. -<https://www.youtube.com/watch?v=NqiN9VFE4CU&t=24m> +# Character set -# Character sets +Now the question of the character set. There exists various character sets for base32 already. There is the [RFC3548](https://tools.ietf.org/html/rfc3548) which is a standard. There’s the z based base32 standard, various other standards for base32 data that have been used in the past. We’re still going to pick a different one. The reason for this is we were able to select our code to optimize for low bit error rates. Wouldn’t it be great if we could choose the character set in such a way that 1 bit errors are more likely than non 1 bit errors. This character set is the result of another year of CPU time to optimize for this. We found a bunch of information on tables for similarity between various characters. What you can see on this slide, the z and the 2 are considered similar in some fonts or writing. As you can see they are 8 apart. One is 2 and the other is 10 so they are 1 bit apart. And r and t are 1 bit apart. And y and v are 1 bit apart. And x and k are 1 bit apart. And e and a are 1 bit apart. And s and 5 are 1 bit apart. And 4 and h are 1 bit apart. There are way more similar errors overlaid in this data that you can look for. It’s pretty cool. We made a character set that optimizes for 1 bit errors. As a result our code is distance 6 for 1 bit errors. If you just look at these 1 bit errors we guarantee 5 errors. If you only make errors like this you can detect 5. -There exist various character sets for base32 already. There's [RFC3548](https://tools.ietf.org/html/rfc3548) which is a standard.. the z-base32 standard, various other standards for base32 data that have been used in the past. We're still going to pick a different one. The reason for this is that we were able to select our code to optimize for low-bit error rates, in such a way that 1-bit errors are no more likely than other errors... So this took another year of CPU time to optimize for this. We looked at tables for similarity between characters. The z and 2 are considered similar in some fonts or writing. As you can see they are "8" apart, one is 2 and the other is 10, so they are 1 bit apart. r and t are 1 bit apart, and x and k are one bit apart, and e and a are 1 bit apart, and the s and 5 are 1 bit apart. There are way more similar errors overlaid in this data that you can look for. It's pretty cool. We made a character set that optimizes for 1-bit errors, so our code is distance 6 for 1 bit errors. If you just look at 1 bit errors, we guarantee 5 errors. If you only make errors like that, we can detect 5. +Q - Did you take into account QWERTY keyboard distance in that selection? -Q: Do you take into account qwerty keyboard distance? +A - We did not take QWERTY keyboard distance into account. -A: We did not take qwerty keyboard distance into account. +Greg Maxwell: We considered it but the visual component is uniform. In formal testing it looked like the visual component was more commonly a source of errors. But the visual component is almost always in the single path whereas a QWERTY keyboard may not be in the single path. -We could consider it, but the visual component is uniform.... more commonly a source of errors. The visual path is always in the path, and a qwerty keyboard is not necessarily always in the path. +# Effective error detecting power as a function of error rate -What this diagram shows you is, blue line is hash from base58 using a 32-bit hash, the purple line is the same thing for the address checksum algorithm we chose, but restricted to 1-bit errors. So you if you only make this class of errors that we consider more likely, you can see it's even more strong, making it 32 times less likely for something like that to be not detected. - -Something else you can see on this diagram is that the line for 1 expected error in the whole address, there's a crossover point at 3.53, what this means is that the checksum algorithm despite being shorter, only 30 bits checksum, despite that, for up to 3.5 expected errors in the whole address, it's actually stronger than the 32-bit checksum being used in base58. For only likely errors it is up to 4.85 for address. +What this diagram shows you, the blue line is again what the 32 bit hash, the current address format, would do. The red line is for arbitrary errors using the checksum algorithm we’ve chosen. The purple line is the same thing for the checksum algorithm we chose but restricted to 1 bit errors. So if you only make this class of errors that we consider more likely, you can see that it’s even stronger. For 1 percent you can see you get another 5 bit of detection chance making it 32 times less likely for something like that to not be detected. Something else you can see on this diagram is the line for 1 expected error in the whole address. You can see that there is a crossover point at 3.53 (expected errors per address). What this means is that the checksum algorithm despite being shorter, it is only a 30 bit checksum, despite that for up to 3.53 expected errors in your whole address it is actually stronger than the 32 bit checksum that was being used in the base58. For only likely errors it is even up to 4.85 per address. A great win. # Structure -One last thing... how do we combine all of that into a real address? We have a character set and a checksum algorithm, how to structure a segwit address? The first is a human-readable part, which is "BC" standing for bitcoin. For testnet, it's "tb", visually distinct from "BC". There is a separator "1", which does not appear in the character set. This means that the human readable is always unambiguously separated from the data that follows, even if the human readable part has the number 1... then there's a flexible part. And for segwit addresses, this is generic, in there is the data for the witness version, the witness program, and the checksum which is 6 characters. The result of this for P2WPKH address is 42 characters, and 62 characters for P2WSH. Up to length 74 for future witness versions. I don't expect this to be a big problem. It's more compact for QR codes, it's easier to read and write. It's 62 for P2WSH because higher security.... 320 bit hashes, the length can go up to 74, although I don't really expect that, since fairly reasonable security target.. +One last thing, how do we combine all of that into a real address because we have a character set and we have a checksum algorithm. How are we going to structure SegWit addresses? It consists of three major parts. The first is the human readable part which for Bitcoin addresses in our proposal will be `bc` standing for Bitcoin. For testnet it is `tb` which is still only two characters but visually distinct from `bc`. Then there is the separator which is always `1`. `1` is a character that does not appear in your character set. This means that the human readable part is always unambiguously separated from the data that follows. Even if the human readable part itself contains a `1`. That makes it extra flexible. Then there is the data part which uses the character set as I described before. For SegWit addresses, but this is generic, in there is the data: the witness version, the witness program and the checksum which is the last 6 characters. The result of this for a pay-to-witness-pubkey-hash (P2WPKH) address would be 42 characters rather than 34 so it is a bit longer. Base32 is a bit less efficient, the checksum is longer and the prefix of two characters adds up. But I don’t expect this to be a big problem. It is slightly larger but it is more compact in QR codes. It is much easier to read and write. Only things that care about visual space does this matter. It is 62 for pay-to-witness-script-hash (P2WSH) because it uses 256 bit hashes rather than 160 for higher security. For future witness versions which support up to 320 bit hashes the length can go up to 74. Though I don’t expect that as 256 bit is a very reasonable security target. # Bech32 -All of this together is what I call bech32. It's a generic data format for things like bitcoin addresses. One instance of it for segwit addresses. It could be used for various other things that have similar requirements. I don't know what those would be, though. +All of this together gives bech32 which is a generic data format for things like addresses. One instance of it for the use of SegWit addresses but it could be used for various other things that have similar requirements. I don’t know what. It seems strange that most of the research on checksum human readable data uses 1 checksum character or 2 checksum characters. Think about bank account numbers or a few similar things. There seems to be very little research on how to make an actually strong checksum that is still designed for human consumption. I hope this can become perhaps a standard for how to do this. There is a link to the [BIP](https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki). In all of this I have not mentioned one of the most important design goals, code simplicity. I’ve been talking about these error detection codes which are typically very complicated to deal with. However they are only complicated to deal with if you are actually interested in error correction. Error detection is trivial. -It seems strange that most of the research you can find on checksum human-readable data using like 1 checksum character or two checksum characters... like bank account numbers, a few similar things, it seems to be that there is little research on how to make actually strong checksum that is still designed for human consumption. I hope that this becomes perhaps a standard for how to do this. There's a link to the BIP address proposal. +# Checksum algorithm code -In all of this, I haven't mentioned the most important design goal, which was simplicity. I've been talking about error detection codes, which are typically very complicated to deal with. But they are only complex if you want to deal with error correction. Detection is simple. +https://github.com/sipa/bech32/blob/master/ref/python/segwit_addr.py#L27-L36 -In bech32, no bignum code requirement, no sha256 dependency. I will also show you, sorry for making you sea sick, what we did make was [a small demo website](http://bitcoin.sipa.be/bech32/demo/demo.html) that shows how ... because it is an actual error correction algorithm, you could optionally implement it to show you error location. The spec allows this, it strongly advises against doing actual error correction because if someone types an address wrong and wants to complain and not try to correct it for them-- because they might end up with a valid address that is not the one they intended. +Ta da. This is the checksum algorithm. It uses no bignum conversion, it has no SHA256 dependency. It is just these ten lines. Of course you need more for the character set conversion and converting to bytes for the witness program. But the mathematical part of the spec is just this. -Say they change this C to an X, it points out that this is likely the problem. It could even be inside the checksum. The algorithm inside this website supports up to 2 errors. We have one that works for up to 3. None of the contributors to this project were not great UI people so we didn't have good ideas for how to do this. +# Demo website -That's it. Thank you very much for your attention. +http://bitcoin.sipa.be/bech32/demo/demo.html -<https://www.youtube.com/watch?v=NqiN9VFE4CU&t=34m45s> +We also made a small demo website. Because it is an actual error correction algorithm behind the scenes, even if we are not using it, you could optionally implement it to do error location. The spec allows this. It strongly advises against doing actual error correction because if someone types an address wrong you want to go complain and tell the user to go look what the address is and not try to correct it for them. They might end up with a valid address that is not the one they intended. Here is an example. If you change this `v` to a `x` it will point out that `x` is likely what you did wrong. This can even be inside the checksum. The algorithm can support up to 2. We have an algorithm that supports up to 3 but not most of the time but is still quite frequently wrong. There are ways to deal with this like showing multiple possibilities to the user. None of the contributors to this project are great UI people so we really didn’t know how to do this. Here’s a P2WSH [example](http://bitcoin.sipa.be/bech32/demo/demo.html) as well. This website is linked from the BIP. You can play with it if you are interested. That’s it. Thank you very much for your attention. # Q&A -Do we have some questions? And wait for the mic. - -Q: Do you know how many bitcoin are lost per year due to like human readable character mistakes that this corrects for? - -A: I have no idea. I expect it to be low. But I also expect we wouldn't know about it. It's hard to quantify and it's very nice to just categorically sweep it away and improve the state of the art there. +Q - Do you have any idea how many Bitcoin are lost per year due to human readable character mistakes that this corrects for? -Q: In the error location code, cna it suggest corrections? +A - I have no idea. I expect it to be low but I also expect that we wouldn’t know about it. It is hard to quantify and it is very nice to just categorically improve the state of the art there. -A: Yes. There is full error correction. It knows what the likely error is. It intentionally does not show you. You should go back and try again, instead of being shown. Greg suggests I explain why this matters. Okay. A code with distance D can either detect D-1 errors, or correct D-1 over 2... so this means our code which is distance 5 can detect 4 errors but can only correct 2. You can be in the middle of two valid code words and then you don't know which one to go to. So if you make 4 errors, which will never result in a valid codeword, you might have gotten closer to another valid codeword than the one you started off from. So this means that your ability to do error detection is erroded by trying to do correction. +Q - In the error location code, can it suggest corrections there? -The key point to make here is that if you make 4 errors, you will almost certainly get it corrected to an address that you did not intend. If you make 4 errors, and run the error correction algorithm which can make up to 2, it will correct it to the wrong thing with very high probability. With 3 it's likely, with 2 never. With 4 it's certain. +A - Yes. The location code actually does full error correction. It knows what the likely error is but it intentionally does not show you because you should go back and look at what it should be rather than try things. Greg suggests I explain why this matters. A code with distance d can either detect d-1 errors or correct (d-1)/2. This means that our code which is distance 5 can detect 4 errors but can only correct 2. This is because you can be in the middle of two valid code words and then you don’t know which one to go to. So if you make 4 errors, which will never result in a valid code word, you may however have gotten closer to another valid code word than the one you started off from. This means that your ability to do error detection is eroded by trying to do correction. -This is different when you are dealing with private keys. This is maybe future work. We're looking at a similar standard like the one here, for encoding private keys. To tell the user that the private key is wrong, that's not what you want, you want to do the error correction there. So we are looking at a stronger checksum there that has more than 6 characters extra, but could correct a reasonable number of errors, not just detect them. +Greg Maxwell: The key point to make here is that if you make 4 errors it will almost certainly correct it to a valid address which is not the address you intended. This is why you don’t want to correct the errors. -Q: Thanks for presenting. I was wondering, the private key thing made me wonder, can this be used for bip32 master keys? +As Greg points out, if you make 4 errors and run the error correction algorithm which can make up to 2, it will correct to the wrong thing with a very high probability. For 4 it is almost certain, with 3 it is likely, with 2 never. This is for example different when you are dealing with private keys. Maybe that is something I should mention as future work. We are also looking at a similar standard like the one here for encoding private keys. But for private keys telling the user “Sorry your private key is wrong”, that is not what you want. You really want to do the error correction there. There we are looking at something with a stronger checksum that has more than 6 characters extra but actually can correct a reasonable number of errors and not just detect them. -A: So the checksum algorithm we show here ends up being really good up to length 89. It was designed for length 71. It's really good up to length 89, not more. So this is approximately 450 bits of data. This is enough for a seed, but not enough for a master key, which has more bit of entropy because it has a chaincode as well as the key. So the future work is looking for stronger checksum which is both able to correct a reasonable number of errors for short things like private keys, but also has good performance for longer things. And given that we're talking about longer strings there anyway, you don't care if it's 6 or 10 or 15 characters for the private key format. +Q - The private key thing made me wonder whether this could be used also for master keys at the root of BIP 32 or something similar to it? -Q: Could bech32 implicitly support segwit in the practice of false signalling where miners are signalling for something other than Core.... +A - The checksum algorithm we chose here ends up being really good up to length 89. It was designed to be good up to 71. It turns out it is really good up to 89. Not more. This is approximately 450 bits of data. That is enough for a seed but not enough for a master key, which has 512 bits of entropy because it has a chaincode as well as a key. Future work is looking for a stronger checksum which can both correct a reasonable number of errors for short things like private keys, but also has good performance for longer things. Given that we are talking about longer strings there anyway you don’t care whether it adds 6 or 10 or 15 characters. -A: That's completley orthogonal. All of what I have been talking about here is on the wallet implementation side. It has no relation at all as to what has been implemented in consensus rules. This is just something that wallets would implement if they want. So, no. +Q - Could bech32 implicitly support SegWit in the practice of false signaling where miners are signaling for something other than Core while running Core in the background? -Q: At the beginning of your talk you mentioned there were some proposals that would abstract bitcoin addresses away from users. What would those might be? +A - That is completely orthogonal. All of what I have been talking about here is on the wallet implementation side. It has no relation at all to what is implemented in consensus rules, full nodes, peer-to-peer protocols, miners. None of this cares about it. This is just something that wallets need to implement if they want. So no. -A: The only proposal that had some traction in the past was [bip70](https://github.com/bitcoin/bips/blob/master/bip-0070.mediawiki) which is the payment protocol you may have heard about it. Instead of having an address, you have a file, which is a payment descriptor. Your address becomes a URL to that file. Because in almost all cases where a bitcoin transaction is taking place, the user or sender is already interacting with the recipient anyway, so why do we need an address anyway? We can just give information through the website or software. This doesn't solve the problem of what happens if your website is being intercepted, but addresses don't solve that either. I think there were some mistakes made in the specification of bip70 that makes it less useful than it could be... there's not much software that implements it, it's hard to implement, it requires SSL certificates, something in that direction I think would be great if it was done with what we have learned so far since then, but it might be difficult to get it adopted and I am not very hopeful unfortunately. +Q - At the beginning of your talk you mentioned that there were some proposals that would completely abstract Bitcoin addresses away from users. Can you talk briefly about what those might be? -Q: ... +A - I wouldn’t say promising but the only proposal that had some traction in the past is [BIP 70](https://github.com/bitcoin/bips/blob/master/bip-0070.mediawiki). This is the payment protocol you may have heard about. Instead of having an address you have a file that is a payment descriptor. Your address becomes a URL to that file. In almost cases where a Bitcoin transaction is taking place the sender is already interacting with the recipient anyway so why do we need an address? They are on their website. They can just give the information through that to the software. This doesn’t solve any of the problems of what if your website is being intercepted? But neither do addresses. Complications with this is I think some mistakes that were made in the specification of this that makes it less useful than it could have been. There is not all that much software that implements it. It is hard to implement, it requires SSL certificates. Something in that direction I think would be great. If it were done with what we have learned so far but it may be very hard to get something adopted. I am not very hopeful unfortunately. -A: So that is.. yes, it does. Because ... P2WPKH addresses, still have the ripemd 160 hash of the... that is part of the segwit proposal, it doesn't have anything to do with this address format that abstracts over this.. Yes the data being encoded in this address for a P2WPKH is ripemd 160 yes, ... not the case for P2WSH. +Q - With this implementation does RIPEMD160 come into it? -Q: Optimizations in terms of classic spending error correction or for example what happens if I miss one character and I end up adding a single character. +A - Yes it does because pay-to-witness-pubkey-hash (P2WPKH) addresses still contain a RIPEMD160 hash of the public key being sent to. That is part of the SegWit proposal. That doesn’t really have anything to do with this address type that abstracts over any data. The data being encoded in the address for a P2WPKH is RIPEMD160. It is no longer for P2WSH, it is just SHA256 there. -A: I think Greg should answer this question. +Q - I am curious if you have done the optimization in terms of the classic spending error correction style. For example what if I miss one character and end up adding a second one at a later point? -Yes, what you are describing is a drop and an add, is a correct length. So this is a burst of errors in a short space. The codes that we use are also selected by virtue of their selection to have properties that are good for small bursts. So they do detect the insert and delete case better than by chance than you would expect. Although we haven't implemented it, we could implement one that provides useful hints for the drop and insert case. +Greg Maxwell: What you are describing where someone drops a character and inserts a character, they end up with the correct length. What this results in is a burst of errors all confined within a short space between the span that they dropped them. Although Pieter didn’t talk about it the codes that we use are also selected by virtue of their construction to have very good properties for small bursts as well. They do detect them by chance more than you would expect from it being a 30 bit check. Although we haven’t implemented it, we could implement one that provides useful hints for the drop and insert case. There’s a table in the BIP that gives a breakdown of all the errors occurring in the window which is specifically interesting for this case because shorter windows are more common for burst errors, just from a counting argument. -There's a table in the BIP that gives a breakdown of the performance for different sizes. +Q - With traditional Bitcoin addresses you have a 1 and a 3 that tells the wallet what type of scriptPubKey. In this case it is the length that determines…? -Q: With traditional bitcoin addresses, it's a 1 and 3, and that tells the scriptpubkey.. in this case it's the length? +A - No it isn’t. The q you see in red is the witness version. All of it is SegWit outputs. But SegWit outputs have a version number and a hash in combination with the length. Both P2WPKH and P2WSH use version 0 but one uses 20 bytes of hash and the other uses 32 bytes. -A: No, it isn't. The q you see in red is the witness version. These are all segwit outputs. These are version number and a hash in combination with the length. So both P2WPKH and P2WSH. They are both version 0, one uses 20 bytes the other uses 32 bytes. +Greg Maxwell: Commenting more on the length. Because future SegWit versions may use different lengths, the specification allows the lengths to be many possible lengths. There is some validation code on what lengths are allowed. Not every possible length is an allowed length as a result of the 8 bit to 5 bit conversion that occurs between the two sizes. -Commenting more on the length. Because future segwit versions may use different lengths, the specification allows the lengths to be many possible lengths, and there's some validation on what to allow it, not every possible length is allowed, from virtue of the 8 bit to 5 bit conversion. +It is also important that this address type intentionally makes an abstraction. While the former thing using the 1 or the 3 selected P2PKH or P2SH, someone implementing the address type does not really need to know about the existence of a witness pubkey hash or a witness script hash, just the version number and data. The sender even shouldn’t care about what is in there, it is the receiver’s business. There are some sanity checks if you know more. -It's important that this address type makes an intentional distinction... someone implementing the address type does not need to know about the existence of the scripthash, just the version number and the data. The sender shouldn't even care about what's in there, that's the receiver's business. There are some sanity checks you could do. +Greg Maxwell: We generally consider it a flaw that the sender knows anything about what kind of scheme the receiver is using. We try to abstract that away. -We generally consider it a flaw that the sender konws anything about the scheme that the receiver is using, so we try to abstract that away. +Q - How does the probability change for address collisions from the old addresses to the new? -Q: I think I am missing something basic. How does the probability change for address collisions? +A - What do you mean by address collision exactly? -A: What do you mean by address collision? +Q - The chances of you generating the same addresses on the old system are absolutely miniscule. Does this increase that probability that two people generate the same address? -Q: Chances of generating two addresses. Does this increase the probability that... +A - There is still 160 bits or 256 bits of data inside the address and that’s the only thing that matters. All of this is about the checksum, not about the data in the addresses which is still using traditional hash functions. Nothing changes there. If you are using a 160 bit one, it goes even dramatically down if you are using 256 bits because now you have something proportional to 2^(-128) rather than 2^(-80) for a collision. -A: There is still 160 bits of data in the address, that's all that matters. This is only about the checksum. The data inside the address is still a hash. If you are using 256 bits, you have something like 1 to 1 to the minus 128 for a collision, which is a big improvement. +Q - Do you see any applications outside of Bitcoin for this? Maybe it will help with marketing. -Q: Applications outside of bitcoin to this? Maybe helping with marketing a little bit? +A - Anything where humans are dealing with small strings of cryptographic material. The example I gave already that uses base32 is onion addresses in Tor for example. I don’t know if they are interested but something similar could be applicable there. They have different requirements for error detection. I guess it isn’t all that bad if you accidentally…. Maybe it is also important to point out, this does not really prevent intentionally similar looking addresses. We have this guarantee that any two valid addresses differ in at least 5 characters which makes it somewhat hard for an attacker to generate two similar looking ones. But making that many characters similar is computationally due to the hash function already being very hard. Sorry that was not really an answer to your question. -A: Anything where humans are dealing with small strings of cryptographic material. The example I gave already that uses base32 is onion addresses in tor, for example. I don't know if they are interested in it, but something similar to bech32 could be applicable there, but they have different requirements for error detection, it isn't that bad if you accidentally go to the wrong tor onion server. This does not really prevent intentionally similar looking addresses. We have this guarantee that any two valid address are different by at least 5 characters, which makes it hard for an attacker to make the characters look similar, well the hash function output in there makes it really hard--- sorry, not really an answer to your question I guess. +Q - Are most of the design decisions around things like the character map and the code generator, are they documented along with the BIP? -Q: Are most of the design decisions around character map and the sort of code generator, documented in the bip? +A - Yes. They are briefly. There is a rationale section that explains why many of the decisions were made. -A: Yeah. There's a rationale section that explains why many of the decisions were made. +Q - Is there any way to use this scheme for traditional pay-to-pubkey or pay-to-script-hash? -Q: Is there any way to use this scheme for traditional pay to pubkey or pay to scripthash? +Intentionally not as I think it would be a very confusing thing if there were multiple encodings, multiple addresses that refer to the same output. You might get someone using one form of address because that is what their wallet does but then they go to a block explorer website which shows them another. They can’t tell which one was intended and this isn’t compatible with the software. I think we should think of addresses as the thing you are sending to. It happens to map to a particular scriptPubKey but we shouldn’t see it as just an encoding of some data. That could be confusing. -A: Intentionally not, because I think it would be confusing to have multiple encodings that refer to the same outputs. You could have someone using one form of address but then they go to a block explorer that shows them another one, so since they don't know which one was intended you'll end up with compatibility errors. I think we should think of addresses as the thing you are sending to. It happens to map to a particular scriptpubkey but we should just see it as an encoding as some data could be confusing. +Q - I know it is super early but have you talked to any wallet implementers about this yet? -Q: I know it's super eary, but have you talked with wallet implementors about this? +A - While designing this we talked to various people before publishing it. In particular, GreenAddress, Armory, Electrum. There have been comments by various others. Many of them gave suggestions. -A: Yes. While designing this, we tlaked with various people before publishing it. Like GreenAddress, Armory, Electrum, different comments from various wallet providers. Many of them gave suggestions. +Q - I’m sure they appreciate the simplicity. -Q: I am sure they appreciate the simplicity. +A - I hope so. I had comments like “You are using an error detection algorithm. We need to implement new cryptography.” I’m like “These ten lines of code.” In practice you need to implement more than that of course. The whole reference implementation for encoding and decoding in Python is 120 lines. -A: I hope so. I had comments like "oh no a new error correction algorithm" and then I showed them hey it's only ten lines of code. They liked that. In practice, you need to implement more than that, but the whole reference implementation for encoding and decoding turns out to be about 120 lines of code. +Q - What is your opinion on everyone not caring about Bitcoin anymore and flipping to Ethereum? -Q: Thank you for presenting. There... what about flipping to ethereum? +A - People do what they want. -A: People do what they want. |