Received: from sog-mx-1.v43.ch3.sourceforge.com ([172.29.43.191] helo=mx.sourceforge.net) by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1WGgB6-0006J0-4m for bitcoin-development@lists.sourceforge.net; Fri, 21 Feb 2014 02:49:28 +0000 Received: from mail-pb0-f50.google.com ([209.85.160.50]) by sog-mx-1.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) id 1WGgB4-000122-6Q for bitcoin-development@lists.sourceforge.net; Fri, 21 Feb 2014 02:49:28 +0000 Received: by mail-pb0-f50.google.com with SMTP id rq2so2808869pbb.9 for ; Thu, 20 Feb 2014 18:49:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:organization:user-agent :mime-version:to:subject:content-type:content-transfer-encoding; bh=E9W0O79z+g80fOwWxsOTWMHO0rmYm26ZNBOxUnL1Fqk=; b=KFnEJtUmWMLOJXDirA/NHSaR2otBbO6wqMqkKGkg96QcWCqp+XY9/zDV/5Hgr3DIQt s7yzDQgziLndIdlY7tlCd5p1jmUoo97hVnP+eByv9yl+WClj3MT89ewWEfqUzTqAGglE QdGIUzdna+7cMrY5HWxC9ELjTDM8SeFttYQ0NuXLa/2NQok/n9NZzs8dXG1yA+xDf3fb Wkm83DRNpeo+XZ/Sr3pyqQkAf5EfFVKf56vevVM5s8DHu8d7ua+94FVVGZObgeomgMbE j1y2jppBB0qULbBlP8JL/AKxYjukdTJDNrC5d8AvkAsGqVx67wOy129DzkZMlpH70sRY x55Q== X-Gm-Message-State: ALoCoQl0QsHTjRrZF8i6aMldfpSXxaJzynHST1/NfIzu97RHgg9834idf0CdkdLP7FYppwltIlLL X-Received: by 10.68.241.198 with SMTP id wk6mr6123523pbc.11.1392950469018; Thu, 20 Feb 2014 18:41:09 -0800 (PST) Received: from [192.168.127.217] (50-0-36-232.dsl.dynamic.sonic.net. [50.0.36.232]) by mx.google.com with ESMTPSA id kc9sm15952216pbc.25.2014.02.20.18.41.06 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 20 Feb 2014 18:41:08 -0800 (PST) Message-ID: <5306BCC1.8040004@monetize.io> Date: Thu, 20 Feb 2014 18:41:05 -0800 From: Mark Friedenbach Organization: Monetize.io Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Bitcoin Dev X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. X-Headers-End: 1WGgB4-000122-6Q Subject: [Bitcoin-development] Base-32 error correction coding X-BeenThere: bitcoin-development@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Feb 2014 02:49:28 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 What follows is a proposed BIP for human-friendly base-32 serialization with error correction encoding. A formatted version is viewable as part of a gist with related code: https://gist.github.com/maaku/8996338#file-bip-ecc32-mediawiki An implementation of this BIP and associated APIs is made available as a pull request, with comprehensive testing: https://github.com/bitcoin/bitcoin/pull/3713 This format is anticipated to be useful for helpdesk-related data (e.g. the proposed normalized transaction ID), and future wallet backup & paper wallet serialization formats. == Abstract == The BIP proposes an human-centered encoding format for base-32 data serialization. It differs from the presumptive default hexadecimal or base58 encodings in the following ways: 1. Visually distinctive in that it includes the full range of alphanumeric digits in its base-32 encoding, except the characters 0, l, v, and 2. which are too easily confused with 1, i, u, r, or z in font or handwriting. 2. Automatic correction of up to 1 transcription error per 31 coded digits (130 bits of payload data). For a 256-bit hash or secret key, this enables seamless recovery from up to two transcription errors so long as they occur in separate halves of the coded representation. 3. Highly probable detection of errors beyond the error correction threshold, with a false negative rate on the order of 25 bits, or 1 in 33 million likelihood. 4. Case-insensitive encoding ensures that it may be displayed in an easier to read uniform case, and it is faster and more comfortable to vocally read off a base-32 encoded number than the alternatives of hexadecimal or base58. In addition to the error correction code transformation of base-32 data, a padding scheme is specified for extending numbers or bit vectors of any length to a multiple of 5 bits suitable for base-32 encoding. == z-base-32 == The bitcoin reference client already has one implementation of base-32 encoding following the RFC 3548 standard, using the following alphabet: const char *pbase32 = "abcdefghijklmnopqrstuvwxyz234567"; For error correction coded strings this BIP specifies usage of Phil Zimmermann's z-base-32 encoding alphabet[], which provides better resistance to transcriptive errors than the RFC 3548 standard: const char *pzbase32 = "ybndrfg8ejkmcpqxot1uwisza345h769"; The same RFC 3548 coder is used for z-base-32, except that unnecessary '=' padding characters are stripped before performing the alphabet substitution. For example, the hexadecimal string 'ae653be0049be3' is RFC 3548 encoded as 'vzstxyaetprq====', and z-base-32 encoded as 'i31uzayruxto'. == CRC-5-USB error correction coding == Herein we describe an error correction encoding using cyclic redundancy check polynomial division[], which requires 5 error correction digits per 26 digits of input, instead of the theoretically optimal 4, but is much, much easier to implement correctly then available non-patented error correction codes. Cyclic redundancy check polynomial division provides a very straightforward, patent-free mechanism for reliably detecting transcription errors in input, and performing up to 1-digit corrections per 26 digit block. === Encoding === The input to this error correction encoder is a sequence of 26 base-32 digits. These digits are decoded into 5-bit unsigned integers with values equal to their offset into the base-32 alphabet string. If the input is less than 26 digits in length, it is extended with zero-valued digits. If For example, the string 'vzstxyaetprq' using the RFC 3548 alphabet becomes the code point sequence: <21 25 18 19 23 24 0 4 19 15 17 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0>' |--------------input-------------|---------padding----------| Expositionally it helps to think of this array as a 26-element column vector of 5-bit binary integers: <0b10101 0b11001 0b10010 0b10011 0b10111 0b11000 0b00000 0b00100 0b10011 0b01111 0b10001 0b10000 ... 14 zero elements ...> If we explode the bits of each element into 5, 1-bit columns, we get a 26 x 5 matrix: <1 0 1 0 1 1 1 0 0 1 1 0 0 1 0 1 0 0 1 1 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 0 0 ... 14 x 5 zero elements ...> The array is then transposed, such that we get a 5 x 26 matrix where each row represents the 5th, 4th, 3rd, 2nd, or 1st bit, respectfully, of each element: <1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0> |---------input--------|----------padding---------| We then use each of these rows separately as input into a cyclic redundancy check polynomial division operation, using the CRC-5-USB generating polynomial x^5 + x^2 + 1. The result is 5 element column vector: <10111 01000 10010 00111 00110> The elements of this vector are then exploded horizontally, and affixed to the end of the bit matrix. <1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0> |---------input--------|----------padding----------|--crc---| At this point, calculating the CRC checksum for each row should result in zero: <0 0 0 0 0> Now we reverse the process in order to encode the output. First the padding bits are removed: <1 1 1 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 1 1 0 0 0 1 1 1 0 0 0 1 1 0> |---------input--------|--crc---| Then the matrix is once again transposed, to yield an (l+5) x 5 matrix, where l is the number of digits in the original input: <1 0 1 0 1 - 1 1 0 0 1 | 1 0 0 1 0 | 1 0 0 1 1 | 1 0 1 1 1 i 1 1 0 0 0 n 0 0 0 0 0 p 0 0 1 0 0 u 1 0 0 1 1 t 0 1 1 1 1 | 1 0 0 0 1 | 1 0 0 0 0 - 1 0 1 0 0 - 0 1 0 0 0 c 1 0 0 1 1 r 1 0 1 1 1 c 1 0 0 1 0> - And the rows are imploded: <21 25 18 19 23 24 0 4 19 15 17 16 20 8 19 23 18>' |--------------input-------------|----crc-----| And the result is converted into z-base-32: 'i31uzayruxtoweuz1'. === Decoding and error recovery === The process of decoding and error detection and recovery is similar to encoding, and this section will not explain steps that were adequately covered in the encoder description. First, the input is converted from z-base-32 into a sequence of up to 31 (26+5) 5-bit integers, with zero-valued padding inserted between the end of the input and the 5-digit checksum. Using our running example of 'i31uzayruxtoweuz1', this result is the following: <21 25 18 19 23 24 0 4 19 15 17 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 8 19 23 18>' |--------------input-------------|----------padding----------|----crc-----| The binary representation of each element is exploded horizontally, and the matrix transposed to yield the following 5 x 31 bit matrix: <1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0> |---------input--------|----------padding----------|--crc---| A CRC calculation of each row should yield zero if the encoded string was transcribed correctly. If, however, a digit was wrong, that would result in some or all of the bits of the corresponding column being flipped. A property of the CRC generating polynomial used is that in the case of a single digit error, each row with a flipped bit would result in the same checksum, and that checksum value can be used as the index into a table indicating which bit was flipped: // EC[0] has no meaning, because offsets are 1-based const unsigned char EC[32] = { 32, 4, 3, 17, 2, 30, 16, 24, 1, 6, 29, 8, 15, 27, 23, 12, 0, 25, 5, 18, 28, 13, 7, 9, 14, 10, 26, 19, 22, 21, 11, 20 }; If all checksum values are zero, the decoder assumes the input is correct. If all non-zero checksum values are equal valued, the decoder assumes the corresponding digit was transcribed incorrectly and flips the bit in that column of the rows with non-zero checksums. If any of the five checksum values are non-zero, and not equal to each other, then there is an unrecoverable error in the input. The probability of two or more digits being incorrect and yet by chance the checksums being zero or equal valued is less than 1 in 33 million. Now that errors have been detected and single-digit errors corrected, the padding bits and CRC checksum bits are removed. The matrix is transposed, it's rows imploded, and the resulting sequence of up to 26 characters converted into base-32 using the RFC 3548 alphabet: 'vzstxyaetprq'. == CodedBase32 integer encoding == Although providing an error correction coder for base-32 data interesting and useful in contexts where base-32 is already deployed, many applications involve encoding of integers which are powers of two in length. This section provides a standard scheme for the encoding of any sized bitstring into a multiple of 5 bits in length, suitable for direct encoding into base-32. First the size in bits of the integer is rounded up to the next highest power of two greater than or equal to 128. This value with a factor of 128 removed is known as n, and its base-2 logorithm as e. Pseudocode: int n = max(next_power_of_two(BITS), 128) / 128; int e = log2(n); A total of 2*n padding bits are prefixed to the bitstring. These consist of e 1 bits, a single 0 bit, and 2*n-e-1 user-specified bits (the "extra" field). The bitstring is now a multiple of 5 in length and can be directly converted into base-32. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTBry9AAoJEAdzVfsmodw45UwP/1kjvfFMe0I+F+0Ma+sl8o3a U3/FlyOapIL8wZPyjXItts7/5V8kPOSPiEjnG/Oes7eBmiod2UYy+73qIHaHI6Ug 88OMjX8hSHYz+EfxrUkb8Uiios4FNS6vo/SVjELR8vLiQoJ30T4l56QMJvMle1wj +GDdHjNfL0F9NzqA7WY1rRRXllBCmDfLUeYS3raOx9tmGgfj/4h451RLwXouIwT6 oBMBIJzkGLHi//0SF+xlNJb/zebD421qXLgg28ci0fz+bxMEtPM7HktCpHS+pzUu V211rA3eP6/gb617SHD7XcxROubpsZ0y1ieaCzjfrQ0NUDqEkYyydDh4rf2AF99R ODwy8bDNikhN62480Gd2PuqKftf66tUdycXMcnC9Cwe3Ejli+RKBGnTBh5ekPO3+ Pmu/vgILuL8WojOKcAnMSk0tvA+w0kBf/b7mUzLsBpJfOMxtk3KgKrWWbuxotz0C VKGVICpvtrA87jpOqB36Hn1tFYRknCX9PCzPEENpJg/aK26xNTid4jtbg9MhopdS GuH4SBNYvvgq7VkCbq5zggh37npgHy/mmBhAmDw7ecBPw/O9jtGEUSFbSTMoaL5s hK5WTlsSNvvuAaFlv0qvreI1gQiJBhR9+JuZfFS1fzBXWcmDf7n8kqkbLOQr6nVJ O6AhIj7iHRCNfSTuhElY =ZHxf -----END PGP SIGNATURE-----