1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
|
Received: from sog-mx-1.v43.ch3.sourceforge.com ([172.29.43.191]
helo=mx.sourceforge.net)
by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
(envelope-from <mark@monetize.io>) id 1WGgB6-0006J0-4m
for bitcoin-development@lists.sourceforge.net;
Fri, 21 Feb 2014 02:49:28 +0000
Received: from mail-pb0-f50.google.com ([209.85.160.50])
by sog-mx-1.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128)
(Exim 4.76) id 1WGgB4-000122-6Q
for bitcoin-development@lists.sourceforge.net;
Fri, 21 Feb 2014 02:49:28 +0000
Received: by mail-pb0-f50.google.com with SMTP id rq2so2808869pbb.9
for <bitcoin-development@lists.sourceforge.net>;
Thu, 20 Feb 2014 18:49:20 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20130820;
h=x-gm-message-state:message-id:date:from:organization:user-agent
:mime-version:to:subject:content-type:content-transfer-encoding;
bh=E9W0O79z+g80fOwWxsOTWMHO0rmYm26ZNBOxUnL1Fqk=;
b=KFnEJtUmWMLOJXDirA/NHSaR2otBbO6wqMqkKGkg96QcWCqp+XY9/zDV/5Hgr3DIQt
s7yzDQgziLndIdlY7tlCd5p1jmUoo97hVnP+eByv9yl+WClj3MT89ewWEfqUzTqAGglE
QdGIUzdna+7cMrY5HWxC9ELjTDM8SeFttYQ0NuXLa/2NQok/n9NZzs8dXG1yA+xDf3fb
Wkm83DRNpeo+XZ/Sr3pyqQkAf5EfFVKf56vevVM5s8DHu8d7ua+94FVVGZObgeomgMbE
j1y2jppBB0qULbBlP8JL/AKxYjukdTJDNrC5d8AvkAsGqVx67wOy129DzkZMlpH70sRY
x55Q==
X-Gm-Message-State: ALoCoQl0QsHTjRrZF8i6aMldfpSXxaJzynHST1/NfIzu97RHgg9834idf0CdkdLP7FYppwltIlLL
X-Received: by 10.68.241.198 with SMTP id wk6mr6123523pbc.11.1392950469018;
Thu, 20 Feb 2014 18:41:09 -0800 (PST)
Received: from [192.168.127.217] (50-0-36-232.dsl.dynamic.sonic.net.
[50.0.36.232]) by mx.google.com with ESMTPSA id
kc9sm15952216pbc.25.2014.02.20.18.41.06
for <bitcoin-development@lists.sourceforge.net>
(version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
Thu, 20 Feb 2014 18:41:08 -0800 (PST)
Message-ID: <5306BCC1.8040004@monetize.io>
Date: Thu, 20 Feb 2014 18:41:05 -0800
From: Mark Friedenbach <mark@monetize.io>
Organization: Monetize.io Inc.
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: Bitcoin Dev <bitcoin-development@lists.sourceforge.net>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
See http://spamassassin.org/tag/ for more details.
X-Headers-End: 1WGgB4-000122-6Q
Subject: [Bitcoin-development] Base-32 error correction coding
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Fri, 21 Feb 2014 02:49:28 -0000
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
What follows is a proposed BIP for human-friendly base-32
serialization with error correction encoding. A formatted version is
viewable as part of a gist with related code:
https://gist.github.com/maaku/8996338#file-bip-ecc32-mediawiki
An implementation of this BIP and associated APIs is made available as
a pull request, with comprehensive testing:
https://github.com/bitcoin/bitcoin/pull/3713
This format is anticipated to be useful for helpdesk-related data
(e.g. the proposed normalized transaction ID), and future wallet
backup & paper wallet serialization formats.
== Abstract ==
The BIP proposes an human-centered encoding format for base-32 data
serialization. It differs from the presumptive default hexadecimal or
base58 encodings in the following ways:
1. Visually distinctive in that it includes the full range of
alphanumeric digits in its base-32 encoding, except the characters 0,
l, v, and 2. which are too easily confused with 1, i, u, r, or z in
font or handwriting.
2. Automatic correction of up to 1 transcription error per 31 coded
digits (130 bits of payload data). For a 256-bit hash or secret key,
this enables seamless recovery from up to two transcription errors so
long as they occur in separate halves of the coded representation.
3. Highly probable detection of errors beyond the error correction
threshold, with a false negative rate on the order of 25 bits, or 1 in
33 million likelihood.
4. Case-insensitive encoding ensures that it may be displayed in an
easier to read uniform case, and it is faster and more comfortable to
vocally read off a base-32 encoded number than the alternatives of
hexadecimal or base58.
In addition to the error correction code transformation of base-32
data, a padding scheme is specified for extending numbers or bit
vectors of any length to a multiple of 5 bits suitable for base-32
encoding.
== z-base-32 ==
The bitcoin reference client already has one implementation of base-32
encoding following the RFC 3548 standard, using the following alphabet:
const char *pbase32 = "abcdefghijklmnopqrstuvwxyz234567";
For error correction coded strings this BIP specifies usage of Phil
Zimmermann's z-base-32 encoding alphabet[], which provides better
resistance to transcriptive errors than the RFC 3548 standard:
const char *pzbase32 = "ybndrfg8ejkmcpqxot1uwisza345h769";
The same RFC 3548 coder is used for z-base-32, except that unnecessary
'=' padding characters are stripped before performing the alphabet
substitution. For example, the hexadecimal string 'ae653be0049be3' is
RFC 3548 encoded as 'vzstxyaetprq====', and z-base-32 encoded as
'i31uzayruxto'.
== CRC-5-USB error correction coding ==
Herein we describe an error correction encoding using cyclic
redundancy check polynomial division[], which requires 5 error
correction digits per 26 digits of input, instead of the theoretically
optimal 4, but is much, much easier to implement correctly then
available non-patented error correction codes. Cyclic redundancy check
polynomial division provides a very straightforward, patent-free
mechanism for reliably detecting transcription errors in input, and
performing up to 1-digit corrections per 26 digit block.
=== Encoding ===
The input to this error correction encoder is a sequence of 26 base-32
digits. These digits are decoded into 5-bit unsigned integers with
values equal to their offset into the base-32 alphabet string. If the
input is less than 26 digits in length, it is extended with
zero-valued digits. If For example, the string 'vzstxyaetprq' using
the RFC 3548 alphabet becomes the code point sequence:
<21 25 18 19 23 24 0 4 19 15 17 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0>'
|--------------input-------------|---------padding----------|
Expositionally it helps to think of this array as a 26-element column
vector of 5-bit binary integers:
<0b10101
0b11001
0b10010
0b10011
0b10111
0b11000
0b00000
0b00100
0b10011
0b01111
0b10001
0b10000
... 14 zero elements ...>
If we explode the bits of each element into 5, 1-bit columns, we get a
26 x 5 matrix:
<1 0 1 0 1
1 1 0 0 1
1 0 0 1 0
1 0 0 1 1
1 0 1 1 1
1 1 0 0 0
0 0 0 0 0
0 0 1 0 0
1 0 0 1 1
0 1 1 1 1
1 0 0 0 1
1 0 0 0 0
... 14 x 5 zero elements ...>
The array is then transposed, such that we get a 5 x 26 matrix where
each row represents the 5th, 4th, 3rd, 2nd, or 1st bit, respectfully,
of each element:
<1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0>
|---------input--------|----------padding---------|
We then use each of these rows separately as input into a cyclic
redundancy check polynomial division operation, using the CRC-5-USB
generating polynomial <code>x^5 + x^2 + 1</code>. The result is 5
element column vector:
<10111
01000
10010
00111
00110>
The elements of this vector are then exploded horizontally, and
affixed to the end of the bit matrix.
<1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
1 1 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0>
|---------input--------|----------padding----------|--crc---|
At this point, calculating the CRC checksum for each row should result
in zero:
<0 0 0 0 0>
Now we reverse the process in order to encode the output. First the
padding bits are removed:
<1 1 1 1 1 1 0 0 1 0 1 1 1 0 1 1 1
0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0
0 0 1 1 1 0 0 0 1 1 0 0 0 0 1 1 1
1 1 0 1 1 0 0 0 1 1 1 0 0 0 1 1 0>
|---------input--------|--crc---|
Then the matrix is once again transposed, to yield an (l+5) x 5
matrix, where l is the number of digits in the original input:
<1 0 1 0 1 -
1 1 0 0 1 |
1 0 0 1 0 |
1 0 0 1 1 |
1 0 1 1 1 i
1 1 0 0 0 n
0 0 0 0 0 p
0 0 1 0 0 u
1 0 0 1 1 t
0 1 1 1 1 |
1 0 0 0 1 |
1 0 0 0 0 -
1 0 1 0 0 -
0 1 0 0 0 c
1 0 0 1 1 r
1 0 1 1 1 c
1 0 0 1 0> -
And the rows are imploded:
<21 25 18 19 23 24 0 4 19 15 17 16 20 8 19 23 18>'
|--------------input-------------|----crc-----|
And the result is converted into z-base-32: 'i31uzayruxtoweuz1'.
=== Decoding and error recovery ===
The process of decoding and error detection and recovery is similar to
encoding, and this section will not explain steps that were adequately
covered in the encoder description.
First, the input is converted from z-base-32 into a sequence of up to
31 (26+5) 5-bit integers, with zero-valued padding inserted between
the end of the input and the 5-digit checksum. Using our running
example of 'i31uzayruxtoweuz1', this result is the following:
<21 25 18 19 23 24 0 4 19 15 17 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20
8 19 23 18>'
|--------------input-------------|----------padding----------|----crc-----|
The binary representation of each element is exploded horizontally,
and the matrix transposed to yield the following 5 x 31 bit matrix:
<1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
1 1 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0>
|---------input--------|----------padding----------|--crc---|
A CRC calculation of each row should yield zero if the encoded string
was transcribed correctly. If, however, a digit was wrong, that would
result in some or all of the bits of the corresponding column being
flipped. A property of the CRC generating polynomial used is that in
the case of a single digit error, each row with a flipped bit would
result in the same checksum, and that checksum value can be used as
the index into a table indicating which bit was flipped:
// EC[0] has no meaning, because offsets are 1-based
const unsigned char EC[32] =
{ 32, 4, 3, 17, 2, 30, 16, 24,
1, 6, 29, 8, 15, 27, 23, 12,
0, 25, 5, 18, 28, 13, 7, 9,
14, 10, 26, 19, 22, 21, 11, 20 };
If all checksum values are zero, the decoder assumes the input is
correct. If all non-zero checksum values are equal valued, the decoder
assumes the corresponding digit was transcribed incorrectly and flips
the bit in that column of the rows with non-zero checksums.
If any of the five checksum values are non-zero, and not equal to each
other, then there is an unrecoverable error in the input. The
probability of two or more digits being incorrect and yet by chance
the checksums being zero or equal valued is less than 1 in 33 million.
Now that errors have been detected and single-digit errors corrected,
the padding bits and CRC checksum bits are removed. The matrix is
transposed, it's rows imploded, and the resulting sequence of up to 26
characters converted into base-32 using the RFC 3548 alphabet:
'vzstxyaetprq'.
== CodedBase32 integer encoding ==
Although providing an error correction coder for base-32 data
interesting and useful in contexts where base-32 is already deployed,
many applications involve encoding of integers which are powers of two
in length. This section provides a standard scheme for the encoding of
any sized bitstring into a multiple of 5 bits in length, suitable for
direct encoding into base-32.
First the size in bits of the integer is rounded up to the next
highest power of two greater than or equal to 128. This value with a
factor of 128 removed is known as <code>n</code>, and its base-2
logorithm as <code>e</code>. Pseudocode:
int n = max(next_power_of_two(BITS), 128) / 128;
int e = log2(n);
A total of <code>2*n</code> padding bits are prefixed to the
bitstring. These consist of <code>e</code> 1 bits, a single 0 bit, and
<code>2*n-e-1</code> user-specified bits (the "extra" field).
The bitstring is now a multiple of 5 in length and can be directly
converted into base-32.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQIcBAEBAgAGBQJTBry9AAoJEAdzVfsmodw45UwP/1kjvfFMe0I+F+0Ma+sl8o3a
U3/FlyOapIL8wZPyjXItts7/5V8kPOSPiEjnG/Oes7eBmiod2UYy+73qIHaHI6Ug
88OMjX8hSHYz+EfxrUkb8Uiios4FNS6vo/SVjELR8vLiQoJ30T4l56QMJvMle1wj
+GDdHjNfL0F9NzqA7WY1rRRXllBCmDfLUeYS3raOx9tmGgfj/4h451RLwXouIwT6
oBMBIJzkGLHi//0SF+xlNJb/zebD421qXLgg28ci0fz+bxMEtPM7HktCpHS+pzUu
V211rA3eP6/gb617SHD7XcxROubpsZ0y1ieaCzjfrQ0NUDqEkYyydDh4rf2AF99R
ODwy8bDNikhN62480Gd2PuqKftf66tUdycXMcnC9Cwe3Ejli+RKBGnTBh5ekPO3+
Pmu/vgILuL8WojOKcAnMSk0tvA+w0kBf/b7mUzLsBpJfOMxtk3KgKrWWbuxotz0C
VKGVICpvtrA87jpOqB36Hn1tFYRknCX9PCzPEENpJg/aK26xNTid4jtbg9MhopdS
GuH4SBNYvvgq7VkCbq5zggh37npgHy/mmBhAmDw7ecBPw/O9jtGEUSFbSTMoaL5s
hK5WTlsSNvvuAaFlv0qvreI1gQiJBhR9+JuZfFS1fzBXWcmDf7n8kqkbLOQr6nVJ
O6AhIj7iHRCNfSTuhElY
=ZHxf
-----END PGP SIGNATURE-----
|