summaryrefslogtreecommitdiff
path: root/8b/95c763af2a0d7d5635f0c3cf4fd70f37c4e28f
blob: 4e491c44bc3ebfbde1171718d0cbd011ce03d2cf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
Received: from sog-mx-1.v43.ch3.sourceforge.com ([172.29.43.191]
	helo=mx.sourceforge.net)
	by sfs-ml-4.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
	(envelope-from <marek@palatinus.cz>) id 1VcOoR-0006AY-Tq
	for bitcoin-development@lists.sourceforge.net;
	Sat, 02 Nov 2013 00:11:35 +0000
X-ACL-Warn: 
Received: from mail-vb0-f48.google.com ([209.85.212.48])
	by sog-mx-1.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128)
	(Exim 4.76) id 1VcOoQ-0007OG-6c
	for bitcoin-development@lists.sourceforge.net;
	Sat, 02 Nov 2013 00:11:35 +0000
Received: by mail-vb0-f48.google.com with SMTP id o19so122343vbm.7
	for <bitcoin-development@lists.sourceforge.net>;
	Fri, 01 Nov 2013 17:11:28 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
	:date:message-id:subject:to:cc:content-type;
	bh=xcbatL4HUIBsojI23uy7lbwCOOTtTixPPAl8OFGNPlc=;
	b=kLxpmbAmWJJF6G/n4Ag/lyWyCHTsdPnh87KyN1OhYKOg8kL1WlDFc+J5GUf8TpnSkM
	fcg2eDjGYNbBUVy07W0N+HShduCKxR5pwyWTsBPzUlauaB/SjMtEDru+146ZllPjtxH6
	cr831rzMS/dWzDznJYI2/tEerguSWe4zOWcDQ0O94TGR/nadRcBqgMk6JgaeVxFctkj9
	jCWR4GRfBVxuzBbvwQSxxfkcYlgLQ3l8iKOBCH5Gd2xY6051PBFLeODs7Ct20GOoZG86
	+4H5gXl6cV7d3ZNQr0fTaNhHW/BplA3C3DHLu8Ooo2eq9PtjwxLSi5Z6GaL+rQ9eZQ/V
	UdxA==
X-Gm-Message-State: ALoCoQkZgJin15Cvg21zGf1xTZUayaNIweUpJ6QgZbf+DcPFShOu9MuAmrd15p9pLRzaSTRGkuMR
X-Received: by 10.220.169.203 with SMTP id a11mr1737581vcz.26.1383350681630;
	Fri, 01 Nov 2013 17:04:41 -0700 (PDT)
MIME-Version: 1.0
Sender: marek@palatinus.cz
Received: by 10.59.1.2 with HTTP; Fri, 1 Nov 2013 17:04:11 -0700 (PDT)
In-Reply-To: <CANg-TZC2NHfGR3mfm4VuuZMbwxkJzP69OmWhLvOD2Zq8GWejnw@mail.gmail.com>
References: <CANg-TZC2NHfGR3mfm4VuuZMbwxkJzP69OmWhLvOD2Zq8GWejnw@mail.gmail.com>
From: slush <slush@centrum.cz>
Date: Sat, 2 Nov 2013 01:04:11 +0100
X-Google-Sender-Auth: 9c-5jv1FxJ9Af54MHtvDfHqY_5w
Message-ID: <CAJna-HhyR4fLotqW2kci8rCuoMMVUtz9s1dpNbZYyrc5epC5sw@mail.gmail.com>
To: Brooks Boyd <boydb@midnightdesign.ws>
Content-Type: multipart/alternative; boundary=047d7b6721c4fa97ba04ea266d7f
X-Spam-Score: 1.0 (+)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
	See http://spamassassin.org/tag/ for more details.
	0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked.
	See
	http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
	for more information. [URIs: doubleclick.net]
	0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
	(slush[at]centrum.cz)
	1.0 HTML_MESSAGE           BODY: HTML included in message
X-Headers-End: 1VcOoQ-0007OG-6c
Cc: "bitcoin-development@lists.sourceforge.net"
	<bitcoin-development@lists.sourceforge.net>
Subject: Re: [Bitcoin-development] BIP39 word list
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Sat, 02 Nov 2013 00:11:36 -0000

--047d7b6721c4fa97ba04ea266d7f
Content-Type: text/plain; charset=ISO-8859-1

Hi Brooks,

I've been already thinking about eat -> cat typing mistake. Actually there
may be simplier solution than having wordlist with duplicated words.
Because there's already a mapping of similar characters in the source code
(currently only in unit test, but it can be moved), when user type a word
which isn't in wordlist, application may try to use such mapping to find a
combination which actually is in the mapping. This may be disambiguous in
some cases, but giving a choice between few words may be better than hard
fail. And it is actually quite easy to implement. Although I think
application can do such smart suggestions and help user to recover badly
written mnemonic, I don't think it is necessary to standardize such method
directly into BIP. It may or may not be implemented by developers and it is
just nice to have feature.

Example:

user type ear, but it isn't in wordlist.

Regards the mapping,
E is similar to A, C, F, O
A is similar to E, C, O
R is similar to B, P, H

So application can calculate combinations of possible characters:

a) when app consider than the the user mistyped only one character
AAR, CAR, FAR, OAR
EER, ECR, EOR
EAB, EAP, EAH

b) when app consider than user maybe mistyped more characters, it may do
full combination matrix
AEB,  ACB, AOB,  ... OEH, OCH, OOH

and then ask user to select only these combinations which are actually
presented in the wordlist. In this particular case it may be only CAR or
FAR (both cannot be in the wordlist because of rules in similarity).

Marek


On Fri, Nov 1, 2013 at 9:14 PM, Brooks Boyd <boydb@midnightdesign.ws> wrote:

> I was inspired to join the mailing list to comment on some of these
> discussions about BIP39, which I think will have great use in the Bitcoin
> community and outside it as a way to transcribe binary data.
>
> The one thought I had as the discussions about similar characters are
> resulting in culling words from the list, is that it only helps to validate
> input, not help the user if it is incorrect.
>
> For example, if both "cat" and "eat" were in the word list, and someone
> wrote down "eat", but later mis-translated it and put "cat" back into
> translator, the result would be a checksum error; "cat" is a different
> number, so the checksum would fail.
>
> As it currently stands, "cat" would not be a valid word ("eat" is the real
> word, and no other number is "cat"), so the translator can throw a
> different error which is more helpful (i.e. "'cat' isn't a valid word
> choice), but still doesn't get the user to the proper translation.
>
> What about if the wordlist included those "words that are so similar to
> each other that we only kept one of them" and had them all refer to the
> same number? I propose the wordlist have the possibility of multiple words
> on a single line, with the first word on the line being the "primary" or
> "real" word to be used, with the other similar words be included so that a
> translation program if it wanted to assist the user could fix their input
> for them (verbosely or not), along the lines of "'cat' isn't a valid word
> choice; assuming you meant 'eat', which is valid". You might still hit a
> checksum error if that similar word is still the wrong word, but as it
> stands now, I know you culled a bunch of words from the wordlist as "too
> similar", but if I want to try and help the user fix a bad input, I need to
> write a translation program with a full english dictionary alongside the
> BIP39 dictionary.
>
> I'd be willing to create a pull request for such an update, but before I
> delve into that, does this sound like a good idea? I could see it devolving
> into a slippery slope if every number in the 2048 set had a dozen word
> variations (misspellings, similar words, slang terms for the real word,
> etc.) which could get confusing of how similar is similar enough to be
> added as an alternate, and the standard would need to be clear that when
> translating binary to words, you only use the "main" word for that row, not
> any of the variations.
>
> MidnightLightning
>
>
> > I've just pushed updated wordlist which is filtered to similar
> characters taken from this matrix.
> > BIP39 now consider following character pairs as similar:
> >         similar = (
> >             ('a', 'c'), ('a', 'e'), ('a', 'o'),
> >             ('b', 'd'), ('b', 'h'), ('b', 'p'), ('b', 'q'), ('b', 'r'),
> >             ('c', 'e'), ('c', 'g'), ('c', 'n'), ('c', 'o'), ('c', 'q'),
> ('c', 'u'),
> >             ('d', 'g'), ('d', 'h'), ('d', 'o'), ('d', 'p'), ('d', 'q'),
> >             ('e', 'f'), ('e', 'o'),
> >             ('f', 'i'), ('f', 'j'), ('f', 'l'), ('f', 'p'), ('f', 't'),
> >             ('g', 'j'), ('g', 'o'), ('g', 'p'), ('g', 'q'), ('g', 'y'),
> >             ('h', 'k'), ('h', 'l'), ('h', 'm'), ('h', 'n'), ('h', 'r'),
> >             ('i', 'j'), ('i', 'l'), ('i', 't'), ('i', 'y'),
> >             ('j', 'l'), ('j', 'p'), ('j', 'q'), ('j', 'y'),
> >             ('k', 'x'),
> >             ('l', 't'),
> >             ('m', 'n'), ('m', 'w'),
> >             ('n', 'u'), ('n', 'z'),
> >             ('o', 'p'), ('o', 'q'), ('o', 'u'), ('o', 'v'),
> >             ('p', 'q'), ('p', 'r'),
> >             ('q', 'y'),
> >             ('s', 'z'),
> >             ('u', 'v'), ('u', 'w'), ('u', 'y'),
> >             ('v', 'w'), ('v', 'y')
> >         )
> > Feel free to review and comment current wordlist, but I think we're
> slowly moving forward final list.
> > slush
>
>
> ------------------------------------------------------------------------------
> Android is increasing in popularity, but the open development platform that
> developers love is also attractive to malware creators. Download this white
> paper to learn more about secure code signing practices that can help keep
> Android apps secure.
> http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>

--047d7b6721c4fa97ba04ea266d7f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Brooks,<div><br></div><div>I&#39;ve been already thinki=
ng about eat -&gt; cat typing mistake. Actually there may be simplier solut=
ion than having wordlist with duplicated words. Because there&#39;s already=
 a mapping of similar characters in the source code (currently only in unit=
 test, but it can be moved), when user type a word which isn&#39;t in wordl=
ist, application may try to use such mapping to find a combination which ac=
tually is in the mapping. This may be disambiguous in some cases, but givin=
g a choice between few words may be better than hard fail. And it is actual=
ly quite easy to implement. Although I think application can do such smart =
suggestions and help user to recover badly written mnemonic, I don&#39;t th=
ink it is necessary to standardize such method directly into BIP. It may or=
 may not be implemented by developers and it is just nice to have feature.<=
/div>

<div><br></div><div style>Example:</div><div style><br></div><div style>use=
r type ear, but it isn&#39;t in wordlist.</div><div style><br></div><div st=
yle>Regards the mapping,</div><div style>E is similar to A, C, F, O</div>

<div style>A is similar to E, C, O</div><div style>R is similar to B, P, H<=
/div><div style><br></div><div style>So application can calculate combinati=
ons of possible characters:</div><div style><br></div><div style>a) when ap=
p consider than the the user mistyped only one character</div>

<div style>AAR, CAR, FAR, OAR<br></div><div style>EER, ECR, EOR</div><div s=
tyle>EAB, EAP, EAH</div><div style><br></div><div style>b) when app conside=
r than user maybe mistyped more characters, it may do full combination matr=
ix</div>

<div style>AEB, =A0ACB, AOB, =A0... OEH, OCH, OOH</div><div style><br></div=
><div style>and then ask user to select only these combinations which are a=
ctually presented in the wordlist. In this particular case it may be only C=
AR or FAR (both cannot be in the wordlist because of rules in similarity).<=
/div>

<div style><br></div><div style>Marek</div></div><div class=3D"gmail_extra"=
><br><br><div class=3D"gmail_quote">On Fri, Nov 1, 2013 at 9:14 PM, Brooks =
Boyd <span dir=3D"ltr">&lt;<a href=3D"mailto:boydb@midnightdesign.ws" targe=
t=3D"_blank">boydb@midnightdesign.ws</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I was inspired to join the =
mailing list to comment on some of these discussions about BIP39, which I t=
hink will have great use in the Bitcoin community and outside it as a way t=
o transcribe binary data.<br>

<br>
The one thought I had as the discussions about similar characters are resul=
ting in culling words from the list, is that it only helps to validate inpu=
t, not help the user if it is incorrect.<br><br>For example, if both &quot;=
cat&quot; and &quot;eat&quot; were in the word list, and someone wrote down=
 &quot;eat&quot;, but later mis-translated it and put &quot;cat&quot; back =
into translator, the result would be a checksum error; &quot;cat&quot; is a=
 different number, so the checksum would fail.<br>


<br>As it currently stands, &quot;cat&quot; would not be a valid word (&quo=
t;eat&quot; is the real word, and no other number is &quot;cat&quot;), so t=
he translator can throw a different error which is more helpful (i.e. &quot=
;&#39;cat&#39; isn&#39;t a valid word choice), but still doesn&#39;t get th=
e user to the proper translation.<br>


<br>What about if the wordlist included those &quot;words that are so simil=
ar to each other that we only kept one of them&quot; and had them all refer=
 to the same number? I propose the wordlist have the possibility of multipl=
e words on a single line, with the first word on the line being the &quot;p=
rimary&quot; or &quot;real&quot; word to be used, with the other similar wo=
rds be included so that a translation program if it wanted to assist the us=
er could fix their input for them (verbosely or not), along the lines of &q=
uot;&#39;cat&#39; isn&#39;t a valid word choice; assuming you meant &#39;ea=
t&#39;, which is valid&quot;. You might still hit a checksum error if that =
similar word is still the wrong word, but as it stands now, I know you cull=
ed a bunch of words from the wordlist as &quot;too similar&quot;, but if I =
want to try and help the user fix a bad input, I need to write a translatio=
n program with a full english dictionary alongside the BIP39 dictionary.<br=
>


<br>I&#39;d be willing to create a pull request for such an update, but bef=
ore I delve into that, does this sound like a good idea? I could see it dev=
olving into a slippery slope if every number in the 2048 set had a dozen wo=
rd variations (misspellings, similar words, slang terms for the real word, =
etc.) which could get confusing of how similar is similar enough to be adde=
d as an alternate, and the standard would need to be clear that when transl=
ating binary to words, you only use the &quot;main&quot; word for that row,=
 not any of the variations.<br>


<br>MidnightLightning<br><br>=A0<br>&gt; I&#39;ve just pushed updated wordl=
ist which is filtered to similar characters taken from this matrix.<br>&gt;=
 BIP39 now consider following character pairs as similar:<br>&gt; =A0 =A0 =
=A0 =A0 similar =3D (<br>


&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;a&#39;, &#39;c&#39;), (&#39;a&#39;, &#39=
;e&#39;), (&#39;a&#39;, &#39;o&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39=
;b&#39;, &#39;d&#39;), (&#39;b&#39;, &#39;h&#39;), (&#39;b&#39;, &#39;p&#39=
;), (&#39;b&#39;, &#39;q&#39;), (&#39;b&#39;, &#39;r&#39;),<br>


&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;c&#39;, &#39;e&#39;), (&#39;c&#39;, &#39=
;g&#39;), (&#39;c&#39;, &#39;n&#39;), (&#39;c&#39;, &#39;o&#39;), (&#39;c&#=
39;, &#39;q&#39;), (&#39;c&#39;, &#39;u&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =
=A0 (&#39;d&#39;, &#39;g&#39;), (&#39;d&#39;, &#39;h&#39;), (&#39;d&#39;, &=
#39;o&#39;), (&#39;d&#39;, &#39;p&#39;), (&#39;d&#39;, &#39;q&#39;),<br>


&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;e&#39;, &#39;f&#39;), (&#39;e&#39;, &#39=
;o&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;f&#39;, &#39;i&#39;), (&#39=
;f&#39;, &#39;j&#39;), (&#39;f&#39;, &#39;l&#39;), (&#39;f&#39;, &#39;p&#39=
;), (&#39;f&#39;, &#39;t&#39;),<br>


&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;g&#39;, &#39;j&#39;), (&#39;g&#39;, &#39=
;o&#39;), (&#39;g&#39;, &#39;p&#39;), (&#39;g&#39;, &#39;q&#39;), (&#39;g&#=
39;, &#39;y&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;h&#39;, &#39;k&#39=
;), (&#39;h&#39;, &#39;l&#39;), (&#39;h&#39;, &#39;m&#39;), (&#39;h&#39;, &=
#39;n&#39;), (&#39;h&#39;, &#39;r&#39;),<br>


&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;i&#39;, &#39;j&#39;), (&#39;i&#39;, &#39=
;l&#39;), (&#39;i&#39;, &#39;t&#39;), (&#39;i&#39;, &#39;y&#39;),<br>&gt; =
=A0 =A0 =A0 =A0 =A0 =A0 (&#39;j&#39;, &#39;l&#39;), (&#39;j&#39;, &#39;p&#3=
9;), (&#39;j&#39;, &#39;q&#39;), (&#39;j&#39;, &#39;y&#39;),<br>


&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;k&#39;, &#39;x&#39;),<br>&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 (&#39;l&#39;, &#39;t&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0=
 (&#39;m&#39;, &#39;n&#39;), (&#39;m&#39;, &#39;w&#39;),<br>&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 (&#39;n&#39;, &#39;u&#39;), (&#39;n&#39;, &#39;z&#39;),<br>


&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;o&#39;, &#39;p&#39;), (&#39;o&#39;, &#39=
;q&#39;), (&#39;o&#39;, &#39;u&#39;), (&#39;o&#39;, &#39;v&#39;),<br>&gt; =
=A0 =A0 =A0 =A0 =A0 =A0 (&#39;p&#39;, &#39;q&#39;), (&#39;p&#39;, &#39;r&#3=
9;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;q&#39;, &#39;y&#39;),<br>


&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;s&#39;, &#39;z&#39;),<br>&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 (&#39;u&#39;, &#39;v&#39;), (&#39;u&#39;, &#39;w&#39;), (&#=
39;u&#39;, &#39;y&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;v&#39;, &#39=
;w&#39;), (&#39;v&#39;, &#39;y&#39;)<br>


&gt; =A0 =A0 =A0 =A0 )<br>&gt; Feel free to review and comment current word=
list, but I think we&#39;re slowly moving forward final list.<br>&gt; slush=
<br></div>
<br>-----------------------------------------------------------------------=
-------<br>
Android is increasing in popularity, but the open development platform that=
<br>
developers love is also attractive to malware creators. Download this white=
<br>
paper to learn more about secure code signing practices that can help keep<=
br>
Android apps secure.<br>
<a href=3D"http://pubads.g.doubleclick.net/gampad/clk?id=3D65839951&amp;iu=
=3D/4140/ostg.clktrk" target=3D"_blank">http://pubads.g.doubleclick.net/gam=
pad/clk?id=3D65839951&amp;iu=3D/4140/ostg.clktrk</a><br>___________________=
____________________________<br>


Bitcoin-development mailing list<br>
<a href=3D"mailto:Bitcoin-development@lists.sourceforge.net">Bitcoin-develo=
pment@lists.sourceforge.net</a><br>
<a href=3D"https://lists.sourceforge.net/lists/listinfo/bitcoin-development=
" target=3D"_blank">https://lists.sourceforge.net/lists/listinfo/bitcoin-de=
velopment</a><br>
<br></blockquote></div><br></div>

--047d7b6721c4fa97ba04ea266d7f--