summaryrefslogtreecommitdiff
path: root/e9/4a491fcbb1f9fee7b583e9c06fed0b993586dc
blob: f73dfe9f1b5a919558f5d96b34c49981a6b36369 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194]
	helo=mx.sourceforge.net)
	by sfs-ml-1.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
	(envelope-from <boydb@midnightdesign.ws>) id 1VcNjD-0007UO-0M
	for bitcoin-development@lists.sourceforge.net;
	Fri, 01 Nov 2013 23:02:07 +0000
Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of
	midnightdesign.ws designates 50.87.144.70 as permitted sender)
	client-ip=50.87.144.70; envelope-from=boydb@midnightdesign.ws;
	helo=gator3054.hostgator.com; 
Received: from gator3054.hostgator.com ([50.87.144.70])
	by sog-mx-4.v43.ch3.sourceforge.com with esmtps (TLSv1:AES256-SHA:256)
	(Exim 4.76) id 1VcNjB-0003t9-Qm
	for bitcoin-development@lists.sourceforge.net;
	Fri, 01 Nov 2013 23:02:06 +0000
Received: from [74.125.82.53] (port=39212 helo=mail-wg0-f53.google.com)
	by gator3054.hostgator.com with esmtpsa (TLSv1:RC4-SHA:128)
	(Exim 4.80) (envelope-from <boydb@midnightdesign.ws>)
	id 1VcL7G-0001Zn-MT for bitcoin-development@lists.sourceforge.net;
	Fri, 01 Nov 2013 15:14:46 -0500
Received: by mail-wg0-f53.google.com with SMTP id y10so9830wgg.8
	for <bitcoin-development@lists.sourceforge.net>;
	Fri, 01 Nov 2013 13:14:44 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:mime-version:date:message-id:subject:from:to
	:content-type;
	bh=nzAwZ7DEseXzXFm1XvsGTTmUh3SL4qUSMq0fZxExcB4=;
	b=ih4+opS1jBUyori+ZKoM3tm85BrYadxQZoi9WTeD3REXhwTwNKsjr0q3nozk3aTUgY
	xITiZpJabmSuQihlHdG0GEx3wBle4jO30P2Hvko9RQjoFGuLcYM1K6BjcVQK0aA3r6sA
	muVYS0SYrQDRqkipwssW/bqZPCF1J5ESPaw7kVnbqju4TT6fS2yvvIRznk7mVqVDXovf
	+ExZAHEs9G3vNkkfy8vAZrflZtFu5o8d5x5gzasmc/Ddp+HSh7dxN1enP+49gyQcym0t
	WXVUaxtRv42iz3BE9Mnr3SLvcoknqDgq2yhcXsYq0GcKnspCZ33YZxe/K4AJMZcA7A8z
	S4SA==
X-Gm-Message-State: ALoCoQmCjbFXnMdIYfLYUssEaaXO75ND1ja16555EMV+BiYioXLQhRYlF6cm+H0IsgeHMFwH+pZ+
MIME-Version: 1.0
X-Received: by 10.180.108.131 with SMTP id hk3mr3764538wib.10.1383336884910;
	Fri, 01 Nov 2013 13:14:44 -0700 (PDT)
Received: by 10.227.60.6 with HTTP; Fri, 1 Nov 2013 13:14:44 -0700 (PDT)
Date: Fri, 1 Nov 2013 15:14:44 -0500
Message-ID: <CANg-TZC2NHfGR3mfm4VuuZMbwxkJzP69OmWhLvOD2Zq8GWejnw@mail.gmail.com>
From: Brooks Boyd <boydb@midnightdesign.ws>
To: bitcoin-development@lists.sourceforge.net
Content-Type: multipart/alternative; boundary=e89a8f3ba34ba1530b04ea233704
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - gator3054.hostgator.com
X-AntiAbuse: Original Domain - lists.sourceforge.net
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - midnightdesign.ws
X-BWhitelist: no
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Source-Sender: (mail-wg0-f53.google.com) [74.125.82.53]:39212
X-Source-Auth: midnight
X-Email-Count: 0
X-Source-Cap: bWlkbmlnaHQ7bWlkbmlnaHQ7Z2F0b3IzMDU0Lmhvc3RnYXRvci5jb20=
X-Spam-Score: -0.5 (/)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
	See http://spamassassin.org/tag/ for more details.
	-1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
	sender-domain
	-0.0 SPF_HELO_PASS          SPF: HELO matches SPF record
	-0.0 SPF_PASS               SPF: sender matches SPF record
	1.0 HTML_MESSAGE           BODY: HTML included in message
X-Headers-End: 1VcNjB-0003t9-Qm
Subject: Re: [Bitcoin-development] BIP39 word list
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Fri, 01 Nov 2013 23:02:07 -0000

--e89a8f3ba34ba1530b04ea233704
Content-Type: text/plain; charset=ISO-8859-1

I was inspired to join the mailing list to comment on some of these
discussions about BIP39, which I think will have great use in the Bitcoin
community and outside it as a way to transcribe binary data.

The one thought I had as the discussions about similar characters are
resulting in culling words from the list, is that it only helps to validate
input, not help the user if it is incorrect.

For example, if both "cat" and "eat" were in the word list, and someone
wrote down "eat", but later mis-translated it and put "cat" back into
translator, the result would be a checksum error; "cat" is a different
number, so the checksum would fail.

As it currently stands, "cat" would not be a valid word ("eat" is the real
word, and no other number is "cat"), so the translator can throw a
different error which is more helpful (i.e. "'cat' isn't a valid word
choice), but still doesn't get the user to the proper translation.

What about if the wordlist included those "words that are so similar to
each other that we only kept one of them" and had them all refer to the
same number? I propose the wordlist have the possibility of multiple words
on a single line, with the first word on the line being the "primary" or
"real" word to be used, with the other similar words be included so that a
translation program if it wanted to assist the user could fix their input
for them (verbosely or not), along the lines of "'cat' isn't a valid word
choice; assuming you meant 'eat', which is valid". You might still hit a
checksum error if that similar word is still the wrong word, but as it
stands now, I know you culled a bunch of words from the wordlist as "too
similar", but if I want to try and help the user fix a bad input, I need to
write a translation program with a full english dictionary alongside the
BIP39 dictionary.

I'd be willing to create a pull request for such an update, but before I
delve into that, does this sound like a good idea? I could see it devolving
into a slippery slope if every number in the 2048 set had a dozen word
variations (misspellings, similar words, slang terms for the real word,
etc.) which could get confusing of how similar is similar enough to be
added as an alternate, and the standard would need to be clear that when
translating binary to words, you only use the "main" word for that row, not
any of the variations.

MidnightLightning


> I've just pushed updated wordlist which is filtered to similar characters
taken from this matrix.
> BIP39 now consider following character pairs as similar:
>         similar = (
>             ('a', 'c'), ('a', 'e'), ('a', 'o'),
>             ('b', 'd'), ('b', 'h'), ('b', 'p'), ('b', 'q'), ('b', 'r'),
>             ('c', 'e'), ('c', 'g'), ('c', 'n'), ('c', 'o'), ('c', 'q'),
('c', 'u'),
>             ('d', 'g'), ('d', 'h'), ('d', 'o'), ('d', 'p'), ('d', 'q'),
>             ('e', 'f'), ('e', 'o'),
>             ('f', 'i'), ('f', 'j'), ('f', 'l'), ('f', 'p'), ('f', 't'),
>             ('g', 'j'), ('g', 'o'), ('g', 'p'), ('g', 'q'), ('g', 'y'),
>             ('h', 'k'), ('h', 'l'), ('h', 'm'), ('h', 'n'), ('h', 'r'),
>             ('i', 'j'), ('i', 'l'), ('i', 't'), ('i', 'y'),
>             ('j', 'l'), ('j', 'p'), ('j', 'q'), ('j', 'y'),
>             ('k', 'x'),
>             ('l', 't'),
>             ('m', 'n'), ('m', 'w'),
>             ('n', 'u'), ('n', 'z'),
>             ('o', 'p'), ('o', 'q'), ('o', 'u'), ('o', 'v'),
>             ('p', 'q'), ('p', 'r'),
>             ('q', 'y'),
>             ('s', 'z'),
>             ('u', 'v'), ('u', 'w'), ('u', 'y'),
>             ('v', 'w'), ('v', 'y')
>         )
> Feel free to review and comment current wordlist, but I think we're
slowly moving forward final list.
> slush

--e89a8f3ba34ba1530b04ea233704
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I was inspired to join the mailing list to comment on some=
 of these discussions about BIP39, which I think will have great use in the=
 Bitcoin community and outside it as a way to transcribe binary data.<br><b=
r>
The one thought I had as the discussions about similar characters are resul=
ting in culling words from the list, is that it only helps to validate inpu=
t, not help the user if it is incorrect.<br><br>For example, if both &quot;=
cat&quot; and &quot;eat&quot; were in the word list, and someone wrote down=
 &quot;eat&quot;, but later mis-translated it and put &quot;cat&quot; back =
into translator, the result would be a checksum error; &quot;cat&quot; is a=
 different number, so the checksum would fail.<br>
<br>As it currently stands, &quot;cat&quot; would not be a valid word (&quo=
t;eat&quot; is the real word, and no other number is &quot;cat&quot;), so t=
he translator can throw a different error which is more helpful (i.e. &quot=
;&#39;cat&#39; isn&#39;t a valid word choice), but still doesn&#39;t get th=
e user to the proper translation.<br>
<br>What about if the wordlist included those &quot;words that are so simil=
ar to each other that we only kept one of them&quot; and had them all refer=
 to the same number? I propose the wordlist have the possibility of multipl=
e words on a single line, with the first word on the line being the &quot;p=
rimary&quot; or &quot;real&quot; word to be used, with the other similar wo=
rds be included so that a translation program if it wanted to assist the us=
er could fix their input for them (verbosely or not), along the lines of &q=
uot;&#39;cat&#39; isn&#39;t a valid word choice; assuming you meant &#39;ea=
t&#39;, which is valid&quot;. You might still hit a checksum error if that =
similar word is still the wrong word, but as it stands now, I know you cull=
ed a bunch of words from the wordlist as &quot;too similar&quot;, but if I =
want to try and help the user fix a bad input, I need to write a translatio=
n program with a full english dictionary alongside the BIP39 dictionary.<br=
>
<br>I&#39;d be willing to create a pull request for such an update, but bef=
ore I delve into that, does this sound like a good idea? I could see it dev=
olving into a slippery slope if every number in the 2048 set had a dozen wo=
rd variations (misspellings, similar words, slang terms for the real word, =
etc.) which could get confusing of how similar is similar enough to be adde=
d as an alternate, and the standard would need to be clear that when transl=
ating binary to words, you only use the &quot;main&quot; word for that row,=
 not any of the variations.<br>
<br>MidnightLightning<br><br>=A0<br>&gt; I&#39;ve just pushed updated wordl=
ist which is filtered to similar characters taken from this matrix.<br>&gt;=
 BIP39 now consider following character pairs as similar:<br>&gt; =A0 =A0 =
=A0 =A0 similar =3D (<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;a&#39;, &#39;c&#39;), (&#39;a&#39;, &#39=
;e&#39;), (&#39;a&#39;, &#39;o&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39=
;b&#39;, &#39;d&#39;), (&#39;b&#39;, &#39;h&#39;), (&#39;b&#39;, &#39;p&#39=
;), (&#39;b&#39;, &#39;q&#39;), (&#39;b&#39;, &#39;r&#39;),<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;c&#39;, &#39;e&#39;), (&#39;c&#39;, &#39=
;g&#39;), (&#39;c&#39;, &#39;n&#39;), (&#39;c&#39;, &#39;o&#39;), (&#39;c&#=
39;, &#39;q&#39;), (&#39;c&#39;, &#39;u&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =
=A0 (&#39;d&#39;, &#39;g&#39;), (&#39;d&#39;, &#39;h&#39;), (&#39;d&#39;, &=
#39;o&#39;), (&#39;d&#39;, &#39;p&#39;), (&#39;d&#39;, &#39;q&#39;),<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;e&#39;, &#39;f&#39;), (&#39;e&#39;, &#39=
;o&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;f&#39;, &#39;i&#39;), (&#39=
;f&#39;, &#39;j&#39;), (&#39;f&#39;, &#39;l&#39;), (&#39;f&#39;, &#39;p&#39=
;), (&#39;f&#39;, &#39;t&#39;),<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;g&#39;, &#39;j&#39;), (&#39;g&#39;, &#39=
;o&#39;), (&#39;g&#39;, &#39;p&#39;), (&#39;g&#39;, &#39;q&#39;), (&#39;g&#=
39;, &#39;y&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;h&#39;, &#39;k&#39=
;), (&#39;h&#39;, &#39;l&#39;), (&#39;h&#39;, &#39;m&#39;), (&#39;h&#39;, &=
#39;n&#39;), (&#39;h&#39;, &#39;r&#39;),<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;i&#39;, &#39;j&#39;), (&#39;i&#39;, &#39=
;l&#39;), (&#39;i&#39;, &#39;t&#39;), (&#39;i&#39;, &#39;y&#39;),<br>&gt; =
=A0 =A0 =A0 =A0 =A0 =A0 (&#39;j&#39;, &#39;l&#39;), (&#39;j&#39;, &#39;p&#3=
9;), (&#39;j&#39;, &#39;q&#39;), (&#39;j&#39;, &#39;y&#39;),<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;k&#39;, &#39;x&#39;),<br>&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 (&#39;l&#39;, &#39;t&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0=
 (&#39;m&#39;, &#39;n&#39;), (&#39;m&#39;, &#39;w&#39;),<br>&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 (&#39;n&#39;, &#39;u&#39;), (&#39;n&#39;, &#39;z&#39;),<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;o&#39;, &#39;p&#39;), (&#39;o&#39;, &#39=
;q&#39;), (&#39;o&#39;, &#39;u&#39;), (&#39;o&#39;, &#39;v&#39;),<br>&gt; =
=A0 =A0 =A0 =A0 =A0 =A0 (&#39;p&#39;, &#39;q&#39;), (&#39;p&#39;, &#39;r&#3=
9;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;q&#39;, &#39;y&#39;),<br>
&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;s&#39;, &#39;z&#39;),<br>&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 (&#39;u&#39;, &#39;v&#39;), (&#39;u&#39;, &#39;w&#39;), (&#=
39;u&#39;, &#39;y&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;v&#39;, &#39=
;w&#39;), (&#39;v&#39;, &#39;y&#39;)<br>
&gt; =A0 =A0 =A0 =A0 )<br>&gt; Feel free to review and comment current word=
list, but I think we&#39;re slowly moving forward final list.<br>&gt; slush=
<br></div>

--e89a8f3ba34ba1530b04ea233704--