Received: from sog-mx-2.v43.ch3.sourceforge.com ([172.29.43.192]
	helo=mx.sourceforge.net)
	by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
	(envelope-from <allen.piscitello@gmail.com>) id 1VcOLp-0007Wq-ID
	for bitcoin-development@lists.sourceforge.net;
	Fri, 01 Nov 2013 23:42:01 +0000
Received-SPF: pass (sog-mx-2.v43.ch3.sourceforge.com: domain of gmail.com
	designates 209.85.212.171 as permitted sender)
	client-ip=209.85.212.171;
	envelope-from=allen.piscitello@gmail.com;
	helo=mail-wi0-f171.google.com; 
Received: from mail-wi0-f171.google.com ([209.85.212.171])
	by sog-mx-2.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128)
	(Exim 4.76) id 1VcOLo-0006fV-5j
	for bitcoin-development@lists.sourceforge.net;
	Fri, 01 Nov 2013 23:42:01 +0000
Received: by mail-wi0-f171.google.com with SMTP id f4so1739953wiw.16
	for <bitcoin-development@lists.sourceforge.net>;
	Fri, 01 Nov 2013 16:41:54 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.180.87.69 with SMTP id v5mr3959294wiz.45.1383349313919; Fri,
	01 Nov 2013 16:41:53 -0700 (PDT)
Received: by 10.194.85.112 with HTTP; Fri, 1 Nov 2013 16:41:53 -0700 (PDT)
In-Reply-To: <CANg-TZC2NHfGR3mfm4VuuZMbwxkJzP69OmWhLvOD2Zq8GWejnw@mail.gmail.com>
References: <CANg-TZC2NHfGR3mfm4VuuZMbwxkJzP69OmWhLvOD2Zq8GWejnw@mail.gmail.com>
Date: Fri, 1 Nov 2013 18:41:53 -0500
Message-ID: <CAJfRnm6mjm5Oy5YFM9vqC487AjtVG2NNzNg+GXaB1p2j7JtcGA@mail.gmail.com>
From: Allen Piscitello <allen.piscitello@gmail.com>
To: Brooks Boyd <boydb@midnightdesign.ws>
Content-Type: multipart/alternative; boundary=f46d044402a274e2de04ea261c61
X-Spam-Score: -0.6 (/)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
	See http://spamassassin.org/tag/ for more details.
	0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked.
	See
	http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
	for more information. [URIs: doubleclick.net]
	-1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
	sender-domain
	0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
	(allen.piscitello[at]gmail.com)
	-0.0 SPF_PASS               SPF: sender matches SPF record
	1.0 HTML_MESSAGE           BODY: HTML included in message
	-0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
	author's domain
	0.1 DKIM_SIGNED            Message has a DKIM or DK signature,
	not necessarily valid
	-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
X-Headers-End: 1VcOLo-0006fV-5j
Cc: Bitcoin Development <bitcoin-development@lists.sourceforge.net>
Subject: Re: [Bitcoin-development] BIP39 word list
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Fri, 01 Nov 2013 23:42:01 -0000

--f46d044402a274e2de04ea261c61
Content-Type: text/plain; charset=ISO-8859-1

The problem with this is that you might have word A which is similar to B,
but B is also similar to C.  So we scrub B from the list, someone enters B,
and we have no way to know if it means A or C.  It leads to a much more
complicated scheme to ensure that all errors are correctable.

Scrubbing A, B, and C is preferable, since it leads to no ambiguity and
there is no need to try to correct an error.


On Fri, Nov 1, 2013 at 3:14 PM, Brooks Boyd <boydb@midnightdesign.ws> wrote:

> I was inspired to join the mailing list to comment on some of these
> discussions about BIP39, which I think will have great use in the Bitcoin
> community and outside it as a way to transcribe binary data.
>
> The one thought I had as the discussions about similar characters are
> resulting in culling words from the list, is that it only helps to validate
> input, not help the user if it is incorrect.
>
> For example, if both "cat" and "eat" were in the word list, and someone
> wrote down "eat", but later mis-translated it and put "cat" back into
> translator, the result would be a checksum error; "cat" is a different
> number, so the checksum would fail.
>
> As it currently stands, "cat" would not be a valid word ("eat" is the real
> word, and no other number is "cat"), so the translator can throw a
> different error which is more helpful (i.e. "'cat' isn't a valid word
> choice), but still doesn't get the user to the proper translation.
>
> What about if the wordlist included those "words that are so similar to
> each other that we only kept one of them" and had them all refer to the
> same number? I propose the wordlist have the possibility of multiple words
> on a single line, with the first word on the line being the "primary" or
> "real" word to be used, with the other similar words be included so that a
> translation program if it wanted to assist the user could fix their input
> for them (verbosely or not), along the lines of "'cat' isn't a valid word
> choice; assuming you meant 'eat', which is valid". You might still hit a
> checksum error if that similar word is still the wrong word, but as it
> stands now, I know you culled a bunch of words from the wordlist as "too
> similar", but if I want to try and help the user fix a bad input, I need to
> write a translation program with a full english dictionary alongside the
> BIP39 dictionary.
>
> I'd be willing to create a pull request for such an update, but before I
> delve into that, does this sound like a good idea? I could see it devolving
> into a slippery slope if every number in the 2048 set had a dozen word
> variations (misspellings, similar words, slang terms for the real word,
> etc.) which could get confusing of how similar is similar enough to be
> added as an alternate, and the standard would need to be clear that when
> translating binary to words, you only use the "main" word for that row, not
> any of the variations.
>
> MidnightLightning
>
>
> > I've just pushed updated wordlist which is filtered to similar
> characters taken from this matrix.
> > BIP39 now consider following character pairs as similar:
> >         similar = (
> >             ('a', 'c'), ('a', 'e'), ('a', 'o'),
> >             ('b', 'd'), ('b', 'h'), ('b', 'p'), ('b', 'q'), ('b', 'r'),
> >             ('c', 'e'), ('c', 'g'), ('c', 'n'), ('c', 'o'), ('c', 'q'),
> ('c', 'u'),
> >             ('d', 'g'), ('d', 'h'), ('d', 'o'), ('d', 'p'), ('d', 'q'),
> >             ('e', 'f'), ('e', 'o'),
> >             ('f', 'i'), ('f', 'j'), ('f', 'l'), ('f', 'p'), ('f', 't'),
> >             ('g', 'j'), ('g', 'o'), ('g', 'p'), ('g', 'q'), ('g', 'y'),
> >             ('h', 'k'), ('h', 'l'), ('h', 'm'), ('h', 'n'), ('h', 'r'),
> >             ('i', 'j'), ('i', 'l'), ('i', 't'), ('i', 'y'),
> >             ('j', 'l'), ('j', 'p'), ('j', 'q'), ('j', 'y'),
> >             ('k', 'x'),
> >             ('l', 't'),
> >             ('m', 'n'), ('m', 'w'),
> >             ('n', 'u'), ('n', 'z'),
> >             ('o', 'p'), ('o', 'q'), ('o', 'u'), ('o', 'v'),
> >             ('p', 'q'), ('p', 'r'),
> >             ('q', 'y'),
> >             ('s', 'z'),
> >             ('u', 'v'), ('u', 'w'), ('u', 'y'),
> >             ('v', 'w'), ('v', 'y')
> >         )
> > Feel free to review and comment current wordlist, but I think we're
> slowly moving forward final list.
> > slush
>
>
> ------------------------------------------------------------------------------
> Android is increasing in popularity, but the open development platform that
> developers love is also attractive to malware creators. Download this white
> paper to learn more about secure code signing practices that can help keep
> Android apps secure.
> http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>

--f46d044402a274e2de04ea261c61
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">The problem with this is that you might have word A which =
is similar to B, but B is also similar to C. =A0So we scrub B from the list=
, someone enters B, and we have no way to know if it means A or C. =A0It le=
ads to a much more complicated scheme to ensure that all errors are correct=
able.<div>
<br></div><div>Scrubbing A, B, and C is preferable, since it leads to no am=
biguity and there is no need to try to correct an error.</div></div><div cl=
ass=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Fri, Nov 1, 2013 =
at 3:14 PM, Brooks Boyd <span dir=3D"ltr">&lt;<a href=3D"mailto:boydb@midni=
ghtdesign.ws" target=3D"_blank">boydb@midnightdesign.ws</a>&gt;</span> wrot=
e:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I was inspired to join the =
mailing list to comment on some of these discussions about BIP39, which I t=
hink will have great use in the Bitcoin community and outside it as a way t=
o transcribe binary data.<br>
<br>
The one thought I had as the discussions about similar characters are resul=
ting in culling words from the list, is that it only helps to validate inpu=
t, not help the user if it is incorrect.<br><br>For example, if both &quot;=
cat&quot; and &quot;eat&quot; were in the word list, and someone wrote down=
 &quot;eat&quot;, but later mis-translated it and put &quot;cat&quot; back =
into translator, the result would be a checksum error; &quot;cat&quot; is a=
 different number, so the checksum would fail.<br>

<br>As it currently stands, &quot;cat&quot; would not be a valid word (&quo=
t;eat&quot; is the real word, and no other number is &quot;cat&quot;), so t=
he translator can throw a different error which is more helpful (i.e. &quot=
;&#39;cat&#39; isn&#39;t a valid word choice), but still doesn&#39;t get th=
e user to the proper translation.<br>

<br>What about if the wordlist included those &quot;words that are so simil=
ar to each other that we only kept one of them&quot; and had them all refer=
 to the same number? I propose the wordlist have the possibility of multipl=
e words on a single line, with the first word on the line being the &quot;p=
rimary&quot; or &quot;real&quot; word to be used, with the other similar wo=
rds be included so that a translation program if it wanted to assist the us=
er could fix their input for them (verbosely or not), along the lines of &q=
uot;&#39;cat&#39; isn&#39;t a valid word choice; assuming you meant &#39;ea=
t&#39;, which is valid&quot;. You might still hit a checksum error if that =
similar word is still the wrong word, but as it stands now, I know you cull=
ed a bunch of words from the wordlist as &quot;too similar&quot;, but if I =
want to try and help the user fix a bad input, I need to write a translatio=
n program with a full english dictionary alongside the BIP39 dictionary.<br=
>

<br>I&#39;d be willing to create a pull request for such an update, but bef=
ore I delve into that, does this sound like a good idea? I could see it dev=
olving into a slippery slope if every number in the 2048 set had a dozen wo=
rd variations (misspellings, similar words, slang terms for the real word, =
etc.) which could get confusing of how similar is similar enough to be adde=
d as an alternate, and the standard would need to be clear that when transl=
ating binary to words, you only use the &quot;main&quot; word for that row,=
 not any of the variations.<br>

<br>MidnightLightning<br><br>=A0<br>&gt; I&#39;ve just pushed updated wordl=
ist which is filtered to similar characters taken from this matrix.<br>&gt;=
 BIP39 now consider following character pairs as similar:<br>&gt; =A0 =A0 =
=A0 =A0 similar =3D (<br>

&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;a&#39;, &#39;c&#39;), (&#39;a&#39;, &#39=
;e&#39;), (&#39;a&#39;, &#39;o&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39=
;b&#39;, &#39;d&#39;), (&#39;b&#39;, &#39;h&#39;), (&#39;b&#39;, &#39;p&#39=
;), (&#39;b&#39;, &#39;q&#39;), (&#39;b&#39;, &#39;r&#39;),<br>

&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;c&#39;, &#39;e&#39;), (&#39;c&#39;, &#39=
;g&#39;), (&#39;c&#39;, &#39;n&#39;), (&#39;c&#39;, &#39;o&#39;), (&#39;c&#=
39;, &#39;q&#39;), (&#39;c&#39;, &#39;u&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =
=A0 (&#39;d&#39;, &#39;g&#39;), (&#39;d&#39;, &#39;h&#39;), (&#39;d&#39;, &=
#39;o&#39;), (&#39;d&#39;, &#39;p&#39;), (&#39;d&#39;, &#39;q&#39;),<br>

&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;e&#39;, &#39;f&#39;), (&#39;e&#39;, &#39=
;o&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;f&#39;, &#39;i&#39;), (&#39=
;f&#39;, &#39;j&#39;), (&#39;f&#39;, &#39;l&#39;), (&#39;f&#39;, &#39;p&#39=
;), (&#39;f&#39;, &#39;t&#39;),<br>

&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;g&#39;, &#39;j&#39;), (&#39;g&#39;, &#39=
;o&#39;), (&#39;g&#39;, &#39;p&#39;), (&#39;g&#39;, &#39;q&#39;), (&#39;g&#=
39;, &#39;y&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;h&#39;, &#39;k&#39=
;), (&#39;h&#39;, &#39;l&#39;), (&#39;h&#39;, &#39;m&#39;), (&#39;h&#39;, &=
#39;n&#39;), (&#39;h&#39;, &#39;r&#39;),<br>

&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;i&#39;, &#39;j&#39;), (&#39;i&#39;, &#39=
;l&#39;), (&#39;i&#39;, &#39;t&#39;), (&#39;i&#39;, &#39;y&#39;),<br>&gt; =
=A0 =A0 =A0 =A0 =A0 =A0 (&#39;j&#39;, &#39;l&#39;), (&#39;j&#39;, &#39;p&#3=
9;), (&#39;j&#39;, &#39;q&#39;), (&#39;j&#39;, &#39;y&#39;),<br>

&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;k&#39;, &#39;x&#39;),<br>&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 (&#39;l&#39;, &#39;t&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0=
 (&#39;m&#39;, &#39;n&#39;), (&#39;m&#39;, &#39;w&#39;),<br>&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 (&#39;n&#39;, &#39;u&#39;), (&#39;n&#39;, &#39;z&#39;),<br>

&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;o&#39;, &#39;p&#39;), (&#39;o&#39;, &#39=
;q&#39;), (&#39;o&#39;, &#39;u&#39;), (&#39;o&#39;, &#39;v&#39;),<br>&gt; =
=A0 =A0 =A0 =A0 =A0 =A0 (&#39;p&#39;, &#39;q&#39;), (&#39;p&#39;, &#39;r&#3=
9;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;q&#39;, &#39;y&#39;),<br>

&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;s&#39;, &#39;z&#39;),<br>&gt; =A0 =A0 =
=A0 =A0 =A0 =A0 (&#39;u&#39;, &#39;v&#39;), (&#39;u&#39;, &#39;w&#39;), (&#=
39;u&#39;, &#39;y&#39;),<br>&gt; =A0 =A0 =A0 =A0 =A0 =A0 (&#39;v&#39;, &#39=
;w&#39;), (&#39;v&#39;, &#39;y&#39;)<br>

&gt; =A0 =A0 =A0 =A0 )<br>&gt; Feel free to review and comment current word=
list, but I think we&#39;re slowly moving forward final list.<br>&gt; slush=
<br></div>
<br>-----------------------------------------------------------------------=
-------<br>
Android is increasing in popularity, but the open development platform that=
<br>
developers love is also attractive to malware creators. Download this white=
<br>
paper to learn more about secure code signing practices that can help keep<=
br>
Android apps secure.<br>
<a href=3D"http://pubads.g.doubleclick.net/gampad/clk?id=3D65839951&amp;iu=
=3D/4140/ostg.clktrk" target=3D"_blank">http://pubads.g.doubleclick.net/gam=
pad/clk?id=3D65839951&amp;iu=3D/4140/ostg.clktrk</a><br>___________________=
____________________________<br>

Bitcoin-development mailing list<br>
<a href=3D"mailto:Bitcoin-development@lists.sourceforge.net">Bitcoin-develo=
pment@lists.sourceforge.net</a><br>
<a href=3D"https://lists.sourceforge.net/lists/listinfo/bitcoin-development=
" target=3D"_blank">https://lists.sourceforge.net/lists/listinfo/bitcoin-de=
velopment</a><br>
<br></blockquote></div><br></div>

--f46d044402a274e2de04ea261c61--