Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 4A4CEE7E for ; Tue, 9 Jan 2018 16:20:42 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-it0-f44.google.com (mail-it0-f44.google.com [209.85.214.44]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7806314D for ; Tue, 9 Jan 2018 16:20:41 +0000 (UTC) Received: by mail-it0-f44.google.com with SMTP id b5so12238839itc.3 for ; Tue, 09 Jan 2018 08:20:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=blockstream.io; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=IFPRJWnljgX1BPqouHJTdbUqFF8ieIShkAxz0GxtF+s=; b=UBLjZASHYeb063avy+u9iTJhE8Bgna+OybNFAVAjii1kXdU8FtnlhTRMj2aDgAaROC E18e6AUpLDBsEogNog3DE6aCSTj0chxx+/SMH7ODllcsEqGTHtKcRs3MLWT01M6aNx9l 62IJRl3UE6XHCQ92nVlSblKJwpCkGvSjFBxjI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=IFPRJWnljgX1BPqouHJTdbUqFF8ieIShkAxz0GxtF+s=; b=hARIl4A5oOZ9CDqUN9EjcN/Q/LF1OiikWm1vLxggdJwnx3V/T91bO95ZCr3xTGwYvU QscbZDUvq4rkctWeiEJoRMGkJKtBE924+8iLDu9xY3xmldqf0aIc1ak5qHAWLF70/F6D qMWzCMQHsl4gjgwqbR1flUunQW91+9C3aDwYwyDkJB0nEJ8j2jOzGlV65bpgjthY73C6 BF+ZYvc4XX4QnTpiyKoL7hEwsaSo9/bh0ZeDSEXbYNJEM67PUaE2eHj6N16TjG2iR8ER ag7R5chQLymzpvTH+Ry8GUsNROKNj2tcU3n58fiiE32MxMHePAZ68tX/PrBhZsMsIZmi bcgA== X-Gm-Message-State: AKGB3mLWuOySoMp7ShSmJKc61BOwIfLPlxFcgRju04q9HFThMOZzGwfS 6SK/+GHmpgH7YcsLrmimIBHJcEwjCUDiKmFcnJfL2vOm2Mg= X-Google-Smtp-Source: ACJfBosWYcFvkhtQ4LvkgQV+wQyqN+i8XPbTsJsAc63kUYKSEcbfgrG0kDGX7l8CBZUZZZACOwBH988pjoxBxDB9ntg= X-Received: by 10.36.145.143 with SMTP id i137mr15952334ite.6.1515514840634; Tue, 09 Jan 2018 08:20:40 -0800 (PST) MIME-Version: 1.0 Received: by 10.2.160.194 with HTTP; Tue, 9 Jan 2018 08:20:20 -0800 (PST) In-Reply-To: References: From: "Russell O'Connor" Date: Tue, 9 Jan 2018 11:20:20 -0500 Message-ID: To: Pavol Rusnak , Bitcoin Protocol Discussion Content-Type: multipart/alternative; boundary="94eb2c0ef16eba70b505625a4bc5" X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HTML_MESSAGE, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: Re: [bitcoin-dev] Satoshilabs secret shared private key scheme X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jan 2018 16:20:42 -0000 --94eb2c0ef16eba70b505625a4bc5 Content-Type: text/plain; charset="UTF-8" On Mon, Jan 8, 2018 at 7:39 AM, Pavol Rusnak via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > On 08/01/18 05:22, Gregory Maxwell wrote: > >> https://github.com/satoshilabs/slips/blob/master/slip-0039.md > > > > The 16-bit "checksum" based on sha2 seems pretty poor since basing > > small checksums on a cryptographic hash results in a fairly poor > > checksum that is surprisingly likely to accept an errored string. Your > > wordlist is 10 bits and you have much less than 1023*10 bits of input, > > so you could easily have a 20 bit code (two words) which guaranteed > > that up to two errored words would always be detected, and probably > > could choose one which catches three words much more often 1:2^20 > > (sipa's crc tools can help find codes like this). > > Originally, we wanted to use 16-bit of CRC32 for checksum, but after the > discussion with Daan Sprenkels we were suggested to change this for > cryptographically strong function. The argument was that CRC32 contains > less entropy and mixing high-entropy data (secret) with low-entropy data > (checksum) is not a good idea. > This entropy argument seems confused. Ignoring constant factors, the entropy of a checksum is the sum over all possible checksums, i, of -n_i*log(n_i), where n_i is the number of times the ith checksum occurs over the space of all possible data being checksummed. In this application the checksum is being applied to a fixed m-bit blob of uniformly random data. The entropy is maximized when every possible checksum occurs equally as frequently, that is we achieve maximum entropy when all the n_i values are equal to each other. Any error correcting code worth it's salt will try to achieve this property because the designers want every checksum value to have as much error correcting power as every other checksum value. I'm almost certain that the algebraic properties of your typical error correcting codes allow you to prove that maximum entropy is perfectly achieved whenever the data-blob size is at least as large as the checksum size. Meanwhile the truncated value of a cryptographic hash function is expected to be slightly under the maximum entropy value, under the assumption that the hash function it behaves like a random function. The main properties of a "strong cryptographic hash function" is that it is infeasible to find collisions and preimages. However these properties are lost when you truncate the hash down to 16-bits. At this point is it entirely feasible to find collisions and preimages. So using a truncated cryptographic hash function doesn't provide you with more entropy (and, in fact, probably a sliver less entropy), and doesn't provide you with any of the befits of strong cryptographic hash function. > Also, there is an argument between a checksum and ECC. We discussed that > ECC might not be a good idea, because it helps the attacker to compute > missing information, while we only want to check for integrity. Also the > word mnemonic is itself a ECC, because if you see the word "acadornic" > it is probably the word "academic". > Every checksum is error correcting. Given an failed checksum, all you have to do is search around the space of edits to find the smallest set edits that yield a valid checksum. With a 2^16 bit checksum one will expect to find a nearby checksum within 2^16 trails, even when using a truncated hash function. What an error-correcting codes gives you isn't the ability to correct errors, which we have seen is something that all short checksums provide, rather they provide *guarantees* about the ability to detect (and correct) certain common classes of errors. For example we can have an ECC that guarantees to find the error where are word is accidentally written down twice (see https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-January/015506.html ). The advice you have been given will only result in losing any guarantees about detecting common classes or errors; it won't stop attackers from recovering missing information, and it won't provide a cryptographically strong function. --94eb2c0ef16eba70b505625a4bc5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

= On Mon, Jan 8, 2018 at 7:39 AM, Pavol Rusnak via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
On 08/01/18 05:22, Grego= ry Maxwell wrote:
>> https://github.com/satoshilabs= /slips/blob/master/slip-0039.md


> The 16-bit "checksum" based on sha2 seems pretty poor since = basing
> small checksums on a cryptographic hash results in a fairly poor
> checksum that is surprisingly likely to accept an errored string. Your=
> wordlist is 10 bits and you have much less than 1023*10 bits of input,=
> so you could easily have a 20 bit code (two words) which guaranteed > that up to two errored words would always be detected, and probably > could choose one which catches three words much more often 1:2^20
> (sipa's crc tools can help find codes like this).

Originally, we wanted to use 16-bit of CRC32 for checksum, but after= the
discussion with Daan Sprenkels we were suggested to change this for
cryptographically strong function. The argument was that CRC32 contains
less entropy and mixing high-entropy data (secret) with low-entropy data (checksum) is not a good idea.

This ent= ropy argument seems confused.=C2=A0 Ignoring constant factors, the entropy = of a checksum is the sum over all possible checksums, i, of -n_i*log(n_i), = where n_i is the number of times the ith checksum occurs over the space of = all possible data being checksummed.=C2=A0 In this application the checksum= is being applied to a fixed m-bit blob of uniformly random data.

The entropy is maximized when every possible checksum occur= s equally as frequently, that is we achieve maximum entropy when all the n_= i values are equal to each other.=C2=A0 Any error correcting code worth it&= #39;s salt will try to achieve this property because the designers want eve= ry checksum value to have as much error correcting power as every other che= cksum value.=C2=A0 I'm almost certain that the algebraic properties of = your typical error correcting codes allow you to prove that maximum entropy= is perfectly achieved whenever the data-blob size is at least as large as = the checksum size.

Meanwhile the truncated value o= f a cryptographic hash function is expected to be slightly under the maximu= m entropy value, under the assumption that the hash function it behaves lik= e a random function.

The main properties of a &quo= t;strong cryptographic hash function" is that it is infeasible to find= collisions and preimages.=C2=A0 However these properties are lost when you= truncate the hash down to 16-bits.=C2=A0 At this point is it entirely feas= ible to find collisions and preimages.

So using a = truncated cryptographic hash function doesn't provide you with more ent= ropy (and, in fact, probably a sliver less entropy), and doesn't provid= e you with any of the befits of strong cryptographic hash function.
=C2=A0
Also, there is an argument between a checksum and ECC. We discussed that ECC might not be a good idea, because it helps the attacker to compute
missing information, while we only want to check for integrity. Also the word mnemonic is itself a ECC, because if you see the word "acadornic&= quot;
it is probably the word "academic".

Every checksum is error correcting.=C2=A0 Given an failed checksum, = all you have to do is search around the space of edits to find the smallest= set edits that yield a valid checksum.=C2=A0 With a 2^16 bit checksum one = will expect to find a nearby checksum within 2^16 trails, even when using a= truncated hash function.

What an error-correcting= codes gives you isn't the ability to correct errors, which we have see= n is something that all short checksums provide, rather they provide *guara= ntees* about the ability to detect (and correct) certain common classes of = errors.=C2=A0 For example we can have an ECC that guarantees to find the er= ror where are word is accidentally written down twice (see