Delivery-date: Sat, 08 Jun 2024 10:55:57 -0700 Received: from mail-yb1-f192.google.com ([209.85.219.192]) by mail.fairlystable.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sG0IF-0007Sd-Kh for bitcoindev@gnusha.org; Sat, 08 Jun 2024 10:55:57 -0700 Received: by mail-yb1-f192.google.com with SMTP id 3f1490d57ef6-dfb1078ff6fsf1049853276.0 for ; Sat, 08 Jun 2024 10:55:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20230601; t=1717869349; x=1718474149; darn=gnusha.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:message-id:to:from:date:sender:from:to:cc:subject:date :message-id:reply-to; bh=xNUAvMhL1JNjtNxkNBMqeZ7fB14muOC8zuhNze3y1n4=; b=kzmT3GanUq5DlrzmSOOQt2deW57op7I2DSpCIvc0cI853yEwX0y6sZkF+MqGmc7iJF /kbFqJwLaokAVqXhneVGfTAKj1FTC2q69yE0t+8aO1wuBoU/Kpym2uy55OWr6GFiNsY9 HjiRauDz5Mnfserf4IKO3RVi94LXmj7y0QhwTCs+H6YzjLLy90HQevx+ZXpJlrW1bN+e MXhPAydrPFRnVkNMZzZzKMx6TIjw3o1DaUOLrcIfUz54KK+D3aZ+Yik7t+X3F90sWvRd 5pPyF0KnY65nE/PnnsNRLqHyyar/KICcGVoTRMIxWgVp0NDEKTPDN34Mu1DWQRPJehVo 8GHg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717869349; x=1718474149; darn=gnusha.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:message-id:to:from:date:from:to:cc:subject:date:message-id :reply-to; bh=xNUAvMhL1JNjtNxkNBMqeZ7fB14muOC8zuhNze3y1n4=; b=QqeaGUZcVjdvC8QDc/GfJqcyW1LJZLvrs0cO2oyfl8Mr2bNfuDv+QSVNMsoKXfOQ+O zC4HGyYPGhP+HFybsX9GyJzQNZcqVXYyZyDTQLrRZGOWVVUZr5/Krll+vQu3ECe+vxEA n1TxrH0TYIxkr+oBKUEND24vLy5YwDe+ssUxIduF2Ms9tBuKy4wXdYC1seHNDc1MLNsB bq6GiitqeNbCvxqersKQB8TsM0kSBDghJiAPxAja01WaUzdnvRaCP1Xj7lVRP5Hn0YuX HD4TGDlWJlkG3kGGJtaaCgmQvVJiPGcH1QFTxFgR6iSIPvRFKXRGaQa1YslGRDLrEcl5 mYtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717869349; x=1718474149; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:message-id:to:from:date:x-beenthere:x-gm-message-state :sender:from:to:cc:subject:date:message-id:reply-to; bh=xNUAvMhL1JNjtNxkNBMqeZ7fB14muOC8zuhNze3y1n4=; b=nX+fyOsruLjsoUgqUE1IBKG9Kqim50MOVOvnRYpxliaycT7NTu+1zDteWN+NjYWlVw TvcPWydG2r22j9918ZR8KB0nnLYqErpz/WVmuG7Zh7w5IapKFSEOru1BZq1CW1tJE+q7 oItuxh8FXvYjlbfgLcYN5AkVgp2Ipwixfoic2qZzWXonjelyFKDofVTeCJdiOeWxVRdt bhc426f/tas1llMVMnDYaOhZOoXi+SvudK+ct+UNw+NGKtaXC/oDSPhyJlGec3wUkOaz WhQATJ2GlGRn3nP+22vv1DZHOI4zr3D397Ef/m38Rpqm9OZldmUmeuKVVbQq0kQqlG9K g3ZA== Sender: bitcoindev@googlegroups.com X-Forwarded-Encrypted: i=1; AJvYcCWbuWE9KhRUaLyzaRR5u+odwgYj3hhKoBsJxUwq3hAx1LqLZYSA6cjGL8KA6hBpO2ARPY7FBh1hGUQXDwGyKHMRbDr9Xiw= X-Gm-Message-State: AOJu0YwXVz/Lb3BkTlKWzpVQ283NdhQleLi9d55msHUjmgNWlYz7/wd/ a2wkZBrOuOdxGQ3iYr8Eiie9W8CIoTTJ4KqDPGmWZQrVXfTEwiJF X-Google-Smtp-Source: AGHT+IFDtBYdoIfMDmgkmyJM9C4LnadGCITDfREbMrJ6EdThD0x4Eih4e187zM6XqUbn/oxHLEkNmg== X-Received: by 2002:a25:c793:0:b0:df7:8b9f:8188 with SMTP id 3f1490d57ef6-dfaf655e33dmr5898565276.37.1717869349270; Sat, 08 Jun 2024 10:55:49 -0700 (PDT) X-BeenThere: bitcoindev@googlegroups.com Received: by 2002:a25:b206:0:b0:dfb:c3a:4d9e with SMTP id 3f1490d57ef6-dfb0c3a50aals1108384276.2.-pod-prod-01-us; Sat, 08 Jun 2024 10:55:47 -0700 (PDT) X-Received: by 2002:a05:690c:f05:b0:62c:f623:3a4e with SMTP id 00721157ae682-62cf6233bfemr4536207b3.0.1717869347328; Sat, 08 Jun 2024 10:55:47 -0700 (PDT) Received: by 2002:a81:86c1:0:b0:61b:e8f5:76d6 with SMTP id 00721157ae682-62cd4444551ms7b3; Fri, 7 Jun 2024 19:40:44 -0700 (PDT) X-Received: by 2002:a05:6902:2b08:b0:dfa:7278:b4c4 with SMTP id 3f1490d57ef6-dfaf652df7emr1238620276.4.1717814443053; Fri, 07 Jun 2024 19:40:43 -0700 (PDT) Date: Fri, 7 Jun 2024 19:40:42 -0700 (PDT) From: Aneesh Karve To: Bitcoin Development Mailing List Message-Id: Subject: [bitcoindev] BIP-? : Free seed mnemonics for steganography and attack-resistance MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_85211_2091040105.1717814442729" X-Original-Sender: aneesh.karve@gmail.com Precedence: list Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com List-ID: X-Google-Group-Id: 786775582512 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -0.5 (/) ------=_Part_85211_2091040105.1717814442729 Content-Type: multipart/alternative; boundary="----=_Part_85212_503457319.1717814442729" ------=_Part_85212_503457319.1717814442729 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable By adding a BIP-39 language called "Free" we can have nice things from=20 PBKDF2(), including offline and steganographic seeds, in a way that's fully= =20 compatible with BIP-39. For instance users can store strong seeds in an=20 easy-to-conceal deck of cards. Proposal inline below, with link to a reference implementation in Python.= =20 Online spec=20 here https://github.com/akarve/bip-keychain/blob/main/pre-bips/steganograph= ic-seeds.md. I could use your help on a proper seed complexity measure. Let me know your thoughts. # Free seed mnemonics for steganography and attack-resistance > "Anyone who considers arithmetical methods of producing random digits=20 is, =20 > of course, in a state of sin." =E2=80=94John Von Neumann # Abstract Bitcoin seed mnemonics face attackers ranging from casual thieves to state= =20 actors. We propose backward-compatible, non-breaking changes to BIP-39 master seed= =20 derivation to unlock broader mnemonic options from PBKDF2(). Users of this BIP can generate and store seeds offline with common physical= =20 objects that are plausibly deniable: playing cards, chess boards, and paper napkins= =20 to name a few. As a result seed mnemonics gain greater portability, memorability,= =20 entropy, and steganography as compared to BIP-39 mnemonics. # Motivation BIP-39 mnemonic seed phrases have the following shortcomings: 1. BIP-39 seed words, if found by an attacker, are instantly recognizable as a Bitcoin passphrase to be scrutinized or seized outright. 1. BIP-39 "mnemonics" are not particularly easy to remember. 1. Computing BIP-39's checksum bits necessitates a computer, making offline seeds impossible. 1. Some hardware vendors support independent sources of entropy such as die= =20 rolls but, unfortunately for the security, convenience, and trust of the users,= =20 vary in how they convert user entropy to the proper binary seed. 1. Users are required to run their own vendor-specific code to verify= =20 that the vendor has actually used their provided entropy. Said verification=20 often _still_ requires blind trust (how does the user know that entropy produced the= =20 right binary seed?) and is prohibitively technical for many users. 1. It is cumbersome to manually enter the results of 100 six-sided die= =20 rolls, the minimum number of rolls to surpass 256 bits of entropy. =20 1. Die rolls are poor for storing secrets since there are usually fewer= =20 dice than rolls and since dice are easily mixed up. =20 > Compare the effort and portability of these 100 rolls > to the far easier and more portable shuffled deck of cards. The above weaknesses cause all Bitcoin wallets to lose, **due to no fundamental limitation whatsoever**, the following benefits: 1. Portability in physical media. 1. Portability and memorability in the human brain. 1. Repudiation in high-risk situations where individuals or the Bitcoin=20 protocol are under attack. 1. The ability to generate, with no electronics, a cryptographically strong seed that will later work with many hardware wallets. 1. Moore's-law-resistant mnemonics that encode far more than 256 bits of=20 entropy. > Although more than 256 bits of entropy exceeds today's ECDSA private= =20 keys, > it rightfully leaves the door open for stronger keys in the future > and further permits today's mnemonics to be repurposed, without=20 change, > for tomorrow's larger keys. **As result, Bitcoin users seeking to evade oppressive governments and=20 other attackers have fewer defenses than the protocol naturally affords.** Importantly, the above weaknesses can be remedied with _little to no change to the BIP-39 seed phrase implementations_ already=20 ubiquitous in hardware and software wallets. # Risks, remedies, and alternatives ## Risks 1. Giving users more sources and options for mnemonics increases the risk that these users provide weak inputs that contain too little entropy= =20 or weak entropy (e.g. common phrases). > Implementers must mitigate this risk with an easy-to-implement entrop= y > measure and message to the user. 1. BIP-39 includes checksum bits in the final word, offering some protectio= n against erroneous entry. The present proposal eliminates both the=20 advantages (integrity) and disadvantages (cannot be computed by hand) of the BIP-39 checksum bits in favor of a much broader set of steganographic mnemonics that can be=20 stored, generated, and carried in situations of urgency and scarcity. > BIP-39 checksum validation _shall remain in place_ (unchanged) for=20 BIP-39 > mnemonics. > Advanced users might choose to implement their own checksums or=20 error-correcting > codes. # Specification **The present spec is fully backwards-compatible with BIP-39 mnemonics**. We introduce a new input "language" `Free` that admits a superset of BIP-39= =20 mnemonics. Wallets can and should continue to validate BIP-39 mnemonics as in the past= . `Free` should be treated as a new input language, similar to English,=20 French, or any of the BIP-39 languages. `Free` should allow at a minimum the ASCII printable characters, minus=20 capital letters. BIP-39 derives binary seeds by applying `PBKDF2()` to a mnemonic and=20 passhprase as follows: ``` PKBDF2( password=3Dbin(nfkd(mnemonic)), salt=3Dbin(nfkd("mnemonic" || passphrase)), hash_name=3D"HMAC-SHA512", iterations=3D2048, ) ``` `nfkd()` converts the input string to Unicode normalization form KD. Fortunately `PBKDF2()` does not limit the domain either the `password` or= =20 `salt` argument. Existing implementations are therefore easy to update. Applications must _not change_ passphrase entry, validation, normaliztion,= =20 or application. Applications should regard the _mnemonic_ as a raw string to permit the=20 widest possible array of input characters. (See the following section for details.= ) In the interest of backward compatibility we propose that existing BIP-39= =20 implementations treat `Free` as the tenth input "language" with the following differences= =20 from the 9 BIP-39 languages: 1. If they do not already, lower the case of all input letters. Strangely BIP-39 is silent on the subject of case.=20 1. If they do not already, `strip` the `Free` mnemonic of surrounding=20 whitespace and split it on the regular expression `\s+`. 1. Apply `nfkd()` to the `Free` mnemonic. ``` if language =3D=3D "Free": norm_mnemonic =3D lower(nfkd(split("\s+", mnemonic))) validate(norm_mnemonic) PKBDF2( password=3Dbin(norm_mnemonic), salt=3Dbin(nfkd("mnemonic" || passphrase))), hash_name=3D"HMAC-SHA512", iterations=3D2048, ) ``` The output of PKBDF2 is converted to a master seed as in BIP-39. ## `validate()` `validate()` must at a minimum estimate the complexity `complexity()` of the user proposed mnemonic and must refuse the mnemonic if the entropy is= =20 less than a threshold (we recommend a threshold of 0.5). Implementations must know the cardinality `C` of the mnemonic character set= . Applications must support at a bare minium an input cardinality of **74** (the number of printable ASCII characters) but higher values for `C` are=20 both permissible and recommended. As suggested below, the higher the=20 cardinality of the input set, the greater the steganographic potential. 0123456789abcdefghijklmnopqrstuvwxyz!"#$%&\'()*+,-./:;<=3D>?@[\\]^_`{|}= ~=20 \t\n\r\x0b\x0c The Shannon Entropy `SE` of a string is as follows: $$ se(X) :=3D - \sum_{i} p(x_i) \log_2 p(x_i) $$ As an optimization for fixed `X` with all unique entries and cardinality=20 `C`: $$ SE(X) :=3D log_2(C)$$ > Intuitively the above is the (fractional) number of bits needed to=20 represent all > characters in the `universe`. The relative entropy is simply the following: $$ re(mnemonic\_tokens, universe) :=3D SE(mnemonic) / SE(universe)$$ `universe` is the list of all possible input tokens. `mnemonic_tokens` is a= =20 tokenized list of the inputs. Tokens may vary in length per the universe and=20 application, though applications can start withe one token per ASCII printable character= . `re()` ranges from 0 to 1 when `mnemonic_tokens` are all unique. An `re()`= =20 of 0.5 reflects that the user has provided enough information as providing one=20 instance of half of all input tokens. `re()` alone is not a complete measure of password complexity since it does not take order into account. For instance the string `"abc...xyz"` and its reverse both have hight relative entropy but are highly predictable. To correct for this we can use the Hamming Distance, `hd()`, which counts= =20 the number of characters that are not in sorted order: $$ \text{Hamming Distance} =3D \sum_{i=3D1}^{n} (s_i \neq t_i) $$ Since undesirable order might be forwards of backwards we take the Relative Absolute Hamming Distance `rahd()`: ``` rahd() :=3D min(hd(norm_mnemonic), hd(norm_mnemonic.reverse())) /=20 len(norm_mnemonic) ``` As with `re()`, `rahd()` ranges from 0 to 1. ``` validate :=3D F(re(norm_mnemonic) + rahd(norm_mnemonic)) / 2 ``` If `validate()` returns a complexity less than a given threshold (TBD) the= =20 wallet should warn the user. ### TODO: examples of `validate()` for representative inputs * [ ] Dice, cards, chess, bad + good text passwords * [ ] Show how this leads to standard dice verification and fingerprinting across all hardware vendors (phew) # Reference implementation=20 * https://github.com/akarve/bipsea ## Example ```sh bipsea validate -f free -m "$(cat input.txt)" | bipsea xprv ``` # Example steganographic mnemonics in `Free` ## Playing cards A common deck of 52 cards encodes approximately 225 bits of entropy, more entropy than 21 BIP-39 words. Such decks can be carried on one's perso= n without raising an eyebrow. Users might enter cards in deck order as follows: ``` 2S 10S kC 10H 5S ... ``` ## Chess, three different ways ### Fictional move order A chess board contains 64 totally ordered squares each of which can be=20 addressed in algebraic notation of the form `{a-h}{1-8}`. A move specifies one of=20 five piece types (`R, N, B, Q, K`) followed by a square. `Nf1` is an example of a single knight move. A series of 42 chess moves written on an easy-to-repudiate,=20 easy-to-obfuscate, piece of paper encodes at least as much entropy as 24 BIP-39 seed words. $$ \log_2(69^{42}) \approx 256 $$ Ensuring that such moves comprise a valid chess game (and thus greater=20 steganography) is a hard problem and is neither required nor recommended in the context of= =20 this BIP. It is not recommended since it constrains the potential entropy in unclear= =20 and hard-to-reckon ways. ### PGN files Nevertheless, high steganography in a chess game can be achieved with file= =20 formats that support comments, such as the common PGN format. Observe the following= =20 snippet from a PGN file and note the opportunities for arbitrary comments. ``` [Event "Third Rosenwald Trophy"] [Site "New York, NY USA"] [Date "1956.10.17"] [EventDate "1956.10.07"] [Round "8"] [Result "0-1"] [White "Donald Byrne"] [Black "Robert James Fischer"] [ECO "D92"] [WhiteElo "?"] [BlackElo "?"] [PlyCount "82"] 1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 O-O 5. Bf4 d5 6. Qb3 dxc4 7. Qxc4 c6 8. e4 Nbd7 9. Rd1 Nb6 10. Qc5 Bg4 11. Bg5 {11. Be2 followed by 12. O-O would have been more prudent. The bishop move played allows a sudden crescendo of tactical points to be uncovered by Fischer. -- Wade} Na4 {!} ``` ### Marked game boards Alternatively a user might choose to subtly mark the 64 squares of two=20 chess boards to represent a 1 or 0 in each of 128 unique positions, storing 128 bits of= =20 entropy (equivalent to 12 BIP-39 seed words). Random bits can be generated with a= =20 coin. ## Any board game One can imagine steganographic secrets similar to chess for Monopoly, Go,= =20 or any board game. ## Dice, but different We noted above that if the user were to roll and then store 100 dice that= =20 it would be impractical to retain the original order. We observe that there are 21 small writing surfaces (the solid dots on each= =20 face) on a six-sided die. If the user were to inscribe a single random digit into= =20 each dot he would obtain approximately 70 bits of ordered entropy. Three such dice= =20 would be easy to retain and order and provide greater entropy than 18 BIP-39 seed=20 words. ## A paper napkin In addition to a literal napkin sketch (with phone numbers, measurements, or harmless notes) users without access to coins, dice, game boards or=20 electronics could generate "poor man's entropy" by dropping a stone onto a napkin=20 divided into=20 equal-sized quadrants to generate entropy. Said "poor man's entropy" is not recommended but nevertheless illustrates the vast expansion in capability and steganography that obtains from this= =20 BIP. --=20 You received this message because you are subscribed to the Google Groups "= Bitcoin Development Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to bitcoindev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/= bitcoindev/de66201a-b281-4a0c-a483-dc2ffd6978b2n%40googlegroups.com. ------=_Part_85212_503457319.1717814442729 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable By adding a BIP-39 language called "Free" we can have nice things from PBKD= F2(), including offline and steganographic seeds, in a way that's fully com= patible with BIP-39. For instance users can store strong seeds in an easy-t= o-conceal deck of cards.

Proposal inline below, with l= ink to a reference implementation in Python. Online spec here=C2=A0https://= github.com/akarve/bip-keychain/blob/main/pre-bips/steganographic-seeds.md.<= div>
I could use your help on a proper seed complexity meas= ure.

Let me know your thoughts.

=
# Free seed mnemonics for steganography and attack-resistance

> "Anyone who considers arithmetical methods of producing =C2=A0= random digits is, =C2=A0
> of course, in a state of sin." =E2=80=94= John Von Neumann

# Abstract

Bitcoin seed mnemonics fa= ce attackers ranging from casual thieves to state actors.
We propose b= ackward-compatible, non-breaking changes to BIP-39 master seed derivationto unlock broader mnemonic options

from PBKDF2().
Users= of this BIP can generate and store seeds offline with common physical obje= cts
that are plausibly deniable: playing cards, chess boards, and pape= r napkins to name
a few. As a result seed mnemonics gain greater porta= bility, memorability, entropy,
and steganography as compared to BIP-39= mnemonics.


# Motivation

BIP-39 mnemonic seed p= hrases have the following shortcomings:

1. BIP-39 seed words, if= found by an attacker, are instantly recognizable
as a Bitcoin passphr= ase to be scrutinized or seized outright.

1. BIP-39 "mnemonics" = are not particularly easy to remember.

1. Computing BIP-39's che= cksum bits necessitates a computer, making offline
seeds impossible.
1. Some hardware vendors support independent sources of entropy s= uch as die rolls
but, unfortunately for the security, convenience, and= trust of the users, vary
in how they convert user entropy to the prop= er binary seed.

=C2=A0 =C2=A0 1. Users are required to run their= own vendor-specific code to verify that the
=C2=A0 =C2=A0 vendor has = actually used their provided entropy. Said verification often _still_
= =C2=A0 =C2=A0 requires blind trust (how does the user know that entropy pro= duced the right
=C2=A0 =C2=A0 binary seed?) and is prohibitively techn= ical for many users.

=C2=A0 =C2=A0 1. It is cumbersome to manual= ly enter the results of 100 six-sided die rolls,
=C2=A0 =C2=A0 the min= imum number of rolls to surpass 256 bits of entropy.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 1. Die rolls are poor for storing secrets since there are = usually fewer dice
=C2=A0 =C2=A0 than rolls and since dice are easily = mixed up.
=C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 > Compare= the effort and portability of these 100 rolls
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 > to the far easier and more portable shuffled deck of cards.

The above weaknesses cause all Bitcoin wallets to lose,
**due = to no fundamental limitation whatsoever**,
the following benefits:

1. Portability in physical media.

1. Portability and mem= orability in the human brain.

1. Repudiation in high-risk situat= ions where individuals or the Bitcoin protocol
are under attack.
=
1. The ability to generate, with no electronics, a cryptographically<= br />strong seed that will later work with many hardware wallets.

1. Moore's-law-resistant mnemonics that encode far more than 256 bits of = entropy.

=C2=A0 =C2=A0 > Although more than 256 bits of entro= py exceeds today's ECDSA private keys,
=C2=A0 =C2=A0 > it rightfull= y leaves the door open for stronger keys in the future
=C2=A0 =C2=A0 &= gt; and further permits today's mnemonics to be repurposed, without change,=
=C2=A0 =C2=A0 > for tomorrow's larger keys.

**As result= , Bitcoin users seeking to evade oppressive governments and other attackers=
have fewer defenses than the protocol naturally affords.**

Importantly, the above weaknesses can be remedied with
_little to no = change to the BIP-39 seed phrase implementations_ already ubiquitous
i= n hardware and software wallets.


# Risks, remedies, and al= ternatives

## Risks

1. Giving users more sources and = options for mnemonics increases the
risk that these users provide weak= inputs that contain too little entropy or weak
entropy (e.g. common p= hrases).

=C2=A0 =C2=A0 > Implementers must mitigate this risk= with an easy-to-implement entropy
=C2=A0 =C2=A0 > measure and mess= age to the user.

1. BIP-39 includes checksum bits in the final w= ord, offering some protection
against erroneous entry. The present pro= posal eliminates both the advantages (integrity)
and disadvantages (ca= nnot be computed by hand) of the BIP-39 checksum bits
in favor of a mu= ch broader set of steganographic mnemonics that can be stored,
generat= ed, and carried in situations of urgency and scarcity.

=C2=A0 = =C2=A0 > BIP-39 checksum validation _shall remain in place_ (unchanged) = for BIP-39
=C2=A0 =C2=A0 > mnemonics.

=C2=A0 =C2=A0 >= Advanced users might choose to implement their own checksums or error-corr= ecting
=C2=A0 =C2=A0 > codes.


# Specification

**The present spec is fully backwards-compatible with BIP-39 mnemoni= cs**.
We introduce a new input "language" `Free` that admits a superse= t of BIP-39 mnemonics.
Wallets can and should continue to validate BIP= -39 mnemonics as in the past.
`Free` should be treated as a new input = language, similar to English, French, or
any of the BIP-39 languages.<= br />
`Free` should allow at a minimum the ASCII printable characters,= minus capital
letters.


BIP-39 derives binary seeds b= y applying `PBKDF2()` to a mnemonic and passhprase
as follows:
```
PKBDF2(
=C2=A0 =C2=A0 password=3Dbin(nfkd(mnemonic)),
=C2=A0 =C2=A0 salt=3Dbin(nfkd("mnemonic" || passphrase)),
=C2=A0 =C2= =A0 hash_name=3D"HMAC-SHA512",
=C2=A0 =C2=A0 iterations=3D2048,
)=
```

`nfkd()` converts the input string to Unicode normaliz= ation form KD.

Fortunately `PBKDF2()` does not limit the domain = either the `password` or `salt`
argument. Existing implementations are= therefore easy to update.
Applications must _not change_ passphrase e= ntry, validation, normaliztion, or application.

Applications sho= uld regard the _mnemonic_ as a raw string to permit the widest
possibl= e array of input characters. (See the following section for details.)
=
In the interest of backward compatibility we propose that existing BI= P-39 implementations
treat `Free` as the tenth input "language" with t= he following differences from the 9
BIP-39 languages:

1. If= they do not already, lower the case of all input letters.
Strangely B= IP-39 is silent on the subject of case.

1. If they do not alrea= dy, `strip` the `Free` mnemonic of surrounding whitespace
and split it= on the regular expression `\s+`.

1. Apply `nfkd()` to the `Free= ` mnemonic.


```
if language =3D=3D "Free":
=C2= =A0 =C2=A0 norm_mnemonic =3D lower(nfkd(split("\s+", mnemonic)))
=C2= =A0 =C2=A0 validate(norm_mnemonic)
=C2=A0 =C2=A0 PKBDF2(
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 password=3Dbin(norm_mnemonic),
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 salt=3Dbin(nfkd("mnemonic" || passphrase))),
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 hash_name=3D"HMAC-SHA512",
=C2=A0 =C2=A0 =C2=A0 =C2=A0 i= terations=3D2048,
=C2=A0 =C2=A0 )
```


The output= of PKBDF2 is converted to a master seed as in BIP-39.


## = `validate()`

`validate()` must at a minimum estimate the complex= ity `complexity()` of
the user proposed mnemonic and must refuse the m= nemonic if the entropy is less
than a threshold (we recommend a thresh= old of 0.5).

Implementations must know the cardinality `C` of th= e mnemonic character set.
Applications must support at a bare minium a= n input cardinality of **74**
(the number of printable ASCII character= s) but higher values for `C` are both
permissible and recommended. =C2= =A0As suggested below, the higher the cardinality of
the input set, th= e greater the steganographic potential.

=C2=A0 =C2=A0 0123456789= abcdefghijklmnopqrstuvwxyz!"#$%&\'()*+,-./:;<=3D>?@[\\]^_`{|}~ \t= \n\r\x0b\x0c

The Shannon Entropy `SE` of a string is as follows:=

$$ se(X) :=3D - \sum_{i} p(x_i) \log_2 p(x_i) $$

As = an optimization for fixed `X` with all unique entries and cardinality `C`:<= br />
$$ SE(X) :=3D log_2(C)$$

> Intuitively the above i= s the (fractional) number of bits needed to represent all
> charact= ers in the `universe`.

The =C2=A0relative entropy is simply the = following:

$$ re(mnemonic\_tokens, universe) :=3D SE(mnemonic) /= SE(universe)$$

`universe` is the list of all possible input tok= ens. `mnemonic_tokens` is a tokenized
list of the inputs. Tokens may v= ary in length per the universe and application,
though applications ca= n start withe one token per ASCII printable character.

`re()` ra= nges from 0 to 1 when `mnemonic_tokens` are all unique. An `re()` of 0.5reflects that the user has provided enough information as providing one = instance
of half of all input tokens.

`re()` alone is not a= complete measure of password complexity since it does
not take order = into account. For instance the string `"abc...xyz"` and its
reverse bo= th have hight relative entropy but are highly predictable.

To co= rrect for this we can use the Hamming Distance, `hd()`, which counts thenumber of characters that are not in sorted order:

$$
\t= ext{Hamming Distance} =3D \sum_{i=3D1}^{n} (s_i \neq t_i)
$$

Since undesirable order might be forwards of backwards we take the Relati= ve
Absolute Hamming Distance `rahd()`:

```
rahd() :=3D= min(hd(norm_mnemonic), hd(norm_mnemonic.reverse())) / len(norm_mnemonic)```

As with `re()`, `rahd()` ranges from 0 to 1.

= ```
validate :=3D F(re(norm_mnemonic) + rahd(norm_mnemonic)) / 2
= ```

If `validate()` returns a complexity less than a given thres= hold (TBD) the wallet
should warn the user.

### TODO: examp= les of `validate()` for representative inputs

* [ ] Dice, cards,= chess, bad + good text passwords
* [ ] Show how this leads to standar= d dice verification and fingerprinting
across all hardware vendors (ph= ew)


# Reference implementation

* https://githu= b.com/akarve/bipsea

## Example

```sh
bipsea vali= date -f free -m "$(cat input.txt)" | bipsea xprv
```


= # Example steganographic mnemonics in `Free`


## Playing ca= rds

A common deck of 52 cards encodes approximately 225 bits of = entropy,
more entropy than 21 BIP-39 words. Such decks can be carried = on one's person
without raising an eyebrow.

Users might ent= er cards in deck order as follows:

```
2S 10S kC 10H 5S ...=
```


## Chess, three different ways


= ### Fictional move order

A chess board contains 64 totally order= ed squares each of which can be addressed
in algebraic notation of the= form `{a-h}{1-8}`. A move specifies one of five piece
types (`R, N, B= , Q, K`) followed by a square.
`Nf1` is an example of a single knight = move.
A series of 42 chess moves written on an easy-to-repudiate, easy= -to-obfuscate,
piece of paper encodes at least as much entropy as 24 B= IP-39 seed words.

$$ \log_2(69^{42}) \approx 256 $$

E= nsuring that such moves comprise a valid chess game (and thus greater stega= nography)
is a hard problem and is neither required nor recommended in= the context of this BIP.
It is not recommended since it constrains th= e potential entropy in unclear and
hard-to-reckon ways.

### PGN files

Nevertheless, high steganography in a chess gam= e can be achieved with file formats
that support comments, such as the= common PGN format. Observe the following snippet
from a PGN file and = note the opportunities for arbitrary comments.


```
[E= vent "Third Rosenwald Trophy"]
[Site "New York, NY USA"]
[Date "1= 956.10.17"]
[EventDate "1956.10.07"]
[Round "8"]
[Result "0-= 1"]
[White "Donald Byrne"]
[Black "Robert James Fischer"]
[E= CO "D92"]
[WhiteElo "?"]
[BlackElo "?"]
[PlyCount "82"]

1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 O-O 5. Bf4 d5 6. Qb3 dxc4
= 7. Qxc4 c6 8. e4 Nbd7 9. Rd1 Nb6 10. Qc5 Bg4 11. Bg5 {11. Be2
followed= by 12. O-O would have been more prudent. The bishop
move played allow= s a sudden crescendo of tactical points to be
uncovered by Fischer. --= Wade} Na4 {!}
```


### Marked game boards

= Alternatively a user might choose to subtly mark the 64 squares of two ches= s boards
to represent a 1 or 0 in each of 128 unique positions, storin= g 128 bits of entropy
(equivalent to 12 BIP-39 seed words). Random bit= s can be generated with a coin.


## Any board game
One can imagine steganographic secrets similar to chess for Monopoly, Go= , or any
board game.


## Dice, but different

We noted above that if the user were to roll and then store 100 dice that= it would
be impractical to retain the original order.
We observe= that there are 21 small writing surfaces (the solid dots on each face)
on a six-sided die. If the user were to inscribe a single random digit in= to each dot
he would obtain approximately 70 bits of ordered entropy. = Three such dice would be
easy to retain and order and provide greater = entropy than 18 BIP-39 seed words.


## A paper napkin
=
In addition to a literal napkin sketch (with phone numbers, measureme= nts,
or harmless notes) users without access to coins, dice, game boar= ds or electronics
could generate "poor man's entropy" by dropping a st= one onto a napkin divided into
equal-sized quadrants to generate entr= opy.

Said "poor man's entropy" is not recommended but neverthele= ss illustrates
the vast expansion in capability and steganography that= obtains from this BIP.

--
You received this message because you are subscribed to the Google Groups &= quot;Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bitcoind= ev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg= id/bitcoindev/de66201a-b281-4a0c-a483-dc2ffd6978b2n%40googlegroups.com.=
------=_Part_85212_503457319.1717814442729-- ------=_Part_85211_2091040105.1717814442729--