Return-Path: Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 06EBEC002D for ; Wed, 24 Aug 2022 19:10:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id D0144400EC for ; Wed, 24 Aug 2022 19:10:22 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org D0144400EC Authentication-Results: smtp2.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=notatether.com header.i=@notatether.com header.a=rsa-sha256 header.s=protonmail header.b=GKxZoill X-Virus-Scanned: amavisd-new at osuosl.org X-Spam-Flag: NO X-Spam-Score: -1.6 X-Spam-Level: X-Spam-Status: No, score=-1.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, PDS_BTC_ID=0.499, PDS_BTC_MSGID=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5IL-jpzSxxqJ for ; Wed, 24 Aug 2022 19:10:20 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 1082D40017 Received: from mail-4018.proton.ch (mail-4018.proton.ch [185.70.40.18]) by smtp2.osuosl.org (Postfix) with ESMTPS id 1082D40017 for ; Wed, 24 Aug 2022 19:10:19 +0000 (UTC) Date: Wed, 24 Aug 2022 19:10:04 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=notatether.com; s=protonmail; t=1661368217; x=1661627417; bh=rDx+fAJJeSD2oDOZqfgkVSh3KkzElOSXacEjpa7UinM=; h=Date:To:From:Cc:Reply-To:Subject:Message-ID:In-Reply-To: References:Feedback-ID:From:To:Cc:Date:Subject:Reply-To: Feedback-ID:Message-ID; b=GKxZoillYSWxI2A75D+jpSMldMOwjtYxxy0f+O/NJSVNvA3eOShX+eb+qR5Vyt1o/ dy9Mot1I0douBO31qF4nQXYOMl9aYLLnVV98KSvdH7kFfOCwWOa2e6DoN+lUovzye9 qgd47GyT2okzWLGmCYnZpZfPCJ2NY1/Gv7TkP28v/VquWSAr5VjxaWVz0OJrfHXyHT /x2Qniwz0wwj+T9zxBHsBl2Gs2pCe2k/XF1y+/TKh4nAUpMwY55soXHIoUtb6p7+6O hOlNmVGHdk+m2eUCQ49PSybyWy2P+zLEQ15i6qq7wU30H3nq2z6O2BrgzSmMdUkuOa 1zrBfnaWgCpvA== To: craigraw@gmail.com From: Ali Sherief Reply-To: Ali Sherief Message-ID: <20220824190958.gklg3riadci3ttgm@artanis> In-Reply-To: References: Feedback-ID: 34210769:user:proton MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Wed, 24 Aug 2022 19:43:52 +0000 Cc: bitcoin-dev@lists.linuxfoundation.org Subject: Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Aug 2022 19:10:23 -0000 Hi Craig, This a really good proposal. I studied your BIP and I have some feedback on= some parts of it. > The first line in the file is a header, and should be ignored on import. From past experience and lessons, most notably BIP39, it is important that = a version byte is defined somewhere in case someone wants to extend it in t= he future, currently there is no version byte which someone can increment i= f somebody wants to extend it. In the unique case of CSV files, you should = make the header line mandatory (I see you have already implied this, but yo= u should make it explicit in the BIP), but instead of a line with columns i= n it, I suggest instead of Reference,Label, you make the format like this: BIP-wallet-labels, Since there are two columns per record, this works out nicely. The first co= lumn can be the name of the BIP - BIPxxxx where the x's are numbers, and th= e second column can be an unsigned 32-bit integer (most significant 8 bits = reserved for version, the remaining for flags, or perhaps the entirety for = version - but I recommend leaving at least some bits for flags, even if the= y all end up being just "reserved"). You should make importing fail if the header line is not exactly as specifi= ed - or appropriate, should you decide a different format for the header. > Files exported should use the .csv file extension. Don't mandate the file extension (read below for why): > In order to reduce file size while retaining wide accessibility, the CSV > file may be compressed using the ZIP file format, using the .zip > file extension. I see three problems with this. The first is more important than the later = two because it makes them moot points, but I'll mention them anyway so you = get a background of the situation: - The BIP is trying to specify in what file format the export format can be= written in onto the filesystem. There is no way to enforce this on a BIP l= evel (besides, Unix operating systems don't even consider the file extensio= n, they use its mimetype). Also specifying this in the BIP will prevent mod= ular "Layer 2" protocols and schemes from encoding the Export labels into a= nother format - for example Base64 or with their own compression algorithm. Now for the two "moot problems": - ZIP does not have good performance or compression ratio, there are better= algorithms out there like gzip (which also happens to be more ubiquitous; = nearly all websites are serving HTML compressed with gzip compression). - ZIP is an archiving format, that happens to have its own compression form= at. Archiving format parsers can have serious vulnerabilities in their impl= ementation that can allow malware to swipe private keys and passwords, sinc= e the primary target for this BIP is wallets. For example, there was Zip Sl= ip[1] in 2018, which allows for remote code execution. So the malware can e= ven hide in memory until private keys or passwords are written to memory, t= hen send them accros the network. Assuming it's targeting a specific wallet= software it's not hard to carry out at all. There's two solutions for all this: 1. The duck-tape solution: Use some compression algorithm like gzip instead= of ZIP archive format. 2. The "throw it out and buy a new one" solution: Get rid of the optional c= ompression specs altogether, because users are responsible for supplying th= e export labels in the first place, so all the compression stuff is redunda= nt and should be left up to the user use if they desire to. I prefer the second solution because it hits the nail at the problem direct= ly instead of putting duck tape on it like the first one. > This .zip file may optionally be encrypted using either AES-128 = or > AES-256 encryption, which is supported by numerous applications including > Winzip and 7-zip. > The textual representation of the wallet's extended public key (as define= d > by BIP32, with an xpub header) should be used as the password. Not specific to AES, but I don't see the benefit of encrypting addresses an= d labels together. Can you please elaborate why this would be desireable? Like I said though, it's better to leave it up to users to decide how to st= ore their exports, since BIPs can't enforce that anyway (additionally, the = password you propose is insecure - anybody with access to the wallet can un= lock it, which is not desireable to some users who want their own security)= . > * Transaction ID (txid) > * Address > * Input (rendered as txid) > * Output (rendered as txid>index or txid:index) Why the need for input and output formats? There is no difference between t= hem on the wallet level, because they are always identified with a txid and= output index. To distinguish between them and hence write them with the co= rrect format would require a UTXO set and thus access to a full node, other= wise the CSV cannot be verified to be completely well-formed. Another important point is that practically nobody labels inputs or outputs= because most people do not know that those things even exist, and the rest= don't bother to label them. But the biggest downside to including them is related to the problem of inf= ormation leaking which you make reference to here: > In both cases, care must be taken when spending to avoid undesirable leak= s > of private information. A CSV dump that has inputs/outputs and addresses mixed together can infer t= he owner of all those items. In fact, A CVS label dump is basically a perso= nal information store so everything in it can be correlated as coming from = the same wallet, so it's important that unnecessary types are kept out of t= he format. People are known to leave files lying around on their computer t= hat they don't need anymore, so these files can find their way via telemetr= y to surveillence entities. While we can't specify what users can do with t= heir exports, we can control the information leak by preventing certain typ= es of items that we know most users will never use from being exported in t= he first place. > The order in which these records appear is not defined. Again, since the primary use case for this BIP is wallets, which likely use= heirarchical derivation schemes like BIP44, there is a net benefit for the= addresses to be exported in ascending order of their `address_type`. It me= ans that wallets can import them in O(n) time as opposed to O(n^2) time spe= nt serially checking in which index the address appears at. Of course, this= implies that all addresses up to a certain index have to be exported into = the CSV as well, but most wallets I know of like Core, Electrum already sto= re addresses like that. Also if you do this, you will need to group all the transaction records bef= ore the address records or vice versa - you can use lexigraphical sorting i= f you want (ie. Addresses before Transactions). The benefit of this separat= ion of parts is that wallets can split the imported address records from th= e transaction records internally, and feed them to separate functions which= set these labels internally. If you decide on doing it this way, then you need a 3rd column to identify = the item type, and also you should quote the label (see below). I strongly = recommend using numbers for identification as opposed to character strings,= so you don't have to worry about localization or character case issues. Th= ere is always one unique number, but there could be multiple strings that r= eference the same type. This will complicate importing functions. If you insist on include Input and Output types then they can both be speci= fied as : if you do this change. They won't be used to determi= ne the type anyway. > The fields may be quoted, but this is unnecessary, as the first comma in > the line will always be the delimiter. Don't implement it like that, because that will break CSV parsers which exp= ect a fixed amount of rows in each record (2 in the header, and some rows h= ave >2 rows). It's better to mandate that they should always be double-quot= ed, since only wallets will generate label exports anyway. If you plan to u= se headers then the 3rd column can be blank for it (or you can split the ve= rsion and flags from each other). > =3D=3DImporting=3D=3D > > When importing, a naive algorithm may simply match against any reference, > but it is possible to disambiguate between transactions, addresses, input= s > and outputs. > For example in the following pseudocode: >
>   if reference length < 64
>     Set address label
>   else if reference length =3D=3D 64
>     Set transaction label
>   else if reference contains '<'
>     Set input label
>   else
>     Set output label
> 
The importing code is too naive and in its current form will prevent the BI= P from getting a number. It is perhaps the single most important part of a = BIP. When implementing an importer, it should utilize a dedicate item type = field that unambiguously identifies the item. So the naive importer is not = good, you need use a 3rd column for that like I explained above, so that th= e importer becomes robust. In summary (exclamation marks indicate severity - one means low, two means = medium, and three means high): 1. Convert the header into a version line with optional flags, otherwise no= body can extend this format without compatibility issues (!) 2. Get rid of the specs related to file compression (!!!) 3. Add a 3rd column for item type (address, transaction etc.) preferably as= numeric constants and grouping items of one type after items of another ty= pe, or if you insist on strings, then only recognize their Titlecase ASCII = versions (!!) 4. Require double quotes around the label (or single quotes if you prefer, = as long as spreadsheet software doesn't choke on them) (!!) 5. Require sorting the records according to the order they are stored in th= e wallet implementation. (!) 6. Consider getting rid of Input and Output item types. (!) 7. And last and most importantly, please write a more robust importer algor= ithm in the example given by the BIP, because code in BIPs are frequently u= sed as references for software. (!!!) I hope you will consider these points in future revisions of your BIP. - Ali [1] https://github.com/snyk/zip-slip-vulnerability On Wed, 24 Aug 2022 11:18:43 +0200, craigraw@gmail.com wrote: > Hi all, > > I would like to propose a BIP that specifies a format for the export and > import of labels from a wallet. While transferring access to funds across > wallet applications has been made simple through standards such as BIP39, > wallet labels remain siloed and difficult to extract despite their value, > particularly in a privacy context. > > The proposed format is a simple two column CSV file, with the reference t= o > a transaction, address, input or output in the first column, and the labe= l > in the second column. CSV was chosen for its wide accessibility, especial= ly > to users without specific technical expertise. Similarly, the CSV file ma= y > be compressed using the ZIP format, and optionally encrypted using AES. > > The full text of the BIP can be found at > https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki > and also copied below. > > Feedback is appreciated. > > Thanks, > Craig Raw > > --- > >
>   BIP: wallet-labels
>   Layer: Applications
>   Title: Wallet Labels Export Format
>   Author: Craig Raw 
>   Comments-Summary: No comments yet.
>   Comments-URI:
> https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>   Status: Draft
>   Type: Informational
>   Created: 2022-08-23
>   License: BSD-2-Clause
> 
> > =3D=3DAbstract=3D=3D > > This document specifies a format for the export of labels that may be > attached to the transactions, addresses, input and outputs in a wallet. > > =3D=3DCopyright=3D=3D > > This BIP is licensed under the BSD 2-clause license. > > =3D=3DMotivation=3D=3D > > The export and import of funds across different Bitcoin wallet applicatio= ns > is well defined through standards such as BIP39, BIP32, BIP44 etc. > These standards are well supported and allow users to move easily between > different wallets. > There is, however, no defined standard to transfer any labels the user ma= y > have applied to the transactions, addresses, inputs or outputs in their > wallet. > The UTXO model that Bitcoin uses makes these labels particularly valuable > as they may indicate the source of funds, whether received externally or = as > a result of change from a prior transaction. > In both cases, care must be taken when spending to avoid undesirable leak= s > of private information. > Labels provide valuable guidance in this regard, and have even become > mandatory when spending in several Bitcoin wallets. > Allowing users to export their labels in a standardized way ensures that > they do not experience lock-in to a particular wallet application. > In addition, by using common formats, this BIP seeks to make manual or bu= lk > management of labels accessible to users without specific technical > expertise. > > =3D=3DSpecification=3D=3D > > In order to make the import and export of labels as widely accessible as > possible, this BIP uses the comma separated values (CSV) format, which is > widely supported by consumer, business, and scientific applications. > Although the technical specification of CSV in RFC4180 is not always > followed, the application of the format in this BIP is simple enough that > compatibility should not present a problem. > Moreover, the simplicity and forgiving nature of CSV (over for example > JSON) lends itself well to bulk label editing using spreadsheet and text > editing tools. > > A CSV export of labels from a wallet must be a UTF-8 encoded text file, > containing one record per line, with records containing two fields > delimited by a comma. > The fields may be quoted, but this is unnecessary, as the first comma in > the line will always be the delimiter. > The first line in the file is a header, and should be ignored on import. > Thereafter, each line represents a record that refers to a label applied = in > the wallet. > The order in which these records appear is not defined. > > The first field in the record contains a reference to the transaction, > address, input or output in the wallet. > This is specified as one of the following: > * Transaction ID (txid) > * Address > * Input (rendered as txid) > * Output (rendered as txid>index or txid:index) > > The second field contains the label applied to the reference. > Exporting applications may omit records with no labels or labels of zero > length. > Files exported should use the .csv file extension. > > In order to reduce file size while retaining wide accessibility, the CSV > file may be compressed using the ZIP file format, using the .zip > file extension. > This .zip file may optionally be encrypted using either AES-128 = or > AES-256 encryption, which is supported by numerous applications including > Winzip and 7-zip. > In order to ensure that weak encryption does not proliferate, importers > following this standard must refuse to import .zip files encrypt= ed > with the weaker Zip 2.0 standard. > The textual representation of the wallet's extended public key (as define= d > by BIP32, with an xpub header) should be used as the password. > > =3D=3DImporting=3D=3D > > When importing, a naive algorithm may simply match against any reference, > but it is possible to disambiguate between transactions, addresses, input= s > and outputs. > For example in the following pseudocode: >
>   if reference length < 64
>     Set address label
>   else if reference length =3D=3D 64
>     Set transaction label
>   else if reference contains '<'
>     Set input label
>   else
>     Set output label
> 
> > Importing applications may truncate labels if necessary. > > =3D=3DTest Vectors=3D=3D > > The following fragment represents a wallet label export: >
> Reference,Label
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transac=
tion
> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Outpu=
t
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Outpu=
t
> (alternative)
> 
> > =3D=3DReference Implementation=3D=3D > > TBD