Return-Path: Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 046FCC002D for ; Sat, 27 Aug 2022 21:27:14 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 5943F82662 for ; Sat, 27 Aug 2022 21:27:13 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 5943F82662 Authentication-Results: smtp1.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=notatether.com header.i=@notatether.com header.a=rsa-sha256 header.s=protonmail header.b=gTJBNUME X-Virus-Scanned: amavisd-new at osuosl.org X-Spam-Flag: NO X-Spam-Score: -1.601 X-Spam-Level: X-Spam-Status: No, score=-1.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, PDS_BTC_ID=0.499, PDS_BTC_MSGID=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id N9_cDCXo5Mfq for ; Sat, 27 Aug 2022 21:27:11 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 6237F823F6 Received: from mail-4323.proton.ch (mail-4323.proton.ch [185.70.43.23]) by smtp1.osuosl.org (Postfix) with ESMTPS id 6237F823F6 for ; Sat, 27 Aug 2022 21:27:10 +0000 (UTC) Date: Sat, 27 Aug 2022 21:26:58 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=notatether.com; s=protonmail; t=1661635625; x=1661894825; bh=ZsJe864vfSvc9ImvNVZ68PdT8fSPxWSpFiP7/sPzY/s=; h=Date:To:From:Cc:Reply-To:Subject:Message-ID:In-Reply-To: References:Feedback-ID:From:To:Cc:Date:Subject:Reply-To: Feedback-ID:Message-ID; b=gTJBNUME1v1YXsxVP477cIoqySzaCNDPNxX5Ykq9ZkoITihnnlJO0lpEATUzD637G tF88aFmRNslHPvsuiLWgaGTlcjAZkj63YLIJTSvVWNFNMvXh1FCccm7PzyuJ/T8oIQ ZP/lPd5q5AybtJ1CeTqgB8q6JGCCBbUmYNJXPBLdQFMv56pprO6DhpmzrfGSHtw/Ae Qy+DOZ27jhB17ghj9DjJEv80DOLBdzwNkp2VZWbEoQ1HfgSkwCEIs/fmppF1nwcWMH CuxE5m575SBiw/gfkS/93bNwRdV9HIMk8DCEgoFylHB+fffW9YLDfK8P8KoqoFDP4d X7OOnjLkfBhnA== To: billy.tetrud@gmail.com From: Ali Sherief Reply-To: Ali Sherief Message-ID: <20220827212652.nup7pscunyxnsnwv@artanis> In-Reply-To: References: Feedback-ID: 34210769:user:proton MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Sat, 27 Aug 2022 21:32:14 +0000 Cc: bitcoin-dev@lists.linuxfoundation.org Subject: Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Aug 2022 21:27:14 -0000 > This seems to run contrary with your point about letting users be in > control of how they store this. Given that you can always connect togethe= r > an output and its address or find the outputs at any address, it doesn't > seem like it would actually leak any more information than just including > addresses. Am I missing something? That's actually true, and coming back to it now it feels more like a securi= ty-through-obscurity suggestion. It's still valid that the export files wil= l be valuable telemetry, but now I'm starting to feel more concerned about = how inputs and outputs would be represented in the first place. Some folks have suggested writing them as descriptors for that purpose[1]. = But I see problems with that approach; there are only descriptors for thing= s like addresses, outputs, derivation paths and so on. I know of no descrip= tors for transaction IDs or inputs. I am actually starting to contemplate whether it's wise to merge Inputs and= Outputs to one classification conveniently called just "Outputs", because = it's impossible to distinguish between them by looking at them (any input i= s also an output, but not vice versa). Wise, because I do not know of any w= allet software that labels outputs. - Ali [1]: https://bitcointalk.org/index.php?topic=3D5411159.0 On Sat, Sat, 27 Aug 2022 16:03:01 -0500, billy.tetrud@gmail.com wrote: > @Ali Thats some good well thought through and well articulated feedback. = I > have one point of contention > > > it's important that unnecessary types are kept out of the format. Peopl= e > are known to leave files lying around on their computer that they don't > need anymore, so these files can find their way via telemetry to > surveillence entities. While we can't specify what users can do with thei= r > exports, we can control the information leak by preventing certain types = of > items that we know most users will never use from being exported in the > first place. > > This seems to run contrary with your point about letting users be in > control of how they store this. Given that you can always connect togethe= r > an output and its address or find the outputs at any address, it doesn't > seem like it would actually leak any more information than just including > addresses. Am I missing something? > > On Wed, Aug 24, 2022, 14:44 Ali Sherief via bitcoin-dev < > bitcoin-dev@lists.linuxfoundation.org> wrote: > > > Hi Craig, > > > > This a really good proposal. I studied your BIP and I have some feedbac= k > > on some parts of it. > > > > > The first line in the file is a header, and should be ignored on impo= rt. > > > > From past experience and lessons, most notably BIP39, it is important t= hat > > a version byte is defined somewhere in case someone wants to extend it = in > > the future, currently there is no version byte which someone can increm= ent > > if somebody wants to extend it. In the unique case of CSV files, you sh= ould > > make the header line mandatory (I see you have already implied this, bu= t > > you should make it explicit in the BIP), but instead of a line with col= umns > > in it, I suggest instead of Reference,Label, you make the format like t= his: > > > > BIP-wallet-labels, > > > > Since there are two columns per record, this works out nicely. The firs= t > > column can be the name of the BIP - BIPxxxx where the x's are numbers, = and > > the second column can be an unsigned 32-bit integer (most significant 8 > > bits reserved for version, the remaining for flags, or perhaps the enti= rety > > for version - but I recommend leaving at least some bits for flags, eve= n if > > they all end up being just "reserved"). > > > > You should make importing fail if the header line is not exactly as > > specified - or appropriate, should you decide a different format for th= e > > header. > > > > > Files exported should use the .csv file extension. > > Don't mandate the file extension (read below for why): > > > > > In order to reduce file size while retaining wide accessibility, the = CSV > > > file may be compressed using the ZIP file format, using the .zip<= /tt> > > > file extension. > > I see three problems with this. The first is more important than the la= ter > > two because it makes them moot points, but I'll mention them anyway so = you > > get a background of the situation: > > - The BIP is trying to specify in what file format the export format ca= n > > be written in onto the filesystem. There is no way to enforce this on a= BIP > > level (besides, Unix operating systems don't even consider the file > > extension, they use its mimetype). Also specifying this in the BIP will > > prevent modular "Layer 2" protocols and schemes from encoding the Expor= t > > labels into another format - for example Base64 or with their own > > compression algorithm. > > > > Now for the two "moot problems": > > - ZIP does not have good performance or compression ratio, there are > > better algorithms out there like gzip (which also happens to be more > > ubiquitous; nearly all websites are serving HTML compressed with gzip > > compression). > > - ZIP is an archiving format, that happens to have its own compression > > format. Archiving format parsers can have serious vulnerabilities in th= eir > > implementation that can allow malware to swipe private keys and passwor= ds, > > since the primary target for this BIP is wallets. For example, there wa= s > > Zip Slip[1] in 2018, which allows for remote code execution. So the mal= ware > > can even hide in memory until private keys or passwords are written to > > memory, then send them accros the network. Assuming it's targeting a > > specific wallet software it's not hard to carry out at all. > > > > There's two solutions for all this: > > 1. The duck-tape solution: Use some compression algorithm like gzip > > instead of ZIP archive format. > > 2. The "throw it out and buy a new one" solution: Get rid of the option= al > > compression specs altogether, because users are responsible for supplyi= ng > > the export labels in the first place, so all the compression stuff is > > redundant and should be left up to the user use if they desire to. > > > > I prefer the second solution because it hits the nail at the problem > > directly instead of putting duck tape on it like the first one. > > > > > This .zip file may optionally be encrypted using either AES-= 128 > > or > > > AES-256 encryption, which is supported by numerous applications inclu= ding > > > Winzip and 7-zip. > > > The textual representation of the wallet's extended public key (as > > defined > > > by BIP32, with an xpub header) should be used as the passwor= d. > > Not specific to AES, but I don't see the benefit of encrypting addresse= s > > and labels together. Can you please elaborate why this would be desirea= ble? > > > > Like I said though, it's better to leave it up to users to decide how t= o > > store their exports, since BIPs can't enforce that anyway (additionally= , > > the password you propose is insecure - anybody with access to the walle= t > > can unlock it, which is not desireable to some users who want their own > > security). > > > > > * Transaction ID (txid) > > > * Address > > > * Input (rendered as txid) > > > * Output (rendered as txid>index or txid:index) > > Why the need for input and output formats? There is no difference betwe= en > > them on the wallet level, because they are always identified with a txi= d > > and output index. To distinguish between them and hence write them with= the > > correct format would require a UTXO set and thus access to a full node, > > otherwise the CSV cannot be verified to be completely well-formed. > > > > Another important point is that practically nobody labels inputs or > > outputs because most people do not know that those things even exist, a= nd > > the rest don't bother to label them. > > > > But the biggest downside to including them is related to the problem of > > information leaking which you make reference to here: > > > In both cases, care must be taken when spending to avoid undesirable > > leaks > > > of private information. > > A CSV dump that has inputs/outputs and addresses mixed together can inf= er > > the owner of all those items. In fact, A CVS label dump is basically a > > personal information store so everything in it can be correlated as com= ing > > from the same wallet, so it's important that unnecessary types are kept= out > > of the format. People are known to leave files lying around on their > > computer that they don't need anymore, so these files can find their wa= y > > via telemetry to surveillence entities. While we can't specify what use= rs > > can do with their exports, we can control the information leak by > > preventing certain types of items that we know most users will never us= e > > from being exported in the first place. > > > > > The order in which these records appear is not defined. > > Again, since the primary use case for this BIP is wallets, which likely > > use heirarchical derivation schemes like BIP44, there is a net benefit = for > > the addresses to be exported in ascending order of their `address_type`= . It > > means that wallets can import them in O(n) time as opposed to O(n^2) ti= me > > spent serially checking in which index the address appears at. Of cours= e, > > this implies that all addresses up to a certain index have to be export= ed > > into the CSV as well, but most wallets I know of like Core, Electrum > > already store addresses like that. > > > > Also if you do this, you will need to group all the transaction records > > before the address records or vice versa - you can use lexigraphical > > sorting if you want (ie. Addresses before Transactions). The benefit of > > this separation of parts is that wallets can split the imported address > > records from the transaction records internally, and feed them to separ= ate > > functions which set these labels internally. > > > > If you decide on doing it this way, then you need a 3rd column to ident= ify > > the item type, and also you should quote the label (see below). I stron= gly > > recommend using numbers for identification as opposed to character stri= ngs, > > so you don't have to worry about localization or character case issues. > > There is always one unique number, but there could be multiple strings = that > > reference the same type. This will complicate importing functions. > > > > If you insist on include Input and Output types then they can both be > > specified as : if you do this change. They won't be used t= o > > determine the type anyway. > > > > > The fields may be quoted, but this is unnecessary, as the first comma= in > > > the line will always be the delimiter. > > Don't implement it like that, because that will break CSV parsers which > > expect a fixed amount of rows in each record (2 in the header, and some > > rows have >2 rows). It's better to mandate that they should always be > > double-quoted, since only wallets will generate label exports anyway. I= f > > you plan to use headers then the 3rd column can be blank for it (or you= can > > split the version and flags from each other). > > > > > =3D=3DImporting=3D=3D > > > > > > When importing, a naive algorithm may simply match against any refere= nce, > > > but it is possible to disambiguate between transactions, addresses, > > inputs > > > and outputs. > > > For example in the following pseudocode: > > >
> > >   if reference length < 64
> > >     Set address label
> > >   else if reference length =3D=3D 64
> > >     Set transaction label
> > >   else if reference contains '<'
> > >     Set input label
> > >   else
> > >     Set output label
> > > 
> > The importing code is too naive and in its current form will prevent th= e > > BIP from getting a number. It is perhaps the single most important part= of > > a BIP. When implementing an importer, it should utilize a dedicate item > > type field that unambiguously identifies the item. So the naive importe= r is > > not good, you need use a 3rd column for that like I explained above, so > > that the importer becomes robust. > > > > In summary (exclamation marks indicate severity - one means low, two me= ans > > medium, and three means high): > > > > 1. Convert the header into a version line with optional flags, otherwis= e > > nobody can extend this format without compatibility issues (!) > > 2. Get rid of the specs related to file compression (!!!) > > 3. Add a 3rd column for item type (address, transaction etc.) preferabl= y > > as numeric constants and grouping items of one type after items of anot= her > > type, or if you insist on strings, then only recognize their Titlecase > > ASCII versions > the words> (!!) > > 4. Require double quotes around the label (or single quotes if you pref= er, > > as long as spreadsheet software doesn't choke on them) (!!) > > 5. Require sorting the records according to the order they are stored i= n > > the wallet implementation. (!) > > 6. Consider getting rid of Input and Output item types. (!) > > 7. And last and most importantly, please write a more robust importer > > algorithm in the example given by the BIP, because code in BIPs are > > frequently used as references for software. (!!!) > > > > I hope you will consider these points in future revisions of your BIP. > > > > - Ali > > > > [1] https://github.com/snyk/zip-slip-vulnerability > > > > On Wed, 24 Aug 2022 11:18:43 +0200, craigraw@gmail.com wrote: > > > Hi all, > > > > > > I would like to propose a BIP that specifies a format for the export = and > > > import of labels from a wallet. While transferring access to funds ac= ross > > > wallet applications has been made simple through standards such as BI= P39, > > > wallet labels remain siloed and difficult to extract despite their va= lue, > > > particularly in a privacy context. > > > > > > The proposed format is a simple two column CSV file, with the referen= ce > > to > > > a transaction, address, input or output in the first column, and the > > label > > > in the second column. CSV was chosen for its wide accessibility, > > especially > > > to users without specific technical expertise. Similarly, the CSV fil= e > > may > > > be compressed using the ZIP format, and optionally encrypted using AE= S. > > > > > > The full text of the BIP can be found at > > > https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediaw= iki > > > and also copied below. > > > > > > Feedback is appreciated. > > > > > > Thanks, > > > Craig Raw > > > > > > --- > > > > > >
> > >   BIP: wallet-labels
> > >   Layer: Applications
> > >   Title: Wallet Labels Export Format
> > >   Author: Craig Raw 
> > >   Comments-Summary: No comments yet.
> > >   Comments-URI:
> > > https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
> > >   Status: Draft
> > >   Type: Informational
> > >   Created: 2022-08-23
> > >   License: BSD-2-Clause
> > > 
> > > > > > =3D=3DAbstract=3D=3D > > > > > > This document specifies a format for the export of labels that may be > > > attached to the transactions, addresses, input and outputs in a walle= t. > > > > > > =3D=3DCopyright=3D=3D > > > > > > This BIP is licensed under the BSD 2-clause license. > > > > > > =3D=3DMotivation=3D=3D > > > > > > The export and import of funds across different Bitcoin wallet > > applications > > > is well defined through standards such as BIP39, BIP32, BIP44 etc. > > > These standards are well supported and allow users to move easily bet= ween > > > different wallets. > > > There is, however, no defined standard to transfer any labels the use= r > > may > > > have applied to the transactions, addresses, inputs or outputs in the= ir > > > wallet. > > > The UTXO model that Bitcoin uses makes these labels particularly valu= able > > > as they may indicate the source of funds, whether received externally= or > > as > > > a result of change from a prior transaction. > > > In both cases, care must be taken when spending to avoid undesirable > > leaks > > > of private information. > > > Labels provide valuable guidance in this regard, and have even become > > > mandatory when spending in several Bitcoin wallets. > > > Allowing users to export their labels in a standardized way ensures t= hat > > > they do not experience lock-in to a particular wallet application. > > > In addition, by using common formats, this BIP seeks to make manual o= r > > bulk > > > management of labels accessible to users without specific technical > > > expertise. > > > > > > =3D=3DSpecification=3D=3D > > > > > > In order to make the import and export of labels as widely accessible= as > > > possible, this BIP uses the comma separated values (CSV) format, whic= h is > > > widely supported by consumer, business, and scientific applications. > > > Although the technical specification of CSV in RFC4180 is not always > > > followed, the application of the format in this BIP is simple enough = that > > > compatibility should not present a problem. > > > Moreover, the simplicity and forgiving nature of CSV (over for exampl= e > > > JSON) lends itself well to bulk label editing using spreadsheet and t= ext > > > editing tools. > > > > > > A CSV export of labels from a wallet must be a UTF-8 encoded text fil= e, > > > containing one record per line, with records containing two fields > > > delimited by a comma. > > > The fields may be quoted, but this is unnecessary, as the first comma= in > > > the line will always be the delimiter. > > > The first line in the file is a header, and should be ignored on impo= rt. > > > Thereafter, each line represents a record that refers to a label appl= ied > > in > > > the wallet. > > > The order in which these records appear is not defined. > > > > > > The first field in the record contains a reference to the transaction= , > > > address, input or output in the wallet. > > > This is specified as one of the following: > > > * Transaction ID (txid) > > > * Address > > > * Input (rendered as txid) > > > * Output (rendered as txid>index or txid:index) > > > > > > The second field contains the label applied to the reference. > > > Exporting applications may omit records with no labels or labels of z= ero > > > length. > > > Files exported should use the .csv file extension. > > > > > > In order to reduce file size while retaining wide accessibility, the = CSV > > > file may be compressed using the ZIP file format, using the .zip<= /tt> > > > file extension. > > > This .zip file may optionally be encrypted using either AES-= 128 > > or > > > AES-256 encryption, which is supported by numerous applications inclu= ding > > > Winzip and 7-zip. > > > In order to ensure that weak encryption does not proliferate, importe= rs > > > following this standard must refuse to import .zip files > > encrypted > > > with the weaker Zip 2.0 standard. > > > The textual representation of the wallet's extended public key (as > > defined > > > by BIP32, with an xpub header) should be used as the passwor= d. > > > > > > =3D=3DImporting=3D=3D > > > > > > When importing, a naive algorithm may simply match against any refere= nce, > > > but it is possible to disambiguate between transactions, addresses, > > inputs > > > and outputs. > > > For example in the following pseudocode: > > >
> > >   if reference length < 64
> > >     Set address label
> > >   else if reference length =3D=3D 64
> > >     Set transaction label
> > >   else if reference contains '<'
> > >     Set input label
> > >   else
> > >     Set output label
> > > 
> > > > > > Importing applications may truncate labels if necessary. > > > > > > =3D=3DTest Vectors=3D=3D > > > > > > The following fragment represents a wallet label export: > > >
> > > Reference,Label
> > >
> > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Trans=
action
> > > 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> > > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,I=
nput
> > >
> > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Out=
put
> > >
> > c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Out=
put
> > > (alternative)
> > > 
> > > > > > =3D=3DReference Implementation=3D=3D > > > > > > TBD