Return-Path: Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 992A2C002D for ; Wed, 21 Sep 2022 06:07:30 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 6AFDD60F3B for ; Wed, 21 Sep 2022 06:07:30 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 6AFDD60F3B Authentication-Results: smtp3.osuosl.org; dkim=pass (2048-bit key) header.d=nunchuk-io.20210112.gappssmtp.com header.i=@nunchuk-io.20210112.gappssmtp.com header.a=rsa-sha256 header.s=20210112 header.b=gsxCGZrK X-Virus-Scanned: amavisd-new at osuosl.org X-Spam-Flag: NO X-Spam-Score: -1.4 X-Spam-Level: X-Spam-Status: No, score=-1.4 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, PDS_BTC_ID=0.499, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qLT8pKxaG0j2 for ; Wed, 21 Sep 2022 06:07:27 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 47A7960A69 Received: from mail-yb1-xb31.google.com (mail-yb1-xb31.google.com [IPv6:2607:f8b0:4864:20::b31]) by smtp3.osuosl.org (Postfix) with ESMTPS id 47A7960A69 for ; Wed, 21 Sep 2022 06:07:27 +0000 (UTC) Received: by mail-yb1-xb31.google.com with SMTP id g5so6518888ybg.11 for ; Tue, 20 Sep 2022 23:07:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nunchuk-io.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=VDl/bQlRauVte+BOP6AsPYIuaV9jZ3e1S+/gDpO0MN0=; b=gsxCGZrKUs+KQl42HpLEfu5I0miFi+xnHlyXdyVFKO1Y5JMZPIMVvsUt4U+kNmTMsA 3rf0lxuT4mkQnIOpFAqM1Iffg+lUQydMI9os1RtDlMiKJtYk4m8uqV7kTmeThx3rQnY3 fHekP1z3IZtP8UJB6qzSz5/1q4/ZcoAlcVKA/vBefl6kdfvEEsFCiBWQvREnyqkINQ6N FDQnPGvQF+dexlW0wRZ4Pt65dbXaUkextJOi65iuF6CpdZFdCEylnJJqorg551N/OrnF 98uWefXXtuSc6iP5+gtzcPYGWQFon3p9yZ0hq/p+B5Gsy9zwAXiHlbaZATUmoFVR/v96 yFug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=VDl/bQlRauVte+BOP6AsPYIuaV9jZ3e1S+/gDpO0MN0=; b=TWVI34zYrDq8eO6nsbEWS1wxNMvh3aKqjUTEFrqHDZVFEP7uo15tBdk0unAnOzRAOE AYYrKdLNvc3dd+UwHiZ0eD8OiDaADvZ6Xmrpau/vkqoEq6ftsU52RJljkRZqjJgaZNsV WhSziGGfsZFg1pSg9hPwo8vZg1OepH1giUoQ9QjQXosH+NdGp0FXLjNruYDFbcYEKqje X5SUxI3GLWADVCW6AWauZSalW/MvqhxIrGM33l+I8Bjo/dxzQX4b+Mw0NgBx4a+vFIoj 2MyAzBlRrJ6uSAtVnsULiVRAs4DlQTgUHfKYXzQuDgD1K5PsVVyKsFcsXfxI4nqbPewa KzoQ== X-Gm-Message-State: ACrzQf2AENSpDT0Q//A5xbFPXIc3ErPPzkQxIsZDm9JRmyineLP+3hW3 ji7vf/GhWKQxanwSXDTsukJpp94t62sZQHwez850CGhaCtwEWg== X-Google-Smtp-Source: AMsMyM4cpCxKKrl+5k6S+u4vGioU/5DoU/QakxOc0Tqd2HyDvVOK+PTk3U4sU/z4hBxRDLXugI8HBAjjdDSOgfZqWBw= X-Received: by 2002:a25:f88:0:b0:6a2:805:7e70 with SMTP id 130-20020a250f88000000b006a208057e70mr23349830ybp.461.1663740445727; Tue, 20 Sep 2022 23:07:25 -0700 (PDT) MIME-Version: 1.0 References: <20220824190958.gklg3riadci3ttgm@artanis> In-Reply-To: From: Hugo Nguyen Date: Tue, 20 Sep 2022 23:07:12 -0700 Message-ID: To: Craig Raw , Bitcoin Protocol Discussion Content-Type: multipart/alternative; boundary="000000000000443a5805e929c2dc" X-Mailman-Approved-At: Wed, 21 Sep 2022 08:37:02 +0000 Cc: Ali Sherief Subject: Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Sep 2022 06:07:30 -0000 --000000000000443a5805e929c2dc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello Craig, Thank you for putting this proposal together. It is indeed another big missing piece of the puzzle. I would like to echo some of the comments already made by others (and you yourself) on this thread, that this proposal seems to have some inherent conflicts between the 2 goals it tries to achieve. > *Allowing users to import and export their labels in a standardized way ensures that they do not experience lock-in to a particular wallet application. As a secondary goal, by using common formats this BIP seeks to make manual or bulk management of labels accessible to users outside of wallet applications and without specific technical expertise.* IMHO, the reason these conflicts exist is because the first one is an engineering requirement, while the second one is a UX / product requirement= . Engineering requirements typically prioritize data integrity, reliability/robustness and performance. Do we want some sort of error detection / correction codes? What data format would be the most robust and least error-prone? Is CSV a good fit or not for this purpose? etc. UX requirements, on the other hand, typically prioritize convenience and ease of use. When we don=E2=80=99t separate these concerns it can backfire and we might = end up with a Frankenstein standard that is the worst of both worlds. That is: not quite robust in engineering terms, but also not quite user-friendly in product terms either. SLIP-132 is one such example. It tries to solve what are inherently engineering challenges =E2=80=94 how to manage the complexities that arose = due to the evolution of keys and scripts =E2=80=94 by sadly offloading those compl= exities onto the end users. The end result is user confusion (what kind of [?]PUB do I need here?) and a nightmare for engineers to maintain (the complexities are better managed via a high level language such as Output Descriptors). Keeping in this mind, I also think having 2 separate BIPs for this is better. Cheers, Hugo On Mon, Aug 29, 2022 at 4:26 AM Craig Raw via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > Thanks for your feedback @Ali. > > I am attempting to achieve two goals with this proposal, primarily for th= e > benefit of wallet users: > > Goal #1. Transfer labels between different wallet implementations > Goal #2. Manage labels in applications outside of Bitcoin wallets (such a= s > Excel) > > Much of the feedback so far has indicated the tension between these two > goals - it may be that it is too difficult to achieve both, in which case > Goal #1 is the most important. That said, I think further exploration is > still necessary before abandoning Goal #2, because removing it would > significantly reduce the value of this proposal and mean users need to re= ly > on application-specific workarounds. > > > it is important that a version byte is defined > If Goal #2 is to be achieved it's difficult to mandate this, particularly > if one requires bit flags to be set. Should an importing wallet fail to > import if the version byte is not present, even if all the data is > otherwise correct? Although it is difficult to know in advance how a form= at > may be extended, it is certainly possible to extend this format with > additional types where the nature of hashes serve as unique identifiers > (more on this below). > > > Don't mandate the file extension... There is no way to enforce this on > a BIP level. > I'm not quite sure what you mean here - for example BIP174, which is > widely used, states "Binary PSBT files should use the .psbt file > extension." Also, this contradicts Goal #2 - Excel and Numbers register a= s > handlers for .csv, and so make it clear that the file is editable outside > of a wallet. > > > ZIP does not have good performance or compression ratio > Indeed, but it is very widely available. That said, gzip is supported > widely too these days. Unfortunately, gzip does not offer encryption (see > next answer). > > > ZIP is an archiving format, that happens to have its own compression > format. > I agree this is not ideal. My main reason for choosing ZIP was that it > supports encryption. It seems to me that without considering encryption, = an > application must create label export files that allow privacy-sensitive > wallet information to be readable in plain text. Being able to transfer > labels without risking privacy is IMO valuable. I considered other > encryption formats such as PGP, but they are much more niche and so again > contradict Goal #2. > > > I don't see the benefit of encrypting addresses and labels together... > additionally, the password you propose is insecure - anybody with access = to > the wallet can unlock it > I'm not sure I understand your question, but both wallet addresses and > wallet labels contain privacy-sensitive information that should be > protected. Wrt to the password, there is actually a more fundamental > problem with using the wallet xpub - there is no equivalent for multisig > wallets. For this reason I'll remove that requirement in future iteration= s. > > > Why the need for input and output formats? There is no difference > between them on the wallet level, because they are always identified with= a > txid and output index. > The input refers to the txid and the input index (in the set of vin), so > the difference is the context in which they are displayed. A wallet will > not necessarily store the spent outputs for a funding transaction > containing a UTXO coming into the wallet, but it will contain references = to > the inputs as part of that transaction. > > > Another important point is that practically nobody labels inputs or > outputs > To the contrary, UTXOs are very frequently labelled, as they link and > reveal information when spent. Inputs are much less frequently labelled, > but there is no particular reason to exclude them. > > > there is a net benefit for the addresses to be exported in ascending > order > Indeed, but it makes achieving Goal #2 much more difficult for marginal > benefit. > > > It's better to mandate that they should always be double-quoted, since > only wallets will generate label exports anyway. > Rather I think it's better to mandate RFC4180 is followed, as per > recommendations in other feedback. > > > The importing code is too naive... it should utilize a dedicate item > type field that unambiguously identifies the item > It's unclear to me what you mean here. As I've indicated it is currently > possible to disambiguate between addresses/transactions/etc without the > need for a 3rd column, but in any case the hash functions used ensure tha= t > labels will not be associated incorrectly. Even in the unlikely event of > some future address type being indistinguishable from a txid, it will > simply not match any txids in the wallet. > > Craig > > > > On Wed, Aug 24, 2022 at 9:10 PM Ali Sherief wrote: > >> Hi Craig, >> >> This a really good proposal. I studied your BIP and I have some feedback >> on some parts of it. >> >> > The first line in the file is a header, and should be ignored on impor= t. >> >> From past experience and lessons, most notably BIP39, it is important >> that a version byte is defined somewhere in case someone wants to extend= it >> in the future, currently there is no version byte which someone can >> increment if somebody wants to extend it. In the unique case of CSV file= s, >> you should make the header line mandatory (I see you have already implie= d >> this, but you should make it explicit in the BIP), but instead of a line >> with columns in it, I suggest instead of Reference,Label, you make the >> format like this: >> >> BIP-wallet-labels, >> >> Since there are two columns per record, this works out nicely. The first >> column can be the name of the BIP - BIPxxxx where the x's are numbers, a= nd >> the second column can be an unsigned 32-bit integer (most significant 8 >> bits reserved for version, the remaining for flags, or perhaps the entir= ety >> for version - but I recommend leaving at least some bits for flags, even= if >> they all end up being just "reserved"). >> >> You should make importing fail if the header line is not exactly as >> specified - or appropriate, should you decide a different format for the >> header. >> >> > Files exported should use the .csv file extension. >> Don't mandate the file extension (read below for why): >> >> > In order to reduce file size while retaining wide accessibility, the C= SV >> > file may be compressed using the ZIP file format, using the >> .zip >> > file extension. >> I see three problems with this. The first is more important than the >> later two because it makes them moot points, but I'll mention them anywa= y >> so you get a background of the situation: >> - The BIP is trying to specify in what file format the export format can >> be written in onto the filesystem. There is no way to enforce this on a = BIP >> level (besides, Unix operating systems don't even consider the file >> extension, they use its mimetype). Also specifying this in the BIP will >> prevent modular "Layer 2" protocols and schemes from encoding the Export >> labels into another format - for example Base64 or with their own >> compression algorithm. >> >> Now for the two "moot problems": >> - ZIP does not have good performance or compression ratio, there are >> better algorithms out there like gzip (which also happens to be more >> ubiquitous; nearly all websites are serving HTML compressed with gzip >> compression). >> - ZIP is an archiving format, that happens to have its own compression >> format. Archiving format parsers can have serious vulnerabilities in the= ir >> implementation that can allow malware to swipe private keys and password= s, >> since the primary target for this BIP is wallets. For example, there was >> Zip Slip[1] in 2018, which allows for remote code execution. So the malw= are >> can even hide in memory until private keys or passwords are written to >> memory, then send them accros the network. Assuming it's targeting a >> specific wallet software it's not hard to carry out at all. >> >> There's two solutions for all this: >> 1. The duck-tape solution: Use some compression algorithm like gzip >> instead of ZIP archive format. >> 2. The "throw it out and buy a new one" solution: Get rid of the optiona= l >> compression specs altogether, because users are responsible for supplyin= g >> the export labels in the first place, so all the compression stuff is >> redundant and should be left up to the user use if they desire to. >> >> I prefer the second solution because it hits the nail at the problem >> directly instead of putting duck tape on it like the first one. >> >> > This .zip file may optionally be encrypted using either >> AES-128 or >> > AES-256 encryption, which is supported by numerous applications >> including >> > Winzip and 7-zip. >> > The textual representation of the wallet's extended public key (as >> defined >> > by BIP32, with an xpub header) should be used as the password= . >> Not specific to AES, but I don't see the benefit of encrypting addresses >> and labels together. Can you please elaborate why this would be desireab= le? >> >> Like I said though, it's better to leave it up to users to decide how to >> store their exports, since BIPs can't enforce that anyway (additionally, >> the password you propose is insecure - anybody with access to the wallet >> can unlock it, which is not desireable to some users who want their own >> security). >> >> > * Transaction ID (txid) >> > * Address >> > * Input (rendered as txid) >> > * Output (rendered as txid>index or txid:index) >> Why the need for input and output formats? There is no difference betwee= n >> them on the wallet level, because they are always identified with a txid >> and output index. To distinguish between them and hence write them with = the >> correct format would require a UTXO set and thus access to a full node, >> otherwise the CSV cannot be verified to be completely well-formed. >> >> Another important point is that practically nobody labels inputs or >> outputs because most people do not know that those things even exist, an= d >> the rest don't bother to label them. >> >> But the biggest downside to including them is related to the problem of >> information leaking which you make reference to here: >> > In both cases, care must be taken when spending to avoid undesirable >> leaks >> > of private information. >> A CSV dump that has inputs/outputs and addresses mixed together can infe= r >> the owner of all those items. In fact, A CVS label dump is basically a >> personal information store so everything in it can be correlated as comi= ng >> from the same wallet, so it's important that unnecessary types are kept = out >> of the format. People are known to leave files lying around on their >> computer that they don't need anymore, so these files can find their way >> via telemetry to surveillence entities. While we can't specify what user= s >> can do with their exports, we can control the information leak by >> preventing certain types of items that we know most users will never use >> from being exported in the first place. >> >> > The order in which these records appear is not defined. >> Again, since the primary use case for this BIP is wallets, which likely >> use heirarchical derivation schemes like BIP44, there is a net benefit f= or >> the addresses to be exported in ascending order of their `address_type`.= It >> means that wallets can import them in O(n) time as opposed to O(n^2) tim= e >> spent serially checking in which index the address appears at. Of course= , >> this implies that all addresses up to a certain index have to be exporte= d >> into the CSV as well, but most wallets I know of like Core, Electrum >> already store addresses like that. >> >> Also if you do this, you will need to group all the transaction records >> before the address records or vice versa - you can use lexigraphical >> sorting if you want (ie. Addresses before Transactions). The benefit of >> this separation of parts is that wallets can split the imported address >> records from the transaction records internally, and feed them to separa= te >> functions which set these labels internally. >> >> If you decide on doing it this way, then you need a 3rd column to >> identify the item type, and also you should quote the label (see below).= I >> strongly recommend using numbers for identification as opposed to charac= ter >> strings, so you don't have to worry about localization or character case >> issues. There is always one unique number, but there could be multiple >> strings that reference the same type. This will complicate importing >> functions. >> >> If you insist on include Input and Output types then they can both be >> specified as : if you do this change. They won't be used to >> determine the type anyway. >> >> > The fields may be quoted, but this is unnecessary, as the first comma = in >> > the line will always be the delimiter. >> Don't implement it like that, because that will break CSV parsers which >> expect a fixed amount of rows in each record (2 in the header, and some >> rows have >2 rows). It's better to mandate that they should always be >> double-quoted, since only wallets will generate label exports anyway. If >> you plan to use headers then the 3rd column can be blank for it (or you = can >> split the version and flags from each other). >> >> > =3D=3DImporting=3D=3D >> > >> > When importing, a naive algorithm may simply match against any >> reference, >> > but it is possible to disambiguate between transactions, addresses, >> inputs >> > and outputs. >> > For example in the following pseudocode: >> >
>> >   if reference length < 64
>> >     Set address label
>> >   else if reference length =3D=3D 64
>> >     Set transaction label
>> >   else if reference contains '<'
>> >     Set input label
>> >   else
>> >     Set output label
>> > 
>> The importing code is too naive and in its current form will prevent the >> BIP from getting a number. It is perhaps the single most important part = of >> a BIP. When implementing an importer, it should utilize a dedicate item >> type field that unambiguously identifies the item. So the naive importer= is >> not good, you need use a 3rd column for that like I explained above, so >> that the importer becomes robust. >> >> In summary (exclamation marks indicate severity - one means low, two >> means medium, and three means high): >> >> 1. Convert the header into a version line with optional flags, otherwise >> nobody can extend this format without compatibility issues (!) >> 2. Get rid of the specs related to file compression (!!!) >> 3. Add a 3rd column for item type (address, transaction etc.) preferably >> as numeric constants and grouping items of one type after items of anoth= er >> type, or if you insist on strings, then only recognize their Titlecase >> ASCII versions > the words> (!!) >> 4. Require double quotes around the label (or single quotes if you >> prefer, as long as spreadsheet software doesn't choke on them) (!!) >> 5. Require sorting the records according to the order they are stored in >> the wallet implementation. (!) >> 6. Consider getting rid of Input and Output item types. (!) >> 7. And last and most importantly, please write a more robust importer >> algorithm in the example given by the BIP, because code in BIPs are >> frequently used as references for software. (!!!) >> >> I hope you will consider these points in future revisions of your BIP. >> >> - Ali >> >> [1] https://github.com/snyk/zip-slip-vulnerability >> >> On Wed, 24 Aug 2022 11:18:43 +0200, craigraw@gmail.com wrote: >> > Hi all, >> > >> > I would like to propose a BIP that specifies a format for the export a= nd >> > import of labels from a wallet. While transferring access to funds >> across >> > wallet applications has been made simple through standards such as >> BIP39, >> > wallet labels remain siloed and difficult to extract despite their >> value, >> > particularly in a privacy context. >> > >> > The proposed format is a simple two column CSV file, with the referenc= e >> to >> > a transaction, address, input or output in the first column, and the >> label >> > in the second column. CSV was chosen for its wide accessibility, >> especially >> > to users without specific technical expertise. Similarly, the CSV file >> may >> > be compressed using the ZIP format, and optionally encrypted using AES= . >> > >> > The full text of the BIP can be found at >> > >> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki >> > and also copied below. >> > >> > Feedback is appreciated. >> > >> > Thanks, >> > Craig Raw >> > >> > --- >> > >> >
>> >   BIP: wallet-labels
>> >   Layer: Applications
>> >   Title: Wallet Labels Export Format
>> >   Author: Craig Raw 
>> >   Comments-Summary: No comments yet.
>> >   Comments-URI:
>> > https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>> >   Status: Draft
>> >   Type: Informational
>> >   Created: 2022-08-23
>> >   License: BSD-2-Clause
>> > 
>> > >> > =3D=3DAbstract=3D=3D >> > >> > This document specifies a format for the export of labels that may be >> > attached to the transactions, addresses, input and outputs in a wallet= . >> > >> > =3D=3DCopyright=3D=3D >> > >> > This BIP is licensed under the BSD 2-clause license. >> > >> > =3D=3DMotivation=3D=3D >> > >> > The export and import of funds across different Bitcoin wallet >> applications >> > is well defined through standards such as BIP39, BIP32, BIP44 etc. >> > These standards are well supported and allow users to move easily >> between >> > different wallets. >> > There is, however, no defined standard to transfer any labels the user >> may >> > have applied to the transactions, addresses, inputs or outputs in thei= r >> > wallet. >> > The UTXO model that Bitcoin uses makes these labels particularly >> valuable >> > as they may indicate the source of funds, whether received externally >> or as >> > a result of change from a prior transaction. >> > In both cases, care must be taken when spending to avoid undesirable >> leaks >> > of private information. >> > Labels provide valuable guidance in this regard, and have even become >> > mandatory when spending in several Bitcoin wallets. >> > Allowing users to export their labels in a standardized way ensures th= at >> > they do not experience lock-in to a particular wallet application. >> > In addition, by using common formats, this BIP seeks to make manual or >> bulk >> > management of labels accessible to users without specific technical >> > expertise. >> > >> > =3D=3DSpecification=3D=3D >> > >> > In order to make the import and export of labels as widely accessible = as >> > possible, this BIP uses the comma separated values (CSV) format, which >> is >> > widely supported by consumer, business, and scientific applications. >> > Although the technical specification of CSV in RFC4180 is not always >> > followed, the application of the format in this BIP is simple enough >> that >> > compatibility should not present a problem. >> > Moreover, the simplicity and forgiving nature of CSV (over for example >> > JSON) lends itself well to bulk label editing using spreadsheet and te= xt >> > editing tools. >> > >> > A CSV export of labels from a wallet must be a UTF-8 encoded text file= , >> > containing one record per line, with records containing two fields >> > delimited by a comma. >> > The fields may be quoted, but this is unnecessary, as the first comma = in >> > the line will always be the delimiter. >> > The first line in the file is a header, and should be ignored on impor= t. >> > Thereafter, each line represents a record that refers to a label >> applied in >> > the wallet. >> > The order in which these records appear is not defined. >> > >> > The first field in the record contains a reference to the transaction, >> > address, input or output in the wallet. >> > This is specified as one of the following: >> > * Transaction ID (txid) >> > * Address >> > * Input (rendered as txid) >> > * Output (rendered as txid>index or txid:index) >> > >> > The second field contains the label applied to the reference. >> > Exporting applications may omit records with no labels or labels of ze= ro >> > length. >> > Files exported should use the .csv file extension. >> > >> > In order to reduce file size while retaining wide accessibility, the C= SV >> > file may be compressed using the ZIP file format, using the >> .zip >> > file extension. >> > This .zip file may optionally be encrypted using either >> AES-128 or >> > AES-256 encryption, which is supported by numerous applications >> including >> > Winzip and 7-zip. >> > In order to ensure that weak encryption does not proliferate, importer= s >> > following this standard must refuse to import .zip files >> encrypted >> > with the weaker Zip 2.0 standard. >> > The textual representation of the wallet's extended public key (as >> defined >> > by BIP32, with an xpub header) should be used as the password= . >> > >> > =3D=3DImporting=3D=3D >> > >> > When importing, a naive algorithm may simply match against any >> reference, >> > but it is possible to disambiguate between transactions, addresses, >> inputs >> > and outputs. >> > For example in the following pseudocode: >> >
>> >   if reference length < 64
>> >     Set address label
>> >   else if reference length =3D=3D 64
>> >     Set transaction label
>> >   else if reference contains '<'
>> >     Set input label
>> >   else
>> >     Set output label
>> > 
>> > >> > Importing applications may truncate labels if necessary. >> > >> > =3D=3DTest Vectors=3D=3D >> > >> > The following fragment represents a wallet label export: >> >
>> > Reference,Label
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transa=
ction
>> > 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Inpu=
t
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Outp=
ut
>> >
>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Outp=
ut
>> > (alternative)
>> > 
>> > >> > =3D=3DReference Implementation=3D=3D >> > >> > TBD >> >> _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > --000000000000443a5805e929c2dc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello Craig,
Thank you for putting this proposal togeth= er. It is indeed another big missing piece of the puzzle.

I would li= ke to echo some of the comments already made by others (and you yourself) o= n this thread, that this proposal seems to have some inherent conflicts bet= ween the 2 goals it tries to achieve.

> Allowing users to impo= rt and export their labels in a standardized way ensures that they do not e= xperience lock-in to a particular wallet application. As a secondary goal, = by using common formats this BIP seeks to make manual or bulk management of= labels accessible to users outside of wallet applications and without spec= ific technical expertise.


IMHO, the reason these conflicts exist= is because the first one is an engineering requirement, while the second o= ne is a UX / product requirement.

Engineering requirements typically= prioritize data=C2=A0integrity, reliability/robustness and performance. Do we w= ant some sort of error detection / correction codes? What data format would= be the most robust and least error-prone? Is CSV a good fit or not for thi= s purpose? etc.

UX requirements, on the other hand, typically= prioritize=C2=A0convenience and ease of use.

When we don=E2=80=99t separate= these concerns it can backfire and we might end up with a Frankenstein sta= ndard that is the worst of both worlds. That is: not quite robust in engine= ering terms, but also not quite user-friendly in product terms either.
<= br>

SLIP-132 is one such examp= le. It tries to solve what are inherently engineering challenges =E2=80=94 = how to manage the complexities that arose due to the evolution of keys and = scripts =E2=80=94 by sadly offloading those complexities onto the end users= . The end result is user confusion (what kind of [?]PUB do I need here?) an= d a nightmare for engineers to maintain (the complexities are better manage= d via a high level language such as Output Descriptors).

Keeping in this mind, I al= so think=C2=A0having 2 separate BIPs for this is better.=C2=A0

C= heers,
Hugo




On Mon, Aug 29, 2022 at 4:26 AM Craig Raw via= bitcoin-dev <b= itcoin-dev@lists.linuxfoundation.org> wrote:
Thanks for your feedba= ck @Ali.

I am attempting to achieve=C2=A0two goals with = this proposal, primarily for the benefit of wallet users:

Goal #1. Transfer labels between different wallet implementations
Goal #2. Manage labels in applications outside=C2=A0of Bitcoin wal= lets (such as Excel)

Much of the feedback so far h= as indicated the tension between these two goals - it may be that it is too= difficult to achieve both, in which case Goal #1 is the most important. Th= at said, I think further exploration is still necessary before abandoning G= oal #2, because removing it would significantly reduce the value of this pr= oposal and mean users need to rely on application-specific workarounds.

> it is important that a version byte is defined
If Goal #2 is to be achieved it's difficult to mandate this, p= articularly if one requires bit flags to be set. Should an importing wallet= fail to import if the version byte is not present, even if all the data is= otherwise correct? Although it is difficult to know in advance how a forma= t may be extended, it is certainly possible to extend this format with addi= tional types where the nature of hashes serve as unique identifiers (more o= n this below).

=C2=A0> Don't mandate the fi= le extension... There is no way to enforce this on a BIP level.
I= 'm not quite sure what you mean here - for example BIP174, which is wid= ely used, states "Binary PSBT files should use the .psbt file extensio= n." Also, this contradicts Goal #2 - Excel and Numbers register as han= dlers for .csv, and so make it clear that the file is editable outside of a= wallet.

> ZIP does not have good performance o= r compression ratio
Indeed, but it is very widely available. That= said, gzip is supported widely too these days. Unfortunately, gzip does no= t offer encryption (see next answer).

> ZIP is = an archiving format, that happens to have its own compression format.
=
I agree this is not ideal. My main reason for choosing ZIP was that it= supports encryption. It seems to me that without considering encryption, a= n application must create label export files that allow privacy-sensitive w= allet information to be readable in plain text. Being able to transfer labe= ls without risking privacy is IMO valuable. I considered other encryption f= ormats such as PGP, but they are much more niche and so again contradict Go= al #2.

> I don't see the benefit of encrypt= ing addresses and labels together... additionally, the password you propose= is insecure - anybody with access to the wallet can unlock it
I&= #39;m not sure I understand your question, but both wallet addresses and wa= llet labels contain privacy-sensitive information that should be protected.= Wrt to the password, there is actually a more fundamental problem with usi= ng the wallet xpub - there is no=C2=A0equivalent for multisig wallets. For = this reason I'll remove that requirement in future iterations.

> Why the need for input and output formats? There is n= o difference between them on the wallet level, because they are always iden= tified with a txid and output index.
The input refers to the txid= and the input index (in the set of vin), so the difference is the context = in which they are displayed. A wallet will not necessarily store the spent = outputs for a funding transaction containing a UTXO coming into the wallet,= but it will contain references to the inputs as part of that transaction.= =C2=A0

> Another important point is that practi= cally nobody labels inputs or outputs
To the contrary, UTXOs are = very frequently labelled, as they link and reveal information when spent. I= nputs are much less frequently labelled, but there is no=C2=A0particular re= ason to exclude them.

> there is a net benefit = for the addresses to be exported in ascending order
Indeed, but i= t makes achieving=C2=A0Goal #2 much more difficult for marginal benefit.

> It's better to mandate that they should alw= ays be double-quoted, since only wallets will generate label exports anyway= .
Rather I think it's better to mandate RFC4180 is followed, = as per recommendations in other feedback.

> The= importing code is too naive... it should utilize a dedicate item type fiel= d that unambiguously identifies the item
It's unclear to me w= hat you mean here. As I've indicated it is currently possible to disamb= iguate between addresses/transactions/etc without the need for a 3rd column= , but in any case the hash functions used ensure that labels will not be as= sociated incorrectly. Even in the unlikely event of some future address typ= e being indistinguishable from a txid, it will simply not match any txids i= n the wallet.

Craig


<= /div>

On Wed, Aug 24, 2022 at 9:10 PM Ali Sherief <ali@notatether.com> wrote:
Hi Craig,

This a really good proposal. I studied your BIP and I have some feedback on= some parts of it.

> The first line in the file is a header, and should be ignored on impor= t.

From past experience and lessons, most notably BIP39, it is important that = a version byte is defined somewhere in case someone wants to extend it in t= he future, currently there is no version byte which someone can increment i= f somebody wants to extend it. In the unique case of CSV files, you should = make the header line mandatory (I see you have already implied this, but yo= u should make it explicit in the BIP), but instead of a line with columns i= n it, I suggest instead of Reference,Label, you make the format like this:<= br>
BIP-wallet-labels,<version>

Since there are two columns per record, this works out nicely. The first co= lumn can be the name of the BIP - BIPxxxx where the x's are numbers, an= d the second column can be an unsigned 32-bit integer (most significant 8 b= its reserved for version, the remaining for flags, or perhaps the entirety = for version - but I recommend leaving at least some bits for flags, even if= they all end up being just "reserved").

You should make importing fail if the header line is not exactly as specifi= ed - or appropriate, should you decide a different format for the header.
> Files exported should use the <tt>.csv</tt> file extension= .
Don't mandate the file extension (read below for why):

> In order to reduce file size while retaining wide accessibility, the C= SV
> file may be compressed using the ZIP file format, using the <tt>= .zip</tt>
> file extension.
I see three problems with this. The first is more important than the later = two because it makes them moot points, but I'll mention them anyway so = you get a background of the situation:
- The BIP is trying to specify in what file format the export format can be= written in onto the filesystem. There is no way to enforce this on a BIP l= evel (besides, Unix operating systems don't even consider the file exte= nsion, they use its mimetype). Also specifying this in the BIP will prevent= modular "Layer 2" protocols and schemes from encoding the Export= labels into another format - for example Base64 or with their own compress= ion algorithm.

Now for the two "moot problems":
- ZIP does not have good performance or compression ratio, there are better= algorithms out there like gzip (which also happens to be more ubiquitous; = nearly all websites are serving HTML compressed with gzip compression).
- ZIP is an archiving format, that happens to have its own compression form= at. Archiving format parsers can have serious vulnerabilities in their impl= ementation that can allow malware to swipe private keys and passwords, sinc= e the primary target for this BIP is wallets. For example, there was Zip Sl= ip[1] in 2018, which allows for remote code execution. So the malware can e= ven hide in memory until private keys or passwords are written to memory, t= hen send them accros the network. Assuming it's targeting a specific wa= llet software it's not hard to carry out at all.

There's two solutions for all this:
1. The duck-tape solution: Use some compression algorithm like gzip instead= of ZIP archive format.
2. The "throw it out and buy a new one" solution: Get rid of the = optional compression specs altogether, because users are responsible for su= pplying the export labels in the first place, so all the compression stuff = is redundant and should be left up to the user use if they desire to.

I prefer the second solution because it hits the nail at the problem direct= ly instead of putting duck tape on it like the first one.

> This <tt>.zip</tt> file may optionally be encrypted using = either AES-128 or
> AES-256 encryption, which is supported by numerous applications includ= ing
> Winzip and 7-zip.
> The textual representation of the wallet's extended public key (as= defined
> by BIP32, with an <tt>xpub</tt> header) should be used as = the password.
Not specific to AES, but I don't see the benefit of encrypting addresse= s and labels together. Can you please elaborate why this would be desireabl= e?

Like I said though, it's better to leave it up to users to decide how t= o store their exports, since BIPs can't enforce that anyway (additional= ly, the password you propose is insecure - anybody with access to the walle= t can unlock it, which is not desireable to some users who want their own s= ecurity).

> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>= txid:index</tt>)
Why the need for input and output formats? There is no difference between t= hem on the wallet level, because they are always identified with a txid and= output index. To distinguish between them and hence write them with the co= rrect format would require a UTXO set and thus access to a full node, other= wise the CSV cannot be verified to be completely well-formed.

Another important point is that practically nobody labels inputs or outputs= because most people do not know that those things even exist, and the rest= don't bother to label them.

But the biggest downside to including them is related to the problem of inf= ormation leaking which you make reference to here:
> In both cases, care must be taken when spending to avoid undesirable l= eaks
> of private information.
A CSV dump that has inputs/outputs and addresses mixed together can infer t= he owner of all those items. In fact, A CVS label dump is basically a perso= nal information store so everything in it can be correlated as coming from = the same wallet, so it's important that unnecessary types are kept out = of the format. People are known to leave files lying around on their comput= er that they don't need anymore, so these files can find their way via = telemetry to surveillence entities. While we can't specify what users c= an do with their exports, we can control the information leak by preventing= certain types of items that we know most users will never use from being e= xported in the first place.

> The order in which these records appear is not defined.
Again, since the primary use case for this BIP is wallets, which likely use= heirarchical derivation schemes like BIP44, there is a net benefit for the= addresses to be exported in ascending order of their `address_type`. It me= ans that wallets can import them in O(n) time as opposed to O(n^2) time spe= nt serially checking in which index the address appears at. Of course, this= implies that all addresses up to a certain index have to be exported into = the CSV as well, but most wallets I know of like Core, Electrum already sto= re addresses like that.

Also if you do this, you will need to group all the transaction records bef= ore the address records or vice versa - you can use lexigraphical sorting i= f you want (ie. Addresses before Transactions). The benefit of this separat= ion of parts is that wallets can split the imported address records from th= e transaction records internally, and feed them to separate functions which= set these labels internally.

If you decide on doing it this way, then you need a 3rd column to identify = the item type, and also you should quote the label (see below). I strongly = recommend using numbers for identification as opposed to character strings,= so you don't have to worry about localization or character case issues= . There is always one unique number, but there could be multiple strings th= at reference the same type. This will complicate importing functions.

If you insist on include Input and Output types then they can both be speci= fied as <txid>:<index> if you do this change. They won't be= used to determine the type anyway.

> The fields may be quoted, but this is unnecessary, as the first comma = in
> the line will always be the delimiter.
Don't implement it like that, because that will break CSV parsers which= expect a fixed amount of rows in each record (2 in the header, and some ro= ws have >2 rows). It's better to mandate that they should always be = double-quoted, since only wallets will generate label exports anyway. If yo= u plan to use headers then the 3rd column can be blank for it (or you can s= plit the version and flags from each other).

> =3D=3DImporting=3D=3D
>
> When importing, a naive algorithm may simply match against any referen= ce,
> but it is possible to disambiguate between transactions, addresses, in= puts
> and outputs.
> For example in the following pseudocode:
> <pre>
>=C2=A0 =C2=A0if reference length < 64
>=C2=A0 =C2=A0 =C2=A0Set address label
>=C2=A0 =C2=A0else if reference length =3D=3D 64
>=C2=A0 =C2=A0 =C2=A0Set transaction label
>=C2=A0 =C2=A0else if reference contains '<'
>=C2=A0 =C2=A0 =C2=A0Set input label
>=C2=A0 =C2=A0else
>=C2=A0 =C2=A0 =C2=A0Set output label
> </pre>
The importing code is too naive and in its current form will prevent the BI= P from getting a number. It is perhaps the single most important part of a = BIP. When implementing an importer, it should utilize a dedicate item type = field that unambiguously identifies the item. So the naive importer is not = good, you need use a 3rd column for that like I explained above, so that th= e importer becomes robust.

In summary (exclamation marks indicate severity - one means low, two means = medium, and three means high):

1. Convert the header into a version line with optional flags, otherwise no= body can extend this format without compatibility issues (!)
2. Get rid of the specs related to file compression (!!!)
3. Add a 3rd column for item type (address, transaction etc.) preferably as= numeric constants and grouping items of one type after items of another ty= pe, or if you insist on strings, then only recognize their Titlecase ASCII = versions <spreadsheet software like Excel always tries to titlecase the = words> (!!)
4. Require double quotes around the label (or single quotes if you prefer, = as long as spreadsheet software doesn't choke on them) (!!)
5. Require sorting the records according to the order they are stored in th= e wallet implementation. (!)
6. Consider getting rid of Input and Output item types. (!)
7. And last and most importantly, please write a more robust importer algor= ithm in the example given by the BIP, because code in BIPs are frequently u= sed as references for software. (!!!)

I hope you will consider these points in future revisions of your BIP.

- Ali

[1] https://github.com/snyk/zip-slip-vulnerability=

On Wed, 24 Aug 2022 11:18:43 +0200, craigraw@gmail.com wrote:
> Hi all,
>
> I would like to propose a BIP that specifies a format for the export a= nd
> import of labels from a wallet. While transferring access to funds acr= oss
> wallet applications has been made simple through standards such as BIP= 39,
> wallet labels remain siloed and difficult to extract despite their val= ue,
> particularly in a privacy context.
>
> The proposed format is a simple two column CSV file, with the referenc= e to
> a transaction, address, input or output in the first column, and the l= abel
> in the second column. CSV was chosen for its wide accessibility, espec= ially
> to users without specific technical expertise. Similarly, the CSV file= may
> be compressed using the ZIP format, and optionally encrypted using AES= .
>
> The full text of the BIP can be found at
> https://github.com/crai= graw/bips/blob/master/bip-wallet-labels.mediawiki
> and also copied below.
>
> Feedback is appreciated.
>
> Thanks,
> Craig Raw
>
> ---
>
> <pre>
>=C2=A0 =C2=A0BIP: wallet-labels
>=C2=A0 =C2=A0Layer: Applications
>=C2=A0 =C2=A0Title: Wallet Labels Export Format
>=C2=A0 =C2=A0Author: Craig Raw <craig@sparrowwallet.com>
>=C2=A0 =C2=A0Comments-Summary: No comments yet.
>=C2=A0 =C2=A0Comments-URI:
> https://github.com/bitcoin/bips/= wiki/Comments:BIP-wallet-labels
>=C2=A0 =C2=A0Status: Draft
>=C2=A0 =C2=A0Type: Informational
>=C2=A0 =C2=A0Created: 2022-08-23
>=C2=A0 =C2=A0License: BSD-2-Clause
> </pre>
>
> =3D=3DAbstract=3D=3D
>
> This document specifies a format for the export of labels that may be<= br> > attached to the transactions, addresses, input and outputs in a wallet= .
>
> =3D=3DCopyright=3D=3D
>
> This BIP is licensed under the BSD 2-clause license.
>
> =3D=3DMotivation=3D=3D
>
> The export and import of funds across different Bitcoin wallet applica= tions
> is well defined through standards such as BIP39, BIP32, BIP44 etc.
> These standards are well supported and allow users to move easily betw= een
> different wallets.
> There is, however, no defined standard to transfer any labels the user= may
> have applied to the transactions, addresses, inputs or outputs in thei= r
> wallet.
> The UTXO model that Bitcoin uses makes these labels particularly valua= ble
> as they may indicate the source of funds, whether received externally = or as
> a result of change from a prior transaction.
> In both cases, care must be taken when spending to avoid undesirable l= eaks
> of private information.
> Labels provide valuable guidance in this regard, and have even become<= br> > mandatory when spending in several Bitcoin wallets.
> Allowing users to export their labels in a standardized way ensures th= at
> they do not experience lock-in to a particular wallet application.
> In addition, by using common formats, this BIP seeks to make manual or= bulk
> management of labels accessible to users without specific technical > expertise.
>
> =3D=3DSpecification=3D=3D
>
> In order to make the import and export of labels as widely accessible = as
> possible, this BIP uses the comma separated values (CSV) format, which= is
> widely supported by consumer, business, and scientific applications. > Although the technical specification of CSV in RFC4180 is not always > followed, the application of the format in this BIP is simple enough t= hat
> compatibility should not present a problem.
> Moreover, the simplicity and forgiving nature of CSV (over for example=
> JSON) lends itself well to bulk label editing using spreadsheet and te= xt
> editing tools.
>
> A CSV export of labels from a wallet must be a UTF-8 encoded text file= ,
> containing one record per line, with records containing two fields
> delimited by a comma.
> The fields may be quoted, but this is unnecessary, as the first comma = in
> the line will always be the delimiter.
> The first line in the file is a header, and should be ignored on impor= t.
> Thereafter, each line represents a record that refers to a label appli= ed in
> the wallet.
> The order in which these records appear is not defined.
>
> The first field in the record contains a reference to the transaction,=
> address, input or output in the wallet.
> This is specified as one of the following:
> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>= txid:index</tt>)
>
> The second field contains the label applied to the reference.
> Exporting applications may omit records with no labels or labels of ze= ro
> length.
> Files exported should use the <tt>.csv</tt> file extension= .
>
> In order to reduce file size while retaining wide accessibility, the C= SV
> file may be compressed using the ZIP file format, using the <tt>= .zip</tt>
> file extension.
> This <tt>.zip</tt> file may optionally be encrypted using = either AES-128 or
> AES-256 encryption, which is supported by numerous applications includ= ing
> Winzip and 7-zip.
> In order to ensure that weak encryption does not proliferate, importer= s
> following this standard must refuse to import <tt>.zip</tt>= ; files encrypted
> with the weaker Zip 2.0 standard.
> The textual representation of the wallet's extended public key (as= defined
> by BIP32, with an <tt>xpub</tt> header) should be used as = the password.
>
> =3D=3DImporting=3D=3D
>
> When importing, a naive algorithm may simply match against any referen= ce,
> but it is possible to disambiguate between transactions, addresses, in= puts
> and outputs.
> For example in the following pseudocode:
> <pre>
>=C2=A0 =C2=A0if reference length < 64
>=C2=A0 =C2=A0 =C2=A0Set address label
>=C2=A0 =C2=A0else if reference length =3D=3D 64
>=C2=A0 =C2=A0 =C2=A0Set transaction label
>=C2=A0 =C2=A0else if reference contains '<'
>=C2=A0 =C2=A0 =C2=A0Set input label
>=C2=A0 =C2=A0else
>=C2=A0 =C2=A0 =C2=A0Set output label
> </pre>
>
> Importing applications may truncate labels if necessary.
>
> =3D=3DTest Vectors=3D=3D
>
> The following fragment represents a wallet label export:
> <pre>
> Reference,Label
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Tran= saction
> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0= ,Input
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0= ,Output
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Ou= tput
> (alternative)
> </pre>
>
> =3D=3DReference Implementation=3D=3D
>
> TBD

_______________________________________________
bitcoin-dev mailing list
= bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mail= man/listinfo/bitcoin-dev
--000000000000443a5805e929c2dc--