summaryrefslogtreecommitdiff
path: root/54/5e6410bb98004a2d2d4f1271ebc7c4376b3802
blob: 5880147ba4865a79c9cae0738441f5475019745c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
Return-Path: <ali@notatether.com>
Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133])
 by lists.linuxfoundation.org (Postfix) with ESMTP id 06EBEC002D
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Wed, 24 Aug 2022 19:10:23 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by smtp2.osuosl.org (Postfix) with ESMTP id D0144400EC
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Wed, 24 Aug 2022 19:10:22 +0000 (UTC)
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org D0144400EC
Authentication-Results: smtp2.osuosl.org; dkim=pass (2048-bit key,
 unprotected) header.d=notatether.com header.i=@notatether.com
 header.a=rsa-sha256 header.s=protonmail header.b=GKxZoill
X-Virus-Scanned: amavisd-new at osuosl.org
X-Spam-Flag: NO
X-Spam-Score: -1.6
X-Spam-Level: 
X-Spam-Status: No, score=-1.6 tagged_above=-999 required=5
 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
 DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, PDS_BTC_ID=0.499,
 PDS_BTC_MSGID=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001]
 autolearn=no autolearn_force=no
Received: from smtp2.osuosl.org ([127.0.0.1])
 by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 5IL-jpzSxxqJ
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Wed, 24 Aug 2022 19:10:20 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.8.0
DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 1082D40017
Received: from mail-4018.proton.ch (mail-4018.proton.ch [185.70.40.18])
 by smtp2.osuosl.org (Postfix) with ESMTPS id 1082D40017
 for <bitcoin-dev@lists.linuxfoundation.org>;
 Wed, 24 Aug 2022 19:10:19 +0000 (UTC)
Date: Wed, 24 Aug 2022 19:10:04 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=notatether.com;
 s=protonmail; t=1661368217; x=1661627417;
 bh=rDx+fAJJeSD2oDOZqfgkVSh3KkzElOSXacEjpa7UinM=;
 h=Date:To:From:Cc:Reply-To:Subject:Message-ID:In-Reply-To:
 References:Feedback-ID:From:To:Cc:Date:Subject:Reply-To:
 Feedback-ID:Message-ID;
 b=GKxZoillYSWxI2A75D+jpSMldMOwjtYxxy0f+O/NJSVNvA3eOShX+eb+qR5Vyt1o/
 dy9Mot1I0douBO31qF4nQXYOMl9aYLLnVV98KSvdH7kFfOCwWOa2e6DoN+lUovzye9
 qgd47GyT2okzWLGmCYnZpZfPCJ2NY1/Gv7TkP28v/VquWSAr5VjxaWVz0OJrfHXyHT
 /x2Qniwz0wwj+T9zxBHsBl2Gs2pCe2k/XF1y+/TKh4nAUpMwY55soXHIoUtb6p7+6O
 hOlNmVGHdk+m2eUCQ49PSybyWy2P+zLEQ15i6qq7wU30H3nq2z6O2BrgzSmMdUkuOa
 1zrBfnaWgCpvA==
To: craigraw@gmail.com
From: Ali Sherief <ali@notatether.com>
Reply-To: Ali Sherief <ali@notatether.com>
Message-ID: <20220824190958.gklg3riadci3ttgm@artanis>
In-Reply-To: <mailman.9.1661342403.3868.bitcoin-dev@lists.linuxfoundation.org>
References: <mailman.9.1661342403.3868.bitcoin-dev@lists.linuxfoundation.org>
Feedback-ID: 34210769:user:proton
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Mailman-Approved-At: Wed, 24 Aug 2022 19:43:52 +0000
Cc: bitcoin-dev@lists.linuxfoundation.org
Subject: Re: [bitcoin-dev] BIP Proposal: Wallet Labels Export Format
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, 
 <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Aug 2022 19:10:23 -0000

Hi Craig,

This a really good proposal. I studied your BIP and I have some feedback on=
 some parts of it.

> The first line in the file is a header, and should be ignored on import.

From past experience and lessons, most notably BIP39, it is important that =
a version byte is defined somewhere in case someone wants to extend it in t=
he future, currently there is no version byte which someone can increment i=
f somebody wants to extend it. In the unique case of CSV files, you should =
make the header line mandatory (I see you have already implied this, but yo=
u should make it explicit in the BIP), but instead of a line with columns i=
n it, I suggest instead of Reference,Label, you make the format like this:

BIP-wallet-labels,<version>

Since there are two columns per record, this works out nicely. The first co=
lumn can be the name of the BIP - BIPxxxx where the x's are numbers, and th=
e second column can be an unsigned 32-bit integer (most significant 8 bits =
reserved for version, the remaining for flags, or perhaps the entirety for =
version - but I recommend leaving at least some bits for flags, even if the=
y all end up being just "reserved").

You should make importing fail if the header line is not exactly as specifi=
ed - or appropriate, should you decide a different format for the header.

> Files exported should use the <tt>.csv</tt> file extension.
Don't mandate the file extension (read below for why):

> In order to reduce file size while retaining wide accessibility, the CSV
> file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> file extension.
I see three problems with this. The first is more important than the later =
two because it makes them moot points, but I'll mention them anyway so you =
get a background of the situation:
- The BIP is trying to specify in what file format the export format can be=
 written in onto the filesystem. There is no way to enforce this on a BIP l=
evel (besides, Unix operating systems don't even consider the file extensio=
n, they use its mimetype). Also specifying this in the BIP will prevent mod=
ular "Layer 2" protocols and schemes from encoding the Export labels into a=
nother format - for example Base64 or with their own compression algorithm.

Now for the two "moot problems":
- ZIP does not have good performance or compression ratio, there are better=
 algorithms out there like gzip (which also happens to be more ubiquitous; =
nearly all websites are serving HTML compressed with gzip compression).
- ZIP is an archiving format, that happens to have its own compression form=
at. Archiving format parsers can have serious vulnerabilities in their impl=
ementation that can allow malware to swipe private keys and passwords, sinc=
e the primary target for this BIP is wallets. For example, there was Zip Sl=
ip[1] in 2018, which allows for remote code execution. So the malware can e=
ven hide in memory until private keys or passwords are written to memory, t=
hen send them accros the network. Assuming it's targeting a specific wallet=
 software it's not hard to carry out at all.

There's two solutions for all this:
1. The duck-tape solution: Use some compression algorithm like gzip instead=
 of ZIP archive format.
2. The "throw it out and buy a new one" solution: Get rid of the optional c=
ompression specs altogether, because users are responsible for supplying th=
e export labels in the first place, so all the compression stuff is redunda=
nt and should be left up to the user use if they desire to.

I prefer the second solution because it hits the nail at the problem direct=
ly instead of putting duck tape on it like the first one.

> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 =
or
> AES-256 encryption, which is supported by numerous applications including
> Winzip and 7-zip.
> The textual representation of the wallet's extended public key (as define=
d
> by BIP32, with an <tt>xpub</tt> header) should be used as the password.
Not specific to AES, but I don't see the benefit of encrypting addresses an=
d labels together. Can you please elaborate why this would be desireable?

Like I said though, it's better to leave it up to users to decide how to st=
ore their exports, since BIPs can't enforce that anyway (additionally, the =
password you propose is insecure - anybody with access to the wallet can un=
lock it, which is not desireable to some users who want their own security)=
.

> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
Why the need for input and output formats? There is no difference between t=
hem on the wallet level, because they are always identified with a txid and=
 output index. To distinguish between them and hence write them with the co=
rrect format would require a UTXO set and thus access to a full node, other=
wise the CSV cannot be verified to be completely well-formed.

Another important point is that practically nobody labels inputs or outputs=
 because most people do not know that those things even exist, and the rest=
 don't bother to label them.

But the biggest downside to including them is related to the problem of inf=
ormation leaking which you make reference to here:
> In both cases, care must be taken when spending to avoid undesirable leak=
s
> of private information.
A CSV dump that has inputs/outputs and addresses mixed together can infer t=
he owner of all those items. In fact, A CVS label dump is basically a perso=
nal information store so everything in it can be correlated as coming from =
the same wallet, so it's important that unnecessary types are kept out of t=
he format. People are known to leave files lying around on their computer t=
hat they don't need anymore, so these files can find their way via telemetr=
y to surveillence entities. While we can't specify what users can do with t=
heir exports, we can control the information leak by preventing certain typ=
es of items that we know most users will never use from being exported in t=
he first place.

> The order in which these records appear is not defined.
Again, since the primary use case for this BIP is wallets, which likely use=
 heirarchical derivation schemes like BIP44, there is a net benefit for the=
 addresses to be exported in ascending order of their `address_type`. It me=
ans that wallets can import them in O(n) time as opposed to O(n^2) time spe=
nt serially checking in which index the address appears at. Of course, this=
 implies that all addresses up to a certain index have to be exported into =
the CSV as well, but most wallets I know of like Core, Electrum already sto=
re addresses like that.

Also if you do this, you will need to group all the transaction records bef=
ore the address records or vice versa - you can use lexigraphical sorting i=
f you want (ie. Addresses before Transactions). The benefit of this separat=
ion of parts is that wallets can split the imported address records from th=
e transaction records internally, and feed them to separate functions which=
 set these labels internally.

If you decide on doing it this way, then you need a 3rd column to identify =
the item type, and also you should quote the label (see below). I strongly =
recommend using numbers for identification as opposed to character strings,=
 so you don't have to worry about localization or character case issues. Th=
ere is always one unique number, but there could be multiple strings that r=
eference the same type. This will complicate importing functions.

If you insist on include Input and Output types then they can both be speci=
fied as <txid>:<index> if you do this change. They won't be used to determi=
ne the type anyway.

> The fields may be quoted, but this is unnecessary, as the first comma in
> the line will always be the delimiter.
Don't implement it like that, because that will break CSV parsers which exp=
ect a fixed amount of rows in each record (2 in the header, and some rows h=
ave >2 rows). It's better to mandate that they should always be double-quot=
ed, since only wallets will generate label exports anyway. If you plan to u=
se headers then the 3rd column can be blank for it (or you can split the ve=
rsion and flags from each other).

> =3D=3DImporting=3D=3D
>
> When importing, a naive algorithm may simply match against any reference,
> but it is possible to disambiguate between transactions, addresses, input=
s
> and outputs.
> For example in the following pseudocode:
> <pre>
>   if reference length < 64
>     Set address label
>   else if reference length =3D=3D 64
>     Set transaction label
>   else if reference contains '<'
>     Set input label
>   else
>     Set output label
> </pre>
The importing code is too naive and in its current form will prevent the BI=
P from getting a number. It is perhaps the single most important part of a =
BIP. When implementing an importer, it should utilize a dedicate item type =
field that unambiguously identifies the item. So the naive importer is not =
good, you need use a 3rd column for that like I explained above, so that th=
e importer becomes robust.

In summary (exclamation marks indicate severity - one means low, two means =
medium, and three means high):

1. Convert the header into a version line with optional flags, otherwise no=
body can extend this format without compatibility issues (!)
2. Get rid of the specs related to file compression (!!!)
3. Add a 3rd column for item type (address, transaction etc.) preferably as=
 numeric constants and grouping items of one type after items of another ty=
pe, or if you insist on strings, then only recognize their Titlecase ASCII =
versions <spreadsheet software like Excel always tries to titlecase the wor=
ds> (!!)
4. Require double quotes around the label (or single quotes if you prefer, =
as long as spreadsheet software doesn't choke on them) (!!)
5. Require sorting the records according to the order they are stored in th=
e wallet implementation. (!)
6. Consider getting rid of Input and Output item types. (!)
7. And last and most importantly, please write a more robust importer algor=
ithm in the example given by the BIP, because code in BIPs are frequently u=
sed as references for software. (!!!)

I hope you will consider these points in future revisions of your BIP.

- Ali

[1] https://github.com/snyk/zip-slip-vulnerability

On Wed, 24 Aug 2022 11:18:43 +0200, craigraw@gmail.com wrote:
> Hi all,
>
> I would like to propose a BIP that specifies a format for the export and
> import of labels from a wallet. While transferring access to funds across
> wallet applications has been made simple through standards such as BIP39,
> wallet labels remain siloed and difficult to extract despite their value,
> particularly in a privacy context.
>
> The proposed format is a simple two column CSV file, with the reference t=
o
> a transaction, address, input or output in the first column, and the labe=
l
> in the second column. CSV was chosen for its wide accessibility, especial=
ly
> to users without specific technical expertise. Similarly, the CSV file ma=
y
> be compressed using the ZIP format, and optionally encrypted using AES.
>
> The full text of the BIP can be found at
> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki
> and also copied below.
>
> Feedback is appreciated.
>
> Thanks,
> Craig Raw
>
> ---
>
> <pre>
>   BIP: wallet-labels
>   Layer: Applications
>   Title: Wallet Labels Export Format
>   Author: Craig Raw <craig@sparrowwallet.com>
>   Comments-Summary: No comments yet.
>   Comments-URI:
> https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels
>   Status: Draft
>   Type: Informational
>   Created: 2022-08-23
>   License: BSD-2-Clause
> </pre>
>
> =3D=3DAbstract=3D=3D
>
> This document specifies a format for the export of labels that may be
> attached to the transactions, addresses, input and outputs in a wallet.
>
> =3D=3DCopyright=3D=3D
>
> This BIP is licensed under the BSD 2-clause license.
>
> =3D=3DMotivation=3D=3D
>
> The export and import of funds across different Bitcoin wallet applicatio=
ns
> is well defined through standards such as BIP39, BIP32, BIP44 etc.
> These standards are well supported and allow users to move easily between
> different wallets.
> There is, however, no defined standard to transfer any labels the user ma=
y
> have applied to the transactions, addresses, inputs or outputs in their
> wallet.
> The UTXO model that Bitcoin uses makes these labels particularly valuable
> as they may indicate the source of funds, whether received externally or =
as
> a result of change from a prior transaction.
> In both cases, care must be taken when spending to avoid undesirable leak=
s
> of private information.
> Labels provide valuable guidance in this regard, and have even become
> mandatory when spending in several Bitcoin wallets.
> Allowing users to export their labels in a standardized way ensures that
> they do not experience lock-in to a particular wallet application.
> In addition, by using common formats, this BIP seeks to make manual or bu=
lk
> management of labels accessible to users without specific technical
> expertise.
>
> =3D=3DSpecification=3D=3D
>
> In order to make the import and export of labels as widely accessible as
> possible, this BIP uses the comma separated values (CSV) format, which is
> widely supported by consumer, business, and scientific applications.
> Although the technical specification of CSV in RFC4180 is not always
> followed, the application of the format in this BIP is simple enough that
> compatibility should not present a problem.
> Moreover, the simplicity and forgiving nature of CSV (over for example
> JSON) lends itself well to bulk label editing using spreadsheet and text
> editing tools.
>
> A CSV export of labels from a wallet must be a UTF-8 encoded text file,
> containing one record per line, with records containing two fields
> delimited by a comma.
> The fields may be quoted, but this is unnecessary, as the first comma in
> the line will always be the delimiter.
> The first line in the file is a header, and should be ignored on import.
> Thereafter, each line represents a record that refers to a label applied =
in
> the wallet.
> The order in which these records appear is not defined.
>
> The first field in the record contains a reference to the transaction,
> address, input or output in the wallet.
> This is specified as one of the following:
> * Transaction ID (<tt>txid</tt>)
> * Address
> * Input (rendered as <tt>txid<index</tt>)
> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>)
>
> The second field contains the label applied to the reference.
> Exporting applications may omit records with no labels or labels of zero
> length.
> Files exported should use the <tt>.csv</tt> file extension.
>
> In order to reduce file size while retaining wide accessibility, the CSV
> file may be compressed using the ZIP file format, using the <tt>.zip</tt>
> file extension.
> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 =
or
> AES-256 encryption, which is supported by numerous applications including
> Winzip and 7-zip.
> In order to ensure that weak encryption does not proliferate, importers
> following this standard must refuse to import <tt>.zip</tt> files encrypt=
ed
> with the weaker Zip 2.0 standard.
> The textual representation of the wallet's extended public key (as define=
d
> by BIP32, with an <tt>xpub</tt> header) should be used as the password.
>
> =3D=3DImporting=3D=3D
>
> When importing, a naive algorithm may simply match against any reference,
> but it is possible to disambiguate between transactions, addresses, input=
s
> and outputs.
> For example in the following pseudocode:
> <pre>
>   if reference length < 64
>     Set address label
>   else if reference length =3D=3D 64
>     Set transaction label
>   else if reference contains '<'
>     Set input label
>   else
>     Set output label
> </pre>
>
> Importing applications may truncate labels if necessary.
>
> =3D=3DTest Vectors=3D=3D
>
> The following fragment represents a wallet label export:
> <pre>
> Reference,Label
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transac=
tion
> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Outpu=
t
> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Outpu=
t
> (alternative)
> </pre>
>
> =3D=3DReference Implementation=3D=3D
>
> TBD