summaryrefslogtreecommitdiff
path: root/16/77c5a2e28ea1fc521774a580eac0b547eed28f
blob: 1d35177b32832d71375c82546629132c950cfcdd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
Return-Path: <roconnor@blockstream.io>
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id C347C92B
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 22 May 2017 07:06:13 +0000 (UTC)
X-Greylist: whitelisted by SQLgrey-1.7.6
Received: from mail-vk0-f54.google.com (mail-vk0-f54.google.com
	[209.85.213.54])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 2CC36AD
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 22 May 2017 07:06:11 +0000 (UTC)
Received: by mail-vk0-f54.google.com with SMTP id y190so36008277vkc.1
	for <bitcoin-dev@lists.linuxfoundation.org>;
	Mon, 22 May 2017 00:06:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=blockstream-io.20150623.gappssmtp.com; s=20150623;
	h=mime-version:from:date:message-id:subject:to;
	bh=WdAKGZh14K850D6Mm6InMTzHe/BTCo13i9XDPyZLSik=;
	b=M8PxhY/7e01FQqTZrxxnm3gcLb83uPrBf6SOtyYYrgLxOW/UBGUYqx2tj88OmTXqXa
	6jaCCUeOjeWv1wYulTWH5MMZx5jqRC9hi1i/ppF2f5ss0+kzhneZeRkbnXtUskhHA2Tc
	S52DsPMy18YoN2L8MVeNBh6TGuqKfuZlKeiTnwtCqh1jU832/hUvcpCQm11O8KCyGgmQ
	OUugqPMt4hNhWW/70uBcmFq+4dcXXrad/6TiuEDGN5SoSoMrIZVl9H8h+RaykxlAQ2Xs
	BbbZGRIfjqmcE5UpXsxZ2tsmRfDeWDa/8J3JGvf9TaYpWrFzGVkATPMdKtdh9SSQxJBb
	tAig==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
	bh=WdAKGZh14K850D6Mm6InMTzHe/BTCo13i9XDPyZLSik=;
	b=D+gq+UkLuUzIFR8FvKXTpdxGpgBjmSUj9RMrTsVP3Q/1kjUHcZqezZJPhkkThVBT1h
	TKevmTjm96h0oKpHbG/q40VkFd1uD2fiRSrMo7/KUKCXEylnFjwEEUT9E2V0/AjPeqF3
	c1Y9B4sHWNaEO3pTrK1YmMmqGDLk0QYcmVOg8+s3ZVEreApAHvVFrFWKjUn5mOCJPWX7
	qHwGQAvqiYAVK7gZe2ZcAljWGJ6W+LJDxAe9DAwWRTdJ3K773o7qM91XJdP7jOvGYv3S
	UQUyUM3ri6rJ7cgrBqyO+ugQnsEpJtb3kxiEffIjB5GrODLGon3WAbp61vHevbEHaMRY
	97aA==
X-Gm-Message-State: AODbwcC7zstcpCqzfuHNqHukJMfPlY+Reg1OMulvy7DVf6Cr4SZ9BQWa
	SYkvcs9nxPKqme+PTBzqa/Qbab5D6eyEcyVmNA==
X-Received: by 10.31.135.18 with SMTP id j18mr7624003vkd.134.1495436769658;
	Mon, 22 May 2017 00:06:09 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.176.95.90 with HTTP; Mon, 22 May 2017 00:05:49 -0700 (PDT)
From: "Russell O'Connor" <roconnor@blockstream.io>
Date: Mon, 22 May 2017 03:05:49 -0400
Message-ID: <CAMZUoK=f3hXHkqJBDfiLGSrgXi_ppgyH6+XWD9W54EYFWLm1+Q@mail.gmail.com>
To: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Content-Type: multipart/alternative; boundary="001a1145853470a2a60550178117"
X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, HTML_MESSAGE, RCVD_IN_DNSWL_NONE,
	RCVD_IN_SORBS_SPAM autolearn=no version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: [bitcoin-dev] A Method for Computing Merkle Roots of Annotated
	Binary Trees
X-BeenThere: bitcoin-dev@lists.linuxfoundation.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org>
List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe>
List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/>
List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org>
List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help>
List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>,
	<mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe>
X-List-Received-Date: Mon, 22 May 2017 07:06:13 -0000

--001a1145853470a2a60550178117
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

## Introduction
This document aims to specify and justify a method for computing Merkle
roots for tree structures whose nodes are annotated with other data.  While
this proposal could be used to replace Bitcoin's Merkle root calculation,
it is primarily aimed at new applications such as MAST, (U)TXO commitments
or other Merklized structures.

## Background
Bitcoin uses a Merkle tree construction to build a commitment to a sequence
of transactions for a Bitcoin block.  The main operation for computing a
Merkle tree is one that takes the recursively computed Merkle roots of two
branches and combines them to compute the Merkle root of the tree with
those two branches.  In Bitcoin, a Merkle root is 256 bits and the
construction is

    MerkleRoot :=3D SHA256(SHA256(LeftRoot =E2=8B=85 RightRoot))

One problem with this construction is that it is unnecessarily expensive.
While the concatenation of the LeftRoot and the RightRoot fits in 512 bits,
the size of a SHA256 chunk, a second chunk is needed to fit SHA256's
padding.  This means that inner SHA256 call invokes the SHA256 compression
function twice, once for each chunk.  The outer SHA256 call invokes the
SHA256 compression function a third time.  Bitcoin's Merkle root procedure
calls the SHA256 compression function a total of three times per node.

The main purpose of composing two calls to the SHA256 hash function is to
protect against length extension attacks.  A length extension attack is
where an attacker has access to the hash of a message `HASH(msg)` and,
without knowing `msg`, is able to construct the hash of a new message
`HASH(msg =E2=8B=85 attackerMsg)`.  This is attack is usually used against =
poorly
constructed MACs (Message Authentication Codes).  In the case of Bitcoin
there are no secret messages that are hashed, so the outer call to SHA256
is unnecessary.

As mentioned above, the inner call to SHA256 requires two chunks because of
SHA256's padding.  For the MerkleRoot construction added padding value is
always


0x8000000000000000000000000000000000000000000000000000000000000000000000000=
0000000000000000000
000000000000000000000000000000000200

because the input is of a fixed length.  SHA256's padding function is
designed

1. to add bits to the message so that the resulting message size is a
multiple of the chunk size (which is 512-bits in the case of SHA256).
2. to satisfy the Merkle--Damg=C3=A5rd property which says that if we can f=
ind a
hash collision in SHA256, which operates on variable length messages, then
we can find a hash collision in SHA256's compression function, which
operates on fixed length messages.

We could remove the second call to the SHA256 compression function and use
SHA256 without padding.  If we did, we would lose the Merkle--Damg=C3=A5rd
property.  While this may be acceptable for some use cases, we are still
left with no room to add annotation data held at a node.  For Bitcoin's
Merkle tree this is not a problem, because its trees are not annotated.
However, for other applications, we would need to add another chunk to hold
the annotation, which would mean adding back the second call to the SHA256
compression function per node of the tree.

## A New Merkle Root Algorithm
Below is an algorithm for computing Merkle roots that directly uses the
SHA256 compression function.  This algorithm operates on binary trees and
supports per node annotations.  Furthermore this proposal satisfies the
Merkle--Damg=C3=A5rd property and more.

#### Remark 1
Any finitely branching tree can be represented by a binary tree, so this
construction also applies to arbitrary finitely branching trees.

The SHA256 compression function takes two inputs:

1. A 256-bit value for the previous chunk in a chain, or an initial vector
in the case of the first chunk.
2. A 512-bit chunk of data.

    sha256Compress : Word256 =C3=97 Word512 -> Word256

In total, the SHA256 compression function compresses 768-bits into
256-bits.  The Merkle roots of two branches occupy 512 bits, and this
leaves another 256-bits of space available for tags.

Let us define a recursive type for binary trees annotated with tags.

    type Tree tag where
      Leaf   : tag                       -> Tree tag
      Unary  : tag =C3=97 Tree tag            -> Tree tag
      Binary : tag =C3=97 Tree tag =C3=97 Tree tag -> Tree tag

We define a recursive algorithm for trees whose tags are 256-bit Words (or
whose tags can be represented by 256-bit words).

    merkleRoot : Tree Word256 -> Word256
    merkleRoot (Leaf (t))                :=3D
sha256Compress(sha256(ApplicationName),
0b0^256 =E2=8B=85 t)
    merkleRoot (Unary (t, child))        :=3D sha256Compress(merkleRoot(chi=
ld),
0b0^256 =E2=8B=85 t)
    merkleRoot (Binary (t, left, right)) :=3D sha256Compress(merkleRoot(lef=
t),
merkleRoot(right) =E2=8B=85 t)

We need some initial value for the `Leaf` case.  This could be taken as the
initial vector for SHA256.  However, I suggest using the hash of an
application specific name.

    ApplicationName : BitString

We further require that the tags dictate the kind of node it is attached
to.  For example, if a given tag is used for Binary nodes, then it can only
appear in other Binary nodes.  One way this can be accomplished is by
requiring the first two bits of a tag follow a particular scheme (the
scheme below ensures that one of the first two bits is set, which will be
useful later when we define safe tags) such as

- 0b11 when the node is a Leaf node.
- 0b10 when the node is a Unary node.
- 0b01 when the node is a Binary node.

This condition suffices to provide the Merkle--Damg=C3=A5rd property:

#### Theorem 2
Suppose `t_1` and `t_2` are two different `Tree Word256` values such that
any tag occurring in the two trees only occurs in one kind of node.  If
`merkleRoot(t_1) =3D merkleRoot(t_2)` then we can effectively find a
collision in the SHA256 compression function.

Proof.
We proceed by structural induction on `t_1`.  Assume `t_1` and `t_2` are
two different `Tree Word256` values such that `merkleRoot(t_1) =3D
merkleRoot(t_2)`.  Define a `tag` function to extract the tag name from the
root of a tree.

    tag : Tree Word256 -> Word256
    tag (Leaf (t))         :=3D t
    tag (Unary (t, _))     :=3D t
    tag (Binary (t, _, _)) :=3D t

Case 1: Suppose `tag(t_1) =3D/=3D tag(t_2)`.
This means that the inputs to `sha256Compress` that produced
`merkleRoot(t_1)` and `merkleRoot(t_2)` are different and we have found a
collision in the SHA256 compression function.

Case 2: Suppose `tag(t_1) =3D tag(t_2)`.
Let `tg` be this tag.  By our requirement on tags, this means that `t_1`
and `t_2` are the same kind of node.  Suppose `t_1 =3D Binary (tg, t_(1l),
t_(1r))` and `t_2 =3D Binary (tg, t_(2l), t_(2r))`.  Since `t_1` and `t_2`
are different, then either `t_(1l) =3D/=3D t_(2l)` or `t_(1r) =3D/=3D t_(2r=
)`.
Without loss of generality, assume `t_(1l) =3D/=3D t_(2l)`.

Case 2a: Suppose `merkleRoot(t_(1l)) =3D/=3D merkleRoot(t_(2l))`.
This means that the inputs to `sha256Compress` that produced
`merkleRoot(t_1)` and `merkleRoot(t_2)` are different and we have found a
collision in the SHA256 compression function.

Case 2b: Suppose `merkleRoot(t_(1l)) =3D merkleRoot(t_(2l))`.
Then by induction we can find a collision in the SHA256 compression
function.

The cases where `t_1` and `t_2` are both `Unary` or `Leaf` nodes are
handled in a similar way.  The reader can verify the algorithm implied by
the above proof provides an effective method of finding a collision in the
SHA256 compression function.  QED

#### Remark 3
We have filled half of the chunk with `0b0^256` in for the `Unary` and
`Leaf `cases in the definition of `merkleRoot`.  The proof of Theorem 2
does not depend on this, and if we have extra ancillary data for these
types of nodes it can replace the `0b0^256` in this first half of the
chunk.  This is particularly useful for the `Leaf `case because `Leaf`
nodes often hold extra data.  We also remark that if this data is too large
to be represented by a `Word256`, it can instead be hashed with SHA256 and
the hash included instead.

Not all of the inputs to the SHA256 compression function are created
equal.  Only the second argument, the chunk data, is applied to the SHA256
expander.  `merkleRoot` is designed to ensure that the first argument of
the SHA256 compression function is only fed some output of the SHA256
compression function.  In fact, we can prove that the output of the
`merkleRoot` function is always the midstate of some SHA256 hash.  To see
this, let us explicitly separate the `sha256` function into the padding
step, `sha256Pad`, and the recursive hashing step, `unpaddedSha256`.

    sha256Pad : BitString -> List Word512

    sha256IV : Word256

    sha256Loop : Word256 =C3=97 List Word512 -> Word256
    sha256Loop (prev, Nil)                :=3D prev
    sha256Loop (prev, Cons (chunk, rest)) :=3D sha256Loop(sha256Compress(pr=
ev,
chunk), rest)

    unpaddedSha256 : List Word512 -> Word256
    unpaddedSha256 (chunks) :=3D sha256Loop(sha256IV, chunks)

    sha256 : BitString -> Word256
    sha256 (s) :=3D unpaddedSha256(sha256Pad(s))

Now we can state what we mean when we say that the output of the
`merkleRoot` function is always the midstate of some SHA256 hash.

#### Theorem 4
For all `t : Tree Word256` there exists some `l : List Word512` such that
`merkleRoot(t) =3D unpaddedSha256(l)`.

Proof.
We construct a function to transform a tree into the required list of
chunks.

    merkleChain : Tree Word256 -> List Word512
    merkleChain (Leaf (t))                :=3D sha256Pad(ApplicationName) +
[0b0^256 =E2=8B=85 t]
    merkleChain (Unary (t, child))        :=3D merkleChain(child) + [0b0^25=
6
=E2=8B=85 t]
    merkleChain (Binary (t, left, right)) :=3D merkleChain(left) +
[merkleRoot(right) =E2=8B=85 t]

The reader can verify by induction that

    forall t : Tree Word256. merkleRoot(t) =3D unpaddedSha256(merkleChain(t=
)).

QED

### Large Tag Space

If one's application cannot directly represent the space of tags with a
`Word256`, then one can hash the tags and still maintain the
Merkle--Damg=C3=A5rd property given by Theorem 2, as long as we still have =
the
property that each tag uniquely determines the kind of node it applies to.

We first note that `Tree` is a functor.

    treeMap : (a -> b) -> Tree a -> Tree b
    treeMap f (Leaf (t))                :=3D Leaf (f(t))
    treeMap f (Unary (t, child))        :=3D Unary (f(t), child)
    treeMap f (Binary (t, left, right)) :=3D Binary (f(t), left, right)

We can hash the tags of a tree.

    hash256Tags : Tree BitString -> Tree Word256
    hash256Tags (t) :=3D treeMap sha256 t

Now we can compute the `merkleRoot` of the result of `hash256Tags`.

#### Theorem 5
Suppose `t_1` and `t_2` are two different `Tree BitString` values such that
any tag occurring in the two trees only occurs in one kind of node.  If
`merkleRoot(hash256Tags(t_1)) =3D merkleRoot(hash256Tags(t_2))`, then we ca=
n
effectively find a collision in the SHA256 compression function.

Proof.
Assume `t_1` and `t_2` are two different `Tree BitString` values such that
`merkleRoot(hash256Tags(t_1)) =3D merkleRoot(hash256Tags(t_2))`.

Case 1: Suppose `hash256Tags(t_1) =3D hash256Tags(t_2)`.
This means there must be a collision in the `sha256` function.  By the
Merkle--Damg=C3=A5rd property of SHA256, this means we can effectively find=
 a
collision in the SHA256 compression function.

Case 2: Suppose `hash256Tags(t_1) =3D/=3D hash256Tags(t_2)`.
If the SHA256 hashes of each tag in the two trees uniquely determines the
kinds of their nodes, then we can apply Theorem 2 to conclude that we can
effectively find a collision in the SHA256 compression function.  If the
SHA256 hashes of each tag in the two trees does not uniquely determine the
kinds of their nodes then there are two different tags among the two trees,
`tg_1` and `tg_2`, such that `sha256(tg_1) =3D sha256(tg_2)`.  Hence there =
is
a collision in the `sha256` function, and therefore we can effectively find
a collision in the SHA256 compression function.  QED

## Avoiding collisions between merkleRoot and sha256
For the moment, let us assume there is no effective way to find a collision
in the SHA256 compression function.  By the Merkle--Damg=C3=A5rd property o=
f
SHA256, we cannot effectively find a collision within the sha256 function.
We have shown, by Theorem 2, that merkleRoot has the same Merkle--Damg=C3=
=A5rd
property and hence we cannot effectively find a collision within the
merkleRoot function.  However, we may have collisions between the `sha256`
and the `merkleRoot` functions.  That is, we may be able to effectively
find values `s : BitString` and `t : Tree Word256` such that `sha256(s) =3D
merkleRoot(t)`.

While this may not be a problem for most applications, it turns out that we
can place restrictions on the format of tags to ensure that there are never
collisions between `sha256` and `merkleRoot`.  Define a safe tag of type
`Word256` to be a a value such that one the first 192 bits is set and the
9th last bit is cleared.

#### Theorem 6
Let `t : Tree Word256` be a tree in which every tag is a safe tag.  Suppose
we have some `s : BitString` such that `sha256(s) =3D merkleRoot(t)`.  Then
we can effectively find a collision in the SHA256 compression function.

Proof.
The last chunk of `sha256Padding(s)` is a padding chunk as defined by the
SHA256 standard.  If we look at the second half of this last chunk, then
according to the padding rules, it can never be the case that one of the
first 192 bits is set and the 9^th last bit is cleared.  Because `t` only
contains safe tags, the inputs to their last compression function in the
computation of `sha256(s)` and `merkleRoot(t)` must be different.
Therefore if `sha256(s) =3D merkleRoot(t)` then we must have encountered a
collision in the final compression function of the two computations.  QED

#### Remark 7
If we use safe tags we can consider a modified `merkleRoot'` function.

    merkleRoot' : Tree Word256 -> Word256
    merkleRoot' (Leaf (t))                :=3D
sha256Compress(sha256(ApplicationName),
sha256("") =E2=8B=85 t)
    merkleRoot' (Unary (t, child))        :=3D sha256Compress(merkleRoot'(c=
hild),
sha256("") =E2=8B=85 t)
    merkleRoot' (Binary (t, left, right)) :=3D sha256Compress(merkleRoot'(l=
eft),
merkleRoot'(right) =E2=8B=85 t)

If we use `merkleRoot'` we no longer require that tags uniquely define
their node types in order to ensure the Merkle--Damg=C3=A5rd property.  Ins=
tead
we can rely on the safe tags to ensure that the use of `sha256(s)` will
never collide with `merkleRoot'(t)`, and therefore nodes of different types
will never collide with each other the `merkleRoot'` computation.  However,
I don't feel this is a particularly good approach, so I will not elaborate
on it further.

Unfortunately, there is no guarantee that the result of `hash256Tags` will
consist of only safe tags, so we cannot use it to avoid collisions between
`sha256` and `merkleRoot =E2=8B=85 hash256Tags`.  To remedy this we define =
an
alternative to `hash256Tags`.

    hash224Tag : BitString -> Word256
    hash224Tag (s) :=3D 0xFFFF =E2=8B=85 sha224(s) =E2=8B=85 0x0000

    hash224Tags : Tree BitString -> Tree Word256
    hash224Tags (t) :=3D treeMap hash224Tag t

We see that by appropriately padding the result of `sha224` we guarantee
that `hash224Tag(s)` is a safe tag.

#### Theorem 8
Suppose `t_1` and `t_2` are two different `Tree BitString` values such that
any tag occurring in the two trees only occurs in one kind of node.  If
`merkleRoot(hash224Tags(t_1)) =3D merkleRoot(hash224Tags(t_2))`, then we ca=
n
effectively find either a collision in the SHA256 compression function, or
a collision in SHA224.

#### Remark 9
We can safely replace the `0xFFFF` prefix in `hash224Tag` with one where
the first two bits determine the kind of node in accordance with our
previously defined scheme.

#### Remark 10
Rather than using SHA224 we could use SHA256 and tweak it afterwards to set
the first bit and clear the 9th last bit. This comes at the cost of
effectively using a non-standard hash function.

#### Remark 11
Most of this development not depend specifically on SHA256.  It all works
similarly for SHA512 and other hash functions that compress in a similar
manner.  SHA256 is used as the primary example because one can find
hardware support for it on commodity hardware (for example, see the Intel
SHA extensions).

## Conclusion
We have defined a merkleRoot function for computing Merkle roots of trees
that include annotations per node.  The resulting function uses one SHA256
compression function call per node and supports arbitrary 256-bit
annotations per node.  Arbitrary annotations can be supported at the cost
of more hashing.  We have also shown that by using safe tags we can
additionally get a property that the `merkleRoot` will not effectively
collide with `sha256` function.

## Further Reading
The article [Characterizing Padding Rules of MD Hash Functions Preserving
Collision Security](https://eprint.iacr.org/2009/325.pdf) by Mridul Nandi
provides a nice overview of various options for padding rules.

--001a1145853470a2a60550178117
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">## Introduction<br>This document aims to specify and justi=
fy a method for computing Merkle roots for tree structures whose nodes are =
annotated with other data.=C2=A0 While this proposal could be used to repla=
ce Bitcoin&#39;s Merkle root calculation, it is primarily aimed at new appl=
ications such as MAST, (U)TXO commitments or other Merklized structures.<br=
><br>## Background<br>Bitcoin uses a Merkle tree construction to build a co=
mmitment to a sequence of transactions for a Bitcoin block.=C2=A0 The main =
operation for computing a Merkle tree is one that takes the recursively com=
puted Merkle roots of two branches and combines them to compute the Merkle =
root of the tree with those two branches.=C2=A0 In Bitcoin, a Merkle root i=
s 256 bits and the construction is<br><br><span style=3D"font-family:monosp=
ace,monospace">=C2=A0=C2=A0=C2=A0 MerkleRoot :=3D SHA256(SHA256(LeftRoot =
=E2=8B=85 RightRoot))<br></span><br>One problem with this construction is t=
hat it is unnecessarily expensive.=C2=A0 While the concatenation of the Lef=
tRoot and the RightRoot fits in 512 bits, the size of a SHA256 chunk, a sec=
ond chunk is needed to fit SHA256&#39;s padding.=C2=A0 This means that inne=
r SHA256 call invokes the SHA256 compression function twice, once for each =
chunk.=C2=A0 The outer SHA256 call invokes the SHA256 compression function =
a third time.=C2=A0 Bitcoin&#39;s Merkle root procedure calls the SHA256 co=
mpression function a total of three times per node.<br><br>The main purpose=
 of composing two calls to the SHA256 hash function is to protect against l=
ength extension attacks.=C2=A0 A length extension attack is where an attack=
er has access to the hash of a message `HASH(msg)` and, without knowing `ms=
g`, is able to construct the hash of a new message `HASH(msg =E2=8B=85 atta=
ckerMsg)`.=C2=A0 This is attack is usually used against poorly constructed =
MACs (Message Authentication Codes).=C2=A0 In the case of Bitcoin there are=
 no secret messages that are hashed, so the outer call to SHA256 is unneces=
sary.<br><br>As mentioned above, the inner call to SHA256 requires two chun=
ks because of SHA256&#39;s padding.=C2=A0 For the MerkleRoot construction a=
dded padding value is always<br><br><span style=3D"font-family:monospace,mo=
nospace">=C2=A0=C2=A0=C2=A0 0x800000000000000000000000000000000000000000000=
00000000000000000000000000000000000000000000000<wbr>00000000000000000000000=
0000000<wbr>000200<br><br></span>because the input is of a fixed length.=C2=
=A0 SHA256&#39;s padding function is designed<br><br>1. to add bits to the =
message so that the resulting message size is a multiple of the chunk size =
(which is 512-bits in the case of SHA256).<br>2. to satisfy the Merkle--Dam=
g=C3=A5rd property which says that if we can find a hash collision in SHA25=
6, which operates on variable length messages, then we can find a hash coll=
ision in SHA256&#39;s compression function, which operates on fixed length =
messages.<br><br>We could remove the second call to the SHA256 compression =
function and use SHA256 without padding.=C2=A0 If we did, we would lose the=
 Merkle--Damg=C3=A5rd property.=C2=A0 While this may be acceptable for some=
 use cases, we are still left with no room to add annotation data held at a=
 node.=C2=A0 For Bitcoin&#39;s Merkle tree this is not a problem, because i=
ts trees are not annotated.=C2=A0 However, for other applications, we would=
 need to add another chunk to hold the annotation, which would mean adding =
back the second call to the SHA256 compression function per node of the tre=
e.<br><br>## A New Merkle Root Algorithm<br>Below is an algorithm for compu=
ting Merkle roots that directly uses the SHA256 compression function.=C2=A0=
 This algorithm operates on binary trees and supports per node annotations.=
=C2=A0 Furthermore this proposal satisfies the Merkle--Damg=C3=A5rd propert=
y and more.<br><br>#### Remark 1<br>Any finitely branching tree can be repr=
esented by a binary tree, so this construction also applies to arbitrary fi=
nitely branching trees.<br><br>The SHA256 compression function takes two in=
puts:<br><br>1. A 256-bit value for the previous chunk in a chain, or an in=
itial vector in the case of the first chunk.<br>2. A 512-bit chunk of data.=
<br><br><span style=3D"font-family:monospace,monospace">=C2=A0=C2=A0=C2=A0 =
sha256Compress : Word256 =C3=97 Word512 -&gt; Word256<br></span><br>In tota=
l, the SHA256 compression function compresses 768-bits into 256-bits.=C2=A0=
 The Merkle roots of two branches occupy 512 bits, and this leaves another =
256-bits of space available for tags.<br><br>Let us define a recursive type=
 for binary trees annotated with tags.<br><br><span style=3D"font-family:mo=
nospace,monospace">=C2=A0=C2=A0=C2=A0 type Tree tag where<br>=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0 Leaf=C2=A0=C2=A0 : tag=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0 -&gt; Tree tag<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 U=
nary=C2=A0 : tag =C3=97 Tree tag=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0 -&gt; Tree tag<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 B=
inary : tag =C3=97 Tree tag =C3=97 Tree tag -&gt; Tree tag<br></span><br>We=
 define a recursive algorithm for trees whose tags are 256-bit Words (or wh=
ose tags can be represented by 256-bit words).<br><br><span style=3D"font-f=
amily:monospace,monospace">=C2=A0=C2=A0=C2=A0 merkleRoot : Tree Word256 -&g=
t; Word256<br>=C2=A0=C2=A0=C2=A0 merkleRoot (Leaf (t))=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 :=3D =
sha256Compress(sha256(Applicat<wbr>ionName), 0b0^256 =E2=8B=85 t)<br>=C2=A0=
=C2=A0=C2=A0 merkleRoot (Unary (t, child))=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 :=3D sha256Compress(merkleRoot(chil<wbr>d), 0b0^256 =E2=8B=85 t)<=
br>=C2=A0=C2=A0=C2=A0 merkleRoot (Binary (t, left, right)) :=3D sha256Compr=
ess(merkleRoot(left<wbr>), merkleRoot(right) =E2=8B=85 t)<br></span><br>We =
need some initial value for the `Leaf` case.=C2=A0 This could be taken as t=
he initial vector for SHA256.=C2=A0 However, I suggest using the hash of an=
 application specific name.<br><br><span style=3D"font-family:monospace,mon=
ospace">=C2=A0=C2=A0=C2=A0 ApplicationName : BitString<br></span><br>We fur=
ther require that the tags dictate the kind of node it is attached to.=C2=
=A0 For example, if a given tag is used for Binary nodes, then it can only =
appear in other Binary nodes.=C2=A0 One way this can be accomplished is by =
requiring the first two bits of a tag follow a particular scheme (the schem=
e below ensures that one of the first two bits is set, which will be useful=
 later when we define safe tags) such as<br><br>- 0b11 when the node is a L=
eaf node.<br>- 0b10 when the node is a Unary node.<br>- 0b01 when the node =
is a Binary node.<br><br>This condition suffices to provide the Merkle--Dam=
g=C3=A5rd property:<br><br>#### Theorem 2<br>Suppose `t_1` and `t_2` are tw=
o different `Tree Word256` values such that any tag occurring in the two tr=
ees only occurs in one kind of node.=C2=A0 If `merkleRoot(t_1) =3D merkleRo=
ot(t_2)` then we can effectively find a collision in the SHA256 compression=
 function.<br><br>Proof.<br>We proceed by structural induction on `t_1`.=C2=
=A0 Assume `t_1` and `t_2` are two different `Tree Word256` values such tha=
t `merkleRoot(t_1) =3D merkleRoot(t_2)`.=C2=A0 Define a `tag` function to e=
xtract the tag name from the root of a tree.<br><br><span style=3D"font-fam=
ily:monospace,monospace">=C2=A0=C2=A0=C2=A0 tag : Tree Word256 -&gt; Word25=
6<br>=C2=A0=C2=A0=C2=A0 tag (Leaf (t))=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 :=3D t<br>=C2=A0=C2=A0=C2=A0 tag (Unary (t, _))=C2=A0=C2=A0=C2=
=A0=C2=A0 :=3D t<br>=C2=A0=C2=A0=C2=A0 tag (Binary (t, _, _)) :=3D t<br></s=
pan><br>Case 1: Suppose `tag(t_1) =3D/=3D tag(t_2)`.<br>This means that the=
 inputs to `sha256Compress` that produced `merkleRoot(t_1)` and `merkleRoot=
(t_2)` are different and we have found a collision in the SHA256 compressio=
n function.<br><br>Case 2: Suppose `tag(t_1) =3D tag(t_2)`.<br>Let `tg` be =
this tag.=C2=A0 By our requirement on tags, this means that `t_1` and `t_2`=
 are the same kind of node.=C2=A0 Suppose `t_1 =3D Binary (tg, t_(1l), t_(1=
r))` and `t_2 =3D Binary (tg, t_(2l), t_(2r))`.=C2=A0 Since `t_1` and `t_2`=
 are different, then either `t_(1l) =3D/=3D t_(2l)` or `t_(1r) =3D/=3D t_(2=
r)`.=C2=A0 Without loss of generality, assume `t_(1l) =3D/=3D t_(2l)`.<br><=
br>Case 2a: Suppose `merkleRoot(t_(1l)) =3D/=3D merkleRoot(t_(2l))`.<br>Thi=
s means that the inputs to `sha256Compress` that produced `merkleRoot(t_1)`=
 and `merkleRoot(t_2)` are different and we have found a collision in the S=
HA256 compression function.<br><br>Case 2b: Suppose `merkleRoot(t_(1l)) =3D=
 merkleRoot(t_(2l))`.<br>Then by induction we can find a collision in the S=
HA256 compression function.<br><br>The cases where `t_1` and `t_2` are both=
 `Unary` or `Leaf` nodes are handled in a similar way.=C2=A0 The reader can=
 verify the algorithm implied by the above proof provides an effective meth=
od of finding a collision in the SHA256 compression function.=C2=A0 QED<br>=
<br>#### Remark 3<br>We have filled half of the chunk with `0b0^256` in for=
 the `Unary` and `Leaf `cases in the definition of `merkleRoot`.=C2=A0 The =
proof of Theorem 2 does not depend on this, and if we have extra ancillary =
data for these types of nodes it can replace the `0b0^256` in this first ha=
lf of the chunk.=C2=A0 This is particularly useful for the `Leaf `case beca=
use `Leaf` nodes often hold extra data.=C2=A0 We also remark that if this d=
ata is too large to be represented by a `Word256`, it can instead be hashed=
 with SHA256 and the hash included instead.<br><br>Not all of the inputs to=
 the SHA256 compression function are created equal.=C2=A0 Only the second a=
rgument, the chunk data, is applied to the SHA256 expander.=C2=A0 `merkleRo=
ot` is designed to ensure that the first argument of the SHA256 compression=
 function is only fed some output of the SHA256 compression function.=C2=A0=
 In fact, we can prove that the output of the `merkleRoot` function is alwa=
ys the midstate of some SHA256 hash.=C2=A0 To see this, let us explicitly s=
eparate the `sha256` function into the padding step, `sha256Pad`, and the r=
ecursive hashing step, `unpaddedSha256`.<br><br><span style=3D"font-family:=
monospace,monospace">=C2=A0=C2=A0=C2=A0 sha256Pad : BitString -&gt; List Wo=
rd512<br></span><br><span style=3D"font-family:monospace,monospace">=C2=A0=
=C2=A0=C2=A0 sha256IV : Word256<br><br>=C2=A0=C2=A0=C2=A0 sha256Loop : Word=
256 =C3=97 List Word512 -&gt; Word256<br>=C2=A0=C2=A0=C2=A0 sha256Loop (pre=
v, Nil)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0 :=3D prev<br>=C2=A0=C2=A0=C2=A0 sha256Loop (prev, =
Cons (chunk, rest)) :=3D sha256Loop(sha256Compress(prev<wbr>, chunk), rest)=
<br></span><br><span style=3D"font-family:monospace,monospace">=C2=A0=C2=A0=
=C2=A0 unpaddedSha256 : List Word512 -&gt; Word256<br>=C2=A0=C2=A0=C2=A0 un=
paddedSha256 (chunks) :=3D sha256Loop(sha256IV, chunks)<br></span><br><span=
 style=3D"font-family:monospace,monospace">=C2=A0=C2=A0=C2=A0 sha256 : BitS=
tring -&gt; Word256<br>=C2=A0=C2=A0=C2=A0 sha256 (s) :=3D unpaddedSha256(sh=
a256Pad(s))<br></span><br>Now we can state what we mean when we say that th=
e output of the `merkleRoot` function is always the midstate of some SHA256=
 hash.<br><br>#### Theorem 4<br>For all `t : Tree Word256` there exists som=
e `l : List Word512` such that `merkleRoot(t) =3D unpaddedSha256(l)`.<br><b=
r>Proof.<br>We construct a function to transform a tree into the required l=
ist of chunks.<br><br><span style=3D"font-family:monospace,monospace">=C2=
=A0=C2=A0=C2=A0 merkleChain : Tree Word256 -&gt; List Word512<br>=C2=A0=C2=
=A0=C2=A0 merkleChain (Leaf (t))=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 :=3D sha256Pad(Application=
Name) + [0b0^256 =E2=8B=85 t]<br>=C2=A0=C2=A0=C2=A0 merkleChain (Unary (t, =
child))=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 :=3D merkleChain(child) +=
 [0b0^256 =E2=8B=85 t]<br>=C2=A0=C2=A0=C2=A0 merkleChain (Binary (t, left, =
right)) :=3D merkleChain(left) + [merkleRoot(right) =E2=8B=85 t]<br></span>=
<br>The reader can verify by induction that<br><br><span style=3D"font-fami=
ly:monospace,monospace">=C2=A0=C2=A0=C2=A0 forall t : Tree Word256. merkleR=
oot(t) =3D unpaddedSha256(merkleChain(t))<wbr>.<br></span><br>QED<br><br>##=
# Large Tag Space<br><br>If one&#39;s application cannot directly represent=
 the space of tags with a `Word256`, then one can hash the tags and still m=
aintain the Merkle--Damg=C3=A5rd property given by Theorem 2, as long as we=
 still have the property that each tag uniquely determines the kind of node=
 it applies to.<br><br>We first note that `Tree` is a functor.<br><br><span=
 style=3D"font-family:monospace,monospace">=C2=A0=C2=A0=C2=A0 treeMap : (a =
-&gt; b) -&gt; Tree a -&gt; Tree b<br>=C2=A0=C2=A0=C2=A0 treeMap f (Leaf (t=
))=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0 :=3D Leaf (f(t))<br>=C2=A0=C2=A0=C2=A0 treeMap f (Unary =
(t, child))=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 :=3D Unary (f(t), chi=
ld)<br>=C2=A0=C2=A0=C2=A0 treeMap f (Binary (t, left, right)) :=3D Binary (=
f(t), left, right)<br></span><br>We can hash the tags of a tree.<br><br><sp=
an style=3D"font-family:monospace,monospace">=C2=A0=C2=A0=C2=A0 hash256Tags=
 : Tree BitString -&gt; Tree Word256<br>=C2=A0=C2=A0=C2=A0 hash256Tags (t) =
:=3D treeMap sha256 t<br></span><br>Now we can compute the `merkleRoot` of =
the result of `hash256Tags`.<br><br>#### Theorem 5<br>Suppose `t_1` and `t_=
2` are two different `Tree BitString` values such that any tag occurring in=
 the two trees only occurs in one kind of node.=C2=A0 If `merkleRoot(hash25=
6Tags(t_1)) =3D merkleRoot(hash256Tags(t_2))`, then we can effectively find=
 a collision in the SHA256 compression function.<br><br>Proof.<br>Assume `t=
_1` and `t_2` are two different `Tree BitString` values such that `merkleRo=
ot(hash256Tags(t_1)) =3D merkleRoot(hash256Tags(t_2))`.<br><br>Case 1: Supp=
ose `hash256Tags(t_1) =3D hash256Tags(t_2)`.<br>This means there must be a =
collision in the `sha256` function.=C2=A0 By the Merkle--Damg=C3=A5rd prope=
rty of SHA256, this means we can effectively find a collision in the SHA256=
 compression function.<br><br>Case 2: Suppose `hash256Tags(t_1) =3D/=3D has=
h256Tags(t_2)`.<br>If the SHA256 hashes of each tag in the two trees unique=
ly determines the kinds of their nodes, then we can apply Theorem 2 to conc=
lude that we can effectively find a collision in the SHA256 compression fun=
ction.=C2=A0 If the SHA256 hashes of each tag in the two trees does not uni=
quely determine the kinds of their nodes then there are two different tags =
among the two trees, `tg_1` and `tg_2`, such that `sha256(tg_1) =3D sha256(=
tg_2)`.=C2=A0 Hence there is a collision in the `sha256` function, and ther=
efore we can effectively find a collision in the SHA256 compression functio=
n.=C2=A0 QED<br><br>## Avoiding collisions between merkleRoot and sha256<br=
>For the moment, let us assume there is no effective way to find a collisio=
n in the SHA256 compression function.=C2=A0 By the Merkle--Damg=C3=A5rd pro=
perty of SHA256, we cannot effectively find a collision within the sha256 f=
unction.=C2=A0 We have shown, by Theorem 2, that merkleRoot has the same Me=
rkle--Damg=C3=A5rd property and hence we cannot effectively find a collisio=
n within the merkleRoot function.=C2=A0 However, we may have collisions bet=
ween the `sha256` and the `merkleRoot` functions.=C2=A0 That is, we may be =
able to effectively find values `s : BitString` and `t : Tree Word256` such=
 that `sha256(s) =3D merkleRoot(t)`.<br><br>While this may not be a problem=
 for most applications, it turns out that we can place restrictions on the =
format of tags to ensure that there are never collisions between `sha256` a=
nd `merkleRoot`.=C2=A0 Define a safe tag of type `Word256` to be a a value =
such that one the first 192 bits is set and the 9th last bit is cleared.<br=
><br>#### Theorem 6<br>Let `t : Tree Word256` be a tree in which every tag =
is a safe tag.=C2=A0 Suppose we have some `s : BitString` such that `sha256=
(s) =3D merkleRoot(t)`.=C2=A0 Then we can effectively find a collision in t=
he SHA256 compression function.<br><br>Proof.<br>The last chunk of `sha256P=
adding(s)` is a padding chunk as defined by the SHA256 standard.=C2=A0 If w=
e look at the second half of this last chunk, then according to the padding=
 rules, it can never be the case that one of the first 192 bits is set and =
the 9^th last bit is cleared.=C2=A0 Because `t` only contains safe tags, th=
e inputs to their last compression function in the computation of `sha256(s=
)` and `merkleRoot(t)` must be different.=C2=A0 Therefore if `sha256(s) =3D=
 merkleRoot(t)` then we must have encountered a collision in the final comp=
ression function of the two computations.=C2=A0 QED<br><br>#### Remark 7<br=
>If we use safe tags we can consider a modified `merkleRoot&#39;` function.=
<br><br><span style=3D"font-family:monospace,monospace">=C2=A0=C2=A0=C2=A0 =
merkleRoot&#39; : Tree Word256 -&gt; Word256<br>=C2=A0=C2=A0=C2=A0 merkleRo=
ot&#39; (Leaf (t))=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 :=3D sha256Compress(sha256(Applicat<wbr>i=
onName), sha256(&quot;&quot;) =E2=8B=85 t)<br>=C2=A0=C2=A0=C2=A0 merkleRoot=
&#39; (Unary (t, child))=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 :=3D sha=
256Compress(merkleRoot&#39;(chi<wbr>ld), sha256(&quot;&quot;) =E2=8B=85 t)<=
br>=C2=A0=C2=A0=C2=A0 merkleRoot&#39; (Binary (t, left, right)) :=3D sha256=
Compress(merkleRoot&#39;(lef<wbr>t), merkleRoot&#39;(right) =E2=8B=85 t)<br=
></span><br>If we use `merkleRoot&#39;` we no longer require that tags uniq=
uely define their node types in order to ensure the Merkle--Damg=C3=A5rd pr=
operty.=C2=A0 Instead we can rely on the safe tags to ensure that the use o=
f `sha256(s)` will never collide with `merkleRoot&#39;(t)`, and therefore n=
odes of different types will never collide with each other the `merkleRoot&=
#39;` computation.=C2=A0 However, I don&#39;t feel this is a particularly g=
ood approach, so I will not elaborate on it further.<br><br>Unfortunately, =
there is no guarantee that the result of `hash256Tags` will consist of only=
 safe tags, so we cannot use it to avoid collisions between `sha256` and `m=
erkleRoot =E2=8B=85 hash256Tags`.=C2=A0 To remedy this we define an alterna=
tive to `hash256Tags`.<br><br><span style=3D"font-family:monospace,monospac=
e">=C2=A0=C2=A0=C2=A0 hash224Tag : BitString -&gt; Word256<br>=C2=A0=C2=A0=
=C2=A0 hash224Tag (s) :=3D 0xFFFF =E2=8B=85 sha224(s) =E2=8B=85 0x0000<br><=
/span><br><span style=3D"font-family:monospace,monospace">=C2=A0=C2=A0=C2=
=A0 hash224Tags : Tree BitString -&gt; Tree Word256<br>=C2=A0=C2=A0=C2=A0 h=
ash224Tags (t) :=3D treeMap hash224Tag t<br></span><br>We see that by appro=
priately padding the result of `sha224` we guarantee that `hash224Tag(s)` i=
s a safe tag.<br><br>#### Theorem 8<br>Suppose `t_1` and `t_2` are two diff=
erent `Tree BitString` values such that any tag occurring in the two trees =
only occurs in one kind of node.=C2=A0 If `merkleRoot(hash224Tags(t_1)) =3D=
 merkleRoot(hash224Tags(t_2))`, then we can effectively find either a colli=
sion in the SHA256 compression function, or a collision in SHA224.<br><br>#=
### Remark 9<br>We can safely replace the `0xFFFF` prefix in `hash224Tag` w=
ith one where the first two bits determine the kind of node in accordance w=
ith our previously defined scheme.<br><br>#### Remark 10<br>Rather than usi=
ng SHA224 we could use SHA256 and tweak it afterwards to set the first bit =
and clear the 9th last bit. This comes at the cost of effectively using a n=
on-standard hash function.<br><br>#### Remark 11<br>Most of this developmen=
t not depend specifically on SHA256.=C2=A0 It all works similarly for SHA51=
2 and other hash functions that compress in a similar manner.=C2=A0 SHA256 =
is used as the primary example because one can find hardware support for it=
 on commodity hardware (for example, see the Intel SHA extensions).<br><br>=
## Conclusion<br>We have defined a merkleRoot function for computing Merkle=
 roots of trees that include annotations per node.=C2=A0 The resulting func=
tion uses one SHA256 compression function call per node and supports arbitr=
ary 256-bit annotations per node.=C2=A0 Arbitrary annotations can be suppor=
ted at the cost of more hashing.=C2=A0 We have also shown that by using saf=
e tags we can additionally get a property that the `merkleRoot` will not ef=
fectively collide with `sha256` function.<br><br>## Further Reading<br>The =
article [Characterizing Padding Rules of MD Hash Functions Preserving Colli=
sion Security](<a href=3D"https://eprint.iacr.org/2009/325.pdf" target=3D"_=
blank">https://eprint.iacr.<wbr>org/2009/325.pdf</a>) by Mridul Nandi provi=
des a nice overview of various options for padding rules.<br><br></div>

--001a1145853470a2a60550178117--