From zooko at zooko.com  Fri Nov  1 06:24:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
Message-ID: <E187ce9-0005S8-00@localhost>

I've written a rationale for my base-32 encoding scheme.  The "DESIGN" 
document is visible via viewcvs here:

http://cvs.sf.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD&content-type=text/vnd.viewcvs-markup

In order to minimize the chance that you ignore this message, I will now include 
the contents of the file in this message.  The version number of the DESIGN file 
is v0.9 so that I can incorporate feedback from p2p-hackers (and any other 
sources) before naming it "v1.0".

------- begin appended file "DESIGN"
                                                             Zooko O'Whielacronx
                                                                   November 2002


                    human-oriented base-32 encoding

INTRO

The base-32 encoding implemented in this library differs from that described in 
draft-josefsson-base-encoding-04.txt [1], and as a result is incompatible with 
that encoding.  This document describes why we made that choice.

This encoding is implemented in a project named libbase32 [2].

This is version 0.9 of this document.  The latest version should always be 
available at:

http://cvs.sf.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD

RATIONALE

Basically, the rationale for base-32 encoding in [1] is as written therein: "The 
Base 32 encoding is designed to represent arbitrary sequences of octets in a 
form that needs to be case insensitive but need not be humanly readable.".

The rationale for libbase32 is different -- it is to represent arbitrary 
sequences of octets in a form that is as convenient as possible for human users 
to manipulate.  In particular, libbase32 was created in order to serve the Mnet 
project [3], where 40-octet cryptographic values are encoded into URIs for 
humans to manipulate.  Anticipated uses of these URIs include cut-and-paste, 
text editing (e.g. in HTML files), manual transcription via a keyboard, manual 
transcription via pen-and-paper, vocal transcription over phone or radio, etc.

The desiderata for such an encoding are:

 * minimizing transcription errors -- e.g. the well-known problem of confusing 
   `0' with `O'
 * encoding into other structures -- e.g. search engines, structured or marked-
   up text, file systems, command shells
 * brevity -- Shorter URLs are better than longer ones.
 * ergonomics -- Human users (especially non-technical ones) should find the 
   URIs as easy and pleasant as possible.  The uglier the URI looks, the worse.

DESIGN

Base

The first decision we made was to use base-32 instead of base-64.  An earlier 
version of this project used base-64, but a discussion on the p2p-hackers 
mailing list [4] convinced us that the added length of base-32 encoding was 
worth the added clarity provided by: case-insensitivity, the absence of non- 
alphanumeric characters, and the ability to omit a few of the most troublesome 
alphanumeric characters.

In particular, we realized that it would probably be faster and more comfortable 
to vocally transcribe a base-32 encoded 40-octet string (64 characters, case- 
insensitive, no non-alphanumeric characters) than a base-64 encoded one 
(54 characters, case-sensitive, plus two non-alphanumeric characters).

Alphabet

There are 26 alphabet characters and 10 digits, for a total of 36 characters 
available.  We need only 32 characters for our base-32 alphabet, so we can 
choose four characters to exclude.  This is where we part company with 
traditional base-32 encodings.  For example [1] eliminates `0', `1', `8', and 
`9'.  This choice eliminates two characters that are unambiguous (`8' and `9') 
while retaining others that are potentially confusing.  Others have suggested 
eliminating `0', `1', `O', and `L', which is likewise suboptimal.

Our choice of confusing characters to eliminate is: `0', `l', `v', and `2'.  Our 
reasoning is that `0' is potentially mistaken for `o', that `l' is potentially 
mistaken for `1' or `i', that `v' is potentially mistaken for `u' or `r' 
(especially in handwriting) and that `2' is potentially mistaken for `z' 
(especially in handwriting).

Note that we choose to focus on typed and written transcription errors instead 
of vocal, since humans already have a well-established system of disambiguating 
spoken alphanumerics (such as the United States military's "Alpha Bravo Charlie 
Delta" and telephone operators' "Is that 'd' as in 'dog'?").

Sub-Octet Data

Suppose you have 10 bits of data to transmit, and the recipient (the decoder) is 
expecting 10 bits of data.  All previous base-32 encoding schemes assume that 
the binary data to be encoded is in 8-bit octets, so you would have to pad the 
data out to 2 octets and encode it in base-32, resulting in a string 
4-characters long.  The decoder will decode that into 2 octets (16 bits) and 
then ignore the least significant 6 bits.

In the base-32 encoding described here, if the encoder and decoder both know the 
exact length of the data in bits (modulo 40), then they can use this shared 
information to optimize the size of the transmitted (encoded) string.  In the 
example that you have 10 bits of data to transmit, libbase32 allows you to 
transmit the optimal encoded string: two characters.

If the length in bits is always a multiple of 8, or if both sides are not sure 
of the length in bits modulo 40, or if this encoding is being used in a way that 
optimizing one or two characters out of the encoded string isn't worth the 
potential confusion, you can always use this encoding the same way you would use 
other encodings -- with an "input is in 8-bit octets" assumption.

Padding

Honestly, I don't understand why all the base-32 and base-64 encodings require 
trailing padding.  Maybe I'm missing something, and when I publish this document 
people will point it out, and then I'll hastily erase this paragraph.

[1] http://www.ietf.org/internet-drafts/draft-josefsson-base-encoding-04.txt
[2] http://sf.net/projects/libbase32
[3] http://mnet.sf.net/
[4] http://zgp.org/pipermail/p2p-hackers/2001-October/


From Raphael_Manfredi at pobox.com  Fri Nov  1 06:59:02 2002
From: Raphael_Manfredi at pobox.com (Raphael Manfredi)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Re: human-oriented base-32 encoding
In-Reply-To: <E187ce9-0005S8-00@localhost>
References: <E187ce9-0005S8-00@localhost>
Message-ID: <apu4pu$ip9$1@lyon.ram.loc>

Quoting p2p-hackers@zgp.org from ml.p2p.hackers:
:There are 26 alphabet characters and 10 digits, for a total of 36 characters 
:available.  We need only 32 characters for our base-32 alphabet, so we can 
:choose four characters to exclude.  This is where we part company with 
:traditional base-32 encodings.  For example [1] eliminates `0', `1', `8', and 
:`9'.  This choice eliminates two characters that are unambiguous (`8' and `9') 
:while retaining others that are potentially confusing.  Others have suggested 
:eliminating `0', `1', `O', and `L', which is likewise suboptimal.
:
:Our choice of confusing characters to eliminate is: `0', `l', `v', and `2'.  Our 
:reasoning is that `0' is potentially mistaken for `o', that `l' is potentially 
:mistaken for `1' or `i', that `v' is potentially mistaken for `u' or `r' 
:(especially in handwriting) and that `2' is potentially mistaken for `z' 
:(especially in handwriting).

Your choice is as arbitrary as the others, despite the care with which you
chose your letters.  Indeed, "9" can be mistaken with "g", especially in
handwriting.  And I've seen people spell out "8" as "B", so...

:Honestly, I don't understand why all the base-32 and base-64 encodings require 
:trailing padding.

To know that it has not been truncated if you are in the middle of a sequence?

Do we really need yet another incompatible base32 encoding?  You might not
know it, but Gnutella already standardized on using
"ABCDEFGHIJKLMNOPQRSTUVWXYZ234567" as the base32 alphabet.

Raphael

From zooko at zooko.com  Fri Nov  1 07:44:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Re: human-oriented base-32 encoding 
In-Reply-To: Message from Raphael_Manfredi@pobox.com (Raphael Manfredi) 
   of "01 Nov 2002 14:58:06 GMT." <apu4pu$ip9$1@lyon.ram.loc> 
References: <E187ce9-0005S8-00@localhost>  <apu4pu$ip9$1@lyon.ram.loc> 
Message-ID: <E187dti-0006mK-00@localhost>

> :Our choice of confusing characters to eliminate is: `0', `l', `v', and `2'.  Our 
> :reasoning is that `0' is potentially mistaken for `o', that `l' is potentially 
> :mistaken for `1' or `i', that `v' is potentially mistaken for `u' or `r' 
> :(especially in handwriting) and that `2' is potentially mistaken for `z' 
> :(especially in handwriting).
> 
> Your choice is as arbitrary as the others, despite the care with which you
> chose your letters.  Indeed, "9" can be mistaken with "g", especially in
> handwriting.  And I've seen people spell out "8" as "B", so...

These are good points, but I still think that `l' is more likely to be confused 
with `i' or `I' than `1' is, and that `v' and `2' are more troublesome than `8' 
and `9'.

This stuff isn't "arbitrary" but we are unfortunately compelled to base our 
decisions on our own intuitions, as I haven't seen any quantitative, scientific 
analysis of this kind of transcription error.  If anyone knows of some I would 
love to see it.

> :Honestly, I don't understand why all the base-32 and base-64 encodings require 
> :trailing padding.
> 
> To know that it has not been truncated if you are in the middle of a sequence?

So it's a very weak kind of error detection?  (Namely, it doesn't detect 
truncation between sequences, nor several other kinds of error.)  Thanks for the 
clue.  I'll update my doc to say that this is the motivation, but it is one that 
doesn't motivate me.

> Do we really need yet another incompatible base32 encoding?  You might not
> know it, but Gnutella already standardized on using
> "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567" as the base32 alphabet.

This is the same alphabet used the Internet Draft, which I refer to in my 
document.  I think my alphabet has better resistance to transcription errors, 
and I think that this feature is more valuable to me than sharing an encoding 
with Gnutella is.

Note: I'm actually quite interested in *interoperating* with Gnutella, and in 
fact I occasionally hack on a Gnutella implementation [1].  But I don't think 
having an identical base-32 encoding is more important than optimizing the 
alphabet for human use.

Oh damn, I just realized that my DESIGN doc doesn't talk about the order of the 
alphabet...

Regards,

Zooko

[1] http://twistedmatrix.com/users/jh.twistd/viewcvs/cgi/viewcvs.cgi/twisted/protocols/gnutella.py?rev=HEAD&content-type=text/vnd.viewcvs-markup&cvsroot=Twisted

From Raphael_Manfredi at pobox.com  Fri Nov  1 08:01:01 2002
From: Raphael_Manfredi at pobox.com (Raphael Manfredi)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Re: human-oriented base-32 encoding
In-Reply-To: <E187dti-0006mK-00@localhost>
References: <E187ce9-0005S8-00@localhost><apu4pu$ip9$1@lyon.ram.loc> <E187dti-0006mK-00@localhost>
Message-ID: <apu8e1$qop$1@lyon.ram.loc>

Quoting p2p-hackers@zgp.org from ml.p2p.hackers:
:This is the same alphabet used the Internet Draft, which I refer to in my 
:document.  I think my alphabet has better resistance to transcription errors, 
:and I think that this feature is more valuable to me than sharing an encoding 
:with Gnutella is.

Really?  What is the value of this base32 encoding then:

	ABCDEFGHIJKLMNPQRSTUWXYZ345678ABC

When you see:

	/uri-res/N2R?urn:sha1:ABCDEFGHIJKLMNPQRSTUWXYZ345678ABC

you better know the alphabet used to be able to determine the SHA1 digest
correctly.  And all the people producing those URLs have better use the
SAME alphabet.

What can be more important than that?

Raphael

From lgonze at panix.com  Fri Nov  1 08:03:01 2002
From: lgonze at panix.com (Lucas Gonze)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Re: human-oriented base-32 encoding 
In-Reply-To: <E187dti-0006mK-00@localhost>
Message-ID: <Pine.LNX.4.44.0211011045190.1792-100000@localhost.localdomain>

> > Do we really need yet another incompatible base32 encoding?  You might not
> > know it, but Gnutella already standardized on using
> > "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567" as the base32 alphabet.
> 
> This is the same alphabet used the Internet Draft, which I refer to in my 
> document.  I think my alphabet has better resistance to transcription errors, 
> and I think that this feature is more valuable to me than sharing an encoding 
> with Gnutella is.

This difference would create a different *kind* of transcription error,
which is that a base32 identifier in either alphabet would have to specify
which alphabet it was.  A fix for this is to make sure that there is a
disambiguating character in any string in your alphabet.  That is not
guaranteed in the wild, so you have to put it there.  For example, you
could prepend a (case-sensitive) lowercase "z" to every identifier.

If that character was always there, the human cost to remember it would be
as low as the human cost to remember the "www." and ".com"  parts of
"www.{foo}.com".

- Lucas


From zooko at zooko.com  Fri Nov  1 08:28:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Re: human-oriented base-32 encoding 
In-Reply-To: Message from Lucas Gonze <lgonze@panix.com> 
   of "Fri, 01 Nov 2002 10:59:55 EST." <Pine.LNX.4.44.0211011045190.1792-100000@localhost.localdomain> 
References: <Pine.LNX.4.44.0211011045190.1792-100000@localhost.localdomain> 
Message-ID: <E187ea7-00079A-00@localhost>

 Lucas Gonze wrote:
>
> This difference would create a different *kind* of transcription error,
> which is that a base32 identifier in either alphabet would have to specify
> which alphabet it was.
...
> If that character was always there, the human cost to remember it would be
> as low as the human cost to remember the "www." and ".com"  parts of
> "www.{foo}.com".

That is a good point.  I should specify some assumptions/requirements about the 
base32 encoded strings, namely that they are error-corrected (outside of the 
base32 encoding) and that which kind of base32 encoding they are, and indeed the 
fact that they *are* base32 encoded at all, is likewise transmitted out of band.

(I already mention that the length of the binary data in bits can be, but 
doesn't have to be, transmitted out of band.)

In particular I envision URI's that denote the overall scheme in the leading 
part, like:

mnet://xpqff6aurk9jtpfahkkrcp3n684ci6x3hwirophsr5h1i1hp8swabhmh7tho

The error correction is done by the transport mechanism (in this case e-mail) 
and the specification of encoding (as well as specification of everything else) 
is done by the leading part of the URI.

If Mnet were to generate URIs that were intended to be used by other systems 
such as Gnutella, like:

/uri-res/N2R?urn:sha1:ABCDEFGHIJKLMNPQRSTUWXYZ345678ABC

Then it would emit the kind of encoding that is denoted by the URI 
specification.

The issue of in-band vs. out-of-band transmission of this kind of, uh, 
"meta-data" is important, and I thank Raphael Manfredi and Lucas Gonze for 
bringing it up.

I'm working on a revision of the DESIGN doc that goes into these issues, as well 
as addresses the issue of alphabet order, which I accidentally omitted from the 
first revision.

Regards,

Zooko


From justin at chapweske.com  Fri Nov  1 08:43:02 2002
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Re: human-oriented base-32 encoding
References: <Pine.LNX.4.44.0211011045190.1792-100000@localhost.localdomain>
Message-ID: <3DC2AEF1.6080004@chapweske.com>

I agree with Lucas and Rapheal.  I think the damage caused by using two 
incompatible base32 encodings far outweighs any benefits of zooko's new 
encoding.  And note, its not just Gnutella using canonical base32, we 
also use it for the Content-Addressable Web and THEX specifications as 
well.

Also realize that there are more p2p apps implementing the Gnutella 
protocol and thus supporting canonical base32 than the rest of the p2p 
apps combined.

> 
> This difference would create a different *kind* of transcription error,
> which is that a base32 identifier in either alphabet would have to specify
> which alphabet it was.  A fix for this is to make sure that there is a
> disambiguating character in any string in your alphabet.  That is not
> guaranteed in the wild, so you have to put it there.  For example, you
> could prepend a (case-sensitive) lowercase "z" to every identifier.
> 
> If that character was always there, the human cost to remember it would be
> as low as the human cost to remember the "www." and ".com"  parts of
> "www.{foo}.com".
> 

-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From zooko at zooko.com  Fri Nov  1 09:34:02 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Re: human-oriented base-32 encoding 
In-Reply-To: Message from Justin Chapweske <justin@chapweske.com> 
   of "Fri, 01 Nov 2002 10:42:25 CST." <3DC2AEF1.6080004@chapweske.com> 
References: <Pine.LNX.4.44.0211011045190.1792-100000@localhost.localdomain>  <3DC2AEF1.6080004@chapweske.com> 
Message-ID: <E187fcd-0007Wj-00@localhost>

 Justin Chapweske wrote:
>
> I agree with Lucas and Rapheal.  I think the damage caused by using two 
> incompatible base32 encodings far outweighs any benefits of zooko's new 
> encoding.

I think there are two possible values to be gained from using the same base32 
encoding:

1.  Interoperation without the need for out-of-band signalling to tell which 
encoding is being used.

2.  Re-use of source code/specs/mental effort.

For the first value, consider the example that a user could retrieve a base32 
encoded identifier from a web page, with no clarifying name to indicate what it 
identifies, and paste it into either Mnet or a Gnutella implementation and get 
the same result.  This is currently impossible, not because Mnet and Gnutella 
use different ASCII encodings, but because they encode different things (hashes 
of files in Gnutella (?), hashes of dinodes in Mnet).  Likewise, if someone 
gives you a Tiger tree hash of the contents and you need the SHA1 flat hash of 
the contents, then the ASCII encoding is irrelevant -- you can't use the data.

Now in the future Mnet might do something with SHA1 hashes of files, or Gnutella 
might do something with Mnet dinodes (;-)), in which case a user might 
reasonably give the Mnet application a naked base32 encoded identifier and 
expect the Mnet application to do something with it.  Now even if Mnet *did* use 
the same base32 encoding as Gnutella, it would still be faced with the mystery 
of whether the thing in question is a SHA1 hash of a file or an mnetId (the id 
of a dinode).

There are three possible ways to disambiguate:

a.  Out-of-band signalling, such as a leading "mnet://" or "SHA1:" or such.  
    This is certainly best from the point of view from the application (and the 
    programmer), but the user might not play along.
b.  In-band signalling, such as Lucas's clever suggesting of including an 
    unnecessary signalling character, or (my preference) using a different 
    (slightly shorter) length for mnetIds.
c.  Suck it and see.  Attempt to download the file both ways, and then whichever 
    one works, cryptographically verify its correctness.  It's comforting to 
    know that this can be done in a pinch, but it hardly seems to first choice.

In short, using a different base32 encoding doesn't make interoperation any 
harder, as far as I can see, and in fact it offers an extra possibility for 
making it *easier* by using Lucas's trick.

(Hm.  In fact, since each character has a 1 in 16 chance of being a Gnutella-
base32-illegal character (8 or 9), a 64-character string has only a 1.6% chance 
of being both a Gnutella-base32-legal string and an Mnet-base32-legal string.  
I can make this even less likely by making `8' and `9' be common trailing 
characters...  Hm...  Heh heh.  I can also use Lucas's trick by appending a 
useless "8" character only in the 1% case that one has not already occurred 
naturally...)

Anyway, if Mnet ever emits SHA1 hashes of the contents of files, then the 
argument about interoperation applies.  As long as Mnet is emitting a 
semantically different object, then the argument applies in the opposite 
direction!


As to the second value of re-using source code and so forth, I've already 
written my base32 implementation and I enjoyed it.  You can have the ANSI C and 
Python implementations under a permissive BSD-style license.

http://sf.net/projects/libbase32

You can also set the "alphabet" string to "abcdefghijklmnopqrstuvwxyz234567" in 
order to make it identical to the Gnutella encoding.


Regards,

Zooko


From justin at chapweske.com  Fri Nov  1 09:44:01 2002
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Re: human-oriented base-32 encoding
References: <Pine.LNX.4.44.0211011045190.1792-100000@localhost.localdomain>  <3DC2AEF1.6080004@chapweske.com> <E187fcd-0007Wj-00@localhost>
Message-ID: <3DC2BD38.40300@chapweske.com>

Can you please call it something other than just 'libbase32' as there is 
already enough confusion between developers about the various encodings. 
  Perhaps call it 'mnet32' encoding or some-such.

> 
> As to the second value of re-using source code and so forth, I've already 
> written my base32 implementation and I enjoyed it.  You can have the ANSI C and 
> Python implementations under a permissive BSD-style license.
> 
> http://sf.net/projects/libbase32
> 
> You can also set the "alphabet" string to "abcdefghijklmnopqrstuvwxyz234567" in 
> order to make it identical to the Gnutella encoding.
> 
> 


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From zooko at zooko.com  Fri Nov  1 09:50:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Re: human-oriented base-32 encoding 
In-Reply-To: Message from Justin Chapweske <justin@chapweske.com> 
   of "Fri, 01 Nov 2002 11:43:20 CST." <3DC2BD38.40300@chapweske.com> 
References: <Pine.LNX.4.44.0211011045190.1792-100000@localhost.localdomain> <3DC2AEF1.6080004@chapweske.com> <E187fcd-0007Wj-00@localhost>  <3DC2BD38.40300@chapweske.com> 
Message-ID: <E187frx-0007iE-00@localhost>

 Justin Chapweske wrote:
>
> Can you please call it something other than just 'libbase32' as there is 
> already enough confusion between developers about the various encodings. 
>   Perhaps call it 'mnet32' encoding or some-such.

I'll make the specification of the alphabet a visible part of the interface 
(i.e. the programmer who uses libbase32 has to choose an alphabet, and one of 
the options is "abcdefghijklmnopqrstuvwxyz234567").

I *do* need a name for my encoding.

--Z


From me at aaronsw.com  Fri Nov  1 14:36:01 2002
From: me at aaronsw.com (Aaron Swartz)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
In-Reply-To: <E187ce9-0005S8-00@localhost>
Message-ID: <42B62FAC-EDEA-11D6-8DD8-003065F376B6@aaronsw.com>

why did you choose to limit yourself to numbers and letters? it seems a 
more human-friendly scheme (while remaining url-compatible) would be:

abcdefghikmopsuwxyz345678$@^-+;~

(there's probably some room for improvement) am i missing something?

-- 
Aaron Swartz [http://www.aaronsw.com] "Curb your consumption," he said.


From barnesjf at vuse.vanderbilt.edu  Fri Nov  1 15:35:02 2002
From: barnesjf at vuse.vanderbilt.edu (J. Fritz Barnes)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
In-Reply-To: <42B62FAC-EDEA-11D6-8DD8-003065F376B6@aaronsw.com>; from me@aaronsw.com on Fri, Nov 01, 2002 at 04:35:45PM -0600
References: <E187ce9-0005S8-00@localhost> <42B62FAC-EDEA-11D6-8DD8-003065F376B6@aaronsw.com>
Message-ID: <20021101172920.F1017@vuse.vanderbilt.edu>

On Fri, Nov 01, 2002 at 04:35:45PM -0600, Aaron Swartz wrote:
:) why did you choose to limit yourself to numbers and letters? it seems a 
:) more human-friendly scheme (while remaining url-compatible) would be:
:) 
:) abcdefghikmopsuwxyz345678$@^-+;~
:) 
:) (there's probably some room for improvement) am i missing something?

The above choice of strings has a small user-interface
disadvantage.  If I double-click on the string above, it will
only select the characters to the left of the dollar-sign.
(Therefore, is slightly less user-friendly for cut-and-paste
applications.)

Fritz

From lgonze at panix.com  Fri Nov  1 15:48:01 2002
From: lgonze at panix.com (Lucas Gonze)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
In-Reply-To: <20021101172920.F1017@vuse.vanderbilt.edu>
Message-ID: <Pine.LNX.4.44.0211011836090.2383-100000@localhost.localdomain>

But while we're on the subject, '-' and '_' would make these identifiers
more chunkable.  for example 18005551212 is harder to transcribe than
1-800-555-1212.  

It's pure cognitive trickery, there's no extra meaning at all, but it
works.

On Fri, 1 Nov 2002, J. Fritz Barnes wrote:

> On Fri, Nov 01, 2002 at 04:35:45PM -0600, Aaron Swartz wrote:
> :) why did you choose to limit yourself to numbers and letters? it seems a 
> :) more human-friendly scheme (while remaining url-compatible) would be:
> :) 
> :) abcdefghikmopsuwxyz345678$@^-+;~
> :) 
> :) (there's probably some room for improvement) am i missing something?
> 
> The above choice of strings has a small user-interface
> disadvantage.  If I double-click on the string above, it will
> only select the characters to the left of the dollar-sign.
> (Therefore, is slightly less user-friendly for cut-and-paste
> applications.)
> 
> Fritz


From gojomo at usa.net  Fri Nov  1 16:01:01 2002
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
References: <E187ce9-0005S8-00@localhost>
Message-ID: <008001c28202$d46c0670$640a000a@golden>

I agree that ease of person-to-person communication, in spoken
and handwritten forms, is an important consideration (especially
between encoding-digit candidates which are equally beneficial in 
other respects, since computers don't care).

However, I think you're undervaluing standardization, and 
overestimating the importance of handling the a few unrepresentative
"sloppy handwriting" problems.

I actually preferred the Base32 alphabet that was proposed in one
of the (since discarded) international-DNS proposals, which left
out both the letters 'o' and 'L' and the numbers '0' and '1' -- 
leaving no possibility of misunderstanding either way.

But the Base32 alphabet which goes...

  ABCDEFGHIJKLMNOPQRSTUVWXYZ234567

...was already in use by another IETF effort (something related
to SASL GSSAPI), and already well-documented in an Internet-Draft
(Joseffson's) that is apparently headed for status as a 
referenceable, numbered IETF RFC.

The clarity of being able to say simply, "Base32", and without
further discussion have one clear digit-set be implied, is very
useful. It will aid in the reuse of encoding routines, and avoid
silly bugs due to people working from different assumptions and
incomplete docs. 

So it benefits us all to christen one "Base32" as "canonical" or 
"internet standard" -- and having "one" is more important than 
having "the best" (for any given, and probably idiosyncratic
and arguable, definition of "best").

Also, while the 'l' <-> '1' and '0' <-> 'O' isomorphisms are 
very troublesome -- in some screen/print fonts they are identical,
or only distinguishable with careful font-specific pixel-level
observation -- the other transcription risks you've chosen to 
protect against are:

   - only problematic in "sloppy" handwriting or artsy fonts;
     no one's 
   - only a few among a gigantic potential set of miswritten,
     misread, or misheard glyphs

What about '7' and 'T'? '2' and '7'? 'I' and '1' (one)? 
'5' and 'S'? '4' and '9'? 

I recently gave my phone number to someone, and I watched
them write it down, apparently correctly. However, that
same person, later confirming the information via email,
converted:

    987 5907 (the correct number, in their own handwriting)
to  472 2402 !!!!

Only the '0' came across right, though each number was 
visually 'close' to what it should have been.

What about 'h' and 'n'? (Apparently, people often misread my
handwritten last name "Mohr" as "Monr".)
 
And if spoken identifiers are important, what about those classes
of letters which sound alike over crummy phone connections?
('B' 'V', 'M' 'N', 'Z' 'C', etc. My sister tells me these are 
"fricatives", and I've never understood why the telephone-dependent
airline-reservations industry doesn't just rule these characters
out of their confirmation codes.)

So you've addressed a few of the problems that have given the most
problems in your anecdotal experience, but others' experience will 
surely differ, there's no reliable overall data, and the cost is
deviation from the "Base32" alphabet which coders are most likely
to see in other applications or find in preexisting docs and 
library code.

At the very least, if you go with a custom alphabet, please call
it "Mnet32" or some such, so that confusion in minimized, and
generations of future hapless searchers don't come across your
definition first, and think they've got *the* "Base32".

- Gojomo


----- Original Message ----- 
From: "Zooko" <zooko@zooko.com>
To: <p2p-hackers@zgp.org>
Sent: Friday, November 01, 2002 6:19 AM
Subject: [p2p-hackers] human-oriented base-32 encoding


> 
> I've written a rationale for my base-32 encoding scheme.  The "DESIGN" 
> document is visible via viewcvs here:
> 
> http://cvs.sf.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD&content-type=text/vnd.viewcvs-markup
> 
> In order to minimize the chance that you ignore this message, I will now include 
> the contents of the file in this message.  The version number of the DESIGN file 
> is v0.9 so that I can incorporate feedback from p2p-hackers (and any other 
> sources) before naming it "v1.0".
> 
> ------- begin appended file "DESIGN"
>                                                              Zooko O'Whielacronx
>                                                                    November 2002
> 
> 
> 
>                     human-oriented base-32 encoding
> 
> INTRO
> 
> The base-32 encoding implemented in this library differs from that described in 
> draft-josefsson-base-encoding-04.txt [1], and as a result is incompatible with 
> that encoding.  This document describes why we made that choice.
> 
> This encoding is implemented in a project named libbase32 [2].
> 
> This is version 0.9 of this document.  The latest version should always be 
> available at:
> 
> http://cvs.sf.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD
> 
> RATIONALE
> 
> Basically, the rationale for base-32 encoding in [1] is as written therein: "The 
> Base 32 encoding is designed to represent arbitrary sequences of octets in a 
> form that needs to be case insensitive but need not be humanly readable.".
> 
> The rationale for libbase32 is different -- it is to represent arbitrary 
> sequences of octets in a form that is as convenient as possible for human users 
> to manipulate.  In particular, libbase32 was created in order to serve the Mnet 
> project [3], where 40-octet cryptographic values are encoded into URIs for 
> humans to manipulate.  Anticipated uses of these URIs include cut-and-paste, 
> text editing (e.g. in HTML files), manual transcription via a keyboard, manual 
> transcription via pen-and-paper, vocal transcription over phone or radio, etc.
> 
> The desiderata for such an encoding are:
> 
>  * minimizing transcription errors -- e.g. the well-known problem of confusing 
>    `0' with `O'
>  * encoding into other structures -- e.g. search engines, structured or marked-
>    up text, file systems, command shells
>  * brevity -- Shorter URLs are better than longer ones.
>  * ergonomics -- Human users (especially non-technical ones) should find the 
>    URIs as easy and pleasant as possible.  The uglier the URI looks, the worse.
> 
> DESIGN
> 
> Base
> 
> The first decision we made was to use base-32 instead of base-64.  An earlier 
> version of this project used base-64, but a discussion on the p2p-hackers 
> mailing list [4] convinced us that the added length of base-32 encoding was 
> worth the added clarity provided by: case-insensitivity, the absence of non- 
> alphanumeric characters, and the ability to omit a few of the most troublesome 
> alphanumeric characters.
> 
> In particular, we realized that it would probably be faster and more comfortable 
> to vocally transcribe a base-32 encoded 40-octet string (64 characters, case- 
> insensitive, no non-alphanumeric characters) than a base-64 encoded one 
> (54 characters, case-sensitive, plus two non-alphanumeric characters).
> 
> Alphabet
> 
> There are 26 alphabet characters and 10 digits, for a total of 36 characters 
> available.  We need only 32 characters for our base-32 alphabet, so we can 
> choose four characters to exclude.  This is where we part company with 
> traditional base-32 encodings.  For example [1] eliminates `0', `1', `8', and 
> `9'.  This choice eliminates two characters that are unambiguous (`8' and `9') 
> while retaining others that are potentially confusing.  Others have suggested 
> eliminating `0', `1', `O', and `L', which is likewise suboptimal.
> 
> Our choice of confusing characters to eliminate is: `0', `l', `v', and `2'.  Our 
> reasoning is that `0' is potentially mistaken for `o', that `l' is potentially 
> mistaken for `1' or `i', that `v' is potentially mistaken for `u' or `r' 
> (especially in handwriting) and that `2' is potentially mistaken for `z' 
> (especially in handwriting).
> 
> Note that we choose to focus on typed and written transcription errors instead 
> of vocal, since humans already have a well-established system of disambiguating 
> spoken alphanumerics (such as the United States military's "Alpha Bravo Charlie 
> Delta" and telephone operators' "Is that 'd' as in 'dog'?").
> 
> Sub-Octet Data
> 
> Suppose you have 10 bits of data to transmit, and the recipient (the decoder) is 
> expecting 10 bits of data.  All previous base-32 encoding schemes assume that 
> the binary data to be encoded is in 8-bit octets, so you would have to pad the 
> data out to 2 octets and encode it in base-32, resulting in a string 
> 4-characters long.  The decoder will decode that into 2 octets (16 bits) and 
> then ignore the least significant 6 bits.
> 
> In the base-32 encoding described here, if the encoder and decoder both know the 
> exact length of the data in bits (modulo 40), then they can use this shared 
> information to optimize the size of the transmitted (encoded) string.  In the 
> example that you have 10 bits of data to transmit, libbase32 allows you to 
> transmit the optimal encoded string: two characters.
> 
> If the length in bits is always a multiple of 8, or if both sides are not sure 
> of the length in bits modulo 40, or if this encoding is being used in a way that 
> optimizing one or two characters out of the encoded string isn't worth the 
> potential confusion, you can always use this encoding the same way you would use 
> other encodings -- with an "input is in 8-bit octets" assumption.
> 
> Padding
> 
> Honestly, I don't understand why all the base-32 and base-64 encodings require 
> trailing padding.  Maybe I'm missing something, and when I publish this document 
> people will point it out, and then I'll hastily erase this paragraph.
> 
> [1] http://www.ietf.org/internet-drafts/draft-josefsson-base-encoding-04.txt
> [2] http://sf.net/projects/libbase32
> [3] http://mnet.sf.net/
> [4] http://zgp.org/pipermail/p2p-hackers/2001-October/
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers


From gojomo at usa.net  Fri Nov  1 16:09:01 2002
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
References: <Pine.LNX.4.44.0211011836090.2383-100000@localhost.localdomain>
Message-ID: <008801c28203$f30e2b20$640a000a@golden>

> > On Fri, Nov 01, 2002 at 04:35:45PM -0600, Aaron Swartz wrote:
> > :) why did you choose to limit yourself to numbers and letters? it seems a 
> > :) more human-friendly scheme (while remaining url-compatible) would be:
> > :) 
> > :) abcdefghikmopsuwxyz345678$@^-+;~
> > :) 
> > :) (there's probably some room for improvement) am i missing something?


Some of those characters are difficult to use in filenames, 
having special meanings or being otherwise disallowed.

They are also considered 'stop' characters by legacy indexing
practices (including Google), so a token featuring the non-
alphanumeric characters may not be searchable, or as easily
searchable as a single item, as a purely alphanumeric token.

For example, try searching Google for:

   F77THX24CRGILULQ637OHA7E4HE7QDQ2

If this identifier, and many others, were broken up by stop
characters, you might get flase hits or have other problems.

Lucas Gonze writes:
> But while we're on the subject, '-' and '_' would make these identifiers
> more chunkable.  for example 18005551212 is harder to transcribe than
> 1-800-555-1212.  
>
> It's pure cognitive trickery, there's no extra meaning at all, but it
> works.

And for the indexing purposes mentioned above, I think such 
chunking would be a problem, rather than a benefit.

- Gojomo


From lgonze at panix.com  Fri Nov  1 16:55:01 2002
From: lgonze at panix.com (Lucas Gonze)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
In-Reply-To: <008801c28203$f30e2b20$640a000a@golden>
Message-ID: <Pine.LNX.4.44.0211011948200.2383-100000@localhost.localdomain>

Gordon Mohr wrote:
> Some of those characters are difficult to use in filenames, 
> having special meanings or being otherwise disallowed.

Hm.  It's Zooko's project.  I'll leave the fine points of what he wants to 
do to him.  No point in kibitzing more than I already have.


From zooko at zooko.com  Fri Nov  1 17:12:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding 
In-Reply-To: Message from Lucas Gonze <lgonze@panix.com> 
   of "Fri, 01 Nov 2002 19:51:33 EST." <Pine.LNX.4.44.0211011948200.2383-100000@localhost.localdomain> 
References: <Pine.LNX.4.44.0211011948200.2383-100000@localhost.localdomain> 
Message-ID: <E187mlQ-0001V1-00@localhost>

For those watching at home (err.. I guess that's everyone?  Except those 
watching from work.) I've updated the doc to address some of the issues that 
have been discussed.

http://cvs.sf.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD&content-type=text/vnd.viewcvs-markup

In particular, I am very interested in the surprisingly strong emotions that 
people seem to have at the very thought of an "incompatible" base-32 encoding.  
When I started this sub-project, I thought that I was sacrificing some non-
specific compatibility in exchange for better human-usability, but after this 
discussion and thinking more carefully about the issue, it seems to me that 
using mnet-base-32 encoding for mnetIds actually *enhances* compatibility with 
other systems more than using standard-base-32 encoding for mnetIds would.


Here is the relevant section from v0.9.4.1 of the DESIGN file:


A NOTE ON COMPATIBILITY AND INTEROPERATION

If your application could possibly interoperate with another application, then 
you should consider the risk of precluding such interoperation by encoding 
semantically identical objects into syntactically different representations.  
For example, many current systems include the SHA-1 hash of the contents of a 
file, and this hash value can be represented for user or programmatic sharing in 
base-32 encoded form [5, 6, 7, 8].  These four systems all use traditional 
base-32 encoding as described in [1].  If your system will expose the SHA-1 hash 
of the contents of a file, then you should consider the benefits of having such 
hash values be exchangeable with those systems by using the same encoding 
including base, alphabet, permutation of alphabet, length-encoding, padding, 
treatment of illegal characters and line-breaks.

If, however, the semantic meaning of the objects that you are exposing is not 
something that can be used by another system, due to semantic differences, then 
you gain nothing with regard to interoperation by using the same ASCII encoding, 
and in fact by doing so you may incur *worse* interoperation problems by making 
it impossible for the applications to use syntactic features (namely, by 
recognizing the encoding scheme) to disambiguate between semantic features.  

Lucas Gonze has suggested [9] that different schemes could in fact 
*deliberately* add characters which would be illegal in another scheme in order 
to enable syntactic differentiation.  (This would be morally similar to the 
"check digit" included in most credit card numbers.)

The author has also suggested [10] encoding schematic compatibility in the 
lengths.  For example, mnetIds will probably be 48 characters in base-32 encoded 
form (encoding 30 octets of data).  If it turns out that other strings of that 
length and form occur in the wild, then the mnetIds could be redefined to be 47 
or 49 characters in order to make them recognizable.

Clearly the best semantic differentiation is an unambiguous one that is 
transmitted out-of-band (outside of the ASCII encoding, that is), such as URI 
scheme names (e.g.: SHA1:blahblahblah or mnet://blahblahblah).  However, users 
might not always preserve those.

REFERENCES

[1] http://www.ietf.org/internet-drafts/draft-josefsson-base-encoding-04.txt
[2] http://sf.net/projects/libbase32
[3] http://mnet.sf.net/
[4] http://zgp.org/pipermail/p2p-hackers/2001-October/
[5] Gnutella [need URL for SHA1 and base-32 encoding stuff]
[6] Bitzi [need URL for specification stuff]
[7] CAW [need URL]
[8] THEX [need URL]
[9] http://zgp.org/pipermail/p2p-hackers/2002-November/000924.html
[10] http://zgp.org/pipermail/p2p-hackers/2002-November/000927.html


From pfh at mail.csse.monash.edu.au  Mon Nov  4 16:55:01 2002
From: pfh at mail.csse.monash.edu.au (Paul Harrison)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
In-Reply-To: <Pine.LNX.4.44.0211011836090.2383-100000@localhost.localdomain>
Message-ID: <Pine.LNX.4.33.0211051113470.23033-100000@mandarin>

On Fri, 1 Nov 2002, Lucas Gonze wrote:

>
> But while we're on the subject, '-' and '_' would make these identifiers
> more chunkable.  for example 18005551212 is harder to transcribe than
> 1-800-555-1212.
>
> It's pure cognitive trickery, there's no extra meaning at all, but it
> works.
>

To have an even more human friendly coding, how about nonsense words :-)
This python code snippet encodes each byte as a syllable...

import string

consonants = [
 'b','c','d','f','g','h','j','k','l','m','n','p','r',
 's','t','v','w','x','z',
 'th','ch','sh','st','ts'
]

vowels = [
 'a','e','i','o','u',  'oo','ee','io','au','ie', 'ai'
]

def human_name(str):
  result = ''
  for letter in str:
    number = ord(letter)
    result = result + consonants[number%len(consonants)] \
                    + vowels    [number/len(consonants)]
  return string.capitalize(result)

(this used to be used in Circle for public keys. It now uses a graphical
squiggle, the idea being a public key is recognizable to people (as
opposed to transcribable) so that they can search for a person then check
that they aren't an imposter)

cheers,
Paul

Email: pfh@csse.monash.edu.au

one ring, no rulers, http://www.csse.monash.edu.au/~pfh/circle/


From bram at gawth.com  Mon Nov  4 19:11:01 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] upcoming meeting sunday
Message-ID: <Pine.LNX.4.21.0211041909190.12650-100000@ultra.gawth.com>

Yesterday was the first sunday of the month, so arithmetic being what it
is, this next week will be the monthly p2p-hackers meeting

3pm, SONY metreon, san francisco, food court area. Sunday, November 10th.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From lgonze at panix.com  Tue Nov  5 08:44:02 2002
From: lgonze at panix.com (Lucas Gonze)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
In-Reply-To: <Pine.LNX.4.33.0211051113470.23033-100000@mandarin>
Message-ID: <Pine.LNX.4.44.0211051109000.20421-100000@localhost.localdomain>

> (this used to be used in Circle for public keys. It now uses a graphical
> squiggle, the idea being a public key is recognizable to people (as
> opposed to transcribable) so that they can search for a person then check
> that they aren't an imposter)
> 
> cheers,
> Paul

That little squiggle is a great idea, Paul.  original!


From zooko at zooko.com  Tue Nov  5 10:34:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented little squiggle encoding (was: human-oriented base-32 encoding)
In-Reply-To: Message from Lucas Gonze <lgonze@panix.com> 
   of "Tue, 05 Nov 2002 11:39:41 EST." <Pine.LNX.4.44.0211051109000.20421-100000@localhost.localdomain> 
References: <Pine.LNX.4.44.0211051109000.20421-100000@localhost.localdomain> 
Message-ID: <E1898SE-0006YC-00@localhost>

 Lucas Gonze wrote:
>
> That little squiggle is a great idea, Paul.

Yes!

> original!

Not entirely.  There've been many ideas posted to the Net throughout the years 
to represent cryptographic values graphically.  I'm afraid I don't have any 
references at the moment, but I recall butterflies, fractals, human faces, and 
other such graphical objects being proposed.  I even recall that an 
implementation was announced.

I *don't* recall "squiggle" as the form, and that sounds like a good idea to me.

Kudos to Paul for devising and implementing such sophisticated schemes as Circle 
and the squiggle.  (And the nonsense words, which by the way I recall were a 
part of the still-born PGP Phone, first version, back in 1995.)

Regards,

Zooko


From painlord2k at libero.it  Fri Nov  8 07:10:02 2002
From: painlord2k at libero.it (Mirco Romanato)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
References: <Pine.LNX.4.44.0211051109000.20421-100000@localhost.localdomain>
Message-ID: <005a01c28738$a8d588c0$c967fea9@painlord2k>

----- Original Message -----
From: "Lucas Gonze" <lgonze@panix.com>

> That little squiggle is a great idea, Paul.  original!
Sorry but it could work in US and maybe in english speaking countries but
for not english latin alphabet user it don't work.

I'm italian so for me half of this scheme is unusable.
And note that in italian we spell the words always like they are writed,
exceptions are very very rare.

Mirco


From clausen at gnu.org  Fri Nov  8 13:31:01 2002
From: clausen at gnu.org (Andrew Clausen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
In-Reply-To: <005a01c28738$a8d588c0$c967fea9@painlord2k>
References: <Pine.LNX.4.44.0211051109000.20421-100000@localhost.localdomain> <005a01c28738$a8d588c0$c967fea9@painlord2k>
Message-ID: <20021108203729.GC1119@gnu.org>

On Fri, Nov 08, 2002 at 04:08:12PM +0100, Mirco Romanato wrote:
> Sorry but it could work in US and maybe in english speaking countries but
> for not english latin alphabet user it don't work.
> 
> I'm italian so for me half of this scheme is unusable.
> And note that in italian we spell the words always like they are writed,
> exceptions are very very rare.

Could you have a different system for each locale?

Eg: italian would probably have extra vowel sounds like "io" and "ia",
and not things like "th" and "gh".

It's a way of representing arbitary data in human readable
form... it's just a user interface issue.

Cheers,
Andrew


From bram at gawth.com  Sat Nov  9 16:49:01 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] reminder: p2p-hackers meeting tomorrow
Message-ID: <Pine.LNX.4.21.0211091647420.328-100000@ultra.gawth.com>

Remember, there will be a p2p-hackers meeting tomorrow, sunday, at 3pm in
the metreon.

I've got some really interesting exporting logins stuff to talk about, as
well as some new BitTorrent developments.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From bradneuberg at yahoo.com  Sat Nov  9 17:29:01 2002
From: bradneuberg at yahoo.com (Brad Neuberg)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Adaptive P2P Network Library?
Message-ID: <20021110012840.2391.qmail@web14107.mail.yahoo.com>

Is anyone aware of any libraries or work that allow
for adaptive P2P bandwidth usage?  For example,
imagine a "smart" networking library that does the
following:

*It allows you to state that your p2p network
"service" (which could be using many different ports
over time, TCP or UDP, etc.) will only use 40% of the
available bandwidth on your network connection.

*It allows you to designate your p2p network "service"
as less important than other applications
communicating on the network, such as web browsing,
etc.; the smart networking library then informs you
when there is low network traffic being communicated,
allowing you to start performing p2p communications. 
As soon as other applications start communicating your
app would then throttle down.  This would prevent a
p2p app from slowing down your other network
applications.

*It somehow "watches" the amount of traffic on your
local network as a whole, and allows you to set a
percentage of the total amount of traffic the p2p
network service can generate on the local network as a
whole during high and low bandwidth usage.

Here's an example.  Imagine a standard p2p file
sharing application that is using this smart
networking library.  It is configured to use up to 50%
of the local PC's bandwidth during peak local usage
(i.e. while other apps on the local PC are also using
bandwidth).  When other apps on the local PC are not
using any bandwidth, it is given permission to exhaust
the local bandwidth on the PC to serve files to other
clients.  It is also configured to be a "friendly"
citizen on the local network, and is therefore aware
of high and low usages of bandwidth on the local area
network as a whole.


Is this possible?

Thanks,
  Brad Neuberg
  bkn3@columbia.edu

From wesley at felter.org  Sun Nov 10 12:34:01 2002
From: wesley at felter.org (Wes Felter)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Adaptive P2P Network Library?
In-Reply-To: <20021110012840.2391.qmail@web14107.mail.yahoo.com>
Message-ID: <B9F41F49.46860%wesley@felter.org>

on 11/9/02 7:28 PM, Brad Neuberg at bradneuberg@yahoo.com wrote:

> Is anyone aware of any libraries or work that allow
> for adaptive P2P bandwidth usage?  For example,
> imagine a "smart" networking library that does the
> following:
>
> *It allows you to state that your p2p network
> "service" (which could be using many different ports
> over time, TCP or UDP, etc.) will only use 40% of the
> available bandwidth on your network connection.

The obvious way to do this is to use a QoS-enabled network stack, which most
computers don't have. To do it with no OS help you would first need to
figure out how much bandwidth is available, which is tricky in the general
case. You can probably get a good enough estimate by writing a bandwidth
monitor and calculating the maximum observed transfer rate over a rolling
n-second window.
 
> *It allows you to designate your p2p network "service"
> as less important than other applications
> communicating on the network, such as web browsing,
> etc.; the smart networking library then informs you
> when there is low network traffic being communicated,
> allowing you to start performing p2p communications.
> As soon as other applications start communicating your
> app would then throttle down.  This would prevent a
> p2p app from slowing down your other network
> applications.

Use QoS or write a bandwidth monitor.

> *It somehow "watches" the amount of traffic on your
> local network as a whole, and allows you to set a
> percentage of the total amount of traffic the p2p
> network service can generate on the local network as a
> whole during high and low bandwidth usage.

You could put the NIC in promiscuous mode and use a bandwidth monitor, but
this will increase CPU utilization. And that won't even work for anything
but a traditional Ethernet (which is becoming increasingly rare in a world
where for 39 cents more you can value-size your router to also include a
switch).

Wes Felter - wesley@felter.org - http://felter.org/wesley/


From pfh at mail.csse.monash.edu.au  Sun Nov 10 15:07:01 2002
From: pfh at mail.csse.monash.edu.au (Paul F Harrison)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Adaptive P2P Network Library?
In-Reply-To: <20021110012840.2391.qmail@web14107.mail.yahoo.com>
Message-ID: <Pine.GSO.4.10.10211110951410.29224-100000@bruce.csse.monash.edu.au>

On Sat, 9 Nov 2002, Brad Neuberg wrote:

> Is anyone aware of any libraries or work that allow
> for adaptive P2P bandwidth usage?  For example,
> imagine a "smart" networking library that does the
> following:
> 
> *It allows you to state that your p2p network
> "service" (which could be using many different ports
> over time, TCP or UDP, etc.) will only use 40% of the
> available bandwidth on your network connection.
> 
> *It allows you to designate your p2p network "service"
> as less important than other applications
> communicating on the network, such as web browsing,
> etc.; the smart networking library then informs you
> when there is low network traffic being communicated,
> allowing you to start performing p2p communications. 
> As soon as other applications start communicating your
> app would then throttle down.  This would prevent a
> p2p app from slowing down your other network
> applications.
>

I'm not sure how to allocate a % of bandwidth to your app with IP, however
there is a way to make it lower priority than other network use.

IP indicates network congestion by dropping packets. TCP responds to this
by sending less packets at a time, or by increasing the time between
packets sent. With UDP however, this sort of thing is up to you. It's
possible to completely lock up a network by blasting it with UDP packets
(from personal experience ;-) ). Any application that makes serious use of
UDP has to implement its own back off algorithm.

So to make your network usage low priority, you could use UDP and make
sure you back off *lots* when packets get dropped.

Libraries that use UDP might allow you to set this as a parameter...
 
cheers,
Paul Harrison

Email: pfh@yoyo.cc.monash.edu.au
Web:   http://yoyo.cc.monash.edu.au/~pfh/


From greg at electricrain.com  Sun Nov 10 23:10:01 2002
From: greg at electricrain.com (Gregory P. Smith)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding
In-Reply-To: <E187ce9-0005S8-00@localhost>
References: <E187ce9-0005S8-00@localhost>
Message-ID: <20021111070906.GA14258@zot.electricrain.com>

> The rationale for libbase32 is different -- it is to represent arbitrary 
> sequences of octets in a form that is as convenient as possible for human users 
> to manipulate.  In particular, libbase32 was created in order to serve the Mnet 
> project [3], where 40-octet cryptographic values are encoded into URIs for 
> humans to manipulate.  Anticipated uses of these URIs include cut-and-paste, 
> text editing (e.g. in HTML files), manual transcription via a keyboard, manual 
> transcription via pen-and-paper, vocal transcription over phone or radio, etc.

Another idea: take the approach that ISBN numbers and other existing
common "not quite easy to send through a human accurately" codes use.
Include some additional ECC characters in the sequence to catch &
correct for mistyped digits.

(useless trivia: ISBN numbers use 0-9 for the main digits and a base11
[0-9X] digit for the final ecc checksum)


From zooko at zooko.com  Mon Nov 11 05:10:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] human-oriented base-32 encoding 
In-Reply-To: Message from "Gregory P. Smith" <greg@electricrain.com> 
   of "Sun, 10 Nov 2002 23:09:06 PST." <20021111070906.GA14258@zot.electricrain.com> 
References: <E187ce9-0005S8-00@localhost>  <20021111070906.GA14258@zot.electricrain.com> 
Message-ID: <E18BEG6-0000UP-00@localhost>

 Greg Smith wrote:
>
> Another idea: take the approach that ISBN numbers and other existing
> common "not quite easy to send through a human accurately" codes use.
> Include some additional ECC characters in the sequence to catch &
> correct for mistyped digits.
> 
> (useless trivia: ISBN numbers use 0-9 for the main digits and a base11
> [0-9X] digit for the final ecc checksum)

Greg: that's a good idea!  *So* good, in fact, that I've already added a "TODO" 
to my DESIGN doc [1] about it.  (See the first item at the end under "NEEDED TO 
ADD".)

Great minds think alike!  And so do ours!

Regards,

Zooko

[1] http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD&content-type=text/vnd.viewcvs-markup

From jsr at dit.upm.es  Mon Nov 11 05:28:01 2002
From: jsr at dit.upm.es (Joaquin Salvachua)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Adaptive P2P Network Library?
Message-ID: <Pine.LNX.4.44.0211111403310.8842-100000@jungla.dit.upm.es>

hello,

i have develop, with some students help, a pseudo-tcp over UDP 
socket library with bandwith control.

it is at:

https://sourceforge.net/projects/progtcp/

regards

Joaquin


-- 
-----------------------------------------------------------
Joaquin Salvachua               tel: +34 91 549 57 00  x.367
Associated Professor                 +34 91 549 57 62  x.367
dpt. Telematica        
E.T.S.I. Telecomunicacion 
Ciudad Universitaria S/N         fax: +34 91 336 73 33 
E-28040  MADRID   SPAIN 

mailto: jsalvachua@dit.upm.es // http://www.dit.upm.es/~jsr
------------------------------------------------------------


From hishigh at 163.com  Tue Nov 12 04:45:02 2002
From: hishigh at 163.com (hishigh)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] =?gb2312?B?cXVlc3Rpb25zIG9uIHAycCBRb1M=?=
Message-ID: <3DCF1A33.00002D.14435@bj222.163.com>

I heard the term of p2p QoS recently.But my colleagues and I argue much in this matter.I think p2p Qos focus on providing Diffserv on the transfer procedure since there are a lot of kinds of terminals from 33k/56k Modem to ADSL.We can provide different sevices when transferring.While my colleagues argue that p2p Qos is more focused on the scheduling in the servlet nodes for different kinds of service request for the reason that there is little room in the transfer link in improvements.How do u think about it?Thanks a lot.

                                       Yunfei Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zgp.org/pipermail/p2p-hackers/attachments/20021112/5206a61f/attachment.htm
From wege at acm.org  Wed Nov 13 13:43:01 2002
From: wege at acm.org (Chris Wege)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Wanted:Talk about P2P in Stuttgart, Germany
Message-ID: <3DD2BB8D.2010105@acm.org>

Hi Folks,

I am looking for someone in the Stuttgart, Germany area who would like 
to give a talk about P2P at the 28th of November at the Java User Group 
Stuttgart (www.jugs.org). Preferably in german.

Anyone?

Best regards,
Christian Wege

--
wege@acm.org
http://www.purl.org/net/wege


From zooko at zooko.com  Thu Nov 14 12:23:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
Message-ID: <E18CQRP-0000Zu-00@localhost>

Why do Bitzi and THEX use Tiger instead of SHA-1 as the basis for their tree 
hashes?

Much as I admire Tiger's inventors Ross Anderson and Eli Biham (and I do -- a 
lot!), all the benchmarks I have seen [1, 2] say that Tiger is about half as 
fast as SHA-1.  (It isn't clear what size Tiger output has been benchmarked.)

SHA-1 also has the advantages of being a U.S. federal standard and the de facto 
standard cryptographic hash among cryptographers.  (MD5 remains the de facto 
standard cryptographic hash among non-cryptographers, presumably because of the 
command-line implementation named "md5sum".)

Thanks in advance for your replies.  I may also post queries along these lines 
to crypto groups, in which case I'll summarize what I learn to the p2p-hackers 
list.

Regards,

Zooko

[1] http://www.eskimo.com/~weidai/benchmarks.html
[2] http://botan.randombit.net/bmarks.php


From gojomo at usa.net  Thu Nov 14 13:19:01 2002
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
References: <E18CQRP-0000Zu-00@localhost>
Message-ID: <00be01c28c23$52680d10$640a000a@golden>

Bitzi was already taking the SHA1 hash of the full-file, and then
wanted a second hash for (1) robustness of our catalog against 
the (hypothetical future) discovery of problems in SHA1 (2) 
incremental and subrange verifications.

To have used SHA1 again, as the tree basis, would have made both 
hashes dependent on the same algorithm, and thus potentially
fall to the same theoretical breakthroughs.

Anderson and Biham emphasize in their paper that Tiger's derivation
is different than that of the MD4/MD5/SHA1 family of hash functions --
of which both MD4 and MD5 have already been compromised, to some
degree.

They also suggest that Tiger will calculate much more efficiently
on 64-bit processors, and that it is already competitive with MD5.
(To that last claim, I can only presume they were using a reference,
completely unoptimized MD5 implementation.) They also note that 
their reference code has not been fully optimized for 32-bit 
machines.

I suspect that the 2x performance gap between SHA1 and Tiger 
in the comparisons you cite is mostly due to the fact that 
the highly-used SHA1 code has been highly tuned, and if/when
the Tiger code is similarly tuned, it will do much better. 

If Anderson & Biham's comments about 64-bit processors are 
correct, then we might expect Tiger to outperform SHA1, when
both are equivalently optimized, on future 64-bit processors.

So for Bitzi, using SHA1 for the full-file hash was an easy
choice -- for immediate accessibility to the widest audience --
while using Tiger for the secondary, more exotic tree hash gained 
(1) algorithm diversity; (2) an extra 32 bits of hash (192 vs 
160); (3) a potential speed *improvement* over SHA1, if/when 
Tiger is equivalently optimized or 64-bit processors become the
norm.

In THEX, any algorithm may be specified to construct the tree,
but I think the existing examples and work is biased towards 
Tiger, in order to (1) preserve potential interoperability with 
the existing Bitzi code and catalog; (2) enjoy the long-term 
benefits in the someday optimized/64-bit world.

- Gojomo
____________________   
Gordon Mohr <gojomo@ . . . At Bitzi, people cooperate to identify, rate,
bitzi.com> Bitzi CTO . . . describe and discover files of every kind. 
_ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it!


----- Original Message ----- 
From: "Zooko" <zooko@zooko.com>
To: <p2p-hackers@zgp.org>
Sent: Thursday, November 14, 2002 12:18 PM
Subject: [p2p-hackers] Tiger vs. SHA-1


> 
> Why do Bitzi and THEX use Tiger instead of SHA-1 as the basis for their tree 
> hashes?
> 
> Much as I admire Tiger's inventors Ross Anderson and Eli Biham (and I do -- a 
> lot!), all the benchmarks I have seen [1, 2] say that Tiger is about half as 
> fast as SHA-1.  (It isn't clear what size Tiger output has been benchmarked.)
> 
> SHA-1 also has the advantages of being a U.S. federal standard and the de facto 
> standard cryptographic hash among cryptographers.  (MD5 remains the de facto 
> standard cryptographic hash among non-cryptographers, presumably because of the 
> command-line implementation named "md5sum".)
> 
> Thanks in advance for your replies.  I may also post queries along these lines 
> to crypto groups, in which case I'll summarize what I learn to the p2p-hackers 
> list.
> 
> Regards,
> 
> Zooko
> 
> [1] http://www.eskimo.com/~weidai/benchmarks.html
> [2] http://botan.randombit.net/bmarks.php
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers


From justin at chapweske.com  Thu Nov 14 13:24:01 2002
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
References: <E18CQRP-0000Zu-00@localhost> <00be01c28c23$52680d10$640a000a@golden>
Message-ID: <3DD41457.8080809@chapweske.com>

Ditto.

Gordon Mohr wrote:
> Bitzi was already taking the SHA1 hash of the full-file, and then
> wanted a second hash for (1) robustness of our catalog against 
> the (hypothetical future) discovery of problems in SHA1 (2) 
> incremental and subrange verifications.
> 
> To have used SHA1 again, as the tree basis, would have made both 
> hashes dependent on the same algorithm, and thus potentially
> fall to the same theoretical breakthroughs.
> 
> Anderson and Biham emphasize in their paper that Tiger's derivation
> is different than that of the MD4/MD5/SHA1 family of hash functions --
> of which both MD4 and MD5 have already been compromised, to some
> degree.
> 
> They also suggest that Tiger will calculate much more efficiently
> on 64-bit processors, and that it is already competitive with MD5.
> (To that last claim, I can only presume they were using a reference,
> completely unoptimized MD5 implementation.) They also note that 
> their reference code has not been fully optimized for 32-bit 
> machines.
> 
> I suspect that the 2x performance gap between SHA1 and Tiger 
> in the comparisons you cite is mostly due to the fact that 
> the highly-used SHA1 code has been highly tuned, and if/when
> the Tiger code is similarly tuned, it will do much better. 
> 
> If Anderson & Biham's comments about 64-bit processors are 
> correct, then we might expect Tiger to outperform SHA1, when
> both are equivalently optimized, on future 64-bit processors.
> 
> So for Bitzi, using SHA1 for the full-file hash was an easy
> choice -- for immediate accessibility to the widest audience --
> while using Tiger for the secondary, more exotic tree hash gained 
> (1) algorithm diversity; (2) an extra 32 bits of hash (192 vs 
> 160); (3) a potential speed *improvement* over SHA1, if/when 
> Tiger is equivalently optimized or 64-bit processors become the
> norm.
> 
> In THEX, any algorithm may be specified to construct the tree,
> but I think the existing examples and work is biased towards 
> Tiger, in order to (1) preserve potential interoperability with 
> the existing Bitzi code and catalog; (2) enjoy the long-term 
> benefits in the someday optimized/64-bit world.
> 
> - Gojomo
> ____________________   
> Gordon Mohr <gojomo@ . . . At Bitzi, people cooperate to identify, rate,
> bitzi.com> Bitzi CTO . . . describe and discover files of every kind. 
> _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it!
> 
> 
> ----- Original Message ----- 
> From: "Zooko" <zooko@zooko.com>
> To: <p2p-hackers@zgp.org>
> Sent: Thursday, November 14, 2002 12:18 PM
> Subject: [p2p-hackers] Tiger vs. SHA-1
> 
> 
> 
>>Why do Bitzi and THEX use Tiger instead of SHA-1 as the basis for their tree 
>>hashes?
>>
>>Much as I admire Tiger's inventors Ross Anderson and Eli Biham (and I do -- a 
>>lot!), all the benchmarks I have seen [1, 2] say that Tiger is about half as 
>>fast as SHA-1.  (It isn't clear what size Tiger output has been benchmarked.)
>>
>>SHA-1 also has the advantages of being a U.S. federal standard and the de facto 
>>standard cryptographic hash among cryptographers.  (MD5 remains the de facto 
>>standard cryptographic hash among non-cryptographers, presumably because of the 
>>command-line implementation named "md5sum".)
>>
>>Thanks in advance for your replies.  I may also post queries along these lines 
>>to crypto groups, in which case I'll summarize what I learn to the p2p-hackers 
>>list.
>>
>>Regards,
>>
>>Zooko
>>
>>[1] http://www.eskimo.com/~weidai/benchmarks.html
>>[2] http://botan.randombit.net/bmarks.php
>>
>>_______________________________________________
>>p2p-hackers mailing list
>>p2p-hackers@zgp.org
>>http://zgp.org/mailman/listinfo/p2p-hackers
> 
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From mujtaba at asu.edu  Fri Nov 15 05:27:01 2002
From: mujtaba at asu.edu (Mujtaba Khambatti)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Need some comments
Message-ID: <CKEEJEHHMACHGIAPJOPLEEBLCCAA.mujtaba@asu.edu>

Hi all..
I have published a paper titled: "Peer-to-Peer Communities: Formation and
Discovery"
http://www.public.asu.edu/%7emujtaba/Articles%20and%20Papers/pdcs-iasted-02.
pdf

Peer-to-Peer Communities are like interest groups, modeled after human
communities and can overlap. They can also exist without anyone knowing
about their existence. Communities are created, implicitly when one or more
entities claim an interest in the same topic. Our work focuses on efficient
methods to discover the formation of these self-configuring communities. We
investigate the behavior of randomly created communities and model the
complexity of discovery algorithms.

Please send me your comments - they will be greatly appreciated.

thanks,
Mujtaba

===========================================
Mujtaba Khambatti
http://www.public.asu.edu/~mujtaba

Work Address: ASU, Tempe, AZ 85287-5406.
Work Number : (480) 965-2737
Home Number : (480) 967-6568
===========================================


From bram at gawth.com  Fri Nov 15 08:34:01 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
In-Reply-To: <00be01c28c23$52680d10$640a000a@golden>
Message-ID: <Pine.LNX.4.21.0211150831350.12011-100000@ultra.gawth.com>

Gordon Mohr wrote:

> Bitzi was already taking the SHA1 hash of the full-file, and then
> wanted a second hash for (1) robustness of our catalog against 
> the (hypothetical future) discovery of problems in SHA1

Well geeze, why not do both sha1 and tiger and xor them if you really care
that much?

Trying to design crypto protocols with the assumption that you don't trust
your primitives quickly gets completely ridiculous.

> I suspect that the 2x performance gap between SHA1 and Tiger 
> in the comparisons you cite is mostly due to the fact that 
> the highly-used SHA1 code has been highly tuned, and if/when
> the Tiger code is similarly tuned, it will do much better. 

I find that dubious.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From gojomo at usa.net  Fri Nov 15 09:33:01 2002
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
References: <Pine.LNX.4.21.0211150831350.12011-100000@ultra.gawth.com>
Message-ID: <00ed01c28ccc$e8ae0190$640a000a@golden>

Bram Cohen writes:
> Gordon Mohr wrote:
> 
> > Bitzi was already taking the SHA1 hash of the full-file, and then
> > wanted a second hash for (1) robustness of our catalog against 
> > the (hypothetical future) discovery of problems in SHA1
> 
> Well geeze, why not do both sha1 and tiger and xor them if you really care
> that much?

We really want to track them independently, as a guard against 
the worrisome threat: that one or the other is discovered to
be weak or otherwise manipulable.

Further, XORing them would make it impossible to use the TigerTree
for subrange verification without also calculating the SHA1.

> Trying to design crypto protocols with the assumption that you don't trust
> your primitives quickly gets completely ridiculous.

Historically, hash algorithms get broken. The two most immediate
antecedents of SHA1 -- MD4 and MD5 -- have been shown to be 
weaker than they were designed to be. (MD4, so weak than anyone
with a desktop machine can create desired preimages in a short
amount of time; MD5, weak enough to be unsuitable for some 
applications and generally frowned upon for new work.)

Ross Anderson and Eli Biham thought this suggested consideration
of new hash functions was advisable. From their Tiger paper 
(apparently written in 1995 or early 1996): 

    These attacks cast doubt on the security of the other 
    members of these families. One may only speculate at how long 
    each function will remain unbroken; however it seems prudent 
    to start work now on replacements. 

(Their paper does not even refer to the most troublesome attacks
on MD5.)

So, I defer to Professors Anderson and Biham on the issue of
whether it is "completely ridiculous" to consider the possibility
your cryptographic hash algorithm will someday be untrustworthy.

Would you design a "crypto protocol" with no facility for 
changing hash functions, if ever necessary? That seems a 
minority viewpoint in security design.

Further, your statement about "trying to design crypto protocols"
is nonsensical; Bitzi's application is not really a "crypto 
protocol". It is a long-lived, shared reference catalog. Ideally,
the Bitzi datadumps will be useful for decades if not centuries. 
Its "primary keys" should thus be as robust as possible against
theoretical breakthroughs far beyond your imagination. We are not
at the end of science and mathmatics, with your domain expertise 
the pinnacle of learning. 

With two primary keys, each an independent strong hash, as long
as any breakthrough only compromises one hash at a time, there
will be a window of opportunity (before the second hash is broken)
with which to cross-reference old data to new stronger hashes,
with secure timestamps. All catalogued data may not make the 
transition, but the situation is better than relying on a single
hash. 

> > I suspect that the 2x performance gap between SHA1 and Tiger 
> > in the comparisons you cite is mostly due to the fact that 
> > the highly-used SHA1 code has been highly tuned, and if/when
> > the Tiger code is similarly tuned, it will do much better. 
> 
> I find that dubious.

Based on what?

In working with different freely-available SHA1 implementations in
C/C++, we saw differences, on the order of 2x, in their hashing 
speed.

Anderson and Biham, no slouches, compared their (lightly 
optimized) Tiger reference implementation against MD5, and found 
Tiger faster. (!) And yet the sites Zooko referenced suggested 
MD5 was 4x faster than Tiger. What was the difference between
MD5 implementations? The level of code optimization (by either
hand tuning or compiler logic).

Bram, what is more alarming than the fact that your knowledge 
is limited, is that you don't even realize how limited it is. 
The universe does not end at the walls which encircle your current 
level of understanding.

- Gojomo
____________________   
Gordon Mohr <gojomo@ . . . At Bitzi, people cooperate to identify, rate,
bitzi.com> Bitzi CTO . . . describe and discover files of every kind. 
_ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it!


From bram at gawth.com  Fri Nov 15 15:53:01 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
In-Reply-To: <00ed01c28ccc$e8ae0190$640a000a@golden>
Message-ID: <Pine.LNX.4.21.0211151548400.14045-100000@ultra.gawth.com>

Gordon Mohr wrote:

> Would you design a "crypto protocol" with no facility for 
> changing hash functions, if ever necessary? That seems a 
> minority viewpoint in security design.

Primitives can be changed with an increment in the major version number,
the micromanaging of algorithms in pgp and ssl has proven to do nothing
but make implementation more difficult and cause compatibility headaches.

> Further, your statement about "trying to design crypto protocols"
> is nonsensical; Bitzi's application is not really a "crypto 
> protocol".

It uses crypto, ergo it is a crypto protocol.

> > > I suspect that the 2x performance gap between SHA1 and Tiger 
> > > in the comparisons you cite is mostly due to the fact that 
> > > the highly-used SHA1 code has been highly tuned, and if/when
> > > the Tiger code is similarly tuned, it will do much better. 
> > 
> > I find that dubious.
> 
> Based on what?

There is only one measure of time, and that's minutes and seconds. You're
making performance claims about something you've already deployed based on
sheer guesswork.

> Bram, what is more alarming than the fact that your knowledge 
> is limited, is that you don't even realize how limited it is. 
> The universe does not end at the walls which encircle your current 
> level of understanding.

I have no other ways of saying this. Fuck off.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From gojomo at usa.net  Fri Nov 15 17:26:01 2002
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
References: <Pine.LNX.4.21.0211151548400.14045-100000@ultra.gawth.com>
Message-ID: <022201c28d0f$109f5ae0$640a000a@golden>

Bram Cohen writes:
> Gordon Mohr wrote:
> > Would you design a "crypto protocol" with no facility for 
> > changing hash functions, if ever necessary? That seems a 
> > minority viewpoint in security design.
> 
> Primitives can be changed with an increment in the major version number,
> the micromanaging of algorithms in pgp and ssl has proven to do nothing
> but make implementation more difficult and cause compatibility headaches.

An interesting, but decidedly minority, opinion. People who
press the envelope with protocols appreciate that flexibility
to adopt custom or stronger swap-ins. 

People who professionally design protocols, for use by multiple 
projects and organizations, now regularly include parameterizable 
algorithms. 

Do you expect your idosyncratic opinion, on the superiority of version 
numbers as a means of constraining algorithm choices, is going to be 
reflected in most (or indeed any) of the protocols that our fellow 
p2p-hackers will be implementing? Which ones? 

Going back to Zooko's question --  why did Bitzi utilize Tiger -- and
our strategy of having a backup algorithm for SHA1, of what use is 
your preference for version numbers? 

If we simply trusted SHA1, and in 2010, SHA1 becomes trivially 
manipulable due to a theoretical breakthrough, major version
numbers would just tell us what data we have to throw out. It
wouldn't save valuable assertions, as having a fallback algorithm
does. 

> > Further, your statement about "trying to design crypto protocols"
> > is nonsensical; Bitzi's application is not really a "crypto 
> > protocol".
> 
> It uses crypto, ergo it is a crypto protocol.

Except that Bitzi's not a "protocol", it's a database.

> > > > I suspect that the 2x performance gap between SHA1 and Tiger 
> > > > in the comparisons you cite is mostly due to the fact that 
> > > > the highly-used SHA1 code has been highly tuned, and if/when
> > > > the Tiger code is similarly tuned, it will do much better. 
> > > 
> > > I find that dubious.
> > 
> > Based on what?
> 
> There is only one measure of time, and that's minutes and seconds. You're
> making performance claims about something you've already deployed based on
> sheer guesswork.

No, I've seen 2x to 4x differences in hashing code, same
algorithm, based on how much effort has been devoted to 
optimizing that particular code. There's no guesswork
involved there. Haven't you seen such differences in your
experience?

The authors of the Tiger code say that little effort 
has been devoted to optimizing their code, especially
for 32-bit processors. Do you not believe them?

Meanwhile, highly-used -- and thus highly-optimized --
SHA1 code is common. People promulgating libraries care 
very much about how fast their SHA1 implementations are, 
as they are highly likely to be used and benchmarked, 
while they typically only care that their Tiger code 
gives correct results, at least until its use becomes 
more prevalent.

In almost any benchmark now available, the SHA1 code is 
likely to be near optimal, while the Tiger code is far 
from it. Haven't you noticed this relationship between
code age/prevalence and level of optimization before? 
Have you been coding under a rock?

> > Bram, what is more alarming than the fact that your knowledge 
> > is limited, is that you don't even realize how limited it is. 
> > The universe does not end at the walls which encircle your current 
> > level of understanding.
> 
> I have no other ways of saying this. Fuck off.

Interesting. You can mock another person's judgement as
"ridiculous", without any support beyond the brashness
with which you speak, but when it suggested that your own 
vision and experience is limited, you have no response 
but profanity. 

That's poor social protocol design.

- Gojomo


From bram at gawth.com  Fri Nov 15 20:02:01 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
In-Reply-To: <022201c28d0f$109f5ae0$640a000a@golden>
Message-ID: <Pine.LNX.4.21.0211151942100.14796-100000@ultra.gawth.com>

Gordon Mohr wrote:

> Bram Cohen writes:
> > 
> > Primitives can be changed with an increment in the major version number,
> > the micromanaging of algorithms in pgp and ssl has proven to do nothing
> > but make implementation more difficult and cause compatibility headaches.
> 
> An interesting, but decidedly minority, opinion. People who
> press the envelope with protocols appreciate that flexibility
> to adopt custom or stronger swap-ins. 

Yeah, well, there's a decided lack of competence in the field. Can anyone
here come up with a *single* instance in which a parameterizable algorithm
saved someone's ass? Even MD5 is to this day unbroken, and 3DES has
remained completely solid. There are multitudes of cases of errors in
protocol design, which of course are made more likely by
parameterizability, and yet even more in implementation, which
parameterizability exacerbates even more, but not a single break in
algorithms, which is the only thing parameterizability helps with.

> Do you expect your idosyncratic opinion, on the superiority of version 
> numbers as a means of constraining algorithm choices, is going to be 
> reflected in most (or indeed any) of the protocols that our fellow 
> p2p-hackers will be implementing? Which ones? 

No, I expect incompetence to continue to reign supreme. But my
'idiosyncratic' opinion has been followed in my own protocol, BitTorrent,
which, unlike all but a handful of other p2p protocols, is widely
deployed. That correlation is not coincidental.

> > There is only one measure of time, and that's minutes and seconds. You're
> > making performance claims about something you've already deployed based on
> > sheer guesswork.
> 
> No, I've seen 2x to 4x differences in hashing code, same
> algorithm, based on how much effort has been devoted to 
> optimizing that particular code. There's no guesswork
> involved there. Haven't you seen such differences in your
> experience?

Some implementations vary by that large a factor, but you're using that
as a reason why the absolute best one might be that much better than the
current best one, which by your own admission is rank speculation. If you
didn't consider performance when selecting tiger or thought other criteria
are more important than say so, but don't engage in wild speculation as if
it's fact.

> > > Bram, what is more alarming than the fact that your knowledge 
> > > is limited, is that you don't even realize how limited it is. 
> > > The universe does not end at the walls which encircle your current 
> > > level of understanding.
> > 
> > I have no other ways of saying this. Fuck off.
> 
> Interesting. You can mock another person's judgement as
> "ridiculous", without any support beyond the brashness
> with which you speak, but when it suggested that your own 
> vision and experience is limited, you have no response 
> but profanity. 
> 
> That's poor social protocol design.

And you can take your pretension and shove it, too.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From kevin at atkinson.dhs.org  Sat Nov 16 06:44:01 2002
From: kevin at atkinson.dhs.org (Kevin Atkinson)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Whi is SHA1 20 bytes?
Message-ID: <Pine.LNX.4.44.0211160913220.1075-100000@kevin-pc.atkinson.dhs.org>

I was wondering why SHA1 is 20 bytes instead of 16 which is a nice power 
of 2.

Is there any harm in dropping the last 4 bytes to make it 16 other than 
increasing the chance of collision which will still be to small to worry 
about?

Thanks in advance.

-- 
http://kevin.atkinson.dhs.org


From zooko at zooko.com  Sat Nov 16 07:14:02 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Why is SHA1 20 bytes? 
In-Reply-To: Message from Kevin Atkinson <kevin@atkinson.dhs.org> 
   of "Sat, 16 Nov 2002 09:43:17 EST." <Pine.LNX.4.44.0211160913220.1075-100000@kevin-pc.atkinson.dhs.org> 
References: <Pine.LNX.4.44.0211160913220.1075-100000@kevin-pc.atkinson.dhs.org> 
Message-ID: <E18D4Zj-0006Xn-00@localhost>

> I was wondering why SHA1 is 20 bytes instead of 16 which is a nice power 
> of 2.
> 
> Is there any harm in dropping the last 4 bytes to make it 16 other than 
> increasing the chance of collision which will still be to small to worry 
> about?

The "Birthday Paradox" says that if you want to generate a collision by randomly 
tossing balls into X buckets, that you will have to toss approximately sqrt(X)
balls before you have the first collision.  So if a hash function has a 128-bit 
output, it takes only about 2^64 work to generate a collision.

Indeed, I'm laying plans to take the advice of people like Ross Anderson (in 
"Security Engineering") who say that 160 bits is too small for the long run, and 
you should start moving to even larger sizes.

It seems ridiculous at first glance that an attacker might do 2^80 work, but 
when you try to weigh in uncertain factors like the following, 160 bits doesn't 
seem so untouchable.

 * faster computers (including special purpose hardware),
 * the proliferation of hash users (many uses of hashes "share" the hash space 
   so that all computers, all networks, all protocols on the planet are 
   vulnerable to collisions with one another),

and most uncertainly of all

 * theoretical advances that weaken the hash function

Regards,

Zooko


From agl at imperialviolet.org  Sat Nov 16 08:59:01 2002
From: agl at imperialviolet.org (Adam Langley)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Why is SHA1 20 bytes?
In-Reply-To: <E18D4Zj-0006Xn-00@localhost>
References: <Pine.LNX.4.44.0211160913220.1075-100000@kevin-pc.atkinson.dhs.org> <E18D4Zj-0006Xn-00@localhost>
Message-ID: <20021116160036.GA3857@imperialviolet.org>

On Sat, Nov 16, 2002 at 10:09:35AM -0500, Zooko wrote:
> 
> > I was wondering why SHA1 is 20 bytes instead of 16 which is a nice power 
> > of 2.
> > 
> > Is there any harm in dropping the last 4 bytes to make it 16 other than 
> > increasing the chance of collision which will still be to small to worry 
> > about?
> 
> Indeed, I'm laying plans to take the advice of people like Ross Anderson (in 
> "Security Engineering") who say that 160 bits is too small for the
> long run, and you should start moving to even larger sizes.

Of course, SHA isn't just 160 bits long. There are 256, 384 and 512 bit
versions:
	http://csrc.nist.gov/cryptval/shs.html
(thou SHA-384 is a trancated version of SHA-512)

-- 
Adam Langley                                      agl@imperialviolet.org
http://www.imperialviolet.org                       (+44) (0)7986 296753
PGP: 9113   256A   CC0F   71A6   4C84   5087   CDA5   52DF   2CB6   3D60

From ingo at fargonauten.de  Sat Nov 16 10:47:02 2002
From: ingo at fargonauten.de (ingo@fargonauten.de)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
In-Reply-To: <022201c28d0f$109f5ae0$640a000a@golden>
References: <Pine.LNX.4.21.0211151548400.14045-100000@ultra.gawth.com> <022201c28d0f$109f5ae0$640a000a@golden>
Message-ID: <20021116175656.GA6506@fargonauten.de>

On Fri, Nov 15, 2002 at 05:25:34PM -0800, Gordon Mohr wrote:
> An interesting, but decidedly minority, opinion. People who
> press the envelope with protocols appreciate that flexibility
> to adopt custom or stronger swap-ins. 

Errm, sorry, but IMHO its not as clear cut as you make it appear.

I might have gotten him wrong, of course, so any errors in this claim
are mine, but I'm pretty sure that at HAL2001, Phil Zimmermann claimed
that allowing the choice between SHA1 and RIPEMD160 in OpenPGP was
unfortunate and actually weakens security.  Subsequently, there have
been several discussions on the OpenPGP IETF mailing-list where
respected developers disputed the value of having so many algorithm
choices in the OpenPGP protocol.  Please refer to that lists archive
for detailed information.

So, there seems to be some disagreement on the value of choice in
crypto protocols amongst practicioners.

bye

-- 
		  http://fargonauten.de/ingo

PGP: 	3187 4DEC 47E6 1B1E 6F4F  57D4 CD90 C164 34AD CE5B

From bert at akamail.com  Sat Nov 16 12:37:01 2002
From: bert at akamail.com (Bert)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
References: <Pine.LNX.4.21.0211151548400.14045-100000@ultra.gawth.com> <022201c28d0f$109f5ae0$640a000a@golden> <20021116175656.GA6506@fargonauten.de>
Message-ID: <3DD6ADDA.6020904@akamail.com>

ingo@fargonauten.de wrote:

>On Fri, Nov 15, 2002 at 05:25:34PM -0800, Gordon Mohr wrote:
>  
>
>>An interesting, but decidedly minority, opinion. People who
>>press the envelope with protocols appreciate that flexibility
>>to adopt custom or stronger swap-ins. 
>>    
>>
>
>Errm, sorry, but IMHO its not as clear cut as you make it appear.
>
>I might have gotten him wrong, of course, so any errors in this claim
>are mine, but I'm pretty sure that at HAL2001, Phil Zimmermann claimed
>that allowing the choice between SHA1 and RIPEMD160 in OpenPGP was
>unfortunate and actually weakens security.
>

There's a difference between allowing choice, and allowing some perhaps 
unfortunate choices. I'd be  surprised if any security person with a 
clue, and any knowledge of the field's history, would suggest that 
support of  multiple encryption and hashing schemes in SSL and related 
protocols/apps is a bad practice. Then again I'm sure there are a few 
out there, but as Gordon states -- it's a minority viewpoint.

>  Subsequently, there have
>been several discussions on the OpenPGP IETF mailing-list where
>respected developers disputed the value of having so many algorithm
>choices in the OpenPGP protocol.  Please refer to that lists archive
>for detailed information.
>
Well, developers are always looking for excuses to be lazy, and maybe 
they did get a bit carried away in OpenPGP...but again, complaining 
about "too many algorithms" is not the same as denying the utility of 
choice.  I would nevertheless be interested in reading the discussions, 
but had no luck digging them out of the archives. Can you be more specific?


From ingo at fargonauten.de  Sat Nov 16 13:39:01 2002
From: ingo at fargonauten.de (ingo@fargonauten.de)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
In-Reply-To: <3DD6ADDA.6020904@akamail.com>
References: <Pine.LNX.4.21.0211151548400.14045-100000@ultra.gawth.com> <022201c28d0f$109f5ae0$640a000a@golden> <20021116175656.GA6506@fargonauten.de> <3DD6ADDA.6020904@akamail.com>
Message-ID: <20021116213703.GA7346@fargonauten.de>

On Sat, Nov 16, 2002 at 12:43:06PM -0800, Bert wrote:
> There's a difference between allowing choice, and allowing some perhaps 
> unfortunate choices. I'd be  surprised if any security person with a 
> clue, and any knowledge of the field's history, would suggest that 
> support of  multiple encryption and hashing schemes in SSL and related 
> protocols/apps is a bad practice. Then again I'm sure there are a few 
> out there, but as Gordon states -- it's a minority viewpoint.

Quoting from memory, he differentiated a bit on encryption and
message digests.

For encryption algorithms the reasoning is easy and was mostly
motivated by keeping implementations simple, not really security-wise.
There are differences between with long-lived (e.g., OpenPGP encrypted
documents and probably the kind of application we're talking about
here) and short-lived applications.  In the long-lived situation once
you introduced an algorithm, you have to support it forever, to be
able to decrypt your old documents.  In SSL, if you don't support it
anymore, don't offer it on negotation.  So, what might be an advantage
with SSL, could be a serious problem in practical implementation with
other applications.  However, thats just for completeness sake; I
don't think it applies to the current situation.

For message digests, the situation is different.  Offering two digest
algorithms might not improve security at all, depending on the rest of
the protocol.  In OpenPGP it doesn't, because the information which
algorithm to use is protected by that same algorithm.  So, if an
attacker breaks the digest and wants to change the data, he might just
as well change the "digest-type" info as well.  So, if SHA1 is broken,
it doesn't matter if the data was hashed with TIGER -- don't just
change the data, change the metadata as well and you can still make a
forgery, even though TIGER wasn't broken!

His recommendation to introduce new hashing algorithms was to upgrade
the version number of the protocol packets.  So, OpenPGPv5 could use
SHA-512 by default or something like that.  Presumably, a recipient
could infer that for data generated after some date or by some sender,
a version4 packet would be illegitimate.  Maybe there's was also some
other assurance involved, I don't remember and can't of think of any
at the moment.  However, it seems remarkably similiar to Brams
suggestion.

> I would nevertheless be interested in reading the discussions, but
> had no luck digging them out of the archives. Can you be more
> specific?

Hmm, got me.  The only interesting thing I can dig up again is the
ElGamal type 20 issue and some discussion regarding the introduction
of a MDC packet (which outlines some of the issues on when choice
might be appropriate, but not the real punchline).  Sorry :-(

Anyway, the above is what I recall about the discussion.  It still
makes sense to me, so I hope its correct ;-)

bye

-- 
		  http://fargonauten.de/ingo

PGP: 	3187 4DEC 47E6 1B1E 6F4F  57D4 CD90 C164 34AD CE5B

From bert at akamail.com  Sun Nov 17 09:48:01 2002
From: bert at akamail.com (Bert)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
References: <Pine.LNX.4.21.0211151548400.14045-100000@ultra.gawth.com> <022201c28d0f$109f5ae0$640a000a@golden> <20021116175656.GA6506@fargonauten.de> <3DD6ADDA.6020904@akamail.com> <20021116213703.GA7346@fargonauten.de>
Message-ID: <3DD7D7DF.4090706@akamail.com>

ingo@fargonauten.de wrote:

>For message digests, the situation is different.  Offering two digest
>algorithms might not improve security at all, depending on the rest of
>the protocol.  In OpenPGP it doesn't, because the information which
>algorithm to use is protected by that same algorithm.  So, if an
>attacker breaks the digest and wants to change the data, he might just
>as well change the "digest-type" info as well.  So, if SHA1 is broken,
>it doesn't matter if the data was hashed with TIGER -- don't just
>change the data, change the metadata as well and you can still make a
>forgery, even though TIGER wasn't broken!
>
Thanks for the clarification. Yes, if the goal is a long-lived 
signature, requiring a signature that is valid according to one of many 
available schemes only improves the odds of a protocol break. But this 
is exactly why Bitzi requires valid digests from both hashes, not just 
one or the other.  

>HHis recommendation to introduce new hashing algorithms was to upgrade
>the version number of the protocol packets.  So, OpenPGPv5 could use
>SHA-512 by default or something like that. 
>
Hope I'm not going beyond your recollection here, but is the 
recommendation just to make one method a "default", or is it to make one 
method exclusive? These are quite different things...If exclusive, then 
a break in that protocol requires an upgrade that completely pisses away 
backwards compatability. We sometimes forget that upgrading is rarely so 
simple (think Y2K).  At least in the current design, if one hash is 
broken, newer implementations can simply exclude it and any new 
signatures remain interpretable by older versions. Granted, applications 
that verify digests with the older version shouldn't be considered 
"secure" any longer (unless authenticity is further verified), but at 
least they continue to work until upgrade or replacement is possible.

I think version numbers and expiration are fine for many applications, 
probably including Bram's though I'm not so familiar with it. But 
protocols that are intended to be embedded and ubiquitous should offer 
considerably more graceful means for dealing with their (inevitable) 
need to evolve.


From gojomo at usa.net  Sun Nov 17 10:53:01 2002
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Whi is SHA1 20 bytes?
References: <Pine.LNX.4.44.0211160913220.1075-100000@kevin-pc.atkinson.dhs.org>
Message-ID: <004601c28e6a$8704c6f0$640a000a@golden>

Kevin Atkinson writes:
> I was wondering why SHA1 is 20 bytes instead of 16 which is a nice power 
> of 2.
> 
> Is there any harm in dropping the last 4 bytes to make it 16 other than 
> increasing the chance of collision which will still be to small to worry 
> about?

Each bit of a cryptographically strong hash is as secure as any other, 
so any truncation just creates a smaller strong hash. Attempts to, for
example, find collisions on the truncated version will take less time, 
but still be no more efficient than a brute-force search. The shorter 
hash is "weaker" in one sense but not "weak" definitionally.

So you could do the what you suggest, though you would then be deviating
from a well-known standard, and in so doing throw out extra security which, 
after you've calculated the whole 20-byte value, is essentially "free". 

Why are the 4 bytes so important in your application?

- Gojomo


From kevin at atkinson.dhs.org  Sun Nov 17 12:08:01 2002
From: kevin at atkinson.dhs.org (Kevin Atkinson)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Whi is SHA1 20 bytes?
In-Reply-To: <004601c28e6a$8704c6f0$640a000a@golden>
Message-ID: <Pine.LNX.4.44.0211171505330.949-100000@kevin-pc.atkinson.dhs.org>

On Sun, 17 Nov 2002, Gordon Mohr wrote:

> Kevin Atkinson writes:
> > I was wondering why SHA1 is 20 bytes instead of 16 which is a nice power 
> > of 2.
> > 
> > Is there any harm in dropping the last 4 bytes to make it 16 other than 
> > increasing the chance of collision which will still be to small to worry 
> > about?
> 
> Each bit of a cryptographically strong hash is as secure as any other, 
> so any truncation just creates a smaller strong hash. Attempts to, for
> example, find collisions on the truncated version will take less time, 
> but still be no more efficient than a brute-force search. The shorter 
> hash is "weaker" in one sense but not "weak" definitionally.
> 
> So you could do the what you suggest, though you would then be deviating
> from a well-known standard, and in so doing throw out extra security which, 
> after you've calculated the whole 20-byte value, is essentially "free". 
> 
> Why are the 4 bytes so important in your application?

There not.  I will probably keep all 20 bytes.  I just wanted to know if 
there was anything special about 20 bytes which apparently is not the 
case.

-- 
http://kevin.atkinson.dhs.org


From gojomo at usa.net  Sun Nov 17 12:30:01 2002
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
References: <Pine.LNX.4.21.0211151942100.14796-100000@ultra.gawth.com>
Message-ID: <005e01c28e78$06c4b690$640a000a@golden>

Bram Cohen writes:
> Gordon Mohr wrote:
> > Bram Cohen writes:
> > > 
> > > Primitives can be changed with an increment in the major version number,
> > > the micromanaging of algorithms in pgp and ssl has proven to do nothing
> > > but make implementation more difficult and cause compatibility headaches.
> > 
> > An interesting, but decidedly minority, opinion. People who
> > press the envelope with protocols appreciate that flexibility
> > to adopt custom or stronger swap-ins. 
> 
> Yeah, well, there's a decided lack of competence in the field. 

OK, Bram versus the world, got it. And your claim to unique competence is 
based on what reasoning or experience? 

Are there multiple interoperable implementations of protocols you've 
designed? Have they stood up to scrutiny and attack by third parties? 
Achieved many years of reliable use?

> Can anyone
> here come up with a *single* instance in which a parameterizable algorithm
> saved someone's ass? 

The practice of open, interoperable security conventions is still rather 
young, maybe 10-20 years. The "trusted algorithm goes completely bad" threat 
is one that is only expected to arise infrequently. 

That said, you can find examples of systems making profitable use of 
swappable algorithms. SSL users in uniquely security-conscious environments
enable only the algorithms which meet their local standards. DNSSEC work 
has introduced new algorithms over time, and as of RFC3110, now recommends
against the use of the original MD5 algorithm.

I don't think even you would find fault with the parameterizable key-
strengths in programs like PGP; that's certainly helped since the 
key-lengths that were once suggested as OK for casual security (384 bits) 
have not just been theoretically questioned, but actually brute-force 
discovered by small teams.

With hash algorithms, you don't have a slidable "strength" parameter
to pass in. (SHA256 is not SHA1 with a different input option.) So
if you want to be able to improve/patch security over time, you've got 
to be open to using new algorithms.

> Even MD5 is to this day unbroken, 

Depends on your definition of "broken". It's not nearly as strong as
it was designed or predicted to be, and is no longer advisable for many
of the applications it was once recommended for.

You should be familiar with these results:

- Even mildly resourceful organizations with suitable motivation
  should be able to create MD5 collisions over a matter of days, 
  not years.

  From: http://www.rsasecurity.com/rsalabs/faq/3-6-6.html

  "Van Oorschot and Wiener [VW94] have considered a brute-force 
  search for collisions (see Question 2.1.6) in hash functions, 
  and they estimate a collision search machine designed 
  specifically for MD5 (costing $10 million in 1994) could find 
  a collision for MD5 in 24 days on average."

  We have 8 more years of advances in computing power, and 
  price drops, and tools for network parallelism, available 
  to us today. 

- RSALabs was recommending as early as 1996 that "applications 
  which rely on the collision-resistance of a hash function should 
  be upgraded away from MD2 and MD5 when practical and convenient." 

  (See: http://citeseer.nj.nec.com/robshaw96recent.html)

> and 3DES has
> remained completely solid.

And yet, why was 3DES needed as a drop-in replacement for *DES*? 

Your own examples make the case for a healthy suspicion about 
algorithms.

> No, I expect incompetence to continue to reign supreme. But my
> 'idiosyncratic' opinion has been followed in my own protocol, BitTorrent,
> which, unlike all but a handful of other p2p protocols, is widely
> deployed. That correlation is not coincidental.

Congratulations! What threshold did you pass that put you into the
elite "widely deployed" category? You should issue a press release.

The "incompetents" who keep putting parameterizable algorithms in 
their internet-infrastructure formats and protocols also have a lot 
of real-world deployment and experience.

That's not a standard for evaluating competing ideas that you have 
any chance of winning, so why bring it up?

> > No, I've seen 2x to 4x differences in hashing code, same
> > algorithm, based on how much effort has been devoted to 
> > optimizing that particular code. There's no guesswork
> > involved there. Haven't you seen such differences in your
> > experience?
> 
> Some implementations vary by that large a factor, but you're using that
> as a reason why the absolute best one might be that much better than the
> current best one, which by your own admission is rank speculation. If you
> didn't consider performance when selecting tiger or thought other criteria
> are more important than say so, but don't engage in wild speculation as if
> it's fact.

I reported the conjectures of Anderson and Biham, that their Tiger code 
has at least some further room for optimization (likely to reduce the 2x 
difference with SHA1 seen in some libraries), and that Tiger is likely to 
outperform functions designed for 32-bit processors when run on 64-bit
processors. 

That's not "wild speculation", that's reasoned expert speculation, as was 
explained from the beginning.

- Gojomo


From bram at gawth.com  Mon Nov 18 03:53:02 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
In-Reply-To: <005e01c28e78$06c4b690$640a000a@golden>
Message-ID: <Pine.LNX.4.21.0211180341190.14796-100000@ultra.gawth.com>

Gordon Mohr wrote:

> > and 3DES has
> > remained completely solid.
> 
> And yet, why was 3DES needed as a drop-in replacement for *DES*? 

You should learn about key lengths.

Gordon, you've now succeeded in making me really, really not like you.
Your profound lack of technical cluefulness is hardly unique, but your
bullheaded lack of awareness of it and personal insults to people who
point out when you're wrong are offensive.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From bram at gawth.com  Mon Nov 18 04:15:02 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
In-Reply-To: <3DD7D7DF.4090706@akamail.com>
Message-ID: <Pine.LNX.4.21.0211180407520.14796-100000@ultra.gawth.com>

Bert wrote:

> I think version numbers and expiration are fine for many applications, 
> probably including Bram's though I'm not so familiar with it. But 
> protocols that are intended to be embedded and ubiquitous should offer 
> considerably more graceful means for dealing with their (inevitable) 
> need to evolve.

There are a few things done in BitTorrent to try to make extensibility
clean -

The metainfo files are in a format which is an encoding of a dictionary,
and unrecognized keys are ignored, so new ones can be added later.

The peer protocol contains some reserved bytes which are, well, reserved,
and can be used in case of new functionality which needs to be supported
on both ends.

Finally, there's a protocol identifier at the beginning of the peer
protocol, and a specific mimetype used to launch it, and either or both of
those may be changed in the case of a protocol change which isn't
backwardly compatible.

Those are about the best you can do. Try to support new functionality
without it interfering with old functionality, and leave in a hook for
simply declaring a new change non-backwards-compatible. Even so, I put off
declaring the protocol final until after a very excruciating extended
release process, involving many backwards-incompatible changes before the
end. Regardless of how many and how good your extensibility hooks are,
protocol changes are always exceedingly painful.

And please, if you make a change which isn't backwards-compatible, admit
it to yourself and make a clean break. Carrying along cruft for the sake
of pride is inevitably a disaster.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From steve_bryan at mac.com  Mon Nov 18 09:21:01 2002
From: steve_bryan at mac.com (Steve Bryan)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
In-Reply-To: <Pine.LNX.4.21.0211180341190.14796-100000@ultra.gawth.com>
Message-ID: <03816FD5-FB1A-11D6-82BC-003065B4EAAE@mac.com>

Pot, kettle, black.

On Monday, November 18, 2002, at 05:52  am, Bram Cohen wrote:

> Gordon Mohr wrote:
>
>>> and 3DES has
>>> remained completely solid.
>>
>> And yet, why was 3DES needed as a drop-in replacement for *DES*?
>
> You should learn about key lengths.
>
> Gordon, you've now succeeded in making me really, really not like you.
> Your profound lack of technical cluefulness is hardly unique, but your
> bullheaded lack of awareness of it and personal insults to people who
> point out when you're wrong are offensive.
>
> -Bram Cohen
>
> "Markets can remain irrational longer than you can remain solvent"
>                                         -- John Maynard Keynes
>
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers


From gojomo at usa.net  Mon Nov 18 13:53:01 2002
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
References: <Pine.LNX.4.21.0211180341190.14796-100000@ultra.gawth.com>
Message-ID: <00ed01c28f4c$d708f490$640a000a@golden>

You're insulting a straw man. 

"Drop-in" does not have to mean "same key length". I can see how it
might be interpreted that way, so perhaps it was a poor choice of 
words on my part. So if it helps better make my point -- which was
that no algorithm deserves blind trust -- simply drop out the words
"drop-in": "And yet, why was 3DES needed as a replacement for *DES*?"

Even if you dislike me, I like you. That's why I bother calling 
"bs!" on your cocksure -- but unsupportable -- pontificating. 
Look out for the people who trust you simply because you're brash. 

- Gojomo

----- Original Message ----- 
From: "Bram Cohen" <bram@gawth.com>
To: <p2p-hackers@zgp.org>
Sent: Monday, November 18, 2002 3:52 AM
Subject: Re: [p2p-hackers] Tiger vs. SHA-1


> Gordon Mohr wrote:
> 
> > > and 3DES has
> > > remained completely solid.
> > 
> > And yet, why was 3DES needed as a drop-in replacement for *DES*? 
> 
> You should learn about key lengths.
> 
> Gordon, you've now succeeded in making me really, really not like you.
> Your profound lack of technical cluefulness is hardly unique, but your
> bullheaded lack of awareness of it and personal insults to people who
> point out when you're wrong are offensive.
> 
> -Bram Cohen
> 
> "Markets can remain irrational longer than you can remain solvent"
>                                         -- John Maynard Keynes
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers


From oskar at freenetproject.org  Mon Nov 18 18:21:01 2002
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
In-Reply-To: <03816FD5-FB1A-11D6-82BC-003065B4EAAE@mac.com>
References: <Pine.LNX.4.21.0211180341190.14796-100000@ultra.gawth.com> <03816FD5-FB1A-11D6-82BC-003065B4EAAE@mac.com>
Message-ID: <20021119021957.GC351@sporty.spiceworld>

Gordon can come play with me instead. I'm not allowed to play with Bram
either.

On Mon, Nov 18, 2002 at 11:20:20AM -0600, Steve Bryan wrote:
> Pot, kettle, black.
> 
> On Monday, November 18, 2002, at 05:52  am, Bram Cohen wrote:
> 
> >Gordon Mohr wrote:
> >
> >>>and 3DES has
> >>>remained completely solid.
> >>
> >>And yet, why was 3DES needed as a drop-in replacement for *DES*?
> >
> >You should learn about key lengths.
> >
> >Gordon, you've now succeeded in making me really, really not like you.
> >Your profound lack of technical cluefulness is hardly unique, but your
> >bullheaded lack of awareness of it and personal insults to people who
> >point out when you're wrong are offensive.
> >
> >-Bram Cohen
> >
> >"Markets can remain irrational longer than you can remain solvent"
> >                                        -- John Maynard Keynes
> >
> >_______________________________________________
> >p2p-hackers mailing list
> >p2p-hackers@zgp.org
> >http://zgp.org/mailman/listinfo/p2p-hackers
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers

-- 

Oskar Sandberg
oskar@freenetproject.org

From sean at lynch.tv  Mon Nov 18 19:43:01 2002
From: sean at lynch.tv (Sean R. Lynch)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] protocol design, hashes, parameterizable protocols, etc.
Message-ID: <1037676615.4446.47.camel@makoto.chaosring.org>

It seems like the overriding reason protocols are the way they are is
because that's the way the designer decided to make them. It seems to me
that ultimately the goal is not to make protocols "perfect," but to make
them work. I think this is why protocols usually aren't (successfully)
designed by committees. The coder or coordinator gets to make the
decisions, and whoever doesn't like it can put up with it or leave,
fork, whatever. 

I think (hope) that this group has bigger fish to fry than which hash to
use or whether to use feature negotiation or version numbers to decide
which hash algorithm or whatever to use, who's more technically
competent, etc. Please correct me if I'm wrong and I'll go find another
list to lurk on.

-- 
Sean R. Lynch <http://sean.lynch.tv/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20021118/4929b800/attachment.pgp
From zooko at zooko.com  Tue Nov 19 05:32:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160)
Message-ID: <E18E8Pa-0003XP-00@localhost>

Folks:

I've been reading the discussion with interest, as I'm working on defining a new 
distributed filestore format for Mnet, and have to make decisions about crypto 
algorithms, future-compatibility, etc.

On interesting detail that I've noticed is that there is actually a "future-
compatibility" benefit to replacing SHA-1 with a truncated, wider hash, such as 
SHA-512, with the output truncated down to 160 bits!

The reason for this is that in Mnet blocks of data are distributed among block 
servers based on the hash of the block itself.  (This is the "consistent 
hashing" technique, first published by Karger [1], and now used in all of the 
DHTs.)  If we use a 160-bit hash now, and at some future point we have to switch 
to a wider hash, then if the wider hash is equal to the smaller hash in some 
subset of its bits, and if those are the high-order bits in our consistent 
hashing scheme, then we do not have to move any blocks from one block server to 
another when making the transition.

The cost of such a design is significant, though.  According to the Crypto++ 
benchmarks, the SHA algorithms can chew through this many megabytes per second:

SHA-1   48.462
SHA-256 24.746
SHA-512  8.246

Even the slowest of these would not make Mnet become CPU-bound instead of 
network-bound (except for Mnet-on-LAN), but it would increase the CPU load that 
Mnet imposes while it works.

At the moment, I'm leaning toward choosing (SHA-256 % 2^160) as the hash 
function that Mnet uses to distribute and identify encrypted blocks.

Regards,

Zooko

[1] http://citeseer.nj.nec.com/karger97consistent.html
[2] http://www.eskimo.com/~weidai/benchmarks.html


From cefn.hoile at bt.com  Tue Nov 19 05:55:01 2002
From: cefn.hoile at bt.com (cefn.hoile@bt.com)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
Message-ID: <F66469FCE9C5D311B8FF0000F8FE9E070F1C20CD@mbtlipnt03.btlabs.bt.co.uk>

Respect to the Bram stokers out there.

Cefn

-----Original Message-----
From: Oskar Sandberg [mailto:oskar@freenetproject.org] 
Sent: 19 November 2002 02:20
To: p2p-hackers@zgp.org
Subject: Re: [p2p-hackers] Tiger vs. SHA-1

Gordon can come play with me instead. I'm not allowed to play with Bram
either.

On Mon, Nov 18, 2002 at 11:20:20AM -0600, Steve Bryan wrote:
> Pot, kettle, black.

From mccoy at mad-scientist.com  Tue Nov 19 13:11:01 2002
From: mccoy at mad-scientist.com (Jim McCoy)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160)
In-Reply-To: <E18E8Pa-0003XP-00@localhost>
Message-ID: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com>

On Tuesday, November 19, 2002, at 05:27 AM, Zooko wrote:
[...]
 > On interesting detail that I've noticed is that there is actually a 
"future-
 > On interesting detail that I've noticed is that there is actually a 
"future-
 > compatibility" benefit to replacing SHA-1 with a truncated, wider 
hash, such as
 > SHA-512, with the output truncated down to 160 bits!

 > The cost of such a design is significant, though.

That is putting it mildy, isn't it? One problem you face is that you 
are hit with this cost all over the place given the architecture that 
MNet is derived from -- data write, data read (for verification), data 
sharing (for verification), and a couple of other places -- and you 
also increase the storage requirements for the metadata that is used to 
describe what is stored.  Truncation  At this point it seems worthwhile 
to ask what your threat model is here.  Are you trying to trying to 
protect against a break in the algorithm or a progression in available 
attacker power (faster computers), or some combination of the two.  If 
an algorithm break is required to make the system "weak" and within the 
reach of a computationally-strong attacker in the next decade then why 
not just use two cheap hashes that use different base assumptions (a la 
Bitzi) and default to the cheapest of these two choices. If a 
computational miracle is required then maybe its worth asking if this 
is really a problem that will need to get solved at that time or if it 
will render the system itself irrelevant for a variety of other reasons.

Are you trying to protect yourself from a "perfect storm"? Some sort of 
computational voodoo not yet predicted (causing a radical speedup in 
"Moore's Law") _in addition to_ "big" protocol breaks that put some of 
these hashes within the reach of some of these "100 years before their 
time" computers [and yes, quantum computers are already on this 
predicted timeline]. If that is the case then it seems to make sense to 
have the larger hashes, but perhaps with an eye towards the fact that 
before this happens it is likely that the cost of manipulating blobs of 
data using SHA-512 hashes will get cheaper.  You can compute a parallel 
set of blobs for a published file using SHA-512 and include these blob 
references with the current SHA1 list, when the time comes to make the 
switchover the datasets can be upgraded by the clients (and the peers 
storing the data) in a transparent fashion as long as the correct 
metadata was included during the initial publication. Let someone who 
wants to imagine that a file they publish today will still be around in 
50 years go through the trouble of heating their CPU during publication 
rather than punishing everyone else for the next 50 years just to suit 
this particular user's vanity; everyone else will dial it up a notch 
when it seems prudent and cost effective.

Given most predictions by those "in the know" at the moment, the 
logical path seems to lead back to a first step of supporting multiple 
hashes and letting the person doing the data publication decide what 
hashes make sense for their purposes. Initially two cheap and different 
hashes will make sense and the truly paranoid can add a third SHA-512 
or OtherHash-1024 in preparation for future support of an 
OtherHash-1024 dataset to eventually migrate the blocks to, the only 
requirement for any sort of future-compatibility path is selecting the 
hashes to support during initial publication.  An additional point in 
favor of this is that your "truncated SHA-XXX" strategy only buys you a 
win in supporting to co-mingled datasets (one of SHA1 and one of 
SHA-XXX) living in the same blob storage structure. It not much harder 
to support overlay datasets of any other hash with a simple upward 
migration path that when SHA1 seems threatened the clients can start 
keeping a SHA-XXX dataset and upgrade what they are currently storing 
by re-publishing it to the correct SHA-XXX blob store.

I am wondering what threat model support prematurely converging on a 
"best of breed" and then just making it bigger? Truncation should 
simply be eliminated out of hand -- why chop the big hash down for a 
single benefit given all of the external costs.

Jim


From bram at gawth.com  Tue Nov 19 13:15:01 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] protocol design, hashes, parameterizable protocols,
 etc.
In-Reply-To: <1037676615.4446.47.camel@makoto.chaosring.org>
Message-ID: <Pine.LNX.4.21.0211191312580.14796-100000@ultra.gawth.com>

Sean R. Lynch wrote:

> I think (hope) that this group has bigger fish to fry than which hash to
> use or whether to use feature negotiation or version numbers to decide
> which hash algorithm or whatever to use, who's more technically
> competent, etc. Please correct me if I'm wrong and I'll go find another
> list to lurk on.

Preparing for backwards compatibility is a huge pain, and largely
unrewarding on its own, but such minutae occupy most of your time working
on real software development.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From bram at gawth.com  Tue Nov 19 13:15:03 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Tiger vs. SHA-1
In-Reply-To: <F66469FCE9C5D311B8FF0000F8FE9E070F1C20CD@mbtlipnt03.btlabs.bt.co.uk>
Message-ID: <Pine.LNX.4.21.0211191314260.14796-100000@ultra.gawth.com>

cefn.hoile@bt.com wrote:

> Respect to the Bram stokers out there.

Bram is the name I was born with.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From bram at gawth.com  Tue Nov 19 13:27:02 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160)
In-Reply-To: <E18E8Pa-0003XP-00@localhost>
Message-ID: <Pine.LNX.4.21.0211191317490.14796-100000@ultra.gawth.com>

Zooko wrote:

> SHA-1   48.462
> SHA-256 24.746
> SHA-512  8.246

sha-512 is just ludicrous. Birthday attacks don't apply to all
applications, and even sha-256 requires 2 ** 128 power to mount a birthday
attack, and that's either 2 ** 128 memory or 2 ** 128 non-parelellized.

Even sha-256's necessity is dubious for quite a ways out. 80 bits is still
very safe even against DES-cracker style super-parallel machines.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From zooko at zooko.com  Tue Nov 19 13:46:04 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) 
In-Reply-To: Message from Bram Cohen <bram@gawth.com> 
   of "Tue, 19 Nov 2002 13:26:15 PST." <Pine.LNX.4.21.0211191317490.14796-100000@ultra.gawth.com> 
References: <Pine.LNX.4.21.0211191317490.14796-100000@ultra.gawth.com> 
Message-ID: <E18EG8C-0001Eu-00@localhost>

 Bram wrote:
>
> sha-512 is just ludicrous. Birthday attacks don't apply to all
> applications, and even sha-256 requires 2 ** 128 power to mount a birthday
> attack, and that's either 2 ** 128 memory or 2 ** 128 non-parelellized.

Indeed -- the only reason why I would consider SHA-512 over SHA-256 is the 
possibility of better-than-brute-force results which make the latter unsafe.

(By the way, birthday attacks can be implemented without significant memory 
requirements, using Floyd's cycle-finding algorithm.  But your point about the 
infeasibility of brute force against SHA-256 is a good one.)

Regards,

Zooko


From zooko at zooko.com  Tue Nov 19 14:06:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) 
In-Reply-To: Message from Jim McCoy <mccoy@mad-scientist.com> 
   of "Tue, 19 Nov 2002 13:10:03 PST." <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> 
References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> 
Message-ID: <E18EGRF-0001NO-00@localhost>

Jim:

Thanks for the comments.  In general, your questions about the practical value 
of "forward-compatibility" techniques are good questions that need to be asked.  

I still haven't settled on how much I am willing to pay (in CPU cycles, storage, 
and design complexity) for the uncertain possibility of smoothly upgrading 
crypto algorithms.

(Note: after writing this message, I changed my mind about truncated-SHA-512.)


 Jim McCoy wrote:
>
>  > The cost of such a design is significant, though.
> 
> That is putting it mildy, isn't it?

This depends on what kind of CPU and network will be doing this operation.  If 
it is a 1.5 GHz Athlon on 1 Mbps consumer broadband, then I'm not sure there's 
any noticeable difference between SHA-512 and SHA-1.  If it is a 206 MHz 
StrongARM on 11 Mbps WLAN, then there is a huge difference.

(By the way, I have heard that SHA-512 gains speed-ups on 64 bit architectures 
just as Tiger does.  But I don't really care about 64-bit architectures.  As 
Bruce Schneier wrote during the AES process, *everything* is fast on a 64-bit 
Alpha.  The more interesting target to me is the 32-bit chips that never go 
away, but instead proliferate as they reduce their requirements for size, power, 
heat, and dollars.  The architectures that I care about most are 32-bit x86, 
32-bit PowerPC, and 32-bit ARM.  Of course, this preference also depends on the 
fact that my overall architecture is decentralized, and there is no expensive, 
beefy server in the center who does all the hash verification that the rest of 
the nodes take on faith.) 


> One problem you face is that you 
> are hit with this cost all over the place given the architecture that 
> MNet is derived from -- data write, data read (for verification), data 
> sharing (for verification), and a couple of other places -- 

True, true, true and true!

> and you 
> also increase the storage requirements for the metadata that is used to 
> describe what is stored.

Not true -- with the truncation trick we use the same amount of storage as 
with SHA-1.

> not just use two cheap hashes that use different base assumptions (a la 
> Bitzi) and default to the cheapest of these two choices.

This doesn't provide the future-compatibility feature of "we're upgrading the 
hash strength but keeping all the blocks on the same blockservers".

In addition, I'm not comfortable with having hashes present but unchecked.  If 
there are two hashes and you only verify one of them, then there is the 
possibility that you aren't looking at the same thing that everyone else is 
looking at.  Consider that Alice is using an older version of Mnet and Bob is 
using a newer version, and Mallet gives them an mnetId that points to a 
different file depending on which version of Mnet they are running.  That is a 
possibility which my design will seek to minimize.

[ellided: Jim's suggestion to have two separate hashes during the upgrade phase]

After reading your message and writing this response, I'm now leaning *away* 
from the "truncate SHA-512" strategy.  My motivation is that I really don't want 
to exclude low-CPU devices like PDAs from being able to decode files which are 
encoded in the Mnet filestore format.  I don't mind having to shuffle all the 
blocks around among the blockservers in order to upgrade -- block storage should 
be fluid anyway.

My *current* thinking is to make the authentication system in the first version 
be simply a Merkle hash tree based on SHA-1, with block size equal to the block 
size of the erasure code.  (Sorry folks: no Bitzi-compatible 1KB TigerTree in 
the first version.  I'll reconsider if I can see a clear interop story.)

If the Mnet filestore format is still in use by the time people want to move 
away from SHA-1, then in version 2 we can introduce a new authentication scheme, 
and the old SHA-1 Merkle hash tree will also be checked, if present.

Thanks very much to all posters for contributing to this discussion.

Regards,

Zooko

http://zooko.com/
http://mnet.sf.net/

From justin at chapweske.com  Tue Nov 19 14:24:01 2002
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160)
References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <E18EGRF-0001NO-00@localhost>
Message-ID: <3DDAB9F8.1080604@chapweske.com>

If you don't consider interop until there is a clear story, then its too 
late and you'll already be locked in.

If you're not going to do Tiger for the hash tree, then at least support 
  variable segment sizes and follow the orphan hash promotion semantics 
in THEX (http://open-content.net/specs/draft-jchapweske-thex-01.html).

This will allow others to create a SHA-1 hash tree whose segment size is 
equal to the file size, and that will match exactly a normal full-file 
SHA-1 hash.  So while we wouldn't have interop for fine-grained 
integrity checking, we'd at least be able to share *some* data.

FYI, the Open Content Network currently generates full file MD5, SHA-1, 
and 1k Tiger hash trees.

Here is an example of our current headers:

200 OK
Date: Tue, 19 Nov 2002 22:14:19 GMT
Accept-Ranges: bytes
Server: Apache/1.3.26 (Unix)  (Red-Hat/Linux)
Content-Encoding: x-gzip
Content-Length: 5843677
Content-MD5: 5pPPYDvG3LWXC+6DxbRk3w==
Content-Type: application/x-gzip
ETag: "3200e7-592add-31ba1400"
Last-Modified: Sun, 09 Jun 1996 00:00:00 GMT
X-Content-URN: urn:md5:42J46YB3Y3OLLFYL52B4LNDE34
X-Content-URN: urn:sha1:FAB6CX2GZSWOOWCPXXBYSFBUSN4LIGTF
X-Content-URN: urn:tree:tiger:FSIUWJUUSPLMMDUQZOWX32R6AEOT7NCCBX6AGBI
X-Thex-URI: 
http://open-content.net:8080/gateway/thex?uri=http://www.kernel.org/pub/linux/kernel/v2.0/linux-2.0.tar.gz;FSIUWJUUSPLMMDUQZOWX32R6AEOT7NCCBX6AGBI


> 
> My *current* thinking is to make the authentication system in the first version 
> be simply a Merkle hash tree based on SHA-1, with block size equal to the block 
> size of the erasure code.  (Sorry folks: no Bitzi-compatible 1KB TigerTree in 
> the first version.  I'll reconsider if I can see a clear interop story.)
> 


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From bram at gawth.com  Tue Nov 19 14:59:01 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) 
In-Reply-To: <E18EG8C-0001Eu-00@localhost>
Message-ID: <Pine.LNX.4.21.0211191457470.14796-100000@ultra.gawth.com>

Zooko wrote:

> (By the way, birthday attacks can be implemented without significant memory 
> requirements, using Floyd's cycle-finding algorithm.  But your point about the 
> infeasibility of brute force against SHA-256 is a good one.)

Floyd's cycle-finding algorithm can't be parallelized very well, that's
why I said it requires 2 ** 128 memory *or* 2 ** 128 non-parallelized
computational power.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From hal at finney.org  Tue Nov 19 15:06:01 2002
From: hal at finney.org (Hal Finney)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160)
Message-ID: <200211192304.gAJN4Ym15999@finney.org>

I agree that it makes sense to be prepared to move beyond SHA-1.  Its 160
bit size makes a collision attack take 2^80 work.  In many applications a
collision attack may be almost as bad as an inversion attack.  2^80 isn't
as big as it used to be, and if Moore's Law continues to hold for another
decade or two, 2^80 attacks will be feasible for well funded attackers.

If quantum computers become practical, an inversion attack will be reduced
from 2^160 to 2^80 via Grover's algorithm [1].  I think this algorithm
would resist parallelizing (square root of N speedup with N machines)
but still in ~20 years it could be a serious problem.

I believe, based on one paper I found [2], that Grover's algorithm can
also speed up hash collision searches to the cube root of the search
space, or 2^53.3 for SHA-1.  That paper suggests that it is not yet known
if this is the best possible speedup.

Based on this, in a new design I would suggest either augmenting SHA-1
with a second hash to make it bigger, or using SHA-256.  It's worth
mentioning that despite the similar names, SHA-256 is really nothing
like SHA-1 and so the many years of cryptanalysis of SHA-1 does not
necessarily carry over to SHA-256.

Hal Finney

[1] http://alumni.imsa.edu/~matth/quant/473/473proj/

[2] http://www.cs.berkeley.edu/~aaronson/aaronson-47057.ps


From hal at finney.org  Tue Nov 19 15:42:01 2002
From: hal at finney.org (Hal Finney)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160)
Message-ID: <200211192340.gAJNeHo16135@finney.org>

Bram Cohen wrote:
> Floyd's cycle-finding algorithm can't be parallelized very well, that's
> why I said it requires 2 ** 128 memory *or* 2 ** 128 non-parallelized
> computational power.

van Oorschot and Wiener's paper on parallel collision search [1] looks
parallelizable to me.  It uses distinguished points, which are for
example hash values which have the lower T bits all zeros.  Based on
their equation 7, using my notation and ignoring constants, a SHA-256
collision search with M processors would take time on the order of:

  2^128 / M + 2^T.

T can be chosen such that 2^T is a couple of orders of magnitude smaller
than the first term, and you get essentially linear speedup based on
the number of processors.  For M of say 2^40, T would be around 80,
and the total memory would be of order 2^48 (2^128 points times 1/2^80
probability of being distinguished).

Hal Finney

[1] http://www.scs.carleton.ca/~paulv/papers/JoC97.pdf

From zooko at zooko.com  Tue Nov 19 15:42:03 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] hash interop (was: SHA-1 vs. (SHA-512 % 2^160))
In-Reply-To: Message from Justin Chapweske <justin@chapweske.com> 
   of "Tue, 19 Nov 2002 16:23:52 CST." <3DDAB9F8.1080604@chapweske.com> 
References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <E18EGRF-0001NO-00@localhost>  <3DDAB9F8.1080604@chapweske.com> 
Message-ID: <E18EHw5-0002r4-00@localhost>

Justin:

Thank you for the post.  It has raised several issues which will probably be 
worth discussing, and I've decided to reply to it multiple times in order to 
address each issue separately.  


 Justin Chapweske wrote:
>
> If you don't consider interop until there is a clear story, then its too 
> late and you'll already be locked in.

I don't believe that's true.  The same smooth upgrade path that I've designed to 
change crypto algorithms should suffice to adopt an interop technique, like 
this:

In version 1, Mnet filestore format has only a single authentication technique: 
SHA-1 Merkle hash trees with block size equal to the block size of the Mnet 
erasure code.

Then someone comes up with an interop opportunity.  This interop requires that 
Mnet filestores be verifiable with TigerTree hashes.

In version 2, Mnet filestore format has both the SHA-1 block-size Merkle trees 
and 1KB TigerTree hashes.  While downloading, clients are REQUIRED to verify all 
hashes that are present.

Hopefully over time the new one is present more and more, and the old one less 
and less.

In version 3, we consider Mnet filemaps with old SHA-1 hashes to be ill-formed 
and reject them.


Okay, that's issue number one.  Stay tuned...

Regards,

Zooko


From zooko at zooko.com  Tue Nov 19 15:46:02 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] security of variable block-length Merkle hash trees (SHA-1 vs. (SHA-512 % 2^160))
In-Reply-To: Message from Justin Chapweske <justin@chapweske.com> 
   of "Tue, 19 Nov 2002 16:23:52 CST." <3DDAB9F8.1080604@chapweske.com> 
References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <E18EGRF-0001NO-00@localhost>  <3DDAB9F8.1080604@chapweske.com> 
Message-ID: <E18EI0E-0002xE-00@localhost>

 Justin Chapweske wrote:
>
> If you're not going to do Tiger for the hash tree, then at least support 
>   variable segment sizes and follow the orphan hash promotion semantics 
> in THEX (http://open-content.net/specs/draft-jchapweske-thex-01.html).

Hm.  Looking at this just now, I wonder if there isn't an authentication flaw, 
wherein two different files have the same THEX X-Content-URN.

Let's say there is a function THEXID() which takes a string of bytes and a 
blocksize as its two arguments.  I think there are two strings, F1 and F2, such 
that THEXID(F1, 1024) == THEXID(F2, 48).  (Assuming that the underlying hash 
outputs 24 bytes digests.)

Let F1 be a string 2048 bytes long, composed of two equal length substrings s1 
and s2.  Let F2 be a string 48 bytes long, composed of two equal length 
substrings s3 and s4.  Now let s3 equal H(s1) and S4 equal H(s2).

Unless I'm missing something, the result of THEXID(F1, 1024) is the same as that 
of THEXID(F2, 48).

The way I have been intending to fix this problem in the Mnet design is to 
prepend the block length to each block (of actual data) before hashing.

You could alternatively fix it by including the blocklength in your HTTP 
headers.  (And I could alternatively fix it by including the blocklength in my 
Mnet filemap.)


Regards,

Zooko


From justin at chapweske.com  Tue Nov 19 16:09:01 2002
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] security of variable block-length Merkle hash trees
 (SHA-1 vs. (SHA-512 % 2^160))
References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <E18EGRF-0001NO-00@localhost>  <3DDAB9F8.1080604@chapweske.com> <E18EI0E-0002xE-00@localhost>
Message-ID: <3DDAD290.7050708@chapweske.com>

You are correct in noticing that the block length, as well as the hash 
algorithm, are important in the communication of hash tree roots.

This is why THEX requires that tree hash URNs explicitly communicate 
their block size if it is different from the default '1024'.

So the output of your two functions would be:

THEXID(F1, 1024) = urn:tree:tiger:ABCDAER234LK2J3498273ASFLKJ

THEXID(F2, 48) = urn:tree:tiger/48:ABCDAER234LK2J3498273ASFLKJ

However, I could theoretically see a case where a developer gets their 
block sizes "confused" by blindly using the block size specified in an 
untrusted THEX file rather than the one specified in the URN.

Thank you much for making this example explicit Zooko.  I will update 
the THEX specification with clear warnings about gotchas like this.  I 
may even suggest that developers only support variable block sizes if 
they fully understand the implications of doing so.

> 
> Hm.  Looking at this just now, I wonder if there isn't an authentication flaw, 
> wherein two different files have the same THEX X-Content-URN.
> 
> Let's say there is a function THEXID() which takes a string of bytes and a 
> blocksize as its two arguments.  I think there are two strings, F1 and F2, such 
> that THEXID(F1, 1024) == THEXID(F2, 48).  (Assuming that the underlying hash 
> outputs 24 bytes digests.)
> 
> Let F1 be a string 2048 bytes long, composed of two equal length substrings s1 
> and s2.  Let F2 be a string 48 bytes long, composed of two equal length 
> substrings s3 and s4.  Now let s3 equal H(s1) and S4 equal H(s2).
> 
> Unless I'm missing something, the result of THEXID(F1, 1024) is the same as that 
> of THEXID(F2, 48).
> 
> The way I have been intending to fix this problem in the Mnet design is to 
> prepend the block length to each block (of actual data) before hashing.
> 
> You could alternatively fix it by including the blocklength in your HTTP 
> headers.  (And I could alternatively fix it by including the blocklength in my 
> Mnet filemap.)
> 
> 
> Regards,
> 
> Zooko
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From bert at akamail.com  Tue Nov 19 16:36:01 2002
From: bert at akamail.com (Bert)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Some new YouServ related papers
Message-ID: <3DDAD9B4.1010503@akamail.com>

In our (futile?!) quest to make p2p the preferred method of information 
sharing (and by that I don't mean porn and MP3's :-) within the 
corporate enterprise, we've added some new features to the YouServ p2p 
webhosting system and described them in a pair of papers.

(1) One describes experiences in developing and deploying a (hybrid) p2p 
search method that doesn't suck.

(2) Another describes a system for sharing web applications as opposed 
to static content, allowing easy development & propagation of meta p2p 
apps atop the base infrastructure.

They can be downloaded from here assuming our external website works:

http://www.almaden.ibm.com/cs/people/bayardo/userv/

Though typically quite reliable, since I actually need to use it today, 
the server seems to be suffering from sporadic outages. Here are some 
alternate download locations in case you have any problems:

(1) http://www-db.stanford.edu/~bawa/Pub/usearch.pdf
(2) http://bayardo-userv.userv.web.cmu.edu/secret/adina/plugin.html


From gojomo at usa.net  Tue Nov 19 16:48:01 2002
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] security of variable block-length Merkle hash trees (SHA-1 vs. (SHA-512 % 2^160))
References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <E18EGRF-0001NO-00@localhost>  <3DDAB9F8.1080604@chapweske.com>  <E18EI0E-0002xE-00@localhost>
Message-ID: <007b01c2902e$7125e740$640a000a@golden>

Zooko writes:
>  Justin Chapweske wrote:
> >
> > If you're not going to do Tiger for the hash tree, then at least support 
> >   variable segment sizes and follow the orphan hash promotion semantics 
> > in THEX (http://open-content.net/specs/draft-jchapweske-thex-01.html).
> 
> Hm.  Looking at this just now, I wonder if there isn't an authentication flaw, 
> wherein two different files have the same THEX X-Content-URN.
>
> Let's say there is a function THEXID() which takes a string of bytes and a 
> blocksize as its two arguments.  I think there are two strings, F1 and F2, such 
> that THEXID(F1, 1024) == THEXID(F2, 48).  (Assuming that the underlying hash 
> outputs 24 bytes digests.)
> 
> Let F1 be a string 2048 bytes long, composed of two equal length substrings s1 
> and s2.  Let F2 be a string 48 bytes long, composed of two equal length 
> substrings s3 and s4.  Now let s3 equal H(s1) and S4 equal H(s2).
> 
> Unless I'm missing something, the result of THEXID(F1, 1024) is the same as that 
> of THEXID(F2, 48).

Hm. I think you have identified a problem, but it's not *exactly* that
THEXID(F1,1024) == THEXID(F2,48). That could be disambiguated by the
specification of blocksize in the identifier. For example, in the URN scheme 
I've proposed for tree hashes -- which is still under discussion -- these 
two values might be identified as:

F1
       urn:tree:tiger:ABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDA
F2
    urn:tree:tiger/48:ABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDA

Even though the value-parts are identical, a program would know they
aren't any more comparable than a SHA1 value and a RIPEMD160 value.

However, there does seem to be a real problem in that THEXID(F1,1024) ==
THEXID(F2,1024), in the special case where (length(F2)== 2*hashsize) &&
(length(F2)<blocksize). That is, for F2, it is also true that:

F2
       urn:tree:tiger:ABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDA

I don't think it arises from the rule for "only child" node promotion, 
but rather the rule for that lets tree(algorithm,string)==algorithm(string)
for length(string)<blocksize.

Still thinking about it, it shouldn't be too exploitable for malicious
purposes -- the isomorphism is only between longer files, and their 
own just-short-of-root generation hash products -- but this could cause 
confusion if not patched up or documented as a problem/special case.

> The way I have been intending to fix this problem in the Mnet design is to 
> prepend the block length to each block (of actual data) before hashing.

I don't think prepending blocklength would necessarily solve the problem 
where...

length(F2)=2*hashsize
length(F2)>blocksize
THEXID(F1,1024) == THEXID(F2,1024)

But somehow mixing in *actual* data lengths (or block counts) would 
work, because then the THEXID of a 48-byte file would not be the
same as a 48-byte-wide interim generation. 

Considering implications & potential workarounds...

- Gojomo


From bram at gawth.com  Tue Nov 19 18:14:01 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] security of variable block-length Merkle hash
 trees (SHA-1 vs. (SHA-512 % 2^160))
In-Reply-To: <3DDAD290.7050708@chapweske.com>
Message-ID: <Pine.LNX.4.21.0211191810490.14796-100000@ultra.gawth.com>

Justin Chapweske wrote:

> You are correct in noticing that the block length, as well as the hash 
> algorithm, are important in the communication of hash tree roots.

For what it's worth, BitTorrent always includes the block length, with no
default, and doesn't really use trees, since they're always exactly two
level, it's really just a hash of a list of hashes and other metainfo.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From zooko at zooko.com  Tue Nov 19 18:40:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] security of variable block-length Merkle hash trees (SHA-1 vs. (SHA-512 % 2^160)) 
In-Reply-To: Message from "Gordon Mohr" <gojomo@usa.net> 
   of "Tue, 19 Nov 2002 16:47:31 PST." <007b01c2902e$7125e740$640a000a@golden> 
References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <E18EGRF-0001NO-00@localhost> <3DDAB9F8.1080604@chapweske.com> <E18EI0E-0002xE-00@localhost>  <007b01c2902e$7125e740$640a000a@golden> 
Message-ID: <E18EKij-0004Bv-00@localhost>

Just for clarification, F1 and F2 in the following is where F2 is composed of 
two equal-length strings F1a and F1b, and F1 is composed of two equal-length 
strings F2a and F2b, and F1a = H(F2a) and F1b = H(F2b).


 "Gordon Mohr" <gojomo@usa.net> writes:
>
> However, there does seem to be a real problem in that THEXID(F1,1024) ==
> THEXID(F2,1024), in the special case where (length(F2)== 2*hashsize) &&
> (length(F2)<blocksize).

Ah yes -- for any file of any length, I can take its two top-most inner nodes 
(that is, the X and Y such that the root = H(X+Y)), and define a new file whose 
contents are X+Y.  This new file will have the same root ID as the previous 
file.

As far as I can see, the only files that I can create which collide have length 
2*hashsize.

> > The way I have been intending to fix this problem in the Mnet design is to 
> > prepend the block length to each block (of actual data) before hashing.
> 
> I don't think prepending blocklength would necessarily solve the problem 
> where...
> 
> length(F2)=2*hashsize
> length(F2)>blocksize
> THEXID(F1,1024) == THEXID(F2,1024)

Hm.  You mean where blocksize <= 2*hashsize?  You're right that this would also 
be a problem.


> But somehow mixing in *actual* data lengths (or block counts) would 
> work, because then the THEXID of a 48-byte file would not be the
> same as a 48-byte-wide interim generation. 

Indeed.  For Mnet, I already have the file size included separately in the 
filemap, so I don't have to worry about this problem in any case.  (I guess
this means I didn't need to prepend any block lengths in my hash.  I hadn't 
really worked out the details of the hash tree yet, as you can see.)

I suppose for purposes of Mnet <-> Bitzi/THEX integration (about which I have a 
longish e-mail in the wings for tomorrow) we ought to document what kinds of 
data are required for crypto integrity purposes to be included "out of band" 
along with the hash value(s).

Again, I prefer to include such information "in-band", by mixing it into the 
hash.  For example, prepending the length of the whole file, and the block size 
just for good measure, at the beginning of the hash of the first block.

(I get this preference from certain "crypto design heuristics" papers such as 
"The Chosen Protocol Attack" [1].  One way to look at it is that implementors 
are more likely to accidentally leave this important metadata out if it is 
transmitted in a separate HTTP header, or in an optional suffix as per your 
block size, than if it is baked into the crypto protocol.)

But in the interests of interop, I might consider leaving that stuff out of my 
hash tree spec (and including it only in my filemap) if you're leaving it out of 
yours.

Regards,

Zooko

[1] http://citeseer.nj.nec.com/kelsey97protocol.html


From bram at gawth.com  Tue Nov 19 19:37:02 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160)
In-Reply-To: <200211192340.gAJNeHo16135@finney.org>
Message-ID: <Pine.LNX.4.21.0211191935450.14796-100000@ultra.gawth.com>

Hal Finney wrote:

> Bram Cohen wrote:
> > Floyd's cycle-finding algorithm can't be parallelized very well, that's
> > why I said it requires 2 ** 128 memory *or* 2 ** 128 non-parallelized
> > computational power.
> 
> van Oorschot and Wiener's paper on parallel collision search [1] looks
> parallelizable to me. 
> 
> [1] http://www.scs.carleton.ca/~paulv/papers/JoC97.pdf

Very clever. I take back my comment about non-parallelizability.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From gojomo at usa.net  Tue Nov 19 23:25:01 2002
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] security of variable block-length Merkle hash trees (SHA-1 vs. (SHA-512 % 2^160)) 
References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <E18EGRF-0001NO-00@localhost> <3DDAB9F8.1080604@chapweske.com> <E18EI0E-0002xE-00@localhost>  <007b01c2902e$7125e740$640a000a@golden>  <E18EKij-0004Bv-00@localhost>
Message-ID: <002f01c29065$e71fc1f0$640a000a@golden>

Zooko writes:
>  "Gordon Mohr" <gojomo@usa.net> writes:
> > However, there does seem to be a real problem in that THEXID(F1,1024) ==
> > THEXID(F2,1024), in the special case where (length(F2)== 2*hashsize) &&
> > (length(F2)<blocksize).
> 
> Ah yes -- for any file of any length, I can take its two top-most inner nodes 
> (that is, the X and Y such that the root = H(X+Y)), and define a new file whose 
> contents are X+Y.  This new file will have the same root ID as the previous 
> file.

Yep. It's a very undesirable quality. I don't think it opens much of
a malicious attack -- because the collision is between a file, and an
artificially-created file of hash data from the original file. Such
a created file is completely determined by the original, and of 
recognizable size. Also, if other channels let people know the size of 
the file they're seeking, they can disambiguate the two.

However, my inclination is still to patch this somehow, and restore
the (intended) property of THEX-style tree hashes that no easily
discoverable files have the same root value (given identical algorithm/
block-size decisions).

> As far as I can see, the only files that I can create which collide have length 
> 2*hashsize.

Upon more thought, I see there are other similar cases having to 
do with the 'only child'-promotion rule. 

For example, consider:

File F1: 4096 bytes long, 4 1024 byte blocks, call them A, B, C, D. 
Then the tree hash is:

H( H( H(A)+H(B) ) + H ( H(C)+H(D) ) )  // '+' is concatenation

File F2: 2048+48 bytes long, 2 1024 byte A blocks, then 1 48 byte block 
which just "happens to be" H(C)+H(D). Then again, the tree hash is:

H( H( H(A)+H(B) ) + H ( H(C)+H(D) ) )

The same caveats as above apply, but this suggests that for one
long file, there may be multiple creatable smaller artificial 
collision files. 

Changing the rule for handling 'only child' nodes might resolve 
this issue, but not the 2*hashsize file issue. 

> > > The way I have been intending to fix this problem in the Mnet design is to 
> > > prepend the block length to each block (of actual data) before hashing.
> > 
> > I don't think prepending blocklength would necessarily solve the problem 
> > where...
> > 
> > length(F2)=2*hashsize
> > length(F2)>blocksize
> > THEXID(F1,1024) == THEXID(F2,1024)
> 
> Hm.  You mean where blocksize <= 2*hashsize?  You're right that this would also 
> be a problem.

Oops, I meant length(F2)<blocksize. If length(F2)>blocksize, then 
THEXID(F2,blocksize) will *not* be the same as THEXID(F1,blocksize),
even if F2 is exactly the same as the two interim tree values just
under the root. So really small blocks would be one (inefficient!)
way of avoiding this kind of problem.

> > But somehow mixing in *actual* data lengths (or block counts) would 
> > work, because then the THEXID of a 48-byte file would not be the
> > same as a 48-byte-wide interim generation. 
> 
> Indeed.  For Mnet, I already have the file size included separately in the 
> filemap, so I don't have to worry about this problem in any case.  (I guess
> this means I didn't need to prepend any block lengths in my hash.  I hadn't 
> really worked out the details of the hash tree yet, as you can see.)
> 
> I suppose for purposes of Mnet <-> Bitzi/THEX integration (about which I have a 
> longish e-mail in the wings for tomorrow) we ought to document what kinds of 
> data are required for crypto integrity purposes to be included "out of band" 
> along with the hash value(s).
> 
> Again, I prefer to include such information "in-band", by mixing it into the 
> hash.  For example, prepending the length of the whole file, and the block size 
> just for good measure, at the beginning of the hash of the first block.

I agree, and my first inclination for addressing this tree-hash issue is
to redefine the THEXish root value calculation as hashing the currently
defined root value with the resource length (either in bytes or blocks). 

But the full implications of such a change, on the format and practice of
tree hashes, requires some more consideration before I'd be ready to 
formally propose it.

For now: thanks for your perceptive catch of this issue. I'm surprised 
it's slipped by peoples' notice for so long.

- Gojomo
____________________   
Gordon Mohr <gojomo@ . . . At Bitzi, people cooperate to identify, rate,
bitzi.com> Bitzi CTO . . . describe and discover files of every kind. 
_ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it!


From justin at chapweske.com  Wed Nov 20 06:57:01 2002
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] [Fwd: Re: Merkle Hash Tree Weakness - Request for Advice]
Message-ID: <3DDBA2A3.7070601@chapweske.com>

Here is a proposed fix that matches the elegence of the hash tree, thus 
seems like the "right" fix.  It also has the added benefit of being 
constructable from an existing "broken" hash tree and the file size, so 
this can be done w/o invalidating Bitzi's database.

However, it will take at least a number of weeks of considering various 
options before we can declare the appropriate fix.  We don't want to be 
too hasty on this.

-------- Original Message --------
Subject: Re: Merkle Hash Tree Weakness - Request for Advice
Date: Wed, 20 Nov 2002 05:42:13 -0800
From: Richard Parker <richard@electrophobia.com>
CC: <justin@chapweske.com>
Newsgroups: sci.crypt
References: <3d4e7bc5.0211191729.4f59437e@posting.google.com>


Yes, as described your hash tree appears to be vulnerable to attacks on its
2nd-preimage resistance.  The problem appears to be that your hash is not
adequately "capturing" the size of substrings.  Fortunately there is a
simple fix, when computing the hash for each vertex append to the hash input
string the total length of the substrings rooted by that vertex (after
padding to the size of the hash function's block size, i.e. 512 bits for
SHA-1 and MD5).  Appending the size to the input of a hash is called
"MD-strengthening" after Merkle and Damgard who independently proposed this
idea.

For example, ignoring optimizations, such such a hash tree might look as
follows:

            V_0
         /       \
        /         \
      V_1         V_2
     /  \        /  \
    /    \      /    \
   V_3   V_4   V_5   V_6

   H()   hash function
   l()   length
   ||    concatenation
   V_i   vertex i
   S_j   substring j
   r     block size of H
   n     output size of H
   p_i   number of bits needed to pad to multiple of r
   w     number of bits used to represent substring length

   V_0 = H(V_1 || V_2 || 0^p_0 || 0^(r-w) || l(S_1)+l(S_2)+l(S_3)+l(S_4))
   V_1 = H(V_3 || V_4 || 0^p_1 || 0^(r-w) || l(S_1)+l(S_2))
   V_2 = H(V_5 || V_6 || 0^p_2 || 0^(r-w) || l(S_3)+l(S_4))
   V_3 = H(S_1 || l(S_1))
   V_4 = H(S_2 || l(S_2))
   V_5 = H(S_3 || l(S_3))
   V_6 = H(S_4 || l(S_4))

Depending on your application requirements, it might well be sufficient to
just apply MD-strengthening to the root hash.

-Richard

-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From lgonze at panix.com  Thu Nov 21 11:23:01 2002
From: lgonze at panix.com (Lucas Gonze)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Re: [decentralization] Some new YouServ related papers
In-Reply-To: <3DDAD9B4.1010503@akamail.com>
Message-ID: <Pine.LNX.4.44.0211211414250.24558-100000@localhost.localdomain>

On Tue, 19 Nov 2002, Bert wrote:
> In our (futile?!) quest to make p2p the preferred method of information 
> sharing (and by that I don't mean porn and MP3's :-) within the 
> corporate enterprise, we've added some new features to the YouServ p2p 
> webhosting system and described them in a pair of papers.

A thing about yooserv that's a little funny is the use of dynamic DNS for
identity, because it implies an admistrative bottleneck.  But maybe that's
just kneejerk reaction.  Can you say how this has worked out in practice,
Bert?

- Lucas


From bert at akamail.com  Thu Nov 21 23:02:01 2002
From: bert at akamail.com (Bert)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] Re: [decentralization] Some new YouServ related
 papers
References: <Pine.LNX.4.44.0211211414250.24558-100000@localhost.localdomain>
Message-ID: <3DDDD801.30901@akamail.com>

ucas Gonze wrote:

>On Tue, 19 Nov 2002, Bert wrote:
>  
>
>>In our (futile?!) quest to make p2p the preferred method of information 
>>sharing (and by that I don't mean porn and MP3's :-) within the 
>>corporate enterprise, we've added some new features to the YouServ p2p 
>>webhosting system and described them in a pair of papers.
>>    
>>
>
>A thing about yooserv that's a little funny is the use of dynamic DNS for
>identity, because it implies an admistrative bottleneck.  But maybe that's
>just kneejerk reaction.  Can you say how this has worked out in practice,
>Bert?
>
I'm not sure I understand the question -- Could you clarify?

By administrative do you mean "involving human adminstration"?  There is 
certainly no such bottleneck as the domains are assigned and registered 
automatically using an injective mapping from the user's (authenticated) 
e-mail address.  Securing the base domain (userv.ibm.com in the case of 
the IBM deployment) certainly involves some administrative involvement 
but it only has to be done once (+ whatever is required for keeping it 
from expiring).

The project has precious few resources so I try to keep the system such 
that it basically runs itself.  


From zooko at zooko.com  Sat Nov 23 10:35:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] choice of security primitive: kinds of failure
Message-ID: <E18Ff3S-0006p3-00@localhost>

My wife said something at dinner the other night as we were talking about choice 
of hash algorithms.  She said she was of two minds: on the one hand you don't 
want to pay a price now for an uncertain gain in the future, and on the other 
hand it's *really* important not to use broken crypto.

This made me realize that the reason I've been internally flip-flopping on which 
hash algorithm to use is that there are two different failure scenarios in my 
head.

The first scenario is scenario where every year or so a academic paper comes out 
that says "Here is a slightly better certificational attack on hash 
algorithm X.".  As these certificational (meaning "purely theoretical") attacks 
on algorithm X get better and better, and closer and closer to being more than 
purely theoretical, I decide to switch the Mnet file format over to algorithm Y, 
and commence doing so, in a series of nice backwards-compatible steps.

When I'm thinking along those lines, it's obvious that I should use whatever 
algorithm is efficient and cryptographically approved today, and not worry about 
the future.

But then there's the other scenario, where I suddenly find out that crackers in 
Malaysia have been using a weakness in algorithm X to steal money and gain 
control of innocent people's computers.  Innocent people who used the Mnet file 
format and trusted its cryptographic guarantees.  I further realize that these 
crackers in Malaysia didn't cryptanalyze it themselves, but that they learned 
the trick from someone else.  I don't know who, or how many people know how to 
do this, or for how long they've known it.

When I'm thinking along those lines, it's obvious that I should use the 
strongest possible algorithm today, and efficiency be damned (within limits).

Another realization that I had is that my desire to have the Mnet file format 
used on handhelds is *definitely* a future scenario of the first kind.  I can 
start using a slow, expensive algorithm today, and if there opens up an 
opportunity to use the Mnet file format on handhelds we can switch over to a 
more efficient algorithm in nice backwards-compatible steps.

Regards,

Zooko


From zooko at zooko.com  Sat Nov 23 12:46:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] choice of hash algorithm: some facts
Message-ID: <E18Fh68-0007pX-00@localhost>

Here's a few things I've learned about hash algorithms:

 * NIST has standardized SHA-1, SHA-256, SHA-384, and SHA-512

SHA-1 is a member of the MD4, MD5 family.  The others (collectively known as 
SHA-2) are not.

 * NESSIE [1] is considering the following algorithms for standardization: all 
of the NIST ones, plus Whirlpool [2].

Notably, Tiger wasn't proposed for NESSIE, even though Biham (one of the two 
authors of Tiger) is participating in NESSIE.  I don't know why Tiger wasn't 
proposed for standardization.

 * Whirlpool is based on Rijndael and one of the designers of Whirlpool is one 
of the designers of Rijndael.  The NESSIE project measures Whirlpool as being a 
little faster than SHA-2/512 (36 cycles/byte for Whirlpool, 40 cycles/byte for 
SHA-512) [3].

 * Ross Anderson (the other author of Tiger) gives a high-level overview of hash 
algorithms in his book "Security Engineering".  He describes MD4, MD5, SHA-1, 
SHA-256, SHA-512.  He calls these latter two "versions of SHA".  He says to use 
more than 160-bit wide hash functions, and to avoid the "MD series".  He doesn't 
mention that SHA-1 is genetically related to the MD series.

 * I ran the Crypto++ v5 benchmark on my machine.  It shows that my 1.4 GHz 
Athlon XP is about twice as fast as Wei Dai's Celeron 850 MHz [4], and otherwise 
shows approximately the same relation between speeds of hash functions: 

hash algorithm MB/s
-------------- ----
CRC-32          253
Adler-32        232
MD5             129
HAVAL (pass=3)   86
SHA-1            84
HAVAL (pass=4)   62
RIPE-MD160       51
HAVAL (pass=5)   50
Tiger            47
SHA-256          41
SHA-512          17
MD2               2

Regards,

Zooko

[1] http://cryptonessie.org/
[2] http://planeta.terra.com.br/informatica/paulobarreto/WhirlpoolPage.html
[3] "Performance of Optimized Implementations of the NESSIE Primitives, version 1.0"
    http://www.cosic.esat.kuleuven.ac.be/nessie/deliverables/D21-v1.pdf
[4] http://www.eskimo.com/~weidai/benchmarks.html


From arma at mit.edu  Mon Nov 25 22:34:01 2002
From: arma at mit.edu (Roger Dingledine)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] CfP: Privacy Enhancing Technologies 2003
Message-ID: <20021126013307.R17342@moria.seul.org>

We've extended the submission deadline to Dec 6 (firm deadline).

Also note that we will have a few (or maybe quite a few, depending on
further sponsors) stipends available for students, unemployed researchers,
etc.


(Please forward widely.)

                            CALL FOR PAPERS

            WORKSHOP ON PRIVACY ENHANCING TECHNOLOGIES 2003

                            Mar 26-28 2003
                           Dresden, Germany
                       Hotel Elbflorenz Dresden

                     http://www.petworkshop.org/

Privacy and anonymity are increasingly important in the online world.
Corporations and governments are starting to realize their power to
track users and their behavior, and restrict the ability to publish
or retrieve documents. Approaches to protecting individuals, groups,
and even companies and governments from such profiling and censorship
have included decentralization, encryption, and distributed trust.

Building on the success of the first anonymity and unobservability
workshop (held in Berkeley in July 2000) and the second workshop
(held in San Francisco in April 2002), this third workshop addresses
the design and realization of such privacy and anti-censorship services
for the Internet and other communication networks. These workshops bring
together anonymity and privacy experts from around the world to discuss
recent advances and new perspectives.

The workshop seeks submissions from academia and industry presenting
novel research on all theoretical and practical aspects of privacy
technologies, as well as experimental studies of fielded systems.
We encourage submissions from other communities such as law and business
that present their perspectives on technological issues. As in past years,
we will publish LNCS proceedings after the workshop.

Suggested topics include but are not restricted to:

* Efficient (technically or economically) realization of privacy services
* Techniques for censorship resistance
* Anonymous communication systems (theory or practice)
* Anonymous publishing systems (theory or practice)
* Attacks on anonymity systems (eg traffic analysis)
* New concepts in anonymity systems
* Protocols that preserve anonymity/privacy
* Models for anonymity and unobservability
* Models for threats to privacy
* Novel relations of payment mechanisms and anonymity
* Privacy-preserving/protecting access control
* Privacy-enhanced data authentication/certification 
* Profiling, data mining, and data protection technologies
* Reliability, robustness, and attack resistance in privacy systems
* Providing/funding privacy infrastructures (eg volunteer vs business)
* Pseudonyms, identity, linkability, and trust
* Privacy, anonymity, and peer-to-peer
* Usability issues and user interfaces for PETs
* Policy, law, and human rights -- anonymous systems in practice
* Incentive-compatible solutions to privacy protection
* Economics of privacy systems
* Fielded systems and techniques for enhancing privacy in existing systems

                           IMPORTANT DATES

Submission deadline (extended, firm)           December 6, 2002 23:59 EST
Acceptance notification                        February 7, 2003
Camera-ready copy for preproceedings              March 7, 2003
Camera-ready copy for proceedings                April 28, 2003

                               CHAIRS

Roger Dingledine, The Free Haven Project, USA
Andreas Pfitzmann, Dresden University of Technology, Germany

                          PROGRAM COMMITTEE

Alessandro Acquisti, SIMS, UC Berkeley, USA
Stefan Brands, Credentica, Canada
Jean Camp, Kennedy School, Harvard University, USA
David Chaum, USA
Richard Clayton, University of Cambridge, England
Lorrie Cranor, AT&T Labs - Research, USA
Roger Dingledine, The Free Haven Project, USA (program chair)
Hannes Federrath, Freie Universitaet Berlin, Germany
Ian Goldberg, Zero Knowledge Systems, Canada
Marit Hansen, Independent Centre for Privacy Protection
  Schleswig-Holstein, Germany
Markus Jakobsson, RSA Laboratories, USA
Brian Levine, University of Massachusetts at Amherst, USA
David Martin, University of Massachusetts at Lowell, USA
Andreas Pfitzmann, Dresden University of Technology, Germany
Matthias Schunter, IBM Zurich Research Lab, Switzerland
Andrei Serjantov, University of Cambridge, England
Adam Shostack, Canada
Paul Syverson, Naval Research Lab, USA

                          PAPER SUBMISSIONS

Submitted papers must not substantially overlap with papers that have
been published or that are simultaneously submitted to a journal or
a conference with proceedings.  Papers should be at most 15 pages
excluding the bibliography and well-marked appendices (using 11-point
font and reasonable margins), and at most 20 pages total.  Authors are
encouraged to follow Springer LNCS format in preparing their submissions
<http://www.springer.de/comp/lncs/authors.html>. Committee members are
not required to read the appendices and the paper should be intelligible
without them.  The paper should start with the title, names of authors
and an abstract.  The introduction should give some background and
summarize the contributions of the paper at a level appropriate for
a non-specialist reader.  During the workshop preproceedings will be
made available.  Final versions are not due until after the workshop,
giving the authors the opportunity to revise their papers based on
discussions during the meeting.

Submissions must be made in Postscript or PDF format.  To submit a paper,
send a plain ASCII text email to <arma@mit.edu> containing the title and
abstract of the paper, the authors' names, email and postal addresses,
phone and fax numbers, and identification of the contact author.  To the
same message, attach your submission (as a MIME attachment). Papers
must be received by December 6, 2002.  Notification of acceptance
or rejection will be sent to authors no later than February 7, 2003,
and authors will have the opportunity to revise for the preproceedings
version by March 7, 2003.  Submission implies that, if accepted, the
author(s) agree to publish in the proceedings and to sign a standard
Springer copyright release, and also that an author of the paper will
present it at the workshop. Final versions (due after the workshop) need
to comply with the instructions for authors made available by Springer.


From bram at gawth.com  Wed Nov 27 02:08:01 2002
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] CodeCon CFP reminder, deadline December 15th
Message-ID: <Pine.LNX.4.21.0211270205190.11483-100000@ultra.gawth.com>

The CodeCon submissions deadline is December 15th, you can see the info
here -

http://codecon.info/

Also, the date and place have been set - February 22-24, Club NV, San
Francisco.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From nikolajn at ascio.com  Wed Nov 27 11:44:01 2002
From: nikolajn at ascio.com (Nikolaj Nyholm)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] RE: [decentralization] Some new YouServ related papers
Message-ID: <2F15A97500CFA0469C9BACC2041F8AC702EE0432@aries.dk.speednames.com>

> A thing about yooserv that's a little funny is the use of 
> dynamic DNS for
> identity, because it implies an admistrative bottleneck.  But 

The idea is not so far out.

Dynamic DNS will in essence run identity and discovery for 3G (layer 2/link
layer mobility).

On a smaller note, ENUM delegations (use telephone number in DNS as
identity) are finally starting to happen, using SIP for presence and
message/session initiation. 
All current efforts are, however, on the purely experimental basis.

On an even smaller note, we've done work on building extended identity
functions on top of any of the above two 'identity layers'.

/n

From halfinney at tmail.com  Wed Nov 27 11:44:04 2002
From: halfinney at tmail.com (Hal Finney)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] choice of security primitive: kinds of failure
In-Reply-To: <E18Ff3S-0006p3-00@localhost>
References: <E18Ff3S-0006p3-00@localhost>
Message-ID: <1038078311.253FE56B@w5.dngr.org>

It's hard for me to believe that some hacker consortium will be capable 
of the advanced mathematical reasoning necessary to break a hash 
algorithm.  Academic researchers are the ones who will break these 
algorithms, and they'll publish their results in the open literature.  
It takes a different kind of expertise to do a theoretical analysis and 
break of a crypto algorithm than has been demonstrated in the hacker 
community.  So I would not put too much weight on that concern.

Hal


On Sat, 23 Nov 2002 10:41AM -0800, Zooko wrote:
>
> My wife said something at dinner the other night as we were talking 
> about choice
> of hash algorithms.  She said she was of two minds: on the one hand you 
> don't
> want to pay a price now for an uncertain gain in the future, and on the 
> other
> hand it's *really* important not to use broken crypto.
>
> This made me realize that the reason I've been internally flip-flopping 
> on which
> hash algorithm to use is that there are two different failure scenarios 
> in my
> head.
>
> The first scenario is scenario where every year or so a academic paper 
> comes out
> that says "Here is a slightly better certificational attack on hash
> algorithm X.".  As these certificational (meaning "purely theoretical") 
> attacks
> on algorithm X get better and better, and closer and closer to being 
> more than
> purely theoretical, I decide to switch the Mnet file format over to 
> algorithm Y,
> and commence doing so, in a series of nice backwards-compatible steps.
>
> When I'm thinking along those lines, it's obvious that I should use 
> whatever
> algorithm is efficient and cryptographically approved today, and not 
> worry about
> the future.
>
> But then there's the other scenario, where I suddenly find out that 
> crackers in
> Malaysia have been using a weakness in algorithm X to steal money and 
> gain
> control of innocent people's computers.  Innocent people who used the 
> Mnet file
> format and trusted its cryptographic guarantees.  I further realize 
> that these
> crackers in Malaysia didn't cryptanalyze it themselves, but that they 
> learned
> the trick from someone else.  I don't know who, or how many people know 
> how to
> do this, or for how long they've known it.
>
> When I'm thinking along those lines, it's obvious that I should use 
> the
> strongest possible algorithm today, and efficiency be damned (within 
> limits).
>
> Another realization that I had is that my desire to have the Mnet file 
> format
> used on handhelds is *definitely* a future scenario of the first kind.  
> I can
> start using a slow, expensive algorithm today, and if there opens up 
> an
> opportunity to use the Mnet file format on handhelds we can switch over 
> to a
> more efficient algorithm in nice backwards-compatible steps.
>
> Regards,
>
> Zooko
>
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
--hal

From agl at imperialviolet.org  Wed Nov 27 11:44:07 2002
From: agl at imperialviolet.org (Adam Langley)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] choice of hash algorithm: some facts
In-Reply-To: <E18Fh68-0007pX-00@localhost>
References: <E18Fh68-0007pX-00@localhost>
Message-ID: <20021124003005.GA30787@imperialviolet.org>

On Sat, Nov 23, 2002 at 03:41:52PM -0500, Zooko wrote:
> hash algorithm MB/s
> -------------- ----
> CRC-32          253
> Adler-32        232
> MD5             129
> HAVAL (pass=3)   86
> SHA-1            84
> HAVAL (pass=4)   62
> RIPE-MD160       51
> HAVAL (pass=5)   50
> Tiger            47
> SHA-256          41
> SHA-512          17
> MD2               2

Any timings for Whirlpool?

-- 
Adam Langley                                      agl@imperialviolet.org
http://www.imperialviolet.org                       (+44) (0)7986 296753
PGP: 9113   256A   CC0F   71A6   4C84   5087   CDA5   52DF   2CB6   3D60

From zooko at zooko.com  Wed Nov 27 13:36:01 2002
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] choice of security primitive: kinds of failure 
In-Reply-To: Message from "Hal Finney" <halfinney@tmail.com> 
   of "Sat, 23 Nov 2002 11:03:29 PST." <1038078311.253FE56B@w5.dngr.org> 
References: <E18Ff3S-0006p3-00@localhost>  <1038078311.253FE56B@w5.dngr.org> 
Message-ID: <E18H9mK-0000AO-00@localhost>

 Hal Finney wrote:
>
> It's hard for me to believe that some hacker consortium will be capable 
> of the advanced mathematical reasoning necessary to break a hash 
> algorithm.

Oh, I agree.  Perhaps I should have left the identities of the analyzers blank 
in the story.  (In my head, I was imagining that it was the national security 
apparatus of a major state, such as Russia.  Another scenario could be a 
brilliant and lucky academic who sells out instead of publishing.)

I hate to sound paranoid, but I do want to think about how likely I consider the 
"surprising, already exploited" scenario.

Since I have very little information with which to evaluate the likelihood of 
such a scenario, I continue to flip-flop between the efficient choices (SHA-1 
and Rijndael), and the conservative ones (SHA-512 (??) and DES-EDE3).

Suggestions would be appreciated.  Perhaps SHA-1 is both the most efficient and 
the most conservative choice of a hash function.

Regards,

Zooko

Algorithm               MB/s
---------               ----
SHA-1                   85.333
SHA-256                 41.558
SHA-512                 17.778
DES-XEX3                13.333
DES-EDE3                 6.061
Rijndael (128-bit key)  31.068
Rijndael (192-bit key)  27.119
Rijndael (256-bit key)  23.881
Rijndael (128) CTR      27.350
Rijndael (128) OFB      29.358
Rijndael (128) CFB      23.188
Rijndael (128) CBC      27.119

From hemppah at cc.jyu.fi  Fri Nov 29 00:49:02 2002
From: hemppah at cc.jyu.fi (hemppah@cc.jyu.fi)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] About search methods, key revokation in PKI and signature management
Message-ID: <1038559300.3de728446fc72@tammi2.cc.jyu.fi>

Hi,

Currently I'm doing my Master thesis which focuses some specific issues related
to p2p systems. I have a few questions concercning these issues:

a) Is DHT-based routing model (O(log(n)) the best effort for finding (data)
blocks in p2p network ?

b) What are assumptions for the best effort ? Look question a (what ever it is).

c) Are there any documentation/implementation how one should manage key
revokation in PKI ?

d) How digital signatures should be managed (PKI) in p2p environment ?

e) How do I know that if I have searched data, results are accurate (not fake
blocks etc.)

Any help would be grateful.

Thanks,
Hermanni Hyyti?l?


-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/

From arachnid at notdot.net  Fri Nov 29 02:10:02 2002
From: arachnid at notdot.net (Nick Johnson)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] The 'MetaWeb' and the Slashdot effect
Message-ID: <3DE73D10.6080608@notdot.net>

Hi folks,

I was just thinking some random thoughts (you know the type), and a 
fairly simple idea occurred to me for mitigation of Slashdot type 
effects, increasing reliability and uptime, etc.
The basic idea is this: Instead of attempting to fetch a page directly, 
the user attempts to fetch it via a script. If the script has a cached 
copy of the page in it's database (the script would follow standard 
caching directives as a proxy would), it simply returns the page. If 
not, it has a number of choices:
- It can open a query to another copy of the script it knows of, 
elsewhere on the network, and ask for the same page, then cache it and 
return it to the user
- If it determines it's traffic to be what it decides is 'too high', it 
can simply return an HTTP redirect to the user, redirecting them to 
another script it knows of
- It can fetch the page itself and return it to the user.

I am aware this idea bears a number of similarities to existing efforts, 
including such things as PeekABooty and Anonymizer.com, but I believe it 
could have several advantages in certain situations:
- It requires no software on the part of the user, though a proxy server 
that uses this system could be implemented.
- It requires no software or modifications on the part of the content 
author.
- If organisations such as slashdot either link to a script or provide 
their own, Slashdot effects can be mitigated or eliminated.

Not to mention that it could be interesting to write ;). Obviously, 
there are unanswered implementation questions, such as how a page 
decides which action to take if it does not have a copy of the page, how 
it should deal with 'no cache' pages, etc. I'm posting this because I'd 
be interested to see if anyone is interested, and if anyone has done 
something similar already.


From bert at akamail.com  Fri Nov 29 08:46:01 2002
From: bert at akamail.com (Bert)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] The 'MetaWeb' and the Slashdot effect
In-Reply-To: <3DE73D10.6080608@notdot.net>
References: <3DE73D10.6080608@notdot.net>
Message-ID: <3DE79B69.5000807@akamail.com>

Nick Johnson wrote:

> Hi folks,
>
> I was just thinking some random thoughts (you know the type), and a 
> fairly simple idea occurred to me for mitigation of Slashdot type 
> effects, increasing reliability and uptime, etc.
> The basic idea is this: Instead of attempting to fetch a page 
> directly, the user attempts to fetch it via a script. If the script 
> has a cached copy of the page in it's database (the script would 
> follow standard caching directives as a proxy would), it simply 
> returns the page. If not, it has a number of choices: [snip]

Below is a paper that explores this idea, though it's more a combination 
of this idea + BitTorrent  in that the peers who download are the ones 
doing the caching. The idea here is that clients requesting the content 
provide a special pragma field that indicates their willingness to serve 
that content to subsequent requestors. The server can then issue a 
redirect to any such peers instead of serving the content itself. *

The Case for Cooperative Networking*, Venkata N. Padmanabhan and 
Kunwadee Sripanidkulchai. IPTPS '02 
<http://www.cs.rice.edu/Conferences/IPTPS02/>.
http://detache.cmcl.cs.cmu.edu/~kunwadee/research/papers/coopnetiptps.pdf

I've also planned on adding a similar function to YouServ, as described 
in section 3.4 of  
http://www.almaden.ibm.com/cs/people/bayardo/userv/plugin/plugin.html. 
Here though we want to use DNS to do the "load balancing" (ala Akamai) 
instead of requiring the originating server to always either send 
redirects or serve the content itself.


From gojomo at usa.net  Fri Nov 29 09:48:01 2002
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] The 'MetaWeb' and the Slashdot effect
References: <3DE73D10.6080608@notdot.net> <3DE79B69.5000807@akamail.com>
Message-ID: <002f01c297cf$2a98b2f0$4d30af18@gojovaio>

> The Case for Cooperative Networking*, Venkata N. Padmanabhan and 
> Kunwadee Sripanidkulchai. IPTPS '02 
> <http://www.cs.rice.edu/Conferences/IPTPS02/>.
> http://detache.cmcl.cs.cmu.edu/~kunwadee/research/papers/coopnetiptps.pdf

Another paper exploring similar ideas, presented at the same
conference:

  Peer-to-Peer Caching Schemes to Address Flash Crowds
  by Tyron Stading, Petros Maniatis, and Mary Baker
  http://mosquitonet.stanford.edu/publications/Backslash-IPTPS2002.pdf

- Gojomo


From agm at SIMS.Berkeley.EDU  Fri Nov 29 13:21:01 2002
From: agm at SIMS.Berkeley.EDU (Antonio Garcia)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] About search methods, key revokation in PKI and
 signature management
In-Reply-To: <1038559300.3de728446fc72@tammi2.cc.jyu.fi>
Message-ID: <Pine.GSO.4.44.0211290919190.22504-100000@info.sims.berkeley.edu>

>
> a) Is DHT-based routing model (O(log(n)) the best effort for finding (data)
> blocks in p2p network ?

Depends on the P2P network. In Gnutella, there are no bounds on searches.
In CHORD and Tapestry, it is log n, in CAN it is n^(1/d), where d is the
dimension of the logical space.

>
> b) What are assumptions for the best effort ? Look question a (what ever it is).
>

Rather complicated question! I would recommend reading the papers...

>
> d) How digital signatures should be managed (PKI) in p2p environment ?

That's unresolved, and would be an excellent paper if you figured it out.

>
> e) How do I know that if I have searched data, results are accurate (not fake
> blocks etc.)

Well, if you are searching by file hash, then it's pretty likely to be
what you are looking for. If you are searching by meta-data, then who
knows what you're getting...

A.

~~~~~~~~~~~~~~~~~~~~~~~
Antonio Garcia-Martinez
cryptologia.com


From seth.johnson at RealMeasures.dyndns.org  Fri Nov 29 13:40:02 2002
From: seth.johnson at RealMeasures.dyndns.org (Seth Johnson)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] DEADLINE Thursday:  Stop the FCC "Broadcast Flag" Proposal!
Message-ID: <3DE7DCEC.3D60F61E@RealMeasures.dyndns.org>

New Yorkers for Fair Use Action Alert:
--------------------------------------

Please send a comment opposing the "Broadcast Flag" Proposal
to the FCC by this Thursday, December 5, 2002.

Tell the FCC to Serve the Public, Not Hollywood!


Okay, you folks understand this issue -- it's very important
to send word to the FCC by the public comments deadline,
this Thursday, December 5, that you OPPOSE the Notice of
Proposed Rulemaking #02-230.  This rule would make it
illegal for ordinary citizens to own fully functional
digital television devices.  We've made it easy; just follow
the links below.


1) Please send in your comments to the FCC using the form
provided below.  Tell them that the movie industry should
not have a special privilege to own fully-functional digital
television devices.  Read the alert below for details.

2) Please forward this alert to any other interested parties
that you know of, who would understand and see the
importance of this issue.

3) Volunteer to help us with this and other alerts related
to your rights to flexible information technology in the
future.  Two roles you can take up are to become a Press
Outreach Campaigner or a Commentator.  Simply reply to this
email to show your interest.


New Yorkers for Fair Use Action Alert:
--------------------------------------

Tell the FCC to Serve the Public, Not Hollywood!

Public Comments Needed to Stop the "Broadcast Flag" Proposal
at the FCC


Please follow this link and use the form on the Center for
Democracy and Technology's site to let the FCC know that the
public's rights are at stake:
http://www.nyfairuse.org/action/fcc.flag.xhtml.

What's Going On:

The FCC is considering a proposal that digital televisions
be required to work only according to the rules set by
Hollywood, through the use of a "broadcast flag" assigned to
digital TV broadcasts.

Through the deliberations of a group called the Broadcast
Protection Discussion Group which assiduously discounted the
public's rights to use flexible information technology,
Hollywood and leading technology players have devised a plan
that would only allow "professionals" to have
fully-functional devices for processing digital broadcast
materials.

Hollywood and content producers must not be allowed to
determine the rights of the public to use flexible
information technology. The idea of the broadcast flag is to
implement universal content control and abolish the right of
free citizens to own effective tools for employing digital
content in useful ways. The broadcast flag is theft.

In the ongoing fight with old world content industries, the
most essential rights and interests in a free society are
those of the public.  Free citizens are not mere consumers;
they are not a separate group from so-called
"professionals." The stakeholders in a truly just
information policy in a free society are the public, not
those who would reserve special rights to control public
uses of information technology.

Please go to the Center for Democracy and Technology's
Broadcast Flag Action Page and use their form to let the FCC
know that the public's rights are at stake:
http://www.cdt.org/action/copyright/.

----

Some background links:

http://bpdg.blogs.eff.org/archives/one-page.pdf
http://www.eff.org/effector/HTML/effect15.22.html#III
http://www.cdt.org/press/020807press.shtml
http://scriban.com/movabletype_archives/000334.shtml
http://scriban.com/movabletype_archives/000331.shtml

The following link is the FCC's "Notice of Proposed
Rulemaking" for the broadcast flag.

http://hraunfoss.fcc.gov/edocs_public/attachmatch/FCC-02-231A1.pdf


From coderman at mindspring.com  Fri Nov 29 17:45:01 2002
From: coderman at mindspring.com (coderman)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] About search methods, key revokation in PKI and
 signature management
References: <1038559300.3de728446fc72@tammi2.cc.jyu.fi>
Message-ID: <3DE81836.30306@mindspring.com>

hemppah@cc.jyu.fi wrote:
> Hi,
> 
> Currently I'm doing my Master thesis which focuses some specific issues related
> to p2p systems. I have a few questions concercning these issues:
> 
> a) Is DHT-based routing model (O(log(n)) the best effort for finding (data)
> blocks in p2p network ?

I have a biased description of search/discovery methods for large peer
networks (out of date - i will revise one of these days...) which may be
helpfull:

http://cubicmetercrystal.com/alpine/discovery.html


> Any help would be grateful.
> 
> Thanks,
> Hermanni Hyyti?l?
> 
> 
> -------------------------------------------------
> This mail sent through IMP: http://horde.org/imp/
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> 


-- 
_____________________________________________________________________
  coderman@mindspring.com               http://cubicmetercrystal.com/
  key fingerprint: 9C00 C63E A71D D488 AF17  F406 56FB 71D9 E17D E793
                ( see html source for public key )
---------------------------------------------------------------------


From coderman at mindspring.com  Fri Nov 29 17:47:01 2002
From: coderman at mindspring.com (coderman)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] About search methods, key revokation in PKI and
 signature management
References: <Pine.GSO.4.44.0211290919190.22504-100000@info.sims.berkeley.edu>
Message-ID: <3DE81898.20202@mindspring.com>

Antonio Garcia wrote:
>>a) Is DHT-based routing model (O(log(n)) the best effort for finding (data)
>>blocks in p2p network ?
> 
> 
> Depends on the P2P network. In Gnutella, there are no bounds on searches.
> In CHORD and Tapestry, it is log n, in CAN it is n^(1/d), where d is the
> dimension of the logical space.

Another important factor to consider is the type of search provided.  For
example, DHT's require key - value associations which is difficult to use
for arbitrary keyword or metadata searches (where partial criteria must be
considered).  Gnutella is more flexible in this regard, however, such
flexibility comes at the price of increased inefficiency.


-- 
_____________________________________________________________________
  coderman@mindspring.com               http://cubicmetercrystal.com/
  key fingerprint: 9C00 C63E A71D D488 AF17  F406 56FB 71D9 E17D E793
                ( see html source for public key )
---------------------------------------------------------------------


From hal at finney.org  Fri Nov 29 18:20:01 2002
From: hal at finney.org (Hal Finney)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] DEADLINE Thursday:  Stop the FCC "Broadcast Flag" Proposal!
Message-ID: <200211300218.gAU2IdM02270@finney.org>

> Please follow this link and use the form on the Center for
> Democracy and Technology's site to let the FCC know that the
> public's rights are at stake:
> http://www.nyfairuse.org/action/fcc.flag.xhtml.

This is probably not the right forum to discuss this, but I found the
questions presented for comment at the CDT site somewhat baffling:

   Will the broadcast flag interfere with consumers' ability to make
   copies of DTV content for their personal use, either on personal
   video recorders or removable media?

   Would the digital flag interfere with consumers' ability to send DTV
   content across networks, such as home digital networks connecting
   digital set top boxes, digital recorders, digital servers and digital
   display devices?

   Would the broadcast flag requirement limit consumers' ability to
   use their existing electronic equipment (equipment not built to look
   for the flag) or make it difficult to use older components with new
   equipment that is compliant with the broadcast flag standard?

   Would a broadcast flag requirement limit the development of future
   equipment providing consumers with new options?

   What will be the cost impact, if any, that a broadcast flag requirement
   would have on consumer electronics equipment?

It doesn't seem to me that the average consumer would be in position to
answer virtually any of these questions.  They all require some knowledge
of the details of the broadcast flag, and some depend on policy decisions
which are yet to be made!

How can a consumer judge what the cost impact of a broadcast flag will be?
We don't know anything about designing digital video devices, how the
BF would affect the parts count and the overall cost.  How can we know
if the BF will interfere with our ability to make copies for personal
use, or to send information across home networks - doesn't that depend
on what restrictions get implemented?  How can we know what degree of
backwards compatibility for existing equipment will be possible, or how
the BF will affect future designs?

I am baffled why the CDT and other online groups would be encouraging
consumers to make what will certainly be completely uninformed and
inexpert judgements on questions that they are totally unqualified
to answer!

What the government *should* ask, if they care about the public's
opinions, are policy questions, like: should the government aim to make
it technically difficult for people to share digital TV broadcasts
over the Internet?  This is an issue where everyone has an opinion,
and mine is as good as yours.  This is an issue where consumers could
give meaningful input.

Anyway, while I was happy to have a chance to register my opinion on the
Proposed Rule Making, I was disappointed to learn that they were asking
me all these technical questions where I would have to learn a lot more
about digital video to feel qualified to answer.

Hal Finney

From seth.johnson at RealMeasures.dyndns.org  Fri Nov 29 18:38:02 2002
From: seth.johnson at RealMeasures.dyndns.org (Seth Johnson)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] DEADLINE Thursday:  Stop the FCC "Broadcast Flag" 
 Proposal!
References: <200211300218.gAU2IdM02270@finney.org>
Message-ID: <3DE8246E.479443F4@RealMeasures.dyndns.org>

Hello Hal,

Yes, we have only used the CDT's website because they have
an easy to use form that generates the metadata-coded email
that the FCC's comments system uses.  Those questions
reflect the questions in the NPRM.

We think it unfortunate that both the FCC and the CDT are
asking for "consumer" analyses, as if one can actually draw
such a distinction for the Internet (which you can't, for
reasons that P2P Hackers understand well).  And simply
reflecting the FCC's questions doesn't encourage people to
speak freely and in terms of how they see the issue.

However, there is a general comments field at the bottom of
the form.  And one does *not* have to accept the premises
loaded into the questions.  And it is certainly possible to
speak according to principles without having to get into the
technical aspects too much.  I find it most helpful to
understand that the whole deal, no matter the technical
details, sets up some people as calling the shots and making
the rest into mere consumers, rather than equal citizens
online.

Hope that helps.  Now back to P2P Hacking . . .

Seth Johnson


Hal Finney wrote:
> 
> > Please follow this link and use the form on the Center for
> > Democracy and Technology's site to let the FCC know that the
> > public's rights are at stake:
> > http://www.nyfairuse.org/action/fcc.flag.xhtml.
> 
> This is probably not the right forum to discuss this, but I found the
> questions presented for comment at the CDT site somewhat baffling:
> 
>    Will the broadcast flag interfere with consumers' ability to make
>    copies of DTV content for their personal use, either on personal
>    video recorders or removable media?
> 
>    Would the digital flag interfere with consumers' ability to send DTV
>    content across networks, such as home digital networks connecting
>    digital set top boxes, digital recorders, digital servers and digital
>    display devices?
> 
>    Would the broadcast flag requirement limit consumers' ability to
>    use their existing electronic equipment (equipment not built to look
>    for the flag) or make it difficult to use older components with new
>    equipment that is compliant with the broadcast flag standard?
> 
>    Would a broadcast flag requirement limit the development of future
>    equipment providing consumers with new options?
> 
>    What will be the cost impact, if any, that a broadcast flag requirement
>    would have on consumer electronics equipment?
> 
> It doesn't seem to me that the average consumer would be in position to
> answer virtually any of these questions.  They all require some knowledge
> of the details of the broadcast flag, and some depend on policy decisions
> which are yet to be made!
> 
> How can a consumer judge what the cost impact of a broadcast flag will be?
> We don't know anything about designing digital video devices, how the
> BF would affect the parts count and the overall cost.  How can we know
> if the BF will interfere with our ability to make copies for personal
> use, or to send information across home networks - doesn't that depend
> on what restrictions get implemented?  How can we know what degree of
> backwards compatibility for existing equipment will be possible, or how
> the BF will affect future designs?
> 
> I am baffled why the CDT and other online groups would be encouraging
> consumers to make what will certainly be completely uninformed and
> inexpert judgements on questions that they are totally unqualified
> to answer!
> 
> What the government *should* ask, if they care about the public's
> opinions, are policy questions, like: should the government aim to make
> it technically difficult for people to share digital TV broadcasts
> over the Internet?  This is an issue where everyone has an opinion,
> and mine is as good as yours.  This is an issue where consumers could
> give meaningful input.
> 
> Anyway, while I was happy to have a chance to register my opinion on the
> Proposed Rule Making, I was disappointed to learn that they were asking
> me all these technical questions where I would have to learn a lot more
> about digital video to feel qualified to answer.
> 
> Hal Finney
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers

-- 

DRM is Theft!  We are the Stakeholders!

New Yorkers for Fair Use
http://www.nyfairuse.org

[CC] Counter-copyright:
http://cyber.law.harvard.edu/cc/cc.html

I reserve no rights restricting copying, modification or
distribution of this incidentally recorded communication. 
Original authorship should be attributed reasonably, but
only so far as such an expectation might hold for usual
practice in ordinary social discourse to which one holds no
claim of exclusive rights.


From hemppah at cc.jyu.fi  Sat Nov 30 04:28:01 2002
From: hemppah at cc.jyu.fi (hemppah@cc.jyu.fi)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] About search methods, key revokation in PKI and 
Message-ID: <1038658842.3de8ad1a64bfd@tammi2.cc.jyu.fi>

Hi, 

Thanks for you answer. Please see my answers below.

>Date: Fri, 29 Nov 2002 09:22:10 -0800 (PST) 
>From: Antonio Garcia <agm@SIMS.Berkeley.EDU> 
>To: p2p-hackers@zgp.org 
>Subject: Re: [p2p-hackers] About search methods, key revokation in PKI and 
> signature management 
>Reply-To: p2p-hackers@zgp.org 

> 
>> a) Is DHT-based routing model (O(log(n)) the best effort for finding (data) 
>> blocks in p2p network ? 

>Depends on the P2P network. In Gnutella, there are no bounds on searches. 
>In CHORD and Tapestry, it is log n, in CAN it is n^(1/d), where d is the 
>dimension of the logical space. 

Yes, that's true. The point I was trying to say was that is O(log n) really the 
best effort and only implemented by DHTs. So there really is no other "log n"s
than DHTs (e.g. trees or tries) ?

>> 
>> b) What are assumptions for the best effort ? Look question a (what ever it 
>>is). 
>> 

>Rather complicated question! I would recommend reading the papers... 

Yes, rather unbounded question ;).

Basically, I just ment that, for example, in DHTs the basic assumption is that 
you can't share your resources from your *own* computer; DHTs maps keys to 
values in m-bit virtual space address. And in Gnutella, the basic assumption is 
contrast. You have (well you don't *have* to but..) to share your resources 
from your own computer.

etc.


>> 
>> d) How digital signatures should be managed (PKI) in p2p environment ? 

>That's unresolved, and would be an excellent paper if you figured it out. 

Hmm..I think Groove has quite clever signature management system, but I'm not 
sure how well it can scale (e.g. 1,000,000,000s of users).

>> 
>> e) How do I know that if I have searched data, results are accurate (not 
>>fake 
>> blocks etc.) 

>Well, if you are searching by file hash, then it's pretty likely to be 
>what you are looking for. If you are searching by meta-data, then who 
>knows what you're getting... 

Hmm..this *might* be weird question, but can Tree Hash EXchange format (THEX) 
combine file hashes and metadata somehow ?

Thanks,
Hermanni


-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/

From hemppah at cc.jyu.fi  Sat Nov 30 04:48:01 2002
From: hemppah at cc.jyu.fi (hemppah@cc.jyu.fi)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] About search methods, key revokation in PKI and 
Message-ID: <1038660062.3de8b1dea30a6@tammi2.cc.jyu.fi>

>>>a) Is DHT-based routing model (O(log(n)) the best effort for finding (data) 
>>>blocks in p2p network ? 
>> 
>> 
>> Depends on the P2P network. In Gnutella, there are no bounds on searches. 
>> In CHORD and Tapestry, it is log n, in CAN it is n^(1/d), where d is the 
>> dimension of the logical space. 

>Another important factor to consider is the type of search provided.  For 
>example, DHT's require key - value associations which is difficult to use 
>for arbitrary keyword or metadata searches (where partial criteria must be 
>considered).  Gnutella is more flexible in this regard, however, such 
>flexibility comes at the price of increased inefficiency. 

Does anyone know if there is any other projects whose aim is to combine DHTs 
and Gnutella-like systems than YAPPERS (see 
http://dbpubs.stanford.edu:8090/pub/2002-24) ? As far as I know, there has been 
some discussion in academical world also (see e.g. 
http://nile.usc.edu/research.htm).

Thanks,
Hermanni Hyyti?l?

-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/

From barnesjf at vuse.vanderbilt.edu  Sat Nov 30 12:38:01 2002
From: barnesjf at vuse.vanderbilt.edu (J. Fritz Barnes)
Date: Sat Dec  9 22:12:04 2006
Subject: [p2p-hackers] About search methods, key revokation in PKI and
In-Reply-To: <1038658842.3de8ad1a64bfd@tammi2.cc.jyu.fi>; from hemppah@cc.jyu.fi on Sat, Nov 30, 2002 at 02:20:42PM +0200
References: <1038658842.3de8ad1a64bfd@tammi2.cc.jyu.fi>
Message-ID: <20021130134801.B10885@vuse.vanderbilt.edu>

On Sat, Nov 30, 2002 at 02:20:42PM +0200, hemppah@cc.jyu.fi wrote:
:) 
:) > 
:) >> a) Is DHT-based routing model (O(log(n)) the best effort for finding (data) 
:) >> blocks in p2p network ? 
:) 
:) Yes, that's true. The point I was trying to say was that is O(log n) really the 
:) best effort and only implemented by DHTs. So there really is no other "log n"s
:) than DHTs (e.g. trees or tries) ?
:) 

There are two measurements to evaluate the routing by: amount of
space required and number of hops required to find a specific
piece of information.  The DHT systems (Chord, Pastry, Tapestry)
tend to be O(log(n)) for both space and number of hops.  If you
were willing to store additional information; you could have O(n)
space and O(1) lookup.  This is a spectrum... on the other side
each host might only keep the next peer in which case it would be
O(1) space and O(n) lookup.

Fritz