From zooko at zooko.com Fri Nov 1 06:24:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding Message-ID: I've written a rationale for my base-32 encoding scheme. The "DESIGN" document is visible via viewcvs here: http://cvs.sf.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD&content-type=text/vnd.viewcvs-markup In order to minimize the chance that you ignore this message, I will now include the contents of the file in this message. The version number of the DESIGN file is v0.9 so that I can incorporate feedback from p2p-hackers (and any other sources) before naming it "v1.0". ------- begin appended file "DESIGN" Zooko O'Whielacronx November 2002 human-oriented base-32 encoding INTRO The base-32 encoding implemented in this library differs from that described in draft-josefsson-base-encoding-04.txt [1], and as a result is incompatible with that encoding. This document describes why we made that choice. This encoding is implemented in a project named libbase32 [2]. This is version 0.9 of this document. The latest version should always be available at: http://cvs.sf.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD RATIONALE Basically, the rationale for base-32 encoding in [1] is as written therein: "The Base 32 encoding is designed to represent arbitrary sequences of octets in a form that needs to be case insensitive but need not be humanly readable.". The rationale for libbase32 is different -- it is to represent arbitrary sequences of octets in a form that is as convenient as possible for human users to manipulate. In particular, libbase32 was created in order to serve the Mnet project [3], where 40-octet cryptographic values are encoded into URIs for humans to manipulate. Anticipated uses of these URIs include cut-and-paste, text editing (e.g. in HTML files), manual transcription via a keyboard, manual transcription via pen-and-paper, vocal transcription over phone or radio, etc. The desiderata for such an encoding are: * minimizing transcription errors -- e.g. the well-known problem of confusing `0' with `O' * encoding into other structures -- e.g. search engines, structured or marked- up text, file systems, command shells * brevity -- Shorter URLs are better than longer ones. * ergonomics -- Human users (especially non-technical ones) should find the URIs as easy and pleasant as possible. The uglier the URI looks, the worse. DESIGN Base The first decision we made was to use base-32 instead of base-64. An earlier version of this project used base-64, but a discussion on the p2p-hackers mailing list [4] convinced us that the added length of base-32 encoding was worth the added clarity provided by: case-insensitivity, the absence of non- alphanumeric characters, and the ability to omit a few of the most troublesome alphanumeric characters. In particular, we realized that it would probably be faster and more comfortable to vocally transcribe a base-32 encoded 40-octet string (64 characters, case- insensitive, no non-alphanumeric characters) than a base-64 encoded one (54 characters, case-sensitive, plus two non-alphanumeric characters). Alphabet There are 26 alphabet characters and 10 digits, for a total of 36 characters available. We need only 32 characters for our base-32 alphabet, so we can choose four characters to exclude. This is where we part company with traditional base-32 encodings. For example [1] eliminates `0', `1', `8', and `9'. This choice eliminates two characters that are unambiguous (`8' and `9') while retaining others that are potentially confusing. Others have suggested eliminating `0', `1', `O', and `L', which is likewise suboptimal. Our choice of confusing characters to eliminate is: `0', `l', `v', and `2'. Our reasoning is that `0' is potentially mistaken for `o', that `l' is potentially mistaken for `1' or `i', that `v' is potentially mistaken for `u' or `r' (especially in handwriting) and that `2' is potentially mistaken for `z' (especially in handwriting). Note that we choose to focus on typed and written transcription errors instead of vocal, since humans already have a well-established system of disambiguating spoken alphanumerics (such as the United States military's "Alpha Bravo Charlie Delta" and telephone operators' "Is that 'd' as in 'dog'?"). Sub-Octet Data Suppose you have 10 bits of data to transmit, and the recipient (the decoder) is expecting 10 bits of data. All previous base-32 encoding schemes assume that the binary data to be encoded is in 8-bit octets, so you would have to pad the data out to 2 octets and encode it in base-32, resulting in a string 4-characters long. The decoder will decode that into 2 octets (16 bits) and then ignore the least significant 6 bits. In the base-32 encoding described here, if the encoder and decoder both know the exact length of the data in bits (modulo 40), then they can use this shared information to optimize the size of the transmitted (encoded) string. In the example that you have 10 bits of data to transmit, libbase32 allows you to transmit the optimal encoded string: two characters. If the length in bits is always a multiple of 8, or if both sides are not sure of the length in bits modulo 40, or if this encoding is being used in a way that optimizing one or two characters out of the encoded string isn't worth the potential confusion, you can always use this encoding the same way you would use other encodings -- with an "input is in 8-bit octets" assumption. Padding Honestly, I don't understand why all the base-32 and base-64 encodings require trailing padding. Maybe I'm missing something, and when I publish this document people will point it out, and then I'll hastily erase this paragraph. [1] http://www.ietf.org/internet-drafts/draft-josefsson-base-encoding-04.txt [2] http://sf.net/projects/libbase32 [3] http://mnet.sf.net/ [4] http://zgp.org/pipermail/p2p-hackers/2001-October/ From Raphael_Manfredi at pobox.com Fri Nov 1 06:59:02 2002 From: Raphael_Manfredi at pobox.com (Raphael Manfredi) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Re: human-oriented base-32 encoding In-Reply-To: References: Message-ID: Quoting p2p-hackers@zgp.org from ml.p2p.hackers: :There are 26 alphabet characters and 10 digits, for a total of 36 characters :available. We need only 32 characters for our base-32 alphabet, so we can :choose four characters to exclude. This is where we part company with :traditional base-32 encodings. For example [1] eliminates `0', `1', `8', and :`9'. This choice eliminates two characters that are unambiguous (`8' and `9') :while retaining others that are potentially confusing. Others have suggested :eliminating `0', `1', `O', and `L', which is likewise suboptimal. : :Our choice of confusing characters to eliminate is: `0', `l', `v', and `2'. Our :reasoning is that `0' is potentially mistaken for `o', that `l' is potentially :mistaken for `1' or `i', that `v' is potentially mistaken for `u' or `r' :(especially in handwriting) and that `2' is potentially mistaken for `z' :(especially in handwriting). Your choice is as arbitrary as the others, despite the care with which you chose your letters. Indeed, "9" can be mistaken with "g", especially in handwriting. And I've seen people spell out "8" as "B", so... :Honestly, I don't understand why all the base-32 and base-64 encodings require :trailing padding. To know that it has not been truncated if you are in the middle of a sequence? Do we really need yet another incompatible base32 encoding? You might not know it, but Gnutella already standardized on using "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567" as the base32 alphabet. Raphael From zooko at zooko.com Fri Nov 1 07:44:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Re: human-oriented base-32 encoding In-Reply-To: Message from Raphael_Manfredi@pobox.com (Raphael Manfredi) of "01 Nov 2002 14:58:06 GMT." References: Message-ID: > :Our choice of confusing characters to eliminate is: `0', `l', `v', and `2'. Our > :reasoning is that `0' is potentially mistaken for `o', that `l' is potentially > :mistaken for `1' or `i', that `v' is potentially mistaken for `u' or `r' > :(especially in handwriting) and that `2' is potentially mistaken for `z' > :(especially in handwriting). > > Your choice is as arbitrary as the others, despite the care with which you > chose your letters. Indeed, "9" can be mistaken with "g", especially in > handwriting. And I've seen people spell out "8" as "B", so... These are good points, but I still think that `l' is more likely to be confused with `i' or `I' than `1' is, and that `v' and `2' are more troublesome than `8' and `9'. This stuff isn't "arbitrary" but we are unfortunately compelled to base our decisions on our own intuitions, as I haven't seen any quantitative, scientific analysis of this kind of transcription error. If anyone knows of some I would love to see it. > :Honestly, I don't understand why all the base-32 and base-64 encodings require > :trailing padding. > > To know that it has not been truncated if you are in the middle of a sequence? So it's a very weak kind of error detection? (Namely, it doesn't detect truncation between sequences, nor several other kinds of error.) Thanks for the clue. I'll update my doc to say that this is the motivation, but it is one that doesn't motivate me. > Do we really need yet another incompatible base32 encoding? You might not > know it, but Gnutella already standardized on using > "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567" as the base32 alphabet. This is the same alphabet used the Internet Draft, which I refer to in my document. I think my alphabet has better resistance to transcription errors, and I think that this feature is more valuable to me than sharing an encoding with Gnutella is. Note: I'm actually quite interested in *interoperating* with Gnutella, and in fact I occasionally hack on a Gnutella implementation [1]. But I don't think having an identical base-32 encoding is more important than optimizing the alphabet for human use. Oh damn, I just realized that my DESIGN doc doesn't talk about the order of the alphabet... Regards, Zooko [1] http://twistedmatrix.com/users/jh.twistd/viewcvs/cgi/viewcvs.cgi/twisted/protocols/gnutella.py?rev=HEAD&content-type=text/vnd.viewcvs-markup&cvsroot=Twisted From Raphael_Manfredi at pobox.com Fri Nov 1 08:01:01 2002 From: Raphael_Manfredi at pobox.com (Raphael Manfredi) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Re: human-oriented base-32 encoding In-Reply-To: References: Message-ID: Quoting p2p-hackers@zgp.org from ml.p2p.hackers: :This is the same alphabet used the Internet Draft, which I refer to in my :document. I think my alphabet has better resistance to transcription errors, :and I think that this feature is more valuable to me than sharing an encoding :with Gnutella is. Really? What is the value of this base32 encoding then: ABCDEFGHIJKLMNPQRSTUWXYZ345678ABC When you see: /uri-res/N2R?urn:sha1:ABCDEFGHIJKLMNPQRSTUWXYZ345678ABC you better know the alphabet used to be able to determine the SHA1 digest correctly. And all the people producing those URLs have better use the SAME alphabet. What can be more important than that? Raphael From lgonze at panix.com Fri Nov 1 08:03:01 2002 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Re: human-oriented base-32 encoding In-Reply-To: Message-ID: > > Do we really need yet another incompatible base32 encoding? You might not > > know it, but Gnutella already standardized on using > > "ABCDEFGHIJKLMNOPQRSTUVWXYZ234567" as the base32 alphabet. > > This is the same alphabet used the Internet Draft, which I refer to in my > document. I think my alphabet has better resistance to transcription errors, > and I think that this feature is more valuable to me than sharing an encoding > with Gnutella is. This difference would create a different *kind* of transcription error, which is that a base32 identifier in either alphabet would have to specify which alphabet it was. A fix for this is to make sure that there is a disambiguating character in any string in your alphabet. That is not guaranteed in the wild, so you have to put it there. For example, you could prepend a (case-sensitive) lowercase "z" to every identifier. If that character was always there, the human cost to remember it would be as low as the human cost to remember the "www." and ".com" parts of "www.{foo}.com". - Lucas From zooko at zooko.com Fri Nov 1 08:28:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Re: human-oriented base-32 encoding In-Reply-To: Message from Lucas Gonze of "Fri, 01 Nov 2002 10:59:55 EST." References: Message-ID: Lucas Gonze wrote: > > This difference would create a different *kind* of transcription error, > which is that a base32 identifier in either alphabet would have to specify > which alphabet it was. ... > If that character was always there, the human cost to remember it would be > as low as the human cost to remember the "www." and ".com" parts of > "www.{foo}.com". That is a good point. I should specify some assumptions/requirements about the base32 encoded strings, namely that they are error-corrected (outside of the base32 encoding) and that which kind of base32 encoding they are, and indeed the fact that they *are* base32 encoded at all, is likewise transmitted out of band. (I already mention that the length of the binary data in bits can be, but doesn't have to be, transmitted out of band.) In particular I envision URI's that denote the overall scheme in the leading part, like: mnet://xpqff6aurk9jtpfahkkrcp3n684ci6x3hwirophsr5h1i1hp8swabhmh7tho The error correction is done by the transport mechanism (in this case e-mail) and the specification of encoding (as well as specification of everything else) is done by the leading part of the URI. If Mnet were to generate URIs that were intended to be used by other systems such as Gnutella, like: /uri-res/N2R?urn:sha1:ABCDEFGHIJKLMNPQRSTUWXYZ345678ABC Then it would emit the kind of encoding that is denoted by the URI specification. The issue of in-band vs. out-of-band transmission of this kind of, uh, "meta-data" is important, and I thank Raphael Manfredi and Lucas Gonze for bringing it up. I'm working on a revision of the DESIGN doc that goes into these issues, as well as addresses the issue of alphabet order, which I accidentally omitted from the first revision. Regards, Zooko From justin at chapweske.com Fri Nov 1 08:43:02 2002 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Re: human-oriented base-32 encoding References: Message-ID: <3DC2AEF1.6080004@chapweske.com> I agree with Lucas and Rapheal. I think the damage caused by using two incompatible base32 encodings far outweighs any benefits of zooko's new encoding. And note, its not just Gnutella using canonical base32, we also use it for the Content-Addressable Web and THEX specifications as well. Also realize that there are more p2p apps implementing the Gnutella protocol and thus supporting canonical base32 than the rest of the p2p apps combined. > > This difference would create a different *kind* of transcription error, > which is that a base32 identifier in either alphabet would have to specify > which alphabet it was. A fix for this is to make sure that there is a > disambiguating character in any string in your alphabet. That is not > guaranteed in the wild, so you have to put it there. For example, you > could prepend a (case-sensitive) lowercase "z" to every identifier. > > If that character was always there, the human cost to remember it would be > as low as the human cost to remember the "www." and ".com" parts of > "www.{foo}.com". > -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From zooko at zooko.com Fri Nov 1 09:34:02 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Re: human-oriented base-32 encoding In-Reply-To: Message from Justin Chapweske of "Fri, 01 Nov 2002 10:42:25 CST." <3DC2AEF1.6080004@chapweske.com> References: <3DC2AEF1.6080004@chapweske.com> Message-ID: Justin Chapweske wrote: > > I agree with Lucas and Rapheal. I think the damage caused by using two > incompatible base32 encodings far outweighs any benefits of zooko's new > encoding. I think there are two possible values to be gained from using the same base32 encoding: 1. Interoperation without the need for out-of-band signalling to tell which encoding is being used. 2. Re-use of source code/specs/mental effort. For the first value, consider the example that a user could retrieve a base32 encoded identifier from a web page, with no clarifying name to indicate what it identifies, and paste it into either Mnet or a Gnutella implementation and get the same result. This is currently impossible, not because Mnet and Gnutella use different ASCII encodings, but because they encode different things (hashes of files in Gnutella (?), hashes of dinodes in Mnet). Likewise, if someone gives you a Tiger tree hash of the contents and you need the SHA1 flat hash of the contents, then the ASCII encoding is irrelevant -- you can't use the data. Now in the future Mnet might do something with SHA1 hashes of files, or Gnutella might do something with Mnet dinodes (;-)), in which case a user might reasonably give the Mnet application a naked base32 encoded identifier and expect the Mnet application to do something with it. Now even if Mnet *did* use the same base32 encoding as Gnutella, it would still be faced with the mystery of whether the thing in question is a SHA1 hash of a file or an mnetId (the id of a dinode). There are three possible ways to disambiguate: a. Out-of-band signalling, such as a leading "mnet://" or "SHA1:" or such. This is certainly best from the point of view from the application (and the programmer), but the user might not play along. b. In-band signalling, such as Lucas's clever suggesting of including an unnecessary signalling character, or (my preference) using a different (slightly shorter) length for mnetIds. c. Suck it and see. Attempt to download the file both ways, and then whichever one works, cryptographically verify its correctness. It's comforting to know that this can be done in a pinch, but it hardly seems to first choice. In short, using a different base32 encoding doesn't make interoperation any harder, as far as I can see, and in fact it offers an extra possibility for making it *easier* by using Lucas's trick. (Hm. In fact, since each character has a 1 in 16 chance of being a Gnutella- base32-illegal character (8 or 9), a 64-character string has only a 1.6% chance of being both a Gnutella-base32-legal string and an Mnet-base32-legal string. I can make this even less likely by making `8' and `9' be common trailing characters... Hm... Heh heh. I can also use Lucas's trick by appending a useless "8" character only in the 1% case that one has not already occurred naturally...) Anyway, if Mnet ever emits SHA1 hashes of the contents of files, then the argument about interoperation applies. As long as Mnet is emitting a semantically different object, then the argument applies in the opposite direction! As to the second value of re-using source code and so forth, I've already written my base32 implementation and I enjoyed it. You can have the ANSI C and Python implementations under a permissive BSD-style license. http://sf.net/projects/libbase32 You can also set the "alphabet" string to "abcdefghijklmnopqrstuvwxyz234567" in order to make it identical to the Gnutella encoding. Regards, Zooko From justin at chapweske.com Fri Nov 1 09:44:01 2002 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Re: human-oriented base-32 encoding References: <3DC2AEF1.6080004@chapweske.com> Message-ID: <3DC2BD38.40300@chapweske.com> Can you please call it something other than just 'libbase32' as there is already enough confusion between developers about the various encodings. Perhaps call it 'mnet32' encoding or some-such. > > As to the second value of re-using source code and so forth, I've already > written my base32 implementation and I enjoyed it. You can have the ANSI C and > Python implementations under a permissive BSD-style license. > > http://sf.net/projects/libbase32 > > You can also set the "alphabet" string to "abcdefghijklmnopqrstuvwxyz234567" in > order to make it identical to the Gnutella encoding. > > -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From zooko at zooko.com Fri Nov 1 09:50:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Re: human-oriented base-32 encoding In-Reply-To: Message from Justin Chapweske of "Fri, 01 Nov 2002 11:43:20 CST." <3DC2BD38.40300@chapweske.com> References: <3DC2AEF1.6080004@chapweske.com> <3DC2BD38.40300@chapweske.com> Message-ID: Justin Chapweske wrote: > > Can you please call it something other than just 'libbase32' as there is > already enough confusion between developers about the various encodings. > Perhaps call it 'mnet32' encoding or some-such. I'll make the specification of the alphabet a visible part of the interface (i.e. the programmer who uses libbase32 has to choose an alphabet, and one of the options is "abcdefghijklmnopqrstuvwxyz234567"). I *do* need a name for my encoding. --Z From me at aaronsw.com Fri Nov 1 14:36:01 2002 From: me at aaronsw.com (Aaron Swartz) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding In-Reply-To: Message-ID: <42B62FAC-EDEA-11D6-8DD8-003065F376B6@aaronsw.com> why did you choose to limit yourself to numbers and letters? it seems a more human-friendly scheme (while remaining url-compatible) would be: abcdefghikmopsuwxyz345678$@^-+;~ (there's probably some room for improvement) am i missing something? -- Aaron Swartz [http://www.aaronsw.com] "Curb your consumption," he said. From barnesjf at vuse.vanderbilt.edu Fri Nov 1 15:35:02 2002 From: barnesjf at vuse.vanderbilt.edu (J. Fritz Barnes) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding In-Reply-To: <42B62FAC-EDEA-11D6-8DD8-003065F376B6@aaronsw.com>; from me@aaronsw.com on Fri, Nov 01, 2002 at 04:35:45PM -0600 References: <42B62FAC-EDEA-11D6-8DD8-003065F376B6@aaronsw.com> Message-ID: <20021101172920.F1017@vuse.vanderbilt.edu> On Fri, Nov 01, 2002 at 04:35:45PM -0600, Aaron Swartz wrote: :) why did you choose to limit yourself to numbers and letters? it seems a :) more human-friendly scheme (while remaining url-compatible) would be: :) :) abcdefghikmopsuwxyz345678$@^-+;~ :) :) (there's probably some room for improvement) am i missing something? The above choice of strings has a small user-interface disadvantage. If I double-click on the string above, it will only select the characters to the left of the dollar-sign. (Therefore, is slightly less user-friendly for cut-and-paste applications.) Fritz From lgonze at panix.com Fri Nov 1 15:48:01 2002 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding In-Reply-To: <20021101172920.F1017@vuse.vanderbilt.edu> Message-ID: But while we're on the subject, '-' and '_' would make these identifiers more chunkable. for example 18005551212 is harder to transcribe than 1-800-555-1212. It's pure cognitive trickery, there's no extra meaning at all, but it works. On Fri, 1 Nov 2002, J. Fritz Barnes wrote: > On Fri, Nov 01, 2002 at 04:35:45PM -0600, Aaron Swartz wrote: > :) why did you choose to limit yourself to numbers and letters? it seems a > :) more human-friendly scheme (while remaining url-compatible) would be: > :) > :) abcdefghikmopsuwxyz345678$@^-+;~ > :) > :) (there's probably some room for improvement) am i missing something? > > The above choice of strings has a small user-interface > disadvantage. If I double-click on the string above, it will > only select the characters to the left of the dollar-sign. > (Therefore, is slightly less user-friendly for cut-and-paste > applications.) > > Fritz From gojomo at usa.net Fri Nov 1 16:01:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding References: Message-ID: <008001c28202$d46c0670$640a000a@golden> I agree that ease of person-to-person communication, in spoken and handwritten forms, is an important consideration (especially between encoding-digit candidates which are equally beneficial in other respects, since computers don't care). However, I think you're undervaluing standardization, and overestimating the importance of handling the a few unrepresentative "sloppy handwriting" problems. I actually preferred the Base32 alphabet that was proposed in one of the (since discarded) international-DNS proposals, which left out both the letters 'o' and 'L' and the numbers '0' and '1' -- leaving no possibility of misunderstanding either way. But the Base32 alphabet which goes... ABCDEFGHIJKLMNOPQRSTUVWXYZ234567 ...was already in use by another IETF effort (something related to SASL GSSAPI), and already well-documented in an Internet-Draft (Joseffson's) that is apparently headed for status as a referenceable, numbered IETF RFC. The clarity of being able to say simply, "Base32", and without further discussion have one clear digit-set be implied, is very useful. It will aid in the reuse of encoding routines, and avoid silly bugs due to people working from different assumptions and incomplete docs. So it benefits us all to christen one "Base32" as "canonical" or "internet standard" -- and having "one" is more important than having "the best" (for any given, and probably idiosyncratic and arguable, definition of "best"). Also, while the 'l' <-> '1' and '0' <-> 'O' isomorphisms are very troublesome -- in some screen/print fonts they are identical, or only distinguishable with careful font-specific pixel-level observation -- the other transcription risks you've chosen to protect against are: - only problematic in "sloppy" handwriting or artsy fonts; no one's - only a few among a gigantic potential set of miswritten, misread, or misheard glyphs What about '7' and 'T'? '2' and '7'? 'I' and '1' (one)? '5' and 'S'? '4' and '9'? I recently gave my phone number to someone, and I watched them write it down, apparently correctly. However, that same person, later confirming the information via email, converted: 987 5907 (the correct number, in their own handwriting) to 472 2402 !!!! Only the '0' came across right, though each number was visually 'close' to what it should have been. What about 'h' and 'n'? (Apparently, people often misread my handwritten last name "Mohr" as "Monr".) And if spoken identifiers are important, what about those classes of letters which sound alike over crummy phone connections? ('B' 'V', 'M' 'N', 'Z' 'C', etc. My sister tells me these are "fricatives", and I've never understood why the telephone-dependent airline-reservations industry doesn't just rule these characters out of their confirmation codes.) So you've addressed a few of the problems that have given the most problems in your anecdotal experience, but others' experience will surely differ, there's no reliable overall data, and the cost is deviation from the "Base32" alphabet which coders are most likely to see in other applications or find in preexisting docs and library code. At the very least, if you go with a custom alphabet, please call it "Mnet32" or some such, so that confusion in minimized, and generations of future hapless searchers don't come across your definition first, and think they've got *the* "Base32". - Gojomo ----- Original Message ----- From: "Zooko" To: Sent: Friday, November 01, 2002 6:19 AM Subject: [p2p-hackers] human-oriented base-32 encoding > > I've written a rationale for my base-32 encoding scheme. The "DESIGN" > document is visible via viewcvs here: > > http://cvs.sf.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD&content-type=text/vnd.viewcvs-markup > > In order to minimize the chance that you ignore this message, I will now include > the contents of the file in this message. The version number of the DESIGN file > is v0.9 so that I can incorporate feedback from p2p-hackers (and any other > sources) before naming it "v1.0". > > ------- begin appended file "DESIGN" > Zooko O'Whielacronx > November 2002 > > > > human-oriented base-32 encoding > > INTRO > > The base-32 encoding implemented in this library differs from that described in > draft-josefsson-base-encoding-04.txt [1], and as a result is incompatible with > that encoding. This document describes why we made that choice. > > This encoding is implemented in a project named libbase32 [2]. > > This is version 0.9 of this document. The latest version should always be > available at: > > http://cvs.sf.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD > > RATIONALE > > Basically, the rationale for base-32 encoding in [1] is as written therein: "The > Base 32 encoding is designed to represent arbitrary sequences of octets in a > form that needs to be case insensitive but need not be humanly readable.". > > The rationale for libbase32 is different -- it is to represent arbitrary > sequences of octets in a form that is as convenient as possible for human users > to manipulate. In particular, libbase32 was created in order to serve the Mnet > project [3], where 40-octet cryptographic values are encoded into URIs for > humans to manipulate. Anticipated uses of these URIs include cut-and-paste, > text editing (e.g. in HTML files), manual transcription via a keyboard, manual > transcription via pen-and-paper, vocal transcription over phone or radio, etc. > > The desiderata for such an encoding are: > > * minimizing transcription errors -- e.g. the well-known problem of confusing > `0' with `O' > * encoding into other structures -- e.g. search engines, structured or marked- > up text, file systems, command shells > * brevity -- Shorter URLs are better than longer ones. > * ergonomics -- Human users (especially non-technical ones) should find the > URIs as easy and pleasant as possible. The uglier the URI looks, the worse. > > DESIGN > > Base > > The first decision we made was to use base-32 instead of base-64. An earlier > version of this project used base-64, but a discussion on the p2p-hackers > mailing list [4] convinced us that the added length of base-32 encoding was > worth the added clarity provided by: case-insensitivity, the absence of non- > alphanumeric characters, and the ability to omit a few of the most troublesome > alphanumeric characters. > > In particular, we realized that it would probably be faster and more comfortable > to vocally transcribe a base-32 encoded 40-octet string (64 characters, case- > insensitive, no non-alphanumeric characters) than a base-64 encoded one > (54 characters, case-sensitive, plus two non-alphanumeric characters). > > Alphabet > > There are 26 alphabet characters and 10 digits, for a total of 36 characters > available. We need only 32 characters for our base-32 alphabet, so we can > choose four characters to exclude. This is where we part company with > traditional base-32 encodings. For example [1] eliminates `0', `1', `8', and > `9'. This choice eliminates two characters that are unambiguous (`8' and `9') > while retaining others that are potentially confusing. Others have suggested > eliminating `0', `1', `O', and `L', which is likewise suboptimal. > > Our choice of confusing characters to eliminate is: `0', `l', `v', and `2'. Our > reasoning is that `0' is potentially mistaken for `o', that `l' is potentially > mistaken for `1' or `i', that `v' is potentially mistaken for `u' or `r' > (especially in handwriting) and that `2' is potentially mistaken for `z' > (especially in handwriting). > > Note that we choose to focus on typed and written transcription errors instead > of vocal, since humans already have a well-established system of disambiguating > spoken alphanumerics (such as the United States military's "Alpha Bravo Charlie > Delta" and telephone operators' "Is that 'd' as in 'dog'?"). > > Sub-Octet Data > > Suppose you have 10 bits of data to transmit, and the recipient (the decoder) is > expecting 10 bits of data. All previous base-32 encoding schemes assume that > the binary data to be encoded is in 8-bit octets, so you would have to pad the > data out to 2 octets and encode it in base-32, resulting in a string > 4-characters long. The decoder will decode that into 2 octets (16 bits) and > then ignore the least significant 6 bits. > > In the base-32 encoding described here, if the encoder and decoder both know the > exact length of the data in bits (modulo 40), then they can use this shared > information to optimize the size of the transmitted (encoded) string. In the > example that you have 10 bits of data to transmit, libbase32 allows you to > transmit the optimal encoded string: two characters. > > If the length in bits is always a multiple of 8, or if both sides are not sure > of the length in bits modulo 40, or if this encoding is being used in a way that > optimizing one or two characters out of the encoded string isn't worth the > potential confusion, you can always use this encoding the same way you would use > other encodings -- with an "input is in 8-bit octets" assumption. > > Padding > > Honestly, I don't understand why all the base-32 and base-64 encodings require > trailing padding. Maybe I'm missing something, and when I publish this document > people will point it out, and then I'll hastily erase this paragraph. > > [1] http://www.ietf.org/internet-drafts/draft-josefsson-base-encoding-04.txt > [2] http://sf.net/projects/libbase32 > [3] http://mnet.sf.net/ > [4] http://zgp.org/pipermail/p2p-hackers/2001-October/ > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers From gojomo at usa.net Fri Nov 1 16:09:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding References: Message-ID: <008801c28203$f30e2b20$640a000a@golden> > > On Fri, Nov 01, 2002 at 04:35:45PM -0600, Aaron Swartz wrote: > > :) why did you choose to limit yourself to numbers and letters? it seems a > > :) more human-friendly scheme (while remaining url-compatible) would be: > > :) > > :) abcdefghikmopsuwxyz345678$@^-+;~ > > :) > > :) (there's probably some room for improvement) am i missing something? Some of those characters are difficult to use in filenames, having special meanings or being otherwise disallowed. They are also considered 'stop' characters by legacy indexing practices (including Google), so a token featuring the non- alphanumeric characters may not be searchable, or as easily searchable as a single item, as a purely alphanumeric token. For example, try searching Google for: F77THX24CRGILULQ637OHA7E4HE7QDQ2 If this identifier, and many others, were broken up by stop characters, you might get flase hits or have other problems. Lucas Gonze writes: > But while we're on the subject, '-' and '_' would make these identifiers > more chunkable. for example 18005551212 is harder to transcribe than > 1-800-555-1212. > > It's pure cognitive trickery, there's no extra meaning at all, but it > works. And for the indexing purposes mentioned above, I think such chunking would be a problem, rather than a benefit. - Gojomo From lgonze at panix.com Fri Nov 1 16:55:01 2002 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding In-Reply-To: <008801c28203$f30e2b20$640a000a@golden> Message-ID: Gordon Mohr wrote: > Some of those characters are difficult to use in filenames, > having special meanings or being otherwise disallowed. Hm. It's Zooko's project. I'll leave the fine points of what he wants to do to him. No point in kibitzing more than I already have. From zooko at zooko.com Fri Nov 1 17:12:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding In-Reply-To: Message from Lucas Gonze of "Fri, 01 Nov 2002 19:51:33 EST." References: Message-ID: For those watching at home (err.. I guess that's everyone? Except those watching from work.) I've updated the doc to address some of the issues that have been discussed. http://cvs.sf.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD&content-type=text/vnd.viewcvs-markup In particular, I am very interested in the surprisingly strong emotions that people seem to have at the very thought of an "incompatible" base-32 encoding. When I started this sub-project, I thought that I was sacrificing some non- specific compatibility in exchange for better human-usability, but after this discussion and thinking more carefully about the issue, it seems to me that using mnet-base-32 encoding for mnetIds actually *enhances* compatibility with other systems more than using standard-base-32 encoding for mnetIds would. Here is the relevant section from v0.9.4.1 of the DESIGN file: A NOTE ON COMPATIBILITY AND INTEROPERATION If your application could possibly interoperate with another application, then you should consider the risk of precluding such interoperation by encoding semantically identical objects into syntactically different representations. For example, many current systems include the SHA-1 hash of the contents of a file, and this hash value can be represented for user or programmatic sharing in base-32 encoded form [5, 6, 7, 8]. These four systems all use traditional base-32 encoding as described in [1]. If your system will expose the SHA-1 hash of the contents of a file, then you should consider the benefits of having such hash values be exchangeable with those systems by using the same encoding including base, alphabet, permutation of alphabet, length-encoding, padding, treatment of illegal characters and line-breaks. If, however, the semantic meaning of the objects that you are exposing is not something that can be used by another system, due to semantic differences, then you gain nothing with regard to interoperation by using the same ASCII encoding, and in fact by doing so you may incur *worse* interoperation problems by making it impossible for the applications to use syntactic features (namely, by recognizing the encoding scheme) to disambiguate between semantic features. Lucas Gonze has suggested [9] that different schemes could in fact *deliberately* add characters which would be illegal in another scheme in order to enable syntactic differentiation. (This would be morally similar to the "check digit" included in most credit card numbers.) The author has also suggested [10] encoding schematic compatibility in the lengths. For example, mnetIds will probably be 48 characters in base-32 encoded form (encoding 30 octets of data). If it turns out that other strings of that length and form occur in the wild, then the mnetIds could be redefined to be 47 or 49 characters in order to make them recognizable. Clearly the best semantic differentiation is an unambiguous one that is transmitted out-of-band (outside of the ASCII encoding, that is), such as URI scheme names (e.g.: SHA1:blahblahblah or mnet://blahblahblah). However, users might not always preserve those. REFERENCES [1] http://www.ietf.org/internet-drafts/draft-josefsson-base-encoding-04.txt [2] http://sf.net/projects/libbase32 [3] http://mnet.sf.net/ [4] http://zgp.org/pipermail/p2p-hackers/2001-October/ [5] Gnutella [need URL for SHA1 and base-32 encoding stuff] [6] Bitzi [need URL for specification stuff] [7] CAW [need URL] [8] THEX [need URL] [9] http://zgp.org/pipermail/p2p-hackers/2002-November/000924.html [10] http://zgp.org/pipermail/p2p-hackers/2002-November/000927.html From pfh at mail.csse.monash.edu.au Mon Nov 4 16:55:01 2002 From: pfh at mail.csse.monash.edu.au (Paul Harrison) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding In-Reply-To: Message-ID: On Fri, 1 Nov 2002, Lucas Gonze wrote: > > But while we're on the subject, '-' and '_' would make these identifiers > more chunkable. for example 18005551212 is harder to transcribe than > 1-800-555-1212. > > It's pure cognitive trickery, there's no extra meaning at all, but it > works. > To have an even more human friendly coding, how about nonsense words :-) This python code snippet encodes each byte as a syllable... import string consonants = [ 'b','c','d','f','g','h','j','k','l','m','n','p','r', 's','t','v','w','x','z', 'th','ch','sh','st','ts' ] vowels = [ 'a','e','i','o','u', 'oo','ee','io','au','ie', 'ai' ] def human_name(str): result = '' for letter in str: number = ord(letter) result = result + consonants[number%len(consonants)] \ + vowels [number/len(consonants)] return string.capitalize(result) (this used to be used in Circle for public keys. It now uses a graphical squiggle, the idea being a public key is recognizable to people (as opposed to transcribable) so that they can search for a person then check that they aren't an imposter) cheers, Paul Email: pfh@csse.monash.edu.au one ring, no rulers, http://www.csse.monash.edu.au/~pfh/circle/ From bram at gawth.com Mon Nov 4 19:11:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] upcoming meeting sunday Message-ID: Yesterday was the first sunday of the month, so arithmetic being what it is, this next week will be the monthly p2p-hackers meeting 3pm, SONY metreon, san francisco, food court area. Sunday, November 10th. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From lgonze at panix.com Tue Nov 5 08:44:02 2002 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding In-Reply-To: Message-ID: > (this used to be used in Circle for public keys. It now uses a graphical > squiggle, the idea being a public key is recognizable to people (as > opposed to transcribable) so that they can search for a person then check > that they aren't an imposter) > > cheers, > Paul That little squiggle is a great idea, Paul. original! From zooko at zooko.com Tue Nov 5 10:34:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented little squiggle encoding (was: human-oriented base-32 encoding) In-Reply-To: Message from Lucas Gonze of "Tue, 05 Nov 2002 11:39:41 EST." References: Message-ID: Lucas Gonze wrote: > > That little squiggle is a great idea, Paul. Yes! > original! Not entirely. There've been many ideas posted to the Net throughout the years to represent cryptographic values graphically. I'm afraid I don't have any references at the moment, but I recall butterflies, fractals, human faces, and other such graphical objects being proposed. I even recall that an implementation was announced. I *don't* recall "squiggle" as the form, and that sounds like a good idea to me. Kudos to Paul for devising and implementing such sophisticated schemes as Circle and the squiggle. (And the nonsense words, which by the way I recall were a part of the still-born PGP Phone, first version, back in 1995.) Regards, Zooko From painlord2k at libero.it Fri Nov 8 07:10:02 2002 From: painlord2k at libero.it (Mirco Romanato) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding References: Message-ID: <005a01c28738$a8d588c0$c967fea9@painlord2k> ----- Original Message ----- From: "Lucas Gonze" > That little squiggle is a great idea, Paul. original! Sorry but it could work in US and maybe in english speaking countries but for not english latin alphabet user it don't work. I'm italian so for me half of this scheme is unusable. And note that in italian we spell the words always like they are writed, exceptions are very very rare. Mirco From clausen at gnu.org Fri Nov 8 13:31:01 2002 From: clausen at gnu.org (Andrew Clausen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding In-Reply-To: <005a01c28738$a8d588c0$c967fea9@painlord2k> References: <005a01c28738$a8d588c0$c967fea9@painlord2k> Message-ID: <20021108203729.GC1119@gnu.org> On Fri, Nov 08, 2002 at 04:08:12PM +0100, Mirco Romanato wrote: > Sorry but it could work in US and maybe in english speaking countries but > for not english latin alphabet user it don't work. > > I'm italian so for me half of this scheme is unusable. > And note that in italian we spell the words always like they are writed, > exceptions are very very rare. Could you have a different system for each locale? Eg: italian would probably have extra vowel sounds like "io" and "ia", and not things like "th" and "gh". It's a way of representing arbitary data in human readable form... it's just a user interface issue. Cheers, Andrew From bram at gawth.com Sat Nov 9 16:49:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] reminder: p2p-hackers meeting tomorrow Message-ID: Remember, there will be a p2p-hackers meeting tomorrow, sunday, at 3pm in the metreon. I've got some really interesting exporting logins stuff to talk about, as well as some new BitTorrent developments. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bradneuberg at yahoo.com Sat Nov 9 17:29:01 2002 From: bradneuberg at yahoo.com (Brad Neuberg) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Adaptive P2P Network Library? Message-ID: <20021110012840.2391.qmail@web14107.mail.yahoo.com> Is anyone aware of any libraries or work that allow for adaptive P2P bandwidth usage? For example, imagine a "smart" networking library that does the following: *It allows you to state that your p2p network "service" (which could be using many different ports over time, TCP or UDP, etc.) will only use 40% of the available bandwidth on your network connection. *It allows you to designate your p2p network "service" as less important than other applications communicating on the network, such as web browsing, etc.; the smart networking library then informs you when there is low network traffic being communicated, allowing you to start performing p2p communications. As soon as other applications start communicating your app would then throttle down. This would prevent a p2p app from slowing down your other network applications. *It somehow "watches" the amount of traffic on your local network as a whole, and allows you to set a percentage of the total amount of traffic the p2p network service can generate on the local network as a whole during high and low bandwidth usage. Here's an example. Imagine a standard p2p file sharing application that is using this smart networking library. It is configured to use up to 50% of the local PC's bandwidth during peak local usage (i.e. while other apps on the local PC are also using bandwidth). When other apps on the local PC are not using any bandwidth, it is given permission to exhaust the local bandwidth on the PC to serve files to other clients. It is also configured to be a "friendly" citizen on the local network, and is therefore aware of high and low usages of bandwidth on the local area network as a whole. Is this possible? Thanks, Brad Neuberg bkn3@columbia.edu From wesley at felter.org Sun Nov 10 12:34:01 2002 From: wesley at felter.org (Wes Felter) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Adaptive P2P Network Library? In-Reply-To: <20021110012840.2391.qmail@web14107.mail.yahoo.com> Message-ID: on 11/9/02 7:28 PM, Brad Neuberg at bradneuberg@yahoo.com wrote: > Is anyone aware of any libraries or work that allow > for adaptive P2P bandwidth usage? For example, > imagine a "smart" networking library that does the > following: > > *It allows you to state that your p2p network > "service" (which could be using many different ports > over time, TCP or UDP, etc.) will only use 40% of the > available bandwidth on your network connection. The obvious way to do this is to use a QoS-enabled network stack, which most computers don't have. To do it with no OS help you would first need to figure out how much bandwidth is available, which is tricky in the general case. You can probably get a good enough estimate by writing a bandwidth monitor and calculating the maximum observed transfer rate over a rolling n-second window. > *It allows you to designate your p2p network "service" > as less important than other applications > communicating on the network, such as web browsing, > etc.; the smart networking library then informs you > when there is low network traffic being communicated, > allowing you to start performing p2p communications. > As soon as other applications start communicating your > app would then throttle down. This would prevent a > p2p app from slowing down your other network > applications. Use QoS or write a bandwidth monitor. > *It somehow "watches" the amount of traffic on your > local network as a whole, and allows you to set a > percentage of the total amount of traffic the p2p > network service can generate on the local network as a > whole during high and low bandwidth usage. You could put the NIC in promiscuous mode and use a bandwidth monitor, but this will increase CPU utilization. And that won't even work for anything but a traditional Ethernet (which is becoming increasingly rare in a world where for 39 cents more you can value-size your router to also include a switch). Wes Felter - wesley@felter.org - http://felter.org/wesley/ From pfh at mail.csse.monash.edu.au Sun Nov 10 15:07:01 2002 From: pfh at mail.csse.monash.edu.au (Paul F Harrison) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Adaptive P2P Network Library? In-Reply-To: <20021110012840.2391.qmail@web14107.mail.yahoo.com> Message-ID: On Sat, 9 Nov 2002, Brad Neuberg wrote: > Is anyone aware of any libraries or work that allow > for adaptive P2P bandwidth usage? For example, > imagine a "smart" networking library that does the > following: > > *It allows you to state that your p2p network > "service" (which could be using many different ports > over time, TCP or UDP, etc.) will only use 40% of the > available bandwidth on your network connection. > > *It allows you to designate your p2p network "service" > as less important than other applications > communicating on the network, such as web browsing, > etc.; the smart networking library then informs you > when there is low network traffic being communicated, > allowing you to start performing p2p communications. > As soon as other applications start communicating your > app would then throttle down. This would prevent a > p2p app from slowing down your other network > applications. > I'm not sure how to allocate a % of bandwidth to your app with IP, however there is a way to make it lower priority than other network use. IP indicates network congestion by dropping packets. TCP responds to this by sending less packets at a time, or by increasing the time between packets sent. With UDP however, this sort of thing is up to you. It's possible to completely lock up a network by blasting it with UDP packets (from personal experience ;-) ). Any application that makes serious use of UDP has to implement its own back off algorithm. So to make your network usage low priority, you could use UDP and make sure you back off *lots* when packets get dropped. Libraries that use UDP might allow you to set this as a parameter... cheers, Paul Harrison Email: pfh@yoyo.cc.monash.edu.au Web: http://yoyo.cc.monash.edu.au/~pfh/ From greg at electricrain.com Sun Nov 10 23:10:01 2002 From: greg at electricrain.com (Gregory P. Smith) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding In-Reply-To: References: Message-ID: <20021111070906.GA14258@zot.electricrain.com> > The rationale for libbase32 is different -- it is to represent arbitrary > sequences of octets in a form that is as convenient as possible for human users > to manipulate. In particular, libbase32 was created in order to serve the Mnet > project [3], where 40-octet cryptographic values are encoded into URIs for > humans to manipulate. Anticipated uses of these URIs include cut-and-paste, > text editing (e.g. in HTML files), manual transcription via a keyboard, manual > transcription via pen-and-paper, vocal transcription over phone or radio, etc. Another idea: take the approach that ISBN numbers and other existing common "not quite easy to send through a human accurately" codes use. Include some additional ECC characters in the sequence to catch & correct for mistyped digits. (useless trivia: ISBN numbers use 0-9 for the main digits and a base11 [0-9X] digit for the final ecc checksum) From zooko at zooko.com Mon Nov 11 05:10:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] human-oriented base-32 encoding In-Reply-To: Message from "Gregory P. Smith" of "Sun, 10 Nov 2002 23:09:06 PST." <20021111070906.GA14258@zot.electricrain.com> References: <20021111070906.GA14258@zot.electricrain.com> Message-ID: Greg Smith wrote: > > Another idea: take the approach that ISBN numbers and other existing > common "not quite easy to send through a human accurately" codes use. > Include some additional ECC characters in the sequence to catch & > correct for mistyped digits. > > (useless trivia: ISBN numbers use 0-9 for the main digits and a base11 > [0-9X] digit for the final ecc checksum) Greg: that's a good idea! *So* good, in fact, that I've already added a "TODO" to my DESIGN doc [1] about it. (See the first item at the end under "NEEDED TO ADD".) Great minds think alike! And so do ours! Regards, Zooko [1] http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/libbase32/libbase32/DESIGN?rev=HEAD&content-type=text/vnd.viewcvs-markup From jsr at dit.upm.es Mon Nov 11 05:28:01 2002 From: jsr at dit.upm.es (Joaquin Salvachua) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Adaptive P2P Network Library? Message-ID: hello, i have develop, with some students help, a pseudo-tcp over UDP socket library with bandwith control. it is at: https://sourceforge.net/projects/progtcp/ regards Joaquin -- ----------------------------------------------------------- Joaquin Salvachua tel: +34 91 549 57 00 x.367 Associated Professor +34 91 549 57 62 x.367 dpt. Telematica E.T.S.I. Telecomunicacion Ciudad Universitaria S/N fax: +34 91 336 73 33 E-28040 MADRID SPAIN mailto: jsalvachua@dit.upm.es // http://www.dit.upm.es/~jsr ------------------------------------------------------------ From hishigh at 163.com Tue Nov 12 04:45:02 2002 From: hishigh at 163.com (hishigh) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] =?gb2312?B?cXVlc3Rpb25zIG9uIHAycCBRb1M=?= Message-ID: <3DCF1A33.00002D.14435@bj222.163.com> I heard the term of p2p QoS recently.But my colleagues and I argue much in this matter.I think p2p Qos focus on providing Diffserv on the transfer procedure since there are a lot of kinds of terminals from 33k/56k Modem to ADSL.We can provide different sevices when transferring.While my colleagues argue that p2p Qos is more focused on the scheduling in the servlet nodes for different kinds of service request for the reason that there is little room in the transfer link in improvements.How do u think about it?Thanks a lot. Yunfei Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zgp.org/pipermail/p2p-hackers/attachments/20021112/5206a61f/attachment.htm From wege at acm.org Wed Nov 13 13:43:01 2002 From: wege at acm.org (Chris Wege) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Wanted:Talk about P2P in Stuttgart, Germany Message-ID: <3DD2BB8D.2010105@acm.org> Hi Folks, I am looking for someone in the Stuttgart, Germany area who would like to give a talk about P2P at the 28th of November at the Java User Group Stuttgart (www.jugs.org). Preferably in german. Anyone? Best regards, Christian Wege -- wege@acm.org http://www.purl.org/net/wege From zooko at zooko.com Thu Nov 14 12:23:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 Message-ID: Why do Bitzi and THEX use Tiger instead of SHA-1 as the basis for their tree hashes? Much as I admire Tiger's inventors Ross Anderson and Eli Biham (and I do -- a lot!), all the benchmarks I have seen [1, 2] say that Tiger is about half as fast as SHA-1. (It isn't clear what size Tiger output has been benchmarked.) SHA-1 also has the advantages of being a U.S. federal standard and the de facto standard cryptographic hash among cryptographers. (MD5 remains the de facto standard cryptographic hash among non-cryptographers, presumably because of the command-line implementation named "md5sum".) Thanks in advance for your replies. I may also post queries along these lines to crypto groups, in which case I'll summarize what I learn to the p2p-hackers list. Regards, Zooko [1] http://www.eskimo.com/~weidai/benchmarks.html [2] http://botan.randombit.net/bmarks.php From gojomo at usa.net Thu Nov 14 13:19:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 References: Message-ID: <00be01c28c23$52680d10$640a000a@golden> Bitzi was already taking the SHA1 hash of the full-file, and then wanted a second hash for (1) robustness of our catalog against the (hypothetical future) discovery of problems in SHA1 (2) incremental and subrange verifications. To have used SHA1 again, as the tree basis, would have made both hashes dependent on the same algorithm, and thus potentially fall to the same theoretical breakthroughs. Anderson and Biham emphasize in their paper that Tiger's derivation is different than that of the MD4/MD5/SHA1 family of hash functions -- of which both MD4 and MD5 have already been compromised, to some degree. They also suggest that Tiger will calculate much more efficiently on 64-bit processors, and that it is already competitive with MD5. (To that last claim, I can only presume they were using a reference, completely unoptimized MD5 implementation.) They also note that their reference code has not been fully optimized for 32-bit machines. I suspect that the 2x performance gap between SHA1 and Tiger in the comparisons you cite is mostly due to the fact that the highly-used SHA1 code has been highly tuned, and if/when the Tiger code is similarly tuned, it will do much better. If Anderson & Biham's comments about 64-bit processors are correct, then we might expect Tiger to outperform SHA1, when both are equivalently optimized, on future 64-bit processors. So for Bitzi, using SHA1 for the full-file hash was an easy choice -- for immediate accessibility to the widest audience -- while using Tiger for the secondary, more exotic tree hash gained (1) algorithm diversity; (2) an extra 32 bits of hash (192 vs 160); (3) a potential speed *improvement* over SHA1, if/when Tiger is equivalently optimized or 64-bit processors become the norm. In THEX, any algorithm may be specified to construct the tree, but I think the existing examples and work is biased towards Tiger, in order to (1) preserve potential interoperability with the existing Bitzi code and catalog; (2) enjoy the long-term benefits in the someday optimized/64-bit world. - Gojomo ____________________ Gordon Mohr Bitzi CTO . . . describe and discover files of every kind. _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! ----- Original Message ----- From: "Zooko" To: Sent: Thursday, November 14, 2002 12:18 PM Subject: [p2p-hackers] Tiger vs. SHA-1 > > Why do Bitzi and THEX use Tiger instead of SHA-1 as the basis for their tree > hashes? > > Much as I admire Tiger's inventors Ross Anderson and Eli Biham (and I do -- a > lot!), all the benchmarks I have seen [1, 2] say that Tiger is about half as > fast as SHA-1. (It isn't clear what size Tiger output has been benchmarked.) > > SHA-1 also has the advantages of being a U.S. federal standard and the de facto > standard cryptographic hash among cryptographers. (MD5 remains the de facto > standard cryptographic hash among non-cryptographers, presumably because of the > command-line implementation named "md5sum".) > > Thanks in advance for your replies. I may also post queries along these lines > to crypto groups, in which case I'll summarize what I learn to the p2p-hackers > list. > > Regards, > > Zooko > > [1] http://www.eskimo.com/~weidai/benchmarks.html > [2] http://botan.randombit.net/bmarks.php > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers From justin at chapweske.com Thu Nov 14 13:24:01 2002 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 References: <00be01c28c23$52680d10$640a000a@golden> Message-ID: <3DD41457.8080809@chapweske.com> Ditto. Gordon Mohr wrote: > Bitzi was already taking the SHA1 hash of the full-file, and then > wanted a second hash for (1) robustness of our catalog against > the (hypothetical future) discovery of problems in SHA1 (2) > incremental and subrange verifications. > > To have used SHA1 again, as the tree basis, would have made both > hashes dependent on the same algorithm, and thus potentially > fall to the same theoretical breakthroughs. > > Anderson and Biham emphasize in their paper that Tiger's derivation > is different than that of the MD4/MD5/SHA1 family of hash functions -- > of which both MD4 and MD5 have already been compromised, to some > degree. > > They also suggest that Tiger will calculate much more efficiently > on 64-bit processors, and that it is already competitive with MD5. > (To that last claim, I can only presume they were using a reference, > completely unoptimized MD5 implementation.) They also note that > their reference code has not been fully optimized for 32-bit > machines. > > I suspect that the 2x performance gap between SHA1 and Tiger > in the comparisons you cite is mostly due to the fact that > the highly-used SHA1 code has been highly tuned, and if/when > the Tiger code is similarly tuned, it will do much better. > > If Anderson & Biham's comments about 64-bit processors are > correct, then we might expect Tiger to outperform SHA1, when > both are equivalently optimized, on future 64-bit processors. > > So for Bitzi, using SHA1 for the full-file hash was an easy > choice -- for immediate accessibility to the widest audience -- > while using Tiger for the secondary, more exotic tree hash gained > (1) algorithm diversity; (2) an extra 32 bits of hash (192 vs > 160); (3) a potential speed *improvement* over SHA1, if/when > Tiger is equivalently optimized or 64-bit processors become the > norm. > > In THEX, any algorithm may be specified to construct the tree, > but I think the existing examples and work is biased towards > Tiger, in order to (1) preserve potential interoperability with > the existing Bitzi code and catalog; (2) enjoy the long-term > benefits in the someday optimized/64-bit world. > > - Gojomo > ____________________ > Gordon Mohr bitzi.com> Bitzi CTO . . . describe and discover files of every kind. > _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! > > > ----- Original Message ----- > From: "Zooko" > To: > Sent: Thursday, November 14, 2002 12:18 PM > Subject: [p2p-hackers] Tiger vs. SHA-1 > > > >>Why do Bitzi and THEX use Tiger instead of SHA-1 as the basis for their tree >>hashes? >> >>Much as I admire Tiger's inventors Ross Anderson and Eli Biham (and I do -- a >>lot!), all the benchmarks I have seen [1, 2] say that Tiger is about half as >>fast as SHA-1. (It isn't clear what size Tiger output has been benchmarked.) >> >>SHA-1 also has the advantages of being a U.S. federal standard and the de facto >>standard cryptographic hash among cryptographers. (MD5 remains the de facto >>standard cryptographic hash among non-cryptographers, presumably because of the >>command-line implementation named "md5sum".) >> >>Thanks in advance for your replies. I may also post queries along these lines >>to crypto groups, in which case I'll summarize what I learn to the p2p-hackers >>list. >> >>Regards, >> >>Zooko >> >>[1] http://www.eskimo.com/~weidai/benchmarks.html >>[2] http://botan.randombit.net/bmarks.php >> >>_______________________________________________ >>p2p-hackers mailing list >>p2p-hackers@zgp.org >>http://zgp.org/mailman/listinfo/p2p-hackers > > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From mujtaba at asu.edu Fri Nov 15 05:27:01 2002 From: mujtaba at asu.edu (Mujtaba Khambatti) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Need some comments Message-ID: Hi all.. I have published a paper titled: "Peer-to-Peer Communities: Formation and Discovery" http://www.public.asu.edu/%7emujtaba/Articles%20and%20Papers/pdcs-iasted-02. pdf Peer-to-Peer Communities are like interest groups, modeled after human communities and can overlap. They can also exist without anyone knowing about their existence. Communities are created, implicitly when one or more entities claim an interest in the same topic. Our work focuses on efficient methods to discover the formation of these self-configuring communities. We investigate the behavior of randomly created communities and model the complexity of discovery algorithms. Please send me your comments - they will be greatly appreciated. thanks, Mujtaba =========================================== Mujtaba Khambatti http://www.public.asu.edu/~mujtaba Work Address: ASU, Tempe, AZ 85287-5406. Work Number : (480) 965-2737 Home Number : (480) 967-6568 =========================================== From bram at gawth.com Fri Nov 15 08:34:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 In-Reply-To: <00be01c28c23$52680d10$640a000a@golden> Message-ID: Gordon Mohr wrote: > Bitzi was already taking the SHA1 hash of the full-file, and then > wanted a second hash for (1) robustness of our catalog against > the (hypothetical future) discovery of problems in SHA1 Well geeze, why not do both sha1 and tiger and xor them if you really care that much? Trying to design crypto protocols with the assumption that you don't trust your primitives quickly gets completely ridiculous. > I suspect that the 2x performance gap between SHA1 and Tiger > in the comparisons you cite is mostly due to the fact that > the highly-used SHA1 code has been highly tuned, and if/when > the Tiger code is similarly tuned, it will do much better. I find that dubious. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From gojomo at usa.net Fri Nov 15 09:33:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 References: Message-ID: <00ed01c28ccc$e8ae0190$640a000a@golden> Bram Cohen writes: > Gordon Mohr wrote: > > > Bitzi was already taking the SHA1 hash of the full-file, and then > > wanted a second hash for (1) robustness of our catalog against > > the (hypothetical future) discovery of problems in SHA1 > > Well geeze, why not do both sha1 and tiger and xor them if you really care > that much? We really want to track them independently, as a guard against the worrisome threat: that one or the other is discovered to be weak or otherwise manipulable. Further, XORing them would make it impossible to use the TigerTree for subrange verification without also calculating the SHA1. > Trying to design crypto protocols with the assumption that you don't trust > your primitives quickly gets completely ridiculous. Historically, hash algorithms get broken. The two most immediate antecedents of SHA1 -- MD4 and MD5 -- have been shown to be weaker than they were designed to be. (MD4, so weak than anyone with a desktop machine can create desired preimages in a short amount of time; MD5, weak enough to be unsuitable for some applications and generally frowned upon for new work.) Ross Anderson and Eli Biham thought this suggested consideration of new hash functions was advisable. From their Tiger paper (apparently written in 1995 or early 1996): These attacks cast doubt on the security of the other members of these families. One may only speculate at how long each function will remain unbroken; however it seems prudent to start work now on replacements. (Their paper does not even refer to the most troublesome attacks on MD5.) So, I defer to Professors Anderson and Biham on the issue of whether it is "completely ridiculous" to consider the possibility your cryptographic hash algorithm will someday be untrustworthy. Would you design a "crypto protocol" with no facility for changing hash functions, if ever necessary? That seems a minority viewpoint in security design. Further, your statement about "trying to design crypto protocols" is nonsensical; Bitzi's application is not really a "crypto protocol". It is a long-lived, shared reference catalog. Ideally, the Bitzi datadumps will be useful for decades if not centuries. Its "primary keys" should thus be as robust as possible against theoretical breakthroughs far beyond your imagination. We are not at the end of science and mathmatics, with your domain expertise the pinnacle of learning. With two primary keys, each an independent strong hash, as long as any breakthrough only compromises one hash at a time, there will be a window of opportunity (before the second hash is broken) with which to cross-reference old data to new stronger hashes, with secure timestamps. All catalogued data may not make the transition, but the situation is better than relying on a single hash. > > I suspect that the 2x performance gap between SHA1 and Tiger > > in the comparisons you cite is mostly due to the fact that > > the highly-used SHA1 code has been highly tuned, and if/when > > the Tiger code is similarly tuned, it will do much better. > > I find that dubious. Based on what? In working with different freely-available SHA1 implementations in C/C++, we saw differences, on the order of 2x, in their hashing speed. Anderson and Biham, no slouches, compared their (lightly optimized) Tiger reference implementation against MD5, and found Tiger faster. (!) And yet the sites Zooko referenced suggested MD5 was 4x faster than Tiger. What was the difference between MD5 implementations? The level of code optimization (by either hand tuning or compiler logic). Bram, what is more alarming than the fact that your knowledge is limited, is that you don't even realize how limited it is. The universe does not end at the walls which encircle your current level of understanding. - Gojomo ____________________ Gordon Mohr Bitzi CTO . . . describe and discover files of every kind. _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! From bram at gawth.com Fri Nov 15 15:53:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 In-Reply-To: <00ed01c28ccc$e8ae0190$640a000a@golden> Message-ID: Gordon Mohr wrote: > Would you design a "crypto protocol" with no facility for > changing hash functions, if ever necessary? That seems a > minority viewpoint in security design. Primitives can be changed with an increment in the major version number, the micromanaging of algorithms in pgp and ssl has proven to do nothing but make implementation more difficult and cause compatibility headaches. > Further, your statement about "trying to design crypto protocols" > is nonsensical; Bitzi's application is not really a "crypto > protocol". It uses crypto, ergo it is a crypto protocol. > > > I suspect that the 2x performance gap between SHA1 and Tiger > > > in the comparisons you cite is mostly due to the fact that > > > the highly-used SHA1 code has been highly tuned, and if/when > > > the Tiger code is similarly tuned, it will do much better. > > > > I find that dubious. > > Based on what? There is only one measure of time, and that's minutes and seconds. You're making performance claims about something you've already deployed based on sheer guesswork. > Bram, what is more alarming than the fact that your knowledge > is limited, is that you don't even realize how limited it is. > The universe does not end at the walls which encircle your current > level of understanding. I have no other ways of saying this. Fuck off. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From gojomo at usa.net Fri Nov 15 17:26:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 References: Message-ID: <022201c28d0f$109f5ae0$640a000a@golden> Bram Cohen writes: > Gordon Mohr wrote: > > Would you design a "crypto protocol" with no facility for > > changing hash functions, if ever necessary? That seems a > > minority viewpoint in security design. > > Primitives can be changed with an increment in the major version number, > the micromanaging of algorithms in pgp and ssl has proven to do nothing > but make implementation more difficult and cause compatibility headaches. An interesting, but decidedly minority, opinion. People who press the envelope with protocols appreciate that flexibility to adopt custom or stronger swap-ins. People who professionally design protocols, for use by multiple projects and organizations, now regularly include parameterizable algorithms. Do you expect your idosyncratic opinion, on the superiority of version numbers as a means of constraining algorithm choices, is going to be reflected in most (or indeed any) of the protocols that our fellow p2p-hackers will be implementing? Which ones? Going back to Zooko's question -- why did Bitzi utilize Tiger -- and our strategy of having a backup algorithm for SHA1, of what use is your preference for version numbers? If we simply trusted SHA1, and in 2010, SHA1 becomes trivially manipulable due to a theoretical breakthrough, major version numbers would just tell us what data we have to throw out. It wouldn't save valuable assertions, as having a fallback algorithm does. > > Further, your statement about "trying to design crypto protocols" > > is nonsensical; Bitzi's application is not really a "crypto > > protocol". > > It uses crypto, ergo it is a crypto protocol. Except that Bitzi's not a "protocol", it's a database. > > > > I suspect that the 2x performance gap between SHA1 and Tiger > > > > in the comparisons you cite is mostly due to the fact that > > > > the highly-used SHA1 code has been highly tuned, and if/when > > > > the Tiger code is similarly tuned, it will do much better. > > > > > > I find that dubious. > > > > Based on what? > > There is only one measure of time, and that's minutes and seconds. You're > making performance claims about something you've already deployed based on > sheer guesswork. No, I've seen 2x to 4x differences in hashing code, same algorithm, based on how much effort has been devoted to optimizing that particular code. There's no guesswork involved there. Haven't you seen such differences in your experience? The authors of the Tiger code say that little effort has been devoted to optimizing their code, especially for 32-bit processors. Do you not believe them? Meanwhile, highly-used -- and thus highly-optimized -- SHA1 code is common. People promulgating libraries care very much about how fast their SHA1 implementations are, as they are highly likely to be used and benchmarked, while they typically only care that their Tiger code gives correct results, at least until its use becomes more prevalent. In almost any benchmark now available, the SHA1 code is likely to be near optimal, while the Tiger code is far from it. Haven't you noticed this relationship between code age/prevalence and level of optimization before? Have you been coding under a rock? > > Bram, what is more alarming than the fact that your knowledge > > is limited, is that you don't even realize how limited it is. > > The universe does not end at the walls which encircle your current > > level of understanding. > > I have no other ways of saying this. Fuck off. Interesting. You can mock another person's judgement as "ridiculous", without any support beyond the brashness with which you speak, but when it suggested that your own vision and experience is limited, you have no response but profanity. That's poor social protocol design. - Gojomo From bram at gawth.com Fri Nov 15 20:02:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 In-Reply-To: <022201c28d0f$109f5ae0$640a000a@golden> Message-ID: Gordon Mohr wrote: > Bram Cohen writes: > > > > Primitives can be changed with an increment in the major version number, > > the micromanaging of algorithms in pgp and ssl has proven to do nothing > > but make implementation more difficult and cause compatibility headaches. > > An interesting, but decidedly minority, opinion. People who > press the envelope with protocols appreciate that flexibility > to adopt custom or stronger swap-ins. Yeah, well, there's a decided lack of competence in the field. Can anyone here come up with a *single* instance in which a parameterizable algorithm saved someone's ass? Even MD5 is to this day unbroken, and 3DES has remained completely solid. There are multitudes of cases of errors in protocol design, which of course are made more likely by parameterizability, and yet even more in implementation, which parameterizability exacerbates even more, but not a single break in algorithms, which is the only thing parameterizability helps with. > Do you expect your idosyncratic opinion, on the superiority of version > numbers as a means of constraining algorithm choices, is going to be > reflected in most (or indeed any) of the protocols that our fellow > p2p-hackers will be implementing? Which ones? No, I expect incompetence to continue to reign supreme. But my 'idiosyncratic' opinion has been followed in my own protocol, BitTorrent, which, unlike all but a handful of other p2p protocols, is widely deployed. That correlation is not coincidental. > > There is only one measure of time, and that's minutes and seconds. You're > > making performance claims about something you've already deployed based on > > sheer guesswork. > > No, I've seen 2x to 4x differences in hashing code, same > algorithm, based on how much effort has been devoted to > optimizing that particular code. There's no guesswork > involved there. Haven't you seen such differences in your > experience? Some implementations vary by that large a factor, but you're using that as a reason why the absolute best one might be that much better than the current best one, which by your own admission is rank speculation. If you didn't consider performance when selecting tiger or thought other criteria are more important than say so, but don't engage in wild speculation as if it's fact. > > > Bram, what is more alarming than the fact that your knowledge > > > is limited, is that you don't even realize how limited it is. > > > The universe does not end at the walls which encircle your current > > > level of understanding. > > > > I have no other ways of saying this. Fuck off. > > Interesting. You can mock another person's judgement as > "ridiculous", without any support beyond the brashness > with which you speak, but when it suggested that your own > vision and experience is limited, you have no response > but profanity. > > That's poor social protocol design. And you can take your pretension and shove it, too. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From kevin at atkinson.dhs.org Sat Nov 16 06:44:01 2002 From: kevin at atkinson.dhs.org (Kevin Atkinson) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Whi is SHA1 20 bytes? Message-ID: I was wondering why SHA1 is 20 bytes instead of 16 which is a nice power of 2. Is there any harm in dropping the last 4 bytes to make it 16 other than increasing the chance of collision which will still be to small to worry about? Thanks in advance. -- http://kevin.atkinson.dhs.org From zooko at zooko.com Sat Nov 16 07:14:02 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Why is SHA1 20 bytes? In-Reply-To: Message from Kevin Atkinson of "Sat, 16 Nov 2002 09:43:17 EST." References: Message-ID: > I was wondering why SHA1 is 20 bytes instead of 16 which is a nice power > of 2. > > Is there any harm in dropping the last 4 bytes to make it 16 other than > increasing the chance of collision which will still be to small to worry > about? The "Birthday Paradox" says that if you want to generate a collision by randomly tossing balls into X buckets, that you will have to toss approximately sqrt(X) balls before you have the first collision. So if a hash function has a 128-bit output, it takes only about 2^64 work to generate a collision. Indeed, I'm laying plans to take the advice of people like Ross Anderson (in "Security Engineering") who say that 160 bits is too small for the long run, and you should start moving to even larger sizes. It seems ridiculous at first glance that an attacker might do 2^80 work, but when you try to weigh in uncertain factors like the following, 160 bits doesn't seem so untouchable. * faster computers (including special purpose hardware), * the proliferation of hash users (many uses of hashes "share" the hash space so that all computers, all networks, all protocols on the planet are vulnerable to collisions with one another), and most uncertainly of all * theoretical advances that weaken the hash function Regards, Zooko From agl at imperialviolet.org Sat Nov 16 08:59:01 2002 From: agl at imperialviolet.org (Adam Langley) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Why is SHA1 20 bytes? In-Reply-To: References: Message-ID: <20021116160036.GA3857@imperialviolet.org> On Sat, Nov 16, 2002 at 10:09:35AM -0500, Zooko wrote: > > > I was wondering why SHA1 is 20 bytes instead of 16 which is a nice power > > of 2. > > > > Is there any harm in dropping the last 4 bytes to make it 16 other than > > increasing the chance of collision which will still be to small to worry > > about? > > Indeed, I'm laying plans to take the advice of people like Ross Anderson (in > "Security Engineering") who say that 160 bits is too small for the > long run, and you should start moving to even larger sizes. Of course, SHA isn't just 160 bits long. There are 256, 384 and 512 bit versions: http://csrc.nist.gov/cryptval/shs.html (thou SHA-384 is a trancated version of SHA-512) -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org (+44) (0)7986 296753 PGP: 9113 256A CC0F 71A6 4C84 5087 CDA5 52DF 2CB6 3D60 From ingo at fargonauten.de Sat Nov 16 10:47:02 2002 From: ingo at fargonauten.de (ingo@fargonauten.de) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 In-Reply-To: <022201c28d0f$109f5ae0$640a000a@golden> References: <022201c28d0f$109f5ae0$640a000a@golden> Message-ID: <20021116175656.GA6506@fargonauten.de> On Fri, Nov 15, 2002 at 05:25:34PM -0800, Gordon Mohr wrote: > An interesting, but decidedly minority, opinion. People who > press the envelope with protocols appreciate that flexibility > to adopt custom or stronger swap-ins. Errm, sorry, but IMHO its not as clear cut as you make it appear. I might have gotten him wrong, of course, so any errors in this claim are mine, but I'm pretty sure that at HAL2001, Phil Zimmermann claimed that allowing the choice between SHA1 and RIPEMD160 in OpenPGP was unfortunate and actually weakens security. Subsequently, there have been several discussions on the OpenPGP IETF mailing-list where respected developers disputed the value of having so many algorithm choices in the OpenPGP protocol. Please refer to that lists archive for detailed information. So, there seems to be some disagreement on the value of choice in crypto protocols amongst practicioners. bye -- http://fargonauten.de/ingo PGP: 3187 4DEC 47E6 1B1E 6F4F 57D4 CD90 C164 34AD CE5B From bert at akamail.com Sat Nov 16 12:37:01 2002 From: bert at akamail.com (Bert) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 References: <022201c28d0f$109f5ae0$640a000a@golden> <20021116175656.GA6506@fargonauten.de> Message-ID: <3DD6ADDA.6020904@akamail.com> ingo@fargonauten.de wrote: >On Fri, Nov 15, 2002 at 05:25:34PM -0800, Gordon Mohr wrote: > > >>An interesting, but decidedly minority, opinion. People who >>press the envelope with protocols appreciate that flexibility >>to adopt custom or stronger swap-ins. >> >> > >Errm, sorry, but IMHO its not as clear cut as you make it appear. > >I might have gotten him wrong, of course, so any errors in this claim >are mine, but I'm pretty sure that at HAL2001, Phil Zimmermann claimed >that allowing the choice between SHA1 and RIPEMD160 in OpenPGP was >unfortunate and actually weakens security. > There's a difference between allowing choice, and allowing some perhaps unfortunate choices. I'd be surprised if any security person with a clue, and any knowledge of the field's history, would suggest that support of multiple encryption and hashing schemes in SSL and related protocols/apps is a bad practice. Then again I'm sure there are a few out there, but as Gordon states -- it's a minority viewpoint. > Subsequently, there have >been several discussions on the OpenPGP IETF mailing-list where >respected developers disputed the value of having so many algorithm >choices in the OpenPGP protocol. Please refer to that lists archive >for detailed information. > Well, developers are always looking for excuses to be lazy, and maybe they did get a bit carried away in OpenPGP...but again, complaining about "too many algorithms" is not the same as denying the utility of choice. I would nevertheless be interested in reading the discussions, but had no luck digging them out of the archives. Can you be more specific? From ingo at fargonauten.de Sat Nov 16 13:39:01 2002 From: ingo at fargonauten.de (ingo@fargonauten.de) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 In-Reply-To: <3DD6ADDA.6020904@akamail.com> References: <022201c28d0f$109f5ae0$640a000a@golden> <20021116175656.GA6506@fargonauten.de> <3DD6ADDA.6020904@akamail.com> Message-ID: <20021116213703.GA7346@fargonauten.de> On Sat, Nov 16, 2002 at 12:43:06PM -0800, Bert wrote: > There's a difference between allowing choice, and allowing some perhaps > unfortunate choices. I'd be surprised if any security person with a > clue, and any knowledge of the field's history, would suggest that > support of multiple encryption and hashing schemes in SSL and related > protocols/apps is a bad practice. Then again I'm sure there are a few > out there, but as Gordon states -- it's a minority viewpoint. Quoting from memory, he differentiated a bit on encryption and message digests. For encryption algorithms the reasoning is easy and was mostly motivated by keeping implementations simple, not really security-wise. There are differences between with long-lived (e.g., OpenPGP encrypted documents and probably the kind of application we're talking about here) and short-lived applications. In the long-lived situation once you introduced an algorithm, you have to support it forever, to be able to decrypt your old documents. In SSL, if you don't support it anymore, don't offer it on negotation. So, what might be an advantage with SSL, could be a serious problem in practical implementation with other applications. However, thats just for completeness sake; I don't think it applies to the current situation. For message digests, the situation is different. Offering two digest algorithms might not improve security at all, depending on the rest of the protocol. In OpenPGP it doesn't, because the information which algorithm to use is protected by that same algorithm. So, if an attacker breaks the digest and wants to change the data, he might just as well change the "digest-type" info as well. So, if SHA1 is broken, it doesn't matter if the data was hashed with TIGER -- don't just change the data, change the metadata as well and you can still make a forgery, even though TIGER wasn't broken! His recommendation to introduce new hashing algorithms was to upgrade the version number of the protocol packets. So, OpenPGPv5 could use SHA-512 by default or something like that. Presumably, a recipient could infer that for data generated after some date or by some sender, a version4 packet would be illegitimate. Maybe there's was also some other assurance involved, I don't remember and can't of think of any at the moment. However, it seems remarkably similiar to Brams suggestion. > I would nevertheless be interested in reading the discussions, but > had no luck digging them out of the archives. Can you be more > specific? Hmm, got me. The only interesting thing I can dig up again is the ElGamal type 20 issue and some discussion regarding the introduction of a MDC packet (which outlines some of the issues on when choice might be appropriate, but not the real punchline). Sorry :-( Anyway, the above is what I recall about the discussion. It still makes sense to me, so I hope its correct ;-) bye -- http://fargonauten.de/ingo PGP: 3187 4DEC 47E6 1B1E 6F4F 57D4 CD90 C164 34AD CE5B From bert at akamail.com Sun Nov 17 09:48:01 2002 From: bert at akamail.com (Bert) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 References: <022201c28d0f$109f5ae0$640a000a@golden> <20021116175656.GA6506@fargonauten.de> <3DD6ADDA.6020904@akamail.com> <20021116213703.GA7346@fargonauten.de> Message-ID: <3DD7D7DF.4090706@akamail.com> ingo@fargonauten.de wrote: >For message digests, the situation is different. Offering two digest >algorithms might not improve security at all, depending on the rest of >the protocol. In OpenPGP it doesn't, because the information which >algorithm to use is protected by that same algorithm. So, if an >attacker breaks the digest and wants to change the data, he might just >as well change the "digest-type" info as well. So, if SHA1 is broken, >it doesn't matter if the data was hashed with TIGER -- don't just >change the data, change the metadata as well and you can still make a >forgery, even though TIGER wasn't broken! > Thanks for the clarification. Yes, if the goal is a long-lived signature, requiring a signature that is valid according to one of many available schemes only improves the odds of a protocol break. But this is exactly why Bitzi requires valid digests from both hashes, not just one or the other. >HHis recommendation to introduce new hashing algorithms was to upgrade >the version number of the protocol packets. So, OpenPGPv5 could use >SHA-512 by default or something like that. > Hope I'm not going beyond your recollection here, but is the recommendation just to make one method a "default", or is it to make one method exclusive? These are quite different things...If exclusive, then a break in that protocol requires an upgrade that completely pisses away backwards compatability. We sometimes forget that upgrading is rarely so simple (think Y2K). At least in the current design, if one hash is broken, newer implementations can simply exclude it and any new signatures remain interpretable by older versions. Granted, applications that verify digests with the older version shouldn't be considered "secure" any longer (unless authenticity is further verified), but at least they continue to work until upgrade or replacement is possible. I think version numbers and expiration are fine for many applications, probably including Bram's though I'm not so familiar with it. But protocols that are intended to be embedded and ubiquitous should offer considerably more graceful means for dealing with their (inevitable) need to evolve. From gojomo at usa.net Sun Nov 17 10:53:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Whi is SHA1 20 bytes? References: Message-ID: <004601c28e6a$8704c6f0$640a000a@golden> Kevin Atkinson writes: > I was wondering why SHA1 is 20 bytes instead of 16 which is a nice power > of 2. > > Is there any harm in dropping the last 4 bytes to make it 16 other than > increasing the chance of collision which will still be to small to worry > about? Each bit of a cryptographically strong hash is as secure as any other, so any truncation just creates a smaller strong hash. Attempts to, for example, find collisions on the truncated version will take less time, but still be no more efficient than a brute-force search. The shorter hash is "weaker" in one sense but not "weak" definitionally. So you could do the what you suggest, though you would then be deviating from a well-known standard, and in so doing throw out extra security which, after you've calculated the whole 20-byte value, is essentially "free". Why are the 4 bytes so important in your application? - Gojomo From kevin at atkinson.dhs.org Sun Nov 17 12:08:01 2002 From: kevin at atkinson.dhs.org (Kevin Atkinson) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Whi is SHA1 20 bytes? In-Reply-To: <004601c28e6a$8704c6f0$640a000a@golden> Message-ID: On Sun, 17 Nov 2002, Gordon Mohr wrote: > Kevin Atkinson writes: > > I was wondering why SHA1 is 20 bytes instead of 16 which is a nice power > > of 2. > > > > Is there any harm in dropping the last 4 bytes to make it 16 other than > > increasing the chance of collision which will still be to small to worry > > about? > > Each bit of a cryptographically strong hash is as secure as any other, > so any truncation just creates a smaller strong hash. Attempts to, for > example, find collisions on the truncated version will take less time, > but still be no more efficient than a brute-force search. The shorter > hash is "weaker" in one sense but not "weak" definitionally. > > So you could do the what you suggest, though you would then be deviating > from a well-known standard, and in so doing throw out extra security which, > after you've calculated the whole 20-byte value, is essentially "free". > > Why are the 4 bytes so important in your application? There not. I will probably keep all 20 bytes. I just wanted to know if there was anything special about 20 bytes which apparently is not the case. -- http://kevin.atkinson.dhs.org From gojomo at usa.net Sun Nov 17 12:30:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 References: Message-ID: <005e01c28e78$06c4b690$640a000a@golden> Bram Cohen writes: > Gordon Mohr wrote: > > Bram Cohen writes: > > > > > > Primitives can be changed with an increment in the major version number, > > > the micromanaging of algorithms in pgp and ssl has proven to do nothing > > > but make implementation more difficult and cause compatibility headaches. > > > > An interesting, but decidedly minority, opinion. People who > > press the envelope with protocols appreciate that flexibility > > to adopt custom or stronger swap-ins. > > Yeah, well, there's a decided lack of competence in the field. OK, Bram versus the world, got it. And your claim to unique competence is based on what reasoning or experience? Are there multiple interoperable implementations of protocols you've designed? Have they stood up to scrutiny and attack by third parties? Achieved many years of reliable use? > Can anyone > here come up with a *single* instance in which a parameterizable algorithm > saved someone's ass? The practice of open, interoperable security conventions is still rather young, maybe 10-20 years. The "trusted algorithm goes completely bad" threat is one that is only expected to arise infrequently. That said, you can find examples of systems making profitable use of swappable algorithms. SSL users in uniquely security-conscious environments enable only the algorithms which meet their local standards. DNSSEC work has introduced new algorithms over time, and as of RFC3110, now recommends against the use of the original MD5 algorithm. I don't think even you would find fault with the parameterizable key- strengths in programs like PGP; that's certainly helped since the key-lengths that were once suggested as OK for casual security (384 bits) have not just been theoretically questioned, but actually brute-force discovered by small teams. With hash algorithms, you don't have a slidable "strength" parameter to pass in. (SHA256 is not SHA1 with a different input option.) So if you want to be able to improve/patch security over time, you've got to be open to using new algorithms. > Even MD5 is to this day unbroken, Depends on your definition of "broken". It's not nearly as strong as it was designed or predicted to be, and is no longer advisable for many of the applications it was once recommended for. You should be familiar with these results: - Even mildly resourceful organizations with suitable motivation should be able to create MD5 collisions over a matter of days, not years. From: http://www.rsasecurity.com/rsalabs/faq/3-6-6.html "Van Oorschot and Wiener [VW94] have considered a brute-force search for collisions (see Question 2.1.6) in hash functions, and they estimate a collision search machine designed specifically for MD5 (costing $10 million in 1994) could find a collision for MD5 in 24 days on average." We have 8 more years of advances in computing power, and price drops, and tools for network parallelism, available to us today. - RSALabs was recommending as early as 1996 that "applications which rely on the collision-resistance of a hash function should be upgraded away from MD2 and MD5 when practical and convenient." (See: http://citeseer.nj.nec.com/robshaw96recent.html) > and 3DES has > remained completely solid. And yet, why was 3DES needed as a drop-in replacement for *DES*? Your own examples make the case for a healthy suspicion about algorithms. > No, I expect incompetence to continue to reign supreme. But my > 'idiosyncratic' opinion has been followed in my own protocol, BitTorrent, > which, unlike all but a handful of other p2p protocols, is widely > deployed. That correlation is not coincidental. Congratulations! What threshold did you pass that put you into the elite "widely deployed" category? You should issue a press release. The "incompetents" who keep putting parameterizable algorithms in their internet-infrastructure formats and protocols also have a lot of real-world deployment and experience. That's not a standard for evaluating competing ideas that you have any chance of winning, so why bring it up? > > No, I've seen 2x to 4x differences in hashing code, same > > algorithm, based on how much effort has been devoted to > > optimizing that particular code. There's no guesswork > > involved there. Haven't you seen such differences in your > > experience? > > Some implementations vary by that large a factor, but you're using that > as a reason why the absolute best one might be that much better than the > current best one, which by your own admission is rank speculation. If you > didn't consider performance when selecting tiger or thought other criteria > are more important than say so, but don't engage in wild speculation as if > it's fact. I reported the conjectures of Anderson and Biham, that their Tiger code has at least some further room for optimization (likely to reduce the 2x difference with SHA1 seen in some libraries), and that Tiger is likely to outperform functions designed for 32-bit processors when run on 64-bit processors. That's not "wild speculation", that's reasoned expert speculation, as was explained from the beginning. - Gojomo From bram at gawth.com Mon Nov 18 03:53:02 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 In-Reply-To: <005e01c28e78$06c4b690$640a000a@golden> Message-ID: Gordon Mohr wrote: > > and 3DES has > > remained completely solid. > > And yet, why was 3DES needed as a drop-in replacement for *DES*? You should learn about key lengths. Gordon, you've now succeeded in making me really, really not like you. Your profound lack of technical cluefulness is hardly unique, but your bullheaded lack of awareness of it and personal insults to people who point out when you're wrong are offensive. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bram at gawth.com Mon Nov 18 04:15:02 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 In-Reply-To: <3DD7D7DF.4090706@akamail.com> Message-ID: Bert wrote: > I think version numbers and expiration are fine for many applications, > probably including Bram's though I'm not so familiar with it. But > protocols that are intended to be embedded and ubiquitous should offer > considerably more graceful means for dealing with their (inevitable) > need to evolve. There are a few things done in BitTorrent to try to make extensibility clean - The metainfo files are in a format which is an encoding of a dictionary, and unrecognized keys are ignored, so new ones can be added later. The peer protocol contains some reserved bytes which are, well, reserved, and can be used in case of new functionality which needs to be supported on both ends. Finally, there's a protocol identifier at the beginning of the peer protocol, and a specific mimetype used to launch it, and either or both of those may be changed in the case of a protocol change which isn't backwardly compatible. Those are about the best you can do. Try to support new functionality without it interfering with old functionality, and leave in a hook for simply declaring a new change non-backwards-compatible. Even so, I put off declaring the protocol final until after a very excruciating extended release process, involving many backwards-incompatible changes before the end. Regardless of how many and how good your extensibility hooks are, protocol changes are always exceedingly painful. And please, if you make a change which isn't backwards-compatible, admit it to yourself and make a clean break. Carrying along cruft for the sake of pride is inevitably a disaster. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From steve_bryan at mac.com Mon Nov 18 09:21:01 2002 From: steve_bryan at mac.com (Steve Bryan) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 In-Reply-To: Message-ID: <03816FD5-FB1A-11D6-82BC-003065B4EAAE@mac.com> Pot, kettle, black. On Monday, November 18, 2002, at 05:52 am, Bram Cohen wrote: > Gordon Mohr wrote: > >>> and 3DES has >>> remained completely solid. >> >> And yet, why was 3DES needed as a drop-in replacement for *DES*? > > You should learn about key lengths. > > Gordon, you've now succeeded in making me really, really not like you. > Your profound lack of technical cluefulness is hardly unique, but your > bullheaded lack of awareness of it and personal insults to people who > point out when you're wrong are offensive. > > -Bram Cohen > > "Markets can remain irrational longer than you can remain solvent" > -- John Maynard Keynes > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers From gojomo at usa.net Mon Nov 18 13:53:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 References: Message-ID: <00ed01c28f4c$d708f490$640a000a@golden> You're insulting a straw man. "Drop-in" does not have to mean "same key length". I can see how it might be interpreted that way, so perhaps it was a poor choice of words on my part. So if it helps better make my point -- which was that no algorithm deserves blind trust -- simply drop out the words "drop-in": "And yet, why was 3DES needed as a replacement for *DES*?" Even if you dislike me, I like you. That's why I bother calling "bs!" on your cocksure -- but unsupportable -- pontificating. Look out for the people who trust you simply because you're brash. - Gojomo ----- Original Message ----- From: "Bram Cohen" To: Sent: Monday, November 18, 2002 3:52 AM Subject: Re: [p2p-hackers] Tiger vs. SHA-1 > Gordon Mohr wrote: > > > > and 3DES has > > > remained completely solid. > > > > And yet, why was 3DES needed as a drop-in replacement for *DES*? > > You should learn about key lengths. > > Gordon, you've now succeeded in making me really, really not like you. > Your profound lack of technical cluefulness is hardly unique, but your > bullheaded lack of awareness of it and personal insults to people who > point out when you're wrong are offensive. > > -Bram Cohen > > "Markets can remain irrational longer than you can remain solvent" > -- John Maynard Keynes > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers From oskar at freenetproject.org Mon Nov 18 18:21:01 2002 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 In-Reply-To: <03816FD5-FB1A-11D6-82BC-003065B4EAAE@mac.com> References: <03816FD5-FB1A-11D6-82BC-003065B4EAAE@mac.com> Message-ID: <20021119021957.GC351@sporty.spiceworld> Gordon can come play with me instead. I'm not allowed to play with Bram either. On Mon, Nov 18, 2002 at 11:20:20AM -0600, Steve Bryan wrote: > Pot, kettle, black. > > On Monday, November 18, 2002, at 05:52 am, Bram Cohen wrote: > > >Gordon Mohr wrote: > > > >>>and 3DES has > >>>remained completely solid. > >> > >>And yet, why was 3DES needed as a drop-in replacement for *DES*? > > > >You should learn about key lengths. > > > >Gordon, you've now succeeded in making me really, really not like you. > >Your profound lack of technical cluefulness is hardly unique, but your > >bullheaded lack of awareness of it and personal insults to people who > >point out when you're wrong are offensive. > > > >-Bram Cohen > > > >"Markets can remain irrational longer than you can remain solvent" > > -- John Maynard Keynes > > > >_______________________________________________ > >p2p-hackers mailing list > >p2p-hackers@zgp.org > >http://zgp.org/mailman/listinfo/p2p-hackers > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers -- Oskar Sandberg oskar@freenetproject.org From sean at lynch.tv Mon Nov 18 19:43:01 2002 From: sean at lynch.tv (Sean R. Lynch) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] protocol design, hashes, parameterizable protocols, etc. Message-ID: <1037676615.4446.47.camel@makoto.chaosring.org> It seems like the overriding reason protocols are the way they are is because that's the way the designer decided to make them. It seems to me that ultimately the goal is not to make protocols "perfect," but to make them work. I think this is why protocols usually aren't (successfully) designed by committees. The coder or coordinator gets to make the decisions, and whoever doesn't like it can put up with it or leave, fork, whatever. I think (hope) that this group has bigger fish to fry than which hash to use or whether to use feature negotiation or version numbers to decide which hash algorithm or whatever to use, who's more technically competent, etc. Please correct me if I'm wrong and I'll go find another list to lurk on. -- Sean R. Lynch -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20021118/4929b800/attachment.pgp From zooko at zooko.com Tue Nov 19 05:32:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) Message-ID: Folks: I've been reading the discussion with interest, as I'm working on defining a new distributed filestore format for Mnet, and have to make decisions about crypto algorithms, future-compatibility, etc. On interesting detail that I've noticed is that there is actually a "future- compatibility" benefit to replacing SHA-1 with a truncated, wider hash, such as SHA-512, with the output truncated down to 160 bits! The reason for this is that in Mnet blocks of data are distributed among block servers based on the hash of the block itself. (This is the "consistent hashing" technique, first published by Karger [1], and now used in all of the DHTs.) If we use a 160-bit hash now, and at some future point we have to switch to a wider hash, then if the wider hash is equal to the smaller hash in some subset of its bits, and if those are the high-order bits in our consistent hashing scheme, then we do not have to move any blocks from one block server to another when making the transition. The cost of such a design is significant, though. According to the Crypto++ benchmarks, the SHA algorithms can chew through this many megabytes per second: SHA-1 48.462 SHA-256 24.746 SHA-512 8.246 Even the slowest of these would not make Mnet become CPU-bound instead of network-bound (except for Mnet-on-LAN), but it would increase the CPU load that Mnet imposes while it works. At the moment, I'm leaning toward choosing (SHA-256 % 2^160) as the hash function that Mnet uses to distribute and identify encrypted blocks. Regards, Zooko [1] http://citeseer.nj.nec.com/karger97consistent.html [2] http://www.eskimo.com/~weidai/benchmarks.html From cefn.hoile at bt.com Tue Nov 19 05:55:01 2002 From: cefn.hoile at bt.com (cefn.hoile@bt.com) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 Message-ID: Respect to the Bram stokers out there. Cefn -----Original Message----- From: Oskar Sandberg [mailto:oskar@freenetproject.org] Sent: 19 November 2002 02:20 To: p2p-hackers@zgp.org Subject: Re: [p2p-hackers] Tiger vs. SHA-1 Gordon can come play with me instead. I'm not allowed to play with Bram either. On Mon, Nov 18, 2002 at 11:20:20AM -0600, Steve Bryan wrote: > Pot, kettle, black. From mccoy at mad-scientist.com Tue Nov 19 13:11:01 2002 From: mccoy at mad-scientist.com (Jim McCoy) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) In-Reply-To: Message-ID: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> On Tuesday, November 19, 2002, at 05:27 AM, Zooko wrote: [...] > On interesting detail that I've noticed is that there is actually a "future- > On interesting detail that I've noticed is that there is actually a "future- > compatibility" benefit to replacing SHA-1 with a truncated, wider hash, such as > SHA-512, with the output truncated down to 160 bits! > The cost of such a design is significant, though. That is putting it mildy, isn't it? One problem you face is that you are hit with this cost all over the place given the architecture that MNet is derived from -- data write, data read (for verification), data sharing (for verification), and a couple of other places -- and you also increase the storage requirements for the metadata that is used to describe what is stored. Truncation At this point it seems worthwhile to ask what your threat model is here. Are you trying to trying to protect against a break in the algorithm or a progression in available attacker power (faster computers), or some combination of the two. If an algorithm break is required to make the system "weak" and within the reach of a computationally-strong attacker in the next decade then why not just use two cheap hashes that use different base assumptions (a la Bitzi) and default to the cheapest of these two choices. If a computational miracle is required then maybe its worth asking if this is really a problem that will need to get solved at that time or if it will render the system itself irrelevant for a variety of other reasons. Are you trying to protect yourself from a "perfect storm"? Some sort of computational voodoo not yet predicted (causing a radical speedup in "Moore's Law") _in addition to_ "big" protocol breaks that put some of these hashes within the reach of some of these "100 years before their time" computers [and yes, quantum computers are already on this predicted timeline]. If that is the case then it seems to make sense to have the larger hashes, but perhaps with an eye towards the fact that before this happens it is likely that the cost of manipulating blobs of data using SHA-512 hashes will get cheaper. You can compute a parallel set of blobs for a published file using SHA-512 and include these blob references with the current SHA1 list, when the time comes to make the switchover the datasets can be upgraded by the clients (and the peers storing the data) in a transparent fashion as long as the correct metadata was included during the initial publication. Let someone who wants to imagine that a file they publish today will still be around in 50 years go through the trouble of heating their CPU during publication rather than punishing everyone else for the next 50 years just to suit this particular user's vanity; everyone else will dial it up a notch when it seems prudent and cost effective. Given most predictions by those "in the know" at the moment, the logical path seems to lead back to a first step of supporting multiple hashes and letting the person doing the data publication decide what hashes make sense for their purposes. Initially two cheap and different hashes will make sense and the truly paranoid can add a third SHA-512 or OtherHash-1024 in preparation for future support of an OtherHash-1024 dataset to eventually migrate the blocks to, the only requirement for any sort of future-compatibility path is selecting the hashes to support during initial publication. An additional point in favor of this is that your "truncated SHA-XXX" strategy only buys you a win in supporting to co-mingled datasets (one of SHA1 and one of SHA-XXX) living in the same blob storage structure. It not much harder to support overlay datasets of any other hash with a simple upward migration path that when SHA1 seems threatened the clients can start keeping a SHA-XXX dataset and upgrade what they are currently storing by re-publishing it to the correct SHA-XXX blob store. I am wondering what threat model support prematurely converging on a "best of breed" and then just making it bigger? Truncation should simply be eliminated out of hand -- why chop the big hash down for a single benefit given all of the external costs. Jim From bram at gawth.com Tue Nov 19 13:15:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] protocol design, hashes, parameterizable protocols, etc. In-Reply-To: <1037676615.4446.47.camel@makoto.chaosring.org> Message-ID: Sean R. Lynch wrote: > I think (hope) that this group has bigger fish to fry than which hash to > use or whether to use feature negotiation or version numbers to decide > which hash algorithm or whatever to use, who's more technically > competent, etc. Please correct me if I'm wrong and I'll go find another > list to lurk on. Preparing for backwards compatibility is a huge pain, and largely unrewarding on its own, but such minutae occupy most of your time working on real software development. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bram at gawth.com Tue Nov 19 13:15:03 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Tiger vs. SHA-1 In-Reply-To: Message-ID: cefn.hoile@bt.com wrote: > Respect to the Bram stokers out there. Bram is the name I was born with. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bram at gawth.com Tue Nov 19 13:27:02 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) In-Reply-To: Message-ID: Zooko wrote: > SHA-1 48.462 > SHA-256 24.746 > SHA-512 8.246 sha-512 is just ludicrous. Birthday attacks don't apply to all applications, and even sha-256 requires 2 ** 128 power to mount a birthday attack, and that's either 2 ** 128 memory or 2 ** 128 non-parelellized. Even sha-256's necessity is dubious for quite a ways out. 80 bits is still very safe even against DES-cracker style super-parallel machines. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From zooko at zooko.com Tue Nov 19 13:46:04 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) In-Reply-To: Message from Bram Cohen of "Tue, 19 Nov 2002 13:26:15 PST." References: Message-ID: Bram wrote: > > sha-512 is just ludicrous. Birthday attacks don't apply to all > applications, and even sha-256 requires 2 ** 128 power to mount a birthday > attack, and that's either 2 ** 128 memory or 2 ** 128 non-parelellized. Indeed -- the only reason why I would consider SHA-512 over SHA-256 is the possibility of better-than-brute-force results which make the latter unsafe. (By the way, birthday attacks can be implemented without significant memory requirements, using Floyd's cycle-finding algorithm. But your point about the infeasibility of brute force against SHA-256 is a good one.) Regards, Zooko From zooko at zooko.com Tue Nov 19 14:06:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) In-Reply-To: Message from Jim McCoy of "Tue, 19 Nov 2002 13:10:03 PST." <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> Message-ID: Jim: Thanks for the comments. In general, your questions about the practical value of "forward-compatibility" techniques are good questions that need to be asked. I still haven't settled on how much I am willing to pay (in CPU cycles, storage, and design complexity) for the uncertain possibility of smoothly upgrading crypto algorithms. (Note: after writing this message, I changed my mind about truncated-SHA-512.) Jim McCoy wrote: > > > The cost of such a design is significant, though. > > That is putting it mildy, isn't it? This depends on what kind of CPU and network will be doing this operation. If it is a 1.5 GHz Athlon on 1 Mbps consumer broadband, then I'm not sure there's any noticeable difference between SHA-512 and SHA-1. If it is a 206 MHz StrongARM on 11 Mbps WLAN, then there is a huge difference. (By the way, I have heard that SHA-512 gains speed-ups on 64 bit architectures just as Tiger does. But I don't really care about 64-bit architectures. As Bruce Schneier wrote during the AES process, *everything* is fast on a 64-bit Alpha. The more interesting target to me is the 32-bit chips that never go away, but instead proliferate as they reduce their requirements for size, power, heat, and dollars. The architectures that I care about most are 32-bit x86, 32-bit PowerPC, and 32-bit ARM. Of course, this preference also depends on the fact that my overall architecture is decentralized, and there is no expensive, beefy server in the center who does all the hash verification that the rest of the nodes take on faith.) > One problem you face is that you > are hit with this cost all over the place given the architecture that > MNet is derived from -- data write, data read (for verification), data > sharing (for verification), and a couple of other places -- True, true, true and true! > and you > also increase the storage requirements for the metadata that is used to > describe what is stored. Not true -- with the truncation trick we use the same amount of storage as with SHA-1. > not just use two cheap hashes that use different base assumptions (a la > Bitzi) and default to the cheapest of these two choices. This doesn't provide the future-compatibility feature of "we're upgrading the hash strength but keeping all the blocks on the same blockservers". In addition, I'm not comfortable with having hashes present but unchecked. If there are two hashes and you only verify one of them, then there is the possibility that you aren't looking at the same thing that everyone else is looking at. Consider that Alice is using an older version of Mnet and Bob is using a newer version, and Mallet gives them an mnetId that points to a different file depending on which version of Mnet they are running. That is a possibility which my design will seek to minimize. [ellided: Jim's suggestion to have two separate hashes during the upgrade phase] After reading your message and writing this response, I'm now leaning *away* from the "truncate SHA-512" strategy. My motivation is that I really don't want to exclude low-CPU devices like PDAs from being able to decode files which are encoded in the Mnet filestore format. I don't mind having to shuffle all the blocks around among the blockservers in order to upgrade -- block storage should be fluid anyway. My *current* thinking is to make the authentication system in the first version be simply a Merkle hash tree based on SHA-1, with block size equal to the block size of the erasure code. (Sorry folks: no Bitzi-compatible 1KB TigerTree in the first version. I'll reconsider if I can see a clear interop story.) If the Mnet filestore format is still in use by the time people want to move away from SHA-1, then in version 2 we can introduce a new authentication scheme, and the old SHA-1 Merkle hash tree will also be checked, if present. Thanks very much to all posters for contributing to this discussion. Regards, Zooko http://zooko.com/ http://mnet.sf.net/ From justin at chapweske.com Tue Nov 19 14:24:01 2002 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> Message-ID: <3DDAB9F8.1080604@chapweske.com> If you don't consider interop until there is a clear story, then its too late and you'll already be locked in. If you're not going to do Tiger for the hash tree, then at least support variable segment sizes and follow the orphan hash promotion semantics in THEX (http://open-content.net/specs/draft-jchapweske-thex-01.html). This will allow others to create a SHA-1 hash tree whose segment size is equal to the file size, and that will match exactly a normal full-file SHA-1 hash. So while we wouldn't have interop for fine-grained integrity checking, we'd at least be able to share *some* data. FYI, the Open Content Network currently generates full file MD5, SHA-1, and 1k Tiger hash trees. Here is an example of our current headers: 200 OK Date: Tue, 19 Nov 2002 22:14:19 GMT Accept-Ranges: bytes Server: Apache/1.3.26 (Unix) (Red-Hat/Linux) Content-Encoding: x-gzip Content-Length: 5843677 Content-MD5: 5pPPYDvG3LWXC+6DxbRk3w== Content-Type: application/x-gzip ETag: "3200e7-592add-31ba1400" Last-Modified: Sun, 09 Jun 1996 00:00:00 GMT X-Content-URN: urn:md5:42J46YB3Y3OLLFYL52B4LNDE34 X-Content-URN: urn:sha1:FAB6CX2GZSWOOWCPXXBYSFBUSN4LIGTF X-Content-URN: urn:tree:tiger:FSIUWJUUSPLMMDUQZOWX32R6AEOT7NCCBX6AGBI X-Thex-URI: http://open-content.net:8080/gateway/thex?uri=http://www.kernel.org/pub/linux/kernel/v2.0/linux-2.0.tar.gz;FSIUWJUUSPLMMDUQZOWX32R6AEOT7NCCBX6AGBI > > My *current* thinking is to make the authentication system in the first version > be simply a Merkle hash tree based on SHA-1, with block size equal to the block > size of the erasure code. (Sorry folks: no Bitzi-compatible 1KB TigerTree in > the first version. I'll reconsider if I can see a clear interop story.) > -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From bram at gawth.com Tue Nov 19 14:59:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) In-Reply-To: Message-ID: Zooko wrote: > (By the way, birthday attacks can be implemented without significant memory > requirements, using Floyd's cycle-finding algorithm. But your point about the > infeasibility of brute force against SHA-256 is a good one.) Floyd's cycle-finding algorithm can't be parallelized very well, that's why I said it requires 2 ** 128 memory *or* 2 ** 128 non-parallelized computational power. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From hal at finney.org Tue Nov 19 15:06:01 2002 From: hal at finney.org (Hal Finney) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) Message-ID: <200211192304.gAJN4Ym15999@finney.org> I agree that it makes sense to be prepared to move beyond SHA-1. Its 160 bit size makes a collision attack take 2^80 work. In many applications a collision attack may be almost as bad as an inversion attack. 2^80 isn't as big as it used to be, and if Moore's Law continues to hold for another decade or two, 2^80 attacks will be feasible for well funded attackers. If quantum computers become practical, an inversion attack will be reduced from 2^160 to 2^80 via Grover's algorithm [1]. I think this algorithm would resist parallelizing (square root of N speedup with N machines) but still in ~20 years it could be a serious problem. I believe, based on one paper I found [2], that Grover's algorithm can also speed up hash collision searches to the cube root of the search space, or 2^53.3 for SHA-1. That paper suggests that it is not yet known if this is the best possible speedup. Based on this, in a new design I would suggest either augmenting SHA-1 with a second hash to make it bigger, or using SHA-256. It's worth mentioning that despite the similar names, SHA-256 is really nothing like SHA-1 and so the many years of cryptanalysis of SHA-1 does not necessarily carry over to SHA-256. Hal Finney [1] http://alumni.imsa.edu/~matth/quant/473/473proj/ [2] http://www.cs.berkeley.edu/~aaronson/aaronson-47057.ps From hal at finney.org Tue Nov 19 15:42:01 2002 From: hal at finney.org (Hal Finney) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) Message-ID: <200211192340.gAJNeHo16135@finney.org> Bram Cohen wrote: > Floyd's cycle-finding algorithm can't be parallelized very well, that's > why I said it requires 2 ** 128 memory *or* 2 ** 128 non-parallelized > computational power. van Oorschot and Wiener's paper on parallel collision search [1] looks parallelizable to me. It uses distinguished points, which are for example hash values which have the lower T bits all zeros. Based on their equation 7, using my notation and ignoring constants, a SHA-256 collision search with M processors would take time on the order of: 2^128 / M + 2^T. T can be chosen such that 2^T is a couple of orders of magnitude smaller than the first term, and you get essentially linear speedup based on the number of processors. For M of say 2^40, T would be around 80, and the total memory would be of order 2^48 (2^128 points times 1/2^80 probability of being distinguished). Hal Finney [1] http://www.scs.carleton.ca/~paulv/papers/JoC97.pdf From zooko at zooko.com Tue Nov 19 15:42:03 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] hash interop (was: SHA-1 vs. (SHA-512 % 2^160)) In-Reply-To: Message from Justin Chapweske of "Tue, 19 Nov 2002 16:23:52 CST." <3DDAB9F8.1080604@chapweske.com> References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <3DDAB9F8.1080604@chapweske.com> Message-ID: Justin: Thank you for the post. It has raised several issues which will probably be worth discussing, and I've decided to reply to it multiple times in order to address each issue separately. Justin Chapweske wrote: > > If you don't consider interop until there is a clear story, then its too > late and you'll already be locked in. I don't believe that's true. The same smooth upgrade path that I've designed to change crypto algorithms should suffice to adopt an interop technique, like this: In version 1, Mnet filestore format has only a single authentication technique: SHA-1 Merkle hash trees with block size equal to the block size of the Mnet erasure code. Then someone comes up with an interop opportunity. This interop requires that Mnet filestores be verifiable with TigerTree hashes. In version 2, Mnet filestore format has both the SHA-1 block-size Merkle trees and 1KB TigerTree hashes. While downloading, clients are REQUIRED to verify all hashes that are present. Hopefully over time the new one is present more and more, and the old one less and less. In version 3, we consider Mnet filemaps with old SHA-1 hashes to be ill-formed and reject them. Okay, that's issue number one. Stay tuned... Regards, Zooko From zooko at zooko.com Tue Nov 19 15:46:02 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] security of variable block-length Merkle hash trees (SHA-1 vs. (SHA-512 % 2^160)) In-Reply-To: Message from Justin Chapweske of "Tue, 19 Nov 2002 16:23:52 CST." <3DDAB9F8.1080604@chapweske.com> References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <3DDAB9F8.1080604@chapweske.com> Message-ID: Justin Chapweske wrote: > > If you're not going to do Tiger for the hash tree, then at least support > variable segment sizes and follow the orphan hash promotion semantics > in THEX (http://open-content.net/specs/draft-jchapweske-thex-01.html). Hm. Looking at this just now, I wonder if there isn't an authentication flaw, wherein two different files have the same THEX X-Content-URN. Let's say there is a function THEXID() which takes a string of bytes and a blocksize as its two arguments. I think there are two strings, F1 and F2, such that THEXID(F1, 1024) == THEXID(F2, 48). (Assuming that the underlying hash outputs 24 bytes digests.) Let F1 be a string 2048 bytes long, composed of two equal length substrings s1 and s2. Let F2 be a string 48 bytes long, composed of two equal length substrings s3 and s4. Now let s3 equal H(s1) and S4 equal H(s2). Unless I'm missing something, the result of THEXID(F1, 1024) is the same as that of THEXID(F2, 48). The way I have been intending to fix this problem in the Mnet design is to prepend the block length to each block (of actual data) before hashing. You could alternatively fix it by including the blocklength in your HTTP headers. (And I could alternatively fix it by including the blocklength in my Mnet filemap.) Regards, Zooko From justin at chapweske.com Tue Nov 19 16:09:01 2002 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] security of variable block-length Merkle hash trees (SHA-1 vs. (SHA-512 % 2^160)) References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <3DDAB9F8.1080604@chapweske.com> Message-ID: <3DDAD290.7050708@chapweske.com> You are correct in noticing that the block length, as well as the hash algorithm, are important in the communication of hash tree roots. This is why THEX requires that tree hash URNs explicitly communicate their block size if it is different from the default '1024'. So the output of your two functions would be: THEXID(F1, 1024) = urn:tree:tiger:ABCDAER234LK2J3498273ASFLKJ THEXID(F2, 48) = urn:tree:tiger/48:ABCDAER234LK2J3498273ASFLKJ However, I could theoretically see a case where a developer gets their block sizes "confused" by blindly using the block size specified in an untrusted THEX file rather than the one specified in the URN. Thank you much for making this example explicit Zooko. I will update the THEX specification with clear warnings about gotchas like this. I may even suggest that developers only support variable block sizes if they fully understand the implications of doing so. > > Hm. Looking at this just now, I wonder if there isn't an authentication flaw, > wherein two different files have the same THEX X-Content-URN. > > Let's say there is a function THEXID() which takes a string of bytes and a > blocksize as its two arguments. I think there are two strings, F1 and F2, such > that THEXID(F1, 1024) == THEXID(F2, 48). (Assuming that the underlying hash > outputs 24 bytes digests.) > > Let F1 be a string 2048 bytes long, composed of two equal length substrings s1 > and s2. Let F2 be a string 48 bytes long, composed of two equal length > substrings s3 and s4. Now let s3 equal H(s1) and S4 equal H(s2). > > Unless I'm missing something, the result of THEXID(F1, 1024) is the same as that > of THEXID(F2, 48). > > The way I have been intending to fix this problem in the Mnet design is to > prepend the block length to each block (of actual data) before hashing. > > You could alternatively fix it by including the blocklength in your HTTP > headers. (And I could alternatively fix it by including the blocklength in my > Mnet filemap.) > > > Regards, > > Zooko > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From bert at akamail.com Tue Nov 19 16:36:01 2002 From: bert at akamail.com (Bert) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Some new YouServ related papers Message-ID: <3DDAD9B4.1010503@akamail.com> In our (futile?!) quest to make p2p the preferred method of information sharing (and by that I don't mean porn and MP3's :-) within the corporate enterprise, we've added some new features to the YouServ p2p webhosting system and described them in a pair of papers. (1) One describes experiences in developing and deploying a (hybrid) p2p search method that doesn't suck. (2) Another describes a system for sharing web applications as opposed to static content, allowing easy development & propagation of meta p2p apps atop the base infrastructure. They can be downloaded from here assuming our external website works: http://www.almaden.ibm.com/cs/people/bayardo/userv/ Though typically quite reliable, since I actually need to use it today, the server seems to be suffering from sporadic outages. Here are some alternate download locations in case you have any problems: (1) http://www-db.stanford.edu/~bawa/Pub/usearch.pdf (2) http://bayardo-userv.userv.web.cmu.edu/secret/adina/plugin.html From gojomo at usa.net Tue Nov 19 16:48:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] security of variable block-length Merkle hash trees (SHA-1 vs. (SHA-512 % 2^160)) References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <3DDAB9F8.1080604@chapweske.com> Message-ID: <007b01c2902e$7125e740$640a000a@golden> Zooko writes: > Justin Chapweske wrote: > > > > If you're not going to do Tiger for the hash tree, then at least support > > variable segment sizes and follow the orphan hash promotion semantics > > in THEX (http://open-content.net/specs/draft-jchapweske-thex-01.html). > > Hm. Looking at this just now, I wonder if there isn't an authentication flaw, > wherein two different files have the same THEX X-Content-URN. > > Let's say there is a function THEXID() which takes a string of bytes and a > blocksize as its two arguments. I think there are two strings, F1 and F2, such > that THEXID(F1, 1024) == THEXID(F2, 48). (Assuming that the underlying hash > outputs 24 bytes digests.) > > Let F1 be a string 2048 bytes long, composed of two equal length substrings s1 > and s2. Let F2 be a string 48 bytes long, composed of two equal length > substrings s3 and s4. Now let s3 equal H(s1) and S4 equal H(s2). > > Unless I'm missing something, the result of THEXID(F1, 1024) is the same as that > of THEXID(F2, 48). Hm. I think you have identified a problem, but it's not *exactly* that THEXID(F1,1024) == THEXID(F2,48). That could be disambiguated by the specification of blocksize in the identifier. For example, in the URN scheme I've proposed for tree hashes -- which is still under discussion -- these two values might be identified as: F1 urn:tree:tiger:ABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDA F2 urn:tree:tiger/48:ABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDA Even though the value-parts are identical, a program would know they aren't any more comparable than a SHA1 value and a RIPEMD160 value. However, there does seem to be a real problem in that THEXID(F1,1024) == THEXID(F2,1024), in the special case where (length(F2)== 2*hashsize) && (length(F2) The way I have been intending to fix this problem in the Mnet design is to > prepend the block length to each block (of actual data) before hashing. I don't think prepending blocklength would necessarily solve the problem where... length(F2)=2*hashsize length(F2)>blocksize THEXID(F1,1024) == THEXID(F2,1024) But somehow mixing in *actual* data lengths (or block counts) would work, because then the THEXID of a 48-byte file would not be the same as a 48-byte-wide interim generation. Considering implications & potential workarounds... - Gojomo From bram at gawth.com Tue Nov 19 18:14:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] security of variable block-length Merkle hash trees (SHA-1 vs. (SHA-512 % 2^160)) In-Reply-To: <3DDAD290.7050708@chapweske.com> Message-ID: Justin Chapweske wrote: > You are correct in noticing that the block length, as well as the hash > algorithm, are important in the communication of hash tree roots. For what it's worth, BitTorrent always includes the block length, with no default, and doesn't really use trees, since they're always exactly two level, it's really just a hash of a list of hashes and other metainfo. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From zooko at zooko.com Tue Nov 19 18:40:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] security of variable block-length Merkle hash trees (SHA-1 vs. (SHA-512 % 2^160)) In-Reply-To: Message from "Gordon Mohr" of "Tue, 19 Nov 2002 16:47:31 PST." <007b01c2902e$7125e740$640a000a@golden> References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <3DDAB9F8.1080604@chapweske.com> <007b01c2902e$7125e740$640a000a@golden> Message-ID: Just for clarification, F1 and F2 in the following is where F2 is composed of two equal-length strings F1a and F1b, and F1 is composed of two equal-length strings F2a and F2b, and F1a = H(F2a) and F1b = H(F2b). "Gordon Mohr" writes: > > However, there does seem to be a real problem in that THEXID(F1,1024) == > THEXID(F2,1024), in the special case where (length(F2)== 2*hashsize) && > (length(F2) > The way I have been intending to fix this problem in the Mnet design is to > > prepend the block length to each block (of actual data) before hashing. > > I don't think prepending blocklength would necessarily solve the problem > where... > > length(F2)=2*hashsize > length(F2)>blocksize > THEXID(F1,1024) == THEXID(F2,1024) Hm. You mean where blocksize <= 2*hashsize? You're right that this would also be a problem. > But somehow mixing in *actual* data lengths (or block counts) would > work, because then the THEXID of a 48-byte file would not be the > same as a 48-byte-wide interim generation. Indeed. For Mnet, I already have the file size included separately in the filemap, so I don't have to worry about this problem in any case. (I guess this means I didn't need to prepend any block lengths in my hash. I hadn't really worked out the details of the hash tree yet, as you can see.) I suppose for purposes of Mnet <-> Bitzi/THEX integration (about which I have a longish e-mail in the wings for tomorrow) we ought to document what kinds of data are required for crypto integrity purposes to be included "out of band" along with the hash value(s). Again, I prefer to include such information "in-band", by mixing it into the hash. For example, prepending the length of the whole file, and the block size just for good measure, at the beginning of the hash of the first block. (I get this preference from certain "crypto design heuristics" papers such as "The Chosen Protocol Attack" [1]. One way to look at it is that implementors are more likely to accidentally leave this important metadata out if it is transmitted in a separate HTTP header, or in an optional suffix as per your block size, than if it is baked into the crypto protocol.) But in the interests of interop, I might consider leaving that stuff out of my hash tree spec (and including it only in my filemap) if you're leaving it out of yours. Regards, Zooko [1] http://citeseer.nj.nec.com/kelsey97protocol.html From bram at gawth.com Tue Nov 19 19:37:02 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] SHA-1 vs. (SHA-512 % 2^160) In-Reply-To: <200211192340.gAJNeHo16135@finney.org> Message-ID: Hal Finney wrote: > Bram Cohen wrote: > > Floyd's cycle-finding algorithm can't be parallelized very well, that's > > why I said it requires 2 ** 128 memory *or* 2 ** 128 non-parallelized > > computational power. > > van Oorschot and Wiener's paper on parallel collision search [1] looks > parallelizable to me. > > [1] http://www.scs.carleton.ca/~paulv/papers/JoC97.pdf Very clever. I take back my comment about non-parallelizability. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From gojomo at usa.net Tue Nov 19 23:25:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] security of variable block-length Merkle hash trees (SHA-1 vs. (SHA-512 % 2^160)) References: <4584A22A-FC03-11D6-911C-000393071F50@mad-scientist.com> <3DDAB9F8.1080604@chapweske.com> <007b01c2902e$7125e740$640a000a@golden> Message-ID: <002f01c29065$e71fc1f0$640a000a@golden> Zooko writes: > "Gordon Mohr" writes: > > However, there does seem to be a real problem in that THEXID(F1,1024) == > > THEXID(F2,1024), in the special case where (length(F2)== 2*hashsize) && > > (length(F2) > Ah yes -- for any file of any length, I can take its two top-most inner nodes > (that is, the X and Y such that the root = H(X+Y)), and define a new file whose > contents are X+Y. This new file will have the same root ID as the previous > file. Yep. It's a very undesirable quality. I don't think it opens much of a malicious attack -- because the collision is between a file, and an artificially-created file of hash data from the original file. Such a created file is completely determined by the original, and of recognizable size. Also, if other channels let people know the size of the file they're seeking, they can disambiguate the two. However, my inclination is still to patch this somehow, and restore the (intended) property of THEX-style tree hashes that no easily discoverable files have the same root value (given identical algorithm/ block-size decisions). > As far as I can see, the only files that I can create which collide have length > 2*hashsize. Upon more thought, I see there are other similar cases having to do with the 'only child'-promotion rule. For example, consider: File F1: 4096 bytes long, 4 1024 byte blocks, call them A, B, C, D. Then the tree hash is: H( H( H(A)+H(B) ) + H ( H(C)+H(D) ) ) // '+' is concatenation File F2: 2048+48 bytes long, 2 1024 byte A blocks, then 1 48 byte block which just "happens to be" H(C)+H(D). Then again, the tree hash is: H( H( H(A)+H(B) ) + H ( H(C)+H(D) ) ) The same caveats as above apply, but this suggests that for one long file, there may be multiple creatable smaller artificial collision files. Changing the rule for handling 'only child' nodes might resolve this issue, but not the 2*hashsize file issue. > > > The way I have been intending to fix this problem in the Mnet design is to > > > prepend the block length to each block (of actual data) before hashing. > > > > I don't think prepending blocklength would necessarily solve the problem > > where... > > > > length(F2)=2*hashsize > > length(F2)>blocksize > > THEXID(F1,1024) == THEXID(F2,1024) > > Hm. You mean where blocksize <= 2*hashsize? You're right that this would also > be a problem. Oops, I meant length(F2)blocksize, then THEXID(F2,blocksize) will *not* be the same as THEXID(F1,blocksize), even if F2 is exactly the same as the two interim tree values just under the root. So really small blocks would be one (inefficient!) way of avoiding this kind of problem. > > But somehow mixing in *actual* data lengths (or block counts) would > > work, because then the THEXID of a 48-byte file would not be the > > same as a 48-byte-wide interim generation. > > Indeed. For Mnet, I already have the file size included separately in the > filemap, so I don't have to worry about this problem in any case. (I guess > this means I didn't need to prepend any block lengths in my hash. I hadn't > really worked out the details of the hash tree yet, as you can see.) > > I suppose for purposes of Mnet <-> Bitzi/THEX integration (about which I have a > longish e-mail in the wings for tomorrow) we ought to document what kinds of > data are required for crypto integrity purposes to be included "out of band" > along with the hash value(s). > > Again, I prefer to include such information "in-band", by mixing it into the > hash. For example, prepending the length of the whole file, and the block size > just for good measure, at the beginning of the hash of the first block. I agree, and my first inclination for addressing this tree-hash issue is to redefine the THEXish root value calculation as hashing the currently defined root value with the resource length (either in bytes or blocks). But the full implications of such a change, on the format and practice of tree hashes, requires some more consideration before I'd be ready to formally propose it. For now: thanks for your perceptive catch of this issue. I'm surprised it's slipped by peoples' notice for so long. - Gojomo ____________________ Gordon Mohr Bitzi CTO . . . describe and discover files of every kind. _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! From justin at chapweske.com Wed Nov 20 06:57:01 2002 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] [Fwd: Re: Merkle Hash Tree Weakness - Request for Advice] Message-ID: <3DDBA2A3.7070601@chapweske.com> Here is a proposed fix that matches the elegence of the hash tree, thus seems like the "right" fix. It also has the added benefit of being constructable from an existing "broken" hash tree and the file size, so this can be done w/o invalidating Bitzi's database. However, it will take at least a number of weeks of considering various options before we can declare the appropriate fix. We don't want to be too hasty on this. -------- Original Message -------- Subject: Re: Merkle Hash Tree Weakness - Request for Advice Date: Wed, 20 Nov 2002 05:42:13 -0800 From: Richard Parker CC: Newsgroups: sci.crypt References: <3d4e7bc5.0211191729.4f59437e@posting.google.com> Yes, as described your hash tree appears to be vulnerable to attacks on its 2nd-preimage resistance. The problem appears to be that your hash is not adequately "capturing" the size of substrings. Fortunately there is a simple fix, when computing the hash for each vertex append to the hash input string the total length of the substrings rooted by that vertex (after padding to the size of the hash function's block size, i.e. 512 bits for SHA-1 and MD5). Appending the size to the input of a hash is called "MD-strengthening" after Merkle and Damgard who independently proposed this idea. For example, ignoring optimizations, such such a hash tree might look as follows: V_0 / \ / \ V_1 V_2 / \ / \ / \ / \ V_3 V_4 V_5 V_6 H() hash function l() length || concatenation V_i vertex i S_j substring j r block size of H n output size of H p_i number of bits needed to pad to multiple of r w number of bits used to represent substring length V_0 = H(V_1 || V_2 || 0^p_0 || 0^(r-w) || l(S_1)+l(S_2)+l(S_3)+l(S_4)) V_1 = H(V_3 || V_4 || 0^p_1 || 0^(r-w) || l(S_1)+l(S_2)) V_2 = H(V_5 || V_6 || 0^p_2 || 0^(r-w) || l(S_3)+l(S_4)) V_3 = H(S_1 || l(S_1)) V_4 = H(S_2 || l(S_2)) V_5 = H(S_3 || l(S_3)) V_6 = H(S_4 || l(S_4)) Depending on your application requirements, it might well be sufficient to just apply MD-strengthening to the root hash. -Richard -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From lgonze at panix.com Thu Nov 21 11:23:01 2002 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Re: [decentralization] Some new YouServ related papers In-Reply-To: <3DDAD9B4.1010503@akamail.com> Message-ID: On Tue, 19 Nov 2002, Bert wrote: > In our (futile?!) quest to make p2p the preferred method of information > sharing (and by that I don't mean porn and MP3's :-) within the > corporate enterprise, we've added some new features to the YouServ p2p > webhosting system and described them in a pair of papers. A thing about yooserv that's a little funny is the use of dynamic DNS for identity, because it implies an admistrative bottleneck. But maybe that's just kneejerk reaction. Can you say how this has worked out in practice, Bert? - Lucas From bert at akamail.com Thu Nov 21 23:02:01 2002 From: bert at akamail.com (Bert) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] Re: [decentralization] Some new YouServ related papers References: Message-ID: <3DDDD801.30901@akamail.com> ucas Gonze wrote: >On Tue, 19 Nov 2002, Bert wrote: > > >>In our (futile?!) quest to make p2p the preferred method of information >>sharing (and by that I don't mean porn and MP3's :-) within the >>corporate enterprise, we've added some new features to the YouServ p2p >>webhosting system and described them in a pair of papers. >> >> > >A thing about yooserv that's a little funny is the use of dynamic DNS for >identity, because it implies an admistrative bottleneck. But maybe that's >just kneejerk reaction. Can you say how this has worked out in practice, >Bert? > I'm not sure I understand the question -- Could you clarify? By administrative do you mean "involving human adminstration"? There is certainly no such bottleneck as the domains are assigned and registered automatically using an injective mapping from the user's (authenticated) e-mail address. Securing the base domain (userv.ibm.com in the case of the IBM deployment) certainly involves some administrative involvement but it only has to be done once (+ whatever is required for keeping it from expiring). The project has precious few resources so I try to keep the system such that it basically runs itself. From zooko at zooko.com Sat Nov 23 10:35:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] choice of security primitive: kinds of failure Message-ID: My wife said something at dinner the other night as we were talking about choice of hash algorithms. She said she was of two minds: on the one hand you don't want to pay a price now for an uncertain gain in the future, and on the other hand it's *really* important not to use broken crypto. This made me realize that the reason I've been internally flip-flopping on which hash algorithm to use is that there are two different failure scenarios in my head. The first scenario is scenario where every year or so a academic paper comes out that says "Here is a slightly better certificational attack on hash algorithm X.". As these certificational (meaning "purely theoretical") attacks on algorithm X get better and better, and closer and closer to being more than purely theoretical, I decide to switch the Mnet file format over to algorithm Y, and commence doing so, in a series of nice backwards-compatible steps. When I'm thinking along those lines, it's obvious that I should use whatever algorithm is efficient and cryptographically approved today, and not worry about the future. But then there's the other scenario, where I suddenly find out that crackers in Malaysia have been using a weakness in algorithm X to steal money and gain control of innocent people's computers. Innocent people who used the Mnet file format and trusted its cryptographic guarantees. I further realize that these crackers in Malaysia didn't cryptanalyze it themselves, but that they learned the trick from someone else. I don't know who, or how many people know how to do this, or for how long they've known it. When I'm thinking along those lines, it's obvious that I should use the strongest possible algorithm today, and efficiency be damned (within limits). Another realization that I had is that my desire to have the Mnet file format used on handhelds is *definitely* a future scenario of the first kind. I can start using a slow, expensive algorithm today, and if there opens up an opportunity to use the Mnet file format on handhelds we can switch over to a more efficient algorithm in nice backwards-compatible steps. Regards, Zooko From zooko at zooko.com Sat Nov 23 12:46:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] choice of hash algorithm: some facts Message-ID: Here's a few things I've learned about hash algorithms: * NIST has standardized SHA-1, SHA-256, SHA-384, and SHA-512 SHA-1 is a member of the MD4, MD5 family. The others (collectively known as SHA-2) are not. * NESSIE [1] is considering the following algorithms for standardization: all of the NIST ones, plus Whirlpool [2]. Notably, Tiger wasn't proposed for NESSIE, even though Biham (one of the two authors of Tiger) is participating in NESSIE. I don't know why Tiger wasn't proposed for standardization. * Whirlpool is based on Rijndael and one of the designers of Whirlpool is one of the designers of Rijndael. The NESSIE project measures Whirlpool as being a little faster than SHA-2/512 (36 cycles/byte for Whirlpool, 40 cycles/byte for SHA-512) [3]. * Ross Anderson (the other author of Tiger) gives a high-level overview of hash algorithms in his book "Security Engineering". He describes MD4, MD5, SHA-1, SHA-256, SHA-512. He calls these latter two "versions of SHA". He says to use more than 160-bit wide hash functions, and to avoid the "MD series". He doesn't mention that SHA-1 is genetically related to the MD series. * I ran the Crypto++ v5 benchmark on my machine. It shows that my 1.4 GHz Athlon XP is about twice as fast as Wei Dai's Celeron 850 MHz [4], and otherwise shows approximately the same relation between speeds of hash functions: hash algorithm MB/s -------------- ---- CRC-32 253 Adler-32 232 MD5 129 HAVAL (pass=3) 86 SHA-1 84 HAVAL (pass=4) 62 RIPE-MD160 51 HAVAL (pass=5) 50 Tiger 47 SHA-256 41 SHA-512 17 MD2 2 Regards, Zooko [1] http://cryptonessie.org/ [2] http://planeta.terra.com.br/informatica/paulobarreto/WhirlpoolPage.html [3] "Performance of Optimized Implementations of the NESSIE Primitives, version 1.0" http://www.cosic.esat.kuleuven.ac.be/nessie/deliverables/D21-v1.pdf [4] http://www.eskimo.com/~weidai/benchmarks.html From arma at mit.edu Mon Nov 25 22:34:01 2002 From: arma at mit.edu (Roger Dingledine) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] CfP: Privacy Enhancing Technologies 2003 Message-ID: <20021126013307.R17342@moria.seul.org> We've extended the submission deadline to Dec 6 (firm deadline). Also note that we will have a few (or maybe quite a few, depending on further sponsors) stipends available for students, unemployed researchers, etc. (Please forward widely.) CALL FOR PAPERS WORKSHOP ON PRIVACY ENHANCING TECHNOLOGIES 2003 Mar 26-28 2003 Dresden, Germany Hotel Elbflorenz Dresden http://www.petworkshop.org/ Privacy and anonymity are increasingly important in the online world. Corporations and governments are starting to realize their power to track users and their behavior, and restrict the ability to publish or retrieve documents. Approaches to protecting individuals, groups, and even companies and governments from such profiling and censorship have included decentralization, encryption, and distributed trust. Building on the success of the first anonymity and unobservability workshop (held in Berkeley in July 2000) and the second workshop (held in San Francisco in April 2002), this third workshop addresses the design and realization of such privacy and anti-censorship services for the Internet and other communication networks. These workshops bring together anonymity and privacy experts from around the world to discuss recent advances and new perspectives. The workshop seeks submissions from academia and industry presenting novel research on all theoretical and practical aspects of privacy technologies, as well as experimental studies of fielded systems. We encourage submissions from other communities such as law and business that present their perspectives on technological issues. As in past years, we will publish LNCS proceedings after the workshop. Suggested topics include but are not restricted to: * Efficient (technically or economically) realization of privacy services * Techniques for censorship resistance * Anonymous communication systems (theory or practice) * Anonymous publishing systems (theory or practice) * Attacks on anonymity systems (eg traffic analysis) * New concepts in anonymity systems * Protocols that preserve anonymity/privacy * Models for anonymity and unobservability * Models for threats to privacy * Novel relations of payment mechanisms and anonymity * Privacy-preserving/protecting access control * Privacy-enhanced data authentication/certification * Profiling, data mining, and data protection technologies * Reliability, robustness, and attack resistance in privacy systems * Providing/funding privacy infrastructures (eg volunteer vs business) * Pseudonyms, identity, linkability, and trust * Privacy, anonymity, and peer-to-peer * Usability issues and user interfaces for PETs * Policy, law, and human rights -- anonymous systems in practice * Incentive-compatible solutions to privacy protection * Economics of privacy systems * Fielded systems and techniques for enhancing privacy in existing systems IMPORTANT DATES Submission deadline (extended, firm) December 6, 2002 23:59 EST Acceptance notification February 7, 2003 Camera-ready copy for preproceedings March 7, 2003 Camera-ready copy for proceedings April 28, 2003 CHAIRS Roger Dingledine, The Free Haven Project, USA Andreas Pfitzmann, Dresden University of Technology, Germany PROGRAM COMMITTEE Alessandro Acquisti, SIMS, UC Berkeley, USA Stefan Brands, Credentica, Canada Jean Camp, Kennedy School, Harvard University, USA David Chaum, USA Richard Clayton, University of Cambridge, England Lorrie Cranor, AT&T Labs - Research, USA Roger Dingledine, The Free Haven Project, USA (program chair) Hannes Federrath, Freie Universitaet Berlin, Germany Ian Goldberg, Zero Knowledge Systems, Canada Marit Hansen, Independent Centre for Privacy Protection Schleswig-Holstein, Germany Markus Jakobsson, RSA Laboratories, USA Brian Levine, University of Massachusetts at Amherst, USA David Martin, University of Massachusetts at Lowell, USA Andreas Pfitzmann, Dresden University of Technology, Germany Matthias Schunter, IBM Zurich Research Lab, Switzerland Andrei Serjantov, University of Cambridge, England Adam Shostack, Canada Paul Syverson, Naval Research Lab, USA PAPER SUBMISSIONS Submitted papers must not substantially overlap with papers that have been published or that are simultaneously submitted to a journal or a conference with proceedings. Papers should be at most 15 pages excluding the bibliography and well-marked appendices (using 11-point font and reasonable margins), and at most 20 pages total. Authors are encouraged to follow Springer LNCS format in preparing their submissions . Committee members are not required to read the appendices and the paper should be intelligible without them. The paper should start with the title, names of authors and an abstract. The introduction should give some background and summarize the contributions of the paper at a level appropriate for a non-specialist reader. During the workshop preproceedings will be made available. Final versions are not due until after the workshop, giving the authors the opportunity to revise their papers based on discussions during the meeting. Submissions must be made in Postscript or PDF format. To submit a paper, send a plain ASCII text email to containing the title and abstract of the paper, the authors' names, email and postal addresses, phone and fax numbers, and identification of the contact author. To the same message, attach your submission (as a MIME attachment). Papers must be received by December 6, 2002. Notification of acceptance or rejection will be sent to authors no later than February 7, 2003, and authors will have the opportunity to revise for the preproceedings version by March 7, 2003. Submission implies that, if accepted, the author(s) agree to publish in the proceedings and to sign a standard Springer copyright release, and also that an author of the paper will present it at the workshop. Final versions (due after the workshop) need to comply with the instructions for authors made available by Springer. From bram at gawth.com Wed Nov 27 02:08:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] CodeCon CFP reminder, deadline December 15th Message-ID: The CodeCon submissions deadline is December 15th, you can see the info here - http://codecon.info/ Also, the date and place have been set - February 22-24, Club NV, San Francisco. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From nikolajn at ascio.com Wed Nov 27 11:44:01 2002 From: nikolajn at ascio.com (Nikolaj Nyholm) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] RE: [decentralization] Some new YouServ related papers Message-ID: <2F15A97500CFA0469C9BACC2041F8AC702EE0432@aries.dk.speednames.com> > A thing about yooserv that's a little funny is the use of > dynamic DNS for > identity, because it implies an admistrative bottleneck. But The idea is not so far out. Dynamic DNS will in essence run identity and discovery for 3G (layer 2/link layer mobility). On a smaller note, ENUM delegations (use telephone number in DNS as identity) are finally starting to happen, using SIP for presence and message/session initiation. All current efforts are, however, on the purely experimental basis. On an even smaller note, we've done work on building extended identity functions on top of any of the above two 'identity layers'. /n From halfinney at tmail.com Wed Nov 27 11:44:04 2002 From: halfinney at tmail.com (Hal Finney) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] choice of security primitive: kinds of failure In-Reply-To: References: Message-ID: <1038078311.253FE56B@w5.dngr.org> It's hard for me to believe that some hacker consortium will be capable of the advanced mathematical reasoning necessary to break a hash algorithm. Academic researchers are the ones who will break these algorithms, and they'll publish their results in the open literature. It takes a different kind of expertise to do a theoretical analysis and break of a crypto algorithm than has been demonstrated in the hacker community. So I would not put too much weight on that concern. Hal On Sat, 23 Nov 2002 10:41AM -0800, Zooko wrote: > > My wife said something at dinner the other night as we were talking > about choice > of hash algorithms. She said she was of two minds: on the one hand you > don't > want to pay a price now for an uncertain gain in the future, and on the > other > hand it's *really* important not to use broken crypto. > > This made me realize that the reason I've been internally flip-flopping > on which > hash algorithm to use is that there are two different failure scenarios > in my > head. > > The first scenario is scenario where every year or so a academic paper > comes out > that says "Here is a slightly better certificational attack on hash > algorithm X.". As these certificational (meaning "purely theoretical") > attacks > on algorithm X get better and better, and closer and closer to being > more than > purely theoretical, I decide to switch the Mnet file format over to > algorithm Y, > and commence doing so, in a series of nice backwards-compatible steps. > > When I'm thinking along those lines, it's obvious that I should use > whatever > algorithm is efficient and cryptographically approved today, and not > worry about > the future. > > But then there's the other scenario, where I suddenly find out that > crackers in > Malaysia have been using a weakness in algorithm X to steal money and > gain > control of innocent people's computers. Innocent people who used the > Mnet file > format and trusted its cryptographic guarantees. I further realize > that these > crackers in Malaysia didn't cryptanalyze it themselves, but that they > learned > the trick from someone else. I don't know who, or how many people know > how to > do this, or for how long they've known it. > > When I'm thinking along those lines, it's obvious that I should use > the > strongest possible algorithm today, and efficiency be damned (within > limits). > > Another realization that I had is that my desire to have the Mnet file > format > used on handhelds is *definitely* a future scenario of the first kind. > I can > start using a slow, expensive algorithm today, and if there opens up > an > opportunity to use the Mnet file format on handhelds we can switch over > to a > more efficient algorithm in nice backwards-compatible steps. > > Regards, > > Zooko > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers --hal From agl at imperialviolet.org Wed Nov 27 11:44:07 2002 From: agl at imperialviolet.org (Adam Langley) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] choice of hash algorithm: some facts In-Reply-To: References: Message-ID: <20021124003005.GA30787@imperialviolet.org> On Sat, Nov 23, 2002 at 03:41:52PM -0500, Zooko wrote: > hash algorithm MB/s > -------------- ---- > CRC-32 253 > Adler-32 232 > MD5 129 > HAVAL (pass=3) 86 > SHA-1 84 > HAVAL (pass=4) 62 > RIPE-MD160 51 > HAVAL (pass=5) 50 > Tiger 47 > SHA-256 41 > SHA-512 17 > MD2 2 Any timings for Whirlpool? -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org (+44) (0)7986 296753 PGP: 9113 256A CC0F 71A6 4C84 5087 CDA5 52DF 2CB6 3D60 From zooko at zooko.com Wed Nov 27 13:36:01 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] choice of security primitive: kinds of failure In-Reply-To: Message from "Hal Finney" of "Sat, 23 Nov 2002 11:03:29 PST." <1038078311.253FE56B@w5.dngr.org> References: <1038078311.253FE56B@w5.dngr.org> Message-ID: Hal Finney wrote: > > It's hard for me to believe that some hacker consortium will be capable > of the advanced mathematical reasoning necessary to break a hash > algorithm. Oh, I agree. Perhaps I should have left the identities of the analyzers blank in the story. (In my head, I was imagining that it was the national security apparatus of a major state, such as Russia. Another scenario could be a brilliant and lucky academic who sells out instead of publishing.) I hate to sound paranoid, but I do want to think about how likely I consider the "surprising, already exploited" scenario. Since I have very little information with which to evaluate the likelihood of such a scenario, I continue to flip-flop between the efficient choices (SHA-1 and Rijndael), and the conservative ones (SHA-512 (??) and DES-EDE3). Suggestions would be appreciated. Perhaps SHA-1 is both the most efficient and the most conservative choice of a hash function. Regards, Zooko Algorithm MB/s --------- ---- SHA-1 85.333 SHA-256 41.558 SHA-512 17.778 DES-XEX3 13.333 DES-EDE3 6.061 Rijndael (128-bit key) 31.068 Rijndael (192-bit key) 27.119 Rijndael (256-bit key) 23.881 Rijndael (128) CTR 27.350 Rijndael (128) OFB 29.358 Rijndael (128) CFB 23.188 Rijndael (128) CBC 27.119 From hemppah at cc.jyu.fi Fri Nov 29 00:49:02 2002 From: hemppah at cc.jyu.fi (hemppah@cc.jyu.fi) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] About search methods, key revokation in PKI and signature management Message-ID: <1038559300.3de728446fc72@tammi2.cc.jyu.fi> Hi, Currently I'm doing my Master thesis which focuses some specific issues related to p2p systems. I have a few questions concercning these issues: a) Is DHT-based routing model (O(log(n)) the best effort for finding (data) blocks in p2p network ? b) What are assumptions for the best effort ? Look question a (what ever it is). c) Are there any documentation/implementation how one should manage key revokation in PKI ? d) How digital signatures should be managed (PKI) in p2p environment ? e) How do I know that if I have searched data, results are accurate (not fake blocks etc.) Any help would be grateful. Thanks, Hermanni Hyyti?l? ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ From arachnid at notdot.net Fri Nov 29 02:10:02 2002 From: arachnid at notdot.net (Nick Johnson) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] The 'MetaWeb' and the Slashdot effect Message-ID: <3DE73D10.6080608@notdot.net> Hi folks, I was just thinking some random thoughts (you know the type), and a fairly simple idea occurred to me for mitigation of Slashdot type effects, increasing reliability and uptime, etc. The basic idea is this: Instead of attempting to fetch a page directly, the user attempts to fetch it via a script. If the script has a cached copy of the page in it's database (the script would follow standard caching directives as a proxy would), it simply returns the page. If not, it has a number of choices: - It can open a query to another copy of the script it knows of, elsewhere on the network, and ask for the same page, then cache it and return it to the user - If it determines it's traffic to be what it decides is 'too high', it can simply return an HTTP redirect to the user, redirecting them to another script it knows of - It can fetch the page itself and return it to the user. I am aware this idea bears a number of similarities to existing efforts, including such things as PeekABooty and Anonymizer.com, but I believe it could have several advantages in certain situations: - It requires no software on the part of the user, though a proxy server that uses this system could be implemented. - It requires no software or modifications on the part of the content author. - If organisations such as slashdot either link to a script or provide their own, Slashdot effects can be mitigated or eliminated. Not to mention that it could be interesting to write ;). Obviously, there are unanswered implementation questions, such as how a page decides which action to take if it does not have a copy of the page, how it should deal with 'no cache' pages, etc. I'm posting this because I'd be interested to see if anyone is interested, and if anyone has done something similar already. From bert at akamail.com Fri Nov 29 08:46:01 2002 From: bert at akamail.com (Bert) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] The 'MetaWeb' and the Slashdot effect In-Reply-To: <3DE73D10.6080608@notdot.net> References: <3DE73D10.6080608@notdot.net> Message-ID: <3DE79B69.5000807@akamail.com> Nick Johnson wrote: > Hi folks, > > I was just thinking some random thoughts (you know the type), and a > fairly simple idea occurred to me for mitigation of Slashdot type > effects, increasing reliability and uptime, etc. > The basic idea is this: Instead of attempting to fetch a page > directly, the user attempts to fetch it via a script. If the script > has a cached copy of the page in it's database (the script would > follow standard caching directives as a proxy would), it simply > returns the page. If not, it has a number of choices: [snip] Below is a paper that explores this idea, though it's more a combination of this idea + BitTorrent in that the peers who download are the ones doing the caching. The idea here is that clients requesting the content provide a special pragma field that indicates their willingness to serve that content to subsequent requestors. The server can then issue a redirect to any such peers instead of serving the content itself. * The Case for Cooperative Networking*, Venkata N. Padmanabhan and Kunwadee Sripanidkulchai. IPTPS '02 . http://detache.cmcl.cs.cmu.edu/~kunwadee/research/papers/coopnetiptps.pdf I've also planned on adding a similar function to YouServ, as described in section 3.4 of http://www.almaden.ibm.com/cs/people/bayardo/userv/plugin/plugin.html. Here though we want to use DNS to do the "load balancing" (ala Akamai) instead of requiring the originating server to always either send redirects or serve the content itself. From gojomo at usa.net Fri Nov 29 09:48:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] The 'MetaWeb' and the Slashdot effect References: <3DE73D10.6080608@notdot.net> <3DE79B69.5000807@akamail.com> Message-ID: <002f01c297cf$2a98b2f0$4d30af18@gojovaio> > The Case for Cooperative Networking*, Venkata N. Padmanabhan and > Kunwadee Sripanidkulchai. IPTPS '02 > . > http://detache.cmcl.cs.cmu.edu/~kunwadee/research/papers/coopnetiptps.pdf Another paper exploring similar ideas, presented at the same conference: Peer-to-Peer Caching Schemes to Address Flash Crowds by Tyron Stading, Petros Maniatis, and Mary Baker http://mosquitonet.stanford.edu/publications/Backslash-IPTPS2002.pdf - Gojomo From agm at SIMS.Berkeley.EDU Fri Nov 29 13:21:01 2002 From: agm at SIMS.Berkeley.EDU (Antonio Garcia) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] About search methods, key revokation in PKI and signature management In-Reply-To: <1038559300.3de728446fc72@tammi2.cc.jyu.fi> Message-ID: > > a) Is DHT-based routing model (O(log(n)) the best effort for finding (data) > blocks in p2p network ? Depends on the P2P network. In Gnutella, there are no bounds on searches. In CHORD and Tapestry, it is log n, in CAN it is n^(1/d), where d is the dimension of the logical space. > > b) What are assumptions for the best effort ? Look question a (what ever it is). > Rather complicated question! I would recommend reading the papers... > > d) How digital signatures should be managed (PKI) in p2p environment ? That's unresolved, and would be an excellent paper if you figured it out. > > e) How do I know that if I have searched data, results are accurate (not fake > blocks etc.) Well, if you are searching by file hash, then it's pretty likely to be what you are looking for. If you are searching by meta-data, then who knows what you're getting... A. ~~~~~~~~~~~~~~~~~~~~~~~ Antonio Garcia-Martinez cryptologia.com From seth.johnson at RealMeasures.dyndns.org Fri Nov 29 13:40:02 2002 From: seth.johnson at RealMeasures.dyndns.org (Seth Johnson) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] DEADLINE Thursday: Stop the FCC "Broadcast Flag" Proposal! Message-ID: <3DE7DCEC.3D60F61E@RealMeasures.dyndns.org> New Yorkers for Fair Use Action Alert: -------------------------------------- Please send a comment opposing the "Broadcast Flag" Proposal to the FCC by this Thursday, December 5, 2002. Tell the FCC to Serve the Public, Not Hollywood! Okay, you folks understand this issue -- it's very important to send word to the FCC by the public comments deadline, this Thursday, December 5, that you OPPOSE the Notice of Proposed Rulemaking #02-230. This rule would make it illegal for ordinary citizens to own fully functional digital television devices. We've made it easy; just follow the links below. 1) Please send in your comments to the FCC using the form provided below. Tell them that the movie industry should not have a special privilege to own fully-functional digital television devices. Read the alert below for details. 2) Please forward this alert to any other interested parties that you know of, who would understand and see the importance of this issue. 3) Volunteer to help us with this and other alerts related to your rights to flexible information technology in the future. Two roles you can take up are to become a Press Outreach Campaigner or a Commentator. Simply reply to this email to show your interest. New Yorkers for Fair Use Action Alert: -------------------------------------- Tell the FCC to Serve the Public, Not Hollywood! Public Comments Needed to Stop the "Broadcast Flag" Proposal at the FCC Please follow this link and use the form on the Center for Democracy and Technology's site to let the FCC know that the public's rights are at stake: http://www.nyfairuse.org/action/fcc.flag.xhtml. What's Going On: The FCC is considering a proposal that digital televisions be required to work only according to the rules set by Hollywood, through the use of a "broadcast flag" assigned to digital TV broadcasts. Through the deliberations of a group called the Broadcast Protection Discussion Group which assiduously discounted the public's rights to use flexible information technology, Hollywood and leading technology players have devised a plan that would only allow "professionals" to have fully-functional devices for processing digital broadcast materials. Hollywood and content producers must not be allowed to determine the rights of the public to use flexible information technology. The idea of the broadcast flag is to implement universal content control and abolish the right of free citizens to own effective tools for employing digital content in useful ways. The broadcast flag is theft. In the ongoing fight with old world content industries, the most essential rights and interests in a free society are those of the public. Free citizens are not mere consumers; they are not a separate group from so-called "professionals." The stakeholders in a truly just information policy in a free society are the public, not those who would reserve special rights to control public uses of information technology. Please go to the Center for Democracy and Technology's Broadcast Flag Action Page and use their form to let the FCC know that the public's rights are at stake: http://www.cdt.org/action/copyright/. ---- Some background links: http://bpdg.blogs.eff.org/archives/one-page.pdf http://www.eff.org/effector/HTML/effect15.22.html#III http://www.cdt.org/press/020807press.shtml http://scriban.com/movabletype_archives/000334.shtml http://scriban.com/movabletype_archives/000331.shtml The following link is the FCC's "Notice of Proposed Rulemaking" for the broadcast flag. http://hraunfoss.fcc.gov/edocs_public/attachmatch/FCC-02-231A1.pdf From coderman at mindspring.com Fri Nov 29 17:45:01 2002 From: coderman at mindspring.com (coderman) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] About search methods, key revokation in PKI and signature management References: <1038559300.3de728446fc72@tammi2.cc.jyu.fi> Message-ID: <3DE81836.30306@mindspring.com> hemppah@cc.jyu.fi wrote: > Hi, > > Currently I'm doing my Master thesis which focuses some specific issues related > to p2p systems. I have a few questions concercning these issues: > > a) Is DHT-based routing model (O(log(n)) the best effort for finding (data) > blocks in p2p network ? I have a biased description of search/discovery methods for large peer networks (out of date - i will revise one of these days...) which may be helpfull: http://cubicmetercrystal.com/alpine/discovery.html > Any help would be grateful. > > Thanks, > Hermanni Hyyti?l? > > > ------------------------------------------------- > This mail sent through IMP: http://horde.org/imp/ > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > -- _____________________________________________________________________ coderman@mindspring.com http://cubicmetercrystal.com/ key fingerprint: 9C00 C63E A71D D488 AF17 F406 56FB 71D9 E17D E793 ( see html source for public key ) --------------------------------------------------------------------- From coderman at mindspring.com Fri Nov 29 17:47:01 2002 From: coderman at mindspring.com (coderman) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] About search methods, key revokation in PKI and signature management References: Message-ID: <3DE81898.20202@mindspring.com> Antonio Garcia wrote: >>a) Is DHT-based routing model (O(log(n)) the best effort for finding (data) >>blocks in p2p network ? > > > Depends on the P2P network. In Gnutella, there are no bounds on searches. > In CHORD and Tapestry, it is log n, in CAN it is n^(1/d), where d is the > dimension of the logical space. Another important factor to consider is the type of search provided. For example, DHT's require key - value associations which is difficult to use for arbitrary keyword or metadata searches (where partial criteria must be considered). Gnutella is more flexible in this regard, however, such flexibility comes at the price of increased inefficiency. -- _____________________________________________________________________ coderman@mindspring.com http://cubicmetercrystal.com/ key fingerprint: 9C00 C63E A71D D488 AF17 F406 56FB 71D9 E17D E793 ( see html source for public key ) --------------------------------------------------------------------- From hal at finney.org Fri Nov 29 18:20:01 2002 From: hal at finney.org (Hal Finney) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] DEADLINE Thursday: Stop the FCC "Broadcast Flag" Proposal! Message-ID: <200211300218.gAU2IdM02270@finney.org> > Please follow this link and use the form on the Center for > Democracy and Technology's site to let the FCC know that the > public's rights are at stake: > http://www.nyfairuse.org/action/fcc.flag.xhtml. This is probably not the right forum to discuss this, but I found the questions presented for comment at the CDT site somewhat baffling: Will the broadcast flag interfere with consumers' ability to make copies of DTV content for their personal use, either on personal video recorders or removable media? Would the digital flag interfere with consumers' ability to send DTV content across networks, such as home digital networks connecting digital set top boxes, digital recorders, digital servers and digital display devices? Would the broadcast flag requirement limit consumers' ability to use their existing electronic equipment (equipment not built to look for the flag) or make it difficult to use older components with new equipment that is compliant with the broadcast flag standard? Would a broadcast flag requirement limit the development of future equipment providing consumers with new options? What will be the cost impact, if any, that a broadcast flag requirement would have on consumer electronics equipment? It doesn't seem to me that the average consumer would be in position to answer virtually any of these questions. They all require some knowledge of the details of the broadcast flag, and some depend on policy decisions which are yet to be made! How can a consumer judge what the cost impact of a broadcast flag will be? We don't know anything about designing digital video devices, how the BF would affect the parts count and the overall cost. How can we know if the BF will interfere with our ability to make copies for personal use, or to send information across home networks - doesn't that depend on what restrictions get implemented? How can we know what degree of backwards compatibility for existing equipment will be possible, or how the BF will affect future designs? I am baffled why the CDT and other online groups would be encouraging consumers to make what will certainly be completely uninformed and inexpert judgements on questions that they are totally unqualified to answer! What the government *should* ask, if they care about the public's opinions, are policy questions, like: should the government aim to make it technically difficult for people to share digital TV broadcasts over the Internet? This is an issue where everyone has an opinion, and mine is as good as yours. This is an issue where consumers could give meaningful input. Anyway, while I was happy to have a chance to register my opinion on the Proposed Rule Making, I was disappointed to learn that they were asking me all these technical questions where I would have to learn a lot more about digital video to feel qualified to answer. Hal Finney From seth.johnson at RealMeasures.dyndns.org Fri Nov 29 18:38:02 2002 From: seth.johnson at RealMeasures.dyndns.org (Seth Johnson) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] DEADLINE Thursday: Stop the FCC "Broadcast Flag" Proposal! References: <200211300218.gAU2IdM02270@finney.org> Message-ID: <3DE8246E.479443F4@RealMeasures.dyndns.org> Hello Hal, Yes, we have only used the CDT's website because they have an easy to use form that generates the metadata-coded email that the FCC's comments system uses. Those questions reflect the questions in the NPRM. We think it unfortunate that both the FCC and the CDT are asking for "consumer" analyses, as if one can actually draw such a distinction for the Internet (which you can't, for reasons that P2P Hackers understand well). And simply reflecting the FCC's questions doesn't encourage people to speak freely and in terms of how they see the issue. However, there is a general comments field at the bottom of the form. And one does *not* have to accept the premises loaded into the questions. And it is certainly possible to speak according to principles without having to get into the technical aspects too much. I find it most helpful to understand that the whole deal, no matter the technical details, sets up some people as calling the shots and making the rest into mere consumers, rather than equal citizens online. Hope that helps. Now back to P2P Hacking . . . Seth Johnson Hal Finney wrote: > > > Please follow this link and use the form on the Center for > > Democracy and Technology's site to let the FCC know that the > > public's rights are at stake: > > http://www.nyfairuse.org/action/fcc.flag.xhtml. > > This is probably not the right forum to discuss this, but I found the > questions presented for comment at the CDT site somewhat baffling: > > Will the broadcast flag interfere with consumers' ability to make > copies of DTV content for their personal use, either on personal > video recorders or removable media? > > Would the digital flag interfere with consumers' ability to send DTV > content across networks, such as home digital networks connecting > digital set top boxes, digital recorders, digital servers and digital > display devices? > > Would the broadcast flag requirement limit consumers' ability to > use their existing electronic equipment (equipment not built to look > for the flag) or make it difficult to use older components with new > equipment that is compliant with the broadcast flag standard? > > Would a broadcast flag requirement limit the development of future > equipment providing consumers with new options? > > What will be the cost impact, if any, that a broadcast flag requirement > would have on consumer electronics equipment? > > It doesn't seem to me that the average consumer would be in position to > answer virtually any of these questions. They all require some knowledge > of the details of the broadcast flag, and some depend on policy decisions > which are yet to be made! > > How can a consumer judge what the cost impact of a broadcast flag will be? > We don't know anything about designing digital video devices, how the > BF would affect the parts count and the overall cost. How can we know > if the BF will interfere with our ability to make copies for personal > use, or to send information across home networks - doesn't that depend > on what restrictions get implemented? How can we know what degree of > backwards compatibility for existing equipment will be possible, or how > the BF will affect future designs? > > I am baffled why the CDT and other online groups would be encouraging > consumers to make what will certainly be completely uninformed and > inexpert judgements on questions that they are totally unqualified > to answer! > > What the government *should* ask, if they care about the public's > opinions, are policy questions, like: should the government aim to make > it technically difficult for people to share digital TV broadcasts > over the Internet? This is an issue where everyone has an opinion, > and mine is as good as yours. This is an issue where consumers could > give meaningful input. > > Anyway, while I was happy to have a chance to register my opinion on the > Proposed Rule Making, I was disappointed to learn that they were asking > me all these technical questions where I would have to learn a lot more > about digital video to feel qualified to answer. > > Hal Finney > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers -- DRM is Theft! We are the Stakeholders! New Yorkers for Fair Use http://www.nyfairuse.org [CC] Counter-copyright: http://cyber.law.harvard.edu/cc/cc.html I reserve no rights restricting copying, modification or distribution of this incidentally recorded communication. Original authorship should be attributed reasonably, but only so far as such an expectation might hold for usual practice in ordinary social discourse to which one holds no claim of exclusive rights. From hemppah at cc.jyu.fi Sat Nov 30 04:28:01 2002 From: hemppah at cc.jyu.fi (hemppah@cc.jyu.fi) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] About search methods, key revokation in PKI and Message-ID: <1038658842.3de8ad1a64bfd@tammi2.cc.jyu.fi> Hi, Thanks for you answer. Please see my answers below. >Date: Fri, 29 Nov 2002 09:22:10 -0800 (PST) >From: Antonio Garcia >To: p2p-hackers@zgp.org >Subject: Re: [p2p-hackers] About search methods, key revokation in PKI and > signature management >Reply-To: p2p-hackers@zgp.org > >> a) Is DHT-based routing model (O(log(n)) the best effort for finding (data) >> blocks in p2p network ? >Depends on the P2P network. In Gnutella, there are no bounds on searches. >In CHORD and Tapestry, it is log n, in CAN it is n^(1/d), where d is the >dimension of the logical space. Yes, that's true. The point I was trying to say was that is O(log n) really the best effort and only implemented by DHTs. So there really is no other "log n"s than DHTs (e.g. trees or tries) ? >> >> b) What are assumptions for the best effort ? Look question a (what ever it >>is). >> >Rather complicated question! I would recommend reading the papers... Yes, rather unbounded question ;). Basically, I just ment that, for example, in DHTs the basic assumption is that you can't share your resources from your *own* computer; DHTs maps keys to values in m-bit virtual space address. And in Gnutella, the basic assumption is contrast. You have (well you don't *have* to but..) to share your resources from your own computer. etc. >> >> d) How digital signatures should be managed (PKI) in p2p environment ? >That's unresolved, and would be an excellent paper if you figured it out. Hmm..I think Groove has quite clever signature management system, but I'm not sure how well it can scale (e.g. 1,000,000,000s of users). >> >> e) How do I know that if I have searched data, results are accurate (not >>fake >> blocks etc.) >Well, if you are searching by file hash, then it's pretty likely to be >what you are looking for. If you are searching by meta-data, then who >knows what you're getting... Hmm..this *might* be weird question, but can Tree Hash EXchange format (THEX) combine file hashes and metadata somehow ? Thanks, Hermanni ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ From hemppah at cc.jyu.fi Sat Nov 30 04:48:01 2002 From: hemppah at cc.jyu.fi (hemppah@cc.jyu.fi) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] About search methods, key revokation in PKI and Message-ID: <1038660062.3de8b1dea30a6@tammi2.cc.jyu.fi> >>>a) Is DHT-based routing model (O(log(n)) the best effort for finding (data) >>>blocks in p2p network ? >> >> >> Depends on the P2P network. In Gnutella, there are no bounds on searches. >> In CHORD and Tapestry, it is log n, in CAN it is n^(1/d), where d is the >> dimension of the logical space. >Another important factor to consider is the type of search provided. For >example, DHT's require key - value associations which is difficult to use >for arbitrary keyword or metadata searches (where partial criteria must be >considered). Gnutella is more flexible in this regard, however, such >flexibility comes at the price of increased inefficiency. Does anyone know if there is any other projects whose aim is to combine DHTs and Gnutella-like systems than YAPPERS (see http://dbpubs.stanford.edu:8090/pub/2002-24) ? As far as I know, there has been some discussion in academical world also (see e.g. http://nile.usc.edu/research.htm). Thanks, Hermanni Hyyti?l? ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ From barnesjf at vuse.vanderbilt.edu Sat Nov 30 12:38:01 2002 From: barnesjf at vuse.vanderbilt.edu (J. Fritz Barnes) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] About search methods, key revokation in PKI and In-Reply-To: <1038658842.3de8ad1a64bfd@tammi2.cc.jyu.fi>; from hemppah@cc.jyu.fi on Sat, Nov 30, 2002 at 02:20:42PM +0200 References: <1038658842.3de8ad1a64bfd@tammi2.cc.jyu.fi> Message-ID: <20021130134801.B10885@vuse.vanderbilt.edu> On Sat, Nov 30, 2002 at 02:20:42PM +0200, hemppah@cc.jyu.fi wrote: :) :) > :) >> a) Is DHT-based routing model (O(log(n)) the best effort for finding (data) :) >> blocks in p2p network ? :) :) Yes, that's true. The point I was trying to say was that is O(log n) really the :) best effort and only implemented by DHTs. So there really is no other "log n"s :) than DHTs (e.g. trees or tries) ? :) There are two measurements to evaluate the routing by: amount of space required and number of hops required to find a specific piece of information. The DHT systems (Chord, Pastry, Tapestry) tend to be O(log(n)) for both space and number of hops. If you were willing to store additional information; you could have O(n) space and O(1) lookup. This is a spectrum... on the other side each host might only keep the next peer in which case it would be O(1) space and O(n) lookup. Fritz