From zooko at zooko.com Mon Sep 1 13:19:24 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00 In-Reply-To: Message from Peter Thiemann of "24 Aug 2003 12:58:18 +0200." References: <20030818110317.GC5488@fysh.org> Message-ID: Since SHA-1, SHA-256, and SHA-512 are unambiguously distinguishable by the length of the hash value, one could specify that "one of the three SHA standards" is the default, and which one is determined by the length. By the way, hash-127 is not a cryptographically secure hash. My apologies if this has already been discussed -- I am travelling and have not followed the discussion closely. Regards, Bryce "Zooko" Wilcox-O'Hearn http://zooko.com/ From hopper at omnifarious.org Mon Sep 1 16:13:42 2003 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00 In-Reply-To: References: <20030818110317.GC5488@fysh.org> Message-ID: <1062432822.3401.433.camel@monster.omnifarious.org> On Mon, 2003-09-01 at 08:19, Zooko wrote: > Since SHA-1, SHA-256, and SHA-512 are unambiguously distinguishable by the > length of the hash value, one could specify that "one of the three SHA > standards" is the default, and which one is determined by the length. Does anybody know of any good analysis of the how closely the output distribution for SHA-1, SHA-256, and SHA-384/512 matches a true randomg number generator in various situations? For example, are they flat for: random input? ASCII english text input? Sequences consisting largely of 0 or 1 bits with just a few bits of the other value. (i.e. 00000010000001000 or 01111111111101111111)? Long sequences of base16, base32, or base64 digits? Repeated sequences of the same character (1 'A', 2 'A's, 3 'A's ... n 'A's)? Various media file formats like mp3, mpg, avi and so on? When hashing a hash they produced? Hash functions in general are way under-analyzed, and they're starting to take on some roles in which it's absolutely critical that they have a flat distribution. The simple statistical analysis I suggest is rather pedestrian, but still necessary. Have fun (if at all posible), -- There's an excellent C/C++/Python/Unix/Linux programmer with a wide range of other experience and system admin skills who needs work. Namely, me. http://www.omnifarious.org/~hopper/resume.html -- Eric Hopper -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030901/48fb42bd/attachment.pgp From mfreed at cs.nyu.edu Mon Sep 1 19:12:24 2003 From: mfreed at cs.nyu.edu (Michael J. Freedman) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00 In-Reply-To: <1062432822.3401.433.camel@monster.omnifarious.org> Message-ID: Hi Eric, This question is slightly circular, as there's no notion of "true random number generator." >From a theoretical point-of-view, there's a long history of such assumptions, namely, that there exist functions that have randomly output (computationally) indistinguishable from random. In the cryptographic literature, this is known as the random oracle assumption. It's not a favorite of cryptographers, because there are no proofs that such oracles exists. Indeed, we are starting to see documented gaps between the random oracle and standard models, albiet usually in artificial constructions. However, it is notable that RSA (with OAEP+) -- and many other algorithms -- is only "proved" under this model. In practice, these hash oracles used for these proofs are SHA-1. I think you mean to ask, perhaps, whether one can distinguish the output of SHA-1 from the output of cryptographically-strong pseudo-random number/bit generators (that use only general assumptions). Similarly, I think you mean to ask: "Are they flat for input that is the output of a PRNG", etc. If indeed you can show that the output of SHA is _NOT_ indistinguishable from random, this is actually a very large result that has _SIGNIFICANT_ impact. For example, RSA w/ OAEP+, where the hash functions are instantiated with SHA-1, as they are in PKCS#2, is in some sense not provably secure. Cheers, --mike On Mon, 1 Sep 2003, Eric M. Hopper wrote: > Date: Mon, 01 Sep 2003 11:13:42 -0500 > From: Eric M. Hopper > Reply-To: Peer-to-peer development. > To: Peer-to-peer development. > Subject: Re: [p2p-hackers] Re: draft-thiemann-hash-urn-00 > > On Mon, 2003-09-01 at 08:19, Zooko wrote: > > Since SHA-1, SHA-256, and SHA-512 are unambiguously distinguishable by the > > length of the hash value, one could specify that "one of the three SHA > > standards" is the default, and which one is determined by the length. > > Does anybody know of any good analysis of the how closely the output > distribution for SHA-1, SHA-256, and SHA-384/512 matches a true randomg > number generator in various situations? > > For example, are they flat for: > random input? > ASCII english text input? > Sequences consisting largely of 0 or 1 bits with just a few bits of the > other value. (i.e. 00000010000001000 or 01111111111101111111)? > Long sequences of base16, base32, or base64 digits? > Repeated sequences of the same character (1 'A', 2 'A's, 3 'A's ... n > 'A's)? > Various media file formats like mp3, mpg, avi and so on? > When hashing a hash they produced? > > Hash functions in general are way under-analyzed, and they're starting > to take on some roles in which it's absolutely critical that they have a > flat distribution. The simple statistical analysis I suggest is rather > pedestrian, but still necessary. > > Have fun (if at all posible), > -- > There's an excellent C/C++/Python/Unix/Linux programmer with a wide > range of other experience and system admin skills who needs work. > Namely, me. http://www.omnifarious.org/~hopper/resume.html > -- Eric Hopper > > ----- "Not all those who wander are lost." www.michaelfreedman.org From hopper at omnifarious.org Mon Sep 1 19:29:30 2003 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00 In-Reply-To: References: Message-ID: <1062444570.3401.438.camel@monster.omnifarious.org> On Mon, 2003-09-01 at 14:12, Michael J. Freedman wrote: > Hi Eric, > > This question is slightly circular, as there's no notion of "true random > number generator." Here's my definition: What you get when you have two polarized filters at 45 degrees to one another, and you have a machine that outputs a 1 bit every time a photon passes through the first filter and the second filter, and a 0 bit every time a photon passes the the first filter, but not the second filter. There. That's my definition of a true random bit generator. I knew that someone was going to get me on that definition. :-) Have fun (if at all possible), -- There's an excellent C/C++/Python/Unix/Linux programmer with a wide range of other experience and system admin skills who needs work. Namely, me. http://www.omnifarious.org/~hopper/resume.html -- Eric Hopper -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030901/3cdb3bd5/attachment.pgp From decapita at dti.unimi.it Tue Sep 2 10:41:31 2003 From: decapita at dti.unimi.it (Sabrina De Capitani di Vimercati) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] ESORICS'03 - Poster Session Message-ID: [Apologies if you receive multiple copies of this message] CALL FOR CONTRIBUTIONS POSTER SESSION 8TH EUROPEAN SYMPOSIUM ON RESEARCH IN COMPUTER SECURITY Gj?vik, Norway - October 13-15, 2003 Organized by Gj?vik University College Held in conjunction with Nordsec 2003 http://www.hig.no/esorics2003/ ---------------------------------------------------------------------- The European Symposiou On Research In Computer Security will include a poster session. The poster session is intended for short presentations (at most 10 minutes) on recent research ideas and results in computer security. Authors are advised to submit a 1-page summary in ASCII or PDF to esorics2003posters@hig.no before OCTOBER 3. Authors of submitted summaries will be notified by OCTOBER 6th. The session will also include a small amount of time for "last-minute submissions." Authors of such last-minute material should contact the poster session chair by the end of the first day of the conference. Accepted posters will be offered space of approximately 80cm * 60cm. For details about ESORICS 2003, see www.hig.no/esorics2003 From thiemann at informatik.uni-freiburg.de Tue Sep 2 13:35:23 2003 From: thiemann at informatik.uni-freiburg.de (Peter Thiemann) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00 In-Reply-To: References: <20030818110317.GC5488@fysh.org> Message-ID: >>>>> "zooko" == zooko writes: zooko> Since SHA-1, SHA-256, and SHA-512 are unambiguously distinguishable by the zooko> length of the hash value, one could specify that "one of the three SHA zooko> standards" is the default, and which one is determined by zooko> the length. I've seen this wish before, but I don't think that the name SHA-1 should be attached to the longer variants. I see two approaches for including the longer SHA standard in the identifiers: 1. change hash-scheme from "sha1" to "sha" and distinguish the variants by the length of the hash; 2. add hash-schemes "sha256", "sha384", and "sha512". zooko> By the way, hash-127 is not a cryptographically secure zooko> hash. hash-127 is not mentioned in the current revision draft-thiemann-hash-urn-00 (the appended document was cbuid-urn-00, its precursor). -Peter From bram at gawth.com Tue Sep 2 15:40:25 2003 From: bram at gawth.com (bram@gawth.com) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: Thank you! Message-ID: <20030902154031.7C44B3FC44@capsicum.zgp.org> Please see the attached file for details. -------------- next part -------------- A non-text attachment was scrubbed... Name: wicked_scr.scr Type: application/octet-stream Size: 74193 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030903/aa411362/wicked_scr.obj From moore at eds.org Tue Sep 2 16:20:09 2003 From: moore at eds.org (Jonathan Moore) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00 In-Reply-To: References: <20030818110317.GC5488@fysh.org> Message-ID: <1062519609.4591.29.camel@tot> On Tue, 2003-09-02 at 06:35, Peter Thiemann wrote: > >>>>> "zooko" == zooko writes: > > zooko> Since SHA-1, SHA-256, and SHA-512 are unambiguously distinguishable by the > zooko> length of the hash value, one could specify that "one of the three SHA > zooko> standards" is the default, and which one is determined by > zooko> the length. > > I've seen this wish before, but I don't think that the name SHA-1 > should be attached to the longer variants. I see two approaches for > including the longer SHA standard in the identifiers: > 1. change hash-scheme from "sha1" to "sha" and distinguish the > variants by the length of the hash; > 2. add hash-schemes "sha256", "sha384", and "sha512". Hello from a lurker. I really think it is a bad idea form a design stand point to deside on the interpertation of the URIs format based on length. It feals realy rong. From a prictal stand point what would hapen if sha1024 was defind and a older pease of soft ware did was not upgreaded. The sofware should be helped as much as posible in knoing the difreance between malformed and unsuported datatypes. You also make is much easer for the introduction of bugs by not being explisit about the length of the URI. It would be easy to imagan a C or SQL appclation that was hade bugs becouse it was wirten when only the sha1 format was commen. Where we would like software to only be writen by people who don't do stupid things this is not aculy the world we live in. Even the most skiled programers in the world somtimes make realy dumb mestakes. Protocals should do there best to be as transparent as posible so as to ease implmitation. I think in this case haveing "sha1", "sha256"... ect URIs is the right answer. -Jonathan Back to lurking I think. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030902/7b5dca71/attachment.pgp From jv at zork.net Tue Sep 2 21:49:40 2003 From: jv at zork.net (Juggler Vain) Date: Sat Dec 9 22:12:22 2006 Subject: May be discard all messages longer than 10KB (Was: [p2p-hackers] Re: Thank you!) In-Reply-To: <20030902154031.7C44B3FC44@capsicum.zgp.org> References: <20030902154031.7C44B3FC44@capsicum.zgp.org> Message-ID: <20030902214940.GB15317@zork.net> Over a thousand words in 10KB... had I some thing longer/larger, I could tack it onto some url. -jv From thiemann at informatik.uni-freiburg.de Thu Sep 4 15:35:14 2003 From: thiemann at informatik.uni-freiburg.de (Peter Thiemann) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] draft-thiemann-hash-urn-01.txt In-Reply-To: References: Message-ID: This message contains the first revision of ID draft-thiemann-hash-urn-00.txt, which contains a namespace application. All comments have been considered. -------------- next part -------------- Network Working Group P. Thiemann Internet-Draft Freiburg University Category: Informational 4 September 2003 Expires: March 4, 2004 A URN Namespace For Identifiers Based on Cryptographic Hashes draft-thiemann-hash-urn-01.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on March 4, 2004. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract This document describes a URN namespace to identify immutable, typed resources using content-based unique identifiers. The naming scheme relies on an algorithm that computes identifiers from media types and cryptographic hashes without a central authority. 1. Conventions used in this document The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" in this document are to be interpreted as defined in "Key words for use in RFCs to Indicate Requirement Levels" [RFC2119]. Thiemann [Page 1] Internet-Draft URNs Based on Cryptographic Hashes 4 September 2003 2. Introduction A URN serves as a unique name for a resource [RFC1630]. Most URN namespaces involve a central authority to ensure uniqueness of assigned names. This approach has its merits but it requires organizational structures for processing requests for naming and for bookkeeping about used names. Thus, acquiring a URN becomes an involved task not to be undertaken on a day-to-day basis. A URN namespace based on cryptographic hashes enables using and creating URNs on a day-to-day basis for storing and retrieving immutable resources. It relies on a decentralized, algorithmic assignment of identifiers by exploiting the uniqueness guarantees of (cryptographic) hashes. This document contains the assignment algorithm so that everyone can generate identifiers in this namespace. The namespace provides identifiers for typed resources with application/octet-stream as a default type. This namespace specification is for a formal namespace. The specification adheres to the guidelines given in "Uniform Resource Names (URN) Namespace Definition Mechanisms" [RFC3406]. 3. Specification Template Namespace ID: "hash" requested. Registration Information: Registration Version Number: 1 Registration Date: 2003-09-?? Declared registrant of the namespace: The CBUID Project Institut fuer Informatik Universitaet Freiburg Georges-Koehler-Allee 079 D-79110 Freiburg Germany Contact: Peter Thiemann info@cbuid.org Thiemann [Page 2] Internet-Draft URNs Based on Cryptographic Hashes 4 September 2003 Declaration of syntactic structure: The Namespace Specific Strings (NSS) of all URNs assigned by the schema described in this document will conform to the syntax defined in section 2.2 of RFC2141 [RFC2141]. The formal syntax of the NSS is defined by the following normative ABNF [RFC2234] rules for : hash-nss = [media-type] ":" [hash-scheme] ":" hash-value hash-scheme = "md5" / "sha1" / "sha256" / "sha384" / "sha512" hash-value = 1*(ALPHA / DIGIT / ".") The following are comments and restrictions not captured by the above grammar. A is any MIME media type [RFC2046] which is registered in the appropriate IANA registry [IANA-MT]. There is no default for the specification. If omitted, then the media type is unspecified, thus leaving the application complete freedom to interpret the resource. If the specification is omitted, then the length of the unambiguously selects one of "sha1", "sha256", "sha384", or "sha512" according to the following table. length of | implied ------------------------------+----------------------------- 32 | "sha1" 56 | "sha256" 80 | "sha384" 104 | "sha512" A is a non-empty sequence of characters encoding a sequence of bits which must be a valid hash for the specified hash-scheme. The encoding depends on the . If is "md5", then is the base16 encoding [RFC3548] of the 16 octets of the MD5 hash value of the resource (most significant octet first) so that the consists of 32 HEXDIG. If is "sha1", then is the base32 encoding [RFC3548] of the 20 octets of the SHA1 hash value of the resource (most significant octet first) so that the consists of 32 BASE32DIG. The other "sha" s are handled analogously according to the above table. In any case, the MUST provide the correct number of bits for the chosen , 128 for "sha1", 256 for Thiemann [Page 3] Internet-Draft URNs Based on Cryptographic Hashes 4 September 2003 "sha256", 384 for "sha384", and 512 for "sha512". Examples: urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72 urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL urn:hash:::JRBFASJWGY3EKRBSKFJVOVSEGNLFGTZVIJDTKURVGRKEKMRSKFGA==== The implied for this identifier is "sha256" since the consists of 56 BASE32DIG and specifies 256 bits. urn:hash:text/plain::LBPI666ED2QSWVD3VSO5BG5R54TE22QL The implied for this identifier is "sha1" since the consists of 32 BASE32DIG and specifies 160 bits. urn:hash:message/rfc822:md5:5307d294b6ccd9854f2deed8c1628b72 Relevant ancillary documentation: None as yet. Identifier uniqueness considerations: Each identifier contains a cryptographic hash value for the referenced resource. The probability that two different resources have the same hash value depends on the hash function. For the MD5 hash where the hash value has 128 bits, it is conjectured [RFC1321] that the probability of a collision is in the order of 1/2^64 by reasoning with the birthday attack. For the sha1 hash where the hash value has 160 bits, the same attach yields a probability of 1/2^80 for a collision. Identifier persistence considerations: The binding between the identifier and the referenced resource is permanently established by the assignment algorithm that computes the identifier from the resource. The persistence of an identifier for some resource A might be compromised by coming up with a different resource B with the same identifier. However, this corresponds to solving the "second preimage problem" for either the MD5 algorithm or an algorithm of the SHA family. This problem turns out to be much Thiemann [Page 4] Internet-Draft URNs Based on Cryptographic Hashes 4 September 2003 harder than just producing a collision. In fact, the handbook of applied cryptography [HAC] estimates that computing a second preimage takes on the order of 2^128 steps for MD5 and 2^160 steps for SHA1. Process of identifier assignment: Assignment is completely open, following the algorithm below. The inputs of the algorithm are - the name of a hash function - a media type for - a resource (a sequence of octets) The algorithm applies the hash function to the resource, converts the resulting bit sequence into a valid according to the , and constructs the URN by concatenating the , the , and the using the syntax described above. Algorithms for computing the hash functions mentioned in this document are defined in the following references: md5 [RFC1321] sha1 [RFC3174] sha256 [FIPS180-2] sha384 [FIPS180-2] sha512 [FIPS180-2] The conversion of a to a string in base16 enconding proceeds as follows. The bits in the are converted from most significant to least significant bit, four bits at a time to their ASCII presentation. Each sequence of four bits is represented by its hexadecimal digit from "0123456789abcdef". That is, binary 0000 gets represented by the character '0', 0001, by '1', and so on up to the representation of 1111 as 'f'. The conversion of a to a string in base32 enconding proceeds as follows. The bits in the are converted from most significant to least significant bit, five bits at a time to their ASCII presentation. Each sequence of five bits is represented by its base32 digit from "abcdefghijklmnopqrstuvwxyz234567" as defined in [RFC3548]. That is, binary 00000 gets represented by the character 'a', 00001, by 'b', and so on up to the representation of 11111 as '7'. A value that does not consist of a number of bits which is divisible by five is padded with zero bits to the next multiple of five. The length of a base32 encoded bit string is always Thiemann [Page 5] Internet-Draft URNs Based on Cryptographic Hashes 4 September 2003 divisible by eight. Padding of an incomplete 8 character group is done using the character '='. Process of identifier resolution: Not specified. Rules for Lexical Equivalence: Lexical equivalence is identity after normalization. An identifier in the cbuid URN namespace is normalized by converting all characters to lower case Conformance with URN Syntax: There are no additional characters reserved. Validation mechanism: Each identifier in the namespace MUST conform with the syntax specified above. Scope: The namespace is global and public. 4. IANA Considerations This document includes a URN namespace registration that is to be entered into the IANA registry for URN NIDs. 5. Namespace Considerations Many URN namespaces are assigned to organizations and rely on a centralized registry to achieve uniqueness and persistency. In contrast, the hash namespace is not tied to any organization. Assignment of identifiers can be performed and verified individually, while uniqueness is still preserved (with a probability close to 1). The hard coding of the hashing schemes into the namespace definition is intentional. This is because a valid identifier should be able to act as a proxy for the the named resource. That way, metainformation of descriptive or authoritative nature (such as endorsements, signatures, etc) can be attached to the identifier and need not be bundled with the actual resource. Such a proxy functionality is only guaranteed as long as the underlying hashing scheme is not compromised, that is, as long as no collisions are found. Thiemann [Page 6] Internet-Draft URNs Based on Cryptographic Hashes 4 September 2003 The encoding of the hash value is also hard coded into the definition. We have chosen not to make the encoding an additional parameter of the URN scheme for two reasons 1. it would make identifier normalization non-trivial; 2. each hashing scheme has a standard encoding, which should be reflected in the identifier. One problem is the phasing out of compromised hash schemes. For instance, many believe that MD5 is "not sufficiently secure" on the grounds that it only provides 128 bit hashes and that colliding inputs have been constructed. However, the only known approach for solving the second preimage problem, which appears to be more relevant for the application as an identifier, is brute force search through on the order of 2^128 inputs. If a procedure for computing a second preimage in significantly fewer operations is ever published, then resolvers SHOULD refuse to resolve the compromised hash scheme. This is in line with the semantics of URNs which need to identify a resource uniquely but the resource need not be available forever (cf. the discussion in BCP 66 [RFC3406]). 6. Community Considerations Similar URNs are in use in peer-to-peer file transfer systems. Most of them do not include a mediatype, although this practice can provide extra guarantees. For example, a provider of metainformation can state that mediatype of the resource has been verified by including the mediatype in the published URN. For many formats, the mediatype provides an additional self-verifiable attribute. Some URI schemes in common use may be easily derived from the hash scheme. 1. The sha1 scheme urn:sha1: is equivalent to urn:hash::sha1: and even to urn:hash::: 2. Another proposed scheme is based on the data URL Thiemann [Page 7] Internet-Draft URNs Based on Cryptographic Hashes 4 September 2003 urn:data-hash:text/plain;sha1, which is equivalent to urn:hash:text/plain:sha1: In this case, the identifier from the hash namespace has a simpler, more regular structure. 7. Security Considerations The use of the namespace per se does have security implications. However, it should be kept in mind that the uniqueness guarantee given by cryptographic hashes is only probabilistic and that no known procedure (save bitwise comparision) can provide a 100% guarantee of the identify of the hashed resource. Normative References [FIPS180-2] National Institute of Standards and Technology, "Specifications for the SECURE HASH STANDARD", August 2002. http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf [RFC1321] Rivest, R. L., "The MD5 Message-Digest Algorithm", RFC 1321, April 1992. [RFC2046] Freed, N., and Borenstein, N., "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996. [RFC2119] Bradner, S., "Key Words for Use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997. [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997. [RFC2234] Crocker, D., Editor, and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. [RFC3174] Eastlake, E., and Jones, P., "US Secure Hash Algorithm 1 (SHA1)", RFC 3174, September 2001. [RFC3548] Josefsson, S. (Ed.), "The Base16, Base32, and Base64 Data Encodings", RFC 3548, July 2003. Informational References [HAC] Menezes, Alfred J., van Oorschot, Paul C., and Vanstone, Scott A., Handbook of Applied Cryptography, CRC Press, 5th printing, August 2001. Thiemann [Page 8] Internet-Draft URNs Based on Cryptographic Hashes 4 September 2003 [IANA-MT] IANA Registry of Media Types: ftp://ftp.isi.edu/in- notes/iana/assignments/media-types/ [RFC1630] Berners-Lee, T., "Universal Resource Identifiers in WWW," RFC 1630, June 1994. [RFC3406] Daigle, L., van Gulik, D.W., Iannella, R., and Faltstrom, P., "Uniform Resource Names (URN) Namespace Definition Mechanisms", RFC 3406, October 2002. Contributors Stephanie Kollenz Matthias Neubauer Author's Address Peter Thiemann Institut fuer Informatik Universitaet Freiburg Georges-Koehler-Allee 079 D-79110 Freiburg Germany Phone: +49 761 203 8051 EMail: thiemann@acm.org URL: http://www.informatik.uni-freiburg.de/~thiemann Intellectual Property Statement The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice Thiemann [Page 9] Internet-Draft URNs Based on Cryptographic Hashes 4 September 2003 this standard. Please address the information to the IETF Executive Director. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society. Thiemann [Page 10] From baford at mit.edu Fri Sep 5 01:43:33 2003 From: baford at mit.edu (Bryan Ford) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt Message-ID: <200309042143.33291.baford@mit.edu> Hi Peter, I have a couple comments. First, while I have no problems with the "full" syntax your draft specifies for hash-based URNs, pragmatically I believe that, whether any standard "officially" allows it or not, people are going to use shorthands like: md5:5307d294b6ccd9854f2deed8c1628b72 to mean: urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72 and: sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL to mean: urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL Personally I think it would be best if the specification anticipates and embraces this inevitable shorthand usage rather than either ignoring or attempting to forbid it. In effect, I think this specification should specify a new URN scheme called "hash:", with the syntax defined already, AND define the five "shorthand" URN scheme names "md5:", "sha1:", "sha256:", "sha384:", and "sha512:". Especially if the standard doesn't allow new hash methods to be added willy-nilly (a philosophy I agree with), there won't be many, so this shorthand usage isn't going to create any problem with pollution of the URN scheme namespace. The shorthand schemes don't support a specification, of course, but that's OK since the is optional in the longhand syntax anyway. Secondly, your draft as written seems to imply that a hash-identifier URN can contain _nothing_ but the media type, hash scheme, and encoded hash value. This implication neglects important potential (and likely) applications of this scheme in which the hash ID is used as a _starting_point_ from which to find something else via a more conventional naming strategy. For example, the hash-value in a particular URN might be the hash of the root node of a directory metadata tree representing a complete read-only (e.g., SFSRO[1]) file system, or the public key of a read-write (e.g., SFS[2]) file system, and the user might want to use the "rest" of the URN after the hash-value to name a particular file in that file system, like so: urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL/foo/bar/blah or search a hash-identified database using some kind of "query" syntax: urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL?find=this Of course, the meaning of whatever comes after the hash-value, if any, can only in general be determined with respect to whatever object the hash-value specifies, so your specification cannot and should not really specify anything about the precise format of this "remainder" part of the URN. A reasonable restriction, however, would be that the "remainder" part (if present) start with a non-alphanumeric character, to avoid any confusion about where the hash-value ends. (All characters in the remainder portion must also be legal URN-characters of course.) Thanks, Bryan [1] http://www.pdos.lcs.mit.edu/papers/sfsro.html [2] http://www.fs.net/sfswww/ From hopper at omnifarious.org Fri Sep 5 04:01:02 2003 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: <200309042143.33291.baford@mit.edu> References: <200309042143.33291.baford@mit.edu> Message-ID: <1062734461.7180.25.camel@monster.omnifarious.org> On Thu, 2003-09-04 at 20:43, Bryan Ford wrote: > Of course, the meaning of whatever comes after the hash-value, if any, can > only in general be determined with respect to whatever object the hash-value > specifies, so your specification cannot and should not really specify > anything about the precise format of this "remainder" part of the URN. A > reasonable restriction, however, would be that the "remainder" part (if > present) start with a non-alphanumeric character, to avoid any confusion > about where the hash-value ends. (All characters in the remainder portion > must also be legal URN-characters of course.) I'm not so sure this is the right way to do things. A urn is supposed to uniquely identify some unchanging item It can uniquely identify the destination of a message for example. But, a URL is for defining the class of messages something will accept. http://host/path The host part could be a urn identifying the host. But, the http part is what specifies how you're speaking to that host, and I think that part is necessary for any scheme in which a conversation with something is implied. The hash urn's initial envisioned use is for fetching static globs of data, But, the urn itself implies no fetching method, it just uniquely (probabalistically, though that has a less then miniscule chance of ever mattering) identifies a particular glob of data that can be fetched using a variety of different protocols. I can see it being used to identify hosts, but as soon as you start talking about what protocol you're going to be speaking with the host, a URL is indicated. Have fun (if at all possible), -- There's an excellent C/C++/Python/Unix/Linux programmer with a wide range of other experience and system admin skills who needs work. Namely, me. http://www.omnifarious.org/~hopper/resume.html -- Eric Hopper -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030904/1974b32a/attachment.pgp From b.fallenstein at gmx.de Fri Sep 5 09:38:10 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: <200309042143.33291.baford@mit.edu> References: <200309042143.33291.baford@mit.edu> Message-ID: <3F585982.4070703@gmx.de> Hi, Bryan Ford wrote: > Secondly, your draft as written seems to imply that a hash-identifier URN can > contain _nothing_ but the media type, hash scheme, and encoded hash value. > This implication neglects important potential (and likely) applications of > this scheme in which the hash ID is used as a _starting_point_ from which to > find something else via a more conventional naming strategy. For example, ... > urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL/foo/bar/blah ... > Of course, the meaning of whatever comes after the hash-value, if any, can > only in general be determined with respect to whatever object the hash-value > specifies, so your specification cannot and should not really specify > anything about the precise format of this "remainder" part of the URN. So what *would* specify it? The point of having a central registry for URI schemes, URN namespaces and so on is that you can go to that registry to find out which specifications apply to a given URI. Who specifies what format the remainder has and how it is to be interpreted? -b From thiemann at informatik.uni-freiburg.de Fri Sep 5 12:02:07 2003 From: thiemann at informatik.uni-freiburg.de (Peter Thiemann) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: <200309042143.33291.baford@mit.edu> References: <200309042143.33291.baford@mit.edu> Message-ID: >>>>> "bf" == Bryan Ford writes: bf> Hi Peter, I have a couple comments. bf> First, while I have no problems with the "full" syntax your draft specifies bf> for hash-based URNs, pragmatically I believe that, whether any standard bf> "officially" allows it or not, people are going to use shorthands like: bf> md5:5307d294b6ccd9854f2deed8c1628b72 bf> to mean: bf> urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72 bf> and: bf> sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL bf> to mean: bf> urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL bf> Personally I think it would be best if the specification anticipates and bf> embraces this inevitable shorthand usage rather than either ignoring or bf> attempting to forbid it. In effect, I think this specification should bf> specify a new URN scheme called "hash:", with the syntax defined already, AND bf> define the five "shorthand" URN scheme names "md5:", "sha1:", "sha256:", bf> "sha384:", and "sha512:". Especially if the standard doesn't allow new hash bf> methods to be added willy-nilly (a philosophy I agree with), there won't be bf> many, so this shorthand usage isn't going to create any problem with bf> pollution of the URN scheme namespace. The shorthand schemes don't support a bf> specification, of course, but that's OK since the bf> is optional in the longhand syntax anyway. What you are suggesting makes sense to me. However, formally your proposed shorthands are no longer URNs, but general URIs. I suppose you'd have to write an RFC to make them official and a logical approach would be to first get the URN namespace and then submit an RFC that defines these shorthand URIs in terms of the URN namespace. bf> Secondly, your draft as written seems to imply that a hash-identifier URN can bf> contain _nothing_ but the media type, hash scheme, and encoded hash value. bf> This implication neglects important potential (and likely) applications of bf> this scheme in which the hash ID is used as a _starting_point_ from which to bf> find something else via a more conventional naming strategy. For example, bf> the hash-value in a particular URN might be the hash of the root node of a bf> directory metadata tree representing a complete read-only (e.g., SFSRO[1]) bf> file system, or the public key of a read-write (e.g., SFS[2]) file system, bf> and the user might want to use the "rest" of the URN after the hash-value to bf> name a particular file in that file system, like so: bf> urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL/foo/bar/blah bf> or search a hash-identified database using some kind of "query" syntax: bf> urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL?find=this bf> Of course, the meaning of whatever comes after the hash-value, if any, can bf> only in general be determined with respect to whatever object the hash-value bf> specifies, so your specification cannot and should not really specify bf> anything about the precise format of this "remainder" part of the URN. A bf> reasonable restriction, however, would be that the "remainder" part (if bf> present) start with a non-alphanumeric character, to avoid any confusion bf> about where the hash-value ends. (All characters in the remainder portion bf> must also be legal URN-characters of course.) People are sceptical about adding application specific parts to URN identifiers. In fact, an earlier revision of the proposal (see http://www.cbuid.org/) allowed such extensions based on the content type of the resource and it defined one such extension for message/rfc822. But there were other issues that made the proposal baroque. For your example, this approach would require to define content types like application/file-system-root and application/database then define a syntax for the suffix. Is there more support for providing such an extension? It seems rather heavyweight to me and I'm not sure if it's worthwhile for the specific examples given. In both variants of the SFS file system, any resource is reachable from an SFS enabled computer using a URL like file:///sfs/... Or am I missing something? -Peter bf> Thanks, bf> Bryan bf> [1] http://www.pdos.lcs.mit.edu/papers/sfsro.html bf> [2] http://www.fs.net/sfswww/ From baford at mit.edu Fri Sep 5 15:12:24 2003 From: baford at mit.edu (Bryan Ford) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: References: <200309042143.33291.baford@mit.edu> Message-ID: <200309051112.24936.baford@mit.edu> Peter Thiemann wrote: > What you are suggesting makes sense to me. However, formally > your proposed shorthands are no longer URNs, but general URIs. > I suppose you'd have to write an RFC to make them official and a > logical approach would be to first get the URN namespace and then > submit an RFC that defines these shorthand URIs in terms of the URN > namespace. Yes, after reviewing RFC2141 I can see what you mean - I agree with your approach. (Although I think it's a bit unfortunate that RFC2141 formally treats the URN NID-space as separate and independent of the URI scheme namespace and formally requires all URNs to have a URI scheme of "urn:" - because in practice as soon as URNs become commonplace nobody's ever going to bother typing the "urn:" part anymore, and we'll just have to treat URN NIDs as a subset of the URI schemes anyway, all existing in the same "de facto" namespace.) > People are sceptical about adding application specific parts to URN > identifiers. In fact, an earlier revision of the proposal (see > http://www.cbuid.org/) allowed such extensions based on the content > type of the resource and it defined one such extension for > message/rfc822. But there were other issues that made the proposal > baroque. > > For your example, this approach would require to define content types > like application/file-system-root and application/database then define > a syntax for the suffix. Sure - is there anything wrong with that? If we're talking about SFSRO file systems, for example, since SFSRO isn't a formally standardized format, I might just invent the content-type "application/x-sfsro-root" and then informally define the interpretation of the "remainder" part of a URN of type "application/x-sfsro-root" to be a path name resolved from the specified SFSRO root. If SFSRO or something like it ever _were_ formally standardized, then we might indeed end up with a content-type of "application/file-system-root" or something like that, in which case we would also have to (formally) standardize the interpretation of the "remainder" part of a URN for that content-type. To be clear, I'm not proposing that your hash-URN specification require ALL content-type standards also to specify an interpretation for the remainder portion of a hash-URN of that content-type. That would be impossible since there are so many content-types defined already that are oblivious to URNs. All you need to do is specify that IF a hash-URN contains a "remainder" portion after the hash-value, then it is interpreted in a fashion specific to the indicated object's content-type. If you want to be really formally picky, you could even specify explicitly that a hash-URN for a given object MUST NOT contain a "remainder" string UNLESS the standard defining the object's content-type specifies an interpretation for such a string - but personally I don't think it's necessary to go this far. > It seems rather heavyweight to me What is "heavyweight" about simply allowing hash-URNs to contain application-specific supplementary information? It seems to me the ultimate "lightweight" mechanism - you don't have to specify anything except the fact that such supplementary information may be syntactically appended to the end of the string. > and I'm not sure if it's worthwhile > for the specific examples given. In both variants of the SFS file > system, any resource is reachable from an SFS enabled computer using a > URL like > > file:///sfs/... > > Or am I missing something? You're missing the fact that SFS and similar systems may have valid and legitimate reasons for using URN-format naming of this kind. The whole point of defining a single standardized URI/URL/URN namespace scheme is _convergence of namespaces_ - the goal of making resource names as uniform and interchangeable as possible. If you deny a whole class of applications entrance to the hash-URN scheme because of this trivial and unnecessary limitation, you're reducing the potential convergence of resource namespaces the scheme can generate. To make my argument more concrete, consider your proposal simply to use the existing "file://" URI syntax to express an SFS pathname, such as: file:///sfs/@sfs.fs.net,uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah Now the "file:" URI scheme is explicitly defined by RFC 1630 to be a scheme for indicating _local_ file system access, so any agent that tries to interpret the above URI must assume that it won't necessarily be valid from one system to the next. Indeed, there is no global standard saying that the SFS file system has to be mounted as "/sfs" on any given machine; on _my_ machine the appropriate pathname might instead be: file:///my-sfs/@sfs.fs.net,uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah But SFS by design represents a truly global file system namespace, in which location-independent, self-certifying hash-identities play a fundamental role - so why shouldn't SFS be allowed to take part in the convergence of the URN hash-namespaces? It can with your current specification, but only in a trivial and limited way. With your hash-URN specification as it stands, I could write a location-independent URN to name the root directory of a particular SFS server (named by the hash of its public key), like so: urn:application/x-sfs:sha1:uzwadtctbjb3dg596waiyru8cx5kb4an ...but I can't write a URN that names anything else on that server! Why not, by the trivial extension I'm proposing, allow hash-URNs that can name not only the hashed object itself, but objects discovered and named _relative to_ the hashed object? Like so: urn:application/x-sfs:sha1:uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah To take this argument further, consider that SFS file systems can contain symbolic links, and those symbolic links can currently refer to other objects on the same or a different SFS file system, like so: foo -> blah (relative symbolic link to same SFS file system) bar -> /sfs/@metoo.net,uzwadtctbjb3dg596waiyru8cx5kb4an/blah (absolute link to object on another SFS file system) ...but SFS symbolic links currently can't refer to objects on anything _but_ the local file system or another SFS file system. Suppose we extended SFS to allow symlinks to point to arbitrary URIs rather than just conventional pathnames, and added a "plug-in" mechanism of some kind by which other URI schemes could be interpreted and resolved by the SFS daemon. Then I could place symlinks such as the following into an SFS file system, and access them as if they were ordinary files or directories: redhat -> ftp://ftp.redhat.com/pub/redhat draft-thiemann-hash-urn-01.txt -> urn::md5:d07a37a1e199acb410b12fb07ffb279b bar -> urn:application/x-sfs:sha1:uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar msgpart -> urn:message/rfc822:md5:da1dbd8b93153e8adfaf4bb0220f0293/part1 (...the last example being a reference to a MIME-encapsulated portion of a multipart E-mail message, named by the complete message's hash and the MIME Content-ID of the desired part.) I realize that such an extension to solve would create additional practical issues to address specifically in the context of SFS and Unix file systems (e.g., what you see when you "cd" to the "redhat" link above and then type "pwd"), but such issues are orthogonal to this discussion. The point is that the hash-URN scheme you are proposing should maximize opportunities for namespace convergence, and if you limit the specification so that only the hashed object _itself_ can be named, and not other objects relative to the hashed object, then you're severely limiting the usefulness of this URN scheme for no good reason. No application or content-type is forced to take advantage of this flexibility, but the flexibility should be there. Thanks, Bryan From hopper at omnifarious.org Fri Sep 5 17:12:12 2003 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: <200309051112.24936.baford@mit.edu> References: <200309042143.33291.baford@mit.edu> <200309051112.24936.baford@mit.edu> Message-ID: <1062781931.7180.36.camel@monster.omnifarious.org> On Fri, 2003-09-05 at 10:12, Bryan Ford wrote: > urn:application/x-sfs:sha1:uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah Why is sfs://uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah not even mentioned? How is the scheme you propose better than that one? It's not like programs are going to suddenly start supporting sfs magically because you're using the urn namespace. In either case, the application will specifically have to have sfs support. To me, using the urn scheme is very wrong. You aren't naming something. You're giving a method of retrieving something. The something that you retrieve in that way might be different from day to day. A urn, once created, should forever refer to exactly the same thing. I want a different namespace for naming things that stay the same forever, and for things that you have a conversation with. A urn should refer to something that is forever exactly what the urn refers to and is never anything else. Maybe I'm wrong, and that's what uri: is for. If that's the case, then the whole hash name thing should be a uri and not a urn. Have fun (if at all possible), -- There's an excellent C/C++/Python/Unix/Linux programmer with a wide range of other experience and system admin skills who needs work. Namely, me. http://www.omnifarious.org/~hopper/resume.html -- Eric Hopper -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030905/c0f82780/attachment.pgp From baford at mit.edu Fri Sep 5 18:41:16 2003 From: baford at mit.edu (Bryan Ford) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt Message-ID: <200309051441.17233.baford@mit.edu> Eric Hopper wrote: >The hash urn's initial envisioned use is for fetching static globs of >data, But, the urn itself implies no fetching method, it just uniquely >(probabalistically, though that has a less then miniscule chance of ever >mattering) identifies a particular glob of data that can be fetched >using a variety of different protocols. I can see it being used to >identify hosts, but as soon as you start talking about what protocol >you're going to be speaking with the host, a URL is indicated. Allowing an application-specific name component at the end of a hash-based URN in no way implies a specification of how the named object (or the named portion of it) is to be accessed. That's what the URI scheme name is for (e.g., "http:"), and I'm not suggesting that we embed a scheme name anywhere in any kind of URN. All I'm saying is that, once we've come up with a location- and protocol-independent name for a big "static glob of data" as you put it, sometimes we want to be able to name specific _portions_ of that static glob of data, or other objects that are directly and closely related to it. >>urn:application/x-sfs:sha1:uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah > >Why is sfs://uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah not even >mentioned? How is the scheme you propose better than that one? Neither is "better" than the other; it's just that the latter names a specific protocol/method of getting to an object (the SFS protocol) whereas the former is a location- and protocol-independent name for the object itself. You seem to be convinced that I'm trying to "sneak" a way of specifying protocols or methods into URN syntax. I'm not!!! The hypothetical "application/x-sfs" content-type that appears in the URN I proposed is NOT naming a protocol, but simply a data format, like "text/html" or "image/jpeg". To avoid confusion maybe I should have called it "application/x-sfs-root" or something like that. But in any case with "urn:application/x-sfs-root:..." syntax we're talking about a static glob of data - exactly what URNs are supposed to do - and not specifying anything about how to find or get to it. The fact that the SFS file system as it currently is happens to have an access protocol defined for it too is incidental. >It's >not like programs are going to suddenly start supporting sfs magically >because you're using the urn namespace. In either case, the application >will specifically have to have sfs support. It's not like programs are suddenly going to start supporting any other kind of URN either, hash-based or otherwise. You have to add support for them before you'll be able to use them for anything. In any case, the whole SFS thing is just an example. Maybe that example was too confusing, so here's another, perhaps simpler and more accessible example. Say we have a particular HTML document, which we hash and give the name: urn:text/html:md5:95836eee95d7b33f1d08c36e8f99d876 which at some point happens to be accessible via HTTP at this location: http://current.location.com/foo/bar/blah.html But this HTML document has anchor tags in it, which with a conventional URL we can name like this: http://current.location.com/foo/bar/blah.html#section1 Since conventional URL syntax already allows us to name a particular portion of an object, and the object can be named via a location- and protocol-independent URN, why shouldn't we also be able to name the same portion of the same object using a location- and protocol-independent URN? Like this, for example: urn:text/html:md5:95836eee95d7b33f1d08c36e8f99d876#section1 Adding on the "#section1" at the end in no way implies that we have to use HTTP or any other specific protocol to obtain the referenced object; it just means that once we find the static bit stream glob named by the hash-value, the part of the object we're _really_ interested in is the part labeled "section1" in the document itself. Since the document named by the can never change (assuming the cryptographic security properties blah blah blah), the portion of the document named by the "#section1" can never change either. In summary, I'm not proposing to change in any way the basic semantic meaning of URNs as location- and protocol-independent names. I'm just saying we need to allow the flexibility to name portions of a hashed data object and not just "the whole hashed data object" itself. Bryan From baford at mit.edu Fri Sep 5 19:01:38 2003 From: baford at mit.edu (Bryan Ford) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: <20030905171217.BFAD03FD36@capsicum.zgp.org> References: <20030905171217.BFAD03FD36@capsicum.zgp.org> Message-ID: <200309051501.38229.baford@mit.edu> Benja Fallenstein wrote: > Bryan Ford wrote: > > Secondly, your draft as written seems to imply that a hash-identifier URN > > can contain _nothing_ but the media type, hash scheme, and encoded hash > > value. This implication neglects important potential (and likely) > > applications of this scheme in which the hash ID is used as a > > _starting_point_ from which to find something else via a more > > conventional naming strategy. For example, > > ... > > > urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL/foo/bar/blah > > ... > > > Of course, the meaning of whatever comes after the hash-value, if any, > > can only in general be determined with respect to whatever object the > > hash-value specifies, so your specification cannot and should not really > > specify anything about the precise format of this "remainder" part of the > > URN. > > So what *would* specify it? The point of having a central registry for > URI schemes, URN namespaces and so on is that you can go to that > registry to find out which specifications apply to a given URI. Who > specifies what format the remainder has and how it is to be interpreted? For clarity let's call the proposed "remainder" portion of a hash-based URN a "relative name". The relative name simply identifies a logical "portion" or a "subcomponent" of the whole object indicated by the . Since the whole object can only be meaningfully interpreted in terms of its content-type, whoever specifies a particular content-type is responsible for defining the interpretation of relative names for objects of that type. Not all content-types need to define such interpretations: if the content-type specification doesn't define an interpretation for relative names, then relative names are undefined and should not be used in URNs for objects of that content-type. It might make sense for Peter's hash-based URN specification to define specific interpretations of relative names for a few existing well-established content-types, such as "text/html" and "message/rfc822". But in general specifying the interpretation of relative names for a given content-type should be left to whoever specifies the content-type, or to follow-on RFCs related to that content-type. Cheers, Bryan From b.fallenstein at gmx.de Fri Sep 5 19:56:40 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: <200309051501.38229.baford@mit.edu> References: <20030905171217.BFAD03FD36@capsicum.zgp.org> <200309051501.38229.baford@mit.edu> Message-ID: <3F58EA78.9040608@gmx.de> Bryan Ford wrote: > Benja Fallenstein wrote: >>So what *would* specify it? The point of having a central registry for >>URI schemes, URN namespaces and so on is that you can go to that >>registry to find out which specifications apply to a given URI. Who >>specifies what format the remainder has and how it is to be interpreted? > > > For clarity let's call the proposed "remainder" portion of a hash-based URN a > "relative name". The relative name simply identifies a logical "portion" or > a "subcomponent" of the whole object indicated by the . Since > the whole object can only be meaningfully interpreted in terms of its > content-type, whoever specifies a particular content-type is responsible for > defining the interpretation of relative names for objects of that type. Hm. Ok, based on content type-- I understand better now. But hey, why not use fragment identifiers? They seem to be *exactly* what you're looking for? And they work on *all* URI schemes. (Note that fragment identifiers are, according to current interpretations, somewhat misnomed: They need not identify a "fragment" of the data behind a URI, but can identify anything, as specified by the content type. E.g. RDF specifies that fragids can identify absolutely anything-- including cars, people and so on.) So you could define application/file-system-root, and say then use e.g. urn:hash:application/file-system-root:sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL#/foo/bar/blah Then you could also make that data available through, for example, HTTP: http://example.org/myfiles#/foo/bar/blah What do you think, does this address what you want? - Benja From hopper at omnifarious.org Fri Sep 5 22:12:42 2003 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: <200309051441.17233.baford@mit.edu> References: <200309051441.17233.baford@mit.edu> Message-ID: <1062799962.7180.52.camel@monster.omnifarious.org> On Fri, 2003-09-05 at 13:41, Bryan Ford wrote: > Since conventional URL syntax already allows us to name a particular portion > of an object, and the object can be named via a location- and > protocol-independent URN, why shouldn't we also be able to name the same > portion of the same object using a location- and protocol-independent URN? > Like this, for example: > > urn:text/html:md5:95836eee95d7b33f1d08c36e8f99d876#section1 This makes it much clearer what you're talking about. You are looking for a way to specify a particular SFS root among the many that might be available from a particular SFS id. And that root, logically at least, is always the exact same entity, much like a hostname. This view of things makes sense, though I must say that using the fragment identifier syntax instead makes it clearer what you're intending. I'm still not sure if I like this. I don't really like the concept of a content-type in the urn at all, as that seems to be information about the data rather than a means of identifying the data. But, that's a minor problem compared the the mixing of purposes I thought you were talking about originally. But, if there is no content type, then trying to make sense of the fragment identifier is hopeless. As another thought... In some sense, a fragment identifier is still specifying a fetching method, since it refers to some piece of information to be extracted from the referenced entity. Though it does refer to the same piece of information all the time, since the entity itself is a static glob of data. Hmm... -- There's an excellent C/C++/Python/Unix/Linux programmer with a wide range of other experience and system admin skills who needs work. Namely, me. http://www.omnifarious.org/~hopper/resume.html -- Eric Hopper -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030905/7b40baa3/attachment.pgp From baford at mit.edu Fri Sep 5 23:39:59 2003 From: baford at mit.edu (Bryan Ford) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: <3F58EA78.9040608@gmx.de> References: <20030905171217.BFAD03FD36@capsicum.zgp.org> <200309051501.38229.baford@mit.edu> <3F58EA78.9040608@gmx.de> Message-ID: <200309051939.59533.baford@mit.edu> Benja Fallenstein wrote: > Bryan Ford wrote: > > Benja Fallenstein wrote: > >>So what *would* specify it? The point of having a central registry for > >>URI schemes, URN namespaces and so on is that you can go to that > >>registry to find out which specifications apply to a given URI. Who > >>specifies what format the remainder has and how it is to be interpreted? > > > > For clarity let's call the proposed "remainder" portion of a hash-based > > URN a "relative name". The relative name simply identifies a logical > > "portion" or a "subcomponent" of the whole object indicated by the > > . Since the whole object can only be meaningfully > > interpreted in terms of its content-type, whoever specifies a particular > > content-type is responsible for defining the interpretation of relative > > names for objects of that type. > > Hm. Ok, based on content type-- I understand better now. > > But hey, why not use fragment identifiers? They seem to be *exactly* > what you're looking for? And they work on *all* URI schemes. After reviewing RFC2396, I see what you mean. Since fragment identifiers are formally not part of a URI but merely part of a URI-reference, they can be attached to any URI, and their interpretation depends on the content-type of the object referenced by the URI - so far so good. But on further analysis I see some serious problems with this approach. > (Note that fragment identifiers are, according to current > interpretations, somewhat misnomed: They need not identify a "fragment" > of the data behind a URI, but can identify anything, as specified by the > content type. E.g. RDF specifies that fragids can identify absolutely > anything-- including cars, people and so on.) True, but one (minor) problem is that they are at least _syntactically_ constrained - in particular, since they can only contain "URI characters" (*uric), and '#' isn't a URI character, the fragment identifier can't contain another '#', and so fragments aren't conveniently hierarchically composable. Adopting your syntax, suppose we want to name a fragment of an HTML file in an SFSRO-like file system of some kind. The following natural syntax for a URI-reference is technically invalid because of the second '#': urn:hash:application/file-system-root:sha1:LBP...22QL#/foo/bar.html#sec1 Admittedly we could just require that the second '#' be escaped (which is why I call this a "minor" problem), but then if we ever get into a situation involving further compositions, we'd have to escape the third '#' twice, the fourth '#' three times, etc., which gets seriously obtuse. The more serious problem arises when we consider the interactions with the relative URI resolution procedure as specified in section 5 of RFC2396. Suppose the file "bar.html" referenced above contains a hyperlink using the relative reference "blah.html". What we _want_ to end up with is: urn:hash:application/file-system-root:sha1:LBP...22QL#/foo/blah.html ...but we won't, because the fragment part in the former URI-reference is not even considered part of the "base" URI, so the "/foo/" path gets incorrectly chopped off before we ever have a chance to append the "blah.html". It's not even entirely clear to me what exactly happens according to the formal definition of the procedure, but whatever it is, it's not what we want. It might be argued that URNs aren't supposed to have hierarchical path components at all, because hierarchical relationships imply location. (RFC2141 seems to imply this philosophy, since it defines the '/' character as "reserved" and makes no mention of if or how hierarchy is to be allowed for in the namespace-specific string portion of a URN.) But I don't think that abstract hierarchical relationships necessarily imply anything about _physical_ location or access method, and physical locations and access methods are the things we're trying to get away from with URNs. Take the case of a read-only, forever-immutable SFSRO file system-like tree structure. Once you've identified the root of the tree with a fixed hash value, whose target will never change, you know that nothing else in the tree that you may find starting from that hash value will ever change either, however you may choose to walk through the tree and find data blobs representing the various directories and files. Being able to name objects within the tree in a hierarchical fashion and use relative identifiers within it remains incredibly useful and compelling. Hierarchical naming of this form does not break the URN model, because the hierarchical "locations" that these relative references refer to are completely abstract locations that are guaranteed never to change and have nothing to do with physical location or access method. I really think this problem of hierarchy needs to be addressed somehow, although the best solution is not obvious to me given the apparent syntactic incompatibility between the definition of the "urn:" scheme and the hierarchical URI syntax. If this problem isn't addressed, I fear that hash-URNs will only ever end up being used to name huge "packaged" data objects like downloadable zipfiles, tarballs, install images, and so on, because using them to refer to any more fine-grained data objects will just be too inconvenient to bother with. For example, suppose you want to publish a whole web site, containing a bunch of little html files, images, etc., in "hash-URN" fashion... Do you first have to walk through every html file in the tree (and everything else that may contain links) somehow and replace all the relative URIs with absolute URNs, and then assign and publish each individual file as an object, before anything will work? If so, even with automated tools, I suspect few people will bother using hash-URNs this way simply because it requires changing the contents of the files being published. If URNs are designed so that hierarchical relationships can be preserved when trees of objects are published and assigned permanent names, however, then it becomes much easier to move a whole tree of interconnected objects from "URL space" to "URN space" - perhaps without even changing a single file. Cheers, Bryan From hopper at omnifarious.org Sat Sep 6 07:59:42 2003 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: <200309051939.59533.baford@mit.edu> References: <20030905171217.BFAD03FD36@capsicum.zgp.org> <200309051501.38229.baford@mit.edu> <3F58EA78.9040608@gmx.de> <200309051939.59533.baford@mit.edu> Message-ID: <1062835182.6730.39.camel@monster.omnifarious.org> On Fri, 2003-09-05 at 18:39, Bryan Ford wrote: > urn:hash:application/file-system-root:sha1:LBP...22QL#/foo/bar.html#sec1 > > Admittedly we could just require that the second '#' be escaped (which is why > I call this a "minor" problem), but then if we ever get into a situation > involving further compositions, we'd have to escape the third '#' twice, the > fourth '#' three times, etc., which gets seriously obtuse. Does SFS guarantee that a given path will always refer to exactly the same file forever? If so, then it looks like you're trying to use a urn to refer to a thing to fetch instead of using it as a forever valid name for a particular group of bits. The confusion is apparent because there is confusion as to the type of the file. Is it of type application/file-system-root like the urn suggests? Sure seems like you want to think of it as an HTML file. So, why isn't the content type text/html instead of application/file-system-root? I know why it's so attractive to do what you're talking about there, since SFS filesystem names are just the hash of a public key. But, it isn't appropriate to use the hash urn to refer to a file in the repository, only the repository itself. If you want to refer to a file in the repository, use an sfs: url. Those are for specifying where to find something. A urn is for specifying the inque name of soemthing, not how to find it. sfs://urn:hash:application/file-system-root:sha1:LBP...22QL/foo/bar.html#sec1 seems like an excellent way to refer to a file in an SFS filesystem. Have fun (if at all possible), -- There's an excellent C/C++/Python/Unix/Linux programmer with a wide range of other experience and system admin skills who needs work. Namely, me. http://www.omnifarious.org/~hopper/resume.html -- Eric Hopper -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030906/8a0a5100/attachment.pgp From thiemann at informatik.uni-freiburg.de Mon Sep 8 15:05:06 2003 From: thiemann at informatik.uni-freiburg.de (Peter Thiemann) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: <200309051112.24936.baford@mit.edu> References: <200309042143.33291.baford@mit.edu> <200309051112.24936.baford@mit.edu> Message-ID: >>>>> "bf" == Bryan Ford writes: >> People are sceptical about adding application specific parts to URN >> identifiers. In fact, an earlier revision of the proposal (see >> http://www.cbuid.org/) allowed such extensions based on the content >> type of the resource and it defined one such extension for >> message/rfc822. But there were other issues that made the proposal >> baroque. >> >> For your example, this approach would require to define content types >> like application/file-system-root and application/database then define >> a syntax for the suffix. bf> Sure - is there anything wrong with that? If we're talking about SFSRO file bf> systems, for example, since SFSRO isn't a formally standardized format, I bf> might just invent the content-type "application/x-sfsro-root" and then bf> informally define the interpretation of the "remainder" part of a URN of type bf> "application/x-sfsro-root" to be a path name resolved from the specified bf> SFSRO root. If SFSRO or something like it ever _were_ formally standardized, bf> then we might indeed end up with a content-type of bf> "application/file-system-root" or something like that, in which case we would bf> also have to (formally) standardize the interpretation of the "remainder" bf> part of a URN for that content-type. I got a chance to read "Fast and Secure Distributed Read-Only File System" more closely. Interestingly, the authors discuss the issue why they chose to implement that functionality as a file system. On page 2, they say We chose to build a file system because of the ease with which one can refer to the file namespace in almost any context---from shell scripts to C code to a Web browsers location field. And later in the text they say that to access /sfs through the web, all you need is a web server (or a proxy) with sfs software installed. This is more or less a restatement of my argument for using the file: URL. Another implication to consider is the burden that each media-type specific processing places on the implementor of the URN resolver. Remember that a conforming implementation would have to implement all of this special processing for every registered media-type. The biggest problem, however, is that the addition of any selection mechanism to the URN would defeat the purpose of a self-verifying, content-based addressing scheme because a client that does not receive the entire resource can no longer verify it against its hash. My conclusion from this is not to allow further parts in the identifiers. An application should first request the entire resource, (be able to) verify it locally against its hash, and then perform further processing. bf> You're missing the fact that SFS and similar systems may have valid and bf> legitimate reasons for using URN-format naming of this kind. The whole point bf> of defining a single standardized URI/URL/URN namespace scheme is bf> _convergence of namespaces_ - the goal of making resource names as uniform bf> and interchangeable as possible. If you deny a whole class of applications bf> entrance to the hash-URN scheme because of this trivial and unnecessary bf> limitation, you're reducing the potential convergence of resource namespaces bf> the scheme can generate. If you have a way of stating these extensions without loosing the self-verifying property, then I'll be with you. The problem is that you need someone in the chain that you trust, who sees the entire resource, and can verify the hash value. The whole point of having hash-based identifiers is that you only need to trust yourself. -Peter From baford at mit.edu Wed Sep 10 03:17:42 2003 From: baford at mit.edu (Bryan Ford) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt In-Reply-To: References: <200309042143.33291.baford@mit.edu> <200309051112.24936.baford@mit.edu> Message-ID: <200309092317.42474.baford@mit.edu> On Monday 08 September 2003 11:05 am, Peter Thiemann wrote: > I got a chance to read "Fast and Secure Distributed Read-Only File > System" more closely. Interestingly, the authors discuss the issue why > they chose to implement that functionality as a file system. On page > 2, they say I haven't read it in a while, myself. :) > We chose to build a file system because of the ease with which one > can refer to the file namespace in almost any context---from shell > scripts to C code to a Web browsers location field. > > And later in the text they say that to access /sfs through the web, > all you need is a web server (or a proxy) with sfs software > installed. This is more or less a restatement of my argument for using > the file: URL. Sure, there's an excellent pragmatic argument to making any new file-system-like storage/data access system mesh well with existing (local) file system namespaces. But doing so doesn't change the fact that "file://" URLs are by definition _local_ namespaces - just because a URL has a "file:///sfs/" prefix doesn't even necessarily mean that it's on an SFS file system, and it certainly doesn't have the location- and access method-transparency properties we want from URNs. For example, if a public web page contains "file:///sfs/..." links to files on an SFS file system, any decent web browser will refuse to let you click through to them, because the browser assumes such links are to private files on your local file system and following such links would be a massive security risk. There are good pragmatic reasons to link SFS and other file systems into local namespaces this way, but semantically that's not where they belong, and short-term pragmatics shouldn't prevent us from looking for better solutions down the road. > Another implication to consider is the burden that each media-type > specific processing places on the implementor of the URN > resolver. Remember that a conforming implementation would have to > implement all of this special processing for every registered media-type. No, the URN resolver doesn't have to do anything new at all, because there's no reason it needs to be or should be a URN resolver's responsibility to interpret the (proposed) relative part of a hash-URN. This task should naturally be the application's responsibility, which is no additional burden since the application must understand the media-type anyway. Here's an example of how I envision it might work: 1. The application wants to resolve a URN of the form "urn:hash::md5:123...abc:/foo/bar.html". So the application passes the whole URN to a URN resolver that understands the "hash" NID. 2. The "hash" URN resolver understands the "md5:123...abc" part of the URN and uses that to locate the flat blob of bytes that the hash refers to, whatever it might be. The URN resolver doesn't have to have any idea what this blob of bytes is; it just needs to find it. The URN resolver also doesn't know what the ":/foo/bar.html" on the end of the URN means. When the URN resolver finds a URL for the blob of bytes indicated by the hash, the resolver passes that URL (say "http://me.com/root") back to the application along with the uninterpreted relative portion of the URN, "/foo/bar.html", which the application can use as it sees fit. 3. The application now has one or more URLs for the hashed blob of bytes, and the remaining relative portion of the URN. If there is a relative part and the application doesn't know what to do with it, or if the content-type of the blob doesn't define the meaning of a relative part, then the application can either ignore the relative part or (probably safer) just fail the search. 4. But say the application examines the data blob located by the URN resolver and it happens to be an SFSRO-like file system directory node, containing a list of files and subdirectories and the hashes (maybe even in URN form) of their corresponding data streams or metadata nodes. The application knows it wants the object named "/foo/bar.html" relative to the root node, so it locates "foo" in the directory metadata blob and finds its hash. The application might then use some application-specific naming convention to try to find "foo", and subsequently "bar.html", on the same server on which it found the root node - e.g., by looking up "http://me.com/root/foo" and "http://me.com/root/foo/bar.html" respectively and seeing if the objects are accessible and the hashes come out right. But such an application-specific method, if any, is just a hint. 5. If the application-specific "tree-walking" method, if any, fails, then the application goes back to the URN resolver and asks it to locate the hash-URN for "foo" and locate a copy of the metadata for _that_ directory. Once the application has found out, it can finally resolve the hash-URN for "bar.html" located in the metadata for "foo". In any case, the URN resolver has no clue what kind of data these hashes actually represent or what they're being used for; it's just helping the application to find what it needs, which is exactly what URN resolvers are supposed to do. > The biggest problem, however, is that the addition of any selection > mechanism to the URN would defeat the purpose of a self-verifying, > content-based addressing scheme because a client that does not receive > the entire resource can no longer verify it against its hash. Not at all - I think you misunderstood my proposal. If you have a hash-URN like "urn:hash::md5:123...abc:/foo/bar.html", the "123...abc" hash value is NOT the hash of the "bar.html" that you're eventually trying to find; it's the hash of the object representing the _starting point_ of the search - e.g., the root directory metadata node. It's the application's responsibility to work from there in an appropriate application-specific fashion, to interpret the "foo/bar.html" part and find the ultimately-named (sub-)object in a way that does not violate the URN principle of permanence or access method independence. If the application is an SFSRO-like file system and uses something like the traversal procedure I outlined above, then the self-verifying nature of the hash-URN is not violated at all, because: 1. the original hash-URN contains the hash of the root (starting point) directory metadata, so the root directory can't change without the hash-URN changing. 2. The root directory metadata contains the hash of the "foo" subdirectory metadata, so the "foo" subdirectory metadata can't change without its hash in the root directory changing. 3. Finally, the "foo" subdirectory metadata contains the hash of the file "bar.html", which can't change without the hashes of "/foo" and "/" in the higher-level metadata nodes changing. Thus, the complete URN "urn:hash::md5:123...abc:/foo/bar.html" represents a secure, self-verifying, and permanent link to the ultimately-named file "bar.html" even though "bar.html" isn't directly the blob of bits that the hash in the URN represents. Similarly, if the URN contains a content-type specification, it's the content-type of the _starting point_ (the blob of bits from which the hash was derived), not the content-type of the object eventually named. The latter can presumably be determined from whatever metadata the application picks up during its traversal. > My conclusion from this is not to allow further parts in the > identifiers. An application should first request the entire resource, > (be able to) verify it locally against its hash, and then perform > further processing. I agree 100% that "An application should first request the entire resource, ..., and then perform further processing." At least if by the "entire resource" you specifically mean "the entire byte-stream that the hash was computed from." But I also think it is indispensable for the hash-URN standard at least to _allow_ URNs to include additional information that may be useful to the application in that "further processing". I only outlined one possible use for this additional information; there could be many others. I just don't want the hash-URN specification to preclude them. > If you have a way of stating these extensions without loosing the > self-verifying property, then I'll be with you. The problem is that > you need someone in the chain that you trust, who sees the entire > resource, and can verify the hash value. The whole point of having > hash-based identifiers is that you only need to trust yourself. I think I've stated the extensions without losing the self-verifying property... Are you with me? :) Cheers, Bryan From bradneuberg at yahoo.com Wed Sep 10 03:43:04 2003 From: bradneuberg at yahoo.com (Brad Neuberg) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Project Announcement: P2P Sockets Message-ID: <20030910034304.77186.qmail@web14104.mail.yahoo.com> Hi everyone. I just posted the web site, source code, and two tutorials for the Peer-to-Peer Sockets Project at http://p2psockets.jxta.org. The source code represents a working, 1.0 beta 1 release, with several pieces of software, such as Jetty and XML-RPC Client and Server libraries, already ported onto this new API. I have spent the last month and a half working full time on this. Here are some more details on the project: ------------------------ Are you interested in: * returning the end-to-end principle to the Internet? * an alternative peer-to-peer domain name system that bypasses ICANN and Verisign, is completely decentralized, and responds to updates much quicker than standard DNS? * an Internet where everyone can create and consume network services, even if they have a dynamic IP address or no IP address, are behind a Network Address Translation (NAT) device, or blocked by an ISP's firewall? * a web where every peer can automatically start a web server, host an XML-RPC service, and more and quickly make these available to other peers? * easily adding peer-to-peer functionality to your Java socket and server socket applications? * having your servlets and Java Server Pages work on a peer-to-peer network for increased reliability, easier maintenence, and exciting new end-user functionality? * playing with a cool technology? If you answered yes to any of the above, then welcome to the Peer-to-Peer Sockets project! The Peer-to-Peer Sockets Project reimplements Java's standard Socket, ServerSocket, and InetAddress classes to work on a peer-to-peer network rather than on the standard TCP/IP network. "Aren't standard TCP/IP sockets and server sockets already peer-to-peer?" some might ask. Standard TCP/IP sockets and server sockets are theoretically peer-to-peer but in practice are not due to firewalls, Network Address Translation (NAT) devices, and political and technical issues with the Domain Name System (DNS). The P2P Sockets project deals with these issues by re-implementing the standard java.net classes on top of the Jxta peer-to-peer network. Jxta is an open-source project that creates a peer-to-peer overlay network that sits on top of TCP/IP. Ever peer on the network is given an IP-address like number, even if they are behind a firewall or don't have a stable IP address. Super-peers on the Jxta network run application-level routers which store special information such as how to reach peers, how to join sub-groups of peers, and what content peers are making available. Jxta application-level relays can proxy requests between peers that would not normally be able to communicate due to firewalls or NAT devices. Peers organize themselves into Peer Groups, which scope all search requests and act as natural security containers. Any peer can publish and create a peer group in a decentralized way, and other peers can search for and discover these peer groups using other super-peers. Peers communicate using Pipes, which are very similar to Unix pipes. Pipes abstract the exact way in which two peers communicate, allowing peers to communicate using other peers as intermediaries if they normally would not be able to communicate due to network partitioning. Jxta is an extremely powerful framework. However, it is not an easy framework to learn, and porting existing software to work on Jxta is not for the faint-of-heart. P2P Sockets effectively hides Jxta by creating a thin illusion that the peer-to-peer network is actually a standard TCP/IP network. If a peer wishes to become a server they simply create a P2P server socket with the domain name they want and the port other peers should use to contact them. P2P clients open socket connections to hosts that are running services on given ports. Hosts can be resolved either by domain name, such as "www.nike.laborpolicy", or by IP address, such as "44.22.33.22". Behind the scenes these resolve to JXTA primitives rather than being resolved through DNS or TCP/IP. For example, the host name "www.nike.laborpolicy" is actually the NAME field of a Jxta Peer Group Advertisement. P2P sockets and server sockets work exactly the same as normal TCP/IP sockets and server sockets. The benefits of taking this approach are many-fold. First, programmers can easily leverage their knowledge of standard TCP/IP sockets and server sockets to work on the Jxta peer-to-peer network without having to learn about Jxta. Second, all of the P2P Sockets code subclasses standard java.net objects, such as java.net.Socket, so existing network applications can quickly be ported to work on a peer-to-peer network. The P2P Sockets project already includes a large amount of software ported to use the peer-to-peer network, including a web server (Jetty) that can receive requests and serve content over the peer-to-peer network; a servlet and JSP engine (Jetty and Jasper) that allows existing servlets and JSPs to serve P2P clients; an XML-RPC client and server (Apache XML-RPC) for accessing and exposing P2P XML-RPC endpoints; an HTTP/1.1 client (Apache Commons HTTP-Client) that can access P2P web servers; a gateway (Smart Cache) to make it possible for existing browsers to access P2P web sites; and a WikiWiki (JSPWiki) that can be used to host WikiWikis on your local machine that other peers can access and edit through the P2P network. Even better, all of this software works and looks exactly as it did before being ported. The P2P Sockets abstraction is so strong that porting each of these pieces of software took as little as 30 minutes to several hours. Everything included in the P2P sockets project is open-source, mostly under BSD-type licenses, and cross-platform due to being written in Java. Because P2P Sockets are based on Jxta, they can easily do things that ordinary server sockets and sockets can't handle. First, creating server sockets that can fail-over and scale is easy with P2P Sockets. Many different peers can start server sockets for the same host name and port, such as "www.nike.laborpolicy" on port 80. When a client opens a P2P socket to "www.nike.laborpolicy" on port 80, they will randomly connect to one of the machines that is hosting this port. All of these server peers might be hosting the same web site, for example, making it very easy to partition client requests across different server peers or to recover from losing one server peer. This is analagous to DNS round-robining, where one host name will resolve to many different IP addresses to help with load-balancing. Second, since P2P Sockets don't use the DNS system, host names can be whatever you wish them to. You can create your own fanciful endings, such as "www.boobah.cat" or "www.cynthia.goddess", or application-specific host names, such as "Brad GNUberg" or "Fidget666" for an instant messaging system. Third, the service ports for a given host name can be distributed across many different peers around the world. For example, imagine that you have a virtual host for "www.nike.laborpolicy". One peer could be hosting port 80, to serve web pages; another could be hosting port 2000, for instant messaging, and a final peer could be hosting port 3000 for peers to subscribe to real-time RSS updates. Hosts now become decentralized coalitions of peers working together to serve requests. Two tutorials are available: * Introduction to Peer-to-Peer Sockets - http://www.codinginparadise.org/p2psockets/1.html * How to Create Peer-to-Peer Web Servers, Servlets, JSPs, and XML-RPC Clients and Servers - http://www.codinginparadise.org/p2psockets/2.html Download P2PSockets-1.0-beta1.zip, (released 9-5-2003) which contains the core package and extensions both compiled and in source form, at http://www.codinginparadise.org/p2psockets/P2PSockets-1.0-beta1.zip Thanks, Brad GNUberg bkn3@columbia.edu From nazareno at dsc.ufcg.edu.br Wed Sep 10 04:01:16 2003 From: nazareno at dsc.ufcg.edu.br (Nazareno Andrade) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Is there any p2p reputation system deployed? Message-ID: <3F5EA20C.1070204@dsc.ufcg.edu.br> Hello. Does anybody know of any p2p system (probably file-sharing) that has any mechanism of prioritizing requests based on contributions other than KazaA and GNUnet?? I've seen several theoretical proposals, but I have no knowledge of any deployed (i.e., with a significant base of users) system that uses this kind of mechanisms. Thanks in advance, Nazareno. ======================================== Nazareno Andrade Mestrando em Inform?tica LSD - DSC/UFCG Campina Grande - Brasil nazareno@dsc.ufcg.edu.br http://lsd.dsc.ufcg.edu.br/~nazareno/ ======================================== From 0x90 at invisiblenet.net Wed Sep 10 05:17:04 2003 From: 0x90 at invisiblenet.net (Lance James) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Public Peer Review Request Message-ID: <005701c3775a$c66fbf00$0201a8c0@invisible> InvisibleNet has formed the Invisible Internet Project (I2P) to support the efforts of those trying to build a more free society by offering them an uncensorable, anonymous, and secure communication system. I2P is a development effort producing a variable latency, fully distributed, autonomous, scalable, anonymous, resilient, and secure network. The goal is to be able to operate successfully in arbitrarily hostile environments - even when an organization with unlimited financial and political resources attacks it. I2P is not a filesharing app. I2P is essentially an anonymizing and secure replacement IP stack, running on top of the existing network. There has already been progress made in writing applications on top of the network to enable generic TCP/IP applications to tunnel through the network transparently, as well as to enable nym lookup and management - two applications which, when paired together, would allow any web browser to point at http://www.[yournym].iip/ and communicate with your webserver anonymously and securely. There are many more ideas for what I2P could be used for, and its certain we won't think of the most interesting ones. I2P is an absurdly ambitious effort. Depending on what mailing lists you read or people you talk with, they'll either say its impossible or just insanely hard. To be perfectly frank, I2P by itself doesn't contribute anything really significant to the CS/P2P research community, but it does take the great work of other projects and research efforts - such as freenet, iip, kademlia, mnet, tarzan, the remailers, and many, many more - and attempts to apply good software engineering techniques to provide hard anonymity and security in a variable latency network. "Variable latency" is repeated so often because I2P doesn't try to operate with a one size fits all set of anonymity and security constraints, and different people will require different tradeoffs. Bin Laden will probably not be able to pull off live streaming video, but Joe and Jane Sixpack and should be able to. Is I2P ready to download and run with? No. So why bother mentioning it? Because we need more critical eyes to make sure we address the right issues the right ways. We think we've got things pegged so that it'll not only work, but also be secure and anonymous. We're moving forward on the development path towards getting an alpha network release out the door, but we need these specs reviewed for flaws that we've missed. Of course, we also need lots of other things, from coders to documenters to QA to network simulators to CS people, but it is your eyeballs that we're calling out for today. What we have ready for review: - Invisible Internet Network Protocol (I2NP) spec[1], describing how network "routers" operate and what messages they send to other routers - Common Data Structures spec[2], describing the serialization of objects described in other specs, as well as the encryption algorithms used. - Invisible Internet Client Protocol (I2CP) spec[3], describing a simple local client protocol for making use of the network. - Polling HTTP Transport spec[4], an example transport protocol for use with I2NP to allow actual communication between routers, regardless of firewall, NAT, or HTTP proxy. We also have the 0.2 release of a software development kit (I2P SDK)[5], which includes everything necessary to design, develop, and test applications to run over the network, as well as all of the above specs. It includes a Java client API implementing I2CP, a sample application (ATalk, a one to one chat app that supports file transfer), a Java router, and a Python router. There are also C and Python client API implementations of I2CP are on the way. These router are "local only" - meaning they don't talk to other routers. This can be used in the same way we can build normal networked applications - by running the server on the local machine and pointing the applications at it. We've been keeping this quiet because its too easy to hype up a vaporware product and we wanted to wait until there was something worth reading about before saying anything. So please read these specs and send in your comments - either to info@invisiblenet.net or to the iip-dev mailing list[6]. Perhaps even jump on that list if you want to discuss things (archives are linked to from the web page), browse the wiki[7], or join us on IIP for development meetings - every tuesday at 9P GMT in #iip- dev (archives[8] since meeting 48 are pretty much I2P specific). Thanks for your time, and we look forward to any responses. - The InvisibleNet team [1] http://www.invisiblenet.net/i2p/I2NP_spec.pdf [2] http://www.invisiblenet.net/i2p/datastructures.pdf [3] http://www.invisiblenet.net/i2p/I2CP_spec.pdf [4] http://www.invisiblenet.net/i2p/polling_http_transport.pdf [5] http://www.invisiblenet.net/i2p/I2P_SDK.zip [6] http://www.invisiblenet.net/iip/devMailinglist.php [7] http://wiki.invisiblenet.net/iip-wiki?I2P [8] http://wiki.invisiblenet.net/iip-wiki?Meetings From moore at eds.org Wed Sep 10 06:23:12 2003 From: moore at eds.org (Jonathan Moore) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Is there any p2p reputation system deployed? In-Reply-To: <3F5EA20C.1070204@dsc.ufcg.edu.br> References: <3F5EA20C.1070204@dsc.ufcg.edu.br> Message-ID: <1063174992.8741.68.camel@tot> On Tue, 2003-09-09 at 21:01, Nazareno Andrade wrote: > Does anybody know of any p2p system (probably file-sharing) that has any > mechanism of prioritizing requests based on contributions other than > KazaA and GNUnet?? Whell there is bittorrent which uses tit for tat markets to do this. -Jonathan -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030909/9e440cbe/attachment.pgp From digi at treepy.com Wed Sep 10 10:34:14 2003 From: digi at treepy.com (p@) Date: Sat Dec 9 22:12:22 2006 Subject: AW: [p2p-hackers] Is there any p2p reputation system deployed? In-Reply-To: <3F5EA20C.1070204@dsc.ufcg.edu.br> Message-ID: <00cc01c37787$19b37ef0$0200a8c0@pat> Hi, emule has implemented a system like this. If you download from a peer it get credits and if this peer ever downloads from you it will be forwarded in the queue based on the credits. Every user handles its own credit system therefore its pretty much not hackable... cheers digi -----Urspr?ngliche Nachricht----- Von: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org] Im Auftrag von Nazareno Andrade Gesendet: Mittwoch, 10. September 2003 06:01 An: p2p-hackers@zgp.org Betreff: [p2p-hackers] Is there any p2p reputation system deployed? Hello. Does anybody know of any p2p system (probably file-sharing) that has any mechanism of prioritizing requests based on contributions other than KazaA and GNUnet?? I've seen several theoretical proposals, but I have no knowledge of any deployed (i.e., with a significant base of users) system that uses this kind of mechanisms. Thanks in advance, Nazareno. ======================================== Nazareno Andrade Mestrando em Inform?tica LSD - DSC/UFCG Campina Grande - Brasil nazareno@dsc.ufcg.edu.br http://lsd.dsc.ufcg.edu.br/~nazareno/ ======================================== _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From eugen at leitl.org Wed Sep 10 16:35:54 2003 From: eugen at leitl.org (Eugen Leitl) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Project Announcement: P2P Sockets (fwd from bradneuberg@yahoo.com) (fwd from morlockelloi@yahoo.com) Message-ID: <20030910163554.GF1808@leitl.org> ----- Forwarded message from Morlock Elloi ----- From: Morlock Elloi Date: Wed, 10 Sep 2003 08:41:40 -0700 (PDT) To: cypherpunks@lne.com Subject: Re: [p2p-hackers] Project Announcement: P2P Sockets (fwd from bradneuberg@yahoo.com) > stable IP address. Super-peers on the Jxta network run > application-level routers which store special > information such as how to reach peers, how to join So these super peers are reliable, non-vulnerable, although everyone knows where they are, because .... ? ===== end (of original message) Y-a*h*o-o (yes, they scan for this) spam follows: __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com ----- End forwarded message ----- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 186 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030910/0b7f4077/attachment.pgp From nazareno at dsc.ufcg.edu.br Wed Sep 10 17:23:48 2003 From: nazareno at dsc.ufcg.edu.br (Nazareno Andrade) Date: Sat Dec 9 22:12:22 2006 Subject: AW: [p2p-hackers] Is there any p2p reputation system deployed? In-Reply-To: <00cc01c37787$19b37ef0$0200a8c0@pat> References: <00cc01c37787$19b37ef0$0200a8c0@pat> Message-ID: <3F5F5E24.4060409@dsc.ufcg.edu.br> Hello. Thanks for the answer, but I'm not sure I already understood it completely. Each peer has knowledge only about its past interactions, right? They do not exchange information about other peers reputations? If so, is there any evidence of the performance of this mechanism? Thanks, Nazareno p@ wrote: > Hi, > > emule has implemented a system like this. If you download from a peer it > get credits and if this peer ever downloads from you it will be > forwarded in the queue based on the credits. Every user handles its own > credit system therefore its pretty much not hackable... > > cheers > > digi > Nazareno. ======================================== Nazareno Andrade Mestrando em Inform?tica LSD - DSC/UFCG Campina Grande - Brasil nazareno@dsc.ufcg.edu.br http://lsd.dsc.ufcg.edu.br/~nazareno/ ======================================== From bram at gawth.com Wed Sep 10 18:14:58 2003 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] p2p-hackers meeting, this sunday Message-ID: We haven't had a p2p-hackers meeting in a while, so I'd like to call a new one. This sunday, 3pm, the metreon. We can talk about new developments in BitTorrent, Codeville, and Bram and Steve Hazel's moving to Berkeley. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From digi at treepy.com Wed Sep 10 18:25:45 2003 From: digi at treepy.com (p@) Date: Sat Dec 9 22:12:22 2006 Subject: AW: AW: [p2p-hackers] Is there any p2p reputation system deployed? In-Reply-To: <3F5F5E24.4060409@dsc.ufcg.edu.br> Message-ID: <00e001c377c8$f91f2bc0$0200a8c0@pat> Hi, If i download from someone I save his user-hash and points for the downloaded bytes. If this user connects now to me his place in the queue gets a multiplier from the prior saved points. If he starts to download, the bytes he downloads get substracted from the points. I never actually measured it. But I think there should be a difference between a freeloader and a normal user. I think especially high up-loaders will profit from this. The system gets weaker if there are more users because the chance that you never met the opposite peer gets higher. Freeloaders don't get banned with this system but probably will have to wait longer for the download to start. >Hello. > >Thanks for the answer, but I'm not sure I already understood it >completely. Each peer has knowledge only about its past interactions, >right? They do not exchange information about other peers reputations? > >If so, is there any evidence of the performance of this mechanism? > >Thanks, > >Nazareno > >p@ wrote: > > Hi, > > > > emule has implemented a system like this. If you download from a peer it > > get credits and if this peer ever downloads from you it will be > > forwarded in the queue based on the credits. Every user handles its own > > credit system therefore its pretty much not hackable... > > > > cheers > > > > digi > > From lgonze at panix.com Wed Sep 10 18:32:28 2003 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:22 2006 Subject: AW: AW: [p2p-hackers] Is there any p2p reputation system deployed? In-Reply-To: <00e001c377c8$f91f2bc0$0200a8c0@pat> Message-ID: <21BB65C9-E3BD-11D7-9A61-000393455590@panix.com> It strikes me that a reputation system with non-transferable reputation, ie where each peer has knowledge only about its own interactions, would encourage long term relationships. That's a good thing, since long term relationships encourage good behavior. Just thinking... - Lucas On Mercredi, sep 10, 2003, at 14:25 America/New_York, p@ wrote: > > Hi, > > If i download from someone I save his user-hash and points for the > downloaded bytes. If this user connects now to me his place in the > queue > gets a multiplier from the prior saved points. If he starts to > download, > the bytes he downloads get substracted from the points. > > I never actually measured it. But I think there should be a difference > between a freeloader and a normal user. I think especially high > up-loaders will profit from this. The system gets weaker if there are > more users because the chance that you never met the opposite peer gets > higher. > > Freeloaders don't get banned with this system but probably will have to > wait longer for the download to start. > >> Hello. >> >> Thanks for the answer, but I'm not sure I already understood it >> completely. Each peer has knowledge only about its past interactions, >> right? They do not exchange information about other peers reputations? >> >> If so, is there any evidence of the performance of this mechanism? >> >> Thanks, >> >> Nazareno >> >> p@ wrote: >>> Hi, >>> >>> emule has implemented a system like this. If you download from a > peer it >>> get credits and if this peer ever downloads from you it will be >>> forwarded in the queue based on the credits. Every user handles its > own >>> credit system therefore its pretty much not hackable... >>> >>> cheers >>> >>> digi >>> > > > > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > From bradneuberg at yahoo.com Wed Sep 10 19:18:51 2003 From: bradneuberg at yahoo.com (Brad Neuberg) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Project Announcement: P2P Sockets (fwd from bradneuberg@yahoo.com) (fwd from morlockelloi@yahoo.com) In-Reply-To: <20030910163554.GF1808@leitl.org> Message-ID: <20030910191851.6889.qmail@web14101.mail.yahoo.com> --- Eugen Leitl wrote: > ----- Forwarded message from Morlock Elloi > ----- > > From: Morlock Elloi > Date: Wed, 10 Sep 2003 08:41:40 -0700 (PDT) > To: cypherpunks@lne.com > Subject: Re: [p2p-hackers] Project Announcement: P2P > Sockets (fwd from > bradneuberg@yahoo.com) > > > stable IP address. Super-peers on the Jxta network > run > > application-level routers which store special > > information such as how to reach peers, how to > join > > So these super peers are reliable, non-vulnerable, > although everyone knows > where they are, because .... ? > These super peers are known as Rendezvous peers in the Jxta world. They are as reliable and non-vulnerable as one could hope for, though I doubt they are perfect; I am building above the existing Jxta infrastructure for these. "Everyone" knows about them by using a common boostrap server to bootstrap into the Jxta network to gain the addresses of a few Rendezvous nodes. Rendezvous nodes then propagate information about their existence to other Rendezvous nodes at various times. Network partitions are certainly possible, and the requirement for a common bootstrap server is fragile. Jxta, and therefore P2P Sockets, currently has no protections against malicious/byzantine peers; it has relatively good protections against peers that fail non-maliciously. Brad Neuberg From clint at TheStaticVoid.net Thu Sep 11 12:42:11 2003 From: clint at TheStaticVoid.net (Clint Heyer) Date: Sat Dec 9 22:12:22 2006 Subject: AW: AW: [p2p-hackers] Is there any p2p reputation system In-Reply-To: <20030910190007.4D1583FD2C@capsicum.zgp.org> Message-ID: <2003911144211.419221@platypus> Have you seen Naanou?[1,2] It's not 'deployed' per se, but the code is available under the GPL license. Naanou uses the characteristics of the underlying DHT (Chord) to provide a distributed reputation system. The goal was to enable distributed moderation of anti-social behaviour in a P2P community. As the user gets moderated against their experience of the network deteriorates - for example download speeds from other peers is reduced. cheers, .clint [1] http://naanou.sourceforge.net/ [2] http://thestaticvoid.net/portfolio/p_naanou.html > Does anybody know of any p2p system (probably file-sharing) that > has any > mechanism of prioritizing requests based on contributions other than > KazaA and GNUnet?? > I've seen several theoretical proposals, but I have no knowledge of > any deployed (i.e., with a significant base of users) system that > uses this kind of mechanisms. ______________________________________ www: http://www.TheStaticVoid.net From clausen at gnu.org Thu Sep 11 01:19:31 2003 From: clausen at gnu.org (Andrew Clausen) Date: Sat Dec 9 22:12:22 2006 Subject: AW: AW: [p2p-hackers] Is there any p2p reputation system deployed? In-Reply-To: <21BB65C9-E3BD-11D7-9A61-000393455590@panix.com> References: <00e001c377c8$f91f2bc0$0200a8c0@pat> <21BB65C9-E3BD-11D7-9A61-000393455590@panix.com> Message-ID: <20030911011931.GB710@gnu.org> On Wed, Sep 10, 2003 at 02:32:28PM -0400, Lucas Gonze wrote: > It strikes me that a reputation system with non-transferable > reputation, ie where each peer has knowledge only about its own > interactions, would encourage long term relationships. At this point, I don't think you are talking about reputation. You're basically saying: "everyone is better off going it alone, and figuring out for themselves who they should trust". If you are in a situation where you don't have enough direct experience with a peer, then I think you need reputation. "Enough direct experience" is a sticky question, because you need to be careful not to overtrust someone who might be trying to suck you in first, then rip you off later. > That's a good thing, since long term relationships encourage good > behavior. I think the same can be said for reputation as well. It would be cool if there were a reputation system that made it irrational to defect. i.e. the utility gained from defection always being outweighed by the lost reputation. This is the motivation for my research. (If you're interested: http://members.optusnet.com.au/clausen/ideas/google/google-subvert.pdf) Cheers, Andrew From bram at gawth.com Fri Sep 12 21:28:48 2003 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Hacker Dim Sum this Sunday at noon in San Francisco (fwd) Message-ID: Hey. A bunch of us are getting together this Sunday at noon at Yank Sing in San Francisco for Dim Sum. Bram said he wants to have a P2P hackers meeting after ... maybe we could relocate to a coffee shop. The address is: 101 Spear St Hopefully we can get 5-10 people to come. I took the liberty to send email to all the super brilliant infoanarchists, P2P hackers, and general geeks that I know live in the city. If you can think of anyone else please forward. Hopefully we can get 6-12 people to go... hopefully everyone won't be too busy (kind of late notice) Feel free to call my cell if you get lost on Sunday (415-595-9965) Also if you could RSVP back I might call Yank Sing to reserve a table(s) if the group gets too big! Peace! Kevin -- Help Support NewsMonster Development! Purchase NewsMonster PRO! http://www.newsmonster.org/download-pro.html Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM - sfburtonator, Web - http://www.peerfear.org/ GPG fingerprint: 4D20 40A0 C734 307E C7B4 DCAA 0303 3AC5 BD9D 7C4D IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster From stevegt at TerraLuna.Org Tue Sep 16 18:25:44 2003 From: stevegt at TerraLuna.Org (stevegt@TerraLuna.Org) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Protocol vs Header Format Message-ID: <20030916182544.GB27485@pathfinder> Hi All, Has anyone ever given any thought to the difference between protocol and header format? In most existing protocol stacks, the 'protocol' field in a lower header (e.g. IP) specifies both the parser and the state engine to use in interpreting the next higher header (e.g. TCP). In other words, in networking we seem to have evolved to this assumption that the header format and protocol state engine are inseparable, so much so that the terms 'protocol' and 'header format' are often (mis)used interchangeably. Are there good reasons for this that anyone can point to? Wouldn't it make more sense for the N-layer header to specify the N+1 parser, and then have a field in the layer N+1 header specify the N+1 state engine? It seems like this would be more flexible. Can anyone think of an existing case where a stack is already implemented this way? I haven't been able to put my finger on one yet, but maybe I'm missing something obvious. I think this ties intimately into the "text vs. binary" header debate, too, though I can't quite articulate why right now. This all bubbled up in my brain because I'm in the throes of putting together a protocol, and need a header format to support it. The development of the protocol is to be self-hosting; i.e. developers use the protocol to collaborate on the development of the protocol. This means that the headers and protocol state engines are both likely to undergo violent evolution as they mature, but will need to remain usable. This means distributed versioning, separation of code (state engine) and data (header format), and so on. I keep getting tangled up in prior art that doesn't seem to care that the protocol and headers might want to evolve separately. Am I crazy, on the right track, or both? Steve -- Stephen G. Traugott (KG6HDQ) UNIX/Linux Infrastructure Architect, TerraLuna Aerospace LLC stevegt@TerraLuna.Org http://www.stevegt.com -- http://Infrastructures.Org From wesley at felter.org Tue Sep 16 23:33:02 2003 From: wesley at felter.org (Wes Felter) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Protocol vs Header Format In-Reply-To: <20030916182544.GB27485@pathfinder> References: <20030916182544.GB27485@pathfinder> Message-ID: <1063755182.4286.34.camel@arlx248.austin.ibm.com> On Tue, 2003-09-16 at 13:25, stevegt@TerraLuna.Org wrote: > Hi All, > > Has anyone ever given any thought to the difference between protocol and > header format? In most existing protocol stacks, the 'protocol' field > in a lower header (e.g. IP) specifies both the parser and the state > engine to use in interpreting the next higher header (e.g. TCP). FWIW, several protocols borrow RFC 822 style headers. But I don't see what that allows you to do that you can't do with custom headers. Another way to look at it is that you're talking about the split between syntax (formats) and semantics (protocols). It remains true that one is useless without the other, and superficial similarities between syntaxes (like using XML for everything) doesn't buy you any interop. -- Wes Felter - wesley@felter.org - http://felter.org/wesley/ From zooko at zooko.com Wed Sep 17 13:40:59 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] desiderata and open issues in ent Message-ID: Dear mnet-devel and p2p-hackers: [If Robert Hettinga forwards this message to a list that you read, and you find it interesting, perhaps you should consider joining mnet-devel [1] or p2p-hackers [2].] [Note that ent is not the only thing going on in terms of improved Mnet networking. Tschechow has gotten a simplified version of ent named "router" running, Myers is implementing Twisted Chord, and Arno *was* working on multiple independent metatrackers before Real Life distracted him.] These are the ways that ent should differ from emergent network designs like Chord and Kademlia. Some of these desiderata dovetail with one another, but others of them appear to be conflicting. It may be impossible to satisfy all of them at once, but I currently believe that we can at least partially achieve all of them. 1. It should self-heal flaws in the network structure. There is a protocol [3] known to do this for Chord, but not (as far as I know) for Kademlia. (re: recent discussion [4]) 2. It should handle the reality that a large fraction (around half) of the nodes are behind NAT or firewalls and can't accept incoming connections. If two nodes are both restricted like that then they cannot be peers of one another in the ent graph. 2.b. But it will not try to handle arbitrary underlay structure -- it will rely on the fact that a large fraction of the nodes form a single fully- connected clique, and that every node has edges to almost all of the clique nodes. 3. It should handle transient nodes. 3.a. Newcomer nodes are never entrusted with the only replica of any data. They may be entrusted with no data at all, or at a cost of using extra bandwidth, they can be entrusted with an extra replica that is already stored by an old-timer node. 3.b. Suppose a new node "B" joins the network, and it is no responsible for a part of the id space that was formerly covered by an old node "A". Now suppose A is serving up 200 GB of data, 100 GB of which now falls into B's part of the id space. When queries for blocks from B's space get routed to be, B will forward them to A, and cache copies of the response. This dovetails with 3.a. 4. It should include incentives for people to run good nodes. 4.a. The most important incentives are probably social -- people should chat with one another, have (*real*, human social, not digital) reputations, and so forth. That's outside the scope of ent design. 4.b. Peering should have a selfish incentive mechanism so that people who run high-quality nodes get higher-quality performance from the network when they try to use it themselves. (I'm inspired by the tit-for-tat pairwise incentives from Bram's original BitTorrent design. I don't know if the current BitTorrent has changed in that respect.) 5. It should exploit locality: nodes which have short fat pipes between them (they are "close to" each other in the underlay network) should be more likely to peer with each other than nodes that share long skinny pipes. 6. It should exploit heterogeneity. The capacities of nodes is expected to follow a power-law distribution: half of the nodes are Pentium II's on 33 Kbps dial-up with 5 MB of free disk space, a quarter of the nodes are Pentium IV's on 512 Kbps DSL with 5 GB of free disk space, an eighth of the nodes are quad- Opterons on 44 Mbps DS3 with 1 TB free disk space, a sixteenth of the nodes are supercomputers which are illicitly bridging the Internet to "Internet 2", etc. 7. Normal emergent network desiderata: scalable, robust, efficient. 8. Simple, analyzable, measurable. Open issues: * Self-healing (1.) has to be designed for Kademlia or else locality (5.) has to be designed for Chord or else we have to switch to another emergent network design entirely for the basis of ent. * Id-space shadowing (3.b.) has to be designed carefully to avoid looping or other bad artifacts, and to behave acceptably well in the various cases of A, B, and other nodes coming and going. * Selfish peering (4.b.) has to be balanced against the system-wide consistency and performance desiderata. For example, each individual node wants to link *only* with peers that provide good quality service to it. However, suppose there is a node that provides bad-quality service, that nobody wants to peer with. Suppose that node is in sole possession of a block of data. We still want searches in the network to find that block! This may be impossible, in which case we have to choose a trade-off, and hopefully one which is either qualitative or easily measurable. * Putting all of these together is the big trick. Can we exploit heterogeneity (6.) and locality (5.) at the same time? Can we exploit both of these while retaining any sort of analyzability/measurability? Etc. A general design idea: One way that some of these apparently conflicting desiderata can be reconciled is to use redundant "special-purpose" overlay networks. For example, Pastry [5] uses its free-choice property to increase network locality, while the very similar Kademlia [6] uses the same free-choice property to increase robustness, at the expense of locality. That is: in Pastry you have to choose any one out of (say) a thousand nodes to be your peer, and you attempt to choose the one which is closest to you in the underlay, i.e. the one that has the fastest connection to you. In Kademlia you have to choose any one out of a thousand nodes, and you attempt to choose the one that is least likely to drop off the net. My idea for ent is that you have two separate overlay networks, one in which you prefer the most reliable nodes and the most robust emergent network topology, and the other in which you prefer the fastest nodes and the most efficient emergent network topology. When the latter fails, you use the former to rebuild it. Regards, Zooko http://zooko.com/log.html [1] http://sourceforge.net/mailarchive/forum.php?forum_id=7702 [2] http://zgp.org/pipermail/p2p-hackers/ [3] http://citeseer.nj.nec.com/liben-nowell02observations.html [4] http://zgp.org/pipermail/p2p-hackers/2003-August/001344.html [5] http://research.microsoft.com/~antr/Pastry/ [6] http://citeseer.nj.nec.com/529075.html From mujtaba at asu.edu Wed Sep 17 15:30:34 2003 From: mujtaba at asu.edu (Mujtaba Khambatti) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Call for Student Essays (P2P pages, IEEE DSO) In-Reply-To: <000901c37d2f$130ed560$03054bab@Mujtaba> Message-ID: <001801c37d30$a447ffb0$03054bab@Mujtaba> Call for Essays The P2P pages [1] of the IEEE Distributed Systems Online [2] is seeking submissions for publication on its website. The IEEE Distributed Systems Online hosts expert-authored articles and resources in the various topic areas of distributed systems. The P2P pages focus specifically on links to useful websites, news, journals, papers, books, and events related to peer-to-peer technologies. The essays must be written by students. Proof of student status will be required to complete the submission process. Essay length can range from 800 to 1000 words, depending on the nature of the topic. Lengthier essays may be serialized for easier online readability. Among the types of articles that would be of interest (not inclusive): 1. P2P Applications 2. Challenges in P2P Systems 3. Security in P2P Systems 4. Trust in P2P Systems 5. P2P and Databases 6. P2P and Mobile networks 7. P2P and the law 8. P2P and the Business world 9. Criticisms of P2P technology Essays must be submitted with a completed list of well-formatted references and a paragraph of up to 50 words containing biographical information about the author. Please submit in plain text (ASCII) format via a MIME attachment to an email. Email your submissions to mujtaba@asu.edu by the 10th of October, 2003 in order to be considered for the November 2003 issue. Please write "DSOnline Essay-Nov" in the subject line of the email. Essays that have less than 750 words or greater than 1050 words will not be considered. The essay title, author name, biographical information, and the reference section will be excluded from the word count. For more information contact mujtaba@asu.edu References: [1] http://dsonline.computer.org/os/related/p2p/index.htm [2] http://dsonline.computer.org/ From bert at web2peer.com Wed Sep 17 16:29:43 2003 From: bert at web2peer.com (Bert) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] p2p sharing & access-control Message-ID: <20030917092947.25462.h018.c001.wm@mail.web2peer.com.criticalpath.net> One of my recent interests has been p2p file sharing in an access-controlled environment instead of the current "free for all" paradigm. This area is deserving of attention because of obvious applications in p2p for the enterprise as well as emerging "darknets" intended to be invitation only. The question I've been thinking about is how to support (efficient) search in such settings. Currently, when we search for access controlled files we must individually authenticate and search each relevant repository. But in a massively distributed environment, how do you know what repositories are relevant? And even if you did, searching all of them independently would be too much trouble. An alternative is to have every information provider allow its content to be indexed by a centralized index host, but the trust & security requirements of such a host would be too high to be practical. We've written a paper that addresses this problem and proposes an alternative solution. The idea is to build a specialized index structure that does not reveal any specific details about the content being shared. As such it is suitable for storage on untrusted nodes, e.g. typical (super) peers in a p2p network. The paper is entitled "Privacy-Preserving Indexing of Documents on the Network", and you can download it from here: http://www.almaden.ibm.com/cs/people/bayardo/userv/ Hope you find it interesting. From bradneuberg at yahoo.com Wed Sep 17 19:03:49 2003 From: bradneuberg at yahoo.com (Brad Neuberg) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] New Release of P2P Sockets + Presentation Message-ID: <20030917190349.80489.qmail@web14103.mail.yahoo.com> Just uploaded a new build of P2P Sockets, dated 9-17-2003. Check out p2psockets.jxta.org to grab it. This build also has a new Power Point presentation (its also at http://codinginparadise.org/p2psockets/p2psockets_powerpoint.ppt) which provides an intro to the project; I presented this last night at the JXTA Town Hall Meeting. It also includes some new shell-scripts that make running the various tools much easier, some code-changes to help the shell-scripts, and some updates to the tutorials to use these new easier shell-scripts. P2P Sockets is a reimplementation of standard Java sockets on top of Jxta and ports of standard web servers, servlet engines, etc. to run on top of a peer-to-peer network. P2P Sockets is finished. The Paper Airplane website, paperairplane.us, is also now up. Thanks, Brad Neuberg bkn3@columbia.edu http://www.codinginparadise.org From lgonze at panix.com Wed Sep 17 22:59:55 2003 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:22 2006 Subject: AW: AW: [p2p-hackers] Is there any p2p reputation system deployed? In-Reply-To: <20030911011931.GB710@gnu.org> Message-ID: On Mercredi, sep 10, 2003, at 21:19 America/New_York, Andrew Clausen wrote: > On Wed, Sep 10, 2003 at 02:32:28PM -0400, Lucas Gonze wrote: >> It strikes me that a reputation system with non-transferable >> reputation, ie where each peer has knowledge only about its own >> interactions, would encourage long term relationships. > > At this point, I don't think you are talking about reputation. You're > basically saying: "everyone is better off going it alone, and figuring > out for themselves who they should trust". > > If you are in a situation where you don't have enough direct experience > with a peer, then I think you need reputation. "Enough direct > experience" is a sticky question, because you need to be careful not to > overtrust someone who might be trying to suck you in first, then rip > you > off later. Ok, it's good to have the two definitions spelled out, so that we can know to say which one we're talking about in the future. > This is the motivation for my research. (If you're interested: > http://members.optusnet.com.au/clausen/ideas/google/google-subvert.pdf) Finally got around to reading this yesterday. I don't know whether the idea that PageRank is ultimately backed by the cost of a domain name was known to others, but to me it's completely new. Good stuff. - Lucas From mllist at vaste.mine.nu Sun Sep 21 13:57:01 2003 From: mllist at vaste.mine.nu (Johan =?iso-8859-1?Q?F=E4nge?=) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] XOR Hash Tree Message-ID: <1639.192.168.1.204.1064152621.squirrel@Vaste_lp3.wired> Sending top-to-bottom flattened THEX trees requires sending 2*base_hashes hashes, right? Instead of hashing pairs of hashes from the previous level to create the next level, why not XOR them? A / \ B C If you transfer A and B, you can reconstruct C by doing A XOR B. Etc. No base_hashes*hash_size overhead. What I'm wondering is of course if this is more easily spoofable. (Does XORing two pseudo-random number make them less random?) Surely someone must've thought of this before? /Vaste From b.fallenstein at gmx.de Sun Sep 21 15:17:49 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] XOR Hash Tree In-Reply-To: <1639.192.168.1.204.1064152621.squirrel@Vaste_lp3.wired> References: <1639.192.168.1.204.1064152621.squirrel@Vaste_lp3.wired> Message-ID: <3F6DC11D.5010301@gmx.de> Hi, Johan F?nge wrote: > Instead of hashing pairs of hashes from the previous level to create the > next level, why not XOR them? > > A > / \ > B C > > If you transfer A and B, you can reconstruct C by doing A XOR B. Etc. No > base_hashes*hash_size overhead. If you transfer B and C, you can reconstruct A by hashing B|C, so XORing doesn't seem to give an advantage. Plus, your scheme breaks security completely. ;) Because if the receiver has A, and you send A and B, then the receiver can construct C... but, the receiver can construct a C for *arbitrary* B! I.e., whatever you send as B, the receiver constructs a C so that it "authenticates" against the root of the hash tree. (There's also the problem that XORing doesn't preserve order-- A XOR B = B XOR A-- but that hardly makes a difference given the above.) Cheers, - Benja From justin at chapweske.com Mon Sep 22 00:55:20 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] XOR Hash Tree In-Reply-To: <1639.192.168.1.204.1064152621.squirrel@Vaste_lp3.wired> References: <1639.192.168.1.204.1064152621.squirrel@Vaste_lp3.wired> Message-ID: <3F6E4878.4020908@chapweske.com> Not good. You could easily create a collision by swapping the leaves. -Justin Johan F?nge wrote: > Sending top-to-bottom flattened THEX trees requires sending 2*base_hashes > hashes, right? > > Instead of hashing pairs of hashes from the previous level to create the > next level, why not XOR them? > > A > / \ > B C > > If you transfer A and B, you can reconstruct C by doing A XOR B. Etc. No > base_hashes*hash_size overhead. > > What I'm wondering is of course if this is more easily spoofable. (Does > XORing two pseudo-random number make them less random?) > > Surely someone must've thought of this before? > > /Vaste > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From icepick at icepick.info Thu Sep 25 19:35:26 2003 From: icepick at icepick.info (icepick@icepick.info) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: [mnet-devel] desiderata and open issues in ent In-Reply-To: References: Message-ID: <20030925193526.GA2264@fil.org> On Wed, Sep 17, 2003 at 09:40:59AM -0400, Zooko wrote: > 2. It should handle the reality that a large fraction (around half) of the > nodes are behind NAT or firewalls and can't accept incoming connections. If > two nodes are both restricted like that then they cannot be peers of one > another in the ent graph. Surveying LinkSys home gateway products this morning I see that all of them support UPnP. I suspect that most new home networks are being setup with hardward like this that supports UPnP, so maybe we'll get lucky and that 50% number will go down. I also have posted code [1] that uses the python COM support to forward a port to a NAT'ed computer (my work computer for example.). It's ugly and blocking but I plan on adding to Mnet this weekend. This is a great doc, btw, of what needs to be tackled. icepick 1 - http://icepick.info/2003/09/17/upnp_example.py From zooko at zooko.com Thu Sep 25 20:23:24 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: [mnet-devel] desiderata and open issues in ent In-Reply-To: Message from icepick@icepick.info of "Thu, 25 Sep 2003 15:35:26 EDT." <20030925193526.GA2264@fil.org> References: <20030925193526.GA2264@fil.org> Message-ID: icepick wrote: > > On Wed, Sep 17, 2003 at 09:40:59AM -0400, Zooko wrote: > > 2. It should handle the reality that a large fraction (around half) of the > > nodes are behind NAT or firewalls and can't accept incoming connections. If > > two nodes are both restricted like that then they cannot be peers of one > > another in the ent graph. > > Surveying LinkSys home gateway products this morning I see that all of them > support UPnP. I suspect that most new home networks are being setup with > hardward like this that supports UPnP, so maybe we'll get lucky and that 50% > number will go down. I'm skeptical. I think that in a lot of places that NAT is installed it is serving a dual role: to multiplex IP addresses and to discourage consumers from running servers. I suspect that if the former need is obviated for some reason, that firewalls (or UPnP configurations) will then be installed to enforce the latter need. Networking researchers and Internet hackers like to talk about "solving the NAT problem", but I suspect that the people who actually make the decisions consider it to be a feature and not a problem. Here's an interesting rant that I skimmed recently that touches on this: http://www.fourmilab.ch/documents/digital-imprimatur/ Regards, Zooko From wesley at felter.org Fri Sep 26 06:00:42 2003 From: wesley at felter.org (Wes Felter) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: [mnet-devel] desiderata and open issues in ent In-Reply-To: Message-ID: On Thursday, September 25, 2003, at 03:23 PM, Zooko wrote: > > I'm skeptical. I think that in a lot of places that NAT is installed > it is > serving a dual role: to multiplex IP addresses and to discourage > consumers > from running servers. I suspect that if the former need is obviated > for some > reason, that firewalls (or UPnP configurations) will then be installed > to > enforce the latter need. I disagree. In general, NATs are being installed by end users, not ISPs. You are right, however, that the ISPs will always find some way of propping up their bad business models. Wes Felter - wesley@felter.org - http://felter.org/wesley/ From myers at maski.org Tue Sep 30 02:05:27 2003 From: myers at maski.org (Myers W. Carpenter) Date: Sat Dec 9 22:12:22 2006 Subject: [p2p-hackers] Re: desiderata and open issues in ent Message-ID: <20030930020527.GA11621@maski.org> On Thu, Sep 25, 2003 at 04:23:24PM -0400, Zooko wrote: > Networking researchers and Internet hackers like to talk about "solving the > NAT problem", but I suspect that the people who actually make the decisions > consider it to be a feature and not a problem. I suspect that at this point the people who actually make the decisions are about as clueless as Aunt Millie (sorry Aunt Millie). Actually, if you want to look at the main decision-maker-by-default, Microsoft, you see that they are pushing NAT traversal. Why? Because it allows them to have neat features like Video/Voice Conf. (which was actually the key reason we got these UPnP routers at work). Also take a look at their Three Degrees project. Key dependency for this is IPv6 and Teredo [1]. I'm tempted to see if this could be used within Mnet. I think it's a good idea to take the bull by the horns now and add in support for these technologies. Put an indicator on your app to show the user what kind of connection they have. For example a yellow indicator or, if you are Peekabooty, a big frowning bear (maybe he could spit at you and call you names?) when you can't accept incoming connections. Make them feel like they aren't getting the full deal. Make the user want it to the point that the other people who make decisions (you know "THEM") can't just slip this one by. myers 1 - "Teredo, also known as IPv4 network address translator (NAT) traversal for IPv6" http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/winxppro/maintain/Teredo.asp