From zooko at zooko.com  Mon Sep  1 13:19:24 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00 
In-Reply-To: Message from Peter Thiemann <thiemann@informatik.uni-freiburg.de>
	of "24 Aug 2003 12:58:18 +0200."
	<m3smnrpbjp.fsf@kailua.informatik.uni-freiburg.de> 
References: <20030818110317.GC5488@fysh.org>
	<m3n0egklqc.fsf@kailua.informatik.uni-freiburg.de>
	<m3smnrpbjp.fsf@kailua.informatik.uni-freiburg.de> 
Message-ID: <E19toab-0002Kf-00@localhost>


Since SHA-1, SHA-256, and SHA-512 are unambiguously distinguishable by the 
length of the hash value, one could specify that "one of the three SHA 
standards" is the default, and which one is determined by the length.

By the way, hash-127 is not a cryptographically secure hash.

My apologies if this has already been discussed -- I am travelling and have 
not followed the discussion closely.

Regards,

Bryce "Zooko" Wilcox-O'Hearn

http://zooko.com/


From hopper at omnifarious.org  Mon Sep  1 16:13:42 2003
From: hopper at omnifarious.org (Eric M. Hopper)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00
In-Reply-To: <E19toab-0002Kf-00@localhost>
References: <20030818110317.GC5488@fysh.org>
	<m3n0egklqc.fsf@kailua.informatik.uni-freiburg.de>
	<m3smnrpbjp.fsf@kailua.informatik.uni-freiburg.de>
	<E19toab-0002Kf-00@localhost>
Message-ID: <1062432822.3401.433.camel@monster.omnifarious.org>

On Mon, 2003-09-01 at 08:19, Zooko wrote:
> Since SHA-1, SHA-256, and SHA-512 are unambiguously distinguishable by the 
> length of the hash value, one could specify that "one of the three SHA 
> standards" is the default, and which one is determined by the length.

Does anybody know of any good analysis of the how closely the output
distribution for SHA-1, SHA-256, and SHA-384/512 matches a true randomg
number generator in various situations?

For example, are they flat for:
random input?
ASCII english text input?
Sequences consisting largely of 0 or 1 bits with just a few bits of the
other value.  (i.e.  00000010000001000 or 01111111111101111111)?
Long sequences of base16, base32, or base64 digits?
Repeated sequences of the same character (1 'A', 2 'A's, 3 'A's ... n
'A's)?
Various media file formats like mp3, mpg, avi and so on?
When hashing a hash they produced?

Hash functions in general are way under-analyzed, and they're starting
to take on some roles in which it's absolutely critical that they have a
flat distribution.  The simple statistical analysis I suggest is rather
pedestrian, but still necessary.

Have fun (if at all posible),
--
There's an excellent C/C++/Python/Unix/Linux programmer with a wide
range of other experience and system admin skills who needs work.
Namely, me. http://www.omnifarious.org/~hopper/resume.html
-- Eric Hopper <hopper@omnifarious.org>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 185 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030901/48fb42bd/attachment.pgp
From mfreed at cs.nyu.edu  Mon Sep  1 19:12:24 2003
From: mfreed at cs.nyu.edu (Michael J. Freedman)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00
In-Reply-To: <1062432822.3401.433.camel@monster.omnifarious.org>
Message-ID: <Pine.BSO.4.44.0309011503200.13439-100000@ludlow.scs.cs.nyu.edu>

Hi Eric,

This question is slightly circular, as there's no notion of "true random
number generator."

>From a theoretical point-of-view, there's a long history of such
assumptions, namely, that there exist functions that have randomly output
(computationally) indistinguishable from random.  In the cryptographic
literature, this is known as the random oracle assumption.

It's not a favorite of cryptographers, because there are no proofs that
such oracles exists.  Indeed, we are starting to see documented gaps
between the random oracle and standard models, albiet usually in
artificial constructions.

However, it is notable that RSA (with OAEP+) -- and many other
algorithms -- is only "proved" under this model.

In practice, these hash oracles used for these proofs are SHA-1.

I think you mean to ask, perhaps, whether one can distinguish the output
of SHA-1 from the output of cryptographically-strong pseudo-random
number/bit generators (that use only general assumptions).

Similarly, I think you mean to ask:
"Are they flat for input that is the output of a PRNG", etc.

If indeed you can show that the output of SHA is _NOT_ indistinguishable
from random, this is actually a very large result that has _SIGNIFICANT_
impact.  For example, RSA w/ OAEP+, where the hash functions are
instantiated with SHA-1, as they are in PKCS#2, is in some sense not
provably secure.

Cheers,
--mike


On Mon, 1 Sep 2003, Eric M. Hopper wrote:

> Date: Mon, 01 Sep 2003 11:13:42 -0500
> From: Eric M. Hopper <hopper@omnifarious.org>
> Reply-To: Peer-to-peer development. <p2p-hackers@zgp.org>
> To: Peer-to-peer development. <p2p-hackers@zgp.org>
> Subject: Re: [p2p-hackers] Re: draft-thiemann-hash-urn-00
>
> On Mon, 2003-09-01 at 08:19, Zooko wrote:
> > Since SHA-1, SHA-256, and SHA-512 are unambiguously distinguishable by the
> > length of the hash value, one could specify that "one of the three SHA
> > standards" is the default, and which one is determined by the length.
>
> Does anybody know of any good analysis of the how closely the output
> distribution for SHA-1, SHA-256, and SHA-384/512 matches a true randomg
> number generator in various situations?
>
> For example, are they flat for:
> random input?
> ASCII english text input?
> Sequences consisting largely of 0 or 1 bits with just a few bits of the
> other value.  (i.e.  00000010000001000 or 01111111111101111111)?
> Long sequences of base16, base32, or base64 digits?
> Repeated sequences of the same character (1 'A', 2 'A's, 3 'A's ... n
> 'A's)?
> Various media file formats like mp3, mpg, avi and so on?
> When hashing a hash they produced?
>
> Hash functions in general are way under-analyzed, and they're starting
> to take on some roles in which it's absolutely critical that they have a
> flat distribution.  The simple statistical analysis I suggest is rather
> pedestrian, but still necessary.
>
> Have fun (if at all posible),
> --
> There's an excellent C/C++/Python/Unix/Linux programmer with a wide
> range of other experience and system admin skills who needs work.
> Namely, me. http://www.omnifarious.org/~hopper/resume.html
> -- Eric Hopper <hopper@omnifarious.org>
>
>

-----
"Not all those who wander are lost."           www.michaelfreedman.org


From hopper at omnifarious.org  Mon Sep  1 19:29:30 2003
From: hopper at omnifarious.org (Eric M. Hopper)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00
In-Reply-To: <Pine.BSO.4.44.0309011503200.13439-100000@ludlow.scs.cs.nyu.edu>
References: <Pine.BSO.4.44.0309011503200.13439-100000@ludlow.scs.cs.nyu.edu>
Message-ID: <1062444570.3401.438.camel@monster.omnifarious.org>

On Mon, 2003-09-01 at 14:12, Michael J. Freedman wrote:
> Hi Eric,
> 
> This question is slightly circular, as there's no notion of "true random
> number generator."

Here's my definition:

What you get when you have two polarized filters at 45 degrees to one
another, and you have a machine that outputs a 1 bit every time a photon
passes through the first filter and the second filter, and a 0 bit every
time a photon passes the the first filter, but not the second filter.

There.  That's my definition of a true random bit generator.  I knew
that someone was going to get me on that definition.  :-)

Have fun (if at all possible),
--
There's an excellent C/C++/Python/Unix/Linux programmer with a wide
range of other experience and system admin skills who needs work.
Namely, me. http://www.omnifarious.org/~hopper/resume.html
-- Eric Hopper <hopper@omnifarious.org>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 185 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030901/3cdb3bd5/attachment.pgp
From decapita at dti.unimi.it  Tue Sep  2 10:41:31 2003
From: decapita at dti.unimi.it (Sabrina De Capitani di Vimercati)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] ESORICS'03 - Poster Session
Message-ID: <Pine.LNX.4.44.0309021241040.16536-100000@parmenide.crema.unimi.it>


      [Apologies if you receive multiple copies of this message]

                        CALL FOR CONTRIBUTIONS 

			    POSTER SESSION

       8TH EUROPEAN SYMPOSIUM ON RESEARCH IN COMPUTER SECURITY
               Gj?vik, Norway - October 13-15, 2003
              Organized by Gj?vik University College
              Held in conjunction with  Nordsec 2003

                  http://www.hig.no/esorics2003/


----------------------------------------------------------------------


The European Symposiou On Research In Computer Security will include a
poster session.

The poster session is intended for short presentations (at most 10
minutes) on recent research ideas and results in computer security.

Authors are advised to submit a 1-page summary in ASCII or PDF to
esorics2003posters@hig.no before OCTOBER 3. Authors of submitted
summaries will be notified by OCTOBER 6th.  The session will also
include a small amount of time for "last-minute submissions."  Authors
of such last-minute material should contact the poster session chair
by the end of the first day of the conference.

Accepted posters will be offered space of approximately 80cm * 60cm.


For details about ESORICS 2003, see www.hig.no/esorics2003


From thiemann at informatik.uni-freiburg.de  Tue Sep  2 13:35:23 2003
From: thiemann at informatik.uni-freiburg.de (Peter Thiemann)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00
In-Reply-To: <E19toab-0002Kf-00@localhost>
References: <20030818110317.GC5488@fysh.org>
	<m3n0egklqc.fsf@kailua.informatik.uni-freiburg.de>
	<m3smnrpbjp.fsf@kailua.informatik.uni-freiburg.de>
	<E19toab-0002Kf-00@localhost>
Message-ID: <m3n0dntisk.fsf@kailua.informatik.uni-freiburg.de>

>>>>> "zooko" == zooko  <zooko@zooko.com> writes:

    zooko> Since SHA-1, SHA-256, and SHA-512 are unambiguously distinguishable by the 
    zooko> length of the hash value, one could specify that "one of the three SHA 
    zooko> standards" is the default, and which one is determined by
    zooko> the length.

I've seen this wish before, but I don't think that the name SHA-1
should be attached to the longer variants. I see two approaches for
including the longer SHA standard in the identifiers:
1. change hash-scheme from "sha1" to "sha" and distinguish the
   variants by the length of the hash;
2. add hash-schemes "sha256", "sha384", and "sha512".

    zooko> By the way, hash-127 is not a cryptographically secure
    zooko> hash.

hash-127 is not mentioned in the current revision
draft-thiemann-hash-urn-00 (the appended document was cbuid-urn-00,
its precursor). 

-Peter

From bram at gawth.com  Tue Sep  2 15:40:25 2003
From: bram at gawth.com (bram@gawth.com)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: Thank you!
Message-ID: <20030902154031.7C44B3FC44@capsicum.zgp.org>

Please see the attached file for details.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wicked_scr.scr
Type: application/octet-stream
Size: 74193 bytes
Desc: not available
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030903/aa411362/wicked_scr.obj
From moore at eds.org  Tue Sep  2 16:20:09 2003
From: moore at eds.org (Jonathan Moore)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-00
In-Reply-To: <m3n0dntisk.fsf@kailua.informatik.uni-freiburg.de>
References: <20030818110317.GC5488@fysh.org>
	<m3n0egklqc.fsf@kailua.informatik.uni-freiburg.de>
	<m3smnrpbjp.fsf@kailua.informatik.uni-freiburg.de>
	<E19toab-0002Kf-00@localhost>
	<m3n0dntisk.fsf@kailua.informatik.uni-freiburg.de>
Message-ID: <1062519609.4591.29.camel@tot>

On Tue, 2003-09-02 at 06:35, Peter Thiemann wrote:
> >>>>> "zooko" == zooko  <zooko@zooko.com> writes:
> 
>     zooko> Since SHA-1, SHA-256, and SHA-512 are unambiguously distinguishable by the 
>     zooko> length of the hash value, one could specify that "one of the three SHA 
>     zooko> standards" is the default, and which one is determined by
>     zooko> the length.
> 
> I've seen this wish before, but I don't think that the name SHA-1
> should be attached to the longer variants. I see two approaches for
> including the longer SHA standard in the identifiers:
> 1. change hash-scheme from "sha1" to "sha" and distinguish the
>    variants by the length of the hash;
> 2. add hash-schemes "sha256", "sha384", and "sha512".

Hello from a lurker.
I really think it is a bad idea form a design stand point to deside on
the interpertation of the URIs format based on length. It feals realy
rong. From a prictal stand point what would hapen if sha1024 was defind
and a older pease of soft ware did was not upgreaded. The sofware should
be helped as much as posible in knoing the difreance between malformed
and unsuported datatypes. You also make is much easer for the
introduction of bugs by not being explisit about the length of the URI.
It would be easy to imagan a C or SQL appclation that was hade bugs
becouse it was wirten when only the sha1 format was commen. Where we
would like software to only be writen by people who don't do stupid
things this is not aculy the world we live in. Even the most skiled
programers in the world somtimes make realy dumb mestakes. Protocals
should do there best to be as transparent as posible so as to ease
implmitation. I think in this case haveing "sha1", "sha256"... ect URIs
is the right answer.

-Jonathan
Back to lurking I think.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030902/7b5dca71/attachment.pgp
From jv at zork.net  Tue Sep  2 21:49:40 2003
From: jv at zork.net (Juggler Vain)
Date: Sat Dec  9 22:12:22 2006
Subject: May be discard all messages longer than 10KB (Was: [p2p-hackers] Re:
	Thank you!)
In-Reply-To: <20030902154031.7C44B3FC44@capsicum.zgp.org>
References: <20030902154031.7C44B3FC44@capsicum.zgp.org>
Message-ID: <20030902214940.GB15317@zork.net>

Over a thousand words in 10KB... had I some thing longer/larger, I could
tack it onto some url.  -jv


From thiemann at informatik.uni-freiburg.de  Thu Sep  4 15:35:14 2003
From: thiemann at informatik.uni-freiburg.de (Peter Thiemann)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] draft-thiemann-hash-urn-01.txt
In-Reply-To: <m3n0egklqc.fsf@kailua.informatik.uni-freiburg.de>
References: <m3n0egklqc.fsf@kailua.informatik.uni-freiburg.de>
Message-ID: <m3bru04le5.fsf@kailua.informatik.uni-freiburg.de>

This message contains the first revision of ID
draft-thiemann-hash-urn-00.txt, which contains a namespace
application. All comments have been considered.

-------------- next part --------------


Network Working Group                                       P. Thiemann
Internet-Draft                                      Freiburg University
Category: Informational                                4 September 2003
Expires: March 4, 2004


     A URN Namespace For Identifiers Based on Cryptographic Hashes
                     draft-thiemann-hash-urn-01.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on March 4, 2004.

Copyright Notice

   Copyright (C) The Internet Society (2003).  All Rights Reserved.

Abstract

   This document describes a URN namespace to identify immutable, typed
   resources using content-based unique identifiers.  The naming scheme
   relies on an algorithm that computes identifiers from media types and
   cryptographic hashes without a central authority.

1. Conventions used in this document

   The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
   in this document are to be interpreted as defined in "Key words for
   use in RFCs to Indicate Requirement Levels" [RFC2119].


Thiemann                                               [Page 1]

Internet-Draft     URNs Based on Cryptographic Hashes   4 September 2003


2. Introduction

   A URN serves as a unique name for a resource [RFC1630].  Most URN
   namespaces involve a central authority to ensure uniqueness of
   assigned names.  This approach has its merits but it requires
   organizational structures for processing requests for naming and for
   bookkeeping about used names.  Thus, acquiring a URN becomes an
   involved task not to be undertaken on a day-to-day basis.

   A URN namespace based on cryptographic hashes enables using and
   creating URNs on a day-to-day basis for storing and retrieving
   immutable resources.  It relies on a decentralized, algorithmic
   assignment of identifiers by exploiting the uniqueness guarantees of
   (cryptographic) hashes.  This document contains the assignment
   algorithm so that everyone can generate identifiers in this
   namespace.

   The namespace provides identifiers for typed resources with
   application/octet-stream as a default type.

   This namespace specification is for a formal namespace.  The
   specification adheres to the guidelines given in "Uniform Resource
   Names (URN) Namespace Definition Mechanisms" [RFC3406].

3. Specification Template

   Namespace ID:

         "hash" requested.

   Registration Information:

         Registration Version Number: 1

         Registration Date: 2003-09-??

   Declared registrant of the namespace:

         The CBUID Project
         Institut fuer Informatik
         Universitaet Freiburg
         Georges-Koehler-Allee 079
         D-79110 Freiburg
         Germany

      Contact:
         Peter Thiemann
         info@cbuid.org


Thiemann                                               [Page 2]

Internet-Draft     URNs Based on Cryptographic Hashes   4 September 2003


   Declaration of syntactic structure:

         The Namespace Specific Strings (NSS) of all URNs assigned by
         the schema described in this document will conform to the
         syntax defined in section 2.2 of RFC2141 [RFC2141].  The formal
         syntax of the NSS is defined by the following normative ABNF
         [RFC2234] rules for <hash-nss>:

     hash-nss    = [media-type] ":" [hash-scheme] ":" hash-value
     hash-scheme = "md5" / "sha1" / "sha256" / "sha384" / "sha512"
     hash-value  = 1*(ALPHA / DIGIT / ".")

         The following are comments and restrictions not captured by the
         above grammar.

         A <media-type> is any MIME media type [RFC2046] which is
         registered in the appropriate IANA registry [IANA-MT].  There
         is no default for the <media-type> specification.  If omitted,
         then the media type is unspecified, thus leaving the
         application complete freedom to interpret the resource.

         If the <hash-scheme> specification is omitted, then the length
         of the <hash-value> unambiguously selects one of "sha1",
         "sha256", "sha384", or "sha512" according to the following
         table.

                length of <hash-value>    |    implied <hash-scheme>
            ------------------------------+-----------------------------
                       32                 |          "sha1"
                       56                 |          "sha256"
                       80                 |          "sha384"
                      104                 |          "sha512"

         A <hash-value> is a non-empty sequence of characters encoding a
         sequence of bits which must be a valid hash for the specified
         hash-scheme.  The encoding depends on the <hash-scheme>.  If
         <hash-scheme> is "md5", then <hash-value> is the base16
         encoding [RFC3548] of the 16 octets of the MD5 hash value of
         the resource (most significant octet first) so that the <hash-
         value> consists of 32 HEXDIG.  If <hash-scheme> is "sha1", then
         <hash-value> is the base32 encoding [RFC3548] of the 20 octets
         of the SHA1 hash value of the resource (most significant octet
         first) so that the <hash-value> consists of 32 BASE32DIG.  The
         other "sha" <hash-value>s are handled analogously according to
         the above table.

         In any case, the <hash-value> MUST provide the correct number
         of bits for the chosen <hash-scheme>, 128 for "sha1", 256 for


Thiemann                                               [Page 3]

Internet-Draft     URNs Based on Cryptographic Hashes   4 September 2003


         "sha256", 384 for "sha384", and 512 for "sha512".

         Examples:

     urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72

     urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL

     urn:hash:::JRBFASJWGY3EKRBSKFJVOVSEGNLFGTZVIJDTKURVGRKEKMRSKFGA====

         The implied <hash-scheme> for this identifier is "sha256" since
         the <hash-value> consists of 56 BASE32DIG and specifies 256
         bits.

     urn:hash:text/plain::LBPI666ED2QSWVD3VSO5BG5R54TE22QL

         The implied <hash-scheme> for this identifier is "sha1" since
         the <hash-value> consists of 32 BASE32DIG and specifies 160
         bits.

     urn:hash:message/rfc822:md5:5307d294b6ccd9854f2deed8c1628b72

   Relevant ancillary documentation:

         None as yet.

   Identifier uniqueness considerations:

         Each identifier contains a cryptographic hash value for the
         referenced resource.  The probability that two different
         resources have the same hash value depends on the hash
         function.  For the MD5 hash where the hash value has 128 bits,
         it is conjectured [RFC1321] that the probability of a collision
         is in the order of 1/2^64 by reasoning with the birthday
         attack.  For the sha1 hash where the hash value has 160 bits,
         the same attach yields a probability of 1/2^80 for a collision.

   Identifier persistence considerations:

         The binding between the identifier and the referenced resource
         is permanently established by the assignment algorithm that
         computes the identifier from the resource.

         The persistence of an identifier for some resource A might be
         compromised by coming up with a different resource B with the
         same identifier.  However, this corresponds to solving the
         "second preimage problem" for either the MD5 algorithm or an
         algorithm of the SHA family.  This problem turns out to be much


Thiemann                                               [Page 4]

Internet-Draft     URNs Based on Cryptographic Hashes   4 September 2003


         harder than just producing a collision.  In fact, the handbook
         of applied cryptography [HAC] estimates that computing a second
         preimage takes on the order of 2^128 steps for MD5 and 2^160
         steps for SHA1.

   Process of identifier assignment:

         Assignment is completely open, following the algorithm below.

         The inputs of the algorithm are
            - the name <hash-scheme> of a hash function
            - a media type for <media-type>
            - a resource (a sequence of octets)

         The algorithm applies the hash function to the resource,
         converts the resulting bit sequence into a valid <hash-value>
         according to the <hash-scheme>, and constructs the URN by
         concatenating the <media-type>, the <hash-scheme>, and the
         <hash-value> using the syntax described above.  Algorithms for
         computing the hash functions mentioned in this document are
         defined in the following references:

            md5      [RFC1321]
            sha1     [RFC3174]
            sha256   [FIPS180-2]
            sha384   [FIPS180-2]
            sha512   [FIPS180-2]

         The conversion of a <hash-value> to a string in base16
         enconding proceeds as follows.  The bits in the <hash-value>
         are converted from most significant to least significant bit,
         four bits at a time to their ASCII presentation.  Each sequence
         of four bits is represented by its hexadecimal digit from
         "0123456789abcdef".  That is, binary 0000 gets represented by
         the character '0', 0001, by '1', and so on up to the
         representation of 1111 as 'f'.

         The conversion of a <hash-value> to a string in base32
         enconding proceeds as follows.  The bits in the <hash-value>
         are converted from most significant to least significant bit,
         five bits at a time to their ASCII presentation.  Each sequence
         of five bits is represented by its base32 digit from
         "abcdefghijklmnopqrstuvwxyz234567" as defined in [RFC3548].
         That is, binary 00000 gets represented by the character 'a',
         00001, by 'b', and so on up to the representation of 11111 as
         '7'. A value that does not consist of a number of bits which is
         divisible by five is padded with zero bits to the next multiple
         of five. The length of a base32 encoded bit string is always


Thiemann                                               [Page 5]

Internet-Draft     URNs Based on Cryptographic Hashes   4 September 2003


         divisible by eight. Padding of an incomplete 8 character group
         is done using the character '='.

   Process of identifier resolution:

         Not specified.

   Rules for Lexical Equivalence:

         Lexical equivalence is identity after normalization.  An
         identifier in the cbuid URN namespace is normalized by
         converting all characters to lower case

   Conformance with URN Syntax:

         There are no additional characters reserved.

   Validation mechanism:

         Each identifier in the namespace MUST conform with the syntax
         specified above.

   Scope:

         The namespace is global and public.

4. IANA Considerations

   This document includes a URN namespace registration that is to be
   entered into the IANA registry for URN NIDs.

5. Namespace Considerations

   Many URN namespaces are assigned to organizations and rely on a
   centralized registry to achieve uniqueness and persistency.  In
   contrast, the hash namespace is not tied to any organization.
   Assignment of identifiers can be performed and verified individually,
   while uniqueness is still preserved (with a probability close to 1).

   The hard coding of the hashing schemes into the namespace definition
   is intentional.  This is because a valid identifier should be able to
   act as a proxy for the the named resource.  That way, metainformation
   of descriptive or authoritative nature (such as endorsements,
   signatures, etc) can be attached to the identifier and need not be
   bundled with the actual resource.  Such a proxy functionality is only
   guaranteed as long as the underlying hashing scheme is not
   compromised, that is, as long as no collisions are found.


Thiemann                                               [Page 6]

Internet-Draft     URNs Based on Cryptographic Hashes   4 September 2003


   The encoding of the hash value is also hard coded into the
   definition.  We have chosen not to make the encoding an additional
   parameter of the URN scheme for two reasons

      1. it would make identifier normalization non-trivial;

      2. each hashing scheme has a standard encoding, which should be
         reflected in the identifier.

   One problem is the phasing out of compromised hash schemes. For
   instance, many believe that MD5 is "not sufficiently secure" on the
   grounds that it only provides 128 bit hashes and that colliding
   inputs have been constructed.  However, the only known approach for
   solving the second preimage problem, which appears to be more
   relevant for the application as an identifier, is brute force search
   through on the order of 2^128 inputs.

   If a procedure for computing a second preimage in significantly fewer
   operations is ever published, then resolvers SHOULD refuse to resolve
   the compromised hash scheme.  This is in line with the semantics of
   URNs which need to identify a resource uniquely but the resource need
   not be available forever (cf. the discussion in BCP 66 [RFC3406]).

6. Community Considerations

   Similar URNs are in use in peer-to-peer file transfer systems.  Most
   of them do not include a mediatype, although this practice can
   provide extra guarantees.  For example, a provider of metainformation
   can state that mediatype of the resource has been verified by
   including the mediatype in the published URN.  For many formats, the
   mediatype provides an additional self-verifiable attribute.

   Some URI schemes in common use may be easily derived from the hash
   scheme.

         1. The sha1 scheme

            urn:sha1:<sha1-hash-value>

         is equivalent to

            urn:hash::sha1:<sha1-hash>

         and even to

            urn:hash:::<sha1-hash>

         2. Another proposed scheme is based on the data URL


Thiemann                                               [Page 7]

Internet-Draft     URNs Based on Cryptographic Hashes   4 September 2003


            urn:data-hash:text/plain;sha1,<sha1-hash>

         which is equivalent to

            urn:hash:text/plain:sha1:<sha1-hash>

         In this case, the identifier from the hash namespace has a
         simpler, more regular structure.

7. Security Considerations

   The use of the namespace per se does have security implications.
   However, it should be kept in mind that the uniqueness guarantee
   given by cryptographic hashes is only probabilistic and that no known
   procedure (save bitwise comparision) can provide a 100% guarantee of
   the identify of the hashed resource.

Normative References

   [FIPS180-2] National Institute of Standards and Technology,
   "Specifications for the SECURE HASH STANDARD", August 2002.
   http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf

   [RFC1321] Rivest, R. L., "The MD5 Message-Digest Algorithm", RFC
   1321, April 1992.

   [RFC2046] Freed, N., and Borenstein, N., "Multipurpose Internet Mail
   Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996.

   [RFC2119] Bradner, S., "Key Words for Use in RFCs to Indicate
   Requirement Levels", RFC 2119, March 1997.

   [RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997.

   [RFC2234] Crocker, D., Editor, and P. Overell, "Augmented BNF for
   Syntax Specifications: ABNF", RFC 2234, November 1997.

   [RFC3174] Eastlake, E., and Jones, P., "US Secure Hash Algorithm 1
   (SHA1)", RFC 3174, September 2001.

   [RFC3548] Josefsson, S. (Ed.), "The Base16, Base32, and Base64 Data
   Encodings", RFC 3548, July 2003.

Informational References

   [HAC]  Menezes, Alfred J., van Oorschot, Paul C., and Vanstone, Scott
   A., Handbook of Applied Cryptography, CRC Press, 5th printing, August
   2001.


Thiemann                                               [Page 8]

Internet-Draft     URNs Based on Cryptographic Hashes   4 September 2003


   [IANA-MT] IANA Registry of Media Types: ftp://ftp.isi.edu/in-
   notes/iana/assignments/media-types/

   [RFC1630] Berners-Lee, T., "Universal Resource Identifiers in WWW,"
   RFC 1630, June 1994.

   [RFC3406] Daigle, L., van Gulik, D.W., Iannella, R., and Faltstrom,
   P., "Uniform Resource Names (URN) Namespace Definition Mechanisms",
   RFC 3406, October 2002.

Contributors

         Stephanie Kollenz

         Matthias Neubauer

Author's Address

         Peter Thiemann
         Institut fuer Informatik
         Universitaet Freiburg
         Georges-Koehler-Allee 079
         D-79110 Freiburg
         Germany

         Phone: +49 761 203 8051
         EMail: thiemann@acm.org
         URL: http://www.informatik.uni-freiburg.de/~thiemann

Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights.  Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11.  Copies of
   claims of rights made available for publication and any assurances of
   licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such
   proprietary rights by implementors or users of this specification can
   be obtained from the IETF Secretariat.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice


Thiemann                                               [Page 9]

Internet-Draft     URNs Based on Cryptographic Hashes   4 September 2003


   this standard.  Please address the information to the IETF Executive
   Director.


Full Copyright Statement

   Copyright (C) The Internet Society (2003).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assignees.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Thiemann                                              [Page 10]

From baford at mit.edu  Fri Sep  5 01:43:33 2003
From: baford at mit.edu (Bryan Ford)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
Message-ID: <200309042143.33291.baford@mit.edu>

Hi Peter, I have a couple comments.

First, while I have no problems with the "full" syntax your draft specifies 
for hash-based URNs, pragmatically I believe that, whether any standard 
"officially" allows it or not, people are going to use shorthands like:

	md5:5307d294b6ccd9854f2deed8c1628b72

to mean:

	urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72

and:

	sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL

to mean:

	urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL

Personally I think it would be best if the specification anticipates and 
embraces this inevitable shorthand usage rather than either ignoring or 
attempting to  forbid it.  In effect, I think this specification should 
specify a new URN scheme called "hash:", with the syntax defined already, AND 
define the five "shorthand" URN scheme names "md5:", "sha1:", "sha256:", 
"sha384:", and "sha512:".  Especially if the standard doesn't allow new hash 
methods to be added willy-nilly (a philosophy I agree with), there won't be 
many, so this shorthand usage isn't going to create any problem with 
pollution of the URN scheme namespace.  The shorthand schemes don't support a 
<media-type> specification, of course, but that's OK since the <media-type> 
is optional in the longhand syntax anyway.

Secondly, your draft as written seems to imply that a hash-identifier URN can 
contain _nothing_ but the media type, hash scheme, and encoded hash value.  
This implication neglects important potential (and likely) applications of 
this scheme in which the hash ID is used as a _starting_point_ from which to 
find something else via a more conventional naming strategy.  For example, 
the hash-value in a particular URN might be the hash of the root node of a 
directory metadata tree representing a complete read-only (e.g., SFSRO[1]) 
file system, or the public key of a read-write (e.g., SFS[2]) file system, 
and the user might want to use the "rest" of the URN after the hash-value to 
name a particular file in that file system, like so:

	urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL/foo/bar/blah

or search a hash-identified database using some kind of "query" syntax:

	urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL?find=this

Of course, the meaning of whatever comes after the hash-value, if any, can 
only in general be determined with respect to whatever object the hash-value 
specifies, so your specification cannot and should not really specify 
anything about the precise format of this "remainder" part of the URN.  A 
reasonable restriction, however, would be that the "remainder" part (if 
present) start with a non-alphanumeric character, to avoid any confusion 
about where the hash-value ends.  (All characters in the remainder portion 
must also be legal URN-characters of course.)

Thanks,
Bryan

[1] http://www.pdos.lcs.mit.edu/papers/sfsro.html
[2] http://www.fs.net/sfswww/

From hopper at omnifarious.org  Fri Sep  5 04:01:02 2003
From: hopper at omnifarious.org (Eric M. Hopper)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <200309042143.33291.baford@mit.edu>
References: <200309042143.33291.baford@mit.edu>
Message-ID: <1062734461.7180.25.camel@monster.omnifarious.org>

On Thu, 2003-09-04 at 20:43, Bryan Ford wrote:
> Of course, the meaning of whatever comes after the hash-value, if any, can 
> only in general be determined with respect to whatever object the hash-value 
> specifies, so your specification cannot and should not really specify 
> anything about the precise format of this "remainder" part of the URN.  A 
> reasonable restriction, however, would be that the "remainder" part (if 
> present) start with a non-alphanumeric character, to avoid any confusion 
> about where the hash-value ends.  (All characters in the remainder portion 
> must also be legal URN-characters of course.)

I'm not so sure this is the right way to do things.

A urn is supposed to uniquely identify some unchanging item  It can
uniquely identify the destination of a message for example.  But, a URL
is for defining the class of messages something will accept.

http://host/path

The host part could be a urn identifying the host.  But, the http part
is what specifies how you're speaking to that host, and I think that
part is necessary for any scheme in which a conversation with something
is implied.

The hash urn's initial envisioned use is for fetching static globs of
data,  But, the urn itself implies no fetching method, it just uniquely
(probabalistically, though that has a less then miniscule chance of ever
mattering) identifies a particular glob of data that can be fetched
using a variety of different protocols.  I can see it being used to
identify hosts, but as soon as you start talking about what protocol
you're going to be speaking with the host, a URL is indicated.

Have fun (if at all possible),
--
There's an excellent C/C++/Python/Unix/Linux programmer with a wide
range of other experience and system admin skills who needs work.
Namely, me. http://www.omnifarious.org/~hopper/resume.html
-- Eric Hopper <hopper@omnifarious.org>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 185 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030904/1974b32a/attachment.pgp
From b.fallenstein at gmx.de  Fri Sep  5 09:38:10 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <200309042143.33291.baford@mit.edu>
References: <200309042143.33291.baford@mit.edu>
Message-ID: <3F585982.4070703@gmx.de>


Hi,

Bryan Ford wrote:
> Secondly, your draft as written seems to imply that a hash-identifier URN can 
> contain _nothing_ but the media type, hash scheme, and encoded hash value.  
> This implication neglects important potential (and likely) applications of 
> this scheme in which the hash ID is used as a _starting_point_ from which to 
> find something else via a more conventional naming strategy.  For example, 
...
> 	urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL/foo/bar/blah
...
> Of course, the meaning of whatever comes after the hash-value, if any, can 
> only in general be determined with respect to whatever object the hash-value 
> specifies, so your specification cannot and should not really specify 
> anything about the precise format of this "remainder" part of the URN.

So what *would* specify it? The point of having a central registry for 
URI schemes, URN namespaces and so on is that you can go to that 
registry to find out which specifications apply to a given URI. Who 
specifies what format the remainder has and how it is to be interpreted?

-b


From thiemann at informatik.uni-freiburg.de  Fri Sep  5 12:02:07 2003
From: thiemann at informatik.uni-freiburg.de (Peter Thiemann)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <200309042143.33291.baford@mit.edu>
References: <200309042143.33291.baford@mit.edu>
Message-ID: <m3llt3fnpc.fsf@kailua.informatik.uni-freiburg.de>

>>>>> "bf" == Bryan Ford <baford@mit.edu> writes:

    bf> Hi Peter, I have a couple comments.
    bf> First, while I have no problems with the "full" syntax your draft specifies 
    bf> for hash-based URNs, pragmatically I believe that, whether any standard 
    bf> "officially" allows it or not, people are going to use shorthands like:

    bf> 	md5:5307d294b6ccd9854f2deed8c1628b72

    bf> to mean:

    bf> 	urn:hash::md5:5307d294b6ccd9854f2deed8c1628b72

    bf> and:

    bf> 	sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL

    bf> to mean:

    bf> 	urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL

    bf> Personally I think it would be best if the specification anticipates and 
    bf> embraces this inevitable shorthand usage rather than either ignoring or 
    bf> attempting to  forbid it.  In effect, I think this specification should 
    bf> specify a new URN scheme called "hash:", with the syntax defined already, AND 
    bf> define the five "shorthand" URN scheme names "md5:", "sha1:", "sha256:", 
    bf> "sha384:", and "sha512:".  Especially if the standard doesn't allow new hash 
    bf> methods to be added willy-nilly (a philosophy I agree with), there won't be 
    bf> many, so this shorthand usage isn't going to create any problem with 
    bf> pollution of the URN scheme namespace.  The shorthand schemes don't support a 
    bf> <media-type> specification, of course, but that's OK since the <media-type> 
    bf> is optional in the longhand syntax anyway.

What you are suggesting makes sense to me. However, formally
your proposed shorthands are no longer URNs, but general URIs.
I suppose you'd have to write an RFC to make them official and a
logical approach would be to first get the URN namespace and then
submit an RFC that defines these shorthand URIs in terms of the URN
namespace.

    bf> Secondly, your draft as written seems to imply that a hash-identifier URN can 
    bf> contain _nothing_ but the media type, hash scheme, and encoded hash value.  
    bf> This implication neglects important potential (and likely) applications of 
    bf> this scheme in which the hash ID is used as a _starting_point_ from which to 
    bf> find something else via a more conventional naming strategy.  For example, 
    bf> the hash-value in a particular URN might be the hash of the root node of a 
    bf> directory metadata tree representing a complete read-only (e.g., SFSRO[1]) 
    bf> file system, or the public key of a read-write (e.g., SFS[2]) file system, 
    bf> and the user might want to use the "rest" of the URN after the hash-value to 
    bf> name a particular file in that file system, like so:

    bf> 	urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL/foo/bar/blah

    bf> or search a hash-identified database using some kind of "query" syntax:

    bf> 	urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL?find=this

    bf> Of course, the meaning of whatever comes after the hash-value, if any, can 
    bf> only in general be determined with respect to whatever object the hash-value 
    bf> specifies, so your specification cannot and should not really specify 
    bf> anything about the precise format of this "remainder" part of the URN.  A 
    bf> reasonable restriction, however, would be that the "remainder" part (if 
    bf> present) start with a non-alphanumeric character, to avoid any confusion 
    bf> about where the hash-value ends.  (All characters in the remainder portion 
    bf> must also be legal URN-characters of course.)

People are sceptical about adding application specific parts to URN
identifiers. In fact, an earlier revision of the proposal (see
http://www.cbuid.org/) allowed such extensions based on the content 
type of the resource and it defined one such extension for
message/rfc822. But there were other issues that made the proposal
baroque. 

For your example, this approach would require to define content types
like application/file-system-root and application/database then define
a syntax for the suffix. 

Is there more support for providing such an extension? 

It seems rather heavyweight to me and I'm not sure if it's worthwhile
for the specific examples given. In both variants of the SFS file
system, any resource is reachable from an SFS enabled computer using a
URL like

file:///sfs/...

Or am I missing something?

-Peter

    bf> Thanks,
    bf> Bryan

    bf> [1] http://www.pdos.lcs.mit.edu/papers/sfsro.html
    bf> [2] http://www.fs.net/sfswww/


From baford at mit.edu  Fri Sep  5 15:12:24 2003
From: baford at mit.edu (Bryan Ford)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <m3llt3fnpc.fsf@kailua.informatik.uni-freiburg.de>
References: <200309042143.33291.baford@mit.edu>
	<m3llt3fnpc.fsf@kailua.informatik.uni-freiburg.de>
Message-ID: <200309051112.24936.baford@mit.edu>

Peter Thiemann wrote:
> What you are suggesting makes sense to me. However, formally
> your proposed shorthands are no longer URNs, but general URIs.
> I suppose you'd have to write an RFC to make them official and a
> logical approach would be to first get the URN namespace and then
> submit an RFC that defines these shorthand URIs in terms of the URN
> namespace.

Yes, after reviewing RFC2141 I can see what you mean - I agree with your 
approach.  (Although I think it's a bit unfortunate that RFC2141 formally 
treats the URN NID-space as separate and independent of the URI scheme 
namespace and formally requires all URNs to have a URI scheme of "urn:" - 
because in practice as soon as URNs become commonplace nobody's ever going to 
bother typing the "urn:" part anymore, and we'll just have to treat URN NIDs 
as a subset of the URI schemes anyway, all existing in the same "de facto" 
namespace.)

> People are sceptical about adding application specific parts to URN
> identifiers. In fact, an earlier revision of the proposal (see
> http://www.cbuid.org/) allowed such extensions based on the content
> type of the resource and it defined one such extension for
> message/rfc822. But there were other issues that made the proposal
> baroque.
>
> For your example, this approach would require to define content types
> like application/file-system-root and application/database then define
> a syntax for the suffix.

Sure - is there anything wrong with that?  If we're talking about SFSRO file 
systems, for example, since SFSRO isn't a formally standardized format, I 
might just invent the content-type "application/x-sfsro-root" and then 
informally define the interpretation of the "remainder" part of a URN of type 
"application/x-sfsro-root" to be a path name resolved from the specified 
SFSRO root.  If SFSRO or something like it ever _were_ formally standardized, 
then we might indeed end up with a content-type of 
"application/file-system-root" or something like that, in which case we would 
also have to (formally) standardize the interpretation of the "remainder" 
part of a URN for that content-type.

To be clear, I'm not proposing that your hash-URN specification require ALL 
content-type standards also to specify an interpretation for the remainder 
portion of a hash-URN of that content-type.  That would be impossible since 
there are so many content-types defined already that are oblivious to URNs.  
All you need to do is specify that IF a hash-URN contains a "remainder" 
portion after the hash-value, then it is interpreted in a fashion specific to 
the indicated object's content-type.  If you want to be really formally 
picky, you could even specify explicitly that a hash-URN for a given object 
MUST NOT contain a "remainder" string UNLESS the standard defining the 
object's content-type specifies an interpretation for such a string - but 
personally I don't think it's necessary to go this far.

> It seems rather heavyweight to me

What is "heavyweight" about simply allowing hash-URNs to contain 
application-specific supplementary information?  It seems to me the ultimate 
"lightweight" mechanism - you don't have to specify anything except the fact 
that such supplementary information may be syntactically appended to the end 
of the string.

> and I'm not sure if it's worthwhile
> for the specific examples given. In both variants of the SFS file
> system, any resource is reachable from an SFS enabled computer using a
> URL like
>
> file:///sfs/...
>
> Or am I missing something?

You're missing the fact that SFS and similar systems may have valid and 
legitimate reasons for using URN-format naming of this kind.  The whole point 
of defining a single standardized URI/URL/URN namespace scheme is 
_convergence of namespaces_ - the goal of making resource names as uniform 
and interchangeable as possible.  If you deny a whole class of applications 
entrance to the hash-URN scheme because of this trivial and unnecessary 
limitation, you're reducing the potential convergence of resource namespaces 
the scheme can generate.

To make my argument more concrete, consider your proposal simply to use the 
existing "file://" URI syntax to express an SFS pathname, such as:

	file:///sfs/@sfs.fs.net,uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah

Now the "file:" URI scheme is explicitly defined by RFC 1630 to be a scheme 
for indicating _local_ file system access, so any agent that tries to 
interpret the above URI must assume that it won't necessarily be valid from 
one system to the next.  Indeed, there is no global standard saying that the 
SFS file system has to be mounted as "/sfs" on any given machine; on _my_ 
machine the appropriate pathname might instead be:

	file:///my-sfs/@sfs.fs.net,uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah

But SFS by design represents a truly global file system namespace, in which 
location-independent, self-certifying hash-identities play a fundamental role 
-  so why shouldn't SFS be allowed to take part in the convergence of the URN 
hash-namespaces?  It can with your current specification, but only in a 
trivial and limited way.  With your hash-URN specification as it stands, I 
could write a location-independent URN to name the root directory of a 
particular SFS server (named by the hash of its public key), like so:

	urn:application/x-sfs:sha1:uzwadtctbjb3dg596waiyru8cx5kb4an

...but I can't write a URN that names anything else on that server!  Why not, 
by the trivial extension I'm proposing, allow hash-URNs that can name not 
only the hashed object itself, but objects discovered and named _relative to_ 
the hashed object?  Like so:

	urn:application/x-sfs:sha1:uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah

To take this argument further, consider that SFS file systems can contain 
symbolic links, and those symbolic links can currently refer to other objects 
on the same or a different SFS file system, like so:

	foo -> blah	(relative symbolic link to same SFS file system)
	bar -> /sfs/@metoo.net,uzwadtctbjb3dg596waiyru8cx5kb4an/blah
		(absolute link to object on another SFS file system)

...but SFS symbolic links currently can't refer to objects on anything _but_ 
the local file system or another SFS file system.  Suppose we extended SFS to 
allow symlinks to point to arbitrary URIs rather than just conventional 
pathnames, and added a "plug-in" mechanism of some kind by which other URI 
schemes could be interpreted and resolved by the SFS daemon.  Then I could 
place symlinks such as the following into an SFS file system, and access them 
as if they were ordinary files or directories:

	redhat -> ftp://ftp.redhat.com/pub/redhat
	draft-thiemann-hash-urn-01.txt -> urn::md5:d07a37a1e199acb410b12fb07ffb279b
	bar -> urn:application/x-sfs:sha1:uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar
	msgpart -> urn:message/rfc822:md5:da1dbd8b93153e8adfaf4bb0220f0293/part1

(...the last example being a reference to a MIME-encapsulated portion of a 
multipart E-mail message, named by the complete message's hash and the MIME 
Content-ID of the desired part.)

I realize that such an extension to solve would create additional practical 
issues to address specifically in the context of SFS and Unix file systems 
(e.g., what you see when you "cd" to the "redhat" link above and then type 
"pwd"), but such issues are orthogonal to this discussion.  The point is that 
the hash-URN scheme you are proposing should maximize opportunities for 
namespace convergence, and if you limit the specification so that only the 
hashed object _itself_ can be named, and not other objects relative to the 
hashed object, then you're severely limiting the usefulness of this URN 
scheme for no good reason.  No application or content-type is forced to take 
advantage of this flexibility, but the flexibility should be there.

Thanks,
Bryan

From hopper at omnifarious.org  Fri Sep  5 17:12:12 2003
From: hopper at omnifarious.org (Eric M. Hopper)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <200309051112.24936.baford@mit.edu>
References: <200309042143.33291.baford@mit.edu>
	<m3llt3fnpc.fsf@kailua.informatik.uni-freiburg.de>
	<200309051112.24936.baford@mit.edu>
Message-ID: <1062781931.7180.36.camel@monster.omnifarious.org>

On Fri, 2003-09-05 at 10:12, Bryan Ford wrote:
> 	urn:application/x-sfs:sha1:uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah

Why is sfs://uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah not even
mentioned?  How is the scheme you propose better than that one?  It's
not like programs are going to suddenly start supporting sfs magically
because you're using the urn namespace.  In either case, the application
will specifically have to have sfs support.

To me, using the urn scheme is very wrong.  You aren't naming
something.  You're giving a method of retrieving something.  The
something that you retrieve in that way might be different from day to
day.  A urn, once created, should forever refer to exactly the same
thing.

I want a different namespace for naming things that stay the same
forever, and for things that you have a conversation with.  A urn should
refer to something that is forever exactly what the urn refers to and is
never anything else.

Maybe I'm wrong, and that's what uri: is for.  If that's the case, then
the whole hash name thing should be a uri and not a urn.

Have fun (if at all possible),
--
There's an excellent C/C++/Python/Unix/Linux programmer with a wide
range of other experience and system admin skills who needs work.
Namely, me. http://www.omnifarious.org/~hopper/resume.html
-- Eric Hopper <hopper@omnifarious.org>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 185 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030905/c0f82780/attachment.pgp
From baford at mit.edu  Fri Sep  5 18:41:16 2003
From: baford at mit.edu (Bryan Ford)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
Message-ID: <200309051441.17233.baford@mit.edu>

Eric Hopper wrote:
>The hash urn's initial envisioned use is for fetching static globs of
>data,  But, the urn itself implies no fetching method, it just uniquely
>(probabalistically, though that has a less then miniscule chance of ever
>mattering) identifies a particular glob of data that can be fetched
>using a variety of different protocols.  I can see it being used to
>identify hosts, but as soon as you start talking about what protocol
>you're going to be speaking with the host, a URL is indicated.

Allowing an application-specific name component at the end of a hash-based URN 
in no way implies a specification of how the named object (or the named 
portion of it) is to be accessed.  That's what the URI scheme name is for 
(e.g., "http:"), and I'm not suggesting that we embed a scheme name anywhere 
in any kind of URN.  All I'm saying is that, once we've come up with a 
location- and protocol-independent name for a big "static glob of data" as 
you put it, sometimes we want to be able to name specific _portions_ of that 
static glob of data, or other objects that are directly and closely related 
to it.

>>urn:application/x-sfs:sha1:uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah
>
>Why is sfs://uzwadtctbjb3dg596waiyru8cx5kb4an/foo/bar/blah not even
>mentioned?  How is the scheme you propose better than that one?

Neither is "better" than the other; it's just that the latter names a specific 
protocol/method of getting to an object (the SFS protocol) whereas the former 
is a location- and protocol-independent name for the object itself.  You seem 
to be convinced that I'm trying to "sneak" a way of specifying protocols or 
methods into URN syntax.  I'm not!!!  The hypothetical "application/x-sfs" 
content-type that appears in the URN I proposed is NOT naming a protocol, but 
simply a data format, like "text/html" or "image/jpeg".  To avoid confusion 
maybe I should have called it "application/x-sfs-root" or something like 
that.  But in any case with "urn:application/x-sfs-root:..." syntax we're 
talking about a static glob of data - exactly what URNs are supposed to do - 
and not specifying anything about how to find or get to it.  The fact that 
the SFS file system as it currently is happens to have an access protocol 
defined for it too is incidental.

>It's
>not like programs are going to suddenly start supporting sfs magically
>because you're using the urn namespace.  In either case, the application
>will specifically have to have sfs support.

It's not like programs are suddenly going to start supporting any other kind 
of URN either, hash-based or otherwise.  You have to add support for them 
before you'll be able to use them for anything.

In any case, the whole SFS thing is just an example.  Maybe that example was 
too confusing, so here's another, perhaps simpler and more accessible 
example.  Say we have a particular HTML document, which we hash and give the 
name:

	urn:text/html:md5:95836eee95d7b33f1d08c36e8f99d876

which at some point happens to be accessible via HTTP at this location:

	http://current.location.com/foo/bar/blah.html

But this HTML document has anchor tags in it, which with a conventional URL we 
can name like this:

	http://current.location.com/foo/bar/blah.html#section1

Since conventional URL syntax already allows us to name a particular portion 
of an object, and the object can be named via a location- and 
protocol-independent URN, why shouldn't we also be able to name the same 
portion of the same object using a location- and protocol-independent URN?  
Like this, for example:

	urn:text/html:md5:95836eee95d7b33f1d08c36e8f99d876#section1

Adding on the "#section1" at the end in no way implies that we have to use 
HTTP or any other specific protocol to obtain the referenced object; it just 
means that once we find the static bit stream glob named by the hash-value, 
the part of the object we're _really_ interested in is the part labeled 
"section1" in the document itself.  Since the document named by the 
<hash-value> can never change (assuming the cryptographic security properties 
blah blah blah), the portion of the document named by the "#section1" can 
never change either.

In summary, I'm not proposing to change in any way the basic semantic meaning 
of URNs as location- and protocol-independent names.  I'm just saying we need 
to allow the flexibility to name portions of a hashed data object and not 
just "the whole hashed data object" itself.

Bryan

From baford at mit.edu  Fri Sep  5 19:01:38 2003
From: baford at mit.edu (Bryan Ford)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <20030905171217.BFAD03FD36@capsicum.zgp.org>
References: <20030905171217.BFAD03FD36@capsicum.zgp.org>
Message-ID: <200309051501.38229.baford@mit.edu>

Benja Fallenstein wrote:
> Bryan Ford wrote:
> > Secondly, your draft as written seems to imply that a hash-identifier URN
> > can contain _nothing_ but the media type, hash scheme, and encoded hash
> > value. This implication neglects important potential (and likely)
> > applications of this scheme in which the hash ID is used as a
> > _starting_point_ from which to find something else via a more
> > conventional naming strategy.  For example,
>
> ...
>
> > 	urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL/foo/bar/blah
>
> ...
>
> > Of course, the meaning of whatever comes after the hash-value, if any,
> > can only in general be determined with respect to whatever object the
> > hash-value specifies, so your specification cannot and should not really
> > specify anything about the precise format of this "remainder" part of the
> > URN.
>
> So what *would* specify it? The point of having a central registry for
> URI schemes, URN namespaces and so on is that you can go to that
> registry to find out which specifications apply to a given URI. Who
> specifies what format the remainder has and how it is to be interpreted?

For clarity let's call the proposed "remainder" portion of a hash-based URN a 
"relative name".  The relative name simply identifies a logical "portion" or 
a "subcomponent" of the whole object indicated by the <hash-value>.  Since 
the whole object can only be meaningfully interpreted in terms of its 
content-type, whoever specifies a particular content-type is responsible for 
defining the interpretation of relative names for objects of that type.  Not 
all content-types need to define such interpretations: if the content-type 
specification doesn't define an interpretation for relative names, then 
relative names are undefined and should not be used in URNs for objects of 
that content-type.

It might make sense for Peter's hash-based URN specification to define 
specific interpretations of relative names for a few existing 
well-established content-types, such as "text/html" and "message/rfc822".  
But in general specifying the interpretation of relative names for a given 
content-type should be left  to whoever specifies the content-type, or to 
follow-on RFCs related to that content-type.

Cheers,
Bryan


From b.fallenstein at gmx.de  Fri Sep  5 19:56:40 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <200309051501.38229.baford@mit.edu>
References: <20030905171217.BFAD03FD36@capsicum.zgp.org>
	<200309051501.38229.baford@mit.edu>
Message-ID: <3F58EA78.9040608@gmx.de>

Bryan Ford wrote:
> Benja Fallenstein wrote:
>>So what *would* specify it? The point of having a central registry for
>>URI schemes, URN namespaces and so on is that you can go to that
>>registry to find out which specifications apply to a given URI. Who
>>specifies what format the remainder has and how it is to be interpreted?
> 
> 
> For clarity let's call the proposed "remainder" portion of a hash-based URN a 
> "relative name".  The relative name simply identifies a logical "portion" or 
> a "subcomponent" of the whole object indicated by the <hash-value>.  Since 
> the whole object can only be meaningfully interpreted in terms of its 
> content-type, whoever specifies a particular content-type is responsible for 
> defining the interpretation of relative names for objects of that type.

Hm. Ok, based on content type-- I understand better now.

But hey, why not use fragment identifiers? They seem to be *exactly* 
what you're looking for? And they work on *all* URI schemes.

(Note that fragment identifiers are, according to current 
interpretations, somewhat misnomed: They need not identify a "fragment" 
of the data behind a URI, but can identify anything, as specified by the 
content type. E.g. RDF specifies that fragids can identify absolutely 
anything-- including cars, people and so on.)

So you could define application/file-system-root, and say then use e.g.

urn:hash:application/file-system-root:sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL#/foo/bar/blah

Then you could also make that data available through, for example, HTTP:

     http://example.org/myfiles#/foo/bar/blah

What do you think, does this address what you want?

- Benja


From hopper at omnifarious.org  Fri Sep  5 22:12:42 2003
From: hopper at omnifarious.org (Eric M. Hopper)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <200309051441.17233.baford@mit.edu>
References: <200309051441.17233.baford@mit.edu>
Message-ID: <1062799962.7180.52.camel@monster.omnifarious.org>

On Fri, 2003-09-05 at 13:41, Bryan Ford wrote:
> Since conventional URL syntax already allows us to name a particular portion 
> of an object, and the object can be named via a location- and 
> protocol-independent URN, why shouldn't we also be able to name the same 
> portion of the same object using a location- and protocol-independent URN?  
> Like this, for example:
> 
> 	urn:text/html:md5:95836eee95d7b33f1d08c36e8f99d876#section1

This makes it much clearer what you're talking about.  You are looking
for a way to specify a particular SFS root among the many that might be
available from a particular SFS id.  And that root, logically at least,
is always the exact same entity, much like a hostname.

This view of things makes sense, though I must say that using the
fragment identifier syntax instead makes it clearer what you're
intending.

I'm still not sure if I like this.  I don't really like the concept of a
content-type in the urn at all, as that seems to be information about
the data rather than a means of identifying the data.  But, that's a
minor problem compared the the mixing of purposes I thought you were
talking about originally.  But, if there is no content type, then trying
to make sense of the fragment identifier is hopeless.

As another thought...  In some sense, a fragment identifier is still
specifying a fetching method, since it refers to some piece of
information to be extracted from the referenced entity.   Though it does
refer to the same piece of information all the time, since the entity
itself is a static glob of data.

Hmm...
--
There's an excellent C/C++/Python/Unix/Linux programmer with a wide
range of other experience and system admin skills who needs work.
Namely, me. http://www.omnifarious.org/~hopper/resume.html
-- Eric Hopper <hopper@omnifarious.org>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 185 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030905/7b40baa3/attachment.pgp
From baford at mit.edu  Fri Sep  5 23:39:59 2003
From: baford at mit.edu (Bryan Ford)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <3F58EA78.9040608@gmx.de>
References: <20030905171217.BFAD03FD36@capsicum.zgp.org>
	<200309051501.38229.baford@mit.edu> <3F58EA78.9040608@gmx.de>
Message-ID: <200309051939.59533.baford@mit.edu>

Benja Fallenstein wrote:
> Bryan Ford wrote:
> > Benja Fallenstein wrote:
> >>So what *would* specify it? The point of having a central registry for
> >>URI schemes, URN namespaces and so on is that you can go to that
> >>registry to find out which specifications apply to a given URI. Who
> >>specifies what format the remainder has and how it is to be interpreted?
> >
> > For clarity let's call the proposed "remainder" portion of a hash-based
> > URN a "relative name".  The relative name simply identifies a logical
> > "portion" or a "subcomponent" of the whole object indicated by the
> > <hash-value>.  Since the whole object can only be meaningfully
> > interpreted in terms of its content-type, whoever specifies a particular
> > content-type is responsible for defining the interpretation of relative
> > names for objects of that type.
>
> Hm. Ok, based on content type-- I understand better now.
>
> But hey, why not use fragment identifiers? They seem to be *exactly*
> what you're looking for? And they work on *all* URI schemes.

After reviewing RFC2396, I see what you mean.  Since fragment identifiers are 
formally not part of a URI but merely part of a URI-reference, they can be 
attached to any URI, and their interpretation depends on the content-type of 
the object referenced by the URI - so far so good.  But on further analysis I 
see some serious problems with this approach.

> (Note that fragment identifiers are, according to current
> interpretations, somewhat misnomed: They need not identify a "fragment"
> of the data behind a URI, but can identify anything, as specified by the
> content type. E.g. RDF specifies that fragids can identify absolutely
> anything-- including cars, people and so on.)

True, but one (minor) problem is that they are at least _syntactically_ 
constrained - in particular, since they can only contain "URI characters" 
(*uric), and '#' isn't a URI character, the fragment identifier can't contain 
another '#', and so fragments aren't conveniently hierarchically composable.  
Adopting your syntax, suppose we want to name a fragment of an HTML file in 
an SFSRO-like file system of some kind.  The following natural syntax for a 
URI-reference is technically invalid because of the second '#':

	urn:hash:application/file-system-root:sha1:LBP...22QL#/foo/bar.html#sec1

Admittedly we could just require that the second '#' be escaped (which is why 
I call this a "minor" problem), but then if we ever get into a situation 
involving further compositions, we'd have to escape the third '#' twice, the 
fourth '#' three times, etc., which gets seriously obtuse.

The more serious problem arises when we consider the interactions with the 
relative URI resolution procedure as specified in section 5 of RFC2396.  
Suppose the file "bar.html" referenced above contains a hyperlink using the 
relative reference "blah.html".  What we _want_ to end up with is:

	urn:hash:application/file-system-root:sha1:LBP...22QL#/foo/blah.html

...but we won't, because the fragment part in the former URI-reference is not 
even considered part of the "base" URI, so the "/foo/" path gets incorrectly 
chopped off before we ever have a chance to append the "blah.html".  It's not 
even entirely clear to me what exactly happens according to the formal 
definition of the procedure, but whatever it is, it's not what we want.

It might be argued that URNs aren't supposed to have hierarchical path 
components at all, because hierarchical relationships imply location.  
(RFC2141 seems to imply this philosophy, since it defines the '/' character 
as "reserved" and makes no mention of if or how hierarchy is to be allowed 
for in the namespace-specific string portion of a URN.)  But I don't think 
that abstract hierarchical relationships necessarily imply anything about 
_physical_ location or access method, and physical locations and access 
methods are the things we're trying to get away from with URNs.

Take the case of a read-only, forever-immutable SFSRO file system-like tree 
structure.  Once you've identified the root of the tree with a fixed hash 
value, whose target will never change, you know that nothing else in the tree 
that you may find starting from that hash value will ever change either, 
however you may choose to walk through the tree and find data blobs 
representing the various directories and files.  Being able to name objects 
within the tree in a hierarchical fashion and use relative identifiers within 
it remains incredibly useful and compelling.  Hierarchical naming of this 
form does not break the URN model, because the hierarchical "locations" that 
these relative references refer to are completely abstract locations that are 
guaranteed never to change and have nothing to do with physical location or 
access method.

I really think this problem of hierarchy needs to be addressed somehow, 
although the best solution is not obvious to me given the apparent syntactic 
incompatibility between the definition of the "urn:" scheme and the 
hierarchical URI syntax.  If this problem isn't addressed, I fear that 
hash-URNs will only ever end up being used to name huge "packaged" data 
objects like downloadable zipfiles, tarballs, install images, and so on, 
because using them to refer to any more fine-grained data objects will just 
be too inconvenient to bother with.  For example, suppose you want to publish 
a whole web site, containing a bunch of little html files, images, etc., in 
"hash-URN" fashion...  Do you first have to walk through every html file in 
the tree (and everything else that may contain links) somehow and replace all 
the relative URIs with absolute URNs, and then assign and publish each 
individual file as an object, before anything will work?  If so, even with 
automated tools, I suspect few people will bother using hash-URNs this way 
simply because it requires changing the contents of the files being 
published.  If URNs are designed so that hierarchical relationships can be 
preserved when trees of objects are published and assigned permanent names, 
however, then it becomes much easier to move a whole tree of interconnected 
objects from "URL space" to "URN space" - perhaps without even changing a 
single file.

Cheers,
Bryan

From hopper at omnifarious.org  Sat Sep  6 07:59:42 2003
From: hopper at omnifarious.org (Eric M. Hopper)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <200309051939.59533.baford@mit.edu>
References: <20030905171217.BFAD03FD36@capsicum.zgp.org>
	<200309051501.38229.baford@mit.edu> <3F58EA78.9040608@gmx.de>
	<200309051939.59533.baford@mit.edu>
Message-ID: <1062835182.6730.39.camel@monster.omnifarious.org>

On Fri, 2003-09-05 at 18:39, Bryan Ford wrote:
> 	urn:hash:application/file-system-root:sha1:LBP...22QL#/foo/bar.html#sec1
> 
> Admittedly we could just require that the second '#' be escaped (which is why 
> I call this a "minor" problem), but then if we ever get into a situation 
> involving further compositions, we'd have to escape the third '#' twice, the 
> fourth '#' three times, etc., which gets seriously obtuse.

Does SFS guarantee that a given path will always refer to exactly the
same file forever?  If so, then it looks like you're trying to use a urn
to refer to a thing to fetch instead of using it as a forever valid name
for a particular group of bits.  The confusion is apparent because there
is confusion as to the type of the file.  Is it of type
application/file-system-root like the urn suggests?  Sure seems like you
want to think of it as an HTML file.  So, why isn't the content type
text/html instead of application/file-system-root?

I know why it's so attractive to do what you're talking about there,
since SFS filesystem names are just the hash of a public key.  But, it
isn't appropriate to use the hash urn to refer to a file in the
repository, only the repository itself.  If you want to refer to a file
in the repository, use an sfs: url.  Those are for specifying where to
find something.  A urn is for specifying the inque name of soemthing,
not how to find it.

sfs://urn:hash:application/file-system-root:sha1:LBP...22QL/foo/bar.html#sec1  seems like an excellent way to refer to a file in an SFS filesystem.

Have fun (if at all possible),
--
There's an excellent C/C++/Python/Unix/Linux programmer with a wide
range of other experience and system admin skills who needs work.
Namely, me. http://www.omnifarious.org/~hopper/resume.html
-- Eric Hopper <hopper@omnifarious.org>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 185 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030906/8a0a5100/attachment.pgp
From thiemann at informatik.uni-freiburg.de  Mon Sep  8 15:05:06 2003
From: thiemann at informatik.uni-freiburg.de (Peter Thiemann)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <200309051112.24936.baford@mit.edu>
References: <200309042143.33291.baford@mit.edu>
	<m3llt3fnpc.fsf@kailua.informatik.uni-freiburg.de>
	<200309051112.24936.baford@mit.edu>
Message-ID: <m3r82rs4m5.fsf@kailua.informatik.uni-freiburg.de>

>>>>> "bf" == Bryan Ford <baford@mit.edu> writes:

    >> People are sceptical about adding application specific parts to URN
    >> identifiers. In fact, an earlier revision of the proposal (see
    >> http://www.cbuid.org/) allowed such extensions based on the content
    >> type of the resource and it defined one such extension for
    >> message/rfc822. But there were other issues that made the proposal
    >> baroque.
    >> 
    >> For your example, this approach would require to define content types
    >> like application/file-system-root and application/database then define
    >> a syntax for the suffix.

    bf> Sure - is there anything wrong with that?  If we're talking about SFSRO file 
    bf> systems, for example, since SFSRO isn't a formally standardized format, I 
    bf> might just invent the content-type "application/x-sfsro-root" and then 
    bf> informally define the interpretation of the "remainder" part of a URN of type 
    bf> "application/x-sfsro-root" to be a path name resolved from the specified 
    bf> SFSRO root.  If SFSRO or something like it ever _were_ formally standardized, 
    bf> then we might indeed end up with a content-type of 
    bf> "application/file-system-root" or something like that, in which case we would 
    bf> also have to (formally) standardize the interpretation of the "remainder" 
    bf> part of a URN for that content-type.

I got a chance to read "Fast and Secure Distributed Read-Only File
System" more closely. Interestingly, the authors discuss the issue why
they chose to implement that functionality as a file system. On page
2, they say 

  We chose to build a file system because of the ease with which one
  can refer to the file namespace in almost any context---from shell
  scripts to C code to a Web browsers location field.

And later in the text they say that to access /sfs through the web,
all you need is a web server (or a proxy) with sfs software
installed. This is more or less a restatement of my argument for using
the file: URL.

Another implication to consider is the burden that each media-type
specific processing places on the implementor of the URN
resolver. Remember that a conforming implementation would have to
implement all of this special processing for every registered media-type. 

The biggest problem, however, is that the addition of any selection
mechanism to the URN would defeat the purpose of a self-verifying,
content-based addressing scheme because a client that does not receive
the entire resource can no longer verify it against its hash.

My conclusion from this is not to allow further parts in the
identifiers. An application should first request the entire resource,
(be able to) verify it locally against its hash, and then perform
further processing.

    bf> You're missing the fact that SFS and similar systems may have valid and 
    bf> legitimate reasons for using URN-format naming of this kind.  The whole point 
    bf> of defining a single standardized URI/URL/URN namespace scheme is 
    bf> _convergence of namespaces_ - the goal of making resource names as uniform 
    bf> and interchangeable as possible.  If you deny a whole class of applications 
    bf> entrance to the hash-URN scheme because of this trivial and unnecessary 
    bf> limitation, you're reducing the potential convergence of resource namespaces 
    bf> the scheme can generate.

If you have a way of stating these extensions without loosing the
self-verifying property, then I'll be with you. The problem is that
you need someone in the chain that you trust, who sees the entire
resource, and can verify the hash value. The whole point of having
hash-based identifiers is that you only need to trust yourself.

-Peter

From baford at mit.edu  Wed Sep 10 03:17:42 2003
From: baford at mit.edu (Bryan Ford)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: draft-thiemann-hash-urn-01.txt
In-Reply-To: <m3r82rs4m5.fsf@kailua.informatik.uni-freiburg.de>
References: <200309042143.33291.baford@mit.edu>
	<200309051112.24936.baford@mit.edu>
	<m3r82rs4m5.fsf@kailua.informatik.uni-freiburg.de>
Message-ID: <200309092317.42474.baford@mit.edu>

On Monday 08 September 2003 11:05 am, Peter Thiemann wrote:
> I got a chance to read "Fast and Secure Distributed Read-Only File
> System" more closely. Interestingly, the authors discuss the issue why
> they chose to implement that functionality as a file system. On page
> 2, they say

I haven't read it in a while, myself. :)

>   We chose to build a file system because of the ease with which one
>   can refer to the file namespace in almost any context---from shell
>   scripts to C code to a Web browsers location field.
>
> And later in the text they say that to access /sfs through the web,
> all you need is a web server (or a proxy) with sfs software
> installed. This is more or less a restatement of my argument for using
> the file: URL.

Sure, there's an excellent pragmatic argument to making any new 
file-system-like storage/data access system mesh well with existing (local) 
file system namespaces.  But doing so doesn't change the fact that "file://" 
URLs are by definition _local_ namespaces - just because a URL has a 
"file:///sfs/" prefix doesn't even necessarily mean that it's on an SFS file 
system, and it certainly doesn't have the location- and access 
method-transparency properties we want from URNs.  For example, if a public 
web page contains "file:///sfs/..." links to files on an SFS file system, any 
decent web browser will refuse to let you click through to them, because the 
browser assumes such links are to private files on your local file system and 
following such links would be a massive security risk.  There are good 
pragmatic reasons to link SFS and other file systems into local namespaces 
this way, but semantically that's not where they belong, and short-term 
pragmatics shouldn't prevent us from looking for better solutions down the 
road.

> Another implication to consider is the burden that each media-type
> specific processing places on the implementor of the URN
> resolver. Remember that a conforming implementation would have to
> implement all of this special processing for every registered media-type.

No, the URN resolver doesn't have to do anything new at all, because there's 
no reason it needs to be or should be a URN resolver's responsibility to 
interpret the (proposed) relative part of a hash-URN.  This task should 
naturally be the application's responsibility, which is no additional burden 
since the application must understand the media-type anyway.  Here's an 
example of how I envision it might work:

1. The application wants to resolve a URN of the form 
"urn:hash::md5:123...abc:/foo/bar.html".  So the application passes the whole 
URN to a URN resolver that understands the "hash" NID.

2. The "hash" URN resolver understands the "md5:123...abc" part of the URN and 
uses that to locate the flat blob of bytes that the hash refers to, whatever 
it might be.  The URN resolver doesn't have to have any idea what this blob 
of bytes is; it just needs to find it.  The URN resolver also doesn't know 
what the ":/foo/bar.html" on the end of the URN means.  When the URN resolver 
finds a URL for the blob of bytes indicated by the hash, the resolver passes 
that URL (say "http://me.com/root") back to the application along with the 
uninterpreted relative portion of the URN, "/foo/bar.html", which the 
application can use as it sees fit.

3. The application now has one or more URLs for the hashed blob of bytes, and 
the remaining relative portion of the URN.  If there is a relative part and 
the application doesn't know what to do with it, or if the content-type of 
the blob doesn't define the meaning of a relative part, then the application 
can either ignore the relative part or (probably safer) just fail the search.

4. But say the application examines the data blob located by the URN resolver 
and it happens to be an SFSRO-like file system directory node, containing a 
list of files and subdirectories and the hashes (maybe even in URN form) of 
their corresponding data streams or metadata nodes.  The application knows it 
wants the object named "/foo/bar.html" relative to the root node, so it 
locates "foo" in the directory metadata blob and finds its hash.  The 
application might then use some application-specific naming convention to try 
to find "foo", and subsequently "bar.html", on the same server on which it 
found the root node - e.g., by looking up "http://me.com/root/foo" and 
"http://me.com/root/foo/bar.html" respectively and seeing if the objects are 
accessible and the hashes come out right.  But such an application-specific 
method, if any, is just a hint.

5. If the application-specific "tree-walking" method, if any, fails, then the 
application goes back to the URN resolver and asks it to locate the hash-URN 
for "foo" and locate a copy of the metadata for _that_ directory.  Once the 
application has found out, it can finally resolve the hash-URN for "bar.html" 
located in the metadata for "foo".  In any case, the URN resolver has no clue 
what kind of data these hashes actually represent or what they're being used 
for; it's just helping the application to find what it needs, which is 
exactly what URN resolvers are supposed to do.

> The biggest problem, however, is that the addition of any selection
> mechanism to the URN would defeat the purpose of a self-verifying,
> content-based addressing scheme because a client that does not receive
> the entire resource can no longer verify it against its hash.

Not at all - I think you misunderstood my proposal.  If you have a hash-URN 
like "urn:hash::md5:123...abc:/foo/bar.html", the "123...abc" hash value is 
NOT the hash of the "bar.html" that you're eventually trying to find; it's 
the hash of the object representing the _starting point_ of the search - 
e.g., the root directory metadata node.  It's the application's 
responsibility to work from there in an appropriate application-specific 
fashion, to interpret the "foo/bar.html" part and find the ultimately-named 
(sub-)object in a way that does not violate the URN principle of permanence 
or access method independence.  If the application is an SFSRO-like file 
system and uses something like the traversal procedure I outlined above, then 
the self-verifying nature of the hash-URN is not violated at all, because:

1. the original hash-URN contains the hash of the root (starting point) 
directory metadata, so the root directory can't change without the hash-URN 
changing.

2. The root directory metadata contains the hash of the "foo" subdirectory 
metadata, so the "foo" subdirectory metadata can't change without its hash in 
the root directory changing.

3. Finally, the "foo" subdirectory metadata contains the hash of the file 
"bar.html", which can't change without the hashes of "/foo" and "/" in the 
higher-level metadata nodes changing.

Thus, the complete URN "urn:hash::md5:123...abc:/foo/bar.html" represents a 
secure, self-verifying, and permanent link to the ultimately-named file 
"bar.html" even though "bar.html" isn't directly the blob of bits that the 
hash in the URN represents.

Similarly, if the URN contains a content-type specification, it's the 
content-type of the _starting point_ (the blob of bits from which the hash 
was derived), not the content-type of the object eventually named.  The 
latter can presumably be determined from whatever metadata the application 
picks up during its traversal.

> My conclusion from this is not to allow further parts in the
> identifiers. An application should first request the entire resource,
> (be able to) verify it locally against its hash, and then perform
> further processing.

I agree 100% that "An application should first request the entire resource, 
..., and then perform further processing."  At least if by the "entire 
resource" you specifically mean "the entire byte-stream that the hash was 
computed from."  But I also think it is indispensable for the hash-URN 
standard at least to _allow_ URNs to include additional information that may 
be useful to the application in that "further processing".  I only outlined 
one possible use for this additional information; there could be many others.  
I just don't want the hash-URN specification to preclude them.

> If you have a way of stating these extensions without loosing the
> self-verifying property, then I'll be with you. The problem is that
> you need someone in the chain that you trust, who sees the entire
> resource, and can verify the hash value. The whole point of having
> hash-based identifiers is that you only need to trust yourself.

I think I've stated the extensions without losing the self-verifying 
property...  Are you with me? :)

Cheers,
Bryan

From bradneuberg at yahoo.com  Wed Sep 10 03:43:04 2003
From: bradneuberg at yahoo.com (Brad Neuberg)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Project Announcement: P2P Sockets
Message-ID: <20030910034304.77186.qmail@web14104.mail.yahoo.com>

Hi everyone.  I just posted the web site, source code,
and two tutorials for the Peer-to-Peer Sockets Project
at http://p2psockets.jxta.org.  The source code
represents a
working, 1.0 beta 1 release, with several pieces of
software, such as Jetty and XML-RPC Client and Server
libraries, already ported onto this new API.  I have
spent the last month and a half working full time on
this.  Here are some more details on the project:

------------------------

Are you interested in:

    * returning the end-to-end principle to the
Internet?
    * an alternative peer-to-peer domain name system
that bypasses ICANN and Verisign, is completely
decentralized, and responds to updates much quicker
than standard DNS?
    * an Internet where everyone can create and
consume network services, even if they have a dynamic
IP address or no IP address, are behind a Network
Address Translation (NAT) device, or blocked by an
ISP's firewall?
    * a web where every peer can automatically start a
web server, host an XML-RPC service, and more and
quickly make these available to other peers?
    * easily adding peer-to-peer functionality to your
Java socket and server socket applications?
    * having your servlets and Java Server Pages work
on a peer-to-peer network for increased reliability,
easier maintenence, and exciting new end-user
functionality?
    * playing with a cool technology?

If you answered yes to any of the above, then welcome
to the Peer-to-Peer Sockets project! The Peer-to-Peer
Sockets Project reimplements Java's standard Socket,
ServerSocket, and InetAddress classes to work on a
peer-to-peer network rather than on the standard
TCP/IP network. "Aren't standard TCP/IP sockets and
server sockets already peer-to-peer?" some might ask.
Standard TCP/IP sockets and server sockets are
theoretically peer-to-peer but in practice are not due
to firewalls, Network Address Translation (NAT)
devices, and political and technical issues with the
Domain Name System (DNS).

The P2P Sockets project deals with these issues by
re-implementing the standard java.net classes on top
of the Jxta peer-to-peer network. Jxta is an
open-source project that creates a peer-to-peer
overlay network that sits on top of TCP/IP. Ever peer
on the network is given an IP-address like number,
even if they are behind a firewall or don't have a
stable IP address. Super-peers on the Jxta network run
application-level routers which store special
information such as how to reach peers, how to join
sub-groups of peers, and what content peers are making
available. Jxta application-level relays can proxy
requests between peers that would not normally be able
to communicate due to firewalls or NAT devices. Peers
organize themselves into Peer Groups, which scope all
search requests and act as natural security
containers. Any peer can publish and create a peer
group in a decentralized way, and other peers can
search for and discover these peer groups using other
super-peers. Peers communicate using Pipes, which are
very similar to Unix pipes. Pipes abstract the exact
way in which two peers communicate, allowing peers to
communicate using other peers as intermediaries if
they normally would not be able to communicate due to
network partitioning.

Jxta is an extremely powerful framework. However, it
is not an easy framework to learn, and porting
existing software to work on Jxta is not for the
faint-of-heart. P2P Sockets effectively hides Jxta by
creating a thin illusion that the peer-to-peer network
is actually a standard TCP/IP network. If a peer
wishes to become a server they simply create a P2P
server socket with the domain name they want and the
port other peers should use to contact them. P2P
clients open socket connections to hosts that are
running services on given ports. Hosts can be resolved
either by domain name, such as "www.nike.laborpolicy",
or by IP address, such as "44.22.33.22". Behind the
scenes these resolve to JXTA primitives rather than
being resolved through DNS or TCP/IP. For example, the
host name "www.nike.laborpolicy" is actually the NAME
field of a Jxta Peer Group Advertisement. P2P sockets
and server sockets work exactly the same as normal
TCP/IP sockets and server sockets.

The benefits of taking this approach are many-fold.
First, programmers can easily leverage their knowledge
of standard TCP/IP sockets and server sockets to work
on the Jxta peer-to-peer network without having to
learn about Jxta. Second, all of the P2P Sockets code
subclasses standard java.net objects, such as
java.net.Socket, so existing network applications can
quickly be ported to work on a peer-to-peer network.
The P2P Sockets project already includes a large
amount of software ported to use the peer-to-peer
network, including a web server (Jetty) that can
receive requests and serve content over the
peer-to-peer network; a servlet and JSP engine (Jetty
and Jasper) that allows existing servlets and JSPs to
serve P2P clients; an XML-RPC client and server
(Apache XML-RPC) for accessing and exposing P2P
XML-RPC endpoints; an HTTP/1.1 client (Apache Commons
HTTP-Client) that can access P2P web servers; a
gateway (Smart Cache) to make it possible for existing
browsers to access P2P web sites; and a WikiWiki
(JSPWiki) that can be used to host WikiWikis on your
local machine that other peers can access and edit
through the P2P network. Even better, all of this
software works and looks exactly as it did before
being ported. The P2P Sockets abstraction is so strong
that porting each of these pieces of software took as
little as 30 minutes to several hours. Everything
included in the P2P sockets project is open-source,
mostly under BSD-type licenses, and cross-platform due
to being written in Java.

Because P2P Sockets are based on Jxta, they can easily
do things that ordinary server sockets and sockets
can't handle. First, creating server sockets that can
fail-over and scale is easy with P2P Sockets. Many
different peers can start server sockets for the same
host name and port, such as "www.nike.laborpolicy" on
port 80. When a client opens a P2P socket to
"www.nike.laborpolicy" on port 80, they will randomly
connect to one of the machines that is hosting this
port. All of these server peers might be hosting the
same web site, for example, making it very easy to
partition client requests across different server
peers or to recover from losing one server peer. This
is analagous to DNS round-robining, where one host
name will resolve to many different IP addresses to
help with load-balancing. Second, since P2P Sockets
don't use the DNS system, host names can be whatever
you wish them to. You can create your own fanciful
endings, such as "www.boobah.cat" or
"www.cynthia.goddess", or application-specific host
names, such as "Brad GNUberg" or "Fidget666" for an
instant messaging system. Third, the service ports for
a given host name can be distributed across many
different peers around the world. For example, imagine
that you have a virtual host for
"www.nike.laborpolicy". One peer could be hosting port
80, to serve web pages; another could be hosting port
2000, for instant messaging, and a final peer could be
hosting port 3000 for peers to subscribe to real-time
RSS updates. Hosts now become decentralized coalitions
of peers working together to serve requests.

Two tutorials are available:

    * Introduction to Peer-to-Peer Sockets -
http://www.codinginparadise.org/p2psockets/1.html
    * How to Create Peer-to-Peer Web Servers,
Servlets, JSPs, and XML-RPC Clients and Servers -
http://www.codinginparadise.org/p2psockets/2.html

Download P2PSockets-1.0-beta1.zip, (released 9-5-2003)
which contains the core package and extensions both
compiled and in source form, at
http://www.codinginparadise.org/p2psockets/P2PSockets-1.0-beta1.zip

Thanks,
  Brad GNUberg
  bkn3@columbia.edu

From nazareno at dsc.ufcg.edu.br  Wed Sep 10 04:01:16 2003
From: nazareno at dsc.ufcg.edu.br (Nazareno Andrade)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Is there any p2p reputation system deployed?
Message-ID: <3F5EA20C.1070204@dsc.ufcg.edu.br>

Hello.

Does anybody know of any p2p system (probably file-sharing) that has any 
mechanism of prioritizing requests based on contributions other than 
KazaA and GNUnet??

I've seen several theoretical proposals, but I have no knowledge of any 
deployed (i.e., with a significant base of users) system that uses this 
kind of mechanisms.

Thanks in advance,

Nazareno.

========================================
  Nazareno Andrade
  Mestrando em Inform?tica
  LSD - DSC/UFCG
  Campina Grande - Brasil
  nazareno@dsc.ufcg.edu.br
  http://lsd.dsc.ufcg.edu.br/~nazareno/
========================================


From 0x90 at invisiblenet.net  Wed Sep 10 05:17:04 2003
From: 0x90 at invisiblenet.net (Lance James)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Public Peer Review Request
Message-ID: <005701c3775a$c66fbf00$0201a8c0@invisible>

InvisibleNet has formed the Invisible Internet Project (I2P) to support
the efforts of those trying to build a more free society by offering
them an uncensorable, anonymous, and secure communication system.  I2P
is a development effort producing a variable latency, fully distributed,
 autonomous, scalable, anonymous, resilient, and secure network.  The
goal is to be able to operate successfully in arbitrarily hostile
environments
- even when an organization with unlimited financial and political resources
attacks it.

I2P is not a filesharing app.  I2P is essentially an anonymizing and
secure replacement IP stack, running on top of the existing network.
There has already been progress made in writing applications on top of
the network to enable generic TCP/IP applications to tunnel through the
network transparently, as well as to enable nym lookup and management
- two applications which, when paired together, would allow any web browser
to point at http://www.[yournym].iip/ and communicate with your webserver
anonymously and securely.  There are many more ideas for what I2P could
be used for, and its certain we won't think of the most interesting ones.

I2P is an absurdly ambitious effort.  Depending on what mailing lists
you read or people you talk with, they'll either say its impossible or
just insanely hard.  To be perfectly frank, I2P by itself doesn't contribute
anything really significant to the CS/P2P research community, but it
does take the great work of other projects and research efforts - such
as freenet, iip, kademlia, mnet, tarzan, the remailers, and many, many
more - and attempts to apply good software engineering techniques to
provide hard anonymity and security in a variable latency network.

"Variable latency" is repeated so often because I2P doesn't try to operate
with a one size fits all set of anonymity and security constraints, and
different people will require different tradeoffs.  Bin Laden will probably
not be able to pull off live streaming video, but Joe and Jane Sixpack
and should be able to.

Is I2P ready to download and run with?  No.  So why bother mentioning
it?  Because we need more critical eyes to make sure we address the right
issues the right ways.  We think we've got things pegged so that it'll
not only work, but also be secure and anonymous.  We're moving forward
on the development path towards getting an alpha network release out
the door, but we need these specs reviewed for flaws that we've missed.
 Of course, we also need lots of other things, from coders to documenters
to QA to network simulators to CS people, but it is your eyeballs that
we're calling out for today.

What we have ready for review:

- Invisible Internet Network Protocol (I2NP) spec[1], describing how
network "routers" operate and what messages they send to other routers

- Common Data Structures spec[2], describing the serialization of objects
described in other specs, as well as the encryption algorithms used.

- Invisible Internet Client Protocol (I2CP) spec[3], describing a simple
local client protocol for making use of the network.

- Polling HTTP Transport spec[4], an example transport protocol for use
with I2NP to allow actual communication between routers, regardless of
firewall, NAT, or HTTP proxy.

We also have the 0.2 release of a software development kit (I2P SDK)[5],
 which includes everything necessary to design, develop, and test
applications
to run over the network, as well as all of the above specs. It includes
a Java client API implementing I2CP, a sample application (ATalk, a one
to one chat app that supports file transfer), a Java router, and a Python
router.  There are also C and Python client API implementations of I2CP
are on the way.  These router are "local only" - meaning they don't talk
to other routers.  This can be used in the same way we can build normal
networked applications - by running the server on the local machine and
pointing the applications at it.

We've been keeping this quiet because its too easy to hype up a vaporware
product and we wanted to wait until there was something worth reading
about before saying anything.  So please read these specs and send in
your comments - either to info@invisiblenet.net or to the iip-dev mailing
list[6].  Perhaps even jump on that list if you want to discuss things
(archives are linked to from the web page), browse the wiki[7], or join
us on IIP for development meetings - every tuesday at 9P GMT in #iip-
dev (archives[8] since meeting 48 are pretty
much I2P specific).

Thanks for your time, and we look forward to any responses.
- The InvisibleNet team


[1] http://www.invisiblenet.net/i2p/I2NP_spec.pdf
[2] http://www.invisiblenet.net/i2p/datastructures.pdf
[3] http://www.invisiblenet.net/i2p/I2CP_spec.pdf
[4] http://www.invisiblenet.net/i2p/polling_http_transport.pdf
[5] http://www.invisiblenet.net/i2p/I2P_SDK.zip
[6] http://www.invisiblenet.net/iip/devMailinglist.php
[7] http://wiki.invisiblenet.net/iip-wiki?I2P
[8] http://wiki.invisiblenet.net/iip-wiki?Meetings


From moore at eds.org  Wed Sep 10 06:23:12 2003
From: moore at eds.org (Jonathan Moore)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Is there any p2p reputation system deployed?
In-Reply-To: <3F5EA20C.1070204@dsc.ufcg.edu.br>
References: <3F5EA20C.1070204@dsc.ufcg.edu.br>
Message-ID: <1063174992.8741.68.camel@tot>

On Tue, 2003-09-09 at 21:01, Nazareno Andrade wrote:
> Does anybody know of any p2p system (probably file-sharing) that has any 
> mechanism of prioritizing requests based on contributions other than 
> KazaA and GNUnet??

Whell there is bittorrent which uses tit for tat markets to do this.

-Jonathan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030909/9e440cbe/attachment.pgp
From digi at treepy.com  Wed Sep 10 10:34:14 2003
From: digi at treepy.com (p@)
Date: Sat Dec  9 22:12:22 2006
Subject: AW: [p2p-hackers] Is there any p2p reputation system deployed?
In-Reply-To: <3F5EA20C.1070204@dsc.ufcg.edu.br>
Message-ID: <00cc01c37787$19b37ef0$0200a8c0@pat>

Hi,

emule has implemented a system like this. If you download from a peer it
get credits and if this peer ever downloads from you it will be
forwarded in the queue based on the credits. Every user handles its own
credit system therefore its pretty much not hackable...

cheers

digi

-----Urspr?ngliche Nachricht-----
Von: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org] Im
Auftrag von Nazareno Andrade
Gesendet: Mittwoch, 10. September 2003 06:01
An: p2p-hackers@zgp.org
Betreff: [p2p-hackers] Is there any p2p reputation system deployed?

Hello.

Does anybody know of any p2p system (probably file-sharing) that has any

mechanism of prioritizing requests based on contributions other than 
KazaA and GNUnet??

I've seen several theoretical proposals, but I have no knowledge of any 
deployed (i.e., with a significant base of users) system that uses this 
kind of mechanisms.

Thanks in advance,

Nazareno.

========================================
  Nazareno Andrade
  Mestrando em Inform?tica
  LSD - DSC/UFCG
  Campina Grande - Brasil
  nazareno@dsc.ufcg.edu.br
  http://lsd.dsc.ufcg.edu.br/~nazareno/
========================================

_______________________________________________
p2p-hackers mailing list
p2p-hackers@zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers
_______________________________________________
Here is a web page listing P2P Conferences:
http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences


From eugen at leitl.org  Wed Sep 10 16:35:54 2003
From: eugen at leitl.org (Eugen Leitl)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Project Announcement: P2P Sockets (fwd from
	bradneuberg@yahoo.com) (fwd from morlockelloi@yahoo.com)
Message-ID: <20030910163554.GF1808@leitl.org>

----- Forwarded message from Morlock Elloi <morlockelloi@yahoo.com> -----

From: Morlock Elloi <morlockelloi@yahoo.com>
Date: Wed, 10 Sep 2003 08:41:40 -0700 (PDT)
To: cypherpunks@lne.com
Subject: Re: [p2p-hackers] Project Announcement: P2P Sockets (fwd from 
  bradneuberg@yahoo.com)

> stable IP address. Super-peers on the Jxta network run
> application-level routers which store special
> information such as how to reach peers, how to join

So these super peers are reliable, non-vulnerable, although everyone knows
where they are, because .... ?


=====
end
(of original message)

Y-a*h*o-o (yes, they scan for this) spam follows:

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

----- End forwarded message -----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030910/0b7f4077/attachment.pgp
From nazareno at dsc.ufcg.edu.br  Wed Sep 10 17:23:48 2003
From: nazareno at dsc.ufcg.edu.br (Nazareno Andrade)
Date: Sat Dec  9 22:12:22 2006
Subject: AW: [p2p-hackers] Is there any p2p reputation system deployed?
In-Reply-To: <00cc01c37787$19b37ef0$0200a8c0@pat>
References: <00cc01c37787$19b37ef0$0200a8c0@pat>
Message-ID: <3F5F5E24.4060409@dsc.ufcg.edu.br>

Hello.

Thanks for the answer, but I'm not sure I already understood it 
completely. Each peer has knowledge only about its past interactions, 
right? They do not exchange information about other peers reputations?

If so, is there any evidence of the performance of this mechanism?

Thanks,

Nazareno

p@ wrote:
> Hi,
> 
> emule has implemented a system like this. If you download from a peer it
> get credits and if this peer ever downloads from you it will be
> forwarded in the queue based on the credits. Every user handles its own
> credit system therefore its pretty much not hackable...
> 
> cheers
> 
> digi
> 


Nazareno.

========================================
  Nazareno Andrade
  Mestrando em Inform?tica
  LSD - DSC/UFCG
  Campina Grande - Brasil
  nazareno@dsc.ufcg.edu.br
  http://lsd.dsc.ufcg.edu.br/~nazareno/
========================================


From bram at gawth.com  Wed Sep 10 18:14:58 2003
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] p2p-hackers meeting, this sunday
Message-ID: <Pine.LNX.4.21.0309101114030.3118-100000@ultra.gawth.com>

We haven't had a p2p-hackers meeting in a while, so I'd like to call a new
one.

This sunday, 3pm, the metreon. We can talk about new developments in
BitTorrent, Codeville, and Bram and Steve Hazel's moving to Berkeley.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From digi at treepy.com  Wed Sep 10 18:25:45 2003
From: digi at treepy.com (p@)
Date: Sat Dec  9 22:12:22 2006
Subject: AW: AW: [p2p-hackers] Is there any p2p reputation system deployed?
In-Reply-To: <3F5F5E24.4060409@dsc.ufcg.edu.br>
Message-ID: <00e001c377c8$f91f2bc0$0200a8c0@pat>


Hi,

If i download from someone I save his user-hash and points for the
downloaded bytes. If this user connects now to me his place in the queue
gets a multiplier from the prior saved points. If he starts to download,
the bytes he downloads get substracted from the points. 

I never actually measured it. But I think there should be a difference
between a freeloader and a normal user. I think especially high
up-loaders will profit from this. The system gets weaker if there are
more users because the chance that you never met the opposite peer gets
higher. 

Freeloaders don't get banned with this system but probably will have to
wait longer for the download to start.

>Hello.
>
>Thanks for the answer, but I'm not sure I already understood it 
>completely. Each peer has knowledge only about its past interactions, 
>right? They do not exchange information about other peers reputations?
>
>If so, is there any evidence of the performance of this mechanism?
>
>Thanks,
>
>Nazareno
>
>p@ wrote:
> > Hi,
> > 
> > emule has implemented a system like this. If you download from a
peer it
> > get credits and if this peer ever downloads from you it will be
> > forwarded in the queue based on the credits. Every user handles its
own
> > credit system therefore its pretty much not hackable...
> > 
> > cheers
> > 
> > digi
> > 


From lgonze at panix.com  Wed Sep 10 18:32:28 2003
From: lgonze at panix.com (Lucas Gonze)
Date: Sat Dec  9 22:12:22 2006
Subject: AW: AW: [p2p-hackers] Is there any p2p reputation system deployed?
In-Reply-To: <00e001c377c8$f91f2bc0$0200a8c0@pat>
Message-ID: <21BB65C9-E3BD-11D7-9A61-000393455590@panix.com>

It strikes me that a reputation system with non-transferable 
reputation, ie where each peer has knowledge only about its own 
interactions, would encourage long term relationships.  That's a good 
thing, since long term relationships encourage good behavior.

Just thinking...

- Lucas

On Mercredi, sep 10, 2003, at 14:25 America/New_York, p@ wrote:

>
> Hi,
>
> If i download from someone I save his user-hash and points for the
> downloaded bytes. If this user connects now to me his place in the 
> queue
> gets a multiplier from the prior saved points. If he starts to 
> download,
> the bytes he downloads get substracted from the points.
>
> I never actually measured it. But I think there should be a difference
> between a freeloader and a normal user. I think especially high
> up-loaders will profit from this. The system gets weaker if there are
> more users because the chance that you never met the opposite peer gets
> higher.
>
> Freeloaders don't get banned with this system but probably will have to
> wait longer for the download to start.
>
>> Hello.
>>
>> Thanks for the answer, but I'm not sure I already understood it
>> completely. Each peer has knowledge only about its past interactions,
>> right? They do not exchange information about other peers reputations?
>>
>> If so, is there any evidence of the performance of this mechanism?
>>
>> Thanks,
>>
>> Nazareno
>>
>> p@ wrote:
>>> Hi,
>>>
>>> emule has implemented a system like this. If you download from a
> peer it
>>> get credits and if this peer ever downloads from you it will be
>>> forwarded in the queue based on the credits. Every user handles its
> own
>>> credit system therefore its pretty much not hackable...
>>>
>>> cheers
>>>
>>> digi
>>>
>
>
>
>
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> _______________________________________________
> Here is a web page listing P2P Conferences:
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
>


From bradneuberg at yahoo.com  Wed Sep 10 19:18:51 2003
From: bradneuberg at yahoo.com (Brad Neuberg)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Project Announcement: P2P Sockets (fwd from
	bradneuberg@yahoo.com) (fwd from morlockelloi@yahoo.com)
In-Reply-To: <20030910163554.GF1808@leitl.org>
Message-ID: <20030910191851.6889.qmail@web14101.mail.yahoo.com>


--- Eugen Leitl <eugen@leitl.org> wrote:
> ----- Forwarded message from Morlock Elloi
> <morlockelloi@yahoo.com> -----
> 
> From: Morlock Elloi <morlockelloi@yahoo.com>
> Date: Wed, 10 Sep 2003 08:41:40 -0700 (PDT)
> To: cypherpunks@lne.com
> Subject: Re: [p2p-hackers] Project Announcement: P2P
> Sockets (fwd from 
>   bradneuberg@yahoo.com)
> 
> > stable IP address. Super-peers on the Jxta network
> run
> > application-level routers which store special
> > information such as how to reach peers, how to
> join
> 
> So these super peers are reliable, non-vulnerable,
> although everyone knows
> where they are, because .... ?
> 

These super peers are known as Rendezvous peers in the
Jxta world.  They are as reliable and non-vulnerable
as one could hope for, though I doubt they are
perfect; I am building above the existing Jxta
infrastructure for these.  "Everyone" knows about them
by using a common boostrap server to bootstrap into
the Jxta network to gain the addresses of a few
Rendezvous nodes.  Rendezvous nodes then propagate
information about their existence to other Rendezvous
nodes at various times.  Network partitions are
certainly possible, and the requirement for a common
bootstrap server is fragile.  Jxta, and therefore P2P
Sockets, currently has no protections against
malicious/byzantine peers; it has relatively good
protections against peers that fail non-maliciously.  


Brad Neuberg

From clint at TheStaticVoid.net  Thu Sep 11 12:42:11 2003
From: clint at TheStaticVoid.net (Clint Heyer)
Date: Sat Dec  9 22:12:22 2006
Subject: AW: AW: [p2p-hackers] Is there any p2p reputation system
In-Reply-To: <20030910190007.4D1583FD2C@capsicum.zgp.org>
Message-ID: <2003911144211.419221@platypus>

Have you seen Naanou?[1,2] It's not 'deployed' per se, but the code is 
available under the GPL license. Naanou uses the characteristics of 
the underlying DHT (Chord) to provide a distributed reputation system. 
The goal was to enable distributed moderation of anti-social behaviour 
in a P2P community. As the user gets moderated against their 
experience of the network deteriorates - for example download speeds 
from other peers is reduced.

cheers,
 .clint

[1] http://naanou.sourceforge.net/
[2] http://thestaticvoid.net/portfolio/p_naanou.html

> Does anybody know of any p2p system (probably file-sharing) that
> has any
> mechanism of prioritizing requests based on contributions other than
> KazaA and GNUnet??
> I've seen several theoretical proposals, but I have no knowledge of
> any deployed (i.e., with a significant base of users) system that
> uses this kind of mechanisms.

______________________________________
 www: http://www.TheStaticVoid.net


From clausen at gnu.org  Thu Sep 11 01:19:31 2003
From: clausen at gnu.org (Andrew Clausen)
Date: Sat Dec  9 22:12:22 2006
Subject: AW: AW: [p2p-hackers] Is there any p2p reputation system deployed?
In-Reply-To: <21BB65C9-E3BD-11D7-9A61-000393455590@panix.com>
References: <00e001c377c8$f91f2bc0$0200a8c0@pat>
	<21BB65C9-E3BD-11D7-9A61-000393455590@panix.com>
Message-ID: <20030911011931.GB710@gnu.org>

On Wed, Sep 10, 2003 at 02:32:28PM -0400, Lucas Gonze wrote:
> It strikes me that a reputation system with non-transferable 
> reputation, ie where each peer has knowledge only about its own 
> interactions, would encourage long term relationships.

At this point, I don't think you are talking about reputation.  You're
basically saying: "everyone is better off going it alone, and figuring
out for themselves who they should trust".

If you are in a situation where you don't have enough direct experience
with a peer, then I think you need reputation.  "Enough direct
experience" is a sticky question, because you need to be careful not to
overtrust someone who might be trying to suck you in first, then rip you
off later.

> That's a good thing, since long term relationships encourage good
> behavior.

I think the same can be said for reputation as well.  It would be cool
if there were a reputation system that made it irrational to defect.
i.e. the utility gained from defection always being outweighed by the
lost reputation.

This is the motivation for my research.  (If you're interested:
http://members.optusnet.com.au/clausen/ideas/google/google-subvert.pdf)

Cheers,
Andrew


From bram at gawth.com  Fri Sep 12 21:28:48 2003
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Hacker Dim Sum this Sunday at noon in San Francisco
	(fwd)
Message-ID: <Pine.LNX.4.21.0309121428170.3118-100000@ultra.gawth.com>

Hey.

A bunch of us are getting together this Sunday at noon at Yank Sing in 
San Francisco for Dim Sum.  Bram said he wants to have a P2P hackers 
meeting after ... maybe we could relocate to a coffee shop.

The address is:

101 Spear St

Hopefully we can get 5-10 people to come. 

I took the liberty to send email to all the super brilliant 
infoanarchists, P2P hackers, and general geeks that I know live in the 
city.  If you can think of anyone else please forward.  Hopefully we can 
get 6-12 people to go... hopefully everyone won't be too busy (kind of 
late notice)

Feel free to call my cell if you get lost on Sunday (415-595-9965)

Also if you could RSVP back I might call Yank Sing to reserve a table(s) 
if the group gets too big!

Peace!

Kevin

-- 
Help Support NewsMonster Development!  Purchase NewsMonster PRO!

    http://www.newsmonster.org/download-pro.html

Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM - sfburtonator,  Web - http://www.peerfear.org/
GPG fingerprint: 4D20 40A0 C734 307E C7B4  DCAA 0303 3AC5 BD9D 7C4D
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


From stevegt at TerraLuna.Org  Tue Sep 16 18:25:44 2003
From: stevegt at TerraLuna.Org (stevegt@TerraLuna.Org)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Protocol vs Header Format
Message-ID: <20030916182544.GB27485@pathfinder>

Hi All,

Has anyone ever given any thought to the difference between protocol and
header format?  In most existing protocol stacks, the 'protocol' field
in a lower header (e.g. IP) specifies both the parser and the state
engine to use in interpreting the next higher header (e.g. TCP).  

In other words, in networking we seem to have evolved to this assumption
that the header format and protocol state engine are inseparable, so
much so that the terms 'protocol' and 'header format' are often
(mis)used interchangeably.  Are there good reasons for this that anyone
can point to?

Wouldn't it make more sense for the N-layer header to specify the N+1
parser, and then have a field in the layer N+1 header specify the N+1
state engine?  It seems like this would be more flexible.  Can anyone
think of an existing case where a stack is already implemented this way?
I haven't been able to put my finger on one yet, but maybe I'm missing
something obvious.

I think this ties intimately into the "text vs. binary" header debate,
too, though I can't quite articulate why right now.

This all bubbled up in my brain because I'm in the throes of putting
together a protocol, and need a header format to support it.  The
development of the protocol is to be self-hosting; i.e. developers use
the protocol to collaborate on the development of the protocol.  This
means that the headers and protocol state engines are both likely to
undergo violent evolution as they mature, but will need to remain
usable.  This means distributed versioning, separation of code (state
engine) and data (header format), and so on.  I keep getting tangled up
in prior art that doesn't seem to care that the protocol and headers
might want to evolve separately. 

Am I crazy, on the right track, or both?

Steve
-- 
Stephen G. Traugott  (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna Aerospace LLC
stevegt@TerraLuna.Org 
http://www.stevegt.com -- http://Infrastructures.Org 

From wesley at felter.org  Tue Sep 16 23:33:02 2003
From: wesley at felter.org (Wes Felter)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Protocol vs Header Format
In-Reply-To: <20030916182544.GB27485@pathfinder>
References: <20030916182544.GB27485@pathfinder>
Message-ID: <1063755182.4286.34.camel@arlx248.austin.ibm.com>

On Tue, 2003-09-16 at 13:25, stevegt@TerraLuna.Org wrote:
> Hi All,
> 
> Has anyone ever given any thought to the difference between protocol and
> header format?  In most existing protocol stacks, the 'protocol' field
> in a lower header (e.g. IP) specifies both the parser and the state
> engine to use in interpreting the next higher header (e.g. TCP).  

FWIW, several protocols borrow RFC 822 style headers. But I don't see
what that allows you to do that you can't do with custom headers.

Another way to look at it is that you're talking about the split between
syntax (formats) and semantics (protocols). It remains true that one is
useless without the other, and superficial similarities between syntaxes
(like using XML for everything) doesn't buy you any interop.

-- 
Wes Felter - wesley@felter.org - http://felter.org/wesley/

From zooko at zooko.com  Wed Sep 17 13:40:59 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] desiderata and open issues in ent
Message-ID: <E19zcYF-00023R-00@localhost>


Dear mnet-devel and p2p-hackers:

[If Robert Hettinga forwards this message to a list that you read, and you 
find it interesting, perhaps you should consider joining mnet-devel [1] or 
p2p-hackers [2].]

[Note that ent is not the only thing going on in terms of improved Mnet 
networking.  Tschechow has gotten a simplified version of ent named "router" 
running, Myers is implementing Twisted Chord, and Arno *was* working on 
multiple independent metatrackers before Real Life distracted him.]


These are the ways that ent should differ from emergent network designs like 
Chord and Kademlia.  Some of these desiderata dovetail with one another, but 
others of them appear to be conflicting.  It may be impossible to satisfy all 
of them at once, but I currently believe that we can at least partially 
achieve all of them.

1.  It should self-heal flaws in the network structure.  There is a protocol [3]
known to do this for Chord, but not (as far as I know) for Kademlia.  (re: 
recent discussion [4])

2.  It should handle the reality that a large fraction (around half) of the 
nodes are behind NAT or firewalls and can't accept incoming connections.  If 
two nodes are both restricted like that then they cannot be peers of one 
another in the ent graph.

2.b.  But it will not try to handle arbitrary underlay structure -- it will 
rely on the fact that a large fraction of the nodes form a single fully-
connected clique, and that every node has edges to almost all of the clique 
nodes.

3.  It should handle transient nodes.

3.a.  Newcomer nodes are never entrusted with the only replica of any data.  
They may be entrusted with no data at all, or at a cost of using extra 
bandwidth, they can be entrusted with an extra replica that is already stored 
by an old-timer node.

3.b.  Suppose a new node "B" joins the network, and it is no responsible for a 
part of the id space that was formerly covered by an old node "A".  Now 
suppose A is serving up 200 GB of data, 100 GB of which now falls into B's 
part of the id space.  When queries for blocks from B's space get routed to 
be, B will forward them to A, and cache copies of the response.  This 
dovetails with 3.a.

4.  It should include incentives for people to run good nodes.

4.a.  The most important incentives are probably social -- people should chat 
with one another, have (*real*, human social, not digital) reputations, and so 
forth.  That's outside the scope of ent design.

4.b.  Peering should have a selfish incentive mechanism so that people who run 
high-quality nodes get higher-quality performance from the network when they 
try to use it themselves.  (I'm inspired by the tit-for-tat pairwise 
incentives from Bram's original BitTorrent design.  I don't know if the 
current BitTorrent has changed in that respect.)

5.  It should exploit locality: nodes which have short fat pipes between them 
(they are "close to" each other in the underlay network) should be more likely 
to peer with each other than nodes that share long skinny pipes.

6.  It should exploit heterogeneity.  The capacities of nodes is expected to 
follow a power-law distribution: half of the nodes are Pentium II's on 33 Kbps 
dial-up with 5 MB of free disk space, a quarter of the nodes are Pentium IV's
on 512 Kbps DSL with 5 GB of free disk space, an eighth of the nodes are quad-
Opterons on 44 Mbps DS3 with 1 TB free disk space, a sixteenth of the nodes 
are supercomputers which are illicitly bridging the Internet to "Internet 2", 
etc.

7.  Normal emergent network desiderata: scalable, robust, efficient.

8.  Simple, analyzable, measurable.


Open issues:

 * Self-healing (1.) has to be designed for Kademlia or else locality (5.) has 
to be designed for Chord or else we have to switch to another emergent network 
design entirely for the basis of ent.

 * Id-space shadowing (3.b.) has to be designed carefully to avoid looping or 
other bad artifacts, and to behave acceptably well in the various cases of A, 
B, and other nodes coming and going.

 * Selfish peering (4.b.) has to be balanced against the system-wide consistency 
and performance desiderata.  For example, each individual node wants to link 
*only* with peers that provide good quality service to it.  However, suppose 
there is a node that provides bad-quality service, that nobody wants to peer 
with.  Suppose that node is in sole possession of a block of data.  We still 
want searches in the network to find that block!  This may be impossible, in 
which case we have to choose a trade-off, and hopefully one which is either 
qualitative or easily measurable.

 * Putting all of these together is the big trick.  Can we exploit 
heterogeneity (6.) and locality (5.) at the same time?  Can we exploit both of 
these while retaining any sort of analyzability/measurability?  Etc.


A general design idea:

One way that some of these apparently conflicting desiderata can be reconciled 
is to use redundant "special-purpose" overlay networks.  For example, 
Pastry [5] uses its free-choice property to increase network locality, while 
the very similar Kademlia [6] uses the same free-choice property to increase 
robustness, at the expense of locality.  That is: in Pastry you have to choose 
any one out of (say) a thousand nodes to be your peer, and you attempt to 
choose the one which is closest to you in the underlay, i.e. the one that has 
the fastest connection to you.  In Kademlia you have to choose any one out of 
a thousand nodes, and you attempt to choose the one that is least likely to 
drop off the net.

My idea for ent is that you have two separate overlay networks, one in which 
you prefer the most reliable nodes and the most robust emergent network 
topology, and the other in which you prefer the fastest nodes and the most 
efficient emergent network topology.  When the latter fails, you use the 
former to rebuild it.


Regards,

Zooko

http://zooko.com/log.html

[1] http://sourceforge.net/mailarchive/forum.php?forum_id=7702
[2] http://zgp.org/pipermail/p2p-hackers/
[3] http://citeseer.nj.nec.com/liben-nowell02observations.html
[4] http://zgp.org/pipermail/p2p-hackers/2003-August/001344.html
[5] http://research.microsoft.com/~antr/Pastry/
[6] http://citeseer.nj.nec.com/529075.html

From mujtaba at asu.edu  Wed Sep 17 15:30:34 2003
From: mujtaba at asu.edu (Mujtaba Khambatti)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Call for Student Essays (P2P pages, IEEE DSO)
In-Reply-To: <000901c37d2f$130ed560$03054bab@Mujtaba>
Message-ID: <001801c37d30$a447ffb0$03054bab@Mujtaba>

Call for Essays 

The P2P pages [1] of the IEEE Distributed Systems Online [2] is seeking
submissions for publication on its website. The IEEE Distributed Systems
Online hosts expert-authored articles and resources in the various topic
areas of distributed systems. The P2P pages focus specifically on links
to useful websites, news, journals, papers, books, and events related to
peer-to-peer technologies.

The essays must be written by students. Proof of student status will be
required to complete the submission process. Essay length can range from
800 to 1000 words, depending on the nature of the topic. Lengthier
essays may be serialized for easier online readability. Among the types
of articles that would be of interest (not inclusive): 

1. P2P Applications 
2. Challenges in P2P Systems 
3. Security in P2P Systems 
4. Trust in P2P Systems 
5. P2P and Databases 
6. P2P and Mobile networks 
7. P2P and the law 
8. P2P and the Business world 
9. Criticisms of P2P technology 

Essays must be submitted with a completed list of well-formatted
references and a paragraph of up to 50 words containing biographical
information about the author. Please submit in plain text (ASCII) format
via a MIME attachment to an email. Email your submissions to
mujtaba@asu.edu by the 10th of October, 2003 in order to be considered
for the November 2003 issue. Please write "DSOnline Essay-Nov" in the
subject line of the email.

Essays that have less than 750 words or greater than 1050 words will not
be considered. The essay title, author name, biographical information,
and the reference section will be excluded from the word count.

For more information contact mujtaba@asu.edu

References:
[1] http://dsonline.computer.org/os/related/p2p/index.htm
[2] http://dsonline.computer.org/


From bert at web2peer.com  Wed Sep 17 16:29:43 2003
From: bert at web2peer.com (Bert)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] p2p sharing & access-control
Message-ID: <20030917092947.25462.h018.c001.wm@mail.web2peer.com.criticalpath.net>

One of my recent interests has been p2p file sharing in an
access-controlled environment instead of the current "free for all"
paradigm. This area is deserving of attention because of obvious
applications in p2p for the enterprise as well as emerging "darknets"
intended to be invitation only.

The question I've been thinking about is how to support (efficient)
search in such settings. Currently, when we search for access
controlled files we must individually authenticate and search each
relevant repository. But in a massively distributed environment, how
do you know what repositories are relevant? And even if you did,
searching all of them independently would be too much trouble.  An
alternative is to have every information provider allow its content to
be indexed by a centralized index host, but the trust & security
requirements of such a host would be too high to be practical.

We've written a paper that addresses this problem and proposes an
alternative solution.  The idea is to build a specialized index
structure that does not reveal any specific details about the content
being shared. As such it is suitable for storage on untrusted nodes,
e.g. typical (super) peers in a p2p network.

The paper is entitled "Privacy-Preserving Indexing of Documents on the
Network", and you can download it from here:

http://www.almaden.ibm.com/cs/people/bayardo/userv/

Hope you find it interesting.

From bradneuberg at yahoo.com  Wed Sep 17 19:03:49 2003
From: bradneuberg at yahoo.com (Brad Neuberg)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] New Release of P2P Sockets + Presentation
Message-ID: <20030917190349.80489.qmail@web14103.mail.yahoo.com>

Just uploaded a new build of P2P Sockets, dated
9-17-2003. Check out p2psockets.jxta.org to grab it. 
This build also has a new Power Point presentation
(its also at
http://codinginparadise.org/p2psockets/p2psockets_powerpoint.ppt)
which provides an intro to the project; I presented
this last night at the JXTA Town Hall Meeting.  It
also includes some new shell-scripts that make running
the various tools much easier, some code-changes to
help the shell-scripts, and some updates to the
tutorials to use these new easier shell-scripts. 

P2P Sockets is a reimplementation of standard Java
sockets on top of Jxta and ports of standard web
servers, servlet engines, etc. to run on top of a
peer-to-peer network. P2P Sockets is finished.

The Paper Airplane website, paperairplane.us, is also
now up.

Thanks,
  Brad Neuberg
  bkn3@columbia.edu
  http://www.codinginparadise.org

From lgonze at panix.com  Wed Sep 17 22:59:55 2003
From: lgonze at panix.com (Lucas Gonze)
Date: Sat Dec  9 22:12:22 2006
Subject: AW: AW: [p2p-hackers] Is there any p2p reputation system deployed?
In-Reply-To: <20030911011931.GB710@gnu.org>
Message-ID: <A7883910-E962-11D7-8793-000393455590@panix.com>


On Mercredi, sep 10, 2003, at 21:19 America/New_York, Andrew Clausen 
wrote:

> On Wed, Sep 10, 2003 at 02:32:28PM -0400, Lucas Gonze wrote:
>> It strikes me that a reputation system with non-transferable
>> reputation, ie where each peer has knowledge only about its own
>> interactions, would encourage long term relationships.
>
> At this point, I don't think you are talking about reputation.  You're
> basically saying: "everyone is better off going it alone, and figuring
> out for themselves who they should trust".
>
> If you are in a situation where you don't have enough direct experience
> with a peer, then I think you need reputation.  "Enough direct
> experience" is a sticky question, because you need to be careful not to
> overtrust someone who might be trying to suck you in first, then rip 
> you
> off later.

Ok, it's good to have the two definitions spelled out, so that we can 
know to say which one we're talking about in the future.

> This is the motivation for my research.  (If you're interested:
> http://members.optusnet.com.au/clausen/ideas/google/google-subvert.pdf)

Finally got around to reading this yesterday.  I don't know whether the 
idea that PageRank is ultimately backed by the cost of a domain name 
was known to others, but to me it's completely new.  Good stuff.

- Lucas


From mllist at vaste.mine.nu  Sun Sep 21 13:57:01 2003
From: mllist at vaste.mine.nu (Johan =?iso-8859-1?Q?F=E4nge?=)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] XOR Hash Tree
Message-ID: <1639.192.168.1.204.1064152621.squirrel@Vaste_lp3.wired>

Sending top-to-bottom flattened THEX trees requires sending 2*base_hashes
hashes, right?

Instead of hashing pairs of hashes from the previous level to create the
next level, why not XOR them?

  A
 / \
B   C

If you transfer A and B, you can reconstruct C by doing A XOR B. Etc. No
base_hashes*hash_size overhead.

What I'm wondering is of course if this is more easily spoofable. (Does
XORing two pseudo-random number make them less random?)

Surely someone must've thought of this before?

/Vaste

From b.fallenstein at gmx.de  Sun Sep 21 15:17:49 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] XOR Hash Tree
In-Reply-To: <1639.192.168.1.204.1064152621.squirrel@Vaste_lp3.wired>
References: <1639.192.168.1.204.1064152621.squirrel@Vaste_lp3.wired>
Message-ID: <3F6DC11D.5010301@gmx.de>


Hi,

Johan F?nge wrote:
> Instead of hashing pairs of hashes from the previous level to create the
> next level, why not XOR them?
> 
>   A
>  / \
> B   C
> 
> If you transfer A and B, you can reconstruct C by doing A XOR B. Etc. No
> base_hashes*hash_size overhead.

If you transfer B and C, you can reconstruct A by hashing B|C, so XORing 
doesn't seem to give an advantage.

Plus, your scheme breaks security completely. ;) Because if the receiver 
has A, and you send A and B, then the receiver can construct C... but, 
the receiver can construct a C for *arbitrary* B! I.e., whatever you 
send as B, the receiver constructs a C so that it "authenticates" 
against the root of the hash tree.

(There's also the problem that XORing doesn't preserve order-- A XOR B = 
B XOR A-- but that hardly makes a difference given the above.)

Cheers,
- Benja


From justin at chapweske.com  Mon Sep 22 00:55:20 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] XOR Hash Tree
In-Reply-To: <1639.192.168.1.204.1064152621.squirrel@Vaste_lp3.wired>
References: <1639.192.168.1.204.1064152621.squirrel@Vaste_lp3.wired>
Message-ID: <3F6E4878.4020908@chapweske.com>

Not good.  You could easily create a collision by swapping the leaves.

-Justin

Johan F?nge wrote:
> Sending top-to-bottom flattened THEX trees requires sending 2*base_hashes
> hashes, right?
> 
> Instead of hashing pairs of hashes from the previous level to create the
> next level, why not XOR them?
> 
>   A
>  / \
> B   C
> 
> If you transfer A and B, you can reconstruct C by doing A XOR B. Etc. No
> base_hashes*hash_size overhead.
> 
> What I'm wondering is of course if this is more easily spoofable. (Does
> XORing two pseudo-random number make them less random?)
> 
> Surely someone must've thought of this before?
> 
> /Vaste
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> _______________________________________________
> Here is a web page listing P2P Conferences:
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From icepick at icepick.info  Thu Sep 25 19:35:26 2003
From: icepick at icepick.info (icepick@icepick.info)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: [mnet-devel] desiderata and open issues in ent
In-Reply-To: <E19zcYF-00023R-00@localhost>
References: <E19zcYF-00023R-00@localhost>
Message-ID: <20030925193526.GA2264@fil.org>

On Wed, Sep 17, 2003 at 09:40:59AM -0400, Zooko wrote:
> 2.  It should handle the reality that a large fraction (around half) of the 
> nodes are behind NAT or firewalls and can't accept incoming connections.  If 
> two nodes are both restricted like that then they cannot be peers of one 
> another in the ent graph.

Surveying LinkSys home gateway products this morning I see that all of them
support UPnP.  I suspect that most new home networks are being setup with
hardward like this that supports UPnP, so maybe we'll get lucky and that 50%
number will go down.

I also have posted code [1] that uses the python COM support to forward a port
to a NAT'ed computer (my work computer for example.).  It's ugly and
blocking but I plan on adding to Mnet this weekend.

This is a great doc, btw, of what needs to be tackled.

icepick

1 - http://icepick.info/2003/09/17/upnp_example.py

From zooko at zooko.com  Thu Sep 25 20:23:24 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: [mnet-devel] desiderata and open issues in ent 
In-Reply-To: Message from icepick@icepick.info 
	of "Thu, 25 Sep 2003 15:35:26 EDT." <20030925193526.GA2264@fil.org> 
References: <E19zcYF-00023R-00@localhost>  <20030925193526.GA2264@fil.org> 
Message-ID: <E1A2ce4-0007au-00@localhost>


 icepick wrote:
>
> On Wed, Sep 17, 2003 at 09:40:59AM -0400, Zooko wrote:
> > 2.  It should handle the reality that a large fraction (around half) of the 
> > nodes are behind NAT or firewalls and can't accept incoming connections.  If 
> > two nodes are both restricted like that then they cannot be peers of one 
> > another in the ent graph.
> 
> Surveying LinkSys home gateway products this morning I see that all of them
> support UPnP.  I suspect that most new home networks are being setup with
> hardward like this that supports UPnP, so maybe we'll get lucky and that 50%
> number will go down.

I'm skeptical.  I think that in a lot of places that NAT is installed it is 
serving a dual role: to multiplex IP addresses and to discourage consumers 
from running servers.  I suspect that if the former need is obviated for some 
reason, that firewalls (or UPnP configurations) will then be installed to 
enforce the latter need.

Networking researchers and Internet hackers like to talk about "solving the 
NAT problem", but I suspect that the people who actually make the decisions 
consider it to be a feature and not a problem.

Here's an interesting rant that I skimmed recently that touches on this:

<a href="http://www.fourmilab.ch/documents/digital-imprimatur/"> 
http://www.fourmilab.ch/documents/digital-imprimatur/
</a>

Regards,

Zooko


From wesley at felter.org  Fri Sep 26 06:00:42 2003
From: wesley at felter.org (Wes Felter)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: [mnet-devel] desiderata and open issues in ent 
In-Reply-To: <E1A2ce4-0007au-00@localhost>
Message-ID: <C2D6759C-EFE6-11D7-ADB7-000393A581BE@felter.org>

On Thursday, September 25, 2003, at 03:23  PM, Zooko wrote:
>
> I'm skeptical.  I think that in a lot of places that NAT is installed 
> it is
> serving a dual role: to multiplex IP addresses and to discourage 
> consumers
> from running servers.  I suspect that if the former need is obviated 
> for some
> reason, that firewalls (or UPnP configurations) will then be installed 
> to
> enforce the latter need.

I disagree. In general, NATs are being installed by end users, not ISPs.

You are right, however, that the ISPs will always find some way of 
propping up their bad business models.

Wes Felter - wesley@felter.org - http://felter.org/wesley/


From myers at maski.org  Tue Sep 30 02:05:27 2003
From: myers at maski.org (Myers W. Carpenter)
Date: Sat Dec  9 22:12:22 2006
Subject: [p2p-hackers] Re: desiderata and open issues in ent
Message-ID: <20030930020527.GA11621@maski.org>

On Thu, Sep 25, 2003 at 04:23:24PM -0400, Zooko wrote:
> Networking researchers and Internet hackers like to talk about "solving the 
> NAT problem", but I suspect that the people who actually make the decisions 
> consider it to be a feature and not a problem.

I suspect that at this point the people who actually make the decisions are
about as clueless as Aunt Millie (sorry Aunt Millie).

Actually, if you want to look at the main decision-maker-by-default,
Microsoft, you see that they are pushing NAT traversal.  Why?  Because it
allows them to have neat features like Video/Voice Conf. (which was actually
the key reason we got these UPnP routers at work).

Also take a look at their Three Degrees project.  Key dependency for this is
IPv6 and Teredo [1].  I'm tempted to see if this could be used within Mnet.

I think it's a good idea to take the bull by the horns now and add in
support for these technologies.  Put an indicator on your app to show the
user what kind of connection they have. For example a yellow indicator or,
if you are Peekabooty, a big frowning bear (maybe he could spit at you and
call you names?) when you can't accept incoming connections.  Make them
feel like they aren't getting the full deal.

Make the user want it to the point that the other people who make decisions
(you know "THEM") can't just slip this one by.

myers


1 - "Teredo, also known as IPv4 network address translator (NAT) traversal
for IPv6"
http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/winxppro/maintain/Teredo.asp