From zooko at zooko.com  Sun Mar  2 13:07:02 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Announcing Mnet v0.6.1
Message-ID: <E18paev-0002jb-00@localhost>

The Mnet Development Team [1] is pleased to announce the release of Mnet v0.6.1.

Mnet is a "universal file space" -- a global space in which you can store and 
retrieve files.  The contents of the universal file space are independent of 
any particular server.  It comes with a GUI file browser that looks a bit like 
a classical file-sharing tool such as Napster.  The code is published under the 
Lesser GNU Public License.

The major improvements of v0.6.1 over v0.6 are GUI improvements, faster 
searches, reduced RAM usage and reduced CPU usage.

This is the last release planned from this branch of the Mnet source code.  The 
next planned release will include a completely new, scalable lookup mechanism 
among other substantial changes.

See the Mnet weather report [2] for the current size of the network and 
measurements of the number, kind, and size of files available.

Please view the ChangeLog for more details:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/mnet/mnet/ChangeLog?rev=HEAD&content-type=text/vnd.viewcvs-markup

Please visit the download page for precompiled packages for Windows, Mac OS X, 
Linux, FreeBSD, and Solaris.  Also available from the download page are 
instructions for compiling the software from source.

http://mnet.sf.net/download.php

Please use Mnet and report bugs via e-mail to <mnet-devel@lists.sf.net>, or by 
using the SourceForge bug tracker:

http://sourceforge.net/tracker/index.php?group_id=43482&atid=436453

The availability and persistence of files is strongly influenced by how stable 
the servers are.  If you run a stable Mnet server it will help.

More information is available on the project web page:

http://mnet.sf.net/

Regards,

Zooko

Developer, Mnet Project

[1] The Mnet Development Team is a loosely-organized band of hackers from around 
the planet who work on the project as a volunteer, non-profit operation in the 
public interest.  Each hacker is either single or else associated with a very 
supportive romantic partner.

[2] The Mnet Weather Report is a series of e-mails to the mnet-devel mailing 
list with the mysterious From: address "Carnivore".
http://sourceforge.net/mailarchive/forum.php?forum_id=7702


From agm at SIMS.Berkeley.EDU  Mon Mar  3 14:43:02 2003
From: agm at SIMS.Berkeley.EDU (Antonio Garcia)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Crawling supernodes.
In-Reply-To: <Pine.BSO.4.44.0302130926160.26119-100000@ludlow.scs.cs.nyu.edu>
Message-ID: <Pine.GSO.4.44.0303031441530.25033-100000@info.sims.berkeley.edu>

Hello,

I am trying to get my client to build a topological map of the supernodes
that I am connected to in Gnutella. Considering the various ways
supernodes pick nodes
to return in pong messages, can I still assume that any hosts returned by
a pinged host are in fact connected to it?

If so, what is the best way to
get overall connectivity information, other than pinging a given host
several times until it reveals all its neighbors?

Thanks, A.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Antonio Garcia-Martinez
cryptologia.com


From justin at chapweske.com  Tue Mar  4 14:57:02 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] New THEX Draft
Message-ID: <3E652F28.90305@chapweske.com>

Hello All,

Please find the new THEX draft at
(http://open-content.net/specs/draft-jchapweske-thex-02.html).

Changes in this version include:

o A *backwards incompatible* change to the THEX hash tree functions that
avoid collisions between the leaf nodes and the internal nodes.  Many
thanks to Zooko for pointing out this problem and many thanks to
the IRTF Cryptographic Forum Research Group for suggesting the
appropriate fix.

o Test vectors for hash trees based on the Tiger algorithm provided by
Gordon Mohr.


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From bram at gawth.com  Thu Mar  6 12:51:01 2003
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] p2p-hackers meeting, this sunday
Message-ID: <Pine.LNX.4.21.0303061204560.12796-100000@ultra.gawth.com>

In keeping with our usual schedule, the monthly p2p-hackers meeting will
be this sunday, the 9th, at 3pm at the metreon.

This month at least me and my brother will be there talking about
Codeville.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From vab at cryptnet.net  Thu Mar  6 15:22:02 2003
From: vab at cryptnet.net (V Alex Brennen)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Miami p2p-hackers meeting ping
In-Reply-To: <Pine.LNX.4.21.0303061204560.12796-100000@ultra.gawth.com>
Message-ID: <Pine.LNX.4.44.0303061807500.5405-100000@cryptnet.net>

Is there enough geographic p2p hacker density in Miami 
(or South Florida) to get a meeting going?

Let me know if you're out there and you'd like to
participate.


	- VAB (@Miami)


From levine at vinecorp.com  Thu Mar  6 23:20:01 2003
From: levine at vinecorp.com (James D. Levine)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] [Silicon Valley] PeerPunks meeting next Tuesday
Message-ID: <Pine.LNX.4.44.0303062209590.1990-100000@localhost.localdomain>


You know the drill by now...

Where:

    Dana Street Roasting Company
    744 W Dana St, Mountain View,CA 94041
    Phone: (650) 390-9638

    This is just 1/2 block off Castro St.


When:  7:00 pm onward

    
From eryou at ifs.com.ky  Fri Mar  7 03:30:03 2003
From: eryou at ifs.com.ky (eryou@ifs.com.ky)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Vacation Message
Message-ID: <20030307112820.19079.qmail@lnxpwapp01.ifs.com.ky>

*** Automated Response ***

I am currently on vacation.  Please contact Cam McKeown (cam.mckeown@ifs.com.ky) for any items.

-- 

regards,

Robert Eryou
Internet Financial Services Ltd. [IFS]
George Town, Grand Cayman, Cayman Islands
345.945.8607
www.ifs.com.ky

PGP: https://www.ifs.com.ky/pgp/Robert_Eryou.asc


I must create a system, or be enslaved by another man's.
        ~William Blake


From levine at vinecorp.com  Mon Mar 10 13:55:03 2003
From: levine at vinecorp.com (James D. Levine)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] [Silicon Valley] PeerPunks meeting tomorrow Tuesday 3/11
Message-ID: <Pine.LNX.4.44.0303101337150.1759-100000@localhost.localdomain>


Come hear about what you missed at CodeCon :)  See you there...

James


----------------

Where:

    Dana Street Roasting Company
    744 W Dana St, Mountain View,CA 94041
    Phone: (650) 390-9638

    This is just 1/2 block off Castro St.


When:  7:00 pm onward


From levine at vinecorp.com  Tue Mar 11 11:32:02 2003
From: levine at vinecorp.com (James D. Levine)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] [Silicon Valley] PeerPunks TONIGHT in Mountain View
Message-ID: <Pine.LNX.4.44.0303111123560.2109-100000@localhost.localdomain>


Last pester...

James


----------------

Where:

    Dana Street Roasting Company
    744 W Dana St, Mountain View,CA 94041
    Phone: (650) 390-9638

    This is just 1/2 block off Castro St.


When:  7:00 pm onward

-- 


From melc at fashionvictims.com  Mon Mar 17 08:35:02 2003
From: melc at fashionvictims.com (melc)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Announcing version 0.2 of CP2PC
Message-ID: <Pine.GSO.4.50.0303171723110.13340-100000@blade008.cs.vu.nl>

After continued development on the CP2PC uberclient file-sharing
application, we are proud to announce release 0.2 of the code.
As part of the CP2PC project we have developed a minimal
programming interface (API) to peer-to-peer file-sharing systems.
This release contains improvements to the 0.1 implementation, but more
importantly it contains a new GUI client.  This client allows seamless
access to multiple file-sharing networks through a single GUI application.

The code for this release can be found at:
        http://sourceforge.net/projects/cp2pc

The current release contains

    * GUI. The GUI provides access to multiple file-sharing networks
      through a single interface.  The interface is similar to existing
      file-sharing applications except that it allows one to choose
      which networks to search for files, or which networks local
      files should be published on.   Our GUI also provides a more
      advanced search interface than most file-sharing applications,
      allowing users to specify specific RDF queries to include in their
      search request as well as simple keyword or attribute based
      search criteria.
    * Core CP2PC code. This provides a default implementation
      of the CP2PC API, an implementation of core facilities (that is
      facilities that may be used by file-sharing network components,
      including a local RDF triple database which implements the
      Tristero search interfaces, a simple CP2PC shell interface,
      donload and upload monitors, etc.), and a skeleton component that
      can be extended to build new file-sharing network backends.
    * Gnutella component.  This is a file-sharing network backend
      component that can be used to connect to and use the Gnutella
      network.  Currently, the component provides a simple shell like
      interface to the network (the shell implements CP2PC API calls).
      The component can also be used by other programs as a library.
      This component uses Limewire [1] code to do the actual Gnutella
      specific work - it forms a bridge between the CP2PC API and the
      Limewire code.
    * GDN component. This is a file-sharing network backend
      component that can be used to connect to and use the GDN [2]
      network.  Currently, the component provides a simple shell like
      interface to the network (the shell implements CP2PC API calls).
      The component can also be used by other programs as a library.
      The component acts as a GDN client; creating, binding to,
      accessing, and destroying objects as necessary.


The code is written in Java and can be downloaded as a
tarball or from CVS.  Required libraries (i.e., jar files) can
also be downloaded from our SourceForge site.   Both the tarball and CVS
contain the CP2PC documentation, including a description of the API
and of the mapping of the API to various file-sharing networks.
Note that the code is still alpha quality.

The code is released under the LGPL.

Future work for the CP2PC project will include expansion of the GUI
frontend, an XML-RPC interface for components and the creation of more
backend components.

Ihor.


[1] http://www.limewire.org
[2] http://www.cs.vu.nl/globe


melc.

--
melc@fashionvictims.com

From kevin at atkinson.dhs.org  Mon Mar 17 10:36:02 2003
From: kevin at atkinson.dhs.org (Kevin Atkinson)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Announcing a new version of DistribNet
Message-ID: <Pine.LNX.4.44.0303171324130.912-200000@kevin-pc.atkinson.dhs.org>


I just released a new snapshot of DistribNet.  DistribNet is a Global 
peer-to-peer Internet file system in which anyone can tap into or add 
content to.  You can find it at http://distribnet.sourceforge.net/.

Compared the last version I released over a year ago this version is a 
functional network.  However it is in no way ready to be used for anything 
but testing.

I would appreciate it if other would take a look and give feedback.  I 
would especially be interested if someone which accesses to a large 
cluster can test it out as I have not actually tested if it functions over 
the network (all my testing was done by launching multiple nodes on the 
same computer).

Expect more improvements to come.


                           Release Notes

So far DistribNet has only been tested on my computer which is RedHat
8.0 (Kernel 2.4).  Other linux systems should work.  Other POSIX
systems might work.  Forget Win32 for now.  DistribNet requires:

  GNU Libc (I use a few thread safe extensions)
  Gcc 3.1 or better (Gcc 2.95 may work)
  POSIX Threads
  Berkeley DB Version 4 or better
  OpenSSL (I use version 0.9.6b)
  Perl (for the scipts)
  A Web Browser

                         Status and Future

The routing table is pretty much done.  Future versions will use a
better selection algorithm for determining which nodes to use.

Data keys are implemeted.  However, no caching is done.

Other key types are not.

The protocol is subject to change.  Absolutly no garantee will be made
that different versions of DistribNet.  The photocol will stabilize
once I get the basics done.

Absolutely no security.  Once I get all the basics done I work on
adding security, which includes encrypting all communication.

Integers are not properly converted to network order.  This will be
done at the same time the photocol is being stabilized.

Basically until I get the basics done, don't expect to be able to use
DistribNet for anything other than testing.

However, please do test it.


Since DistribNet is my Masters Theseus I will be working nearly full
time on it for the next couple of months.  However, I would defiantly
appreciate help with the implementation.

General ideas, of course, are also welcome.  Please post them to
distribnet-devel at lists.sourceforge.net and not to me directly so
others can benefit from our discussion.


Attached is a text copy of the distribnet design document.

-- 
http://kevin.atkinson.dhs.org

-------------- next part --------------
next_inactive up previous


                               DistribNet                                
                                                                         
               Kevin Atkinson (kevin at atkinson dhs org)                

            Project Page: http://distribnet.sourceforge.net/             
       Mailing list: http://lists.sourceforge.net/lists/listinfo/        
                            distribnet-devel                             

Abstract:

A global peer-to-peer Internet file system in which anyone can tap into
or add content to.

1 Overview

NOTE

This paper was initially written in response to my dislike of Freenet and
similar network that focus on complete anonymity. Thus a large deal of
the comments are directed towards that community. My ultimate goal is to
design a general purpose p2p network and not just something to replace of
Freenet. In fact in some ways DistribNet won't replace Freenet due to the
anonymity issue. I plan on eventually modifying this paper accordantly to
address the general p2p (and web) community. For now please keep the
intended audience in mind when reading this section.

1.1 Main Goal

  * To allow anyone, possibly anonymously, to publish web sites with out
    having to pay for the bandwidth for a commercial provider or having
    to put up with the increasingly ad ridden free web sites. The only
    thing the author of the web site should have to worry about is the
    contents of the web site itself.

1.2 (Possibly Impossible) Goals

  * Really fast lookup to find data. The worst case should be O(log(n))
    and the average case should be O(1) or very close to it.
  * Actually retrieving the data should also be really fast. Popular data
    should be sitting on the same subnet. On average it should be as fast
    or faster than a typical web site (such as slashdot, Google, etc.).
    It should make effective use of the topology of the Internet to to
    minimize network load and maximize performance.
  * General searching based on keywords will be build into the protocol
    from the beginning. The searching faculty will be designed in such a
    way to make message boards trivial to implement.
  * Ability to update data while keeping old revisions around so data
    never disappears until it is truly unwanted. No one person will have
    the power to delete data once it spreads throughout the network.
  * Will try very hard to keep all but the most unpopular content from
    falling off the network. Basically before deleting a locally
    unpopular key it will first check if other nodes are storing the key
    and how popular they find the key. If not enough nodes are storing
    the key and there is any indication that the data may be useful at a
    latter date it will not delete it unless it absolutely has to. And if
    it does delete it it will first try uploading it to other nodes with
    more disk space available.
  * Ability to store data indefinitely if someone is willing to provide
    the space for it (and being able to find that data in log(n) time).
  * Extremely robust so that the only way to kill the network is to
    disable almost all of the nodes. The network should still function
    even if say 90% of it goes down.
  * Extremely effect CPU-wise so that a fully functional node can run in
    the background and only take 1-2% of the CPU.

1.3 Applications

I would like the protocol to be able to effectually support (ie with out
any ugly hacks that many of the application for Freenet use)

  * Efficient Web like sites (with HTTP gateway to make browsing easy)
  * Efficient sharing of files large and small.
  * Public message forms (with IMAP gateway to make reading easy)
  * Private Email (the message will off course be encrypted so only the
    intended recipient can read it, again with IMAP gateway)

And maybe:

  * Streaming Media
  * Online Chat (with possible IRC or similar gateway)

1.4 Anti-Goals

Also see philosophy for why I don't find these issues that important

  * Complete anonymity for the browser. I want to focus first on
    performance than on anonymity. In fact I plan to use extensive
    logging in the development versions so that I track network
    performance and quickly cache performance bugs. As DistribNet
    stabilizes anonymity will be improved at the expense of logging.
   
    The initial version will only use cryptology when absolutely
    necessary (for example key signing). Most communications will be done
    in the clear. After DistribNet stabilizes encryption will slowly be
    added.
   
    Please note that I still wish to allow for anonymous posting of
    content. However, without encryption, it probably won't be as
    anonymous as Freenet or your GNUNet.
   
  * Data in the cache will be stored in a straight forward manner. No
    attempt will be made to prevent the node operate from knowing what is
    in his own cache. Also, by default, very little attempt will be made
    to prevent others from knowing what is a particular node cache.

1.5 Philosophy

  * I have nothing against complete anonymity, it is just that I am
    afraid that both Freenet and GNUNet or more designed around the
    anonymity and privacy issues then they are around the performance and
    scalability issues.
  * For most type of things the level of anonymity that Freenet and
    GNUNet offers is simply not needed. Even for copyrighted and censored
    material there is, in general, little risk in actually viewing the
    information because it is simply impractical to go after every single
    person who access forbidden information. Most all of the time the
    lawsuits and such are after the original distributors of the
    information and not the viewers. There for DistribNet will aim to
    provide anonymity for distributing information, but not for actually
    viewing it. However, since there is some information where even
    viewing it is extremely risky, DistribNet will eventually be able to
    provide the same level of anonymity that Freenet or GNUNet offers,
    but it will be completely optional.
  * I also believe that knowing what is in one owns datastore and being
    able to block certain type of material from one owns node is not that
    big of a deal. Unless almost everyone blocks a certain type of
    information the availability of blocked information will not be
    harmed. This is because even if 90% of the nodes block say, kiddie
    porn, the information will still be available on the other 10% of the
    nodes which, if the network is designed correctly, should be more
    than enough for anyone to get at blocked information. Furthermore,
    since the source code for DistribNet will be protected under the GPL
    or similar license, it will be completely impractical for other to
    force a significant number of nodes to block information. Due to the
    dynamic nature of the cache I find it legally difficult to hold
    anyone responsible for the contents of there cache as it is
    constantly changing.

2 DistribNet Architecture

Two types of keys: Map and Data keys. Maps keys are hashed based on there
identification and can be updated, Data keys are hashed based on there
content and consequently can not be updated.

There will be three type of storage of keys, Permanent, Cache, and
Pointers. Permanent keys will be used to ensure the available of content,
the cache will be used exactly like a typical cache will be used, and
pointers will be used to be be able to find content.

Map keys will be routed based on the SHA-1 hash on the identification
using a Pastry[4] like system. Data are not routed and will be stored
based on where they are retrieve. Map keys will be used to be able to
find data keys.

2.1 Key Types

There will essentially be two types of keys. Map keys and data keys. Map
keys will be uniquely identified in a similar manner as freenet SSK keys.
Data keys will be identified in a similar manner as freenet's CHK keys.

Map keys will contain the following information:

  * Short Description
  * Public Namespace Key
  * Timestamped Index pointers
  * Timestamped Data pointers

At any given point in time each map key will only be associated with one
index pointer and one data pointer. Map keys can be updated by appending
a new index or data pointer to the existing list. By default, when a map
key is queried only the most recent pointer will be returned. However,
older pointers are still there and may be retrieved by specifying a
specific date. Thus, map keys may be updated, but information is never
lost or overwritten.

Data keys will be very much like freenet's CHK keys except that they will
not be encrypted. Since they are not encrypted delta compression may be
used to save space.

There will not be anything like freenet's KSK keys as those proved to be
completely insure. Instead Map keys may be requested with out a
signature. If there is more than one map key by that name than a list of
keys is presented sorted by popularity. To make such a list meaning full
every public key in freenet will have a descriptive string associated
with it.

2.1.1 Data Key Details

Data keys will be stored in maximum size blocks of just under 32K. If an
object is larger than 32K it will be broken down into smaller size chunks
and an index block, also with a maximum size of about 32K, will be
created so that the final object can be reassembled. If an object is too
big to be indexed by one index block the index blocks themselves will be
split up. This can be done as many times as necessary therefore providing
the ability to store files of arbitrary size. DistribNet will use 64 bit
integers to store the file size therefore supporting file sizes up to
264-1 bytes.

Data keys will be retrieved by blocks rather than all at once. When a
client first requests a data key that is too large to fit in a block an
index block will be returned. It is then up the client to figure out how
to retrieve the individual blocks.

Please note that even though that blocks are retrieved individually they
are not treated as truly independent keys by the nodes. For example a
node can be asked which blocks it has based on a given index block rather
than having to ask for each and every data block. Also, nodes maintain
persistent connections so that blocks can be retrieved one after another
without having to re-establish to connection each time.

Data and index blocks will be indexed based on the SHA-1 hash of there
contents. The exact numbers of as follows:

+-------------------------------------------------------------+
| Data Block Size:                        | 215 - 128 = 32640 |
|-----------------------------------------+-------------------|
| Index block header size:                | 40                |
|-----------------------------------------+-------------------|
| Maximum number of keys per index block: | 1630              |
|-----------------------------------------+-------------------|
| Key Size:                               | 20                |
+-------------------------------------------------------------+

Maximum object sizes:

    direct   => 214.99 bytes , about 31.9 kilo
   
    1 level  => 225.66 bytes , about 50.7 megs
   
    2 levels => 236.34 bytes , about 80.8 gigs
   
    3 levels => 247.01 bytes , about 129 tera
   
    4 levels => 257.68 bytes
   
    5 levels => 268.35 bytes (but limited to 264 - 1)

Why 32640?

A block size of just under 32K was chosen because I wanted a size which
will allow most text files to fit in one block, most other files with one
level of indexing, and just about anything anybody would think of
transferring on a public network in two levels and 32K worked out
perfectly. Also, files around 32K are rather rare therefor preventing a
lot of of unnecessary splitting of files that don't quite make it. 32640
rather than exactly 32K was chosen to allow some additional information
to be transfered with the block without pushing the total size over 32K.
32640 can also be stored nicely in a 16 bit integer without having to
worry if its signed or unsigned.

However, the exact block size is not fixed in stone. If, at a latter
date, a different block size is deemed to be more appropriate than this
number can be changed....

2.2 Storage

Permanent keys will be distributed essentially randomly. However, to
insure availability the network will insure at least N nodes contain the
data. Nodes which are responsible for maintaining a permanent key will
know about all the other nodes on the network with are also responsible
for that key. From time to time it will check up on the other nodes to
make sure they are still live and if less than N-1 other nodes are live
it will pick another node to to ask to maintain a copy of the key. It
will first try nodes which already have the key in its cache and if they
all refuse or none of them do. It will chose a random node to ask and
will keep trying until some node accepts or one the original nodes
becomes live again.

Cached keys will be DistribNet based on where it will do the most good
performance wise. How cache keys will be managed is still undecided. For
the first implementation it will likely be stored on the nodes which have
previously requested the key.

Pointer keys will basically be distributed based on the numeric distance
of the hash of the key from the hash of the node's identification. Since
pointer keys contain very little data they will be an extremely large
amount of redundancy. Pointer keys will contain two types of pointers.
Pointers to permanent keys and pointers to permanent keys. Pointer keys
on different nodes will all contain the same permanent pointers but will
only contain pointers to cached keys to nodes which or near by. There
will be an upper limit to the number of pointers within an pointer key
any one node will have.

3 DistribNet Routing

Map keys will be routed based on the SHA-1 hash of their identification
based in a similar manor as done in Pastry[4]. This section will assume
the reader is familiar with how Pastry works and will focus on how
DistribNet differs from Pastry.

Each node on DistribNet is uniquely identifies by the 160-bit SHA-1 hash
of the public key. Since SHA-1 hashes are used the nodes will be evenly
distributed. Keys in DistribNet are stored based on bitwise closeness.
Bitwise closeness is based on the number of common bits two keys have.
Unlike Pastry, numerical closeness is generally not used.

The routing contains 8 rows with each row containing 24 entries each. In
general, DistribNet tries to maintain at least two nodes for each entry.
The number of rows does not need to be fixed and it can change based on
the network size. It may also be possible that the number of entries per
row does not necessarily have to be fixed. However, This idea has not
been exported in. 4 was chosen as the base size for several reasons 1) it
is a power of two, 2) when keys are thought of as a sequence of digits a
base size of 4 means that the digits will be hexadecimal, 3) the Pastry
paper hinted that 4 would be a good choice. The number of rows was chosen
to be large enough so that there is no possibility that the last row will
be used when dealing with a moderate size network during testing.

Unlike Pastry their is no Leaf set. Instead the ``leaf set'' consists of
all rows which are not ``full''. A full row is a row which contains 15
full entries with extra empty entry being the one which represents the
common digit with the node's key, and thus will never be used. Not having
a true ``lead set'' simplifies the implementation since a seperate list
does not need to be maintianed. This also means that all the nodes in the
leaf set will maintain the same set. I have not determened if this is a
good or bad thing.

A row is considered full in DistribNet if 15 of the 16 entries are full
in the current node AND other nodes also have 15 of the 16 entries full
(clarify...). For each full row DistribNet will try to maintain at least
two nodes for each entry. This way if one node goes down the other one
can be used without effecting performance. When a node is determined to
be down (as oppose to being momentary offline) DistribNet will try to
replace it with another node that is up. With this arraignment is is
extremely likely that at least one on the two nodes will be available. A
full row can become a leaf row if the entry count drops below 15.

For each non-full row (ie in the Leaf Set) DistribNet will attempt to
maintain as many nodes as are available for that entry so that every
other node in the leaf set is accounted for. From time to time DistribNet
will contact another node in the leaf set and synchronize its leaf set
with it. This is possible because all nodes in the leaf set will have the
same set. Down nodes in the leaf set will be removed, but the criteria
for when a node is down for a leaf set is stricter than the criteria for
a full row. If a lead row becomes a full row than excess nodes will be
removed.

DistribNet also maintains an accurate estimate on the number of nodes
that are on the network. This is possible because unlike with network
such as freenet, all nodes are accounted for.

To store a Map key the 3 bitwise closest nodes will get it.

When looking for a key the 8 closest nodes will be tried.

The routing table is implemented in the files routeing_*.?pp in the src/
directory of the distribution.

4 Retrieval of Data keys

Each key request is coupled by a hops-to-try HTT request. This node
controls the number of additional nodes that can be contacted to retrieve
the data. If the HTT number is 0 than the request will fail unless it has
the node. This number is only for actually retrieving the data, not
finding it.

When a node A wants to retrieve a key K either two things will happen. If
it has good reason to believe that a nearby node has the key it will
attempt to retrieve it from that node. otherwise it will send a request
to get other nodes which have the key. If a nearby node has the key it
will ask the that node for the key. If it doesn't it will ask some nearby
to do so on its behave.

To find a key ... To do this it will contact node B which will in tern
contact C etc, until an answer is found which for the sake of argument
will be node E. Node E will then send a list of possible nodes L which
contain key K directly to node A. Node E will then send the result to
node D, which will send it to C, etc. Node E will also add node A to list
L with probability of say 10%, Node D will do the same but with a
probability of say 25%, etc. This will avoid the problem having the list
L becomes extremely large for popular data but allow nodes close to A to
discover that A has the data since nodes close to A will likely contact
the same nodes that A tried. Since A requested the location of key K it
is assumed that K will will likely download the data. If this assumption
is false than node A will simply be removed at the list latter on.

Once A retrieves the list it will pick a node from the list L based on
some evaluation function, lets say it picks node X. Node X will then
return the key to node A. The evaluation function will take several
factors, into account, including distance, download speed, past
reputation, and if node A even knows anything about the new node.

If node X does not send the key back to node A for what ever reason it
will remove node X from the list and try again. It will also send this
information to node B so it can consider removing node X from its list,
it will then in term notify node C of the information, etc. If the key is
an index block it will also send information of what parts of the
complete key node X has. If the key is not an index block than node a is
done.

If the key is an index block than node A will start downloading the
sub-blocks of key K that node X has. At the same time, if the key is
large or node X does not contain all the sub-blocks of K, node X will
chose another node from the list to contact, and possible other nodes
depending on the size of the file. It will then download other parts of
the file from the other nodes. Which blocks are download from which nodes
will chance based on the download speed of the nodes so that more blocks
are download from faster nodes and less from slower, thus allowing the
data to be transfered in the least amount of time. If after contacting a
certain number of nodes there are still parts of the key that are not
available on any of those nodes, node A will perform a separate query for
the individual blocks. However, I image, in practice this will rarely be
necessary.

4.1 Distance determination

One very course estimate for node distance would be to take the numerical
distance between two nodes ip address since nodes closer to easy other
numerically are likely to be share the same gateways and nodes really
close are likely to be on the same subnet.

Another way to estimate node distance releases on the the fact that node
distance, for the most part, obeys the triangle inequality. For each node
in the list of candidate nodes some information about the estimated
distance between that node, node E, in the list and the node storing the
list is maintained by some means. For node A to estimate the distance
between a node on the list, node X, and itself all it has to do is
combine the distance between it and E with the distance between E and X.
The combination function will depend on the aspect of distance that is
being measured. For the number of hops it will simply add them, for
download speed it will take the maximum, etc.

5 Limitations

Because they is no indirection when retrieving data most of the data on
any particular node would be data that a local node user requested at
some point in time. This means that it is fairly easy to tell what which
keys a particular user requested. Although complete anonymity for the
browser is one of my anti-goals this is going a bit to far. One solution
for this is to do something similar that GNUNet does which is described
in [3].

It is also blatantly obvious which nodes have which keys. Although I do
not see this as a major problem, especially if a solution for the first
problem is found, it is something to consider. I will be more than happy
to entertain solutions to this problem, provided that it doesn't effect
effectually that much.

6 Implementation Details

An implementation for DistribNet is available at http://
distribnet.sourceforge.net/.

6.1 Physical Storage

Blocks are currently stored in one of three ways

 1. block smaller than a fixed threshold (currently 1k) are stored using
    Berkeley DB (version 3.3 or better).
 2. blocks larger than the threshold are stored as files. The primary
    reason for doing this is to avoid limiting the size of data store by
    the maximum size of a file which is often 2 or 4 GB on most 32-bit
    systems.
 3. blocks are not stored at all, instead they are linked to an external
    file out side of the data store much like a symbolic link links to
    file out side of the current directory. However since blocks often
    only represent part of the file the offset is also stored as part of
    the link. These links are stored in the same database that small
    blocks are stored in. Since the external file can easily be changed
    by the user, the SHA-1 hashes will be recomputed when the file
    modification data changes. If the SHA-1 hash of the block differs all
    the links to the file will be thrown out and the file will be
    relinked. (This part is not implemented yet).

Most of the code for the data keys can be found in data_key.cpp

6.2 Language

DistribNet is/will be written in fairly modern C++. It will use several
external libraries however it will not use any C++ specific libraries. In
particular I have no plan to use any sort of Abstraction library for
POSIX functionally. Instead thin wrapper classes will be used which I
have complete control over and will serve mainly to make the process of
using POSIX functions less tedious rather than abstract away the details
of using them.

Bibliography

   
1   GNUNet. http://www.ovmj.org/GNUnet/ and http://www.gnu.org/software/
    GNUnet/
   
2   Freenet. http://freenet.sourceforge.net/
   
3   Krista Bennett and Christian Grothoff. ``GNUnet - anonymity for
    free''. http://gecko.cs.purdue.edu/GNUnet/papers.php3
   
4   Antony Rowstron and Peter Druschel. ``Pastry: Scalable, decentralized
    object location and routing for large-scale peer-to-peer systems''.
    http://research.microsoft.com/ antr/Pastry/pubs.htm

About this document ...

DistribNet

This document was generated using the LaTeX2HTML translator Version 2002
(1.62)

Copyright � 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning
Unit, University of Leeds.
Copyright � 1997, 1998, 1999, Ross Moore, Mathematics Department,
Macquarie University, Sydney.

The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/
lyx_tmpdir1352U04iwM/lyx_tmpbuf0/design.tex

The translation was initiated by Kevin Atkinson on 2003-03-17
-------------------------------------------------------------------------
next_inactive up previous

Kevin Atkinson 2003-03-17
From zooko at zooko.com  Sun Mar 30 12:53:02 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Sloppy Chord
Message-ID: <E18zjlw-0008GF-00@localhost>

A fundamental advantage of Pastry [1] and Kademlia [2] over Chord [3] would seem 
to be that in the former two, there is a "free choice" in which peers a node 
links to, whereas in Chord each peer is uniquely determined.

The Pastry papers describe this feature in terms of an arbitrary "proximity 
metric".  The Kademlia paper says:

"Worse yet, asymmetry leads to rigid routing tables.  Each entry in a Chord 
 node's finger table must store the precise node preceding some interval in the 
 ID space.  Any node actually in the interval would be too far from nodes 
 preceding it in the same interval.  Kademlia, in contast, can send a query to 
 any node within an interval, ..."

This feature seems very important to me, not because the "free choice" part can 
be used to select peers with low latency or few network hops, but because it can 
be used to select on arbitrary other criteria, such as avoiding peers that are 
unreachable (due to an incompletely connected underlying network), avoiding 
peers that are untrusted, or other criteria.

But is this rigidity really a consequence of Chord's asymmetric distance metric?

Brandon Wiley suggested to me that one could have a "Sloppy Chord", using the 
same, asymmetric, distance metric, but changing the requirement of the k'th 
finger table entry from "the precessor of selfId + 2^k" to "a node in the 
interval 2^(k-1) through 2^k".

After thinking about it for a few minutes, there isn't any obvious reason why 
this wouldn't have the same asymptotic performance guarantees as proper MIT 
Chord.

Regards,

Zooko

[1] http://citeseer.nj.nec.com/rowstron01pastry.html
[2] http://citeseer.nj.nec.com/529075.html
[3] http://citeseer.nj.nec.com/stoica01chord.html

http://zooko.com/
         ^-- under re-construction: some new stuff, some broken links

From gojomo at usa.net  Sun Mar 30 18:56:02 2003
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Sloppy Chord
References: <E18zjlw-0008GF-00@localhost>
Message-ID: <01ee01c2f730$fb4e0380$0a0a000a@golden>

You've probably seen the IPTPS03 paper, "Sloppy Hashing and Self-Organized
Clusters", by David Mazieres and Michael Freedman:

   http://iptps03.cs.berkeley.edu/final-papers/coral.pdf

I suspect many of the strict Chord (and other DHT) assumptions can be 
loosened while still achieving excellent, nearly equivalent performance -- 
though it then becomes harder to rigorously prove such performance. 

- Gordon
  
----- Original Message ----- 
From: "Zooko" <zooko@zooko.com>
To: <p2p-hackers@zgp.org>
Sent: Sunday, March 30, 2003 12:51 PM
Subject: [p2p-hackers] Sloppy Chord


> 
> A fundamental advantage of Pastry [1] and Kademlia [2] over Chord [3] would seem 
> to be that in the former two, there is a "free choice" in which peers a node 
> links to, whereas in Chord each peer is uniquely determined.
> 
> The Pastry papers describe this feature in terms of an arbitrary "proximity 
> metric".  The Kademlia paper says:
> 
> "Worse yet, asymmetry leads to rigid routing tables.  Each entry in a Chord 
>  node's finger table must store the precise node preceding some interval in the 
>  ID space.  Any node actually in the interval would be too far from nodes 
>  preceding it in the same interval.  Kademlia, in contast, can send a query to 
>  any node within an interval, ..."
> 
> This feature seems very important to me, not because the "free choice" part can 
> be used to select peers with low latency or few network hops, but because it can 
> be used to select on arbitrary other criteria, such as avoiding peers that are 
> unreachable (due to an incompletely connected underlying network), avoiding 
> peers that are untrusted, or other criteria.
> 
> But is this rigidity really a consequence of Chord's asymmetric distance metric?
> 
> Brandon Wiley suggested to me that one could have a "Sloppy Chord", using the 
> same, asymmetric, distance metric, but changing the requirement of the k'th 
> finger table entry from "the precessor of selfId + 2^k" to "a node in the 
> interval 2^(k-1) through 2^k".
> 
> After thinking about it for a few minutes, there isn't any obvious reason why 
> this wouldn't have the same asymptotic performance guarantees as proper MIT 
> Chord.
> 
> Regards,
> 
> Zooko
> 
> [1] http://citeseer.nj.nec.com/rowstron01pastry.html
> [2] http://citeseer.nj.nec.com/529075.html
> [3] http://citeseer.nj.nec.com/stoica01chord.html
> 
> http://zooko.com/
>          ^-- under re-construction: some new stuff, some broken links
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers

From kevin at atkinson.dhs.org  Mon Mar 31 00:31:02 2003
From: kevin at atkinson.dhs.org (Kevin Atkinson)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Sloppy Chord
In-Reply-To: <01ee01c2f730$fb4e0380$0a0a000a@golden>
Message-ID: <Pine.LNX.4.44.0303310316200.912-100000@kevin-pc.atkinson.dhs.org>

What advantages does Chord[1] offer over Pastry[2]?  I chose Pastry over
chord because of it simplicity, and the because of the so called "free
choice" property.

In fact my network DistribNet[3], is essentially implementing the ideas of
Coral[4] but on top of Pastry rather than chord.  In particular nodes can 
freely join and leave the network without requiring any movement of keys.  
The only thing a network has to do when it joins is to synchronize its 
routing table with the network.  Nothing has to be done for a node to 
leave the network.

So my question is: What advantages does Chord offer performance wise over 
Pastry?

[1] Chord: http://www.pdos.lcs.mit.edu/chord/
[2] Pastry: http://research.microsoft.com/antr/Pastry/pubs.htm
[3] DistribNet: http://distribnet.sourceforge.net/
[4] Coral: http://www.scs.cs.nyu.edu/coral/

--- 
http://kevin.atkinson.dhs.org


From kevin at atkinson.dhs.org  Mon Mar 31 01:03:02 2003
From: kevin at atkinson.dhs.org (Kevin Atkinson)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Sloppy Chord
In-Reply-To: <Pine.LNX.4.44.0303310316200.912-100000@kevin-pc.atkinson.dhs.org>
Message-ID: <Pine.LNX.4.44.0303310400080.912-100000@kevin-pc.atkinson.dhs.org>

On Mon, 31 Mar 2003, Kevin Atkinson wrote:

[Broken link corrected below]

> What advantages does Chord[1] offer over Pastry[2]?  I chose Pastry over
> chord because of it simplicity, and the because of the so called "free
> choice" property.
> 
> In fact my network DistribNet[3], is essentially implementing the ideas of
> Coral[4] but on top of Pastry rather than chord.  In particular nodes can 
> freely join and leave the network without requiring any movement of keys.  
> The only thing a network has to do when it joins is to synchronize its 
> routing table with the network.  Nothing has to be done for a node to 
> leave the network.
> 
> So my question is: What advantages does Chord offer performance wise over 
> Pastry?
> 
> [1] Chord: http://www.pdos.lcs.mit.edu/chord/
> [2] Pastry: http://research.microsoft.com/antr/Pastry/pubs.htm
Make that "http://research.microsoft.com/~antr/Pastry/" sorry.
> [3] DistribNet: http://distribnet.sourceforge.net/
> [4] Coral: http://www.scs.cs.nyu.edu/coral/

-- 
http://kevin.atkinson.dhs.org


From zooko at zooko.com  Mon Mar 31 06:43:02 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Sloppy Chord 
In-Reply-To: Message from Kevin Atkinson <kevin@atkinson.dhs.org> 
   of "Mon, 31 Mar 2003 03:30:27 EST." <Pine.LNX.4.44.0303310316200.912-100000@kevin-pc.atkinson.dhs.org> 
References: <Pine.LNX.4.44.0303310316200.912-100000@kevin-pc.atkinson.dhs.org> 
Message-ID: <E1900Tl-0004so-00@localhost>

 Kevin Atkinson <kevin@atkinson.dhs.org> writes:
>
> What advantages does Chord[1] offer over Pastry[2]?  I chose Pastry over
> chord because of it simplicity, and the because of the so called "free
> choice" property.

Heh heh.  Simplicity is funny.  I believe that Brandon Wiley prefers Chord over 
Pastry because of its simplicity.  ;-)

I prefer Pastry-or-Kademlia over Chord because of the free choice and, to a 
lesser extent, because of the symmetric metric.

I prefer Kademlia over Pastry because of the simplicity of exposition and 
argument in the original Kademlia paper.  Also Kademlia has a simplification 
over Pastry (no "leaf sets"), but I suspect that this simplification might come 
with a cost in reduced robustness.

> In fact my network DistribNet[3], is essentially implementing the ideas of
> Coral[4] but on top of Pastry rather than chord.

Wow!  I need to read about Coral now.  I haven't read the iptps03 proceedings 
yet, so I am obviously behind the times now.

The Coral page says it will do primarily Kademlia, and also Chord.

Regards,

Zooko

http://zooko.com/
         ^-- under re-construction: some new stuff, some broken links

From kevin at atkinson.dhs.org  Mon Mar 31 07:18:01 2003
From: kevin at atkinson.dhs.org (Kevin Atkinson)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Sloppy Chord
In-Reply-To: <E1900Tl-0004so-00@localhost>
Message-ID: <Pine.LNX.4.44.0303310948230.914-100000@kevin-pc.atkinson.dhs.org>

On Mon, 31 Mar 2003, Zooko wrote:

>  Kevin Atkinson <kevin@atkinson.dhs.org> writes:
> >
> > What advantages does Chord[1] offer over Pastry[2]?  I chose Pastry over
> > chord because of it simplicity, and the because of the so called "free
> > choice" property.
> 
> Heh heh.  Simplicity is funny.  I believe that Brandon Wiley prefers
> Chord over Pastry because of its simplicity.  ;-)

Well I guess it is how you look at it.  I thought Pastry was simpler
than Chord, at least the idea.  My implementation of Pastry makes several 
simplifications:
  1) No Neighborhood set.
  2) No "real" leaf set.  Instead any row which does not have at least one
     node in each entry is a leaf row.
  3) Distance is strictly bitwise not numerical.  (bitwise meaning the 
     number of common bits.  Ie 1101 and 1111 would have a distance of 2)

2 and 3 mean that all nodes in the "leaf set" have identical leaf sets.  
which simplifies the implementation.

> I prefer Pastry-or-Kademlia over Chord because of the free choice and, to a 
> lesser extent, because of the symmetric metric.
> 
> I prefer Kademlia over Pastry because of the simplicity of exposition and 
> argument in the original Kademlia paper.  Also Kademlia has a simplification 
> over Pastry (no "leaf sets"), but I suspect that this simplification might come 
> with a cost in reduced robustness.

I will have to look at Kademlia.  It may be very similar to what I did 
with Pastry.

> > In fact my network DistribNet[3], is essentially implementing the ideas of
> > Coral[4] but on top of Pastry rather than chord.
> 
> Wow!  I need to read about Coral now.  I haven't read the iptps03 proceedings 
> yet, so I am obviously behind the times now.

Well I have not herd of Kademlia or Coral until yesterday :-|.

--- 
http://kevin.atkinson.dhs.org


From blanu at bozonics.com  Mon Mar 31 12:06:02 2003
From: blanu at bozonics.com (Brandon Wiley)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Sloppy Chord 
In-Reply-To: <E1900Tl-0004so-00@localhost>
Message-ID: <Pine.NEB.4.03.10303311339480.19424-100000@galaxie.superpuppy.net>

> Heh heh.  Simplicity is funny.  I believe that Brandon Wiley prefers Chord over 
> Pastry because of its simplicity.  ;-)

I prefer Chord over Kademlia because of simplicity and ease of explanation
to people who might want to implement it. I don't have much of an opinion
on Pastry currently.


From blanu at bozonics.com  Mon Mar 31 12:55:03 2003
From: blanu at bozonics.com (Brandon Wiley)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Sloppy Chord
In-Reply-To: <E18zjlw-0008GF-00@localhost>
Message-ID: <Pine.NEB.4.03.10303311342220.19424-100000@galaxie.superpuppy.net>

> This feature seems very important to me, not because the "free choice" part can 
> be used to select peers with low latency or few network hops, but because it can 
> be used to select on arbitrary other criteria, such as avoiding peers that are 
> unreachable (due to an incompletely connected underlying network), avoiding 
> peers that are untrusted, or other criteria.

Some of this you could do with MIT Chord. If a node has a globally known
property (unreachability, untrustedness according to a central authority)
then you can just not let "bad" nodes join. You only need "sloppy" Chord
if you want nodes to make a routing choice which is based on subjective
criteria or information.

One of my censorship resistance schemes for China works like the former.
The global criteria for joining is that the node is not in China
(according to a ip->location service). This forms a supernode-like network
of "good" (not in China) nodes with the "bad" (in China) nodes attached
to the edges of the network. Unlike other supernode-based networks like
Kazaa, however, the supernodes are not organzed haphazardly, but into a
nice, neat Chord ring.

I also do this for the Alluvium mirror-finding network by having the
criteria be "not behind NAT".

> Brandon Wiley suggested to me that one could have a "Sloppy Chord", using the 
> same, asymmetric, distance metric, but changing the requirement of the k'th 
> finger table entry from "the precessor of selfId + 2^k" to "a node in the 
> interval 2^(k-1) through 2^k".

On important thing to remember about Chord is that for the lookups to
succeed, you *only* need for next pointers to be correct and to always
follow pointers forward. You don't need finger tables. You can have
incorrect finger tables as long as you never follow a bad finger
backwards, and this is something you can easily check for yourself.

MIT Chord sets you up with a finger table which you can follow for the
nice O(log N). However, you should feel perfectly free to keep around
references to nodes that don't fit in your finger table. And you should
feel free to follow such references if they are closer to the key than
either 1) you or 2) the last reference you followed, and 3) you like them
better than the other references you have by some criteria.

I can't guarantee that if you follow references other than your finger
table you'll get there in O(log n). However, you can't optimize for two
things at once. If you want to get there fast, the finger table is your
roadmap to guaranteed speed. Of course some reference you have might
actually be closer to the key than your finger table's reference. Then you
should follow that. The O is the same, but the actual time to the key
might be less. If you don't care about getting there as fast as possible,
use something based on criteria other than being in your finger table.
You'll still get there for sure.

> After thinking about it for a few minutes, there isn't any obvious reason why 
> this wouldn't have the same asymptotic performance guarantees as proper MIT 
> Chord.

I wish there was some analysis on this. I can only say that I don't see a
problem here, but I don't feel like trying to prove it. Of course, this is
one of the non-technical problems with MIT Chord. It sticks very closely
to the provable whereas for real applications less provable systems with
better actual properties are desirable.


From nramabha at cs.ucsd.edu  Mon Mar 31 12:59:03 2003
From: nramabha at cs.ucsd.edu (Narayanan S RAMABHADRAN)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Sloppy Chord
In-Reply-To: <E18zjlw-0008GF-00@localhost>
Message-ID: <Pine.GSO.4.32.0303301814460.25708-100000@gradlab.ucsd.edu>

Hi

   This would probably work. However, to achieve theoretical guarantees,
it may be necessary to add a restriction, such as choosing a node with
uniform probability from that interval. If you choose based on a metric
like latency, you may not be able to theoretically show that it has the
same performance as Chord. In practice, it would probably do well. I have
done some as yet unpublished work that shows that choosing based on
uniform probability does indeed work well.

There is also some work from Stanford
called Symphony, published in USITS 2003, that does something similar.
They extend Chord by allowing a node to choose neighbors at random but
governed by a certain probability distribution over the key space that
ensures efficient routing. While this is not exactly "free choice", it
does loosen up some of the rigidity in Chord. It has the disadvantage that
you need to estimate N, the total number of nodes in the system.

   Regards,
   Sriram

   Narayanan Sriram Ramabhadran
   Graduate student
   Dept. of Computer Science & Engg.
   University of California San Diego


On Sun, 30 Mar 2003, Zooko wrote:

>
> A fundamental advantage of Pastry [1] and Kademlia [2] over Chord [3] would seem
> to be that in the former two, there is a "free choice" in which peers a node
> links to, whereas in Chord each peer is uniquely determined.
>
> The Pastry papers describe this feature in terms of an arbitrary "proximity
> metric".  The Kademlia paper says:
>
> "Worse yet, asymmetry leads to rigid routing tables.  Each entry in a Chord
>  node's finger table must store the precise node preceding some interval in the
>  ID space.  Any node actually in the interval would be too far from nodes
>  preceding it in the same interval.  Kademlia, in contast, can send a query to
>  any node within an interval, ..."
>
> This feature seems very important to me, not because the "free choice" part can
> be used to select peers with low latency or few network hops, but because it can
> be used to select on arbitrary other criteria, such as avoiding peers that are
> unreachable (due to an incompletely connected underlying network), avoiding
> peers that are untrusted, or other criteria.
>
> But is this rigidity really a consequence of Chord's asymmetric distance metric?
>
> Brandon Wiley suggested to me that one could have a "Sloppy Chord", using the
> same, asymmetric, distance metric, but changing the requirement of the k'th
> finger table entry from "the precessor of selfId + 2^k" to "a node in the
> interval 2^(k-1) through 2^k".
>
> After thinking about it for a few minutes, there isn't any obvious reason why
> this wouldn't have the same asymptotic performance guarantees as proper MIT
> Chord.
>
> Regards,
>
> Zooko
>
> [1] http://citeseer.nj.nec.com/rowstron01pastry.html
> [2] http://citeseer.nj.nec.com/529075.html
> [3] http://citeseer.nj.nec.com/stoica01chord.html
>
> http://zooko.com/
>          ^-- under re-construction: some new stuff, some broken links
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
>


From bert at akamail.com  Mon Mar 31 13:26:02 2003
From: bert at akamail.com (Bert)
Date: Sat Dec  9 22:12:20 2006
Subject: [p2p-hackers] Sloppy Chord
In-Reply-To: <Pine.NEB.4.03.10303311342220.19424-100000@galaxie.superpuppy.net>
References: <Pine.NEB.4.03.10303311342220.19424-100000@galaxie.superpuppy.net>
Message-ID: <3E88B2CB.6030103@akamail.com>

Simplicity is indeed underappreciated in these sorts of DHT schemes (and 
in "academic" p2p research in general). Why else would Gnutella be so 
popular?

Here's a little-known DHT approach which is straightforward, 
(relatively) simple to implement, and has provable O(log n) performance 
(with very high probability).

Symphony: Distributed Hashing in a Small World  Gurmeet Manku, Mayank 
Bawa and Prabhakar Raghavan. USITS, 2003

http://www-db.stanford.edu/~bawa/Pub/symphony.pdf

In general I think randomized approaches such as this make a LOT more 
sense than any of Chord/Kademlia/Pastry. They can offer greater 
flexibility and robustness due to significantly less rigid distribution 
and routing rules.

Bert