From zooko at zooko.com Sun Mar 2 13:07:02 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Announcing Mnet v0.6.1 Message-ID: The Mnet Development Team [1] is pleased to announce the release of Mnet v0.6.1. Mnet is a "universal file space" -- a global space in which you can store and retrieve files. The contents of the universal file space are independent of any particular server. It comes with a GUI file browser that looks a bit like a classical file-sharing tool such as Napster. The code is published under the Lesser GNU Public License. The major improvements of v0.6.1 over v0.6 are GUI improvements, faster searches, reduced RAM usage and reduced CPU usage. This is the last release planned from this branch of the Mnet source code. The next planned release will include a completely new, scalable lookup mechanism among other substantial changes. See the Mnet weather report [2] for the current size of the network and measurements of the number, kind, and size of files available. Please view the ChangeLog for more details: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/mnet/mnet/ChangeLog?rev=HEAD&content-type=text/vnd.viewcvs-markup Please visit the download page for precompiled packages for Windows, Mac OS X, Linux, FreeBSD, and Solaris. Also available from the download page are instructions for compiling the software from source. http://mnet.sf.net/download.php Please use Mnet and report bugs via e-mail to , or by using the SourceForge bug tracker: http://sourceforge.net/tracker/index.php?group_id=43482&atid=436453 The availability and persistence of files is strongly influenced by how stable the servers are. If you run a stable Mnet server it will help. More information is available on the project web page: http://mnet.sf.net/ Regards, Zooko Developer, Mnet Project [1] The Mnet Development Team is a loosely-organized band of hackers from around the planet who work on the project as a volunteer, non-profit operation in the public interest. Each hacker is either single or else associated with a very supportive romantic partner. [2] The Mnet Weather Report is a series of e-mails to the mnet-devel mailing list with the mysterious From: address "Carnivore". http://sourceforge.net/mailarchive/forum.php?forum_id=7702 From agm at SIMS.Berkeley.EDU Mon Mar 3 14:43:02 2003 From: agm at SIMS.Berkeley.EDU (Antonio Garcia) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Crawling supernodes. In-Reply-To: Message-ID: Hello, I am trying to get my client to build a topological map of the supernodes that I am connected to in Gnutella. Considering the various ways supernodes pick nodes to return in pong messages, can I still assume that any hosts returned by a pinged host are in fact connected to it? If so, what is the best way to get overall connectivity information, other than pinging a given host several times until it reveals all its neighbors? Thanks, A. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Antonio Garcia-Martinez cryptologia.com From justin at chapweske.com Tue Mar 4 14:57:02 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] New THEX Draft Message-ID: <3E652F28.90305@chapweske.com> Hello All, Please find the new THEX draft at (http://open-content.net/specs/draft-jchapweske-thex-02.html). Changes in this version include: o A *backwards incompatible* change to the THEX hash tree functions that avoid collisions between the leaf nodes and the internal nodes. Many thanks to Zooko for pointing out this problem and many thanks to the IRTF Cryptographic Forum Research Group for suggesting the appropriate fix. o Test vectors for hash trees based on the Tiger algorithm provided by Gordon Mohr. -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From bram at gawth.com Thu Mar 6 12:51:01 2003 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] p2p-hackers meeting, this sunday Message-ID: In keeping with our usual schedule, the monthly p2p-hackers meeting will be this sunday, the 9th, at 3pm at the metreon. This month at least me and my brother will be there talking about Codeville. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From vab at cryptnet.net Thu Mar 6 15:22:02 2003 From: vab at cryptnet.net (V Alex Brennen) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Miami p2p-hackers meeting ping In-Reply-To: Message-ID: Is there enough geographic p2p hacker density in Miami (or South Florida) to get a meeting going? Let me know if you're out there and you'd like to participate. - VAB (@Miami) From levine at vinecorp.com Thu Mar 6 23:20:01 2003 From: levine at vinecorp.com (James D. Levine) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] [Silicon Valley] PeerPunks meeting next Tuesday Message-ID: You know the drill by now... Where: Dana Street Roasting Company 744 W Dana St, Mountain View,CA 94041 Phone: (650) 390-9638 This is just 1/2 block off Castro St. When: 7:00 pm onward From eryou at ifs.com.ky Fri Mar 7 03:30:03 2003 From: eryou at ifs.com.ky (eryou@ifs.com.ky) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Vacation Message Message-ID: <20030307112820.19079.qmail@lnxpwapp01.ifs.com.ky> *** Automated Response *** I am currently on vacation. Please contact Cam McKeown (cam.mckeown@ifs.com.ky) for any items. -- regards, Robert Eryou Internet Financial Services Ltd. [IFS] George Town, Grand Cayman, Cayman Islands 345.945.8607 www.ifs.com.ky PGP: https://www.ifs.com.ky/pgp/Robert_Eryou.asc I must create a system, or be enslaved by another man's. ~William Blake From levine at vinecorp.com Mon Mar 10 13:55:03 2003 From: levine at vinecorp.com (James D. Levine) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] [Silicon Valley] PeerPunks meeting tomorrow Tuesday 3/11 Message-ID: Come hear about what you missed at CodeCon :) See you there... James ---------------- Where: Dana Street Roasting Company 744 W Dana St, Mountain View,CA 94041 Phone: (650) 390-9638 This is just 1/2 block off Castro St. When: 7:00 pm onward From levine at vinecorp.com Tue Mar 11 11:32:02 2003 From: levine at vinecorp.com (James D. Levine) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] [Silicon Valley] PeerPunks TONIGHT in Mountain View Message-ID: Last pester... James ---------------- Where: Dana Street Roasting Company 744 W Dana St, Mountain View,CA 94041 Phone: (650) 390-9638 This is just 1/2 block off Castro St. When: 7:00 pm onward -- From melc at fashionvictims.com Mon Mar 17 08:35:02 2003 From: melc at fashionvictims.com (melc) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Announcing version 0.2 of CP2PC Message-ID: After continued development on the CP2PC uberclient file-sharing application, we are proud to announce release 0.2 of the code. As part of the CP2PC project we have developed a minimal programming interface (API) to peer-to-peer file-sharing systems. This release contains improvements to the 0.1 implementation, but more importantly it contains a new GUI client. This client allows seamless access to multiple file-sharing networks through a single GUI application. The code for this release can be found at: http://sourceforge.net/projects/cp2pc The current release contains * GUI. The GUI provides access to multiple file-sharing networks through a single interface. The interface is similar to existing file-sharing applications except that it allows one to choose which networks to search for files, or which networks local files should be published on. Our GUI also provides a more advanced search interface than most file-sharing applications, allowing users to specify specific RDF queries to include in their search request as well as simple keyword or attribute based search criteria. * Core CP2PC code. This provides a default implementation of the CP2PC API, an implementation of core facilities (that is facilities that may be used by file-sharing network components, including a local RDF triple database which implements the Tristero search interfaces, a simple CP2PC shell interface, donload and upload monitors, etc.), and a skeleton component that can be extended to build new file-sharing network backends. * Gnutella component. This is a file-sharing network backend component that can be used to connect to and use the Gnutella network. Currently, the component provides a simple shell like interface to the network (the shell implements CP2PC API calls). The component can also be used by other programs as a library. This component uses Limewire [1] code to do the actual Gnutella specific work - it forms a bridge between the CP2PC API and the Limewire code. * GDN component. This is a file-sharing network backend component that can be used to connect to and use the GDN [2] network. Currently, the component provides a simple shell like interface to the network (the shell implements CP2PC API calls). The component can also be used by other programs as a library. The component acts as a GDN client; creating, binding to, accessing, and destroying objects as necessary. The code is written in Java and can be downloaded as a tarball or from CVS. Required libraries (i.e., jar files) can also be downloaded from our SourceForge site. Both the tarball and CVS contain the CP2PC documentation, including a description of the API and of the mapping of the API to various file-sharing networks. Note that the code is still alpha quality. The code is released under the LGPL. Future work for the CP2PC project will include expansion of the GUI frontend, an XML-RPC interface for components and the creation of more backend components. Ihor. [1] http://www.limewire.org [2] http://www.cs.vu.nl/globe melc. -- melc@fashionvictims.com From kevin at atkinson.dhs.org Mon Mar 17 10:36:02 2003 From: kevin at atkinson.dhs.org (Kevin Atkinson) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Announcing a new version of DistribNet Message-ID: I just released a new snapshot of DistribNet. DistribNet is a Global peer-to-peer Internet file system in which anyone can tap into or add content to. You can find it at http://distribnet.sourceforge.net/. Compared the last version I released over a year ago this version is a functional network. However it is in no way ready to be used for anything but testing. I would appreciate it if other would take a look and give feedback. I would especially be interested if someone which accesses to a large cluster can test it out as I have not actually tested if it functions over the network (all my testing was done by launching multiple nodes on the same computer). Expect more improvements to come. Release Notes So far DistribNet has only been tested on my computer which is RedHat 8.0 (Kernel 2.4). Other linux systems should work. Other POSIX systems might work. Forget Win32 for now. DistribNet requires: GNU Libc (I use a few thread safe extensions) Gcc 3.1 or better (Gcc 2.95 may work) POSIX Threads Berkeley DB Version 4 or better OpenSSL (I use version 0.9.6b) Perl (for the scipts) A Web Browser Status and Future The routing table is pretty much done. Future versions will use a better selection algorithm for determining which nodes to use. Data keys are implemeted. However, no caching is done. Other key types are not. The protocol is subject to change. Absolutly no garantee will be made that different versions of DistribNet. The photocol will stabilize once I get the basics done. Absolutely no security. Once I get all the basics done I work on adding security, which includes encrypting all communication. Integers are not properly converted to network order. This will be done at the same time the photocol is being stabilized. Basically until I get the basics done, don't expect to be able to use DistribNet for anything other than testing. However, please do test it. Since DistribNet is my Masters Theseus I will be working nearly full time on it for the next couple of months. However, I would defiantly appreciate help with the implementation. General ideas, of course, are also welcome. Please post them to distribnet-devel at lists.sourceforge.net and not to me directly so others can benefit from our discussion. Attached is a text copy of the distribnet design document. -- http://kevin.atkinson.dhs.org -------------- next part -------------- next_inactive up previous DistribNet Kevin Atkinson (kevin at atkinson dhs org) Project Page: http://distribnet.sourceforge.net/ Mailing list: http://lists.sourceforge.net/lists/listinfo/ distribnet-devel Abstract: A global peer-to-peer Internet file system in which anyone can tap into or add content to. 1 Overview NOTE This paper was initially written in response to my dislike of Freenet and similar network that focus on complete anonymity. Thus a large deal of the comments are directed towards that community. My ultimate goal is to design a general purpose p2p network and not just something to replace of Freenet. In fact in some ways DistribNet won't replace Freenet due to the anonymity issue. I plan on eventually modifying this paper accordantly to address the general p2p (and web) community. For now please keep the intended audience in mind when reading this section. 1.1 Main Goal * To allow anyone, possibly anonymously, to publish web sites with out having to pay for the bandwidth for a commercial provider or having to put up with the increasingly ad ridden free web sites. The only thing the author of the web site should have to worry about is the contents of the web site itself. 1.2 (Possibly Impossible) Goals * Really fast lookup to find data. The worst case should be O(log(n)) and the average case should be O(1) or very close to it. * Actually retrieving the data should also be really fast. Popular data should be sitting on the same subnet. On average it should be as fast or faster than a typical web site (such as slashdot, Google, etc.). It should make effective use of the topology of the Internet to to minimize network load and maximize performance. * General searching based on keywords will be build into the protocol from the beginning. The searching faculty will be designed in such a way to make message boards trivial to implement. * Ability to update data while keeping old revisions around so data never disappears until it is truly unwanted. No one person will have the power to delete data once it spreads throughout the network. * Will try very hard to keep all but the most unpopular content from falling off the network. Basically before deleting a locally unpopular key it will first check if other nodes are storing the key and how popular they find the key. If not enough nodes are storing the key and there is any indication that the data may be useful at a latter date it will not delete it unless it absolutely has to. And if it does delete it it will first try uploading it to other nodes with more disk space available. * Ability to store data indefinitely if someone is willing to provide the space for it (and being able to find that data in log(n) time). * Extremely robust so that the only way to kill the network is to disable almost all of the nodes. The network should still function even if say 90% of it goes down. * Extremely effect CPU-wise so that a fully functional node can run in the background and only take 1-2% of the CPU. 1.3 Applications I would like the protocol to be able to effectually support (ie with out any ugly hacks that many of the application for Freenet use) * Efficient Web like sites (with HTTP gateway to make browsing easy) * Efficient sharing of files large and small. * Public message forms (with IMAP gateway to make reading easy) * Private Email (the message will off course be encrypted so only the intended recipient can read it, again with IMAP gateway) And maybe: * Streaming Media * Online Chat (with possible IRC or similar gateway) 1.4 Anti-Goals Also see philosophy for why I don't find these issues that important * Complete anonymity for the browser. I want to focus first on performance than on anonymity. In fact I plan to use extensive logging in the development versions so that I track network performance and quickly cache performance bugs. As DistribNet stabilizes anonymity will be improved at the expense of logging. The initial version will only use cryptology when absolutely necessary (for example key signing). Most communications will be done in the clear. After DistribNet stabilizes encryption will slowly be added. Please note that I still wish to allow for anonymous posting of content. However, without encryption, it probably won't be as anonymous as Freenet or your GNUNet. * Data in the cache will be stored in a straight forward manner. No attempt will be made to prevent the node operate from knowing what is in his own cache. Also, by default, very little attempt will be made to prevent others from knowing what is a particular node cache. 1.5 Philosophy * I have nothing against complete anonymity, it is just that I am afraid that both Freenet and GNUNet or more designed around the anonymity and privacy issues then they are around the performance and scalability issues. * For most type of things the level of anonymity that Freenet and GNUNet offers is simply not needed. Even for copyrighted and censored material there is, in general, little risk in actually viewing the information because it is simply impractical to go after every single person who access forbidden information. Most all of the time the lawsuits and such are after the original distributors of the information and not the viewers. There for DistribNet will aim to provide anonymity for distributing information, but not for actually viewing it. However, since there is some information where even viewing it is extremely risky, DistribNet will eventually be able to provide the same level of anonymity that Freenet or GNUNet offers, but it will be completely optional. * I also believe that knowing what is in one owns datastore and being able to block certain type of material from one owns node is not that big of a deal. Unless almost everyone blocks a certain type of information the availability of blocked information will not be harmed. This is because even if 90% of the nodes block say, kiddie porn, the information will still be available on the other 10% of the nodes which, if the network is designed correctly, should be more than enough for anyone to get at blocked information. Furthermore, since the source code for DistribNet will be protected under the GPL or similar license, it will be completely impractical for other to force a significant number of nodes to block information. Due to the dynamic nature of the cache I find it legally difficult to hold anyone responsible for the contents of there cache as it is constantly changing. 2 DistribNet Architecture Two types of keys: Map and Data keys. Maps keys are hashed based on there identification and can be updated, Data keys are hashed based on there content and consequently can not be updated. There will be three type of storage of keys, Permanent, Cache, and Pointers. Permanent keys will be used to ensure the available of content, the cache will be used exactly like a typical cache will be used, and pointers will be used to be be able to find content. Map keys will be routed based on the SHA-1 hash on the identification using a Pastry[4] like system. Data are not routed and will be stored based on where they are retrieve. Map keys will be used to be able to find data keys. 2.1 Key Types There will essentially be two types of keys. Map keys and data keys. Map keys will be uniquely identified in a similar manner as freenet SSK keys. Data keys will be identified in a similar manner as freenet's CHK keys. Map keys will contain the following information: * Short Description * Public Namespace Key * Timestamped Index pointers * Timestamped Data pointers At any given point in time each map key will only be associated with one index pointer and one data pointer. Map keys can be updated by appending a new index or data pointer to the existing list. By default, when a map key is queried only the most recent pointer will be returned. However, older pointers are still there and may be retrieved by specifying a specific date. Thus, map keys may be updated, but information is never lost or overwritten. Data keys will be very much like freenet's CHK keys except that they will not be encrypted. Since they are not encrypted delta compression may be used to save space. There will not be anything like freenet's KSK keys as those proved to be completely insure. Instead Map keys may be requested with out a signature. If there is more than one map key by that name than a list of keys is presented sorted by popularity. To make such a list meaning full every public key in freenet will have a descriptive string associated with it. 2.1.1 Data Key Details Data keys will be stored in maximum size blocks of just under 32K. If an object is larger than 32K it will be broken down into smaller size chunks and an index block, also with a maximum size of about 32K, will be created so that the final object can be reassembled. If an object is too big to be indexed by one index block the index blocks themselves will be split up. This can be done as many times as necessary therefore providing the ability to store files of arbitrary size. DistribNet will use 64 bit integers to store the file size therefore supporting file sizes up to 264-1 bytes. Data keys will be retrieved by blocks rather than all at once. When a client first requests a data key that is too large to fit in a block an index block will be returned. It is then up the client to figure out how to retrieve the individual blocks. Please note that even though that blocks are retrieved individually they are not treated as truly independent keys by the nodes. For example a node can be asked which blocks it has based on a given index block rather than having to ask for each and every data block. Also, nodes maintain persistent connections so that blocks can be retrieved one after another without having to re-establish to connection each time. Data and index blocks will be indexed based on the SHA-1 hash of there contents. The exact numbers of as follows: +-------------------------------------------------------------+ | Data Block Size: | 215 - 128 = 32640 | |-----------------------------------------+-------------------| | Index block header size: | 40 | |-----------------------------------------+-------------------| | Maximum number of keys per index block: | 1630 | |-----------------------------------------+-------------------| | Key Size: | 20 | +-------------------------------------------------------------+ Maximum object sizes: direct => 214.99 bytes , about 31.9 kilo 1 level => 225.66 bytes , about 50.7 megs 2 levels => 236.34 bytes , about 80.8 gigs 3 levels => 247.01 bytes , about 129 tera 4 levels => 257.68 bytes 5 levels => 268.35 bytes (but limited to 264 - 1) Why 32640? A block size of just under 32K was chosen because I wanted a size which will allow most text files to fit in one block, most other files with one level of indexing, and just about anything anybody would think of transferring on a public network in two levels and 32K worked out perfectly. Also, files around 32K are rather rare therefor preventing a lot of of unnecessary splitting of files that don't quite make it. 32640 rather than exactly 32K was chosen to allow some additional information to be transfered with the block without pushing the total size over 32K. 32640 can also be stored nicely in a 16 bit integer without having to worry if its signed or unsigned. However, the exact block size is not fixed in stone. If, at a latter date, a different block size is deemed to be more appropriate than this number can be changed.... 2.2 Storage Permanent keys will be distributed essentially randomly. However, to insure availability the network will insure at least N nodes contain the data. Nodes which are responsible for maintaining a permanent key will know about all the other nodes on the network with are also responsible for that key. From time to time it will check up on the other nodes to make sure they are still live and if less than N-1 other nodes are live it will pick another node to to ask to maintain a copy of the key. It will first try nodes which already have the key in its cache and if they all refuse or none of them do. It will chose a random node to ask and will keep trying until some node accepts or one the original nodes becomes live again. Cached keys will be DistribNet based on where it will do the most good performance wise. How cache keys will be managed is still undecided. For the first implementation it will likely be stored on the nodes which have previously requested the key. Pointer keys will basically be distributed based on the numeric distance of the hash of the key from the hash of the node's identification. Since pointer keys contain very little data they will be an extremely large amount of redundancy. Pointer keys will contain two types of pointers. Pointers to permanent keys and pointers to permanent keys. Pointer keys on different nodes will all contain the same permanent pointers but will only contain pointers to cached keys to nodes which or near by. There will be an upper limit to the number of pointers within an pointer key any one node will have. 3 DistribNet Routing Map keys will be routed based on the SHA-1 hash of their identification based in a similar manor as done in Pastry[4]. This section will assume the reader is familiar with how Pastry works and will focus on how DistribNet differs from Pastry. Each node on DistribNet is uniquely identifies by the 160-bit SHA-1 hash of the public key. Since SHA-1 hashes are used the nodes will be evenly distributed. Keys in DistribNet are stored based on bitwise closeness. Bitwise closeness is based on the number of common bits two keys have. Unlike Pastry, numerical closeness is generally not used. The routing contains 8 rows with each row containing 24 entries each. In general, DistribNet tries to maintain at least two nodes for each entry. The number of rows does not need to be fixed and it can change based on the network size. It may also be possible that the number of entries per row does not necessarily have to be fixed. However, This idea has not been exported in. 4 was chosen as the base size for several reasons 1) it is a power of two, 2) when keys are thought of as a sequence of digits a base size of 4 means that the digits will be hexadecimal, 3) the Pastry paper hinted that 4 would be a good choice. The number of rows was chosen to be large enough so that there is no possibility that the last row will be used when dealing with a moderate size network during testing. Unlike Pastry their is no Leaf set. Instead the ``leaf set'' consists of all rows which are not ``full''. A full row is a row which contains 15 full entries with extra empty entry being the one which represents the common digit with the node's key, and thus will never be used. Not having a true ``lead set'' simplifies the implementation since a seperate list does not need to be maintianed. This also means that all the nodes in the leaf set will maintain the same set. I have not determened if this is a good or bad thing. A row is considered full in DistribNet if 15 of the 16 entries are full in the current node AND other nodes also have 15 of the 16 entries full (clarify...). For each full row DistribNet will try to maintain at least two nodes for each entry. This way if one node goes down the other one can be used without effecting performance. When a node is determined to be down (as oppose to being momentary offline) DistribNet will try to replace it with another node that is up. With this arraignment is is extremely likely that at least one on the two nodes will be available. A full row can become a leaf row if the entry count drops below 15. For each non-full row (ie in the Leaf Set) DistribNet will attempt to maintain as many nodes as are available for that entry so that every other node in the leaf set is accounted for. From time to time DistribNet will contact another node in the leaf set and synchronize its leaf set with it. This is possible because all nodes in the leaf set will have the same set. Down nodes in the leaf set will be removed, but the criteria for when a node is down for a leaf set is stricter than the criteria for a full row. If a lead row becomes a full row than excess nodes will be removed. DistribNet also maintains an accurate estimate on the number of nodes that are on the network. This is possible because unlike with network such as freenet, all nodes are accounted for. To store a Map key the 3 bitwise closest nodes will get it. When looking for a key the 8 closest nodes will be tried. The routing table is implemented in the files routeing_*.?pp in the src/ directory of the distribution. 4 Retrieval of Data keys Each key request is coupled by a hops-to-try HTT request. This node controls the number of additional nodes that can be contacted to retrieve the data. If the HTT number is 0 than the request will fail unless it has the node. This number is only for actually retrieving the data, not finding it. When a node A wants to retrieve a key K either two things will happen. If it has good reason to believe that a nearby node has the key it will attempt to retrieve it from that node. otherwise it will send a request to get other nodes which have the key. If a nearby node has the key it will ask the that node for the key. If it doesn't it will ask some nearby to do so on its behave. To find a key ... To do this it will contact node B which will in tern contact C etc, until an answer is found which for the sake of argument will be node E. Node E will then send a list of possible nodes L which contain key K directly to node A. Node E will then send the result to node D, which will send it to C, etc. Node E will also add node A to list L with probability of say 10%, Node D will do the same but with a probability of say 25%, etc. This will avoid the problem having the list L becomes extremely large for popular data but allow nodes close to A to discover that A has the data since nodes close to A will likely contact the same nodes that A tried. Since A requested the location of key K it is assumed that K will will likely download the data. If this assumption is false than node A will simply be removed at the list latter on. Once A retrieves the list it will pick a node from the list L based on some evaluation function, lets say it picks node X. Node X will then return the key to node A. The evaluation function will take several factors, into account, including distance, download speed, past reputation, and if node A even knows anything about the new node. If node X does not send the key back to node A for what ever reason it will remove node X from the list and try again. It will also send this information to node B so it can consider removing node X from its list, it will then in term notify node C of the information, etc. If the key is an index block it will also send information of what parts of the complete key node X has. If the key is not an index block than node a is done. If the key is an index block than node A will start downloading the sub-blocks of key K that node X has. At the same time, if the key is large or node X does not contain all the sub-blocks of K, node X will chose another node from the list to contact, and possible other nodes depending on the size of the file. It will then download other parts of the file from the other nodes. Which blocks are download from which nodes will chance based on the download speed of the nodes so that more blocks are download from faster nodes and less from slower, thus allowing the data to be transfered in the least amount of time. If after contacting a certain number of nodes there are still parts of the key that are not available on any of those nodes, node A will perform a separate query for the individual blocks. However, I image, in practice this will rarely be necessary. 4.1 Distance determination One very course estimate for node distance would be to take the numerical distance between two nodes ip address since nodes closer to easy other numerically are likely to be share the same gateways and nodes really close are likely to be on the same subnet. Another way to estimate node distance releases on the the fact that node distance, for the most part, obeys the triangle inequality. For each node in the list of candidate nodes some information about the estimated distance between that node, node E, in the list and the node storing the list is maintained by some means. For node A to estimate the distance between a node on the list, node X, and itself all it has to do is combine the distance between it and E with the distance between E and X. The combination function will depend on the aspect of distance that is being measured. For the number of hops it will simply add them, for download speed it will take the maximum, etc. 5 Limitations Because they is no indirection when retrieving data most of the data on any particular node would be data that a local node user requested at some point in time. This means that it is fairly easy to tell what which keys a particular user requested. Although complete anonymity for the browser is one of my anti-goals this is going a bit to far. One solution for this is to do something similar that GNUNet does which is described in [3]. It is also blatantly obvious which nodes have which keys. Although I do not see this as a major problem, especially if a solution for the first problem is found, it is something to consider. I will be more than happy to entertain solutions to this problem, provided that it doesn't effect effectually that much. 6 Implementation Details An implementation for DistribNet is available at http:// distribnet.sourceforge.net/. 6.1 Physical Storage Blocks are currently stored in one of three ways 1. block smaller than a fixed threshold (currently 1k) are stored using Berkeley DB (version 3.3 or better). 2. blocks larger than the threshold are stored as files. The primary reason for doing this is to avoid limiting the size of data store by the maximum size of a file which is often 2 or 4 GB on most 32-bit systems. 3. blocks are not stored at all, instead they are linked to an external file out side of the data store much like a symbolic link links to file out side of the current directory. However since blocks often only represent part of the file the offset is also stored as part of the link. These links are stored in the same database that small blocks are stored in. Since the external file can easily be changed by the user, the SHA-1 hashes will be recomputed when the file modification data changes. If the SHA-1 hash of the block differs all the links to the file will be thrown out and the file will be relinked. (This part is not implemented yet). Most of the code for the data keys can be found in data_key.cpp 6.2 Language DistribNet is/will be written in fairly modern C++. It will use several external libraries however it will not use any C++ specific libraries. In particular I have no plan to use any sort of Abstraction library for POSIX functionally. Instead thin wrapper classes will be used which I have complete control over and will serve mainly to make the process of using POSIX functions less tedious rather than abstract away the details of using them. Bibliography 1 GNUNet. http://www.ovmj.org/GNUnet/ and http://www.gnu.org/software/ GNUnet/ 2 Freenet. http://freenet.sourceforge.net/ 3 Krista Bennett and Christian Grothoff. ``GNUnet - anonymity for free''. http://gecko.cs.purdue.edu/GNUnet/papers.php3 4 Antony Rowstron and Peter Druschel. ``Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems''. http://research.microsoft.com/ antr/Pastry/pubs.htm About this document ... DistribNet This document was generated using the LaTeX2HTML translator Version 2002 (1.62) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds. Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney. The command line arguments were: latex2html -no_subdir -split 0 -show_section_numbers /tmp/ lyx_tmpdir1352U04iwM/lyx_tmpbuf0/design.tex The translation was initiated by Kevin Atkinson on 2003-03-17 ------------------------------------------------------------------------- next_inactive up previous Kevin Atkinson 2003-03-17 From zooko at zooko.com Sun Mar 30 12:53:02 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Sloppy Chord Message-ID: A fundamental advantage of Pastry [1] and Kademlia [2] over Chord [3] would seem to be that in the former two, there is a "free choice" in which peers a node links to, whereas in Chord each peer is uniquely determined. The Pastry papers describe this feature in terms of an arbitrary "proximity metric". The Kademlia paper says: "Worse yet, asymmetry leads to rigid routing tables. Each entry in a Chord node's finger table must store the precise node preceding some interval in the ID space. Any node actually in the interval would be too far from nodes preceding it in the same interval. Kademlia, in contast, can send a query to any node within an interval, ..." This feature seems very important to me, not because the "free choice" part can be used to select peers with low latency or few network hops, but because it can be used to select on arbitrary other criteria, such as avoiding peers that are unreachable (due to an incompletely connected underlying network), avoiding peers that are untrusted, or other criteria. But is this rigidity really a consequence of Chord's asymmetric distance metric? Brandon Wiley suggested to me that one could have a "Sloppy Chord", using the same, asymmetric, distance metric, but changing the requirement of the k'th finger table entry from "the precessor of selfId + 2^k" to "a node in the interval 2^(k-1) through 2^k". After thinking about it for a few minutes, there isn't any obvious reason why this wouldn't have the same asymptotic performance guarantees as proper MIT Chord. Regards, Zooko [1] http://citeseer.nj.nec.com/rowstron01pastry.html [2] http://citeseer.nj.nec.com/529075.html [3] http://citeseer.nj.nec.com/stoica01chord.html http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From gojomo at usa.net Sun Mar 30 18:56:02 2003 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Sloppy Chord References: Message-ID: <01ee01c2f730$fb4e0380$0a0a000a@golden> You've probably seen the IPTPS03 paper, "Sloppy Hashing and Self-Organized Clusters", by David Mazieres and Michael Freedman: http://iptps03.cs.berkeley.edu/final-papers/coral.pdf I suspect many of the strict Chord (and other DHT) assumptions can be loosened while still achieving excellent, nearly equivalent performance -- though it then becomes harder to rigorously prove such performance. - Gordon ----- Original Message ----- From: "Zooko" To: Sent: Sunday, March 30, 2003 12:51 PM Subject: [p2p-hackers] Sloppy Chord > > A fundamental advantage of Pastry [1] and Kademlia [2] over Chord [3] would seem > to be that in the former two, there is a "free choice" in which peers a node > links to, whereas in Chord each peer is uniquely determined. > > The Pastry papers describe this feature in terms of an arbitrary "proximity > metric". The Kademlia paper says: > > "Worse yet, asymmetry leads to rigid routing tables. Each entry in a Chord > node's finger table must store the precise node preceding some interval in the > ID space. Any node actually in the interval would be too far from nodes > preceding it in the same interval. Kademlia, in contast, can send a query to > any node within an interval, ..." > > This feature seems very important to me, not because the "free choice" part can > be used to select peers with low latency or few network hops, but because it can > be used to select on arbitrary other criteria, such as avoiding peers that are > unreachable (due to an incompletely connected underlying network), avoiding > peers that are untrusted, or other criteria. > > But is this rigidity really a consequence of Chord's asymmetric distance metric? > > Brandon Wiley suggested to me that one could have a "Sloppy Chord", using the > same, asymmetric, distance metric, but changing the requirement of the k'th > finger table entry from "the precessor of selfId + 2^k" to "a node in the > interval 2^(k-1) through 2^k". > > After thinking about it for a few minutes, there isn't any obvious reason why > this wouldn't have the same asymptotic performance guarantees as proper MIT > Chord. > > Regards, > > Zooko > > [1] http://citeseer.nj.nec.com/rowstron01pastry.html > [2] http://citeseer.nj.nec.com/529075.html > [3] http://citeseer.nj.nec.com/stoica01chord.html > > http://zooko.com/ > ^-- under re-construction: some new stuff, some broken links > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers From kevin at atkinson.dhs.org Mon Mar 31 00:31:02 2003 From: kevin at atkinson.dhs.org (Kevin Atkinson) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Sloppy Chord In-Reply-To: <01ee01c2f730$fb4e0380$0a0a000a@golden> Message-ID: What advantages does Chord[1] offer over Pastry[2]? I chose Pastry over chord because of it simplicity, and the because of the so called "free choice" property. In fact my network DistribNet[3], is essentially implementing the ideas of Coral[4] but on top of Pastry rather than chord. In particular nodes can freely join and leave the network without requiring any movement of keys. The only thing a network has to do when it joins is to synchronize its routing table with the network. Nothing has to be done for a node to leave the network. So my question is: What advantages does Chord offer performance wise over Pastry? [1] Chord: http://www.pdos.lcs.mit.edu/chord/ [2] Pastry: http://research.microsoft.com/antr/Pastry/pubs.htm [3] DistribNet: http://distribnet.sourceforge.net/ [4] Coral: http://www.scs.cs.nyu.edu/coral/ --- http://kevin.atkinson.dhs.org From kevin at atkinson.dhs.org Mon Mar 31 01:03:02 2003 From: kevin at atkinson.dhs.org (Kevin Atkinson) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Sloppy Chord In-Reply-To: Message-ID: On Mon, 31 Mar 2003, Kevin Atkinson wrote: [Broken link corrected below] > What advantages does Chord[1] offer over Pastry[2]? I chose Pastry over > chord because of it simplicity, and the because of the so called "free > choice" property. > > In fact my network DistribNet[3], is essentially implementing the ideas of > Coral[4] but on top of Pastry rather than chord. In particular nodes can > freely join and leave the network without requiring any movement of keys. > The only thing a network has to do when it joins is to synchronize its > routing table with the network. Nothing has to be done for a node to > leave the network. > > So my question is: What advantages does Chord offer performance wise over > Pastry? > > [1] Chord: http://www.pdos.lcs.mit.edu/chord/ > [2] Pastry: http://research.microsoft.com/antr/Pastry/pubs.htm Make that "http://research.microsoft.com/~antr/Pastry/" sorry. > [3] DistribNet: http://distribnet.sourceforge.net/ > [4] Coral: http://www.scs.cs.nyu.edu/coral/ -- http://kevin.atkinson.dhs.org From zooko at zooko.com Mon Mar 31 06:43:02 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Sloppy Chord In-Reply-To: Message from Kevin Atkinson of "Mon, 31 Mar 2003 03:30:27 EST." References: Message-ID: Kevin Atkinson writes: > > What advantages does Chord[1] offer over Pastry[2]? I chose Pastry over > chord because of it simplicity, and the because of the so called "free > choice" property. Heh heh. Simplicity is funny. I believe that Brandon Wiley prefers Chord over Pastry because of its simplicity. ;-) I prefer Pastry-or-Kademlia over Chord because of the free choice and, to a lesser extent, because of the symmetric metric. I prefer Kademlia over Pastry because of the simplicity of exposition and argument in the original Kademlia paper. Also Kademlia has a simplification over Pastry (no "leaf sets"), but I suspect that this simplification might come with a cost in reduced robustness. > In fact my network DistribNet[3], is essentially implementing the ideas of > Coral[4] but on top of Pastry rather than chord. Wow! I need to read about Coral now. I haven't read the iptps03 proceedings yet, so I am obviously behind the times now. The Coral page says it will do primarily Kademlia, and also Chord. Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From kevin at atkinson.dhs.org Mon Mar 31 07:18:01 2003 From: kevin at atkinson.dhs.org (Kevin Atkinson) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Sloppy Chord In-Reply-To: Message-ID: On Mon, 31 Mar 2003, Zooko wrote: > Kevin Atkinson writes: > > > > What advantages does Chord[1] offer over Pastry[2]? I chose Pastry over > > chord because of it simplicity, and the because of the so called "free > > choice" property. > > Heh heh. Simplicity is funny. I believe that Brandon Wiley prefers > Chord over Pastry because of its simplicity. ;-) Well I guess it is how you look at it. I thought Pastry was simpler than Chord, at least the idea. My implementation of Pastry makes several simplifications: 1) No Neighborhood set. 2) No "real" leaf set. Instead any row which does not have at least one node in each entry is a leaf row. 3) Distance is strictly bitwise not numerical. (bitwise meaning the number of common bits. Ie 1101 and 1111 would have a distance of 2) 2 and 3 mean that all nodes in the "leaf set" have identical leaf sets. which simplifies the implementation. > I prefer Pastry-or-Kademlia over Chord because of the free choice and, to a > lesser extent, because of the symmetric metric. > > I prefer Kademlia over Pastry because of the simplicity of exposition and > argument in the original Kademlia paper. Also Kademlia has a simplification > over Pastry (no "leaf sets"), but I suspect that this simplification might come > with a cost in reduced robustness. I will have to look at Kademlia. It may be very similar to what I did with Pastry. > > In fact my network DistribNet[3], is essentially implementing the ideas of > > Coral[4] but on top of Pastry rather than chord. > > Wow! I need to read about Coral now. I haven't read the iptps03 proceedings > yet, so I am obviously behind the times now. Well I have not herd of Kademlia or Coral until yesterday :-|. --- http://kevin.atkinson.dhs.org From blanu at bozonics.com Mon Mar 31 12:06:02 2003 From: blanu at bozonics.com (Brandon Wiley) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Sloppy Chord In-Reply-To: Message-ID: > Heh heh. Simplicity is funny. I believe that Brandon Wiley prefers Chord over > Pastry because of its simplicity. ;-) I prefer Chord over Kademlia because of simplicity and ease of explanation to people who might want to implement it. I don't have much of an opinion on Pastry currently. From blanu at bozonics.com Mon Mar 31 12:55:03 2003 From: blanu at bozonics.com (Brandon Wiley) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Sloppy Chord In-Reply-To: Message-ID: > This feature seems very important to me, not because the "free choice" part can > be used to select peers with low latency or few network hops, but because it can > be used to select on arbitrary other criteria, such as avoiding peers that are > unreachable (due to an incompletely connected underlying network), avoiding > peers that are untrusted, or other criteria. Some of this you could do with MIT Chord. If a node has a globally known property (unreachability, untrustedness according to a central authority) then you can just not let "bad" nodes join. You only need "sloppy" Chord if you want nodes to make a routing choice which is based on subjective criteria or information. One of my censorship resistance schemes for China works like the former. The global criteria for joining is that the node is not in China (according to a ip->location service). This forms a supernode-like network of "good" (not in China) nodes with the "bad" (in China) nodes attached to the edges of the network. Unlike other supernode-based networks like Kazaa, however, the supernodes are not organzed haphazardly, but into a nice, neat Chord ring. I also do this for the Alluvium mirror-finding network by having the criteria be "not behind NAT". > Brandon Wiley suggested to me that one could have a "Sloppy Chord", using the > same, asymmetric, distance metric, but changing the requirement of the k'th > finger table entry from "the precessor of selfId + 2^k" to "a node in the > interval 2^(k-1) through 2^k". On important thing to remember about Chord is that for the lookups to succeed, you *only* need for next pointers to be correct and to always follow pointers forward. You don't need finger tables. You can have incorrect finger tables as long as you never follow a bad finger backwards, and this is something you can easily check for yourself. MIT Chord sets you up with a finger table which you can follow for the nice O(log N). However, you should feel perfectly free to keep around references to nodes that don't fit in your finger table. And you should feel free to follow such references if they are closer to the key than either 1) you or 2) the last reference you followed, and 3) you like them better than the other references you have by some criteria. I can't guarantee that if you follow references other than your finger table you'll get there in O(log n). However, you can't optimize for two things at once. If you want to get there fast, the finger table is your roadmap to guaranteed speed. Of course some reference you have might actually be closer to the key than your finger table's reference. Then you should follow that. The O is the same, but the actual time to the key might be less. If you don't care about getting there as fast as possible, use something based on criteria other than being in your finger table. You'll still get there for sure. > After thinking about it for a few minutes, there isn't any obvious reason why > this wouldn't have the same asymptotic performance guarantees as proper MIT > Chord. I wish there was some analysis on this. I can only say that I don't see a problem here, but I don't feel like trying to prove it. Of course, this is one of the non-technical problems with MIT Chord. It sticks very closely to the provable whereas for real applications less provable systems with better actual properties are desirable. From nramabha at cs.ucsd.edu Mon Mar 31 12:59:03 2003 From: nramabha at cs.ucsd.edu (Narayanan S RAMABHADRAN) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Sloppy Chord In-Reply-To: Message-ID: Hi This would probably work. However, to achieve theoretical guarantees, it may be necessary to add a restriction, such as choosing a node with uniform probability from that interval. If you choose based on a metric like latency, you may not be able to theoretically show that it has the same performance as Chord. In practice, it would probably do well. I have done some as yet unpublished work that shows that choosing based on uniform probability does indeed work well. There is also some work from Stanford called Symphony, published in USITS 2003, that does something similar. They extend Chord by allowing a node to choose neighbors at random but governed by a certain probability distribution over the key space that ensures efficient routing. While this is not exactly "free choice", it does loosen up some of the rigidity in Chord. It has the disadvantage that you need to estimate N, the total number of nodes in the system. Regards, Sriram Narayanan Sriram Ramabhadran Graduate student Dept. of Computer Science & Engg. University of California San Diego On Sun, 30 Mar 2003, Zooko wrote: > > A fundamental advantage of Pastry [1] and Kademlia [2] over Chord [3] would seem > to be that in the former two, there is a "free choice" in which peers a node > links to, whereas in Chord each peer is uniquely determined. > > The Pastry papers describe this feature in terms of an arbitrary "proximity > metric". The Kademlia paper says: > > "Worse yet, asymmetry leads to rigid routing tables. Each entry in a Chord > node's finger table must store the precise node preceding some interval in the > ID space. Any node actually in the interval would be too far from nodes > preceding it in the same interval. Kademlia, in contast, can send a query to > any node within an interval, ..." > > This feature seems very important to me, not because the "free choice" part can > be used to select peers with low latency or few network hops, but because it can > be used to select on arbitrary other criteria, such as avoiding peers that are > unreachable (due to an incompletely connected underlying network), avoiding > peers that are untrusted, or other criteria. > > But is this rigidity really a consequence of Chord's asymmetric distance metric? > > Brandon Wiley suggested to me that one could have a "Sloppy Chord", using the > same, asymmetric, distance metric, but changing the requirement of the k'th > finger table entry from "the precessor of selfId + 2^k" to "a node in the > interval 2^(k-1) through 2^k". > > After thinking about it for a few minutes, there isn't any obvious reason why > this wouldn't have the same asymptotic performance guarantees as proper MIT > Chord. > > Regards, > > Zooko > > [1] http://citeseer.nj.nec.com/rowstron01pastry.html > [2] http://citeseer.nj.nec.com/529075.html > [3] http://citeseer.nj.nec.com/stoica01chord.html > > http://zooko.com/ > ^-- under re-construction: some new stuff, some broken links > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > From bert at akamail.com Mon Mar 31 13:26:02 2003 From: bert at akamail.com (Bert) Date: Sat Dec 9 22:12:20 2006 Subject: [p2p-hackers] Sloppy Chord In-Reply-To: References: Message-ID: <3E88B2CB.6030103@akamail.com> Simplicity is indeed underappreciated in these sorts of DHT schemes (and in "academic" p2p research in general). Why else would Gnutella be so popular? Here's a little-known DHT approach which is straightforward, (relatively) simple to implement, and has provable O(log n) performance (with very high probability). Symphony: Distributed Hashing in a Small World Gurmeet Manku, Mayank Bawa and Prabhakar Raghavan. USITS, 2003 http://www-db.stanford.edu/~bawa/Pub/symphony.pdf In general I think randomized approaches such as this make a LOT more sense than any of Chord/Kademlia/Pastry. They can offer greater flexibility and robustness due to significantly less rigid distribution and routing rules. Bert