From gojomo at usa.net Tue Oct 1 00:30:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency References: <3D969A65.6020902@notdot.net> Message-ID: <00b701c2691c$4d5a9ce0$640a000a@golden> I'm coming into the discussion late, but: Interesting idea. I believe Sergeui Osokine has suggested something similar for Gnutella: instead of upload slot quotas, take all comers, just slice the bandwidth more thinly. My fear is that you might tend toward a state where everyone is making progress toward completing every file download, but at an ever-slowing rate, so that few downloads ever complete: Zeno's paradox for P2P. Further, once people realize such a system is in effect, they might perceive it in their narrow self-interest to open an unbounded number of download connections... where really, you just want the smallest number that can saturate your local link and average-out the TCP sawtooth backoffs. I think LimeWire now has a mixed slot-allocation policy, where rather than a hard number of download slots, it watches throughput over time, and allows new simultaneous downloads when they appear to improve total outflow. If instead, adding one more connectios seems to just slice the same amount of outflow more thinly, or hurt outflow, then no new download connections are granted. Some tweaked version of that policy is probably well into the "better than good enough" territory that would only need to be further optimized as an academic exercise. If you had a global viewpoint, you might want to bias downloads in favor of people who already have most of the file: delivering the 10k that actually completes someone's file, rather than any random 10k, makes people happier in total, and perhaps accelerates other valuable system processes (the resharing of the file, its evaluation and selection-in or -out of the file pool, etc.). OTOH, if people "grab and run", only sharing out while they are in the process of downloading, a system which keeps more people hanging on completion might paradoxically satisfy more people over the long term. Maybe. - Gojomo ____________________ Gordon Mohr Bitzi CTO . . . describe and discover files of every kind. _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! ----- Original Message ----- From: "Nick Johnson" To: Sent: Saturday, September 28, 2002 11:15 PM Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency > I was pondering the setup used by many (most?) current P2P networks, and > the following occurred to me: > Most current P2P clients work off the assumption that the number of > uploads should be relatively small, so as to allow a reasonable > bandwidth for each downloader, similar to the same tactic used by > traffic-heavy HTTP and FTP servers. This means that a P2P servent often > has to wait long times and multiple retries before it gets it's chance > to download from a specific host. For many networks, getting to download > is essentially random - who gets on straight after a slot becomes free? > This problem is less for files that are very popular and have spread > between many hosts, as there are more sources to download from. However, > the popularity again ensures that getting a free slot will be difficult. > Many networks support multiple & segmented downloading, and AFAIK many > of these support a client providing uploads of a segment of a file even > when it is not entirely downloaded. > It seems to me that a better approach would be for servents to have a > very high cap on the number of concurrent uploads, which should cause > more clients to be able to get the file at a slow speed immeditately, > instead of fewer being able to get it at a faster speed. While the two > may seem the same, a greater number of clients beginning a download > means that all these clients have segments of the file, however small, > and are themselves potential servers of that file. Hence, a popular file > can quickly spread in segments from one overloaded servent to many > interested servents. Each servent is likely to have a different part of > the file, so they can all start downloading the parts they do not have > from other downloaders. Hopefully this should result in faster overall > downloads and a more efficient network, spreading the load better. > Combined with clients that start downloads at random positions (to > prevent all servents having the beginning and few having the end), and a > system that provides the addresses of other recent downloaders (and > parent nodes that provided the file) to speed resource discovery, I > belive this system could prove significantly more efficient than current > limited-availablility systems. > > So, you made it to the end of all this. Does anyone have any comments on > the practicality of this idea? It seems good to me, but n heads are > better than 1 ;). > > Nick Johnson > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers From dehora at eircom.net Tue Oct 1 02:01:01 2002 From: dehora at eircom.net (=?iso-8859-1?Q?Bill_de_h=D3ra?=) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency In-Reply-To: Message-ID: <002701c26928$93411700$1fc8c8c8@mitchum> > [mailto:p2p-hackers-admin@zgp.org] On Behalf Of Bram Cohen > > > Any given protocol gains no interoperability > whatsoever from using XML, since its semantics will be unique > regardless of what data format it uses. Well if you have to use a data format... > The obsession with using XML for everything is just plain > stupid. Similar sentiments have been expressed about TCP and HTTP. But XML has a social value that currently outweighs its limitations for use in protocols. It's good not to suspend technical judgement on XML, But your .sig says it all really - the market's chosen XML. Bill de h?ra -- Propylon www.propylon.com From mccoy at mad-scientist.com Tue Oct 1 03:11:01 2002 From: mccoy at mad-scientist.com (Jim McCoy) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency In-Reply-To: <3D96B782.70607@notdot.net> Message-ID: <25034AB2-D510-11D6-88DB-0003931095E0@mad-scientist.com> On Sunday, September 29, 2002, at 01:19 AM, Nick Johnson wrote: [...] > Incidentally, why the use of 'bencoding', as opposed to, say, XML? Bencoding is an improvement on mencoding (the 'b' is either for Bram or BitTorrent, the 'm' is for Mojo) and both are ruthlessly efficient and easy to parse/validate when compared to XML. Jim From decoy at iki.fi Tue Oct 1 03:51:01 2002 From: decoy at iki.fi (Sampo Syreeni) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency In-Reply-To: Message-ID: On 2002-09-30, Bram Cohen uttered to p2p-hackers@zgp.org: >BitTorrent uses four, that seems to be plenty. I'm aware of that. I too consider BT elegant. >That's dependant on the amount of time it takes to get a single block >from one machine to another, which smaller pieces makes better, but more >connections makes worse. True. I should've commented on that. What we'd likely want, in the optimum, is more or less constant size blocks but a number of connections scaling based on node uplink bandwidth. This way overhead is kept constant, and no negotiation is needed between nodes of different speeds. The price is latency with slower nodes, but that's just as well. If faster nodes are available, it's better to handle replication through them anyway. Based on pure intuition, an approach like this would seem to buy us optimally fast replication of heavily requested blocks without harming the performance for those less so. TCP congestion control then only becomes a problem with higher capacity nodes. Also, Gordon's comment applies. The configurable connection limit needs to be hard so that infinite slowdown is avoided. It comes to me that adapting the number based on a running latency/loading statistic would be nice. >Smaller pieces increase overhead a lot. I picked values for BitTorrent >which keep it miniscule in all cases. Yes. Engineering is the science of tradeoffs. >BitTorrent uses TCP as a black box and it works great. It does. However, I'm worrying about the generic peering case, where there's no such thing as a centralized tracker, and where the number of clients is somewhat larger than would be expected of a current BitTorrent network with its vast majority of highly transient nodes. (I also tend to think in terms of distributed file systems and databases more than blob dissemination.) In this case we do not expect clients to know whom to contact unless we disseminate load metrics. Just trying to connect is inelegant and dangerous, since the first messages in a connection setup obviously cannot be congestion controlled using TCP style backward adaptation -- there is a risk of tremendous SYN loading if one advertises a particularly hot piece of software. If bandwidth advertising is not done, on the other hand, higher speed nodes will be underutilized and connection churn on the lower speed ones cannot be controlled. Also I'm thinking largely in terms of highspeed nodes, not transient ones connected via modem. Under the above reasoning, faster nodes could very well have optimum numbers of outbound links nearing a hundred. In this case, TCP's quirks will start to show unless we do something about them. (Transient nodes are a big thing in P2P, of course, and mustn't be forgotten. The question of highspeed transience is particularly funky, considering that there's considerable potential in such nodes which isn't easy to advertise. Still, transience is something I'm not too familiar with. It's better to leave that side of the equation to others on the list.) -- Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111 student/math+cs/helsinki university, http://www.iki.fi/~decoy/front openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 From arachnid at notdot.net Tue Oct 1 04:09:01 2002 From: arachnid at notdot.net (Nick Johnson) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency References: Message-ID: <3D998284.3050700@notdot.net> Bram Cohen wrote: > Well, we have firm confirmation that opening up too many TCP connections > will cause them to not back off properly, and that 16 is so ludicrously > bad that it gets poor performance despite your bandwidth bullyness. It seems to me that such a blanket statement cannot be at all accurate. For example, are you indicating that because the southern-cross cable carries more than 16 concurrent TCP connections, none of them will back off properly, and hence TCP across the Southern Cross cable cannot function? I would tend to argue that the number of concurrent TCP connections that can be reliably maintained for best performance in a particular situation (be it client-server as with FTP or peer-to-peer with multiple downloaders) will depend amongst other things on the total bandwidth of the pipe, the number of connections, and the maximum bandwidth each connection would use. Which is exactly why I posed the question - has anyone run some simulations to establish even how to work out the optimal number of connections a servent should allow in a P2P network given a particular size pipe, or even what sort of network can make best use of these strategies? Nick Johnson From arachnid at notdot.net Tue Oct 1 04:23:01 2002 From: arachnid at notdot.net (Nick Johnson) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency References: Message-ID: <3D9985B3.8040409@notdot.net> Bram Cohen wrote: > Bencode parsers are vastly easier to write and a small fraction the size > of XML parsers. However, since XML parsers already exist for essentially every commonly used language, negating the need to write a parser at all. XML also holds the advantage of being easier on the eyes. If a good compressed-XML standard could be decided on, it would also be compact enough for transmission where overhead is particularaly important. I'm not claiming it's the be-all and end-all for data interchange, just that it has its uses. From reading the description of bencoding, I would have to agree that it seems to be pretty easy to parse, but I can still see relatively few situations where one couldn't use a pre-built XML parser for the same task... > The obsession with using XML for everything is just plain stupid. Any > given protocol gains no interoperability whatsoever from using XML, since > its semantics will be unique regardless of what data format it uses. It's > like painting your house puke green because you already happen to have a > few cans of paint of that color. However, if you know the general format of XML, there is far less overhead learning a new XML based data format than learning a completely new data format. Also, as I said earlier, XSLT can translate most XML data into another variant already readable by your program. Incidentally, I should probably point out that this is getting rather off-topic... Nick Johnson From bram at gawth.com Wed Oct 2 14:44:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency In-Reply-To: <3D994485.5010005@chapweske.com> Message-ID: Justin Chapweske wrote: > Swarmcast has been effectively abandoned, but is still very interesting > in that it is the first and only system to approach the theoretical > channel capacity of a multi-path network. EDonkey and the new follow-on project Overnet do something like that, although I'm not sure exactly what you mean. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bram at gawth.com Wed Oct 2 14:55:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency In-Reply-To: Message-ID: Sampo Syreeni wrote: > >BitTorrent uses TCP as a black box and it works great. > > It does. However, I'm worrying about the generic peering case, where > there's no such thing as a centralized tracker, All the BitTorrent tracker does is help nodes find each other, it has no concept of transfer rates. > and where the number of clients is somewhat larger than would be > expected of a current BitTorrent network with its vast majority of > highly transient nodes. BitTorrent has scaled to 100 simultaneous downloaders without breaking a sweat. It can probably scale up a couple more orders of magnitude before more problems start to happen. I still haven't seen a decent reason for a file transfer application to not use TCP as a black box. TCP's weaknesses mostly have to do with producing low latency or handling networks which have a high packet loss even when they aren't congested. > Also I'm thinking largely in terms of highspeed nodes, not transient ones > connected via modem. Under the above reasoning, faster nodes could very > well have optimum numbers of outbound links nearing a hundred. BitTorrent deployments are typically about a gigabyte, and mostly downloaded by people on DSL lines. People with high upload capacity wind up sending to (trading with, really) people with high download capacity. There's no need to increase the number of uploads a particular machine does. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bram at gawth.com Wed Oct 2 16:04:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency In-Reply-To: <3D998284.3050700@notdot.net> Message-ID: Nick Johnson wrote: > Bram Cohen wrote: > > Well, we have firm confirmation that opening up too many TCP connections > > will cause them to not back off properly, and that 16 is so ludicrously > > bad that it gets poor performance despite your bandwidth bullyness. > > It seems to me that such a blanket statement cannot be at all accurate. Sure it can. If a bunch of TCP connections are all downloading from the same congested server, they'll all get about the same download rate. If you open two, you'll get more than your fair share. If everybody did that, of course, it would make everything worse, which is why talking about it being a 'performance improvement' is ludicrous. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From gojomo at usa.net Wed Oct 2 16:30:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency References: Message-ID: <01d101c26a6b$908d75f0$640a000a@golden> Bram Cohen writes: > Nick Johnson wrote: > > Bram Cohen wrote: > > > Well, we have firm confirmation that opening up too many TCP connections > > > will cause them to not back off properly, and that 16 is so ludicrously > > > bad that it gets poor performance despite your bandwidth bullyness. > > > > It seems to me that such a blanket statement cannot be at all accurate. > > Sure it can. If a bunch of TCP connections are all downloading from the > same congested server, they'll all get about the same download rate. If > you open two, you'll get more than your fair share. > > If everybody did that, of course, it would make everything worse, which is > why talking about it being a 'performance improvement' is ludicrous. Depending on the ways in which packet loss is distributed, it's also possible that opening a large number of TCP connections to the same host could aggravate, rather than ameriolate, the way TCP's backoff strategy prevents optimal bandwidth usage. Here's the theory -- though I'm not sure how often this happens in practice: With each TCP connection, packet transfer rates crawl up by small steps, then drops off by one large step when packet loss is detected. As a result, the actual rate achieved by any one connection follows a sawtooth pattern, always below the 'full utilization' rate, and the area above the sawtooth approximates capacity unused. To some extent, having multiple TCP connections, each of which backs-off at a different time, better fills the gaps -- because multiple connections are crawling up while any one is falling back. However, if the packet loss happens in a pattern where all connections detect losses at the same time, they they'll all back off in sync, leaving throughput no better and perhaps worse than the single connection case. I don't know which effect predominates. - Gojomo From wesley at felter.org Wed Oct 2 16:48:01 2002 From: wesley at felter.org (Wes Felter) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency In-Reply-To: <01d101c26a6b$908d75f0$640a000a@golden> Message-ID: on 10/2/02 6:29 PM, Gordon Mohr at gojomo@usa.net wrote: > However, if the packet loss happens in a pattern where all > connections detect losses at the same time, they they'll all > back off in sync, leaving throughput no better and perhaps worse > than the single connection case. IIRC Floyd and van Jacobsen proposed RED to solve TCP self-synchronization. So if the backbones are using RED, you'd expect multiple connections to improve throughput. (Unless the endpoints are using TCP Vegas, which nobody does.) Wes Felter - wesley@felter.org - http://felter.org/wesley/ From mfreed at cs.nyu.edu Wed Oct 2 17:02:01 2002 From: mfreed at cs.nyu.edu (Michael J. Freedman) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency In-Reply-To: Message-ID: Given where this conversation has gone, I thought to point out a very related paper: Robert Morris, TCP Behavior with Many Flows, IEEE International Conference on Network Protocols, October 1997, Atlanta, Georgia. http://www.pdos.lcs.mit.edu/~rtm/papers/icnp97-web.pdf .ps Robert's thesis also describes improved methods for dropping packets... --mike On Wed, 2 Oct 2002, Wes Felter wrote: > Date: Wed, 02 Oct 2002 18:49:18 -0500 > From: Wes Felter > Reply-To: p2p-hackers@zgp.org > To: p2p-hackers@zgp.org > Subject: Re: [p2p-hackers] Restrictions on number of downloads and > network efficiency > > on 10/2/02 6:29 PM, Gordon Mohr at gojomo@usa.net wrote: > > > However, if the packet loss happens in a pattern where all > > connections detect losses at the same time, they they'll all > > back off in sync, leaving throughput no better and perhaps worse > > than the single connection case. > > IIRC Floyd and van Jacobsen proposed RED to solve TCP self-synchronization. > So if the backbones are using RED, you'd expect multiple connections to > improve throughput. (Unless the endpoints are using TCP Vegas, which nobody > does.) > > Wes Felter - wesley@felter.org - http://felter.org/wesley/ > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > ----- "Not all those who wander are lost." www.michaelfreedman.org From decoy at iki.fi Wed Oct 2 18:27:01 2002 From: decoy at iki.fi (Sampo Syreeni) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency In-Reply-To: Message-ID: On 2002-10-02, Bram Cohen uttered to p2p-hackers@zgp.org: >All the BitTorrent tracker does is help nodes find each other, it has no >concept of transfer rates. Quite. Now scale BT to something like 10000+ downloaders and/or highly volatile data, as in the case of transaction processors and online databases. Not only will the tracker become a bottleneck, but also the fact that those attempting to connect mostly shouldn't unless the capacity is known to be there. Hence, dissemination of load metrics and a need for an architecture without a centralized tracker. The latter is also a single point of failure for a given resource. We'd like to avoid this if censorship is an issue. (Freenet I consider inelegant in this regard. I don't see how it could ever really scale or provide true anonymity.) >BitTorrent has scaled to 100 simultaneous downloaders without breaking a >sweat. It can probably scale up a couple more orders of magnitude before >more problems start to happen. I doubt that, but I'll grant BT the benefit of a doubt. Still, the question remains: does it in fact scale efficiently? Could it handle the kinds of high interaction scenarios I've been talking about? Does it actually optimally replicate the data? I do not think so. (At this point it's more about elegance and theoretical merit than practical reasons, I'll admit.) >I still haven't seen a decent reason for a file transfer application to >not use TCP as a black box. TCP's weaknesses mostly have to do with >producing low latency or handling networks which have a high packet loss >even when they aren't congested. The reason is simply that you'll want to be a good net.denizen, which I don't think TCP guarantees with sufficient numbers of simultaneous connections. I also think those connections might be needed in certain apps. (E.g. if the data is highly volatile, one would likely end up with something approximating realtime application level multicast. If nothing less suffices, here at least we would want high fanouts wherever possible. Then TCP's connection control might prove a problem, something evidenced by the fact that a unified congestion manager is on its way to becoming an Internet standard.) >People with high upload capacity wind up sending to (trading with, >really) people with high download capacity. There's no need to increase >the number of uploads a particular machine does. Yes, I understand the tit-for-tat rationale utilized in BT. However, I tend to think that download speeds should be roughly equal for all the nodes regardless of upload ones, and that timely replication might well count in certain P2P apps of wide appeal. The latter might work out with something approximating BT's current heuristic. The first wouldn't. Hence, proportionally higher fanouts for higher speed nodes. (I believe such a setup will be the only one giving lower speed nodes an equal incentive to join the network. That's important in that most of the unused resources out there probably reside with the lower end nodes, as do the human resources necessary to get actual content into a network.) -- Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111 student/math+cs/helsinki university, http://www.iki.fi/~decoy/front openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 From bram at gawth.com Wed Oct 2 19:16:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency In-Reply-To: Message-ID: Sampo Syreeni wrote: > On 2002-10-02, Bram Cohen uttered to p2p-hackers@zgp.org: > > >All the BitTorrent tracker does is help nodes find each other, it has no > >concept of transfer rates. > > Quite. Now scale BT to something like 10000+ downloaders and/or highly > volatile data Ten thousand simultaneous downloaders would be about six hits a second, no big deal at all. A hundred thoussand might start to be a bit much for one machine, but that's a truly collosal amount of load. I'm not sure what you mean by 'highly volatile data', but bulk data tranfers by their nature generally don't make much sense for highly temporal data, and for non-bulk transfers congestion control isn't an issue, because there's no congestion to speak of. > >BitTorrent has scaled to 100 simultaneous downloaders without breaking a > >sweat. It can probably scale up a couple more orders of magnitude before > >more problems start to happen. > > I doubt that, but I'll grant BT the benefit of a doubt. Still, the > question remains: does it in fact scale efficiently? Could it handle the > kinds of high interaction scenarios I've been talking about? Does it > actually optimally replicate the data? Yes, yes, and yes. > I do not think so. (At this point it's more about elegance and > theoretical merit than practical reasons, I'll admit.) No, actually, there are plenty of humongous deployments which need ridiculous levels of scaling, although none of them have been done using BitTorrent yet. > >I still haven't seen a decent reason for a file transfer application to > >not use TCP as a black box. TCP's weaknesses mostly have to do with > >producing low latency or handling networks which have a high packet loss > >even when they aren't congested. > > The reason is simply that you'll want to be a good net.denizen, which I > don't think TCP guarantees with sufficient numbers of simultaneous > connections. I also think those connections might be needed in certain > apps. I am still exceedingly skeptical, since every concrete proposal of an app I've seen I know doesn't. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From greg at electricrain.com Thu Oct 3 18:41:01 2002 From: greg at electricrain.com (Gregory P. Smith) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Restrictions on number of downloads and network efficiency In-Reply-To: <002701c26928$93411700$1fc8c8c8@mitchum> References: <002701c26928$93411700$1fc8c8c8@mitchum> Message-ID: <20021004014038.GA9009@zot.electricrain.com> > > Any given protocol gains no interoperability > > whatsoever from using XML, since its semantics will be unique > > regardless of what data format it uses. > > Well if you have to use a data format... Use one that is computationally efficient to parse. That also means it's easy to implement and less prone to all sorts of nasty bugs. > > The obsession with using XML for everything is just plain > > stupid. > > Similar sentiments have been expressed about TCP and HTTP. But XML has a > social value that currently outweighs its limitations for use in > protocols. It's good not to suspend technical judgement on XML, But your > .sig says it all really - the market's chosen XML. The market has not chosen XML as the underlying wire format for protocols. The reasons have already been stated: it doesn't canonicalize easily and it can't represent bulk/binary data without significant computational inefficiency and space overhead. It's fine to use it for higher level meta info and things that might actually effect what gets presented to the user but it doesn't belong at the lowest level. (xhtml is a good example of using it; it's used as presentation markup with references to other URIs for binary data rather than attempting to that directly within the document.) -g From bram at gawth.com Mon Oct 7 17:07:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] meeting sunday Message-ID: remember, there's a p2p-hackers meeting this sunday, october 13th, in the metreon, in san francisco, starting at 3pm. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From adam at cypherspace.org Tue Oct 8 07:49:01 2002 From: adam at cypherspace.org (Adam Back) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] (no subject) In-Reply-To: <1033163712.1157.12.camel@arlx031.austin.ibm.com>; from wesley@felter.org on Fri, Sep 27, 2002 at 04:55:12PM -0500 References: <5.1.0.14.0.20020926164449.00ba4228@pop3.student.unsw.edu.au> <1033163712.1157.12.camel@arlx031.austin.ibm.com> Message-ID: <20021008154759.A2057279@exeter.ac.uk> I believe giFT only supported the peer protocol, and could not act as a super-node. But I think Fast Track dynamically elects super-nodes. Adam -- http://www.cypherspace.net/ On Fri, Sep 27, 2002 at 04:55:12PM -0500, Wes Felter wrote: > On Thu, 2002-09-26 at 01:48, Iram Mahboob wrote: > > Hi, > > Are the supernodes in Fast Track elected dynamically or are they fixed > > (like in Napster). Can someone please send me a pointer to a document that > > explains the protocol. I am trying to figure out how Fast Track works. > > I think they're elected dynamically. You should find a copy of the > source code for an old version of giFT, which supported the FastTrack > protocol. From bram at gawth.com Sun Oct 13 12:54:01 2002 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] meeting today Message-ID: Remember, meeting today, the 13th, 3pm, the metreon. I'm about to head out myself. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bdarla at KOM.tu-darmstadt.de Mon Oct 14 09:47:01 2002 From: bdarla at KOM.tu-darmstadt.de (Vasilios Darlagiannis) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] PIK Journal, Special Issue on P2P, extended deadline Message-ID: <3DAAAD92.10407@kom.tu-darmstadt.de> FYI, After receiving many requests, it was decided to extend the submission deadline by one week. This means that the new deadline is on the 22nd of October, 2002, which is going to be final. Best regards, Vasilios Darlagiannis ---- Special Issue on Peer-to-Peer Systems Call for Papers organized by the Journal PIK Peer-to-Peer has emerged as a promising new paradigm for distributed computing, with symmetric communication, as opposed to the Client-Server paradigm. Peer-to-Peer systems are characterized as being decentralized, self-organizing distributed systems, which received a special attention lately. The popularity of Peer-to-Peer networks, which can already be observed in IP-traffic measurements, stems certainly from Internet file sharing domain. However the Peer-to-Peer mindset can be employed beyond these content distribution applications, as in the area of bidirectional communication and collaboration, distributed computing and messaging. Thus the new perspective of Peer-to-Peer networking holds new challenges, e.g. concerning the traffic management in mobile and fixed networks hosted by Internet Service Providers, as well as in enterprise networks. On the other hand an increased flexibility and a higher fault tolerance of the overall system, and an increased network value are only a few possible advantages connected to the idea of Peer-to-Peer networking. The goal of the Special Issue is to present novel peer-to-peer technologies, protocols, applications and systems, and also to identify key research issues and challenges that lie ahead. Topics of interest include, but are not limited to: ? Novel Peer-to-Peer applications and systems ? Peer-to-Peer service development ? Peer-to-Peer infrastructure and overlay networks ? Protocols for resource management/discovery/reservation/scheduling ? Security and anti-censorship in Peer-to-Peer systems ? Performance and measurements issues of Peer-to-Peer systems ? Fault tolerance, scalability, availability, accessibility in Peer-to-Peer networks ? Quality of Service and billing/accounting issues in Peer-to-Peer networks Guidelines for Submission: Submissions are expected to consist of not more than 6 pages, with at most 6000 characters. The text can be written either in English or in German. Authors are invited to submit by email a single electronic version of their paper to the address p2p@lkn.ei.tum.de . Both Postscript and PDF formats are the only acceptable formats. Important Dates: Papers due: October 15, 2002 Notification to authors: November 15, 2002 Camera ready papers due: December 15, 2002 Guest Editors: J?rg Ebersp?cher, LKN, Technische Universit?t M?nchen Ralf Steinmetz, KOM, Technische Universit?t Darmstadt -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zgp.org/pipermail/p2p-hackers/attachments/20021014/26daea88/attachment.html From arachnid at notdot.net Tue Oct 15 03:21:01 2002 From: arachnid at notdot.net (Nick Johnson) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Peer based chat application Message-ID: <3DABEC46.5000205@notdot.net> I've been experimenting with a peer based chat application recently. The basic idea was originally to devise a way to create impromptu 'chat-rooms', without requiring access to a centralised server. I've written some proof-of-concept code, which implements a very basic version of the protocol, along with a simple curses-based GUI. The current version of the code behaves (in brief) as follows: When executed, it opens a listening socket on port 7777. The user can use the command /connect servername to establish a connection to another server, or another server can establish one by connecting on port 7777. These servers then exchange messages using a simple protocol, currently consisting of: nick message_id message \n Where nick is the nickname of the sender, mesage_id is a long integer in hexadecimal notation, and message is the text message. Whenever a servent recieves a message over a connection, it first checks a cache of recent message IDs for the ID attached to the message. If it already exists in the cache, the message is silently dropped, thus preventing infinite loops if the graph of connections has cycles. If the message ID is not in the cache, it notifies the client code of the message, then forwards it on to all its other connections. The code (including makefile) can be found at http://notdot.net/peerchat.tar.gz - be warned, it is purely proof of concept, and makes heavy (and possibly bad ;) use of threads. A rewrite is high up my list. Extensions to this code could involve: Presence notification - determination of online users, even given unexpected disconnections from the network Forgery prevention - to prevent other users from imitating someone with a nickname that is 'taken', this could work by refusing to acknowledge messages that come in over a connection other than the expected one for that nickname Mutliple channels & channel routing - An obvious extension, this would add the complexity of selectively routing all channel messages to only those clients that are interested in recieving them Selective connection dropping/blocking - If a servent has multiple connections to the same 'network', it can measure the quantity of duplicate messages it recieves over each connection, and use this to either ask a peer to stop sending messages based on certain criteria, or drop that connection entirely. This, combined with the next item, could yield a network that automatically organises itself for most efficient message passing (given a finite maximum number of links each servent maintains) Discovery - implementation of a way for a servent to establish the IP of other peers on the network so it can connect to them to increase both reliability and efficiency. As you can see, if I plan on extending this code as far as I believe it could go, I've got a fair bit of work ahead of me ;). I believe, however, that a network structure and broadcast system similar to this could be very efficient at broadcasting information of whatever type to a multitude of changing users. I would greatly appreciate any feedback anyone is able to give - the idea siezed me somewhat, and I hacked out the code to try it out, but I've had little opportunity for feedback. Thanks for reading this far, anyway ;) Nick Johnson From clint at TheStaticVoid.net Tue Oct 15 05:13:01 2002 From: clint at TheStaticVoid.net (Clint Heyer) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] Ann: Naanou Message-ID: <200210151212.g9FCCJa00054@trinity.cc.uq.edu.au> Hi All, A while back I posted a questionnaire asking some P2P-related questions, and later, the results. I now have an beta-level implementation available at: http://thestaticvoid.net/naanou/index.shtml. Points of interest: - Based on Chord protocol - Uses MP3, AVI, JPEG (and a few other image types) metadata for searching - Multi-source adaptive download system - Once you begin to download a file, your node makes it available to others - When downloading from the `bandwidth disadvantaged' sizes of blocks fetched is reduced. For those on beefy connections, block sizes scale up to reduce overhead - `Clean-room' implementation in C# (a .NET language) The implementation was done as a vehicle for tests involving P2P moderation (not present in this public version). Oh, there is a quick questionnaire at http://thestaticvoid.net/naanou/questionnaire.shtml if anyone feels so inclined :) cheers, .clint ----- Clint Heyer (clinth@acm.org) Web: http://TheStaticVoid.net - Mobile: 0421011224 ----- From afanous at EE.UManitoba.CA Thu Oct 17 05:47:02 2002 From: afanous at EE.UManitoba.CA (Amgad Fanous) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] trasport protocol Message-ID: On the Jxta project's site, it says that Jxta is protocol independent. Does this mean it can run on protocols other than TCP? Such as UDP or even a new protocol? Does anyone know if this is true? And is there is any benifit of running Jxta on top of a different transport protocol? Thank you, Amgad From agl at imperialviolet.org Thu Oct 17 09:04:01 2002 From: agl at imperialviolet.org (Adam Langley) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] trasport protocol In-Reply-To: References: Message-ID: <20021017160141.GA15324@linuxpower.org> On Thu, Oct 17, 2002 at 01:29:46AM -0500, Amgad Fanous wrote: > Does anyone know if this is true? And is there is any benifit of running > Jxta on top of a different transport protocol? Nothing in the spec that I've seen says anything about needing TCP. JXTA (as with most protocols) just requires a reliable, in-order stream so if you can build one, JXTA will run on it. Thus UDP would need a layer on top to be usable for JXTA (like Airhook). That still, of course, leaves the question of why you would want to use JXTA in the first place. -- Adam Langley agl@imperialviolet.org http://www.imperialviolet.org (+44) (0)7986 296753 PGP: 9113 256A CC0F 71A6 4C84 5087 CDA5 52DF 2CB6 3D60 From wesley at felter.org Thu Oct 17 09:44:01 2002 From: wesley at felter.org (Wes Felter) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] trasport protocol In-Reply-To: References: Message-ID: <1034872630.1158.3.camel@arlx002.austin.ibm.com> On Thu, 2002-10-17 at 01:29, Amgad Fanous wrote: > > On the Jxta project's site, it says that Jxta is protocol > independent. Does this mean it can run on protocols other than TCP? Such > as UDP or even a new protocol? The JXTA lists are a better place to ask this. AFAIK, they are also planning to run over Bluetooth. -- Wes Felter - wesley@felter.org - http://felter.org/wesley/ From wesley at felter.org Thu Oct 17 14:43:01 2002 From: wesley at felter.org (Wes Felter) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] OceanStore "Pond" prototype source code released Message-ID: <1034890543.6617.20.camel@arlx002.austin.ibm.com> OceanStore is "a global persistent data store designed to scale to billions of users. It provides a consistent, highly-available, and durable storage utility atop an infrastructure comprised of untrusted servers." It's also another example that P2P != file sharing. http://oceanstore.sourceforge.net/pond-sf.html -- Wes Felter - wesley@felter.org - http://felter.org/wesley/ From gojomo at usa.net Thu Oct 17 16:05:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] OceanStore "Pond" prototype source code released References: <1034890543.6617.20.camel@arlx002.austin.ibm.com> Message-ID: <031101c27631$96297690$640a000a@golden> Wes Felter writes: > OceanStore is "a global persistent data store designed to scale to > billions of users. It provides a consistent, highly-available, and > durable storage utility atop an infrastructure comprised of untrusted > servers." It's also another example that P2P != file sharing. > > http://oceanstore.sourceforge.net/pond-sf.html > By simply publicizing the name/keys of the files that you've pushed into the Ocean, so that others may retrieve those files as easily as you, it seems to me that OceanStore would be a very capable platform for file-sharing. But I won't tell the enemies of file sharing, if you won't. Shhhh! - Gojomo From wesley at felter.org Thu Oct 17 17:14:01 2002 From: wesley at felter.org (Wes Felter) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] OceanStore "Pond" prototype source code released In-Reply-To: <031101c27631$96297690$640a000a@golden> Message-ID: on 10/17/02 6:04 PM, Gordon Mohr at gojomo@usa.net wrote: > Wes Felter writes: >> OceanStore is "a global persistent data store designed to scale to >> billions of users. It provides a consistent, highly-available, and >> durable storage utility atop an infrastructure comprised of untrusted >> servers." It's also another example that P2P != file sharing. >> >> http://oceanstore.sourceforge.net/pond-sf.html >> > > By simply publicizing the name/keys of the files that you've pushed > into the Ocean, so that others may retrieve those files as easily as > you, it seems to me that OceanStore would be a very capable platform > for file-sharing. Until you get the bill and the letter from the RIAA. > But I won't tell the enemies of file sharing, if you won't. Shhhh! It's a deal. Wes Felter - wesley@felter.org - http://felter.org/wesley/ From gojomo at usa.net Thu Oct 17 17:45:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] OceanStore "Pond" prototype source code released References: Message-ID: <03b001c2763f$7e7a59c0$640a000a@golden> Wes Felter writes: > on 10/17/02 6:04 PM, Gordon Mohr at gojomo@usa.net wrote: > > > Wes Felter writes: > >> OceanStore is "a global persistent data store designed to scale to > >> billions of users. It provides a consistent, highly-available, and > >> durable storage utility atop an infrastructure comprised of untrusted > >> servers." It's also another example that P2P != file sharing. > >> > >> http://oceanstore.sourceforge.net/pond-sf.html > >> > > > > By simply publicizing the name/keys of the files that you've pushed > > into the Ocean, so that others may retrieve those files as easily as > > you, it seems to me that OceanStore would be a very capable platform > > for file-sharing. > > Until you get the bill and the letter from the RIAA. Which is different from Napster, Gnutella, FastTrack -- how? Also from the OceanStore overview: # We must assume that any server in the infrastructure may # crash, leak information, or become compromised. Promiscuous # caching therefore requires redundancy and cryptographic # techniques to protect the data from the servers upon which # it resides. ...and... # A version-based archival storage system provides durability # which exceeds today's best by orders of magnitude. OceanStore # stores each version of a data object in a permanent, read-only # form, which is encoded with an erasure code and spread over # hundreds or thousands of servers. A small subset of the encoded # fragments are sufficient to reconstruct the archived object; only # a global-scale disaster could disable enough machines to destroy # the archived object. A nastygram from any legal entity is not a "global-scale disaster"; if OceanStore works as advertised it will outdo Freenet in censorship-resistance -- even if local access points may be censorable. So I would say OceanStore is another example that advanced P2P tends to subsume unfettered file-sharing as a trivial application. - Gojomo ____________________ Gordon Mohr Bitzi CTO . . . describe and discover files of every kind. _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! From Bernard.Traversat at Sun.Com Fri Oct 18 08:28:01 2002 From: Bernard.Traversat at Sun.Com (Bernard Traversat) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] trasport protocol References: <20021017160141.GA15324@linuxpower.org> Message-ID: <3DB027BF.1000204@Sun.Com> Adam Langley wrote: >On Thu, Oct 17, 2002 at 01:29:46AM -0500, Amgad Fanous wrote: > > >>Does anyone know if this is true? And is there is any benifit of running >>Jxta on top of a different transport protocol? >> >> > >Nothing in the spec that I've seen says anything about needing TCP. JXTA >(as with most protocols) just requires a reliable, in-order stream so if >you can build one, JXTA will run on it. Thus UDP would need a layer on >top to be usable for JXTA (like Airhook). > The JXTA protocol requirements are closer to UDP in the sense that only an unreliable and uni-directional transport is in fact required. Most implementations will provide as Adam mentioned a reliable, in-order streaming pipe service when build on top of TCP/IP. This is the case of the current implementations. > >That still, of course, leaves the question of why you would want to use >JXTA in the first place. > Virtual addressing based on Peer ID rather than physical IP addresses (a peer can have multiple IP addresses DHCP) Uniform addressability independently of peer physical network locations (NAT, Firewall, etc) Peergroup multicast across multiple physical and independently of the underlying multicast domains. B. > > > From istoica at cs.berkeley.edu Sat Oct 19 15:26:02 2002 From: istoica at cs.berkeley.edu (Ion Stoica) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] IPTPS'03 - Extended Submission Deadline: November 1 Message-ID: <3DB1C911.E6A78B40@cs.berkeley.edu> The paper submission deadline for IPTPS'03 has been extended by one week. Please accept my apology if you receive multiple copies of this message. Ion ---- The 2nd International Workshop on Peer-to-Peer Systems (IPTPS'03) February 20-21, 2003 Claremont Hotel, Berkeley, CA, USA. (http://iptps03.cs.berkeley.edu) Important Dates: * November 1, 2002 : Submission of position papers * December 20, 2002 : Notification of Acceptance * January 15, 2003 : Camera-ready copies * February 20-21, 2002 : IPTPS'03 The 2nd International Workshop on Peer-to-Peer Systems (IPTPS'03) aims to provide a forum for researchers active in peer-to-peer computing to discuss the state-of-the-art and to identify key research challenges in peer-to-peer computing. IPTPS'03 hopes to continue and build on the success of the first workshop, IPTPS'02. The goal of the workshop is to examine peer-to-peer technologies, applications and systems, and also to identify key research issues and challenges that lie ahead. In the context of this workshop, peer-to-peer systems are characterized as being decentralized, self-organizing distributed systems, in which all or most communication is symmetric. Topics of interest include, but are not limited to: * peer-to-peer applications and services * peer-to-peer systems and infrastructures * peer-to-peer algorithms * security in peer-to-peer systems * robustness in peer-to-peer systems * anonymity and anti-censorship * performance of peer-to-peer systems * workload characterization for peer-to-peer systems The workshops aims to bring together researchers and practitioners in the fields of systems, networking, and theory. The program of the workshop will be a combination of invited talks, presentations of position papers, and discussions. To ensure a productive workshop environment, attendance will be limited to about 50 participants who are active in the field. Each potential participant should submit a position paper of 5 pages or less that exposes a new problem, advocates a specific solution, or reports on actual experience. Participants will be invited based on the originality, technical merit and topical relevance of their submissions, as well as the likelihood that the ideas expressed in their submissions will lead to insightful technical discussions at the workshop. Please do not submit abbreviated versions of journal or conference papers. Organizers: Program Committee: Miguel Castro, Microsoft Research Joe Hellerstein, UC Berkeley Richard Karp, UC Berkeley Frans Kaashoek, MIT (co-chair) Nancy Lynch, MIT David Mazieres, New York University Robert Morris, MIT Ion Stoica, UC Berkeley (co-chair) Marvin Theimer, Microsoft Research Amin Vahdat, Duke University Geoffrey Voelker, UC San Diego Ellen Zegura, Georgia Tech Hui Zhang, CMU Steering Committee: Druschel, Rice University Frans Kaashoek, MIT Antony Rowstron, Microsoft Research Scott Shenker, ICIR, Berkeley Ion Stoica, UC Berkeley Administrative Assistant: Bob Miller, UC Berkeley From me at aaronsw.com Fri Oct 25 10:02:01 2002 From: me at aaronsw.com (Aaron Swartz) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading Message-ID: <612CA43B-E83B-11D6-A567-003065F376B6@aaronsw.com> Hi Petar! I hope you don't mind me sending this letter to you personally. All the Kademlia discussion forums seem to be not working. I'm ccing the p2p-hackers list[1], in case they have any bright ideas. I'm working on a project very similar to Varvar[2], except using Python. I'm integrating Khashmir[3], a Python Kademlia implementation and BitTorrent[4], a Python multi-source downloader which I'm told has better attack-resistance properties than ED2K. I was thinking about the issues related to multi-source downloading. My first plan was simply to have nodes near x store the locations of nodes that had the file that hashed to x. However, I believe this is an inefficient solution, since what you really want is a k-bucket to store these nodes in (so as to get the benefits of the eviction policy) and why duplicate work that the network is already doing? Khashmir assumes that it can store the files on the nearby nodes, but for a multi-source file sharing app this seems unfeasible. Nodes may not want to store those files or may not have the disk space or bandwidth to. It seems better to place nodes nearby the files. But I suspect this may introduce issues that would undermine some of Kademlia's guarantees. Since you also seem to be doing multi-source downloading, I was wondering if you could outline the approach you chose. Thanks! [1] http://zgp.org/mailman/listinfo/p2p-hackers [2] http://kademlia.scs.cs.nyu.edu/varvar.html [3] http://sourceforge.net/projects/khashmir/ [4] http://bitconjurer.org/BitTorrent/ -- Aaron Swartz [http://www.aaronsw.com] "Curb your consumption," he said. From zooko at zooko.com Fri Oct 25 10:08:02 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading In-Reply-To: Message from Aaron Swartz of "Fri, 25 Oct 2002 12:01:18 CDT." <612CA43B-E83B-11D6-A567-003065F376B6@aaronsw.com> References: <612CA43B-E83B-11D6-A567-003065F376B6@aaronsw.com> Message-ID: AaronSw wrote: > > I'm working on a project very similar to Varvar[2], except using > Python. I'm integrating Khashmir[3], a Python Kademlia implementation > and BitTorrent[4], a Python multi-source downloader which I'm told has > better attack-resistance properties than ED2K. You should also be aware that mldonkey v2 came out recently, and it says it is compatible with Overnet, and it says that Overnet does Kademlia and multi- source downloading. http://www.nongnu.org/mldonkey/ Regards, Zooko From gojomo at usa.net Fri Oct 25 14:12:02 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading References: <612CA43B-E83B-11D6-A567-003065F376B6@aaronsw.com> Message-ID: <007701c27c6b$1f8a56e0$640a000a@golden> Aaron writes: > It seems better to place nodes nearby the files. But I > suspect this may introduce issues that would undermine some of > Kademlia's guarantees. I have a hunch things would still work OK if: (1) every participating machine creates one "virtual node" centered on each file it can locally share. That is, choose to be an expert on those files you happen to have, and those 'near' what you already have. (2) there is a quick/easy way to give nodes "pop quizzes" on their claimed offerings, so that bogus nodes drop out, just like any other "nonresponsive" node in Kademlia Using a tree hash as the main content-identifier could work well for (2); if someone claims to have the full-file identified by a particular tree hash root value, you can pick any random tiny range of that file, ask them for that range and the proof that range fits into the whole, and assume that if they reliably answer these random probes successfully, they're more-or-less an honest provider of that content. If they fail for any reason, drop them like any other unresponsive ndoe. - Gojomo ____________________ Gordon Mohr Bitzi CTO . . . describe and discover files of every kind. _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! From justin at chapweske.com Fri Oct 25 15:39:02 2002 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading References: <612CA43B-E83B-11D6-A567-003065F376B6@aaronsw.com> <007701c27c6b$1f8a56e0$640a000a@golden> Message-ID: <3DB9C7D9.3000108@chapweske.com> While I would like to use tree hashes for cleaning my house and walking my dog, its not quite appropriate for those tasks. The node can simply retrieve the small bit of data on-demand that it is proving that it has available. The only thing that the tree hash buys you is that you don't need to store the content itself on verifying nodes... Two branches of thought: o There is probably a zero-knowledge way to do this that blinds the root that you are requesting content for. Lets race to see who can come up with the solution first... o Gordon's idea *could* be used to vastly improve mp3.com's old my.mp3.com service where they verified that you had in the CD in your drive by requesting random hashes of data. IMO the mp3.com system had to to store all of the hashes of every file in their database. Gordon's suggestion would allow them to only store the root hash for each CD and place the burdon of proof on the clients. /me watches as all of the lurkers start writing their patent applications... > > Using a tree hash as the main content-identifier could work > well for (2); if someone claims to have the full-file identified > by a particular tree hash root value, you can pick any random tiny > range of that file, ask them for that range and the proof that range > fits into the whole, and assume that if they reliably answer these > random probes successfully, they're more-or-less an honest > provider of that content. If they fail for any reason, drop them > like any other unresponsive ndoe. > -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From gojomo at usa.net Fri Oct 25 16:29:02 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading References: <612CA43B-E83B-11D6-A567-003065F376B6@aaronsw.com> <007701c27c6b$1f8a56e0$640a000a@golden> <3DB9C7D9.3000108@chapweske.com> Message-ID: <009501c27c7e$37e14970$640a000a@golden> Justin writes: > While I would like to use tree hashes for cleaning my house and walking > my dog, its not quite appropriate for those tasks. > > The node can simply retrieve the small bit of data on-demand that it is > proving that it has available. That's fine! The threat is: a censor creates many nodes, each claiming that they are making available specific content that in fact they want to censor. They saturate all the pointers to that area of the content-space. Yet when people ask for that content, they play dumb. Some DHT-based systems attempt to offset this threat by limiting the ability of nodes to choose their own neighborhoods -- for example, by making neighborhood-of-responsibility some deterministic function of IP address, which cannot be arbitrarily changed cheaply. The solution I've proposed instead says, sure, pick your own neighborhoods to match exactly the content you want to be responsible for. But in order for other people to weave you in, you have to pass some tests proving your reliabilty. If you pass these tests, you get wound deeper into the Kademlia mesh for your chosen neighborhood; if you fail, you start dropping out. If content is named via, say, SHA1, there is no quick-and-simple probe to make sure a node is providing what it says it is. You have to get the whole content, or settle for some sort of quorum-of- available-sources approach. If content is named via a treehash, then you can fire small random tests of a node's ability to provide what its says it is providing. If in fact it responds to some/most/all probes successfully, then you don't care what its motivations are, or if it is only grabbing those ranges on-demand -- it's delivering what it promised it would deliver, so you're happy to have it in the mesh. - Gojomo From petar at scs.cs.nyu.edu Fri Oct 25 17:31:02 2002 From: petar at scs.cs.nyu.edu (Petar Maymounkov) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading In-Reply-To: <200210252148.g9PLmUK12515@amsterdam.lcs.mit.edu> Message-ID: Hi All, You are discussing the search-related aspects of multiple-source downloads (BTW I was working on the bandwidth and availability aspect of multi-source downloads). Here are my thoughts on your questions: I am not sure what is your exact setting, b/c you are talking about storing files on "nearby" nodes?? which hints to me that you are trying to achieve some other more ambitious goals, like fixed cguarantees that actual file content is in the network always??? In any case: With regards to the idea for virtual nodes, keep in mind that each virtual node needs to have a valid routing table. If you have 1000 files, this already means you are keeping track of 20*1000*log n contacts. Do you really want this? Petar On Fri, 25 Oct 2002, Gordon Mohr wrote: > Aaron writes: > > It seems better to place nodes nearby the files. But I > > suspect this may introduce issues that would undermine some of > > Kademlia's guarantees. > > I have a hunch things would still work OK if: > > (1) every participating machine creates one "virtual node" > centered on each file it can locally share. That is, > choose to be an expert on those files you happen to > have, and those 'near' what you already have. > > (2) there is a quick/easy way to give nodes "pop quizzes" > on their claimed offerings, so that bogus nodes drop > out, just like any other "nonresponsive" node in > Kademlia > > Using a tree hash as the main content-identifier could work > well for (2); if someone claims to have the full-file identified > by a particular tree hash root value, you can pick any random tiny > range of that file, ask them for that range and the proof that range > fits into the whole, and assume that if they reliably answer these > random probes successfully, they're more-or-less an honest > provider of that content. If they fail for any reason, drop them > like any other unresponsive ndoe. > > - Gojomo > ____________________ > Gordon Mohr bitzi.com> Bitzi CTO . . . describe and discover files of every kind. > _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! > > > > > From gojomo at usa.net Fri Oct 25 18:31:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:03 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading References: Message-ID: <00f301c27c8f$4681bb70$640a000a@golden> Petar writes: > With regards to the idea for virtual nodes, keep in mind that > each virtual node needs to have a valid routing table. If you have > 1000 files, this already means you are keeping track of > 20*1000*log n contacts. Do you really want this? Why not? It's only RAM. And you could probably create a data structure which essentially shares contacts between virtual nodes' contact-buckets. So in the 1000 virtual-node case, you have 1000 very-distinct routing tables for the regions very close to the node-IDs, but for the more distant k-buckets, most/all of the contacts are shared amongst multiple virtual nodes. - Gojomo ----- Original Message ----- From: "Petar Maymounkov" To: "Gordon Mohr" Cc: ; ; Sent: Friday, October 25, 2002 5:25 PM Subject: Re: [p2p-hackers] kandemlia and multi-source downloading > Hi All, > > You are discussing the search-related aspects of multiple-source > downloads (BTW I was working on the bandwidth and availability > aspect of multi-source downloads). Here are my thoughts on your questions: > > I am not sure what is your exact setting, b/c you are talking about > storing files on "nearby" nodes?? which hints to me that you are trying > to achieve some other more ambitious goals, like fixed cguarantees that > actual file content is in the network always??? In any case: > > With regards to the idea for virtual nodes, keep in mind that > each virtual node needs to have a valid routing table. If you have > 1000 files, this already means you are keeping track of > 20*1000*log n contacts. Do you really want this? > > Petar > > > On Fri, 25 Oct 2002, Gordon Mohr wrote: > > > Aaron writes: > > > It seems better to place nodes nearby the files. But I > > > suspect this may introduce issues that would undermine some of > > > Kademlia's guarantees. > > > > I have a hunch things would still work OK if: > > > > (1) every participating machine creates one "virtual node" > > centered on each file it can locally share. That is, > > choose to be an expert on those files you happen to > > have, and those 'near' what you already have. > > > > (2) there is a quick/easy way to give nodes "pop quizzes" > > on their claimed offerings, so that bogus nodes drop > > out, just like any other "nonresponsive" node in > > Kademlia > > > > Using a tree hash as the main content-identifier could work > > well for (2); if someone claims to have the full-file identified > > by a particular tree hash root value, you can pick any random tiny > > range of that file, ask them for that range and the proof that range > > fits into the whole, and assume that if they reliably answer these > > random probes successfully, they're more-or-less an honest > > provider of that content. If they fail for any reason, drop them > > like any other unresponsive ndoe. > > > > - Gojomo > > ____________________ > > Gordon Mohr > bitzi.com> Bitzi CTO . . . describe and discover files of every kind. > > _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! > > > > > > > > > > > > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers From petar at scs.cs.nyu.edu Sat Oct 26 07:22:01 2002 From: petar at scs.cs.nyu.edu (Petar Maymounkov) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading In-Reply-To: <00f301c27c8f$4681bb70$640a000a@golden> Message-ID: On Fri, 25 Oct 2002, Gordon Mohr wrote: > Petar writes: > > With regards to the idea for virtual nodes, keep in mind that > > each virtual node needs to have a valid routing table. If you have > > 1000 files, this already means you are keeping track of > > 20*1000*log n contacts. Do you really want this? > > Why not? It's only RAM. And you could probably create a > data structure which essentially shares contacts between > virtual nodes' contact-buckets. You certainly need to have a common datastructure, and still it is seems to be a lot not in terms of memomry but in terms of bandwidth that it takes to keep it up. However, as I say, "it seems" because we don't now for sure yet, since there is no really working impl of kademlia. Petar > > So in the 1000 virtual-node case, you have 1000 very-distinct > routing tables for the regions very close to the node-IDs, > but for the more distant k-buckets, most/all of the contacts > are shared amongst multiple virtual nodes. > > - Gojomo > > ----- Original Message ----- > From: "Petar Maymounkov" > To: "Gordon Mohr" > Cc: ; ; > Sent: Friday, October 25, 2002 5:25 PM > Subject: Re: [p2p-hackers] kandemlia and multi-source downloading > > > > Hi All, > > > > You are discussing the search-related aspects of multiple-source > > downloads (BTW I was working on the bandwidth and availability > > aspect of multi-source downloads). Here are my thoughts on your questions: > > > > I am not sure what is your exact setting, b/c you are talking about > > storing files on "nearby" nodes?? which hints to me that you are trying > > to achieve some other more ambitious goals, like fixed cguarantees that > > actual file content is in the network always??? In any case: > > > > With regards to the idea for virtual nodes, keep in mind that > > each virtual node needs to have a valid routing table. If you have > > 1000 files, this already means you are keeping track of > > 20*1000*log n contacts. Do you really want this? > > > > Petar > > > > > > On Fri, 25 Oct 2002, Gordon Mohr wrote: > > > > > Aaron writes: > > > > It seems better to place nodes nearby the files. But I > > > > suspect this may introduce issues that would undermine some of > > > > Kademlia's guarantees. > > > > > > I have a hunch things would still work OK if: > > > > > > (1) every participating machine creates one "virtual node" > > > centered on each file it can locally share. That is, > > > choose to be an expert on those files you happen to > > > have, and those 'near' what you already have. > > > > > > (2) there is a quick/easy way to give nodes "pop quizzes" > > > on their claimed offerings, so that bogus nodes drop > > > out, just like any other "nonresponsive" node in > > > Kademlia > > > > > > Using a tree hash as the main content-identifier could work > > > well for (2); if someone claims to have the full-file identified > > > by a particular tree hash root value, you can pick any random tiny > > > range of that file, ask them for that range and the proof that range > > > fits into the whole, and assume that if they reliably answer these > > > random probes successfully, they're more-or-less an honest > > > provider of that content. If they fail for any reason, drop them > > > like any other unresponsive ndoe. > > > > > > - Gojomo > > > ____________________ > > > Gordon Mohr > > bitzi.com> Bitzi CTO . . . describe and discover files of every kind. > > > _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > p2p-hackers mailing list > > p2p-hackers@zgp.org > > http://zgp.org/mailman/listinfo/p2p-hackers > > From justin at chapweske.com Sat Oct 26 10:59:02 2002 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading References: <612CA43B-E83B-11D6-A567-003065F376B6@aaronsw.com> <007701c27c6b$1f8a56e0$640a000a@golden> <3DB9C7D9.3000108@chapweske.com> <009501c27c7e$37e14970$640a000a@golden> Message-ID: <3DBAD7BF.9060101@chapweske.com> > > Some DHT-based systems attempt to offset this threat by limiting > the ability of nodes to choose their own neighborhoods -- for example, > by making neighborhood-of-responsibility some deterministic function > of IP address, which cannot be arbitrarily changed cheaply. > > The solution I've proposed instead says, sure, pick your own > neighborhoods to match exactly the content you want to be responsible > for. But in order for other people to weave you in, you have to pass > some tests proving your reliabilty. If you pass these tests, you get > wound deeper into the Kademlia mesh for your chosen neighborhood; if > you fail, you start dropping out. > You may wish to check out Wei Dai's recent postings to Blue Sky about a hash-cash-ish approach to node id's. Btw, with the weak traffic flow on p2p-hackers and blue sky, would it perhaps make sense to consolidate the lists? -Justin From zooko at zooko.com Sat Oct 26 11:26:02 2002 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] merging bluesky and p2p-hackers In-Reply-To: Message from Justin Chapweske of "Sat, 26 Oct 2002 12:58:23 CDT." <3DBAD7BF.9060101@chapweske.com> References: <612CA43B-E83B-11D6-A567-003065F376B6@aaronsw.com> <007701c27c6b$1f8a56e0$640a000a@golden> <3DB9C7D9.3000108@chapweske.com> <009501c27c7e$37e14970$640a000a@golden> <3DBAD7BF.9060101@chapweske.com> Message-ID: Justin Chapweske wrote: > > Btw, with the weak traffic flow on p2p-hackers and blue sky, would it > perhaps make sense to consolidate the lists? It's fine by me. Historically the reason for the existence of two separate lists is that the bluesky list was originally intended to be exclusive. So exclusive, in fact, that Bram Cohen and I weren't aware of it when we started p2p-hackers. (Unless I misremember, and we were aware of it, but considered ourselves and the other p2p-hackers to be uninvited.) Since then I think the bluesky list admins have changed their minds and sought general distribution. There are currently 331 subscribers to p2p-hackers. Regards, Zooko From lgonze at panix.com Sat Oct 26 23:01:02 2002 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading In-Reply-To: <00f301c27c8f$4681bb70$640a000a@golden> Message-ID: > Petar writes: > > With regards to the idea for virtual nodes, keep in mind that > > each virtual node needs to have a valid routing table. If you have > > 1000 files, this already means you are keeping track of > > 20*1000*log n contacts. Do you really want this? > > Why not? It's only RAM. And you could probably create a > data structure which essentially shares contacts between > virtual nodes' contact-buckets. > > So in the 1000 virtual-node case, you have 1000 very-distinct > routing tables for the regions very close to the node-IDs, > but for the more distant k-buckets, most/all of the contacts > are shared amongst multiple virtual nodes. > > - Gojomo It's not so easy to maintain an accurate routing table if the space isn't evenly subdivided. Everything about existing designs for DHTs is based on the assumption that distance in ID-space is the same as distance in node-space. That's the point of consistent hashing: to use local operations on IDs to predict network topology. If you have an entry in your finger table for the node at the opposite side of ID space, but there are more actual nodes on one side of the target than the other, then jumping from your current position to the position of that opposite node doesn't actually traverse 1/2 the nodes. The guarantee of log(n) routing is out the window. Maybe it's a better design to subdivide ID space unevenly, according to files or nodes rather than the number of identifiers. I don't mean that facetiously, what I mean is that it's a whole new project. - Lucas From gojomo at usa.net Sun Oct 27 16:46:01 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading References: Message-ID: <007301c27e1b$43fdf9d0$640a000a@golden> Lucas Gonze writes: > It's not so easy to maintain an accurate routing table if the space isn't > evenly subdivided. Everything about existing designs for DHTs is based on > the assumption that distance in ID-space is the same as distance in > node-space. That's the point of consistent hashing: to use local > operations on IDs to predict network topology. > > If you have an entry in your finger table for the node at the opposite > side of ID space, but there are more actual nodes on one side of the > target than the other, then jumping from your current position to the > position of that opposite node doesn't actually traverse 1/2 the nodes. > The guarantee of log(n) routing is out the window. Actually, this is an area where Kademlia's XOR distance metric and area "buckets" may outperform Chord's arithmetic difference distance metric and exact "fingers". In Kademlia, you just want to know about 'k' long-lived nodes within each distant target region. Provable behavior doesn't depend on those nodes being closest to some exact desired ID-space location. So in fact there are many equally-sufficient routing tables possible, from any particular location. -- Coming back to a concrete example: Say nodes and resources have 160-bit IDs. The size of each "bucket", k, is set to 20. Then each virtual node maintains 160 buckets, each bucket with 20 nodes in the matching range: 3,200 unique routing entries in all. Now extend to the case where you want to run 1,000 virtual nodes at chosen spots. A naive implementation might require 1,000x3,200 -- 3.2 million -- routing entries. But in fact, any one known remote node is useful -- albeit in different buckets -- for every virtual node. Even keeping the "classic" Kademlia node-replacement heuristics, you could probably keep all 160,000 buckets full by reusing a number of unique routing entries that is only few multiples of the single-node 3,200 case. As the network gets large enough, or with a loosening of the rules for which nodes to keep (albeit still within the provable-behavior parameters), it may even be possible to keep all 160,000 routing buckets full with a number of individual entries which closely approaches 3,200. So the marginal memory cost of adding virtual nodes could be almost nil. -- Similarly, adding virtual nodes need not mean more message traffic: Kademlia nodes can "dial-down" their inbound traffic simply by choosing to be intermittently unresponsive -- and thus dropping out of some, but not all, remote buckets. So given an ability to handle a fixed 'm' flow of messages, you can be 1 node or 1000, you just need to give the right amount of flow-control pushback in either case. -- Coming back to Aaron's speculation which triggered my comments: "It seems better to place nodes nearby the files. But I suspect this may introduce issues that would undermine some of Kademlia's guarantees." I think a Kademlia-based network, where node-IDs are purposefully chosen to exactly match locally-available resources, could work really well. Kademlia is virtual-node friendly, and by choosing a resource identifier which allows immediate, cheap probabilistic verifications that a node is providing what it claims to provide -- namely, a tree hash -- you can protect against censors squatting on the node-IDs of material they want to suppress. Only by actually providing the named content, whenever asked, can you set up shop at that node-ID. - Gojomo ____________________ Gordon Mohr Bitzi CTO . . . describe and discover files of every kind. _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! From hal at finney.org Mon Oct 28 09:00:02 2002 From: hal at finney.org (Hal Finney) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading Message-ID: <200210280942.g9S9gdd01973@finney.org> Gordon Mohr writes: > Kademlia is virtual-node friendly, and by choosing a resource > identifier which allows immediate, cheap probabilistic verifications > that a node is providing what it claims to provide -- namely, > a tree hash -- you can protect against censors squatting on the > node-IDs of material they want to suppress. Only by actually > providing the named content, whenever asked, can you set up shop > at that node-ID. I'm not familiar with the details of Kademlia, but is there a problem that these "test probe" queries would generally be smaller than the chunks used for actual message fetching? With large content in the megabytes or gigabytes, an actual message fetch chunk might well be pretty big. (I don't know how many chunks a typical file would be broken into.) Whereas to avoid swamping the network with test probes, those might tend to be relatively small. This could allow a censor node to respond to test probes, but to stop providing data when the query size became large. Or, if the chunk size is the same, it might still be able to tell the difference because test probes would be rare and haphazard, whereas actual fetches would be more intensive and structured. Does the Kademlia architecture prevent censor nodes from distinguishing test probes from actual fetches? Hal Finney From gojomo at usa.net Mon Oct 28 11:18:02 2002 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:12:04 2006 Subject: [p2p-hackers] kandemlia and multi-source downloading References: <200210280942.g9S9gdd01973@finney.org> Message-ID: <005d01c27eb6$9f0be620$640a000a@golden> Hal Finney writes: > Gordon Mohr writes: > > Kademlia is virtual-node friendly, and by choosing a resource > > identifier which allows immediate, cheap probabilistic verifications > > that a node is providing what it claims to provide -- namely, > > a tree hash -- you can protect against censors squatting on the > > node-IDs of material they want to suppress. Only by actually > > providing the named content, whenever asked, can you set up shop > > at that node-ID. > > I'm not familiar with the details of Kademlia, but is there a problem > that these "test probe" queries would generally be smaller than the > chunks used for actual message fetching? To be clear: these "test probes", as well as the notion that nodes choose their own IDs to exactly match content they want to make available, are both speculative ideas for extending a Kademlia-like network. They're not part of the original Kademlia proposals, which can be viewed at: http://kademlia.scs.cs.nyu.edu/ -- That said: yes, I think that you've identified a real threat, that censor nodes would be reliable up to a point, fooling peers enough to become woven-in at their target region, but choose to be unreliable beyond a certain point. (Another variant of this kind of threat: the censor node responds to probes over a specific 80% of the file properly, but for 20% either doesn't respond or responds with junk.) I think these threats are manageable via a number of techniques. One start, which you mention, would be to use the same chunk sizes, or randomize the chunk sizes, for both true-fetches and test-probes, so that censor nodes cannot be certain of the difference between tests and fetches. Ideally, though, probes and fetches would be truly interchangeable, where each provides equal benefit to the rest of the network, so there's no way to make oneself look more reliable than you really are. (Every time you behave well, you become more deeply linked into the network, every time you behave badly, you begin to be cut out.) One way to achieve this, in the context of a Kademlia mesh, would be for nodes to also proxy/cache the content of the nodes they've successfully "probed". Then, when a "censor node" has managed to prove itself for some set of ranges to peers, those peers can provide those ranges themselves. When they refer new seekers to the "censor", they also offer up the ranges they already have. If the seeker encounters a problem from the "censor" (or perhaps even with nonzero probability on all fetches), they ask for the ranges *through* the referring node (who had "vouched" for the source). That referrer then tries the "censor", who then either (1) provides the material, helping it be replicated and delivered to the seeker; or (2) flakes out, ruining its reputation with the peer it had previously fooled into sending people its way. In any case, you only get to squat on a certain ID if you reliably provide ranges of any reasonable size of content at (near) that ID. As soon as you stop delivering the content on request, you're progressively evicted from the mesh. Meanwhile, any sites honestly and reliably providing that content will keep being promoted throughout peers' routing tables. - Gojomo ____________________ Gordon Mohr Bitzi CTO . . . describe and discover files of every kind. _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it!