From blanu at uts.cc.utexas.edu Fri Jun 1 00:10:02 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] Reputation System: "Dimensions of Trust" In-Reply-To: <3B173A7F.970737BA@neurogrid.com> Message-ID: > I guess I'm talking about reputations systems in general, or trust in general rather than > how it applies to Freenet. I'm talking about a reputation system for Freenet, in response to the question, "Can you imagine a reputation system for Freenet?" Also, the general trust system your talking about does not apply to Freenet in a meaningful way. > I guess as far as Freenet goes, your concern was that if we punish nodes for serving up > data that doesn't match the hash that you requested, then you are potentially punishing > the innocent nodes that are just forwarding this data. How about having the intermediate > nodes check that the file they are sending on actually matches the original request? My concern is that punishing nodes based on the user's perception of the goodness of a file as opposed to whether the node returned the requested file will destroy the routing of the network as the network is organized to get a given key efficiently, not to get a given psychological result efficiently. So if you take a system which is optimized to produce from a given hash a file that has that hash and you penalize nodes based on whether the produced file is a picture of a kitten, you will end up with a network which is not very good at finding files or pictures of kittens. > My understanding of Freenet is limited, but to sum up what I have worked out, it seems > that you search freenet for specific documents via their hashes, and inefficiencies arise > when nodes return bad data. If you can work out which nodes are returning bad data then > you can avoid asking them and return good data more efficiently, which is what you might > want a trust system for. I want a trust system for keeping my node from talking to spies and keeping my node from talking to nodes which will attempt to slip me bogus results when I asked for a file which hasn't been somehow cryptographically secured (by putting its hash or signature in the key). > Personally I'd like to be searching via keywords and learning which other users in the > system have similar data labelling approaches as me, so that I can get in touch with them > to find out new stuff, and I think that this framework requires reputation management, but > this is all beyond Freenet's scope because it searches via hashes, right? I'm not going > to be able to say I want some information on XYZ, what've you got? Or if I am then I am > going to have to query a Freenet key index, not Freenet itself, right? This is a layer on top of the base Freenet architecture. It can all be done totally over Freenet. However, the basic architecture just gets a file give the file's key. A key can be arbitrarily assigned, based on the file's hash, or based on a signature from the publisher. Keyword searching is implemented entirely on top of this system. How this is done is somewhat complicated and tangential and the subject of my talk at the P2P conference in September. But the point is that nodes know about files with attached keys. But when you search via keywords or various other kinds of metadata such as recommendations, published rankings of files, etc., you're using a system in a different layer, which knows nothing about nodes, only about publishers and "sites" (groups of documents). Also, the node layer knows nothing about publishers and sites, but only about keys and files. So you can't meaningfully mix penalizations between publishers and nodes. No node tells you that the keyword "kitten" is matched by the file "1234". A site tells you that. Put you don't fetch the file "1234" from a site, you fetch it from a node. It's kind of like the difference between a website and a hard drive. From sam at neurogrid.com Fri Jun 1 00:48:01 2001 From: sam at neurogrid.com (Sam Joseph) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] Reputation System: "Dimensions of Trust" References: Message-ID: <3B174836.29F633B9@neurogrid.com> Brandon wrote: > > I guess I'm talking about reputations systems in general, or trust in general rather than > > how it applies to Freenet. > > I'm talking about a reputation system for Freenet, in response to the > question, "Can you imagine a reputation system for Freenet?" Also, the > general trust system your talking about does not apply to Freenet in a > meaningful way. Yes, sorry for muddying the water. > > I guess as far as Freenet goes, your concern was that if we punish nodes for serving up > > data that doesn't match the hash that you requested, then you are potentially punishing > > the innocent nodes that are just forwarding this data. How about having the intermediate > > nodes check that the file they are sending on actually matches the original request? > > My concern is that punishing nodes based on the user's perception of the > goodness of a file as opposed to whether the node returned the requested > file will destroy the routing of the network as the network is organized > to get a given key efficiently, not to get a given psychological result > efficiently. So if you take a system which is optimized to produce from a > given hash a file that has that hash and you penalize nodes based on > whether the produced file is a picture of a kitten, you will end up with a > network which is not very good at finding files or pictures of kittens. Yeah, I got that. I'm not talking about assessing whether the system is returning a good picture of a set of kittens now. I'm talking about having nodes return a document that matches the hash that you used to request the document in the first place. My understanding of what you called bad data, is when I ask for data via the hash 1234 and I get something back that doesn't match the hash 1234. And I was then asking if you couldn't have all of the nodes in the chain of nodes passing the file back checking if the hash of the file matched the hash of the request. Sure it would take time, but the bad data would be caught before it got passed all the way across the network, and you could preferentially ask for data from nodes that had a reputation for returning the correct responses (in terms of the hash), and help others not make the same mistakes by making that trust information available. I have taken your point about the kittens, and I am now talking just about making sure files match the hashes that we are using to search for them. > > My understanding of Freenet is limited, but to sum up what I have worked out, it seems > > that you search freenet for specific documents via their hashes, and inefficiencies arise > > when nodes return bad data. If you can work out which nodes are returning bad data then > > you can avoid asking them and return good data more efficiently, which is what you might > > want a trust system for. > > I want a trust system for keeping my node from talking to spies Can you define a spy. > and > keeping my node from talking to nodes which will attempt to slip me bogus > results when I asked for a file which hasn't been somehow > cryptographically secured (by putting its hash or signature in the key). I can read the above two ways. Do you mean that any time anybody sends me an insecure file, that this is a bogus result? That we want to trust nodes that give us bogus results less? If so then the probabiltity metrics I was talking about coudl be used right? Like if you gave me 1 bogus file out of 100 I requested, I'd trust you 99%. Would we have to demand that all nodes checked that they weren't forwarding bogus results in order to make sure this was fair? Could this even be done? > > Personally I'd like to be searching via keywords and learning which other users in the > > system have similar data labelling approaches as me, so that I can get in touch with them > > to find out new stuff, and I think that this framework requires reputation management, but > > this is all beyond Freenet's scope because it searches via hashes, right? I'm not going > > to be able to say I want some information on XYZ, what've you got? Or if I am then I am > > going to have to query a Freenet key index, not Freenet itself, right? > > This is a layer on top of the base Freenet architecture. It can all be > done totally over Freenet. However, the basic architecture just gets a > file give the file's key. A key can be arbitrarily assigned, based on the > file's hash, or based on a signature from the publisher. Keyword searching > is implemented entirely on top of this system. How this is done is > somewhat complicated and tangential and the subject of my talk at the P2P > conference in September. But the point is that nodes know about files with > attached keys. But when you search via keywords or various other kinds of > metadata such as recommendations, published rankings of files, etc., > you're using a system in a different layer, which knows nothing about > nodes, only about publishers and "sites" (groups of documents). Also, the > node layer knows nothing about publishers and sites, but only about keys > and files. So you can't meaningfully mix penalizations between publishers > and nodes. I look forward to hearing your talk. I suspect I will ask a question like "couldn't we gain more efficient file transfer by building the content awareness in at the network level", but let us leave the enusing debate till September ;-) CHEERS> SAM From oskar at freenetproject.org Fri Jun 1 02:39:01 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens forpersistence In-Reply-To: ; from lucas@gonze.com on Thu, May 31, 2001 at 04:45:14PM -0400 References: Message-ID: <20010531235714.R1075@hobbex.localdomain> On Thu, May 31, 2001 at 04:45:14PM -0400, Lucas Gonze wrote: <> > A different question: can you conceive of a reputation system for freenet? The > reason this interests me is that it pushes the limits of anonymous pseudonyms. There are at least two levels of reputation systems conceivable on this type of network. The first is a reputation system between nodes, where nodes that behave well gain reputation with their peers, the second is a semantic reputation system of people vouching for the validity of data (particularly metadata). A reputation system between nodes doesn't reflect anything on the users really, so it is not an anonymity issue. The issue with that falls squarely inside the limited connectivity problem - making the routing work when each node can only be connected to a limited number of others (which is useful for many other things, physical lack of connectivity being the most obvious). Freenet routing has not tackled the problem of limited connectivity, because, honestly, we still need make it work better in a full connectivity situation. The hypercube like routing systems (Plaxton, Tapestry, Pastry, Chord etc) don't deal with limited connectivity either though, in fact I know of nothing that does (except IP routing of course). The second sort of system does have the issues of revealing identity that you describe, though I would say you are describing a subset of a larger problem within pseudonymity - that the more information somebody generates the easier it is to match different nyms, and the only solution to that is to avoid producing any information bound to your meatspace self (the Unabomber problem, I guess). But it is not a freenet issue since such a system could be implemented completely in metadata. > Lets say you have a public key that is not explicitly associated with your > meatspace identity. The more reputation data gathered on that key, the more > stuff that can point to your meatspace identity. EG, you use a key for a few > years. At some point you buy a plane ticket and have it mailed to your house. > At that point all the other data can be correlated back to your address. A few > years later you buy a book using that key. The cops come to that address and > find that book in your shelf. > > What this points to is a need to either sacrifice anonymity or churn identities, > and churning puts an upward limit on the quality of reputation data. > > So a highly anonymous design like Freenet's might be able to use reputation > attached to persistent pseudonyms, but would have to churn the pseudonyms fairly > often. > > - Lucas > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From oskar at freenetproject.org Fri Jun 1 02:39:02 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens forpersistence In-Reply-To: <20010531173727.I27652@belegost.mit.edu>; from arma@mit.edu on Thu, May 31, 2001 at 05:37:27PM -0400 References: <20010531173727.I27652@belegost.mit.edu> Message-ID: <20010601111327.B658@hobbex.localdomain> On Thu, May 31, 2001 at 05:37:27PM -0400, Roger Dingledine wrote: > On Thu, May 31, 2001 at 04:45:14PM -0400, Lucas Gonze wrote: > > A different question: can you conceive of a reputation system for freenet? The > > reason this interests me is that it pushes the limits of anonymous pseudonyms. > > For better phrasing and terminology (anonymous pseudonyms is a messy > phrase), take a look at > http://www.cert.org/IHW2001/terminology_proposal.pdf > > It should help you get a handle on the different types of anonymity, > pseudonymity, nymity, etc. Note that it's still a document in draft form, > and it's still changing. Ah, that should be very helpful. <> > Anyway, there's a whole lot to be covered here. Since pseudonyms and > reputations (and indeed anonymity) are a tricky thing to analyze in a > dynamic and distributed environment, I would recommend starting with > a simpler model than Freenet -- Freenet's haphazard design makes it > extremely difficult to prove or analyze any complex properties. Your somewhat predictable endless jabs withstanding, I would have to agree that there is no reason to bring Freenet into an analysis of high level personal reputation systems. For our contexts, any balloon-and-honey-pot model (http://www.machaon.ru/pooh/chap6.html) should be enough of a base for observing the linkability of pseudonyms placed on data. -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From lucas at gonze.com Fri Jun 1 09:01:15 2001 From: lucas at gonze.com (Lucas Gonze) Date: Sat Dec 9 22:11:42 2006 Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable tokens forpersistence Message-ID: per Roger: > > For better phrasing and terminology (anonymous pseudonyms is a messy > > phrase), take a look at > > http://www.cert.org/IHW2001/terminology_proposal.pdf Definitely useful. Makes me realize that the phrase 'anonymous pseudonyms' is about as articulate as 'fooey whatchamacallit'. Still, they agree with my original point that pseudonym churn is inevitable when anonymity matters. So this brings me to a thought about an attack on Freenet. It's a reputation attack based on the idea of behavioral signatures. First, I'm going to find a way to identify nodes without having explicit pseudonyms. An attacker modifies their node so that anytime it gets a piece of behavioral data on a node it publishes that data. Modem speed, chunks found, time of lookup, IP address, anything it can find. At the same time, whenever a compromised node connects to a new one it looks up published recordings of behavioral data that match the observed characteristics. The more data you get the more likely it is that nodes can be identified. In other words any persistent behaviors can be enough to establish linkability. Second, the attacker records all datums passed to the observed node and publishes these recordings. If that node's behavioral signature is long-lived enough then there is effectively a long-lived pseudonym. As noted in that terminology paper, the longer an identity persists the more data there is for an intersection attack. So if there do exist behavioral signatures that can stand in for pseudonyms then intersection attacks are possible. The only way to defeat this is to churn _behaviors_. Obviously this depends on the idea of behavioral signatures. I would say that there needs to be a lot of data and a lot of CPU for finding these, but that there does exist some quantity of data and CPU that could accomplish it. So the question to answer is whether that amount is within any feasible budget. - Lucas From hal at finney.org Fri Jun 1 10:43:01 2001 From: hal at finney.org (hal@finney.org) Date: Sat Dec 9 22:11:42 2006 Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable tokens forpersistence Message-ID: <200106011735.KAA01997@finney.org> Lucas Gonze, , writes: > So this brings me to a thought about an attack on Freenet. It's a reputation > attack based on the idea of behavioral signatures. I think you need to clarify what aspect you are attacking. In other words, you need to say, Freenet (or whatever network) claims to achieve security property X, and here is an attack on X. The property you seem to be attacking is that a node can prevent anyone from knowing that they are connecting to the same node over time. As far as I know this is not a property sought or claimed by Freenet. Hal From blanu at uts.cc.utexas.edu Fri Jun 1 13:52:01 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable tokens forpersistence In-Reply-To: Message-ID: > If that node's behavioral signature is long-lived enough then there is > effectively a long-lived pseudonym. As noted in that terminology paper, the > longer an identity persists the more data there is for an intersection attack. > So if there do exist behavioral signatures that can stand in for pseudonyms then > intersection attacks are possible. The only way to defeat this is to churn > _behaviors_. I don't see the point of this attack. In 0.4 all nodes will be identified by a persistent public key anyway. If nodes churned their public keys, this attack would allow you to discover the probable identity of a node across multiple public keys. However, nodes won't churn public keys, I don't think, as I don't see any reason for a node to change its identity. In fact, it would be better for routing if nodes always kept the same identity. From Verbatim3D at aol.com Fri Jun 1 23:23:02 2001 From: Verbatim3D at aol.com (Verbatim3D@aol.com) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence Message-ID: True but all content that is shown on the net might as well be called illegal if the " GOVERNMENT : Wants to get Rid of all NAPSTERS ! Wake up call to the US There is no stoping us !! ===== - Verb From oskar at freenetproject.org Sat Jun 2 03:11:01 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence In-Reply-To: ; from Verbatim3D@aol.com on Sat, Jun 02, 2001 at 02:22:53AM -0400 References: Message-ID: <20010602121228.A639@hobbex.localdomain> Go away. On Sat, Jun 02, 2001 at 02:22:53AM -0400, Verbatim3D@aol.com wrote: > True but all content that is shown on the net might as well be called illegal if the " GOVERNMENT : Wants to get Rid of all NAPSTERS ! Wake up call to the US There is no stoping us !! > > > ===== - Verb > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From lucas at gonze.com Sat Jun 2 10:57:01 2001 From: lucas at gonze.com (Lucas Gonze) Date: Sat Dec 9 22:11:42 2006 Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable tokensforpersistence In-Reply-To: Message-ID: > In 0.4 all nodes will be identified > by a persistent public key anyway. Ah. It will be moot. Questions: what are the public keys used for? How much activity can be linked to a node via the key? If a malicious nodes shared their knowledge of actions taken by a node, using the public key to coordinate, how much comopromising data would they have to share? From bram at gawth.com Sat Jun 2 11:23:01 2001 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence In-Reply-To: <20010602121228.A639@hobbex.localdomain> Message-ID: On Sat, 2 Jun 2001, Oskar Sandberg wrote: > Go away. > Oskar, you're an asshole. While normally I'd view this as your problem, your constant unwelcoming attitude in anything you can post to does serious damage. I wouldn't accept your help with any project I'm involved in regardless of your technical skill because I know that long term (and probably even short term) your presence would do more harm than good. Your actually getting paid to work on Freenet is a testament to it's lack of cultural leadership. You might want to change your attitude, for the sake of your own career. I for one actively forewarn people about you, and frankly, your childish inability to admit you could ever be wrong leads to poor design decisions. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From oskar at freenetproject.org Sat Jun 2 12:33:01 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence In-Reply-To: ; from bram@gawth.com on Sat, Jun 02, 2001 at 11:23:20AM -0700 References: <20010602121228.A639@hobbex.localdomain> Message-ID: <20010602213415.B838@hobbex.localdomain> Umm, is this not the list that was going to known as "p2p-elitists"? I seem to remember that the process for scaring away lamers was established even before the list was, as a compromise so that it didn't need to be invite-only. This list will descend into pointlessness in a second if we don't tell off people who post like that. My attitude toward lamers is not an attribute of my character but a behavioral adaption learned through necessity. Of course, the same thing goes for personal feuds, so I'll accept your criticism. Like anything else there is probably a grain of truth in it. I hope your projects prosper without me. On Sat, Jun 02, 2001 at 11:23:20AM -0700, Bram Cohen wrote: > On Sat, 2 Jun 2001, Oskar Sandberg wrote: > > > Go away. > > > > Oskar, you're an asshole. > > While normally I'd view this as your problem, your constant unwelcoming > attitude in anything you can post to does serious damage. I wouldn't > accept your help with any project I'm involved in regardless of your > technical skill because I know that long term (and probably even short > term) your presence would do more harm than good. Your actually getting > paid to work on Freenet is a testament to it's lack of cultural > leadership. > > You might want to change your attitude, for the sake of your own career. I > for one actively forewarn people about you, and frankly, your childish > inability to admit you could ever be wrong leads to poor design decisions. > > -Bram Cohen > > "Markets can remain irrational longer than you can remain solvent" > -- John Maynard Keynes > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From oskar at freenetproject.org Sat Jun 2 12:42:01 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable tokensforpersistence In-Reply-To: ; from lucas@gonze.com on Sat, Jun 02, 2001 at 01:54:35PM -0400 References: Message-ID: <20010602214253.C838@hobbex.localdomain> On Sat, Jun 02, 2001 at 01:54:35PM -0400, Lucas Gonze wrote: > > In 0.4 all nodes will be identified > > by a persistent public key anyway. > > Ah. It will be moot. > > Questions: what are the public keys used for? Identifying nodes. If a node gains a reference for a key, you want to know that is the same node you end up connecting too. > How much activity can be linked > to a node via the key? All activity. > If a malicious nodes shared their knowledge of actions > taken by a node, using the public key to coordinate, how much comopromising data > would they have to share? If all nodes share everything about the node, then everything will be known. If less than all nodes share everything, then the probability of guilt will be increased accordingly. It is definitely a weakness, but that's the model we have, and the collaboration weakness as well as the TA weakness, has not been denied. -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From blanu at uts.cc.utexas.edu Sat Jun 2 14:47:01 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] Reputation System: "Dimensions of Trust" In-Reply-To: <3B174836.29F633B9@neurogrid.com> Message-ID: > Yeah, I got that. I'm not talking about assessing whether the system is returning a good > picture of a set of kittens now. I'm talking about having nodes return a document that matches > the hash that you used to request the document in the first place. That is currently taken care of by the Freenet architecture. If after a file is transferred the hash doesn't match then the reference to the node which supplied it isn't stored. So references to that hash will drift away from that node. What I think we need a reputation system for is things which cannot be determined automatically by a node. As far as I can tell, that's spies and nodes which will subvert files which are not cryptographically secured files. > Can you define a spy. Certainly. A spy is a node which is there in order to gather data about network activity. An IP harvester is one kind of spy. While I want as many good nodes to find out my IP (for increased routing efficiency), I don't want any nodes which are working as IP harvesters for the enemy to find out my IP. If such a node were to contact me, I would want to play dumb and say that I was a simple web server. There is unfortunately no way that my node can automatically tell by talking to another node whether it is a friendly node trying to upload the right to free expression, or an evil node attempting to collect node IPs so that we can all be shot. This requires a reputation system by which I can tell you which nodes I trust and you can determine how much you believe me, based on how much you trust me. > > keeping my node from talking to nodes which will attempt to slip me bogus > > results when I asked for a file which hasn't been somehow > > cryptographically secured (by putting its hash or signature in the key). > > I can read the above two ways. Do you mean that any time anybody sends me an insecure file, > that this is a bogus result? No, not at all. If I insert a file into the network called "kitten.jpg" and e-mail you give you the key and assure you with all my heart that this is a picture of a kitten and you download it and it's a picture of George Washington, then one of two things needs to occur. Either you lessen your trust for me because I'm obviously some kind of crazed pathological liar, or you lessen your trust for the node that served up the file, as someone is obviously trying to pull a fast one in the network. However, if it is indeed a picture of a kitten then you can be (fairly) sure that no one in the network did anything funny, as it would be really pointless to replace one picture of a kitten with another. With secured files, there's no need for a reputation system. Either the hash matches or it doesn't. With insecure files, it's rather more difficult to tell what's going on because there's this blurred issue of trust of the publisher vs. trust of the network. > That we want to trust nodes that give us bogus results less? If ... > so then the probabiltity metrics I was talking about coudl be used right? Like if you gave me 1 > bogus file out of 100 I requested, I'd trust you 99%. Yes. The problem is that integrating routing by trust in the network level might screw up the efficient finding of information using routing by closeness. The effects are unclear at the moment. > I look forward to hearing your talk. I suspect I will ask a question like "couldn't we gain > more efficient file transfer by building the content awareness in at the network level", but let > us leave the enusing debate till September ;-) That's very sporting of you, giving me the hard question well in advance. Thanks. :-) From bram at gawth.com Sat Jun 2 15:29:01 2001 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence In-Reply-To: <20010602213415.B838@hobbex.localdomain> Message-ID: On Sat, 2 Jun 2001, Oskar Sandberg wrote: > Umm, is this not the list that was going to known as "p2p-elitists"? I > seem to remember that the process for scaring away lamers was established > even before the list was, as a compromise so that it didn't need to be > invite-only. You've been the same on this list, on freenet's dev list, on infoanarchy and in real life. I actually intended to create a list for p2p developers and make it invite-only to exclude you quite specifically, but I mentioned the idea of a developer's mailing list to zooko of the big mouth and he went and started an open list and invited everybody. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From agl at linuxpower.org Sat Jun 2 15:45:01 2001 From: agl at linuxpower.org (Adam Langley) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence In-Reply-To: Message-ID: <20010603000149.A3138@linuxpower.org> On Sat, Jun 02, 2001 at 03:28:41PM -0700, Bram Cohen wrote: > You've been the same on this list, on freenet's dev list, on infoanarchy > and in real life. Oskar took up the recognised position of keeping newbies off the freenet-devl list (not the chat list - which is free) and did it very well. Oskar is certainly forthright - and politically correct in the same way as the Atlantic Ocean is nice and dry. Which can be damm refeshing at times. And having spent about 25 hrs flying sat next to him, and about 10 days when we were (quite literally) within 20 meters for 99% of the time - I can tell you he's a damm nice person really. After that I could have axe murdered most people to get away. You don't have to listen to him and can ignore him when he disagrees with you. But if you disagree you should think damm hard about it because Oskar's a very smart guy. > I actually intended to create a list for p2p developers and make it > invite-only to exclude you quite specifically, Which says more about you than Oskar. AGL -- Don't believe everything you hear or anything you say. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 240 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20010602/5c9ecfdc/attachment.pgp From oskar at freenetproject.org Sat Jun 2 16:19:01 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence In-Reply-To: ; from bram@gawth.com on Sat, Jun 02, 2001 at 03:28:41PM -0700 References: <20010602213415.B838@hobbex.localdomain> Message-ID: <20010603012024.A1008@hobbex.localdomain> On Sat, Jun 02, 2001 at 03:28:41PM -0700, Bram Cohen wrote: <> > I actually intended to create a list for p2p developers and make it > invite-only to exclude you quite specifically, but I mentioned the idea of > a developer's mailing list to zooko of the big mouth and he went and > started an open list and invited everybody. Oooh, I know this, "The No Oskars Club". Anyways, if there is any particular thing that I have done to upset you, then I would be glad to discuss it with you and hopefully straighten things up, though this is hardly the place. If there isn't, and disliking me just makes you feel good, then that is ok too. I know you think I'm being sanctimonious, but I don't feel like fighting or bearing any ill will against you regarding this. Good night. -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From oskar at freenetproject.org Sat Jun 2 16:25:01 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence In-Reply-To: <20010603000149.A3138@linuxpower.org>; from agl@linuxpower.org on Sun, Jun 03, 2001 at 12:01:49AM +0100 References: <20010603000149.A3138@linuxpower.org> Message-ID: <20010603012546.B1008@hobbex.localdomain> For the record, Adam is nice too. He's the sort of guy who'll run around an entire hotel tracking down some aspirin to help aid one's self inflicted deadly headache. On Sun, Jun 03, 2001 at 12:01:49AM +0100, Adam Langley wrote: > On Sat, Jun 02, 2001 at 03:28:41PM -0700, Bram Cohen wrote: > > You've been the same on this list, on freenet's dev list, on infoanarchy > > and in real life. > > Oskar took up the recognised position of keeping newbies off the > freenet-devl list (not the chat list - which is free) and did it very > well. > > Oskar is certainly forthright - and politically correct in the same > way as the Atlantic Ocean is nice and dry. Which can be damm > refeshing at times. > > And having spent about 25 hrs flying sat next to him, and about 10 > days when we were (quite literally) within 20 meters for 99% of the > time - I can tell you he's a damm nice person really. After that I > could have axe murdered most people to get away. > > You don't have to listen to him and can ignore him when he disagrees > with you. But if you disagree you should think damm hard about it > because Oskar's a very smart guy. > > > I actually intended to create a list for p2p developers and make it > > invite-only to exclude you quite specifically, > > Which says more about you than Oskar. > > AGL > > -- > Don't believe everything you hear or anything you say. -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From blanu at uts.cc.utexas.edu Sat Jun 2 21:44:02 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable tokensforpersistence In-Reply-To: Message-ID: > > In 0.4 all nodes will be identified > > by a persistent public key anyway. > > Ah. It will be moot. > > Questions: what are the public keys used for? It makes node harvesting more challenging. You have to know a node's public key in order to communicate with it. This eliminates the Media Enforcer attack, which is to port scan huge ranges of IP addresses looking for Freenet nodes, request contraband material from them, and then send a letter to the ISP saying that the node served up contraband material. Now you have to know a node's public key (by getting it from another node) in order talk to it. So you have to actually run an active node in order to find other nodes. You can change your public key whenever you like, but it will uproot your presense in the network as you will now appear to everyone concerned to be a new node. From sam at neurogrid.com Sat Jun 2 22:50:01 2001 From: sam at neurogrid.com (Sam Joseph) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] Reputation System: "Dimensions of Trust" References: Message-ID: <3B19CFB2.A5E4DBE7@neurogrid.com> Hi Brandon, Brandon wrote: > > Yeah, I got that. I'm not talking about assessing whether the system is returning a good > > picture of a set of kittens now. I'm talking about having nodes return a document that matches > > the hash that you used to request the document in the first place. > > That is currently taken care of by the Freenet architecture. If after a > file is transferred the hash doesn't match then the reference to the node > which supplied it isn't stored. So references to that hash will drift away > from that node. Okay, thanks for explaining that. You don't think there would be any advantage in recording that information? I guess it is stored to the extent that a node that supplies the wrong hash doesn't get referenced too as much. But that doesn't stop the node being referenced for other things. I'm just thinking that when a node is deciding where to forward a query too, it code be sending the queries preferentially to those nodes that have been consistent in providing correct hashes. Of course I have no idea how frequently that occurs in Freenet, and so it may just not be worth the effort. It depends ifyou think that nodes that are supplying incorrect data for one hash, are also doing so for other things, and so one should preferentially forward queries to those nodes that consistently supply accurate data, since Freenet is depth first and greedy right? You might wait a while before finding out that the node was returning inaccurate data, which slows search down. But maybe it would not be so beneficial to implement, given that the node won't be referenced for that hash subsequently ... > > Can you define a spy. > > Certainly. A spy is a node which is there in order to gather data about > network activity. An IP harvester is one kind of spy. While I want as many > good nodes to find out my IP (for increased routing efficiency), I don't > want any nodes which are working as IP harvesters for the enemy to find > out my IP. If such a node were to contact me, I would want to play dumb > and say that I was a simple web server. There is unfortunately no way that > my node can automatically tell by talking to another node whether it is a > friendly node trying to upload the right to free expression, or an evil > node attempting to collect node IPs so that we can all be shot. This > requires a reputation system by which I can tell you which nodes I trust > and you can determine how much you believe me, based on how much you trust > me. So you're saying tthat there is no way to determine from the pattern of interactions at the network level whether a node is a spy or not? And there's no feedback to assess potential judgement calls. The reputation system I was talking about applies equally well if the assessment of "spyness" is generated by human users based on their suspicions. Would be nice to quantify what the human users where basing their suspicions on. Like do they have access to some data that the nodes don't? Emails from people saying "I used node X to gather data about the network and now I'm going to attack you". How does the human user recognise a spy. The transitive trust thing that Zooko talked about (which is also part of NeuroGrid) applies equally, the difficulty comes in whether there is any feedback process that allows you to assess the validity of the claims being made ... > > > > keeping my node from talking to nodes which will attempt to slip me bogus > > > results when I asked for a file which hasn't been somehow > > > cryptographically secured (by putting its hash or signature in the key). > > > > I can read the above two ways. Do you mean that any time anybody sends me an insecure file, > > that this is a bogus result? > > No, not at all. If I insert a file into the network called "kitten.jpg" > and e-mail you give you the key and assure you with all my heart that this > is a picture of a kitten and you download it and it's a picture of George > Washington, then one of two things needs to occur. Either you lessen your > trust for me because I'm obviously some kind of crazed pathological liar, > or you lessen your trust for the node that served up the file, as someone > is obviously trying to pull a fast one in the network. However, if it is > indeed a picture of a kitten then you can be (fairly) sure that no one in > the network did anything funny, as it would be really pointless to replace > one picture of a kitten with another. > > With secured files, there's no need for a reputation system. Either the > hash matches or it doesn't. So maybe I'm starting to get the distinction. The fact that I can decrypt the file with the key you've given me, certifies taht I'm receiving what you intended to send me. Given that proviso that your private key is secure, so any discrepancy between what you told m the file was, and what I think the file is, is due to differences in opinion between me and you, and not related to what might or might not have happened in the network. > With insecure files, it's rather more > difficult to tell what's going on because there's this blurred issue of > trust of the publisher vs. trust of the network. Sure. That's true. But the extent to which it is a problem depends on the degree to which you think that the source of misunderstandings and failures are the consequence of attempts to subvert the system, or just due to different ways on looking at the world. Freenet is clearly focused on creating secure, anonymous channels of communication, because of the fear that those channels will be subverted. Whereas NeuroGrid is focused on trying to remove the failings and misunderstandings introduced as a consequence of people's different perceptions, rather than active attempts at subterfuge. Ultimately which is more important is a matter of perspective, and of course the type of world in which you live. That said, it is becoming clear that NeuroGrid could gain alot from using hashes, and some kind of security where users encrypted not the files, but the relation of hashes and keywords - so that there was less chance that someone would try and forge someone elses opinions. It depends if people's opinions get attached to particular nodes, which themselves can be secured ... > > That we want to trust nodes that give us bogus results less? If > ... > > so then the probabiltity metrics I was talking about coudl be used right? Like if you gave me 1 > > bogus file out of 100 I requested, I'd trust you 99%. > > Yes. The problem is that integrating routing by trust in the network level > might screw up the efficient finding of information using routing by > closeness. The effects are unclear at the moment. I guess so, although I don't think that you have any guarantees of efficiency through finding information of routing by closeness, other than it appears to work. We don't have any math to show this do we? What kind of efficiency are we talking about anyway? > > > I look forward to hearing your talk. I suspect I will ask a question like "couldn't we gain > > more efficient file transfer by building the content awareness in at the network level", but let > > us leave the enusing debate till September ;-) > > That's very sporting of you, giving me the hard question well in advance. > Thanks. :-) My pleasure. I thought you'd need a little time, since I'm hoping to have simulations, equations, and an implementation by that point to make the question all the harder :-) CHEERS> SAM From oskar at freenetproject.org Sun Jun 3 06:07:02 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] Reputation System: "Dimensions of Trust" In-Reply-To: <3B19CFB2.A5E4DBE7@neurogrid.com>; from sam@neurogrid.com on Sun, Jun 03, 2001 at 02:48:34PM +0900 References: <3B19CFB2.A5E4DBE7@neurogrid.com> Message-ID: <20010603150745.A914@hobbex.localdomain> On Sun, Jun 03, 2001 at 02:48:34PM +0900, Sam Joseph wrote: > Hi Brandon, > > Brandon wrote: > > > > Yeah, I got that. I'm not talking about assessing whether the system is returning a good > > > picture of a set of kittens now. I'm talking about having nodes return a document that matches > > > the hash that you used to request the document in the first place. > > > > That is currently taken care of by the Freenet architecture. If after a > > file is transferred the hash doesn't match then the reference to the node > > which supplied it isn't stored. So references to that hash will drift away > > from that node. > > Okay, thanks for explaining that. > > You don't think there would be any advantage in recording that information? I guess it is stored to > the extent that a node that supplies the wrong hash doesn't get referenced too as much. But that > doesn't stop the node being referenced for other things. I'm just thinking that when a node is > deciding where to forward a query too, it code be sending the queries preferentially to those nodes > that have been consistent in providing correct hashes. In general, punishment is a good detriment to behavior for which there is some probability of getting caught doing. If the probability of discovery is low, then you can try to increase the punishment to shift the cost analysis. In this case, since all data on Freenet is hashed and signed, the chance of being caught when trying to supply bad data is 100%, so punishments are hardly necessary. Future versions of Freenet will keep some track of the behavior of neighbor nodes, and use them less if they work badly, but returning bad data is not the biggest motivation for this (simple no reply attacks are really much more annoying). > Of course I have no idea how frequently that occurs in Freenet, and so it may just not be worth the > effort. It depends ifyou think that nodes that are supplying incorrect data for one hash, are also > doing so for other things, and so one should preferentially forward queries to those nodes that > consistently supply accurate data, since Freenet is depth first and greedy right? You might wait a > while before finding out that the node was returning inaccurate data, which slows search down. > > But maybe it would not be so beneficial to implement, given that the node won't be referenced for that > hash subsequently ... The way I see it, nodes returning bad data should be considered broken, and treated as such, not as attackers. There is certainly no reason to be overly paranoid toward a node trying such a pitifully futile attack. <> > So you're saying tthat there is no way to determine from the pattern of interactions at the network > level whether a node is a spy or not? And there's no feedback to assess potential judgement calls. There certainly does not have to be. A normal Freenet node runs into many others during the course of operation, so it can be used as a harvester for node identities quite well. Even to the extent that nodes only have a limited number of contacts, an attacker can always start an arbitrary number of nodes to try to meat as many others as possible (and no, hacks like limiting based on IPs simply don't work in anything but a short term low level perspective). <> > > With insecure files, it's rather more > > difficult to tell what's going on because there's this blurred issue of > > trust of the publisher vs. trust of the network. > > Sure. That's true. But the extent to which it is a problem depends on the degree to which you think > that the source of misunderstandings and failures are the consequence of attempts to subvert the > system, or just due to different ways on looking at the world. Freenet is clearly focused on creating > secure, anonymous channels of communication, because of the fear that those channels will be > subverted. Whereas NeuroGrid is focused on trying to remove the failings and misunderstandings > introduced as a consequence of people's different perceptions, rather than active attempts at > subterfuge. I think it is becoming pretty clear that this sort of thinking is a little too akin to a "lets all be nice to one another society". Human hostility and attacks are just as fundamental to human interaction as the participants different ways of viewing the world, and systems must be designed with that in mind. < > > > Yes. The problem is that integrating routing by trust in the network level > > might screw up the efficient finding of information using routing by > > closeness. The effects are unclear at the moment. > > I guess so, although I don't think that you have any guarantees of efficiency through finding > information of routing by closeness, other than it appears to work. We don't have any math to show > this do we? > > What kind of efficiency are we talking about anyway? We are just guessing, but there is a lot to indicate that connectivity limits (especially very strict ones as such as system would have to have to matter) could be harmful. We still have plenty of issues using the full connectivity routing that we believe we can do better, so maybe when we have nailed that limited connectivity will come up. -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From lucas at gonze.com Sun Jun 3 09:42:02 2001 From: lucas at gonze.com (Lucas Gonze) Date: Sat Dec 9 22:11:42 2006 Subject: FW: [p2p-hackers] persistence in Freenet, exchangeabletokensforpersistence In-Reply-To: Message-ID: > It makes node harvesting more challenging. You have to know a node's > public key in order to communicate with it. This eliminates the Media > Enforcer attack, which is to port scan huge ranges of IP addresses looking > for Freenet nodes, request contraband material from them, and then send a > letter to the ISP saying that the node served up contraband material. Now > you have to know a node's public key (by getting it from another node) in > order talk to it. So you have to actually run an active node in order to > find other nodes. My initial reaction is that this is a strong strategy because it forces attackers to buy in, and at that point other self-protecting mechanisms can be used. What I don't like about it is that it enables strong linkability. It looks like freenet hackers are aware of that and have decided to move ahead anyway in the absence of better ideas. Yes? > You can change your public key whenever you like, but it will uproot your > presense in the network as you will now appear to everyone concerned to be > a new node. What problems come up for a new node? What are the drawbacks? - Lucas From greg at electricrain.com Mon Jun 4 21:59:01 2001 From: greg at electricrain.com (Gregory P. Smith) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] Reputation System: "Kittens of Trust" In-Reply-To: ; from blanu@uts.cc.utexas.edu on Sat, Jun 02, 2001 at 04:46:38PM -0500 References: <3B174836.29F633B9@neurogrid.com> Message-ID: <20010604215847.A28157@zot.electricrain.com> > No, not at all. If I insert a file into the network called "kitten.jpg" > and e-mail you give you the key and assure you with all my heart that this > is a picture of a kitten and you download it and it's a picture of George > Washington, then one of two things needs to occur. Either you lessen your > trust for me because I'm obviously some kind of crazed pathological liar, > or you lessen your trust for the node that served up the file, as someone > is obviously trying to pull a fast one in the network. However, if it is > indeed a picture of a kitten then you can be (fairly) sure that no one in > the network did anything funny, as it would be really pointless to replace > one picture of a kitten with another. Unless you are storing a kitten photo lineup on the network and having witnesses identify who peed on your rug by giving them non hash based links to the images in the network. Then the urine happy kitten's litter has lots to gain by replacing pictures of kittens with others at random intervals to disrupt the lineup... anyways, preaching to the choir here, we all know that hashing kittens is the only way to know you've got the right kitten in this world. digressingly, Greg From jim at at.org Thu Jun 7 11:27:01 2001 From: jim at at.org (Jim Carrico) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] Reputation System: "Dimensions of Trust" In-Reply-To: References: <3B174836.29F633B9@neurogrid.com> Message-ID: Brandon Wiley, responding to Sam Joseph, said: >> Can you define a spy. > >Certainly. A spy is a node which is there in order to gather data about >network activity. An IP harvester is one kind of spy. While I want as many >good nodes to find out my IP (for increased routing efficiency), I don't >want any nodes which are working as IP harvesters for the enemy to find >out my IP. If such a node were to contact me, I would want to play dumb >and say that I was a simple web server. There is unfortunately no way that >my node can automatically tell by talking to another node whether it is a >friendly node trying to upload the right to free expression, or an evil >node attempting to collect node IPs so that we can all be shot. This >requires a reputation system by which I can tell you which nodes I trust >and you can determine how much you believe me, based on how much you trust >me. Correct me if I'm missing something, but this suggests that the problem of rating the "spyness" of a node is equivalent to determining the "spyness" of the node operator, or rather the aggregate spyness of everyone with physical access to the hardware on which the node is running. (multiplied by a 'weighting factor' assessing the host machine's vulnerability to subversion by skilled attackers. eg. even if I've known bob since kindergarten, and am certain he's not working for Dr. Evil, I may not rate his node very highly if I have low confidence in his ability to secure his machine against evil agents, etc.) To make such a rating with confidence suggests a pretty intimate and detailed real-world knowledge of the node operator(s), hardware, location, etc. If this is so, then it seems that a "white list" of trusted nodes will itself provide a lot of information about real-world relationship between node operators, a rich data source for intersection attacks against both node and publisher anonymity. Even if such lists aren't associated directly with nodes - ie they are signed by pseudonymous publishers as in Steven Hazel's combined trust proposal, if enough of these lists were captured by the bad guys, it would seriously compromise the publisher/node "unlinkability" that is obviously critical to the success of freenet's stated mission. Over and above the problems with routing that Brandon mentioned in an earlier post, this seems like good reason to abandon the white list concept, and with it perhaps any notion of a node-based reputation system for Freenet. However, in a scenario in which freenet's anonymity may actually be *necessary* - eg. chinese dissidents whose lives really are in danger - it's likely that simply running a freenet node will be enough to implicate one as a dissident, regardless of what's on it or who it's connected to. In other words, lack of a node trust system seriously compromises freenet's effectiveness in precisely the area in which it is most needed. There are no doubt refinements to the white list concept, and further layers of obfuscation which may help to make nodes difficult to discover, but intuition suggests this will always tend to compromise the routing efficiency and general usefulness of the network. An alternative approach is to try to develop freenet into a general-purpose tool with many benign and banal purposes, that *also happens to guarantee free speech*. This would be the 'hide in plain sight' approach, related to the notion that encrypted communications are only really effective if a significant proportion of the total communications are encrypted. The reason I'm thinking about all this is in consideration of an area in which "anonymous speech" is ubiquitous, familiar and universally accepted: the concept of the secret ballot, one of the principal underpinnings of modern democratic political systems. The point of a secret ballot is so that one can announce ones opinion without that opinion being "linkable" to one's real identity. Concepts surrounding the transparency, integrity, and fairness of elections are dependant on this principle - and so if it's a good idea to have this ability once every four years (or so), why should it not be a good idea to have it all the time? Extrapolating some current trends of digital culture, we seem to be heading toward a world in which everything we do, see, hear, or buy will be *trackable* and linkable to our real identities. Microsoft in particular wants to "own" this data, and one can only presume that the US.gov's apparent "softening" on the anti-trust breakup threat is linked to their salivating over the unprecedented level of surveillance - at arms length - this would provide. The only way to thwart such a dystopian scenario will be to build systems which anonymize *everything* by default. If one wishes to announce any information publicly, of course anyone will always be free to do that. But if we can establish this principle, that everyone has a right to *not be observed* by marketers, spammers, or spooks - then i think it's a good bet that systems which guarantee this right will find their way into the mainstream. This seems like the most effective way of providing "cover" for political dissidence and other overtly threatened forms of speech. Zooko's "why am i not pseudonymous yet" (http://www.inet-one.com/cypherpunks/dir.98.12.21-98.12.27/msg00018.html) suggests that any truly dedicated agency will be able to match pseudonymous identities (ie. publishers) with real identities (ie. nodes). But as with encrypted traffic, the more ubiquitous and *normal* it is to use anonymity or pseudonymity on a daily basis, the more difficult the attackers job becomes. Am i missing something important? From zooko at zooko.com Thu Jun 7 12:24:01 2001 From: zooko at zooko.com (zooko@zooko.com) Date: Sat Dec 9 22:11:42 2006 Subject: there is no security without a threat model (was: Re: [p2p-hackers] Reputation System: "Dimensions of Trust") In-Reply-To: Message from Jim Carrico of "Thu, 07 Jun 2001 11:25:58 PDT." References: <3B174836.29F633B9@neurogrid.com> Message-ID: Jim Carrico wrote: > > Zooko's "why am i not pseudonymous yet" > (http://www.inet-one.com/cypherpunks/dir.98.12.21-98.12.27/msg00018.html) > suggests that any truly dedicated agency will be able to match pseudonymous > identities (ie. publishers) with real identities (ie. nodes). But as with > encrypted traffic, the more ubiquitous and *normal* it is to use anonymity > or pseudonymity on a daily basis, the more difficult the attackers job > becomes. I agree with your thesis that "Anonymity Loves Company", as the saying goes. If you read "why i am not truly pseudonymous yet", please also read the recent addendum [1]. One thing that can be said for certain is that attempts to provide security of any kind desperately need to have the threat model made explicit. Explicit threat models will enable actual engineering rather than "shots in the dark". Explicit threat models can also enable "good enough but not perfect" solutions which can be deployed earlier and to a wider user base. Finally, an explicit threat model is absolutely required if an end user is going to be able to make an informed decision about the risks he or she chooses. The current common practice of assuming an *implicit* threat model (which typically has rigorous mathematical constraints but unexamined real-world constraints) is reprehensible -- akin to building a bridge with a theoretically perfect design but no actual testing and advertising it as "proven safe". Until Freenet, or any other system, has an explicit threat model against which its security guarantees are measured, it is not suitable to be trusted with real world risks. As a mea culpa, Mojo Nation was briefly guilty of this sort of irresponsible behaviour last year when a "FAQ" question implied that Mojo Nation offered anonymity. Regards, Zooko [1] http://zooko.com/memory_lane.html P.S. To give credit where it is due, Bram Cohen is partially responsible for influencing my thinking on this issue. I remember him saying that the hardest part of crypto engineering is deciding what threat model you are addressing. From blanu at uts.cc.utexas.edu Thu Jun 7 16:41:02 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] Re: there is no security without a threat model (was: Re: [p2p-hackers] Reputation System: "Dimensions of Trust") In-Reply-To: Message-ID: > Until Freenet, or any other system, has an explicit threat model against which > its security guarantees are measured, it is not suitable to be trusted with > real world risks. You said it better than I could have. Preach on, brother! From dmarti at zgp.org Wed Jun 13 18:59:01 2001 From: dmarti at zgp.org (Don Marti) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] Chess over Freenet Message-ID: <20010613185833.A22647@zgp.org> Brandon Wiley's article on "Applications over Freenet: a Decentralized, Anonymous Gaming API" is up. http://www2.linuxjournal.com/articles/culture/0027.html (I wonder why they put it under "culture") -- Don Marti "I've never sent or received a GIF in my life." http://zgp.org/~dmarti -- Bruce Schneier, Secrets and Lies, p. 246. dmarti@zgp.org Free the Web, burn all GIFs: http://burnallgifs.org/ From arma at mit.edu Sun Jun 17 21:01:01 2001 From: arma at mit.edu (Roger Dingledine) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] keyword searching + consistent hashing? Message-ID: <20010618000023.W968@belegost.mit.edu> Let's say I have a Chord (or any consistent hashing) service for which I can do hash lookups -- that is, I can fetch a document in O(lg n) hops once I know what it's called. [1] Has anybody found a good way of integrating keyword searching into this framework? I don't want to get into anything complicated -- I just want the user to be able to put in English words and get back keys to use in the Chord network. I can see separating the two: having a separate network or computer which "knows" everything in the Chord service, and can answer search queries -- perhaps publishers inform the service, or perhaps it crawls the nodes looking for available files. But I'm hoping for something more integrated. I can imagine systems where publishers provide a description along with the document, hash each keyword of the description, and then register those hashes with a Chord service, so Chord will allow you to do the actual searching. But those seem rather kludgy and potentially lopsided (eg, whoever lives at H("mp3") in the keyspace is going to be having a bad year -- but perhaps enough caching and replication can resolve that). Can you point me in some likely directions? Or am I crazy to think of getting searching out of a Chord-style architecture, and I should be looking at the "separate search engine" approach? How do non-broadcast file sharing architectures (intend to) do searching? I'd been completely ignoring the issue of searching before, but I've finally realized I can't afford to ignore it. Thanks, --Roger [1] http://web.mit.edu/6.033/www/handouts/dp2-chord.html , http://pdos.lcs.mit.edu/~kaashoek/chord.ps From tcole at espnow.com Sun Jun 17 21:19:01 2001 From: tcole at espnow.com (Tavin Cole) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] keyword searching + consistent hashing? In-Reply-To: <20010618000023.W968@belegost.mit.edu>; from arma@mit.edu on Mon, Jun 18, 2001 at 12:00:24AM -0400 References: <20010618000023.W968@belegost.mit.edu> Message-ID: <20010618001757.K8476@niss> On Mon, Jun 18, 2001 at 12:00:24AM -0400, Roger Dingledine wrote: > I'd been completely ignoring the issue of searching before, but I've > finally realized I can't afford to ignore it. There are several peer to peer information distribution applications which could all benefit from a simple keyword searching service, or perhaps something more complex. A very general peer to peer indexing and/or spidering network that was capable of searching for content on multiple other networks would be the best approach, in my opinion. Maybe this is something we can all collaborate on. Aside from the niceties of having a unified search network for all peer to peer systems, I take this position because the network is basically the topology, and you want to optimize the topology of each network for the functions it performs. -- # tavin cole # # "Technology is a way of organizing the universe so that # man doesn't have to experience it." # # - Max Frisch From gojomo at usa.net Sun Jun 17 21:34:01 2001 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] keyword searching + consistent hashing? References: <20010618000023.W968@belegost.mit.edu> Message-ID: <008301c0f7af$d8d43fe0$e8c7a540@tron> I bet the 1998 Google paper would engenger some good ideas: http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm Another shoot-from-the-hip idea: Let's say your search is something like "Beatles AND Revolution AND mp3". Which word's "expert node" should you query first, 'Beatles', 'Revolution', or 'mp3'? I suspect that there should be a globally-shared ordering of words -- so that searchers and indexers proceed through multi-word queries in exploitably predictable ways. I also suspect an ordering based on either word-frequency or search-frequency might work best. On average, then, certain indexes only need to remember word-concordances in the "more-common" direction. That is, the 'mp3' node doesn't need to remember everywhere it appears with 'Revolution' or 'Beatles', but those have to remember "forward" to the more-common 'mp3' term. I think this would have the effect of halving, on average, index sizes and might serve to counterbalance the popular-word effect you mention. - Gojomo ----- Original Message ----- From: "Roger Dingledine" To: Sent: Sunday, June 17, 2001 9:00 PM Subject: [p2p-hackers] keyword searching + consistent hashing? > Let's say I have a Chord (or any consistent hashing) service for which > I can do hash lookups -- that is, I can fetch a document in O(lg n) > hops once I know what it's called. [1] > > Has anybody found a good way of integrating keyword searching into this > framework? I don't want to get into anything complicated -- I just want > the user to be able to put in English words and get back keys to use > in the Chord network. I can see separating the two: having a separate > network or computer which "knows" everything in the Chord service, and > can answer search queries -- perhaps publishers inform the service, or > perhaps it crawls the nodes looking for available files. But I'm hoping > for something more integrated. > > I can imagine systems where publishers provide a description along with > the document, hash each keyword of the description, and then register > those hashes with a Chord service, so Chord will allow you to do the > actual searching. But those seem rather kludgy and potentially lopsided > (eg, whoever lives at H("mp3") in the keyspace is going to be having a > bad year -- but perhaps enough caching and replication can resolve that). > > Can you point me in some likely directions? Or am I crazy to think of > getting searching out of a Chord-style architecture, and I should be > looking at the "separate search engine" approach? How do non-broadcast > file sharing architectures (intend to) do searching? > > I'd been completely ignoring the issue of searching before, but I've > finally realized I can't afford to ignore it. > > Thanks, > --Roger > > [1] http://web.mit.edu/6.033/www/handouts/dp2-chord.html , > http://pdos.lcs.mit.edu/~kaashoek/chord.ps > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > From hal at finney.org Sun Jun 17 21:34:02 2001 From: hal at finney.org (hal@finney.org) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] keyword searching + consistent hashing? Message-ID: <200106180426.VAA23379@finney.org> There have been a lot of discussions in the Freenet developers list about searching over the past year. Generally there are two camps: (a) searching will never work, and (b) it will too. One of the ideas, from Ian Clarke, inventor of Freenet, is to exploit Freenet's normal routing algorithm. This is basically a steepest-descent "search" with backtracking, where each node vectors the request to the neighbor node which has the closest key to that being requested. To do searching, instead of running this algorithm with hashes as it is normally done, it would be run with strings of keywords. So you might have the location key for a document be "MP3 Grateful Dead Sugar Magnolia". If someone looks for this exact string, the Freenet routing algorithm will find the document just as easily as if it were looking up by a hash. However the routing algorithm would then be changed to do fuzzy matches, which might be smart and involve rearranging words and checking for close spellings, etc. So if someone searched for "Grateful Dead MP3" that might be a close match to the string above, and the search request would find its way through the network and might find this document. As I said there is skepticism that this idea will work but perhaps at some point it will be tried if no better ideas come along. A different suggestion that was made was to add pointers to a document at the locations associated with all possible subsets of its keywords. So with the keywords above you'd have pointers from "Grateful Dead" and "MP3 Sugar Magnolia", etc. (This would use the original hash-based lookup concept.) To avoid the single-word-overload problem you could either only do it for sets of 3 or more words, or else you'd just put a hard limit on the number of entries stored for any set of keywords, and so if someone searched for "MP3" they'd only find the 100 most recent entries that used that word, making popular single word lookups relatively useless but not harmful. More specific searches would be necessary to get better results. Hal From sah at thalassocracy.org Mon Jun 18 03:50:01 2001 From: sah at thalassocracy.org (Steven Hazel) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] keyword searching + consistent hashing? In-Reply-To: hal@finney.org's message of "Sun, 17 Jun 2001 21:26:01 -0700" References: <200106180426.VAA23379@finney.org> Message-ID: <871yoi9i54.fsf@azrael.dyn.cheapnet.net> On Sun, 17 Jun 2001, hal@finney.org wrote: > However the routing algorithm would then be changed to do fuzzy > matches, which might be smart and involve rearranging words and > checking for close spellings, etc. So if someone searched for > "Grateful Dead MP3" that might be a close match to the string above, > and the search request would find its way through the network and > might find this document. > > As I said there is skepticism that this idea will work but perhaps > at some point it will be tried if no better ideas come along. I am convinced that "fuzzy" might as well mean "magic" for the purposes of this discussion. For some background, here is the relevant text from Ian Clarke's 2 Jun 2001 post to the Freenet development list: > Currently a key in Freenet consists, essentially, of a string of > text. We define a "closeness" function between two keys, so that > given three keys, A, B, and C, we can determine which of A or B is > closer to C. To do this we have chosen a lexographic comparison, so > that "aardvark" is closer to "apple" than "zebra" is. It was > apparent, even when writing my original paper describing the Freenet > architecture, that much more complex keys could be used with > correspondingly more sophisticated closeness operations. > > So how does this help up with Fuzzy searching? Well consider we > defined a new key-type, called a MetadataKey, which rather than just > a single string of text, it consisted of a number of key-data pairs, > such as: > > "artist" => "tori amos" > "album" => "little earthquakes" > "song" => "winter" > "year" => "1988" > > Now, lets say that we wanted to search for an mp3 which was stored > under such a key. We could define a search like this: > > ("artist" string= "tori amos") AND > ("album" contains "litt") AND > ("song" contains "w") AND > ("year" lessThan "2000") > > A node receives this search, and compares it to each of the > MetadataKeys in its datastore. It uses fuzzy logic to come up with > a value between 0 and 1 for how closely each MetadataKey matches > this search (I have thought this out in more detail but for brevity > won't explain here, do a web search for "fuzzy logic", and > "Levenshtein distance"), a perfect match being 1. The search > request is then forwarded to the referece associated with the > closest match. Once the HTL runs out, a SearchResponse message is > sent which contains, at any given time, the top, say, 10 matches for > the query, along with the CHK of the data they refer to. Each node > updates this as they pass the search request back to the requester. > The requester can then choose which of these they wish to request > using conventional Freenet messages. (For those of you who aren't familiar with the Levenshtein distance -- that's pronounced "edit distance" -- it's the number of deletions, additions, and substitutions which must be performed to transform one string into another.) The problem with this is that Freenet routing, or any key-based routing, for that matter, depends on a node being assigned a portion of the keyspace for which it is responsible. With ordinary keys, we do this taking the hash of some data, and assigning each node a range of numbers for which it is responsible. There hasn't yet been a convincing description of an algorithm for meaningfully dividing up the metadata keyspace. What Ian partially describes is an interesting way to order metadata with respect to a given search query, but that's not the problem that needs to be solved. We can now compare A and B against C, and learn that A is meaningfully closer to C than B is, based on some (hopefully useful) definition of "meaningfully closer". But that comparison is relative to C. Knowing that B < A with respect to C doesn't tell us anything about the relationship of A to C with respect to B, or of B to C with respect to A. For Ian's closeness relationship to work, the nodes would have to re-arrange the distribution of metadata in the network before each search. Here's a concrete example, using edit distance as closeness, and the values "Toby", "Toons", and "Tony": With respect to "Tony": "Toby" = 1 "Toons" = 2 And with respect to "Toons": "Tony" = 2 "Toby" = 2 And with respect to "Toby": "Tony" = 1 "Toons" = 2 So, globally, in what order do those values occur? "Toby" and "Tony" are clearly both equidistant from "Toons". But they're also clearly equidistant from one another by a lesser but non-zero amount. There is no global order. We can't meaningfully map that closeness relationship onto a global ordering and deal out subsets to nodes. So there's no way that a node can know which other node to forward a request to without the data in the network being organized specially for a particular query. And therefore there's no such thing as "fuzzy searching". -S From alk at pobox.com Mon Jun 18 11:05:01 2001 From: alk at pobox.com (Tony Kimball) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] keyword searching + consistent hashing? References: <20010618000023.W968@belegost.mit.edu> Message-ID: <15150.17062.143746.123137@spanky.love.edu> Quoth Roger Dingledine on Monday, 18 June: : : (eg, whoever lives at H("mp3") in the keyspace is going to be having a : bad year -- but perhaps enough caching and replication can resolve that). "mp3" is a UI use-case reductio-ad-absurdum: Users should not normally be presented with the option of searching for a .mp3 file extension, nor should indexing typically index on that term in that position. A more realistic example is "Spears" or "Eminem" or such like, which are actually useful discriminators, but still represent problematic hotspots in the hash space. Dexter's approach to this problem is two-fold: (1) Flatten the distribution of hashes by combining terms. Hotspots are less hot and more diffuse for than for . A search for 'kronos quartet purple haze' can result in lookups , , the results of which are combined in the client. There are a number of tricks applied to make this more generally applicable, for example: pre-normalizing the keys by metaphones; filtering prepositions, articles, and copulae; and, pruning the term permutation trees heuristically. (2) Distribute the load by virtualizing the node. The hash node need not be a single host. Query load can be distributed over a group of hosts by round-robining, and storage load can be distributed by WAN RAID technique. : How do non-broadcast : file sharing architectures (intend to) do searching? Most of what I've seen has been content-routed using Bloom filters to represent routing tables (a lossy compression technique). I personally consider the storage requirements, the update bandwidth requirements, and the practical lookup latencies in such networks are too great, at Internet scales, for todays typical network nodes and links (say P2-400/64M/5G/56k), but it is still an interesting approach for fat nodes in broadband networks. I don't know of any viable project developing a content-routing search network. Which is strange, really, since it is such a good way of doing intranet content discovery. There's also a camp based on exploiting the power-law 'supernode'. This approach has value for approximate searches on weighted graphs, but trades off between discovery guarantees and bandwidth scaling. If you are comfortable with this tradeoff, you should look into systems such as OpenCola and Neurogrid. From zooko at zooko.com Mon Jun 18 20:37:02 2001 From: zooko at zooko.com (zooko@zooko.com) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message from Roger Dingledine of "Mon, 18 Jun 2001 00:00:24 EDT." <20010618000023.W968@belegost.mit.edu> References: <20010618000023.W968@belegost.mit.edu> Message-ID: Roger Dingledine wrote: > > Let's say I have a Chord (or any consistent hashing) service for which > I can do hash lookups -- that is, I can fetch a document in O(lg n) > hops once I know what it's called. I think of this as two separate problems: immutable namespaces and mutable namespaces. Keyword searching must be implemented atop a mutable namespace (because immutable namespaces have keys that are not human-readable.) I'm not really going to address your questions about searching (although note that there is an unused implementation of search-by-consistent-hashing in Mojo Nation: [1] -- it sounds sort of like Dexter's approach). Instead I'm going to babble about the larger picture and how we're all fondling different parts of the elephant while Stratton Sclavos laughs all the way to the bank. (Oops -- I just changed metaphors in midstream...) Self-Authenticating, Immutable, Mutable, Distributed Namespaces can be distributed as long as the objects are *self-authenticating*. That is: when you get an object you can easily verify that it matches the key that you started with. (Note: there is still a weak DoS attack by spreading bogus objects that the searcher has to reject, but that seems very weak.) Two examples of self-authenticating objects are any object where the key is its collision-free hash, and any object where the key includes a public key and the object comes with a signature from the corresponding private key. (In Mojo Nation, the former is MojoIds, and the latter is not implemented. In Freenet, the former is Content Hash Keys and the latter is Sub-Space Keys. In the Self-Certifying File System[2] the latter is used for remote directories.) The hash-name approach creates an immutable namespace, and one with non-human-readable keys. The signed-object approach also allows for mutable namespaces, but it cannot be perfectly distributed in the same sense that immutable namespaces can. This is because *someone* has to be able to change what value a key maps to, and there isn't any way to have a universal agreement on what the new value should be. (Unlike an immutable namespace, where we can all universally agree that the value for a key should be the thing that is SHA1'ed to create that key.) Who Controls The Mutants? So the first decision you have to face when designing a mutable namespace is "who controls the mappings?". The two most obvious answers are: a. a centralized committee[3] does (DNS). or b. Anyone who creates a new part of the namespace by creating a public key controls it by using the corresponding private key (SDSI[4]). Now we all see the problems with the first solution (although as an aside, the vast majority of people do *not* see the problem with the first, and we may very well be stuck with it for the next couple of decades, whether we like it or not). Okay, so now we agree that for there to be a good distributed mutable namespace, all the objects have to be self-authenticating and control over mapping must be local to the holder of the relevant secret key. What does this have to do with searching? What Does This Have To Do With Searching? Well.. Actually I'm not sure, but I feel like there must be some important connection here. Hopefully someone else can tell me what it is, or convince me that there isn't one. ;-) Regards, Zooko [1] http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/mojonation/evil/common/ContentHandicappers.py?content-type=text/plain [2] http://fs.net/ [3] "The Man Who Bought The Internet" http://www.fortune.com/indexw.jhtml?channel=artcol.jhtml&doc_id=202984 [4] http://theory.lcs.mit.edu/~cis/sdsi.html From blanu at uts.cc.utexas.edu Tue Jun 19 00:05:01 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] keyword searching + consistent hashing? In-Reply-To: <20010618000023.W968@belegost.mit.edu> Message-ID: > Has anybody found a good way of integrating keyword searching into this > framework? Ah, glad you asked! Now you can skip my O'Reilly P2P talk. :-) > I can imagine systems where publishers provide a description along with > the document, hash each keyword of the description, and then register > those hashes with a Chord service, so Chord will allow you to do the > actual searching. But those seem rather kludgy and potentially lopsided > (eg, whoever lives at H("mp3") in the keyspace is going to be having a > bad year). So the system I'm advocating starts with this idea. It's for searching Freenet, so it uses Freenet hashing and routing. As everyone knows, Freenet is two hops away from Chord, so it should work with Chord as well. In order to avoid flooding the most popular keyword, which I choose to call "britney" for absolutely no reason, you give each entry a number. So the first item with that keyword is filed under "britney-0". The second is "britney-1", etc. With a good hashing algorithm the various entries should then be spread more or less evenly around the entire keyspace. This also avoids collisions in Freenet, which can only have one item called "britney". This approach has some problems. First of all, it makes things slow to insert as you have to find the highest value of x. There are some obvious techniques for speeding this up so that it scales. Additionally, x will eventually get really large and you will have to store it in a bignum, which is really obnoxious. Also, in Freenet things fall out, which makes it difficult to determine the highest value of x. The more sparse the collection of items in the keyword index is, the less likely it is that you will correctly determine the correct value for x. The solution we're currently using for this is to have date-based indices. Each key has a timestamp for the day that it was inserted. So the key format is prefix-timestamp-number. The advantage to this is that the numbering starts over every day. Inserting doesn't take as long, you don't run out of numbers, and the probability of getting the right value for x is high. Each item still has a unique identifier so that the items are spread over the evenly over the keyspace. Personally, I don't really like keyword searching. I'm much more fond of metadata searching based on fields with semantic content, such as Title, Author, Publication Date, etc.. So the searching system which I'm going to be presenting is for doing that rather than keyword searching. However, it's really the same idea, just slightly modified. Also, in a Chord system timestamps aren't strictly necessary as data doesn't fall out. In such a system I think you could do some neat things with assigning numbers to the items so as to balance lookup and insert times. From blanu at uts.cc.utexas.edu Tue Jun 19 00:21:01 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message-ID: > Okay, so now we agree that for there to be a good distributed mutable > namespace, all the objects have to be self-authenticating and control over > mapping must be local to the holder of the relevant secret key. What does this > have to do with searching? I do indeed agree. I can relate this quite easily to searching using a Freenet example. The problem with the namespace system which I just posted is that it uses human-readable keys, which are of course not self-verifying. Unfortunately, there is no way to allow for public submission of information which can be verified since verification requires you have some information (hash or public key) and the whole idea of public submission is that you have neither of these. So anything publicly submitted can be corrupted by evil nodes in the network. Therefore, it is very useful to have trusted individuals sort through the submissions and make signed copies of them. My conception of the searching system which I'm advocating for Freenet is that there will be a place for insecure public submissions. Several "search engines" will provide a service of scanning these submissions, verifying, sorting, and ranking them automatically or manually, and then place them in new indices which are signed. If you trust one of these engines then you can just search their indices. If you don't trust anyone or fear that they are censoring content when the verify it, then the raw, insecure submissions are still there for you to look at. Though I'm working on a searching system for Freenet, these issues are pretty much the same for any system. From zooko at zooko.com Tue Jun 19 08:07:01 2001 From: zooko at zooko.com (zooko@zooko.com) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message from Brandon of "Tue, 19 Jun 2001 02:20:33 CDT." References: Message-ID: Brandon wrote: > > I do indeed agree. I can relate this quite easily to searching using a > Freenet example. The problem with the namespace system which I just > posted is that it uses human-readable keys, which are of course not > self-verifying. > > Unfortunately, there is no way to allow for public submission of > information which can be verified since verification requires you have > some information (hash or public key) and the whole idea of public > submission is that you have neither of these. Hm. I don't think I understand what "public submission" is. (At least not in the context of distributed namespaces...) Why not require the client to generate an RSA key pair in order to submit the information? The major drawback that I can see is that it makes the key non-human- writeable/memorable. Is that what you mean when you say that public submission requires that you don't have a public key? By the way, I'm surprised that I forgot to reference the Pet Names Markup Language[1] in my previous "overview of namespaces" letter. The Pet Names Markup Language is a way to map human-memorable names onto the kind of distributed mutable namespace that I am imagining: self-authenticating, each mapping controlled by its private key, and a third quality that I didn't mention yet: that namespaces can be transitively linked. This kind of namespace is probably best described in the papers on SDSI -- the Simple Distributed Security Infrastructure by Ron Rivest (the "R" in "RSA") [2, 3]. When reading the PNML web page, you can obviously envision different user interfaces and encodings and so forth, but the basic structure of pet names, suggested pet names, translation on introduction, etc. is an excellent and elegant solution to the problem. I can't recommend the SDSI/PNML concepts highly enough. In my opinion, if you design a distributed mutable namespace, you ought to either use SDSI/PNML, or have a good reason why you chose not to. I'm still not entirely clear on the relation between these namespace issues and distributed keyword search schemes, so please write back! Regards, Zooko [1] "Lambda for Humans -- The Pet Name Markup Language" Mark Miller http://www.erights.org/elib/capability/pnml.html [2] "SDSI -- A Simple Distributed Security Infrastructure" 1996 Ronald L. Rivest, Butler Lampson http://citeseer.nj.nec.com/rivest96sdsi.html [3] "On SDSI's Linked Local Name Spaces" 1998 Martín Abadi http://citeseer.nj.nec.com/abadi98sdsis.html From wesley at felter.org Tue Jun 19 12:01:02 2001 From: wesley at felter.org (Wesley Felter) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message-ID: On Mon, 18 Jun 2001 zooko@zooko.com wrote: > Roger Dingledine wrote: > > > > Let's say I have a Chord (or any consistent hashing) service for which > > I can do hash lookups -- that is, I can fetch a document in O(lg n) > > hops once I know what it's called. > > I think of this as two separate problems: immutable namespaces and mutable > namespaces. Keyword searching must be implemented atop a mutable namespace > (because immutable namespaces have keys that are not human-readable.) Hey, back up a second there. What exactly is the difference between mutable and immutable namespaces? If you mean the mutability of the mappings themselves (so in an immutable namespace, a name would always map to the same thing forever (where "the same thing" is left undefined for the moment)), I don't see why keys in an immutable namespace must be non-human-readable. Wesley Felter - wesley@felter.org - http://felter.org/wesley/ From zooko at zooko.com Tue Jun 19 12:40:01 2001 From: zooko at zooko.com (zooko@zooko.com) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message from Wesley Felter of "Tue, 19 Jun 2001 14:11:26 CDT." References: Message-ID: Wes Felter wrote: > > > I think of this as two separate problems: immutable namespaces and mutable > > namespaces. Keyword searching must be implemented atop a mutable namespace > > (because immutable namespaces have keys that are not human-readable.) > > Hey, back up a second there. What exactly is the difference between > mutable and immutable namespaces? If you mean the mutability of the > mappings themselves (so in an immutable namespace, a name would always map > to the same thing forever (where "the same thing" is left undefined for > the moment)), I don't see why keys in an immutable namespace > must be non-human-readable. Good question. The answer is that there was an implicit extra requirement in there: that the namespace be perfectly distributed (i.e. nobody ever disagrees about what value a given key should map to, nor does anyone find themselves hindered from entering a new value into the namespace). The only way, AFAIK, to achieve a *perfectly distributed* namespace, is for it to be an immutable namespace, and for it to be based on collision-free hashes and hence have non-human-memorable keys. Furthermore, there *can't* be any namespace that satisfies my idea of "perfectly distributed" and has human-memorable keys, because there would immediately be conflicting preferences about what the key "mcdonalds.com" mapped to in that namespace, violating one of my criteria for "perfectly distributed". (As a side note, the only reason we are having this problem is that human brains are built to remember collections of only seven plus or minus two arbitrary symbols, and ASCII only offers about 7 bits per of symbols, for a grand total of about 63 bits per memorable word. I sometimes wonder what would happen if we all used a so-called "ideographic" script like Chinese, and had maybe 20 reliably distringuishable bits per symbol, and could encode a whole 160 bit unique id into 8 symbols. In real life, I doubt that readers of Chinese cannot reliably distinguish anywhere near that many symbols...) Regards, Zooko From blanu at uts.cc.utexas.edu Tue Jun 19 12:47:02 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message-ID: > > Unfortunately, there is no way to allow for public submission of > > information which can be verified since verification requires you have > > some information (hash or public key) and the whole idea of public > > submission is that you have neither of these. > > Hm. I don't think I understand what "public submission" is. (At least not in > the context of distributed namespaces...) Why not require the client to > generate an RSA key pair in order to submit the information? A public key doesn't mean anything if you don't have it *before* the submission. The search system I'm advocating requires keys which are not necessarily human-readable, but rather *guessable* so that they can be automatically retrieved by a client. If you have everyone's public keys beforehand then you can search these various keyspaces for items. If you don't have someone's public key then you can't find any of their items at all. So you require either out-of-band public key distribution, or an insecure public space for the submission of public keys. So if you want people that you don't already know to submit entries that you can read, then you need an insecure public space at some point in the process. From wesley at felter.org Tue Jun 19 12:48:01 2001 From: wesley at felter.org (Wesley Felter) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message-ID: On Tue, 19 Jun 2001 zooko@zooko.com wrote: > Good question. The answer is that there was an implicit extra requirement in > there: that the namespace be perfectly distributed (i.e. nobody ever disagrees > about what value a given key should map to, nor does anyone find themselves > hindered from entering a new value into the namespace). Didn't Raph Levien write a paper about how to build a distributed namespace with arbitrary keys (given out FCFS)? Wesley Felter - wesley@felter.org - http://felter.org/wesley/ From zooko at zooko.com Tue Jun 19 12:56:02 2001 From: zooko at zooko.com (zooko@zooko.com) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message from Brandon of "Tue, 19 Jun 2001 14:46:17 CDT." References: Message-ID: Brandon wrote: > > > Hm. I don't think I understand what "public submission" is. (At least not in > > the context of distributed namespaces...) Why not require the client to > > generate an RSA key pair in order to submit the information? > > A public key doesn't mean anything if you don't have it *before* the > submission. The search system I'm advocating requires keys which are not > necessarily human-readable, but rather *guessable* so that they can be > automatically retrieved by a client. Ahh. Very interesting. Hm. But however the submitted data travels to you, or however your query is transmitted to it, one could do transitive introductions along that same path. Does that solve the problem? Regards, Zooko From zooko at zooko.com Tue Jun 19 13:09:01 2001 From: zooko at zooko.com (zooko@zooko.com) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message from Wesley Felter of "Tue, 19 Jun 2001 14:59:05 CDT." References: Message-ID: Wes wrote: > > > Good question. The answer is that there was an implicit extra requirement in > > there: that the namespace be perfectly distributed (i.e. nobody ever disagrees > > about what value a given key should map to, nor does anyone find themselves > > hindered from entering a new value into the namespace). > > Didn't Raph Levien write a paper about how to build a distributed > namespace with arbitrary keys (given out FCFS)? I don't know about this. I would love to get a URL to it! Note that the consistent-hashing based distributed search hack[1] that we implemented but do not use for Mojo Nation was originally suggested by Raph in a post to advogato.org. Regards, Zooko [1] http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/mojonation/evil/common/ContentHandicappers.py?content-type=text/plain From alk at pobox.com Tue Jun 19 14:40:01 2001 From: alk at pobox.com (Tony Kimball) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) References: Message-ID: <15151.50838.590836.759462@spanky.love.edu> Quoth zooko@zooko.com on Tuesday, 19 June: : > > Keyword searching must be implemented atop a mutable namespace : > > (because immutable namespaces have keys that are not human-readable.) : Good question. The answer is that there was an implicit extra requirement in : there: that the namespace be perfectly distributed (i.e. nobody ever disagrees : about what value a given key should map to, nor does anyone find themselves : hindered from entering a new value into the namespace). : : The only way, AFAIK, to achieve a *perfectly distributed* namespace, is for it : to be an immutable namespace, and for it to be based on collision-free hashes : and hence have non-human-memorable keys. Your discussion appears to confuse human memorability and human generability. Whether I can remember a key without an external memory aid is operationaly insignificant, because I do in fact posess external memory aids. : ... there *can't* be any namespace that satisfies my idea of : "perfectly distributed" and has human-memorable keys, because there would : immediately be conflicting preferences about what the key "mcdonalds.com" : mapped to in that namespace... One way to solve this problem is to make the namespace bigger, using enough keys in combination to insure the injectivity of the relation. But it seems to me that making the keys random 160-bit integers won't reduce the probability of conflicting preferences, because it is the semantic value of the mapping that is the issue in contention, not the uninterpreted association of random integers. As long as the map has a semantic value, and is operationally significant in the context of some application, such conflicts may arise. From blanu at uts.cc.utexas.edu Tue Jun 19 19:11:01 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message-ID: > Hey, back up a second there. What exactly is the difference between > mutable and immutable namespaces? If you mean the mutability of the > mappings themselves (so in an immutable namespace, a name would always map > to the same thing forever (where "the same thing" is left undefined for > the moment)), I don't see why keys in an immutable namespace > must be non-human-readable. In a decentralized system, each node in the network has the ability to mess with the data. Therefore, all items must be self-verifying in order to guarantee that they are immutable. If some node in the network modifies the data then obviously it's not immutable. I suppose you could also come up with a design where you only talked to nodes that you somehow trusted would honor the contract to not modify immutable data. So I guess the answer is that you have to either trust the network or the data has to be self-verifying. From blanu at uts.cc.utexas.edu Tue Jun 19 19:13:01 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message-ID: > Didn't Raph Levien write a paper about how to build a distributed > namespace with arbitrary keys (given out FCFS)? Wow! I believe that the be theoretically impossible. So I'd love to read that paper. Maybe he was using a different definition of distributed than I am. From blanu at uts.cc.utexas.edu Tue Jun 19 19:26:01 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: Message-ID: > Hm. But however the submitted data travels to you, or however your query is > transmitted to it, one could do transitive introductions along that same path. > > Does that solve the problem? Unfortunately, no. In order for this to work, the public key would have to be broadcast to the entire network. The basic problem is that the only way that a producer and a consumer can communicate is by inserting things into the network. A very miniscule amount of OOB communication is required as well, the name of an index ("britney" in my example). The name of the index could be automatically determine by the client, in which case the OOB information is the algorithm encoded in the client. Anyway, they have to have some way to agree on some keys beforehand. The only way for them to exchange information is for the producer to insert information at the predetermined keys and for the consumer to look for it there. In the particular case of searching based on keywords, random people that don't know each other are inserting and requesting things from these globally agreed upon keys using the keyword->key algorithm shared by all of the clients. In order for this to be secure, the public key of the producer must be known to the consumer before the search. This is because in order for the content at a particular key to not be modifiable by intervening nodes, there must be a relationalship between the contents of the file and the key itself. There's no way that the client can automatically generate keys based on some algorithm so that the keys generated are related to the producer's public key if the client doesn't know the producer's public key. I hope that made sense. :-) If not, I can try again. ;-) From jeff at platypus.ro Tue Jun 19 19:29:01 2001 From: jeff at platypus.ro (Jeff Darcy) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) References: Message-ID: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> > In a decentralized system, each node in the network has the ability to > mess with the data. Being decentralized does not necessarily imply that *any* node can modify *any* data. That is merely one form of decentralization, and for the very reasons we're discussing it might not be the ideal form. Partitioning, hierarchy and delegation can and often do have their place in decentralized systems, so long as globally necessary roles are not permanently bound to particular nodes. From Verbatim3D at aol.com Tue Jun 19 19:31:01 2001 From: Verbatim3D at aol.com (Verbatim3D@aol.com) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent ... Message-ID: <29.1684638f.286164a9@aol.com> hello how may i reach you ? From Verbatim3D at aol.com Tue Jun 19 19:35:01 2001 From: Verbatim3D at aol.com (Verbatim3D@aol.com) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent ... Message-ID: <115.899f11.286165ac@aol.com> I ment like Aol , Aim , Icq .... Ect From blanu at uts.cc.utexas.edu Tue Jun 19 19:48:02 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> Message-ID: > > In a decentralized system, each node in the network has the ability to > > mess with the data. > > Being decentralized does not necessarily imply that *any* node can modify > *any* data. That is merely one form of decentralization, and for the very > reasons we're discussing it might not be the ideal form. Partitioning, > hierarchy and delegation can and often do have their place in decentralized > systems, so long as globally necessary roles are not permanently bound to > particular nodes. I was actually thinking just that when I wrote that, but it didn't quite come out right. I meant to say something more like each node has the ability to modify the data if it gets the opportunity. Any non-self-verifying data can be modified by any node which it passes through. Various checks may detect, fix, avoid, etc. this from happening. Centralization, for instance, is an architecture with a naive method for dealing with evil nodes. You just have one node and hope that it's not evil. Totally decentralized systems are not so naive to assume that you trust *all* nodes in the network. So there are various schemes to get the data from nodes you trust, not giving evil nodes the opportunity to modify the data (as opposed to the innate ability to, which they still have). Or, as with Freenet, you can opt to use mostly self-verifying data and not care about where you get the data from. So, to sum up, self-verifying data can't be modified by any node. Non-self-verifying data can be modified be any node it passes through. Various architectures route the data differently and have different ways of dealing with evil nodes. From blanu at uts.cc.utexas.edu Tue Jun 19 19:48:03 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent ... In-Reply-To: <29.1684638f.286164a9@aol.com> Message-ID: > hello how may i reach you ? Who are you asking? From zooko at zooko.com Tue Jun 19 19:55:01 2001 From: zooko at zooko.com (zooko@zooko.com) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] no more In-Reply-To: Message from Brandon of "Tue, 19 Jun 2001 21:47:39 CDT." References: Message-ID: I removed from p2p-hackers subscribers and sent him polite private e-mail suggesting that this wasn't the forum he was looking for. Regards, Zooko (No, in case someone was going to ask, I will not do the same for Oskar...) From oskar at freenetproject.org Wed Jun 20 03:33:03 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com>; from jeff@platypus.ro on Tue, Jun 19, 2001 at 10:28:27PM -0400 References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> Message-ID: <20010620123459.B525@sandbergs.org> On Tue, Jun 19, 2001 at 10:28:27PM -0400, Jeff Darcy wrote: > > In a decentralized system, each node in the network has the ability to > > mess with the data. > > Being decentralized does not necessarily imply that *any* node can modify > *any* data. That is merely one form of decentralization, and for the very > reasons we're discussing it might not be the ideal form. Partitioning, > hierarchy and delegation can and often do have their place in decentralized > systems, so long as globally necessary roles are not permanently bound to > particular nodes. I guess this is semantics, but aren't hierarchical and decentralized as close to antonymous design processes as one can come? It is true that a hierarchy doesn't have to have one top, but a center does not have to contain one peer either. It would seem to me a hierarchical system could be distributed (DNS), but hardly decentralized (and that the center is mobile doesn't really seem to matter). Somebody linked a paper that defined the terms regarding anonymity and untracability a couple of weeks ago, which should mercifully spare us the semantic bantering regarding that - anybody know of a similiar paper regarding descriptions of system topologies? -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From jeff at platypus.ro Wed Jun 20 07:05:01 2001 From: jeff at platypus.ro (Jeff Darcy) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> Message-ID: <087501c0f991$ebb58590$367b9fa8@lss.emc.com> Oskar Sandberg: > I guess this is semantics, but aren't hierarchical and decentralized as > close to antonymous design processes as one can come? Not really. A truly centralized system is topologically a star. A hierarchical system is topologically a tree, which is not at all the same thing as a star and can therefore be considered decentralized. If the root of the tree only handles one request out of a thousand because the rest have been fully delegated elsewhere, then that can be pretty scalable and that's all the benefit many people hope to derive from decentralizing, so it's a pretty useful distinction. > It is true that a > hierarchy doesn't have to have one top, but a center does not have to > contain one peer either. The topology most people probably think of when they're thinking about decentralization is a mesh. What I think many "flat-earth" P2P folks miss is that meshes, trees, and DAGs are very closely related. Consider IP routing, for example. All of the routes through the network might form a mesh, but the routes to one destination at a particular point in time (modulo a few update-propagation issues) is a tree or DAG. When one considers multiply-rooted trees the similarities become even stronger. When you get right into it, superficially tree-structured and superficially mesh-structured systems can be devilishly difficult to tell apart. It's like those images where if you look at it one way you see a vase but then you look again you see two people about to kiss. As I alluded to in my last message, the important thing in decentralization is not that every last node be topologically or functionally equal to every other, but that roles not be permanently assigned to particular nodes. Temporary assignment of roles is fine, which is why so many P2P systems are sprouting "supernodes" and "brokers" and such all over the place. As long as the roles can be (re)assigned automagically when *the system* (not an administrator) detects that it is good or necessary to do so, the system is decentralized in pretty much all of the ways that matter. > It would seem to me a hierarchical system could > be distributed (DNS), but hardly decentralized The distinction I would make, based on having worked in this space for over a decade, is that the superset of distributed systems still allows the sort of permanent role assignment I was just talking about, while the subset of decentralized systems does not. Many people use the term "fully distributed" to mean "decentralized" but the distinction they're making is usually the same. > (and that the center is > mobile doesn't really seem to matter). The mobility of the center really *does* matter. The problem with having a center is that if it fails or slows down or is compromised then the entire system fails or slows down or is compromised. If the system is designed so that centers can be moved or created as needed, and/or so that failures of various kinds are contained so that they do not affect everyone, that's the most important single distinction to be made between centralized vs. decentralized systems. (Yes, that means a cluster can be a decentralized system. The cluster infrastructure I've worked on is, within itself, as "pure P2P" as anything I've ever seen, with all of the resource location/migration and coordination issues that such headlessness implies.) There's even a danger in being too extremist about levels of decentralization. Some of the "flat-earth" systems don't really solve the problems associated with centralization, and are just as vulnerable to bottlenecks or catastrophic cascading failures as any centralized system ever was. Look at what happened to Gnutella before reflectors. All that some of these systems do is make it harder to identify or remedy the source of such problems, while also adding a whole new class of problems and failures related to all those useless extra levels of indirection. IMO that much focus on ideology is poison. My own system is probably not "fully decentralized" enough for some people, but they can just take a flying leap at a rolling doughnut. I could make it more fully decentralized if I wanted, but not just for the sake of an ideal. My goal is to meet certain requirements - efficiency, robustness, security (but not anonymity) - and if further decentralization does not help me meet that goal then I'm not interested. Nobody should ever assume that the solution to their problem is the solution to everyone else's. > Somebody linked a paper that defined the terms regarding anonymity and > untracability a couple of weeks ago, which should mercifully spare us > the semantic bantering regarding that - anybody know of a similiar paper > regarding descriptions of system topologies? There are a lot of people out there offering definitions. Unfortunately, I don't think there's any one that could be considered authoritative. Of course, achieving universal agreement is a known problem in decentralized systems so perhaps we should just deal with it. ;-) From raph at levien.com Wed Jun 20 08:17:01 2001 From: raph at levien.com (Raph Levien) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] Secure namespaces in p2p networks Message-ID: <20010620010334.D15523@levien.com> Hi p2p-hackers, Zooko clued me into this thread. Yes, I have a paper on how to build secure distributed namespaces using p2p networks. Get your copy here: http://www.levien.com/fc.ps This was submitted to FC '00, but in the infinite wisdom of the reviewers, rejected. Interestingly enough, this design includes the Advogato trust metric as a technique for defeating a particular kind of attack - in particular, trying to flood the network with tons of servers, most likely virtual. I believe there are many more interesting applications of the trust metric ideas to p2p networks, including content selection and spam-resistant e-mail. I'm beginning work again on my thesis. Best place to look for updates on that is my Advogato diary: http://www.advogato.org/person/raph/ Raph From zooko at zooko.com Wed Jun 20 08:27:01 2001 From: zooko at zooko.com (zooko@zooko.com) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] names of kinds of topology In-Reply-To: Message from "Jeff Darcy" of "Wed, 20 Jun 2001 10:04:28 EDT." <087501c0f991$ebb58590$367b9fa8@lss.emc.com> References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> Message-ID: Here are two common "design patterns" in network topology that I frequently encounter and I wish I had a specific name for. (For these topologies, "decentralized" is neither specific nor unambiguous, but then neither is "centralized".) 1. There are many servers. Service happens bilaterally between client and server. For clients to interact with each other, they must use the *same* server. Clients can use multiple servers, aggregating results from multiple servers and dynamically choosing which servers to use. Mojo Nation's "content tracking" architecture is currently of this model. 2. There are many servers. Service happens bilaterally between client and server. For clients to interact with each other, they must use the *same* server. Servers aggregate results from other servers. Clients can only use one server, and they choose which server to use. IRC is of this model. Jukka Santala had a patch that allowed Mojo Nation "Content Trackers" to suck information out of each other (acting as clients in that transaction), which would change the topology of Mojo Nation content tracking to include servers aggregating information from one another as well as client aggregating information from multiple servers. So anyway, what are the names for these two topologies? Regards, Zooko From oskar at freenetproject.org Wed Jun 20 08:27:02 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: <087501c0f991$ebb58590$367b9fa8@lss.emc.com>; from jeff@platypus.ro on Wed, Jun 20, 2001 at 10:04:28AM -0400 References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> Message-ID: <20010620172900.E525@sandbergs.org> On Wed, Jun 20, 2001 at 10:04:28AM -0400, Jeff Darcy wrote: > Oskar Sandberg: <> > > It is true that a > > hierarchy doesn't have to have one top, but a center does not have to > > contain one peer either. > > The topology most people probably think of when they're thinking about > decentralization is a mesh. What I think many "flat-earth" P2P folks miss > is that meshes, trees, and DAGs are very closely related. Consider IP > routing, for example. All of the routes through the network might form a > mesh, but the routes to one destination at a particular point in time > (modulo a few update-propagation issues) is a tree or DAG. When one > considers multiply-rooted trees the similarities become even stronger. When > you get right into it, superficially tree-structured and superficially > mesh-structured systems can be devilishly difficult to tell apart. It's > like those images where if you look at it one way you see a vase but then > you look again you see two people about to kiss. Yes, a good example of this are the Plaxton etc [1] systems, which they describe in the paper as a set of trees, but which can also be viewed as a slightly relaxed hyperdimensional mesh. I would think that any system which allows a searching algorithm must be describable as a set of trees (just follow the routes). However, there is an important difference. In the sort of the systems that I think about as truly decentralized like Plaxton, Chord [2], or (assuming for a moment that it actually works) Freenet, the roots to the trees are spread over all the nodes, where as a system with a hierarchy would have the tree roots concentrated to a subset of the peers. If such a system is still considered decentralized, then I guess we need a new term for the other type - the only synonym that my Roget's lists for decentralized is deconcentrated, so I'll propose that. > As I alluded to in my last message, the important thing in decentralization > is not that every last node be topologically or functionally equal to every > other, but that roles not be permanently assigned to particular nodes. > Temporary assignment of roles is fine, which is why so many P2P systems are > sprouting "supernodes" and "brokers" and such all over the place. As long > as the roles can be (re)assigned automagically when *the system* (not an > administrator) detects that it is good or necessary to do so, the system is > decentralized in pretty much all of the ways that matter. For lack of a better term I have been refering to the networks which use "supernodes","brokers", "reflectors" etc as square-root networks, because by making sqrt(N) of the nodes "supernodes" you have O(g=sqrt(N)) growth in network traffic (per node) and O(h=sqrt(N)) growth in the tables of each of them (obviously you can shift those to functions, but you'll need to keep g*h = N). Personally I have been frowning at these designs and certainly not considered them decentralized (from now on I will frown at these designs and not consider them deconcentrated :-) ). (btw, What is interesting about the square-root networks to me is that they seem to approached from two directions, both from the systems that were previously broadcast (that is g=N, h=1) but that sank under the load, and from systems that previously completely centralized (that is g=1, h=N) but sank under the load at central server or attacks to it.) Of course, I can't say that deconcentration is necessary characteristic for every set of goals, or for sure that having the root subset be dynamic is not enough even when strict survivability is called for. I can't even say that deconcentration is good, just that it appeals to me. <> > > (and that the center is > > mobile doesn't really seem to matter). > > The mobility of the center really *does* matter. The problem with having a > center is that if it fails or slows down or is compromised then the entire > system fails or slows down or is compromised. If the system is designed so > that centers can be moved or created as needed, and/or so that failures of > various kinds are contained so that they do not affect everyone, that's the > most important single distinction to be made between centralized vs. > decentralized systems. (Yes, that means a cluster can be a decentralized > system. The cluster infrastructure I've worked on is, within itself, as > "pure P2P" as anything I've ever seen, with all of the resource > location/migration and coordination issues that such headlessness implies.) OK, but once you have a center/proper root subset (PRS) you open up a whole can of worms regarding making sure that this cannot be abused, both by people trying to use the limited set of nodes as an achilles heel of the network, and by the peers in the PRS itself. Certainly having the PRS be dynamic and mobile is one of the methods by which this can be combated, but, while probably necessary, it is not the only such measure, which is why I wanted to seperated these networks, and the means necessary to secure them, from the deconcentrated ones. > There's even a danger in being too extremist about levels of > decentralization. Some of the "flat-earth" systems don't really solve the > problems associated with centralization, and are just as vulnerable to > bottlenecks or catastrophic cascading failures as any centralized system > ever was. Look at what happened to Gnutella before reflectors. All that > some of these systems do is make it harder to identify or remedy the source > of such problems, while also adding a whole new class of problems and > failures related to all those useless extra levels of indirection. IMO that > much focus on ideology is poison. My own system is probably not "fully > decentralized" enough for some people, but they can just take a flying leap > at a rolling doughnut. I could make it more fully decentralized if I > wanted, but not just for the sake of an ideal. My goal is to meet certain > requirements - efficiency, robustness, security (but not anonymity) - and if > further decentralization does not help me meet that goal then I'm not > interested. Nobody should ever assume that the solution to their problem is > the solution to everyone else's. I think the problem is mostly semantic differences perceived as dogma. If my defenition of decentralized were what I have now decided is deconcentrated, then me saying that your network was not decentralized need not be taken as an insult or attack on your work. Same thing with terms like scalable (comparatively limited scalability is perfectly rational design tradeoff). <> > > Somebody linked a paper that defined the terms regarding anonymity and > > untracability a couple of weeks ago, which should mercifully spare us > > the semantic bantering regarding that - anybody know of a similiar paper > > regarding descriptions of system topologies? > > There are a lot of people out there offering definitions. Unfortunately, I > don't think there's any one that could be considered authoritative. Of > course, achieving universal agreement is a known problem in decentralized > systems so perhaps we should just deal with it. ;-) Well, I think one of the main reasons this list was started was that when a lot of us in SF we found that we had all developed a different set of vocabulary for the same thing, so setting down some authorative definitions, at least between one another, would certainly be using this list to good effect. We couldn't with honestly refer to ourselves as p2p-_hackers_ if we just sat around complaining about what we do not have. Anybody feel like starting a p2p-hackers-dictionary somewhere on the web? -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From oskar at freenetproject.org Wed Jun 20 08:34:02 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: <20010620172900.E525@sandbergs.org>; from oskar@freenetproject.org on Wed, Jun 20, 2001 at 05:29:00PM +0200 References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> Message-ID: <20010620173604.F525@sandbergs.org> On Wed, Jun 20, 2001 at 05:29:00PM +0200, Oskar Sandberg wrote: > On Wed, Jun 20, 2001 at 10:04:28AM -0400, Jeff Darcy wrote: > > Oskar Sandberg: I forgot my references (maybe the p2p-hackers dictionary should also have a reference list): [1] C. Plaxton, R. Rajaraman, A Richa. Accessing Nearby copies of replicated objects in a distributed environment. In Proc. of ACM SPAA, June 1997. [2] I Stoica, R Morris, D Karger, M Kasshoek, H Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. Submission to ACM SIGCOMM, 2001. -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From hal at finney.org Wed Jun 20 09:14:01 2001 From: hal at finney.org (hal@finney.org) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] names of kinds of topology Message-ID: <200106201605.JAA32133@finney.org> Zooko writes: > 2. There are many servers. Service happens bilaterally between client and > server. For clients to interact with each other, they must use the *same* > server. Servers aggregate results from other servers. Clients can only use > one server, and they choose which server to use. > > IRC is of this model. I thought IRC allowed users connected to different servers to talk, as long as the servers were connected together? That's what netsplits were, when servers would get disconnected and you'd lose access to the users on the other servers. Has it changed? Hal From hal at finney.org Wed Jun 20 09:21:01 2001 From: hal at finney.org (hal@finney.org) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) Message-ID: <200106201612.JAA32168@finney.org> Jeff writes: > Not really. A truly centralized system is topologically a star. A > hierarchical system is topologically a tree, which is not at all the same > thing as a star and can therefore be considered decentralized. A star is a kind of tree, one where everyone connects to the same root, a tree of depth 1. So you can have various degrees of centralization in a tree. Some trees are more equal than others. Hal From zooko at zooko.com Wed Jun 20 09:26:01 2001 From: zooko at zooko.com (zooko@zooko.com) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] names of kinds of topology In-Reply-To: Message from hal@finney.org of "Wed, 20 Jun 2001 09:05:36 PDT." <200106201605.JAA32133@finney.org> References: <200106201605.JAA32133@finney.org> Message-ID: Hal Finney wrote: > > Zooko writes: > > 2. There are many servers. Service happens bilaterally between client and > > server. For clients to interact with each other, they must use the *same* > > server. Servers aggregate results from other servers. Clients can only use > > one server, and they choose which server to use. > > > > IRC is of this model. > > I thought IRC allowed users connected to different servers to talk, > as long as the servers were connected together? That's what netsplits > were, when servers would get disconnected and you'd lose access to the > users on the other servers. Has it changed? My fault. You are right that IRC users can talk to each other through intermediate servers without having to use the same server. Regards, Zooko From jeff at platypus.ro Wed Jun 20 09:35:01 2001 From: jeff at platypus.ro (Jeff Darcy) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) References: <200106201612.JAA32168@finney.org> Message-ID: <08ac01c0f9a6$e0d1d830$367b9fa8@lss.emc.com> > A star is a kind of tree, one where everyone connects to the same root, > a tree of depth 1. That's true, but not useful. Yes, a star is a special kind of tree. It's also a special kind of mesh. That doesn't make stars and trees and meshes equivalent in any practical kind of way. It's like saying a doughnut is the same as a coffee cup because they're both genus one, but if you try and pour your coffee into the wrong one you'll learn the difference real fast. From jeff at platypus.ro Wed Jun 20 12:14:01 2001 From: jeff at platypus.ro (Jeff Darcy) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> Message-ID: <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> > However, there is an important difference. In the sort of the systems > that I think about as truly decentralized like Plaxton, Chord [2], or > (assuming for a moment that it actually works) Freenet, the roots to the > trees are spread over all the nodes, where as a system with a hierarchy > would have the tree roots concentrated to a subset of the peers. I think we need to consider time as part of the taxonomy. Permanent concentration is not the same as transient concentration, which in turn is not the same as no concentration at all. > If such > a system is still considered decentralized, then I guess we need a new > term for the other type - the only synonym that my Roget's lists for > decentralized is deconcentrated, so I'll propose that. FWIW, I call them "flat" decentralized systems. > For lack of a better term I have been refering to the networks which use > "supernodes","brokers", "reflectors" etc as square-root networks, > because by making sqrt(N) of the nodes "supernodes" you have > O(g=sqrt(N)) growth in network traffic (per node) and O(h=sqrt(N)) > growth in the tables of each of them (obviously you can shift those to > functions, but you'll need to keep g*h = N). That formula simply doesn't work. Total traffic remaining constant, traffic per node is proportional to mean path length - the measure most often used by people who actually study routing. Let's look at a few topologies and see how table size and path length are affected. (1) For a fully-connected network, path length is always 1 and table size is always N. There's our starting point. (2) For a star, path length is always 2 and table size is always 1, except for the hub where they're 1 and N respectively. The averages both work out to 2, for a constant value of 4 regardless of network size. Hmmm. (3) For a two-level hierarchy (sqrt(N) pools of sqrt(N) nodes, with one "gateway" per pool), the mean path length asymptotically increases toward 3 as N increases, and the mean table size likewise decreases toward sqrt(N). For large N, therefore, our product would be 3*sqrt(N). I'm not doing this just to pick nits, either. It's important. The relationship you state between table size and path length (or traffic per node) only applies for fully-connected or broadcast networks. The search behavior of several well-known networks does fit this model and makes it relevant, but for many other purposes or structures it's a total red herring. It *is* possible to maintain short path lengths and small table sizes concurrently in many situations. > (btw, What is interesting about the square-root networks to me is that > they seem to approached from two directions, both from the systems that > were previously broadcast (that is g=N, h=1) but that sank under the > load, and from systems that previously completely centralized (that is > g=1, h=N) but sank under the load at central server or attacks to it.) That's an important observation, and it's precisely the reason that I don't believe in "flat-earth" decentralization as a practical approach for most problems. Extremes are rarely optimal in the real world. Time after time I've seen flat decentralized systems fail when N gets large because some part somewhere had a communications complexity of O(N) or worse, and I hate seeing systems fail. Hierarchical approaches avoid that trap, and as long as the hierarchy is dynamic and adaptive they avoid the traps inherent in full centralization as well. > Of course, I can't say that deconcentration is necessary characteristic > for every set of goals, or for sure that having the root subset be > dynamic is not enough even when strict survivability is called for. I > can't even say that deconcentration is good, just that it appeals to me. Yes, it is very appealing aesthetically, and it's necessary and/or appropriate for some situations. > OK, but once you have a center/proper root subset (PRS) you open up a > whole can of worms regarding making sure that this cannot be abused, > both by people trying to use the limited set of nodes as an achilles > heel of the network, and by the peers in the PRS itself. Certainly > having the PRS be dynamic and mobile is one of the methods by which > this can be combated, but, while probably necessary, it is not the only > such measure, which is why I wanted to seperated these networks, and the > means necessary to secure them, from the deconcentrated ones. That's reasonable enough. Just remember that we all end up eating worms of one kind or another. > Well, I think one of the main reasons this list was started was that > when a lot of us in SF we found that we had all developed a different > set of vocabulary for the same thing, so setting down some authorative > definitions, at least between one another, would certainly be using this > list to good effect. Where the hell was I during these talks? Probably too busy answering all the people who were asking why I - as an employee of a large corporation not known for participating in the open exchange of ideas - was there at all. Oh well. Going back to taxonomy, I'll take a stab at it. Here are some example structures: (1) Fully centralized. All operations involve a single permanent root node. (2) Partitioned. There are multiple roots, separated by role, location, or other criteria (e.g. partitioned namespace). Loss of a root is catastrophic wrt its own responsibilities, irrelevant wrt others. (3) Decentralized. There are multiple roots and/or intermediate nodes capable of operating independently and/or taking over for one another, such that loss of any N is disruptive but not fatal (unless it causes a partition of the entire network/system). (4) Flat ("deconcentrated"). There are no roots at all, even dynamically assigned or role-specific. Operations depend only on endpoints and the existence of some path between them. From oskar at freenetproject.org Wed Jun 20 14:45:02 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com>; from jeff@platypus.ro on Wed, Jun 20, 2001 at 03:12:54PM -0400 References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> Message-ID: <20010620234727.K525@sandbergs.org> On Wed, Jun 20, 2001 at 03:12:54PM -0400, Jeff Darcy wrote: > > However, there is an important difference. In the sort of the systems > > that I think about as truly decentralized like Plaxton, Chord [2], or > > (assuming for a moment that it actually works) Freenet, the roots to the > > trees are spread over all the nodes, where as a system with a hierarchy > > would have the tree roots concentrated to a subset of the peers. > > I think we need to consider time as part of the taxonomy. Permanent > concentration is not the same as transient concentration, which in turn is > not the same as no concentration at all. However the mobility of the tree roots is quite orthoganol to their concentration. A deconcentrated network can also have roots that are static or mobile over time (in fact, the combination of deconcentration and mobility is what keeps me working on Freenet in spite of my doubts regarding the system.) > > If such > > a system is still considered decentralized, then I guess we need a new > > term for the other type - the only synonym that my Roget's lists for > > decentralized is deconcentrated, so I'll propose that. > > FWIW, I call them "flat" decentralized systems. The way you used flat in your last mail, and below, gave me the impression that you refering to original Gnutella type networks, where you have no information where to go (it all looks the same). Networks like Plaxton, while certainly deconcentrated, don't feel flat... > > For lack of a better term I have been refering to the networks which use > > "supernodes","brokers", "reflectors" etc as square-root networks, > > because by making sqrt(N) of the nodes "supernodes" you have > > O(g=sqrt(N)) growth in network traffic (per node) and O(h=sqrt(N)) > > growth in the tables of each of them (obviously you can shift those to > > functions, but you'll need to keep g*h = N). > > That formula simply doesn't work. Total traffic remaining constant, traffic > per node is proportional to mean path length - the measure most often used > by people who actually study routing. Let's look at a few topologies and > see how table size and path length are affected. Because you brought up the specific example of using "supernodes" like the second generation file swapping services (I think examples are KaZaa, EDonkey, and Gnutella with reflectors), I drifted off into a discussion of those systems, meaning not only that there are supernodes, but there is no sorting of the data between them. Thus, to query for something in a global namespace, Alice would need to contact every "supernode", and each supernode needs to hold all the entries for a partition of the entire peer population. Hence if there are X supernodes, the amount of traffic will be of order X for every user Alice, and the supernode will need to hold in the order of N/X entries. Thus my examples, with one single supernode (Napster) each user only needs to contact 1 party, but the supernode must hold all N entries. If every node is supernode (Gnutella), the amount of entries is constant, but each user must contact N parties. Another way of putting it is: If you are searching for data in a global namespace, the network contains no sorting between the nodes, T is the total network traffic, P is the portion of the nodes that contain information, and N is the total number of nodes, than: T/P = k*N? (for some contant k) This isn't very profound when you think about it, but I think it is important to note. > (1) For a fully-connected network, path length is always 1 and table size is > always N. There's our starting point. > > (2) For a star, path length is always 2 and table size is always 1, except > for the hub where they're 1 and N respectively. The averages both work out > to 2, for a constant value of 4 regardless of network size. Hmmm. > > (3) For a two-level hierarchy (sqrt(N) pools of sqrt(N) nodes, with one > "gateway" per pool), the mean path length asymptotically increases toward 3 > as N increases, and the mean table size likewise decreases toward sqrt(N). > For large N, therefore, our product would be 3*sqrt(N). > > I'm not doing this just to pick nits, either. It's important. The > relationship you state between table size and path length (or traffic per > node) only applies for fully-connected or broadcast networks. The search > behavior of several well-known networks does fit this model and makes it > relevant, but for many other purposes or structures it's a total red > herring. It *is* possible to maintain short path lengths and small table > sizes concurrently in many situations. I'm not going to argue that it is not, but rather that for that to be so it is necessary to sort entries in some manner between the nodes (remember the problem is finding an entry in a global namespace). Some concentration may make that easier, but certainly not right off. <> > > Well, I think one of the main reasons this list was started was that > > when a lot of us in SF we found that we had all developed a different > > set of vocabulary for the same thing, so setting down some authorative > > definitions, at least between one another, would certainly be using this > > list to good effect. > > Where the hell was I during these talks? Probably too busy answering all > the people who were asking why I - as an employee of a large corporation not > known for participating in the open exchange of ideas - was there at all. > Oh well. > > Going back to taxonomy, I'll take a stab at it. Here are some example > structures: > > (1) Fully centralized. All operations involve a single permanent root node. Ok. > (2) Partitioned. There are multiple roots, separated by role, location, or > other criteria (e.g. partitioned namespace). Loss of a root is catastrophic > wrt its own responsibilities, irrelevant wrt others. Ok. > (3) Decentralized. There are multiple roots and/or intermediate nodes > capable of operating independently and/or taking over for one another, such > that loss of any N is disruptive but not fatal (unless it causes a partition > of the entire network/system). Ok (it will take some adaption on my part, but people have often responded emotionally to claim that their designs were centralized, so it's probably for the better.) > (4) Flat ("deconcentrated"). There are no roots at all, even dynamically > assigned or role-specific. Operations depend only on endpoints and the > existence of some path between them. I'm not sure about this. What I mean by deconcentrated is that there are roots, just that every node is (or could be, there may be less trees then nodes) also a root of some tree. -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From blanu at uts.cc.utexas.edu Wed Jun 20 15:13:01 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> Message-ID: > That's an important observation, and it's precisely the reason that I don't > believe in "flat-earth" decentralization as a practical approach for most > problems. Extremes are rarely optimal in the real world. This statement is too general to be practically useful in conversation. You can't really meaningfully talk about "most problems". Perhaps in your sphere of interest fully decentralized systems aren't appropriate. However, in anonymous systems I find any system that is not fully decentralized to be unacceptable as it creates weak points for attack and subversion. Of course, as someone pointed out earlier, you can create supernode-like node in a fully decentralize system by creating a lot of virtual nodes. But at least the architecture isn't helping you out. From jeff at platypus.ro Wed Jun 20 15:16:01 2001 From: jeff at platypus.ro (Jeff Darcy) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) References: Message-ID: <08f601c0f9d6$85b6c840$367b9fa8@lss.emc.com> > This statement is too general to be practically useful in conversation. > You can't really meaningfully talk about "most problems". Perhaps in your > sphere of interest fully decentralized systems aren't appropriate. > However, in anonymous systems I find any system that is not fully > decentralized to be unacceptable as it creates weak points for attack and > subversion. Sorry to be the one to point this out, but anonymous systems are a tiny little niche within the overall space of applications. If there's anyone who lacks perspective, it's those who live inside that bubble, not outside. From jeff at platypus.ro Wed Jun 20 15:37:02 2001 From: jeff at platypus.ro (Jeff Darcy) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> <20010620234727.K525@sandbergs.org> Message-ID: <090201c0f9d9$670193f0$367b9fa8@lss.emc.com> > However the mobility of the tree roots is quite orthoganol to their > concentration. A deconcentrated network can also have roots that are > static or mobile over time (in fact, the combination of deconcentration > and mobility is what keeps me working on Freenet in spite of my doubts > regarding the system.) OK, I admit I'm confused. Earlier you claimed that hierarchy and decentralization were antithetical, but now you're talking about roots in the context of a "deconcentrated" network. What are these roots of which you speak? What is their role? Must other nodes contact these roots to access resources? When I use "root" I refer to a node that is distinguished from its neighbors by an asymmetric/hierarchical relationship in which it is allowed to complete requests or perform actions without contacting them while they are not similarly free to do so without contacting it. I'm not sure what a "root" is when there's no "up" or "down" - as I believe would be the case in a "deconcentrated" network. > The way you used flat in your last mail, and below, gave me the > impression that you refering to original Gnutella type networks, where > you have no information where to go (it all looks the same). Networks > like Plaxton, while certainly deconcentrated, don't feel flat... I think we need to clarify whether we're talking about the topology of the system itself or of the namespace that system uses. They're two different things; I've been focusing on the former, but I'm beginning to get the impression that you're talking about the latter. > Because you brought up the specific example of using "supernodes" like > the second generation file swapping services (I think examples are > KaZaa, EDonkey, and Gnutella with reflectors), I drifted off into a > discussion of those systems, meaning not only that there are supernodes, > but there is no sorting of the data between them. Thus, to query for > something in a global namespace, Alice would need to contact every > "supernode", and each supernode needs to hold all the entries for a > partition of the entire peer population. In other words, a broadcast network (among the supernodes). In that very particular case, your formula does indeed apply, but to be honest I consider it a fairly degenerate case. > > It *is* possible to maintain short path lengths and small table > > sizes concurrently in many situations. > > I'm not going to argue that it is not, but rather that for that to be so > it is necessary to sort entries in some manner between the nodes Yes, in the particular case of searching, that is true. That's why I'm not a big fan of searching, and prefer proactive indexing instead. > > (4) Flat ("deconcentrated"). There are no roots at all, even dynamically > > assigned or role-specific. Operations depend only on endpoints and the > > existence of some path between them. > > I'm not sure about this. What I mean by deconcentrated is that there are > roots, just that every node is (or could be, there may be less trees > then nodes) also a root of some tree. Again, I think we need to be clear whether we're talking about the network or the namespace. From oskar at freenetproject.org Wed Jun 20 17:35:02 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: <090201c0f9d9$670193f0$367b9fa8@lss.emc.com>; from jeff@platypus.ro on Wed, Jun 20, 2001 at 06:35:37PM -0400 References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> <20010620234727.K525@sandbergs.org> <090201c0f9d9$670193f0$367b9fa8@lss.emc.com> Message-ID: <20010621023741.M525@sandbergs.org> On Wed, Jun 20, 2001 at 06:35:37PM -0400, Jeff Darcy wrote: > > The way you used flat in your last mail, and below, gave me the > > impression that you refering to original Gnutella type networks, where > > you have no information where to go (it all looks the same). Networks > > like Plaxton, while certainly deconcentrated, don't feel flat... > > I think we need to clarify whether we're talking about the topology of the > system itself or of the namespace that system uses. They're two different > things; I've been focusing on the former, but I'm beginning to get the > impression that you're talking about the latter. The general problem that we are dealing with here, as far as I know, is publishing and finding data in distributed networks. Finding entries in a global namespace is a relaxation of this problem (depending on implementation it may be one phase of it). I don't really see the point in discussing the topology in regard to something else then the problem at hand. It might be worth considering the problem from the perspective of being forced into a certain topology between the peers, but one is making the problem much harder which I hesitate to do until the easier case is solved. I'll gladly admit my complete naivite' regarding classic approaces to these problems however, so I could just be misunderstanding something. > > However the mobility of the tree roots is quite orthoganol to their > > concentration. A deconcentrated network can also have roots that are > > static or mobile over time (in fact, the combination of deconcentration > > and mobility is what keeps me working on Freenet in spite of my doubts > > regarding the system.) > > OK, I admit I'm confused. Earlier you claimed that hierarchy and > decentralization were antithetical, but now you're talking about roots in > the context of a "deconcentrated" network. What are these roots of which > you speak? What is their role? Must other nodes contact these roots to > access resources? When I use "root" I refer to a node that is distinguished > from its neighbors by an asymmetric/hierarchical relationship in which it is > allowed to complete requests or perform actions without contacting them > while they are not similarly free to do so without contacting it. I'm not > sure what a "root" is when there's no "up" or "down" - as I believe would be > the case in a "deconcentrated" network. For any individual query the network contains a tree that leeds to a root (or root set). Plaxton and co's paper, which I referenced earlier contains a very good example of such a system. There are trees, but every node takes part at different levels in many of these trees (and as a leaf in every single one). <> > > Because you brought up the specific example of using "supernodes" like > > the second generation file swapping services (I think examples are > > KaZaa, EDonkey, and Gnutella with reflectors), I drifted off into a > > discussion of those systems, meaning not only that there are supernodes, > > but there is no sorting of the data between them. Thus, to query for > > something in a global namespace, Alice would need to contact every > > "supernode", and each supernode needs to hold all the entries for a > > partition of the entire peer population. > > In other words, a broadcast network (among the supernodes). In that very > particular case, your formula does indeed apply, but to be honest I consider > it a fairly degenerate case. I must admit I don't understand what application you are aiming at. Could you give me an example of the sort of network you are discussing? > > > It *is* possible to maintain short path lengths and small table > > > sizes concurrently in many situations. > > > > I'm not going to argue that it is not, but rather that for that to be so > > it is necessary to sort entries in some manner between the nodes > > Yes, in the particular case of searching, that is true. That's why I'm not > a big fan of searching, and prefer proactive indexing instead. Trying to index every entry at every node is not a particularly scalable solution either. Searching usually makes people think of keyword searches (where the thread this sprang from started), but my discussion is generalized to any sort of lookup (which usually defaults to binary identifiers). > > > (4) Flat ("deconcentrated"). There are no roots at all, even dynamically > > > assigned or role-specific. Operations depend only on endpoints and the > > > existence of some path between them. > > > > I'm not sure about this. What I mean by deconcentrated is that there are > > roots, just that every node is (or could be, there may be less trees > > then nodes) also a root of some tree. > > Again, I think we need to be clear whether we're talking about the network > or the namespace. Especially if you are not interested in anonymity (as I gather you are not), what role does the network have besides serving as a namespace for lookups? -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From jeff at platypus.ro Wed Jun 20 21:04:01 2001 From: jeff at platypus.ro (Jeff Darcy) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> <20010620234727.K525@sandbergs.org> <090201c0f9d9$670193f0$367b9fa8@lss.emc.com> <20010621023741.M525@sandbergs.org> Message-ID: <093001c0fa07$20029890$367b9fa8@lss.emc.com> From: "Oskar Sandberg" > The general problem that we are dealing with here, as far as I know, is > publishing and finding data in distributed networks. Finding entries in > a global namespace is a relaxation of this problem (depending on > implementation it may be one phase of it). My apologies, then. I apparently interpreted questions such as "aren't hierarchical and decentralized as close to antonymous design processes as one can come" and "anybody know of a similiar paper regarding descriptions of system topologies" as invitations to a broader discussion when in fact they were not. > I don't really see the point in discussing the topology in regard to > something else then the problem at hand. Anything not directly related to the problem at hand is "pointless"? I'm sorry, but I cannot agree. Focus is good, but myopia is not good focus. Would you try to shut down a discussion of security until it became the problem at hand, or might you occasionally be interested in thinking and talking about it at leisure so you don't have to scramble to catch up when the shit hits the fan? > For any individual query the network contains a tree that leeds to a > root (or root set). Plaxton and co's paper, which I referenced earlier > contains a very good example of such a system. There are trees, but > every node takes part at different levels in many of these trees (and as > a leaf in every single one). This seems to be another example of the previously-mentioned relationships between trees and meshes, very similar to the example I gave of a route tree superimposed on a mesh. In fact, Plaxton et al. explicitly mention the similarity to a routing protocol. IMO the "roots" in their scheme are like destinations in a route tree, and not really roots in the sense that the term would usually be understood. Certainly they're not roots in the sense that *I* have been using the term. > > In other words, a broadcast network (among the supernodes). In that very > > particular case, your formula does indeed apply, but to be honest I consider > > it a fairly degenerate case. > > I must admit I don't understand what application you are aiming at. > Could you give me an example of the sort of network you are discussing? Pretty much anything other than the various functional clones of Napster. Very few other distributed applications rely in any significant way on broadcast. From IRC and the web to DNS and IP routing itself, unicast communication predominates by a huge margin. Perhaps "degenerate" is too strong a word, but broadcast is certainly a special case and rules that apply only to special cases should not be stated as though they were general. > > Yes, in the particular case of searching, that is true. That's why I'm not > > a big fan of searching, and prefer proactive indexing instead. > > Trying to index every entry at every node is not a particularly scalable > solution either. Then it's a good thing nobody suggested that. > Especially if you are not interested in anonymity (as I gather you are > not), what role does the network have besides serving as a namespace for > lookups? That's an amazing question, especially from a Freenet guy. Let's see, what can we use a network for, besides searching? Hey, I've got it, maybe we can use the network for transferring the *files* as well! Sound familiar? I do believe Freenet sometimes uses the network that way, on the rare occasions that a user is actually able to find a file and wishes to retrieve its contents. OK, sorry for being so sarcastic. It seems that managing the metadata has become such a big pain in the ass for some systems that their designers have forgotten about what the users really want/need - the data. The sort of lookups you're talking about are just *one way* to get at the data. It's a pretty problematic way at that, which is why the vast majority of applications avoid it. I don't need to broadcast a query to read a web page or log in to a remote system or play online chess, but somehow I still find the information I need to do those things. In my own project I never use such methods, and yet users can still find data from anywhere using highly familiar and intuitive means. From blanu at uts.cc.utexas.edu Wed Jun 20 21:07:01 2001 From: blanu at uts.cc.utexas.edu (Brandon) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: <08f601c0f9d6$85b6c840$367b9fa8@lss.emc.com> Message-ID: > Sorry to be the one to point this out, but anonymous systems are a tiny > little niche within the overall space of applications. If there's anyone > who lacks perspective, it's those who live inside that bubble, not outside. Your first statement is quite true. I'm just pointing out that it is overgeneralizing to simply say fully decentralized systems are not useful when there is at least one type of application for which *only* fully decentralized systems are suitable. Your second statement is another overly general philosophical statement that doesn't apply to the discussion in a useful way. From oskar at freenetproject.org Thu Jun 21 06:46:01 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:42 2006 Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) In-Reply-To: <093001c0fa07$20029890$367b9fa8@lss.emc.com>; from jeff@platypus.ro on Thu, Jun 21, 2001 at 12:03:26AM -0400 References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> <20010620234727.K525@sandbergs.org> <090201c0f9d9$670193f0$367b9fa8@lss.emc.com> <20010621023741.M525@sandbergs.org> <093001c0fa07$20029890$367b9fa8@lss.emc.com> Message-ID: <20010621154823.A629@sandbergs.org> On Thu, Jun 21, 2001 at 12:03:26AM -0400, Jeff Darcy wrote: > From: "Oskar Sandberg" <> > > I don't really see the point in discussing the topology in regard to > > something else then the problem at hand. > > Anything not directly related to the problem at hand is "pointless"? I'm > sorry, but I cannot agree. Focus is good, but myopia is not good focus. > Would you try to shut down a discussion of security until it became the > problem at hand, or might you occasionally be interested in thinking and > talking about it at leisure so you don't have to scramble to catch up when > the shit hits the fan? In a way, yes. I would argue that a discussing security is pretty meaningless outside the context of what you are trying secure. A discussion on the value moats isn't particularly pointful if you are trying secure an email message, and a comparison between RSA and ElGamal when trying to secure a medieval castle is most certainly a waste of time. > > For any individual query the network contains a tree that leeds to a > > root (or root set). Plaxton and co's paper, which I referenced earlier > > contains a very good example of such a system. There are trees, but > > every node takes part at different levels in many of these trees (and as > > a leaf in every single one). > > This seems to be another example of the previously-mentioned relationships > between trees and meshes, very similar to the example I gave of a route tree > superimposed on a mesh. In fact, Plaxton et al. explicitly mention the > similarity to a routing protocol. IMO the "roots" in their scheme are like > destinations in a route tree, and not really roots in the sense that the > term would usually be understood. Certainly they're not roots in the sense > that *I* have been using the term. I have been using root as in "root of a route tree". By a Proper Root Subset I meant that the set of peers that can be roots to the lookup route trees is a proper subset of set of peers in the network, and by deconcentrated that all nodes in the network are also roots of route trees. < > > > I must admit I don't understand what application you are aiming at. > > Could you give me an example of the sort of network you are discussing? > > Pretty much anything other than the various functional clones of Napster. > Very few other distributed applications rely in any significant way on > broadcast. From IRC and the web to DNS and IP routing itself, unicast > communication predominates by a huge margin. Perhaps "degenerate" is too > strong a word, but broadcast is certainly a special case and rules that > apply only to special cases should not be stated as though they were > general. I'm not a fan of broadcasting either, my question was more to the effect of what application it is you wish to build. I gather you are looking to create a general point to point routing system for an overlay network? What motivates this? > > > Yes, in the particular case of searching, that is true. That's why I'm > not > > > a big fan of searching, and prefer proactive indexing instead. > > > > Trying to index every entry at every node is not a particularly scalable > > solution either. > > Then it's a good thing nobody suggested that. You have a reference as to what you do mean by proactive indexing? Google wasn't very helpful, and I thought I remembered a discussion between you and the creator of the "blocks" system on Infoanarchy.org where you refered to what he is doing (which, AFAIK, is indexing every entry at every node) by that name. > > Especially if you are not interested in anonymity (as I gather you are > > not), what role does the network have besides serving as a namespace for > > lookups? > > That's an amazing question, especially from a Freenet guy. Let's see, what > can we use a network for, besides searching? Hey, I've got it, maybe we can > use the network for transferring the *files* as well! Sound familiar? I do > believe Freenet sometimes uses the network that way, on the rare occasions > that a user is actually able to find a file and wishes to retrieve its > contents. Well, besides caching for performance, the only other reason I see to not simply transfer the data directly between the publisher and the requestor is to disassociate them from one another (ie striving toward anonymity). > OK, sorry for being so sarcastic. It seems that managing the metadata has > become such a big pain in the ass for some systems that their designers have > forgotten about what the users really want/need - the data. The sort of > lookups you're talking about are just *one way* to get at the data. It's a > pretty problematic way at that, which is why the vast majority of > applications avoid it. I don't need to broadcast a query to read a web page > or log in to a remote system or play online chess, but somehow I still find > the information I need to do those things. In my own project I never use > such methods, and yet users can still find data from anywhere using highly > familiar and intuitive means. Yes, but the web already works, I don't see any reason to reimplement it. What I am interested in is disassocating data from any physical location (my motivations for which are mostly political) - and like I said, being able to look things up in a global namespace is a relaxation of that problem. Plaxton's proposed system works by having the namespace lookup result in pointers to the actual location of the data objects, where as our network attempts to return the data directly to the namespace lookup, but the lookup is still there. The related problem of creating a file sharing system like Napster and Gnutella that isn't centralized like the prior or limited in size like the latter is, while not directly what I am doing ATM, certainly interesting and useful. I would be very interested to hear more about "your project" however. You have a reference to a paper or implementation? -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From jeff at platypus.ro Thu Jun 21 08:29:01 2001 From: jeff at platypus.ro (Jeff Darcy) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] Re: topology, goals, etc. (was Re: shared namespaces) References: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> <20010620234727.K525@sandbergs.org> <090201c0f9d9$670193f0$367b9fa8@lss.emc.com> <20010621023741.M525@sandbergs.org> <093001c0fa07$20029890$367b9fa8@lss.emc.com> <20010621154823.A629@sandbergs.org> Message-ID: <096c01c0fa66$caee31b0$367b9fa8@lss.emc.com> From: "Oskar Sandberg" > A discussion on the value moats isn't particularly pointful if you are > trying secure an email message, and a comparison between RSA and ElGamal > when trying to secure a medieval castle is most certainly a waste of > time. I still don't agree. If you're trying to secure a medieval castle then comparisons between RSA and ElGamal should obviously have a lower priority than discussions of moats and vats of boiling oil, but to say that it's a "waste of time" implies that it's not worthy of discussion *at all*. I'm not under the impression that people's time spent reading or posting to this list is such a rare or valuable commodity that we must reject as "pointless" anything not related to some arbitrarily (and sometimes selfishly) chosen "problem at hand". > I'm not a fan of broadcasting either, my question was more to the effect > of what application it is you wish to build. I gather you are looking to > create a general point to point routing system for an overlay network? That's actually only a small part of what I'm working on. I'll describe the project shortly. > You have a reference as to what you do mean by proactive indexing? > Google wasn't very helpful, and I thought I remembered a discussion > between you and the creator of the "blocks" system on Infoanarchy.org > where you refered to what he is doing (which, AFAIK, is indexing every > entry at every node) by that name. By proactive indexing I simply mean that (meta-)information about data locations is distributed in advance of requests for that information. That does not, however, mean that all meta-information is distributed to every node. It's quite reasonable to distribute and cache partial information proactively, and then fall back to searching or an authoritative catalog when nothing is found in the cache. If a catalog is used, it in turn could use any of the levels of decentralization we've mentioned, and searching would still be unnecessary. BTW, there are real systems that work on pretty much this basis, though they might not be the sorts of systems people people here like to study. For example, InterLibrary Loan uses just this combination of authoritative catalogs and distributed partial information, as do quite a few criminal and medical databases. They handle more traffic than any of the filesharing programs more typically considered as examples, and they work pretty well with this model. > Well, besides caching for performance, the only other reason I see to > not simply transfer the data directly between the publisher and the > requestor is to disassociate them from one another (ie striving toward > anonymity). First, that's a helluva big "besides". Caching for performance is no small matter. Secondly, if you don't see other reasons besides these two then perhaps you should think about it some more. I've seen such overlay networks used many times, usually when the implementor thinks they're better able to provide security/QoS guarantees or load distribution or fault-tolerant routing better than the underlying network does. Just about every CDN fits that description, as might Swarmcast or even Uprizer. > Yes, but the web already works, I don't see any reason to reimplement > it. What I am interested in is disassocating data from any physical > location (my motivations for which are mostly political) That's an entirely valid and even laudable interest, and if it guides your choices then so be it. However, it's also a rather uncommon interest in the overall scheme of things, which usually means that those choices don't generalize very well to other projects/applications with different goals. It's absolutely fine for you to view the world through Freenet-colored glasses because that's your project, but you should be prepared to accept that others will probably see things a little differently. > The related problem of creating a file sharing system like Napster and > Gnutella that isn't centralized like the prior or limited in size like > the latter is, while not directly what I am doing ATM, certainly > interesting and useful. > > I would be very interested to hear more about "your project" however. > You have a reference to a paper or implementation? As I said earlier, my employer is not known for participating in the open exchange of ideas, so I'm somewhat constrained in how much detail I can provide. Even mentioning it could conceivably get me in trouble. Sometimes I regret staying here to do this instead of going off on my own to do it, because my personal inclination is to keep things open, but there's something to be said for having the resources of such a behemoth at one's disposal and that's the tradeoff I chose to make. In any case all of my options are underwater so maybe I don't care. ;-) The easiest starting point is probably my "Beyond FTP" article at infoAnarchy (http://www.infoanarchy.org/?op=displaystory;sid=2001/5/16/153927/189). My goal is to satisfy all of the requirements mentioned; quite bluntly I believe that anything less is a lazy cop-out that addresses programmer egos more than user needs. Beyond these "surface" requirements, my primary goal is enhanced performance and scalability. There are also intrinsic fault-recovery and security (but not anonymity) requirements. Structurally, the system starts with a recognition that all nodes are not created equal. There are therefore N root nodes (N is likely to be small), each independently capable of anchoring the entire system so that N-1 concurrent root failures can be tolerated. The remaining nodes use routing methods to form interlocking trees leading to these roots, in a fashion that is actually quite reminiscent of Plaxton. Data is aggressively cached everywhere to enhance performance and scalability, while maintaining full consistency (sequential consistency, for those who care). The service provided is block-level, appearing to the system as a disk drive, so there are no significant naming issues to be dealt with. To make this useful, one needs a shared-storage filesystem, which is a pretty exotic kind of creature but it just so happens that I recently co-designed one (http://www.emc.com/products/software/highroad.jsp). For various reasons that I won't go into, GFS might be even better suited to this environment, but I haven't had a chance to pursue that just yet. In case anyone is thinking these are just dreams, I should mention that I already have a working version. There are a few pieces still missing - most notably sophisticated routing and some security stuff that I'm not going to talk about - and some implementation shortcuts have been taken (some stuff is still in user space that doesn't truly belong there) but the core functionality is there and it's already good enough for internal demonstrations. Very limited performance analysis has so far yielded results exceeding expectations for this stage, and only non-technical aspects of the project are holding back further progress. No, you won't be seeing the source any time soon. Sorry. So now you know where I'm coming from. I hope that my description will help you realize why I tend to look at things in a somewhat non-canonical way. It might also explain some of my frustration with the people I see on mailing list after mailing list, website after website, conference after conference, preannouncing and reannouncing their projects multiple times, reinforcing each others' fundamental biases even as they argue over arcane details. Sometimes it seems like everyone's attacking that medieval castle through the heavily fortified north gate while the east, west and south gates remain lightly defended. From lucas at gonze.com Sun Jun 24 11:27:01 2001 From: lucas at gonze.com (Lucas Gonze) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] names of kinds of topology In-Reply-To: Message-ID: Zooko - In trying to parse your idea I run up against a problem with this language from #2 "For clients to interact with each other, they must use the *same* server." Did you mean to say "For clients to interact with each other, they do not have to use the *same* server."? The reason I have a problem with it is that I believe the core difference is where aggregation happens -- at the client/endpoint or at the server/nearest_intermediary. If not for that funny language, I would say that the difference between #1 and #2 is that #1 doesn't have agents and #2 does. The reason I bring up agents is that servers that aggregate results from other servers are taking initiative on behalf of the client/endpoint. - Lucas > Here are two common "design patterns" in network topology that I frequently > encounter and I wish I had a specific name for. (For these topologies, > "decentralized" is neither specific nor unambiguous, but then neither is > "centralized".) > > 1. There are many servers. Service happens bilaterally between client and > server. For clients to interact with each other, they must use the *same* > server. Clients can use multiple servers, aggregating results from multiple > servers and dynamically choosing which servers to use. > > Mojo Nation's "content tracking" architecture is currently of this model. > > 2. There are many servers. Service happens bilaterally between client and > server. For clients to interact with each other, they must use the *same* > server. Servers aggregate results from other servers. Clients can only use > one server, and they choose which server to use. > > IRC is of this model. > > Jukka Santala had a patch that allowed Mojo Nation "Content Trackers" to suck > information out of each other (acting as clients in that transaction), which > would change the topology of Mojo Nation content tracking to include servers > aggregating information from one another as well as client aggregating > information from multiple servers. > > So anyway, what are the names for these two topologies? > > Regards, > > Zooko > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > From zooko at zooko.com Mon Jun 25 09:02:01 2001 From: zooko at zooko.com (zooko@zooko.com) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] names of kinds of topology In-Reply-To: Message from "Lucas Gonze" of "Sun, 24 Jun 2001 14:24:44 EDT." References: Message-ID: Lucas Gonze wrote: > > In trying to parse your idea I run up against a problem with this language from > #2 "For clients to interact with each other, they must use the *same* server." > Did you mean to say "For clients to interact with each other, they do not have > to use the *same* server."? Yes! This was a cutnpast-o. I'm very sorry -- cutnpast-o's are especially confusing when you are trying to establish terminology. :-/ > The reason I have a problem with it is that I believe the core difference is > where aggregation happens -- at the client/endpoint or at the > server/nearest_intermediary. If not for that funny language, I would say that > the difference between #1 and #2 is that #1 doesn't have agents and #2 does. > The reason I bring up agents is that servers that aggregate results from other > servers are taking initiative on behalf of the client/endpoint. This makes sense to me. Although I am leary of the "agent" buzzword, I agree that an important distinction between these two models is that in #2, data aggregation on the client's behalf is happening remotely from the client. Regards, Zooko From lucas at gonze.com Tue Jun 26 08:55:01 2001 From: lucas at gonze.com (Lucas Gonze) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] names of kinds of topology In-Reply-To: Message-ID: per Zooko: > This makes sense to me. Although I am leary of the "agent" buzzword, I agree > that an important distinction between these two models is that in #2, data > aggregation on the client's behalf is happening remotely from the client. can't say I love 'agent' myself -- it is hopelessly fuzzy terminology. - Lucas From bram at gawth.com Thu Jun 28 03:14:02 2001 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] BitTorrent is out Message-ID: My new P2P app, BitTorrent, is out, you can get it here - http://bitconjurer.org/BitTorrent/ In a nutshell, it gets people to upload by bartering for bytes, and has some very sophisticated and robust algorithms for doing load balancing and dealing with low uptime. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bram at bitconjurer.org Fri Jun 29 17:29:01 2001 From: bram at bitconjurer.org (Bram Cohen) Date: Sat Dec 9 22:11:42 2006 Subject: [p2p-hackers] New release of BitTorrent out Message-ID: I've put up a new release, you can get it here - http://bitconjurer.org/BitTorrent/BitTorrent-01-00-01.tar.gz It's also now the one linked off the BitTorrent pages. This is a bug fix release - it no longer hoses the CPU after connection fails, and doesn't throw an exception when an upload connection hasn't gotten it's status set yet. The next version will include copying of comments from blobs_want to blobs_have, instructions when you execute with inappropriate parameters, and (hopefully) crypto in C. That last one I could use some help with - if anyone throws together a public domain distutils-ified C implementation of StreamEncrypter.py, I'll include it immediately (should be fairly straightforward - it's just rijndael in counter mode). Otherwise it's dependant on me successfully bugging my younger brother to write one. -Bram Cohen