From blanu at uts.cc.utexas.edu  Fri Jun  1 00:10:02 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] Reputation System: "Dimensions of Trust"
In-Reply-To: <3B173A7F.970737BA@neurogrid.com>
Message-ID: <Pine.OSF.4.33.0106010152010.23609-100000@curly.cc.utexas.edu>

> I guess I'm talking about reputations systems in general, or trust in general rather than
> how it applies to Freenet.

I'm talking about a reputation system for Freenet, in response to the
question, "Can you imagine a reputation system for Freenet?" Also, the
general trust system your talking about does not apply to Freenet in a
meaningful way.

> I guess as far as Freenet goes, your concern was that if we punish nodes for serving up
> data that doesn't match the hash that you requested, then you are potentially punishing
> the innocent nodes that are just forwarding this data.  How about having the intermediate
> nodes check that the file they are sending on actually matches the original request?

My concern is that punishing nodes based on the user's perception of the
goodness of a file as opposed to whether the node returned the requested
file will destroy the routing of the network as the network is organized
to get a given key efficiently, not to get a given psychological result
efficiently. So if you take a system which is optimized to produce from a
given hash a file that has that hash and you penalize nodes based on
whether the produced file is a picture of a kitten, you will end up with a
network which is not very good at finding files or pictures of kittens.

> My understanding of Freenet is limited, but to sum up what I have worked out, it seems
> that you search freenet for specific documents via their hashes, and inefficiencies arise
> when nodes return bad data.  If you can work out which nodes are returning bad data then
> you can avoid asking them and return good data more efficiently, which is what you might
> want a trust system for.

I want a trust system for keeping my node from talking to spies and
keeping my node from talking to nodes which will attempt to slip me bogus
results when I asked for a file which hasn't been somehow
cryptographically secured (by putting its hash or signature in the key).

> Personally I'd like to be searching via keywords and learning which other users in the
> system have similar data labelling approaches as me, so that I can get in touch with them
> to find out new stuff, and I think that this framework requires reputation management, but
> this is all beyond Freenet's scope because it searches via hashes, right?  I'm not going
> to be able to say I want some information on XYZ, what've you got?  Or if I am then I am
> going to have to query a Freenet key index, not Freenet itself, right?

This is a layer on top of the base Freenet architecture. It can all be
done totally over Freenet. However, the basic architecture just gets a
file give the file's key. A key can be arbitrarily assigned, based on the
file's hash, or based on a signature from the publisher. Keyword searching
is implemented entirely on top of this system. How this is done is
somewhat complicated and tangential and the subject of my talk at the P2P
conference in September. But the point is that nodes know about files with
attached keys. But when you search via keywords or various other kinds of
metadata such as recommendations, published rankings of files, etc.,
you're using a system in a different layer, which knows nothing about
nodes, only about publishers and "sites" (groups of documents). Also, the
node layer knows nothing about publishers and sites, but only about keys
and files. So you can't meaningfully mix penalizations between publishers
and nodes.

No node tells you that the keyword "kitten" is matched by the file "1234".
A site tells you that. Put you don't fetch the file "1234" from a site,
you fetch it from a node.

It's kind of like the difference between a website and a hard drive.


From sam at neurogrid.com  Fri Jun  1 00:48:01 2001
From: sam at neurogrid.com (Sam Joseph)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] Reputation System: "Dimensions of Trust"
References: <Pine.OSF.4.33.0106010152010.23609-100000@curly.cc.utexas.edu>
Message-ID: <3B174836.29F633B9@neurogrid.com>


Brandon wrote:

> > I guess I'm talking about reputations systems in general, or trust in general rather than
> > how it applies to Freenet.
>
> I'm talking about a reputation system for Freenet, in response to the
> question, "Can you imagine a reputation system for Freenet?" Also, the
> general trust system your talking about does not apply to Freenet in a
> meaningful way.

Yes, sorry for muddying the water.

> > I guess as far as Freenet goes, your concern was that if we punish nodes for serving up
> > data that doesn't match the hash that you requested, then you are potentially punishing
> > the innocent nodes that are just forwarding this data.  How about having the intermediate
> > nodes check that the file they are sending on actually matches the original request?
>
> My concern is that punishing nodes based on the user's perception of the
> goodness of a file as opposed to whether the node returned the requested
> file will destroy the routing of the network as the network is organized
> to get a given key efficiently, not to get a given psychological result
> efficiently. So if you take a system which is optimized to produce from a
> given hash a file that has that hash and you penalize nodes based on
> whether the produced file is a picture of a kitten, you will end up with a
> network which is not very good at finding files or pictures of kittens.

Yeah, I got that.  I'm not talking about assessing whether the system is returning a good
picture of a set of kittens now.  I'm talking about having nodes return a document that matches
the hash that you used to request the document in the first place.

My understanding of what you called bad data, is when I ask for data via the hash 1234 and I get
something back that doesn't match the hash 1234.  And I was then asking if you couldn't have all
of the nodes in the chain of nodes passing the file back checking if the hash of the file
matched the hash of the request.  Sure it would take time, but the bad data would be caught
before it got passed all the way across the network, and you could preferentially ask for data
from nodes that had a reputation for returning the correct responses (in terms of the hash), and
help others not make the same mistakes by making that trust information available.

I have taken your point about the kittens, and I am now talking just about making sure files
match the hashes that we are using to search for them.

> > My understanding of Freenet is limited, but to sum up what I have worked out, it seems
> > that you search freenet for specific documents via their hashes, and inefficiencies arise
> > when nodes return bad data.  If you can work out which nodes are returning bad data then
> > you can avoid asking them and return good data more efficiently, which is what you might
> > want a trust system for.
>
> I want a trust system for keeping my node from talking to spies

Can you define a spy.

> and
> keeping my node from talking to nodes which will attempt to slip me bogus
> results when I asked for a file which hasn't been somehow
> cryptographically secured (by putting its hash or signature in the key).

I can read the above two ways.  Do you mean that any time anybody sends me an insecure file,
that this is a bogus result?  That we want to trust nodes that give us bogus results less?  If
so then the probabiltity metrics I was talking about coudl be used right?  Like if you gave me 1
bogus file out of 100 I requested, I'd trust you 99%.

Would we have to demand that all nodes checked that they weren't forwarding bogus results in
order to make sure this was fair?  Could this even be done?

> > Personally I'd like to be searching via keywords and learning which other users in the
> > system have similar data labelling approaches as me, so that I can get in touch with them
> > to find out new stuff, and I think that this framework requires reputation management, but
> > this is all beyond Freenet's scope because it searches via hashes, right?  I'm not going
> > to be able to say I want some information on XYZ, what've you got?  Or if I am then I am
> > going to have to query a Freenet key index, not Freenet itself, right?
>
> This is a layer on top of the base Freenet architecture. It can all be
> done totally over Freenet. However, the basic architecture just gets a
> file give the file's key. A key can be arbitrarily assigned, based on the
> file's hash, or based on a signature from the publisher. Keyword searching
> is implemented entirely on top of this system. How this is done is
> somewhat complicated and tangential and the subject of my talk at the P2P
> conference in September. But the point is that nodes know about files with
> attached keys. But when you search via keywords or various other kinds of
> metadata such as recommendations, published rankings of files, etc.,
> you're using a system in a different layer, which knows nothing about
> nodes, only about publishers and "sites" (groups of documents). Also, the
> node layer knows nothing about publishers and sites, but only about keys
> and files. So you can't meaningfully mix penalizations between publishers
> and nodes.

I look forward to hearing your talk.  I suspect I will ask a question like "couldn't we gain
more efficient file transfer by building the content awareness in at the network level", but let
us leave the enusing debate till September ;-)

CHEERS> SAM


From oskar at freenetproject.org  Fri Jun  1 02:39:01 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens forpersistence
In-Reply-To: <NEBBJIHMMLKHEOPNOGHDKEKGDMAA.lucas@gonze.com>; from lucas@gonze.com on Thu, May 31, 2001 at 04:45:14PM -0400
References: <Pine.OSF.4.33.0105311528180.19690-100000@curly.cc.utexas.edu> <NEBBJIHMMLKHEOPNOGHDKEKGDMAA.lucas@gonze.com>
Message-ID: <20010531235714.R1075@hobbex.localdomain>

On Thu, May 31, 2001 at 04:45:14PM -0400, Lucas Gonze wrote:
<> 
> A different question: can you conceive of a reputation system for freenet?  The
> reason this interests me is that it pushes the limits of anonymous pseudonyms.

There are at least two levels of reputation systems conceivable on this
type of network. The first is a reputation system between nodes, where
nodes that behave well gain reputation with their peers, the second is a
semantic reputation system of people vouching for the validity of data
(particularly metadata).

A reputation system between nodes doesn't reflect anything on the users
really, so it is not an anonymity issue. The issue with that falls
squarely inside the limited connectivity problem - making the routing work
when each node can only be connected to a limited number of others (which
is useful for many other things, physical lack of connectivity being the
most obvious). Freenet routing has not tackled the problem of limited
connectivity, because, honestly, we still need make it work better in a
full connectivity situation. The hypercube like routing systems (Plaxton,
Tapestry, Pastry, Chord etc) don't deal with limited connectivity either
though, in fact I know of nothing that does (except IP routing of course).

The second sort of system does have the issues of revealing identity that
you describe, though I would say you are describing a subset of a larger
problem within pseudonymity - that the more information somebody generates
the easier it is to match different nyms, and the only solution to that is
to avoid producing any information bound to your meatspace self (the
Unabomber problem, I guess). But it is not a freenet issue since such a
system could be implemented completely in metadata.

> Lets say you have a public key that is not explicitly associated with your
> meatspace identity.  The more reputation data gathered on that key, the more
> stuff that can point to your meatspace identity.  EG, you use a key for a few
> years.  At some point you buy a plane ticket and have it mailed to your house.
> At that point all the other data can be correlated back to your address.  A few
> years later you buy a book using that key.  The cops come to that address and
> find that book in your shelf.
> 
> What this points to is a need to either sacrifice anonymity or churn identities,
> and churning puts an upward limit on the quality of reputation data.
> 
> So a highly anonymous design like Freenet's might be able to use reputation
> attached to persistent pseudonyms, but would have to churn the pseudonyms fairly
> often.
> 
> - Lucas
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From oskar at freenetproject.org  Fri Jun  1 02:39:02 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens forpersistence
In-Reply-To: <20010531173727.I27652@belegost.mit.edu>; from arma@mit.edu on Thu, May 31, 2001 at 05:37:27PM -0400
References: <Pine.OSF.4.33.0105311528180.19690-100000@curly.cc.utexas.edu> <NEBBJIHMMLKHEOPNOGHDKEKGDMAA.lucas@gonze.com> <20010531173727.I27652@belegost.mit.edu>
Message-ID: <20010601111327.B658@hobbex.localdomain>

On Thu, May 31, 2001 at 05:37:27PM -0400, Roger Dingledine wrote:
> On Thu, May 31, 2001 at 04:45:14PM -0400, Lucas Gonze wrote:
> > A different question: can you conceive of a reputation system for freenet?  The
> > reason this interests me is that it pushes the limits of anonymous pseudonyms.
>  
> For better phrasing and terminology (anonymous pseudonyms is a messy
> phrase), take a look at
> http://www.cert.org/IHW2001/terminology_proposal.pdf
> 
> It should help you get a handle on the different types of anonymity,
> pseudonymity, nymity, etc. Note that it's still a document in draft form,
> and it's still changing.

Ah, that should be very helpful.

<> 
> Anyway, there's a whole lot to be covered here. Since pseudonyms and
> reputations (and indeed anonymity) are a tricky thing to analyze in a
> dynamic and distributed environment, I would recommend starting with
> a simpler model than Freenet -- Freenet's haphazard design makes it
> extremely difficult to prove or analyze any complex properties.

Your somewhat predictable endless jabs withstanding, I would have to agree
that there is no reason to bring Freenet into an analysis of high level
personal reputation systems. For our contexts, any balloon-and-honey-pot
model (http://www.machaon.ru/pooh/chap6.html) should be enough of a
base for observing the linkability of pseudonyms placed on data.

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From lucas at gonze.com  Fri Jun  1 09:01:15 2001
From: lucas at gonze.com (Lucas Gonze)
Date: Sat Dec  9 22:11:42 2006
Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable tokens forpersistence
Message-ID: <NEBBJIHMMLKHEOPNOGHDAEMHDMAA.lucas@gonze.com>

per Roger:
> > For better phrasing and terminology (anonymous pseudonyms is a messy
> > phrase), take a look at
> > http://www.cert.org/IHW2001/terminology_proposal.pdf

Definitely useful.  Makes me realize that the phrase 'anonymous pseudonyms' is
about as articulate as 'fooey whatchamacallit'.  Still, they agree with my
original point that pseudonym churn is inevitable when anonymity matters.

So this brings me to a thought about an attack on Freenet.  It's a reputation
attack based on the idea of behavioral signatures.

First, I'm going to find a way to identify nodes without having explicit
pseudonyms.  An attacker modifies their node so that anytime it gets a piece of
behavioral data on a node it publishes that data.  Modem speed, chunks found,
time of lookup, IP address, anything it can find.  At the same time, whenever a
compromised node connects to a new one it looks up published recordings of
behavioral data that match the observed characteristics.  The more data you get
the more likely it is that nodes can be identified.  In other words any
persistent behaviors can be enough to establish linkability.

Second, the attacker records all datums passed to the observed node and
publishes these recordings.

If that node's behavioral signature is long-lived enough then there is
effectively a long-lived pseudonym.  As noted in that terminology paper, the
longer an identity persists the more data there is for an intersection attack.
So if there do exist behavioral signatures that can stand in for pseudonyms then
intersection attacks are possible.  The only way to defeat this is to churn
_behaviors_.

Obviously this depends on the idea of behavioral signatures.  I would say that
there needs to be a lot of data and a lot of CPU for finding these, but that
there does exist some quantity of data and CPU that could accomplish it.  So the
question to answer is whether that amount is within any feasible budget.

- Lucas


From hal at finney.org  Fri Jun  1 10:43:01 2001
From: hal at finney.org (hal@finney.org)
Date: Sat Dec  9 22:11:42 2006
Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable tokens forpersistence
Message-ID: <200106011735.KAA01997@finney.org>

Lucas Gonze, <lucas@gonze.com>, writes:
> So this brings me to a thought about an attack on Freenet.  It's a reputation
> attack based on the idea of behavioral signatures.

I think you need to clarify what aspect you are attacking.  In other words,
you need to say, Freenet (or whatever network) claims to achieve security
property X, and here is an attack on X.

The property you seem to be attacking is that a node can prevent anyone
from knowing that they are connecting to the same node over time.
As far as I know this is not a property sought or claimed by Freenet.

Hal

From blanu at uts.cc.utexas.edu  Fri Jun  1 13:52:01 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable tokens
 forpersistence
In-Reply-To: <NEBBJIHMMLKHEOPNOGHDAEMHDMAA.lucas@gonze.com>
Message-ID: <Pine.OSF.4.33.0106011548440.23609-100000@curly.cc.utexas.edu>

> If that node's behavioral signature is long-lived enough then there is
> effectively a long-lived pseudonym.  As noted in that terminology paper, the
> longer an identity persists the more data there is for an intersection attack.
> So if there do exist behavioral signatures that can stand in for pseudonyms then
> intersection attacks are possible.  The only way to defeat this is to churn
> _behaviors_.

I don't see the point of this attack. In 0.4 all nodes will be identified
by a persistent public key anyway. If nodes churned their public keys,
this attack would allow you to discover the probable identity of a node
across multiple public keys. However, nodes won't churn public keys, I
don't think, as I don't see any reason for a node to change its identity.
In fact, it would be better for routing if nodes always kept the same
identity.


From Verbatim3D at aol.com  Fri Jun  1 23:23:02 2001
From: Verbatim3D at aol.com (Verbatim3D@aol.com)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence
Message-ID: <a0.1546e9cb.2849e03d@aol.com>

True but all content that is shown on the net might as well be called illegal if the " GOVERNMENT : Wants to get Rid of all NAPSTERS ! Wake up call to the US There is no stoping us !! 


 ===== - Verb

From oskar at freenetproject.org  Sat Jun  2 03:11:01 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence
In-Reply-To: <a0.1546e9cb.2849e03d@aol.com>; from Verbatim3D@aol.com on Sat, Jun 02, 2001 at 02:22:53AM -0400
References: <a0.1546e9cb.2849e03d@aol.com>
Message-ID: <20010602121228.A639@hobbex.localdomain>

Go away. 

On Sat, Jun 02, 2001 at 02:22:53AM -0400, Verbatim3D@aol.com wrote:
> True but all content that is shown on the net might as well be called illegal if the " GOVERNMENT : Wants to get Rid of all NAPSTERS ! Wake up call to the US There is no stoping us !! 
> 
> 
>  ===== - Verb
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From lucas at gonze.com  Sat Jun  2 10:57:01 2001
From: lucas at gonze.com (Lucas Gonze)
Date: Sat Dec  9 22:11:42 2006
Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable tokensforpersistence
In-Reply-To: <Pine.OSF.4.33.0106011548440.23609-100000@curly.cc.utexas.edu>
Message-ID: <NEBBJIHMMLKHEOPNOGHDEEOBDMAA.lucas@gonze.com>

> In 0.4 all nodes will be identified
> by a persistent public key anyway.

Ah.  It will be moot.

Questions:  what are the public keys used for?  How much activity can be linked
to a node via the key?  If a malicious nodes shared their knowledge of actions
taken by a node, using the public key to coordinate, how much comopromising data
would they have to share?


From bram at gawth.com  Sat Jun  2 11:23:01 2001
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for
 persistence
In-Reply-To: <20010602121228.A639@hobbex.localdomain>
Message-ID: <Pine.LNX.4.21.0106021111410.29647-100000@ultra.gawth.com>

On Sat, 2 Jun 2001, Oskar Sandberg wrote:

> Go away. 
> 

Oskar, you're an asshole.

While normally I'd view this as your problem, your constant unwelcoming
attitude in anything you can post to does serious damage. I wouldn't
accept your help with any project I'm involved in regardless of your
technical skill because I know that long term (and probably even short
term) your presence would do more harm than good. Your actually getting
paid to work on Freenet is a testament to it's lack of cultural
leadership.

You might want to change your attitude, for the sake of your own career. I
for one actively forewarn people about you, and frankly, your childish
inability to admit you could ever be wrong leads to poor design decisions.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From oskar at freenetproject.org  Sat Jun  2 12:33:01 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence
In-Reply-To: <Pine.LNX.4.21.0106021111410.29647-100000@ultra.gawth.com>; from bram@gawth.com on Sat, Jun 02, 2001 at 11:23:20AM -0700
References: <20010602121228.A639@hobbex.localdomain> <Pine.LNX.4.21.0106021111410.29647-100000@ultra.gawth.com>
Message-ID: <20010602213415.B838@hobbex.localdomain>

Umm, is this not the list that was going to known as "p2p-elitists"? I
seem to remember that the process for scaring away lamers was established
even before the list was, as a compromise so that it didn't need to be 
invite-only.

This list will descend into pointlessness in a second if we don't tell off
people who post like that. My attitude toward lamers is not an attribute
of my character but a behavioral adaption learned through necessity.

Of course, the same thing goes for personal feuds, so I'll accept your
criticism. Like anything else there is probably a grain of truth in it. I
hope your projects prosper without me.

On Sat, Jun 02, 2001 at 11:23:20AM -0700, Bram Cohen wrote:
> On Sat, 2 Jun 2001, Oskar Sandberg wrote:
> 
> > Go away. 
> > 
> 
> Oskar, you're an asshole.
> 
> While normally I'd view this as your problem, your constant unwelcoming
> attitude in anything you can post to does serious damage. I wouldn't
> accept your help with any project I'm involved in regardless of your
> technical skill because I know that long term (and probably even short
> term) your presence would do more harm than good. Your actually getting
> paid to work on Freenet is a testament to it's lack of cultural
> leadership.
> 
> You might want to change your attitude, for the sake of your own career. I
> for one actively forewarn people about you, and frankly, your childish
> inability to admit you could ever be wrong leads to poor design decisions.
> 
> -Bram Cohen
> 
> "Markets can remain irrational longer than you can remain solvent"
>                                         -- John Maynard Keynes
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From oskar at freenetproject.org  Sat Jun  2 12:42:01 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable tokensforpersistence
In-Reply-To: <NEBBJIHMMLKHEOPNOGHDEEOBDMAA.lucas@gonze.com>; from lucas@gonze.com on Sat, Jun 02, 2001 at 01:54:35PM -0400
References: <Pine.OSF.4.33.0106011548440.23609-100000@curly.cc.utexas.edu> <NEBBJIHMMLKHEOPNOGHDEEOBDMAA.lucas@gonze.com>
Message-ID: <20010602214253.C838@hobbex.localdomain>

On Sat, Jun 02, 2001 at 01:54:35PM -0400, Lucas Gonze wrote:
> > In 0.4 all nodes will be identified
> > by a persistent public key anyway.
> 
> Ah.  It will be moot.
> 
> Questions:  what are the public keys used for?  

Identifying nodes. If a node gains a reference for a key, you want to know
that is the same node you end up connecting too.

> How much activity can be linked
> to a node via the key?  

All activity.

> If a malicious nodes shared their knowledge of actions
> taken by a node, using the public key to coordinate, how much comopromising data
> would they have to share?

If all nodes share everything about the node, then everything will be
known. If less than all nodes share everything, then the probability of
guilt will be increased accordingly. It is definitely a weakness, but
that's the model we have, and the collaboration weakness as well as the TA
weakness, has not been denied.

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From blanu at uts.cc.utexas.edu  Sat Jun  2 14:47:01 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] Reputation System: "Dimensions of Trust"
In-Reply-To: <3B174836.29F633B9@neurogrid.com>
Message-ID: <Pine.OSF.4.33.0106021543440.3940-100000@curly.cc.utexas.edu>

> Yeah, I got that.  I'm not talking about assessing whether the system is returning a good
> picture of a set of kittens now.  I'm talking about having nodes return a document that matches
> the hash that you used to request the document in the first place.

That is currently taken care of by the Freenet architecture. If after a
file is transferred the hash doesn't match then the reference to the node
which supplied it isn't stored. So references to that hash will drift away
from that node.

What I think we need a reputation system for is things which cannot be
determined automatically by a node. As far as I can tell, that's spies and
nodes which will subvert files which are not cryptographically secured
files.

> Can you define a spy.

Certainly. A spy is a node which is there in order to gather data about
network activity. An IP harvester is one kind of spy. While I want as many
good nodes to find out my IP (for increased routing efficiency), I don't
want any nodes which are working as IP harvesters for the enemy to find
out my IP. If such a node were to contact me, I would want to play dumb
and say that I was a simple web server. There is unfortunately no way that
my node can automatically tell by talking to another node whether it is a
friendly node trying to upload the right to free expression, or an evil
node attempting to collect node IPs so that we can all be shot. This
requires a reputation system by which I can tell you which nodes I trust
and you can determine how much you believe me, based on how much you trust
me.

> > keeping my node from talking to nodes which will attempt to slip me bogus
> > results when I asked for a file which hasn't been somehow
> > cryptographically secured (by putting its hash or signature in the key).
>
> I can read the above two ways.  Do you mean that any time anybody sends me an insecure file,
> that this is a bogus result?

No, not at all. If I insert a file into the network called "kitten.jpg"
and e-mail you give you the key and assure you with all my heart that this
is a picture of a kitten and you download it and it's a picture of George
Washington, then one of two things needs to occur. Either you lessen your
trust for me because I'm obviously some kind of crazed pathological liar,
or you lessen your trust for the node that served up the file, as someone
is obviously trying to pull a fast one in the network. However, if it is
indeed a picture of a kitten then you can be (fairly) sure that no one in
the network did anything funny, as it would be really pointless to replace
one picture of a kitten with another.

With secured files, there's no need for a reputation system. Either the
hash matches or it doesn't. With insecure files, it's rather more
difficult to tell what's going on because there's this blurred issue of
trust of the publisher vs. trust of the network.

> That we want to trust nodes that give us bogus results less?  If
...
> so then the probabiltity metrics I was talking about coudl be used right?  Like if you gave me 1
> bogus file out of 100 I requested, I'd trust you 99%.

Yes. The problem is that integrating routing by trust in the network level
might screw up the efficient finding of information using routing by
closeness. The effects are unclear at the moment.

> I look forward to hearing your talk.  I suspect I will ask a question like "couldn't we gain
> more efficient file transfer by building the content awareness in at the network level", but let
> us leave the enusing debate till September ;-)

That's very sporting of you, giving me the hard question well in advance.
Thanks. :-)


From bram at gawth.com  Sat Jun  2 15:29:01 2001
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for
 persistence
In-Reply-To: <20010602213415.B838@hobbex.localdomain>
Message-ID: <Pine.LNX.4.21.0106021523400.29647-100000@ultra.gawth.com>

On Sat, 2 Jun 2001, Oskar Sandberg wrote:

> Umm, is this not the list that was going to known as "p2p-elitists"? I
> seem to remember that the process for scaring away lamers was established
> even before the list was, as a compromise so that it didn't need to be 
> invite-only.

You've been the same on this list, on freenet's dev list, on infoanarchy
and in real life.

I actually intended to create a list for p2p developers and make it
invite-only to exclude you quite specifically, but I mentioned the idea of
a developer's mailing list to zooko of the big mouth and he went and
started an open list and invited everybody.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From agl at linuxpower.org  Sat Jun  2 15:45:01 2001
From: agl at linuxpower.org (Adam Langley)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence
In-Reply-To: <Pine.LNX.4.21.0106021523400.29647-100000@ultra.gawth.com>
Message-ID: <20010603000149.A3138@linuxpower.org>

On Sat, Jun 02, 2001 at 03:28:41PM -0700, Bram Cohen wrote:
> You've been the same on this list, on freenet's dev list, on infoanarchy
> and in real life.

Oskar took up the recognised position of keeping newbies off the
freenet-devl list (not the chat list - which is free) and did it very
well.

Oskar is certainly forthright - and politically correct in the same
way as the Atlantic Ocean is nice and dry. Which can be damm
refeshing at times.

And having spent about 25 hrs flying sat next to him, and about 10
 days when we were (quite literally) within 20 meters for 99% of the
 time - I can tell you he's a damm nice person really. After that I
 could have axe murdered most people to get away.

You don't have to listen to him and can ignore him when he disagrees
with you. But if you disagree you should think damm hard about it
because Oskar's a very smart guy.

> I actually intended to create a list for p2p developers and make it
> invite-only to exclude you quite specifically,

Which says more about you than Oskar.

AGL

-- 
Don't believe everything you hear or anything you say.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20010602/5c9ecfdc/attachment.pgp
From oskar at freenetproject.org  Sat Jun  2 16:19:01 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence
In-Reply-To: <Pine.LNX.4.21.0106021523400.29647-100000@ultra.gawth.com>; from bram@gawth.com on Sat, Jun 02, 2001 at 03:28:41PM -0700
References: <20010602213415.B838@hobbex.localdomain> <Pine.LNX.4.21.0106021523400.29647-100000@ultra.gawth.com>
Message-ID: <20010603012024.A1008@hobbex.localdomain>

On Sat, Jun 02, 2001 at 03:28:41PM -0700, Bram Cohen wrote:
<> 
> I actually intended to create a list for p2p developers and make it
> invite-only to exclude you quite specifically, but I mentioned the idea of
> a developer's mailing list to zooko of the big mouth and he went and
> started an open list and invited everybody.

Oooh, I know this, "The No Oskars Club".

Anyways, if there is any particular thing that I have done to upset you,
then I would be glad to discuss it with you and hopefully straighten
things up, though this is hardly the place. If there isn't, and disliking
me just makes you feel good, then that is ok too.

I know you think I'm being sanctimonious, but I don't feel like fighting
or bearing any ill will against you regarding this. Good night.

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From oskar at freenetproject.org  Sat Jun  2 16:25:01 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] persistence in Freenet, exchangeable tokens for persistence
In-Reply-To: <20010603000149.A3138@linuxpower.org>; from agl@linuxpower.org on Sun, Jun 03, 2001 at 12:01:49AM +0100
References: <Pine.LNX.4.21.0106021523400.29647-100000@ultra.gawth.com> <20010603000149.A3138@linuxpower.org>
Message-ID: <20010603012546.B1008@hobbex.localdomain>

For the record, Adam is nice too. He's the sort of guy who'll run around
an entire hotel tracking down some aspirin to help aid one's self
inflicted deadly headache.

On Sun, Jun 03, 2001 at 12:01:49AM +0100, Adam Langley wrote:
> On Sat, Jun 02, 2001 at 03:28:41PM -0700, Bram Cohen wrote:
> > You've been the same on this list, on freenet's dev list, on infoanarchy
> > and in real life.
> 
> Oskar took up the recognised position of keeping newbies off the
> freenet-devl list (not the chat list - which is free) and did it very
> well.
> 
> Oskar is certainly forthright - and politically correct in the same
> way as the Atlantic Ocean is nice and dry. Which can be damm
> refeshing at times.
> 
> And having spent about 25 hrs flying sat next to him, and about 10
>  days when we were (quite literally) within 20 meters for 99% of the
>  time - I can tell you he's a damm nice person really. After that I
>  could have axe murdered most people to get away.
> 
> You don't have to listen to him and can ignore him when he disagrees
> with you. But if you disagree you should think damm hard about it
> because Oskar's a very smart guy.
> 
> > I actually intended to create a list for p2p developers and make it
> > invite-only to exclude you quite specifically,
> 
> Which says more about you than Oskar.
> 
> AGL
> 
> -- 
> Don't believe everything you hear or anything you say.


-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From blanu at uts.cc.utexas.edu  Sat Jun  2 21:44:02 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: FW: [p2p-hackers] persistence in Freenet, exchangeable
 tokensforpersistence
In-Reply-To: <NEBBJIHMMLKHEOPNOGHDEEOBDMAA.lucas@gonze.com>
Message-ID: <Pine.OSF.4.33.0106022339480.3940-100000@curly.cc.utexas.edu>

> > In 0.4 all nodes will be identified
> > by a persistent public key anyway.
>
> Ah.  It will be moot.
>
> Questions:  what are the public keys used for?

It makes node harvesting more challenging. You have to know a node's
public key in order to communicate with it. This eliminates the Media
Enforcer attack, which is to port scan huge ranges of IP addresses looking
for Freenet nodes, request contraband material from them, and then send a
letter to the ISP saying that the node served up contraband material. Now
you have to know a node's public key (by getting it from another node) in
order talk to it. So you have to actually run an active node in order to
find other nodes.

You can change your public key whenever you like, but it will uproot your
presense in the network as you will now appear to everyone concerned to be
a new node.


From sam at neurogrid.com  Sat Jun  2 22:50:01 2001
From: sam at neurogrid.com (Sam Joseph)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] Reputation System: "Dimensions of Trust"
References: <Pine.OSF.4.33.0106021543440.3940-100000@curly.cc.utexas.edu>
Message-ID: <3B19CFB2.A5E4DBE7@neurogrid.com>

Hi Brandon,

Brandon wrote:

> > Yeah, I got that.  I'm not talking about assessing whether the system is returning a good
> > picture of a set of kittens now.  I'm talking about having nodes return a document that matches
> > the hash that you used to request the document in the first place.
>
> That is currently taken care of by the Freenet architecture. If after a
> file is transferred the hash doesn't match then the reference to the node
> which supplied it isn't stored. So references to that hash will drift away
> from that node.

Okay, thanks for explaining that.

You don't think there would be any advantage in recording that information?  I guess it is stored to
the extent that a node that supplies the wrong hash doesn't get referenced too as much.  But that
doesn't stop the node being referenced for other things.  I'm just thinking that when a node is
deciding where to forward a query too, it code be sending the queries preferentially to those nodes
that have been consistent in providing correct hashes.

Of course I have no idea how frequently that occurs in Freenet, and so it may just not be worth the
effort. It depends ifyou think that nodes that are supplying incorrect data for one hash, are also
doing so for other things, and so one should preferentially forward queries to those nodes that
consistently supply accurate data, since Freenet is depth first and greedy right?  You might wait a
while before finding out that the node was returning inaccurate data, which slows search down.

But maybe it would not be so beneficial to implement, given that the node won't be referenced for that
hash subsequently ...

> > Can you define a spy.
>
> Certainly. A spy is a node which is there in order to gather data about
> network activity. An IP harvester is one kind of spy. While I want as many
> good nodes to find out my IP (for increased routing efficiency), I don't
> want any nodes which are working as IP harvesters for the enemy to find
> out my IP. If such a node were to contact me, I would want to play dumb
> and say that I was a simple web server. There is unfortunately no way that
> my node can automatically tell by talking to another node whether it is a
> friendly node trying to upload the right to free expression, or an evil
> node attempting to collect node IPs so that we can all be shot. This
> requires a reputation system by which I can tell you which nodes I trust
> and you can determine how much you believe me, based on how much you trust
> me.

So you're saying tthat there is no way to determine from the pattern of interactions at the network
level whether a node is a spy or not?  And there's no feedback to assess potential judgement calls.

The reputation system I was talking about applies equally well if the assessment of "spyness" is
generated by human users based on their suspicions.  Would be nice to quantify what the human users
where basing their suspicions on.  Like do they have access to some data that the nodes don't?  Emails
from people saying "I used node X to gather data about the network and now I'm going to attack you".
How does the human user recognise a spy.

The transitive trust thing that Zooko talked about (which is also part of NeuroGrid) applies equally,
the difficulty comes in whether there is any feedback process that allows you to assess the validity
of the claims being made ...

>
> > > keeping my node from talking to nodes which will attempt to slip me bogus
> > > results when I asked for a file which hasn't been somehow
> > > cryptographically secured (by putting its hash or signature in the key).
> >
> > I can read the above two ways.  Do you mean that any time anybody sends me an insecure file,
> > that this is a bogus result?
>
> No, not at all. If I insert a file into the network called "kitten.jpg"
> and e-mail you give you the key and assure you with all my heart that this
> is a picture of a kitten and you download it and it's a picture of George
> Washington, then one of two things needs to occur. Either you lessen your
> trust for me because I'm obviously some kind of crazed pathological liar,
> or you lessen your trust for the node that served up the file, as someone
> is obviously trying to pull a fast one in the network. However, if it is
> indeed a picture of a kitten then you can be (fairly) sure that no one in
> the network did anything funny, as it would be really pointless to replace
> one picture of a kitten with another.
>
> With secured files, there's no need for a reputation system. Either the
> hash matches or it doesn't.

So maybe I'm starting to get the distinction.  The fact that I can decrypt the file with the key
you've given me, certifies taht I'm receiving what you intended to send me.  Given that proviso that
your private key is secure, so any discrepancy between what you told m the file was, and what I think
the file is, is due to differences in opinion between me and you, and not related to what might or
might not have happened in the network.

> With insecure files, it's rather more
> difficult to tell what's going on because there's this blurred issue of
> trust of the publisher vs. trust of the network.

Sure.  That's true. But the extent to which it is a problem depends on the degree to which you think
that the source of misunderstandings and failures are the consequence of attempts to subvert the
system, or just due to different ways on looking at the world.  Freenet is clearly focused on creating
secure, anonymous channels of communication, because of the fear that those channels will be
subverted.  Whereas NeuroGrid is focused on trying to remove the failings and misunderstandings
introduced as a consequence of people's different perceptions, rather than active attempts at
subterfuge.

Ultimately which is more important is a matter of perspective, and of course the type of world in
which you live. That said, it is becoming clear that NeuroGrid could gain alot from using hashes, and
some kind of security where users encrypted not the files, but the relation of hashes and keywords -
so that there was less chance that someone would try and forge someone elses opinions.  It depends if
people's opinions get attached to particular nodes, which themselves can be secured ...

> > That we want to trust nodes that give us bogus results less?  If
> ...
> > so then the probabiltity metrics I was talking about coudl be used right?  Like if you gave me 1
> > bogus file out of 100 I requested, I'd trust you 99%.
>
> Yes. The problem is that integrating routing by trust in the network level
> might screw up the efficient finding of information using routing by
> closeness. The effects are unclear at the moment.

I guess so, although I don't think that you have any guarantees of efficiency through finding
information of routing by closeness, other than it appears to work.  We don't have any math to show
this do we?

What kind of efficiency are we talking about anyway?

>
> > I look forward to hearing your talk.  I suspect I will ask a question like "couldn't we gain
> > more efficient file transfer by building the content awareness in at the network level", but let
> > us leave the enusing debate till September ;-)
>
> That's very sporting of you, giving me the hard question well in advance.
> Thanks. :-)

My pleasure.  I thought you'd need a little time, since I'm hoping to have simulations, equations, and
an implementation by that point to make the question all the harder :-)

CHEERS> SAM


From oskar at freenetproject.org  Sun Jun  3 06:07:02 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] Reputation System: "Dimensions of Trust"
In-Reply-To: <3B19CFB2.A5E4DBE7@neurogrid.com>; from sam@neurogrid.com on Sun, Jun 03, 2001 at 02:48:34PM +0900
References: <Pine.OSF.4.33.0106021543440.3940-100000@curly.cc.utexas.edu> <3B19CFB2.A5E4DBE7@neurogrid.com>
Message-ID: <20010603150745.A914@hobbex.localdomain>

On Sun, Jun 03, 2001 at 02:48:34PM +0900, Sam Joseph wrote:
> Hi Brandon,
> 
> Brandon wrote:
> 
> > > Yeah, I got that.  I'm not talking about assessing whether the system is returning a good
> > > picture of a set of kittens now.  I'm talking about having nodes return a document that matches
> > > the hash that you used to request the document in the first place.
> >
> > That is currently taken care of by the Freenet architecture. If after a
> > file is transferred the hash doesn't match then the reference to the node
> > which supplied it isn't stored. So references to that hash will drift away
> > from that node.
> 
> Okay, thanks for explaining that.
> 
> You don't think there would be any advantage in recording that information?  I guess it is stored to
> the extent that a node that supplies the wrong hash doesn't get referenced too as much.  But that
> doesn't stop the node being referenced for other things.  I'm just thinking that when a node is
> deciding where to forward a query too, it code be sending the queries preferentially to those nodes
> that have been consistent in providing correct hashes.

In general, punishment is a good detriment to behavior for which there is
some probability of getting caught doing. If the probability of discovery
is low, then you can try to increase the punishment to shift the cost
analysis.

In this case, since all data on Freenet is hashed and signed, the chance
of being caught when trying to supply bad data is 100%, so punishments are
hardly necessary. Future versions of Freenet will keep some track of the
behavior of neighbor nodes, and use them less if they work badly, but
returning bad data is not the biggest motivation for this (simple no reply
attacks are really much more annoying).

> Of course I have no idea how frequently that occurs in Freenet, and so it may just not be worth the
> effort. It depends ifyou think that nodes that are supplying incorrect data for one hash, are also
> doing so for other things, and so one should preferentially forward queries to those nodes that
> consistently supply accurate data, since Freenet is depth first and greedy right?  You might wait a
> while before finding out that the node was returning inaccurate data, which slows search down.
> 
> But maybe it would not be so beneficial to implement, given that the node won't be referenced for that
> hash subsequently ...

The way I see it, nodes returning bad data should be considered broken,
and treated as such, not as attackers. There is certainly no reason to be
overly paranoid toward a node trying such a pitifully futile attack.

<> 
> So you're saying tthat there is no way to determine from the pattern of interactions at the network
> level whether a node is a spy or not?  And there's no feedback to assess potential judgement calls.

There certainly does not have to be. A normal Freenet node runs into many
others during the course of operation, so it can be used as a harvester
for node identities quite well. Even to the extent that nodes only have a
limited number of contacts, an attacker can always start an arbitrary
number of nodes to try to meat as many others as possible (and no, hacks
like limiting based on IPs simply don't work in anything but a short term 
low level perspective).

<> 
> > With insecure files, it's rather more
> > difficult to tell what's going on because there's this blurred issue of
> > trust of the publisher vs. trust of the network.
> 
> Sure.  That's true. But the extent to which it is a problem depends on the degree to which you think
> that the source of misunderstandings and failures are the consequence of attempts to subvert the
> system, or just due to different ways on looking at the world.  Freenet is clearly focused on creating
> secure, anonymous channels of communication, because of the fear that those channels will be
> subverted.  Whereas NeuroGrid is focused on trying to remove the failings and misunderstandings
> introduced as a consequence of people's different perceptions, rather than active attempts at
> subterfuge.

I think it is becoming pretty clear that this sort of thinking is a little
too akin to a "lets all be nice to one another society". Human hostility
and attacks are just as fundamental to human interaction as the
participants different ways of viewing the world, and systems must be
designed with that in mind.

< >
> > Yes. The problem is that integrating routing by trust in the network level
> > might screw up the efficient finding of information using routing by
> > closeness. The effects are unclear at the moment.
> 
> I guess so, although I don't think that you have any guarantees of efficiency through finding
> information of routing by closeness, other than it appears to work.  We don't have any math to show
> this do we?
> 
> What kind of efficiency are we talking about anyway?

We are just guessing, but there is a lot to indicate that
connectivity limits (especially very strict ones as such as system would
have to have to matter) could be harmful. We still have plenty of issues
using the full connectivity routing that we believe we can do better, so
maybe when we have nailed that limited connectivity will come up.


-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From lucas at gonze.com  Sun Jun  3 09:42:02 2001
From: lucas at gonze.com (Lucas Gonze)
Date: Sat Dec  9 22:11:42 2006
Subject: FW: [p2p-hackers] persistence in Freenet, exchangeabletokensforpersistence
In-Reply-To: <Pine.OSF.4.33.0106022339480.3940-100000@curly.cc.utexas.edu>
Message-ID: <NEBBJIHMMLKHEOPNOGHDKEOLDMAA.lucas@gonze.com>

> It makes node harvesting more challenging. You have to know a node's
> public key in order to communicate with it. This eliminates the Media
> Enforcer attack, which is to port scan huge ranges of IP addresses looking
> for Freenet nodes, request contraband material from them, and then send a
> letter to the ISP saying that the node served up contraband material. Now
> you have to know a node's public key (by getting it from another node) in
> order talk to it. So you have to actually run an active node in order to
> find other nodes.

My initial reaction is that this is a strong strategy because it forces
attackers to buy in, and at that point other self-protecting mechanisms can be
used.

What I don't like about it is that it enables strong linkability.  It looks like
freenet hackers are aware of that and have decided to move ahead anyway in the
absence of better ideas.  Yes?

> You can change your public key whenever you like, but it will uproot your
> presense in the network as you will now appear to everyone concerned to be
> a new node.

What problems come up for a new node?  What are the drawbacks?

- Lucas


From greg at electricrain.com  Mon Jun  4 21:59:01 2001
From: greg at electricrain.com (Gregory P. Smith)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] Reputation System: "Kittens of Trust"
In-Reply-To: <Pine.OSF.4.33.0106021543440.3940-100000@curly.cc.utexas.edu>; from blanu@uts.cc.utexas.edu on Sat, Jun 02, 2001 at 04:46:38PM -0500
References: <3B174836.29F633B9@neurogrid.com> <Pine.OSF.4.33.0106021543440.3940-100000@curly.cc.utexas.edu>
Message-ID: <20010604215847.A28157@zot.electricrain.com>

> No, not at all. If I insert a file into the network called "kitten.jpg"
> and e-mail you give you the key and assure you with all my heart that this
> is a picture of a kitten and you download it and it's a picture of George
> Washington, then one of two things needs to occur. Either you lessen your
> trust for me because I'm obviously some kind of crazed pathological liar,
> or you lessen your trust for the node that served up the file, as someone
> is obviously trying to pull a fast one in the network. However, if it is
> indeed a picture of a kitten then you can be (fairly) sure that no one in
> the network did anything funny, as it would be really pointless to replace
> one picture of a kitten with another.

Unless you are storing a kitten photo lineup on the network and having
witnesses identify who peed on your rug by giving them non hash based
links to the images in the network.  Then the urine happy kitten's
litter has lots to gain by replacing pictures of kittens with others
at random intervals to disrupt the lineup...

anyways, preaching to the choir here, we all know that hashing kittens
is the only way to know you've got the right kitten in this world.

digressingly,
Greg


From jim at at.org  Thu Jun  7 11:27:01 2001
From: jim at at.org (Jim Carrico)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] Reputation System: "Dimensions of Trust"
In-Reply-To: <Pine.OSF.4.33.0106021543440.3940-100000@curly.cc.utexas.edu>
References: <3B174836.29F633B9@neurogrid.com>
Message-ID: <l03130300b74492173f68@[209.53.57.211]>

Brandon Wiley, responding to Sam Joseph, said:
>> Can you define a spy.
>
>Certainly. A spy is a node which is there in order to gather data about
>network activity. An IP harvester is one kind of spy. While I want as many
>good nodes to find out my IP (for increased routing efficiency), I don't
>want any nodes which are working as IP harvesters for the enemy to find
>out my IP. If such a node were to contact me, I would want to play dumb
>and say that I was a simple web server. There is unfortunately no way that
>my node can automatically tell by talking to another node whether it is a
>friendly node trying to upload the right to free expression, or an evil
>node attempting to collect node IPs so that we can all be shot. This
>requires a reputation system by which I can tell you which nodes I trust
>and you can determine how much you believe me, based on how much you trust
>me.

Correct me if I'm missing something, but this suggests that the problem of
rating the "spyness"
of a node is equivalent to determining the "spyness" of the node operator,
or rather the aggregate spyness of everyone with physical access to the
hardware on which the node is running. (multiplied by a 'weighting factor'
assessing the host machine's vulnerability to subversion by skilled
attackers. eg. even if I've known bob since kindergarten, and am certain
he's not working for Dr. Evil, I may not rate his node very highly if I
have low confidence in his ability to secure his machine against evil
agents, etc.)

To make such a rating with confidence suggests a pretty intimate and
detailed real-world knowledge of the node operator(s), hardware, location,
etc.

If this is so, then it seems that a "white list" of trusted nodes will
itself provide a lot of information about real-world relationship between
node operators, a rich data source for intersection attacks against both
node and publisher anonymity. Even if such lists aren't associated directly
with nodes - ie they are signed by pseudonymous publishers as in Steven
Hazel's combined trust proposal, if enough of these lists were captured by
the bad guys,  it would seriously compromise the publisher/node
"unlinkability" that is obviously critical to the success of freenet's
stated mission.

Over and above the problems with routing that Brandon mentioned in an
earlier post, this seems like good reason to abandon the white list
concept, and with it perhaps any notion of a node-based reputation system
for Freenet.

However, in a scenario in which freenet's anonymity may actually be
*necessary* - eg. chinese dissidents whose lives really are in danger -
it's likely that simply running a freenet node will be enough to implicate
one as a dissident, regardless of what's on it or who it's connected to.
In other words, lack of a node trust system seriously compromises freenet's
effectiveness in precisely the area in which it is most needed. There are
no doubt refinements to the white list concept, and further layers of
obfuscation which may help to make nodes difficult to discover, but
intuition suggests this will always tend to compromise the routing
efficiency and general usefulness of the network. An alternative approach
is to try to develop freenet into a general-purpose tool with many benign
and banal purposes, that *also happens to guarantee free speech*.  This
would be the 'hide in plain sight' approach, related to the notion that
encrypted communications are only really effective if a significant
proportion of the total communications are encrypted.

The reason I'm thinking about all this is in consideration of an area in
which "anonymous speech" is ubiquitous, familiar and universally accepted:
the concept of the secret ballot, one of the principal underpinnings of
modern democratic political systems. The point of a secret ballot is so
that one can announce ones opinion without that opinion being "linkable" to
one's real identity. Concepts surrounding the transparency, integrity, and
fairness of elections are dependant on this principle - and so if it's a
good idea to have this ability once every four years (or so), why should it
not be a good idea to have it all the time?

Extrapolating some current trends of digital culture, we seem to be heading
toward a world in which everything we do, see, hear, or buy will be
*trackable* and linkable to our real identities.  Microsoft in particular
wants to "own" this data, and one can only presume that the US.gov's
apparent "softening" on the anti-trust breakup threat is linked to their
salivating over the unprecedented level of surveillance - at arms length -
this would provide. The only way to thwart such a dystopian scenario will
be to build systems which anonymize *everything* by default.  If one wishes
to announce any information publicly, of course anyone will always be free
to do that.  But if we can establish this principle, that everyone has a
right to *not be observed* by marketers, spammers, or spooks - then i think
it's a good bet that systems which guarantee this right will find their way
into the mainstream. This seems like the most effective way of providing
"cover" for political dissidence and other overtly threatened forms of
speech.

Zooko's "why am i not pseudonymous yet"
(http://www.inet-one.com/cypherpunks/dir.98.12.21-98.12.27/msg00018.html)
suggests that any truly dedicated agency will be able to match pseudonymous
identities (ie. publishers) with real identities (ie. nodes).  But as with
encrypted traffic, the more ubiquitous and *normal* it is to use anonymity
or pseudonymity on a daily basis, the more difficult the attackers job
becomes.

Am i missing something important?


From zooko at zooko.com  Thu Jun  7 12:24:01 2001
From: zooko at zooko.com (zooko@zooko.com)
Date: Sat Dec  9 22:11:42 2006
Subject: there is no security without a threat model (was: Re: [p2p-hackers] Reputation System: "Dimensions of Trust")
In-Reply-To: Message from Jim Carrico <jim@at.org> 
   of "Thu, 07 Jun 2001 11:25:58 PDT." <l03130300b74492173f68@[209.53.57.211]> 
References: <3B174836.29F633B9@neurogrid.com>  <l03130300b74492173f68@[209.53.57.211]> 
Message-ID: <E1585LE-000798-00@localhost>

 Jim Carrico wrote:
>
> Zooko's "why am i not pseudonymous yet"
> (http://www.inet-one.com/cypherpunks/dir.98.12.21-98.12.27/msg00018.html)
> suggests that any truly dedicated agency will be able to match pseudonymous
> identities (ie. publishers) with real identities (ie. nodes).  But as with
> encrypted traffic, the more ubiquitous and *normal* it is to use anonymity
> or pseudonymity on a daily basis, the more difficult the attackers job
> becomes.


I agree with your thesis that "Anonymity Loves Company", as the saying goes.

If you read "why i am not truly pseudonymous yet", please also read the recent
addendum [1].


One thing that can be said for certain is that attempts to provide security of
any kind desperately need to have the threat model made explicit.  Explicit
threat models will enable actual engineering rather than "shots in the dark".
Explicit threat models can also enable "good enough but not perfect" solutions
which can be deployed earlier and to a wider user base.  Finally, an explicit
threat model is absolutely required if an end user is going to be able to make
an informed decision about the risks he or she chooses.  The current common
practice of assuming an *implicit* threat model (which typically has rigorous
mathematical constraints but unexamined real-world constraints) is
reprehensible -- akin to building a bridge with a theoretically perfect design
but no actual testing and advertising it as "proven safe".


Until Freenet, or any other system, has an explicit threat model against which
its security guarantees are measured, it is not suitable to be trusted with
real world risks.


As a mea culpa, Mojo Nation was briefly guilty of this sort of irresponsible
behaviour last year when a "FAQ" question implied that Mojo Nation offered
anonymity.


Regards,

Zooko

[1] http://zooko.com/memory_lane.html

P.S.  To give credit where it is due, Bram Cohen is partially responsible for
influencing my thinking on this issue.  I remember him saying that the hardest
part of crypto engineering is deciding what threat model you are addressing.


From blanu at uts.cc.utexas.edu  Thu Jun  7 16:41:02 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] Re: there is no security without a threat model (was: Re: [p2p-hackers]
 Reputation System: "Dimensions of Trust")
In-Reply-To: <E1585LE-000798-00@localhost>
Message-ID: <Pine.OSF.4.33.0106071840180.21708-100000@moe.cc.utexas.edu>

> Until Freenet, or any other system, has an explicit threat model against which
> its security guarantees are measured, it is not suitable to be trusted with
> real world risks.

You said it better than I could have. Preach on, brother!


From dmarti at zgp.org  Wed Jun 13 18:59:01 2001
From: dmarti at zgp.org (Don Marti)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] Chess over Freenet
Message-ID: <20010613185833.A22647@zgp.org>

Brandon Wiley's article on "Applications over Freenet: a
Decentralized, Anonymous Gaming API" is up.

http://www2.linuxjournal.com/articles/culture/0027.html

(I wonder why they put it under "culture")

-- 
Don Marti              "I've never sent or received a GIF in my life."
http://zgp.org/~dmarti    -- Bruce Schneier, Secrets and Lies, p. 246.
dmarti@zgp.org    Free the Web, burn all GIFs: http://burnallgifs.org/

From arma at mit.edu  Sun Jun 17 21:01:01 2001
From: arma at mit.edu (Roger Dingledine)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] keyword searching + consistent hashing?
Message-ID: <20010618000023.W968@belegost.mit.edu>

Let's say I have a Chord (or any consistent hashing) service for which
I can do hash lookups -- that is, I can fetch a document in O(lg n)
hops once I know what it's called. [1]

Has anybody found a good way of integrating keyword searching into this
framework? I don't want to get into anything complicated -- I just want
the user to be able to put in English words and get back keys to use
in the Chord network. I can see separating the two: having a separate
network or computer which "knows" everything in the Chord service, and
can answer search queries -- perhaps publishers inform the service, or
perhaps it crawls the nodes looking for available files. But I'm hoping
for something more integrated.

I can imagine systems where publishers provide a description along with
the document, hash each keyword of the description, and then register
those hashes with a Chord service, so Chord will allow you to do the
actual searching. But those seem rather kludgy and potentially lopsided
(eg, whoever lives at H("mp3") in the keyspace is going to be having a
bad year -- but perhaps enough caching and replication can resolve that).

Can you point me in some likely directions? Or am I crazy to think of
getting searching out of a Chord-style architecture, and I should be
looking at the "separate search engine" approach? How do non-broadcast
file sharing architectures (intend to) do searching?

I'd been completely ignoring the issue of searching before, but I've
finally realized I can't afford to ignore it.

Thanks,
--Roger

[1] http://web.mit.edu/6.033/www/handouts/dp2-chord.html ,
    http://pdos.lcs.mit.edu/~kaashoek/chord.ps


From tcole at espnow.com  Sun Jun 17 21:19:01 2001
From: tcole at espnow.com (Tavin Cole)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] keyword searching + consistent hashing?
In-Reply-To: <20010618000023.W968@belegost.mit.edu>; from arma@mit.edu on Mon, Jun 18, 2001 at 12:00:24AM -0400
References: <20010618000023.W968@belegost.mit.edu>
Message-ID: <20010618001757.K8476@niss>

On Mon, Jun 18, 2001 at 12:00:24AM -0400, Roger Dingledine wrote:
> I'd been completely ignoring the issue of searching before, but I've
> finally realized I can't afford to ignore it.

There are several peer to peer information distribution applications
which could all benefit from a simple keyword searching service, or
perhaps something more complex.  A very general peer to peer indexing
and/or spidering network that was capable of searching for content
on multiple other networks would be the best approach, in my opinion.
Maybe this is something we can all collaborate on.

Aside from the niceties of having a unified search network for all
peer to peer systems, I take this position because the network is
basically the topology, and you want to optimize the topology of
each network for the functions it performs.

-- 

# tavin cole
#
# "Technology is a way of organizing the universe so that
# man doesn't have to experience it."
#
#        - Max Frisch


From gojomo at usa.net  Sun Jun 17 21:34:01 2001
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] keyword searching + consistent hashing?
References: <20010618000023.W968@belegost.mit.edu>
Message-ID: <008301c0f7af$d8d43fe0$e8c7a540@tron>

I bet the 1998 Google paper would engenger some good ideas:

    http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm

Another shoot-from-the-hip idea:

Let's say your search is something like "Beatles AND Revolution 
AND mp3".

Which word's "expert node" should you query first, 'Beatles',
'Revolution', or 'mp3'? I suspect that there should be a
globally-shared ordering of words -- so that searchers and 
indexers proceed through multi-word queries in exploitably
predictable ways. I also suspect an ordering based on either 
word-frequency or search-frequency might work best.

On average, then, certain indexes only need to remember 
word-concordances in the "more-common" direction. That is, 
the 'mp3' node doesn't need to remember everywhere it appears
with 'Revolution' or 'Beatles', but those have to remember
"forward" to the more-common 'mp3' term. I think this would
have the effect of halving, on average, index sizes and 
might serve to counterbalance the popular-word effect you
mention.

- Gojomo


----- Original Message ----- 
From: "Roger Dingledine" <arma@mit.edu>
To: <p2p-hackers@zgp.org>
Sent: Sunday, June 17, 2001 9:00 PM
Subject: [p2p-hackers] keyword searching + consistent hashing?


> Let's say I have a Chord (or any consistent hashing) service for which
> I can do hash lookups -- that is, I can fetch a document in O(lg n)
> hops once I know what it's called. [1]
> 
> Has anybody found a good way of integrating keyword searching into this
> framework? I don't want to get into anything complicated -- I just want
> the user to be able to put in English words and get back keys to use
> in the Chord network. I can see separating the two: having a separate
> network or computer which "knows" everything in the Chord service, and
> can answer search queries -- perhaps publishers inform the service, or
> perhaps it crawls the nodes looking for available files. But I'm hoping
> for something more integrated.
> 
> I can imagine systems where publishers provide a description along with
> the document, hash each keyword of the description, and then register
> those hashes with a Chord service, so Chord will allow you to do the
> actual searching. But those seem rather kludgy and potentially lopsided
> (eg, whoever lives at H("mp3") in the keyspace is going to be having a
> bad year -- but perhaps enough caching and replication can resolve that).
> 
> Can you point me in some likely directions? Or am I crazy to think of
> getting searching out of a Chord-style architecture, and I should be
> looking at the "separate search engine" approach? How do non-broadcast
> file sharing architectures (intend to) do searching?
> 
> I'd been completely ignoring the issue of searching before, but I've
> finally realized I can't afford to ignore it.
> 
> Thanks,
> --Roger
> 
> [1] http://web.mit.edu/6.033/www/handouts/dp2-chord.html ,
>     http://pdos.lcs.mit.edu/~kaashoek/chord.ps
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> 


From hal at finney.org  Sun Jun 17 21:34:02 2001
From: hal at finney.org (hal@finney.org)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] keyword searching + consistent hashing?
Message-ID: <200106180426.VAA23379@finney.org>

There have been a lot of discussions in the Freenet developers list
about searching over the past year.  Generally there are two camps:
(a) searching will never work, and (b) it will too.

One of the ideas, from Ian Clarke, inventor of Freenet, is to exploit
Freenet's normal routing algorithm.  This is basically a steepest-descent
"search" with backtracking, where each node vectors the request to the
neighbor node which has the closest key to that being requested.

To do searching, instead of running this algorithm with hashes as it
is normally done, it would be run with strings of keywords.  So you
might have the location key for a document be "MP3 Grateful Dead Sugar
Magnolia".  If someone looks for this exact string, the Freenet routing
algorithm will find the document just as easily as if it were looking
up by a hash.

However the routing algorithm would then be changed to do fuzzy matches,
which might be smart and involve rearranging words and checking for
close spellings, etc.  So if someone searched for "Grateful Dead MP3"
that might be a close match to the string above, and the search request
would find its way through the network and might find this document.

As I said there is skepticism that this idea will work but perhaps at
some point it will be tried if no better ideas come along.

A different suggestion that was made was to add pointers to a document
at the locations associated with all possible subsets of its keywords.
So with the keywords above you'd have pointers from "Grateful Dead"
and "MP3 Sugar Magnolia", etc.  (This would use the original hash-based
lookup concept.)

To avoid the single-word-overload problem you could either only do it for
sets of 3 or more words, or else you'd just put a hard limit on the number
of entries stored for any set of keywords, and so if someone searched for
"MP3" they'd only find the 100 most recent entries that used that word,
making popular single word lookups relatively useless but not harmful.
More specific searches would be necessary to get better results.

Hal

From sah at thalassocracy.org  Mon Jun 18 03:50:01 2001
From: sah at thalassocracy.org (Steven Hazel)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] keyword searching + consistent hashing?
In-Reply-To: hal@finney.org's message of "Sun, 17 Jun 2001 21:26:01 -0700"
References: <200106180426.VAA23379@finney.org>
Message-ID: <871yoi9i54.fsf@azrael.dyn.cheapnet.net>

On Sun, 17 Jun 2001, hal@finney.org wrote:

> However the routing algorithm would then be changed to do fuzzy
> matches, which might be smart and involve rearranging words and
> checking for close spellings, etc.  So if someone searched for
> "Grateful Dead MP3" that might be a close match to the string above,
> and the search request would find its way through the network and
> might find this document.
> 
> As I said there is skepticism that this idea will work but perhaps
> at some point it will be tried if no better ideas come along.

I am convinced that "fuzzy" might as well mean "magic" for the
purposes of this discussion.

For some background, here is the relevant text from Ian Clarke's 2 Jun
2001 post to the Freenet development list:

> Currently a key in Freenet consists, essentially, of a string of
> text.  We define a "closeness" function between two keys, so that
> given three keys, A, B, and C, we can determine which of A or B is
> closer to C.  To do this we have chosen a lexographic comparison, so
> that "aardvark" is closer to "apple" than "zebra" is.  It was
> apparent, even when writing my original paper describing the Freenet
> architecture, that much more complex keys could be used with
> correspondingly more sophisticated closeness operations.
>
> So how does this help up with Fuzzy searching?  Well consider we
> defined a new key-type, called a MetadataKey, which rather than just
> a single string of text, it consisted of a number of key-data pairs,
> such as:
>
> "artist" => "tori amos"
> "album"  => "little earthquakes"
> "song"   => "winter"
> "year"   => "1988"
>
> Now, lets say that we wanted to search for an mp3 which was stored
> under such a key.  We could define a search like this:
>
> ("artist" string= "tori amos") AND
> ("album" contains "litt") AND
> ("song" contains "w") AND
> ("year" lessThan "2000")
>
> A node receives this search, and compares it to each of the
> MetadataKeys in its datastore.  It uses fuzzy logic to come up with
> a value between 0 and 1 for how closely each MetadataKey matches
> this search (I have thought this out in more detail but for brevity
> won't explain here, do a web search for "fuzzy logic", and
> "Levenshtein distance"), a perfect match being 1.  The search
> request is then forwarded to the referece associated with the
> closest match.  Once the HTL runs out, a SearchResponse message is
> sent which contains, at any given time, the top, say, 10 matches for
> the query, along with the CHK of the data they refer to. Each node
> updates this as they pass the search request back to the requester.
> The requester can then choose which of these they wish to request
> using conventional Freenet messages.

(For those of you who aren't familiar with the Levenshtein distance --
that's pronounced "edit distance" -- it's the number of deletions,
additions, and substitutions which must be performed to transform one
string into another.)

The problem with this is that Freenet routing, or any key-based
routing, for that matter, depends on a node being assigned a portion
of the keyspace for which it is responsible.  With ordinary keys, we
do this taking the hash of some data, and assigning each node a range
of numbers for which it is responsible.  There hasn't yet been a
convincing description of an algorithm for meaningfully dividing up
the metadata keyspace.

What Ian partially describes is an interesting way to order metadata
with respect to a given search query, but that's not the problem that
needs to be solved.  We can now compare A and B against C, and learn
that A is meaningfully closer to C than B is, based on some (hopefully
useful) definition of "meaningfully closer".  But that comparison is
relative to C.  Knowing that B < A with respect to C doesn't tell us
anything about the relationship of A to C with respect to B, or of B
to C with respect to A.

For Ian's closeness relationship to work, the nodes would have to
re-arrange the distribution of metadata in the network before each
search.

Here's a concrete example, using edit distance as closeness, and the
values "Toby", "Toons", and "Tony":

With respect to "Tony":

"Toby" = 1
"Toons" = 2

And with respect to "Toons":

"Tony" = 2
"Toby" = 2

And with respect to "Toby":

"Tony" = 1
"Toons" = 2

So, globally, in what order do those values occur?  "Toby" and "Tony"
are clearly both equidistant from "Toons".  But they're also clearly
equidistant from one another by a lesser but non-zero amount.  There
is no global order.

We can't meaningfully map that closeness relationship onto a global
ordering and deal out subsets to nodes.  So there's no way that a node
can know which other node to forward a request to without the data in
the network being organized specially for a particular query.

And therefore there's no such thing as "fuzzy searching".

-S

From alk at pobox.com  Mon Jun 18 11:05:01 2001
From: alk at pobox.com (Tony Kimball)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] keyword searching + consistent hashing?
References: <20010618000023.W968@belegost.mit.edu>
Message-ID: <15150.17062.143746.123137@spanky.love.edu>

Quoth Roger Dingledine on Monday, 18 June:
: 
: (eg, whoever lives at H("mp3") in the keyspace is going to be having a
: bad year -- but perhaps enough caching and replication can resolve that).

"mp3" is a UI use-case reductio-ad-absurdum: Users should not normally
be presented with the option of searching for a .mp3 file extension,
nor should indexing typically index on that term in that position.  A
more realistic example is "Spears" or "Eminem" or such like, which are
actually useful discriminators, but still represent problematic
hotspots in the hash space.  

Dexter's approach to this problem is two-fold:

(1) Flatten the distribution of hashes by combining terms.

Hotspots are less hot and more diffuse for <britney,oops> than for
<britney>.  A search for 'kronos quartet purple haze' can result in
lookups <kronos,quartet> <kronos,purple> <kronos,haze>
<quartet,purple> <quartet,haze>, <purple,haze>, the results of which
are combined in the client.  There are a number of tricks applied to
make this more generally applicable, for example: pre-normalizing the
keys by metaphones; filtering prepositions, articles, and copulae; 
and, pruning the term permutation trees heuristically.  

(2) Distribute the load by virtualizing the node.

The hash node need not be a single host.  Query load can be
distributed over a group of hosts by round-robining, and storage
load can be distributed by WAN RAID technique.

: How do non-broadcast
: file sharing architectures (intend to) do searching?

Most of what I've seen has been content-routed using Bloom filters to
represent routing tables (a lossy compression technique).  I
personally consider the storage requirements, the update bandwidth
requirements, and the practical lookup latencies in such networks are
too great, at Internet scales, for todays typical network nodes and
links (say P2-400/64M/5G/56k), but it is still an interesting approach
for fat nodes in broadband networks.  I don't know of any viable
project developing a content-routing search network.  Which is
strange, really, since it is such a good way of doing intranet
content discovery.

There's also a camp based on exploiting the power-law 'supernode'.
This approach has value for approximate searches on weighted graphs,
but trades off between discovery guarantees and bandwidth scaling.
If you are comfortable with this tradeoff, you should look into
systems such as OpenCola and Neurogrid.


From zooko at zooko.com  Mon Jun 18 20:37:02 2001
From: zooko at zooko.com (zooko@zooko.com)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
In-Reply-To: Message from Roger Dingledine <arma@mit.edu> 
   of "Mon, 18 Jun 2001 00:00:24 EDT." <20010618000023.W968@belegost.mit.edu> 
References: <20010618000023.W968@belegost.mit.edu> 
Message-ID: <E15CCGQ-0001s5-00@localhost>

 Roger Dingledine wrote:
>
> Let's say I have a Chord (or any consistent hashing) service for which
> I can do hash lookups -- that is, I can fetch a document in O(lg n)
> hops once I know what it's called.


I think of this as two separate problems: immutable namespaces and mutable
namespaces.  Keyword searching must be implemented atop a mutable namespace
(because immutable namespaces have keys that are not human-readable.)

I'm not really going to address your questions about searching (although note
that there is an unused implementation of search-by-consistent-hashing in Mojo
Nation: [1] -- it sounds sort of like Dexter's approach).  Instead I'm going to
babble about the larger picture and how we're all fondling different parts of
the elephant while Stratton Sclavos laughs all the way to the bank.  (Oops -- 
I just changed metaphors in midstream...)


  Self-Authenticating, Immutable, Mutable, Distributed

Namespaces can be distributed as long as the objects are *self-authenticating*.
That is: when you get an object you can easily verify that it matches the key
that you started with.  (Note: there is still a weak DoS attack by spreading
bogus objects that the searcher has to reject, but that seems very weak.)

Two examples of self-authenticating objects are any object where the key is its
collision-free hash, and any object where the key includes a public key and the
object comes with a signature from the corresponding private key.  (In Mojo
Nation, the former is MojoIds, and the latter is not implemented.  In Freenet,
the former is Content Hash Keys and the latter is Sub-Space Keys.  In the
Self-Certifying File System[2] the latter is used for remote directories.)

The hash-name approach creates an immutable namespace, and one with
non-human-readable keys.

The signed-object approach also allows for mutable namespaces, but it cannot be
perfectly distributed in the same sense that immutable namespaces can.  This is
because *someone* has to be able to change what value a key maps to, and there
isn't any way to have a universal agreement on what the new value should be.
(Unlike an immutable namespace, where we can all universally agree that the
value for a key should be the thing that is SHA1'ed to create that key.) 


  Who Controls The Mutants?

So the first decision you have to face when designing a mutable namespace is
"who controls the mappings?".  The two most obvious answers are: a. a
centralized committee[3] does (DNS).  or b. Anyone who creates a new part of
the namespace by creating a public key controls it by using the corresponding
private key (SDSI[4]).

Now we all see the problems with the first solution (although as an aside, the
vast majority of people do *not* see the problem with the first, and we may
very well be stuck with it for the next couple of decades, whether we like it
or not).

Okay, so now we agree that for there to be a good distributed mutable
namespace, all the objects have to be self-authenticating and control over
mapping must be local to the holder of the relevant secret key.  What does this
have to do with searching?


  What Does This Have To Do With Searching?

Well..  Actually I'm not sure, but I feel like there must be some important
connection here.  Hopefully someone else can tell me what it is, or convince me
that there isn't one.  ;-)


Regards,

Zooko

[1] http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/mojonation/evil/common/ContentHandicappers.py?content-type=text/plain
[2] http://fs.net/
[3] "The Man Who Bought The Internet"
    http://www.fortune.com/indexw.jhtml?channel=artcol.jhtml&doc_id=202984
[4] http://theory.lcs.mit.edu/~cis/sdsi.html


From blanu at uts.cc.utexas.edu  Tue Jun 19 00:05:01 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] keyword searching + consistent hashing?
In-Reply-To: <20010618000023.W968@belegost.mit.edu>
Message-ID: <Pine.OSF.4.33.0106181719150.10036-100000@moe.cc.utexas.edu>

> Has anybody found a good way of integrating keyword searching into this
> framework?

Ah, glad you asked! Now you can skip my O'Reilly P2P talk. :-)

> I can imagine systems where publishers provide a description along with
> the document, hash each keyword of the description, and then register
> those hashes with a Chord service, so Chord will allow you to do the
> actual searching. But those seem rather kludgy and potentially lopsided
> (eg, whoever lives at H("mp3") in the keyspace is going to be having a
> bad year).

So the system I'm advocating starts with this idea. It's for searching
Freenet, so it uses Freenet hashing and routing. As everyone knows,
Freenet is two hops away from Chord, so it should work with Chord as well.

In order to avoid flooding the most popular keyword, which I choose to
call "britney" for absolutely no reason, you give each entry a number. So
the first item with that keyword is filed under "britney-0". The second is
"britney-1", etc. With a good hashing algorithm the various entries should
then be spread more or less evenly around the entire keyspace. This also
avoids collisions in Freenet, which can only have one item called
"britney".

This approach has some problems. First of all, it makes things slow to
insert as you have to find the highest value of x. There are some obvious
techniques for speeding this up so that it scales. Additionally, x will
eventually get really large and you will have to store it in a bignum,
which is really obnoxious. Also, in Freenet things fall out, which makes
it difficult to determine the highest value of x. The more sparse the
collection of items in the keyword index is, the less likely it is that
you will correctly determine the correct value for x.

The solution we're currently using for this is to have date-based
indices. Each key has a timestamp for the day that it was inserted. So the
key format is prefix-timestamp-number.

The advantage to this is that the numbering starts over every day.
Inserting doesn't take as long, you don't run out of numbers, and the
probability of getting the right value for x is high. Each item still has
a unique identifier so that the items are spread over the evenly over the
keyspace.

Personally, I don't really like keyword searching. I'm much more fond of
metadata searching based on fields with semantic content, such as Title,
Author, Publication Date, etc.. So the searching system which I'm going to
be presenting is for doing that rather than keyword searching. However,
it's really the same idea, just slightly modified.

Also, in a Chord system timestamps aren't strictly necessary as data
doesn't fall out. In such a system I think you could do some neat things
with assigning numbers to the items so as to balance lookup and insert
times.


From blanu at uts.cc.utexas.edu  Tue Jun 19 00:21:01 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching +
 consistent hashing?)
In-Reply-To: <E15CCGQ-0001s5-00@localhost>
Message-ID: <Pine.OSF.4.33.0106190208130.9269-100000@curly.cc.utexas.edu>

> Okay, so now we agree that for there to be a good distributed mutable
> namespace, all the objects have to be self-authenticating and control over
> mapping must be local to the holder of the relevant secret key.  What does this
> have to do with searching?

I do indeed agree. I can relate this quite easily to searching using a
Freenet example. The problem with the namespace system which I just
posted is that it uses human-readable keys, which are of course not
self-verifying.

Unfortunately, there is no way to allow for public submission of
information which can be verified since verification requires you have
some information (hash or public key) and the whole idea of public
submission is that you have neither of these.

So anything publicly submitted can be corrupted by evil nodes in the
network. Therefore, it is very useful to have trusted individuals sort
through the submissions and make signed copies of them.

My conception of the searching system which I'm advocating for Freenet is
that there will be a place for insecure public submissions. Several
"search engines" will provide a service of scanning these submissions,
verifying, sorting, and ranking them automatically or manually, and then
place them in new indices which are signed. If you trust one of these
engines then you can just search their indices. If you don't trust anyone
or fear that they are censoring content when the verify it, then the raw,
insecure submissions are still there for you to look at.

Though I'm working on a searching system for Freenet, these issues are
pretty much the same for any system.


From zooko at zooko.com  Tue Jun 19 08:07:01 2001
From: zooko at zooko.com (zooko@zooko.com)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) 
In-Reply-To: Message from Brandon <blanu@uts.cc.utexas.edu> 
   of "Tue, 19 Jun 2001 02:20:33 CDT." <Pine.OSF.4.33.0106190208130.9269-100000@curly.cc.utexas.edu> 
References: <Pine.OSF.4.33.0106190208130.9269-100000@curly.cc.utexas.edu> 
Message-ID: <E15CN1v-0002XP-00@localhost>

 Brandon <blanu@uts.cc.utexas.edu> wrote:
>
> I do indeed agree. I can relate this quite easily to searching using a
> Freenet example. The problem with the namespace system which I just
> posted is that it uses human-readable keys, which are of course not
> self-verifying.
> 
> Unfortunately, there is no way to allow for public submission of
> information which can be verified since verification requires you have
> some information (hash or public key) and the whole idea of public
> submission is that you have neither of these.

Hm.  I don't think I understand what "public submission" is.  (At least not in
the context of distributed namespaces...)  Why not require the client to
generate an RSA key pair in order to submit the information?

The major drawback that I can see is that it makes the key non-human-
writeable/memorable.  Is that what you mean when you say that public submission
requires that you don't have a public key?


By the way, I'm surprised that I forgot to reference the Pet Names Markup
Language[1] in my previous "overview of namespaces" letter.

The Pet Names Markup Language is a way to map human-memorable names onto the
kind of distributed mutable namespace that I am imagining: self-authenticating,
each mapping controlled by its private key, and a third quality that I didn't
mention yet: that namespaces can be transitively linked.  

This kind of namespace is probably best described in the papers on SDSI -- the
Simple Distributed Security Infrastructure by Ron Rivest (the "R" in "RSA")
[2, 3].

When reading the PNML web page, you can obviously envision different user
interfaces and encodings and so forth, but the basic structure of pet names,
suggested pet names, translation on introduction, etc. is an excellent and
elegant solution to the problem.


I can't recommend the SDSI/PNML concepts highly enough.  In my opinion, if you
design a distributed mutable namespace, you ought to either use SDSI/PNML, or
have a good reason why you chose not to.


I'm still not entirely clear on the relation between these namespace issues and
distributed keyword search schemes, so please write back!

Regards,

Zooko

[1] "Lambda for Humans -- The Pet Name Markup Language" Mark Miller
    http://www.erights.org/elib/capability/pnml.html

[2] "SDSI -- A Simple Distributed Security Infrastructure" 1996 Ronald L.  Rivest, Butler Lampson
    http://citeseer.nj.nec.com/rivest96sdsi.html

[3] "On SDSI's Linked Local Name Spaces" 1998 Mart�n Abadi
    http://citeseer.nj.nec.com/abadi98sdsis.html


From wesley at felter.org  Tue Jun 19 12:01:02 2001
From: wesley at felter.org (Wesley Felter)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching +
 consistent hashing?)
In-Reply-To: <E15CCGQ-0001s5-00@localhost>
Message-ID: <Pine.LNX.4.30.0106191407420.13700-100000@valentine.felter.org>

On Mon, 18 Jun 2001 zooko@zooko.com wrote:

>  Roger Dingledine wrote:
> >
> > Let's say I have a Chord (or any consistent hashing) service for which
> > I can do hash lookups -- that is, I can fetch a document in O(lg n)
> > hops once I know what it's called.
>
> I think of this as two separate problems: immutable namespaces and mutable
> namespaces.  Keyword searching must be implemented atop a mutable namespace
> (because immutable namespaces have keys that are not human-readable.)

Hey, back up a second there. What exactly is the difference between
mutable and immutable namespaces? If you mean the mutability of the
mappings themselves (so in an immutable namespace, a name would always map
to the same thing forever (where "the same thing" is left undefined for
the moment)), I don't see why keys in an immutable namespace
must be non-human-readable.

Wesley Felter - wesley@felter.org - http://felter.org/wesley/


From zooko at zooko.com  Tue Jun 19 12:40:01 2001
From: zooko at zooko.com (zooko@zooko.com)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) 
In-Reply-To: Message from Wesley Felter <wesley@felter.org> 
   of "Tue, 19 Jun 2001 14:11:26 CDT." <Pine.LNX.4.30.0106191407420.13700-100000@valentine.felter.org> 
References: <Pine.LNX.4.30.0106191407420.13700-100000@valentine.felter.org> 
Message-ID: <E15CRI6-00044b-00@localhost>

 Wes Felter wrote:
>
> > I think of this as two separate problems: immutable namespaces and mutable
> > namespaces.  Keyword searching must be implemented atop a mutable namespace
> > (because immutable namespaces have keys that are not human-readable.)
> 
> Hey, back up a second there. What exactly is the difference between
> mutable and immutable namespaces? If you mean the mutability of the
> mappings themselves (so in an immutable namespace, a name would always map
> to the same thing forever (where "the same thing" is left undefined for
> the moment)), I don't see why keys in an immutable namespace
> must be non-human-readable.


Good question.  The answer is that there was an implicit extra requirement in
there: that the namespace be perfectly distributed (i.e. nobody ever disagrees
about what value a given key should map to, nor does anyone find themselves
hindered from entering a new value into the namespace).

The only way, AFAIK, to achieve a *perfectly distributed* namespace, is for it
to be an immutable namespace, and for it to be based on collision-free hashes
and hence have non-human-memorable keys.

Furthermore, there *can't* be any namespace that satisfies my idea of
"perfectly distributed" and has human-memorable keys, because there would
immediately be conflicting preferences about what the key "mcdonalds.com"
mapped to in that namespace, violating one of my criteria for "perfectly
distributed".


(As a side note, the only reason we are having this problem is that human
brains are built to remember collections of only seven plus or minus two
arbitrary symbols, and ASCII only offers about 7 bits per of symbols, for a
grand total of about 63 bits per memorable word.

I sometimes wonder what would happen if we all used a so-called "ideographic"
script like Chinese, and had maybe 20 reliably distringuishable bits per
symbol, and could encode a whole 160 bit unique id into 8 symbols.  In real
life, I doubt that readers of Chinese cannot reliably distinguish anywhere near
that many symbols...)

Regards,

Zooko


From blanu at uts.cc.utexas.edu  Tue Jun 19 12:47:02 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching +
 consistent hashing?) 
In-Reply-To: <E15CN1v-0002XP-00@localhost>
Message-ID: <Pine.OSF.4.33.0106191439590.9269-100000@curly.cc.utexas.edu>

> > Unfortunately, there is no way to allow for public submission of
> > information which can be verified since verification requires you have
> > some information (hash or public key) and the whole idea of public
> > submission is that you have neither of these.
>
> Hm.  I don't think I understand what "public submission" is.  (At least not in
> the context of distributed namespaces...)  Why not require the client to
> generate an RSA key pair in order to submit the information?

A public key doesn't mean anything if you don't have it *before* the
submission. The search system I'm advocating requires keys which are not
necessarily human-readable, but rather *guessable* so that they can be
automatically retrieved by a client.

If you have everyone's public keys beforehand then you can search these
various keyspaces for items. If you don't have someone's public key then
you can't find any of their items at all.

So you require either out-of-band public key distribution, or an insecure
public space for the submission of public keys.

So if you want people that you don't already know to submit entries that
you can read, then you need an insecure public space at some point in the
process.


From wesley at felter.org  Tue Jun 19 12:48:01 2001
From: wesley at felter.org (Wesley Felter)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching +
 consistent hashing?) 
In-Reply-To: <E15CRI6-00044b-00@localhost>
Message-ID: <Pine.LNX.4.30.0106191455230.13700-100000@valentine.felter.org>

On Tue, 19 Jun 2001 zooko@zooko.com wrote:

> Good question.  The answer is that there was an implicit extra requirement in
> there: that the namespace be perfectly distributed (i.e. nobody ever disagrees
> about what value a given key should map to, nor does anyone find themselves
> hindered from entering a new value into the namespace).

Didn't Raph Levien write a paper about how to build a distributed
namespace with arbitrary keys (given out FCFS)?

Wesley Felter - wesley@felter.org - http://felter.org/wesley/


From zooko at zooko.com  Tue Jun 19 12:56:02 2001
From: zooko at zooko.com (zooko@zooko.com)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) 
In-Reply-To: Message from Brandon <blanu@uts.cc.utexas.edu> 
   of "Tue, 19 Jun 2001 14:46:17 CDT." <Pine.OSF.4.33.0106191439590.9269-100000@curly.cc.utexas.edu> 
References: <Pine.OSF.4.33.0106191439590.9269-100000@curly.cc.utexas.edu> 
Message-ID: <E15CRXt-0004Bi-00@localhost>

 
 Brandon wrote:
>
> > Hm.  I don't think I understand what "public submission" is.  (At least not in
> > the context of distributed namespaces...)  Why not require the client to
> > generate an RSA key pair in order to submit the information?
> 
> A public key doesn't mean anything if you don't have it *before* the
> submission. The search system I'm advocating requires keys which are not
> necessarily human-readable, but rather *guessable* so that they can be
> automatically retrieved by a client.

Ahh.  Very interesting.

Hm.  But however the submitted data travels to you, or however your query is
transmitted to it, one could do transitive introductions along that same path.

Does that solve the problem?

Regards,

Zooko


From zooko at zooko.com  Tue Jun 19 13:09:01 2001
From: zooko at zooko.com (zooko@zooko.com)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) 
In-Reply-To: Message from Wesley Felter <wesley@felter.org> 
   of "Tue, 19 Jun 2001 14:59:05 CDT." <Pine.LNX.4.30.0106191455230.13700-100000@valentine.felter.org> 
References: <Pine.LNX.4.30.0106191455230.13700-100000@valentine.felter.org> 
Message-ID: <E15CRko-0004Gf-00@localhost>

 Wes wrote:
>
> > Good question.  The answer is that there was an implicit extra requirement in
> > there: that the namespace be perfectly distributed (i.e. nobody ever disagrees
> > about what value a given key should map to, nor does anyone find themselves
> > hindered from entering a new value into the namespace).
> 
> Didn't Raph Levien write a paper about how to build a distributed
> namespace with arbitrary keys (given out FCFS)?

I don't know about this.  I would love to get a URL to it!

Note that the consistent-hashing based distributed search hack[1] that we
implemented but do not use for Mojo Nation was originally suggested by Raph in
a post to advogato.org.

Regards,

Zooko

[1] http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/mojonation/evil/common/ContentHandicappers.py?content-type=text/plain


From alk at pobox.com  Tue Jun 19 14:40:01 2001
From: alk at pobox.com (Tony Kimball)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?) 
References: <Pine.LNX.4.30.0106191407420.13700-100000@valentine.felter.org>
	<E15CRI6-00044b-00@localhost>
Message-ID: <15151.50838.590836.759462@spanky.love.edu>

Quoth zooko@zooko.com on Tuesday, 19 June:
: > >  Keyword searching must be implemented atop a mutable namespace
: > > (because immutable namespaces have keys that are not human-readable.)

: Good question.  The answer is that there was an implicit extra requirement in
: there: that the namespace be perfectly distributed (i.e. nobody ever disagrees
: about what value a given key should map to, nor does anyone find themselves
: hindered from entering a new value into the namespace).
: 
: The only way, AFAIK, to achieve a *perfectly distributed* namespace, is for it
: to be an immutable namespace, and for it to be based on collision-free hashes
: and hence have non-human-memorable keys.

Your discussion appears to confuse human memorability and human
generability.  Whether I can remember a key without an external memory
aid is operationaly insignificant, because I do in fact posess
external memory aids.  

: ... there *can't* be any namespace that satisfies my idea of
: "perfectly distributed" and has human-memorable keys, because there would
: immediately be conflicting preferences about what the key "mcdonalds.com"
: mapped to in that namespace...

One way to solve this problem is to make the namespace bigger, using 
enough keys in combination to insure the injectivity of the relation.

But it seems to me that making the keys random 160-bit integers won't
reduce the probability of conflicting preferences, because it is the
semantic value of the mapping that is the issue in contention, not
the uninterpreted association of random integers.  As long as the map
has a semantic value, and is operationally significant in the context
of some application, such conflicts may arise.


From blanu at uts.cc.utexas.edu  Tue Jun 19 19:11:01 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching +
 consistent hashing?)
In-Reply-To: <Pine.LNX.4.30.0106191407420.13700-100000@valentine.felter.org>
Message-ID: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu>

> Hey, back up a second there. What exactly is the difference between
> mutable and immutable namespaces? If you mean the mutability of the
> mappings themselves (so in an immutable namespace, a name would always map
> to the same thing forever (where "the same thing" is left undefined for
> the moment)), I don't see why keys in an immutable namespace
> must be non-human-readable.

In a decentralized system, each node in the network has the ability to
mess with the data. Therefore, all items must be self-verifying in order
to guarantee that they are immutable. If some node in the network modifies
the data then obviously it's not immutable.

I suppose you could also come up with a design where you only talked to
nodes that you somehow trusted would honor the contract to not modify
immutable data.

So I guess the answer is that you have to either trust the network or the
data has to be self-verifying.


From blanu at uts.cc.utexas.edu  Tue Jun 19 19:13:01 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching +
 consistent hashing?) 
In-Reply-To: <Pine.LNX.4.30.0106191455230.13700-100000@valentine.felter.org>
Message-ID: <Pine.OSF.4.33.0106192111380.22307-100000@curly.cc.utexas.edu>

> Didn't Raph Levien write a paper about how to build a distributed
> namespace with arbitrary keys (given out FCFS)?

Wow! I believe that the be theoretically impossible. So I'd love to read
that paper. Maybe he was using a different definition of distributed than
I am.


From blanu at uts.cc.utexas.edu  Tue Jun 19 19:26:01 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching +
 consistent hashing?) 
In-Reply-To: <E15CRXt-0004Bi-00@localhost>
Message-ID: <Pine.OSF.4.33.0106192112320.22307-100000@curly.cc.utexas.edu>

> Hm.  But however the submitted data travels to you, or however your query is
> transmitted to it, one could do transitive introductions along that same path.
>
> Does that solve the problem?

Unfortunately, no. In order for this to work, the public key would have to
be broadcast to the entire network.

The basic problem is that the only way that a producer and a consumer can
communicate is by inserting things into the network. A very miniscule
amount of OOB communication is required as well, the name of an index
("britney" in my example). The name of the index could be automatically
determine by the client, in which case the OOB information is the
algorithm encoded in the client.

Anyway, they have to have some way to agree on some keys beforehand. The
only way for them to exchange information is for the producer to insert
information at the predetermined keys and for the consumer to look for it
there.

In the particular case of searching based on keywords, random people that
don't know each other are inserting and requesting things from these
globally agreed upon keys using the keyword->key algorithm shared by all
of the clients.

In order for this to be secure, the public key of the producer must be
known to the consumer before the search. This is because in order for the
content at a particular key to not be modifiable by intervening nodes,
there must be a relationalship between the contents of the file and the
key itself. There's no way that the client can automatically generate keys
based on some algorithm so that the keys generated are related to the
producer's public key if the client doesn't know the producer's public
key.

I hope that made sense. :-) If not, I can try again. ;-)


From jeff at platypus.ro  Tue Jun 19 19:29:01 2001
From: jeff at platypus.ro (Jeff Darcy)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu>
Message-ID: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com>

> In a decentralized system, each node in the network has the ability to
> mess with the data.

Being decentralized does not necessarily imply that *any* node can modify
*any* data.  That is merely one form of decentralization, and for the very
reasons we're discussing it might not be the ideal form.  Partitioning,
hierarchy and delegation can and often do have their place in decentralized
systems, so long as globally necessary roles are not permanently bound to
particular nodes.


From Verbatim3D at aol.com  Tue Jun 19 19:31:01 2001
From: Verbatim3D at aol.com (Verbatim3D@aol.com)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent ...
Message-ID: <29.1684638f.286164a9@aol.com>

hello    how may i reach you ?

From Verbatim3D at aol.com  Tue Jun 19 19:35:01 2001
From: Verbatim3D at aol.com (Verbatim3D@aol.com)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent ...
Message-ID: <115.899f11.286165ac@aol.com>

I ment like Aol , Aim , Icq .... Ect 

From blanu at uts.cc.utexas.edu  Tue Jun 19 19:48:02 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching +
 consistent hashing?)
In-Reply-To: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com>
Message-ID: <Pine.OSF.4.33.0106192139130.22307-100000@curly.cc.utexas.edu>

> > In a decentralized system, each node in the network has the ability to
> > mess with the data.
>
> Being decentralized does not necessarily imply that *any* node can modify
> *any* data.  That is merely one form of decentralization, and for the very
> reasons we're discussing it might not be the ideal form.  Partitioning,
> hierarchy and delegation can and often do have their place in decentralized
> systems, so long as globally necessary roles are not permanently bound to
> particular nodes.

I was actually thinking just that when I wrote that, but it didn't quite
come out right. I meant to say something more like each node has the
ability to modify the data if it gets the opportunity. Any
non-self-verifying data can be modified by any node which it passes
through. Various checks may detect, fix, avoid, etc. this from happening.

Centralization, for instance, is an architecture with a naive method for
dealing with evil nodes. You just have one node and hope that it's not evil.

Totally decentralized systems are not so naive to assume that you trust
*all* nodes in the network. So there are various schemes to get the data
from nodes you trust, not giving evil nodes the opportunity to modify the
data (as opposed to the innate ability to, which they still have). Or, as
with Freenet, you can opt to use mostly self-verifying data and not care
about where you get the data from.

So, to sum up, self-verifying data can't be modified by any node.
Non-self-verifying data can be modified be any node it passes through.
Various architectures route the data differently and have different ways
of dealing with evil nodes.


From blanu at uts.cc.utexas.edu  Tue Jun 19 19:48:03 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching +
 consistent ...
In-Reply-To: <29.1684638f.286164a9@aol.com>
Message-ID: <Pine.OSF.4.33.0106192147300.22307-100000@curly.cc.utexas.edu>

> hello    how may i reach you ?

Who are you asking?


From zooko at zooko.com  Tue Jun 19 19:55:01 2001
From: zooko at zooko.com (zooko@zooko.com)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] no more <Verbatim3D@aol.com>
In-Reply-To: Message from Brandon <blanu@uts.cc.utexas.edu> 
   of "Tue, 19 Jun 2001 21:47:39 CDT." <Pine.OSF.4.33.0106192147300.22307-100000@curly.cc.utexas.edu> 
References: <Pine.OSF.4.33.0106192147300.22307-100000@curly.cc.utexas.edu> 
Message-ID: <E15CY5G-0006v4-00@localhost>

I removed <Verbatim3D@aol.com> from p2p-hackers subscribers and sent him polite
private e-mail suggesting that this wasn't the forum he was looking for.

Regards,

Zooko

(No, in case someone was going to ask, I will not do the same for Oskar...)


From oskar at freenetproject.org  Wed Jun 20 03:33:03 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
In-Reply-To: <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com>; from jeff@platypus.ro on Tue, Jun 19, 2001 at 10:28:27PM -0400
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com>
Message-ID: <20010620123459.B525@sandbergs.org>

On Tue, Jun 19, 2001 at 10:28:27PM -0400, Jeff Darcy wrote:
> > In a decentralized system, each node in the network has the ability to
> > mess with the data.
> 
> Being decentralized does not necessarily imply that *any* node can modify
> *any* data.  That is merely one form of decentralization, and for the very
> reasons we're discussing it might not be the ideal form.  Partitioning,
> hierarchy and delegation can and often do have their place in decentralized
> systems, so long as globally necessary roles are not permanently bound to
> particular nodes.

I guess this is semantics, but aren't hierarchical and decentralized as
close to antonymous design processes as one can come? It is true that a
hierarchy doesn't have to have one top, but a center does not have to
contain one peer either. It would seem to me a hierarchical system could
be distributed (DNS), but hardly decentralized (and that the center is
mobile doesn't really seem to matter).

Somebody linked a paper that defined the terms regarding anonymity and
untracability a couple of weeks ago, which should mercifully spare us
the semantic bantering regarding that - anybody know of a similiar paper
regarding descriptions of system topologies?

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From jeff at platypus.ro  Wed Jun 20 07:05:01 2001
From: jeff at platypus.ro (Jeff Darcy)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org>
Message-ID: <087501c0f991$ebb58590$367b9fa8@lss.emc.com>

Oskar Sandberg:
> I guess this is semantics, but aren't hierarchical and decentralized as
> close to antonymous design processes as one can come?

Not really.  A truly centralized system is topologically a star.  A
hierarchical system is topologically a tree, which is not at all the same
thing as a star and can therefore be considered decentralized.  If the root
of the tree only handles one request out of a thousand because the rest have
been fully delegated elsewhere, then that can be pretty scalable and that's
all the benefit many people hope to derive from decentralizing, so it's a
pretty useful distinction.

> It is true that a
> hierarchy doesn't have to have one top, but a center does not have to
> contain one peer either.

The topology most people probably think of when they're thinking about
decentralization is a mesh.  What I think many "flat-earth" P2P folks miss
is that meshes, trees, and DAGs are very closely related.  Consider IP
routing, for example.  All of the routes through the network might form a
mesh, but the routes to one destination at a particular point in time
(modulo a few update-propagation issues) is a tree or DAG.  When one
considers multiply-rooted trees the similarities become even stronger.  When
you get right into it, superficially tree-structured and superficially
mesh-structured systems can be devilishly difficult to tell apart.  It's
like those images where if you look at it one way you see a vase but then
you look again you see two people about to kiss.

As I alluded to in my last message, the important thing in decentralization
is not that every last node be topologically or functionally equal to every
other, but that roles not be permanently assigned to particular nodes.
Temporary assignment of roles is fine, which is why so many P2P systems are
sprouting "supernodes" and "brokers" and such all over the place.  As long
as the roles can be (re)assigned automagically when *the system* (not an
administrator) detects that it is good or necessary to do so, the system is
decentralized in pretty much all of the ways that matter.

> It would seem to me a hierarchical system could
> be distributed (DNS), but hardly decentralized

The distinction I would make, based on having worked in this space for over
a decade, is that the superset of distributed systems still allows the sort
of permanent role assignment I was just talking about, while the subset of
decentralized systems does not.  Many people use the term "fully
distributed" to mean "decentralized" but the distinction they're making is
usually the same.

>  (and that the center is
> mobile doesn't really seem to matter).

The mobility of the center really *does* matter.  The problem with having a
center is that if it fails or slows down or is compromised then the entire
system fails or slows down or is compromised.  If the system is designed so
that centers can be moved or created as needed, and/or so that failures of
various kinds are contained so that they do not affect everyone, that's the
most important single distinction to be made between centralized vs.
decentralized systems.  (Yes, that means a cluster can be a decentralized
system.  The cluster infrastructure I've worked on is, within itself, as
"pure P2P" as anything I've ever seen, with all of the resource
location/migration and coordination issues that such headlessness implies.)

There's even a danger in being too extremist about levels of
decentralization.  Some of the "flat-earth" systems don't really solve the
problems associated with centralization, and are just as vulnerable to
bottlenecks or catastrophic cascading failures as any centralized system
ever was.  Look at what happened to Gnutella before reflectors.  All that
some of these systems do is make it harder to identify or remedy the source
of such problems, while also adding a whole new class of problems and
failures related to all those useless extra levels of indirection.  IMO that
much focus on ideology is poison.  My own system is probably not "fully
decentralized" enough for some people, but they can just take a flying leap
at a rolling doughnut.  I could make it more fully decentralized if I
wanted, but not just for the sake of an ideal.  My goal is to meet certain
requirements - efficiency, robustness, security (but not anonymity) - and if
further decentralization does not help me meet that goal then I'm not
interested.  Nobody should ever assume that the solution to their problem is
the solution to everyone else's.

> Somebody linked a paper that defined the terms regarding anonymity and
> untracability a couple of weeks ago, which should mercifully spare us
> the semantic bantering regarding that - anybody know of a similiar paper
> regarding descriptions of system topologies?

There are a lot of people out there offering definitions.  Unfortunately, I
don't think there's any one that could be considered authoritative.  Of
course, achieving universal agreement is a known problem in decentralized
systems so perhaps we should just deal with it.  ;-)


From raph at levien.com  Wed Jun 20 08:17:01 2001
From: raph at levien.com (Raph Levien)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] Secure namespaces in p2p networks
Message-ID: <20010620010334.D15523@levien.com>

Hi p2p-hackers,

   Zooko clued me into this thread.

   Yes, I have a paper on how to build secure distributed namespaces
using p2p networks. Get your copy here:

      http://www.levien.com/fc.ps

   This was submitted to FC '00, but in the infinite wisdom of the
reviewers, rejected.

   Interestingly enough, this design includes the Advogato trust
metric as a technique for defeating a particular kind of attack - in
particular, trying to flood the network with tons of servers, most
likely virtual. I believe there are many more interesting applications
of the trust metric ideas to p2p networks, including content selection
and spam-resistant e-mail.

   I'm beginning work again on my thesis. Best place to look for updates
on that is my Advogato diary: http://www.advogato.org/person/raph/

Raph

From zooko at zooko.com  Wed Jun 20 08:27:01 2001
From: zooko at zooko.com (zooko@zooko.com)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] names of kinds of topology
In-Reply-To: Message from "Jeff Darcy" <jeff@platypus.ro> 
   of "Wed, 20 Jun 2001 10:04:28 EDT." <087501c0f991$ebb58590$367b9fa8@lss.emc.com> 
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org>  <087501c0f991$ebb58590$367b9fa8@lss.emc.com> 
Message-ID: <E15Cjob-0002Dp-00@localhost>

Here are two common "design patterns" in network topology that I frequently
encounter and I wish I had a specific name for.  (For these topologies,
"decentralized" is neither specific nor unambiguous, but then neither is
"centralized".)

1.  There are many servers.  Service happens bilaterally between client and
server.  For clients to interact with each other, they must use the *same*
server.  Clients can use multiple servers, aggregating results from multiple
servers and dynamically choosing which servers to use.

Mojo Nation's "content tracking" architecture is currently of this model.

2.  There are many servers.  Service happens bilaterally between client and
server.  For clients to interact with each other, they must use the *same*
server.  Servers aggregate results from other servers.  Clients can only use
one server, and they choose which server to use.

IRC is of this model.

Jukka Santala had a patch that allowed Mojo Nation "Content Trackers" to suck
information out of each other (acting as clients in that transaction), which
would change the topology of Mojo Nation content tracking to include servers
aggregating information from one another as well as client aggregating
information from multiple servers.

So anyway, what are the names for these two topologies?

Regards,

Zooko


From oskar at freenetproject.org  Wed Jun 20 08:27:02 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
In-Reply-To: <087501c0f991$ebb58590$367b9fa8@lss.emc.com>; from jeff@platypus.ro on Wed, Jun 20, 2001 at 10:04:28AM -0400
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com>
Message-ID: <20010620172900.E525@sandbergs.org>

On Wed, Jun 20, 2001 at 10:04:28AM -0400, Jeff Darcy wrote:
> Oskar Sandberg:
<> 
> > It is true that a
> > hierarchy doesn't have to have one top, but a center does not have to
> > contain one peer either.
> 
> The topology most people probably think of when they're thinking about
> decentralization is a mesh.  What I think many "flat-earth" P2P folks miss
> is that meshes, trees, and DAGs are very closely related.  Consider IP
> routing, for example.  All of the routes through the network might form a
> mesh, but the routes to one destination at a particular point in time
> (modulo a few update-propagation issues) is a tree or DAG.  When one
> considers multiply-rooted trees the similarities become even stronger.  When
> you get right into it, superficially tree-structured and superficially
> mesh-structured systems can be devilishly difficult to tell apart.  It's
> like those images where if you look at it one way you see a vase but then
> you look again you see two people about to kiss.

Yes, a good example of this are the Plaxton etc [1] systems, which they
describe in the paper as a set of trees, but which can also be viewed as
a slightly relaxed hyperdimensional mesh. I would think that any system
which allows a searching algorithm must be describable as a set of trees
(just follow the routes).

However, there is an important difference. In the sort of the systems
that I think about as truly decentralized like Plaxton, Chord [2], or
(assuming for a moment that it actually works) Freenet, the roots to the
trees are spread over all the nodes, where as a system with a hierarchy
would have the tree roots concentrated to a subset of the peers. If such
a system is still considered decentralized, then I guess we need a new
term for the other type - the only synonym that my Roget's lists for
decentralized is deconcentrated, so I'll propose that.

> As I alluded to in my last message, the important thing in decentralization
> is not that every last node be topologically or functionally equal to every
> other, but that roles not be permanently assigned to particular nodes.
> Temporary assignment of roles is fine, which is why so many P2P systems are
> sprouting "supernodes" and "brokers" and such all over the place.  As long
> as the roles can be (re)assigned automagically when *the system* (not an
> administrator) detects that it is good or necessary to do so, the system is
> decentralized in pretty much all of the ways that matter.

For lack of a better term I have been refering to the networks which use
"supernodes","brokers", "reflectors" etc as square-root networks,
because by making sqrt(N) of the nodes "supernodes" you have
O(g=sqrt(N)) growth in network traffic (per node) and O(h=sqrt(N))
growth in the tables of each of them (obviously you can shift those to
functions, but you'll need to keep g*h = N). Personally I have been
frowning at these designs and certainly not considered them
decentralized (from now on I will frown at these designs and not
consider them deconcentrated :-) ).

(btw, What is interesting about the square-root networks to me is that
they seem to approached from two directions, both from the systems that
were previously broadcast (that is g=N, h=1) but that sank under the
load, and from systems that previously completely centralized (that is
g=1, h=N) but sank under the load at central server or attacks to it.) 

Of course, I can't say that deconcentration is necessary characteristic
for every set of goals, or for sure that having the root subset be
dynamic is not enough even when strict survivability is called for. I
can't even say that deconcentration is good, just that it appeals to me.

<> 
> >  (and that the center is
> > mobile doesn't really seem to matter).
> 
> The mobility of the center really *does* matter.  The problem with having a
> center is that if it fails or slows down or is compromised then the entire
> system fails or slows down or is compromised.  If the system is designed so
> that centers can be moved or created as needed, and/or so that failures of
> various kinds are contained so that they do not affect everyone, that's the
> most important single distinction to be made between centralized vs.
> decentralized systems.  (Yes, that means a cluster can be a decentralized
> system.  The cluster infrastructure I've worked on is, within itself, as
> "pure P2P" as anything I've ever seen, with all of the resource
> location/migration and coordination issues that such headlessness implies.)

OK, but once you have a center/proper root subset (PRS) you open up a
whole can of worms regarding making sure that this cannot be abused,
both by people trying to use the limited set of nodes as an achilles
heel of the network, and by the peers in the PRS itself. Certainly
having the PRS be dynamic and mobile is one of the methods by which
this can be combated, but, while probably necessary, it is not the only
such measure, which is why I wanted to seperated these networks, and the
means necessary to secure them, from the deconcentrated ones.

> There's even a danger in being too extremist about levels of
> decentralization.  Some of the "flat-earth" systems don't really solve the
> problems associated with centralization, and are just as vulnerable to
> bottlenecks or catastrophic cascading failures as any centralized system
> ever was.  Look at what happened to Gnutella before reflectors.  All that
> some of these systems do is make it harder to identify or remedy the source
> of such problems, while also adding a whole new class of problems and
> failures related to all those useless extra levels of indirection.  IMO that
> much focus on ideology is poison.  My own system is probably not "fully
> decentralized" enough for some people, but they can just take a flying leap
> at a rolling doughnut.  I could make it more fully decentralized if I
> wanted, but not just for the sake of an ideal.  My goal is to meet certain
> requirements - efficiency, robustness, security (but not anonymity) - and if
> further decentralization does not help me meet that goal then I'm not
> interested.  Nobody should ever assume that the solution to their problem is
> the solution to everyone else's.

I think the problem is mostly semantic differences perceived as dogma.
If my defenition of decentralized were what I have now decided is
deconcentrated, then me saying that your network was not decentralized
need not be taken as an insult or attack on your work. Same thing with
terms like scalable (comparatively limited scalability is perfectly
rational design tradeoff).

<>
> > Somebody linked a paper that defined the terms regarding anonymity and
> > untracability a couple of weeks ago, which should mercifully spare us
> > the semantic bantering regarding that - anybody know of a similiar paper
> > regarding descriptions of system topologies?
> 
> There are a lot of people out there offering definitions.  Unfortunately, I
> don't think there's any one that could be considered authoritative.  Of
> course, achieving universal agreement is a known problem in decentralized
> systems so perhaps we should just deal with it.  ;-)

Well, I think one of the main reasons this list was started was that
when a lot of us in SF we found that we had all developed a different
set of vocabulary for the same thing, so setting down some authorative
definitions, at least between one another, would certainly be using this
list to good effect.

We couldn't with honestly refer to ourselves as p2p-_hackers_ if we just
sat around complaining about what we do not have. Anybody feel like
starting a p2p-hackers-dictionary somewhere on the web?

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From oskar at freenetproject.org  Wed Jun 20 08:34:02 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
In-Reply-To: <20010620172900.E525@sandbergs.org>; from oskar@freenetproject.org on Wed, Jun 20, 2001 at 05:29:00PM +0200
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org>
Message-ID: <20010620173604.F525@sandbergs.org>

On Wed, Jun 20, 2001 at 05:29:00PM +0200, Oskar Sandberg wrote:
> On Wed, Jun 20, 2001 at 10:04:28AM -0400, Jeff Darcy wrote:
> > Oskar Sandberg:

I forgot my references (maybe the p2p-hackers dictionary should also
have a reference list):

[1] C. Plaxton, R. Rajaraman, A Richa. Accessing Nearby copies of
replicated objects in a distributed environment. In Proc. of ACM SPAA,
June 1997.

[2] I Stoica, R Morris, D Karger, M Kasshoek, H Balakrishnan. Chord: A
scalable peer-to-peer lookup service for Internet applications.
Submission to ACM SIGCOMM, 2001.


-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From hal at finney.org  Wed Jun 20 09:14:01 2001
From: hal at finney.org (hal@finney.org)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] names of kinds of topology
Message-ID: <200106201605.JAA32133@finney.org>

Zooko writes:
> 2.  There are many servers.  Service happens bilaterally between client and
> server.  For clients to interact with each other, they must use the *same*
> server.  Servers aggregate results from other servers.  Clients can only use
> one server, and they choose which server to use.
>
> IRC is of this model.

I thought IRC allowed users connected to different servers to talk,
as long as the servers were connected together?  That's what netsplits
were, when servers would get disconnected and you'd lose access to the
users on the other servers.  Has it changed?

Hal

From hal at finney.org  Wed Jun 20 09:21:01 2001
From: hal at finney.org (hal@finney.org)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
Message-ID: <200106201612.JAA32168@finney.org>

Jeff writes:
> Not really.  A truly centralized system is topologically a star.  A
> hierarchical system is topologically a tree, which is not at all the same
> thing as a star and can therefore be considered decentralized.

A star is a kind of tree, one where everyone connects to the same root,
a tree of depth 1.  So you can have various degrees of centralization
in a tree.  Some trees are more equal than others.

Hal

From zooko at zooko.com  Wed Jun 20 09:26:01 2001
From: zooko at zooko.com (zooko@zooko.com)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] names of kinds of topology 
In-Reply-To: Message from hal@finney.org 
   of "Wed, 20 Jun 2001 09:05:36 PDT." <200106201605.JAA32133@finney.org> 
References: <200106201605.JAA32133@finney.org> 
Message-ID: <E15Ckjp-0002Yz-00@localhost>

 Hal Finney wrote:
>
> Zooko writes:
> > 2.  There are many servers.  Service happens bilaterally between client and
> > server.  For clients to interact with each other, they must use the *same*
> > server.  Servers aggregate results from other servers.  Clients can only use
> > one server, and they choose which server to use.
> >
> > IRC is of this model.
> 
> I thought IRC allowed users connected to different servers to talk,
> as long as the servers were connected together?  That's what netsplits
> were, when servers would get disconnected and you'd lose access to the
> users on the other servers.  Has it changed?

My fault.

You are right that IRC users can talk to each other through intermediate
servers without having to use the same server.

Regards,

Zooko


From jeff at platypus.ro  Wed Jun 20 09:35:01 2001
From: jeff at platypus.ro (Jeff Darcy)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
References: <200106201612.JAA32168@finney.org>
Message-ID: <08ac01c0f9a6$e0d1d830$367b9fa8@lss.emc.com>

> A star is a kind of tree, one where everyone connects to the same root,
> a tree of depth 1.

That's true, but not useful.  Yes, a star is a special kind of tree.  It's
also a special kind of mesh.  That doesn't make stars and trees and meshes
equivalent in any practical kind of way.  It's like saying a doughnut is the
same as a coffee cup because they're both genus one, but if you try and pour
your coffee into the wrong one you'll learn the difference real fast.


From jeff at platypus.ro  Wed Jun 20 12:14:01 2001
From: jeff at platypus.ro (Jeff Darcy)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org>
Message-ID: <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com>

> However, there is an important difference. In the sort of the systems
> that I think about as truly decentralized like Plaxton, Chord [2], or
> (assuming for a moment that it actually works) Freenet, the roots to the
> trees are spread over all the nodes, where as a system with a hierarchy
> would have the tree roots concentrated to a subset of the peers.

I think we need to consider time as part of the taxonomy.  Permanent
concentration is not the same as transient concentration, which in turn is
not the same as no concentration at all.

> If such
> a system is still considered decentralized, then I guess we need a new
> term for the other type - the only synonym that my Roget's lists for
> decentralized is deconcentrated, so I'll propose that.

FWIW, I call them "flat" decentralized systems.

> For lack of a better term I have been refering to the networks which use
> "supernodes","brokers", "reflectors" etc as square-root networks,
> because by making sqrt(N) of the nodes "supernodes" you have
> O(g=sqrt(N)) growth in network traffic (per node) and O(h=sqrt(N))
> growth in the tables of each of them (obviously you can shift those to
> functions, but you'll need to keep g*h = N).

That formula simply doesn't work.  Total traffic remaining constant, traffic
per node is proportional to mean path length - the measure most often used
by people who actually study routing.  Let's look at a few topologies and
see how table size and path length are affected.

(1) For a fully-connected network, path length is always 1 and table size is
always N.  There's our starting point.

(2) For a star, path length is always 2 and table size is always 1, except
for the hub where they're 1 and N respectively.  The averages both work out
to 2, for a constant value of 4 regardless of network size.  Hmmm.

(3) For a two-level hierarchy (sqrt(N) pools of sqrt(N) nodes, with one
"gateway" per pool), the mean path length asymptotically increases toward 3
as N increases, and the mean table size likewise decreases toward sqrt(N).
For large N, therefore, our product would be 3*sqrt(N).

I'm not doing this just to pick nits, either.  It's important.  The
relationship you state between table size and path length (or traffic per
node) only applies for fully-connected or broadcast networks.  The search
behavior of several well-known networks does fit this model and makes it
relevant, but for many other purposes or structures it's a total red
herring.  It *is* possible to maintain short path lengths and small table
sizes concurrently in many situations.

> (btw, What is interesting about the square-root networks to me is that
> they seem to approached from two directions, both from the systems that
> were previously broadcast (that is g=N, h=1) but that sank under the
> load, and from systems that previously completely centralized (that is
> g=1, h=N) but sank under the load at central server or attacks to it.)

That's an important observation, and it's precisely the reason that I don't
believe in "flat-earth" decentralization as a practical approach for most
problems.  Extremes are rarely optimal in the real world.  Time after time
I've seen flat decentralized systems fail when N gets large because some
part somewhere had a communications complexity of O(N) or worse, and I hate
seeing systems fail.  Hierarchical approaches avoid that trap, and as long
as the hierarchy is dynamic and adaptive they avoid the traps inherent in
full centralization as well.

> Of course, I can't say that deconcentration is necessary characteristic
> for every set of goals, or for sure that having the root subset be
> dynamic is not enough even when strict survivability is called for. I
> can't even say that deconcentration is good, just that it appeals to me.

Yes, it is very appealing aesthetically, and it's necessary and/or
appropriate for some situations.

> OK, but once you have a center/proper root subset (PRS) you open up a
> whole can of worms regarding making sure that this cannot be abused,
> both by people trying to use the limited set of nodes as an achilles
> heel of the network, and by the peers in the PRS itself. Certainly
> having the PRS be dynamic and mobile is one of the methods by which
> this can be combated, but, while probably necessary, it is not the only
> such measure, which is why I wanted to seperated these networks, and the
> means necessary to secure them, from the deconcentrated ones.

That's reasonable enough.  Just remember that we all end up eating worms of
one kind or another.

> Well, I think one of the main reasons this list was started was that
> when a lot of us in SF we found that we had all developed a different
> set of vocabulary for the same thing, so setting down some authorative
> definitions, at least between one another, would certainly be using this
> list to good effect.

Where the hell was I during these talks?  Probably too busy answering all
the people who were asking why I - as an employee of a large corporation not
known for participating in the open exchange of ideas - was there at all.
Oh well.

Going back to taxonomy, I'll take a stab at it.  Here are some example
structures:

(1) Fully centralized.  All operations involve a single permanent root node.

(2) Partitioned.  There are multiple roots, separated by role, location, or
other criteria (e.g. partitioned namespace).  Loss of a root is catastrophic
wrt its own responsibilities, irrelevant wrt others.

(3) Decentralized.  There are multiple roots and/or intermediate nodes
capable of operating independently and/or taking over for one another, such
that loss of any N is disruptive but not fatal (unless it causes a partition
of the entire network/system).

(4) Flat ("deconcentrated").  There are no roots at all, even dynamically
assigned or role-specific.  Operations depend only on endpoints and the
existence of some path between them.


From oskar at freenetproject.org  Wed Jun 20 14:45:02 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
In-Reply-To: <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com>; from jeff@platypus.ro on Wed, Jun 20, 2001 at 03:12:54PM -0400
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com>
Message-ID: <20010620234727.K525@sandbergs.org>

On Wed, Jun 20, 2001 at 03:12:54PM -0400, Jeff Darcy wrote:
> > However, there is an important difference. In the sort of the systems
> > that I think about as truly decentralized like Plaxton, Chord [2], or
> > (assuming for a moment that it actually works) Freenet, the roots to the
> > trees are spread over all the nodes, where as a system with a hierarchy
> > would have the tree roots concentrated to a subset of the peers.
> 
> I think we need to consider time as part of the taxonomy.  Permanent
> concentration is not the same as transient concentration, which in turn is
> not the same as no concentration at all.

However the mobility of the tree roots is quite orthoganol to their
concentration. A deconcentrated network can also have roots that are
static or mobile over time (in fact, the combination of deconcentration
and mobility is what keeps me working on Freenet in spite of my doubts
regarding the system.)

> > If such
> > a system is still considered decentralized, then I guess we need a new
> > term for the other type - the only synonym that my Roget's lists for
> > decentralized is deconcentrated, so I'll propose that.
> 
> FWIW, I call them "flat" decentralized systems.

The way you used flat in your last mail, and below, gave me the
impression that you refering to original Gnutella type networks, where
you have no information where to go (it all looks the same). Networks
like Plaxton, while certainly deconcentrated, don't feel flat...

> > For lack of a better term I have been refering to the networks which use
> > "supernodes","brokers", "reflectors" etc as square-root networks,
> > because by making sqrt(N) of the nodes "supernodes" you have
> > O(g=sqrt(N)) growth in network traffic (per node) and O(h=sqrt(N))
> > growth in the tables of each of them (obviously you can shift those to
> > functions, but you'll need to keep g*h = N).
> 
> That formula simply doesn't work.  Total traffic remaining constant, traffic
> per node is proportional to mean path length - the measure most often used
> by people who actually study routing.  Let's look at a few topologies and
> see how table size and path length are affected.

Because you brought up the specific example of using "supernodes" like
the second generation file swapping services (I think examples are
KaZaa, EDonkey, and Gnutella with reflectors), I drifted off into a
discussion of those systems, meaning not only that there are supernodes,
but there is no sorting of the data between them. Thus, to query for
something in a global namespace, Alice would need to contact every
"supernode", and each supernode needs to hold all the entries for a
partition of the entire peer population.

Hence if there are X supernodes, the amount of traffic will be of order
X for every user Alice, and the supernode will need to hold in the order
of N/X entries. Thus my examples, with one single supernode (Napster)
each user only needs to contact 1 party, but the supernode must hold all
N entries. If every node is supernode (Gnutella), the amount of entries
is constant, but each user must contact N parties.

Another way of putting it is: 

  If you are searching for data in a global namespace, the network
  contains no sorting between the nodes, T is the total network traffic,
  P is the portion of the nodes that contain information, and N is the
  total number of nodes, than:

  T/P = k*N?     (for some contant k)

This isn't very profound when you think about it, but I think it is
important to note.

> (1) For a fully-connected network, path length is always 1 and table size is
> always N.  There's our starting point.
> 
> (2) For a star, path length is always 2 and table size is always 1, except
> for the hub where they're 1 and N respectively.  The averages both work out
> to 2, for a constant value of 4 regardless of network size.  Hmmm.
> 
> (3) For a two-level hierarchy (sqrt(N) pools of sqrt(N) nodes, with one
> "gateway" per pool), the mean path length asymptotically increases toward 3
> as N increases, and the mean table size likewise decreases toward sqrt(N).
> For large N, therefore, our product would be 3*sqrt(N).
> 
> I'm not doing this just to pick nits, either.  It's important.  The
> relationship you state between table size and path length (or traffic per
> node) only applies for fully-connected or broadcast networks.  The search
> behavior of several well-known networks does fit this model and makes it
> relevant, but for many other purposes or structures it's a total red
> herring.  It *is* possible to maintain short path lengths and small table
> sizes concurrently in many situations.

I'm not going to argue that it is not, but rather that for that to be so
it is necessary to sort entries in some manner between the nodes
(remember the problem is finding an entry in a global namespace). Some
concentration may make that easier, but certainly not right off.

<> 
> > Well, I think one of the main reasons this list was started was that
> > when a lot of us in SF we found that we had all developed a different
> > set of vocabulary for the same thing, so setting down some authorative
> > definitions, at least between one another, would certainly be using this
> > list to good effect.
> 
> Where the hell was I during these talks?  Probably too busy answering all
> the people who were asking why I - as an employee of a large corporation not
> known for participating in the open exchange of ideas - was there at all.
> Oh well.
> 
> Going back to taxonomy, I'll take a stab at it.  Here are some example
> structures:
> 
> (1) Fully centralized.  All operations involve a single permanent root node.

Ok.

> (2) Partitioned.  There are multiple roots, separated by role, location, or
> other criteria (e.g. partitioned namespace).  Loss of a root is catastrophic
> wrt its own responsibilities, irrelevant wrt others.

Ok.

> (3) Decentralized.  There are multiple roots and/or intermediate nodes
> capable of operating independently and/or taking over for one another, such
> that loss of any N is disruptive but not fatal (unless it causes a partition
> of the entire network/system).

Ok (it will take some adaption on my part, but people have often
responded emotionally to claim that their designs were centralized, so
it's probably for the better.)

> (4) Flat ("deconcentrated").  There are no roots at all, even dynamically
> assigned or role-specific.  Operations depend only on endpoints and the
> existence of some path between them.

I'm not sure about this. What I mean by deconcentrated is that there are
roots, just that every node is (or could be, there may be less trees
then nodes) also a root of some tree.

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From blanu at uts.cc.utexas.edu  Wed Jun 20 15:13:01 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching +
 consistent hashing?)
In-Reply-To: <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com>
Message-ID: <Pine.OSF.4.33.0106201649080.2167-100000@curly.cc.utexas.edu>

> That's an important observation, and it's precisely the reason that I don't
> believe in "flat-earth" decentralization as a practical approach for most
> problems.  Extremes are rarely optimal in the real world.

This statement is too general to be practically useful in conversation.
You can't really meaningfully talk about "most problems". Perhaps in your
sphere of interest fully decentralized systems aren't appropriate.
However, in anonymous systems I find any system that is not fully
decentralized to be unacceptable as it creates weak points for attack and
subversion.

Of course, as someone pointed out earlier, you can create supernode-like
node in a fully decentralize system by creating a lot of virtual nodes.
But at least the architecture isn't helping you out.


From jeff at platypus.ro  Wed Jun 20 15:16:01 2001
From: jeff at platypus.ro (Jeff Darcy)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
References: <Pine.OSF.4.33.0106201649080.2167-100000@curly.cc.utexas.edu>
Message-ID: <08f601c0f9d6$85b6c840$367b9fa8@lss.emc.com>

> This statement is too general to be practically useful in conversation.
> You can't really meaningfully talk about "most problems". Perhaps in your
> sphere of interest fully decentralized systems aren't appropriate.
> However, in anonymous systems I find any system that is not fully
> decentralized to be unacceptable as it creates weak points for attack and
> subversion.

Sorry to be the one to point this out, but anonymous systems are a tiny
little niche within the overall space of applications.  If there's anyone
who lacks perspective, it's those who live inside that bubble, not outside.


From jeff at platypus.ro  Wed Jun 20 15:37:02 2001
From: jeff at platypus.ro (Jeff Darcy)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> <20010620234727.K525@sandbergs.org>
Message-ID: <090201c0f9d9$670193f0$367b9fa8@lss.emc.com>

> However the mobility of the tree roots is quite orthoganol to their
> concentration. A deconcentrated network can also have roots that are
> static or mobile over time (in fact, the combination of deconcentration
> and mobility is what keeps me working on Freenet in spite of my doubts
> regarding the system.)

OK, I admit I'm confused.  Earlier you claimed that hierarchy and
decentralization were antithetical, but now you're talking about roots in
the context of a "deconcentrated" network.  What are these roots of which
you speak?  What is their role?  Must other nodes contact these roots to
access resources?  When I use "root" I refer to a node that is distinguished
from its neighbors by an asymmetric/hierarchical relationship in which it is
allowed to complete requests or perform actions without contacting them
while they are not similarly free to do so without contacting it.  I'm not
sure what a "root" is when there's no "up" or "down" - as I believe would be
the case in a "deconcentrated" network.

> The way you used flat in your last mail, and below, gave me the
> impression that you refering to original Gnutella type networks, where
> you have no information where to go (it all looks the same). Networks
> like Plaxton, while certainly deconcentrated, don't feel flat...

I think we need to clarify whether we're talking about the topology of the
system itself or of the namespace that system uses.  They're two different
things; I've been focusing on the former, but I'm beginning to get the
impression that you're talking about the latter.

> Because you brought up the specific example of using "supernodes" like
> the second generation file swapping services (I think examples are
> KaZaa, EDonkey, and Gnutella with reflectors), I drifted off into a
> discussion of those systems, meaning not only that there are supernodes,
> but there is no sorting of the data between them. Thus, to query for
> something in a global namespace, Alice would need to contact every
> "supernode", and each supernode needs to hold all the entries for a
> partition of the entire peer population.

In other words, a broadcast network (among the supernodes).  In that very
particular case, your formula does indeed apply, but to be honest I consider
it a fairly degenerate case.

> > It *is* possible to maintain short path lengths and small table
> > sizes concurrently in many situations.
>
> I'm not going to argue that it is not, but rather that for that to be so
> it is necessary to sort entries in some manner between the nodes

Yes, in the particular case of searching, that is true.  That's why I'm not
a big fan of searching, and prefer proactive indexing instead.

> > (4) Flat ("deconcentrated").  There are no roots at all, even
dynamically
> > assigned or role-specific.  Operations depend only on endpoints and the
> > existence of some path between them.
>
> I'm not sure about this. What I mean by deconcentrated is that there are
> roots, just that every node is (or could be, there may be less trees
> then nodes) also a root of some tree.

Again, I think we need to be clear whether we're talking about the network
or the namespace.


From oskar at freenetproject.org  Wed Jun 20 17:35:02 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
In-Reply-To: <090201c0f9d9$670193f0$367b9fa8@lss.emc.com>; from jeff@platypus.ro on Wed, Jun 20, 2001 at 06:35:37PM -0400
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> <20010620234727.K525@sandbergs.org> <090201c0f9d9$670193f0$367b9fa8@lss.emc.com>
Message-ID: <20010621023741.M525@sandbergs.org>

On Wed, Jun 20, 2001 at 06:35:37PM -0400, Jeff Darcy wrote:
<swapping paragraphs around so this make more sense>
> > The way you used flat in your last mail, and below, gave me the
> > impression that you refering to original Gnutella type networks, where
> > you have no information where to go (it all looks the same). Networks
> > like Plaxton, while certainly deconcentrated, don't feel flat...
> 
> I think we need to clarify whether we're talking about the topology of the
> system itself or of the namespace that system uses.  They're two different
> things; I've been focusing on the former, but I'm beginning to get the
> impression that you're talking about the latter.

The general problem that we are dealing with here, as far as I know, is
publishing and finding data in distributed networks. Finding entries in
a global namespace is a relaxation of this problem (depending on
implementation it may be one phase of it). 

I don't really see the point in discussing the topology in regard to
something else then the problem at hand. It might be worth considering
the problem from the perspective of being forced into a certain topology
between the peers, but one is making the problem much harder which I
hesitate to do until the easier case is solved.  I'll gladly admit my
complete naivite' regarding classic approaces to these problems however,
so I could just be misunderstanding something.

<swap>
> > However the mobility of the tree roots is quite orthoganol to their
> > concentration. A deconcentrated network can also have roots that are
> > static or mobile over time (in fact, the combination of deconcentration
> > and mobility is what keeps me working on Freenet in spite of my doubts
> > regarding the system.)
> 
> OK, I admit I'm confused.  Earlier you claimed that hierarchy and
> decentralization were antithetical, but now you're talking about roots in
> the context of a "deconcentrated" network.  What are these roots of which
> you speak?  What is their role?  Must other nodes contact these roots to
> access resources?  When I use "root" I refer to a node that is distinguished
> from its neighbors by an asymmetric/hierarchical relationship in which it is
> allowed to complete requests or perform actions without contacting them
> while they are not similarly free to do so without contacting it.  I'm not
> sure what a "root" is when there's no "up" or "down" - as I believe would be
> the case in a "deconcentrated" network.

For any individual query the network contains a tree that leeds to a
root (or root set). Plaxton and co's paper, which I referenced earlier
contains a very good example of such a system. There are trees, but
every node takes part at different levels in many of these trees (and as
a leaf in every single one).

<>
> > Because you brought up the specific example of using "supernodes" like
> > the second generation file swapping services (I think examples are
> > KaZaa, EDonkey, and Gnutella with reflectors), I drifted off into a
> > discussion of those systems, meaning not only that there are supernodes,
> > but there is no sorting of the data between them. Thus, to query for
> > something in a global namespace, Alice would need to contact every
> > "supernode", and each supernode needs to hold all the entries for a
> > partition of the entire peer population.
> 
> In other words, a broadcast network (among the supernodes).  In that very
> particular case, your formula does indeed apply, but to be honest I consider
> it a fairly degenerate case.

I must admit I don't understand what application you are aiming at.
Could you give me an example of the sort of network you are discussing?

> > > It *is* possible to maintain short path lengths and small table
> > > sizes concurrently in many situations.
> >
> > I'm not going to argue that it is not, but rather that for that to be so
> > it is necessary to sort entries in some manner between the nodes
> 
> Yes, in the particular case of searching, that is true.  That's why I'm not
> a big fan of searching, and prefer proactive indexing instead.

Trying to index every entry at every node is not a particularly scalable
solution either. 

Searching usually makes people think of keyword searches (where the
thread this sprang from started), but my discussion is generalized to
any sort of lookup (which usually defaults to binary identifiers).

> > > (4) Flat ("deconcentrated").  There are no roots at all, even dynamically
> > > assigned or role-specific.  Operations depend only on endpoints and the
> > > existence of some path between them.
> >
> > I'm not sure about this. What I mean by deconcentrated is that there are
> > roots, just that every node is (or could be, there may be less trees
> > then nodes) also a root of some tree.
> 
> Again, I think we need to be clear whether we're talking about the network
> or the namespace.

Especially if you are not interested in anonymity (as I gather you are
not), what role does the network have besides serving as a namespace for
lookups?

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From jeff at platypus.ro  Wed Jun 20 21:04:01 2001
From: jeff at platypus.ro (Jeff Darcy)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> <20010620234727.K525@sandbergs.org> <090201c0f9d9$670193f0$367b9fa8@lss.emc.com> <20010621023741.M525@sandbergs.org>
Message-ID: <093001c0fa07$20029890$367b9fa8@lss.emc.com>

From: "Oskar Sandberg" <oskar@freenetproject.org>
> The general problem that we are dealing with here, as far as I know, is
> publishing and finding data in distributed networks. Finding entries in
> a global namespace is a relaxation of this problem (depending on
> implementation it may be one phase of it).

My apologies, then.  I apparently interpreted questions such as "aren't
hierarchical and decentralized as
close to antonymous design processes as one can come" and "anybody know of a
similiar paper
regarding descriptions of system topologies" as invitations to a broader
discussion when in fact they were not.

> I don't really see the point in discussing the topology in regard to
> something else then the problem at hand.

Anything not directly related to the problem at hand is "pointless"?  I'm
sorry, but I cannot agree.  Focus is good, but myopia is not good focus.
Would you try to shut down a discussion of security until it became the
problem at hand, or might you occasionally be interested in thinking and
talking about it at leisure so you don't have to scramble to catch up when
the shit hits the fan?

> For any individual query the network contains a tree that leeds to a
> root (or root set). Plaxton and co's paper, which I referenced earlier
> contains a very good example of such a system. There are trees, but
> every node takes part at different levels in many of these trees (and as
> a leaf in every single one).

This seems to be another example of the previously-mentioned relationships
between trees and meshes, very similar to the example I gave of a route tree
superimposed on a mesh.  In fact, Plaxton et al. explicitly mention the
similarity to a routing protocol.  IMO the "roots" in their scheme are like
destinations in a route tree, and not really roots in the sense that the
term would usually be understood.  Certainly they're not roots in the sense
that *I* have been using the term.

> > In other words, a broadcast network (among the supernodes).  In that
very
> > particular case, your formula does indeed apply, but to be honest I
consider
> > it a fairly degenerate case.
>
> I must admit I don't understand what application you are aiming at.
> Could you give me an example of the sort of network you are discussing?

Pretty much anything other than the various functional clones of Napster.
Very few other distributed applications rely in any significant way on
broadcast.  From IRC and the web to DNS and IP routing itself, unicast
communication predominates by a huge margin. Perhaps "degenerate" is too
strong a word, but broadcast is certainly a special case and rules that
apply only to special cases should not be stated as though they were
general.

> > Yes, in the particular case of searching, that is true.  That's why I'm
not
> > a big fan of searching, and prefer proactive indexing instead.
>
> Trying to index every entry at every node is not a particularly scalable
> solution either.

Then it's a good thing nobody suggested that.

> Especially if you are not interested in anonymity (as I gather you are
> not), what role does the network have besides serving as a namespace for
> lookups?

That's an amazing question, especially from a Freenet guy.  Let's see, what
can we use a network for, besides searching?  Hey, I've got it, maybe we can
use the network for transferring the *files* as well!  Sound familiar?  I do
believe Freenet sometimes uses the network that way, on the rare occasions
that a user is actually able to find a file and wishes to retrieve its
contents.

OK, sorry for being so sarcastic.  It seems that managing the metadata has
become such a big pain in the ass for some systems that their designers have
forgotten about what the users really want/need - the data.  The sort of
lookups you're talking about are just *one way* to get at the data.  It's a
pretty problematic way at that, which is why the vast majority of
applications avoid it.  I don't need to broadcast a query to read a web page
or log in to a remote system or play online chess, but somehow I still find
the information I need to do those things.  In my own project I never use
such methods, and yet users can still find data from anywhere using highly
familiar and intuitive means.


From blanu at uts.cc.utexas.edu  Wed Jun 20 21:07:01 2001
From: blanu at uts.cc.utexas.edu (Brandon)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching +
 consistent hashing?)
In-Reply-To: <08f601c0f9d6$85b6c840$367b9fa8@lss.emc.com>
Message-ID: <Pine.OSF.4.33.0106202302000.5180-100000@curly.cc.utexas.edu>

> Sorry to be the one to point this out, but anonymous systems are a tiny
> little niche within the overall space of applications.  If there's anyone
> who lacks perspective, it's those who live inside that bubble, not outside.

Your first statement is quite true. I'm just pointing out that it is
overgeneralizing to simply say fully decentralized systems are not useful
when there is at least one type of application for which *only* fully
decentralized systems are suitable.

Your second statement is another overly general philosophical statement
that doesn't apply to the discussion in a useful way.


From oskar at freenetproject.org  Thu Jun 21 06:46:01 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:42 2006
Subject: shared namespaces (was: Re: [p2p-hackers] keyword searching + consistent hashing?)
In-Reply-To: <093001c0fa07$20029890$367b9fa8@lss.emc.com>; from jeff@platypus.ro on Thu, Jun 21, 2001 at 12:03:26AM -0400
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> <20010620234727.K525@sandbergs.org> <090201c0f9d9$670193f0$367b9fa8@lss.emc.com> <20010621023741.M525@sandbergs.org> <093001c0fa07$20029890$367b9fa8@lss.emc.com>
Message-ID: <20010621154823.A629@sandbergs.org>

On Thu, Jun 21, 2001 at 12:03:26AM -0400, Jeff Darcy wrote:
> From: "Oskar Sandberg" <oskar@freenetproject.org>
<> 
> > I don't really see the point in discussing the topology in regard to
> > something else then the problem at hand.
> 
> Anything not directly related to the problem at hand is "pointless"?  I'm
> sorry, but I cannot agree.  Focus is good, but myopia is not good focus.
> Would you try to shut down a discussion of security until it became the
> problem at hand, or might you occasionally be interested in thinking and
> talking about it at leisure so you don't have to scramble to catch up when
> the shit hits the fan?

In a way, yes. I would argue that a discussing security is pretty
meaningless outside the context of what you are trying secure.

A discussion on the value moats isn't particularly pointful if you are
trying secure an email message, and a comparison between RSA and ElGamal
when trying to secure a medieval castle is most certainly a waste of
time.

> > For any individual query the network contains a tree that leeds to a
> > root (or root set). Plaxton and co's paper, which I referenced earlier
> > contains a very good example of such a system. There are trees, but
> > every node takes part at different levels in many of these trees (and as
> > a leaf in every single one).
> 
> This seems to be another example of the previously-mentioned relationships
> between trees and meshes, very similar to the example I gave of a route tree
> superimposed on a mesh.  In fact, Plaxton et al. explicitly mention the
> similarity to a routing protocol.  IMO the "roots" in their scheme are like
> destinations in a route tree, and not really roots in the sense that the
> term would usually be understood.  Certainly they're not roots in the sense
> that *I* have been using the term.

I have been using root as in "root of a route tree". By a Proper Root
Subset I meant that the set of peers that can be roots to the lookup
route trees is a proper subset of set of peers in the network, and by
deconcentrated that all nodes in the network are also roots of route
trees.

< >
> > I must admit I don't understand what application you are aiming at.
> > Could you give me an example of the sort of network you are discussing?
> 
> Pretty much anything other than the various functional clones of Napster.
> Very few other distributed applications rely in any significant way on
> broadcast.  From IRC and the web to DNS and IP routing itself, unicast
> communication predominates by a huge margin. Perhaps "degenerate" is too
> strong a word, but broadcast is certainly a special case and rules that
> apply only to special cases should not be stated as though they were
> general.

I'm not a fan of broadcasting either, my question was more to the effect
of what application it is you wish to build. I gather you are looking to
create a general point to point routing system for an overlay network?
What motivates this?

> > > Yes, in the particular case of searching, that is true.  That's why I'm
> not
> > > a big fan of searching, and prefer proactive indexing instead.
> >
> > Trying to index every entry at every node is not a particularly scalable
> > solution either.
> 
> Then it's a good thing nobody suggested that.

You have a reference as to what you do mean by proactive indexing?
Google wasn't very helpful, and I thought I remembered a discussion
between you and the creator of the "blocks" system on Infoanarchy.org
where you refered to what he is doing (which, AFAIK, is indexing every
entry at every node) by that name.

> > Especially if you are not interested in anonymity (as I gather you are
> > not), what role does the network have besides serving as a namespace for
> > lookups?
> 
> That's an amazing question, especially from a Freenet guy.  Let's see, what
> can we use a network for, besides searching?  Hey, I've got it, maybe we can
> use the network for transferring the *files* as well!  Sound familiar?  I do
> believe Freenet sometimes uses the network that way, on the rare occasions
> that a user is actually able to find a file and wishes to retrieve its
> contents.

Well, besides caching for performance, the only other reason I see to
not simply transfer the data directly between the publisher and the
requestor is to disassociate them from one another (ie striving toward
anonymity).

> OK, sorry for being so sarcastic.  It seems that managing the metadata has
> become such a big pain in the ass for some systems that their designers have
> forgotten about what the users really want/need - the data.  The sort of
> lookups you're talking about are just *one way* to get at the data.  It's a
> pretty problematic way at that, which is why the vast majority of
> applications avoid it.  I don't need to broadcast a query to read a web page
> or log in to a remote system or play online chess, but somehow I still find
> the information I need to do those things.  In my own project I never use
> such methods, and yet users can still find data from anywhere using highly
> familiar and intuitive means.

Yes, but the web already works, I don't see any reason to reimplement
it. What I am interested in is disassocating data from any physical
location (my motivations for which are mostly political) - and like I
said, being able to look things up in a global namespace is a relaxation
of that problem. Plaxton's proposed system works by having the namespace
lookup result in pointers to the actual location of the data objects,
where as our network attempts to return the data directly to the
namespace lookup, but the lookup is still there.

The related problem of creating a file sharing system like Napster and
Gnutella that isn't centralized like the prior or limited in size like
the latter is, while not directly what I am doing ATM, certainly
interesting and useful.

I would be very interested to hear more about "your project" however.
You have a reference to a paper or implementation?

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From jeff at platypus.ro  Thu Jun 21 08:29:01 2001
From: jeff at platypus.ro (Jeff Darcy)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] Re: topology, goals, etc. (was Re: shared namespaces)
References: <Pine.OSF.4.33.0106192108090.22307-100000@curly.cc.utexas.edu> <085b01c0f930$b05d12d0$367b9fa8@lss.emc.com> <20010620123459.B525@sandbergs.org> <087501c0f991$ebb58590$367b9fa8@lss.emc.com> <20010620172900.E525@sandbergs.org> <08b701c0f9bd$146655c0$367b9fa8@lss.emc.com> <20010620234727.K525@sandbergs.org> <090201c0f9d9$670193f0$367b9fa8@lss.emc.com> <20010621023741.M525@sandbergs.org> <093001c0fa07$20029890$367b9fa8@lss.emc.com> <20010621154823.A629@sandbergs.org>
Message-ID: <096c01c0fa66$caee31b0$367b9fa8@lss.emc.com>

From: "Oskar Sandberg" <oskar@freenetproject.org>
> A discussion on the value moats isn't particularly pointful if you are
> trying secure an email message, and a comparison between RSA and ElGamal
> when trying to secure a medieval castle is most certainly a waste of
> time.

I still don't agree.  If you're trying to secure a medieval castle then
comparisons between RSA and ElGamal should obviously have a lower priority
than discussions of moats and vats of boiling oil, but to say that it's a
"waste of time" implies that it's not worthy of discussion *at all*.  I'm
not under the impression that people's time spent reading or posting to this
list is such a rare or valuable commodity that we must reject as "pointless"
anything not related to some arbitrarily (and sometimes selfishly) chosen
"problem at hand".

> I'm not a fan of broadcasting either, my question was more to the effect
> of what application it is you wish to build. I gather you are looking to
> create a general point to point routing system for an overlay network?

That's actually only a small part of what I'm working on.  I'll describe the
project shortly.

> You have a reference as to what you do mean by proactive indexing?
> Google wasn't very helpful, and I thought I remembered a discussion
> between you and the creator of the "blocks" system on Infoanarchy.org
> where you refered to what he is doing (which, AFAIK, is indexing every
> entry at every node) by that name.

By proactive indexing I simply mean that (meta-)information about data
locations is distributed in advance of requests for that information.  That
does not, however, mean that all meta-information is distributed to every
node.  It's quite reasonable to distribute and cache partial information
proactively, and then fall back to searching or an authoritative catalog
when nothing is found in the cache.  If a catalog is used, it in turn could
use any of the levels of decentralization we've mentioned, and searching
would still be unnecessary.

BTW, there are real systems that work on pretty much this basis, though they
might not be the sorts of systems people people here like to study.  For
example, InterLibrary Loan uses just this combination of authoritative
catalogs and distributed partial information, as do quite a few criminal and
medical databases.  They handle more traffic than any of the filesharing
programs more typically considered as examples, and they work pretty well
with this model.

> Well, besides caching for performance, the only other reason I see to
> not simply transfer the data directly between the publisher and the
> requestor is to disassociate them from one another (ie striving toward
> anonymity).

First, that's a helluva big "besides".  Caching for performance is no small
matter.

Secondly, if you don't see other reasons besides these two then perhaps you
should think about it some more.  I've seen such overlay networks used many
times, usually when the implementor thinks they're better able to provide
security/QoS guarantees or load distribution or fault-tolerant routing
better than the underlying network does. Just about every CDN fits that
description, as might Swarmcast or even Uprizer.

> Yes, but the web already works, I don't see any reason to reimplement
> it. What I am interested in is disassocating data from any physical
> location (my motivations for which are mostly political)

That's an entirely valid and even laudable interest, and if it guides your
choices then so be it.  However, it's also a rather uncommon interest in the
overall scheme of things, which usually means that those choices don't
generalize very well to other projects/applications with different goals.
It's absolutely fine for you to view the world through Freenet-colored
glasses because that's your project, but you should be prepared to accept
that others will probably see things a little differently.

> The related problem of creating a file sharing system like Napster and
> Gnutella that isn't centralized like the prior or limited in size like
> the latter is, while not directly what I am doing ATM, certainly
> interesting and useful.
>
> I would be very interested to hear more about "your project" however.
> You have a reference to a paper or implementation?

As I said earlier, my employer is not known for participating in the open
exchange of ideas, so I'm somewhat constrained in how much detail I can
provide.  Even mentioning it could conceivably get me in trouble.  Sometimes
I regret staying here to do this instead of going off on my own to do it,
because my personal inclination is to keep things open, but there's
something to be said for having the resources of such a behemoth at one's
disposal and that's the tradeoff I chose to make.  In any case all of my
options are underwater so maybe I don't care.  ;-)

The easiest starting point is probably my "Beyond FTP" article at
infoAnarchy
(http://www.infoanarchy.org/?op=displaystory;sid=2001/5/16/153927/189).  My
goal is to satisfy all of the requirements mentioned; quite bluntly I
believe that anything less is a lazy cop-out that addresses programmer egos
more than user needs.  Beyond these "surface" requirements, my primary goal
is enhanced performance and scalability.  There are also intrinsic
fault-recovery and security (but not anonymity) requirements.

Structurally, the system starts with a recognition that all nodes are not
created equal.  There are therefore N root nodes (N is likely to be small),
each independently capable of anchoring the entire system so that N-1
concurrent root failures can be tolerated.  The remaining nodes use routing
methods to form interlocking trees leading to these roots, in a fashion that
is actually quite reminiscent of Plaxton.  Data is aggressively cached
everywhere to enhance performance and scalability, while maintaining full
consistency (sequential consistency, for those who care).  The service
provided is block-level, appearing to the system as a disk drive, so there
are no significant naming issues to be dealt with.  To make this useful, one
needs a shared-storage filesystem, which is a pretty exotic kind of creature
but it just so happens that I recently co-designed one
(http://www.emc.com/products/software/highroad.jsp).  For various reasons
that I won't go into, GFS might be even better suited to this environment,
but I haven't had a chance to pursue that just yet.

In case anyone is thinking these are just dreams, I should mention that I
already have a working version.  There are a few pieces still missing - most
notably sophisticated routing and some security stuff that I'm not going to
talk about - and some implementation shortcuts have been taken (some stuff
is still in user space that doesn't truly belong there) but the core
functionality is there and it's already good enough for internal
demonstrations.  Very limited performance analysis has so far yielded
results exceeding expectations for this stage, and only non-technical
aspects of the project are holding back further progress.  No, you won't be
seeing the source any time soon.  Sorry.

So now you know where I'm coming from.  I hope that my description will help
you realize why I tend to look at things in a somewhat non-canonical way.
It might also explain some of my frustration with the people I see on
mailing list after mailing list, website after website, conference after
conference, preannouncing and reannouncing their projects multiple times,
reinforcing each others' fundamental biases even as they argue over arcane
details.  Sometimes it seems like everyone's attacking that medieval castle
through the heavily fortified north gate while the east, west and south
gates remain lightly defended.


From lucas at gonze.com  Sun Jun 24 11:27:01 2001
From: lucas at gonze.com (Lucas Gonze)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] names of kinds of topology
In-Reply-To: <E15Cjob-0002Dp-00@localhost>
Message-ID: <NEBBJIHMMLKHEOPNOGHDOEHODPAA.lucas@gonze.com>

Zooko -

In trying to parse your idea I run up against a problem with this language from
#2 "For clients to interact with each other, they must use the *same* server."
Did you mean to say "For clients to interact with each other, they do not have
to use the *same* server."?

The reason I have a problem with it is that I believe the core difference is
where aggregation happens -- at the client/endpoint or at the
server/nearest_intermediary.  If not for that funny language, I would say that
the difference between #1 and #2 is that #1 doesn't have agents and #2 does.
The reason I bring up agents is that servers that aggregate results from other
servers are taking initiative on behalf of the client/endpoint.

- Lucas


> Here are two common "design patterns" in network topology that I frequently
> encounter and I wish I had a specific name for.  (For these topologies,
> "decentralized" is neither specific nor unambiguous, but then neither is
> "centralized".)
>
> 1.  There are many servers.  Service happens bilaterally between client and
> server.  For clients to interact with each other, they must use the *same*
> server.  Clients can use multiple servers, aggregating results from multiple
> servers and dynamically choosing which servers to use.
>
> Mojo Nation's "content tracking" architecture is currently of this model.
>
> 2.  There are many servers.  Service happens bilaterally between client and
> server.  For clients to interact with each other, they must use the *same*
> server.  Servers aggregate results from other servers.  Clients can only use
> one server, and they choose which server to use.
>
> IRC is of this model.
>
> Jukka Santala had a patch that allowed Mojo Nation "Content Trackers" to suck
> information out of each other (acting as clients in that transaction), which
> would change the topology of Mojo Nation content tracking to include servers
> aggregating information from one another as well as client aggregating
> information from multiple servers.
>
> So anyway, what are the names for these two topologies?
>
> Regards,
>
> Zooko
>
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
>


From zooko at zooko.com  Mon Jun 25 09:02:01 2001
From: zooko at zooko.com (zooko@zooko.com)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] names of kinds of topology 
In-Reply-To: Message from "Lucas Gonze" <lucas@gonze.com> 
   of "Sun, 24 Jun 2001 14:24:44 EDT." <NEBBJIHMMLKHEOPNOGHDOEHODPAA.lucas@gonze.com> 
References: <NEBBJIHMMLKHEOPNOGHDOEHODPAA.lucas@gonze.com> 
Message-ID: <E15EYkP-0006v4-00@localhost>

 Lucas Gonze wrote:
>
> In trying to parse your idea I run up against a problem with this language from
> #2 "For clients to interact with each other, they must use the *same* server."
> Did you mean to say "For clients to interact with each other, they do not have
> to use the *same* server."?

Yes!  This was a cutnpast-o.  I'm very sorry -- cutnpast-o's are especially
confusing when you are trying to establish terminology.  :-/

 
> The reason I have a problem with it is that I believe the core difference is
> where aggregation happens -- at the client/endpoint or at the
> server/nearest_intermediary.  If not for that funny language, I would say that
> the difference between #1 and #2 is that #1 doesn't have agents and #2 does.
> The reason I bring up agents is that servers that aggregate results from other
> servers are taking initiative on behalf of the client/endpoint.

This makes sense to me.  Although I am leary of the "agent" buzzword, I agree
that an important distinction between these two models is that in #2, data
aggregation on the client's behalf is happening remotely from the client.


Regards,

Zooko


From lucas at gonze.com  Tue Jun 26 08:55:01 2001
From: lucas at gonze.com (Lucas Gonze)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] names of kinds of topology 
In-Reply-To: <E15EYkP-0006v4-00@localhost>
Message-ID: <NEBBJIHMMLKHEOPNOGHDEEKPDPAA.lucas@gonze.com>

per Zooko:
> This makes sense to me.  Although I am leary of the "agent" buzzword, I agree
> that an important distinction between these two models is that in #2, data
> aggregation on the client's behalf is happening remotely from the client.

can't say I love 'agent' myself -- it is hopelessly fuzzy terminology.

- Lucas


From bram at gawth.com  Thu Jun 28 03:14:02 2001
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] BitTorrent is out
Message-ID: <Pine.LNX.4.21.0106280311430.8975-100000@ultra.gawth.com>

My new P2P app, BitTorrent, is out, you can get it here - 

http://bitconjurer.org/BitTorrent/

In a nutshell, it gets people to upload by bartering for bytes, and has
some very sophisticated and robust algorithms for doing load balancing and
dealing with low uptime.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From bram at bitconjurer.org  Fri Jun 29 17:29:01 2001
From: bram at bitconjurer.org (Bram Cohen)
Date: Sat Dec  9 22:11:42 2006
Subject: [p2p-hackers] New release of BitTorrent out
Message-ID: <Pine.LNX.4.21.0106291401560.16911-100000@zen.eds.org>

I've put up a new release, you can get it here - 

http://bitconjurer.org/BitTorrent/BitTorrent-01-00-01.tar.gz

It's also now the one linked off the BitTorrent pages.

This is a bug fix release - it no longer hoses the CPU after connection
fails, and doesn't throw an exception when an upload connection hasn't
gotten it's status set yet.

The next version will include copying of comments from blobs_want to
blobs_have, instructions when you execute with inappropriate parameters,
and (hopefully) crypto in C.

That last one I could use some help with - if anyone throws together a
public domain distutils-ified C implementation of StreamEncrypter.py, I'll
include it immediately (should be fairly straightforward - it's just
rijndael in counter mode). Otherwise it's dependant on me successfully
bugging my younger brother to write one.

-Bram Cohen