From david at pjort.com Sun Feb 1 09:19:00 2004 From: david at pjort.com (David =?iso-8859-1?Q?G=F6thberg?=) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Re: P2P journal copyright In-Reply-To: <401B22BA.8010005@neurogrid.com> References: <20040131030211.GF20611@lycopodium> <401AF6B8.6050902@neurogrid.com> <20040131030211.GF20611@lycopodium> Message-ID: <5.0.2.1.1.20040131100728.00a3f650@pop.home.se> Sam Joseph, P2P journal wrote: >We are trying to adjust our copyright policy to make everyone happy. >We are in the process of trying to improve the wording of the policy, and >I don't understand why you don't want to help us. >In fact here is the latest version of the copyright policy >"P2PJ wants to find a balance between author?s copyright and ensuring >articles in P2PJ are unique. P2PJ prefers original, quality articles. >Quality is more important than quantity. > >By submitting the article to P2PJ, the author grants P2PJ and its >affiliated organizations and entities, perpetual, royalty-free, worldwide >licenses for both electronic and print formats. Authors are granted rights >to reproduce their articles provided that they include a prominent >statement: 'The piece originally appeared in P2P Journal'. A clearly >visible link to http://p2pjournal.com must also be made. Hi everyone! Since it was I who started out this discussion I feel obliged to comment. Don Marti: Thanks for supporting my views. :) Sam Joseph, p2pjournal.com: You're new version of your copyright agreement is much more agreeable. However I think it does create a whole set of legal problems and uncertainties for both parties. So I have some suggestions. Stating that the p2pjournal gets a "license" and then that "Authors are granted rights to reproduce" makes it very unclear who owns what. The wording you have chosen for instance might make it illegal for any of the parties to resell the text! That is, your wording gives both parties the right to reproduce the text, but not to sell copies of it or resell the rights to it. I think it is better and easier to give both parties one complete copyright and ownership of the text. That is, to copy the copyright! :) Here's a rough translation (from memory) and adaptation of the copyright agreement we used for the paintings my mother bought for the book she wrote. Lawyers in Sweden thought this was a very nice idea and they could see no legal problem with it: ************************** "The author grants P2P Journal a "shared" or "copied" copyright to the document. This means that each party has the complete set of rights to the document as if it is two different documents. That means that both parties can do anything they want with their copy of the document. Both parties can reproduce, redistribute, licence or even sell their rights to the document to other parties. This also means that both parties can do changes to the document and reuse any part of it as they see fit. That is, both parties fully own their copy of the document." "Both parties understand that if they sell their right to the document to some other party they should inform the buyer about the fact that this document is subject to a shared or copied copyright." ************************** A funny consequence of a copied copyright is that each party actually can sell (or give away) copied copyrights of the document! That is, theoretically in the long run there might be many owners each owning a copy of the document. :)) We had one additional "backup copy" paragraph that you probably don't want to use and don't need. But it was good since one usually don't have many backup copies of paintings... ************************** "If any of the parties loses his/her originals they have the right to request full size copies of the paintings from the other party at cost price. (Price of making a good copy in a print shop and postage and packaging.)" ************************** We didn't bother to ad anything about how this "backup copy" paragraph should be handled if one party sold their right to the document to some other party. So it is unclear if a new owner that have not signed the original contract would be bound by it. So adding some obligation or rights between the two original parties can become very messy if one party wants to sell, reuse or change their copy. Your demand that the author must include the statement "The piece originally appeared in P2P Journal" causes a similar problem. What if the author greatly reworks the document? Or only reuses some small part of the document inside another document? And what if the author sells his right to the document? The easy way to avoid such problems is simply to not have any obligations between the two parties. I am aware that you wanted to prevent the authors from publishing their document in other journals etc. But to accomplish that you must be the sole owner of the document. But to OWN the document you should PAY the author for doing WORK for you. And note that much of the content in the articles might have been the result of work done while the author was paid by some other source. And you didn't want to pay anything anyway so. But you could ad the following paragraph: ************************** If / when the author reuses the published article p2p Journal would like that the following statement is included: 'This piece originally appeared in P2P Journal' However this is only a request, not an obligation. ************************** I believe that most authors will be perfectly happy to "brag" in their paper that the paper has been published in a journal. But only making it a request gives the author full freedom. And this also solves the problem what to do when only reusing a small part of the document (like a figure or so). I understand that you need to have the right to print and resell your journal and to reuse parts of your journal in other ways. But the authors also need to reuse the documents they write about their research. It would be silly if the authors would have to redraw the figures describing their p2p network structure. A "copied" copyright solves this problem completely giving both parties full freedom. And most p2p researchers really like freedom a lot! Freedom of speech/expression/publication is one of the main reasons many of us do p2p research. Well, this was my five cents (my view) on the "problem". Greetings from snowy Gothenburg, Sweden, Northern Europe, .../David ----------------------------------------------------------- David G?thberg Email: david@pjort.com http://www.david.pjort.com ----------------------------------------------------------- From sam at neurogrid.com Mon Feb 2 01:04:18 2004 From: sam at neurogrid.com (Sam Joseph) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] P2PJ Call for Referees Message-ID: <401DA212.8080009@neurogrid.com> Hi all, I realise that this may not be the best timing to post this call for *referees*, since we are in the middle of a debate about copyright in P2PJ. However I am in the process of moving house, and so can't gather all the threads of the copyright thing together right now. I want to thank everyone who has mailed me and the list with input, and assure everyone that as soon as I have finished my house move I will work my hardest to get P2PJ set up with a copyright policy that addresses everyones concerns. In the meantime I'm just mailing this call for referees because we really do need help reviewing papers, and I have already taken far too long to get it sorted out. Sincerely, Sam Joseph ------------------------------------------------------------------ CALL FOR REVIEWERS Peer-to-Peer Journal (http://p2pjournal.com) ------------------------------------------------------------------- Apologies if you receive this announcement more than once. The Peer-to-Peer Journal (P2PJ) is an electronic, refereed journal devoted to comprehensive coverage of Peer-to-Peer and Parallel computing topics. It is freely available, with no subscription and is located at http://www.p2pjournal.com/. All articles submitted to P2PJ are evaluated by referees who comment on the article and make recommendations to the Editor. P2PJ is looking to expand the number of referees and would very much welcome volunteers from those interested in P2P and with specializations in relevant fields. When P2PJ receives articles, the editorial board selects referees to approach according to the overlap between the article itself and the expertise of the potential reviewers. An invitation email is then sent to the selected referees, with the anonymised article attached and a checklist of points to consider. P2PJ hopes to receive reviews within 10 to 14 days. The authors then receive an anonymised version of the referees' comments and suggestions for improvement of the article. While there are no direct benefits available to referees, reviewing does allow you to see the current state of the art andis of immense benefit to the peer-to-peer community generally and to the authors of the submission specifically. A referee would not normally receive more than three papers a year to review, and most referees are asked to review only occasionally, when a paper in their particular field is submitted. While there are no direct benefits available to referees, reviewing does allow you to see the current state of the art, is of immense benefit to the peer-to-peer community generally, and is beneficial to the authors of the submission specifically. If you would like to become a referee for P2PJ, please send this this email to sam@p2pjournal.com after completing the form below. ----------------------------------------------- Title: (e.g. Dr, Prof.) First name: Family Name: Affiliation/institution: Email address: Keywords describing your area of expertise and interest (please be a specific as possible, e.g. 'distributed hashtables', not 'p2p'). List up to 5 keywords or phrases, one per line: Do you have a doctorate (PhD)? If so, in what field: Number of papers you have published in refereed journals: Thank you. Your reply will be acknowledged. Invitations to referee will then arrive when appropriate papers are submitted to P2PJ. If you need to update your details, please notify Raymond F. Gao, Editor-in-Chief Daniel Brookshier, Editor Sam Joseph, Editor From b.fallenstein at gmx.de Tue Feb 3 14:12:33 2004 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? Message-ID: <401FAC51.6040000@gmx.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, does anybody know references for using cryptographic hashes as unique identifiers for files in very large repositories (think all of the Web)? The references I've found (e.g. Handbook of Applied Cryptography) don't talk explicitly about that, but only about applications in message authentication, and attacks related to that; of course that's related, but it would be nice to know whether there are references from cryptology talking explicitly about hashes as unique identifiers in very large collections of messages. Thanks, - - Benja -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAH6xQUvR5J6wSKPMRAhRLAKCZUml8RrCWLTSuFR69uCLEvaLIYQCgqQP1 ZOpKsNoWrTxfuN/xu2PB5Dw= =Vy04 -----END PGP SIGNATURE----- From bert at web2peer.com Tue Feb 3 15:27:42 2004 From: bert at web2peer.com (Bert) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? Message-ID: <20040203072743.1242.h004.c001.wm@mail.web2peer.com.criticalpath.net> Maybe I'm not understanding the question, but isn't this exactly the idea of distributed hash tables? E.g. Chord, CAN, Pastry, etc... On Tue, 03 Feb 2004 16:12:33 +0200, Benja Fallenstein wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > does anybody know references for using cryptographic hashes as unique > identifiers for files in very large repositories (think all of the > Web)? > The references I've found (e.g. Handbook of Applied Cryptography) don't > talk explicitly about that, but only about applications in message > authentication, and attacks related to that; of course that's related, > but it would be nice to know whether there are references from > cryptology talking explicitly about hashes as unique identifiers in > very > large collections of messages. > > Thanks, > - - Benja > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFAH6xQUvR5J6wSKPMRAhRLAKCZUml8RrCWLTSuFR69uCLEvaLIYQCgqQP1 > ZOpKsNoWrTxfuN/xu2PB5Dw= > =Vy04 > -----END PGP SIGNATURE----- > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From jdd at dixons.org Tue Feb 3 16:13:35 2004 From: jdd at dixons.org (Jim Dixon) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? In-Reply-To: <401FAC51.6040000@gmx.de> Message-ID: <20040203160155.G78403-100000@localhost> On Tue, 3 Feb 2004, Benja Fallenstein wrote: > does anybody know references for using cryptographic hashes as unique > identifiers for files in very large repositories (think all of the Web)? > The references I've found (e.g. Handbook of Applied Cryptography) don't > talk explicitly about that, but only about applications in message > authentication, and attacks related to that; of course that's related, > but it would be nice to know whether there are references from > cryptology talking explicitly about hashes as unique identifiers in very > large collections of messages. No references really needed. You can use SHA digests to generate 160 bit/20 byte keys which can be used as unique identifiers. While it is theoretically possible that one messages or other document could hash to the same digest as another, changes are approximately 10^16 against it, so we needn't worry in our lifetimes. As someone else has pointed out, the various DHT networks (Chord, Pastry, etc) as well as Freenet assume that SHA digests are unique identifiers. -- Jim Dixon jdd@dixons.org tel +44 117 982 0786 mobile +44 797 373 7881 http://jxcl.sourceforge.net Java unit test coverage http://xlattice.sourceforge.net p2p communications infrastructure From b.fallenstein at gmx.de Tue Feb 3 16:15:55 2004 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? In-Reply-To: <20040203072743.1242.h004.c001.wm@mail.web2peer.com.criticalpath.net> References: <20040203072743.1242.h004.c001.wm@mail.web2peer.com.criticalpath.net> Message-ID: <401FC93B.6000508@gmx.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Bert, The question was whether there are any references in the cryptology literature that say that the idea is ok. I know it's true, I just wanted a reference so that I don't have to do the math to show it myself. I found none so far, so I'll have to do it myself ;) (Luckily, a more math-inclined friend showed me how that's not all that hard.) Thanks - - Benja Bert wrote: | Maybe I'm not understanding the question, but isn't this exactly the | idea of distributed hash tables? E.g. Chord, CAN, Pastry, etc... | | | | On Tue, 03 Feb 2004 16:12:33 +0200, Benja Fallenstein wrote: | | | Hi, | | does anybody know references for using cryptographic hashes as unique | identifiers for files in very large repositories (think all of the | Web)? | The references I've found (e.g. Handbook of Applied Cryptography) | |> don't | | talk explicitly about that, but only about applications in message | authentication, and attacks related to that; of course that's | |> related, | | but it would be nice to know whether there are references from | cryptology talking explicitly about hashes as unique identifiers in | very | large collections of messages. | | Thanks, | - Benja _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences | _______________________________________________ | p2p-hackers mailing list | p2p-hackers@zgp.org | http://zgp.org/mailman/listinfo/p2p-hackers | _______________________________________________ | Here is a web page listing P2P Conferences: | http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAH8k6UvR5J6wSKPMRAsaXAKCewyF+NSbPpE2jDwv0P6W08gKrjwCdHYsG /7u2dWKVA4NOp+NK5IYm+Yc= =Ll48 -----END PGP SIGNATURE----- From zooko at zooko.com Tue Feb 3 16:22:42 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? In-Reply-To: Message from Jim Dixon of "Tue, 03 Feb 2004 16:13:35 GMT." <20040203160155.G78403-100000@localhost> References: <20040203160155.G78403-100000@localhost> Message-ID: Jim Dixon wrote: > > While it is > theoretically possible that one messages or other document could hash to > the same digest as another, changes are approximately 10^16 against it, so > we needn't worry in our lifetimes. You're right about the conclusion, Jim, but 2^10 ~= 10^3, so 2^160 ~= 10^48. This is 10^16: 10,000,000,000,000,000 This is 10^48: 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 Regards, Zooko From hannes.tschofenig at siemens.com Tue Feb 3 16:46:19 2004 From: hannes.tschofenig at siemens.com (Tschofenig Hannes) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? Message-ID: <2A8DB02E3018D411901B009027FD3A3F03BC067B@mchp905a.mch.sbs.de> hi all, you would call it statistically unique: from the hip draft: The birthday paradox sets a bound for the expectation of collisions. It is based on the square root of the number of values. A 64-bit hash, then, would put the chances of a collision at 50-50 with 2^32 hosts (4 billion). A 1% chance of collision would occur in a population of 640M and a .001% collision chance in a 20M population. A 128 bit hash will have the same .001% collision chance in a 9x10^16 population. ciao hannes > -----Original Message----- > From: Jim Dixon [mailto:jdd@dixons.org] > Sent: Tuesday, February 03, 2004 5:14 PM > To: Peer-to-peer development. > Subject: Re: [p2p-hackers] References for using hashes as unique > identifiers? > > > On Tue, 3 Feb 2004, Benja Fallenstein wrote: > > > does anybody know references for using cryptographic hashes > as unique > > identifiers for files in very large repositories (think all > of the Web)? > > The references I've found (e.g. Handbook of Applied > Cryptography) don't > > talk explicitly about that, but only about applications in message > > authentication, and attacks related to that; of course > that's related, > > but it would be nice to know whether there are references from > > cryptology talking explicitly about hashes as unique > identifiers in very > > large collections of messages. > > No references really needed. You can use SHA digests to generate 160 > bit/20 byte keys which can be used as unique identifiers. While it is > theoretically possible that one messages or other document > could hash to > the same digest as another, changes are approximately 10^16 > against it, so > we needn't worry in our lifetimes. > > As someone else has pointed out, the various DHT networks > (Chord, Pastry, > etc) as well as Freenet assume that SHA digests are unique > identifiers. > > -- > Jim Dixon jdd@dixons.org tel +44 117 982 0786 mobile +44 > 797 373 7881 > http://jxcl.sourceforge.net Java unit > test coverage > http://xlattice.sourceforge.net p2p communications > infrastructure > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > From b.fallenstein at gmx.de Tue Feb 3 17:03:34 2004 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? In-Reply-To: References: <20040203160155.G78403-100000@localhost> Message-ID: <401FD466.80101@gmx.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Zooko O'Whielacronx wrote: | Jim Dixon wrote: | |>While it is |>theoretically possible that one messages or other document could hash to |>the same digest as another, changes are approximately 10^16 against it, so |>we needn't worry in our lifetimes. | | You're right about the conclusion, Jim, but 2^10 ~= 10^3, so 2^160 ~= 10^48. It's certainly not 2^160 against -- that would be the case if we had only two documents. If we have n documents, the probability of a collision is < n^2/2^160 though, which for n = 2^60 still gives 2^(-40) or about one trillion against. - - Benja -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAH9RlUvR5J6wSKPMRAmSFAJ9kz8z7fqrEzW+AtEzMtRwQhQ2cAgCfeesw mBPUNb//x8d9i88y/kEMuZA= =mYPs -----END PGP SIGNATURE----- From jdd at dixons.org Tue Feb 3 17:13:23 2004 From: jdd at dixons.org (Jim Dixon) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? In-Reply-To: Message-ID: <20040203171223.B78403-100000@localhost> On 3 Feb 2004, Zooko O'Whielacronx wrote: > > While it is > > theoretically possible that one messages or other document could hash to > > the same digest as another, changes are approximately 10^16 against it, so > > we needn't worry in our lifetimes. > > You're right about the conclusion, Jim, but 2^10 ~= 10^3, so 2^160 ~= 10^48. Yep. Should have stayed in bed. > This is 10^16: > > 10,000,000,000,000,000 > > This is 10^48: > > 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 Which makes the odds just a bit better. ;-) -- Jim Dixon jdd@dixons.org tel +44 117 982 0786 mobile +44 797 373 7881 http://jxcl.sourceforge.net Java unit test coverage http://xlattice.sourceforge.net p2p communications infrastructure From zooko at zooko.com Tue Feb 3 17:21:48 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? In-Reply-To: Message from Jim Dixon of "Tue, 03 Feb 2004 17:13:23 GMT." <20040203171223.B78403-100000@localhost> References: <20040203171223.B78403-100000@localhost> Message-ID: > > You're right about the conclusion, Jim, but 2^10 ~= 10^3, so 2^160 ~= 10^48. > > Yep. Should have stayed in bed. Me too, because I didn't mention the birthday surprise. But we don't have to worry about the birthday surprise for a specific document. The chance that anyone can come up with a second pre-image that matches this hash is 2^-160: 820550664cf296792b38d1647a4d8c0e1966af57 Regards, Zooko P.S. The hash above is hex encoded. Here it is base32 encoded in my own base32 alphabet: oeniy31c6km81k3a4f18wuccbacspm4z From gojomo at bitzi.com Tue Feb 3 17:33:08 2004 From: gojomo at bitzi.com (Gordon Mohr) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? In-Reply-To: References: <20040203171223.B78403-100000@localhost> Message-ID: <401FDB54.7040406@bitzi.com> Zooko O'Whielacronx wrote: > But we don't have to worry about the birthday surprise for a specific document. > The chance that anyone can come up with a second pre-image that matches this > hash is 2^-160: > > 820550664cf296792b38d1647a4d8c0e1966af57 > > Regards, > > Zooko > > P.S. The hash above is hex encoded. Here it is base32 encoded in my own > base32 alphabet: oeniy31c6km81k3a4f18wuccbacspm4z And using custom alphabets decreases the chances the inadvertent collisions even further: the exact same preimage and hash function will give different output! :) - Gordon From hal at finney.org Tue Feb 3 19:08:31 2004 From: hal at finney.org (Hal Finney) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? Message-ID: <200402031908.i13J8VT10926@finney.org> Hannes writes: > > The birthday paradox sets a bound for the expectation of collisions. It is > based on the square root of the number of values. A 64-bit hash, then, would > put the chances of a collision at 50-50 with 2^32 hosts (4 billion). A 1% > chance of collision would occur in a population of 640M and a .001% > collision chance in a 20M population. A 128 bit hash will have the same > .001% collision chance in a 9x10^16 population. So if the world population stabilizes at 10 billion, or 10^10, then each person can have 9 million addressable objects before the chance of a collision rises to .001%. At about a billion objects per person the chances hit 50-50 and the system breaks down. I don't think this is good enough. 9 million objects per person is rather limited, especially in a future 100 years from now which will probably be much more information-rich than today. I would suggest before we standardize on this that we use a larger hash than 160 bits. Newer hashes have sizes of 256, 384 and 512 bits. Otherwise we'll be facing a Y2K-like problem 50 to 100 years from now. Hal From b.fallenstein at gmx.de Tue Feb 3 20:14:50 2004 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <200402031908.i13J8VT10926@finney.org> References: <200402031908.i13J8VT10926@finney.org> Message-ID: <4020013A.8050106@gmx.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Hal, Hal Finney wrote: | I don't think this is good enough. 9 million objects per person is | rather limited, especially in a future 100 years from now which will | probably be much more information-rich than today. I think it's a safe assumption that one hundred years from now, computers will be able to find second preimages for 160-bit hashes anyway -- and it's not too unlikely that the analysis of hashes will have advanced enough that SHA-1 or other hashes of today are broken anyway. There was a paper that extrapolated trends in cryptography into the future, among other things predicting when breaking an n-bit hash would cost how much. I don't remember enough about the paper to find it right now, but maybe someone else remembers. Cheers, - - Benja -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAH/+mUvR5J6wSKPMRAu8fAKDQb/VMdmJtxRAzjlhbjcl7GGxkQwCfXX4d kwpe9bkhJ/I226NvwL//11o= =gurT -----END PGP SIGNATURE----- From mllist at vaste.mine.nu Tue Feb 3 22:50:23 2004 From: mllist at vaste.mine.nu (Johan =?iso-8859-1?Q?F=E4nge?=) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <200402031908.i13J8VT10926@finney.org> References: <200402031908.i13J8VT10926@finney.org> Message-ID: <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> Has there been any research done on this, and how this can be handled? I realize that this is extremely unlikely, but it nags me that it's a problem. To fully compare two hashed pieces of data on two peers, one would need to transfer all of the data, correct? It is also not possible to detect "possible" collisions, without increasing the standard transfer rate. (E.g. the output of the hash-function.) However, this increase of transfer-rate is in some instances already taking place. E.g. when the collision is a top-level hash of a merkle-tree, but the lower levels are different. As soon as the lower parts of the tree is transferred (e.g. in a filesharing system where files are being transferred) the collection might be detected. Of course this only reduces the possibilities and there might be still be indetectable collisions. A human might also detect that it's incorrect. The question then is whether it, except when the hash-function is cracked, really matters. If we really are unable to detect something wrong (and again, that it isn't intentional) then does this really matters? Won't what we got the be good enough? Once collection is detected is it's not a hard problem to do something about it. In a DHT, simply note on the node storing the hash with collisions that there are two versions, and the necessary data to distinguish them. After deciding which piece of data it is, then push one or both pieces of data onto some other node (by adding some data, e.g. "1", to the data hashed, and repeat until there's no longer a collision). I believe that for any serious candidate to an all-encompassing namespace, there should be some procedure as to what to do if a collision happens. /Vaste > Hannes writes: > > So if the world population stabilizes at 10 billion, or 10^10, then each > person can have 9 million addressable objects before the chance of a > collision rises to .001%. At about a billion objects per person the > chances > hit 50-50 and the system breaks down. > > I don't think this is good enough. 9 million objects per person is > rather limited, especially in a future 100 years from now which will > probably be much more information-rich than today. > > I would suggest before we standardize on this that we use a larger > hash than 160 bits. Newer hashes have sizes of 256, 384 and 512 bits. > Otherwise we'll be facing a Y2K-like problem 50 to 100 years from now. > > Hal From justin at chapweske.com Tue Feb 3 23:08:52 2004 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> Message-ID: <1075849732.3607.3240.camel@bog> I agree that this is the appropriate fix. If you are concerned about collisions, don't worry about using larger (and much slower) hash functions. Simply transfer enough levels of your Merkle tree (encoded using THEX of course :) to provide the level of robustness that you are looking for. > However, this increase of transfer-rate is in some instances already > taking place. E.g. when the collision is a top-level hash of a > merkle-tree, but the lower levels are different. As soon as the lower > parts of the tree is transferred (e.g. in a filesharing system where files > are being transferred) the collection might be detected. Of course this > only reduces the possibilities and there might be still be indetectable > collisions. A human might also detect that it's incorrect. > From opoli at comcast.net Wed Feb 4 02:49:34 2004 From: opoli at comcast.net (Opoli) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <4020013A.8050106@gmx.de> Message-ID: <000401c3eac9$8801c7d0$ae00a943@Willard> There was a paper that extrapolated trends in cryptography into the future, among other things predicting when breaking an n-bit hash would cost how much. I don't remember enough about the paper to find it right now, but maybe someone else remembers. -- The RSA site has links to a number of papers dealing with this - http://www.rsasecurity.com/rsalabs/technotes/bernstein.html The most recent paper is 2002 - there is probably something out there more up to date but this is what I had at hand. Glad - to be a part of the list, Chris Camp Cheers, - - Benja -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAH/+mUvR5J6wSKPMRAu8fAKDQb/VMdmJtxRAzjlhbjcl7GGxkQwCfXX4d kwpe9bkhJ/I226NvwL//11o= =gurT -----END PGP SIGNATURE----- _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From p2p at kingprimate.com Wed Feb 4 03:48:10 2004 From: p2p at kingprimate.com (Jeremiah Rogers) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> Message-ID: <40206B7A.5070109@kingprimate.com> Johan F?nge wrote: > To fully compare two hashed pieces of data on two peers, one would need to > transfer all of the data, correct? It is also not possible to detect > "possible" collisions, without increasing the standard transfer rate. > (E.g. the output of the hash-function.) You could compare smaller segments of the data. The likelyhood that both halves of a two messages, as well as the entire messages, all have matching hashes would be extremely unlikely (3 * 2^n or maybe even 2^3n, I'm not sure). I guess it really depends on how sure you want to be that the messages aren't the same. > Once collection is detected is it's not a hard problem to do something > about it. In a DHT, simply note on the node storing the hash with > collisions that there are two versions, and the necessary data to > distinguish them. After deciding which piece of data it is, then push one > or both pieces of data onto some other node (by adding some data, e.g. > "1", to the data hashed, and repeat until there's no longer a collision). > > I believe that for any serious candidate to an all-encompassing namespace, > there should be some procedure as to what to do if a collision happens. Being able to handle a collision without completely failing is important, but if you're going to have assertions made about database hashes then you need to have unique hashes to prevent refutable or misinterpreted statements. There are also zero-knowledge situations where (for example) the DHT stores merely the hash H, and although both A and B hash to H, you won't be able to tell that. There was a interesting solution proposed earlier on this list [1] that would allow for increased hash length in the future if collions occured while keeping current length (and network usage) lower. It involves taking a larger hash and shrinking it. -jr [1] http://zgp.org/pipermail/p2p-hackers/2002-November/000977.html From bkn3 at columbia.edu Wed Feb 4 05:31:54 2004 From: bkn3 at columbia.edu (Brad Neuberg) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Distributed Hashtables - Low Latency/Low Membership Overhead In-Reply-To: <40206B7A.5070109@kingprimate.com> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> Message-ID: <6.0.1.1.2.20040203212724.01e31b38@pop.mail.yahoo.com> I'm looking for a DHT that has low latency (as few hops as possible) coupled with a design that can handle high churn (such as the average length of time for a node being 30 minutes). The membership/maintenance protocol should have low-overhead (i.e. the background traffic for establishing, maintaining, and breaking membership in the DHT should be a low-percentage of the total network traffic). Other factors can be "pessimized" to achieve these two goals, such as maintaining extra large routing tables in memory. I am aware of work establishing low latency for DHTs with Kelips and a modified Chord, but am not aware of work that also combines low-overhead for the membership protocol. Can folks point me to further work in this area? Thanks, Brad Neuberg bkn3@columbia.edu From paul at soniq.net Wed Feb 4 06:05:09 2004 From: paul at soniq.net (Paul Boehm) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <200402031908.i13J8VT10926@finney.org> References: <200402031908.i13J8VT10926@finney.org> Message-ID: <20040204060509.GA5303@soniq.net> On Tue, Feb 03, 2004 at 11:08:31AM -0800, Hal Finney wrote: > I would suggest before we standardize on this that we use a larger > hash than 160 bits. Newer hashes have sizes of 256, 384 and 512 bits. > Otherwise we'll be facing a Y2K-like problem 50 to 100 years from now. i've been wondering if unique identifiers and data authentication really are the same problem. i think the unique identifier problem could be solved elegantly with universal hash functions, that allow for N to S mappings with arbitrary length for both N and S, and a provably non-biased distribution. for authentication of the data, the question remains whether trusting a cryptographic hash really is better than asking a trusted peer (assuming there won't be non-social filesharing in the future) that has the data, to MAC it for us. regards, paul From b.fallenstein at gmx.de Wed Feb 4 10:54:05 2004 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Distributed Hashtables - Low Latency/Low Membership Overhead In-Reply-To: <6.0.1.1.2.20040203212724.01e31b38@pop.mail.yahoo.com> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> <6.0.1.1.2.20040203212724.01e31b38@pop.mail.yahoo.com> Message-ID: <4020CF4D.8090004@gmx.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, For handling high churn at manageable network traffic, see Bamboo (http://bamboo-dht.org/). It's unfortunately not optimized for low latency, though, but maybe ideas from there can help. - - Benja Brad Neuberg wrote: | I'm looking for a DHT that has low latency (as few hops as possible) | coupled with a design that can handle high churn (such as the average | length of time for a node being 30 minutes). The membership/maintenance | protocol should have low-overhead (i.e. the background traffic for | establishing, maintaining, and breaking membership in the DHT should be | a low-percentage of the total network traffic). Other factors can be | "pessimized" to achieve these two goals, such as maintaining extra large | routing tables in memory. I am aware of work establishing low latency | for DHTs with Kelips and a modified Chord, but am not aware of work that | also combines low-overhead for the membership protocol. Can folks point | me to further work in this area? | | Thanks, | Brad Neuberg | bkn3@columbia.edu | | _______________________________________________ | p2p-hackers mailing list | p2p-hackers@zgp.org | http://zgp.org/mailman/listinfo/p2p-hackers | _______________________________________________ | Here is a web page listing P2P Conferences: | http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences | | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAIM9CUvR5J6wSKPMRAhKlAJ92QE5oZRtXJ6uSGxWL64heuqrP7gCcDSf7 ZVt1Fq/XeStyWIqerjO8hyk= =YDM/ -----END PGP SIGNATURE----- From aloeser at cs.tu-berlin.de Wed Feb 4 11:16:35 2004 From: aloeser at cs.tu-berlin.de (Alexander =?iso-8859-1?Q?L=F6ser?=) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll Message-ID: <4020D493.2420AE9C@cs.tu-berlin.de> Chord uses a consistent hash function which equaly distributes keys to nodes. For a large number of keys, queries are somehow loadbalanced. However, some keys are more popular than other keys, thus some nodes receive more queries, more messages are routed to thos nodes, more bandwith is used. An interesting approach would be to create cache entries of keys and objects along the lookup paths of successful queries. Does anybody now experiences with this approach? Which approaches for query loadbalancing are known to you? Any links to papers or wprking systems are welcome. Alex -- ___________________________________________________________ M.Sc., Dipl. Wi.-Inf. Alexander L?ser Technische Universitaet Berlin Fakultaet IV - CIS bmb+f-Projekt: "New Economy, Neue Medien in der Bildung" hp: http://cis.cs.tu-berlin.de/~aloeser/ office: +49- 30-314-25551 fax : +49- 30-314-21601 ___________________________________________________________ From bkn3 at columbia.edu Wed Feb 4 11:19:03 2004 From: bkn3 at columbia.edu (Brad Neuberg) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll In-Reply-To: <4020D493.2420AE9C@cs.tu-berlin.de> References: <4020D493.2420AE9C@cs.tu-berlin.de> Message-ID: <6.0.1.1.2.20040204031846.01e4feb0@pop.mail.yahoo.com> Check out Coral at http://www.scs.cs.nyu.edu/coral/. At 03:16 AM 2/4/2004, Alexander L?ser wrote: >Chord uses a consistent hash function which equaly distributes keys to >nodes. For a large number of keys, queries are somehow loadbalanced. >However, some keys are more popular than other keys, thus some nodes >receive more queries, more messages are routed to thos nodes, more >bandwith is used. An interesting approach would be to create cache >entries of keys and objects along the lookup paths of successful >queries. > >Does anybody now experiences with this approach? >Which approaches for query loadbalancing are known to you? > >Any links to papers or wprking systems are welcome. > >Alex >-- >___________________________________________________________ > > M.Sc., Dipl. Wi.-Inf. Alexander L?ser > Technische Universitaet Berlin Fakultaet IV - CIS > bmb+f-Projekt: "New Economy, Neue Medien in der Bildung" > hp: http://cis.cs.tu-berlin.de/~aloeser/ > office: +49- 30-314-25551 > fax : +49- 30-314-21601 >___________________________________________________________ > > >_______________________________________________ >p2p-hackers mailing list >p2p-hackers@zgp.org >http://zgp.org/mailman/listinfo/p2p-hackers >_______________________________________________ >Here is a web page listing P2P Conferences: >http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From bkn3 at columbia.edu Wed Feb 4 11:19:03 2004 From: bkn3 at columbia.edu (Brad Neuberg) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll In-Reply-To: <4020D493.2420AE9C@cs.tu-berlin.de> References: <4020D493.2420AE9C@cs.tu-berlin.de> Message-ID: <6.0.1.1.2.20040204031846.01e4feb0@pop.mail.yahoo.com> Check out Coral at http://www.scs.cs.nyu.edu/coral/. At 03:16 AM 2/4/2004, Alexander L?ser wrote: >Chord uses a consistent hash function which equaly distributes keys to >nodes. For a large number of keys, queries are somehow loadbalanced. >However, some keys are more popular than other keys, thus some nodes >receive more queries, more messages are routed to thos nodes, more >bandwith is used. An interesting approach would be to create cache >entries of keys and objects along the lookup paths of successful >queries. > >Does anybody now experiences with this approach? >Which approaches for query loadbalancing are known to you? > >Any links to papers or wprking systems are welcome. > >Alex >-- >___________________________________________________________ > > M.Sc., Dipl. Wi.-Inf. Alexander L?ser > Technische Universitaet Berlin Fakultaet IV - CIS > bmb+f-Projekt: "New Economy, Neue Medien in der Bildung" > hp: http://cis.cs.tu-berlin.de/~aloeser/ > office: +49- 30-314-25551 > fax : +49- 30-314-21601 >___________________________________________________________ > > >_______________________________________________ >p2p-hackers mailing list >p2p-hackers@zgp.org >http://zgp.org/mailman/listinfo/p2p-hackers >_______________________________________________ >Here is a web page listing P2P Conferences: >http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From eugen at leitl.org Wed Feb 4 11:26:44 2004 From: eugen at leitl.org (Eugen Leitl) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] is muted horn good for you? Message-ID: <20040204112644.GI13816@leitl.org> Can someone knowledgeable comment on Mute's architecture? http://mute-net.sourceforge.net/index.shtml -- Eugen* Leitl leitl ______________________________________________________________ ICBM: 48.07078, 11.61144 http://www.leitl.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE http://moleculardevices.org http://nanomachines.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20040204/debc73d7/attachment.pgp From coderman at peertech.org Wed Feb 4 11:34:51 2004 From: coderman at peertech.org (coderman) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll In-Reply-To: <6.0.1.1.2.20040204031846.01e4feb0@pop.mail.yahoo.com> References: <4020D493.2420AE9C@cs.tu-berlin.de> <6.0.1.1.2.20040204031846.01e4feb0@pop.mail.yahoo.com> Message-ID: <4020D8DB.4000102@peertech.org> Brad Neuberg wrote: > Check out Coral at http://www.scs.cs.nyu.edu/coral/. From: http://www.scs.cs.nyu.edu/coral/download.html " Currently, only R/W access to Coral is available. Note that getting read/write access requires an account on the SCS file servers, so this route is probably open only to SCS members and affiliates." Any way to get a snapshot (nightly tarballs?) or read access? This sounds interesting but I'd like to dig a little deeper... Regards, From mllist at vaste.mine.nu Wed Feb 4 13:19:23 2004 From: mllist at vaste.mine.nu (Johan =?iso-8859-1?Q?F=E4nge?=) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <40206B7A.5070109@kingprimate.com> References: <200402031908.i13J8VT10926@finney.org><12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> Message-ID: <13027.81.227.49.57.1075900763.squirrel@Vaste_lp3.wired> Jeremiah Rogers wrote: > Johan F?nge wrote: > > > To fully compare two hashed pieces of data on two peers, one > > would need to transfer all of the data, correct? It is also > > not possible to detect "possible" collisions, without > > increasing the standard transfer rate. (E.g. the output of > > the hash-function.) > > You could compare smaller segments of the data. The likelyhood that both > halves of a two messages, as well as the entire messages, all have > matching hashes would be extremely unlikely (3 * 2^n or maybe even 2^3n, > I'm not sure). I guess it really depends on how sure you want to be that > the messages aren't the same. Unless you already compare smaller segments this _is_ increasing the standard transfer rate. In filesharing this is sometimes already done to get finer grained hashes, so there it is not problem. > > > Once collection is detected is it's not a hard problem to do > > something about it. In a DHT, simply note on the node storing > > the hash with collisions that there are two versions, and the > > necessary data to distinguish them. After deciding which piece > > of data it is, then push one or both pieces of data onto some > > other node (by adding some data, e.g. "1", to the data hashed, > > and repeat until there's no longer a collision). > > > > I believe that for any serious candidate to an all-encompassing > > namespace, there should be some procedure as to what to do if a > > collision happens. > > Being able to handle a collision without completely failing is > important, but if you're going to have assertions made about database > hashes then you need to have unique hashes to prevent refutable or > misinterpreted statements. I'm not sure I understand what you mean here. The extra information is collected and used only when a collision is detected. It doesn't affect the rest of the system. The extra information means that the peer doing the hash must do a more thorough investigation, and this shouldn't present much of a problem. This extra information may be an additional hash. (data has hash x; data + "1" has hash y) > > There are also zero-knowledge situations where (for example) the DHT > stores merely the hash H, and although both A and B hash to H, you won't > be able to tell that. My earlier question was that if you really are _completely_ unable to detect the error, does it then matter? After all, there is _nothing_ that says something is wrong. It's a farily hypothetical question. > > There was a interesting solution proposed earlier on this list [1] that > would allow for increased hash length in the future if collions occured > while keeping current length (and network usage) lower. It involves > taking a larger hash and shrinking it. Yes, that is one kind of additional information. /Vaste From mllist at vaste.mine.nu Wed Feb 4 13:31:45 2004 From: mllist at vaste.mine.nu (Johan =?iso-8859-1?Q?F=E4nge?=) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] is muted horn good for you? In-Reply-To: <20040204112644.GI13816@leitl.org> References: <20040204112644.GI13816@leitl.org> Message-ID: <13035.81.227.49.57.1075901505.squirrel@Vaste_lp3.wired> > > Can someone knowledgeable comment on Mute's architecture? > > http://mute-net.sourceforge.net/index.shtml http://mute-net.sourceforge.net/howAnts.shtml Throwing a fairly quick look at this page, I'd say it's awful, with regards to scaling. I must say that the ant-stuff is a pretty cool way to explain gnutella, and as far as I can tell that is just what it is, regarding structure. (But so far without any optimizations like supernodes.) It is different though, in that it also transfers data along the overlay (even slower, but better anonymity). /Vaste From justin at chapweske.com Wed Feb 4 15:06:26 2004 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <13027.81.227.49.57.1075900763.squirrel@Vaste_lp3.wired> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> <13027.81.227.49.57.1075900763.squirrel@Vaste_lp3.wired> Message-ID: <1075907185.3607.3259.camel@bog> Practically speaking, increasing the size of the hash is not a bandwidth issue. It is a CPU issue. So truncating a larger hash simply provides less useful information at the cost of higher CPU utilization. The rule of thumb that I go by is that your content verification should take no more than 5% of the CPU when the file is being verified at a rate equal to the maximum expected download rate. This is usually quite easy to attain for broadband connections, but gets starts getting difficult for high-speed networks - 45 Mbps+. The nice thing about the approach of comparing the top N values of the hash tree is that, in order to generate the root hash, you have to generate those intermediate hashes anyway, so it causes no additional CPU load to verify as many hash bits as you like. -Justin > > > > There was a interesting solution proposed earlier on this list [1] that > > would allow for increased hash length in the future if collions occured > > while keeping current length (and network usage) lower. It involves > > taking a larger hash and shrinking it. > > Yes, that is one kind of additional information. From coderman at peertech.org Wed Feb 4 15:23:18 2004 From: coderman at peertech.org (coderman) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <1075907185.3607.3259.camel@bog> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> <13027.81.227.49.57.1075900763.squirrel@Vaste_lp3.wired> <1075907185.3607.3259.camel@bog> Message-ID: <40210E66.8020806@peertech.org> Justin Chapweske wrote: >Practically speaking, increasing the size of the hash is not a bandwidth >issue. It is a CPU issue. So truncating a larger hash simply provides >less useful information at the cost of higher CPU utilization. > >The rule of thumb that I go by is that your content verification should >take no more than 5% of the CPU when the file is being verified at a >rate equal to the maximum expected download rate. This is usually quite >easy to attain for broadband connections, but gets starts getting >difficult for high-speed networks - 45 Mbps+. > > Another factor is bringing large amounts of existing content onto the network. I have a few 160G drives loaded with content, archives, software, etc, and building hash identifiers for that much data takes a while. This is one reason I have been a big fan of VIA's efforts to put crypto features on their processors. The C5XL shipped with a high quality random number generator and the C5P with two RNG's and AES as well. The next revision will contain SHA on core, which will be directly useful for the types of high volume hashing needed for large archives or high bandwidth links. My hope is that the market reacts positively to these features, and other chip makers start to follow suit. [ Info about the C5XL rng: http://peertech.org/hardware/viarng/ developer guide: /www.via.com.tw/en/images/Products/eden/pdf/PadLock_RNG_prog_guide.pdf ] / From agthorr at barsoom.org Wed Feb 4 16:23:07 2004 From: agthorr at barsoom.org (Agthorr) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Distributed Hashtables - Low Latency/Low Membership Overhead In-Reply-To: <6.0.1.1.2.20040203212724.01e31b38@pop.mail.yahoo.com> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> <6.0.1.1.2.20040203212724.01e31b38@pop.mail.yahoo.com> Message-ID: <20040204162306.GB22096@barsoom.org> On Tue, Feb 03, 2004 at 09:31:54PM -0800, Brad Neuberg wrote: > The membership/maintenance protocol should have low-overhead > (i.e. the background traffic for establishing, maintaining, and > breaking membership in the DHT should be a low-percentage of the > total network traffic). Well, how much will your total network traffic be? How many nodes to do you expect to have, at most? > Other factors can be "pessimized" to achieve these two goals, such > as maintaining extra large routing tables in memory. I think you have conflicting goals there. You can't have large routing tables *and* low churn overhead. -- Agthorr From eugen at leitl.org Wed Feb 4 16:36:26 2004 From: eugen at leitl.org (Eugen Leitl) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Distributed Hashtables - Low Latency/Low Membership Overhead In-Reply-To: <20040204162306.GB22096@barsoom.org> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> <6.0.1.1.2.20040203212724.01e31b38@pop.mail.yahoo.com> <20040204162306.GB22096@barsoom.org> Message-ID: <20040204163626.GK24465@leitl.org> On Wed, Feb 04, 2004 at 08:23:07AM -0800, Agthorr wrote: > I think you have conflicting goals there. You can't have large > routing tables *and* low churn overhead. If the node address space is densely populated (see plenty of p2p connection attempts on dynamically assigned DSL addresses) you can just assume it's a noisy high-dimensional grid, and just store diffs. The routing table is local, and gets refreshed when a delivery fails. -- Eugen* Leitl leitl ______________________________________________________________ ICBM: 48.07078, 11.61144 http://www.leitl.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE http://moleculardevices.org http://nanomachines.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20040204/13ccb57d/attachment.pgp From agthorr at barsoom.org Wed Feb 4 16:54:23 2004 From: agthorr at barsoom.org (Agthorr) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll In-Reply-To: <4020D493.2420AE9C@cs.tu-berlin.de> References: <4020D493.2420AE9C@cs.tu-berlin.de> Message-ID: <20040204165423.GC22265@barsoom.org> On Wed, Feb 04, 2004 at 12:16:35PM +0100, Alexander L?ser wrote: > Chord uses a consistent hash function which equaly distributes keys to > nodes. For a large number of keys, queries are somehow loadbalanced. > However, some keys are more popular than other keys, thus some nodes > receive more queries, more messages are routed to thos nodes, more > bandwith is used. An interesting approach would be to create cache > entries of keys and objects along the lookup paths of successful > queries. See the Caching and Load Balance sections in this paper: http://www.pdos.lcs.mit.edu/papers/cfs:sosp01/cfs_sosp.pdf -- Agthorr From zooko at zooko.com Wed Feb 4 17:28:01 2004 From: zooko at zooko.com (Bryce Wilcox-O'Hearn) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: Message from Jeremiah Rogers of "Tue, 03 Feb 2004 22:48:10 EST." <40206B7A.5070109@kingprimate.com> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> Message-ID: Jeremiah Rogers wrote: > > There was a interesting solution proposed earlier on this list [1] that > would allow for increased hash length in the future if collions occured > while keeping current length (and network usage) lower. It involves > taking a larger hash and shrinking it. ... > [1] http://zgp.org/pipermail/p2p-hackers/2002-November/000977.html Thank you for saying that my idea was interesting. If you read [1], then you might also want to read the other messages in those threads, such as [2] in which I change my mind and decide to use normal old SHA-1. By the way, the final spec that resulted from that design process is [3], and it is now implemented [4] (modulo an erratum and an optional feature). Regards, Zooko [2] http://zgp.org/pipermail/p2p-hackers/2002-November/000984.html [3] http://mnet.sourceforge.net/new_filesystem.html [4] http://cvs.sourceforge.net/viewcvs.py/mnet/mnet_new/mnetlib/filesystem/znff.py?view=markup From p2p at kingprimate.com Wed Feb 4 17:51:56 2004 From: p2p at kingprimate.com (Jeremiah Rogers) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <1075907185.3607.3259.camel@bog> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> <13027.81.227.49.57.1075900763.squirrel@Vaste_lp3.wired> <1075907185.3607.3259.camel@bog> Message-ID: <4021313B.3040703@kingprimate.com> Justin Chapweske wrote: > Practically speaking, increasing the size of the hash is not a bandwidth > issue. It is a CPU issue. So truncating a larger hash simply provides > less useful information at the cost of higher CPU utilization. Sorry, I should have made it more clear that in the network I'm designing in my head (low latency zero-knowledge triple store) hash size has a fairly signifigant impact on the network. With larger files there is a smaller hash/actual data ratio so CPU usage is probably more important. -jr From b.fallenstein at gmx.de Wed Feb 4 18:02:34 2004 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <4021313B.3040703@kingprimate.com> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> <13027.81.227.49.57.1075900763.squirrel@Vaste_lp3.wired> <1075907185.3607.3259.camel@bog> <4021313B.3040703@kingprimate.com> Message-ID: <402133BA.60006@gmx.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm not following -- if you truncate a 512 bit hash to 160 bits, you only transfer 160 bits (that's the point of truncation), so where's the additional bandwidth compared to using a 160 bit hash in the first place? - - Benja Jeremiah Rogers wrote: | Justin Chapweske wrote: | |> Practically speaking, increasing the size of the hash is not a bandwidth |> issue. It is a CPU issue. So truncating a larger hash simply provides |> less useful information at the cost of higher CPU utilization. | | | Sorry, I should have made it more clear that in the network I'm | designing in my head (low latency zero-knowledge triple store) hash size | has a fairly signifigant impact on the network. With larger files there | is a smaller hash/actual data ratio so CPU usage is probably more | important. | | -jr | _______________________________________________ | p2p-hackers mailing list | p2p-hackers@zgp.org | http://zgp.org/mailman/listinfo/p2p-hackers | _______________________________________________ | Here is a web page listing P2P Conferences: | http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences | | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAITO5UvR5J6wSKPMRAlxAAJ9Nc4l0g8Puopq+FSR5LNNxcpCiAwCg0WY2 VLtgckFoCJzrG+kKfrWsd7w= =fuKR -----END PGP SIGNATURE----- From p2p at kingprimate.com Wed Feb 4 18:09:17 2004 From: p2p at kingprimate.com (Jeremiah Rogers) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <13027.81.227.49.57.1075900763.squirrel@Vaste_lp3.wired> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> <13027.81.227.49.57.1075900763.squirrel@Vaste_lp3.wired> Message-ID: <4021354D.9090009@kingprimate.com> Johan F?nge wrote: >> > To fully compare two hashed pieces of data on two peers, one >> > would need to transfer all of the data, correct? It is also >> > not possible to detect "possible" collisions, without >> > increasing the standard transfer rate. (E.g. the output of >> > the hash-function.) >> >>You could compare smaller segments of the data. The likelyhood that both >> halves of a two messages, as well as the entire messages, all have >>matching hashes would be extremely unlikely (3 * 2^n or maybe even 2^3n, >>I'm not sure). I guess it really depends on how sure you want to be that >>the messages aren't the same. > > Unless you already compare smaller segments this _is_ increasing the > standard transfer rate. In filesharing this is sometimes already done to > get finer grained hashes, so there it is not problem. Sorry, I was answering your first question about how to compare files during suspected collisions. If we collect additional information upfront the chances of a collision drop from close to never to closer to never, which (you're right) doesn't really justify increasing the standard transfer rate by too far. >> > Once collection is detected is it's not a hard problem to do >> > something about it. In a DHT, simply note on the node storing >> > the hash with collisions that there are two versions, and the >> > necessary data to distinguish them. After deciding which piece >> > of data it is, then push one or both pieces of data onto some >> > other node (by adding some data, e.g. "1", to the data hashed, >> > and repeat until there's no longer a collision). >> > >> > I believe that for any serious candidate to an all-encompassing >> > namespace, there should be some procedure as to what to do if a >> > collision happens. >> >>Being able to handle a collision without completely failing is >>important, but if you're going to have assertions made about database >>hashes then you need to have unique hashes to prevent refutable or >>misinterpreted statements. > > > I'm not sure I understand what you mean here. The extra information is > collected and used only when a collision is detected. It doesn't affect > the rest of the system. The extra information means that the peer doing > the hash must do a more thorough investigation, and this shouldn't present > much of a problem. This extra information may be an additional hash. (data > has hash x; data + "1" has hash y) Sorry, see comment above. I think I'm getting elements of the network I want to design confused with the (filesharing?) networks others seem to be talking about. Still, if you want to make (cryptographically signed) assertions about given hashes within the network you can't easily handle collisions without some weird protocols for re-establishing non-refutable signatures. But since the collision will never happen (*crosses fingers*) these protocols could indeed be strange and slow. >>There are also zero-knowledge situations where (for example) the DHT >>stores merely the hash H, and although both A and B hash to H, you won't >>be able to tell that. > > > My earlier question was that if you really are _completely_ unable to > detect the error, does it then matter? After all, there is _nothing_ that > says something is wrong. It's a farily hypothetical question. If you can't detect it than it probably doesn't matter. It could lead to strange occurances though, and if it happens that two very popular things hash to the same value the network may fall apart. So the important question for choosing hash length is what the problems are when you have a collision. In a filesharing network it's a pain in the ass but there are copies of the plaintexts availible to compare, in a zero-knowledge network a collision probably goes undetected and can severely screw up the network. -jr From p2p at kingprimate.com Wed Feb 4 18:26:15 2004 From: p2p at kingprimate.com (Jeremiah Rogers) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] References for using hashes as unique identifie rs? In-Reply-To: <402133BA.60006@gmx.de> References: <200402031908.i13J8VT10926@finney.org> <12327.81.227.49.57.1075848623.squirrel@Vaste_lp3.wired> <40206B7A.5070109@kingprimate.com> <13027.81.227.49.57.1075900763.squirrel@Vaste_lp3.wired> <1075907185.3607.3259.camel@bog> <4021313B.3040703@kingprimate.com> <402133BA.60006@gmx.de> Message-ID: <40213947.8090908@kingprimate.com> What I'm saying is that transferring a larger hash in this network is a bandwith problem -- with this network design basically all you transfer is hashes (since it's zk). But if you want collision resistance for the future by being able to up the DHT-key hash length later, you could compute the 512 bit hash and then only use it's 160 bit subset as the DHT-key. Later if the network gets too close to collisions you could use a larger subset as the key, while not having to throw away all 160 bit keys. Benja Fallenstein wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > I'm not following -- if you truncate a 512 bit hash to 160 bits, you > only transfer 160 bits (that's the point of truncation), so where's the > additional bandwidth compared to using a 160 bit hash in the first place? > > - - Benja > > Jeremiah Rogers wrote: > | Justin Chapweske wrote: > | > |> Practically speaking, increasing the size of the hash is not a bandwidth > |> issue. It is a CPU issue. So truncating a larger hash simply provides > |> less useful information at the cost of higher CPU utilization. > | > | > | Sorry, I should have made it more clear that in the network I'm > | designing in my head (low latency zero-knowledge triple store) hash size > | has a fairly signifigant impact on the network. With larger files there > | is a smaller hash/actual data ratio so CPU usage is probably more > | important. > | > | -jr > | _______________________________________________ > | p2p-hackers mailing list > | p2p-hackers@zgp.org > | http://zgp.org/mailman/listinfo/p2p-hackers > | _______________________________________________ > | Here is a web page listing P2P Conferences: > | http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > | > | > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFAITO5UvR5J6wSKPMRAlxAAJ9Nc4l0g8Puopq+FSR5LNNxcpCiAwCg0WY2 > VLtgckFoCJzrG+kKfrWsd7w= > =fuKR > -----END PGP SIGNATURE----- > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > From Paul.Harrison at infotech.monash.edu.au Thu Feb 5 04:20:52 2004 From: Paul.Harrison at infotech.monash.edu.au (Paul Harrison) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll In-Reply-To: <4020D493.2420AE9C@cs.tu-berlin.de> Message-ID: On Wed, 4 Feb 2004, Alexander [iso-8859-1] L=F6ser wrote: > Chord uses a consistent hash function which equaly distributes keys to > nodes. For a large number of keys, queries are somehow loadbalanced. > However, some keys are more popular than other keys, thus some nodes > receive more queries, more messages are routed to thos nodes, more > bandwith is used. An interesting approach would be to create cache > entries of keys and objects along the lookup paths of successful > queries. > > Does anybody now experiences with this approach? > Which approaches for query loadbalancing are known to you? With a little work, you can probabilistically load balance common keys in a DHT. This is how Circle does it (Circle is a chord implementation, thecircle.org.au): Each key in Circle consists of a 16 byte MD5 and an additional 4 random bytes. Thus to do a look-up you are not looking for a key so much as a very short span of the key space, ie [md5]00000000 through to [md5]ffffffff. This is no harder than a single-key look-up. Actual entries for a particular md5 will be scattered evenly along this span. This gives you the potential to split single keys across multiple nodes. To take advantage of this, Circle chooses as node id some point in the middle of the span of one of the keys it published last session. So if a key is common, it is likely that several nodes will choose node ids in the middle of its span, thus splitting it across those nodes. For each look-up you pick a random key within the span you want and work up and down the hashtable from there until you have enough results. This means each node within that span gets its fair share of the load. Unlike caching schemes, this is fully scalable. A caching scheme still requires that at least one node knows of all the instances of a particular key. Once you have this you can safely start adding entries for general concepts into your DHT, which is quite useful: "I am a person, you can talk to me", "I can cache data", "I'm willing to do distributed computing stuff", "I am part of some other specialized P2P network", etc. cheers, Paul pfh@logarithmic.net | http://www.logarithmic.net/pfh Current cost to save one life: AU$300 / US$200 www.unicef.org www.oxfam.org From coderman at peertech.org Thu Feb 5 03:55:54 2004 From: coderman at peertech.org (coderman) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll In-Reply-To: References: Message-ID: <4021BECA.3040005@peertech.org> Paul Harrison wrote: >... >For each look-up you pick a random key within the span you want and work >up and down the hashtable from there until you have enough results. This >means each node within that span gets its fair share of the load. > >Unlike caching schemes, this is fully scalable. A caching scheme still >requires that at least one node knows of all the instances of a particular >key. > Maybe we are thinking of different kinds of loaded keys, but querying multiple nodes for a given entry would be worse than just querying the right node directly the first time. Very loaded keys (the search for the "britney" key as the canonical example) will overwhelm any individual node. It would be required to distribute the load across multiple nodes _but_ require only one hit to match. Traversing to multiple nodes in this situation only compounds the problem. Caching helps with the "only one hit required" portion of this requirement, but does suffer from the cache coherence problem you mention: you pay for management overhead associated with the caching. I'm not sure how this particular problem could be resolved in a DHT, although there are certainly techniques to minimize this as much as possible. From gcarreno at gcarreno.org Thu Feb 5 04:31:54 2004 From: gcarreno at gcarreno.org (Gustavo Carreno) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll In-Reply-To: <4021BECA.3040005@peertech.org> References: <4021BECA.3040005@peertech.org> Message-ID: <82109911694.20040205043154@gcarreno.org> Hello coderman, Thursday, February 5, 2004, 3:55:54 AM, you wrote: c> Paul Harrison wrote: >>Unlike caching schemes, this is fully scalable. A caching scheme still >>requires that at least one node knows of all the instances of a particular >>key. >> c> Maybe we are thinking of different kinds of loaded keys, but querying c> multiple nodes for a given entry would be worse than just querying the c> right node directly the first time. I'm just a by-stander with not much knowledge on DHT nor on P2P advanced schemes, but if someone tells me that Circle is doing a similar search has the Fast Sort implementation and the cached scheme ir more like a bubble sort kind, I'm picking the Fast Sort implementation. From the little that I've understood 'till now of any chord implementation, it's quite "node droping" resistent and a cached scheme will have the need for _THAT_ node to exist or pass a _HUGE_ amount of data to the next node, just before getting out, assuming that it's not getting bumped due to network outage. So, "node dropping" resistence and a similarity to Fast Search, splitting the search "universe" by halfs, is it not better than one central cache? Please keep in mind that I'm the least knowgeable person in matters of Hashing, P2P and the likes, so if I'm just tripping on my tongue, you have the obligation to correct me and slap me on the face :) Gustavo Carreno -=[ "When you know Slackware you know Linux. When you know Red Hat, all you know is Red Hat" ]=- From Paul.Harrison at infotech.monash.edu.au Thu Feb 5 13:57:50 2004 From: Paul.Harrison at infotech.monash.edu.au (Paul Harrison) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll In-Reply-To: <4021BECA.3040005@peertech.org> Message-ID: On Wed, 4 Feb 2004, coderman wrote: > Paul Harrison wrote: > > >... > >For each look-up you pick a random key within the span you want and work > >up and down the hashtable from there until you have enough results. This > >means each node within that span gets its fair share of the load. Gustavo, you compared this to bubble sort... reading it over, it is kind of unclear, I'm going to have to do a little braindump before i explain. In a chord, each node has a random node id. Each node is responsible for the chunk of hashtable between that node id and the smallest node id greater than its node id on the network. So the hash-table is split across all the nodes in the network. I'm assuming that each node knows how to contact its "neighbours". That is, the node with the largest node id less than its own node id, and the node with the smallest node id larger than its own node id. Nodes in a chord need to know their neighbours in order to maintain the network's structure. With these links, it doesn't cost much to traverse the nodes in order. So once you've found one node that lies in the span you're looking for, it's cheap to explore the whole span by walking up and down it (like a linked list). > > > >Unlike caching schemes, this is fully scalable. A caching scheme still > >requires that at least one node knows of all the instances of a particular > >key. > > > Maybe we are thinking of different kinds of loaded keys, but querying > multiple nodes for a given entry would be worse than just querying the > right node directly the first time. > What i had in mind is a very common key, maybe even one that nearly everyone in the network publishes. So it's impractical for a single node to store every instance of it. But also, when you go to look it up, you might only need a few hundred hits to find what you wanted. If you're looking for Britney's latest hit, any one of the 1 billion available copies will do. The fundamental concept is that a key need not be a single point. It can be made into a (very small) segment which is divisible into pieces. So there's no longer a single right node. With the scheme i have in mind, you would hit one randomly chosen point in the span you want. For all but the most common keys a single node would be responsible for the whole span and things work exactly as in a normal chord. For common keys, it would still almost always give you enough results you need, but if you want more you have the option of walking up and down the table to get the rest. > Very loaded keys (the search for the "britney" key as the canonical > example) will overwhelm any individual node. It would be required to > distribute > the load across multiple nodes _but_ require only one hit to match. > Traversing > to multiple nodes in this situation only compounds the problem. > I think we do have the same thing in mind. > Caching helps with the "only one hit required" portion of this > requirement, but > does suffer from the cache coherence problem you mention: you pay for > management overhead associated with the caching. > > I'm not sure how this particular problem could be resolved in a DHT, > although > there are certainly techniques to minimize this as much as possible. You need some sort of redundancy in a DHT even without common keys, nodes can fail at any time. Publishing two versions of each key would be a start (maybe the md5 and the md5 with the top bit flipped). cheers, Paul pfh@logarithmic.net | http://www.logarithmic.net/pfh Current cost to save one life: AU$300 / US$200 www.unicef.org www.oxfam.org From gcarreno at gcarreno.org Thu Feb 5 16:55:46 2004 From: gcarreno at gcarreno.org (Gustavo Carreno) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll In-Reply-To: References: Message-ID: <24154544463.20040205165546@gcarreno.org> Hello Paul, Thursday, February 5, 2004, 1:57:50 PM, you wrote: PH> Gustavo, you compared this to bubble sort... reading it over, it is kind PH> of unclear, I'm going to have to do a little braindump before i explain. Well, I did say that I'm a complete ignorant on the matter, but from the reading on this thread that I've done, that was the fealing that I had: - Chord could be compared to a FastSort. - The cached system could be compared with a BubbleSort. Of course this is a VERY simplistic way of looking at it. Gustavo Carreno -=[ "When you know Slackware you know Linux. When you know Red Hat, all you know is Red Hat" ]=- From nl at essential.com.au Fri Feb 6 04:40:59 2004 From: nl at essential.com.au (Nick Lothian) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Re: P2P journal copyright Message-ID: > > Sam Joseph, P2P journal wrote: > >We are trying to adjust our copyright policy to make everyone happy. > >We are in the process of trying to improve the wording of > the policy, and > >I don't understand why you don't want to help us. > >In fact here is the latest version of the copyright policy > >"P2PJ wants to find a balance between author's copyright and > ensuring > >articles in P2PJ are unique. P2PJ prefers original, quality > articles. > >Quality is more important than quantity. > > > >By submitting the article to P2PJ, the author grants P2PJ and its > >affiliated organizations and entities, perpetual, > royalty-free, worldwide > >licenses for both electronic and print formats. Authors are > granted rights > >to reproduce their articles provided that they include a prominent > >statement: 'The piece originally appeared in P2P Journal'. A clearly > >visible link to http://p2pjournal.com must also be made. > A couple of interesting references on journal copyright (more directly related to problems Elsevier publishing, but relevant none the less), from 1) Donald Knuth: http://www-cs-faculty.stanford.edu/~knuth/joalet.pdf 2) Scientific Publishing: A Mathematician's Viewpoint: http://www.ams.org/notices/200007/forum-birman.pdf Nick From coderman at peertech.org Fri Feb 6 06:08:53 2004 From: coderman at peertech.org (coderman) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll In-Reply-To: References: Message-ID: <40232F75.6070504@peertech.org> Paul Harrison wrote: >What i had in mind is a very common key, maybe even one that nearly >everyone in the network publishes. ... > >The fundamental concept is that a key need not be a single point. It can >be made into a (very small) segment which is divisible into pieces. So >there's no longer a single right node. ... > >I think we do have the same thing in mind. > > Yes, this handles higher loads nicely. And brings up another good question: How do the various DHT's handle malicious nodes trying to "spam" or flood a given key? The easier it is for various nodes to assist or join in caching or handling a specific section of the keyspace, the easier it is to spam or flood popular keys with potentially bogus information. If you are addressing content that is self certifying this is somewhat avoided, but for other types of lookup this would be problematic. Any ideas? From jdd at dixons.org Fri Feb 6 14:44:58 2004 From: jdd at dixons.org (Jim Dixon) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches for the Chord protocoll In-Reply-To: <40232F75.6070504@peertech.org> Message-ID: <20040206144123.F94769-100000@localhost> On Thu, 5 Feb 2004, coderman wrote: > How do the various DHT's handle malicious nodes trying to "spam" or flood > a given key? Most or all of the DHTs have well-defined user communities in which the spammer is readily identifiable. They can then be dealt with through adminstrative processes. Typically the user has a digital certificate of some sort. > The easier it is for various nodes to assist or join in caching or > handling a > specific section of the keyspace, the easier it is to spam or flood popular > keys with potentially bogus information. If you are addressing content > that is self certifying this is somewhat avoided, but for other types of > lookup this would be problematic. -- Jim Dixon jdd@dixons.org tel +44 117 982 0786 mobile +44 797 373 7881 http://jxcl.sourceforge.net Java unit test coverage http://xlattice.sourceforge.net p2p communications infrastructure From aloeser at cs.tu-berlin.de Fri Feb 6 15:56:49 2004 From: aloeser at cs.tu-berlin.de (Alexander =?iso-8859-1?Q?L=F6ser?=) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] Request for query load balancing approaches forthe Chord protocoll References: <20040206144123.F94769-100000@localhost> Message-ID: <4023B941.2C7D1C7D@cs.tu-berlin.de> Hi all , thank you so far for your feedback on query load balancing strategies in CHORD. I'd like to point you to an additional issue in this area: What kind of load balancing strategies exist for conjunctive queries in distributed hash tables? Consider the following two terms hashed in the Chord index: $AB1234=SHA-1("Style:Bossa Nova") $DF2388=SHA-1("Composer:Gilberto") $AB1234 value: URI http://"Girl from Ipanema" $AB1234 value: URI http://"Girl from Ipanema" and a lookup for songs from Gilberto in Bossa Nova Style -> Lookup ($DF2388 AND $AB1234) I know only ony approach from Datta/Aberer, where popular queries combinations are indexed temporary in the Chord table with a new key. Does anybody know other approaches? Alex Jim Dixon wrote: > On Thu, 5 Feb 2004, coderman wrote: > > > How do the various DHT's handle malicious nodes trying to "spam" or flood > > a given key? > > Most or all of the DHTs have well-defined user communities in which the > spammer is readily identifiable. They can then be dealt with through > adminstrative processes. > > Typically the user has a digital certificate of some sort. > > > The easier it is for various nodes to assist or join in caching or > > handling a > > specific section of the keyspace, the easier it is to spam or flood popular > > keys with potentially bogus information. If you are addressing content > > that is self certifying this is somewhat avoided, but for other types of > > lookup this would be problematic. > > -- > Jim Dixon jdd@dixons.org tel +44 117 982 0786 mobile +44 797 373 7881 > http://jxcl.sourceforge.net Java unit test coverage > http://xlattice.sourceforge.net p2p communications infrastructure > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences -- ___________________________________________________________ M.Sc., Dipl. Wi.-Inf. Alexander L?ser Technische Universitaet Berlin Fakultaet IV - CIS bmb+f-Projekt: "New Economy, Neue Medien in der Bildung" hp: http://cis.cs.tu-berlin.de/~aloeser/ office: +49- 30-314-25551 fax : +49- 30-314-21601 ___________________________________________________________ From anwitaman at hotmail.com Mon Feb 9 11:30:46 2004 From: anwitaman at hotmail.com (Anwitaman Datta) Date: Sat Dec 9 22:12:37 2006 Subject: [p2p-hackers] load-balancing/query-adaptive indexing/DHTs Message-ID: Hi Alex (and all) Just to clarify one detail, that our work was evaluated not for Chord, but p-grid (www.p-grid.org), however most of the ideas ought to be applicable for a wide variety of DHTs including Chord. Here are the two papers which Alex must be referring to, where we tackle some of the issues. http://www.p-grid.org/Papers/TR-IC-2003-32.pdf http://www.p-grid.org/Papers/TR-IC-2003-69.pdf rgds, A. Message: 4 Date: Fri, 06 Feb 2004 16:56:49 +0100 From: Alexander L?ser Subject: Re: [p2p-hackers] Request for query load balancing approaches forthe Chord protocoll To: "Peer-to-peer development." Message-ID: <4023B941.2C7D1C7D@cs.tu-berlin.de> Content-Type: text/plain; charset=iso-8859-1 Hi all , thank you so far for your feedback on query load balancing strategies in CHORD. I'd like to point you to an additional issue in this area: What kind of load balancing strategies exist for conjunctive queries in distributed hash tables? Consider the following two terms hashed in the Chord index: $AB1234=SHA-1("Style:Bossa Nova") $DF2388=SHA-1("Composer:Gilberto") $AB1234 value: URI http://"Girl from Ipanema" $AB1234 value: URI http://"Girl from Ipanema" and a lookup for songs from Gilberto in Bossa Nova Style -> Lookup ($DF2388 AND $AB1234) I know only ony approach from Datta/Aberer, where popular queries combinations are indexed temporary in the Chord table with a new key. Does anybody know other approaches? Alex _________________________________________________________________ Contact brides & grooms FREE! http://www.shaadi.com/ptnr.php?ptnr=hmltag Only on www.shaadi.com. Register now! From lgonze at panix.com Wed Feb 11 18:38:54 2004 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL Message-ID: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> Dear Zooko, The Google "I feel lucky" button is a counter-example to your conjecture that names cannot be all three of secure, decentralized and memorable. Because "I feel lucky" is very accurate about the target object (for some names), it is secure (for some names). In real world usage of a web browser, the name "Dave Winer" most probably does identify the web site "http://scripting.com", and "I feel lucky" gets the answer to this question right. Because "I feel lucky" does not require a centralized registry, it is decentralized. The centralized entity Google decides on name assertions like "Dave Winer"->http://scripting.com based on decentralized information sources. If other search engines also operated "I feel lucky" services, they would also probably come to the same conclusion about this name, so both the information sources and name resolution services are decentralized. Because "I feel lucky" allows short bit strings to be used as names, it is memorable. In this counter-example, I have added one concept to the three in your original essay; the idea of limiting which memorable names can be resolved in a way that is secure and decentralized. I hope you will agree that this is within the bounds of the problem. best, sincerely, yours truly, etc, Lucas From sam at neurogrid.com Wed Feb 11 20:57:40 2004 From: sam at neurogrid.com (Sam Joseph) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] P2P workshops in Hawaii Message-ID: <402A9744.3090005@neurogrid.com> Hi All, Pardon the cross-posting, but I just found some new P2P workshops in Hawaii with abstract deadlines coming up soon. http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences CHEERS> SAM From mccoy at mad-scientist.com Wed Feb 11 21:57:27 2004 From: mccoy at mad-scientist.com (Jim McCoy) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> Message-ID: <47D50CA7-5CDD-11D8-90BE-000393071F50@mad-scientist.com> On Feb 11, 2004, at 10:38 AM, Lucas Gonze wrote: > > The Google "I feel lucky" button is a counter-example to [Zooko's] > conjecture that names cannot be all three of secure, decentralized and > memorable. I think that your liberal use of "(for some names)" betrays the weakness of this example. Google-bombing is a simple counter-example to your suggestion. I would propose that the "consensus opinion" nature of the google page ranking mechanism weakens any suggestion that it might be secure. It is not secure, but it is decentralized (google is effectively the world's largest online reputation system) and it can support memorable identifier tags. If I use the google IFL button today for a query it will returns a particular result. In a secure system I should be able to state with complete confidence that a similar query made in one year hence will return the exact same result (or a result that the current "owner" of that name has delegated to.) Only if I was feeling very, very lucky would I make such a claim. To be secure I need to be able to use a "name" as a reference that I can hand to someone else and know that they will get the same answer. The top-rank in a google search is a very ephemeral condition and it is subject to change without notice at seemingly random intervals. This hardly qualifies as secure. Jim McCoy From lgonze at panix.com Wed Feb 11 22:39:40 2004 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <47D50CA7-5CDD-11D8-90BE-000393071F50@mad-scientist.com> References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> <47D50CA7-5CDD-11D8-90BE-000393071F50@mad-scientist.com> Message-ID: On Wed, 11 Feb 2004, Jim McCoy wrote: > I think that your liberal use of "(for some names)" betrays the > weakness of this example. "(for some names)" isn't a crutch, it's meat of my idea. > Google-bombing is a simple counter-example to your suggestion. I would > propose that the "consensus opinion" nature of the google page ranking > mechanism weakens any suggestion that it might be secure. When Google-bombing is used to add a name for something, which is the usual situation, it doesn't necessarily destroy an old name, so doesn't affect security. When it's used to switch a name from one target to another, the original target of the name had to have too little consensus behind it. You would have a really hard time convincing Google, the Internet Archive, Overture, and the MSN search engine that the name "Google" refers to "http://nytimes.com", for example. The problem isn't with consensus in general, it is with which names you choose to use. This is up to the judgement of the user. If you're not so clever you'll use a bad consensus name like "John", if you are you'll use a good consensus name like "Blosxom." > If I use the google IFL button today for a query it will returns a > particular result. In a secure system I should be able to state with > complete confidence that a similar query made in one year hence will > return the exact same result (or a result that the current "owner" of > that name has delegated to.) Only if I was feeling very, very lucky > would I make such a claim. To be secure I need to be able to use a > "name" as a reference that I can hand to someone else and know that > they will get the same answer. The top-rank in a google search is a > very ephemeral condition and it is subject to change without notice at > seemingly random intervals. This hardly qualifies as secure. Ok, so assume that you can only say with high certainty that a consensus name is true for today. Your name is then scoped to a specific date and can be looked up as the Google name for the day you used it. That said, one-day only names are not useful in the first place. You just shouldn't use them. But there are a lot of names likely to hold their place for a few years, long enough for a name to be dereferenced and converted to a long number. What matters is how strong the consensus is. For some names there isn't enough, for others there is. - Lucas From bkn3 at columbia.edu Wed Feb 11 23:21:14 2004 From: bkn3 at columbia.edu (Brad Neuberg) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> <47D50CA7-5CDD-11D8-90BE-000393071F50@mad-scientist.com> Message-ID: <6.0.1.1.2.20040211151942.01e63438@pop.mail.yahoo.com> Remember, though, that names don't just identify web-sites. They can also identify email addresses ("joeschmoe@somewhere.com"), instant messaging endpoints, and phone-numbers. While your example of using Google to skirt around Zookos Triangle for identifying web sites and web pages is interesting, it would break down if you are using it to identify messaging endpoints to a particular user or group. Brad Neuberg bkn3@columbia.edu At 02:39 PM 2/11/2004, you wrote: >On Wed, 11 Feb 2004, Jim McCoy wrote: > > I think that your liberal use of "(for some names)" betrays the > > weakness of this example. > >"(for some names)" isn't a crutch, it's meat of my idea. > > > Google-bombing is a simple counter-example to your suggestion. I would > > propose that the "consensus opinion" nature of the google page ranking > > mechanism weakens any suggestion that it might be secure. > >When Google-bombing is used to add a name for something, which is the >usual situation, it doesn't necessarily destroy an old name, so doesn't >affect security. > >When it's used to switch a name from one target to another, the original >target of the name had to have too little consensus behind it. You would >have a really hard time convincing Google, the Internet Archive, Overture, >and the MSN search engine that the name "Google" refers to >"http://nytimes.com", for example. > >The problem isn't with consensus in general, it is with which names you >choose to use. This is up to the judgement of the user. If you're not so >clever you'll use a bad consensus name like "John", if you are you'll use >a good consensus name like "Blosxom." > > > If I use the google IFL button today for a query it will returns a > > particular result. In a secure system I should be able to state with > > complete confidence that a similar query made in one year hence will > > return the exact same result (or a result that the current "owner" of > > that name has delegated to.) Only if I was feeling very, very lucky > > would I make such a claim. To be secure I need to be able to use a > > "name" as a reference that I can hand to someone else and know that > > they will get the same answer. The top-rank in a google search is a > > very ephemeral condition and it is subject to change without notice at > > seemingly random intervals. This hardly qualifies as secure. > >Ok, so assume that you can only say with high certainty that a consensus >name is true for today. Your name is then scoped to a specific date and >can be looked up as the Google name for the day you used it. > >That said, one-day only names are not useful in the first place. You just >shouldn't use them. But there are a lot of names likely to hold their >place for a few years, long enough for a name to be dereferenced and >converted to a long number. > >What matters is how strong the consensus is. For some names there isn't >enough, for others there is. > >- Lucas > > > >_______________________________________________ >p2p-hackers mailing list >p2p-hackers@zgp.org >http://zgp.org/mailman/listinfo/p2p-hackers >_______________________________________________ >Here is a web page listing P2P Conferences: >http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From lgonze at panix.com Thu Feb 12 00:32:47 2004 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <6.0.1.1.2.20040211151942.01e63438@pop.mail.yahoo.com> References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> <47D50CA7-5CDD-11D8-90BE-000393071F50@mad-scientist.com> <6.0.1.1.2.20040211151942.01e63438@pop.mail.yahoo.com> Message-ID: On Wed, 11 Feb 2004, Brad Neuberg wrote: > Remember, though, that names don't just identify web-sites. They can also > identify email addresses ("joeschmoe@somewhere.com"), instant messaging > endpoints, and phone-numbers. While your example of using Google to skirt > around Zookos Triangle for identifying web sites and web pages is > interesting, it would break down if you are using it to identify messaging > endpoints to a particular user or group. Can you articulate why it would break down, Brad? The thing about "I feel lucky" sorts of names is not so much that they are only useful for web pages, or at least that's not my intention. My thought is that IFL is a simple and concrete counter example of a class of names that acts differently from either ICANN names or self-authenticating names. This class of names will have about the same properties as verbal name systems. They'll be probabilistic. Speakers will have the burden of choosing a less ambiguous name if they want to address within a larger scope. The scope of a name will be flexible. The context in which a name is used will affect the likelyhood that one object rather than another is the target. Names will change hands over time. "I feel lucky" itself is not exactly a perfect name resolver. Google attempts to answer the question "what web page is this person thinking of." It might also attempt to answer other questions like "what SIP listener?" or "what MX server?" It might learn to take context into consideration, ending up with a hybrid between pet names and consensus names. - Lucas > > Brad Neuberg > bkn3@columbia.edu > > At 02:39 PM 2/11/2004, you wrote: > > >On Wed, 11 Feb 2004, Jim McCoy wrote: > > > I think that your liberal use of "(for some names)" betrays the > > > weakness of this example. > > > >"(for some names)" isn't a crutch, it's meat of my idea. > > > > > Google-bombing is a simple counter-example to your suggestion. I would > > > propose that the "consensus opinion" nature of the google page ranking > > > mechanism weakens any suggestion that it might be secure. > > > >When Google-bombing is used to add a name for something, which is the > >usual situation, it doesn't necessarily destroy an old name, so doesn't > >affect security. > > > >When it's used to switch a name from one target to another, the original > >target of the name had to have too little consensus behind it. You would > >have a really hard time convincing Google, the Internet Archive, Overture, > >and the MSN search engine that the name "Google" refers to > >"http://nytimes.com", for example. > > > >The problem isn't with consensus in general, it is with which names you > >choose to use. This is up to the judgement of the user. If you're not so > >clever you'll use a bad consensus name like "John", if you are you'll use > >a good consensus name like "Blosxom." > > > > > If I use the google IFL button today for a query it will returns a > > > particular result. In a secure system I should be able to state with > > > complete confidence that a similar query made in one year hence will > > > return the exact same result (or a result that the current "owner" of > > > that name has delegated to.) Only if I was feeling very, very lucky > > > would I make such a claim. To be secure I need to be able to use a > > > "name" as a reference that I can hand to someone else and know that > > > they will get the same answer. The top-rank in a google search is a > > > very ephemeral condition and it is subject to change without notice at > > > seemingly random intervals. This hardly qualifies as secure. > > > >Ok, so assume that you can only say with high certainty that a consensus > >name is true for today. Your name is then scoped to a specific date and > >can be looked up as the Google name for the day you used it. > > > >That said, one-day only names are not useful in the first place. You just > >shouldn't use them. But there are a lot of names likely to hold their > >place for a few years, long enough for a name to be dereferenced and > >converted to a long number. > > > >What matters is how strong the consensus is. For some names there isn't > >enough, for others there is. > > > >- Lucas > > > > > > > >_______________________________________________ > >p2p-hackers mailing list > >p2p-hackers@zgp.org > >http://zgp.org/mailman/listinfo/p2p-hackers > >_______________________________________________ > >Here is a web page listing P2P Conferences: > >http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > From mccoy at mad-scientist.com Thu Feb 12 01:11:14 2004 From: mccoy at mad-scientist.com (Jim McCoy) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> <47D50CA7-5CDD-11D8-90BE-000393071F50@mad-scientist.com> <6.0.1.1.2.20040211151942.01e63438@pop.mail.yahoo.com> Message-ID: <5A42F906-5CF8-11D8-90BE-000393071F50@mad-scientist.com> I guess I am having a hard time seeing how this differs from a collective "pet names" system (I am sure someone will fill in the ref to markm's original paper) with SDSI-like deferals except for the fact that it introduces really horrible trust problems. The problem with a (SDSI-like) Google's IFL's "Dave Winer" is that the middle stage in this name chain is not very stable and so it is hard to place any real value on what is returned for the last key in the chain. While it is true that for some names this will end up being a semi-stable reference, for most others it fails to convey any context and leads to ambiguous answers. If we place any importance on this ephemeral link-weighting then it will simply raise the value of attacks upon it. How hard do you really think it would be to get the IFL result for Dave Winer to point to http://www.gonze.com? Another problem with relying upon this sort of a power law system is that it degrades very quickly and dramatically once you get away from the big connectors. The more keywords/context we need to add to this "name" to dis-ambiguate it the less useful it becomes compared to using a real pet name. > "I feel lucky" itself is not exactly a perfect name resolver. Google > attempts to answer the question "what web page is this person thinking > of." No, google is trying to answer the question "what web page has the highest rank in our system when linked to these specific keywords." Don't try to read any sort of AI-like intention into the pagerank algorithm, it just doesn't work that way. Want a simple example? Try this little IFL lookup: http://www.google.com/search?q=pet+names&btnI Jim From list at waterken.net Thu Feb 12 16:24:03 2004 From: list at waterken.net (Tyler Close) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> Message-ID: <200402120824.03646.list@waterken.net> On Wed February 11 2004 10:38 am, Lucas Gonze wrote: > The Google "I feel lucky" button is a counter-example to your > conjecture that names cannot be all three of secure, decentralized and > memorable. So if Google IFLs are used as identifiers, how does Google securely identify the result of an IFL lookup, without losing the decentralized property? Tyler -- The union of REST and capability-based security. http://www.waterken.com/dev/Web/ From lgonze at panix.com Thu Feb 12 18:10:19 2004 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <5A42F906-5CF8-11D8-90BE-000393071F50@mad-scientist.com> Message-ID: On Wed, 11 Feb 2004, Jim McCoy wrote: > I guess I am having a hard time seeing how this differs from a > collective "pet names" system (I am sure someone will fill in the ref > to markm's original paper) with SDSI-like deferals except for the fact > that it introduces really horrible trust problems. The problem with a > (SDSI-like) Google's IFL's "Dave Winer" is that the middle stage in > this name chain is not very stable and so it is hard to place any real > value on what is returned for the last key in the chain. That's an unusually broad way of interpreting pet names. A collective pet names system with SDSI-like deferals would normally not be considered pet names at all. That said, SDSI is a good point of reference, along with PageRank, the PGP web of trust, and FOAF. The PGP web of trust is a pretty good place to look for RMS' public key, for example, and for most applications I would be happy to address his public key using a URI like pgpwot:"Richard M. Stallman". And this doesn't strike me as a horrible trust problem in any way. > While it is true that for some names this will end up being a > semi-stable reference, for most others it fails to convey any context > and leads to ambiguous answers. Summarizing the conversation at this point, this question is not whether consensus names are decentralized, secure and memorable, but whether they are "complex, limited, and risky." Complexity: PageRank and related algorithms are very well known ground at this point. Risky: the most stable consensus names might have a lifetime of ten years. (EG, "ietf" will probably continue to map to "http://ietf.org" for another ten years). The least stable secure hashes might have a lifetime of ten years. Limited: given that the number of consensus names with acceptable stability is low, how can all the other objects that need names get them? Use members of the decentralized, secure, and memorable namespace as tree roots. Whatever the number of root nodes, the namespace would be less centralized than namespaces administered by ICANN or Verisign. > If we place any importance on this ephemeral link-weighting then it > will simply raise the value of attacks upon it. How hard do you really > think it would be to get the IFL result for Dave Winer to point to > http://www.gonze.com? Pretty hard, and that's assuming mappings as informal as (Dave Winer, http://scripting.com) even need to be used. Google rankings are under heavy attack by well-funded entities every day; despite this, top ranked items like the New York Times have been stable for a long time. ... A meta comment: I wrote my original note around Google "I feel lucky" because it's a simple way of expressing the idea. If you read that in an overly literal way you will miss the point. - Lucas Gonze From lgonze at panix.com Thu Feb 12 18:16:54 2004 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <200402120824.03646.list@waterken.net> Message-ID: On Thursday, Feb 12, 2004, at 11:24 America/New_York, Tyler Close wrote: > So if Google IFLs are used as identifiers, how does Google > securely identify the result of an IFL lookup, without losing the > decentralized property? You shouldn't use an IFL name that isn't consistent across search engines you trust. Any conscientious effort to use PageRank on the same well known set of crawl results (e.g. a snapshot of the web taken on May 5, 2003) should give the same result. If there is a problem with one name, let that name go. - Lucas From zooko at zooko.com Thu Feb 12 19:36:54 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message from Lucas Gonze of "Wed, 11 Feb 2004 13:38:54 EST." <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> Message-ID: Wednesday morning I was thinking of writing an essay criticizing a widespread and sloppy use of the word "secure". Wednesday evening I saw, from Lucas Gonze's message to p2p-hackers, that I am eminently guilty of this sloppy usage. Sometimes people ask me "Is it possible to have a secure such-and-such?". The answer is always "It depends on what you mean by 'secure'.". It depends on what policy you wish to have securely enforced in this system. Two different people might ask for a "secure such-and-such", but have different and even mutually incompatible ideas of what policy will be enforced. It isn't surprising that people have different ideas about what is desirable in a system, but it is a little surprising when they assume that security implies their own desiderata. The word "security" is ripe with multiple possibilities. People sloppily use it to mean "the qualities that I desire are still present even if the system is under attack". They forget that other people desire other qualities. My essay "Names: Decentralized, Secure, Human-Meaningful: Choose Two" [1] blunders right into this confusion. In that essay I make a sweeping assertion about the realms of possibility without explaining clearly what I mean by "secure". In my defense, buried down in the fiddly bits at the bottom of the essay is a comment: """ The first step will be to specify what it *means* for the decentralized human-memorizable names to be secure, which is the same as specifying what universal policy should govern ownership of names. (For example, in the case of CHKs, to be secure means that you can't have a collision such that one CHK identifies two different bitstrings which, conveniently enough, is part of the definition of security for cryptographic hashes. In the case of SSKs, to be secure means that only the holder of the private key can change the mapping from a given SSKs to its object, which is conveniently similar to the traditional notion of security for digital signatures.) """ However, I never defined what I meant by "secure" in general in that essay. I will now attempt to do so. By "a secure naming scheme" I mean that the scheme has referential integrity. A person, Alice, sends a message to another person, Bob. That message contains a name. Bob uses the naming system to de-reference that name, resulting in an object. "Referential integrity" means that nobody can cause the resulting object to be other than what Alice intended. There are a lot of implications of this which I understand only partly at this point. Anyway, I'll stop for now and send out this message. Regards, Zooko [1] http://zooko.com/distnames.html From zooko at zooko.com Thu Feb 12 20:21:20 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message from "Zooko O'Whielacronx" of "12 Feb 2004 14:36:54 EST." References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> Message-ID: [following up to my own post] Jack Lloyd wrote to me privately to point out that my definition of "referential integrity" might inadvertently include availability. > "Referential integrity" means that nobody can cause the resulting object to be > other than what Alice intended. I suggest that we follow the tradition of computer security and separate "violation of referential integrity" -- substituting a bogus object in place of the object that Alice meant -- from "denial of service", i.e. preventing Bob from getting any object. Regards, Zooko From lgonze at panix.com Thu Feb 12 20:29:00 2004 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message-ID: <17952616-5D9A-11D8-8D19-000393455590@panix.com> On Thursday, Feb 12, 2004, at 14:36 America/New_York, Zooko O'Whielacronx wrote: > Wednesday morning I was thinking of writing an essay criticizing a > widespread > and sloppy use of the word "secure". Wednesday evening I saw, from > Lucas > Gonze's message to p2p-hackers, that I am eminently guilty of this > sloppy usage. Can you articulate how that relates to my message, Zooko? Thanks. - Lucas From bkn3 at columbia.edu Thu Feb 12 20:29:59 2004 From: bkn3 at columbia.edu (Brad Neuberg) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <5A42F906-5CF8-11D8-90BE-000393071F50@mad-scientist.com> References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> <47D50CA7-5CDD-11D8-90BE-000393071F50@mad-scientist.com> <6.0.1.1.2.20040211151942.01e63438@pop.mail.yahoo.com> <5A42F906-5CF8-11D8-90BE-000393071F50@mad-scientist.com> Message-ID: <6.0.1.1.2.20040212122926.01e5c4c8@pop.mail.yahoo.com> What I don't understand is that this scheme was offered as a way to resolve Zookos Triangle. How is this Google-based system secure? Brad At 05:11 PM 2/11/2004, you wrote: >I guess I am having a hard time seeing how this differs from a collective >"pet names" system (I am sure someone will fill in the ref to markm's >original paper) with SDSI-like deferals except for the fact that it >introduces really horrible trust problems. The problem with a (SDSI-like) >Google's IFL's "Dave Winer" is that the middle stage in this name chain is >not very stable and so it is hard to place any real value on what is >returned for the last key in the chain. While it is true that for some >names this will end up being a semi-stable reference, for most others it >fails to convey any context and leads to ambiguous answers. > >If we place any importance on this ephemeral link-weighting then it will >simply raise the value of attacks upon it. How hard do you really think >it would be to get the IFL result for Dave Winer to point to >http://www.gonze.com? Another problem with relying upon this sort of a >power law system is that it degrades very quickly and dramatically once >you get away from the big connectors. The more keywords/context we need >to add to this "name" to dis-ambiguate it the less useful it becomes >compared to using a real pet name. > >>"I feel lucky" itself is not exactly a perfect name resolver. Google >>attempts to answer the question "what web page is this person thinking >>of." > >No, google is trying to answer the question "what web page has the highest >rank in our system when linked to these specific keywords." >Don't try to read any sort of AI-like intention into the pagerank >algorithm, it just doesn't work that way. > >Want a simple example? Try this little IFL lookup: > >http://www.google.com/search?q=pet+names&btnI > >Jim > >_______________________________________________ >p2p-hackers mailing list >p2p-hackers@zgp.org >http://zgp.org/mailman/listinfo/p2p-hackers >_______________________________________________ >Here is a web page listing P2P Conferences: >http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From bkn3 at columbia.edu Thu Feb 12 20:31:21 2004 From: bkn3 at columbia.edu (Brad Neuberg) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <5A42F906-5CF8-11D8-90BE-000393071F50@mad-scientist.com> References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> <47D50CA7-5CDD-11D8-90BE-000393071F50@mad-scientist.com> <6.0.1.1.2.20040211151942.01e63438@pop.mail.yahoo.com> <5A42F906-5CF8-11D8-90BE-000393071F50@mad-scientist.com> Message-ID: <6.0.1.1.2.20040212123017.01e80250@pop.mail.yahoo.com> I think this is different then a pet-names system because it is a global name-space. Pet names suffer from the fact that I can't open a browser on anyones machine and enter www.cnn.com and be taken to the same place; it depends what you've labeled with that particular pet name, or what someone else who you trust has labeled. The original Google scheme seems to be a global namespace created from everyone's links. Brad At 05:11 PM 2/11/2004, you wrote: >I guess I am having a hard time seeing how this differs from a collective >"pet names" system (I am sure someone will fill in the ref to markm's >original paper) with SDSI-like deferals except for the fact that it >introduces really horrible trust problems. The problem with a (SDSI-like) >Google's IFL's "Dave Winer" is that the middle stage in this name chain is >not very stable and so it is hard to place any real value on what is >returned for the last key in the chain. While it is true that for some >names this will end up being a semi-stable reference, for most others it >fails to convey any context and leads to ambiguous answers. > >If we place any importance on this ephemeral link-weighting then it will >simply raise the value of attacks upon it. How hard do you really think >it would be to get the IFL result for Dave Winer to point to >http://www.gonze.com? Another problem with relying upon this sort of a >power law system is that it degrades very quickly and dramatically once >you get away from the big connectors. The more keywords/context we need >to add to this "name" to dis-ambiguate it the less useful it becomes >compared to using a real pet name. > >>"I feel lucky" itself is not exactly a perfect name resolver. Google >>attempts to answer the question "what web page is this person thinking >>of." > >No, google is trying to answer the question "what web page has the highest >rank in our system when linked to these specific keywords." >Don't try to read any sort of AI-like intention into the pagerank >algorithm, it just doesn't work that way. > >Want a simple example? Try this little IFL lookup: > >http://www.google.com/search?q=pet+names&btnI > >Jim > >_______________________________________________ >p2p-hackers mailing list >p2p-hackers@zgp.org >http://zgp.org/mailman/listinfo/p2p-hackers >_______________________________________________ >Here is a web page listing P2P Conferences: >http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From bkn3 at columbia.edu Thu Feb 12 20:51:37 2004 From: bkn3 at columbia.edu (Brad Neuberg) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] Zooko's Spectrum, not Zooko's Law Message-ID: <6.0.1.1.2.20040212123254.01ea7c78@pop.mail.yahoo.com> Zooko's Spectrum, not Zooko's Law --------------------------------------------------- A problem I have always had with Zookos Law (decentralization, security, human-memorizable, pick two) is that he doesn't define what he means by those three terms. What does he mean by security? Security to me is a spectrum, from completely open to completely militarily locked down. What does he mean by human-memorizable? That goes all the way from extremely human friendly, such as "Brad Neuberg" to "brad@neuberg.com" to short identifiers like Compuserve used to have such as "234323432@compuserve.com" all the way to 128-bit hashes. That sure looks like a spectrum to me. Decentralization is also itself a spectrum. Systems such as Napster and Bittorrent are hybrid decentralized, while systems such as Gnutella are much more decentralized. Systems are a complex collection of pieces; some pieces can be centralized, while the rest are decentralized, as Napster and Bittorrent have shown. Bittorrents trackers are relatively centralized, while the content streaming is decentralized. The goal is not to be religious on whether to centralize or decentralize, but to identify what your political, social, and business goals are in order to decentralize the bits that achieve these goals. I agree that at their extreme, you can't have all three qualities, but that is an extreme statement. If each of these three qualities, decentralization, security, and human-friendly names, are a spectrum, then perhaps we can have all three if we slightly relax them. Call it Zooko's Spectrum, not Zooko's Law. You don't have to throw out all three, you just have to slightly relax one of them. So you can have human-friendly names and security, but you have to slightly relax the degree of decentralization in your system (but not throw it completely out). Or perhaps you can demand extreme decentralization and extreme security without throwing out human-friendliness, but slightly relax the human-friendly part (by having names that are short numerical GUIDs the length of phone-numbers but not the 128-bit GUIDs of FreeNet). The end result is you can have your cake and eat it too, if you decide to use carrot cake instead of flour. Decentralization, Security, Human-Friendly Names: a nuanced spectrum of choices that can't all be had 100% but can slightly be had if you slightly relax one of them. I've also posted this on my blog at http://www.codinginparadise.org if you have a response to this you want to post on your blog. Brad Neuberg bkn3@columbia.edu From jdd at dixons.org Thu Feb 12 20:58:46 2004 From: jdd at dixons.org (Jim Dixon) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message-ID: <20040212204923.E29957-100000@localhost> On 12 Feb 2004, Zooko O'Whielacronx wrote: > > "Referential integrity" means that nobody can cause the resulting object to be > > other than what Alice intended. > > I suggest that we follow the tradition of computer security and separate > "violation of referential integrity" -- substituting a bogus object in place of > the object that Alice meant -- from "denial of service", i.e. preventing Bob > from getting any object. "Any object" is a bit strong. This wording implies that if Bob is prevented from 'getting' _any_ object, you do not have referential integrity. I think that what you mean to say is "referential integrity obtains if a reference can be resolved only in one way", that is, all objects obtained by resolving the reference in any mannner are identical. This does not necessarily imply that the reference can be resolved. -- Jim Dixon jdd@dixons.org tel +44 117 982 0786 mobile +44 797 373 7881 http://jxcl.sourceforge.net Java unit test coverage http://xlattice.sourceforge.net p2p communications infrastructure From bkn3 at columbia.edu Thu Feb 12 20:57:00 2004 From: bkn3 at columbia.edu (Brad Neuberg) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] Zooko's Spectrum, not Zooko's Law In-Reply-To: <6.0.1.1.2.20040212123254.01ea7c78@pop.mail.yahoo.com> References: <6.0.1.1.2.20040212123254.01ea7c78@pop.mail.yahoo.com> Message-ID: <6.0.1.1.2.20040212125633.01e5a590@pop.mail.yahoo.com> Posted this and realized that Zooko had described what he meant by security today. :) At 12:51 PM 2/12/2004, you wrote: >Zooko's Spectrum, not Zooko's Law >--------------------------------------------------- > >A problem I have always had with Zookos Law (decentralization, security, >human-memorizable, pick two) is that he doesn't define what he means by >those three terms. What does he mean by security? Security to me is a >spectrum, from completely open to completely militarily locked down. What >does he mean by human-memorizable? That goes all the way from extremely >human friendly, such as "Brad Neuberg" to "brad@neuberg.com" to short >identifiers like Compuserve used to have such as >"234323432@compuserve.com" all the way to 128-bit hashes. That sure looks >like a spectrum to me. > >Decentralization is also itself a spectrum. Systems such as Napster and >Bittorrent are hybrid decentralized, while systems such as Gnutella are >much more decentralized. Systems are a complex collection of pieces; some >pieces can be centralized, while the rest are decentralized, as Napster >and Bittorrent have shown. Bittorrents trackers are relatively >centralized, while the content streaming is decentralized. The goal is >not to be religious on whether to centralize or decentralize, but to >identify what your political, social, and business goals are in order to >decentralize the bits that achieve these goals. > >I agree that at their extreme, you can't have all three qualities, but >that is an extreme statement. If each of these three qualities, >decentralization, security, and human-friendly names, are a spectrum, then >perhaps we can have all three if we slightly relax them. > >Call it Zooko's Spectrum, not Zooko's Law. You don't have to throw out >all three, you just have to slightly relax one of them. So you can have >human-friendly names and security, but you have to slightly relax the >degree of decentralization in your system (but not throw it completely >out). Or perhaps you can demand extreme decentralization and extreme >security without throwing out human-friendliness, but slightly relax the >human-friendly part (by having names that are short numerical GUIDs the >length of phone-numbers but not the 128-bit GUIDs of FreeNet). > >The end result is you can have your cake and eat it too, if you decide to >use carrot cake instead of flour. Decentralization, Security, >Human-Friendly Names: a nuanced spectrum of choices that can't all be had >100% but can slightly be had if you slightly relax one of them. > >I've also posted this on my blog at http://www.codinginparadise.org if you >have a response to this you want to post on your blog. > >Brad Neuberg >bkn3@columbia.edu > >_______________________________________________ >p2p-hackers mailing list >p2p-hackers@zgp.org >http://zgp.org/mailman/listinfo/p2p-hackers >_______________________________________________ >Here is a web page listing P2P Conferences: >http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From lgonze at panix.com Thu Feb 12 21:05:59 2004 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <6.0.1.1.2.20040212122926.01e5c4c8@pop.mail.yahoo.com> Message-ID: <42359519-5D9F-11D8-8D19-000393455590@panix.com> On Thursday, Feb 12, 2004, at 15:29 America/New_York, Brad Neuberg wrote: > What I don't understand is that this scheme was offered as a way to > resolve Zookos Triangle. How is this Google-based system secure? You have to say what you mean by secure for me to answer well. Some terminology: there is a name, a target being named, a referrer supplying the name, and a dereferencer consuming the name. The question of whether an attacker substitute a bogus object when the name is dereferenced depends on which name and which time span. Over a hundred year time span probably all names have to qualified by the date of reference. Over a single day there are quite a few names which don't have to be qualified. Whether there are enough names with enough stability over the required lifespan of a name depends on the application. It is trivial to think of applications that these names work for and applications that they don't work for. The question of denial of service is different. This seems unlikely, since all trustworthy resolvers would have to be taken down. (What portion of resolvers? That is quantifiable but would take real research.) Lastly, I want to make a much broader point not related at all to whether my scheme resolves Zooko's triangle. The interesting stuff here is what the hell names are, how they work, and how we can write programs to support better name systems in the digital world. When I say names I mean real names, the things that we intuitively understand because they evolved along with knowledge and self consciousness. That type of naming is very decentralized, secure enough given endless kludges, and has optimal memorability. All the factors are balanced. To create an optimal name system in the digital world means to study and leverage the ways in which people already use names. I believe this is a doable job in the relatively short term. That is the substance of what I am proposing. - Lucas From zooko at zooko.com Thu Feb 12 21:09:08 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message from Jim Dixon of "Thu, 12 Feb 2004 20:58:46 GMT." <20040212204923.E29957-100000@localhost> References: <20040212204923.E29957-100000@localhost> Message-ID: Jim, thank you for your comments. Jim Dixon wrote: > > > > "Referential integrity" means that nobody can cause the resulting object to be > > > other than what Alice intended. > > > > I suggest that we follow the tradition of computer security and separate > > "violation of referential integrity" -- substituting a bogus object in place of > > the object that Alice meant -- from "denial of service", i.e. preventing Bob > > from getting any object. > > "Any object" is a bit strong. This wording implies that if Bob is > prevented from 'getting' _any_ object, you do not have referential > integrity. I'm sorry -- I don't understand the objection. What I meant to say was simply that availability of the name service could be considered separately from correctness of the name service, where by correctness I mean that the resulting object is an object that Alice intended. > I think that what you mean to say is "referential integrity obtains if a > reference can be resolved only in one way", that is, all objects obtained > by resolving the reference in any mannner are identical. This does not > necessarily imply that the reference can be resolved. Actually, that's too restrictive! Alice might want the name to resolve to a set of objects, where any one from that set is okay. For example if a SIP URL resolves to a set of proxies, and Bob should use whichever SIP proxy is currently available. However, Bob should *not* use a proxy inserted into the result by someone other than Alice. Also Alice might want the name to denote something that changes over time, so that if Bob resolves it once he gets one object, and if he resolves it again he might get another object. That can be tricky, because then denial-of-service can extend to "rollback attacks" where Bob is denied the new object and thus the name resolves to the old object. However in general a mapping from name to object which is time-variant or varies in other ways, or which is a one-to-many mapping, doesn't violate the principle of referential integrity. Regards, Zooko From bkn3 at columbia.edu Thu Feb 12 21:20:51 2004 From: bkn3 at columbia.edu (Brad Neuberg) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <42359519-5D9F-11D8-8D19-000393455590@panix.com> References: <6.0.1.1.2.20040212122926.01e5c4c8@pop.mail.yahoo.com> <42359519-5D9F-11D8-8D19-000393455590@panix.com> Message-ID: <6.0.1.1.2.20040212131846.01ec13d0@pop.mail.yahoo.com> > >Lastly, I want to make a much broader point not related at all to whether >my scheme resolves Zooko's triangle. The interesting stuff here is what >the hell names are, how they work, and how we can write programs to >support better name systems in the digital world. When I say names I mean >real names, the things that we intuitively understand because they evolved >along with knowledge and self consciousness. That type of naming is very >decentralized, secure enough given endless kludges, and has optimal >memorability. All the factors are balanced. > >To create an optimal name system in the digital world means to study and >leverage the ways in which people already use names. I believe this is a >doable job in the relatively short term. That is the substance of what I >am proposing. I think I see where your going towards. The "real-world" has converged on a system like you've described, and you're saying we can use this as an inspiration to solve these naming issues. Did you get to see the post on Zooko's Spectrum? The real world has solved these naming issues by slightly "relaxing" some of the security constraints (i.e. two people can have the same name) and by slightly relaxing the human friendliness (our names don't really mean anything, compared to some possible names which might be descriptive such as "The guy with red hair"). Brad >- Lucas > >_______________________________________________ >p2p-hackers mailing list >p2p-hackers@zgp.org >http://zgp.org/mailman/listinfo/p2p-hackers >_______________________________________________ >Here is a web page listing P2P Conferences: >http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From lgonze at panix.com Thu Feb 12 21:32:55 2004 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <6.0.1.1.2.20040212131846.01ec13d0@pop.mail.yahoo.com> Message-ID: <05425DE5-5DA3-11D8-8D19-000393455590@panix.com> On Thursday, Feb 12, 2004, at 16:20 America/New_York, Brad Neuberg wrote: > I think I see where your going towards. The "real-world" has > converged on a system like you've described, and you're saying we can > use this as an inspiration to solve these naming issues. Exactly! > Did you get to see the post on Zooko's Spectrum? The real world has > solved these naming issues by slightly "relaxing" some of the security > constraints (i.e. two people can have the same name) and by slightly > relaxing the human friendliness (our names don't really mean anything, > compared to some possible names which might be descriptive such as > "The guy with red hair"). Yup! :-) (I always considered it "Zooko's Koan" myself, because the truth of it is less interesting than all the meaty issues under the surface.) From lloyd at randombit.net Thu Feb 12 21:38:02 2004 From: lloyd at randombit.net (Jack Lloyd) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message-ID: On 12 Feb 2004, Zooko O'Whielacronx wrote: > > I think that what you mean to say is "referential integrity obtains if a > > reference can be resolved only in one way", that is, all objects obtained > > by resolving the reference in any mannner are identical. This does not > > necessarily imply that the reference can be resolved. > > Actually, that's too restrictive! Alice might want the name to resolve to a set > of objects, where any one from that set is okay. For example if a SIP URL > resolves to a set of proxies, and Bob should use whichever SIP proxy is > currently available. However, Bob should *not* use a proxy inserted into the > result by someone other than Alice. I think you could view this as the set of being a single object which is resolved, and let Bob pick whichever one from the set that he likes. Unless there is a situation where Bob should get any one element from the set but not any of the others, which I cannot think of offhand. In some cases it may make more sense for it to resolve to a single 'random' for simplicity, of course. > Also Alice might want the name to denote something that changes over time, so > that if Bob resolves it once he gets one object, and if he resolves it again he > might get another object. That can be tricky, because then denial-of-service > can extend to "rollback attacks" where Bob is denied the new object and thus the > name resolves to the old object. However in general a mapping from name to > object which is time-variant or varies in other ways, or which is a one-to-many > mapping, doesn't violate the principle of referential integrity. Are there any systems that allow for this? I have been thinking of some cases where a (semi-)persistent name that points to time-depedent data would be useful. The only thing that comes to mind is generating a long random string, but that is not self-authenticating unless you include a signature or similiar, unlike the more usual system of key = hash(content). And it allows for creating deliberate collisions, unlike the hash-based names which require you to break the hash first. I figure that has got to be a more elegant method. -J From zooko at zooko.com Thu Feb 12 21:44:13 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: Message from Lucas Gonze of "Thu, 12 Feb 2004 16:05:59 EST." <42359519-5D9F-11D8-8D19-000393455590@panix.com> References: <42359519-5D9F-11D8-8D19-000393455590@panix.com> Message-ID: Lucas Gonze wrote: > > Lastly, I want to make a much broader point not related at all to > whether my scheme resolves Zooko's triangle. The interesting stuff > here is what the hell names are, how they work, and how we can write > programs to support better name systems in the digital world. When I > say names I mean real names, the things that we intuitively understand > because they evolved along with knowledge and self consciousness. That > type of naming is very decentralized, secure enough given endless > kludges, and has optimal memorability. All the factors are balanced. Lucas: I still owe you a response to your original "IFL" message, but I wanted to make a comment about this much broader point. As Carl Ellison has argued, the kind of names that we used as our brains were evolving don't scale. Agriculture and urbanization arose about 10,000 years ago. Language probably evolved between 100,000 and 5,000,000 years ago. So for almost all of our evolutionary history we needed to remember only a handful of names for people. But I do strongly agree with the sentiment that a computer scientist attempting to devise a new tool should pay close attention to how the existing natural analog succeeds at its job. Mark Miller's "Pet Names" [1], and SDSI's "linked local namespaces" [2] are a beautiful hack to make a naming scheme that scales while allowing the use of names to be natural inasmuch as each individual user retains full control his or her own local namespace. That is: I can say "Check out Dave's new movie!" and you can say "Check out Dave's new movie!" and we can mean different Daves and thus different movies. DNS and Google's "I Feel Lucky" both seem unnatural to me -- I have to say "Check out dsmithson954@aol.co.us's new movie!" ? Regards, Zooko [1] http://www.erights.org/elib/capability/pnml.html [2] http://citeseer.nj.nec.com/2379.html From zooko at zooko.com Thu Feb 12 21:52:40 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message from Jack Lloyd of "Thu, 12 Feb 2004 16:38:02 EST." References: Message-ID: Jack Lloyd wrote: > > > Also Alice might want the name to denote something that changes over time, so > > that if Bob resolves it once he gets one object, and if he resolves it again he > > might get another object. ... > Are there any systems that allow for this? Yes. The Self-Certifying File System [1] and Freenet [2]. Probably others! Not Mnet [3] yet, alas. Regards, Zooko [1] http://www.fs.net/sfswww/ [2] http://freenet.sourceforge.net/ [3] http://mnet.sf.net/ From zooko at zooko.com Thu Feb 12 22:05:57 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: Message from Brad Neuberg of "Thu, 12 Feb 2004 12:31:21 PST." <6.0.1.1.2.20040212123017.01e80250@pop.mail.yahoo.com> References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> <47D50CA7-5CDD-11D8-90BE-000393071F50@mad-scientist.com> <6.0.1.1.2.20040211151942.01e63438@pop.mail.yahoo.com> <5A42F906-5CF8-11D8-90BE-000393071F50@mad-scientist.com> <6.0.1.1.2.20040212123017.01e80250@pop.mail.yahoo.com> Message-ID: Brad Neuberg wrote: > > I think this is different then a pet-names system because it is a global > name-space. Pet names suffer from the fact that I can't open a browser on > anyones machine and enter www.cnn.com and be taken to the same place; it > depends what you've labeled with that particular pet name, or what someone > else who you trust has labeled. I honestly consider that quality to be a benefit rather than a drawback. I know that most people disagree with me about this. I really wish that if a person wants to, he can make "cnn" map to: http://www.ce.unipr.it/pardis/CNN/cnn.html on his computer. But anyway, if you borrow his computer then maybe you can import your own namespace before you start browsing? Regards, Zooko From b.fallenstein at gmx.de Thu Feb 12 23:08:07 2004 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: References: Message-ID: <402C0757.8000507@gmx.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Zooko O'Whielacronx wrote: | Jack Lloyd wrote: |>>Also Alice might want the name to denote something that changes over time, so |>>that if Bob resolves it once he gets one object, and if he resolves it again he |>>might get another object. | | ... | |>Are there any systems that allow for this? | | Yes. The Self-Certifying File System [1] and Freenet [2]. Probably others! | Not Mnet [3] yet, alas. I'm developing a replacement for HTTP-based addressing which resolves addresses through a P2P network. (We don't have a webpage up for it, shame on us.) In Storm, our system, we have something called a 'pointer,' which can have different 'current' versions over time, but old versions stay available as long as any peer keeps a copy. I keep my homepage in Storm; you can browse it at http://himalia.it.jyu.fi/~benja/, which will re-direct you to a HTTP gateway to our system. The HTTP gateway automatically inserts a "History" link (as well as "This page is linked from...") in the upper-right corner of the page; the history is computed by retrieving all known versions of the pointer from the network. So that's kind of an example to-- the set of available versions can grow, and the 'current' version can change. Cheers, - - Benja -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFALAdWUvR5J6wSKPMRAkitAJ9krkeVMF+SqdfSRzCBBVQmOH3D+gCdGRa5 eHsvlgrmFKU3OO6/nB02nBI= =vLwJ -----END PGP SIGNATURE----- From zooko at zooko.com Thu Feb 12 23:23:59 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message from "Zooko O'Whielacronx" of "12 Feb 2004 16:52:40 EST." References: Message-ID: [following up to my own post] > > > Also Alice might want the name to denote something that changes over time, so > > > that if Bob resolves it once he gets one object, and if he resolves it again he > > > might get another object. > ... > > Are there any systems that allow for this? > > Yes. The Self-Certifying File System [1] and Freenet [2]. Probably others! > Not Mnet [3] yet, alas. Oh, and I'm embarassed that I forgot the new YURL scheme [4]. YURL is designed to fit into the World-Wide Web. It has the same integrity guarantees, based on the same sorts of cryptography, that Freenet and SFS provide. Regards, Zooko > [1] http://www.fs.net/sfswww/ > [2] http://freenet.sourceforge.net/ > [3] http://mnet.sf.net/ [4] http://www.waterken.com/dev/YURL/ From clausen at gnu.org Thu Feb 12 23:51:55 2004 From: clausen at gnu.org (Andrew Clausen) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <6.0.1.1.2.20040212122926.01e5c4c8@pop.mail.yahoo.com> References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> <47D50CA7-5CDD-11D8-90BE-000393071F50@mad-scientist.com> <6.0.1.1.2.20040211151942.01e63438@pop.mail.yahoo.com> <5A42F906-5CF8-11D8-90BE-000393071F50@mad-scientist.com> <6.0.1.1.2.20040212122926.01e5c4c8@pop.mail.yahoo.com> Message-ID: <20040212235155.GA534@gnu.org> On Thu, Feb 12, 2004 at 12:29:59PM -0800, Brad Neuberg wrote: > What I don't understand is that this scheme was offered as a way to resolve > Zookos Triangle. How is this Google-based system secure? My answer to this is: it has a high cost of attack. That is, you have to either purchase many domain names or convince many high PageRank people (who in turn purchased many domain names - i.e. recursion ;) to link to you. See my thesis: http://members.optusnet.com.au/clausen/reputation/rep-cost-attack.pdf Cheers, Andrew From clausen at gnu.org Fri Feb 13 00:11:16 2004 From: clausen at gnu.org (Andrew Clausen) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <05425DE5-5DA3-11D8-8D19-000393455590@panix.com> References: <6.0.1.1.2.20040212131846.01ec13d0@pop.mail.yahoo.com> <05425DE5-5DA3-11D8-8D19-000393455590@panix.com> Message-ID: <20040213001115.GB534@gnu.org> On Thu, Feb 12, 2004 at 04:32:55PM -0500, Lucas Gonze wrote: > On Thursday, Feb 12, 2004, at 16:20 America/New_York, Brad Neuberg > wrote: > >I think I see where your going towards. The "real-world" has > >converged on a system like you've described, and you're saying we can > >use this as an inspiration to solve these naming issues. > > Exactly! The real-world is very centralized, and isn't so inspiring, IMHO. When you trust Google's "I'm feeling lucky", you are trusting the domain name system. That is, that all domain names were purchased properly. Otherwise, you can Sybil-attack PageRank with a flood of "false" domain names at zero cost. An insider in Verisign might do this, say. So, I agree that it might be possible to use reputation to securely find objects with easy-to-remember names, but you need to bootstrap off something else you trust (eg: your friends' public keys). Cheers, Andrew From aloeser at cs.tu-berlin.de Fri Feb 13 14:30:40 2004 From: aloeser at cs.tu-berlin.de (Alexander =?iso-8859-1?Q?L=F6ser?=) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] Distributed Inverted List References: <6.0.1.1.2.20040212131846.01ec13d0@pop.mail.yahoo.com> <05425DE5-5DA3-11D8-8D19-000393455590@panix.com> <20040213001115.GB534@gnu.org> Message-ID: <402CDF90.F75628AD@cs.tu-berlin.de> Hi all, I search for approaches storing large inverted lists. Consider the case for a Hash value many object ids exist. The Object values are sorted. E.g.: HASH Object values $1234 | 123, 456, 678, 1020, 100002 $4566 | 123, 8755, 78899, 10000276 Id like to store this hashtable in form of a distributed hashtable. Unfortunately each node can only store two object values, but not the whole list of objects for a hash key. What approaches exist to store such a inverted list in a distributed manner in a DHT? Alex PS: Just as a background: In my approach I will exercise a join between the object values of the keys $1234 AND $4566. In the example the result of the would would be 123. -- ___________________________________________________________ M.Sc., Dipl. Wi.-Inf. Alexander L?ser Technische Universitaet Berlin Fakultaet IV - CIS bmb+f-Projekt: "New Economy, Neue Medien in der Bildung" hp: http://cis.cs.tu-berlin.de/~aloeser/ office: +49- 30-314-25551 fax : +49- 30-314-21601 ___________________________________________________________ From b.fallenstein at gmx.de Fri Feb 13 14:53:27 2004 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] Distributed Inverted List In-Reply-To: <402CDF90.F75628AD@cs.tu-berlin.de> References: <6.0.1.1.2.20040212131846.01ec13d0@pop.mail.yahoo.com> <05425DE5-5DA3-11D8-8D19-000393455590@panix.com> <20040213001115.GB534@gnu.org> <402CDF90.F75628AD@cs.tu-berlin.de> Message-ID: <402CE4E7.8060204@gmx.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Alex, I don't really understand the question yet: why can the whole list not be the object value? Alternatively, why can't the individual items be the object values (you say there can only be two per key, why?) and the client does the sorting? I guess that you have a reason, I just don't understand it yet. Cheers, - - Benja Alexander L?ser wrote: | Hi all, | I search for approaches storing large inverted lists. Consider the case for | a Hash value many object ids exist. The Object values are sorted. E.g.: | | | HASH Object values | $1234 | 123, 456, 678, 1020, 100002 | $4566 | 123, 8755, 78899, 10000276 | | Id like to store this hashtable in form of a distributed hashtable. | Unfortunately each node can only store two object values, but not the whole | list of objects for a hash key. What approaches exist to store such a | inverted list in a distributed manner in a DHT? | | Alex | | PS: Just as a background: In my approach I will exercise a join between the | object values of the keys $1234 AND $4566. In the example the result of | the would would be 123. | | -- | ___________________________________________________________ | | M.Sc., Dipl. Wi.-Inf. Alexander L?ser | Technische Universitaet Berlin Fakultaet IV - CIS | bmb+f-Projekt: "New Economy, Neue Medien in der Bildung" | hp: http://cis.cs.tu-berlin.de/~aloeser/ | office: +49- 30-314-25551 | fax : +49- 30-314-21601 | ___________________________________________________________ | | | _______________________________________________ | p2p-hackers mailing list | p2p-hackers@zgp.org | http://zgp.org/mailman/listinfo/p2p-hackers | _______________________________________________ | Here is a web page listing P2P Conferences: | http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences | | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFALOTnUvR5J6wSKPMRAkflAKCJUgYKZwrMKEhvgSEzirxz/SgKbgCeLP1c YfuxE5Z94u+rVESwJtonfnA= =SPiL -----END PGP SIGNATURE----- From aloeser at cs.tu-berlin.de Fri Feb 13 15:24:37 2004 From: aloeser at cs.tu-berlin.de (Alexander =?iso-8859-1?Q?L=F6ser?=) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] Distributed Inverted List References: <6.0.1.1.2.20040212131846.01ec13d0@pop.mail.yahoo.com> <05425DE5-5DA3-11D8-8D19-000393455590@panix.com> <20040213001115.GB534@gnu.org> <402CDF90.F75628AD@cs.tu-berlin.de> <402CE4E7.8060204@gmx.de> Message-ID: <402CEC35.37AB9983@cs.tu-berlin.de> Hi Benja, hi all consider a number of documents, each document has a unique DocID. Further each document is annotated with several meta data terms. I try to biuld a global index where you can lookup each terms and retreive the matching documents. One example: DocID| Terms 123 | Bossa Nova 456 | Bossa Nova, Jobim 678 | Jobim 1020 | Bossa Nova 8755 | Jobim In my approach I hash each document term ("Bossa Nova" = $1234, "Jobim"=$4566 ) nd store it in a DHT. Further, I invert the list above so I can lookup terms and get the corresponding document ID ( By the way, this is very common in information retreival and known as inverted index or inverted list) : HASID | DocID $1234 | 123, 456, 1020 $4566 | 456, 678, 8755 Now I can query for documents containing either "Bossa Nova" (DocID= 123, 456, 1020) OR "Jobim" (DocID=456, 678, 8755) or, more interesting, "Bossa Nova" AND "Jobim"(DocID=456). This works fine if each hashkey and all DocIds can be stored in a DHT on one physicall node. However if I try to store all 100 Millions documents, related to Britney spears, I can't possible store all DocIds at one physicall node. My question is: What approaches are used to store a large number of DocIDs on several physical nodes, belongig to the same hash key? Alex Benja Fallenstein wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Alex, > > I don't really understand the question yet: why can the whole list not > be the object value? Alternatively, why can't the individual items be > the object values (you say there can only be two per key, why?) and the > client does the sorting? I guess that you have a reason, I just don't > understand it yet. > > Cheers, > - - Benja > > Alexander L?ser wrote: > | Hi all, > | I search for approaches storing large inverted lists. Consider the > case for > | a Hash value many object ids exist. The Object values are sorted. E.g.: > | > | > | HASH Object values > | $1234 | 123, 456, 678, 1020, 100002 > | $4566 | 123, 8755, 78899, 10000276 > | > | Id like to store this hashtable in form of a distributed hashtable. > | Unfortunately each node can only store two object values, but not the > whole > | list of objects for a hash key. What approaches exist to store such a > | inverted list in a distributed manner in a DHT? > | > | Alex > | > | PS: Just as a background: In my approach I will exercise a join > between the > | object values of the keys $1234 AND $4566. In the example the result of > | the would would be 123. > | > | -- > | ___________________________________________________________ > | > | M.Sc., Dipl. Wi.-Inf. Alexander L?ser > | Technische Universitaet Berlin Fakultaet IV - CIS > | bmb+f-Projekt: "New Economy, Neue Medien in der Bildung" > | hp: http://cis.cs.tu-berlin.de/~aloeser/ > | office: +49- 30-314-25551 > | fax : +49- 30-314-21601 > | ___________________________________________________________ > | > | > | _______________________________________________ > | p2p-hackers mailing list > | p2p-hackers@zgp.org > | http://zgp.org/mailman/listinfo/p2p-hackers > | _______________________________________________ > | Here is a web page listing P2P Conferences: > | http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > | > | > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFALOTnUvR5J6wSKPMRAkflAKCJUgYKZwrMKEhvgSEzirxz/SgKbgCeLP1c > YfuxE5Z94u+rVESwJtonfnA= > =SPiL > -----END PGP SIGNATURE----- > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences -- ___________________________________________________________ M.Sc., Dipl. Wi.-Inf. Alexander L?ser Technische Universitaet Berlin Fakultaet IV - CIS bmb+f-Projekt: "New Economy, Neue Medien in der Bildung" hp: http://cis.cs.tu-berlin.de/~aloeser/ office: +49- 30-314-25551 fax : +49- 30-314-21601 ___________________________________________________________ From list at waterken.net Fri Feb 13 15:35:36 2004 From: list at waterken.net (Tyler Close) Date: Sat Dec 9 22:12:38 2006 Subject: The y-property (Was: [p2p-hackers] what did I mean by "secure"? (was: IFL)) In-Reply-To: References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> Message-ID: <200402130735.36230.list@waterken.net> I think what you are trying to get at is what I've called the y-property. See: http://www.waterken.com/dev/YURL/Definition/#The_y-property "" Briefly stated, the y-property is: "The introducer determines the message target." The y-property means that only the introducer has the privilege of determining the recipient of a message sent to the introduced site and the processor of the sent message. The introducer is the site authorized to write to a communication channel read by a client site. The introducer uses the communication channel to provide a URL to the client site. The URL identifies the introduced site. The introduced site is a site selected by the introducer. The client site is the site that receives the URL and uses it to send a message to the introduced site. Receiving a message means having access to the plaintext of the message. Processing a message means producing a response message which the client site will accept as an authentic response to the sent message. The y-property is the result of applying the principle of least privilege to the fact that the introducer decides which site to introduce. "" Tyler On Thu February 12 2004 11:36 am, Zooko O'Whielacronx wrote: > However, I never defined what I meant by "secure" in general in that essay. > I will now attempt to do so. > > By "a secure naming scheme" I mean that the scheme has referential > integrity. > > A person, Alice, sends a message to another person, Bob. That message > contains a name. Bob uses the naming system to de-reference that name, > resulting in an object. > > "Referential integrity" means that nobody can cause the resulting object to > be other than what Alice intended. > > There are a lot of implications of this which I understand only partly at > this point. Anyway, I'll stop for now and send out this message. > > Regards, > > Zooko > > [1] http://zooko.com/distnames.html > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences -- The union of REST and capability-based security. http://www.waterken.com/dev/Web/ From b.fallenstein at gmx.de Fri Feb 13 15:48:33 2004 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] Distributed Inverted List In-Reply-To: <402CEC35.37AB9983@cs.tu-berlin.de> References: <6.0.1.1.2.20040212131846.01ec13d0@pop.mail.yahoo.com> <05425DE5-5DA3-11D8-8D19-000393455590@panix.com> <20040213001115.GB534@gnu.org> <402CDF90.F75628AD@cs.tu-berlin.de> <402CE4E7.8060204@gmx.de> <402CEC35.37AB9983@cs.tu-berlin.de> Message-ID: <402CF1D1.3070400@gmx.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alex-- ok, I understand the problem better now. I don't have an answer for the scaling problem besides what has already been discussed on the list wrt storing large numbers of values for the same key, already. However, do remember that if you take that approach, you need to retrieve millions of values from the network to do your intersection and get the result list! "On the Feasibility of Peer-to-Peer Web Indexing and Search," by Jinyang Li, Boon Thau Loo, Joe Hellerstein, Frans Kaashoek, David R. Karger and Robert Morris, has some ideas about how to keep the bandwidth usage for intersection within bounds by using Bloom filters, but maybe you knew that already. Sorry for not being able to help with your actual question. - - Benja -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFALPHRUvR5J6wSKPMRAtyrAJ9K/4Z+TVsiuIjZ20xCM2bzxEg4NgCgyERX fE3slifAjb0NLMdeSGzsQL0= =hC3l -----END PGP SIGNATURE----- From Wolfgang.Mueller2 at uni-bayreuth.de Fri Feb 13 15:48:58 2004 From: Wolfgang.Mueller2 at uni-bayreuth.de (Wolfgang =?iso-8859-1?q?M=FCller?=) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] Distributed Inverted List In-Reply-To: <402CDF90.F75628AD@cs.tu-berlin.de> References: <6.0.1.1.2.20040212131846.01ec13d0@pop.mail.yahoo.com> <20040213001115.GB534@gnu.org> <402CDF90.F75628AD@cs.tu-berlin.de> Message-ID: <200402131648.58275.wolfgang.mueller2@uni-bayreuth.de> Hi, Alex, There is a cute paper by Li et al. about the scalability of such approaches. http://citeseer.nj.nec.com/li03feasibility.html There are some others about replacing hot spots in "chords" by multiple nodes, but I do not have the references on top of my head. Others in this list probably have. Cheers, Wolfgang From list at waterken.net Fri Feb 13 16:04:41 2004 From: list at waterken.net (Tyler Close) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: References: Message-ID: <200402130804.41436.list@waterken.net> That's not what I was getting at. I'll try to be clearer. The Google IFL system maps a name to a URL. The client then uses that URL to contact the target site. Currently, the returned URL is an http URL that is neither secure, nor decentralized. The IFL query might also return an https URL, which may be considered secure, but is not decentralized. The only secure and decentralized identifier you propose is the IFL name itself, but you can't return that in response to an IFL query, or we get infinite recursion. The output of an IFL lookup must be an identifier that is itself secure and decentralized for the IFL system to be considered secure and decentralized. IFL requires a secure and decentralized identifier scheme in order to be a secure and decentralized naming system. Given a secure and decentralized identifier scheme, the need to make yet another one is less compelling. For economy of expression, let's use the term YURL for a secure and decentralized URI. I've defined this term at: http://www.waterken.com/dev/YURL/Definition/ Given the existence of a YURL scheme, the IFL system equates to what I've been calling a keyword service. A keyword service maps a human-memorable name to a YURL. The IFL system has the special property that the mapping it provides is not decided unilaterally, but by consensus. This property makes the IFL a useful form of keyword service. I propose that the particular use-cases you have been thinking about, eg: establishing car brands, are completely solved by keyword services and that we need not try to extend the IFL keyword service into a new kind of fragile YURL scheme. We could work through some example scenarios if you like. Tyler On Thu February 12 2004 10:16 am, Lucas Gonze wrote: > On Thursday, Feb 12, 2004, at 11:24 America/New_York, Tyler Close wrote: > > So if Google IFLs are used as identifiers, how does Google > > securely identify the result of an IFL lookup, without losing the > > decentralized property? > > You shouldn't use an IFL name that isn't consistent across search > engines you trust. Any conscientious effort to use PageRank on the > same well known set of crawl results (e.g. a snapshot of the web taken > on May 5, 2003) should give the same result. If there is a problem > with one name, let that name go. > > - Lucas > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences -- The union of REST and capability-based security. http://www.waterken.com/dev/Web/ From lgonze at panix.com Fri Feb 13 16:22:58 2004 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL In-Reply-To: <20040213001115.GB534@gnu.org> Message-ID: On Thursday, Feb 12, 2004, at 19:11 America/New_York, Andrew Clausen wrote: > The real-world is very centralized, and isn't so inspiring, IMHO. A not totally irrelevant digression: I'm not interested in real world naming only as an inspiration -- memorable program-generated names have to literally model real world naming. A computer model for handling names is exactly as good as it is similar to our cognitive processes. PicHunter*, for example, also works exactly as well as it matches human cognition. * http://www.pnylab.com/pny/papers/phj/main.html > When you trust Google's "I'm feeling lucky", you are trusting the > domain > name system. That is, that all domain names were purchased properly. > Otherwise, you can Sybil-attack PageRank with a flood of "false" domain > names at zero cost. An insider in Verisign might do this, say. > > So, I agree that it might be possible to use reputation to securely > find objects with easy-to-remember names, but you need to bootstrap > off something else you trust (eg: your friends' public keys). I'm going to think about this for a while before I have a comment. The first thing I'm going to think about is whether it's the same problem as Zooko's triangle or a different one. - Lucas From list at waterken.net Fri Feb 13 16:18:53 2004 From: list at waterken.net (Tyler Close) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: References: Message-ID: <200402130818.53552.list@waterken.net> On Thu February 12 2004 03:23 pm, Zooko O'Whielacronx wrote: > [following up to my own post] > > > > > Also Alice might want the name to denote something that changes over > > > > time, so that if Bob resolves it once he gets one object, and if he > > > > resolves it again he might get another object. > > > > ... > > > > > Are there any systems that allow for this? > > > > Yes. The Self-Certifying File System [1] and Freenet [2]. Probably > > others! Not Mnet [3] yet, alas. > > Oh, and I'm embarassed that I forgot the new YURL scheme [4]. YURL is > designed to fit into the World-Wide Web. It has the same integrity > guarantees, based on the same sorts of cryptography, that Freenet and SFS > provide. YURL is actually the term I coined to define the property provided by all of these URL schemes, the y-property. See: http://www.waterken.com/dev/YURL/Definition/ There is a list of all known YURL schemes at: http://www.waterken.com/dev/YURL/#YURL_schemes (I don't have Freenet there yet, because I can't find a link to the URL specifications. Anyone got one?) The YURL scheme I created for the WWW is httpsy. See: http://www.waterken.com/dev/YURL/httpsy/ The YURL concept was derived from the cap YURL scheme used by the E language. AFAICT, the cap YURL scheme predates all others and is the originator of this field. Tyler -- The union of REST and capability-based security. http://www.waterken.com/dev/Web/ From zooko at zooko.com Fri Feb 13 17:26:29 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message from Tyler Close of "Fri, 13 Feb 2004 08:18:53 PST." <200402130818.53552.list@waterken.net> References: <200402130818.53552.list@waterken.net> Message-ID: Tyler Close wrote: > > The YURL concept was derived from the cap YURL scheme used by the > E language. AFAICT, the cap YURL scheme predates all others and is > the originator of this field. Okay, now I'm even *more* embarassed than in addition to not mentioning httpsy, I also didn't mention E: http://erights.org/ E is an object-oriented, garbage-collected programming language. References in E can refer to remote objects living on other computers across the network as well as to objects in the local virtual machine. E cryptographically enforces the rule of referential integrity: If Alice gives you a reference, then the resulting object is an object that was acceptable to Alice as the target of that reference. Like the other systems we've mentioned in this thread (except Mnet), E allows mutable content. The author of E is Mark Miller, who is also the author of the Pet Names Markup Language proposal, and who is also responsible for teaching me about these issues a few years ago. E references can be serialized in order to be stored, passed via e-mail, etc.. The serialized form of an E reference is a non-human-meaningful string. (It is derived, like all of the systems mentioned, from the secure hash of a public key.) Regards, Zooko From lgonze at panix.com Fri Feb 13 18:10:39 2004 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL: memorability as overlay In-Reply-To: <200402130804.41436.list@waterken.net> Message-ID: Assume a (+secure +decentralized -memorable) namespace based on self-authenticating hashes, and also a decentralized body of hypertext documents with edges labeled according to the whim of the labeler. In that case memorability can be added as an overlay without introducing centralization or insecurity, because PageRank applies equally to secure name schemes not backed by an authority: I sure do love the New York Times . A Sybil attack on PageRank by ICANN or Verisign would succeed in breaking names in their portion of the namespace, but names in the self-authenticating portion would continue to work. That addresses Andrew Clausen's point without introducing a second reputation network. If IFL names can resolve to (+secure +decentralized -memorable) names as well as (+secure -decentralized +memorable) names, that resolves Tyler Close's point without requiring IFL names to be recursive. I can now summarize my solution to Zooko's triangle: PageRank sometimes allows memorability to be added as an overlay on secure and decentralized but not memorable namespaces. When there is hypertext, it is possible for names to be all three of memorable, decentralized and secure. - Lucas From lgonze at panix.com Fri Feb 13 18:25:58 2004 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL: memorability as overlay In-Reply-To: Message-ID: <11711810-5E52-11D8-9EC6-000393455590@panix.com> > because PageRank applies equally to secure name schemes not backed by > an authority: Per Clausen's thesis, the integrity of PageRank is backed by the cash price of a name, so self-authenticating names are out. From lintao.liu at asu.edu Fri Feb 13 18:57:24 2004 From: lintao.liu at asu.edu (Lintao Liu) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] Distributed Inverted List In-Reply-To: <200402131648.58275.wolfgang.mueller2@uni-bayreuth.de> Message-ID: <000001c3f263$388043c0$c3b2a995@LintaoLiu> Hi,=20 I have thought about this problem and came with a partial answer, which will appear in GP2PC. http://www.public.asu.edu/~kryu1/papers/KF-GP2PC-CR2.pdf The limitation of this method is: It is not designed for full text search. The average number of keywords associated with each file shouldn't be too large, like more than 20. If each file has less than 10 keywords, this mechanism works great, based on our experiment.=20 Here is some overhead analysis which is not included in this paper but should be helpful to understand this limitation: For a single file f, K(f) is its keyword set.=20 Let m =3D |K(f)|, and n =3D |K(f)&Dictionary| We assume that all synthetic keywords consist of only two prime keywords. (Based on the experiment, more than 90% synthetic keywords follow this assumption) The total number of replicas for this file is: Without dictionary: m (1) With dictionary : m - n + n(n-1)/2 =3D m + n(n-3)/2 (2) As the first sight of (2), we know n cannot be too large. If we consider another interesting metric: n/m, we will find m also plays an important role. Basically, n/m represents how many percentages of total occurrences would be removed. In another word, it means how many generic keywords would benefit from our design. For example, there are total 100000 keyword occurrences and the top 10 keywords appear 30000 times. Putting these 10 keywords in the dictionary will make n/m =3D 30000/100000 =3D0.3 (not exactly equal to 0.3, but approximately). A larger n/m will help more generic keywords but generate more synthetic keywords. To some extent, n/m shows how much the keyword fusion helps the system to solve the imbalance problem. To keep the same ratio of n/m, a larger m will make a larger n, which is where the limitation comes from.=20 Other overhead includes network traffic. Consider a stable system where a dictionary is already built and don't change a lot, the number of messages for each file is almost the same with the number of replicas. =09 We can use the same mechanism for query processing, just replacing files with queries, and disk storage imbalance with network traffic imbalance. We believe we can achieve the same goal but cause less overhead (because the popular keywords in queries may not be generic in files, which will generate less number of new replicas). =09 Any comments are welcome. And we are doing new experiments to test our design. If you happen to have access to some query logs, we really appreciate if you could provide it.=20 Best, Lintao -----Original Message----- From: p2p-hackers-bounces@zgp.org [mailto:p2p-hackers-bounces@zgp.org] On Behalf Of Wolfgang M=FCller Sent: Friday, February 13, 2004 8:49 AM To: Peer-to-peer development. Subject: Re: [p2p-hackers] Distributed Inverted List Hi, Alex, There is a cute paper by Li et al. about the scalability of such approaches.=20 http://citeseer.nj.nec.com/li03feasibility.html There are some others about replacing hot spots in "chords" by multiple nodes,=20 but I do not have the references on top of my head. Others in this list=20 probably have. Cheers, Wolfgang _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From jdd at dixons.org Fri Feb 13 19:01:59 2004 From: jdd at dixons.org (Jim Dixon) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message-ID: <20040213182808.M29957-100000@localhost> On 12 Feb 2004, Zooko O'Whielacronx wrote: > > > > "Referential integrity" means that nobody can cause the resulting object to be > > > > other than what Alice intended. > > > > > > I suggest that we follow the tradition of computer security and separate > > > "violation of referential integrity" -- substituting a bogus object in place of > > > the object that Alice meant -- from "denial of service", i.e. preventing Bob > > > from getting any object. > > > > "Any object" is a bit strong. This wording implies that if Bob is > > prevented from 'getting' _any_ object, you do not have referential > > integrity. > > I'm sorry -- I don't understand the objection. What I meant to say was simply > that availability of the name service could be considered separately from > correctness of the name service, where by correctness I mean that the resulting > object is an object that Alice intended. Well, that's better, but I believe that Alice's intentions should not be considered. > > I think that what you mean to say is "referential integrity obtains if a > > reference can be resolved only in one way", that is, all objects obtained > > by resolving the reference in any mannner are identical. This does not > > necessarily imply that the reference can be resolved. > > Actually, that's too restrictive! Alice might want the name to resolve to a set > of objects, where any one from that set is okay. For example if a SIP URL > resolves to a set of proxies, and Bob should use whichever SIP proxy is > currently available. However, Bob should *not* use a proxy inserted into the > result by someone other than Alice. Reformulation: "referential integrity obtains if a reference can be resolved only in the manner determined by the algorithm under consideration", that is, any object obtained at any given time by resolving the reference in any manner belongs to a set of objects whose membership is determined by the algorithm under consideration and variables selected by that algorithm. The BBC used to (and probably still do) operate their domain name services in such a way that certain symbols would resolve differently depending upon when and where the question was being asked. If you were in Kansas, they wanted you to use their New York server farm, but if you were in Coventry, they directed you to London. > Also Alice might want the name to denote something that changes over time, so > that if Bob resolves it once he gets one object, and if he resolves it again he > might get another object. That can be tricky, because then denial-of-service > can extend to "rollback attacks" where Bob is denied the new object and thus the > name resolves to the old object. However in general a mapping from name to > object which is time-variant or varies in other ways, or which is a one-to-many > mapping, doesn't violate the principle of referential integrity. Consider a symbol x(t). Alice intends for this to be 1 during odd hours and 0 during even hours. Mallory blocks access to the symbol during even hours. Bob has no trouble resolving the symbol during odd hours, but during even hours he has to fall back on the last value he could resolve to. In other words, to Bob, the value of x is always 1. Your wording seems to imply that x(t) has referential integrity even though Bob's understanding of the symbol is wrong half the time. Alice has too much to drink one evening and alters the software on her server. In consequence x(t) is always 1, although she intended for it to be 1 during odd hours and 0 during even hours. In this case your wording implies that there is a lack of referential integrity, because the symbol doesn't resolve to what she intended. I submit that there isn't. I think that (a) on closer examination the notions of referential integrity and availability are quite hard to disentangle and (b) certainly the intentions of the designer should not be relevant to considerations of referential integrity. -- Jim Dixon jdd@dixons.org tel +44 117 982 0786 mobile +44 797 373 7881 http://jxcl.sourceforge.net Java unit test coverage http://xlattice.sourceforge.net p2p communications infrastructure From zooko at zooko.com Fri Feb 13 19:27:57 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message from Jim Dixon of "Fri, 13 Feb 2004 19:01:59 GMT." <20040213182808.M29957-100000@localhost> References: <20040213182808.M29957-100000@localhost> Message-ID: Jim Dixon wrote: > > Your wording seems to imply that x(t) has referential integrity even > though Bob's understanding of the symbol is wrong half the time. You're right. My current definition of referential integrity does not enable Alice to require that the object changes at specific times, even though it does enable Alice to change the object. This may be surprising, but I chose it because it is not (as far as I know) possible to securely enforce the former, but it is possible to securely enforce the latter, modulo roll-back attacks. > I think that (a) on closer examination the notions of referential > integrity and availability are quite hard to disentangle and (b) certainly > the intentions of the designer should not be relevant to considerations of > referential integrity. As to (a), I agree. As to (b), I didn't mean the designer of the system, I meant the speaker -- the one who utters the name. The notion of referential integrity that I am promulgating states that this person has the sole authority to determine what object results from the dereferencing of the name. Regards, Zooko From jdd at dixons.org Sat Feb 14 05:35:34 2004 From: jdd at dixons.org (Jim Dixon) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: Message-ID: <20040214044913.F29957-100000@localhost> On 13 Feb 2004, Zooko O'Whielacronx wrote: > > Your wording seems to imply that x(t) has referential integrity even > > though Bob's understanding of the symbol is wrong half the time. > > You're right. My current definition of referential integrity does not enable > Alice to require that the object changes at specific times, even though it does > enable Alice to change the object. This may be surprising, but I chose it > because it is not (as far as I know) possible to securely enforce the former, > but it is possible to securely enforce the latter, modulo roll-back attacks. Earlier in this thread you introduced the notions of time and caching. I added location to the mix. In the domain name system each of these -- time, caching, location -- is a factor. Bob connects to the Internet through a communications channel. He asks that a string (a.xyz.org) be resolved to an IP address. There may be an arbitrary number of xyz.org servers located at various points on the Internet. One of these will be authoritative. There also will be a number of caching name servers between Bob and the xyx.org servers. None of these is authoritative. When the string a.xyz.org is presented for resolution to the authoritative name server, it has two options: it can reply NULL (the string does not resolve), or it can reply with an IP address. Either might depend upon the time, the location of the query, any other factor known to the resolver (temperature, pressure, etc), or be random to a degree. The resolver's reply includes suggestions as to how the resolution (the information returned) should be used, specifically time-to-live. This complex context should be used to test your idea of referential integrity. Certainly any resolver needs to return not just the value that the symbol resolves to, but also whether the value was cached and if so when. It also needs to say whether or not it is authoritative, although precisely what "authoritative" means needs some further examination. > > I think that (a) on closer examination the notions of referential > > integrity and availability are quite hard to disentangle and (b) certainly > > the intentions of the designer should not be relevant to considerations of > > referential integrity. > > As to (a), I agree. As to (b), I didn't mean the designer of the system, > I meant the speaker -- the one who utters the name. The notion of referential > integrity that I am promulgating states that this person has the sole authority > to determine what object results from the dereferencing of the name. What person? What is the significance of "authority"? How is authority relevant to referential integrity? In my world, which is not at all unusual, I work behind a firewall. I control several domains. Names resolve differently depending upon which side of the firewall you are on. Certain names are undefined on one side of the firewall but defined on the other. Other names resolve to different IP addresses depending upon which side of the firewall you are on. The resolvers certainly resolve names differently at different times. When I configure my name servers incorrectly, those name servers are behaving correctly (they have 'referential integrity') when the object resulting from their dereferencing of a name is not what I intended. This is because it is the name server which is authoritative, not me. Any other interpretation of the situation leads only to confusion. -- Jim Dixon jdd@dixons.org tel +44 117 982 0786 mobile +44 797 373 7881 http://jxcl.sourceforge.net Java unit test coverage http://xlattice.sourceforge.net p2p communications infrastructure From list at waterken.net Sat Feb 14 06:48:55 2004 From: list at waterken.net (Tyler Close) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL: memorability as overlay In-Reply-To: <11711810-5E52-11D8-9EC6-000393455590@panix.com> References: <11711810-5E52-11D8-9EC6-000393455590@panix.com> Message-ID: <200402132248.55559.list@waterken.net> On Fri February 13 2004 10:25 am, Lucas Gonze wrote: > > because PageRank applies equally to secure name schemes not backed by > > an authority: > > Per Clausen's thesis, the integrity of PageRank is backed by the cash > price of a name, so self-authenticating names are out. So then is your contradiction of Zooko's Triangle. In your proof, PageRank got its decentralized property from the underlying self-authenticating pointers. Take away the self-authenticating pointers, and you take away decentralization. Although not itself centralized, PageRank depends upon a centralized infrastructure. In particular, PageRank depends upon the artificial scarcity, and resultant cost, of identity (ie: having a hostname). Tyler -- The union of REST and capability-based security. http://www.waterken.com/dev/Web/ From hopper at omnifarious.org Mon Feb 16 14:23:54 2004 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] An Analysis of Compare by Hash Message-ID: <1076941433.26007.6.camel@monster.omnifarious.org> Many people here might already be aware of this interesting paper, but I thought I'd post it anyway, since I hadn't seen any mention of it. http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson_html/hash.html This is directly applicable to many of the projects people talk about here. Have fun (if at all possible), -- The best we can hope for concerning the people at large is that they be properly armed. -- Alexander Hamilton -- Eric Hopper (hopper@omnifarious.org http://www.omnifarious.org/~hopper) -- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20040216/8740bdda/attachment.pgp From zooko at zooko.com Mon Feb 16 15:18:47 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] An Analysis of Compare by Hash In-Reply-To: Message from "Eric M. Hopper" of "Mon, 16 Feb 2004 06:23:54 PST." <1076941433.26007.6.camel@monster.omnifarious.org> References: <1076941433.26007.6.camel@monster.omnifarious.org> Message-ID: > http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson_html/hash.html The paper is wrong on a factual count, plus I disagree with the engineering intuitions. I'll just point out the factual part here. "Users of compare-by-hash argue that this assumption is warranted because the chance of a hash collision between any two randomly generated blocks is estimated to be many orders of magnitude smaller than the chance of many kinds of hardware errors." There is no recourse to "randomly generated blocks" in the design and analysis of cryptographic hashes like SHA-1. The hypothetical adversary who seeks to cause collisions is allowed to generate the pre-images however he likes, including making them related, adaptively computing them, doing birthday- collision attacks, ad infinitum. SHA-1 and the other crypto hashes have been explicitly designed and evaluated under that assumption, *not* under some kind of "random inputs" assumption [1]. Henson's suggestion that perhaps one could find collisions in SHA-1 by using non-random inputs is pure speculation, and appears to have been written in ignorance of the relevant research. Graydon Hoare has written a more detailed response to Henson: http://www.venge.net/monotone/docs/Hash-Integrity.html#Hash%20Integrity Hopefully Graydon's note will be cited whereever Henson's paper is. Regards, Zooko [1] As an example of how cryptographers have not allowed their work to rest on this extremely strong assumption about random distribution of inputs, consider that they have previously used a weaker assumption called "hash function balance", and that they have recently questioned even this assumption: "Hash Function Balance and its Impact on Birthday Attacks" (2002) Mihir Bellare, Tadayoshi Kohno http://citeseer.nj.nec.com/bellare02hash.html From hopper at omnifarious.org Mon Feb 16 21:36:22 2004 From: hopper at omnifarious.org (Eric Mathew Hopper) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] An Analysis of Compare by Hash In-Reply-To: References: <1076941433.26007.6.camel@monster.omnifarious.org> Message-ID: <20040216213622.GA21307@omnifarious.org> On Mon, Feb 16, 2004 at 10:18:47AM -0500, Zooko O'Whielacronx wrote: > > > http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson_html/hash.html > > The paper is wrong on a factual count, plus I disagree with the > engineering intuitions. I'll just point out the factual part here. > > "Users of compare-by-hash argue that this assumption is warranted > because the chance of a hash collision between any two randomly > generated blocks is estimated to be many orders of magnitude smaller > than the chance of many kinds of hardware errors." > > There is no recourse to "randomly generated blocks" in the design and > analysis of cryptographic hashes like SHA-1. This, I totally agreed with, and it's a major flaw of that paper. It confused a lot of people in the forum that was talking about Monotone. I also agree that his engineering intuition is wrong. He seems to appreciate what exponents really mean, but then goes on to demonstrate that he doesn't. Though, my own back-of-the-envelope calculations, where I assume one document for every electron in the ocean indiciate that SHA-1 is possibly inadequate for the needs of a globally distributed filesystem for the indefinite future. But they also indicate that SHA-256 is more than adequate (though it's also significantly less well proven). :-) > Graydon Hoare has written a more detailed response to Henson: > > http://www.venge.net/monotone/docs/Hash-Integrity.html#Hash%20Integrity > > Hopefully Graydon's note will be cited whereever Henson's paper is. It should be. Thanks for pointing it out. One thing that I think the first paper usefully points out though is that some systems are dependent in a very fundamental way on the evenness of the distribution of SHA-1 output. And it is good to keep in mind this dependency when designing them. Have fun (if at all possible), -- "It does me no injury for my neighbor to say there are twenty gods or no God. It neither picks my pocket nor breaks my leg." --- Thomas Jefferson "Go to Heaven for the climate, Hell for the company." -- Mark Twain -- Eric Hopper (hopper@omnifarious.org http://www.omnifarious.org/~hopper) -- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20040216/2b429d8d/attachment.pgp From hopper at omnifarious.org Tue Feb 17 02:30:27 2004 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:38 2006 Subject: The y-property (Was: [p2p-hackers] what did I mean by "secure"? (was: IFL)) In-Reply-To: <200402130735.36230.list@waterken.net> References: <8B690778-5CC1-11D8-BA3F-000393455590@panix.com> <200402130735.36230.list@waterken.net> Message-ID: <1076985026.26007.21.camel@monster.omnifarious.org> On Fri, 2004-02-13 at 07:35, Tyler Close wrote: > I think what you are trying to get at is what I've called the > y-property. See: > > http://www.waterken.com/dev/YURL/Definition/#The_y-property > > "" > Briefly stated, the y-property is: "The introducer determines the message > target." Yes, Zooko's statement and this one are equivalent. -- Eric M. Hopper -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20040216/7523969d/attachment.pgp From hopper at omnifarious.org Tue Feb 17 02:44:48 2004 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] what did I mean by "secure"? (was: IFL) In-Reply-To: <20040213182808.M29957-100000@localhost> References: <20040213182808.M29957-100000@localhost> Message-ID: <1076985887.26007.27.camel@monster.omnifarious.org> On Fri, 2004-02-13 at 11:01, Jim Dixon wrote: > I think that (a) on closer examination the notions of referential > integrity and availability are quite hard to disentangle and (b) certainly > the intentions of the designer should not be relevant to considerations of > referential integrity. It's pretty easy for Bob to determine whether or not his information is 'live' or not. Perhaps that should be added as a phrase "Any request for an object that returns a live value returns what Alice intended.". I submit that determining whether or not Alice's intentions are 'real' (i.e. she's drunk or under coercion) is beyond the scope of the definition. Have fun (if at all possible), -- The best we can hope for concerning the people at large is that they be properly armed. -- Alexander Hamilton -- Eric Hopper (hopper@omnifarious.org http://www.omnifarious.org/~hopper) -- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20040216/d434a499/attachment.pgp From hopper at omnifarious.org Tue Feb 17 02:51:14 2004 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL: memorability as overlay In-Reply-To: <200402132248.55559.list@waterken.net> References: <11711810-5E52-11D8-9EC6-000393455590@panix.com> <200402132248.55559.list@waterken.net> Message-ID: <1076986274.26007.31.camel@monster.omnifarious.org> On Fri, 2004-02-13 at 22:48, Tyler Close wrote: > Although not itself centralized, PageRank depends upon a > centralized infrastructure. In particular, PageRank depends upon > the artificial scarcity, and resultant cost, of identity (ie: > having a hostname). No hostname is required, just an IP. From what I know, PageRank would still work if everybody used IP addresses instead of hostnames in URLs. Though, that would break half the websites out their as many use virtual hosting. Though, I suppose you could say that having an IP that doesn't change over time is costly and therefor a scarce commodity. -- The best we can hope for concerning the people at large is that they be properly armed. -- Alexander Hamilton -- Eric Hopper (hopper@omnifarious.org http://www.omnifarious.org/~hopper) -- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20040216/c7bf4aef/attachment.pgp From sam at neurogrid.com Tue Feb 17 03:03:05 2004 From: sam at neurogrid.com (Sam Joseph) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] Re: P2P journal copyright In-Reply-To: <5.0.2.1.1.20040131100728.00a3f650@pop.home.se> References: <20040131030211.GF20611@lycopodium> <401AF6B8.6050902@neurogrid.com> <20040131030211.GF20611@lycopodium> <5.0.2.1.1.20040131100728.00a3f650@pop.home.se> Message-ID: <40318469.1090508@neurogrid.com> Hi David, I've finally got things together to read all the mails and related documents - apologies for the delay. David G?thberg wrote: > Sam Joseph, p2pjournal.com: You're new version of your copyright > agreement is much more agreeable. However I think it does create a > whole set of legal problems and uncertainties for both parties. So I > have some suggestions. > > Stating that the p2pjournal gets a "license" and then that "Authors > are granted rights to reproduce" makes it very unclear who owns what. > The wording you have chosen for instance might make it illegal for any > of the parties to resell the text! That is, your wording gives both > parties the right to reproduce the text, but not to sell copies of it > or resell the rights to it. > > I think it is better and easier to give both parties one complete > copyright and ownership of the text. That is, to copy the copyright! :) > > Here's a rough translation (from memory) and adaptation of the > copyright agreement we used for the paintings my mother bought for the > book she wrote. Lawyers in Sweden thought this was a very nice idea > and they could see no legal problem with it: Thanks for your input here David - although I have to say that since I am not a lawyer I don't really know whether our new wording or your new suggestion would lead to more or less legal complications ... I also had an email from Johan Fange suggesting that perhaps the copyright should be of limited duration, and of course we also had input to the list from Nick Lothian with links to some other journals that have had to deal with these issues. Many thanks to Nick for those links. Actually I think that the debate Nick is linking us to is more about the cost of journals than the copyright, and since P2PJournal is free, perhaps that debate is not so relevant. However through those links I did find the copyright approach of the American Mathematical Society http://www.ams.org/authors/ctp.html which seems to be the kind of agreement that Knuth and co were advocating. Anyways, I think we have to update P2PJ's current pages on copyright in some fashion to stop Don flaming my next CFP, so I'm going to suggest we go with Ray's wording for the moment, and try and get some legal consultation about what would really achieve the two goals of satisfying all the authors and journals requirements, because otherwise this is all a bit meaningless. Thanks to everyone for their input on this. CHEERS> SAM From clausen at gnu.org Tue Feb 17 03:55:12 2004 From: clausen at gnu.org (Andrew Clausen) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] IFL: memorability as overlay In-Reply-To: <1076986274.26007.31.camel@monster.omnifarious.org> References: <11711810-5E52-11D8-9EC6-000393455590@panix.com> <200402132248.55559.list@waterken.net> <1076986274.26007.31.camel@monster.omnifarious.org> Message-ID: <20040217035512.GC580@gnu.org> On Mon, Feb 16, 2004 at 06:51:14PM -0800, Eric M. Hopper wrote: > On Fri, 2004-02-13 at 22:48, Tyler Close wrote: > > Although not itself centralized, PageRank depends upon a > > centralized infrastructure. In particular, PageRank depends upon > > the artificial scarcity, and resultant cost, of identity (ie: > > having a hostname). > > No hostname is required, just an IP. From what I know, PageRank would > still work if everybody used IP addresses instead of hostnames in URLs. PageRank requires you allocate "initial karma" to some subset of pages. You could use the domain system, IP addresses, the Open Directory, or any combination of these to seed this. I am not aware of any evidence of what Google actually does use. > Though, that would break half the websites out their as many use virtual > hosting. It wouldn't "break" them, it would just mean they wouldn't get as much "initial karma". > Though, I suppose you could say that having an IP that doesn't change > over time is costly and therefor a scarce commodity. Yeah, probably scarcer than domain names. But, IP address allocation is also centralized. Cheers, Andrew From hopper at omnifarious.org Tue Feb 17 08:20:22 2004 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? In-Reply-To: <20040203160155.G78403-100000@localhost> References: <20040203160155.G78403-100000@localhost> Message-ID: <1077006022.26007.76.camel@monster.omnifarious.org> On Tue, 2004-02-03 at 08:13, Jim Dixon wrote: > On Tue, 3 Feb 2004, Benja Fallenstein wrote: > > > does anybody know references for using cryptographic hashes as unique > > identifiers for files in very large repositories (think all of the Web)? > > The references I've found (e.g. Handbook of Applied Cryptography) don't > > talk explicitly about that, but only about applications in message > > authentication, and attacks related to that; of course that's related, > > but it would be nice to know whether there are references from > > cryptology talking explicitly about hashes as unique identifiers in very > > large collections of messages. > > No references really needed. You can use SHA digests to generate 160 > bit/20 byte keys which can be used as unique identifiers. While it is > theoretically possible that one messages or other document could hash to > the same digest as another, changes are approximately 10^16 against it, so > we needn't worry in our lifetimes. Actually lets do this guess and come up with a better number... Lets say there are approximately 275 billion documents on the web. That's close to 2^38. This means there are 2^76 relations between two different documents, each of which has the potential for two documents with the same hash. Assuming SHA1 has a flat distribution or is 'balanced', there's an even chance for all 2^160 possible values. 2^160 / 2^76 = 2^84 This is a chance of about 1 in 2^84 (or 10^25) of two documents currently existing that have the same hash. This is extremely tiny. The chances of us being wiped out by an asteroid in the next decade are probably at least a trillion times higher. For every factor of 2 increase in the number of documents, there is a factor of 4 increase in the chance of two of those documents having the same hash. It all depends on the chances you're willing to accept. And, how willing you are to accept current assumptions about the flatness in SHA1's output. IMHO, cryptographic hash functions receive too little analysis given their importance. The cryptographic community doesn't seem to feel that it's as fun to break a hash function as it is to break a block cipher. I think cryptographic hash functions are actually more important than block ciphers. And I think the way people are starting to use them as identifiers makes assurances as to the flatness of their output even more important than the uses they've previously been put to in cryptography. I've only seen one paper so far [1] (thanks to Zooko for pointing it out to me) that even lays out an approach for analyzing the flatness (or balance) of cryptographic hash functions. I've seen at least 2-3 papers analyzing Rijndael in great detail. But, these are just random opinions I hold. :-) Have fun (if at all possible), -- The best we can hope for concerning the people at large is that they be properly armed. -- Alexander Hamilton -- Eric Hopper (hopper@omnifarious.org http://www.omnifarious.org/~hopper) -- [1] "Hash Function Balance and its Impact on Birthday Attacks" (2002) Mihir Bellare, Tadayoshi Kohno http://citeseer.nj.nec.com/bellare02hash.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20040217/11ed482d/attachment.pgp From b.fallenstein at gmx.de Tue Feb 17 10:51:43 2004 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? In-Reply-To: <1077006022.26007.76.camel@monster.omnifarious.org> References: <20040203160155.G78403-100000@localhost> <1077006022.26007.76.camel@monster.omnifarious.org> Message-ID: <4031F23F.6070503@gmx.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Eric M. Hopper wrote: | Lets say there are approximately 275 billion documents on the web. | | That's close to 2^38. This means there are 2^76 relations between two | different documents, each of which has the potential for two documents | with the same hash. | | Assuming SHA1 has a flat distribution or is 'balanced', there's an even | chance for all 2^160 possible values. | | 2^160 / 2^76 = 2^84 | | This is a chance of about 1 in 2^84 (or 10^25) of two documents | currently existing that have the same hash. I got confused by your way of putting it, although on closer inspection you are correct. The formula that I have in mind (which gives an upper bound to the probability of a hash collision) is ~ (collision probability) < (number of docs)^2 / (possible hashes) I.e., in the example, ~ probability < (2^38)^2 / 2^160 = 2^(-84) or one in 2^84. | This is extremely tiny. | The chances of us being wiped out by an asteroid in the next decade are | probably at least a trillion times higher. This is an example I have been using too! ;-) I also find comparing this to the risks we think acceptable when designing nuclear power plants a good point. A problem this line of argument suffers from is that for neither of the two points, I can come up with an article that actually points numbers on these risks. That would make it more convincing, if you could show how big these probabilities are, compared to a hash collision one. | For every factor of 2 increase in the number of documents, there is a | factor of 4 increase in the chance of two of those documents having the | same hash. (As an upper bound that is close enough to the actual value for the difference not to be interesting.) | IMHO, cryptographic hash functions receive too little analysis given | their importance. The cryptographic community doesn't seem to feel that | it's as fun to break a hash function as it is to break a block cipher. | I think cryptographic hash functions are actually more important than | block ciphers. And I think the way people are starting to use them as | identifiers makes assurances as to the flatness of their output even | more important than the uses they've previously been put to in | cryptography. | | I've only seen one paper so far [1] (thanks to Zooko for pointing it out | to me) that even lays out an approach for analyzing the flatness (or | balance) of cryptographic hash functions. I've seen at least 2-3 papers | analyzing Rijndael in great detail. I agree very much. I have not seen any literature about the assumptions behind the design of hash functions, either; papers just say, "Let there be a hash function, and the algorithm shall be as follows," but I have not yet seen any discussions about what combinations of steps make us believe that it is hard to break a function and why. It would be cool if a cryptography PhD student read this list and took your mail as an invitation ;-) - - Benja -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAMfI/UvR5J6wSKPMRAi5DAJ0cHkX6eLHggj2wTcE+8Aesna43ewCfdxWK mDyT+OG6bRqtPcKNdXXl6pA= =H+Lo -----END PGP SIGNATURE----- From zooko at zooko.com Tue Feb 17 12:11:21 2004 From: zooko at zooko.com (Zooko O'Whielacronx) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] References for using hashes as unique identifiers? In-Reply-To: Message from Benja Fallenstein of "Tue, 17 Feb 2004 12:51:43 +0200." <4031F23F.6070503@gmx.de> References: <20040203160155.G78403-100000@localhost> <1077006022.26007.76.camel@monster.omnifarious.org> <4031F23F.6070503@gmx.de> Message-ID: Once upon a time there was a discussion about the risk of hash collision in Mojo Nation on the Mojo Nation devel mailing list. Hal Finney said that such extreme probabilities were beyond our ability to engineer. I agreed and said that there was a higher chance that our hash-collision-handling code would be buggy than that there would be a hash collision. Greg Smith wasn't satisfied with this and went ahead and implemented a simple routine to detect and clean up after a hash collision. Shortly thereafter we discovered that there was a sporadic bug in this routine which caused it to trigger sometimes even in the absence of a collision. Rather than fixing the bug, Greg removed the routine. Regards, Zooko From yo0ga at yahoo.com Tue Feb 17 17:48:40 2004 From: yo0ga at yahoo.com (yoga avidia sudarma) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] need a password Message-ID: <20040217174840.95803.qmail@web9405.mail.yahoo.com> 1. IDEALWINA@YAHOO.COM 2. CARENPM@YAHOO.COM --------------------------------- Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zgp.org/pipermail/p2p-hackers/attachments/20040217/4f7b26f7/attachment.html From yo0ga at yahoo.com Tue Feb 17 20:04:32 2004 From: yo0ga at yahoo.com (yoga avidia sudarma) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] Re: p2p-hackers Digest, Vol 7, Issue 19 In-Reply-To: <20040217200007.AD2603FD71@capsicum.zgp.org> Message-ID: <20040217200432.10743.qmail@web9404.mail.yahoo.com> need help....... email me the password of: 1. idealwina@yahoo.com 2. carenpm@yahoo.com --------------------------------- Do you Yahoo!? Yahoo! Finance: Get your refund fast by filing online -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zgp.org/pipermail/p2p-hackers/attachments/20040217/ce33a7c5/attachment.htm From eugen at leitl.org Thu Feb 19 16:35:31 2004 From: eugen at leitl.org (Eugen Leitl) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] [Twisted-Python] Twisted and P2P (re: p2p, gnutella) (fwd from rdrb@123.cl) Message-ID: <20040219163531.GQ26194@leitl.org> ----- Forwarded message from RITA Y/O RODRIGO DIAZ Y/O BENENSON ----- From: RITA Y/O RODRIGO DIAZ Y/O BENENSON Date: Wed, 18 Feb 2004 21:51:51 -0300 To: twisted-python@twistedmatrix.com Subject: [Twisted-Python] Twisted and P2P (re: p2p, gnutella) X-Mailer: iPlanet Messenger Express 5.2 HotFix 1.21 (built Sep 8 2003) Reply-To: twisted-python@twistedmatrix.com Hi, I'm working on the Twistification of http://thecircle.org.au/ and in the reparation of the broken http://khashmir.sf.net if you would like to help contact me at the email myname at elo dot utfsm dot cl knowing that myname is rodrigob (I hate spam) TheCircle work is just at his begining (design is made, code is starting, slowly), the khashmir reparation is going fine (but no estimation can be done in debugging, because number of critical bugs is unknown). rodrigob. > From: stephan > Subject: [Twisted-Python] p2p, gnutella > Reply-To: twisted-python@twistedmatrix.com > > > > I am looking forward to add p2p functionality to my app. Does > anybody know > what the current status of twisted's gnutella implementation is? > > There seems to have been a semi-finished implemention in > 1.0.2alpha4 which > I can't find in 1.1.1 anymore. > > I might also be willing to implement missing parts but I would need > to get > a short briefing on how things are. > > thanks, > > _stephan > _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python ----- End forwarded message ----- -- Eugen* Leitl leitl ______________________________________________________________ ICBM: 48.07078, 11.61144 http://www.leitl.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE http://moleculardevices.org http://nanomachines.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20040219/706940d7/attachment.pgp From Ashish_Vashishta at baylor.edu Tue Feb 24 03:54:41 2004 From: Ashish_Vashishta at baylor.edu (Vashishta, Ashish) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] P2P in NS Message-ID: Hi all I have to implement YOID multicast protocol in NS. I am not sure of how to begin with this P2P stuff in NS. There are several issues related to P2P and NS in general that I am not able to resolve. Can anyone provide some pointers where I can go and look for P2P stuff for NS. The NS mailing list archive is not quite helpful for P2P simulations. Thanks in advance Ashish -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zgp.org/pipermail/p2p-hackers/attachments/20040223/964653da/attachment.htm From sam at neurogrid.com Tue Feb 24 04:05:22 2004 From: sam at neurogrid.com (Sam Joseph) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] P2P in NS In-Reply-To: References: Message-ID: <403ACD82.7040809@neurogrid.com> Hi Ashish, I think the thing you want to check out is PacketLevel P2P simulator which can run on top of NS. I reviewed it as part of my paper on P2P simulations: http://www.p2pjournal.com/issues/November03.pdf The PLP2P simulator can be found at the following link: http://www.cc.gatech.edu/computing/compass/gnutella/ CHEERS> SAM Vashishta, Ashish wrote: > Hi all > > I have to implement YOID multicast protocol in NS. I am not sure of > how to begin with this P2P stuff in NS. There are several issues > related to P2P and NS in general that I am not able to resolve. > > Can anyone provide some pointers where I can go and look for P2P stuff > for NS. The NS mailing list archive is not quite helpful for P2P > simulations. > > > > Thanks in advance > > Ashish > > > >------------------------------------------------------------------------ > >_______________________________________________ >p2p-hackers mailing list >p2p-hackers@zgp.org >http://zgp.org/mailman/listinfo/p2p-hackers >_______________________________________________ >Here is a web page listing P2P Conferences: >http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > > From sam at neurogrid.com Tue Feb 24 08:51:00 2004 From: sam at neurogrid.com (Sam Joseph) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] CFP: AP2PC 2004 Message-ID: <403B1074.5080605@neurogrid.com> *** our apologies if you receive multiple copies of this e-mail *** Preliminary call for papers Third International Workshop on Agents and Peer-to-Peer Computing (AP2PC 2004) http://p2p.ingce.unibo.it/ to be held at AAMAS 2004 Third International Joint Conference on Autonomous Agents and MultiAgent Systems New York City, USA July 19 or 20, 2004 ---------- Overview ---------- Peer-to-peer (P2P) computing is attracting enormous media attention, spurred by the popularity of file sharing systems such as Napster, Gnutella, and Morpheus. The peers are autonomous, or as some call them, first-class citizens. P2P networks are emerging as a new distributed computing paradigm for their potential to harness the computing power of the hosts composing the network and make their under-utilized resources available to others. This possibility has generated a lot of interest in many industrial organizations which have already launched important projects. In P2P systems, peer and web services in the role of resources become shared and combined to enable new capabilities greater than the sum of the parts. This means that services can be developed and treated as pools of methods that can be composed dynamically. The decentralized nature of P2P computing makes it also ideal for economic environments that foster knowledge sharing and collaboration as well as cooperative and non-cooperative behaviors in sharing resources. Business models are being developed, which rely on incentive mechanisms to supply contributions to the system and methods for controlling free riding. Clearly, the growth and the management of P2P networks must be regulated to ensure adequate compensation of content and/or service providers. At the same time, there is also a need to ensure equitable distribution of content and services. Although researchers working on distributed computing, MultiAgent Systems, databases and networks have been using similar concepts for a long time, it is only recently that papers motivated by the current P2P paradigm have started appearing in high quality conferences and workshops. Research in agent systems in particular appears to be most relevant because, since their inception, MultiAgent Systems have always been thought of as networks of peers. The MultiAgent paradigm can thus be superimposed on the P2P architecture, where agents embody the description of the task environments, the decision-support capabilities, the collective behavior, and the interaction protocols of each peer. The emphasis in this context on decentralization, user autonomy, ease and speed of growth that gives P2P its advantages, also leads to significant potential problems. Most prominent among these problems are coordination: the ability of an agent to make decisions on its own actions in the context of activities of other agents, and scalability: the value of the P2P systems lies in how well they scale along several dimensions, including complexity, heterogeneity of peers, robustness, traffic redistribution, and so on. It is important to scale up coordination strategies along multiple dimensions to enhance their tractability and viability, and thereby to widen the application domains. These two problems are common to many large-scale applications. Without coordination, agents may be wasting their efforts, squander resources and fail to achieve their objectives in situations requiring collective effort. This workshop will bring together researchers working on agent systems and P2P computing with the intention of strengthening this connection. Researchers from other related areas such as distributed systems, networks and database systems will also be welcome (and, in our opinion, have a lot to contribute). We seek high-quality and original contributions on the general theme of "Agents and P2P Computing". The following is a non-exhaustive list of topics of special interest: * Intelligent agent techniques for P2P computing * P2P computing techniques for multi-agent systems * The Semantic Web, Semantic Coordination Mechanisms and P2P systems * Scalability, coordination, robustness and adaptability in P2P systems * Self-organization and emergent behavior in P2P networks * E-commerce and P2P computing * Participation and Contract Incentive Mechanisms in P2P Systems * Computational Models of Trust and Reputation * Community of interest building and regulation, and behavioral norms * Intellectual property rights in P2P systems * P2P architectures * Scalable Data Structures for P2P systems * Services in P2P systems (service definition languages, service discovery, filtering and composition etc.) * Knowledge Discovery and P2P Data Mining Agents * P2P data management * Information ecosystems and P2P systems * Security issues in P2P networks * Pervasive computing based on P2P architectures (ad-hoc networks, wireless communication devices and mobile systems) * Grid computing solutions based on agents and P2P paradigms * Legal issues in P2P networks ------- Panel ------- The theme of the panel will be Conducting Business via P2P. P2P computing has had some visible successes in applications such as file sharing, but many of these applications have had a consumer or hobbyist focus. This panel will discuss emerging "mission-critical" applications of P2P and the challenges that P2P technologies must surmount in order to best support such applications. These challenges include security, trust and reputation, representing business protocols, checking compliance, bootstrapping systems, and performance. The panel will involve 10 minute presentations by four panelists followed by a discussion session involving the audience. ------------------ Important dates ------------------ Abstract: 1st April 2004 (see submission instructions below) Paper submission: 6th April 2004 Acceptance notification: 1st May 2004 Workshop: 19 or 20th July 2004 Camera ready for Post-proceedings: 31st August 2004 --------------- Registration --------------- Accomodation and workshop registration will be handled by the AAMAS 2004 organization along with the main conference registration. --------------------------- Submission instructions --------------------------- Unpublished papers should be formatted according to the LNCS/LNAI author instructions for proceedings and they should not be longer than 12 pages (about 5000 words including figures, tables, references, etc.). The abstract and then the paper should be submitted electronically HERE according to the deadlines mentioned above. In case of problems submit abstract and paper (pdf), according to the deadlines mentioned above, to submission@ingce.unibo.it by specifying in both emails: paper's author(s), title, contact author and at most 5 keywords/topics. ------------- Publication ------------- Accepted papers will be distributed to the workshop participants as workshop notes. Post-proceedings of the revised papers (namely accepted papers presented at the workshop) will be published by Springer - Lecture Notes in Computer Science series (LNCS) Here are the volumes of revised and invited papers of preceding editions: LNCS volume no. 2530 for AP2PC'2002 LNCS volume no. 2872 for AP2PC'2003 (publication in progress) ------------- Organizers ------------- Program Co-chairs Karl Aberer ?cole Polytechnique F?d?rale de Lausanne (EPFL) CH-1015 Lausanne, Switzerland Tel. +41-21-693 4679 - Fax +41-21-693 8115 E-mail: karl.aberer@epfl.ch Sonia Bergamaschi Dept. of Science Engineering, University of Modena and Reggio-Emilia, Italy via Vignolese, 905 - 41100 Modena Italy Tel. +39 059 2056132 - Fax +39 059 2056126 E-mail: bergamaschi.sonia@unimo.it Gianluca Moro (main contact) Dept. of Electronics, Computer Science and Systems, University of Bologna, Italy Via Venezia, 52 - I-47023 Cesena (FC) Tel. +39 0547 339237 - Fax +39 0547 339208 E-mail: gmoro@deis.unibo.it ------------- Panel Chair ------------- Munindar P. Singh Dept. of Computer Science, North Carolina State University, USA Venture I, Suite 110 / Box 7535 - Raleigh, NC 27695-7535 Tel. +1 919 515.5677 - Fax +1 919 515.7896 E-mail: mpsingh@eos.ncsu.edu ---------------------- Steering Commitee ---------------------- Karl Aberer, EPFL, Lausanne, Switzerland Sonia Bergamaschi, University of Modena and Reggio-Emilia, Italy Manolis Koubarakis, Technical University of Crete Paul Marrow, Intelligent Systems Laboratory, BTexact Technologies, UK Gianluca Moro, University of Bologna, Cesena, Italy Aris M. Ouksel, University of Illinois at Chicago, USA Claudio Sartori, University of Bologna, Italy Munindar P. Singh, North Carolina State University, USA ---------------------------------- Web Master of Review System ---------------------------------- Sam Joseph Laboratory for Interactive Learning Technology (LILT), University of Hawaii E-mail: srjoseph@hawaii.edu --------------- Sponsorships --------------- Khaled Nagi Computer Science Dept., Alexandria University, E-mail: khaledn@acm.org ---------------------- Program commitee ---------------------- Karl Aberer, EPFL, Lausanne, Switzerland Sonia Bergamaschi, University of Modena and Reggio-Emilia, Italy Jon Bing, Universitat of Oslo, Norway M. Brian Blake, Georgetown University, USA Rajkumar Buyya, University of Melbourne, Australia Ooi Beng Chin, National University of Singapore, Singapore Paolo Ciancarini, University of Bologna, Italy Costas Courcoubetis, Athens University of Economics and Business, Greece Yogesh Deshpande, University of Western Sydney, Australia Asuman Dogac, Middle East Technical University, Turkey Boi V. Faltings, EPFL, Lausanne, Switzerland Maria Gini, University of Minnesota, USA Dina Q. Goldin, University of Connecticut, USA Chihab Hanachi, University of Toulouse, France Mark Klein, Massachusetts Institute of Technology, USA Matthias Klusch, DFKI, Saarbrucken, Germany Yannis Labrou, PowerMarket Inc., USA Tan Kian Lee, National University of Singapore, Singapore Dejan Milojicic, Hewlett Packard Labs, USA Alberto Montresor, University of Bologna, Italy Luc Moreau, University of Southampton, UK Jean-Henry Morin, University of Geneve, Switzerland John Mylopoulos, University of Toronto, Canada Andrea Omicini, University of Bologna, Italy Maria Orlowska, University of Queensland, Australia Aris. M. Ouksel, University of Illinois at Chicago, USA Mike Papazoglou, Tilburg University, Netherlands Terry R. Payne, University of Southampton, UK Paolo Petta, Austrian Research Institute for AI, Austria, Jeremy Pitt, Imperial College, UK Dimitris Plexousakis, Institute of Computer Science, FORTH, Greece Martin Purvis, University of Otago, New Zealand Omer F. Rana, Cardiff University, UK Katia Sycara, Robotics Institute, Carnegie Mellon University, USA Douglas S. Reeves, North Carolina State University, USA Thomas Risse, Fraunhofer IPSI, Darmstadt, Germany Pierangela Samarati, University of Milan, Italy Giovanni Sartor, CIRSFID, University of Bologna, Italy, Christophe Silbertin-Blanc, University of Toulouse, France Maarten van Steen, Vrije Universiteit, Netherlands Markus Stumptner, University of South Australia, Australia Peter Triantafillou, Technical University of Crete, Greece Anand Tripathi, University of Minnesota, USA Vijay K. Vaishnavi, Georgia State University, USA Francisco Valverde-Albacete, Universidad Carlos III de Madrid, Spain Maurizio Vincini, University of Modena and Reggio-Emilia, Italy Fang Wang, BTexact Technologies, UK Gerhard Weiss, Technische Universitaet, Germany Bin Yu, North Carolina State University, USA Franco Zambonelli, University of Modena and Reggio-Emilia, Italy From p2p at garethwestern.com Tue Feb 24 11:08:35 2004 From: p2p at garethwestern.com (p2p@garethwestern.com) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] Web Service Interface for P2P File Storage Message-ID: <1077620915.403b30b3e5818@www.garethwestern.com> Hi, I'm developing a web service interface for a p2p storage system. Is anyone aware of any existing work in this area? The XSpace implementation on xmethods.net seems to be similar to what I am looking for, only I am trying to use a distributed p2p storage space, such as PAST, instead. Also, which storage networks do you recommend? I am currently starting to experiment with PAST and Bamboo, with plans to also try an implementation of Chord as soon as the first ones are developed. Thanks for your advice! Cheers, Gareth From eugen at leitl.org Thu Feb 26 11:00:18 2004 From: eugen at leitl.org (Eugen Leitl) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] FYI: [Twisted-Python] ANN: Twisted 1.2.0 (fwd from itamar@itamarst.org) Message-ID: <20040226110017.GI32608@leitl.org> In (admittedly, unlikely) case you were unaware of this excellent P2P framework. ----- Forwarded message from Itamar Shtull-Trauring ----- From: Itamar Shtull-Trauring Date: Wed, 25 Feb 2004 23:12:38 -0500 To: twisted-python@twistedmatrix.com Subject: [Twisted-Python] ANN: Twisted 1.2.0 Organization: http://itamarst.org X-Mailer: Ximian Evolution 1.4.5 Reply-To: twisted-python@twistedmatrix.com Twisted is an event-driven networking framework for server and client applications. For more information, visit http://www.twistedmatrix.com, join the list http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python or visit us on #twisted at irc.freenode.net. The Twisted from Scratch tutorial is a good starting point for learning Twisted: http://twistedmatrix.com/documents/howto/tutorial What's New in 1.2.0 =================== - SFTP server implementation for the SSH server. - Improved wxPython support. - IMAPv4 enhancements and bug fixes. - Allow disabling display of tracebacks in error web pages. - ident protocol implementation (client and server). - Support mapping arbitrary child FDs when running processes on POSIX. - Initial SOAP client support (using SOAPpy). - Partial download support for FTP client. - Web framework now supports different handlers for different methods (e.g. GET or POST). - Coverage support in the trial testing framework. - Bug fixes and documentation and feature enhancements. _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python ----- End forwarded message ----- -- Eugen* Leitl leitl ______________________________________________________________ ICBM: 48.07078, 11.61144 http://www.leitl.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE http://moleculardevices.org http://nanomachines.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20040226/9e1b35d2/attachment.pgp From rohit_bhalla2002 at yahoo.com Sat Feb 28 06:12:43 2004 From: rohit_bhalla2002 at yahoo.com (Rohit Bhalla) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] NS and P2P Message-ID: <20040228061243.8410.qmail@web60207.mail.yahoo.com> Hello, I found in the ns-users archives some messages that relate the implementation of P2P stuff in NS Already someone have implemented with success on of this types of agents (Narada,Yoid, gnutella)? If yes, i appreciate so much if someone can send the code to me... Thanks by your help! Rohit __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools From ncbgroups at yahoo.com.br Sat Feb 28 06:15:28 2004 From: ncbgroups at yahoo.com.br (=?iso-8859-1?q?Nilton=20Braga?=) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] NS and P2P In-Reply-To: <20040228061243.8410.qmail@web60207.mail.yahoo.com> Message-ID: <20040228061528.77654.qmail@web40017.mail.yahoo.com> Hi! I'm also interested in this code, if someone knows where to find it. Thank you. --- Rohit Bhalla escreveu: > Hello, > > I found in the ns-users archives some messages that > relate the implementation of P2P stuff in NS > Already someone have implemented with success on of > this types of agents (Narada,Yoid, gnutella)? If > yes, > i > appreciate so much if someone can send the code to > me... > > Thanks by your help! > Rohit > > > __________________________________ > Do you Yahoo!? > Get better spam protection with Yahoo! Mail. > http://antispam.yahoo.com/tools > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences ______________________________________________________________________ Yahoo! Mail - O melhor e-mail do Brasil! Abra sua conta agora: http://br.yahoo.com/info/mail.html From bradneuberg at yahoo.com Sun Feb 29 04:30:50 2004 From: bradneuberg at yahoo.com (Brad Neuberg) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] NS By P2P In-Reply-To: <20040228061528.77654.qmail@web40017.mail.yahoo.com> Message-ID: <20040229043050.84108.qmail@web60703.mail.yahoo.com> What do you mean by NS? Do you mean Netscape and Mozilla browser integration? If thats what you are interested in, I can provide lots of tips on Mozilla-P2P integration (and a little about IE-P2P integration). Brad --- Nilton Braga wrote: > Hi! > > I'm also interested in this code, if someone knows > where to find it. > > Thank you. > > > > --- Rohit Bhalla > escreveu: > Hello, > > > > I found in the ns-users archives some messages > that > > relate the implementation of P2P stuff in NS > > Already someone have implemented with success on > of > > this types of agents (Narada,Yoid, gnutella)? If > > yes, > > i > > appreciate so much if someone can send the code to > > me... > > > > Thanks by your help! > > Rohit > > > > > > __________________________________ > > Do you Yahoo!? > > Get better spam protection with Yahoo! Mail. > > http://antispam.yahoo.com/tools > > _______________________________________________ > > p2p-hackers mailing list > > p2p-hackers@zgp.org > > http://zgp.org/mailman/listinfo/p2p-hackers > > _______________________________________________ > > Here is a web page listing P2P Conferences: > > > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > > > ______________________________________________________________________ > > Yahoo! Mail - O melhor e-mail do Brasil! Abra sua > conta agora: > http://br.yahoo.com/info/mail.html > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From ncbgroups at yahoo.com.br Sun Feb 29 05:10:26 2004 From: ncbgroups at yahoo.com.br (=?iso-8859-1?q?Nilton=20Braga?=) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] NS By P2P In-Reply-To: <20040229043050.84108.qmail@web60703.mail.yahoo.com> Message-ID: <20040229051026.2294.qmail@web40017.mail.yahoo.com> Well, in fact, when I said ns I was referring to the ns-2 software (Network Simulator - www.isi.edu/nsnam/ns/) But if you have tips about p2p-browser integration, I'm really interested. Could you send these tips?? Thank you. > What do you mean by NS? Do you mean Netscape and > Mozilla browser integration? If thats what you are > interested in, I can provide lots of tips on > Mozilla-P2P integration (and a little about IE-P2P > integration). > > Brad > > --- Nilton Braga wrote: > > Hi! > > > > I'm also interested in this code, if someone knows > > where to find it. > > > > Thank you. > > > > > > > > --- Rohit Bhalla > > escreveu: > Hello, > > > > > > I found in the ns-users archives some messages > > that > > > relate the implementation of P2P stuff in NS > > > Already someone have implemented with success on > > of > > > this types of agents (Narada,Yoid, gnutella)? If > > > yes, > > > i > > > appreciate so much if someone can send the code > to > > > me... > > > > > > Thanks by your help! > > > Rohit > > > > > > > > > __________________________________ > > > Do you Yahoo!? > > > Get better spam protection with Yahoo! Mail. > > > http://antispam.yahoo.com/tools > > > _______________________________________________ > > > p2p-hackers mailing list > > > p2p-hackers@zgp.org > > > http://zgp.org/mailman/listinfo/p2p-hackers > > > _______________________________________________ > > > Here is a web page listing P2P Conferences: > > > > > > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > > > > > > > ______________________________________________________________________ > > > > Yahoo! Mail - O melhor e-mail do Brasil! Abra sua > > conta agora: > > http://br.yahoo.com/info/mail.html > > _______________________________________________ > > p2p-hackers mailing list > > p2p-hackers@zgp.org > > http://zgp.org/mailman/listinfo/p2p-hackers > > _______________________________________________ > > Here is a web page listing P2P Conferences: > > > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > _______________________________________________ > Here is a web page listing P2P Conferences: > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences ______________________________________________________________________ Yahoo! Mail - O melhor e-mail do Brasil! Abra sua conta agora: http://br.yahoo.com/info/mail.html From lujianming at software.ict.ac.cn Sun Feb 29 06:37:43 2004 From: lujianming at software.ict.ac.cn (Jimmy) Date: Sat Dec 9 22:12:38 2006 Subject: [p2p-hackers] any opensource application or simulation based on CAN ? Message-ID: <20040229064030.5F41D3FC37@capsicum.zgp.org> p2p-hackers, hi,every body. I am now doing something about text retrieval on P2P and going to take the CAN as the lower p2p networks overlay . It seems to be hard to find some opensource application based on CAN ,or a good opensource CAN simulation available. Does any body know any any opensource application or simulation based on CAN ? Good luck ^_^. Jimmy.Lud. lujianming@software.ict.ac.cn 2004-02-29 -------------- next part -------------- A non-text attachment was scrubbed... Name: face-3.gif Type: image/gif Size: 842 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20040229/7ccbaa9e/face-3.gif