From bram at gawth.com Sun Sep 2 00:12:01 2001 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] BitTorrent 2.2 is out Message-ID: I just pushed out BitTorrent 2.2, the protocols are now frozen. This release includes supporting multiple publishers who have the same file, and separate downloader query and announcement, so it can report what port it's listening on and downloaders which don't start downloading aren't included in lists of peers. Smaller new features include compilation under BSD, changing the mimetype to all low caps, and proper quoting in urls, to support characters like space and equals in filenames. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From arma at mit.edu Mon Sep 3 15:29:01 2001 From: arma at mit.edu (Roger Dingledine) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Chord comments In-Reply-To: ; from cyb@azrael.dyn.cheapnet.net on Mon, Jul 30, 2001 at 03:59:32PM -0500 References: <20010727033914.G7892@belegost.mit.edu> Message-ID: <20010903182823.T14872@moria.mit.edu> On Mon, Jul 30, 2001 at 03:59:32PM -0500, Brandon K. Wiley wrote: > So if you have 65k addresses and the network has 65k nodes, then your odds > of being closest for a given key are 50%. Your odds for winning on two > keys are 25%, etc.. So if you have the entire IP resources of a > university, you can *almost* be assured of censoring one file, but if you > want to, say, censor 65k files (one file per node), then you get .5 ^ 65k, > which is really small. I've been thinking about this off and on for a while, and I think you're right. This is an excellent point -- Chord may be vulnerable to a large adversary gunning for a specific file, but that adversary has to do a separate attack for each file he wants to censor. "Getting control of one file does not get him significantly closer to controlling any other." > > b) Determine which IPs would be useful to you, and break into machines > > on their subnets. > > Right now I'm just trying to defend against attackers that use legal > measures such as scanning IPs. If you can take over arbitrary machines at > will then I don't know of any currently existing system which will help > you. No, my attack was very different from compromising arbitrary machines. I don't have to compromise *your* machine -- I just have to compromise any machine out there that has (or has access to) an IP better than yours. I think this is a much more feasible attack. > > But when a node is "fixing" the Chord network due to > > loss/addition of a node, how do you know that he's fixing it right? Nodes > > can flat-out give you a new routing table which is made up entirely of > > dead nodes, thus cutting you out of the ring. If you verify that new > > routes you get work, then the adversary can transition you onto his fake > > ring, and then drop you all at once (or just watch all your queries). > > An adversary cannot give you an entirely new routing table. Well, he can, > but you will only use those parts of the new routing table which are > actually better than what you already have. The adversary does not know > what your routing table currently contains, so cannot know what keys it > controls are better than what you already have. However, since the keys > which you desire are relative to your own key, your key is based on your > IP, and the adversary has your IP, the adversary can generate a routing > table for you which contains the best keys under its control for the > various slots in your routing table. The odds of the adversary taking over > your entire routing table are, as before, (evil / (evil+good)) ^ keys. So > given a 65k network and 65k addresses under the control of the adversary, > the odds of taking over your entire routing table are .5 ^ 160 = 6.8e-49. > If any good entries are left in your routing table then when the evil node > drops you, you can rebuild your routing table from the good node. Chances > are that the evil node isn't the best node for one of the slots in your > routing table, so there is a possibility that it will be replaced by > another node and then you won't get bad references anymore. > > http://www.pdos.lcs.mit.edu/papers/chord:sigcomm01/ Ok, I take back my earlier claim. You're right -- since the finger tables try to remember specific locations in the ring (rather than "somebody from that segment of the ring" as I'd originally thought), you need to to get IPs which are better than every entry in the target's finger table. I still think a strong adversary can win against a given individual target by identifying which IPs would be useful and getting them. (Your math above is fishy, but that's a topic for a later post.) The neat feature which Chord provides is separation of targets: if I'm attacking a given target in the Chord ring, then my attack (with high probability) is not useful against any other target. Said a different way, I can't "pre-attack" a target without first knowing his IP. (Or equivalently, I can't "pre-attack" a file without first knowing its contents (hash)). Anybody else buy this? --Roger From arma at mit.edu Mon Sep 3 16:20:01 2001 From: arma at mit.edu (Roger Dingledine) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Chord comments In-Reply-To: ; from cyb@azrael.dyn.cheapnet.net on Mon, Jul 30, 2001 at 03:59:32PM -0500 References: <20010727033914.G7892@belegost.mit.edu> Message-ID: <20010903191920.U14872@moria.mit.edu> On Mon, Jul 30, 2001 at 03:59:32PM -0500, Brandon K. Wiley wrote: > So given a fixed size keyspace (all IP4 addresses), you can do a > precomputed dictionary attack. If you have a lot of IPs (a.k.a. money) > then you can very quickly determine which of your IPs is best (since in > Chord "best" is absolute within a given set of IPs and a given key) and > then set up a node there. So the question of how useful this attack is > depends on the distribution of keyspaces over the network. > > My math is probably naive or just wrong, but my attempt at probability > says that your probability of having the closest key is > (evil / (good + evil)) ^ keys. Consider that you have M honest machines around the ring already, and you have N darts that you can throw into the ring. Your goal is to get at least one dart into specific bins that you're trying to attack (either so you can be the one responsible for a file you're attacking, or so you can be the one that is closest to a finger for a given victim node. Visualize it as a clock face (ie M=12), and you're trying to land at least one of your darts (each dart thrown independently and uniformly randomly) between the 12 and the 1. (I'm making the assumption here that anywhere between the 12 and the 1 will do; I think that's an ok assumption.) For more complex attacks, you need to collect a number of specific bins (either to own all the fingers of a particular victim node, or to own all the replicas of a particular victim file). Picture those as the area between 1 o'clock and 2 o'clock, etc. The goal is to get at least one dart into each target bin -- it doesn't matter which dart, or how many more you get, etc. So the chance of getting at least one dart in each of P specific bins (places), given M honest machines already in the ring and given N darts (new IPs) that you can try, is: sum over i from 0 to P of (-1)^i * (P Choose i) * ((M-i)/M)^N I've included a quick perl script at the end of this post so you can play with it; beware of overruns...factorials are messy things, and this is a fragile script. But first, some sample situations: Chance of winning 1 places amid 10000 machines with 1000 new IPs is 0.0951671064414438. Chance of winning 1 places amid 10000 machines with 5000 new IPs is 0.393484504375222. Chance of winning 14 places amid 10000 machines with 5000 new IPs is 2.10996882146536e-06. (~14 fingers in 10K nodes) Chance of winning 1 places amid 10000 machines with 50000 new IPs is 0.993263737389397. Chance of winning 14 places amid 10000 machines with 50000 new IPs is 0.909710516660863. Chance of winning 100 places amid 10000 machines with 50000 new IPs is 0.508637738432524. (so a file replicated into 100 different places is still vulnerable) Chance of winning 200 places amid 10000 machines with 50000 new IPs is 0.258652809279435. > So if you have 65k addresses and the network has 65k nodes, then your odds > of being closest for a given key are 50%. Your odds for winning on two > keys are 25%, etc.. So if you have the entire IP resources of a > university, you can *almost* be assured of censoring one file, but if you > want to, say, censor 65k files (one file per node), then you get .5 ^ 65k, > which is really small. Chance of winning 1 places amid 65000 machines with 65000 new IPs is 0.632123388689125. Chance of winning 2 places amid 65000 machines with 65000 new IPs is 0.399577896430526. Chance of winning 3 places amid 65000 machines with 65000 new IPs is 0.252579901639995. Chance of winning 4 places amid 65000 machines with 65000 new IPs is 0.159659167435841. Chance of winning 5 places amid 65000 machines with 65000 new IPs is 0.100922190340699. It's really a matter of beating the current number of active machines. Chance of winning 1000 places amid 5000 machines with 50000 new IPs is 0.955655650413358. Winning all 5000 places is .797. Hope this helps. Note that this analysis completely ignores the more realistic attack, which is calculate which IPs on the net would be better for you, and break into them or some machine on their subnet. It also doesn't consider whether it becomes harder to "reuse" the same address space on future attacks. --Roger (Thanks to Eddy Karat for arguing the math with me, and giving me such a clean summation. Do let me know if you fix it or improve on it, or if all of my calculations here included overruns so the numbers are all wrong.) #!/usr/bin/perl # Calculate \Sigma_{i=0}^P # (-1)^i * (P Choose i) * ((M-i)/M)^N # swiped and pruned from http://code.anapraxis.net/Math/Combinatorics.pm # get a better Choose function if you want better accuracy sub Choose { my $n = shift; die "N ($n) must be a positive integer" if $n < 1; my $r = shift; die "R must be 0 < R < N ($n) if there is no repetition" if ($r < 0 || $r > $n); my $c = 1; if ($r > $n/2) { $r = $n - $r } # Take advantage of 2) for (1..$r) { $c *= $n--; # n! / (n-r)! $c /= $_; # c / r! } return $c; } ########################## $n = 50000; # number of adversary machines $m = 5000; # number of honest machines $p = 200; # number of simultaneous attacks i must make $sum = 0; for($i=0;$i<=$p;$i++) { $sum += ( ((-1)**$i) * (Choose($p, $i)) * ((($m-$i)/$m)**$n) ); print "sum at i=$i is $sum.\n"; } print "Chance of winning $p places amid $m machines with $n new IPs is $sum.\n"; From arma at mit.edu Mon Sep 3 16:49:02 2001 From: arma at mit.edu (Roger Dingledine) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] CfP: Workshop on Privacy Enhancing Technologies 2002 Message-ID: <20010903194759.X14872@moria.mit.edu> We're hoping to get some good submissions from the P2P community for this workshop. This is the sequel to the workshop last summer where Freenet and Free Haven were presented. ------------------------------------------------------------------------ CALL FOR PAPERS WORKSHOP ON PRIVACY ENHANCING TECHNOLOGIES 2002 Apr 14-15, 2002 San Francisco, CA, USA Workshop web site: http://www.pet2002.org/ Privacy and anonymity are increasingly important in the online world. Corporations and governments are starting to realize their power to track users and their behavior, and restrict the ability to publish or retrieve documents. Approaches to protecting individuals, groups, and even companies and governments from such profiling and censorship have included decentralization, encryption, and distributed trust. Building on the success of the first anonymity and unobservability workshop (LNCS 2009, held in Berkeley in July 2000), this second workshop addresses the design and realization of such anonymity and anti-censorship services for the Internet and other communication networks. We are holding this workshop adjacent to the Twelfth Conference on Computers, Freedom, and Privacy (CFP2002) for convenience, but we are not affiliated with that conference. The workshop seeks submissions from academia and industry presenting novel research on all theoretical and practical aspects of privacy technologies, as well as experimental studies of fielded systems. We encourage submissions from other communities such as law and business that present these communities' perspectives on technological issues. We will publish accepted papers in proceedings in the Springer Lecture Notes in Computer Science (LNCS) series. Suggested topics include but are not restricted to: * Efficient realization of privacy services * Techniques for and against traffic analysis * Attacks on anonymity systems * New concepts for anonymity systems * Novel relations of payment mechanisms and anonymity * Models for anonymity and unobservability * Models for threats to privacy * Techniques for censorship resistance * Resource management in anonymous systems * Pseudonyms, linkability, and trust * Policy and human rights -- anonymous systems in practice * Fielded systems and privacy enhancement techniques for existing systems * Frameworks for new systems developers IMPORTANT DATES Submission deadline December 10, 2001 Acceptance notification February 11, 2002 Camera-ready copy for preproceedings March 11, 2002 Camera-ready copy for proceedings May 15, 2002 GENERAL CHAIR Adam Shostack, Zero Knowledge Systems (adam@zeroknowledge.com) PROGRAM COMMITTEE John Borking, Dutch Data Protection Authority Lance Cottrell, Anonymizer.com Roger Dingledine, Reputation Technologies (co-chair, arma@mit.edu) Hannes Federrath, Freie Universitaet Berlin, Germany Markus Jakobsson, RSA Laboratories Marit Koehntopp, Independent Centre for Privacy Protection, SH, Germany Andreas Pfitzmann, Dresden University of Technology, Germany Avi Rubin, AT&T Labs - Research Paul Syverson, Naval Research Lab (co-chair, syverson@itd.nrl.navy.mil) Michael Waidner, IBM Zurich Research Lab PAPER SUBMISSIONS Submitted papers must not substantially overlap with papers that have been published or that are simultaneously submitted to a journal or a conference with proceedings. Papers should be at most 15 pages excluding the bibliography and well-marked appendices (using 11-point font and reasonable margins), and at most 20 pages total. Authors are encouraged to follow Springer LNCS format in preparing their submissions. Committee members are not required to read the appendices and the paper should be intelligible without them. The paper should start with the title, names of authors and an abstract. The introduction should give some background and summarize the contributions of the paper at a level appropriate for a non-specialist reader. We will publish accepted papers in proceedings in the Springer Lecture Notes in Computer Science (LNCS) series after the workshop. During the workshop preproceedings will be made available. Final versions are not due until after the workshop, giving the authors the opportunity to revise their papers based on discussions during the meeting. Submissions can be made in Postscript or PDF format. To submit a paper, send a plain ASCII text email to the program chairs (emails: arma@mit.edu, syverson@itd.nrl.navy.mil) containing the title and abstract of the paper, the authors' names, email and postal addresses, phone and fax numbers, and identification of the contact author. To the same message, attach your submission (as a MIME attachment). Papers must be received by December 10, 2001. Notification of acceptance or rejection will be sent to authors no later than February 11, 2002, and authors will have the opportunity to revise for the preproceedings version by March 11, 2002. Submission implies that, if accepted, the author(s) agree to publish in the proceedings and to sign a standard Springer copyright release, and also that an author of the paper will present it at the workshop. Final versions (due after the workshop) need to comply with the instructions for authors made available by Springer. From bosley at hcs.harvard.edu Mon Sep 3 17:46:01 2001 From: bosley at hcs.harvard.edu (Carl Bosley) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Chord comments In-Reply-To: <20010903191920.U14872@moria.mit.edu> Message-ID: > > My math is probably naive or just wrong, but my attempt at probability > > says that your probability of having the closest key is > > (evil / (good + evil)) ^ keys. > So the chance of getting at least one dart in each of P specific bins > (places), given M honest machines already in the ring and given N darts > (new IPs) that you can try, is: > > sum over i from 0 to P of > (-1)^i * (P Choose i) * ((M-i)/M)^N indeed. if you'd like something easier to approximate ... note for P = 1 this is 1 - ((M-1)/M)^N ~= 1 - e^{-N/M} For P > 1, for N >> 1 the overlap is negligible enough that (1 - e^{-N/M})^P is a good approximation. --Carl From arma at mit.edu Mon Sep 3 17:54:01 2001 From: arma at mit.edu (Roger Dingledine) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Chord comments In-Reply-To: ; from bosley@hcs.harvard.edu on Mon, Sep 03, 2001 at 08:45:32PM -0400 References: <20010903191920.U14872@moria.mit.edu> Message-ID: <20010903205353.Y14872@moria.mit.edu> On Mon, Sep 03, 2001 at 08:45:32PM -0400, Carl Bosley wrote: > For P > 1, for N >> 1 the overlap is negligible enough that > > (1 - e^{-N/M})^P > > is a good approximation. Excellent. This is what I was looking for. I think the overall story is that if you have significantly more possible IPs at your disposal than the current number of machines in the network, then you're going to win -- but only against a few targets, not against everybody. Thanks, --Roger From arma at MIT.EDU Tue Sep 4 10:06:01 2001 From: arma at MIT.EDU (Roger Dingledine) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Chord comments Message-ID: <20010904130555.M14872@moria.mit.edu> ----- Forwarded message from "Edwin R. Karat" ----- From: "Edwin R. Karat" Date: Tue, 04 Sep 2001 12:41:07 -0400 To: Roger Dingledine Subject: Re: [p2p-hackers] Chord comments In message <20010904121040.G14872@moria.mit.edu>, Roger Dingledine writes: >On Tue, Sep 04, 2001 at 01:06:14AM -0400, Edwin R. Karat wrote: >> In message <20010903205353.Y14872@moria.mit.edu>, Roger Dingledine writes: >> >On Mon, Sep 03, 2001 at 08:45:32PM -0400, Carl Bosley wrote: >> >> For P > 1, for N >> 1 the overlap is negligible enough that >> >> >> >> (1 - e^{-N/M})^P >> >> >> >> is a good approximation. >> > >> >Excellent. This is what I was looking for. >> > >> >I think the overall story is that if you have significantly more possible >> >IPs at your disposal than the current number of machines in the network, >> >then you're going to win -- but only against a few targets, not against >> >everybody. >> >> Whoops. Actually, if you have a good chance of winning against 1, then >> you have a good chance of winning against everybody else too, don't you? > >With the same set of darts? (Meaning they land in the same places.) Yes. With M >> P, then saying that you've compromised one particular computer only eats up P darts, you still have M-P darts to take over other computers with. So, the problem is largely the same with M-P darts, at which point you have nearly the same probability of taking over another specified computer. Analysis 1: Probability of taking over 2 *specified* computers (in the worst case with none of the bins in common) is equivalent to doubling P (you are trying to go for 2P bins, instead of P bins). In our approximate sum, (1 - e^{-N/M})^P, this just squares the probability, which is equivalent to 2 independent trials. Of course, they are not independent, but the point is that the dependence is a wash in the errors of the M >> P approximation. Seen from a different point of view, taking over a computer takes up P darts, leaving you with N-P darts. But to be likely to take over a specified computer, N is on the order of M, which is >> P. From gojomo at usa.net Tue Sep 4 11:47:01 2001 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:11:43 2006 Subject: Various identifier choices Re: [p2p-hackers] Morpheus, Freenet, MojoNation (was Semantic Routing BOF) References: <009101c130bf$49122fe0$0ea7fea9@golden> <87bskyfyf1.fsf@azrael.dyn.cheapnet.net> <00b301c130c7$61a960c0$0ea7fea9@golden> <20010829235342.J383@sandbergs.org> Message-ID: <014b01c13571$d6c07c00$0ea7fea9@golden> Oskar Sandberg writes: > I considered if the way of calculating UID values from data might > actually be an area where we should be trying to "interoperate" between > the different networks, but I figure that even there the emphasis is > too different for it be worth it. With Bitzi, we don't mind a proliferation of identifiers, because we aim to provide, as metadata, alternate names/IDs for catalogued files. So for example you'll eventually be able to pull from the Bitzi catalog a record like... Bitprint (SHA1+TigerTree) \_ Freenet-CHK \_ MojoID \_ MD5 \_ etc. (KazaaID? SHA256?) ...whenever users have contriibuted such associations. (Of course, users could lie, but since all these identifiers are calculable from the content itself, the worst that can happen is the inconvenience of fetching the wrong file once, at which point the problem can be detected and reported back to Bitzi for correction.) And in a separate message: > On Wed, Aug 29, 2001 at 01:15:43PM -0700, Gordon Mohr wrote: > > Yes. I prefer a tree hash for that purpose, as it allows > > out-of-order subsegment verification, but the progressive > > hash works too. > > There is obviously no need for out-of-order verfication on a stream that > needs to be tunneled. I would say that it is obviously beneficial to avoid designing-in a permanent assumption of streamed, in-order, complete delivery. Might it not be nice, someday, to tunnel different segments from different places, simultaneously or spread across time periods, for lots of reasons -- not least of which being performance and resistance to traffic analysis? - Gojomo From oskar at freenetproject.org Tue Sep 4 14:05:02 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:43 2006 Subject: Various identifier choices Re: [p2p-hackers] Morpheus, Freenet, MojoNation (was Semantic Routing BOF) In-Reply-To: <014b01c13571$d6c07c00$0ea7fea9@golden>; from gojomo@usa.net on Tue, Sep 04, 2001 at 11:45:50AM -0700 References: <009101c130bf$49122fe0$0ea7fea9@golden> <87bskyfyf1.fsf@azrael.dyn.cheapnet.net> <00b301c130c7$61a960c0$0ea7fea9@golden> <20010829235342.J383@sandbergs.org> <014b01c13571$d6c07c00$0ea7fea9@golden> Message-ID: <20010904230943.A547@sandbergs.org> On Tue, Sep 04, 2001 at 11:45:50AM -0700, Gordon Mohr wrote: > Oskar Sandberg writes: > > I considered if the way of calculating UID values from data might > > actually be an area where we should be trying to "interoperate" between > > the different networks, but I figure that even there the emphasis is > > too different for it be worth it. > > With Bitzi, we don't mind a proliferation of identifiers, because > we aim to provide, as metadata, alternate names/IDs for catalogued > files. Of course, you can always make a universal identifier by concatenating (/including) all the possible options, but a global unique data identifier standard would certainly be pretty nice. It is more difficult than just choosing a hash algorithm though, considering things like prehash encryption and metadata. Just trying to create a document format that always coalesces for identical data is difficult enough since the trend elsewhere has been text formats that are not strict (not character sensitive, allow for comments, have spacing issues, etc, etc). <> > Might it not be nice, someday, to tunnel different segments from > different places, simultaneously or spread across time periods, > for lots of reasons -- not least of which being performance and > resistance to traffic analysis? No, this hash format is used to verify streams as they pass through nodes, within atomic pieces of data. The verification segments are not seperately addressable - and making them so (which makes no sense because we have another format for that) would mean changing the data format anyways. -- 'DeCSS would be fine. Where is it?' 'Here,' Montag touched his head. 'Ah,' Granger smiled and nodded. Oskar Sandberg oskar@freenetproject.org From cyb at azrael.dyn.cheapnet.net Tue Sep 4 18:17:01 2001 From: cyb at azrael.dyn.cheapnet.net (Brandon Wiley) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) In-Reply-To: <014b01c13571$d6c07c00$0ea7fea9@golden> Message-ID: > With Bitzi, we don't mind a proliferation of identifiers, because > we aim to provide, as metadata, alternate names/IDs for catalogued > files. This reminds me that I have a question about Bitzi and this seems like the right place to ask it. Is there a programmatic interface for talking to Bitzi? Can I write a client? What format does the information come in? I think it would be very wonderful if I could get an RDF serialization of the Bitzi catalog via CGI or XML-RPC. From gojomo at usa.net Tue Sep 4 20:43:01 2001 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) References: Message-ID: <002b01c135bd$6df13560$1a1ffea9@gojovaio> Brandon Wiley writes: > This reminds me that I have a question about Bitzi and this seems like the > right place to ask it. Is there a programmatic interface for talking to > Bitzi? Can I write a client? What format does the information come in? I > think it would be very wonderful if I could get an RDF serialization of > the Bitzi catalog via CGI or XML-RPC. We intend to offer several interfaces. The only one up and running right now is the plain HTTP POST that does a submit/lookup -- but its return info is HTML intended for human eyes. The next priority is an HTTP GET or POST which dumps summary info about one or more given hashes (by SHA1 or full Bitprint). The summary info will definitely be XML, it might be proper RDF. Suggestions and commentary from interested parties can definitely shape what is offered; wishlists and preferences, here or at our website discussion forums would be appreciated. After that, other interfaces will be developed as outside demand and our capacity-to-service-them grow. For example, a possibility that often comes up is an XML-RPC interface for performing the same set of basic basic contribution/rating/ search services available through the web interface. There will also be periodic full dumps of the contributed info at various levels of detail, for mirroring and the creation of derivative/experimental works. DMOZ is the model we'll emulate, but here, too, requests and suggestions are welcome. - Gojomo From cyb at azrael.dyn.cheapnet.net Tue Sep 4 23:46:01 2001 From: cyb at azrael.dyn.cheapnet.net (Brandon Wiley) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) In-Reply-To: <002b01c135bd$6df13560$1a1ffea9@gojovaio> Message-ID: > The next priority is an HTTP GET or POST which dumps summary info > about one or more given hashes (by SHA1 or full Bitprint). The > summary info will definitely be XML, it might be proper RDF. > Suggestions and commentary from interested parties can definitely > shape what is offered; wishlists and preferences, here or at > our website discussion forums would be appreciated. My preference is certainly for an RDF dump via HTTP. I'm going to be demoing at O'Reilly my P2P searching technology. It works by gathering RDF databases of metadata. It was originally developed for Freenet, but I'm working on integration with MojoNation and BitTorrent. I'd like to add Bitzi as another system which can be searched in order to find URLs to content in various networks. If Bitzi provides a straight RDF dump of its whole database then it can act simply as a catalog source and searching can be done in my application. If Bitzi provides an implementation of the searching API either via the CGI version of the API or the XML-RPC version then it can actually be used as a drop-in replacement for the search engine part of the application. I've very interested in the interoperability of systems and I think it would be great to have a centralized metadata catalog as well as the decentralized catalogs which can exist in MN and BT. Centralized catalogs have distinct advantages at times. Bitzi is in a fine position to serve that role. All that really needs to be done to get things started is to provider an RDF serialization of the database via HTTP. From gojomo at usa.net Wed Sep 5 01:12:01 2001 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) References: Message-ID: <01e801c135e2$5d03dda0$0ea7fea9@golden> Brandon Wiley writes: > My preference is certainly for an RDF dump via HTTP. I'm going to be > demoing at O'Reilly my P2P searching technology. It works by gathering RDF > databases of metadata. It was originally developed for Freenet, but I'm > working on integration with MojoNation and BitTorrent. I'd like to add > Bitzi as another system which can be searched in order to find URLs to > content in various networks. If Bitzi provides a straight RDF dump of its > whole database then it can act simply as a catalog source and searching > can be done in my application. The dump will grow quite large; how often would you expect to schedule full-fetches (or delta-fetches)? Do you have any example RDF dumps which would demonstrate the fields and format conventions you'd find most useful? (We won't invent if there's already good precedents to mimic, and we could crank out an initial dump in very short order if it'd help give you something better to demo at O'R-P2P.) > If Bitzi provides an implementation of the > searching API either via the CGI version of the API or the XML-RPC version > then it can actually be used as a drop-in replacement for the search > engine part of the application. What is your dominant search model? Free text across all credible metadata? Field-specific with things like scalar value comparisons (e.g. "128 <= bitrate <= 196")? Both? > I've very interested in the interoperability of systems and I think it > would be great to have a centralized metadata catalog as well as the > decentralized catalogs which can exist in MN and BT. Centralized catalogs > have distinct advantages at times. Bitzi is in a fine position to serve > that role. All that really needs to be done to get things started is to > provider an RDF serialization of the database via HTTP. And that's exactly the role we'd like to play -- being the steward for cataloguing tasks which are easiest to do with a shared, central reference point, while letting the metadata itself travel whatever chaotic paths make the most sense to system developers and users. - Gojomo From coderman at mindspring.com Wed Sep 5 17:07:02 2001 From: coderman at mindspring.com (coderman) Date: Sat Dec 9 22:11:43 2006 Subject: Various identifier choices Re: [p2p-hackers] Morpheus, Freenet,MojoNation (was Semantic Routing BOF) References: <009101c130bf$49122fe0$0ea7fea9@golden> <87bskyfyf1.fsf@azrael.dyn.cheapnet.net> <00b301c130c7$61a960c0$0ea7fea9@golden> <20010829235342.J383@sandbergs.org> <014b01c13571$d6c07c00$0ea7fea9@golden> Message-ID: <3B96DC2A.5ED52BEF@mindspring.com> Gordon Mohr wrote: > > ... > > With Bitzi, we don't mind a proliferation of identifiers, because > we aim to provide, as metadata, alternate names/IDs for catalogued > files. > > So for example you'll eventually be able to pull from the Bitzi > catalog a record like... > > Bitprint (SHA1+TigerTree) > \_ Freenet-CHK > \_ MojoID > \_ MD5 > \_ etc. (KazaaID? SHA256?) > > ...whenever users have contriibuted such associations. > Regarding access to this data, would it be possible to implement some kind of lookup information to obtain all known metadata for a given SHA-1 key (or TigerTree, or MD5, etc)? This would be very usefull, and would place a much smaller load than retreiving significant portions of the index... From gojomo at usa.net Thu Sep 6 01:20:02 2001 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:11:43 2006 Subject: Various identifier choices Re: [p2p-hackers] Morpheus, Freenet,MojoNation (was Semantic Routing BOF) References: <009101c130bf$49122fe0$0ea7fea9@golden> <87bskyfyf1.fsf@azrael.dyn.cheapnet.net> <00b301c130c7$61a960c0$0ea7fea9@golden> <20010829235342.J383@sandbergs.org> <014b01c13571$d6c07c00$0ea7fea9@golden> <3B96DC2A.5ED52BEF@mindspring.com> Message-ID: <016201c136ac$a4582740$0ea7fea9@golden> Coderman writes: > Gordon Mohr wrote: > > With Bitzi, we don't mind a proliferation of identifiers, because > > we aim to provide, as metadata, alternate names/IDs for catalogued > > files. > > > > So for example you'll eventually be able to pull from the Bitzi > > catalog a record like... > > > > Bitprint (SHA1+TigerTree) > > \_ Freenet-CHK > > \_ MojoID > > \_ MD5 > > \_ etc. (KazaaID? SHA256?) > > > > ...whenever users have contriibuted such associations. > Regarding access to this data, would it be possible to implement some > kind of lookup information to obtain all known metadata for a given > SHA-1 key (or TigerTree, or MD5, etc)? > > This would be very usefull, and would place a much smaller load than > retreiving significant portions of the index... That's exactly the first stable interface we'll offer. It'll functionally be something like: getTicket(SHA1 or Bitprint) -> returns XML "ticket" of best known contributed info about file with that hash (redundant/down-rated tags will not be included) - Gojomo From cyb at azrael.dyn.cheapnet.net Fri Sep 7 02:06:02 2001 From: cyb at azrael.dyn.cheapnet.net (Brandon Wiley) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) In-Reply-To: <01e801c135e2$5d03dda0$0ea7fea9@golden> Message-ID: > The dump will grow quite large; how often would you expect to > schedule full-fetches (or delta-fetches)? It would be best if Bitzi implemented the searching API directly so that clients could talk directly to Bitzi without having to download the entire dump. If not then it would probably be best to have a centralized service which occasionally fetches a dump from Bitzi and then implements the searching API so as to free normal nodes from having to fetch anything massive. So if I end up implementing a Bitzi searching service then fetches will be scheduled whenever it is convenient for Bitzi for fetches to be scheduled. > Do you have any example RDF dumps which would demonstrate the > fields and format conventions you'd find most useful? Yes I do. My search engine can be configured to handle any schema. However I've been using Dublin Core because it's a standard schema for talking about files and generally people want to search for files. I've attached an example database. It doesn't use all of the dublin core fields, just the ones that I felt like filling in. On a side note, I replaced the DC schema one day with one I made up and turned my search engine into a personal contact information database and let me friends add themselves. So it's not limited to file searching. > (We won't invent if there's already good precedents to mimic, > and we could crank out an initial dump in very short order if > it'd help give you something better to demo at O'R-P2P.) That would be great! I could give a great demo with a fat database. If you decide to include fields that aren't in Dublic Core then just give me a list of the names of the fields and I'll configure it to use that schema instead. > What is your dominant search model? Free text across all credible > metadata? Field-specific with things like scalar value comparisons > (e.g. "128 <= bitrate <= 196")? Both? Currently the API only supports substring matches on a field-by-field basis for a set of fields defined by a particular schema. So if you're using DC, for instance, you can search for "ala" in the "Creator" field and "Wo" in the Title field, things like that. I'd like to add more complex searching to the API but I think that some discussion needs to occur regarding a good API for searching metadata before the API can be extended past its most various basic and obvious initial form. > And that's exactly the role we'd like to play -- being the > steward for cataloguing tasks which are easiest to do with a > shared, central reference point, while letting the metadata > itself travel whatever chaotic paths make the most sense to > system developers and users. Whee! This sounds like fun. From cyb at azrael.dyn.cheapnet.net Fri Sep 7 02:09:01 2001 From: cyb at azrael.dyn.cheapnet.net (Brandon Wiley) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) In-Reply-To: <01e801c135e2$5d03dda0$0ea7fea9@golden> Message-ID: Here's that example RDF file that I was supposed to attach. -------------- next part -------------- Noir-03.avi Noir Episode 3 Unknown If I knew that, then I could kill you. Bakamx Fansubs video/x-msvideo Fansub bush.html Britney Spears Gene Weingarten What does the Britney Spears chatroom have to say about the new Bush administration? Washington Post text/html Copyright 2001 The Washington Post Company Noir-02.avi Noir Episode 2 Unknown Who am I? And why don't I feel regret? Bakamx Fansubs video/x-msvideo Fansub Noir-01.avi Noir Episode 1 Unknown Sexy women committing horribly violent acts for no apparent reason Bakamx Fansubs video/x-msvideo Fansub Life, liberty, and the pursuit of happiness human/female Rose Kennedy Her mother A lovely young lady from Chicago, whose hobbies include computer security and punk rock. N/A rose_kndy 1 gold bar, please e-cash burning man admission fee u.s. gubment greenbacks u.s. treasury admission-fee f e a b c d cam.jpg Brandon's Webcam Brandon Wiley A picture of me, updated once every 60 seconds. Brandon Wiley image/jpeg BSD (with attribution clause) GPL image/jpeg My Sexxy LCD Monitor V.N.V.Tech My monitor is flat and very light. Does it make you randy? Brandon Wiley lcd.jpg tux-vs-clippy.mov Tux vs. Clippy Viktorie Navratilova An epic battle between two forces of nature. Brandon Wiley video/quicktime GPL yatta.mpeg Yatta! Unknown Wacky Japanese Boy Band Music Video Unknown video/mpeg GPL bomberman.nes Bomber Man Unknown ROM for the classic Nintendo game where you blow stuff up Unknown application/nestra-rom You have no right to this. edith.mp3 She Thinks She's Edith Head They Might Be Giants A song about a confused young lady Unknown audio/mp3 It's an MP3. @.txt @ Brandon Wiley The script for a comic book I wrote. Brandon Wiley text/plain Heck if I know From bram at gawth.com Mon Sep 10 16:20:01 2001 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] BitTorrent 2.3 out - now with user feedback! Message-ID: I just pushed out the latest release of BitTorrent - it now has great user feedback, and I fixed a bug where I forgot to make remotely initated sockets non-blocking (oops). You can get it here - http://bitconjurer.org/BitTorrent/download.html The next release will be the much anticipated web-integrated one for Windows. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bram at gawth.com Wed Sep 12 11:06:02 2001 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] 2.3.1 out - BitTorrent has download resuming Message-ID: BitTorrent now has download resuming, get it here - http://bitconjurer.org/BitTorrent/download.html To use download resuming, simply save using the same local file name you used for a partial download, and it will pick up where it left off. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bram at gawth.com Sat Sep 15 16:46:01 2001 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] BitTorrent now integrates into Internet Explorer! Message-ID: The much-anticipated web-integrated version of BitTorrent is now out - http://bitconjurer.org/BitTorrent/ Next will come some serious polishing - clean shutdown, no more text window, and storing of hash values in the publisher. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From zooko at zooko.com Sun Sep 16 09:05:01 2001 From: zooko at zooko.com (zooko@zooko.com) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks Message-ID: About two weeks ago I posted a message entitled "proxying and introduction: the two fundamental operations of emergent networks": http://zgp.org/pipermail/p2p-hackers/2001-August/000258.html an anonymous fan wrote: > > This is great stuff, Zooko. It has really got me thinking. ...too bad that the > following thread was about MN more than how the resulting emergent network > behaves. Thanks! Yes, I was sort of hoping that Oskar or someone would pick up the implicit challenge. See, AFAICT, Freenet and most other networks have focussed exclusively on proxying and neglected introduction. There is a sound theoretical reason for concentrating on proxying, as Oskar can lucidly explain: that with introduction alone, and no proxying, the size of your horizon is proportional to the size of your local state (i.e. you have to remember everyone's id and address, or whatever, in order to use them). Therefore introduction is "non-scalable" in Oskar's opinion. The counterargument to that is that introduction is *required*. A new node is created and it is not connected to anyone in the network. How does it get connected? That is the problem of Original Introduction. People who neglect introduction end up with some kind of kludge to do Original Introduction (e.g. node-lists on HTTP, the MN Meta Trackers, or manually configuring your node to connect to other nodes), and with no transitive introduction at all. But that gives a central point of failure (e.g. you can take out the HTTP server that newcomers depend on in order to connect to the network), or at least passes the buck for having a robust, emergent introduction service off to the HTTP or the manual configuration or whatever. So a really good emergent network needs both: proxying *and* introduction. IMO a really good emergent network is going to have: 1. good Transitive Introduction 2. good Original Introduction, which should utilize the features of the Transitive Introduction -- instead of being a wholly separate behavior with different properties 2.a. easy-for-users Original Introduction such as "type in the DNS or IP of your friend who is already connected to the Emergent Net", or "scan local net / local wireless area for Emergent Net nodes" 2.b. an easily accessible default original introduction service i.e. a set of redundant original introducer nodes like the MN Meta Trackers 3. Proxying for superlinear effective horizon But in the immediate term, I don't really care about number 3: proxying of operations (although I do care about relaying of messages) until the number of nodes on my network times the amount of local space it takes to carry on a relationship is approaching the amount of local space that I am willing to allocate on each node. Quick back-of-the-envelope calculations go like this: 128 bytes for addressing information (could do with less, but let's be a bit pessimistic), 128 bytes for crypto information (could do with less, again), and let's just say 256 bytes for local reputation (i.e. what you know about that other node, how it performs, etc.) for a nice round number of 512 bytes per counterparty. (Note: if you really wanted to squeeze I believe you could get this down to 20 bytes for crypto and addressing, and maybe 20 bytes for local reputation for a total of 40 bytes per counterparty, but let's be pessimistic while writing on the back of this particular envelope...) So if users are willing to allocate up to 128 MB of local persistent storage for maintaining their relationships in the network, then each node can have direct relationships with at least 2^18 == 262,144 other nodes. By the time this 2^18 limit is actually hurting my network, Freenet v0.5 will be out, and I can learn from all of the applied research that Freenet has done in effective proxying techniques and then add those proxying techniques into my network. Of course, it's also possible that harddrives will be bigger by then and users will be willing to allocate more than 128 MB just for peer relationships. Now my purpose here is definitely not to criticize proxying as such! Proxying is very important for a lot of reasons. For one thing, if you don't have proxying then local network usage is *also* proportional to the effective network horizon of a given operation (although there of several things that can ameliorate that problem including multicast) and for another thing, smaller devices with tighter storage constraints are more likely to need proxying. Another reason is that there may be some theoretical or engineering benefits to combining message-relay with higher-layer operations (as Freenet does) as opposed to making them separate abstraction layers (as Mojo Nation does). Finally, as Lucas Gonze pointed out in private e-mail, proxying allows for more complex relationships, for example maybe you don't *want* to introduce your two friends to each other, because you don't want them to be able to exchange information without giving you access to it. So my purpose here is *not* to denigrate proxying, but to draw attention to the important of introduction. Here is a quick recap of the reasons why introduction is vitally important to emergent networks: 1. Introduction is required, in order to have a network at all in the first place. 2. You can do non-transitive Original Introduction, but then it must be centralized and/or manually managed by humans. (Which is okay for a lot of applications.) 3. If you are going to implement transitive, automatic Introduction in order to have robust, automatic joining of the network, then why not use it? e.g.: a. You already use it to go from 1 neighbors to K neighbors (where K is the minimum number of neighbors that you need to be part of the network), then why not use it to go from K neighbors to M neighbors, where M is a higher number for greater efficiency in some cases. b. Use automatic transitive introduction to dynamically heal and optimize the network. 4. Introduction may make for more efficient networks than proxying in some cases (those cases where higher degree of connectivity is better). a. One such case seems to be when the total number of nodes on the entire network is sufficiently small, which the current state of Mojo Nation. Regards, Zooko P.S. Thanks again, for those who didn't read my earlier message, to Oskar, Adam Langley, and Bram for getting me thinking about this last year, and to Mark Miller for teaching me the ways of Granovetterism (== Introductionism). P.P.S. I don't know that much about Freenet, and it probably already has some transitive introduction features, but the Freenet people have not discussed it in terms of "Proxying and Introduction: The Two Fundamental Network Operations" before now as far as I know. From zooko at zooko.com Sun Sep 16 12:08:01 2001 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks In-Reply-To: Message from zooko@zooko.com of "Sun, 16 Sep 2001 08:55:26 PDT." References: Message-ID: following up to my own post: I, Zooko, wrote: > > P.P.S. I don't know that much about Freenet, and it probably already has some > transitive introduction features, but the Freenet people have not discussed it > in terms of "Proxying and Introduction: The Two Fundamental Network Operations" > before now as far as I know. I got a nice description from Adam Langley on IRC (irc.openprojects.net, channel #infoanarchy, founded by infoanarchy.org), of Freenet's transitive introduction mechanism. It sounds typically elegant, and I'm sure we are all eager to know how it behaves emergently. Right now I'm worrying about Original Introduction -- how do you get new nodes added to the network for the first time, if your centralized node list, CGI script, or Meta Tracker, is unavailable? The best answer I can come up with is that they talk to one of their friends who is already in the network and then they type in the IP address and port number of that friend's node. Regards, Zooko From hal at finney.org Sun Sep 16 12:15:02 2001 From: hal at finney.org (hal@finney.org) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks Message-ID: <200109161914.MAA07207@finney.org> Zooko writes: > Right now I'm worrying about Original Introduction -- how do you get new nodes > added to the network for the first time, if your centralized node list, CGI > script, or Meta Tracker, is unavailable? If the only problem is finding another node, and if nodes listen on well known ports, you can just try IP addresses at random until you find one. This method is being used by the Linux Morpheus clone at http://sourceforge.net/projects/gift/. However it is not a good long term solution for many P2P systems for two reasons: first it may not be a good idea to use a well known port if it turns out to be controversial; and second, IPV6 will provide too many addresses to ping them at random. > The best answer I can come up with is that they talk to one of their friends > who is already in the network and then they type in the IP address and port > number of that friend's node. Maybe a browser plug-in could save the need for typing the info; people could put it on a web page in some special format and the plug-in could read the node address and launch the P2P client. Hal From coderman at mindspring.com Sun Sep 16 12:31:01 2001 From: coderman at mindspring.com (coderman) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks References: Message-ID: <3BA51BB0.9D29344@mindspring.com> zooko@zooko.com wrote: > > ... > > 128 bytes for addressing information (could do with less, but let's be a bit > pessimistic), 128 bytes for crypto information (could do with less, again), and > let's just say 256 bytes for local reputation (i.e. what you know about that > other node, how it performs, etc.) for a nice round number of 512 bytes per > counterparty. (Note: if you really wanted to squeeze I believe you could get > this down to 20 bytes for crypto and addressing, and maybe 20 bytes for local > reputation for a total of 40 bytes per counterparty, but let's be pessimistic > while writing on the back of this particular envelope...) > > ... > > Here is a quick recap of the reasons why > introduction is vitally important to emergent networks: > > ... > > 3. If you are going to implement transitive, automatic Introduction in order to > have robust, automatic joining of the network, then why not use it? e.g.: > a. You already use it to go from 1 neighbors to K neighbors (where K is the > minimum number of neighbors that you need to be part of the network), then > why not use it to go from K neighbors to M neighbors, where M is a higher > number for greater efficiency in some cases. > b. Use automatic transitive introduction to dynamically heal and optimize the > network. > 4. Introduction may make for more efficient networks than proxying in some > cases (those cases where higher degree of connectivity is better). > a. One such case seems to be when the total number of nodes on the entire > network is sufficiently small, which the current state of Mojo Nation. > A few thoughts: I am building a system that uses no proxying (well, proxying is not a required feature of the network, although it can be used at a single level of indirection) and relies exclusively on transitive introduction as you call it, and reputation to organize peers within the network. This has the effect that a) like you mention, you are continually healing and optimizing the organization/topology of the network, and b) This network uses an extremely high level of direct connectivity, with the exact amount of connectedness determined by each peer on an individual basis based on factors such as bandwidth, memory, user preference, etc. Now, regarding Original Introduction : I am still working on a good way to do this, and it is rather hard to come up with something general and robust. I have decided to make this bootstrapping available in a few specific ways: 1. By default, there is a defaulty page (or pages) that can be queried to grab a few initial hosts. Once these are obtained, the transitive introduction process is primed, and this method should never be required again. 2. Manually enter a friends node address. Again, this kicks off the transitive introduction, so this should only be required once. 3. Scanning subnets for nodes. I dont really like this method, so I will probably avoid this at all costs, unless someone impresses upon me the dire need for it. Transitive introduction in my case consists of querying the highest quality peers for a set of their higher quality peers. In gnutella, this transitive introduction consisted of watching host address fly through the network, and connecting to any number of them as desired. Are there additional ways of transitive introduction that have not been widely discussed yet? In particular I am curious how FastTrack implements this. Obviously central servers should be avoided, but the decentralized options appear rather limited off hand. From lucas at gonze.com Sun Sep 16 12:46:01 2001 From: lucas at gonze.com (Lucas Gonze) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks In-Reply-To: <200109161914.MAA07207@finney.org> Message-ID: I have been thinking about a seeding method that works by spreading notifications across a bunch of public -- non-dedicated and unrelated -- forums. There are a zillion archived and searchable publication nodes. A node could ship with knowledge of a Google search string, an Altavista search string, a list of discussion group search URLs, etc. Just a big list of URLs and some prior knowledge of what to do with each. These don't have to on the web, they just have to be reachable in some way. Newsgroups, ftp sites, DNS records, online classifieds, IRC... anything that can accept and publish data. Some nodes would drop their contact data in these locations, others would pick it up. This is a flavor of LIPP, the Lossy Inefficient Paranoid Protocol: http://groups.yahoo.com/group/decentralization/message/3287 Gordon Mohr pointed out the relation of LIPP to: The Dining Cryptographers Problem: Unconditional Sender and Recipient Untraceability David Chaum J. Cryptology (1988) http://komarios.net/crypt/diningcr.htm (Also http://komarios.net/crypt/dc.htm & http://komarios.net/crypt/dc-demo.htm) Chaffing and Winnowing: Confidentiality without Encryption Ronald L. Rivest March 18, 1998 (rev. April 24, 1998) http://theory.lcs.mit.edu/~rivest/chaffing.txt I noticed the other day that dogs communicating via pee is LIPP-like in that it is lossy and inefficient but not paranoid. - Lucas From coderman at mindspring.com Sun Sep 16 13:05:02 2001 From: coderman at mindspring.com (coderman) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks References: Message-ID: <3BA52394.DEBE0941@mindspring.com> Lucas Gonze wrote: > > ... > > There are a zillion archived and searchable publication nodes. A node could > ship with knowledge of a Google search string, an Altavista search string, a > list of discussion group search URLs, etc. Just a big list of URLs and some > prior knowledge of what to do with each. These don't have to on the web, they > just have to be reachable in some way. Newsgroups, ftp sites, DNS records, > online classifieds, IRC... anything that can accept and publish data. > I had actually thought of using this method. I put up a page with a well defined string 'ALPINE_BOOTSTRAP_HOSTS' and let google get around to crawling it. Once indexed, anyone could put up a page with 'ALPINE_BOOTSTRAP_HOSTS' and if google got to it, it would eventually be locatable by a client for use with bootstrapping some peers. This would then kick off the transitive introduction, and so it should only be needed once. Example: http://www.google.com/search?q=cache:FAVsZne1Lao:cubicmetercrystal.com/alpine/bootstrap_hosts.html+ALPINE+BOOTSTRAP&hl=en I took this page down, and decided against this method because the relevance metrics used by search engines can be abused. I dont know how likely it would be, but a malicious party could place a highly linked page up that only lists rogue endpoints. (perhaps the RIAA wants peers connecting to their snooping servers?) It is an interesting idea, and perhaps coupled with some kind of assymetric/ public key encryption could be inplemented in a secure fashion. I.e. All pages need to be authorized and signed by something like a certificate authority for your network (maybe the maintainers, who knows). If the page located by google has a valid signature, then it could be used. If it does not, then search for the next page until you find one that does have a valid signature. You could apply this same mechanism to a mailing list or news group as well... From alk at pobox.com Sun Sep 16 18:00:01 2001 From: alk at pobox.com (Tony Kimball) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks References: Message-ID: <15269.19176.105147.503740@gargle.gargle.HOWL> Quoth Zooko on Sunday, 16 September: : Right now I'm worrying about Original Introduction -- how do you get new nodes : added to the network for the first time, if your centralized node list, CGI : script, or Meta Tracker, is unavailable? Peer-wise horizontal diversification is just one form of decentralization. Another is vertical decentralization, in which a variety of modalities are used to obtain operational information, with rolling failover between them. It is more time-consuming to implement a multimodal introduction mechanism, but it can make your system quite robust. Pick a few: - IM services Messenger Yahoo AOL etc. - Mailboxes Hotmail Yahoo et.c - NNTP servers/groups - IRC networks/channels - GNUTELLA-based discovery (requires a significant number of agents listening on the gnet, due to limited horizon) - Freenet-based discovery Publish connection-point lists - DNS-based discovery dyndns.org eyep.net etc. - Supernodes - FastTrack-based discovery - Opennap/napigator-based discovery Some are very quick to get working because there is plenty of solid, re-usable code available. Myself, I'd like to see the peer-system community rally to establish a reliable, persistent open DNS space for discovery services. Perhaps it's a field-of-dreams situation: If you build it, they will come. But perhaps not: So much NIH. From lucas at gonze.com Mon Sep 17 11:45:02 2001 From: lucas at gonze.com (Lucas Gonze) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks In-Reply-To: <15269.19176.105147.503740@gargle.gargle.HOWL> Message-ID: In the early gnutella days you got an introduction by going to #gnutella on irc and looking up IPs of people in the room, who were likely to be running the software. This way of bootstrapping a network based on personal connections was primitive and slow but extremely robust. > Peer-wise horizontal diversification is just one form of > decentralization. Another is vertical decentralization, in which > a variety of modalities are used to obtain operational information, > with rolling failover between them. It is more time-consuming > to implement a multimodal introduction mechanism, but it can > make your system quite robust. Pick a few: From ml at gondwanaland.com Tue Sep 18 17:50:02 2001 From: ml at gondwanaland.com (Mike Linksvayer) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) In-Reply-To: ; from cyb@azrael.dyn.cheapnet.net on Fri, Sep 07, 2001 at 04:05:14AM -0500 References: <01e801c135e2$5d03dda0$0ea7fea9@golden> Message-ID: <20010918204913.A80700@or.pair.com> On Fri, Sep 07, 2001 at 04:05:14AM -0500, Brandon Wiley wrote: > > (We won't invent if there's already good precedents to mimic, > > and we could crank out an initial dump in very short order if > > it'd help give you something better to demo at O'R-P2P.) > > That would be great! I could give a great demo with a fat database. If you > decide to include fields that aren't in Dublic Core then just give me a > list of the names of the fields and I'll configure it to use that schema > instead. An experimental dump featuring basic "best data" on 261,190 discrete files is now available at http://preview.openbits.org. Here's a one record example: 4128768 4944330300000000170B47454F42000005900000 Brazzaville - Brazzaville 2002 - 05 - Ocean (With Joe Frank).mp3 http://www.emusic.com/albums/19514/ Ocean (With Joe Frank) Brazzaville 2002 Brazzaville 5 1999 258440 128 44100 y DQ6TJEH2V39CVT3JM2SPCHBVH3SDWX7W Joe_Frank EWGAR2KGANV9QMI9B4529TD496 Bitprint detail may be accessed (html only right now) at http://bitzi.com/lookup/, i.e., http://bitzi.com/lookup/3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VDNK5UNR8ZPQ5MFASNGVB5MISV7ESUSB2MN5R3IY2 for the example above. The sha1 component of the bitprint may also be used alone, like http://bitzi.com/lookup/3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD You'll encounter the following non-Dublin Core fields: bz:length File size bz:first20 First 20 bytes of file bz:subjective Subjective comment bz:url Related URL bz:album Album name mm:trackNum Album track number mm:duration Track duration (ms) bz:bitrate (kilobits/second) bz:samplerate Hz bz:stereo y|n bz:encoder Audio encoder bz:audio_sha1 sha1 of audio data bz:width Image width bz:height Image height bz:bpp Impage bits/pixel bz:samplesize .wav specific bz:channels Channels in a .ogg (stereo=2) bz:broadcaster Original broadcaster bz:series Series name bz:medium Broadcast medium mm:trmId Relatable audio fingerprint bz:society "BitSociety" interest group bz:md5 MD5 full file hash We also used description, title, creator and date from Dublin Core. Obviously you won't find all of the above in a single record. This experimental dump is intentionally simple and flat. Future dumps may be more structured and contain more Bitzi "community" data, e.g., contributor attributions and content rating. Criticism desired! -- Mike Linksvayer http://gondwanaland.com/ml/ From coderman at mindspring.com Tue Sep 18 18:22:01 2001 From: coderman at mindspring.com (coderman) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> Message-ID: <3BA7F46F.892E3989@mindspring.com> Mike Linksvayer wrote: > > ... > > for the example above. The sha1 component of the bitprint may also > be used alone, like > > http://bitzi.com/lookup/3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD > I have a quick question. How hard would it be to support SHA-1 in hex format? I.e. http://bitzi.com/lookup/4A110667BE591E02DAE39A199390D6699E8D2959 Anyone know what the most popular type of representation is for SHA-1 hashes? From ml at gondwanaland.com Tue Sep 18 18:34:01 2001 From: ml at gondwanaland.com (Mike Linksvayer) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) In-Reply-To: <3BA7F46F.892E3989@mindspring.com>; from coderman@mindspring.com on Tue, Sep 18, 2001 at 06:27:11PM -0700 References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> Message-ID: <20010918213352.A8771@or.pair.com> On Tue, Sep 18, 2001 at 06:27:11PM -0700, coderman wrote: > Mike Linksvayer wrote: > > for the example above. The sha1 component of the bitprint may also > > be used alone, like > > > > http://bitzi.com/lookup/3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD > > I have a quick question. How hard would it be to support SHA-1 > in hex format? I.e. > > http://bitzi.com/lookup/4A110667BE591E02DAE39A199390D6699E8D2959 We already do support hex-encoded sha1 lookups. The URL you cite redirects to http://bitzi.com/lookup/JIISN378MERAFYZDVIN3HEGYPGRI4KK3 The hex version of the URL I gave as an example is http://bitzi.com/lookup/CA9174243CD55B960AA02691075E39730512F663 -- Mike Linksvayer http://gondwanaland.com/ml/ From coderman at mindspring.com Tue Sep 18 18:46:01 2001 From: coderman at mindspring.com (coderman) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <20010918213352.A8771@or.pair.com> Message-ID: <3BA7F9F3.61DBF5CE@mindspring.com> Mike Linksvayer wrote: > > ... > > We already do support hex-encoded sha1 lookups. The URL you cite > redirects to > Excelent, thanks. I read in some past discussion archives that hex was to be supported for backwards compatability for an indefinate period of time. Is this still the plan? Is there a date when you might move away from hex? From justin at chapweske.com Tue Sep 18 18:48:01 2001 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> Message-ID: <3BA7F8D4.2090108@chapweske.com> hex and Base64 are the most popular formats. coderman wrote: >Mike Linksvayer wrote: > >>... >> >>for the example above. The sha1 component of the bitprint may also >>be used alone, like >> >>http://bitzi.com/lookup/3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD >> > > >I have a quick question. How hard would it be to support SHA-1 >in hex format? I.e. > >http://bitzi.com/lookup/4A110667BE591E02DAE39A199390D6699E8D2959 > > >Anyone know what the most popular type of representation is for >SHA-1 hashes? >_______________________________________________ >p2p-hackers mailing list >p2p-hackers@zgp.org >http://zgp.org/mailman/listinfo/p2p-hackers > From gojomo at usa.net Tue Sep 18 18:57:01 2001 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <20010918213352.A8771@or.pair.com> Message-ID: <011501c140af$0bda60c0$6601a8c0@gojovaio> Base64 SHA1 lookups -- with the Freenet character substitutions, to be more URL-friendly -- should work too. (The code is there but was only tested on random format conformant input, which of course never returns real catalog details. For example: http://bitzi.com/lookup/JI2sN378M~a2FYzeVIn3HegY-GR ) So please try it out with some Base64 SHA1 values for files you know to be in the catalog, and let me know if it works! - Gordon ____________________ Gordon Mohr, gojomo@ bitzi.com, Bitzi CTO From gojomo at usa.net Tue Sep 18 19:21:01 2001 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <20010918213352.A8771@or.pair.com> <3BA7F9F3.61DBF5CE@mindspring.com> Message-ID: <013901c140b2$80708a60$6601a8c0@gojovaio> coderman writes: > Excelent, thanks. I read in some past discussion archives that hex > was to be supported for backwards compatability for an indefinate > period of time. Is this still the plan? Is there a date when > you might move away from hex? No -- beyond backward compatibility with our prior use of hex, we want these lookups to be convenient/easy for the widest possible audience. So while we prefer Base32 as our canonical/displayed format, we'll accept other formats for lookups indefinitely. - Gordon From justin at chapweske.com Tue Sep 18 20:23:02 2001 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <20010918213352.A8771@or.pair.com> <011501c140af$0bda60c0$6601a8c0@gojovaio> Message-ID: <3BA80F7F.50805@chapweske.com> With so many accepted input formats, wouldn't it make more sense to be explicit about which format you are using in the URL? Gordon Mohr wrote: >Base64 SHA1 lookups -- with the Freenet character substitutions, >to be more URL-friendly -- should work too. > >(The code is there but was only tested on random format >conformant input, which of course never returns real >catalog details. For example: > http://bitzi.com/lookup/JI2sN378M~a2FYzeVIn3HegY-GR ) > >So please try it out with some Base64 SHA1 values for >files you know to be in the catalog, and let me know if >it works! > >- Gordon >____________________ >Gordon Mohr, gojomo@ >bitzi.com, Bitzi CTO > > > >_______________________________________________ >p2p-hackers mailing list >p2p-hackers@zgp.org >http://zgp.org/mailman/listinfo/p2p-hackers > From cyb at azrael.dyn.cheapnet.net Tue Sep 18 22:58:01 2001 From: cyb at azrael.dyn.cheapnet.net (Brandon Wiley) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) In-Reply-To: <20010918204913.A80700@or.pair.com> Message-ID: > An experimental dump featuring basic "best data" on 261,190 discrete > files is now available at http://preview.openbits.org. I'm so excited! I'll use this database in my demo at O'Reilly. I'll start working on integrating the new schema elements tomorrow. From zooko at zooko.com Wed Sep 19 07:24:01 2001 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) In-Reply-To: Message from Justin Chapweske of "Tue, 18 Sep 2001 20:45:56 CDT." <3BA7F8D4.2090108@chapweske.com> References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> Message-ID: Having URLs which are short enough to cut and paste is important. Encoding six bits per character (base 64) is that much better than encoding five bits per character. A mojoid in base-32 would look like this: http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1 The same mojoid in base-64 would look like this: http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDhXvLWxCZdzV6F8Q That can make a significant difference in terms of usability, due to line-wrapping in SMTP gateways and in GUIs, the awkwardness of layout when representing this mojoid e.g. in HTML, and the general user experience. The bigger and uglier the URL, the less a user likes to deal with it. By the way, we might try to squeeze mojoids. I think we can get down to 30 bytes from 40 (by convincing ourselves that an 80-bit symmetric key has the same attack work factor as a 160-bit hash id), so then it would look like: http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e1 or http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDh We might also have an unencrypted mojoid, which would be 20 bytes, like this: http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da or http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8No Regards, Zooko From justin at chapweske.com Wed Sep 19 10:03:01 2001 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> Message-ID: <3BA8CFB3.1020509@chapweske.com> Are you planning on using the Freenet modification of Base64 to deal with some of the chars that don't go well in URLs? -Justin Zooko wrote: >Having URLs which are short enough to cut and paste is important. Encoding six >bits per character (base 64) is that much better than encoding five bits per >character. > >A mojoid in base-32 would look like this: > >http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1 > >The same mojoid in base-64 would look like this: > >http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDhXvLWxCZdzV6F8Q > >That can make a significant difference in terms of usability, due to >line-wrapping in SMTP gateways and in GUIs, the awkwardness of layout when >representing this mojoid e.g. in HTML, and the general user experience. The >bigger and uglier the URL, the less a user likes to deal with it. > >By the way, we might try to squeeze mojoids. I think we can get down to 30 >bytes from 40 (by convincing ourselves that an 80-bit symmetric key has the >same attack work factor as a 160-bit hash id), so then it would look like: > >http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e1 > >or > >http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDh > >We might also have an unencrypted mojoid, which would be 20 bytes, like this: > >http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da > >or > >http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8No > >Regards, > >Zooko > >_______________________________________________ >p2p-hackers mailing list >p2p-hackers@zgp.org >http://zgp.org/mailman/listinfo/p2p-hackers > From bram at gawth.com Wed Sep 19 10:19:02 2001 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) In-Reply-To: <3BA7F8D4.2090108@chapweske.com> Message-ID: On Tue, 18 Sep 2001, Justin Chapweske wrote: > hex and Base64 are the most popular formats. Don't forget Base256 :-) -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From zooko at zooko.com Wed Sep 19 10:35:02 2001 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) In-Reply-To: Message from Justin Chapweske of "Wed, 19 Sep 2001 12:02:43 CDT." <3BA8CFB3.1020509@chapweske.com> References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <3BA8CFB3.1020509@chapweske.com> Message-ID: > Are you planning on using the Freenet modification of Base64 to deal > with some of the chars that don't go well in URLs? Mojo Nation has always used `-' in place of `+' and `_' in place of `/'. This also allows us to use mojoids as filenames in most file systems. I'm sure we'd be willing to change that if there were a good reason. Regards, Zooko From gojomo at usa.net Wed Sep 19 10:39:01 2001 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> Message-ID: <005601c14131$cbdc4d20$0ea7fea9@golden> Zooko writes: > Having URLs which are short enough to cut and paste is important. I agree. > Encoding six > bits per character (base 64) is that much better than encoding five bits per > character. Yes, Base64 is 17% more compact than Base32, which is 20% more compact than Hexadecimal. But Base64 introduces case-sensitivity. Especially if you ever use identifier fragments as a shorthand, this introduces situations where they "bleed together" -- in human perception, in filesystems, in search-routines. Also, Base64 introduces 2 characters that can present problems in URLs and filenames: '/' and '+'. These also serve as 'break' characters to many text-index and text-search routines. You could use Freenet's patched Base64, which uses the characters '~' and '-' instead, but then you've deviated slightly from a long-standing standard, and still have the 'break' characters problem. In contrast, Base32 is robust across case isomorphisms, safe for URLs and filesystems, and results in full-length and fragment identifiers which are typically recognized as unbroken units by legacy text-search mechanisms. > A mojoid in base-32 would look like this: > > http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1 That looks like Hexadecimal to me; the chance that a 70-digit Base32 number would contain no letters G-Z is infinitesimal. > The same mojoid in base-64 would look like this: > > http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDhXvLWxCZdzV6F8Q If you're lucky enough not to get any '/' or '+' characters! > That can make a significant difference in terms of usability, due to > line-wrapping in SMTP gateways and in GUIs, the awkwardness of layout when > representing this mojoid e.g. in HTML, and the general user experience. The > bigger and uglier the URL, the less a user likes to deal with it. I again agree. However, for the foreseeable future, SHA1 will be a sufficient casual "mailable" key into Bitzi, and SHA1 in Base32 is already a manageable 32-characters. I can see with your longer MojoIDs you have a problem; there is no need for all identifiers to use the same ASCII-compatible-encoding, so perhaps Base64 is the right choice for MojoNation. If Bitzi was to track and display MojoIDs, associated with Bitprints, we would display the MojoIDs in whatever fashion is typical for MojoNation users. > By the way, we might try to squeeze mojoids. I think we can get down to 30 > bytes from 40 (by convincing ourselves that an 80-bit symmetric key has the > same attack work factor as a 160-bit hash id), so then it would look like: > > http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e1 > > or > > http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDh > > We might also have an unencrypted mojoid, which would be 20 bytes, like this: > > http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da > > or > > http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8No Sure, knock yourselves out. My only request would be that you document how they are created somewhere (besides the code itself) and freeze the definition at some point. :) - Gojomo From gojomo at usa.net Wed Sep 19 10:53:01 2001 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Bitzi (was Various identifier choices) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <20010918213352.A8771@or.pair.com> <011501c140af$0bda60c0$6601a8c0@gojovaio> <3BA80F7F.50805@chapweske.com> Message-ID: <008001c14133$e925f140$0ea7fea9@golden> Justin writes: > With so many accepted input formats, wouldn't it make more sense to be > explicit about which format you are using in the URL? I suppose you mean something like: http://bitzi.com/lookup?shal=3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD That's a possibility if there's any more proliferation of acceptable identifiers. Right now, though, what comes after the /lookup/ is always a SHA1 identifier, and then an optional TigerTree. The different encodings are a superficial detail, and always distinguishable by their differing lengths. Further, as mentioned elsewhere, URL compactness is a benefit -- so we don't want to add characters which only clarify what is already invariant. - Gojomo ____________________ Gordon Mohr, gojomo@ bitzi.com, Bitzi CTO From zooko at zooko.com Wed Sep 19 11:01:02 2001 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) In-Reply-To: Message from Gordon Mohr of "Wed, 19 Sep 2001 10:37:46 PDT." <005601c14131$cbdc4d20$0ea7fea9@golden> References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <005601c14131$cbdc4d20$0ea7fea9@golden> Message-ID: > But Base64 introduces case-sensitivity. Especially if > you ever use identifier fragments as a shorthand, this > introduces situations where they "bleed together" -- > in human perception, in filesystems, in search-routines. Hm. Since mojoids include 160-bit SHA1 hashes, they are collision free *even* if you base64 encode them and then merge all the upper/lowercase! (That isn't obvious, but once upon a time I convinced myself of it. I can find the message in old mojonation-devel archives if you like.) Hm. I'm pretty sure that using fragments as shorthand opens the door to collisions all by itself, and that the upper/lowercase issue doesn't contribute significantly to the risk of spoofing. Can you give me an example of this "bleed together" problem, excluding using fragments? > Also, Base64 introduces 2 characters that can present > problems in URLs and filenames: '/' and '+'. I should have specified that we translate `+' and `/' to `-' and `_' respectively. > In contrast, Base32 is robust across case isomorphisms, > safe for URLs and filesystems, and results in full-length > and fragment identifiers which are typically recognized > as unbroken units by legacy text-search mechanisms. I guess we just differ in our value judgements here. I value shorter ids for cut-and-paste purposes more than I value absence of "break" characters. Indeed, I can't really think of a motivating example for caring about "break" characters. Could you please suggest one? > > A mojoid in base-32 would look like this: > > > > http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1 > > That looks like Hexadecimal to me; the chance that a 70-digit Base32 > number would contain no letters G-Z is infinitesimal. Ahem. Okay, that was hexidecimal. The standard Python libraries offer hex and base-64, and in my haste I mistook hex for base-32. Hm. I can't find a base-32 encoder in Python. Could someone who favors base-32, and thus presumably has an encoder handy, show the base-32 version of 40-byte, 30-byte, and 20-byte strings? Thanks! Regards, Zooko From hal at finney.org Wed Sep 19 11:11:02 2001 From: hal at finney.org (hal@finney.org) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) Message-ID: <200109191810.LAA20604@finney.org> Zooko writes: > Since mojoids include 160-bit SHA1 hashes, they are collision free *even* if > you base64 encode them and then merge all the upper/lowercase! (That isn't > obvious, but once upon a time I convinced myself of it. I can find the message > in old mojonation-devel archives if you like.) Base64 encoding a 160 bit hash would take 26-27 characters. Compressing case throws away 1 bit per character that happens to be alphabetic, and 52/64 of the characters will be alphabetic. So you'll end up discarding around 21 bits, giving the hash an effective strength of 139 bits. That should be amply strong for these purposes. Hal From alk at pobox.com Wed Sep 19 11:15:02 2001 From: alk at pobox.com (Tony Kimball) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <005601c14131$cbdc4d20$0ea7fea9@golden> Message-ID: <15272.57453.804563.955204@gargle.gargle.HOWL> Quoth Zooko on Wednesday, 19 September: : : I guess we just differ in our value judgements here. I value shorter ids for : cut-and-paste purposes more than I value absence of "break" characters. : Indeed, I can't really think of a motivating example for caring about "break" : characters. Could you please suggest one? command shells. From zooko at zooko.com Wed Sep 19 11:33:01 2001 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) In-Reply-To: Message from Tony Kimball of "Wed, 19 Sep 2001 13:14:05 CDT." <15272.57453.804563.955204@gargle.gargle.HOWL> References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <005601c14131$cbdc4d20$0ea7fea9@golden> <15272.57453.804563.955204@gargle.gargle.HOWL> Message-ID: > : I guess we just differ in our value judgements here. I value shorter ids for > : cut-and-paste purposes more than I value absence of "break" characters. > : Indeed, I can't really think of a motivating example for caring about "break" > : characters. Could you please suggest one? > > command shells. I've been working with mojoids as part of a full time job and as an obsessive hobby for two years now. I've cut-and-pasted with X windows (and regretted that double-click doesn't highlight the entire mojoid if it contains a `-' character), I've used tab-completion to choose files whose names were mojoids, I've written Python code and bash scripts to manipulate mojoids as strings, as filenames, and as URLs. I've received and transmitted mojoids via IRC, e-mail, HTTP, and ssh. I've also read thousands of feedback e-mails from users of my software, watched newbies try to use the software for the first time in an "I'll watch but I won't help" user test on four separate occasions, convinced my mom to use it by telling her that it was the only way to download home movies of her infant grandchild, and used the software myself as an end user in order to publish, share, and download files. And in all that I've never noticed break characters or upper/lowercase to be an issue. But I *have* had problems with overly long mojoids being mangled by e-mail agents and *really* long ones (> 256 characters) being rejected by MSIE. Maybe if the binary object is only 20, 30 or 40 bytes long then expansion like hex or base-32 is okay, but among the three issues at hand: 1. upper-lower, 2. break chars, 3. length, I am concerned about length because of my experiences and because I think that the user experience is most affected by that one. But I admit that this is just an intuition of mine. Maybe users are fine with slightly longer URLs, and maybe they prefer all-lowercase over mixed case URLs. I haven't really heard a definitive statement on that from any users. Regards, Zooko From alk at pobox.com Wed Sep 19 11:39:01 2001 From: alk at pobox.com (Tony Kimball) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <005601c14131$cbdc4d20$0ea7fea9@golden> <15272.57453.804563.955204@gargle.gargle.HOWL> Message-ID: <15272.58907.924394.802255@gargle.gargle.HOWL> Quoth Zooko on Wednesday, 19 September: : : >: I guess we just differ in our value judgements here. I value shorter ids for : >: cut-and-paste purposes more than I value absence of "break" characters. : >: Indeed, I can't really think of a motivating example for caring about "break" : >: characters. Could you please suggest one? : > : > command shells. : : ... in all that I've never noticed break characters or upper/lowercase to be an : issue. But I *have* had problems with overly long mojoids being mangled by : e-mail agents and *really* long ones (> 256 characters) being rejected by MSIE. okay. non-starter. how about search engines? From gojomo at usa.net Wed Sep 19 11:55:01 2001 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <005601c14131$cbdc4d20$0ea7fea9@golden> Message-ID: <00ca01c1413c$a7539340$0ea7fea9@golden> Zooko writes: > > But Base64 introduces case-sensitivity. Especially if > > you ever use identifier fragments as a shorthand, this > > introduces situations where they "bleed together" -- > > in human perception, in filesystems, in search-routines. > > Hm. > > Since mojoids include 160-bit SHA1 hashes, they are collision free *even* if > you base64 encode them and then merge all the upper/lowercase! (That isn't > obvious, but once upon a time I convinced myself of it. I can find the message > in old mojonation-devel archives if you like.) I'd prefer a pointer to the MojoID definition document! > Hm. I'm pretty sure that using fragments as shorthand opens the door to > collisions all by itself, and that the upper/lowercase issue doesn't contribute > significantly to the risk of spoofing. > > Can you give me an example of this "bleed together" problem, excluding using > fragments? With a filesystem or file-management program which ignores or normalizes casing, Base64 names can suffer damage. This might then cause you to not find real matches (on a case-sensitive basis). Or, it might tempt you to use case-insensitive searches, and then you've lost ~21 bits from your secure hash (as Hal Finney mentioned). Alternatively, if you wanted to rely on legacy case-insensitive full-text search to find file identifiers, you'd be introducing a step where your identifiers are 21 bits weaker. (Try, for example, Googling for SMF2Y24TI7Y3CVER8NJKT7CAFGR9FS7Z. Googling for a Base64 identifier would introduse) These problems are further aggravated if you ever find it useful to use fragments. We already have 34 versions of 'Uptown Girl' in the Bitzi database. Given human perception, I think it's easier for people to say or think things like: "The 'PYME' version is complete, the '43N6' version is truncated." ...than to say... "The 'bB/e' version is complete, the 'B+vb' version is truncated." > > Also, Base64 introduces 2 characters that can present > > problems in URLs and filenames: '/' and '+'. > > I should have specified that we translate `+' and `/' to `-' and `_' > respectively. MojoNation and Freenet should get together and use the same "Base64v2". > > In contrast, Base32 is robust across case isomorphisms, > > safe for URLs and filesystems, and results in full-length > > and fragment identifiers which are typically recognized > > as unbroken units by legacy text-search mechanisms. > > I guess we just differ in our value judgements here. I value shorter ids for > cut-and-paste purposes more than I value absence of "break" characters. > Indeed, I can't really think of a motivating example for caring about "break" > characters. Could you please suggest one? Again, Googling for identifiers. Other full-text searches for fragments. Searching for the Base32 fragment 'B6THNJ' is always a single word; searching for the Base64 fragment 'aS+w/e' might be interpreted as 'as w e' and perhaps ignored completely. > Hm. I can't find a base-32 encoder in Python. Could someone who favors > base-32, and thus presumably has an encoder handy, show the base-32 version of > 40-byte, 30-byte, and 20-byte strings? Thanks! 20b -> 32 chars: 3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD 30b -> 48 chars: 3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD8EJ2KEDCV3WQMMPF 40b -> 64 chars: 3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD8EJ2KEDCV3WQMMPFWFJW6DCVPKXMZQIZ - Gojomo From gojomo at usa.net Wed Sep 19 12:06:01 2001 From: gojomo at usa.net (Gordon Mohr) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <005601c14131$cbdc4d20$0ea7fea9@golden> <00ca01c1413c$a7539340$0ea7fea9@golden> Message-ID: <000701c1413e$16517720$0ea7fea9@golden> I left a thought unfinished: > (Try, for example, Googling for SMF2Y24TI7Y3CVER8NJKT7CAFGR9FS7Z. > Googling for a Base64 identifier would introduse) "Googling for a Base64 identifier would introduce not just case-indifference, but also potentially fragment isomorphisms -- if, as I suspect, non-alphanumeric Base64 characters are treated as break characters." From oskar at freenetproject.org Wed Sep 19 14:57:02 2001 From: oskar at freenetproject.org (Oskar Sandberg) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) In-Reply-To: <200109191810.LAA20604@finney.org>; from hal@finney.org on Wed, Sep 19, 2001 at 11:10:07AM -0700 References: <200109191810.LAA20604@finney.org> Message-ID: <20010919235618.B402@sandbergs.org> On Wed, Sep 19, 2001 at 11:10:07AM -0700, hal@finney.org wrote: > Zooko writes: > > Since mojoids include 160-bit SHA1 hashes, they are collision free *even* if > > you base64 encode them and then merge all the upper/lowercase! (That isn't > > obvious, but once upon a time I convinced myself of it. I can find the message > > in old mojonation-devel archives if you like.) > > Base64 encoding a 160 bit hash would take 26-27 characters. Compressing > case throws away 1 bit per character that happens to be alphabetic, and > 52/64 of the characters will be alphabetic. So you'll end up discarding > around 21 bits, giving the hash an effective strength of 139 bits. > That should be amply strong for these purposes. It makes a lot more sense to generate a shorter key to begin with and base32 encode it, then to generate a full length key and then downgrade it based on the base64 encoding... > > Hal > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers -- Though here at journey's end I lie In darkness buried deep, above all shadows rides the Sun beyond all towers strong and high, and the Stars forever dwell: beyond all mountains steep, I will not say the Day is done, nor bid the Stars farewell. (JRRT) Oskar Sandberg oskar@freenetproject.org From zooko at zooko.com Thu Sep 20 12:40:01 2001 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:11:43 2006 Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) In-Reply-To: Message from Tony Kimball of "Wed, 19 Sep 2001 13:38:19 CDT." <15272.58907.924394.802255@gargle.gargle.HOWL> References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <005601c14131$cbdc4d20$0ea7fea9@golden> <15272.57453.804563.955204@gargle.gargle.HOWL> <15272.58907.924394.802255@gargle.gargle.HOWL> Message-ID: > : > command shells. > : > : ... in all that I've never noticed break characters or upper/lowercase to be an > : issue. But I *have* had problems with overly long mojoids being mangled by > : e-mail agents and *really* long ones (> 256 characters) being rejected by MSIE. > > okay. non-starter. how about search engines? Hm. Like I might have "ABCDE-FGHIJKLMNOPQRSTUVXWYZ" on my home page, but you can't find it through a search engine because when you ask for "A-B" it gives you "A and not B" or something? That seems like a good example (although I'm not sure about the details of how search engines work). Regards, Zooko From bram at gawth.com Tue Sep 25 21:45:01 2001 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] Call for Presentations: CodeCon 2002 Message-ID: CALL FOR PRESENTATIONS: CODECON 2002 http://www.codecon.org/ Please forward wherever applicable. CodeCon 2002, scheduled for February 15, 16, and 17 in San Francisco, California, is the premier event in 2002 for the P2P, cypherpunk, and network/security application developer community. It is a workshop for developers of real-world applications that support individual liberties. During the first two days, our policy is "bring your own code"; while those not demonstrating software are welcome to attend, the focus is primarily on developer discussion. The final day of the workshop is intended to be more inclusive, consisting of public and press demonstrations, interviews, panels and a public session allowing a larger number of presenters to demonstrate their projects in a more informal setting. All presentations must be accompanied by functional applications, ideally open source. Presenters must be one of the active developers of the code in question. CodeCon strongly encourages presenters from non-commercial and academic backgrounds to attend for the purposes of collaboration and the sharing of knowledge by providing free registration to workshop presenters and highly-discounted registration to full-time students. Public session presenters and approved members of the press will receive free registration for the public session on Sunday. IMPORTANT DATES Submissions open: 1 October 2001 Final submission deadline: 1 January 2002 Final notification of acceptance: 15 January 2002 Conference begins: 15 February 2002 Public session and public demonstrations: 17 February 2002 Post-conference web-based proceedings: 15 March 2002 SUGGESTED TOPICS The focus of CodeCon is on running applications which: * use one or more of: cryptography, steganography, distributed network architectures, peer to peer communications, anonymity or pseudonymity * enhance individual power and liberty * can be discussed freely, either by virtue of being open source or having a published protocol, and preferably free of intellectual property restrictions * are generally useful, either directly to a large number of users, or as an example of technology applicable to a larger audience Examples of excellent presentations include Mixmaster remailers and extensions, OpenNap, Swarmcast, Mojo Nation, Magic Money, and OpenPGP applications. Novelty in technical approaches, security assumptions, and end-user functionality are excellent properties. Presentations about basic technologies, such as a new cipher or hash, non-interesting vulnerabilities in existing applications, or discussions of unimplemented protocols are better suited for other conferences. The guidelines for the CodeCon public session on Sunday are less stringent than the main workshop; presentations which are more tangential to CodeCon's focus may be accepted. FORMAT OF PRESENTATIONS (main workshop) Paper and Q&A ------------- For those most comfortable with a traditional conference format, we will accept papers up to 25 pages. We encourage HTML or plain ASCII submissions, but can accept PostScript, PDF, or LaTeX. We will distribute papers in advance of the conference, and will provide 30 or 60 minutes for discussion and Q&A, at the presenter's discretion. In exceptional cases, we will accept anonymous papers and conduct either a non-directed discussion or a Q&A session directed by proxy. All papers should be accompanied by source code or an application. When possible, we would prefer that the application be available for interactive use during the workshop, either on a presenter-provided demonstration machine or one of the conference kiosks. Additionally, during the paper presentation, some use of this demo must be made; it may be relatively brief, but a demonstration of the running application is essential. Interactive demo ---------------- In addition to the traditional conference paper format, we encourage highly interactive presentations. Throughout the event, we will have several kiosks and local servers available for demonstration purposes. We also strongly encourage presenters to bring their own hardware. Application demos can be up to 20 minutes, followed by a period of up to 40 minutes for Q&A, which can include demonstration of additional features of the application not covered in the main presentation. If desired by the presenter, we can distribute URLs of applications several days before the workshop to allow attendees to familiarize themselves with the basics of applications prior to the workshop sessions. Panel ----- In areas where multiple projects fall roughly in the same domain, the most efficient presentation may be a panel with one or more developers from each team. These developers may then individually demonstrate their applications, followed by discussion among the panel and Q&A with the other attendees as to differences in design goals, implementation, and other aspects of the systems. If we receive multiple submissions from related projects for papers or demos, we may suggest to the presenters that they combine into a panel. Additionally, presenters are free to submit jointly as a pre-selected panel. There is some flexibility in requirements and formats for presentations; please enquire if you would like to use an alternate form. FORMAT OF PRESENTATIONS (public session) On the afternoon of Sunday 17 February, we will set aside a substantial amount of time for 5 minute-or-less project public session presentations. Other events on this day, including panels and main presentations, will be targeted at members of the press and public, so brief presentations on Sunday will reach a wide audience. Presenters from the first two days who wish to make an additional public session presentation may do so. SUBMISSION DETAILS Presentations must be performed by one of the active developers on the project. That's the rule -- no code, no mike. Multiple people may be involved in a presentation. You do get in free if you're part of a presentation even if you don't speak during it, so creativity (within reason) is encouraged. The workshop language is English, for both presentations and papers. Ideally, demonstrations should be usable by attendees with 802.11b connected devices either via a web interface, or locally on Windows, UNIX-like, or MacOS platforms. Cross-platform applications are most desirable. Our venue may be 21+. If you are submitting and are under 21, please advise the program committee; we may consider alternate venues for one or more days of the event. If you have a specific day on which you would prefer to present, please advise us. Main workshop submissions should include in the plain-text body of email to submissions@codecon.org the following information: - Name of presenter - Name of others involved in project attending conference - Title of presentation - Brief summary of topic - URL or attachment of example code (must be received by the final submission deadline) - Brief project history - Brief summary of demo, or abstract of paper - Any other details considered relevant Public session submissions should include in the plain-text body of email to submissions@codecon.org the following information: - Name of presenter - Title of presentation - Brief summary of topic - URL or attachment with example code - Any other details PROGRAM COMMITTEE Bram Cohen, BitTorrent Dan Egnor, ofb.net Jered Floyd, Permabit Ian Grigg, Systemics Ryan Lackey, HavenCo Don Marti, LinuxJournal Guido Sanchez, New Hack City Bill Stewart, AT&T Brandon Wiley, Freenet Jamie Zawinski, DNA Lounge COSTS Recognizing that many of the developers of the most interesting cypherpunk applications are unable to afford accommodations and other expenses in San Francisco, CodeCon will attempt to locate housing and otherwise assist with issues for presenters on a case-by-case basis. Please contact codecon-admin@codecon.org if your submission is accepted but you require assistance to attend. SPONSORSHIP If your organization is interested in sponsoring CodeCon, we would love to hear from you. In particular, we are looking for sponsors for social meals and parties on any of the three days of the conference, as well as sponsors of the conference as a whole, prizes or awards for quality presentations, and assistance with transportation or accommodation for presenters with limited resources. If you might be interested in sponsoring any of these aspects, please contact the conference organizers at codecon-admin@codecon.org. QUESTIONS If you have questions about CodeCon, or would like to contact the organizers, please mail codecon-admin@codecon.org. Please note this address is only for questions and administrative requests, and not for workshop presentation submissions. From greg at electricrain.com Thu Sep 27 14:33:01 2001 From: greg at electricrain.com (Gregory P. Smith) Date: Sat Dec 9 22:11:43 2006 Subject: [p2p-hackers] CIPHERim? Message-ID: <20010927143220.C19954@zot.electricrain.com> anyone know any details about the "im" product mentioned here? http://dailynews.yahoo.com/h/zd/20010927/tc/developer_encrypts_corporate_im_1.html It doesn't give a good description, but sounds a lot like what im clients have needed for a while (basically use EGTP from mojonation to send/receive your messages instead of a central server). -g