From bram at gawth.com  Sun Sep  2 00:12:01 2001
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] BitTorrent 2.2 is out
Message-ID: <Pine.LNX.4.21.0109020006160.19110-100000@ultra.gawth.com>

I just pushed out BitTorrent 2.2, the protocols are now frozen.

This release includes supporting multiple publishers who have the same
file, and separate downloader query and announcement, so it can report
what port it's listening on and downloaders which don't start downloading
aren't included in lists of peers.

Smaller new features include compilation under BSD, changing the mimetype
to all low caps, and proper quoting in urls, to support characters like
space and equals in filenames.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From arma at mit.edu  Mon Sep  3 15:29:01 2001
From: arma at mit.edu (Roger Dingledine)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Chord comments
In-Reply-To: <Pine.LNX.3.96.1010730054441.11613A-100000@azrael.dyn.cheapnet.net>; from cyb@azrael.dyn.cheapnet.net on Mon, Jul 30, 2001 at 03:59:32PM -0500
References: <20010727033914.G7892@belegost.mit.edu> <Pine.LNX.3.96.1010730054441.11613A-100000@azrael.dyn.cheapnet.net>
Message-ID: <20010903182823.T14872@moria.mit.edu>

On Mon, Jul 30, 2001 at 03:59:32PM -0500, Brandon K. Wiley wrote:
> So if you have 65k addresses and the network has 65k nodes, then your odds
> of being closest for a given key are 50%. Your odds for winning on two
> keys are 25%, etc.. So if you have the entire IP resources of a
> university, you can *almost* be assured of censoring one file, but if you
> want to, say, censor 65k files (one file per node), then you get .5 ^ 65k,
> which is really small.
 
I've been thinking about this off and on for a while, and I think you're
right. This is an excellent point -- Chord may be vulnerable to a large
adversary gunning for a specific file, but that adversary has to do a
separate attack for each file he wants to censor. "Getting control of
one file does not get him significantly closer to controlling any other."

> > b) Determine which IPs would be useful to you, and break into machines
> >    on their subnets.
> 
> Right now I'm just trying to defend against attackers that use legal
> measures such as scanning IPs. If you can take over arbitrary machines at
> will then I don't know of any currently existing system which will help
> you.
 
No, my attack was very different from compromising arbitrary machines.
I don't have to compromise *your* machine -- I just have to compromise
any machine out there that has (or has access to) an IP better than
yours. I think this is a much more feasible attack.

> > But when a node is "fixing" the Chord network due to
> > loss/addition of a node, how do you know that he's fixing it right? Nodes
> > can flat-out give you a new routing table which is made up entirely of
> > dead nodes, thus cutting you out of the ring. If you verify that new
> > routes you get work, then the adversary can transition you onto his fake
> > ring, and then drop you all at once (or just watch all your queries).
> 
> An adversary cannot give you an entirely new routing table. Well, he can,
> but you will only use those parts of the new routing table which are
> actually better than what you already have. The adversary does not know
> what your routing table currently contains, so cannot know what keys it
> controls are better than what you already have. However, since the keys
> which you desire are relative to your own key, your key is based on your
> IP, and the adversary has your IP, the adversary can generate a routing
> table for you which contains the best keys under its control for the
> various slots in your routing table. The odds of the adversary taking over
> your entire routing table are, as before, (evil / (evil+good)) ^ keys. So
> given a 65k network and 65k addresses under the control of the adversary,
> the odds of taking over your entire routing table are .5 ^ 160 = 6.8e-49.
> If any good entries are left in your routing table then when the evil node
> drops you, you can rebuild your routing table from the good node. Chances
> are that the evil node isn't the best node for one of the slots in your
> routing table, so there is a possibility that it will be replaced by
> another node and then you won't get bad references anymore.
> 
> http://www.pdos.lcs.mit.edu/papers/chord:sigcomm01/
 
Ok, I take back my earlier claim. You're right -- since the finger tables
try to remember specific locations in the ring (rather than "somebody
from that segment of the ring" as I'd originally thought), you need to
to get IPs which are better than every entry in the target's finger table.

I still think a strong adversary can win against a given individual
target by identifying which IPs would be useful and getting them.
(Your math above is fishy, but that's a topic for a later post.)

The neat feature which Chord provides is separation of targets: if I'm
attacking a given target in the Chord ring, then my attack (with high
probability) is not useful against any other target. Said a different
way, I can't "pre-attack" a target without first knowing his IP. (Or
equivalently, I can't "pre-attack" a file without first knowing its
contents (hash)).

Anybody else buy this?
--Roger


From arma at mit.edu  Mon Sep  3 16:20:01 2001
From: arma at mit.edu (Roger Dingledine)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Chord comments
In-Reply-To: <Pine.LNX.3.96.1010730054441.11613A-100000@azrael.dyn.cheapnet.net>; from cyb@azrael.dyn.cheapnet.net on Mon, Jul 30, 2001 at 03:59:32PM -0500
References: <20010727033914.G7892@belegost.mit.edu> <Pine.LNX.3.96.1010730054441.11613A-100000@azrael.dyn.cheapnet.net>
Message-ID: <20010903191920.U14872@moria.mit.edu>

On Mon, Jul 30, 2001 at 03:59:32PM -0500, Brandon K. Wiley wrote:
> So given a fixed size keyspace (all IP4 addresses), you can do a
> precomputed dictionary attack. If you have a lot of IPs (a.k.a. money)
> then you can very quickly determine which of your IPs is best (since in
> Chord "best" is absolute within a given set of IPs and a given key) and
> then set up a node there. So the question of how useful this attack is
> depends on the distribution of keyspaces over the network.
> 
> My math is probably naive or just wrong, but my attempt at probability
> says that your probability of having the closest key is
> (evil / (good + evil)) ^ keys.
 
Consider that you have M honest machines around the ring already,
and you have N darts that you can throw into the ring. Your goal is to
get at least one dart into specific bins that you're trying to attack
(either so you can be the one responsible for a file you're attacking, or
so you can be the one that is closest to a finger for a given victim node.

Visualize it as a clock face (ie M=12), and you're trying to land at least
one of your darts (each dart thrown independently and uniformly randomly)
between the 12 and the 1. (I'm making the assumption here that anywhere
between the 12 and the 1 will do; I think that's an ok assumption.)

For more complex attacks, you need to collect a number of specific bins
(either to own all the fingers of a particular victim node, or to own
all the replicas of a particular victim file). Picture those as the area
between 1 o'clock and 2 o'clock, etc. The goal is to get at least one
dart into each target bin -- it doesn't matter which dart, or how many
more you get, etc.

So the chance of getting at least one dart in each of P specific bins
(places), given M honest machines already in the ring and given N darts
(new IPs) that you can try, is:

 sum over i from 0 to P of
 (-1)^i * (P Choose i) * ((M-i)/M)^N

I've included a quick perl script at the end of this post so you can play
with it; beware of overruns...factorials are messy things, and this is
a fragile script. But first, some sample situations:

Chance of winning 1 places amid 10000 machines with 1000 new IPs is 0.0951671064414438.
Chance of winning 1 places amid 10000 machines with 5000 new IPs is 0.393484504375222.
Chance of winning 14 places amid 10000 machines with 5000 new IPs is 2.10996882146536e-06.
(~14 fingers in 10K nodes)

Chance of winning 1 places amid 10000 machines with 50000 new IPs is 0.993263737389397.
Chance of winning 14 places amid 10000 machines with 50000 new IPs is 0.909710516660863.

Chance of winning 100 places amid 10000 machines with 50000 new IPs is 0.508637738432524.
(so a file replicated into 100 different places is still vulnerable)
Chance of winning 200 places amid 10000 machines with 50000 new IPs is 0.258652809279435.

> So if you have 65k addresses and the network has 65k nodes, then your odds
> of being closest for a given key are 50%. Your odds for winning on two
> keys are 25%, etc.. So if you have the entire IP resources of a
> university, you can *almost* be assured of censoring one file, but if you
> want to, say, censor 65k files (one file per node), then you get .5 ^ 65k,
> which is really small.

Chance of winning 1 places amid 65000 machines with 65000 new IPs is 0.632123388689125.
Chance of winning 2 places amid 65000 machines with 65000 new IPs is 0.399577896430526.
Chance of winning 3 places amid 65000 machines with 65000 new IPs is 0.252579901639995.
Chance of winning 4 places amid 65000 machines with 65000 new IPs is 0.159659167435841.
Chance of winning 5 places amid 65000 machines with 65000 new IPs is 0.100922190340699.

It's really a matter of beating the current number of active machines.

Chance of winning 1000 places amid 5000 machines with 50000 new IPs is 0.955655650413358.

Winning all 5000 places is .797.
 
Hope this helps. Note that this analysis completely ignores the more
realistic attack, which is calculate which IPs on the net would be better
for you, and break into them or some machine on their subnet. It also
doesn't consider whether it becomes harder to "reuse" the same address
space on future attacks.
--Roger
(Thanks to Eddy Karat for arguing the math with me, and giving me such a
clean summation. Do let me know if you fix it or improve on it, or if all
of my calculations here included overruns so the numbers are all wrong.)


#!/usr/bin/perl

# Calculate \Sigma_{i=0}^P
#   (-1)^i * (P Choose i) * ((M-i)/M)^N

# swiped and pruned from http://code.anapraxis.net/Math/Combinatorics.pm
#  get a better Choose function if you want better accuracy
sub Choose {
  my $n = shift;  die "N ($n) must be a positive integer" if $n < 1;
  my $r = shift;

     die "R must be 0 < R < N ($n) if there is no repetition"
          if ($r < 0 || $r > $n);
      my $c = 1;
      if ($r > $n/2) { $r = $n - $r }   # Take advantage of 2)
      for (1..$r) {
        $c *= $n--;                     # n! / (n-r)!
        $c /= $_;                       # c  / r!
      }
      return $c;

}

##########################

$n = 50000; # number of adversary machines
$m = 5000; # number of honest machines
$p = 200; # number of simultaneous attacks i must make

$sum = 0;

for($i=0;$i<=$p;$i++) {

  $sum += ( ((-1)**$i) * (Choose($p, $i)) * ((($m-$i)/$m)**$n) );

  print "sum at i=$i is $sum.\n";

}

print "Chance of winning $p places amid $m machines with $n new IPs is $sum.\n";


From arma at mit.edu  Mon Sep  3 16:49:02 2001
From: arma at mit.edu (Roger Dingledine)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] CfP: Workshop on Privacy Enhancing Technologies 2002
Message-ID: <20010903194759.X14872@moria.mit.edu>

We're hoping to get some good submissions from the P2P community for
this workshop. This is the sequel to the workshop last summer where
Freenet and Free Haven were presented.

------------------------------------------------------------------------

                                CALL FOR PAPERS

                WORKSHOP ON PRIVACY ENHANCING TECHNOLOGIES 2002

                               Apr 14-15, 2002
                           San Francisco, CA, USA

                  Workshop web site: http://www.pet2002.org/


Privacy and anonymity are increasingly important in the online world.
Corporations and governments are starting to realize their power to
track users and their behavior, and restrict the ability to publish
or retrieve documents. Approaches to protecting individuals, groups,
and even companies and governments from such profiling and censorship
have included decentralization, encryption, and distributed trust.

Building on the success of the first anonymity and unobservability
workshop (LNCS 2009, held in Berkeley in July 2000), this second workshop
addresses the design and realization of such anonymity and anti-censorship
services for the Internet and other communication networks. We are holding
this workshop adjacent to the Twelfth Conference on Computers, Freedom,
and Privacy (CFP2002) for convenience, but we are not affiliated with
that conference.

The workshop seeks submissions from academia and industry presenting
novel research on all theoretical and practical aspects of privacy
technologies, as well as experimental studies of fielded systems.
We encourage submissions from other communities such as law and business
that present these communities' perspectives on technological issues. We
will publish accepted papers in proceedings in the Springer Lecture
Notes in Computer Science (LNCS) series.

Suggested topics include but are not restricted to:

* Efficient realization of privacy services
* Techniques for and against traffic analysis
* Attacks on anonymity systems
* New concepts for anonymity systems
* Novel relations of payment mechanisms and anonymity
* Models for anonymity and unobservability
* Models for threats to privacy
* Techniques for censorship resistance
* Resource management in anonymous systems
* Pseudonyms, linkability, and trust
* Policy and human rights -- anonymous systems in practice
* Fielded systems and privacy enhancement techniques for existing systems
* Frameworks for new systems developers


                           IMPORTANT DATES

Submission deadline                                 December 10, 2001
Acceptance notification                             February 11, 2002
Camera-ready copy for preproceedings                   March 11, 2002
Camera-ready copy for proceedings                        May 15, 2002


                            GENERAL CHAIR

Adam Shostack, Zero Knowledge Systems (adam@zeroknowledge.com)

                          PROGRAM COMMITTEE

John Borking, Dutch Data Protection Authority
Lance Cottrell, Anonymizer.com
Roger Dingledine, Reputation Technologies (co-chair, arma@mit.edu)
Hannes Federrath, Freie Universitaet Berlin, Germany
Markus Jakobsson, RSA Laboratories
Marit Koehntopp, Independent Centre for Privacy Protection, SH, Germany
Andreas Pfitzmann, Dresden University of Technology, Germany
Avi Rubin, AT&T Labs - Research
Paul Syverson, Naval Research Lab (co-chair, syverson@itd.nrl.navy.mil)
Michael Waidner, IBM Zurich Research Lab

                          PAPER SUBMISSIONS

Submitted papers must not substantially overlap with papers that have
been published or that are simultaneously submitted to a journal
or a conference with proceedings.  Papers should be at most 15
pages excluding the bibliography and well-marked appendices (using
11-point font and reasonable margins), and at most 20 pages total.
Authors are encouraged to follow Springer LNCS format in preparing their
submissions. <http://www.springer.de/comp/lncs/authors.html> Committee
members are not required to read the appendices and the paper should
be intelligible without them.  The paper should start with the title,
names of authors and an abstract.  The introduction should give some
background and summarize the contributions of the paper at a level
appropriate for a non-specialist reader.  We will publish accepted
papers in proceedings in the Springer Lecture Notes in Computer Science
(LNCS) series after the workshop.  During the workshop preproceedings
will be made available.  Final versions are not due until after the
workshop, giving the authors the opportunity to revise their papers
based on discussions during the meeting.

Submissions can be made in Postscript or PDF format.  To submit a paper,
send a plain ASCII text email to the program chairs (emails: arma@mit.edu,
syverson@itd.nrl.navy.mil) containing the title and abstract of the
paper, the authors' names, email and postal addresses, phone and fax
numbers, and identification of the contact author.  To the same message,
attach your submission (as a MIME attachment). Papers must be received by
December 10, 2001.  Notification of acceptance or rejection will be sent
to authors no later than February 11, 2002, and authors will have the
opportunity to revise for the preproceedings version by March 11, 2002.
Submission implies that, if accepted, the author(s) agree to publish
in the proceedings and to sign a standard Springer copyright release,
and also that an author of the paper will present it at the workshop.
Final versions (due after the workshop) need to comply with the
instructions for authors made available by Springer.


From bosley at hcs.harvard.edu  Mon Sep  3 17:46:01 2001
From: bosley at hcs.harvard.edu (Carl Bosley)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Chord comments
In-Reply-To: <20010903191920.U14872@moria.mit.edu>
Message-ID: <Pine.OSF.4.33.0109032038450.2671-100000@hcs.harvard.edu>

> > My math is probably naive or just wrong, but my attempt at probability
> > says that your probability of having the closest key is
> > (evil / (good + evil)) ^ keys.

> So the chance of getting at least one dart in each of P specific bins
> (places), given M honest machines already in the ring and given N darts
> (new IPs) that you can try, is:
>
>  sum over i from 0 to P of
>  (-1)^i * (P Choose i) * ((M-i)/M)^N

indeed.

if you'd like something easier to approximate ... note for P = 1 this is
1 - ((M-1)/M)^N
~= 1 - e^{-N/M}

For P > 1, for N >> 1 the overlap is negligible enough that

(1 - e^{-N/M})^P

is a good approximation.

--Carl


From arma at mit.edu  Mon Sep  3 17:54:01 2001
From: arma at mit.edu (Roger Dingledine)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Chord comments
In-Reply-To: <Pine.OSF.4.33.0109032038450.2671-100000@hcs.harvard.edu>; from bosley@hcs.harvard.edu on Mon, Sep 03, 2001 at 08:45:32PM -0400
References: <20010903191920.U14872@moria.mit.edu> <Pine.OSF.4.33.0109032038450.2671-100000@hcs.harvard.edu>
Message-ID: <20010903205353.Y14872@moria.mit.edu>

On Mon, Sep 03, 2001 at 08:45:32PM -0400, Carl Bosley wrote:
> For P > 1, for N >> 1 the overlap is negligible enough that
> 
> (1 - e^{-N/M})^P
> 
> is a good approximation.
 
Excellent. This is what I was looking for.

I think the overall story is that if you have significantly more possible
IPs at your disposal than the current number of machines in the network,
then you're going to win -- but only against a few targets, not against
everybody.

Thanks,
--Roger


From arma at MIT.EDU  Tue Sep  4 10:06:01 2001
From: arma at MIT.EDU (Roger Dingledine)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Chord comments
Message-ID: <20010904130555.M14872@moria.mit.edu>

----- Forwarded message from "Edwin R. Karat" <karat@MIT.EDU> -----

From: "Edwin R. Karat" <karat@MIT.EDU>
Date: Tue, 04 Sep 2001 12:41:07 -0400
To: Roger Dingledine <arma@MIT.EDU>
Subject: Re: [p2p-hackers] Chord comments

In message <20010904121040.G14872@moria.mit.edu>, Roger Dingledine writes:
>On Tue, Sep 04, 2001 at 01:06:14AM -0400, Edwin R. Karat wrote:
>> In message <20010903205353.Y14872@moria.mit.edu>, Roger Dingledine writes:
>> >On Mon, Sep 03, 2001 at 08:45:32PM -0400, Carl Bosley wrote:
>> >> For P > 1, for N >> 1 the overlap is negligible enough that
>> >> 
>> >> (1 - e^{-N/M})^P
>> >> 
>> >> is a good approximation.
>> > 
>> >Excellent. This is what I was looking for.
>> >
>> >I think the overall story is that if you have significantly more possible
>> >IPs at your disposal than the current number of machines in the network,
>> >then you're going to win -- but only against a few targets, not against
>> >everybody.
>> 
>> Whoops.  Actually, if you have a good chance of winning against 1, then
>> you have a good chance of winning against everybody else too, don't you?
>
>With the same set of darts? (Meaning they land in the same places.)

Yes.  With M >> P, then saying that you've compromised one particular
computer only eats up P darts, you still have M-P darts to take over
other computers with.  So, the problem is largely the same with M-P
darts, at which point you have nearly the same probability of
taking over another specified computer.

Analysis 1:

Probability of taking over 2 *specified* computers (in the worst case
with none of the bins in common) is equivalent to doubling P (you are
trying to go for 2P bins, instead of P bins).  In our approximate sum,
(1 - e^{-N/M})^P, this just squares the probability, which is
equivalent to 2 independent trials.  Of course, they are not
independent, but the point is that the dependence is a wash in the
errors of the M >> P approximation.

Seen from a different point of view, taking over a computer takes up P
darts, leaving you with N-P darts.  But to be likely to take over a
specified computer, N is on the order of M, which is >> P.

From gojomo at usa.net  Tue Sep  4 11:47:01 2001
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:11:43 2006
Subject: Various identifier choices Re: [p2p-hackers] Morpheus, Freenet,
 MojoNation (was Semantic Routing BOF)
References: <Pine.LNX.3.96.1010829140126.29233A-100000@azrael.dyn.cheapnet.net>
 <009101c130bf$49122fe0$0ea7fea9@golden>
 <87bskyfyf1.fsf@azrael.dyn.cheapnet.net>
 <00b301c130c7$61a960c0$0ea7fea9@golden> <20010829235342.J383@sandbergs.org>
Message-ID: <014b01c13571$d6c07c00$0ea7fea9@golden>

Oskar Sandberg writes:
> I considered if the way of calculating UID values from data might
> actually be an area where we should be trying to "interoperate" between
> the different networks, but I figure that even there the emphasis is
> too different for it be worth it.

With Bitzi, we don't mind a proliferation of identifiers, because
we aim to provide, as metadata, alternate names/IDs for catalogued
files.

So for example you'll eventually be able to pull from the Bitzi
catalog a record like...

  Bitprint (SHA1+TigerTree)
  \_ Freenet-CHK
  \_ MojoID
  \_ MD5
  \_ etc. (KazaaID? SHA256?)

...whenever users have contriibuted such associations.

(Of course, users could lie, but since all these identifiers are
calculable from the content itself, the worst that can happen
is the inconvenience of fetching the wrong file once, at which 
point the problem can be detected and reported back to Bitzi
for correction.)

And in a separate message:
> On Wed, Aug 29, 2001 at 01:15:43PM -0700, Gordon Mohr wrote:
> > Yes. I prefer a tree hash for that purpose, as it allows 
> > out-of-order subsegment verification, but the progressive 
> > hash works too.
> 
> There is obviously no need for out-of-order verfication on a stream that
> needs to be tunneled.

I would say that it is obviously beneficial to avoid designing-in 
a permanent assumption of streamed, in-order, complete delivery. 

Might it not be nice, someday, to tunnel different segments from 
different places, simultaneously or spread across time periods,
for lots of reasons -- not least of which being performance and
resistance to traffic analysis?

- Gojomo


From oskar at freenetproject.org  Tue Sep  4 14:05:02 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:43 2006
Subject: Various identifier choices Re: [p2p-hackers] Morpheus, Freenet, MojoNation (was Semantic Routing BOF)
In-Reply-To: <014b01c13571$d6c07c00$0ea7fea9@golden>; from gojomo@usa.net on Tue, Sep 04, 2001 at 11:45:50AM -0700
References: <Pine.LNX.3.96.1010829140126.29233A-100000@azrael.dyn.cheapnet.net> <009101c130bf$49122fe0$0ea7fea9@golden> <87bskyfyf1.fsf@azrael.dyn.cheapnet.net> <00b301c130c7$61a960c0$0ea7fea9@golden> <20010829235342.J383@sandbergs.org> <014b01c13571$d6c07c00$0ea7fea9@golden>
Message-ID: <20010904230943.A547@sandbergs.org>

On Tue, Sep 04, 2001 at 11:45:50AM -0700, Gordon Mohr wrote:
> Oskar Sandberg writes:
> > I considered if the way of calculating UID values from data might
> > actually be an area where we should be trying to "interoperate" between
> > the different networks, but I figure that even there the emphasis is
> > too different for it be worth it.
> 
> With Bitzi, we don't mind a proliferation of identifiers, because
> we aim to provide, as metadata, alternate names/IDs for catalogued
> files.

Of course, you can always make a universal identifier by concatenating
(/including) all the possible options, but a global unique data
identifier standard would certainly be pretty nice. It is more difficult
than just choosing a hash algorithm though, considering things like
prehash encryption and metadata. Just trying to create a document format
that always coalesces for identical data is difficult enough since
the trend elsewhere has been text formats that are not strict (not
character sensitive, allow for comments, have spacing issues, etc, etc).

<>
> Might it not be nice, someday, to tunnel different segments from 
> different places, simultaneously or spread across time periods, 
> for lots of reasons -- not least of which being performance and 
> resistance to traffic analysis? 

No, this hash format is used to verify streams as they pass through
nodes, within atomic pieces of data. The verification segments are not
seperately addressable - and making them so (which makes no sense
because we have another format for that) would mean changing the data
format anyways.


-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
oskar@freenetproject.org

From cyb at azrael.dyn.cheapnet.net  Tue Sep  4 18:17:01 2001
From: cyb at azrael.dyn.cheapnet.net (Brandon Wiley)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
In-Reply-To: <014b01c13571$d6c07c00$0ea7fea9@golden>
Message-ID: <Pine.LNX.4.21.0109042013510.29940-100000@azrael.dyn.cheapnet.net>

> With Bitzi, we don't mind a proliferation of identifiers, because
> we aim to provide, as metadata, alternate names/IDs for catalogued
> files.

This reminds me that I have a question about Bitzi and this seems like the
right place to ask it. Is there a programmatic interface for talking to
Bitzi? Can I write a client? What format does the information come in? I
think it would be very wonderful if I could get an RDF serialization of
the Bitzi catalog via CGI or XML-RPC.


From gojomo at usa.net  Tue Sep  4 20:43:01 2001
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
References: <Pine.LNX.4.21.0109042013510.29940-100000@azrael.dyn.cheapnet.net>
Message-ID: <002b01c135bd$6df13560$1a1ffea9@gojovaio>

Brandon Wiley writes:
> This reminds me that I have a question about Bitzi and this seems like the
> right place to ask it. Is there a programmatic interface for talking to
> Bitzi? Can I write a client? What format does the information come in? I
> think it would be very wonderful if I could get an RDF serialization of
> the Bitzi catalog via CGI or XML-RPC.

We intend to offer several interfaces. The only one up and running 
right now is the plain HTTP POST that does a submit/lookup -- but 
its return info is HTML intended for human eyes.

The next priority is an HTTP GET or POST which dumps summary info 
about one or more given hashes (by SHA1 or full Bitprint). The
summary info will definitely be XML, it might be proper RDF.
Suggestions and commentary from interested parties can definitely
shape what is offered; wishlists and preferences, here or at
our website discussion forums would be appreciated.

After that, other interfaces will be developed as outside
demand and our capacity-to-service-them grow. For example,
a possibility that often comes up is an XML-RPC interface 
for performing the same set of basic basic contribution/rating/
search services available through the web interface.

There will also be periodic full dumps of the contributed info 
at various levels of detail, for mirroring and the creation of
derivative/experimental works. DMOZ is the model we'll emulate,
but here, too, requests and suggestions are welcome.

- Gojomo


From cyb at azrael.dyn.cheapnet.net  Tue Sep  4 23:46:01 2001
From: cyb at azrael.dyn.cheapnet.net (Brandon Wiley)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
In-Reply-To: <002b01c135bd$6df13560$1a1ffea9@gojovaio>
Message-ID: <Pine.LNX.4.21.0109050133340.3159-100000@azrael.dyn.cheapnet.net>

> The next priority is an HTTP GET or POST which dumps summary info 
> about one or more given hashes (by SHA1 or full Bitprint). The
> summary info will definitely be XML, it might be proper RDF.
> Suggestions and commentary from interested parties can definitely
> shape what is offered; wishlists and preferences, here or at
> our website discussion forums would be appreciated.

My preference is certainly for an RDF dump via HTTP. I'm going to be
demoing at O'Reilly my P2P searching technology. It works by gathering RDF
databases of metadata. It was originally developed for Freenet, but I'm
working on integration with MojoNation and BitTorrent. I'd like to add
Bitzi as another system which can be searched in order to find URLs to
content in various networks. If Bitzi provides a straight RDF dump of its
whole database then it can act simply as a catalog source and searching
can be done in my application. If Bitzi provides an implementation of the
searching API either via the CGI version of the API or the XML-RPC version
then it can actually be used as a drop-in replacement for the search
engine part of the application.

I've very interested in the interoperability of systems and I think it
would be great to have a centralized metadata catalog as well as the
decentralized catalogs which can exist in MN and BT. Centralized catalogs
have distinct advantages at times. Bitzi is in a fine position to serve
that role. All that really needs to be done to get things started is to
provider an RDF serialization of the database via HTTP.


From gojomo at usa.net  Wed Sep  5 01:12:01 2001
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
References: <Pine.LNX.4.21.0109050133340.3159-100000@azrael.dyn.cheapnet.net>
Message-ID: <01e801c135e2$5d03dda0$0ea7fea9@golden>

Brandon Wiley writes:
> My preference is certainly for an RDF dump via HTTP. I'm going to be
> demoing at O'Reilly my P2P searching technology. It works by gathering RDF
> databases of metadata. It was originally developed for Freenet, but I'm
> working on integration with MojoNation and BitTorrent. I'd like to add
> Bitzi as another system which can be searched in order to find URLs to
> content in various networks. If Bitzi provides a straight RDF dump of its
> whole database then it can act simply as a catalog source and searching
> can be done in my application. 

The dump will grow quite large; how often would you expect to 
schedule full-fetches (or delta-fetches)?

Do you have any example RDF dumps which would demonstrate the
fields and format conventions you'd find most useful? 

(We won't invent if there's already good precedents to mimic,
and we could crank out an initial dump in very short order if
it'd help give you something better to demo at O'R-P2P.)

> If Bitzi provides an implementation of the
> searching API either via the CGI version of the API or the XML-RPC version
> then it can actually be used as a drop-in replacement for the search
> engine part of the application.

What is your dominant search model? Free text across all credible 
metadata? Field-specific with things like scalar value comparisons 
(e.g. "128 <= bitrate <= 196")? Both?

> I've very interested in the interoperability of systems and I think it
> would be great to have a centralized metadata catalog as well as the
> decentralized catalogs which can exist in MN and BT. Centralized catalogs
> have distinct advantages at times. Bitzi is in a fine position to serve
> that role. All that really needs to be done to get things started is to
> provider an RDF serialization of the database via HTTP.

And that's exactly the role we'd like to play -- being the 
steward for cataloguing tasks which are easiest to do with a 
shared, central reference point, while letting the metadata
itself travel whatever chaotic paths make the most sense to
system developers and users.

- Gojomo


From coderman at mindspring.com  Wed Sep  5 17:07:02 2001
From: coderman at mindspring.com (coderman)
Date: Sat Dec  9 22:11:43 2006
Subject: Various identifier choices Re: [p2p-hackers] Morpheus, 
 Freenet,MojoNation (was Semantic Routing BOF)
References: <Pine.LNX.3.96.1010829140126.29233A-100000@azrael.dyn.cheapnet.net>
	 <009101c130bf$49122fe0$0ea7fea9@golden>
	 <87bskyfyf1.fsf@azrael.dyn.cheapnet.net>
	 <00b301c130c7$61a960c0$0ea7fea9@golden> <20010829235342.J383@sandbergs.org> <014b01c13571$d6c07c00$0ea7fea9@golden>
Message-ID: <3B96DC2A.5ED52BEF@mindspring.com>

Gordon Mohr wrote:
> 
> ...
>
> With Bitzi, we don't mind a proliferation of identifiers, because
> we aim to provide, as metadata, alternate names/IDs for catalogued
> files.
> 
> So for example you'll eventually be able to pull from the Bitzi
> catalog a record like...
> 
>   Bitprint (SHA1+TigerTree)
>   \_ Freenet-CHK
>   \_ MojoID
>   \_ MD5
>   \_ etc. (KazaaID? SHA256?)
> 
> ...whenever users have contriibuted such associations.
> 


Regarding access to this data, would it be possible to implement some
kind of lookup information to obtain all known metadata for a given
SHA-1 key (or TigerTree, or MD5, etc)?

This would be very usefull, and would place a much smaller load than
retreiving significant portions of the index...

From gojomo at usa.net  Thu Sep  6 01:20:02 2001
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:11:43 2006
Subject: Various identifier choices Re: [p2p-hackers] Morpheus,
 Freenet,MojoNation (was Semantic Routing BOF)
References: <Pine.LNX.3.96.1010829140126.29233A-100000@azrael.dyn.cheapnet.net>
 <009101c130bf$49122fe0$0ea7fea9@golden>
 <87bskyfyf1.fsf@azrael.dyn.cheapnet.net>
 <00b301c130c7$61a960c0$0ea7fea9@golden> <20010829235342.J383@sandbergs.org>
 <014b01c13571$d6c07c00$0ea7fea9@golden> <3B96DC2A.5ED52BEF@mindspring.com>
Message-ID: <016201c136ac$a4582740$0ea7fea9@golden>

Coderman writes:
> Gordon Mohr wrote:

> > With Bitzi, we don't mind a proliferation of identifiers, because
> > we aim to provide, as metadata, alternate names/IDs for catalogued
> > files.
> > 
> > So for example you'll eventually be able to pull from the Bitzi
> > catalog a record like...
> > 
> >   Bitprint (SHA1+TigerTree)
> >   \_ Freenet-CHK
> >   \_ MojoID
> >   \_ MD5
> >   \_ etc. (KazaaID? SHA256?)
> > 
> > ...whenever users have contriibuted such associations.

> Regarding access to this data, would it be possible to implement some
> kind of lookup information to obtain all known metadata for a given
> SHA-1 key (or TigerTree, or MD5, etc)?
> 
> This would be very usefull, and would place a much smaller load than
> retreiving significant portions of the index...

That's exactly the first stable interface we'll offer. It'll 
functionally be something like:

   getTicket(SHA1 or Bitprint) 
    -> returns XML "ticket" of best known contributed info about 
       file with that hash (redundant/down-rated tags will not 
       be included)

- Gojomo


From cyb at azrael.dyn.cheapnet.net  Fri Sep  7 02:06:02 2001
From: cyb at azrael.dyn.cheapnet.net (Brandon Wiley)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
In-Reply-To: <01e801c135e2$5d03dda0$0ea7fea9@golden>
Message-ID: <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net>

> The dump will grow quite large; how often would you expect to 
> schedule full-fetches (or delta-fetches)?

It would be best if Bitzi implemented the searching API directly so that
clients could talk directly to Bitzi without having to download the entire
dump. If not then it would probably be best to have a centralized service
which occasionally fetches a dump from Bitzi and then implements the
searching API so as to free normal nodes from having to fetch anything
massive. So if I end up implementing a Bitzi searching service then
fetches will be scheduled whenever it is convenient for Bitzi for fetches
to be scheduled.

> Do you have any example RDF dumps which would demonstrate the
> fields and format conventions you'd find most useful? 

Yes I do. My search engine can be configured to handle any schema. However
I've been using Dublin Core because it's a standard schema for talking
about files and generally people want to search for files. I've attached
an example database. It doesn't use all of the dublin core fields, just
the ones that I felt like filling in. On a side note, I replaced the DC
schema one day with one I made up and turned my search engine into a
personal contact information database and let me friends add
themselves. So it's not limited to file searching.

> (We won't invent if there's already good precedents to mimic,
> and we could crank out an initial dump in very short order if
> it'd help give you something better to demo at O'R-P2P.)

That would be great! I could give a great demo with a fat database. If you
decide to include fields that aren't in Dublic Core then just give me a
list of the names of the fields and I'll configure it to use that schema
instead.

> What is your dominant search model? Free text across all credible 
> metadata? Field-specific with things like scalar value comparisons 
> (e.g. "128 <= bitrate <= 196")? Both?

Currently the API only supports substring matches on a field-by-field
basis for a set of fields defined by a particular schema. So if you're
using DC, for instance, you can search for "ala" in the "Creator" field
and "Wo" in the Title field, things like that. I'd like to add more
complex searching to the API but I think that some discussion needs to
occur regarding a good API for searching metadata before the API can be
extended past its most various basic and obvious initial form.

> And that's exactly the role we'd like to play -- being the 
> steward for cataloguing tasks which are easiest to do with a 
> shared, central reference point, while letting the metadata
> itself travel whatever chaotic paths make the most sense to
> system developers and users.

Whee! This sounds like fun.


From cyb at azrael.dyn.cheapnet.net  Fri Sep  7 02:09:01 2001
From: cyb at azrael.dyn.cheapnet.net (Brandon Wiley)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
In-Reply-To: <01e801c135e2$5d03dda0$0ea7fea9@golden>
Message-ID: <Pine.LNX.4.21.0109070407040.24019-200000@azrael.dyn.cheapnet.net>


Here's that example RDF file that I was supposed to attach.


-------------- next part --------------
<rdf:RDF

  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'

  xmlns:dc='http://purl.org/dc/elements/1.1/' >

  <rdf:Description rdf:about='Noir-03.avi'>

    <dc:Identifier>Noir-03.avi</dc:Identifier>

    <dc:Title>Noir Episode 3</dc:Title>

    <dc:Creator>Unknown</dc:Creator>

    <dc:Description>If I knew that, then I could kill you.</dc:Description>

    <dc:Publisher>Bakamx Fansubs</dc:Publisher>

    <dc:Format>video/x-msvideo</dc:Format>

    <dc:Rights>Fansub</dc:Rights>

  </rdf:Description>

  <rdf:Description rdf:about='bush.html'>

    <dc:Identifier>bush.html</dc:Identifier>

    <dc:Title>Britney Spears</dc:Title>

    <dc:Creator>Gene Weingarten</dc:Creator>

    <dc:Description>What does the Britney Spears chatroom have to say about the new Bush administration?</dc:Description>

    <dc:Publisher>Washington Post</dc:Publisher>

    <dc:Format>text/html</dc:Format>

    <dc:Rights>Copyright 2001 The Washington Post Company</dc:Rights>

  </rdf:Description>

  <rdf:Description rdf:about='Noir-02.avi'>

    <dc:Identifier>Noir-02.avi</dc:Identifier>

    <dc:Title>Noir Episode 2</dc:Title>

    <dc:Creator>Unknown</dc:Creator>

    <dc:Description>Who am I? And why don&apos;t I feel regret?</dc:Description>

    <dc:Publisher>Bakamx Fansubs</dc:Publisher>

    <dc:Format>video/x-msvideo</dc:Format>

    <dc:Rights>Fansub</dc:Rights>

  </rdf:Description>

  <rdf:Description rdf:about='Noir-01.avi'>

    <dc:Identifier>Noir-01.avi</dc:Identifier>

    <dc:Title>Noir Episode 1</dc:Title>

    <dc:Creator>Unknown</dc:Creator>

    <dc:Description>Sexy women committing horribly violent acts for no apparent reason</dc:Description>

    <dc:Publisher>Bakamx Fansubs</dc:Publisher>

    <dc:Format>video/x-msvideo</dc:Format>

    <dc:Rights>Fansub</dc:Rights>

  </rdf:Description>

  <rdf:Description rdf:about='rose_kndy'>

    <dc:Rights>Life, liberty, and the pursuit of happiness</dc:Rights>

    <dc:Format>human/female</dc:Format>

    <dc:Title>Rose Kennedy</dc:Title>

    <dc:Creator>Her mother</dc:Creator>

    <dc:Description>A lovely young lady from Chicago, whose hobbies include computer security and punk rock.</dc:Description>

    <dc:Publisher>N/A</dc:Publisher>

    <dc:Identifier>rose_kndy</dc:Identifier>

  </rdf:Description>

  <rdf:Description rdf:about='admission-fee'>

    <dc:Rights>1 gold bar, please</dc:Rights>

    <dc:Format>e-cash</dc:Format>

    <dc:Title>burning man admission fee</dc:Title>

    <dc:Creator>u.s. gubment</dc:Creator>

    <dc:Description>greenbacks</dc:Description>

    <dc:Publisher>u.s. treasury</dc:Publisher>

    <dc:Identifier>admission-fee</dc:Identifier>

  </rdf:Description>

  <rdf:Description rdf:about='test'>

    <dc:Rights>f</dc:Rights>

    <dc:Format>e</dc:Format>

    <dc:Title>a</dc:Title>

    <dc:Creator>b</dc:Creator>

    <dc:Description>c</dc:Description>

    <dc:Publisher>d</dc:Publisher>

  </rdf:Description>

  <rdf:Description rdf:about='cam.jpg'>

    <dc:Identifier>cam.jpg</dc:Identifier>

    <dc:Title>Brandon&apos;s Webcam</dc:Title>

    <dc:Creator>Brandon Wiley</dc:Creator>

    <dc:Description>A picture of me, updated once every 60 seconds.</dc:Description>

    <dc:Publisher>Brandon Wiley</dc:Publisher>

    <dc:Format>image/jpeg</dc:Format>

    <dc:Rights>BSD (with attribution clause)</dc:Rights>

  </rdf:Description>

  <rdf:Description rdf:about='lcd.jpg'>

    <dc:Rights>GPL</dc:Rights>

    <dc:Format>image/jpeg</dc:Format>

    <dc:Title>My Sexxy LCD Monitor</dc:Title>

    <dc:Creator>V.N.V.Tech</dc:Creator>

    <dc:Description>My monitor is flat and very light. Does it make you randy?</dc:Description>

    <dc:Publisher>Brandon Wiley</dc:Publisher>

    <dc:Identifier>lcd.jpg</dc:Identifier>

  </rdf:Description>

  <rdf:Description rdf:about='tux-vs-clippy.mov'>

    <dc:Identifier>tux-vs-clippy.mov</dc:Identifier>

    <dc:Title>Tux vs. Clippy</dc:Title>

    <dc:Creator>Viktorie Navratilova</dc:Creator>

    <dc:Description>An epic battle between two forces of nature.</dc:Description>

    <dc:Publisher>Brandon Wiley</dc:Publisher>

    <dc:Format>video/quicktime</dc:Format>

    <dc:Rights>GPL</dc:Rights>

  </rdf:Description>

  <rdf:Description rdf:about='yatta.mpeg'>

    <dc:Identifier>yatta.mpeg</dc:Identifier>

    <dc:Title>Yatta!</dc:Title>

    <dc:Creator>Unknown</dc:Creator>

    <dc:Description>Wacky Japanese Boy Band Music Video</dc:Description>

    <dc:Publisher>Unknown</dc:Publisher>

    <dc:Format>video/mpeg</dc:Format>

    <dc:Rights>GPL</dc:Rights>

  </rdf:Description>

  <rdf:Description rdf:about='bomberman.nes'>

    <dc:Identifier>bomberman.nes</dc:Identifier>

    <dc:Title>Bomber Man</dc:Title>

    <dc:Creator>Unknown</dc:Creator>

    <dc:Description>ROM for the classic Nintendo game where you blow stuff up</dc:Description>

    <dc:Publisher>Unknown</dc:Publisher>

    <dc:Format>application/nestra-rom</dc:Format>

    <dc:Rights>You have no right to this.</dc:Rights>

  </rdf:Description>

  <rdf:Description rdf:about='edith.mp3'>

    <dc:Identifier>edith.mp3</dc:Identifier>

    <dc:Title>She Thinks She&apos;s Edith Head</dc:Title>

    <dc:Creator>They Might Be Giants</dc:Creator>

    <dc:Description>A song about a confused young lady</dc:Description>

    <dc:Publisher>Unknown</dc:Publisher>

    <dc:Format>audio/mp3</dc:Format>

    <dc:Rights>It&apos;s an MP3.</dc:Rights>

  </rdf:Description>

  <rdf:Description rdf:about='@.txt'>

    <dc:Identifier>@.txt</dc:Identifier>

    <dc:Title>@</dc:Title>

    <dc:Creator>Brandon Wiley</dc:Creator>

    <dc:Description>The script for a comic book I wrote.</dc:Description>

    <dc:Publisher>Brandon Wiley</dc:Publisher>

    <dc:Format>text/plain</dc:Format>

    <dc:Rights>Heck if I know</dc:Rights>

  </rdf:Description>

</rdf:RDF>

From bram at gawth.com  Mon Sep 10 16:20:01 2001
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] BitTorrent 2.3 out - now with user feedback!
Message-ID: <Pine.LNX.4.21.0109101616390.544-100000@ultra.gawth.com>

I just pushed out the latest release of BitTorrent - it now has great user
feedback, and I fixed a bug where I forgot to make remotely initated
sockets non-blocking (oops).

You can get it here -

http://bitconjurer.org/BitTorrent/download.html

The next release will be the much anticipated web-integrated one for
Windows.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From bram at gawth.com  Wed Sep 12 11:06:02 2001
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] 2.3.1 out - BitTorrent has download resuming
Message-ID: <Pine.LNX.4.21.0109121102410.3552-100000@ultra.gawth.com>

BitTorrent now has download resuming, get it here -

http://bitconjurer.org/BitTorrent/download.html

To use download resuming, simply save using the same local file name you
used for a partial download, and it will pick up where it left off.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From bram at gawth.com  Sat Sep 15 16:46:01 2001
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] BitTorrent now integrates into Internet Explorer!
Message-ID: <Pine.LNX.4.21.0109151641460.16257-100000@ultra.gawth.com>

The much-anticipated web-integrated version of BitTorrent is now out -

http://bitconjurer.org/BitTorrent/

Next will come some serious polishing - clean shutdown, no more text
window, and storing of hash values in the publisher.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From zooko at zooko.com  Sun Sep 16 09:05:01 2001
From: zooko at zooko.com (zooko@zooko.com)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks
Message-ID: <E15ieGs-00060l-00@imp>

About two weeks ago I posted a message entitled "proxying and introduction: the
two fundamental operations of emergent networks":

http://zgp.org/pipermail/p2p-hackers/2001-August/000258.html


 an anonymous fan wrote:
>
> This is great stuff, Zooko.  It has really got me thinking.  ...too bad that the
> following thread was about MN more than how the resulting emergent network
> behaves.

Thanks!

Yes, I was sort of hoping that Oskar or someone would pick up the implicit
challenge.  See, AFAICT, Freenet and most other networks have focussed
exclusively on proxying and neglected introduction.

There is a sound theoretical reason for concentrating on proxying, as Oskar can
lucidly explain: that with introduction alone, and no proxying, the size of
your horizon is proportional to the size of your local state (i.e. you have to
remember everyone's id and address, or whatever, in order to use them).
Therefore introduction is "non-scalable" in Oskar's opinion.

The counterargument to that is that introduction is *required*.  A new node is
created and it is not connected to anyone in the network.  How does it get
connected?  That is the problem of Original Introduction.  People who neglect
introduction end up with some kind of kludge to do Original Introduction (e.g.
node-lists on HTTP, the MN Meta Trackers, or manually configuring your node to
connect to other nodes), and with no transitive introduction at all.

But that gives a central point of failure (e.g. you can take out the HTTP
server that newcomers depend on in order to connect to the network), or at
least passes the buck for having a robust, emergent introduction service off to
the HTTP or the manual configuration or whatever.

So a really good emergent network needs both: proxying *and* introduction.


IMO a really good emergent network is going to have:

1. good Transitive Introduction

2. good Original Introduction, which should utilize the features of the
    Transitive Introduction -- instead of being a wholly separate behavior with
    different properties
 2.a. easy-for-users Original Introduction such as "type in the DNS or IP of
    your friend who is already connected to the Emergent Net", or "scan local
    net / local wireless area for Emergent Net nodes"
 2.b. an easily accessible default original introduction service i.e. a set of
    redundant original introducer nodes like the MN Meta Trackers

3. Proxying for superlinear effective horizon


But in the immediate term, I don't really care about number 3: proxying of
operations (although I do care about relaying of messages) until the number of
nodes on my network times the amount of local space it takes to carry on a
relationship is approaching the amount of local space that I am willing to
allocate on each node.  Quick back-of-the-envelope calculations go like this:

128 bytes for addressing information (could do with less, but let's be a bit
pessimistic), 128 bytes for crypto information (could do with less, again), and
let's just say 256 bytes for local reputation (i.e. what you know about that
other node, how it performs, etc.) for a nice round number of 512 bytes per
counterparty.  (Note: if you really wanted to squeeze I believe you could get
this down to 20 bytes for crypto and addressing, and maybe 20 bytes for local
reputation for a total of 40 bytes per counterparty, but let's be pessimistic
while writing on the back of this particular envelope...)

So if users are willing to allocate up to 128 MB of local persistent storage
for maintaining their relationships in the network, then each node can have
direct relationships with at least 2^18 == 262,144 other nodes.

By the time this 2^18 limit is actually hurting my network, Freenet v0.5 will
be out, and I can learn from all of the applied research that Freenet has done
in effective proxying techniques and then add those proxying techniques into my
network.  Of course, it's also possible that harddrives will be bigger by then
and users will be willing to allocate more than 128 MB just for peer
relationships.


Now my purpose here is definitely not to criticize proxying as such!  Proxying
is very important for a lot of reasons.  For one thing, if you don't have
proxying then local network usage is *also* proportional to the effective
network horizon of a given operation (although there of several things that can
ameliorate that problem including multicast) and for another thing, smaller
devices with tighter storage constraints are more likely to need proxying.
Another reason is that there may be some theoretical or engineering benefits
to combining message-relay with higher-layer operations (as Freenet does) as
opposed to making them separate abstraction layers (as Mojo Nation does).
Finally, as Lucas Gonze pointed out in private e-mail, proxying allows for more
complex relationships, for example maybe you don't *want* to introduce your two
friends to each other, because you don't want them to be able to exchange
information without giving you access to it.


So my purpose here is *not* to denigrate proxying, but to draw attention to the
important of introduction.  Here is a quick recap of the reasons why
introduction is vitally important to emergent networks:

1. Introduction is required, in order to have a network at all in the first
   place.
2. You can do non-transitive Original Introduction, but then it must be
   centralized and/or manually managed by humans.  (Which is okay for a lot of
   applications.)
3. If you are going to implement transitive, automatic Introduction in order to
   have robust, automatic joining of the network, then why not use it?  e.g.:
  a. You already use it to go from 1 neighbors to K neighbors (where K is the
   minimum number of neighbors that you need to be part of the network), then
   why not use it to go from K neighbors to M neighbors, where M is a higher
   number for greater efficiency in some cases.
  b. Use automatic transitive introduction to dynamically heal and optimize the
   network.
4. Introduction may make for more efficient networks than proxying in some
   cases (those cases where higher degree of connectivity is better).
  a. One such case seems to be when the total number of nodes on the entire
   network is sufficiently small, which the current state of Mojo Nation.


Regards,

Zooko

P.S.  Thanks again, for those who didn't read my earlier message, to Oskar,
Adam Langley, and Bram for getting me thinking about this last year, and to
Mark Miller for teaching me the ways of Granovetterism (== Introductionism).

P.P.S.  I don't know that much about Freenet, and it probably already has some
transitive introduction features, but the Freenet people have not discussed it
in terms of "Proxying and Introduction: The Two Fundamental Network Operations"
before now as far as I know.


From zooko at zooko.com  Sun Sep 16 12:08:01 2001
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks 
In-Reply-To: Message from zooko@zooko.com 
   of "Sun, 16 Sep 2001 08:55:26 PDT." <E15ieGs-00060l-00@imp> 
References: <E15ieGs-00060l-00@imp> 
Message-ID: <E15ih7b-0008Va-00@imp>

following up to my own post:

 I, Zooko, wrote:
>
> P.P.S.  I don't know that much about Freenet, and it probably already has some
> transitive introduction features, but the Freenet people have not discussed it
> in terms of "Proxying and Introduction: The Two Fundamental Network Operations"
> before now as far as I know.


I got a nice description from Adam Langley on IRC (irc.openprojects.net,
channel #infoanarchy, founded by infoanarchy.org), of Freenet's transitive
introduction mechanism.  It sounds typically elegant, and I'm sure we are all
eager to know how it behaves emergently.


Right now I'm worrying about Original Introduction -- how do you get new nodes
added to the network for the first time, if your centralized node list, CGI
script, or Meta Tracker, is unavailable?

The best answer I can come up with is that they talk to one of their friends
who is already in the network and then they type in the IP address and port
number of that friend's node.

Regards,

Zooko


From hal at finney.org  Sun Sep 16 12:15:02 2001
From: hal at finney.org (hal@finney.org)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks
Message-ID: <200109161914.MAA07207@finney.org>

Zooko writes:
> Right now I'm worrying about Original Introduction -- how do you get new nodes
> added to the network for the first time, if your centralized node list, CGI
> script, or Meta Tracker, is unavailable?

If the only problem is finding another node, and if nodes listen on well
known ports, you can just try IP addresses at random until you find one.
This method is being used by the Linux Morpheus clone at
http://sourceforge.net/projects/gift/.

However it is not a good long term solution for many P2P systems for two
reasons: first it may not be a good idea to use a well known port if it
turns out to be controversial; and second, IPV6 will provide too many
addresses to ping them at random.

> The best answer I can come up with is that they talk to one of their friends
> who is already in the network and then they type in the IP address and port
> number of that friend's node.

Maybe a browser plug-in could save the need for typing the info; people could
put it on a web page in some special format and the plug-in could read the
node address and launch the P2P client.

Hal

From coderman at mindspring.com  Sun Sep 16 12:31:01 2001
From: coderman at mindspring.com (coderman)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental 
 operations of emergent networks
References: <E15ieGs-00060l-00@imp>
Message-ID: <3BA51BB0.9D29344@mindspring.com>

zooko@zooko.com wrote:
> 
> ...
>
> 128 bytes for addressing information (could do with less, but let's be a bit
> pessimistic), 128 bytes for crypto information (could do with less, again), and
> let's just say 256 bytes for local reputation (i.e. what you know about that
> other node, how it performs, etc.) for a nice round number of 512 bytes per
> counterparty.  (Note: if you really wanted to squeeze I believe you could get
> this down to 20 bytes for crypto and addressing, and maybe 20 bytes for local
> reputation for a total of 40 bytes per counterparty, but let's be pessimistic
> while writing on the back of this particular envelope...)
> 
> ...
>
> Here is a quick recap of the reasons why
> introduction is vitally important to emergent networks:
> 
> ...
>
> 3. If you are going to implement transitive, automatic Introduction in order to
>    have robust, automatic joining of the network, then why not use it?  e.g.:
>   a. You already use it to go from 1 neighbors to K neighbors (where K is the
>    minimum number of neighbors that you need to be part of the network), then
>    why not use it to go from K neighbors to M neighbors, where M is a higher
>    number for greater efficiency in some cases.
>   b. Use automatic transitive introduction to dynamically heal and optimize the
>    network.
> 4. Introduction may make for more efficient networks than proxying in some
>    cases (those cases where higher degree of connectivity is better).
>   a. One such case seems to be when the total number of nodes on the entire
>    network is sufficiently small, which the current state of Mojo Nation.
> 


A few thoughts:  I am building a system that uses no proxying (well, proxying is
not a required feature of the network, although it can be used at a single level
of indirection) and relies exclusively on transitive introduction as you call it,
and reputation to organize peers within the network.

This has the effect that a) like you mention, you are continually healing and 
optimizing the organization/topology of the network, and b) This network uses
an extremely high level of direct connectivity, with the exact amount of
connectedness determined by each peer on an individual basis based on factors
such as bandwidth, memory, user preference, etc.

Now, regarding Original Introduction :  I am still working on a good way to do
this, and it is rather hard to come up with something general and robust.  I
have decided to make this bootstrapping available in a few specific ways:

1. By default, there is a defaulty page (or pages) that can be queried to grab
   a few initial hosts.  Once these are obtained, the transitive introduction
   process is primed, and this method should never be required again.

2. Manually enter a friends node address.  Again, this kicks off the transitive
   introduction, so this should only be required once.

3. Scanning subnets for nodes.  I dont really like this method, so I will
   probably avoid this at all costs, unless someone impresses upon me the 
   dire need for it.

Transitive introduction in my case consists of querying the highest quality
peers for a set of their higher quality peers.

In gnutella, this transitive introduction consisted of watching host address
fly through the network, and connecting to any number of them as desired.

Are there additional ways of transitive introduction that have not been widely
discussed yet?  In particular I am curious how FastTrack implements this.

Obviously central servers should be avoided, but the decentralized options appear
rather limited off hand.

From lucas at gonze.com  Sun Sep 16 12:46:01 2001
From: lucas at gonze.com (Lucas Gonze)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks
In-Reply-To: <200109161914.MAA07207@finney.org>
Message-ID: <NEBBJIHMMLKHEOPNOGHDOEBAEJAA.lucas@gonze.com>

I have been thinking about a seeding method that works by spreading
notifications across a bunch of public -- non-dedicated and unrelated -- forums.

There are a zillion archived and searchable publication nodes.  A node could
ship with knowledge of a Google search string, an Altavista search string, a
list of discussion group search URLs, etc.  Just a big list of URLs and some
prior knowledge of what to do with each.  These don't have to on the web, they
just have to be reachable in some way.  Newsgroups, ftp sites, DNS records,
online classifieds, IRC...   anything that can accept and publish data.

Some nodes would drop their contact data in these locations, others would pick
it up.

This is a flavor of LIPP, the Lossy Inefficient Paranoid Protocol:
http://groups.yahoo.com/group/decentralization/message/3287

Gordon Mohr pointed out the relation of LIPP to:
   The Dining Cryptographers Problem: Unconditional Sender and Recipient
Untraceability
   David Chaum
   J. Cryptology (1988)
   http://komarios.net/crypt/diningcr.htm
   (Also http://komarios.net/crypt/dc.htm &
http://komarios.net/crypt/dc-demo.htm)

   Chaffing and Winnowing: Confidentiality without Encryption
   Ronald L. Rivest
   March 18, 1998 (rev. April 24, 1998)
   http://theory.lcs.mit.edu/~rivest/chaffing.txt

I noticed the other day that dogs communicating via pee is LIPP-like in that it
is lossy and inefficient but not paranoid.

- Lucas


From coderman at mindspring.com  Sun Sep 16 13:05:02 2001
From: coderman at mindspring.com (coderman)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental 
 operations of emergent networks
References: <NEBBJIHMMLKHEOPNOGHDOEBAEJAA.lucas@gonze.com>
Message-ID: <3BA52394.DEBE0941@mindspring.com>

Lucas Gonze wrote:
>
> ...
> 
> There are a zillion archived and searchable publication nodes.  A node could
> ship with knowledge of a Google search string, an Altavista search string, a
> list of discussion group search URLs, etc.  Just a big list of URLs and some
> prior knowledge of what to do with each.  These don't have to on the web, they
> just have to be reachable in some way.  Newsgroups, ftp sites, DNS records,
> online classifieds, IRC...   anything that can accept and publish data.
> 

I had actually thought of using this method.  I put up a page with a well
defined string 'ALPINE_BOOTSTRAP_HOSTS' and let google get around to crawling it.

Once indexed, anyone could put up a page with 'ALPINE_BOOTSTRAP_HOSTS' and if
google got to it, it would eventually be locatable by a client for use with
bootstrapping some peers.  This would then kick off the transitive introduction,
and so it should only be needed once.

Example:
http://www.google.com/search?q=cache:FAVsZne1Lao:cubicmetercrystal.com/alpine/bootstrap_hosts.html+ALPINE+BOOTSTRAP&hl=en

I took this page down, and decided against this method because the relevance
metrics used by search engines can be abused.  I dont know how likely it would
be, but a malicious party could place a highly linked page up that only lists
rogue endpoints.  (perhaps the RIAA wants peers connecting to their snooping
servers?)

It is an interesting idea, and perhaps coupled with some kind of assymetric/
public key encryption could be inplemented in a secure fashion.

I.e. All pages need to be authorized and signed by something like a certificate
authority for your network (maybe the maintainers, who knows).  If the page
located by google has a valid signature, then it could be used.  If it does
not, then search for the next page until you find one that does have a valid
signature.

You could apply this same mechanism to a mailing list or news group as well...

From alk at pobox.com  Sun Sep 16 18:00:01 2001
From: alk at pobox.com (Tony Kimball)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks 
References: <E15ieGs-00060l-00@imp>
	<E15ih7b-0008Va-00@imp>
Message-ID: <15269.19176.105147.503740@gargle.gargle.HOWL>

Quoth Zooko on Sunday, 16 September:

: Right now I'm worrying about Original Introduction -- how do you get new nodes
: added to the network for the first time, if your centralized node list, CGI
: script, or Meta Tracker, is unavailable?

Peer-wise horizontal diversification is just one form of
decentralization.  Another is vertical decentralization, in which 
a variety of modalities are used to obtain operational information,
with rolling failover between them.  It is more time-consuming
to implement a multimodal introduction mechanism, but it can 
make your system quite robust.  Pick a few:

- IM services
  Messenger
  Yahoo
  AOL
  etc.

- Mailboxes
  Hotmail
  Yahoo
  et.c

- NNTP servers/groups
  
- IRC networks/channels
  
- GNUTELLA-based discovery
  (requires a significant number of agents listening on the gnet, due
  to limited horizon) 

- Freenet-based discovery
  Publish connection-point lists

- DNS-based discovery
  dyndns.org
  eyep.net
  etc.

- Supernodes

- FastTrack-based discovery

- Opennap/napigator-based discovery

Some are very quick to get working because there is plenty of solid,
re-usable code available.  

Myself, I'd like to see the peer-system community rally to establish a
reliable, persistent open DNS space for discovery services.  Perhaps
it's a field-of-dreams situation:  If you build it, they will come.
But perhaps not: So much NIH.


From lucas at gonze.com  Mon Sep 17 11:45:02 2001
From: lucas at gonze.com (Lucas Gonze)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] part 2: proxying and introduction: the two fundamental operations of emergent networks 
In-Reply-To: <15269.19176.105147.503740@gargle.gargle.HOWL>
Message-ID: <NEBBJIHMMLKHEOPNOGHDAECNEJAA.lucas@gonze.com>

In the early gnutella days you got an introduction by going to #gnutella on irc
and looking up IPs of people in the room, who were likely to be running the
software.  This way of bootstrapping a network based on personal connections was
primitive and slow but extremely robust.

> Peer-wise horizontal diversification is just one form of
> decentralization.  Another is vertical decentralization, in which
> a variety of modalities are used to obtain operational information,
> with rolling failover between them.  It is more time-consuming
> to implement a multimodal introduction mechanism, but it can
> make your system quite robust.  Pick a few:


From ml at gondwanaland.com  Tue Sep 18 17:50:02 2001
From: ml at gondwanaland.com (Mike Linksvayer)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
In-Reply-To: <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net>; from cyb@azrael.dyn.cheapnet.net on Fri, Sep 07, 2001 at 04:05:14AM -0500
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net>
Message-ID: <20010918204913.A80700@or.pair.com>

On Fri, Sep 07, 2001 at 04:05:14AM -0500, Brandon Wiley wrote:
> > (We won't invent if there's already good precedents to mimic,
> > and we could crank out an initial dump in very short order if
> > it'd help give you something better to demo at O'R-P2P.)
> 
> That would be great! I could give a great demo with a fat database. If you
> decide to include fields that aren't in Dublic Core then just give me a
> list of the names of the fields and I'll configure it to use that schema
> instead.

An experimental dump featuring basic "best data" on 261,190 discrete
files is now available at http://preview.openbits.org.  Here's a
one record example:

<rdf:RDF xmlns:rdf  = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc   = "http://purl.org/dc/elements/1.1/"
         xmlns:bz   = "http://bitzi.com/xmlns/2001/09/10/experimental#"
         xmlns:mm   = "http://musicbrainz.org/mm/mm-2.0#">
<!-- (C) 2001 Bitzi; see http://bitzi.com/openbits for license to use in whole or part-->
  <rdf:Description rdf:about='bitprint:3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VDNK5UNR8ZPQ5MFASNGVB5MISV7ESUSB2MN5R3IY2'>
    <bz:length>4128768</bz:length>
    <bz:first20>4944330300000000170B47454F42000005900000</bz:first20>
    <bz:filename>Brazzaville - Brazzaville 2002 - 05 - Ocean (With Joe Frank).mp3</bz:filename>
    <bz:url>http://www.emusic.com/albums/19514/</bz:url>
    <dc:title>Ocean (With Joe Frank)</dc:title>
    <bz:album>Brazzaville 2002</bz:album>
    <dc:creator>Brazzaville</dc:creator>
    <mm:trackNum>5</mm:trackNum>
    <dc:date>1999</dc:date>
    <mm:duration>258440</mm:duration>
    <bz:bitrate>128</bz:bitrate>
    <bz:samplerate>44100</bz:samplerate>
    <bz:stereo>y</bz:stereo>
    <bz:audio_sha1>DQ6TJEH2V39CVT3JM2SPCHBVH3SDWX7W</bz:audio_sha1>
    <bz:society>Joe_Frank</bz:society>
    <bz:md5>EWGAR2KGANV9QMI9B4529TD496</bz:md5>
  </rdf:Description>
</rdf:RDF>

Bitprint detail may be accessed (html only right now) at
http://bitzi.com/lookup/<bitprint>, i.e.,

http://bitzi.com/lookup/3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VDNK5UNR8ZPQ5MFASNGVB5MISV7ESUSB2MN5R3IY2

for the example above.  The sha1 component of the bitprint may also
be used alone, like

http://bitzi.com/lookup/3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD


You'll encounter the following non-Dublin Core fields:

bz:length	File size
bz:first20	First 20 bytes of file
bz:subjective	Subjective comment
bz:url		Related URL
bz:album	Album name
mm:trackNum	Album track number
mm:duration	Track duration (ms)
bz:bitrate	(kilobits/second)
bz:samplerate	Hz
bz:stereo	y|n
bz:encoder	Audio encoder
bz:audio_sha1	sha1 of audio data
bz:width	Image width
bz:height	Image height
bz:bpp		Impage bits/pixel
bz:samplesize	.wav specific
bz:channels	Channels in a .ogg (stereo=2)
bz:broadcaster	Original broadcaster
bz:series	Series name
bz:medium	Broadcast medium
mm:trmId	Relatable audio fingerprint
bz:society	"BitSociety" interest group
bz:md5		MD5 full file hash

We also used description, title, creator and date from Dublin Core.

Obviously you won't find all of the above in a single record.

This experimental dump is intentionally simple and flat.  Future
dumps may be more structured and contain more Bitzi "community"
data, e.g., contributor attributions and content rating.

Criticism desired!

-- 
  Mike Linksvayer
  http://gondwanaland.com/ml/

From coderman at mindspring.com  Tue Sep 18 18:22:01 2001
From: coderman at mindspring.com (coderman)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com>
Message-ID: <3BA7F46F.892E3989@mindspring.com>

Mike Linksvayer wrote:
>
> ...
> 
> for the example above.  The sha1 component of the bitprint may also
> be used alone, like
> 
> http://bitzi.com/lookup/3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD
> 


I have a quick question.  How hard would it be to support SHA-1
in hex format?  I.e. 

http://bitzi.com/lookup/4A110667BE591E02DAE39A199390D6699E8D2959


Anyone know what the most popular type of representation is for
SHA-1 hashes?

From ml at gondwanaland.com  Tue Sep 18 18:34:01 2001
From: ml at gondwanaland.com (Mike Linksvayer)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
In-Reply-To: <3BA7F46F.892E3989@mindspring.com>; from coderman@mindspring.com on Tue, Sep 18, 2001 at 06:27:11PM -0700
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com>
Message-ID: <20010918213352.A8771@or.pair.com>

On Tue, Sep 18, 2001 at 06:27:11PM -0700, coderman wrote:
> Mike Linksvayer wrote:
> > for the example above.  The sha1 component of the bitprint may also
> > be used alone, like
> > 
> > http://bitzi.com/lookup/3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD
> 
> I have a quick question.  How hard would it be to support SHA-1
> in hex format?  I.e. 
> 
> http://bitzi.com/lookup/4A110667BE591E02DAE39A199390D6699E8D2959

We already do support hex-encoded sha1 lookups.  The URL you cite
redirects to

http://bitzi.com/lookup/JIISN378MERAFYZDVIN3HEGYPGRI4KK3

The hex version of the URL I gave as an example is

http://bitzi.com/lookup/CA9174243CD55B960AA02691075E39730512F663

-- 
  Mike Linksvayer
  http://gondwanaland.com/ml/

From coderman at mindspring.com  Tue Sep 18 18:46:01 2001
From: coderman at mindspring.com (coderman)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <20010918213352.A8771@or.pair.com>
Message-ID: <3BA7F9F3.61DBF5CE@mindspring.com>

Mike Linksvayer wrote:
> 
> ...
>
> We already do support hex-encoded sha1 lookups.  The URL you cite
> redirects to
> 


Excelent, thanks.  I read in some past discussion archives that hex
was to be supported for backwards compatability for an indefinate
period of time.  Is this still the plan?  Is there a date when
you might move away from hex?

From justin at chapweske.com  Tue Sep 18 18:48:01 2001
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com>
Message-ID: <3BA7F8D4.2090108@chapweske.com>

hex and Base64 are the most popular formats.

coderman wrote:

>Mike Linksvayer wrote:
>
>>...
>>
>>for the example above.  The sha1 component of the bitprint may also
>>be used alone, like
>>
>>http://bitzi.com/lookup/3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD
>>
>
>
>I have a quick question.  How hard would it be to support SHA-1
>in hex format?  I.e. 
>
>http://bitzi.com/lookup/4A110667BE591E02DAE39A199390D6699E8D2959
>
>
>Anyone know what the most popular type of representation is for
>SHA-1 hashes?
>_______________________________________________
>p2p-hackers mailing list
>p2p-hackers@zgp.org
>http://zgp.org/mailman/listinfo/p2p-hackers
>


From gojomo at usa.net  Tue Sep 18 18:57:01 2001
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <20010918213352.A8771@or.pair.com>
Message-ID: <011501c140af$0bda60c0$6601a8c0@gojovaio>

Base64 SHA1 lookups -- with the Freenet character substitutions, 
to be more URL-friendly -- should work too. 

(The code is there but was only tested on random format 
conformant input, which of course never returns real 
catalog details. For example:
   http://bitzi.com/lookup/JI2sN378M~a2FYzeVIn3HegY-GR )

So please try it out with some Base64 SHA1 values for 
files you know to be in the catalog, and let me know if 
it works!

- Gordon
____________________
Gordon Mohr, gojomo@
bitzi.com, Bitzi CTO


From gojomo at usa.net  Tue Sep 18 19:21:01 2001
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <20010918213352.A8771@or.pair.com> <3BA7F9F3.61DBF5CE@mindspring.com>
Message-ID: <013901c140b2$80708a60$6601a8c0@gojovaio>

coderman writes:
> Excelent, thanks.  I read in some past discussion archives that hex
> was to be supported for backwards compatability for an indefinate
> period of time.  Is this still the plan?  Is there a date when
> you might move away from hex?

No -- beyond backward compatibility with our prior use of
hex, we want these lookups to be convenient/easy for
the widest possible audience. 

So while we prefer Base32 as our canonical/displayed format, 
we'll accept other formats for lookups indefinitely.

- Gordon


From justin at chapweske.com  Tue Sep 18 20:23:02 2001
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <20010918213352.A8771@or.pair.com> <011501c140af$0bda60c0$6601a8c0@gojovaio>
Message-ID: <3BA80F7F.50805@chapweske.com>

With so many accepted input formats, wouldn't it make more sense to be 
explicit about which format you are using in the URL?

Gordon Mohr wrote:

>Base64 SHA1 lookups -- with the Freenet character substitutions, 
>to be more URL-friendly -- should work too. 
>
>(The code is there but was only tested on random format 
>conformant input, which of course never returns real 
>catalog details. For example:
>   http://bitzi.com/lookup/JI2sN378M~a2FYzeVIn3HegY-GR )
>
>So please try it out with some Base64 SHA1 values for 
>files you know to be in the catalog, and let me know if 
>it works!
>
>- Gordon
>____________________
>Gordon Mohr, gojomo@
>bitzi.com, Bitzi CTO
>
>
>
>_______________________________________________
>p2p-hackers mailing list
>p2p-hackers@zgp.org
>http://zgp.org/mailman/listinfo/p2p-hackers
>


From cyb at azrael.dyn.cheapnet.net  Tue Sep 18 22:58:01 2001
From: cyb at azrael.dyn.cheapnet.net (Brandon Wiley)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
In-Reply-To: <20010918204913.A80700@or.pair.com>
Message-ID: <Pine.LNX.4.21.0109182358420.32448-100000@azrael.dyn.cheapnet.net>

> An experimental dump featuring basic "best data" on 261,190 discrete
> files is now available at http://preview.openbits.org.

I'm so excited! I'll use this database in my demo at O'Reilly. I'll start
working on integrating the new schema elements tomorrow.


From zooko at zooko.com  Wed Sep 19 07:24:01 2001
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices))
In-Reply-To: Message from Justin Chapweske <justin@chapweske.com> 
   of "Tue, 18 Sep 2001 20:45:56 CDT." <3BA7F8D4.2090108@chapweske.com> 
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com>  <3BA7F8D4.2090108@chapweske.com> 
Message-ID: <E15ji7k-0006aI-00@imp>

Having URLs which are short enough to cut and paste is important.  Encoding six
bits per character (base 64) is that much better than encoding five bits per
character.

A mojoid in base-32 would look like this:

http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1

The same mojoid in base-64 would look like this:

http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDhXvLWxCZdzV6F8Q

That can make a significant difference in terms of usability, due to
line-wrapping in SMTP gateways and in GUIs, the awkwardness of layout when
representing this mojoid e.g. in HTML, and the general user experience.  The
bigger and uglier the URL, the less a user likes to deal with it.

By the way, we might try to squeeze mojoids.  I think we can get down to 30
bytes from 40 (by convincing ourselves that an 80-bit symmetric key has the
same attack work factor as a 160-bit hash id), so then it would look like:

http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e1

or

http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDh

We might also have an unencrypted mojoid, which would be 20 bytes, like this:

http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da

or

http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8No

Regards,

Zooko


From justin at chapweske.com  Wed Sep 19 10:03:01 2001
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices))
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com>  <3BA7F8D4.2090108@chapweske.com> <E15ji7k-0006aI-00@imp>
Message-ID: <3BA8CFB3.1020509@chapweske.com>

Are you planning on using the Freenet modification of Base64 to deal 
with some of the chars that don't go well in URLs?

-Justin

Zooko wrote:

>Having URLs which are short enough to cut and paste is important.  Encoding six
>bits per character (base 64) is that much better than encoding five bits per
>character.
>
>A mojoid in base-32 would look like this:
>
>http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1
>
>The same mojoid in base-64 would look like this:
>
>http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDhXvLWxCZdzV6F8Q
>
>That can make a significant difference in terms of usability, due to
>line-wrapping in SMTP gateways and in GUIs, the awkwardness of layout when
>representing this mojoid e.g. in HTML, and the general user experience.  The
>bigger and uglier the URL, the less a user likes to deal with it.
>
>By the way, we might try to squeeze mojoids.  I think we can get down to 30
>bytes from 40 (by convincing ourselves that an 80-bit symmetric key has the
>same attack work factor as a 160-bit hash id), so then it would look like:
>
>http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e1
>
>or
>
>http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDh
>
>We might also have an unencrypted mojoid, which would be 20 bytes, like this:
>
>http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da
>
>or
>
>http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8No
>
>Regards,
>
>Zooko
>
>_______________________________________________
>p2p-hackers mailing list
>p2p-hackers@zgp.org
>http://zgp.org/mailman/listinfo/p2p-hackers
>


From bram at gawth.com  Wed Sep 19 10:19:02 2001
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
In-Reply-To: <3BA7F8D4.2090108@chapweske.com>
Message-ID: <Pine.LNX.4.21.0109191018160.19149-100000@ultra.gawth.com>

On Tue, 18 Sep 2001, Justin Chapweske wrote:

> hex and Base64 are the most popular formats.

Don't forget Base256 :-)

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From zooko at zooko.com  Wed Sep 19 10:35:02 2001
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) 
In-Reply-To: Message from Justin Chapweske <justin@chapweske.com> 
   of "Wed, 19 Sep 2001 12:02:43 CDT." <3BA8CFB3.1020509@chapweske.com> 
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <E15ji7k-0006aI-00@imp>  <3BA8CFB3.1020509@chapweske.com> 
Message-ID: <E15jl6K-0007UQ-00@imp>

> Are you planning on using the Freenet modification of Base64 to deal 
> with some of the chars that don't go well in URLs?

Mojo Nation has always used `-' in place of `+' and `_' in place of `/'.  This
also allows us to use mojoids as filenames in most file systems.

I'm sure we'd be willing to change that if there were a good reason.

Regards,

Zooko


From gojomo at usa.net  Wed Sep 19 10:39:01 2001
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi
 (was Various identifier choices))
References: <01e801c135e2$5d03dda0$0ea7fea9@golden>
 <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net>
 <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com>
 <3BA7F8D4.2090108@chapweske.com> <E15ji7k-0006aI-00@imp>
Message-ID: <005601c14131$cbdc4d20$0ea7fea9@golden>

Zooko writes:
> Having URLs which are short enough to cut and paste is important. 

I agree.

> Encoding six
> bits per character (base 64) is that much better than encoding five bits per
> character.

Yes, Base64 is 17% more compact than Base32, which is 
20% more compact than Hexadecimal.

But Base64 introduces case-sensitivity. Especially if
you ever use identifier fragments as a shorthand, this
introduces situations where they "bleed together" --
in human perception, in filesystems, in search-routines.

Also, Base64 introduces 2 characters that can present
problems in URLs and filenames: '/' and '+'. These
also serve as 'break' characters to many text-index
and text-search routines. 

You could use Freenet's patched Base64, which uses the
characters '~' and '-' instead, but then you've deviated
slightly from a long-standing standard, and still have
the 'break' characters problem.

In contrast, Base32 is robust across case isomorphisms,
safe for URLs and filesystems, and results in full-length
and fragment identifiers which are typically recognized
as unbroken units by legacy text-search mechanisms.

> A mojoid in base-32 would look like this:
> 
> http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1

That looks like Hexadecimal to me; the chance that a 70-digit Base32
number would contain no letters G-Z is infinitesimal.

> The same mojoid in base-64 would look like this:
> 
> http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDhXvLWxCZdzV6F8Q

If you're lucky enough not to get any '/' or '+' characters!

> That can make a significant difference in terms of usability, due to
> line-wrapping in SMTP gateways and in GUIs, the awkwardness of layout when
> representing this mojoid e.g. in HTML, and the general user experience.  The
> bigger and uglier the URL, the less a user likes to deal with it.

I again agree. However, for the foreseeable future, SHA1 
will be a sufficient casual "mailable" key into Bitzi, 
and SHA1 in Base32 is already a manageable 32-characters. 

I can see with your longer MojoIDs you have a problem; 
there is no need for all identifiers to use the same 
ASCII-compatible-encoding, so perhaps Base64 is the right 
choice for MojoNation. 

If Bitzi was to track and display MojoIDs, associated 
with Bitprints, we would display the MojoIDs in whatever 
fashion is typical for MojoNation users.

> By the way, we might try to squeeze mojoids.  I think we can get down to 30
> bytes from 40 (by convincing ourselves that an 80-bit symmetric key has the
> same attack work factor as a 160-bit hash id), so then it would look like:
> 
> http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e1
> 
> or
> 
> http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8NoFN9gmJsJKeFDh
> 
> We might also have an unencrypted mojoid, which would be 20 bytes, like this:
> 
> http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da
> 
> or
> 
> http://localhost:4004/id/GxeGTutsaClMmy2wMkordzQB8No

Sure, knock yourselves out. My only request would be that 
you document how they are created somewhere (besides the 
code itself) and freeze the definition at some point. :)

- Gojomo


From gojomo at usa.net  Wed Sep 19 10:53:01 2001
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Bitzi (was Various identifier choices)
References: <01e801c135e2$5d03dda0$0ea7fea9@golden>
 <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net>
 <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com>
 <20010918213352.A8771@or.pair.com> <011501c140af$0bda60c0$6601a8c0@gojovaio>
 <3BA80F7F.50805@chapweske.com>
Message-ID: <008001c14133$e925f140$0ea7fea9@golden>

Justin writes:
> With so many accepted input formats, wouldn't it make more sense to be 
> explicit about which format you are using in the URL?

I suppose you mean something like:

   http://bitzi.com/lookup?shal=3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD

That's a possibility if there's any more proliferation of acceptable
identifiers. 

Right now, though, what comes after the /lookup/ is always a SHA1 
identifier, and then an optional TigerTree. The different encodings 
are a superficial detail, and always distinguishable by their 
differing lengths. 

Further, as mentioned elsewhere, URL compactness is a benefit --
so we don't want to add characters which only clarify what is
already invariant.

- Gojomo
____________________
Gordon Mohr, gojomo@
bitzi.com, Bitzi CTO


From zooko at zooko.com  Wed Sep 19 11:01:02 2001
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) 
In-Reply-To: Message from Gordon Mohr <gojomo@usa.net> 
   of "Wed, 19 Sep 2001 10:37:46 PDT." <005601c14131$cbdc4d20$0ea7fea9@golden> 
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <E15ji7k-0006aI-00@imp>  <005601c14131$cbdc4d20$0ea7fea9@golden> 
Message-ID: <E15jlVI-0007dQ-00@imp>

> But Base64 introduces case-sensitivity. Especially if
> you ever use identifier fragments as a shorthand, this
> introduces situations where they "bleed together" --
> in human perception, in filesystems, in search-routines.

Hm.

Since mojoids include 160-bit SHA1 hashes, they are collision free *even* if
you base64 encode them and then merge all the upper/lowercase!  (That isn't
obvious, but once upon a time I convinced myself of it.  I can find the message
in old mojonation-devel archives if you like.)

Hm.  I'm pretty sure that using fragments as shorthand opens the door to
collisions all by itself, and that the upper/lowercase issue doesn't contribute
significantly to the risk of spoofing.

Can you give me an example of this "bleed together" problem, excluding using
fragments?


> Also, Base64 introduces 2 characters that can present
> problems in URLs and filenames: '/' and '+'.

I should have specified that we translate `+' and `/' to `-' and `_'
respectively.


> In contrast, Base32 is robust across case isomorphisms,
> safe for URLs and filesystems, and results in full-length
> and fragment identifiers which are typically recognized
> as unbroken units by legacy text-search mechanisms.

I guess we just differ in our value judgements here.  I value shorter ids for
cut-and-paste purposes more than I value absence of "break" characters.
Indeed, I can't really think of a motivating example for caring about "break"
characters.  Could you please suggest one?


> > A mojoid in base-32 would look like this:
> > 
> > http://localhost:4004/id/1b17864eeb6c68294c9b2db0324a2b773401f0da0537d82626c24a7850e15ef2d6c4265dcd5e85f1
> 
> That looks like Hexadecimal to me; the chance that a 70-digit Base32
> number would contain no letters G-Z is infinitesimal.

Ahem.  <blush>

Okay, that was hexidecimal.  The standard Python libraries offer hex and
base-64, and in my haste I mistook hex for base-32.

Hm.  I can't find a base-32 encoder in Python.  Could someone who favors
base-32, and thus presumably has an encoder handy, show the base-32 version of
40-byte, 30-byte, and 20-byte strings?  Thanks!


Regards,

Zooko


From hal at finney.org  Wed Sep 19 11:11:02 2001
From: hal at finney.org (hal@finney.org)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices))
Message-ID: <200109191810.LAA20604@finney.org>

Zooko writes:
> Since mojoids include 160-bit SHA1 hashes, they are collision free *even* if
> you base64 encode them and then merge all the upper/lowercase!  (That isn't
> obvious, but once upon a time I convinced myself of it.  I can find the message
> in old mojonation-devel archives if you like.)

Base64 encoding a 160 bit hash would take 26-27 characters.  Compressing
case throws away 1 bit per character that happens to be alphabetic, and
52/64 of the characters will be alphabetic.  So you'll end up discarding
around 21 bits, giving the hash an effective strength of 139 bits.
That should be amply strong for these purposes.

Hal

From alk at pobox.com  Wed Sep 19 11:15:02 2001
From: alk at pobox.com (Tony Kimball)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) 
References: <01e801c135e2$5d03dda0$0ea7fea9@golden>
	<Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net>
	<20010918204913.A80700@or.pair.com>
	<3BA7F46F.892E3989@mindspring.com>
	<3BA7F8D4.2090108@chapweske.com>
	<E15ji7k-0006aI-00@imp>
	<005601c14131$cbdc4d20$0ea7fea9@golden>
	<E15jlVI-0007dQ-00@imp>
Message-ID: <15272.57453.804563.955204@gargle.gargle.HOWL>

Quoth Zooko on Wednesday, 19 September:
: 
: I guess we just differ in our value judgements here.  I value shorter ids for
: cut-and-paste purposes more than I value absence of "break" characters.
: Indeed, I can't really think of a motivating example for caring about "break"
: characters.  Could you please suggest one?

command shells.  


From zooko at zooko.com  Wed Sep 19 11:33:01 2001
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) 
In-Reply-To: Message from Tony Kimball <alk@pobox.com> 
   of "Wed, 19 Sep 2001 13:14:05 CDT." <15272.57453.804563.955204@gargle.gargle.HOWL> 
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <E15ji7k-0006aI-00@imp> <005601c14131$cbdc4d20$0ea7fea9@golden> <E15jlVI-0007dQ-00@imp>  <15272.57453.804563.955204@gargle.gargle.HOWL> 
Message-ID: <E15jm0K-000810-00@imp>

> : I guess we just differ in our value judgements here.  I value shorter ids for
> : cut-and-paste purposes more than I value absence of "break" characters.
> : Indeed, I can't really think of a motivating example for caring about "break"
> : characters.  Could you please suggest one?
> 
> command shells.  

I've been working with mojoids as part of a full time job and as an obsessive
hobby for two years now.  I've cut-and-pasted with X windows (and regretted
that double-click doesn't highlight the entire mojoid if it contains a `-'
character), I've used tab-completion to choose files whose names were mojoids,
I've written Python code and bash scripts to manipulate mojoids as strings, as
filenames, and as URLs.  I've received and transmitted mojoids via IRC, e-mail,
HTTP, and ssh.

I've also read thousands of feedback e-mails from users of my software, watched
newbies try to use the software for the first time in an "I'll watch but 
I won't help" user test on four separate occasions, convinced my mom to use it
by telling her that it was the only way to download home movies of her infant
grandchild, and used the software myself as an end user in order to publish,
share, and download files.

And in all that I've never noticed break characters or upper/lowercase to be an
issue.  But I *have* had problems with overly long mojoids being mangled by
e-mail agents and *really* long ones (> 256 characters) being rejected by MSIE.

Maybe if the binary object is only 20, 30 or 40 bytes long then expansion like
hex or base-32 is okay, but among the three issues at hand: 1.  upper-lower, 2.
break chars, 3. length, I am concerned about length because of my experiences
and because I think that the user experience is most affected by that one.

But I admit that this is just an intuition of mine.  Maybe users are fine with
slightly longer URLs, and maybe they prefer all-lowercase over mixed case URLs.
I haven't really heard a definitive statement on that from any users.

Regards,

Zooko


From alk at pobox.com  Wed Sep 19 11:39:01 2001
From: alk at pobox.com (Tony Kimball)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) 
References: <01e801c135e2$5d03dda0$0ea7fea9@golden>
	<Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net>
	<20010918204913.A80700@or.pair.com>
	<3BA7F46F.892E3989@mindspring.com>
	<3BA7F8D4.2090108@chapweske.com>
	<E15ji7k-0006aI-00@imp>
	<005601c14131$cbdc4d20$0ea7fea9@golden>
	<E15jlVI-0007dQ-00@imp>
	<15272.57453.804563.955204@gargle.gargle.HOWL>
	<E15jm0K-000810-00@imp>
Message-ID: <15272.58907.924394.802255@gargle.gargle.HOWL>

Quoth Zooko on Wednesday, 19 September:
: 
: >: I guess we just differ in our value judgements here.  I value shorter ids for
: >: cut-and-paste purposes more than I value absence of "break" characters.
: >: Indeed, I can't really think of a motivating example for caring about "break"
: >: characters.  Could you please suggest one?
: > 
: > command shells.  
: 
: ... in all that I've never noticed break characters or upper/lowercase to be an
: issue.  But I *have* had problems with overly long mojoids being mangled by
: e-mail agents and *really* long ones (> 256 characters) being rejected by MSIE.

okay.  non-starter.  how about search engines?  


From gojomo at usa.net  Wed Sep 19 11:55:01 2001
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi
 (was Various identifier choices))
References: <01e801c135e2$5d03dda0$0ea7fea9@golden>
 <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net>
 <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com>
 <3BA7F8D4.2090108@chapweske.com> <E15ji7k-0006aI-00@imp>
 <005601c14131$cbdc4d20$0ea7fea9@golden> <E15jlVI-0007dQ-00@imp>
Message-ID: <00ca01c1413c$a7539340$0ea7fea9@golden>

Zooko writes:
> > But Base64 introduces case-sensitivity. Especially if
> > you ever use identifier fragments as a shorthand, this
> > introduces situations where they "bleed together" --
> > in human perception, in filesystems, in search-routines.
> 
> Hm.
> 
> Since mojoids include 160-bit SHA1 hashes, they are collision free *even* if
> you base64 encode them and then merge all the upper/lowercase!  (That isn't
> obvious, but once upon a time I convinced myself of it.  I can find the message
> in old mojonation-devel archives if you like.)

I'd prefer a pointer to the MojoID definition document!

> Hm.  I'm pretty sure that using fragments as shorthand opens the door to
> collisions all by itself, and that the upper/lowercase issue doesn't contribute
> significantly to the risk of spoofing.
> 
> Can you give me an example of this "bleed together" problem, excluding using
> fragments?

With a filesystem or file-management program which ignores
or normalizes casing, Base64 names can suffer damage. 

This might then cause you to not find real matches (on a
case-sensitive basis). Or, it might tempt you to use 
case-insensitive searches, and then you've lost ~21 bits from 
your secure hash (as Hal Finney mentioned). 

Alternatively, if you wanted to rely on legacy case-insensitive
full-text search to find file identifiers, you'd be 
introducing a step where your identifiers are 21 bits weaker.

(Try, for example, Googling for SMF2Y24TI7Y3CVER8NJKT7CAFGR9FS7Z.
Googling for a Base64 identifier would introduse)

These problems are further aggravated if you ever find it 
useful to use fragments.  We already have 34 versions of
'Uptown Girl' in the Bitzi database. Given human perception,
I think it's easier for people to say or think things like:

  "The 'PYME' version is complete, the '43N6' version is
   truncated."

...than to say...

  "The 'bB/e' version is complete, the 'B+vb' version is
   truncated."

> > Also, Base64 introduces 2 characters that can present
> > problems in URLs and filenames: '/' and '+'.
> 
> I should have specified that we translate `+' and `/' to `-' and `_'
> respectively.

MojoNation and Freenet should get together and use the same
"Base64v2".

> > In contrast, Base32 is robust across case isomorphisms,
> > safe for URLs and filesystems, and results in full-length
> > and fragment identifiers which are typically recognized
> > as unbroken units by legacy text-search mechanisms.
> 
> I guess we just differ in our value judgements here.  I value shorter ids for
> cut-and-paste purposes more than I value absence of "break" characters.
> Indeed, I can't really think of a motivating example for caring about "break"
> characters.  Could you please suggest one?

Again, Googling for identifiers. Other full-text searches for
fragments. Searching for the Base32 fragment 'B6THNJ' is always
a single word; searching for the Base64 fragment 'aS+w/e' might
be interpreted as 'as w e' and perhaps ignored completely.

> Hm.  I can't find a base-32 encoder in Python.  Could someone who favors
> base-32, and thus presumably has an encoder handy, show the base-32 version of
> 40-byte, 30-byte, and 20-byte strings?  Thanks!

20b -> 32 chars: 3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD
30b -> 48 chars: 3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD8EJ2KEDCV3WQMMPF
40b -> 64 chars: 3KIZIJB64XP3NCXAE4ISQZT3QNCTF7VD8EJ2KEDCV3WQMMPFWFJW6DCVPKXMZQIZ

- Gojomo


From gojomo at usa.net  Wed Sep 19 12:06:01 2001
From: gojomo at usa.net (Gordon Mohr)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi
 (was Various identifier choices))
References: <01e801c135e2$5d03dda0$0ea7fea9@golden>
 <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net>
 <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com>
 <3BA7F8D4.2090108@chapweske.com> <E15ji7k-0006aI-00@imp>
 <005601c14131$cbdc4d20$0ea7fea9@golden> <E15jlVI-0007dQ-00@imp>
 <00ca01c1413c$a7539340$0ea7fea9@golden>
Message-ID: <000701c1413e$16517720$0ea7fea9@golden>

I left a thought unfinished:
> (Try, for example, Googling for SMF2Y24TI7Y3CVER8NJKT7CAFGR9FS7Z.
> Googling for a Base64 identifier would introduse)

"Googling for a Base64 identifier would introduce not just
case-indifference, but also potentially fragment
isomorphisms -- if, as I suspect, non-alphanumeric Base64
characters are treated as break characters."


From oskar at freenetproject.org  Wed Sep 19 14:57:02 2001
From: oskar at freenetproject.org (Oskar Sandberg)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices))
In-Reply-To: <200109191810.LAA20604@finney.org>; from hal@finney.org on Wed, Sep 19, 2001 at 11:10:07AM -0700
References: <200109191810.LAA20604@finney.org>
Message-ID: <20010919235618.B402@sandbergs.org>

On Wed, Sep 19, 2001 at 11:10:07AM -0700, hal@finney.org wrote:
> Zooko writes:
> > Since mojoids include 160-bit SHA1 hashes, they are collision free *even* if
> > you base64 encode them and then merge all the upper/lowercase!  (That isn't
> > obvious, but once upon a time I convinced myself of it.  I can find the message
> > in old mojonation-devel archives if you like.)
> 
> Base64 encoding a 160 bit hash would take 26-27 characters.  Compressing
> case throws away 1 bit per character that happens to be alphabetic, and
> 52/64 of the characters will be alphabetic.  So you'll end up discarding
> around 21 bits, giving the hash an effective strength of 139 bits.
> That should be amply strong for these purposes.

It makes a lot more sense to generate a shorter key to begin with and
base32 encode it, then to generate a full length key and then downgrade
it based on the base64 encoding...

> 
> Hal
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers

-- 
Though here at journey's end I lie 
  In darkness buried deep,          above all shadows rides the Sun
beyond all towers strong and high,    and the Stars forever dwell:
  beyond all mountains steep,       I will not say the Day is done,
                                      nor bid the Stars farewell.
(JRRT)

Oskar Sandberg
oskar@freenetproject.org

From zooko at zooko.com  Thu Sep 20 12:40:01 2001
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:11:43 2006
Subject: please prefer base 64 over base 32 (was: Re: [p2p-hackers] Bitzi (was Various identifier choices)) 
In-Reply-To: Message from Tony Kimball <alk@pobox.com> 
   of "Wed, 19 Sep 2001 13:38:19 CDT." <15272.58907.924394.802255@gargle.gargle.HOWL> 
References: <01e801c135e2$5d03dda0$0ea7fea9@golden> <Pine.LNX.4.21.0109070343480.24019-100000@azrael.dyn.cheapnet.net> <20010918204913.A80700@or.pair.com> <3BA7F46F.892E3989@mindspring.com> <3BA7F8D4.2090108@chapweske.com> <E15ji7k-0006aI-00@imp> <005601c14131$cbdc4d20$0ea7fea9@golden> <E15jlVI-0007dQ-00@imp> <15272.57453.804563.955204@gargle.gargle.HOWL> <E15jm0K-000810-00@imp>  <15272.58907.924394.802255@gargle.gargle.HOWL> 
Message-ID: <E15k9X2-0002Uw-00@imp>

> : > command shells.  
> : 
> : ... in all that I've never noticed break characters or upper/lowercase to be an
> : issue.  But I *have* had problems with overly long mojoids being mangled by
> : e-mail agents and *really* long ones (> 256 characters) being rejected by MSIE.
> 
> okay.  non-starter.  how about search engines?  

Hm.  Like I might have "ABCDE-FGHIJKLMNOPQRSTUVXWYZ" on my home page, but you
can't find it through a search engine because when you ask for "A-B" it gives
you "A and not B" or something?

That seems like a good example (although I'm not sure about the details of how
search engines work).

Regards,

Zooko


From bram at gawth.com  Tue Sep 25 21:45:01 2001
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] Call for Presentations: CodeCon 2002
Message-ID: <Pine.LNX.4.21.0109252058250.10559-100000@ultra.gawth.com>

CALL FOR PRESENTATIONS: CODECON 2002
http://www.codecon.org/

Please forward wherever applicable.

CodeCon 2002, scheduled for February 15, 16, and 17 in San Francisco,
California, is the premier event in 2002 for the P2P, cypherpunk, and
network/security application developer community. It is a workshop for
developers of real-world applications that support individual liberties.

During the first two days, our policy is "bring your own code"; while
those not demonstrating software are welcome to attend, the focus is
primarily on developer discussion. The final day of the workshop is
intended to be more inclusive, consisting of public and press
demonstrations, interviews, panels and a public session allowing a larger
number of presenters to demonstrate their projects in a more informal
setting. All presentations must be accompanied by functional applications,
ideally open source. Presenters must be one of the active developers of
the code in question.

CodeCon strongly encourages presenters from non-commercial and academic
backgrounds to attend for the purposes of collaboration and the sharing of
knowledge by providing free registration to workshop presenters and
highly-discounted registration to full-time students. Public session
presenters and approved members of the press will receive free
registration for the public session on Sunday.

IMPORTANT DATES

Submissions open:                               1 October 2001
Final submission deadline:                      1 January 2002
Final notification of acceptance:              15 January 2002
Conference begins:                            15 February 2002
Public session and public demonstrations:     17 February 2002
Post-conference web-based proceedings:           15 March 2002

SUGGESTED TOPICS

The focus of CodeCon is on running applications which:

*  use one or more of: cryptography, steganography, distributed 
   network architectures, peer to peer communications, anonymity 
   or pseudonymity
	      
*  enhance individual power and liberty 

*  can be discussed freely, either by virtue of being open source or
   having a published protocol, and preferably free of intellectual
   property restrictions
	       
*  are generally useful, either directly to a large number of users, or 
   as an example of technology applicable to a larger audience


Examples of excellent presentations include Mixmaster remailers and
extensions, OpenNap, Swarmcast, Mojo Nation, Magic Money, and OpenPGP
applications. Novelty in technical approaches, security assumptions, and
end-user functionality are excellent properties.

Presentations about basic technologies, such as a new cipher or hash,
non-interesting vulnerabilities in existing applications, or discussions
of unimplemented protocols are better suited for other conferences. The
guidelines for the CodeCon public session on Sunday are less stringent
than the main workshop; presentations which are more tangential to
CodeCon's focus may be accepted.

FORMAT OF PRESENTATIONS (main workshop)

Paper and Q&A
-------------
For those most comfortable with a traditional conference format, we will
accept papers up to 25 pages. We encourage HTML or plain ASCII
submissions, but can accept PostScript, PDF, or LaTeX. We will distribute
papers in advance of the conference, and will provide 30 or 60 minutes for
discussion and Q&A, at the presenter's discretion. In exceptional cases,
we will accept anonymous papers and conduct either a non-directed
discussion or a Q&A session directed by proxy. All papers should be
accompanied by source code or an application. When possible, we would
prefer that the application be available for interactive use during the
workshop, either on a presenter-provided demonstration machine or one of
the conference kiosks. Additionally, during the paper presentation, some
use of this demo must be made; it may be relatively brief, but a
demonstration of the running application is essential.

Interactive demo
----------------
In addition to the traditional conference paper format, we encourage
highly interactive presentations. Throughout the event, we will have
several kiosks and local servers available for demonstration purposes. We
also strongly encourage presenters to bring their own hardware.
Application demos can be up to 20 minutes, followed by a period of up to
40 minutes for Q&A, which can include demonstration of additional features
of the application not covered in the main presentation. If desired by the
presenter, we can distribute URLs of applications several days before the
workshop to allow attendees to familiarize themselves with the basics of
applications prior to the workshop sessions.

Panel
-----
In areas where multiple projects fall roughly in the same domain, the most
efficient presentation may be a panel with one or more developers from
each team. These developers may then individually demonstrate their
applications, followed by discussion among the panel and Q&A with the
other attendees as to differences in design goals, implementation, and
other aspects of the systems. If we receive multiple submissions from
related projects for papers or demos, we may suggest to the presenters
that they combine into a panel. Additionally, presenters are free to
submit jointly as a pre-selected panel.

There is some flexibility in requirements and formats for presentations;  
please enquire if you would like to use an alternate form.

FORMAT OF PRESENTATIONS (public session)

On the afternoon of Sunday 17 February, we will set aside a substantial
amount of time for 5 minute-or-less project public session presentations.
Other events on this day, including panels and main presentations, will be
targeted at members of the press and public, so brief presentations on
Sunday will reach a wide audience. Presenters from the first two days who
wish to make an additional public session presentation may do so.

SUBMISSION DETAILS

Presentations must be performed by one of the active developers on the
project. That's the rule -- no code, no mike. Multiple people may be
involved in a presentation. You do get in free if you're part of a
presentation even if you don't speak during it, so creativity (within
reason) is encouraged.

The workshop language is English, for both presentations and papers.

Ideally, demonstrations should be usable by attendees with 802.11b
connected devices either via a web interface, or locally on Windows,
UNIX-like, or MacOS platforms. Cross-platform applications are most
desirable.

Our venue may be 21+. If you are submitting and are under 21, please
advise the program committee; we may consider alternate venues for one or
more days of the event. If you have a specific day on which you would
prefer to present, please advise us.

Main workshop submissions should include in the plain-text body of email
to submissions@codecon.org the following information:

                 - Name of presenter
                 - Name of others involved in project attending conference
                 - Title of presentation
                 - Brief summary of topic
                 - URL or attachment of example code
                   (must be received by the final submission deadline)
                 - Brief project history
                 - Brief summary of demo, or abstract of paper
                 - Any other details considered relevant

Public session submissions should include in the plain-text body of email
to submissions@codecon.org the following information:

                   - Name of presenter
                   - Title of presentation
                   - Brief summary of topic
                   - URL or attachment with example code
                   - Any other details

PROGRAM COMMITTEE

                       Bram Cohen, BitTorrent
                       Dan Egnor, ofb.net
                       Jered Floyd, Permabit
                       Ian Grigg, Systemics
                       Ryan Lackey, HavenCo
                       Don Marti, LinuxJournal
                       Guido Sanchez, New Hack City
                       Bill Stewart, AT&T
                       Brandon Wiley, Freenet
                       Jamie Zawinski, DNA Lounge

COSTS

Recognizing that many of the developers of the most interesting cypherpunk
applications are unable to afford accommodations and other expenses in San
Francisco, CodeCon will attempt to locate housing and otherwise assist
with issues for presenters on a case-by-case basis. Please contact
codecon-admin@codecon.org if your submission is accepted but you require
assistance to attend.

SPONSORSHIP

If your organization is interested in sponsoring CodeCon, we would love to
hear from you. In particular, we are looking for sponsors for social meals
and parties on any of the three days of the conference, as well as
sponsors of the conference as a whole, prizes or awards for quality
presentations, and assistance with transportation or accommodation for
presenters with limited resources. If you might be interested in
sponsoring any of these aspects, please contact the conference organizers
at codecon-admin@codecon.org.

QUESTIONS

If you have questions about CodeCon, or would like to contact the
organizers, please mail codecon-admin@codecon.org. Please note this
address is only for questions and administrative requests, and not for
workshop presentation submissions.


From greg at electricrain.com  Thu Sep 27 14:33:01 2001
From: greg at electricrain.com (Gregory P. Smith)
Date: Sat Dec  9 22:11:43 2006
Subject: [p2p-hackers] CIPHERim?
Message-ID: <20010927143220.C19954@zot.electricrain.com>

anyone know any details about the "im" product mentioned here?

http://dailynews.yahoo.com/h/zd/20010927/tc/developer_encrypts_corporate_im_1.html

It doesn't give a good description, but sounds a lot like what im
clients have needed for a while (basically use EGTP from mojonation to
send/receive your messages instead of a central server).

-g