From bram at gawth.com  Thu Jul  3 08:58:02 2003
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] IRTF P2P Working Group Formed
In-Reply-To: <3EFB48D2.3080602@chapweske.com>
Message-ID: <Pine.LNX.4.21.0307030856340.8932-100000@ultra.gawth.com>

Justin Chapweske wrote:

>   A new IRTF research group, P2PRG (Peer-to-Peer Research Group), has begun,
>   with the appended charter. Use p2prg-request@ietf.org to subscribe to the
>   mailing list.
> 
>   - Vern Paxson (IRTF chair)

I predict that this group will be just as important as Intel's p2p
standards group.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From justin at chapweske.com  Thu Jul  3 10:03:02 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] IRTF P2P Working Group Formed
In-Reply-To: <Pine.LNX.4.21.0307030856340.8932-100000@ultra.gawth.com>
References: <Pine.LNX.4.21.0307030856340.8932-100000@ultra.gawth.com>
Message-ID: <3F0461B4.8090401@chapweske.com>

I disagree.  Starting with a research group is a good first step in 
defining taxonomy and requirements in this space.  I'm sure it will be a 
number of years before an IETF working group is formed and standards are 
created, but this is a good first step.

Bram Cohen wrote:
> Justin Chapweske wrote:
> 
> 
>>  A new IRTF research group, P2PRG (Peer-to-Peer Research Group), has begun,
>>  with the appended charter. Use p2prg-request@ietf.org to subscribe to the
>>  mailing list.
>>
>>  - Vern Paxson (IRTF chair)
> 
> 
> I predict that this group will be just as important as Intel's p2p
> standards group.
> 
> -Bram Cohen
> 
> "Markets can remain irrational longer than you can remain solvent"
>                                         -- John Maynard Keynes

-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From adam at cypherspace.org  Thu Jul  3 21:48:02 2003
From: adam at cypherspace.org (Adam Back)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
Message-ID: <20030704054745.B13145352@exeter.ac.uk>

So one common p2p download approach is to download a file in parts in
parallel (and out of order) from multiple servers.  (Particularly to
achieve reasonable download rates from multiple asynchronous links of
varying link speeds).  A common idiom is also that there is a compact
authenticator for a file (such as it's hash) which people will supply
as a document-id.

Then we have the issue with people actively jamming p2p networks, so
it has become practically interesting to achieve per chunk
authentication in these multiple server downloads, while retaining a
single compact file authenticator.

I read the THEX (Tree Hash EXchange format) internet-draft proposed by
Justin Chapweske and Gordon Mohr, and I'm taking from that document
that it attempts to deal with this problem.

However it seems somewhat inefficient (or if used efficiently not to
robustly achieve per-chunk anti-jamming for moderate-to-large sized
files).  They propose using a Merkle Hash Tree (MHT) on the document
with base chunks of 1KB.  One of their claims is that the MHT itself
can be downloaded from different nodes to combat needing to trust the
tree server (and presumably for scalability).

They appear to propose that the recipient would download log(n) chunks
from a tree server (by asking for offsets and lengths in the tree
file, presumably using HTTP keep-alive and offset, length features of
HTTP).  However this has significant request overhead when the file
hashes are 16-20bytes (the request would be larger than the hash), it
also involves at least two connections: one for authentication info,
another for downloading chunks.

They also mention their format is suitable for serial download (as
well as the random access download I described in the above
paragraph).  Here I presume (though it is not stated) that the user
would be expected to download either the entire set of leaf nodes (1/2
the full tree size), or some subset of the leaf-nodes plus enough
other nodes to verify that the leaf-nodes were correct.  (To avoid
being jammed during download from the tree server.)  Again none of
this is explicitly stated but would be minimally necessary to avoid
jamming.


A simpler and more efficient approach is as follows (I presume a
128bit (16byte) output hash function such as MD5, or truncated SHA1; I
also presume each node has the whole file):

if the file is <= 1KB download the file and compare to master hash.

If the file is > 1KB and <= 64KB download, hash separately each of the
1KB chunk of the file; call the concatenation of those hashes the 2nd
level hash set, and call the hash of the 2nd level hash set the master
hash.  To download first download the 2nd level hash set (a 1KB file)
and check that it hashes to the master hash.  Then download each 1KB
chunk of the file (in random order from multiple servers) and check
each 1KB chunk matches the corresponding chunk in the 2nd level hash
set.

If the file is > 64KB and <= 4MB, hash separately each of the 1KB
chunks of the file; call the concatenation of those hashes the 2nd
level hash set.  The 2nd level hash set will be up be 64KB in size.
Hash separately each of the up to 64 1KB chunks of the 2nd level hash
set; call the concatenation of those hashes the 3rd level hash set.
Call the hash of the 3rd level hash set the master hash.  Download an
dverification is an obvious extension of the 2 level case.

Repeat for as many levels as necessary to match the file size.
Bandwidth efficiency is optimal, there is a single compact file
authenticator (the master hash: the hash of the 2nd level hash set),
and immediate authentication is provided on each 1KB file chunk.

To avoid the slow-start problem (can't download and verify from
multiple servers until the 2nd level hash set has been downloaded),
the 2nd level hash set chunk could have download started from multiple
serers (to discover the fastest one), and/or speculative download of
3rd level chunks or content chunks could be started and verifciation
deferred until the 2nd level hash set chunk is complete.

Adam

From gojomo at bitzi.com  Fri Jul  4 00:07:02 2003
From: gojomo at bitzi.com (Gordon Mohr)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
References: <20030704054745.B13145352@exeter.ac.uk>
Message-ID: <014d01c341fa$c0a58bd0$660a000a@golden>

Adam Back writes:
> I read the THEX (Tree Hash EXchange format) internet-draft proposed by
> Justin Chapweske and Gordon Mohr, and I'm taking from that document
> that it attempts to deal with this problem.
> 
> However it seems somewhat inefficient (or if used efficiently not to
> robustly achieve per-chunk anti-jamming for moderate-to-large sized
> files).  They propose using a Merkle Hash Tree (MHT) on the document
> with base chunks of 1KB. 

Yes.

> One of their claims is that the MHT itself
> can be downloaded from different nodes to combat needing to trust the
> tree server (and presumably for scalability).

I would not put it that way; rather, for some reason, you already
trust the root value of the tree. It's been recommended to you
by a trusted source, commonly accepted in public forums as a 
good root, whatever.

From adam at cypherspace.org  Fri Jul  4 02:51:01 2003
From: adam at cypherspace.org (Adam Back)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
In-Reply-To: <014d01c341fa$c0a58bd0$660a000a@golden>; from gojomo@bitzi.com on Fri, Jul 04, 2003 at 12:06:11AM -0700
References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden>
Message-ID: <20030704105003.A13087560@exeter.ac.uk>

> The THEX data format is really best for grabbing a whole subset of
> the internal tree values, from the top/root on down, in one
> gulp. Yes, that top-down format includes redundant info, 

Well you can halve the downloaded tree size as you only need the
leaves (at your desired resolution) (given processor speeds, link
speeds and the efficiency of hash algorithms, I think you can
categorically state that you _will_ be able repopulate the rest faster
than you can download it).

> and you could grab the data from lots of different people, but it's
> so small compared to the content you're getting, why not just get
> the whole thing from any one arbitrary peer who has it handy?

Because the arbitrary peer may be jamming you.  I'm presuming a
byzantine network, where some significant proportion of nodes are
hostile and working for a well-funded adversary.  (Actually the
current p2p network looks a lot lke this thanks to the RIAA funding
p2p jamming-ops).

> For example, the full tree to verify a 1GB file at a resolution of 
> 64KB chunks is only (1G/64K)*2*24(Tiger)=768K, or less than 1/10th 
> of 1% of the total data being verified. So I'd say, just get it from
> anyone who offers it, verify it's consistent with the desired root,
> and keep it around -- nothing fancy. 

And so if the peer was jamming you, you have to download the whole
tree again; repeat until you get a tree which matches the master hash.

All I'm saying is your jamming resistance is unevenly balanced.  You
will detect jamming (in your example) after each 64KB chunk for normal
downloads.  But for the tree your are accepting a lower jamming
resistance, namely you download the whole tree before you notice.  So
in your case the jamming resistance is 12x less effective.  (Actually
that's 6x because you are downloading redundant data for half the tree
so you can skip that; similarly it need only be 5x if you also use a
more reasonable sized mainstream hash like SHA1).

So THEX has two problems:

A) The rational jammer will always jam your tree downloads because
they are 4-5x more vulnerable (he may jam content chunks also, but he
gets best value for his investment by jamming your tree downloads).

B) If you download your tree in parts in parallel to speed it up, your
tree download becomes even more vulnerable because when the tree fails
to verify you won't know which chunks were jammed, and your overall
jamming risk (for tree download) will be higher: it will be (1-p)^k
(where p is the proportion of nodes which are jaming and k is the
number of nodes downloaded from.  This factor will be multiplied by
the jamming multiplier coming from A).

(If you think this is not a problem you have never tried downloading
700KB files over a period of days from a collection of a few but
periodicaly changing availability (because of competition for the
download and because of drop-off-and-on) dialup users because they are
the only nodes which happen to have the file you want).

Note also in p2p networks links vary a lot.  "Just grabbing" 768kB
from the first host that comes to mind may not work out to well if it
is a 56kb modem that is sharing it's link with 4 other downloads.
(Determining link speed ahead of time is also a separate problem, so
you can't rely on picking fast links: the jammer will advertise he has
many T1 links each performing in reality like 56kb modems; he can do
this from his single T1 or cable modem.)

> > Repeat for as many levels as necessary to match the file size.
> > Bandwidth efficiency is optimal, there is a single compact file
> > authenticator (the master hash: the hash of the 2nd level hash set),
> > and immediate authentication is provided on each 1KB file chunk.
> 
> On what criteria is this more simple or efficient?

Efficient in that the space overhead is negligibly higher, but the
jamming resistance of the tree download portion is the same as the
file portion.

> exactly analogous to the degree-64 tree: it includes segment summary
> values covering the exact same amount of source data.)

Let's phrase the recursive download another way which would make it
directly compatible with THEX.  I argue that A) and B) are undesirable
properties that can be fixed by the following algorithm:

Download the 1KB sized 6th generation chunk of the THEX MHT (which can
be done due to the serialization format).  Then download 1st of 64 1KB
12th generation chunks (and it can be verified against the 1st 16 byte
hash in the 6th generation chunk), then download the next etc.  (They
can be downloaded out of order from different hosts.)

Note that in this case the chunk size is selectable.  If the hash size
is 16 bytes (a nice size to work with), and the desired chunk size is
X bytes, then the generations to download to fill that chunk are
multiples of log2(X/16).

That algorithm could be input for a THEX draft 3.  (I'd also argue you
should note that only the leaf nodes are needed due to the fast
repopulation operation).

Adam

From bert at web2peer.com  Fri Jul  4 09:03:02 2003
From: bert at web2peer.com (bert@web2peer.com)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
Message-ID: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net>

On Fri, 4 Jul 2003 10:50:03 +0100, Adam Back wrote:

one arbitrary peer who has it handy?
> 
> Because the arbitrary peer may be jamming you.  I'm presuming a
> byzantine network, where some significant proportion of nodes are
> hostile and working for a well-funded adversary.  (Actually the
> current p2p network looks a lot lke this thanks to the RIAA funding
> p2p jamming-ops

If more than half the nodes are adversarial, the network is going to
be prone to all kinds of attacks that you'll never be able to
surmount. So let's be generous numbers and say the probability a
selected node is adversarial is .5, and the THEX tree size is 1MByte. 
If you download the file as a whole, on average you'll be downloading
2Mbyte of summary data per file transfer. Your scheme might allow you
to reduce this to (say) 1.1Mbyte on average by detecting malicious
nodes a bit earlier, for a savings of .8Mbytes per file (and again
that's being very generous in the assumptions). Given that  the
typical application of these techniques is multi-megabyte file
downloads, a .8Mbyte savings is nothing to write home about,
particularly given the added complexity your scheme introduces.

Perhaps you have some other applications in mind?

From adam at cypherspace.org  Fri Jul  4 10:02:02 2003
From: adam at cypherspace.org (Adam Back)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
In-Reply-To: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net>; from bert@web2peer.com on Fri, Jul 04, 2003 at 09:02:29AM -0700
References: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net>
Message-ID: <20030704180124.A12837776@exeter.ac.uk>

On Fri, Jul 04, 2003 at 09:02:29AM -0700, bert@web2peer.com wrote:
> If more than half the nodes are adversarial, the network is going to
> be prone to all kinds of attacks that you'll never be able to
> surmount.

I think p2p download can work (albeit less efficiently) into quite
high jamming levels.  Your overhead is a startup and node-maintenance
one: you try to first obtain and than maintain as many non jamming
peers as you can to stream at the desired rate.  You try to stay with
nodes that have proven non-jamming while they let you.  (Individual
peers often have fairness policies such that they let other users
download; and also nodes drop-off, re-join, so there are reasons you
can not stay with them for the duration of the download).

Hence the desire for immediate and comprehensive (tree blocks as well
as content blocks) jamming detection.

> So let's be generous numbers and say the probability a
> selected node is adversarial is .5, and the THEX tree size is 1MByte. 
> If you download the file as a whole, on average you'll be downloading
> 2Mbyte of summary data per file transfer. Your scheme might allow you
> to reduce this to (say) 1.1Mbyte on average by detecting malicious
> nodes a bit earlier, for a savings of .8Mbytes per file (and again
> that's being very generous in the assumptions). Given that  the
> typical application of these techniques is multi-megabyte file
> downloads, a .8Mbyte savings is nothing to write home about,
> particularly given the added complexity your scheme introduces.

It has another advantage: that you can safely download the tree in
parallel, without admitting higher jamming ratios.

I don't think in the conclusion it adds any significant complexity as
it can be expressed as just an approach to downloading the THEX tree.
(Which the THEX document anyway supposes you could download in chunks:
I just specify how to download those chunks in parllel in a size equal
to your preferred data chunk size in a way which avoids jamming to the
same degree as THEX plans for document chunks).  

The chunk size is really an expression of your tolerance for jamming;
how much bandwidth you're willing to expend to discover whether a node
is jamming or not (if it is jamming you've wasted that chunk; if it's
not you've got useful content/tree data.)  The jamming tolerance chunk
size need not be the same as the optimal request chunk size given your
link characteristics.  (The request chunk size would typically be
larger; some multiple (probably a power of 2 multiple) of the jammming
tolerance chunk size chosen to give TCP a chance to get streaming at
full speed for some useful period of time).

Adam

From gojomo at bitzi.com  Fri Jul  4 13:28:02 2003
From: gojomo at bitzi.com (Gordon Mohr)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden> <20030704105003.A13087560@exeter.ac.uk>
Message-ID: <00b601c3426a$9ff6e3f0$660a000a@golden>

Adam Back writes:
> > The THEX data format is really best for grabbing a whole subset of
> > the internal tree values, from the top/root on down, in one
> > gulp. Yes, that top-down format includes redundant info, 
> 
> Well you can halve the downloaded tree size as you only need the
> leaves (at your desired resolution) (given processor speeds, link
> speeds and the efficiency of hash algorithms, I think you can
> categorically state that you _will_ be able repopulate the rest faster
> than you can download it).

I agree, but:
  (1) That's a fairly easiy thing to do via a range-request, or perhaps
      a maximum of two requests: one to get the tree header, so you
      know where the raw data begins, and then one to get exactly
      the one minimum generation you want.
  (2) The savings is still tiny overall, compared to either the total
      size of data transferred or (if you're assuming a high level of
      malicious mischief) the amount of data you'll be dropping on the
      floor each time you discover a bad peer. So any random-access or
      minimum-needed optimizations can be deferred.

However, some Gnutella engineers have proposed that "desired generation"
be included on the URI-line for THEX tree requests, so that they can
get just the resolution they want from cooperating peers. This could
take the form of another THEX serialization-type -- say, "single 
generation". 

> > and you could grab the data from lots of different people, but it's
> > so small compared to the content you're getting, why not just get
> > the whole thing from any one arbitrary peer who has it handy?
> 
> Because the arbitrary peer may be jamming you.  I'm presuming a
> byzantine network, where some significant proportion of nodes are
> hostile and working for a well-funded adversary.  (Actually the
> current p2p network looks a lot lke this thanks to the RIAA funding
> p2p jamming-ops).

But the best you can do, in a network where many peers are malicious,
is to identify them as soon as possible, and then ignore them,
preferring those who are non-malicious. 

So you still begin with an untrusted arbitrary peer to provide your 
verification data, just like you'll have arbitrary untrusted peers
to provide the content, and there's an unavoidable probability related
to the total number of malicious peers that you'll have to throw out
the data your receive when it doesn't chekc out with the trusted root
value. 

Spreading your requests of the verification data out over many 
peers only increases the chance that you'll have mixed data,
some good, some bad. But as soon as you've found one good source,
why not just stick with it (for the full tree or full generation)?

> > For example, the full tree to verify a 1GB file at a resolution of 
> > 64KB chunks is only (1G/64K)*2*24(Tiger)=768K, or less than 1/10th 
> > of 1% of the total data being verified. So I'd say, just get it from
> > anyone who offers it, verify it's consistent with the desired root,
> > and keep it around -- nothing fancy. 
> 
> And so if the peer was jamming you, you have to download the whole
> tree again; repeat until you get a tree which matches the master hash.

Yes, but the whole tree is so small, why care? And since you have to 
discover nodes are bad before you can ignore them on subsequent 
transactions, so even in this bad-case scenario you can now ignore the
malicious node when later getting the file-content. 

And under your alternative proposal, you need the roughly the 
same amount of data -- or sometimes more -- to judge any node or 
subregion of the tree. 

(More specifically: if you were to specify a THEX serialization
type where 5 out of every 6 generations were omitted, each remaining
"level" of that format would be exactly equivalent to the 64-degree 
case.)

In any of these cases, the main benefit of tree-based verification
against malicious nodes remains: they now have to expend roughly as
much effort to interfere with tranfers as acquirers are expending
to receive transfers, and they are discovered almost instantly.
Malicious nodes can no longer inject tiny amounts of bad data into 
large downloads to impose an asymmetric cost on acquirers, who discover 
that their full-file is bad but don't know which region/peer was
responsible. 

> So THEX has two problems:
> 
> A) The rational jammer will always jam your tree downloads because
> they are 4-5x more vulnerable (he may jam content chunks also, but he
> gets best value for his investment by jamming your tree downloads).

This doesn't follow; you can be verifying the breadth-first serialization
tree download as it happens, and as soon as a single inconsistent byte
appears -- which you will always be able to detect within 2*hash_size
bytes -- you can dump that peer. And you can still keep all the data the 
bad peer sent you that did check out

With say SHA1, and disregarding the header overhead, that means you can't 
be fed more than 40 bad bytes before the problem is evident, because the
data doesn't match up with what you've previously verified. Compared to 
the 1K minimum resolution size on content downloads, that's 25x LESS 
vulnerable. 

Even so, these differences are all negligible at this level.

> B) If you download your tree in parts in parallel to speed it up, your
> tree download becomes even more vulnerable because when the tree fails
> to verify you won't know which chunks were jammed, and your overall
> jamming risk (for tree download) will be higher: it will be (1-p)^k
> (where p is the proportion of nodes which are jaming and k is the
> number of nodes downloaded from.  This factor will be multiplied by
> the jamming multiplier coming from A).

This doesn't follow either: some large portions of the tree segments
you've grabbed will be self-consistent; some will further be 
consistent with the desired root. You can keep all those. 

Only the segment(s) which are self-inconsistent or inconsistent with
the root value need be discarded. (This is actually a good reason 
for the redundant top-down tree format.)

> > On what criteria is this more simple or efficient?
> 
> Efficient in that the space overhead is negligibly higher, but the
> jamming resistance of the tree download portion is the same as the
> file portion.

As I've noted above, the jamming resistance of the top-down binary tree
is potentially to the resolution of every 2*hash_size byte range, much 
higher resolution than the 1K file portions. 

> > exactly analogous to the degree-64 tree: it includes segment summary
> > values covering the exact same amount of source data.)
> 
> Let's phrase the recursive download another way which would make it
> directly compatible with THEX.  I argue that A) and B) are undesirable
> properties that can be fixed by the following algorithm:
> 
> Download the 1KB sized 6th generation chunk of the THEX MHT (which can
> be done due to the serialization format).  Then download 1st of 64 1KB
> 12th generation chunks (and it can be verified against the 1st 16 byte
> hash in the 6th generation chunk), then download the next etc.  (They
> can be downloaded out of order from different hosts.)
> 
> Note that in this case the chunk size is selectable.  If the hash size
> is 16 bytes (a nice size to work with), and the desired chunk size is
> X bytes, then the generations to download to fill that chunk are
> multiples of log2(X/16).
>
> That algorithm could be input for a THEX draft 3. 

Sure, people could do this with the existing calculation (and serialization)
method. It just doesn't offer any tangible benefits to justify its added
complexity. You still need the data all the way up to the root before you
can make any judgements about the veracity of the lowest nodes. 

> (I'd also argue you
> should note that only the leaf nodes are needed due to the fast
> repopulation operation).

I agree it would be good to explicitly remind people that any one 
generation is enough to recalculate the rest of the tree, and if there's 
real-world demand, specify a single-generation serialization format. 

- Gordon @ Bitzi
____________________   
Gordon Mohr <gojomo@ . . . At Bitzi, people cooperate to identify, rate,
bitzi.com> Bitzi CTO . . . describe and discover files of every kind. 
_ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it!

From gojomo at bitzi.com  Fri Jul  4 13:33:02 2003
From: gojomo at bitzi.com (Gordon Mohr)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
References: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net> <20030704180124.A12837776@exeter.ac.uk>
Message-ID: <00be01c3426b$5be67670$660a000a@golden>

Adam Back writes:
> On Fri, Jul 04, 2003 at 09:02:29AM -0700, bert@web2peer.com wrote:
> > If more than half the nodes are adversarial, the network is going to
> > be prone to all kinds of attacks that you'll never be able to
> > surmount.
> 
> I think p2p download can work (albeit less efficiently) into quite
> high jamming levels.  

I agree with Adam here. No matter how many adversarial nodes there
are, if they are discovered as soon as they emit 1K of bad data, 
and thereafter ignored, even a tiny percentage of honest nodes can
quickly find each other and bootstrap a useful network.

Math rather than majorities will carry the day here.

- Gordon


From dtburton75 at buckeye-express.com  Fri Jul  4 15:39:01 2003
From: dtburton75 at buckeye-express.com (Doug Burton)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] unsubscribe me Please
Message-ID: <002e01c3427c$ffd9af20$6713a23f@douglasjjp7bcx>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zgp.org/pipermail/p2p-hackers/attachments/20030704/b5c8196a/attachment.html
From bert at web2peer.com  Fri Jul  4 17:17:02 2003
From: bert at web2peer.com (bert@web2peer.com)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
Message-ID: <20030704171611.27874.h022.c001.wm@mail.web2peer.com.criticalpath.net>

On Fri, 4 Jul 2003 13:32:15 -0700, "Gordon Mohr" wrote:

> I agree with Adam here. No matter how many adversarial nodes there
> are, if they are discovered as soon as they emit 1K of bad data, 
> and thereafter ignored, even a tiny percentage of honest nodes can
> quickly find each other and bootstrap a useful network.

Wishful thinking I think ;-) Adversarial nodes do more than simply
emit bad data during downloads. They are free hose/spoof/DOS the
search & discovery protocols as well. They may act in collusion with
many other nodes, and not necessary act adversarially in any
consistent manner, making detection extremely difficult. So the point
is, you may never get to the point where you ever are able to find
*any* honest nodes, unless those happen to be pre-programmed or
discovered via out of band means, or you rely on some kind of central
trusted authority.

This is admittedly speculation, but it's not blind. In theoretical
models of fully distributed computations in the presence of
adversaries (e.g. analysis of reputation or anonymizing systems) it's
highly unusual to guarantee any interesting properties when the level
of malicious (colluding) peers exceeds half the network.

From bert at web2peer.com  Fri Jul  4 17:17:04 2003
From: bert at web2peer.com (bert@web2peer.com)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
Message-ID: <20030704171611.27874.h022.c001.wm@mail.web2peer.com.criticalpath.net>

On Fri, 4 Jul 2003 13:32:15 -0700, "Gordon Mohr" wrote:

> I agree with Adam here. No matter how many adversarial nodes there
> are, if they are discovered as soon as they emit 1K of bad data, 
> and thereafter ignored, even a tiny percentage of honest nodes can
> quickly find each other and bootstrap a useful network.

Wishful thinking I think ;-) Adversarial nodes do more than simply
emit bad data during downloads. They are free hose/spoof/DOS the
search & discovery protocols as well. They may act in collusion with
many other nodes, and not necessary act adversarially in any
consistent manner, making detection extremely difficult. So the point
is, you may never get to the point where you ever are able to find
*any* honest nodes, unless those happen to be pre-programmed or
discovered via out of band means, or you rely on some kind of central
trusted authority.

This is admittedly speculation, but it's not blind. In theoretical
models of fully distributed computations in the presence of
adversaries (e.g. analysis of reputation or anonymizing systems) it's
highly unusual to guarantee any interesting properties when the level
of malicious (colluding) peers exceeds half the network.

From zooko at zooko.com  Fri Jul  4 17:44:02 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach 
In-Reply-To: Message from bert@web2peer.com 
   of "Fri, 04 Jul 2003 17:16:09 PDT." <20030704171611.27874.h022.c001.wm@mail.web2peer.com.criticalpath.net> 
References: <20030704171611.27874.h022.c001.wm@mail.web2peer.com.criticalpath.net> 
Message-ID: <E19Yb8f-0007Qj-00@localhost>

 bert@web2peer.com wrote:
>
> This is admittedly speculation, but it's not blind. In theoretical
> models of fully distributed computations in the presence of
> adversaries (e.g. analysis of reputation or anonymizing systems) it's
> highly unusual to guarantee any interesting properties when the level
> of malicious (colluding) peers exceeds half the network.

*Any* interesting properties?  The results that I am familiar with say things 
sort of like "You can't get the network to perform a general computation 
(i.e., execute an arbitrary program) correctly in the presence of more than 
X% misbehaving processors.".

Those "Byzantine" results are certainly valuable research, but they might 
easily be misapplied to peer-to-peer systems, where there are lots of 
properties that we might consider interesting that do *not* require general 
multiparty computation.

For example, in *general* you can't reliably ask a bunch of computers to tell 
you the contents of the 'sillynet://secrets.txt' file if you don't know how 
many of the computers you are talking to are malicious liars.

However, if you already have the SHA1 hash of the file, it is perfectly 
feasible to ask a bunch of computers to tell you the contents of the 
'mnet:38ppp56jbb8b64zrh8reoadzgn1zpdxc76enkmqduwtf4tug' file.

Now perhaps in some contexts finding the contents of a file when you already 
knew the hash of it doesn't count as an "interesting" property, but 
I'm certainly interested in doing that!  ;-)

Regards,

Zooko

http://zooko.com/
         ^-- under re-construction: some new stuff, some broken links

From blanu at bozonics.com  Fri Jul  4 19:44:02 2003
From: blanu at bozonics.com (Brandon Wiley)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and
 alternate approach
In-Reply-To: <20030704171611.27874.h022.c001.wm@mail.web2peer.com.criticalpath.net>
Message-ID: <Pine.NEB.4.03.10307042121250.16108-100000@galaxie.superpuppy.net>

> highly unusual to guarantee any interesting properties when the level
> of malicious (colluding) peers exceeds half the network.

This reminds me of a paper I'm fond on, "Dynamically Fault-Tolerant
Content Addressable Networks" by Jared Saia et. al., from IPTPS'02.

"after the removal of 2/3 of the peers by an omniscient adversary who can
choose which to destroy, 99% of the rest can access 99% of the remaining
data"

This is of course about node deletion, not adding evil nodes, but it's
such an awesome paper I had to mention it for those who have not read it
yet!


From baford at mit.edu  Sat Jul  5 05:51:02 2003
From: baford at mit.edu (Bryan Ford)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: THEX efficiency for authenticated p2p dload. and alternate approach
Message-ID: <200307050850.14503.baford@mit.edu>

This discussion seems strangely reminiscent of a debate between database 
engineers on whether to use binary trees or B-trees for indexes.  The 
question is simply whether you use the minimal branching arity necessary to 
form a tree (namely 2) or whether you increase the arity so as to make index 
blocks the same size as data blocks.  For databases the answer typically 
depends on where the index is stored.  Use a binary tree if the index can be 
stored entirely in main memory, because the traversal and update code is very 
quick and simple.  But use a B-tree if it's stored on disk, where you can 
only access data efficiently in blocks: if you access part of a disk block 
you might as well access the rest, so you want to make the most of each block 
access and minimize the number of blocks you have to traverse from root to 
leaf.

As far as what's appropriate here, I have to put myself behind Adam's 
proposal.  Requesting data from other nodes in a P2P network is certainly 
much more like disk access than memory access - you need to request (and 
verify) data in decent-size chunks for efficiency, but you also want a 
reasonable upper bound on the size of each chunk (e.g., to fit into a 1.5K 
Ethernet packet, or perhaps a 64K UDP datagram if you're feeling generous).   
THEX minimizes the amount of data that goes into computing each intermediate 
node (namely two sub-node/block hashes), at the cost of adding a _lot_ more 
intermediate levels to the tree (e.g., 6X more in the case of 16-byte hashes 
with 1K blocks) and increasing the total size of the metadata by almost a 
factor of two.

Although THEX theoretically allows one P2P host to request THEX tree data from 
another in random access fashion and verify it incrementally one (e.g., 
32-bit or 40-byte) intermediate node at a time, 40-byte requests are 
impractically small for network efficiency.  Instead the THEX proposal seems 
to be for hosts always to exchange complete metadata trees - but as already 
pointed out in this discussion, doing so merely delays the problem because, 
for large files, the complete metadata tree itself becomes unwieldy (at least 
for users with low-bandwidth connections) and needs to be broken up.

But breaking up large pieces of data into manageable fixed-size pieces is the 
purpose of using data blocks in the first place.  If we have to break up 
metadata as well as "plain" data, why should we use a different strategy or a 
different block size to break up the metadata as we used for the data?  Given 
the practical necessity of being able to break up and incrementally verify 
tree metadata _somehow_, I think that Adam's proposal is conceptually cleaner 
and simpler, it is likely to be easier to use in protocols and P2P 
applications because they only have to implement one data blocking mechanism 
rather than two, and it is certainly more efficient in terms of the total 
amount of metadata that must be stored and transferred for a given complete 
file.

Ultimately I think this debate will prove to be just like the binary tree vs 
B-tree debate in the database world, and if history is anything to go by, for 
any high-latency block/chunk/packet-based "storage medium" the B-trees are 
going to win hands-down.

Cheers,
Bryan

From bert at web2peer.com  Sat Jul  5 08:48:02 2003
From: bert at web2peer.com (bert@web2peer.com)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and  alternate approach
Message-ID: <20030705084700.27408.h022.c001.wm@mail.web2peer.com.criticalpath.net>

On Fri, 4 Jul 2003 21:24:59 -0500 (CDT), Brandon Wiley wrote:
> > highly unusual to guarantee any interesting properties when the
level
> > of malicious (colluding) peers exceeds half the network.
> 
> This reminds me of a paper I'm fond on, "Dynamically Fault-Tolerant
> Content Addressable Networks" by Jared Saia et. al., from IPTPS'02.
> 
> "after the removal of 2/3 of the peers by an omniscient adversary who
> can
> choose which to destroy, 99% of the rest can access 99% of the
> remaining
> data"
> 
> This is of course about node deletion, not adding evil nodes, but
it's
> such an awesome paper I had to mention it for those who have not read
> it
> yet!

Definitely an interesting result, though it applies only to a very
specific adversarial model. You'll also find this paper in IPTPS-2000
which considers much more diverse set of attacks:

Emil Sit and Robert Morris, Security Considerations for Peer-to-Peer
Distributed Hash Tables 

...unfortunately it doesn't provide any particularly strong results on
how to deal with them. There's a lot of interesting work remaining to
do in this area, that's for sure.

From bert at web2peer.com  Sat Jul  5 08:48:04 2003
From: bert at web2peer.com (bert@web2peer.com)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and  alternate approach
Message-ID: <20030705084700.27408.h022.c001.wm@mail.web2peer.com.criticalpath.net>

On Fri, 4 Jul 2003 21:24:59 -0500 (CDT), Brandon Wiley wrote:
> > highly unusual to guarantee any interesting properties when the
level
> > of malicious (colluding) peers exceeds half the network.
> 
> This reminds me of a paper I'm fond on, "Dynamically Fault-Tolerant
> Content Addressable Networks" by Jared Saia et. al., from IPTPS'02.
> 
> "after the removal of 2/3 of the peers by an omniscient adversary who
> can
> choose which to destroy, 99% of the rest can access 99% of the
> remaining
> data"
> 
> This is of course about node deletion, not adding evil nodes, but
it's
> such an awesome paper I had to mention it for those who have not read
> it
> yet!

Definitely an interesting result, though it applies only to a very
specific adversarial model. You'll also find this paper in IPTPS-2000
which considers much more diverse set of attacks:

Emil Sit and Robert Morris, Security Considerations for Peer-to-Peer
Distributed Hash Tables 

...unfortunately it doesn't provide any particularly strong results on
how to deal with them. There's a lot of interesting work remaining to
do in this area, that's for sure.

From bert at web2peer.com  Sat Jul  5 09:03:02 2003
From: bert at web2peer.com (bert@web2peer.com)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and  alternate approach
Message-ID: <20030705090238.27408.h022.c001.wm@mail.web2peer.com.criticalpath.net>

>You'll also find this paper in IPTPS-2000

Sorry , that should have been IPTPS-*2002*

From adam at cypherspace.org  Sat Jul  5 12:05:02 2003
From: adam at cypherspace.org (Adam Back)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
In-Reply-To: <00b601c3426a$9ff6e3f0$660a000a@golden>; from gojomo@bitzi.com on Fri, Jul 04, 2003 at 01:26:59PM -0700
References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden> <20030704105003.A13087560@exeter.ac.uk> <00b601c3426a$9ff6e3f0$660a000a@golden>
Message-ID: <20030705200440.A12656853@exeter.ac.uk>

On Fri, Jul 04, 2003 at 01:26:59PM -0700, Gordon Mohr wrote:
> Spreading your requests of the verification data out over many 
> peers only increases the chance that you'll have mixed data,
> some good, some bad. But as soon as you've found one good source,
> why not just stick with it (for the full tree or full generation)?
>
> [..] the whole tree is so small, why care? 

Because the first node you find may not be a performant one, and you
can not determine performance except empirically.  So if you used this
algorithm and you find an overloaded 56k modem, and you download the
entire tree from that single node serially your tree download even
though only 3% of the file size could easily take longer than the rest
of the file.  (Similarly this likely still holds for smaller
percentages corresponding to larger chunks).

> And under your alternative proposal, you need the roughly the 
> same amount of data -- or sometimes more -- to judge any node or 
> subregion of the tree. 

I can verify downloads in parallel because the intermediate tree auth
and leaf tree chunks are downloaded in a pattern to optimize that.

In my approach, the tree auth stuff (everything but the leafs) is
redundant, but facilitates parallel download of just the leafe nodes.

With your approach a full 50% of the download is redundant and the
main value of the non-leaf nodes (given the speed of repopulation) is
for authentication only.  This is a highly inefficient authenticaiton
approach compared to mine.

My authentication overhead is about 0.025% of file size and yours is
about 1.56% of file size; a 63x higher overhead.  (For 1KB chunks; the
same comparative efficiency ratio of the two approaches holds for
arbitrary chunk sizes).

> > A) The rational jammer will always jam your tree downloads because
> > they are 4-5x more vulnerable (he may jam content chunks also, but he
> > gets best value for his investment by jamming your tree downloads).
> 
> This doesn't follow; you can be verifying the breadth-first
> serialization tree download as it happens

If you download the full tree this is true.  But you have doubled the
tree data size to be able to verify as you download sequentially from
one node.

If however you download the leaf nodes only, with your method you can
verify _nothing_ until you've downloaded the _entire_ tree.
(Downloading just the leaves is something you said people were asking
to be able to do for efficiency reasons).

> With say SHA1, and disregarding the header overhead, that means you can't 
> be fed more than 40 bad bytes before the problem is evident, because the
> data doesn't match up with what you've previously verified. Compared to 
> the 1K minimum resolution size on content downloads, that's 25x LESS 
> vulnerable. 

I was taking the 1K block size to match the MTU.  That is you can't
download a smaller chunk than that (or you don't want to for
efficiency).  

You could in theory probe nodes good behavior with smaller chunks to
build confidence in a node, upping the chunk size over time I suppose.
And if that was important to you you might define a variable chunk
size.  (But still then my approach allows parallel downloads of just
the leaf nodes, just use log2(X/16) as the number of generations where
X is that desired smaller chunk size; if the chunk size is 16 bytes
then the approaches are equivalent).

> > B) If you download your tree in parts in parallel to speed it up, your
> > tree download becomes even more vulnerable because when the tree fails
> > to verify you won't know which chunks were jammed, and your overall
> > jamming risk (for tree download) will be higher: it will be (1-p)^k
> > (where p is the proportion of nodes which are jaming and k is the
> > number of nodes downloaded from.  This factor will be multiplied by
> > the jamming multiplier coming from A).
> 
> This doesn't follow either: some large portions of the tree segments
> you've grabbed will be self-consistent; some will further be 
> consistent with the desired root. You can keep all those. 

You are presuming downloading the entire tree (rather than just the
leaf nodes).  Anyway to follow this approach through (presuming you
will download the full tree):

- All chunks will be self-consistent (presuming a rational adversary).

- The problem will be that some will not reach the root.

My approach was about how to download in parallel if you want to
download just half the tree (the leaf nodes) and repopulate the rest.

A simple variant of my approach applies if you want to download the
full tree.  In this case similarly: you restrict your parallel fetches
to looking no more than 6 generations ahead of what you have fully
populated (presuming 16 byte hash).

So if you advocate downloading the full tree it might be worth
describing this variant of my algorithm to admit parallel downloads of
the full tree with no uncertainty.

> > > On what criteria is this more simple or efficient?
> > 
> > Efficient in that the space overhead is negligibly higher, but the
> > jamming resistance of the tree download portion is the same as the
> > file portion.
> 
> As I've noted above, the jamming resistance of the top-down binary tree
> is potentially to the resolution of every 2*hash_size byte range, much 
> higher resolution than the 1K file portions. 

At a cost of close to 50% transferred tree data expansion over my
approach (63x more tree auth data).

> > Note that in this case the chunk size is selectable.  If the hash size
> > is 16 bytes (a nice size to work with), and the desired chunk size is
> > X bytes, then the generations to download to fill that chunk are
> > multiples of log2(X/16).
> >
> > That algorithm could be input for a THEX draft 3. 
> 
> Sure, people could do this with the existing calculation (and serialization)
> method. It just doesn't offer any tangible benefits to justify its added
> complexity. You still need the data all the way up to the root before you
> can make any judgements about the veracity of the lowest nodes. 

This is why you download first the top 1KB to allow you to
authenticate the next 64KB (which you could download 1KB chunks of in
parallel); similary the 64KB (once downloaded) allows you to verify
the next 4MB downloaded in parallel etc.  So I'm not saying arbitrary
sequence, but still a sequence that admits ample parallelism early in
the download of the tree data.

> > (I'd also argue you
> > should note that only the leaf nodes are needed due to the fast
> > repopulation operation).
> 
> I agree it would be good to explicitly remind people that any one 
> generation is enough to recalculate the rest of the tree, and if there's 
> real-world demand, specify a single-generation serialization format. 

The optimal single-generation serialization format is the one I have
been specifying.  If you download a single-generation without using it
you have to re-download the entire tree if anything goes wrong even
from a single node.  In parallel from multiple nodes you won't even
know which parts of it are wrong, and then the bad probability of
failure (1-p)^k that I mentioned comes into play.  (Where p is
proportion of bad nodes, k number of nodes downloaded from in
parallel.)

Adam

From gojomo at bitzi.com  Sat Jul  5 19:47:02 2003
From: gojomo at bitzi.com (Gordon Mohr)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden> <20030704105003.A13087560@exeter.ac.uk> <00b601c3426a$9ff6e3f0$660a000a@golden> <20030705200440.A12656853@exeter.ac.uk>
Message-ID: <00b201c34368$c6732300$660a000a@golden>

Adam, 

I believe your approach to be a premature optimization,
against a theoretical attack which would not be rational 
for an adversary to attempt. 

Indeed, to consider the larger real-world context, adopting
any of these approaches means that injecting a small 
bad segment into a larger download can no longer corrupt a 
much larger file in such a way that the jammer is costly
to trace. 

Jammers who supply bad file blocks or bad verification
data can be discovered within 1KB (or other choosable
threshold) of their first bad data. 

I would suggest that this fact makes this sort of jamming --
malicious nodes claiming they'll supply something exact but 
then not supplying it -- sufficiently costly that it would 
rarely be tried. Attckers might as well just try a DOS of 
nonsense traffic floods. 

So cutting the (already-tiny) verification overhead by
another 40% or so in data tranferred seems to me like a 
negligible benefit compared to just getting something 
deployed, and there's nothing more simple than a top-down 
full-tree immediately-self-verifying transmission. 

If anyone wanted to do an extreme optimization, or in fact 
react to some real threat that someday emerges, the full-tree 
format does allow -- via subrange-requests -- random access 
to just the levels and portions of levels they want.

They could use your ignore-5-out-of-every-6 generations
specialization, or they could ignore 3-out-of-4 levels,
or 255-out-of-256 levels, whatever strikes their fancy. 
Using a binary tree at the core allows any of these other 
approaches to work.

I doubt any of these would ever be pursued, though. 
Seeing how real P2P systems have been programmed, deployed, 
and thrived (or not), people only need a "good-enough"
solution at each step, not an optimal one. The good-enough 
approach here is to ask a peer for the full tree (or full 
generation of interest). Then check to see if it's an honest 
tree (or honest up to some point). If not, discard the
dishonest part, mark the peer as bad, and try another 
peer at random.  Good enough, and since every approach
requires finding at least one honest peer, this approach
eventually succeeds in every case where success is at
all possible. 

There's just one numerical point I'd like to address:

Adam Back writes:
> > And under your alternative proposal, you need the roughly the 
> > same amount of data -- or sometimes more -- to judge any node or 
> > subregion of the tree. 
> 
> I can verify downloads in parallel because the intermediate tree auth
> and leaf tree chunks are downloaded in a pattern to optimize that.
> 
> In my approach, the tree auth stuff (everything but the leafs) is
> redundant, but facilitates parallel download of just the leafe nodes.
> 
> With your approach a full 50% of the download is redundant and the
> main value of the non-leaf nodes (given the speed of repopulation) is
> for authentication only.  This is a highly inefficient authenticaiton
> approach compared to mine.
> 
> My authentication overhead is about 0.025% of file size and yours is
> about 1.56% of file size; a 63x higher overhead.  (For 1KB chunks; the
> same comparative efficiency ratio of the two approaches holds for
> arbitrary chunk sizes).

That's a deceptive comparison, disregarding the leaves as 
overhead. And if leaving out 5/6 of the levels is good, 
why not leave out 7/8? 15/16?

It's just a tradeoff between bandwidth, granularity of 
resolution, expected level of malicious action, and 
code/doc complexity. 

I happen to think bandwidth is cheap -- and getting cheaper.
Code/doc complexity is expensive -- and can even be fatal
to adoption. Deploying any of these systems will cause
the expected level of jamming attacks to plummet, because
they'll no longer offer a big bang for the buck. And it's 
a pleasant fringe benefit that the most simple interchange
format -- a top-down fully-filled-in binary tree -- also 
offers the best possible progressive verification
granularity. 

Your tradeoffs may be different, but as noted above, your 
pattern of interactions can be implemented as an optional
specialization of the more simple and general approach if 
ever needed. 

- Gordon


From justin at chapweske.com  Sun Jul  6 19:12:03 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:21 2006
Subject: [P2Prg] Re: [p2p-hackers] THEX efficiency for authenticated p2p
 dload. and alternate approach
In-Reply-To: <20030704180124.A12837776@exeter.ac.uk>
References: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net> <20030704180124.A12837776@exeter.ac.uk>
Message-ID: <3F08D6D1.6020006@chapweske.com>

> 
> I don't think in the conclusion it adds any significant complexity as
> it can be expressed as just an approach to downloading the THEX tree.
> (Which the THEX document anyway supposes you could download in chunks:
> I just specify how to download those chunks in parllel in a size equal
> to your preferred data chunk size in a way which avoids jamming to the
> same degree as THEX plans for document chunks).  
> 

So, am I correct in understanding that the current THEX specification 
affords the flexibility necessary for you to implement your idea?

We intentionally do not specify how to use THEX in the draft.  We merely 
provide a flexible serialization format and let everyone adapt it to 
their own needs.  I believe this is as it should be, because each 
network will have different requirements.

In the next version of THEX we will be introducing a new optional 
serialization type, "rootleaves", for applications that only care about 
having a single row of hashes, and not the intermediate nodes.  If you 
combine this with a service to dynamically generate THEX trees at a 
specified depth, you now gain the ability to randomly request any single 
row of the hash tree w/o doing fancy byte range requests.

At some point the P2PRG should probably do a taxonomy of the major 
approaches to integrity verification.  Off the top of my head I can 
think of:

o full file hash

o block hashes

o hashed/signed/mac'd block hashes (tree hash with 2 levels)

o chained block hashes

o tree hashes

Any others worth mentioning?

-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From justin at chapweske.com  Sun Jul  6 20:12:02 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: THEX efficiency for authenticated p2p dload.
 and alternate approach
In-Reply-To: <200307050850.14503.baford@mit.edu>
References: <200307050850.14503.baford@mit.edu>
Message-ID: <3F08E4EE.8070500@chapweske.com>

> Although THEX theoretically allows one P2P host to request THEX tree data from 
> another in random access fashion and verify it incrementally one (e.g., 
> 32-bit or 40-byte) intermediate node at a time, 40-byte requests are 
> impractically small for network efficiency. 

I do not advocate any one way of implementing THEX.  But if you desire 
to access the tree in a fully random fashion, then simply pipeline the 
requests to maintain reasonable network efficiency.

Again, in practice the amount of THEX data used is a small fraction of 
the total file size, so even w/o pipelining, the performance impact will 
be negligable.

> Instead the THEX proposal seems 
> to be for hosts always to exchange complete metadata trees - but as already 
> pointed out in this discussion, doing so merely delays the problem because, 
> for large files, the complete metadata tree itself becomes unwieldy (at least 
> for users with low-bandwidth connections) and needs to be broken up.

The default breadthfirst serialization allows hosts to retrieve as 
little or much data as they desire.  Downloading more data simply 
increases the verification resolution.

You can also certainly fix the size of your THEX trees to a couple 
hundred kilobytes, as many of the Gnutella developers have done.  This 
way your verification resolution decreases as the file size increases, 
though the THEX file size stays constant.

And thats exactly what we try to accomplish with THEX.  We want people 
to slice and dice it as they see fit for their application.


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From justin at chapweske.com  Sun Jul  6 20:40:01 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and
 alternate approach
In-Reply-To: <20030705200440.A12656853@exeter.ac.uk>
References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden> <20030704105003.A13087560@exeter.ac.uk> <00b601c3426a$9ff6e3f0$660a000a@golden> <20030705200440.A12656853@exeter.ac.uk>
Message-ID: <3F08EB71.2040103@chapweske.com>

> 
> I can verify downloads in parallel because the intermediate tree auth
> and leaf tree chunks are downloaded in a pattern to optimize that.
> 
> In my approach, the tree auth stuff (everything but the leafs) is
> redundant, but facilitates parallel download of just the leafe nodes.
> 
> With your approach a full 50% of the download is redundant and the
> main value of the non-leaf nodes (given the speed of repopulation) is
> for authentication only.  This is a highly inefficient authenticaiton
> approach compared to mine.
> 

We are fully aware that for any given download/verification strategy, 
there will be an optimal THEX serialization that will be 50% smaller 
than the breadthfirst serialization.  However, that custom serialization 
format will not be as flexible as breadthfirst and may not meet the 
needs of networks operating under a different set of requirements.

Please note that the THEX serialization type is specified as a URI, 
which allows it to be assigned in a decentralized fashion.  So if you 
feel strongly about the advantages of a custom serialization type over 
the generic breadthfirst serialization for your specific application, 
then feel free to specify a new serialization type URI, such as 
(http://cypherspace.org/spec/thex/adambackserialization).


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From seth.johnson at RealMeasures.dyndns.org  Mon Jul  7 09:36:02 2003
From: seth.johnson at RealMeasures.dyndns.org (Seth Johnson)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Call for WIPO DG on Open and Collaborative Public Goods
Message-ID: <3F09A01B.EDA15121@RealMeasures.dyndns.org>

(Looks like we're about to call WIPO out on the carpet.  Information is the
one indisputable public good, whatever its form of organization.  Please see
the APPENDIX below for an overview of categories of public goods being
suggested.  -- Seth)


-------- Original Message --------
Subject: [Random-bits] WIPO DG asked to convene meeting on open and
collaborativeprojects to create public goods
Date: Mon, 7 Jul 2003 11:51:21 -0400 (EDT)
From: Jay Sulzberger <jays@panix.com>
To: fairuse-discuss@nyfairuse.org
CC: Jay Sulzberger <jays@panix.com>


 ---------- Forwarded message ----------
 Date: Mon, 07 Jul 2003 11:40:57 -0400
 From: James Love <james.love@cptech.org>
 To: random-bits@lists.essential.org, ecommerce
<ecommerce@lists.essential.org>
 Subject: [Random-bits] WIPO DG asked to convene meeting on open and
collaborative projects to create public goods

> http://www.cptech.org/ip/wipo/kamil-idris-7july2003.pdf

7 July 2003

Director General
Dr. Kamil Idris, Director General
World Intellectual Property Organization
Geneva, Switzerland

Dear Dr. Idris:

In recent years there has been an explosion of open and collaborative
projects to create public goods.  These projects are extremely important,
and they raise profound questions regarding appropriate intellectual
property policies.  They also provide evidence that one can achieve a high
level of innovation in some areas of the modern economy without intellectual
property protection, and indeed excessive, unbalanced, or poorly designed
intellectual property protections may be counter-productive.  We ask that
the World Intellectual Property Organization convene a meeting in calendar
year 2004 to examine these new open collaborative development models, and to
discuss their relevance for public policy.   (See Appendix following
signatures for examples of open collaborative projects to create public
goods).

Sincerely,

(in alphabetical order)

Alan Asher
Consumers Association
London, UK

Dr. K. Balasubramaniam
Co-ordinator of Health Action International, Asia Pacific
Columbo, Sri Lanka

Konrad Becker, Director
Institute for New Culture Technologies /t0
Vienna, Austria

Yochai Benkler
Professor of Law
Yale Law School
New Haven, CT  USA

Jonathan Berger
Law and Treatment Access Unit
AIDS Law Project
University of the Witwatersrand
South Africa

James Boyle
Professor of Law
Duke Law School
Durham, NC  USA

Diane Cabell
Director, Clinical Programs, Berkman Center for Internet & Society
Harvard Law School
Cambridge, MA, USA

Darius Cuplinskas
Director, Information Program
Open Society Institute
Budapest, Hungary

Marie de Cenival
Charg?e de mission ETAPSUD
Agence Nationale de Recherches sur le Sida  (A.N.R.S.)
INSERM 379 "Epid?miologie et Sciences Sociales appliqu?es ? l'innovation
m?dicale"
Marseille, France

Felix Cohen
CEO, Consumentenbond
The Hague, the Netherlands

Benjamin Coriat
Professor of Economics, University of Paris 13
Director of CEPN-IIDE, CNRS
Paris, France

Carlos Correa
Center for Interdisciplinary Studies on Industrial Property and Economics
University of Buenos Aires
Buenos Aires, Argentina

Paul A. David
Professor of Economics, Stanford University & Senior Fellow, Stanford
Institute for Economic Policy Research
Stanford, California, USA
Emeritus Fellow, All Souls College, Oxford & Senior Fellow, Oxford
Internet Institute
Oxford, UK

Kristin Dawkins
Vice President for International Programs
Institute for Agriculture and Trade Policy
Minneapolis, MN  USA

Peter T. DiMauro
Center for Technology Assessment
Washington, DC  USA

Rochelle Cooper Dreyfuss
Pauline Newman Professor of Law
New York University School of Law
NY, NY USA

Peter Eckersley,
Department of Computer Science, and IP Research Institute of Australia,
The University of Melbourne
Australia

Michael B. Eisen
Public Library of Science
San Francisco, CA,
and
Lawrence Berkeley National Lab
Berkeley, CA  USA

Nathan Geffen
Treatment Action Campaign
Cape Town, South Africa

Gwen Hinze
Staff Lawyer
Electronic Frontier Foundation
San Francisco, CA USA

Ellen F.M. 't Hoen LL.M.
Medecins sans Frontieres
Access to Essential Medicines Campaign
Paris, France

Jeanette Hofmann
Nexus & Social Science Research Center
Berlin, Germany

Aidan Hollis
Associate Professor, Department of Economics,
University of Calgary, and
TD MacDonald Chair in Industrial Economics
Competition Bureau, Industry Canada
Gatineau, Quebec  Canada

Dr Tim Hubbard
Head of Human Genome Analysis
Wellcome Trust Sanger Institute
Cambridge, UK

Nobuo Ikeda
Senior Fellow, Research Institute of Economy, Trade and Industry
Tokyo, Japan

Professor Wilmot James
Chair, Africa Genome Initiative
Social Cohesion & Integration Research Programme
Human Sciences Research Council
Cape Town, South Africa

Niyada Kiatying-Angsulee, Ph.D.
Drug Study Group
Thailand

Philippa Lawson
Senior Counsel, Public Interest Advocacy Centre
Ottawa, Canada

Lawrence Lessig
Professor at Law and Executive Director of the Center for Internet and
Society
Stanford Law School
Stanford, CA USA

James A. Lewis
Director, Technology and Public Policy Program
Center for Strategic and International Studies
Washington, DC  USA

Jiraporn Limpananont, Ph.D.
Pharmaceutical Patent Project, Social Pharmacy Research Unit (SPR),
Faculty of Pharmaceutical Sciences
Chulalongkorn University
Bangkok, Thailand.

James Love
Director, Consumer Project on Technology
Co-Chair, Trans Atlantic Consumer Dialogue (TACD) Committee on
Intellectual Property
Washington, DC USA

Jason M. Mahler
Vice President and General Counsel
Computer and Communications Industry Association
Washington, DC USA

Eric S. Maskin
A.O. Hirschman Professor of Social Science
Institute for Advanced Study
Princeton, NJ  USA

Professor Keith Maskus
Chair, Department of Economics
University of Colorado at Boulder.
Boulder, CO USA

Ken McEldowney
Executive Director
Consumer Action
California USA

William McGreevey
Director, Development Economics
Futures Group
Washington, DC  USA

Professor Jon Merz
Center for Bioethics
University of Pennsylvania
Philadelphia, PA USA

Jean Paul Moatti
Director, INSERM 379
Facult? de Sciences Economiques
Universit? de la M?diterran?e
Marseille, France


Eben Moglen
Professor of Law & Legal History
Columbia University
General Counsel, Free Software Foundation
NY, NY USA

Ralph Nader
Consumer Advocate
Washington, DC USA

Hee-Seob Nam, Patent Attorney
Intellectual Property Left
Korea Progressive Network JINBONET
Korea

James Orbinski MD
Associate Professor
Centre for International Health
University of Toronto, Canada

Bruce Perens
Director, Software in the Public Interest Inc.
Co-Founder, Open Source Initiative, Linux Standard Base
USA

Greg Pomerantz,
Fellow, Information Law Institute, New York University
New York, NY USA

Laurie Racine
President, Center for the Public Domain
Durham, NC USA

Eric S. Raymond
President, Open Source Initiative
USA

Juan Rovira
Senior Health Economist
The World Bank

Frederic M. Scherer
Emeritus, John F. Kennedy School, Harvard University
Cambridge, MA  USA

Mark Silbergeld
Consumer Federation of America
Washington, DC USA

Richard Stallman
Launched the development of the GNU operating system, whose GNU/Linux
variant is the principal competitor for Microsoft Windows.
Cambridge, MA USA

Anthony Stanco
Center of Open Source & Government
George Washington University
Washington, DC  USA

Joseph Stiglitz
Professor of Economics and Finance
Columbia University
Former Chief Economist World Bank
Chairman of the White House Council of Economic Advisers from 1995 to 1997
Received Nobel Prize for Economics in 2001
New York, NY USA

Peter Suber
Research Professor of Philosophy, Earlham College
Open Access Project Director, Public Knowledge
Senior Researcher, SPARC
Brooksville, ME, USA

Sir John Sulton
Winner of 2002 Nobel Prize for Physiology or Medicine
Former Director of the Wellcome Trust Sanger Institute
Cambridge, UK

Harsha Thirumurthy
Yale University, CT USA

Alexander C. Tsai, MD
Case Western Reserve University
Cleveland, OH USA

Pia Valota
ACU Associazione Consumatori Utenti ONLUS
AEC Association of European Consumers socially and environmentally aware
Milano, Italy

Professor Hal Varian
Dean, School of Information and Management Systems
University of California at Berkeley.
Berkeley, CA USA

Machiel van der Velde
Co-Chair, Trans-Atlantic Consumer Dialogue (TACD)
Committee on intellectual property
The Hague, the Netherlands

Victoria Villamar
le Bureau Europ?en des Unions de Consommateurs/
European Consumers' Organisation
Brussels, Belgium

Robert Weissman
Essential Action
Washington, DC USA

Professor Jonathan Zittrain
Co-Director, Berkman Center for Internet & Society
Harvard Law School
Cambridge MA USA


APPENDIX

Open collaborative projects to create public goods


These are some of the projects that could be discussed:

1.	The IETF and Open Network Protocols.

The Internet Engineering Task Force has worked for years to develop the
public domain protocols that are essential for the operation of the
Internet, an open network that has replaced a number of proprietary
alternatives.  It is important that WIPO acknowledge the success and
importance of the Internet, and appreciate and understand the way the IETF
functions.

The IETF is currently struggling with problems setting open standards. When
the IETF seeks to adopt a standard, there is uncertainty if anyone will
later claim the standard infringes a patent.  One suggestion to address this
problem is to create a system whereby a standards organization could
announce an intention to adopt a standard, and after a reasonable period for
disclosure, prevent parties from later enforcing non-disclosed infringement
claims.

2.	Development of Free and Open Software

This movement is highly decentralized, competitive, entrepreneurial,
heterogeneous, and devoted to the publishing of software that is freely
distributed and open.  It includes projects that embrace the GNU General
Public License (GPL), which uses copyright licenses to require that modified
versions also be free software, and projects such as FreeBSD, which use
minimal licensing restrictions and permit anyone to make non-free modified
versions, as well as projects such as MySQL, which release the code under
the GNU GPL but sell licenses to make non-free modified versions, as well as
many other approaches.

The new Apple operating system runs on top of FreeBSD, and big corporate
players like Oracle and IBM run databases and server software on the
mostly-GPL'd GNU/Linux operating system.  Apache is the leading web page
server software.  WIPO provides frequent forums where firms that embrace
closed and proprietary development models express their views, but very
little is heard from those who have embraced open and collaborative
development models for free software. The astonishing success of this
movement should be recognized by WIPO, and policy development should be open
to new ways of thinking.

These various actors have a variety of values and objectives.  Richard
Stallman of the Free Software Foundation says "the freedom to change and
redistribute software is a human right."  Others see this is as primarily an
issue of how to most efficiently develop and distribute software. The
proponents of open collaborative free software projects note that there are
powerful reasons why software code should be open and freely copied.  Not
only is it efficient to copy existing code in new programs, but the
transparency of the code allows a large community to find flaws and suggest
improvements (Linus Torvalds' observation, popularized by Eric Raymond's,
that "with enough eyeballs, all bugs are shallow").

The free software movement is very important to the success and the future
of the Internet, and it is also quite important in countering Microsoft's
massive monopoly power, particularly given the number of commercial
competitors to Microsoft that have disappeared.  In recent years many
governments have began to embrace open collaborative free software
projects.  Free software developers are concerned about a number of policies
that WIPO is involved in, including whether to allow patents on
computational ideas, the future development of digital rights management
schemes, and the enforceability of "shrink wrapped" or click-on contracts
that contain anticompetitive provisions.

3.	The World Wide Web.

If measured by the rate at which it has transformed the world, the World
Wide Web is the most important publishing success ever.  The web was built
on public domain protocols, and on documents that were from the beginning,
transparent and open at the level of source code.   Long before anyone even
knew how copyright would apply to the Internet, millions of documents were
being created for free distribution on the Internet.  Governments are now
routinely publishing documents and data on the web so it can be freely
available, as do multilateral institutions like WIPO.

The entire future of the Web will depend upon the extent to which new
digital copyright regimes permit such practices as hypertext linking, the
use of materials in search engines such as Google, and liberal views toward
fair use.

4.	The Human Genome Project (HGP).

In an April 14, 2003 state, the heads of state for the France, the US, the
UK, Germany, Japan and China issued a statement, which noted that:
"Scientists from six countries have completed the essential sequence of
three billion base pairs of DNA of the human genome, the molecular
instruction book of human life. . . This information is now freely available
to the world without constraints via public databases on the World Wide
Web."

If Presidents Jacques Chirac and George Bush, Prime Ministers Tony Blair and
Junichiro Koizumi, Chancellor Gerhard Schroeder and Premier WEN Jiabao can
collaborate on a statement to herald efforts to create a public domain
database, free from intellectual property claims, it is time for the World
Intellectual Property Organization to better appreciate why these
governments did not want the Human Genome patented.

5.	The SNP Consortium

A different example of a project to create a public domain database involves
single nucleotide polymorphisms (SNPs), which are thought to have great
significance in biomedical research.  In 1999, the SNP Consortium was
organized as a non-profit foundation to provide public data on SNPs.  The
SNP Consortium is composed of the Wellcome Trust and 11 pharmaceutical and
technological companies including Amersham Biosciences, AstraZeneca,
Aventis, Bayer, Bristol-Myers Squibb Company, Hoffmann-LaRoche, GSK, IBM,
Motorola, Novartis, Pfizer and Searle.  The work was preformed by the
Stanford Human Genome Centerm, Washington University School of Medicine (St.
Louis), the Sanger Centre and the Whitehead Institute for Biomedical
Research.   The mission of the SNP consortium was to develop up to 300,000
SNPs distributed evenly throughout the human genome and to make the
information related to these SNPs available to the public without
intellectual property restrictions.    By 2001 it had exceeded expectations,
and more than 1.5 million SNPs were discovered and made available to
researchers worldwide.  The SNPs consortium, the HGP and other similar
projects represent different notions regarding the intellectual property
rules for databases, and more information about these projects would be
useful in evaluating assumptions and informing debates in the WIPO Standing
Committee on Copyright as it considers current proposals to convene a
diplomatic conference to adopt a treaty on new sui generis intellectual
property rules for databases.

6.	Open Academic and Scientific Journals

The development of the Internet and the World Wide Web has fueled interest
in new models for publishing academic and scientific journals.   The prices
for traditional journals have been sharply rising for years, worsening the
gap between those who can afford access to information and those who
cannot.  In the past several years there has been a proliferation of
projects to create open academic and scientific journals.  The Public
Library of Science was founded by Nobel Prize winner Dr. Harold Varmus and
fellow researchers Patrick Brown and Michael Eisen.  The Free Online
Scholarship (FOS) movement, the creation of the widely read (for profit)
BioMed Central to provide "immediate free access to peer-reviewed biomedical
research," the Budapest Open Access Initiative (which has been endorsed by
210 organizations), and other similar projects seek to promote new business
models for publishing that allow academic and scientific information to be
more widely available to the research community.  Other efforts to provide
reduced price or free access to researchers in developing countries include
the Health InterNetwork, which was introduced by the United Nations'
Secretary General Kofi Annan at the UN Millennium Summit in the year 2000, a
number of projects sponsored by the International Network for the
Availability of Scientific Publications, eIFL.Net (Electronic Information
for Libraries), a foundation that "strives to lead, negotiate, support and
advocate for the wide availability of electronic resources by library users
in transition and developing countries," and a new effort by the Creative
Commons to create a license for free access to copyrighted materials in
developing countries.  Recently US Congressman Martin Sabo introduced
legislation to require all US funded research to enter the public domain,
and others are calling for international cooperation to similarly enhance
the scientific commons.

7.	The Global Positioning System.

This is not an example of collaborative development model, but it does
illustrate the benefits of providing a free information good, in terms of
stimulating the development of an entire generation of new applications.  
If lighthouses are considered a textbook example of a public good, the
modern equivalent might be the Global Positioning System (GPS), which
provides the entire world highly accurate positioning and timing data via
satellites.    GPS signals are used for air, road, rail, and marine
navigation, precision agriculture and mining, oil exploration, environmental
research and management, telecommunications, electronic data transfer,
construction, recreation and emergency response. There are an estimated 4
million GPS users worldwide.  The services are offered without charge. 
Following the Korean Airline disaster, President Reagan offered GPS free to
promote increased safety for civil aviation, and more recently President
Clinton eliminated the intentional degrading of the system for civilian use.
NASA reports that "many years ago we evaluated charging for the civil
signal. The more we looked at it, the more convinced we became that by
providing the signal free of direct user fees we would encourage
technological development and industrial growth. The benefits from that, the
new jobs created, and the increased safety and efficiency for services more
than outweighed the money we would get from charging ? especially when you
consider the additional bureaucracy that would be needed to manage cost
recovery. We think that judgement has proven valid, as the world-wide market
for GPS applications and services now exceeds $8 billion annually."

--
James Love, Director, Consumer Project on Technology
http://www.cptech.org, mailto:james.love@cptech.org
tel. +1.202.387.8030, mobile +1.202.361.3040

_______________________________________________
Random-bits mailing list
Random-bits@lists.essential.org
http://lists.essential.org/mailman/listinfo/random-bits


From painlord2k at libero.it  Mon Jul  7 12:45:01 2003
From: painlord2k at libero.it (Mirco Romanato)
Date: Sat Dec  9 22:12:21 2006
Subject: [P2Prg] Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
References: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net> <20030704180124.A12837776@exeter.ac.uk> <3F08D6D1.6020006@chapweske.com>
Message-ID: <000101c344c0$269f75a0$99b5fea9@painlordcave1>

----- Original Message ----- 
From: "Justin Chapweske" <justin@chapweske.com>
To: "Adam Back" <adam@cypherspace.org>
Cc: <p2p-hackers@zgp.org>; <p2prg@ietf.org>
Sent: Monday, July 07, 2003 4:11 AM
Subject: Re: [P2Prg] Re: [p2p-hackers] THEX efficiency for
authenticated p2p dload. and alternate approach

I write a few example, correct me as you feel need.


> At some point the P2PRG should probably do a taxonomy of the major
> approaches to integrity verification.  Off the top of my head I can
> think of:

> o full file hash
Like SHA-1 in gnutella

> o block hashes
xBin? verify a file chunk against the root hash value?
Mnet?

> o hashed/signed/mac'd block hashes (tree hash with 2 levels)
like CD4 used in the eDonkey network?

> o chained block hashes
like in chained block critography?
this need the previous block/hash in the series to verify the actual
block against its hash?


> o tree hashes
like TigerTree in Gnutella


> Any others worth mentioning?


Mirco


From lgonze at panix.com  Mon Jul  7 12:50:02 2003
From: lgonze at panix.com (Lucas Gonze)
Date: Sat Dec  9 22:12:21 2006
Subject: [P2Prg] Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
In-Reply-To: <000101c344c0$269f75a0$99b5fea9@painlordcave1>
Message-ID: <2DC9873D-B0B4-11D7-AA05-000393455590@panix.com>

>> Any others worth mentioning?
>>

One other: downloading overlapping segments from different servers and 
comparing the overlaps.

That's insecure, obviously.

- Lucas


From varx32 at umkc.edu  Tue Jul  8 09:07:02 2003
From: varx32 at umkc.edu (Rajegowda, Vikas Aralaguppe (UMKC-Student))
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] unsubscribe
Message-ID: <051D9E794E394F4B8A03A2FC4F234D78452B39@KC-MAIL4.kc.umkc.edu>

 
	-----Original Message----- 
	From: bert@web2peer.com [mailto:bert@web2peer.com] 
	Sent: Fri 7/4/2003 7:16 PM 
	To: p2p-hackers@zgp.org 
	Cc: p2p-hackers@zgp.org; p2prg@ietf.org; adam@cypherspace.org 
	Subject: Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
	
	
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 3674 bytes
Desc: not available
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030708/9a1116c8/attachment.bin
From bram at gawth.com  Tue Jul  8 17:18:02 2003
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and
 alternate approach
In-Reply-To: <20030704054745.B13145352@exeter.ac.uk>
Message-ID: <Pine.LNX.4.21.0307081716350.8932-100000@ultra.gawth.com>

For what it's worth, BitTorrent does one dumber than Adam's approach and
always has exactly two levels it its 'tree' - the list of all hashes, then
everything else.

It's easy to specify, easy to implement, seems to work fine, and unlike
THEX, is widely deployed.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From gojomo at bitzi.com  Wed Jul  9 01:02:02 2003
From: gojomo at bitzi.com (Gordon Mohr)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
References: <Pine.LNX.4.21.0307081716350.8932-100000@ultra.gawth.com>
Message-ID: <01be01c345f0$50244e10$660a000a@golden>

Bram Cohen writes:
> For what it's worth, BitTorrent does one dumber than Adam's approach and
> always has exactly two levels it its 'tree' - the list of all hashes, then
> everything else.

EDonkey uses a similar approach, though its bottom level blocks
are a fixed 9.5MBs in size. 

> It's easy to specify, easy to implement, seems to work fine, and unlike
> THEX, is widely deployed.

Yes, and that's further evidence that "simple and good enough" is 
often all that's needed. Even though the verification trees must
be fetched in their entirety before they can be confirmed, that's
not much of a problem or vulnerability in practice, at least when
considering the class of people seeking many-megabyte files.

The chief advantage I see in the deep-tree (THEX) approach is that
it scales to any desired block size and verification resolution,
while retaining the same root values. 

With deep trees, a paranoid darknet which does all sharing via 
origin-spoofed multiply-forwarded self-verifying 1KB unreliable-
transport packets can use the same file-identity root values
as a mainstream CDN using long-lived TCP connections to share 
1MB blocks at a time. 

Further, if the level of malicious jamming rises on a network, 
cooperating nodes may then opt to verify incoming data at a finer 
resolution, minimizing the amount of wasteful activity any dishonest 
node can trigger.

To vary the verification resolution of BitTorrent identifiers, a 
new block size must be chosen, which then alters the top-level 
identifier.

- Gordon @ Bitzi

From magnus at bodin.org  Wed Jul  9 03:58:08 2003
From: magnus at bodin.org (Magnus Bodin)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] RFC3548 is out
Message-ID: <20030709032024.GC15571@bodin.org>

Just for curiosity; RFC3548 is finally out and it has a nice little
reference to this list:

   [8] Wilcox-O'Hearn, B., "Post to P2P-hackers mailing list", World
       Wide Web http://zgp.org/pipermail/p2p-hackers/2001-
       September/000315.html, September 2001.

i.e.

<http://zgp.org/pipermail/p2p-hackers/2001-September/000315.html>


I have a copy of the RFC here for laziness: <http://rfc3548.x42.com/>

/magnus

-- 
http://x42.com

From hopper at omnifarious.org  Wed Jul  9 11:39:02 2003
From: hopper at omnifarious.org (Eric M. Hopper)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and
	alternate approach
In-Reply-To: <20030704054745.B13145352@exeter.ac.uk>
References: <20030704054745.B13145352@exeter.ac.uk>
Message-ID: <1057775910.13939.896.camel@monster.omnifarious.org>

Skipped content of type multipart/related-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 185 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030709/ab0d3732/attachment.pgp
From justin at chapweske.com  Wed Jul  9 12:31:02 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and
 alternate approach
In-Reply-To: <1057775910.13939.896.camel@monster.omnifarious.org>
References: <20030704054745.B13145352@exeter.ac.uk> <1057775910.13939.896.camel@monster.omnifarious.org>
Message-ID: <3F0C6D68.4060106@chapweske.com>

+1

Eric M. Hopper wrote:
> On Thu, 2003-07-03 at 23:47, Adam Back wrote:
> 
>> /So one common p2p download approach is to download a file in parts 
>> in/ /parallel (and out of order) from multiple servers.  (Particularly 
>> to/ /achieve reasonable download rates from multiple asynchronous 
>> links of/ /varying link speeds).  A common idiom is also that there is 
>> a compact/ /authenticator for a file (such as it's hash) which people 
>> will supply/ /as a document-id./ 
> 
> 
> There is an important thing you can do with an order 2 authentication 
> tree that is much harder to do with a tree of an order larger than 2.  
> That's download the authentication data needed to verify each packet 
> against the root along with the packet itself.
> 
> For example, if you have a 2-way THEX tree for 2^3*blocksize data, it 
> will look something like this:
> 
> 
> 
> If someone transmits the data for node A, in order to verify node A 
> completely, the hashes for B, J, and N need to be transmitted.  No other 
> hashes are needed since they are already known, as in the case for the 
> root node, or can be calculated.
> 
> If someone then transmits the data for node B, no hashes need to be 
> transmitted since the reciever already has all the needed hashes.  For 
> C, only D is needed.
> 
> If you have an 8-way THEX tree, you end up with a diagram like this:
> 
> 
> 
> If someone recieves node A, they will have to also get the hashes for 
> nodes B-H in order to verify node A.  This is MUCH more information than 
> with a 2-way THEX tree, and as the depth of both trees grows, the 2-way 
> tree is favored more and more.
> 
> I think one useful measure is how much data is needed to verify a given 
> block as compared to the size of a block.  If you have a block size of 
> 64KB, and a hash data size of 32 bytes (I think SHA-1 is just a little 
> too weak, and prefer SHA2-256), then you can deal with a 16MB file and 
> still ensure that the maximum amount of data needed to verify any given 
> block is less than the size of a block.  If you only use a 16 byte hash, 
> then you can send 4GB file and still keep that property.  If you 
> maintain a 32 byte hash, but go to a 128KB block, you can transmit an 
> 8GB file and maintain that property.
> 
> This is also highly resistant to jamming.  If you get the verification 
> data from the same node that sent you the data block in the first place, 
> that node will be unable to spoof the verification data to make a bad 
> block look like a good one.  If you get the verification data from a 
> different node than sent you the data block, that node will be unable to 
> spoof the hashes in order to make a good block look like a bad one.  So 
> errors in either the verification hashes, or the block are easily and 
> quickly detectable.
> 
> Lastly, it is easy to specify which subset of verification data you 
> need.  Any data block you recieve may need 
> log2(number_of_blocks_in_file) hashes worth of verification data.  You 
> can simply send a bitstring with one bit for each hash you might 
> potentially need saying whether or not you actually need it or not.
> 
> Sorry this is in HTML.  I just didn't want to have to use ASCII art for 
> the diagrams because it'd be a huge and annoying pain.
> 
> Have fun (if at all possible),
> 
> --
> There's an excellent C/C++/Python/Unix/Linux programmer with a wide 
> range of other experience and system admin skills who needs work. 
> Namely, me. _http://www.omnifarious.org/~hopper/resume.html_
> -- Eric Hopper <hopper@omnifarious.org>
> 
> 


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From darkelf at arabia.com  Wed Jul  9 13:39:02 2003
From: darkelf at arabia.com (Oscar Cisneros)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] RFC3548 is out
Message-ID: <388601c34659$9e6070e0$6ac9010a@mail2world.com>

Can you elaborate, please? What is this RFC all about?
 
-Oscar
http://emote.net

<-----Original Message----->

 	  	 From: Magnus Bodin
Sent: 9/7/2003 5:24:24 AM
To: p2p-hackers@zgp.org
Subject: Re: [p2p-hackers] RFC3548 is out 


Just for curiosity; RFC3548 is finally out and it has a nice little 
reference to this list: 

[8] Wilcox-O'Hearn, B., "Post to P2P-hackers mailing list", World 
Wide Web http://zgp.org/pipermail/p2p-hackers/2001- 
September/000315.html, September 2001. 

i.e. 


I have a copy of the RFC here for laziness: 

/magnus 

-- 
http://x42.com 
_______________________________________________ 
p2p-hackers mailing list 
p2p-hackers@zgp.org 
http://zgp.org/mailman/listinfo/p2p-hackers 
	

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zgp.org/pipermail/p2p-hackers/attachments/20030709/bde26f14/attachment.html
From magnus at bodin.org  Wed Jul  9 14:01:01 2003
From: magnus at bodin.org (Magnus Bodin)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] RFC3548 is out
In-Reply-To: <388601c34659$9e6070e0$6ac9010a@mail2world.com>
References: <388601c34659$9e6070e0$6ac9010a@mail2world.com>
Message-ID: <20030709210018.GL7976@bodin.org>

On Wed, Jul 09, 2003 at 01:35:21PM -0700, Oscar Cisneros wrote:
> Can you elaborate, please? What is this RFC all about?

It's finally an RFC that covers _only_ base64, base32 and base16 so
other standards may refer to that one instead of some embedded stuff.

It's nothing fancy at all. Just found it funny that it referenced 
this list for just a minor comment about URL-safe chars.

/magnus

-- 
http://x42.com

From jlevine at bayarea.net  Wed Jul  9 14:11:02 2003
From: jlevine at bayarea.net (jlevine@bayarea.net)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] P2punks meeting next Monday evening July 14 7:30pm
Message-ID: <Pine.NEB.4.21.0307091406400.15903-100000@shell2.bayarea.net>


You know the routine...

Also I have a few new Glyphguy t-shirts for anyone
whose wardrobe is getting thin.

See you there.

--------------

 
Where:
 
  Dana Street Roasting Company
  744 Dana St., Mountain View
  Phone: (650) 390-9638
 
  1/2 block off Castro St.

When:

  7:30pm onward 


Website:
 
  http://www.bitbin.org/p2punks


From adam at cypherspace.org  Thu Jul 10 01:00:02 2003
From: adam at cypherspace.org (Adam Back)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
In-Reply-To: <1057775910.13939.896.camel@monster.omnifarious.org>; from hopper@omnifarious.org on Wed, Jul 09, 2003 at 01:38:30PM -0500
References: <20030704054745.B13145352@exeter.ac.uk> <1057775910.13939.896.camel@monster.omnifarious.org>
Message-ID: <20030710085938.A13047545@exeter.ac.uk>

On Wed, Jul 09, 2003 at 01:38:30PM -0500, Eric M. Hopper wrote:
> There is an important thing you can do with an order 2
> authentication tree that is much harder to do with a tree of an
> order larger than 2.  That's download the authentication data needed
> to verify each packet against the root along with the packet itself.

Yes Gordon made this same observation.  At the end of that thread I
think we reached the conclusion that it would be more flexible to not
compute the 8-way THEX the way you show in the 8THEX diagram (actually
I proposed 64-way, but that's less convenient to draw).  Instead the
8-way would just be the 8 leaf nodes from diagram 2THEX.  This allows
you to mix requirements.  If you want the possibility to do what you
describe (download just the auth nodes necessary to authenticate a
given chunk), you can do that.  If you want the possibility to
optimally efficiently download the tree itself in parallel and be able
to verify chunks at some resoultion immediately you do what I
described (download stripes through the tree at 6 generational gaps in
sequence -- ie download first the 64 nodes at generation 6 in
parallel; once you've done that you can download the 64^2 nodes at
generation 12 etc; once you've got the full leaf node set you can then
download the entire file in parallel with immediate verification of
bad chunks).

And if you don't care about that extra space efficiency of downloading
the leaves only, but you do care about downloading the tree in
parallel, you can do the variant of what I described where you
download the first chunks worth of full tree.  Then you download in
parallel from that offset the next section of full tree, again no
further than 6 generations ahead of what you have fully populated.

(Where the generation gap is given by log2(chunk-size/hash-size)).

> I think one useful measure is how much data is needed to verify a
> given block as compared to the size of a block.  If you have a block
> size of 64KB, and a hash data size of 32 bytes [...] then you can
> deal with a 16MB file and still ensure that the maximum amount of
> data needed to verify any given block is less than the size of a
> block.  [...]

And that would be a simplifying argument?  (Ie if the auth data is the
auth chunk size, then you can by definition downlaod it serially).

Or just an interesting statistic?

> Lastly, it is easy to specify which subset of verification data you
> need.  Any data block you recieve may need
> log2(number_of_blocks_in_file) hashes worth of verification data.
> You can simply send a bitstring with one bit for each hash you might
> potentially need saying whether or not you actually need it or not.

That might be an interesting little protocol for THEX input.

Note however it only works where each node _has_ the whole file.  For
systems which do swarmcasting (bittorrent, edonkey?) then they don't.

However I suppose if one wanted this, they could retain the log2 hash
path that presumably they got when they fetched that chunk.  (You
could coalesce chunk lo2 hash paths as they are downloaded if desired
to save local storage space, and still be able to recreate the paths).

gnunet supports downloads of files in minimally UDP suitable chunk
sizes (if I recall Chris Grothoff said 1KB chunks in his PET03
presentation or chatting afterwards).

Adam

From adam at cypherspace.org  Thu Jul 10 01:07:02 2003
From: adam at cypherspace.org (Adam Back)
Date: Sat Dec  9 22:12:21 2006
Subject: bittorrent tree mechanism (Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach)
In-Reply-To: <Pine.LNX.4.21.0307081716350.8932-100000@ultra.gawth.com>; from bram@gawth.com on Tue, Jul 08, 2003 at 05:17:50PM -0700
References: <20030704054745.B13145352@exeter.ac.uk> <Pine.LNX.4.21.0307081716350.8932-100000@ultra.gawth.com>
Message-ID: <20030710090630.B13047545@exeter.ac.uk>

On Tue, Jul 08, 2003 at 05:17:50PM -0700, Bram Cohen wrote:
> For what it's worth, BitTorrent does one dumber than Adam's approach and
> always has exactly two levels it its 'tree' - the list of all hashes, then
> everything else.
> 
> It's easy to specify, easy to implement, seems to work fine, and unlike
> THEX, is widely deployed.

I'd guess that the limit of what you can draw from the empirical
evidence is that it works fine from a functional perspective rather
than necessarily an adversarial one.

That is to say as far as I know we are at this point ahead of the
arms-race with the jammers who are satisfying themselves with
exploiting the weakness of the rating systems to discover pre-rated
known-good hashes - and just publishing mislabelled files, empty files
an dfiles full of taunts.


Do you swarm-cast the tree?  Or is the tree downloaded from the
bittorrent index server?  Or is it downloaded from a random node?

Do you check the consistency of the 2nd level with respect to the
master hash prior to swarmcasting content?

What would the bittorrent client do if the tree failed?  Fail with
error message or repeat until success?

Adam

From adam at cypherspace.org  Thu Jul 10 01:12:02 2003
From: adam at cypherspace.org (Adam Back)
Date: Sat Dec  9 22:12:21 2006
Subject: gnunet transport (Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach)
In-Reply-To: <01be01c345f0$50244e10$660a000a@golden>; from gojomo@bitzi.com on Wed, Jul 09, 2003 at 01:01:32AM -0700
References: <Pine.LNX.4.21.0307081716350.8932-100000@ultra.gawth.com> <01be01c345f0$50244e10$660a000a@golden>
Message-ID: <20030710091117.C13047545@exeter.ac.uk>

On Wed, Jul 09, 2003 at 01:01:32AM -0700, Gordon Mohr wrote:
> With deep trees, a paranoid darknet which does all sharing via 
> origin-spoofed multiply-forwarded self-verifying 1KB unreliable-
> transport packets

btw that transport, chunk size, and forwarding for server and client
anonymity objective describes gnunet I believe (or at least one of
it's transports).

Not sure if there are any gnunet people on this list.

Adam

From adam at cypherspace.org  Thu Jul 10 01:34:02 2003
From: adam at cypherspace.org (Adam Back)
Date: Sat Dec  9 22:12:21 2006
Subject: p2p network assumptions for download auth problem (Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach)
In-Reply-To: <00b601c3426a$9ff6e3f0$660a000a@golden>; from gojomo@bitzi.com on Fri, Jul 04, 2003 at 01:26:59PM -0700
References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden> <20030704105003.A13087560@exeter.ac.uk> <00b601c3426a$9ff6e3f0$660a000a@golden>
Message-ID: <20030710093327.A13197394@exeter.ac.uk>

On Fri, Jul 04, 2003 at 01:26:59PM -0700, Gordon Mohr wrote:
> But the best you can do, in a network where many peers are malicious,
> is to identify them as soon as possible, and then ignore them,
> preferring those who are non-malicious. 

Actually I think this assumption may not be necessarily true in the
general case.  This goes back to what assumptions about what the p2p
network one makes.

The current kazaa jamming is not by peers, but by publishers.  The
publishers try to make their content look otherwise attractive (by
giving rave review self-proclaimed meta data) to encourage others to
download, and then magnify the effect, as people tend to download from
fast sources, which means many sources; and for it to reach many
sources it has to compete with good content which people will tend to
delete less quickly.  Of course they may also be running a number of
peers holding their jammed data to firstly publish, but secondly to
inflate it's apparent popularity.

My interest is in server and user privacy, so I tend to think in terms
of the p2p network features that enable them.

So to get back to the assumptions, if we take the full general case,
which is I think has aspects of gnunet (UDP transport option, small
packets, forwarded so you don't necessarily know which node is
serving, and add swarmcasting to that mix, and server deniability wrt
to what it is serving, then ignoring nodes that serve bad content
might not even be optimal as firstly you don't know which node served
it, and serial tree downloads from a single node may not be available
due to fragmentation and server deniability.

So I'd say in the general case I think you need the parallel
but authenticatable per packet approach I gave.

Freenet also does the chunking and redirection with cacheing for
server privacy, user privacy and dynamic load balancing.

In specific simpler cases where you don't care about tree efficienciy,
server deniability isn't there, forwarding and privacy isn't there
then practically you can do just not bother for simplicity (as perhaps
bittorrent may be doing).

Adam

From hopper at omnifarious.org  Thu Jul 10 08:59:02 2003
From: hopper at omnifarious.org (Eric M. Hopper)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and
	alternate approach
In-Reply-To: <20030710085938.A13047545@exeter.ac.uk>
References: <20030704054745.B13145352@exeter.ac.uk>
	 <1057775910.13939.896.camel@monster.omnifarious.org>
	 <20030710085938.A13047545@exeter.ac.uk>
Message-ID: <1057852681.13939.924.camel@monster.omnifarious.org>

On Thu, 2003-07-10 at 02:59, Adam Back wrote:
> And that would be a simplifying argument?  (Ie if the auth data is the
> auth chunk size, then you can by definition downlaod it serially).
> 
> Or just an interesting statistic?

Interesting statistic.  Since a blocksize is chosen mainly because of
network and protocol stack properties, it should be fixed for the
lifetime of a protocol run.  It's nice to see then what you can
accomplish with a full block of data.

> > Lastly, it is easy to specify which subset of verification data you
> > need.  Any data block you recieve may need
> > log2(number_of_blocks_in_file) hashes worth of verification data.
> > You can simply send a bitstring with one bit for each hash you might
> > potentially need saying whether or not you actually need it or not.
> 
> That might be an interesting little protocol for THEX input.
> 
> Note however it only works where each node _has_ the whole file.  For
> systems which do swarmcasting (bittorrent, edonkey?) then they don't.
> 
> However I suppose if one wanted this, they could retain the log2 hash
> path that presumably they got when they fetched that chunk.  (You
> could coalesce chunk lo2 hash paths as they are downloaded if desired
> to save local storage space, and still be able to recreate the paths).

This is what I was imagining would be done.  Every node should have the
hash values to be able to verify all the blocks it has recieved, and
therefor could validly send.

> gnunet supports downloads of files in minimally UDP suitable chunk
> sizes (if I recall Chris Grothoff said 1KB chunks in his PET03
> presentation or chatting afterwards).

I also miscalculated badly as I used the size of the hash in bits rather
than bytes.

Using correct math...  With 1KB blocks, you could have a 4TB file and
still have all the verification data for a block fit into a block.  This
is even with a 256 bit hash.  If you were willing to expand the
verification blocks to 1280 bytes, you could have this property for a
1PB file.

From what I understand, Hollywood digital masters are in the low
terabyte range, so a 1280 byte verification block should be good until
everybody's adopted gigabit ethernet and larger block sizes make a lot
of sense.  :-)

Have fun (if at all possible),
--
There's an excellent C/C++/Python/Unix/Linux programmer with a wide
range of other experience and system admin skills who needs work.
Namely, me. http://www.omnifarious.org/~hopper/resume.html
-- Eric Hopper <hopper@omnifarious.org>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 185 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030710/802b4e2e/attachment.pgp
From b.fallenstein at gmx.de  Thu Jul 10 12:56:02 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Comments on draft-thiemann-cbuid-urn-00
Message-ID: <3F0DC458.6090401@gmx.de>

[cc:ed to p2p-hackers and urn-nid]

Dear Peter,

I just saw your recent I-D in the archives of the urn-nid mailing list. 
(I'm not subscribed, thus the delay in reaction.)

http://lists.research.netsol.com/pipermail/urn-nid/2003-June/000348.html

I'm working on a system (Storm, <http://sv.gnu.org/projects/storm/>) 
having very similar addressing needs; in fact, I've been in the process 
of preparing a registration for an informal URN namespace using MIME 
type + cryptographic hash to identify typed octet stream resources. We 
are building a p2p-based extension to the Web, based on this.

I think there are also plans to register a formal namespace for 
'bitprints'--

     http://bitzi.com/developer/bitprint

which are combinations of a SHA-1 hash and a Merkle hash tree root using 
  Tiger hashes. This namespace would not include content types.

Bitprints are used by Bitzi <http://bitzi.com> and Onion Networks 
<http://onionnetworks.com>. Our project also uses bitprints for 
compatibility. (The only thing setting our namespace apart is that we 
include MIME types.)

I hope that we can archieve some collaboration here. It would be good if 
we could reach agreement on a single namespace, so that URNs in this 
namespace can be resolved with either of our systems.

Below are some high-level comments regarding the approach of your I-D. 
(I'll save low-level comments for a later stage of the discussion. :) )

- It seems obvious that a shared namespace would be a good thing. If 
someone made a link between two files in one of your repositories, for 
example, and someone else published the linked-to file using our system, 
then a browser understanding both systems could follow the link from 
your repository by downloading the file through our system.

- Your proposal attempts to be general by allowing different hash 
functions to be used. I am wondering about that: It seems good practice 
to keep protocol parameters extensible, but it also means that there 
will be different ids for the same resource-- impractical when you try 
to look it up under one id, but it's stored under another!

On the other hand, if you say "it's *this* hash" there will be people 
who'll want to use another, and they'd have to create their own 
namespace. Since this is not a standard-- it's an informational RFC-- 
there is no reason why they wouldn't.

I think the way to go would be to provide a general namespace for 
hash-based URNs, but to specify one 'prefered' way to use it, noting 
that there are several systems implementing this way, already. But I'm 
not sure.

- All your identifiers use only a single hash function. Especially for 
URNs, which are supposed to be long-lived, there is a good reason to use 
hashes generated by more than one function: If one of the functions is 
broken (but the other isn't simultaneously), the ids don't become 
useless; you can use timestamping to extend the lifetime of your ids 
indefinitely (as long as the two hash functions you happen to use at a 
given time are never broken simultaneously). [unforch the only reference 
I can find on this right now is the US patent on it :-(, # 5,373,561 
<http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=5,373,561.WKU.&OS=PN/5,373,561&RS=PN/5,373,561>.]

I would suggest adding the ability to use more than one hash function in 
the same URN.

The syntax could look for example like this:

     urn:cbuid:md5.sha1:<md5-hash>.<sha1-hash>

- Our project picked bitprint to be compatible-- because there were 
already at least two other independent projects using it, and it was 
explicitly promoted in the interest of compatibility. So I would suggest 
bitprints as the 'recommended' set of hash functions for this system. 
Using the above example syntax, they'd read like this:

     urn:cbuid:sha1.tigertree:<sha1-hash>.<tigertree-hash>

- The emerging 'industry standard' at least in the p2p community seems 
to encode hashes in base32; it provides shorter ids than base16 (hex), 
yet is also case-insensitive and uses only alphanumerics. Using the same 
hash, but different ASCII representations seems really icky: You *are* 
using the same technology, yet your ids don't resemble each other at 
all. I suggest that you use base32 in your namespace; see RFC 3548 if 
you need a definition.

- Your mechanism for hashing parts of an email separately is quite 
application-specific; that's a pity, given that the namespace is quite 
generally useful otherwise. Different applications may very easily have 
different needs for breaking up an entity into parts; for example, I 
could easily imagine that somebody would like to hash each body part of 
a multipart message independently.

It seems like solving your need through some mechanism outside this 
namespace registration would make the namespace simpler and more 
generally useful. What exactly do you need it for, anyway?

- Two syntax considerations: Firstly, it would seem like a good idea to 
choose a syntax similar to that of RFC 2397 (The "data" URL scheme) 
since it also represents the MIME types of the resources it identifies 
in the URI. So you'd have something like,

     urn:cbuid:sha1:text/plain,<sha1-hash>

In the flavor of URI that doesn't include content types, you could 
simply leave that part off.

     urn:cbuid:sha1:<sha1-hash>

(Unambiguous because hashes cannot contain commas.) This would probably 
make the scheme more attractive to folks who don't want to include 
content types, since there would be no 'artefacts' (":*:") related to 
them in the content-type-less syntax.

Secondly and finally, cbuid is hard to remember and easy to misspell, 
IMHO. If it's going to be a general namespace for cryptographic 
hash-based id, I'd propose simply calling it 'hash'--

    urn:hash:sha1:<sha1-hash>

:-)

Hoping to spark some discussion,
- Benja


From medinajoe1 at msn.com  Fri Jul 11 10:37:02 2003
From: medinajoe1 at msn.com (JOSEPH MEDINA)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] REMOVE ME
Message-ID: <BAY4-F27VpHgLHVbxZX00003477@hotmail.com>

An HTML attachment was scrubbed...
URL: http://zgp.org/pipermail/p2p-hackers/attachments/20030711/0603ee25/attachment.html
From zooko at zooko.com  Fri Jul 11 11:32:02 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] list admin trivia (was: REMOVE ME)
In-Reply-To: Message from "JOSEPH MEDINA" <medinajoe1@msn.com> 
   of "Fri, 11 Jul 2003 10:36:06 PDT." <BAY4-F27VpHgLHVbxZX00003477@hotmail.com> 
References: <BAY4-F27VpHgLHVbxZX00003477@hotmail.com> 
Message-ID: <E19b2eq-0006oc-00@localhost>

> REMOVE ME

Your quiet, attentive listadmins will deal with this, as always.

FYI, I also filter out about half-a-dozen spams a day (or, more precisely, 
I check all the stored spam and filter *in* the occasional good post from a 
non-subscriber) and I deal with the people who show up every couple of weeks 
looking for someone to go into business with them manufacturing phony phone 
cards for use in Europe.

Let me know if you want me to forward that last kind to you.

Just kidding.  But let this be a lesson to you: don't give your mailing list a 
name that includes the string "hack".

--Z


From bram at gawth.com  Sat Jul 12 07:24:02 2003
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and
 alternate approach
In-Reply-To: <20030710085938.A13047545@exeter.ac.uk>
Message-ID: <Pine.LNX.4.21.0307120721070.8932-100000@ultra.gawth.com>

Adam Back wrote:

> Note however it only works where each node _has_ the whole file.  For
> systems which do swarmcasting (bittorrent, edonkey?) then they don't.

eDonkey does do swarming. It's actually quite sophisticated, supporting
putting downloaders in queues and rarest first piece downloads. But it
doesn't have any tit for tat leech resistance properties.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From bram at gawth.com  Sat Jul 12 07:31:02 2003
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:21 2006
Subject: bittorrent tree mechanism (Re: [p2p-hackers] THEX efficiency
 for authenticated p2p dload. and alternate approach)
In-Reply-To: <20030710090630.B13047545@exeter.ac.uk>
Message-ID: <Pine.LNX.4.21.0307120723410.8932-100000@ultra.gawth.com>

Adam Back wrote:

> That is to say as far as I know we are at this point ahead of the
> arms-race with the jammers who are satisfying themselves with
> exploiting the weakness of the rating systems to discover pre-rated
> known-good hashes - and just publishing mislabelled files, empty files
> an dfiles full of taunts.

That's an issue of file discovery. The hashing information can be included
with the basic metainfo for a file - it isn't all that much bigger than
the hash tree root is, at least compared to the file as a
whole. BitTorrent completely skips out on the whole discovery issue by
making it be launched by clicking on a hyperlink. That does a good job of
getting rid of all the fake file spam.

> Do you swarm-cast the tree?  Or is the tree downloaded from the
> bittorrent index server?  Or is it downloaded from a random node?

It's downloaded from the original web site, frequently but not necessarily
the same machine as the tracker.

> Do you check the consistency of the 2nd level with respect to the
> master hash prior to swarmcasting content?

There is no master hash sent.

> What would the bittorrent client do if the tree failed?  Fail with
> error message or repeat until success?

Repeat until success. Piece failing happens once in a while even with
nothing bad going on in the system. I don't know why.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From tpm101 at gmx.net  Sun Jul 13 19:32:02 2003
From: tpm101 at gmx.net (Tim Muller)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach
In-Reply-To: <Pine.LNX.4.21.0307120721070.8932-100000@ultra.gawth.com>
References: <Pine.LNX.4.21.0307120721070.8932-100000@ultra.gawth.com>
Message-ID: <200307140331.45299.tpm101@gmx.net>

On Saturday 12 July 2003 15:23, Bram Cohen wrote:

> eDonkey does do swarming. It's actually quite sophisticated, supporting
> putting downloaders in queues and rarest first piece downloads. But it
> doesn't have any tit for tat leech resistance properties.

From http://www.edonkey2000.com and http://www.overnet.com
(20 June 2003):

-----
"eDonkey2000 now with the Horde!!!
 6.20.03 - There is a new version on the download page. It includes a new 
download system called Horde.

Horde makes downloading even faster. 
 You will join the Horde for a file you are downloading. You will then find 
other users that are also in the Horde and partner with them. This means that 
you will each send parts of the file to each other until it is complete. This 
way you work closely with other people that are also downloading the file to 
complete it together. The Horde will work together to ensure that the file is 
downloaded as fast as possible to everyone. 

Horde is the leech killer.
 When you download in a Horde you are always seeking partners that give you 
the best speeds. Since everyone is doing this, those that have the highest 
upload speeds will also get the highest download speeds. If you don't upload 
then you wont find people to partner with you so your downloads will be 
sluggish. With Horde the more you give the more you will receive."
-----

;-)

-Tim


From jlevine at bayarea.net  Sun Jul 13 22:17:02 2003
From: jlevine at bayarea.net (jlevine@bayarea.net)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] P2punks meeting Monday evening July 14 7:30pm
Message-ID: <Pine.NEB.4.21.0307132216090.11568-100000@shell2.bayarea.net>

Reminder...

---------- Forwarded message ----------

You know the routine...

Also I have a few new Glyphguy t-shirts for anyone
whose wardrobe is getting thin.

See you there.

--------------

 
Where:
 
  Dana Street Roasting Company
  744 Dana St., Mountain View
  Phone: (650) 390-9638
 
  1/2 block off Castro St.

When:

  7:30pm onward 


Website:
 
  http://www.bitbin.org/p2punks


From tor.klingberg at gmx.net  Mon Jul 14 10:26:02 2003
From: tor.klingberg at gmx.net (Tor Klingberg)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] RFC3548 is out
References: <388601c34659$9e6070e0$6ac9010a@mail2world.com> <20030709210018.GL7976@bodin.org>
Message-ID: <00f701c34a2c$ef6f96b0$722c43d5@Scaleo>

From: "Magnus Bodin" <magnus@bodin.org>
> On Wed, Jul 09, 2003 at 01:35:21PM -0700, Oscar Cisneros wrote:
> > Can you elaborate, please? What is this RFC all about?
>
> It's finally an RFC that covers _only_ base64, base32 and base16 so
> other standards may refer to that one instead of some embedded stuff.

I hope the base32 spec matches that used by Gnutella and Gordon More's
Bitzi, found at
http://groups.yahoo.com/group/the_gdf/files/Proposals/Working%20Proposals/HU
GE/draft-gdf-huge-0_94.txt (section 2.3)

I suppose it does. Just want to check.

/Tor


From Raphael_Manfredi at pobox.com  Mon Jul 14 11:13:02 2003
From: Raphael_Manfredi at pobox.com (Raphael Manfredi)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: RFC3548 is out
In-Reply-To: <00f701c34a2c$ef6f96b0$722c43d5@Scaleo>
References: <388601c34659$9e6070e0$6ac9010a@mail2world.com> <20030709210018.GL7976@bodin.org> <00f701c34a2c$ef6f96b0$722c43d5@Scaleo>
Message-ID: <beupqk$uga$1@lyon.ram.loc>

Quoting p2p-hackers@zgp.org from ml.p2p.hackers:
:I hope the base32 spec matches that used by Gnutella and Gordon More's
:Bitzi, found at
:http://groups.yahoo.com/group/the_gdf/files/Proposals/Working%20Proposals/HU
:GE/draft-gdf-huge-0_94.txt (section 2.3)

It does match.

Raphael

From jlevine at bayarea.net  Mon Jul 14 13:56:02 2003
From: jlevine at bayarea.net (jlevine@bayarea.net)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] P2punks meeting TONITE Monday evening July 14 7:30pm
Message-ID: <Pine.NEB.4.21.0307141353510.29363-100000@shell2.bayarea.net>

Last pester...see you all there!

James

--------------

 
Where:
 
  Dana Street Roasting Company
  744 Dana St., Mountain View
  Phone: (650) 390-9638
 
  1/2 block off Castro St.

When:

  7:30pm onward 


Website:
 
  http://www.bitbin.org/p2punks


From thiemann at informatik.uni-freiburg.de  Tue Jul 15 08:24:03 2003
From: thiemann at informatik.uni-freiburg.de (Peter Thiemann)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00
In-Reply-To: <3F0DC458.6090401@gmx.de>
References: <3F0DC458.6090401@gmx.de>
Message-ID: <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>

Dear Benja,

    BF> [cc:ed to p2p-hackers and urn-nid]

    BF> I just saw your recent I-D in the archives of the urn-nid mailing
    BF> list. (I'm not subscribed, thus the delay in reaction.)

    BF> http://lists.research.netsol.com/pipermail/urn-nid/2003-June/000348.html
[
Nice to see that somebody is listening to this list, since nobody is
really subscribed to it. I only received a message that my message
awaits moderator approval.
]

    BF> I'm working on a system (Storm,
    BF> <http://sv.gnu.org/projects/storm/>)

The project description link on that page is stale.

    BF> having very similar addressing needs; in fact, I've been in the
    BF> process of preparing a registration for an informal URN namespace
    BF> using MIME type + cryptographic hash to identify typed octet stream
    BF> resources. We are building a p2p-based extension to the Web, based on
    BF> this.

This is not quite what we are after. We are interested in using p2p
techniques for implementing a global sharable mail store. The main
application is to use the ids as a replacement for IMAP's concept of
message uids to allow for easy synchronization of client caches and
replicas of the mail server. Hence the bias towards the message/rfc822
type in the draft. However, the intention was to design a scheme which
is usable for other information storage and retrieval. Perhaps, your
storm project can provide the storage layer that we need? 

    BF> 'bitprints'-- ...

Thanks for the pointer! I was not aware of those efforts.

    BF> I hope that we can archieve some collaboration here. It would be good
    BF> if we could reach agreement on a single namespace, so that URNs in
    BF> this namespace can be resolved with either of our systems.

Same here.

    BF> Below are some high-level comments regarding the approach of your
    BF> I-D. (I'll save low-level comments for a later stage of the
    BF> discussion. :) )

    BF> - It seems obvious that a shared namespace would be a good thing. If
    BF>   someone made a link between two files in one of your repositories,
    BF>   for example, and someone else published the linked-to file using our
    BF>   system, then a browser understanding both systems could follow the
    BF>   link from your repository by downloading the file through our
    BF>   system.

I agree completely. That's why we made some effort in keeping the
scheme simple and extensible. This kind of support also makes the
namespace application stronger. 

    BF> - Your proposal attempts to be general by allowing different hash
    BF>   functions to be used. I am wondering about that: It seems good
    BF>   practice to keep protocol parameters extensible, but it also means
    BF>   that there will be different ids for the same resource-- impractical
    BF>   when you try to look it up under one id, but it's stored under
    BF>   another!

My general feeling is that this is not a huge problem as long as you
stay inside one application framework. For example, in our
application, you perform search queries against a database of meta
information and then you get the results in terms of these URNs. Next,
you try to retrieve (some of) these URNs basically from known hosts. 
So our application relies more on the uniqueness properties to achieve
distribution and replication rather than on the naming function.

    BF> On the other hand, if you say "it's *this* hash" there will be
    BF> people who'll want to use another, and they'd have to create
    BF> their own namespace. Since this is not a standard-- it's an
    BF> informational RFC--  
    BF> there is no reason why they wouldn't.

    BF> I think the way to go would be to provide a general namespace for
    BF> hash-based URNs, but to specify one 'prefered' way to use it, noting
    BF> that there are several systems implementing this way, already. But I'm
    BF> not sure.

Well, I'm pretty sure that the URN scheme should *not* fix one
particular hash function. Instead, it should be extensible so that it
does not become obsolete just because a hash function is broken or
somebody discovers a new super-safe or super-efficient hash function.

    BF> - All your identifiers use only a single hash function. Especially for
    BF>   URNs, which are supposed to be long-lived, there is a good reason to
    BF>   use hashes generated by more than one function: If one of the
    BF>   functions is broken (but the other isn't simultaneously), the ids
    BF>   don't become useless; you can use timestamping to extend the
    BF>   lifetime of your ids indefinitely (as long as the two hash functions
    BF>   you happen to use at a given time are never broken
    BF>   simultaneously). [unforch the only reference I can find on this
    BF>   right now is the US patent on it :-(, # 5,373,561
    BF>   <http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=5,373,561.WKU.&OS=PN/5,373,561&RS=PN/5,373,561>.]   

This is interesting but not directly relevant (I think) because we are not
dealing with certificates here. Rather you want to increase the
confidence (if a server supports more than one hash function) and
robustness (if a server supports just one of a selection of hashes) of
a data access. I really don't see why timestamping should be required
because each hash value lives indefinitely long.

    BF> I would suggest adding the ability to use more than one hash function
    BF> in the same URN.

    BF> The syntax could look for example like this:

    BF>      urn:cbuid:md5.sha1:<md5-hash>.<sha1-hash>

That sounds like a good proposal to me, it gives you increased
confidence and robustness virtually for free. I'll put a concrete
syntax proposal at the end of this message.

    BF> - Our project picked bitprint to be compatible-- because there were
    BF>   already at least two other independent projects using it, and it was
    BF>   explicitly promoted in the interest of compatibility. So I would
    BF>   suggest bitprints as the 'recommended' set of hash functions for
    BF>   this system. Using the above example syntax, they'd read like this:

    BF>      urn:cbuid:sha1.tigertree:<sha1-hash>.<tigertree-hash>

I don't think this recommendation should be a part of a namespace
application. This choice is really application dependent, so it is
part of the application's description. I would not mind, though, to
use it in an example (like: if you transform the id like this, then
you get a valid bitprint id), I just would not want to make a
normative statement. 

    BF> - The emerging 'industry standard' at least in the p2p community seems
    BF>   to encode hashes in base32; it provides shorter ids than base16
    BF>   (hex), yet is also case-insensitive and uses only
    BF>   alphanumerics. Using the same hash, but different ASCII
    BF>   representations seems really icky: You *are* using the same
    BF>   technology, yet your ids don't resemble each other at all. I suggest
    BF>   that you use base32 in your namespace; see RFC 3548 if you need a
    BF>   definition.

This is a hairy issue. I understand your reasoning. After studying the
RFC I tend *not* to commit to any particular coding, but rather make
the encoding a parameter with some reasonable default. Then the
identifier equivalence section should state explicitly which
representation is considered the normalized one. 

I'm not sure about the reasonable default.
Can you give me a reason why p2p folks stick to base32? I don't really
see an advantage over base64 for ids that are never handled by humans.

    BF> - Your mechanism for hashing parts of an email separately is quite
    BF>   application-specific; that's a pity, given that the namespace is
    BF>   quite generally useful otherwise. Different applications may very
    BF>   easily have different needs for breaking up an entity into parts;
    BF>   for example, I could easily imagine that somebody would like to hash
    BF>   each body part of a multipart message independently.

This is a misunderstanding of the intention, so this requires
clarification in an update of the draft. The point is that many data
formats contain header or other meta information (emails, images,
mp3). The mode parameter signals that this meta information is
ignored and only the raw contents are hashed. Since the hash only
determines the raw contents, the specification needs to define how to
complete the contents to a valid instance document of the specified
type. In the case of an email message, this means to add the required
fields from the RFC2822 specification. For other formats, other things
have to be done. And for formats that only consist of raw contents, it
will not make sense to define a mode 1 id.

    BF> It seems like solving your need through some mechanism outside this
    BF> namespace registration would make the namespace simpler and more
    BF> generally useful. What exactly do you need it for, anyway?

see the top of this mesage.


    BF> - Two syntax considerations: Firstly, it would seem like a good idea
    BF>   to choose a syntax similar to that of RFC 2397 (The "data" URL
    BF>   scheme) since it also represents the MIME types of the resources it
    BF>   identifies in the URI. So you'd have something like,

    BF>      urn:cbuid:sha1:text/plain,<sha1-hash>

I don't think it is a good idea to put the mediatype at this point
because it separates the sha1 and the <sha1-hash> fields which belong
together, logically. 

    BF> In the flavor of URI that doesn't include content types, you could
    BF> simply leave that part off.

    BF>      urn:cbuid:sha1:<sha1-hash>

    BF> (Unambiguous because hashes cannot contain commas.) This would
    BF> probably make the scheme more attractive to folks who don't want to
    BF> include content types, since there would be no 'artefacts' (":*:")
    BF> related to them in the content-type-less syntax.

I don't see the point of doing so. Fact is that the ids are to be
processed by machines and only occasionally will they pop up "in the
human eye". Reading and writing them should be straightforward
to implement. Hence, my preference is to have a fixed number of fields
separated by ":" and simply leave unspecified slots empty (admittedly,
the "*" is artificial). That is

	urn:cbuid::sha1.tigertree::<sha1-hash>.<tigertree-hash>

[the empty slot after the sha1.tigertree is reserved for specifying the
encoding of the following hashes, see the grammar below]

    BF> Secondly and finally, cbuid is hard to remember and easy to misspell,
    BF> IMHO. If it's going to be a general namespace for cryptographic
    BF> hash-based id, I'd propose simply calling it 'hash'--

    BF>     urn:hash:sha1:<sha1-hash>

Well, I could be talked into that. The name 'cbuid' stems from the idea
that those URNs are going to generalize IMAP's uids as mentioned
above. However, given that these URNs are not for human consumption
(or are they), I see no convincing technical argument in favor of any
particular name. Do you have one?

Proposed Grammar:

   cbuid-nss   = type-spec ":" hash *1(":" type-specific-extension)
   type-spec   = [media-type *parameter]
   parameter   = ";" "mode" "=" 1*DIGIT / ";" token "=" token
   token       = 1*(ALPHA / DIGIT)
   hash        = hash-scheme ":" hash-enc ":" hash-values *("/" hash-values)
   hash-scheme = hash-item *("." hash-item)
   hash-item   = "md5" / "sha1" / "hash127" / hash-token
   hash-token  = token
   hash-enc    = ["base16" / "base32" / "base64"]
   hash-values = hash-value *("." hash-value)
   hash-value  = token

Changes:
- typespec can be empty
- more than one hash function
- hash-enc[oding] added

Questions:
* does there have to be a registry for names of hash functions?
  Clearly, the namespace definition cannot be updated for each new
  hash function.
* which value of <hash-enc> should be the default (and probably be the
  result of the normalization process)? I can guess your answer...

Best wishes
-Peter

From gojomo at bitzi.com  Tue Jul 15 08:52:02 2003
From: gojomo at bitzi.com (Gordon Mohr)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>
Message-ID: <008401c34ae8$ef44fdd0$660a000a@golden>

Peter Thiemann writes:
> Can you give me a reason why p2p folks stick to base32? I don't really
> see an advantage over base64 for ids that are never handled by humans.

Content identifiers are often sent via email; they are 
also (very occasionally) put on paper. 

Content identifiers also appear inside other URIs (such
as HTTP) or as filenames. Base64 characters aren't always
legal in filenames, and may need to be escaped in other
URIs.

Content identifiers also occasionally get usefully catalogued
by full-text indexers, but Base64 characters are usually
considered word-boundaries or other punctuation by such tools. 
Thus Base32 identifiers can be sought as atomic words, while 
Base64 identifiers cannot. 

For example, try:
  http://www.google.com/search?q=EPH3MTGDELUYJU7UDWSRA6B3PAYVEILO

Sometimes it may be handy to use a truncation of the full 
identifier -- say the first 4-6 characters -- as a nickname
of the full identifier to distinguish (in a non-secure way)
between variants of otherwise similar files. Such use only
heightens the other problems with Base64: your short
identifier could be "/a+A/4" instead of something like
"AZB3A4".

- Gojomo @ Bitzi
____________________   
Gordon Mohr <gojomo@ . . . At Bitzi, people cooperate to identify, rate,
bitzi.com> Bitzi CTO . . . describe and discover files of every kind. 
_ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it!


From justin at chapweske.com  Tue Jul 15 09:12:01 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00
In-Reply-To: <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>
Message-ID: <3F1427B7.6010402@chapweske.com>

> 
> Well, I'm pretty sure that the URN scheme should *not* fix one
> particular hash function. Instead, it should be extensible so that it
> does not become obsolete just because a hash function is broken or
> somebody discovers a new super-safe or super-efficient hash function.
> 

I actually disagree.  In order to avoid chosen-algorithm attacks, the 
set of standardized hashes should be kept very small and directly under 
the control of the IETF and at the guidance of the CFRG.  By having it 
as a top level URN scheme such as urn:sha1, implementors will tend to 
focus on the algorithm itself rather than the notion of generic 
hash-based naming.

We must avoid having a generic hash name space that allows developers to 
add new hashes willy-nilly.  I believe that many non crypto-savvy 
developers would tend to support every algorithm specified in the name 
space without understanding that by supporting multiple agorithms you 
introduce a weakest-link condition.

Also, for reasons of interoperability, it would be useful if the number 
of different URNs be kept small.  A large number of the P2P developers 
have agreed upon SHA1 for the time being, which could potentially enable 
very simple interoperability between these systems.

I also think some mechanism should be introduced to deprecate hash 
schemes over time.  So while urn:md5 would be a decent name space to add 
for compatibility with existing MD5-based applications, such a scheme 
should be immediately denoted as being deprecated to avoid having new 
applications adopt MD5.

> 
>     BF> I would suggest adding the ability to use more than one hash function
>     BF> in the same URN.
> 
>     BF> The syntax could look for example like this:
> 
>     BF>      urn:cbuid:md5.sha1:<md5-hash>.<sha1-hash>
> 
> That sounds like a good proposal to me, it gives you increased
> confidence and robustness virtually for free. I'll put a concrete
> syntax proposal at the end of this message.
> 

I don't believe this is a good idea from a security perspective.  I fear 
that most implementors would only verify one of the hashes, and unless 
they make a judicious choice about the hash to verify, they are again 
opening themselves to a chosen-algorithm attack.  Otherwise, 
implementors w/o the proper context to decide which hash is the 
strongest are forced to verify both hashes, which will cut their hashing 
performance roughly in half.

Zooko, do you have any thoughts on this?  These seem like the types of 
attacks that you've spent a lot of time thinking about.

-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From zooko at zooko.com  Tue Jul 15 09:24:02 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00 
In-Reply-To: Message from Peter Thiemann <thiemann@informatik.uni-freiburg.de> 
   of "15 Jul 2003 17:14:37 +0200." <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> 
References: <3F0DC458.6090401@gmx.de>  <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> 
Message-ID: <E19cSZU-0005nT-00@localhost>

> Can you give me a reason why p2p folks stick to base32? I don't really
> see an advantage over base64 for ids that are never handled by humans.

This was discussed on the p2p-hackers mailing list:

http://zgp.org/pipermail/p2p-hackers/2001-September/date.html

Look for messages with the subject "please prefer base 64 over base 32".  
I named that thread, because I started it by arguing in favor of base 64.  

In the course of the discussion Gordon Mohr and others convinced me to change 
my mind, and I adopted base-32 afterward.

Regards,

Zooko

http://zooko.com/

From zooko at zooko.com  Tue Jul 15 09:46:02 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00 
In-Reply-To: Message from Justin Chapweske <justin@chapweske.com> 
   of "Tue, 15 Jul 2003 11:11:35 CDT." <3F1427B7.6010402@chapweske.com> 
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>  <3F1427B7.6010402@chapweske.com> 
Message-ID: <E19cSuD-0005pu-00@localhost>

 Justin Chapweske wrote:
>
> >     BF> I would suggest adding the ability to use more than one hash function
> >     BF> in the same URN.
> > 
> >     BF> The syntax could look for example like this:
> > 
> >     BF>      urn:cbuid:md5.sha1:<md5-hash>.<sha1-hash>
[...]
> I don't believe this is a good idea from a security perspective.  [...]
[...]
> Zooko, do you have any thoughts on this?  These seem like the types of 
> attacks that you've spent a lot of time thinking about.

I *have* thought about this issue.  The short of it is: I'm not sure how to do 
it right, so I'm not going to do it.

For example, what should the required/allowed behavior be when one of the 
hashes fails to match and the matches?  Are implementors required to check all 
hashes that are included?

I suspect that there might be a way to do secure, smooth, backwards-
compatible upgrade past certain kinds of hash algorithm breakages.  However, 
I haven't seen a complete description of how it would work.

(The algorithm-negotiation features of SSL/TLS might be considered an example 
of what *not* to do, and in any case they cannot be carried over to this 
application since they are interactive.)

So, lacking a clear understanding of how to do secure and useful algorithm 
upgrade, I have satisfied myself with simply hardcoding SHA-1.  I accept that 
the identifiers thus generated might become unreliable in as little as ten 
years' time.  When I *do* decide to change algorithms, I intend to do so 
unambiguously by either changing the namespace or relying on the length of the 
identifier.

Mnet URI's currently look like this:

mnet:38ppp56jbb8b64zrh8reoadzgn1zpdxc76enkmqduwtf4tug

(They use SHA-1, exclusively and non-optionally.)

If I were to change to SHA-256 or something, I would probably make them look 
like one of these:

mnet:7nbcku4ijbk848kgzakqs316hdnbb6magsqag5hybrw1gmqf5b46xwenwz9um1qqhw8o

(Uses the new, longer length to indicate that it uses the new algorithm.)

or:

mneu:7nbcku4ijbk848kgzakqs316hdnbb6magsqag5hybrw1gmqf5b46xwenwz9um1qqhw8o
znet:7nbcku4ijbk848kgzakqs316hdnbb6magsqag5hybrw1gmqf5b46xwenwz9um1qqhw8o
mnet2:7nbcku4ijbk848kgzakqs316hdnbb6magsqag5hybrw1gmqf5b46xwenwz9um1qqhw8o

(Uses a different namespace.)

I'm not at all concerned about using more namespace identifiers.  There will 
be tens of thousands of different namespace identifiers used during the next 
ten years, nearly all of which will immediately die and become corpse 
namespaces.  If Mnet still has any relevance at all in 2013, then it will be 
deserving of another namespace identifier.

Regards,

Zooko

http://zooko.com/

From b.fallenstein at gmx.de  Tue Jul 15 15:59:02 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Cryptographic hashes in URNs (was: Comments on draft-thiemann-cbuid-urn-00)
In-Reply-To: <E19cSuD-0005pu-00@localhost>
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>  <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost>
Message-ID: <3F1486C8.3000706@gmx.de>

Hi all,

This is a discussion about the registration of a URN namespace based on 
cryptographic hashes, i.e., a namespace of octet streams as identified 
by their hashes.

Some security-related issues have (unsurprisingly) turned up, so I'm now 
cross-posting to three mailing lists--

- urn-nid (technical discussion of URN namespace registrations)
- p2p-hackers (because one important user base would be the p2p community)
- the IRTF crypto forum (to pick their minds about the security issues).

I think it's appropriate in all three, but please stay strictly on topic :-)

Justin Chapweske wrote:
 > [Peter Thiemann wrote: -bf]
>> Well, I'm pretty sure that the URN scheme should *not* fix one
>> particular hash function. Instead, it should be extensible so that it
>> does not become obsolete just because a hash function is broken or
>> somebody discovers a new super-safe or super-efficient hash function.

[Later in the mail, Peter even suggested a registry for hash functions.]

> I actually disagree.  In order to avoid chosen-algorithm attacks,
 > the set of standardized hashes should be kept very small and directly
 > under the control of the IETF and at the guidance of the CFRG.

I agree; if we have a urn:hash: namespace or similar, the list of 
allowable hash functions should be fixed in the namespace registration, 
and only be changeable by going through the RFC process.

Regarding chosen-algorithm attacks, we have to keep in mind what our 
adversary model is. It should be noted that the hash function is picked 
by the person publishing a URN, not by the person a file is downloaded 
from. An attack using a hash collision would still be possible, but it 
would go more like this:

- Find a hash collision, H = h(x) = h(y), x != y.
- Use x to obtain some form of a "good rating" on H; this could include 
something like having an independent, widely trusted entity publish H as 
the hash of some good data, or even obtaining a signature on H.
- Make y available for download with H as its id.

If the URN is published by someone else than the adversary, the 
adversary doesn't get to choose the algorithm.

 > By having it as a top level URN scheme such as urn:sha1, implementors
 > will tend to focus on the algorithm itself rather than the notion
 > of generic hash-based naming.

Maybe you're right. OTOH this doesn't generalize to using combinations 
of hash functions (more on this below).

> We must avoid having a generic hash name space that allows developers
 > to add new hashes willy-nilly.  I believe that many non crypto-savvy
 > developers would tend to support every algorithm specified in the
 > name space without understanding that by supporting multiple agorithms
 > you introduce a weakest-link condition.

These are separate issues. Having developers add non-standard hash 
functions seems like a really bad idea. Supporting every function in the 
namespace registration may be a bad idea, but do you trust 
non-crypto-savvy developers to make a sensible *choice* of which 
algorithms to support (assuming that there *are* namespaces for 
different functions, including maybe md5 for backward compatibility).

It should also be noted that the spec can try to educate those people 
that do read it. Of course there are always those that never read the 
spec, just use what the others use.

> Also, for reasons of interoperability, it would be useful if the number
 > of different URNs be kept small.

I agree, but what about developers who don't want to use e.g. SHA1 for 
technical reasons? (In particular hash size; possibly also backwards 
compatibility, especially if they have a running system they want to 
augment with URNs.) Well, maybe such developers aren't plentiful; 
anybody know any? (Not sure how on-topic this is for CFRG tho.)

 > A large number of the P2P developers have agreed upon SHA1 for the time
 > being, which could potentially enable very simple interoperability
> between these systems.

True, but there are also the two technical points against using SHA1 alone:

- Many systems today use hash trees for multisource downloading. (Of 
course you can use a SHA1 hash tree, but that obviously doesn't give you 
the interoperability with plain SHA1 systems.)
- Using just one function makes repair after the function is broken much 
more difficult (again, more below).

> I also think some mechanism should be introduced to deprecate hash schemes
 > over time.  So while urn:md5 would be a decent name space to add for
 > compatibility with existing MD5-based applications, such a scheme 
should be
 > immediately denoted as being deprecated to avoid having new applications
 > adopt MD5.

Agreed.

>>     BF> I would suggest adding the ability to use more than one hash function
>>     BF> in the same URN.
>>
>>     BF> The syntax could look for example like this:
>>
>>     BF>      urn:cbuid:md5.sha1:<md5-hash>.<sha1-hash>
>>
>> That sounds like a good proposal to me, it gives you increased
>> confidence and robustness virtually for free. I'll put a concrete
>> syntax proposal at the end of this message.
> 
> I don't believe this is a good idea from a security perspective.  I fear that
 > most implementors would only verify one of the hashes, and unless 
they make a
 > judicious choice about the hash to verify, they are again opening 
themselves to
 > a chosen-algorithm attack.  Otherwise, implementors w/o the proper 
context to
 > decide which hash is the strongest are forced to verify both hashes, 
which will
 > cut their hashing performance roughly in half.
 >
> Zooko, do you have any thoughts on this?  These seem like the types of attacks
 > that you've spent a lot of time thinking about.

Zooko replied:
> I *have* thought about this issue.  The short of it is: I'm not sure how to do 
> it right, so I'm not going to do it.

(I understand this as: "I'm not going to use more than a single hash 
function in the id.")

> For example, what should the required/allowed behavior be when one of the 
> hashes fails to match and the matches?  Are implementors required to check all 
> hashes that are included?

My take on this is: Generally, yes.

If you don't, you open yourself to a very similar attack to the one I 
described above regarding chosen-message attacks. Assume that two hash 
functions, g(.) and h(.) are in use.

- Create a URN containing g(x), h(y) for x != y.
- Using x, obtain an endorsement for the URN from someone who verifies 
only g(.).
- Use the endorsement to "sell" y to someone who verifies only h(.).

Yes, verifying only one of the two hashes takes longer. Well, verifying 
one hash takes longer than verifying zero hashes; let's specify a system 
without this kind of security holes.

I think that using a URN which includes two hashes should be read as a 
statement by the URN's creator that they think verifying two hashes is 
reasonable. If you don't agree with them, don't resolve their URN in the 
first place...

(I don't see why Justin thinks that verifying only one hash would 
introduce chosen-algorithm attacks, though. Assume that a verifier is 
capable of verifying both g(.) and h(.), but verifies only g(.), which 
happens to be insecure. I think it is reasonable to think that the 
verifier would also accept a URN including *only* a g(.) hash; thus, the 
double hash isn't necessary for the attack. And if the developer of the 
verifier knew that g(.) was insecure, surely they wouldn't make their 
program verify only g(.) in the first place. Maybe someone can enlighten 
me on chosen-algorithm attacks made possible by using two functions.)

The one exception I would make to having to verify both hashes is when 
the verifier doesn't have an implementation of both hash functions. In 
these cases, I would say that an implementation MAY reject the URN, but 
if it does not, it SHOULD provide an indication to the user that the 
verifier cannot guarantee that everybody else will see the same file 
behind this URN as given. The verifier could also provide an alternative 
URN where this *can* be guaranteed (i.e., one containing only one hash).

> I suspect that there might be a way to do secure, smooth, backwards-
> compatible upgrade past certain kinds of hash algorithm breakages.  However, 
> I haven't seen a complete description of how it would work.

Ok. I assume that you mean, a way to continue to use identifiers even 
after the hash functions used in these identifiers are all broken.

For the following, you have to have a secure timestamping service. 
Assume that your identifiers include two hashes, using g(.) and h(.). 
Assume that g(.) is broken, but h(.) still works. Assume that you have 
decided upon a new hash function, i(.), which you will in the future use 
together with h(.). For every file x where you have the contents, not 
just the hash, you can:

- Compute h(x) and i(x).
- Timestamp the statement: "H = h(x) and I = i(x) are hashes of the same 
file."

Now suppose that h(.) is broken, but i(.) still works. Assume that you 
have an old-style identifier containing g(x) and h(x). Assume that you 
have downloaded y, an alleged copy of x; you want to verify it against 
the identifier containing G = g(x) and H2 = h(x). You have the timestamp 
certificate of the above statement.

You now verify that g(y) = G; h(y) = H = H2; and i(y) = I.

If this holds, you can be sure that y = x. Proof: Assume that y != x. We 
know that h(y) = h(x). That's not something big; at the time of 
verification, we know ways of computing collisions for h(.).

However, the timestamped statement contains I = i(y). At the time of 
verification, it is believed that nobody can obtain hash collisions on 
i(.). Therefore, at the time of verification, the adversary must have 
used y to compute i(y), or, alternatively, created a timestamp for every 
possible value of i(.). Since timestamping involves hashing, creating a 
timestamp for every possible value of i(.) requires at least as much 
computing power as mounting a birthday attack on the function used, 
which is held to be impossible. Thus, someone has intentionally 
associated h(x) with i(y), and thus y.

To know that it makes sense to associate h(x) with y, the person must 
have known, at the time, that h(x) = h(y).

Therefore, if y != x, then someone must have known that h(x) = h(y), 
**at a time where no way to compute a hash collision on h(.) was known, 
yet**. This is held to be impossible; thus, y = x, q.e.d.

Does this satisfy you?

Note that you cannot use the above in a system using only one hash 
function-- because once your hash function breaks, your timestamps, 
using that hash function, break as well! At every time, you need at 
least one hash function that remains unbroken.

I should also note that the above idea is patented in the US and 
probably elsewhere. However, patents expire (this one in the 2010s, I 
think), and this is long-term thinking.

> (The algorithm-negotiation features of SSL/TLS might be considered an example 
> of what *not* to do, and in any case they cannot be carried over to this 
> application since they are interactive.)

Yup and yup. (Some protocols may use hash URNs as return values and may 
include algorithm negotiation, but that would be a security issue with 
those protocols, not with the hash URN spec.)


We can of course use a different URN namespace identifier for each hash 
function, i.e.,

     urn:sha1:...

A point in favor of this would be that people use it already.

I'm not sure how this interacts with using different hashes in a single 
URN, though. I think there should be a method to do so, in order to 
enable the lifetime extension as described above, and in order to 
combine 'industry standard' SHA-1 hashes with a tree hash that enables 
multisource downloading.

I think that the specification should probably say that an 
implementation must check all hashes, except at explicit user 
discretion. I think that if an an application does check all hashes, 
there are no security considerations with using more than one hash. OTOH 
there are security benefits-- if only that birthday attacks become 
harder to mount because of the additional bits.

The question then is, if we allow e.g. {sha1,sha256,tiger} each by 
itself, do we also allow every possible combination of the above? Or do 
we limit ourselves to a specified set of combinations, e.g. sha1+tiger 
but not sha1+sha256?

I believe that there is no security gain from allowing the former but 
disallowing the latter. OTOH I could easily imagine that if sha1+tiger 
is popular, some people would like to use sha1+tiger+sha256 for added 
security.

There is the issue of having less URNs per resource, of course, which is 
outside the security domain I think. I'm not sure how much a concern 
this is... opinions?

If we allow all possible subsets, using URN namespace names becomes 
problematic; we'd have to register

     urn:sha1:...
     urn:sha256:...
     urn:tiger:...
     urn:sha1+sha256:...
     urn:sha1+tiger:...
     urn:sha256+tiger:...
     urn:sha1+sha256+tiger:...

(urks). Or, we'd have to do something like, use urn:sha1:... for just 
one hash but urn:hashes:sha1+sha256:... for more than one hash. Then, 
we'd only have to register

     urn:sha1:...
     urn:sha256:...
     urn:tiger:...
     urn:hashes:...

Opinions?

Thanks,
- Benja


From justin at chapweske.com  Tue Jul 15 17:14:02 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Cryptographic hashes in URNs
In-Reply-To: <3F1486C8.3000706@gmx.de>
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>  <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost> <3F1486C8.3000706@gmx.de>
Message-ID: <3F14989C.2020501@chapweske.com>

>  > A large number of the P2P developers have agreed upon SHA1 for the time
>  > being, which could potentially enable very simple interoperability
> 
>> between these systems.
> 
> 
> True, but there are also the two technical points against using SHA1 alone:
> 
> - Many systems today use hash trees for multisource downloading. (Of 
> course you can use a SHA1 hash tree, but that obviously doesn't give you 
> the interoperability with plain SHA1 systems.)
> - Using just one function makes repair after the function is broken much 
> more difficult (again, more below).
> 

As they are defined today 
(http://open-content.net/specs/draft-jchapweske-thex-02.html), the tree 
hashes are parametizable to allow different file segment (leaf) sizes to 
be specified.  IFAIK other traditional digest algorithms are set in 
stone, allowing no parameterization(?).

If appropriate, I would be interested in certain hash tree forms being 
validated and standardized as normal digest functions.  This would allow 
hash tree forms to be incorporated into existing standards that rely on 
a normal digest function.

In regards to using multiple functions, most current systems that output 
multiple functions simply output them as multiple seperate pieces of 
metadata.  For instance, a system could output something like the following:

X-Content-URN: urn:sha1:FAB6CX2GZSWOOWCPXXBYSFBUSN4LIGTF
X-Content-URN: urn:tree:tiger:3FOBMWPE2JED5DUN2VA6J7DGSNJNJILE4HRF6SQ

I believe that in many systems the hashes will simply be treated as 
pieces of meta-data that are used to verify the integrity of a file. 
Just because they contain 'urn' in them doesn't mean that they have to 
be used as identifiers.  So, if you look at it from the meta-data 
perspective, I think its natural to use multiple independant hashes 
rather them glomming them all together.

> (I don't see why Justin thinks that verifying only one hash would 
> introduce chosen-algorithm attacks, though. Assume that a verifier is 
> capable of verifying both g(.) and h(.), but verifies only g(.), which 
> happens to be insecure. I think it is reasonable to think that the 
> verifier would also accept a URN including *only* a g(.) hash; thus, the 
> double hash isn't necessary for the attack. And if the developer of the 
> verifier knew that g(.) was insecure, surely they wouldn't make their 
> program verify only g(.) in the first place. Maybe someone can enlighten 
> me on chosen-algorithm attacks made possible by using two functions.)

My point is subtle and perhaps irrelevant, but let me try to clarify:

When I see something like:

urn:hash:sha1.md5:<foo>

This implies to me that I am obligated (not sure to whom) to verify both 
hashes.  I believe these semantics are reasonable, however you will find 
very few developers will be willing to follow these semantics and verify 
both hashes.

However, when I see something like:

X-Content-URN: urn:md5:42J46YB3Y3OLLFYL52B4LNDE34
X-Content-URN: urn:sha1:FAB6CX2GZSWOOWCPXXBYSFBUSN4LIGTF
X-Content-URN: urn:tree:tiger:3FOBMWPE2JED5DUN2VA6J7DGSNJNJILE4HRF6SQ
X-Content-URN: urn:crc32:<foo>

This implies to me that I should apply some sort of ranking between the 
algorithms and not use any that are below my standards.  If I am not 
confident about any single hash, I am free to verify multiple of them.

Obviously from a technical perspective, both approaches are the same, it 
just seems to me that the first approach invites developers to defy the 
semantics, while the second approach is likely to be a healthier approach.

I am not a developer psychologist, so I'm very open to other viewpoints 
on this.

-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From b.fallenstein at gmx.de  Tue Jul 15 18:04:02 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Cryptographic hashes in URNs
In-Reply-To: <3F14989C.2020501@chapweske.com>
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>  <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost> <3F1486C8.3000706@gmx.de> <3F14989C.2020501@chapweske.com>
Message-ID: <3F14A421.5010502@gmx.de>

Hi,

Justin Chapweske wrote:
> If appropriate, I would be interested in certain hash tree forms being 
> validated and standardized as normal digest functions.  This would allow 
> hash tree forms to be incorporated into existing standards that rely on 
> a normal digest function.

Seems appropriate to me.

> In regards to using multiple functions, most current systems that output 
> multiple functions simply output them as multiple seperate pieces of 
> metadata.  For instance, a system could output something like the 
> following:
> 
> X-Content-URN: urn:sha1:FAB6CX2GZSWOOWCPXXBYSFBUSN4LIGTF
> X-Content-URN: urn:tree:tiger:3FOBMWPE2JED5DUN2VA6J7DGSNJNJILE4HRF6SQ
> 
> I believe that in many systems the hashes will simply be treated as 
> pieces of meta-data that are used to verify the integrity of a file. 
> Just because they contain 'urn' in them doesn't mean that they have to 
> be used as identifiers.  So, if you look at it from the meta-data 
> perspective, I think its natural to use multiple independant hashes 
> rather them glomming them all together.

Well, I use them as context-less names that resolve to something; and 
you simply cannot give two different URNs in one <a href="...">. If you 
use them as metadata about a different resource, then I agree you don't 
need to put different hashes into a single URN.

(OTOH, if you go the other way around, and want to give RDF metadata 
about a resource identified by a hash-URN-- e.g., ratings or 
locations--, there are good reasons for using a URN with more than one 
hash again, so that the resource name you store information about 
doesn't need to become invalid if one hash function is broken.)

>> (I don't see why Justin thinks that verifying only one hash would 
>> introduce chosen-algorithm attacks, though. Assume that a verifier is 
>> capable of verifying both g(.) and h(.), but verifies only g(.), which 
>> happens to be insecure. I think it is reasonable to think that the 
>> verifier would also accept a URN including *only* a g(.) hash; thus, 
>> the double hash isn't necessary for the attack. And if the developer 
>> of the verifier knew that g(.) was insecure, surely they wouldn't make 
>> their program verify only g(.) in the first place. Maybe someone can 
>> enlighten me on chosen-algorithm attacks made possible by using two 
>> functions.)
> 
> My point is subtle and perhaps irrelevant, but let me try to clarify:
> 
> When I see something like:
> 
> urn:hash:sha1.md5:<foo>
> 
> This implies to me that I am obligated (not sure to whom) to verify both 
> hashes.  I believe these semantics are reasonable, however you will find 
> very few developers will be willing to follow these semantics and verify 
> both hashes.

I dunno... if the spec explains how this is a security leak, and the 
developers still do it, I think there's little I can do...

> However, when I see something like:
> 
> X-Content-URN: urn:md5:42J46YB3Y3OLLFYL52B4LNDE34
> X-Content-URN: urn:sha1:FAB6CX2GZSWOOWCPXXBYSFBUSN4LIGTF
> X-Content-URN: urn:tree:tiger:3FOBMWPE2JED5DUN2VA6J7DGSNJNJILE4HRF6SQ
> X-Content-URN: urn:crc32:<foo>
> 
> This implies to me that I should apply some sort of ranking between the 
> algorithms and not use any that are below my standards.  If I am not 
> confident about any single hash, I am free to verify multiple of them.

I would actually agree with your assessments of the meaning of both 
statements -- the first implies you have to verify both, the second 
implies you have to verify as many as you deem necessary.

So your point is: Because many developers will-- maybe-- use the second 
kind of semantics always, having the form aiming for the first kind of 
semantics is futile.

I need the first kind of semantics, though, if I want to use hash URNs 
as reliable, trusted, *context-less* identifiers. (I.e., without being 
able to give more than one URN in an <a href>.) I think having a way to 
convey these semantics would be good, even if some people will implement 
it in an insecure way.

Seems to me like this is generally useful where people use URNs for 
identifiers. I mean, when I post a URN in a forum, giving two URNs seems 
less practical and posting a number of URNs and saying "Download as many 
of these as you need to meet your security requirements, and verify that 
they are all equal" seems weird. :-) You should click on it and your 
software should handle the rest.

My current thinking is that if the spec explains in which way verifying 
only one hash is a security leak, and if developers still do it, then 
it's an issue between the developers and their users-- if the users and 
developers are willing to accept the security leak of having somebody 
recommend a URN they haven't fully checked, then that's their problem, 
I'd think.

- Benja


From justin at chapweske.com  Tue Jul 15 18:13:02 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Cryptographic hashes in URNs
In-Reply-To: <3F14A421.5010502@gmx.de>
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>  <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost> <3F1486C8.3000706@gmx.de> <3F14989C.2020501@chapweske.com> <3F14A421.5010502@gmx.de>
Message-ID: <3F14A687.9000008@chapweske.com>

I won't both the CFRG with this post.

> 
> I need the first kind of semantics, though, if I want to use hash URNs 
> as reliable, trusted, *context-less* identifiers. (I.e., without being 
> able to give more than one URN in an <a href>.) I think having a way to 
> convey these semantics would be good, even if some people will implement 
> it in an insecure way.
> 

I see how your requirements are different.  Perhaps the solution is to 
define a generic URI scheme that allows composition of multiple URIs to 
identify the content.  It could be done in a fashion similar to the DURI 
draft 
(http://www.watersprings.org/pub/id/draft-masinter-dated-uri-03.txt) and 
could be made independant of hashing if you wished.

So, it could be similar in spirit to the urn:hashes that you mentioned 
in a previous post.

I'm actually rather suprised that I've never seen a URI format defined 
that simply implies equivilence between a set of sub-URIs.

-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From gojomo at bitzi.com  Tue Jul 15 20:55:02 2003
From: gojomo at bitzi.com (Gordon Mohr)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Cryptographic hashes in URNs
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>  <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost> <3F1486C8.3000706@gmx.de> <3F14989C.2020501@chapweske.com> <3F14A421.5010502@gmx.de> <3F14A687.9000008@chapweske.com>
Message-ID: <012c01c34b4d$ffb96150$660a000a@golden>

Justin writes:
> > I need the first kind of semantics, though, if I want to use hash URNs 
> > as reliable, trusted, *context-less* identifiers. (I.e., without being 
> > able to give more than one URN in an <a href>.) I think having a way to 
> > convey these semantics would be good, even if some people will implement 
> > it in an insecure way.
> > 
> 
> I see how your requirements are different.  Perhaps the solution is to 
> define a generic URI scheme that allows composition of multiple URIs to 
> identify the content.  It could be done in a fashion similar to the DURI 
> draft 
> (http://www.watersprings.org/pub/id/draft-masinter-dated-uri-03.txt) and 
> could be made independant of hashing if you wished.

I was going to suggest something similar. 

> So, it could be similar in spirit to the urn:hashes that you mentioned 
> in a previous post.
> 
> I'm actually rather suprised that I've never seen a URI format defined 
> that simply implies equivilence between a set of sub-URIs.

There's a facility in the "magnet:" URI format, as practiced, for 
other URIs to be referenced as "exact substitutes" or "acceptable
substitutes" of the main "topic" of the "magnet:" link. (The topic
itself is also a URI, so "magnet" links are really just activator/
envelopes for one or more other related URIs.) See:

  http://groups.yahoo.com/group/magnet-uri/message/9

So while declaring such equivalence is not the purpose of "magnet:"
URIs, there was a desire for such a facility, and so a practice has 
developed. (I'd be the first to admit that practice is far from 
elegant, but perhaps that's unavoidable in this domain.)

- Gordon @ Bitzi
____________________   
Gordon Mohr <gojomo@ . . . At Bitzi, people cooperate to identify, rate,
bitzi.com> Bitzi CTO . . . describe and discover files of every kind. 
_ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it!


From b.fallenstein at gmx.de  Wed Jul 16 06:50:02 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Cryptographic hashes in URNs
In-Reply-To: <012c01c34b4d$ffb96150$660a000a@golden>
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>  <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost> <3F1486C8.3000706@gmx.de> <3F14989C.2020501@chapweske.com> <3F14A421.5010502@gmx.de> <3F14A687.9000008@chapweske.com> <012c01c34b4d$ffb96150$660a000a@golden>
Message-ID: <3F15579E.80308@gmx.de>

Hi Gordon, hi Justin--

Gordon Mohr wrote:
> Justin writes:
>>I see how your requirements are different.  Perhaps the solution is to 
>>define a generic URI scheme that allows composition of multiple URIs to 
>>identify the content.  It could be done in a fashion similar to the DURI 
>>draft 
>>(http://www.watersprings.org/pub/id/draft-masinter-dated-uri-03.txt) and 
>>could be made independant of hashing if you wished.
> 
> I was going to suggest something similar. 

At first I thought this a good idea, but after sleeping over and mulling 
about it, my feeling is that this may be overgeneralization.

I think now that we should probably only standardize those hash 
functions and combinations thereof that someone actually uses, in the 
interest of interoperability, wide review, and having a smaller number 
of different names for a resource. If someone really needs to use 
something different, I think it would be acceptable to have them go 
through the registration and public review process.

So, I would suggest that we specify--

     urn:sha1:<foo>
     urn:tigertree:<bar>
     urn:sha1-tigertree:<foo>.<bar>

Is there anybody whose requirements wouldn't be met by such an approach? 
If not, I think a more general approach would be overgeneralization (and 
would probably introduce unnecessary syntactic ugliness).

Opinions?
- Benja


From b.fallenstein at gmx.de  Wed Jul 16 07:22:01 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00
In-Reply-To: <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>
Message-ID: <3F155F28.6070005@gmx.de>

Peter Thiemann wrote:
>     BF> I just saw your recent I-D in the archives of the urn-nid mailing
>     BF> list. (I'm not subscribed, thus the delay in reaction.)
> [
> Nice to see that somebody is listening to this list, since nobody is
> really subscribed to it. I only received a message that my message
> awaits moderator approval.
> ]

I'm subscribed now :-)

>     BF> I'm working on a system (Storm,
>     BF> <http://sv.gnu.org/projects/storm/>)
> 
> The project description link on that page is stale.

Sorry, I've neglected putting together a homepage for far too long now. 
I'll put something there this week. Till then, there's of course the 
readme; the current version is at:

http://savannah.nongnu.org/cgi-bin/viewcvs/storm/storm/README?rev=1.12&content-type=text/vnd.viewcvs-markup

Basically, it's a storage system which can perform similar functions as 
both the Web and the local file system (unifying their namespaces), 
based on cryptographic technology. Storing and downloading data 
identified through a hash works, but there are some thorny research 
issues with updateable resources, which is why we haven't released 
anything yet (since we cannot guarantee upwards compatibility for 
updateable resources, and thus cannot recomment using Storm yet :( )

>     BF> having very similar addressing needs; in fact, I've been in the
>     BF> process of preparing a registration for an informal URN namespace
>     BF> using MIME type + cryptographic hash to identify typed octet stream
>     BF> resources. We are building a p2p-based extension to the Web, based on
>     BF> this.
> 
> This is not quite what we are after. We are interested in using p2p
> techniques for implementing a global sharable mail store. The main
> application is to use the ids as a replacement for IMAP's concept of
> message uids to allow for easy synchronization of client caches and
> replicas of the mail server. Hence the bias towards the message/rfc822
> type in the draft. However, the intention was to design a scheme which
> is usable for other information storage and retrieval. Perhaps, your
> storm project can provide the storage layer that we need? 

Since you won't need updateable resources, maybe so, if you can live 
with a system written in Java (that can be interacted with through 
simple HTTP, though). I'd be really happy to see our system used in this 
way, I've wanted to store e-mail in it for a long time. I'll want to use 
it :-)

>     BF> - All your identifiers use only a single hash function. Especially for
>     BF>   URNs, which are supposed to be long-lived, there is a good reason to
>     BF>   use hashes generated by more than one function: If one of the
>     BF>   functions is broken (but the other isn't simultaneously), the ids
>     BF>   don't become useless; you can use timestamping to extend the
>     BF>   lifetime of your ids indefinitely (as long as the two hash functions
>     BF>   you happen to use at a given time are never broken
>     BF>   simultaneously). [unforch the only reference I can find on this
>     BF>   right now is the US patent on it :-(, # 5,373,561
>     BF>   <http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=5,373,561.WKU.&OS=PN/5,373,561&RS=PN/5,373,561>.]   
> 
> This is interesting but not directly relevant (I think) because we are not
> dealing with certificates here. Rather you want to increase the
> confidence (if a server supports more than one hash function) and
> robustness (if a server supports just one of a selection of hashes) of
> a data access. I really don't see why timestamping should be required
> because each hash value lives indefinitely long.

It lives only as long as the hash function isn't broken-- but with 
timestamping, you can *continue* to use the identifiers *securely*, 
*after* all hash functions in the identifier have already been broken!

> This is a hairy issue. I understand your reasoning. After studying the
> RFC I tend *not* to commit to any particular coding, but rather make
> the encoding a parameter with some reasonable default. Then the
> identifier equivalence section should state explicitly which
> representation is considered the normalized one. 

Why do you think a choice of encoding is needed?

>     BF> - Your mechanism for hashing parts of an email separately is quite
>     BF>   application-specific; that's a pity, given that the namespace is
>     BF>   quite generally useful otherwise. Different applications may very
>     BF>   easily have different needs for breaking up an entity into parts;
>     BF>   for example, I could easily imagine that somebody would like to hash
>     BF>   each body part of a multipart message independently.
> 
> This is a misunderstanding of the intention, so this requires
> clarification in an update of the draft. The point is that many data
> formats contain header or other meta information (emails, images,
> mp3). The mode parameter signals that this meta information is
> ignored and only the raw contents are hashed.

Hmm. I can see how this could, in principle, be useful for many 
applications, but I'm still wondering how many people would actually 
implement it. Anybody on p2p-hackers who would like to use this in their 
system?

> Since the hash only
> determines the raw contents, the specification needs to define how to
> complete the contents to a valid instance document of the specified
> type. In the case of an email message, this means to add the required
> fields from the RFC2822 specification.

Sorry, I'm not following here. Could you give an example?

> For other formats, other things
> have to be done. And for formats that only consist of raw contents, it
> will not make sense to define a mode 1 id.
> 
>     BF> It seems like solving your need through some mechanism outside this
>     BF> namespace registration would make the namespace simpler and more
>     BF> generally useful. What exactly do you need it for, anyway?
> 
> see the top of this mesage.

Ok, but why is it important that you hash the body separately from the 
header? I'm simply not sure why this is important for you.

>     BF> In the flavor of URI that doesn't include content types, you could
>     BF> simply leave that part off.
> 
>     BF>      urn:cbuid:sha1:<sha1-hash>
> 
>     BF> (Unambiguous because hashes cannot contain commas.) This would
>     BF> probably make the scheme more attractive to folks who don't want to
>     BF> include content types, since there would be no 'artefacts' (":*:")
>     BF> related to them in the content-type-less syntax.
> 
> I don't see the point of doing so. Fact is that the ids are to be
> processed by machines and only occasionally will they pop up "in the
> human eye". Reading and writing them should be straightforward
> to implement. Hence, my preference is to have a fixed number of fields
> separated by ":" and simply leave unspecified slots empty (admittedly,
> the "*" is artificial). That is
> 
> 	urn:cbuid::sha1.tigertree::<sha1-hash>.<tigertree-hash>
> 
> [the empty slot after the sha1.tigertree is reserved for specifying the
> encoding of the following hashes, see the grammar below]

My concern was mostly about the developers who don't feel they need 
content types and may see this as syntactical baggage. If we move in the 
direction of using namespace ids for hash functions, i.e.

     urn:sha1:<hash>

there's also the problem of backwards compatibility; many people use 
them without a content type, today. I think it would be good if we could 
allow for a content type in the syntax, but would have the 
content-type-less syntax as above.

Cheers,
- Benja


From b.fallenstein at gmx.de  Wed Jul 16 08:38:02 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs
In-Reply-To: <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com>
References: <E19cSuD-0005pu-00@localhost> <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com>
Message-ID: <3F1570D6.8080601@gmx.de>

Hi Carl,

Carl Ellison wrote:
> 	I apparently don't know the rules for URN formation.  I thought it
> was completely free after the "urn:".  Is that not true?

No. After the "urn:" comes a "namespace identifier," a colon, and a 
"namespace-specific string," i.e.

     urn:<namespace id>:<namespace-specific string>

Assignment of namespace ids is a manged process, i.e., you cannot just 
create your own, you have to register it with the IETF. Normally this 
requires an RFC. The namespace registration explains how the 
namespace-specific string is interpreted.

For more info, see RFCs 2141 and 3406. The official registry of 
namespace ids is at:

     http://www.iana.org/assignments/urn-namespaces

> 	Is urn:sha1:dRDPBgZzTFq7Jl2Q2N/YNghcfj8= not now legal?

No, since there is no registered 'sha1' namespace.

> At 12:57 AM 7/16/2003 +0200, Benja Fallenstein wrote:
>>    urn:sha1:...
>>    urn:sha256:...
>>    urn:tiger:...
>>    urn:sha1+sha256:...
>>    urn:sha1+tiger:...
>>    urn:sha256+tiger:...
>>    urn:sha1+sha256+tiger:...
[...]
>>Opinions?
> 
> When you combine hash functions, you need to specify the combining
> function also.  I assume you were assuming mere concatenation, here. 
> It's often more than that.  So, the concatenation function would have
> to be listed also.

Hm, can you explain what else would be needed? Do we really need other 
combinations than concatenation in URNs-- what would be the 
applications? (I'm not familiar with other ways in which you would want 
to combine hash functions.)

> My personal preference is for anyone who wants to do this to declare
> a new hash function name and define it as the particular combination
> function of the particular other hash functions, but use that new
> hash function name in things like the URN construct.  We've seen a
> couple of these, but not very many and I don't see a reason to
> encourage people to do this kind of concatenation.

Hm. Maybe you're right. My one point against this would be that if you use

     urn:sha1.tigertree:...

it's easier for a developer who supports only SHA-1 to realize that they 
can provide at least partial verification than it is if you use

     urn:bitprint:...

and define 'bitprint' as, 'SHA-1 concatenated with a Tiger tree-hash.'

But I think I'd be fine with the latter, if it is consensus.

- Benja


From justin at chapweske.com  Wed Jul 16 10:13:02 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs
In-Reply-To: <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com>
References: <E19cSuD-0005pu-00@localhost> <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com>
Message-ID: <3F158787.6020902@chapweske.com>

> 	I'm skipping most of this conversation, to comment on a single point
> at the end..
> 
> 	Is urn:sha1:dRDPBgZzTFq7Jl2Q2N/YNghcfj8= not now legal?
> 

The sha1 urn scheme is not yet registered, but its de facto usage 
utilizes a Base32 encoding, not Base64.

-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From justin at chapweske.com  Wed Jul 16 10:26:02 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] content-types in URIs
In-Reply-To: <3F155F28.6070005@gmx.de>
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> <3F155F28.6070005@gmx.de>
Message-ID: <3F158A90.2060503@chapweske.com>

Perhaps I'm misinterpretting cbuid, but it appears to suggest that the 
hash URIs include the content-type of the content in the URI.

I think this notion is a good idea from a security perspective.  It is 
important to retrieve the content-type from a trusted source.  While 
probably not practical, one could envision a content-type attack where 
something is perfectly harmless when interpretted as a JPEG, but turns 
into a virus when interpretted as an executable.

The problem is, I don't think this belongs in a hash-based URI, because 
the content-type is not self-verifiable, while the hash itself is.

If you want to associate trusted meta-data with a piece of content, I 
would suggest adding a layer of indirection.  Use a hash-based URN such 
as "urn:sha1" to identify the meta-data and not the file itself.  I 
think the old Mojonation system used to do this if I'm not mistaken.

An alternative to hashing the meta-data is to use a facility like the 
HTML <link> tag or HTTP Link header to describe extra meta-data about a 
URI, such as the content type.  This approach is used in 
RSS-autodiscovery as follows:

<link rel="alternate" type="application/rss+xml" title="RSS feed" 
href="http://foo.com/rss.xml" />

> 
> My concern was mostly about the developers who don't feel they need 
> content types and may see this as syntactical baggage. If we move in the 
> direction of using namespace ids for hash functions, i.e.
> 
>     urn:sha1:<hash>
> 
> there's also the problem of backwards compatibility; many people use 
> them without a content type, today. I think it would be good if we could 
> allow for a content type in the syntax, but would have the 
> content-type-less syntax as above.
> 


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From zooko at zooko.com  Wed Jul 16 10:28:02 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs 
In-Reply-To: Message from Justin Chapweske <justin@chapweske.com> 
   of "Wed, 16 Jul 2003 12:12:39 CDT." <3F158787.6020902@chapweske.com> 
References: <E19cSuD-0005pu-00@localhost> <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com>  <3F158787.6020902@chapweske.com> 
Message-ID: <E19cq2G-0008Uh-00@localhost>

Coincidentally, there is an active (and contentious) discussion of 
cryptography-based naming at the "cryptography" mailing list.

Three different concrete proposals, including one already deployed and one 
newly announced, have been mentioned.

majordomo@wasabisystems.com


From b.fallenstein at gmx.de  Wed Jul 16 10:35:02 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs
In-Reply-To: <E19cq2G-0008Uh-00@localhost>
References: <E19cSuD-0005pu-00@localhost> <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com>  <3F158787.6020902@chapweske.com> <E19cq2G-0008Uh-00@localhost>
Message-ID: <3F158C33.4070604@gmx.de>

Hi Zooko,

Zooko wrote:
> Coincidentally, there is an active (and contentious) discussion of 
> cryptography-based naming at the "cryptography" mailing list.
> 
> Three different concrete proposals, including one already deployed and one 
> newly announced, have been mentioned.

Is there an archive?

> majordomo@wasabisystems.com

"subscribe cryptography" gives:
**** subscribe: unknown list 'cryptography'.

- Benja


From zooko at zooko.com  Wed Jul 16 10:54:03 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] content-types in URIs 
In-Reply-To: Message from Justin Chapweske <justin@chapweske.com> 
   of "Wed, 16 Jul 2003 12:25:36 CDT." <3F158A90.2060503@chapweske.com> 
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> <3F155F28.6070005@gmx.de>  <3F158A90.2060503@chapweske.com> 
Message-ID: <E19cqS3-0000IQ-00@localhost>

 Justin Chapweske wrote:
>
> I think this notion is a good idea from a security perspective.  It is 
> important to retrieve the content-type from a trusted source.  While 
> probably not practical, one could envision a content-type attack where 
> something is perfectly harmless when interpretted as a JPEG, but turns 
> into a virus when interpretted as an executable.

I agree that this is important for security.

> The problem is, I don't think this belongs in a hash-based URI, because 
> the content-type is not self-verifiable, while the hash itself is.

I agree that this is a problem, but I wouldn't say that the file has a "real" 
content-type and the problem is making sure that the real content-type is 
included.  Rather I would say that different people might honestly ascribe 
different type to the same file.

I think it's quite reasonable to include some type information in the 
crypto-id of the file, but we haven't yet decided to do that in Mnet.

Here is the design document for the erasure coding, encryption, and 
identification of files in Mnet:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/mnet/mnet_new/doc/new_filesystem.html

The doc is short and sweet, and possibly of interest if you are following this 
thread.  In particular the section on metadata details several alternatives 
that we considered:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/mnet/mnet_new/doc/new_filesystem.html#metadata


> If you want to associate trusted meta-data with a piece of content, I 
> would suggest adding a layer of indirection.  Use a hash-based URN such 
> as "urn:sha1" to identify the meta-data and not the file itself.  I 
> think the old Mojonation system used to do this if I'm not mistaken.

Yes, adding another layer of indirection in order to handle metadata is one of 
the ideas, actually three of the ideas, suggested in our design doc.

By the way, you are mistaken -- the hash-based ids in Mojo Nation identified 
just the file contents (plus a file name, but not a file type other than the 
extension of the file name).  The metadata in Mojo Nation included the 
crypto-id of the file contents.  The metadata was unsigned XML that you 
trusted because you got it directly from a server that you trusted.

(Although actually you had no good reason to trust the servers, so this was an 
open hole.)

Regards,

Zooko

http://zooko.com/

From smb at research.att.com  Wed Jul 16 10:57:02 2003
From: smb at research.att.com (Steven M. Bellovin)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs 
Message-ID: <20030716175034.23A027B4D@berkshire.research.att.com>

In message <3F158C33.4070604@gmx.de>, Benja Fallenstein writes:
>
>Hi Zooko,
>
>Zooko wrote:
>> Coincidentally, there is an active (and contentious) discussion of 
>> cryptography-based naming at the "cryptography" mailing list.
>> 
>> Three different concrete proposals, including one already deployed and one 
>> newly announced, have been mentioned.
>
>Is there an archive?
>
>> majordomo@wasabisystems.com
>
>"subscribe cryptography" gives:
>**** subscribe: unknown list 'cryptography'.

It's now at metzdowd.com


		--Steve Bellovin, http://www.research.att.com/~smb (me)
		http://www.wilyhacker.com (2nd edition of "Firewalls" book)


From b.fallenstein at gmx.de  Wed Jul 16 11:25:02 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] content-types in URIs
In-Reply-To: <3F158A90.2060503@chapweske.com>
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> <3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com>
Message-ID: <3F159813.4070903@gmx.de>

Hi Justin,

Justin Chapweske wrote:
> Perhaps I'm misinterpretting cbuid, but it appears to suggest that the 
> hash URIs include the content-type of the content in the URI.
> 
> I think this notion is a good idea from a security perspective.  It is 
> important to retrieve the content-type from a trusted source.  While 
> probably not practical, one could envision a content-type attack where 
> something is perfectly harmless when interpretted as a JPEG, but turns 
> into a virus when interpretted as an executable.

Yes, or a file that is a diagram when interpreted as JPEG, but a 
pornographic image when interpreted as some other image format. (Given 
the container architecure of many formats, I don't think it's necessary 
impossible to find such a 'pun' where two different formats interpret 
the header of a file differently, and thus have two different 
interpretations for the body.)

> The problem is, I don't think this belongs in a hash-based URI, because 
> the content-type is not self-verifiable, while the hash itself is.

In my mind, the entity that we're identifying is a pair:

     (content type, octet stream)

The question is, given such a pair, and given a hash-based URI, can we 
authenticate that the URI identifies exactly this pair and no other? If 
so, I would call the URI self-verifying.

Now, given

     urn:sha1:text/plain,<hash>

and a pair

     ("text/plain", "foobar")

we can verify the <hash> against "foobar", and compare the "text/plain" 
from the URI to the "text/plain" in the pair, so the URI clearly maps to 
only one such pair (as long as finding a hash collision is impossible). 
Thus, sha1 identifiers of this form *would* be self-verifying.

Do you identify "self-verifying" somehow differently?

- Benja


From b.fallenstein at gmx.de  Wed Jul 16 11:45:01 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: Hash URIs and metadata (was: Re: [p2p-hackers] content-types in URIs)
In-Reply-To: <E19cqS3-0000IQ-00@localhost>
References: <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> <3F155F28.6070005@gmx.de>  <3F158A90.2060503@chapweske.com> <E19cqS3-0000IQ-00@localhost>
Message-ID: <3F159C99.6070905@gmx.de>

Hi,

Zooko wrote:
>>If you want to associate trusted meta-data with a piece of content, I 
>>would suggest adding a layer of indirection.  Use a hash-based URN such 
>>as "urn:sha1" to identify the meta-data and not the file itself.  I 
>>think the old Mojonation system used to do this if I'm not mistaken.
> 
> Yes, adding another layer of indirection in order to handle metadata is one of 
> the ideas, actually three of the ideas, suggested in our design doc.

I'm planning to implement a system that works like this, for providing 
HTTP content negotiation features as well as arbitrary metadata, on top 
of hash-based identification.

You would create a "resource specification", e.g. like this:

     <>   spec:hasRepresentation    <urn:sha1:xyz> .
     <urn:sha1:xyz>   dc:language   "en" .

     <>   spec:hasRepresentation    <urn:sha1:abc> .
     <urn:sha1:abc>   dc:language   "fr" .

     <>   cc:license     <http://example.org/freelicense> .
     <>   dc:author      _:x .
     _:x  foaf:mailbox   <mailto:b.fallenstein@gmx.de> .

So "this resource" would have two representations in two different 
languages, plus associated author and license information.

Then you would refer to this resource through a special URN, like

     urn:urn-x:sha1:FOO

where FOO is the hash of the above RDF graph.

Entering the above URI into a browser would bring up the English or 
French version of the resource, depending on your preferences.

To download the defining RDF graph itself, you'd use

     urn:sha1:FOO

If anybody is doing something similar, it would be good to hear about it 
and possibly collaborate.

- Benja


From seth.johnson at RealMeasures.dyndns.org  Thu Jul 17 10:05:02 2003
From: seth.johnson at RealMeasures.dyndns.org (Seth Johnson)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Nature: WIPO to Address Free Public Goods
Message-ID: <3F16D506.8FD0C47C@RealMeasures.dyndns.org>

(Forwarded from CNI Copyright list)

---------- Forwarded message ---------- 
From: James Love 
To: CNI-COPYRIGHT -- Copyright & Intellectual Property 
Sent: 7/10/03 7:03 PM 
Subject: Nature: Drive for patent-free innovation gathers pace -  Kamil
Idris is being asked to assess the merits of an open approach to
intellectual property

Nature reports that WIPO has agreed to organize the meeting on open
development models... jamie

* Francis Gurry, an assistant director-general at the WIPO, said that the
organization welcomed the idea.?The use of open and collaborative
development models for research and innovation is a very important and
interesting development,? he said in a statement. ?The director-general
looks forward with enthusiasm to taking up the invitation to organize a
conference to explore the scope and application of these models.?

in html

> http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v424/n6945/full/424118a_fs.html
or in pdf
> http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v424/n6945/full/424118a_fs.html&content_filetype=PDF

118 NATURE|VOL 424 | 10 JULY 2003 |www.nature.com/nature

***

Drive for patent-free innovation gathers pace

Kamil Idris is being asked to assess the merits of an open approach to
intellectual property.

Declan Butler,Paris

A group of top scientists and economists are asking the World Intellectual
Property Organization (WIPO) in Geneva to promote open models of innovation
that don't rely on patents.

The group believes that innovation based on freely available knowledge can
be effective not just in areas where it has established a foothold -- such
as genome sequence data -- but also in sectors where patent protection is
entirely dominant, such as drug development (see Nature 424, 10?11; 2003).

In a 7 July letter to Kamil Idris, director general of the WIPO, 59
scientists and economists call attention to the "explosion of open and
collaborative projects to create public goods" in recent years, including
the Human Genome Project, the open-source software movement, and Internet
standards. Such projects show that "one can achieve a high level of
innovation in some areas of the modern economy without intellectual property
protection," says the letter, arguing that "excessive, unbalanced or poorly
designed intellectual property protections may be counterproductive." It
calls on the WIPO to hold a major conference on these models during 2004.

The signatories include Joseph Stiglitz of Columbia University in New York,
who received the 2001 Nobel prize for economics; John Sulston of the
Wellcome Trust Sanger Institute near Cambridge, UK, winner of the 2002 Nobel
prize for medicine; James Orbinski, former president of M?decins Sans
Fronti?res; and Richard Stallman, a computer scientist regarded by many as
the "father" of the open-source software movement.

Francis Gurry, an assistant director-general at the WIPO, said that the
organization welcomed the idea.  "The use of open and collaborative
development models for research and innovation is a very important and
interesting development," he said in a statement.  "The director-general
looks forward with enthusiasm to taking up the invitation to organize a
conference to explore the scope and application of these models.'

Advocates of open-source innovation want the WIPO and other public agencies
to rethink how innovation works, says James Love, director of the
Washington-based Consumer Project on Technology and a signatory to the
letter.  Open research for drug development is one of the initiative?s main
targets, he says.  Some of the authors are also pursuing the idea of an
international treaty to encourage governments to fund drug research and put
the results directly into the public domain.

Love argues that research results should ultimately become a freely
available commodity, with drug companies competing to market generics of any
drugs developed.  The current system, in which drug research and development
is carried out by drug companies that keep patent rights for up to 20 years,
is grossly inefficient and results in excessive prices so that those who
need the drugs most cannot afford them, argues Love.

Yet to be fleshed out are details of how such a model would work, and how
competitive forces could be maintained within it.But in May, the general
assembly of the World Health Organization instructed agency officials to
draft terms of reference during 2004 for a new evaluation of intellectual
property, innovation and public health.  Consideration of open-science
models is expected to be part of this exercise.

"The success of the Internet and of open-source software has driven home
just how far open and collaborative projects can go," says Hal Varian, an
economist at the University of California, Berkeley, who has also signed the
7 July letter.

Another signatory, Paul David, an economist at Stanford University, argues
that systems such as free and open-source software are not at odds with
intellectual property rights protection, but rather a choice by creators and
society as to the benefits they want to obtain.

118 NATURE|VOL 424 | 10 JULY 2003 |www.nature.com/nature Kamil Idris is
being asked to assess the merits of an open approach to intellectual
property.

-- 
James Love, Director, Consumer Project on Technology
http://www.cptech.org, mailto:james.love@cptech.org
tel. +1.202.387.8030, mobile +1.202.361.3040
***
-- 

DRM is Theft!  We are the Stakeholders!

New Yorkers for Fair Use
http://www.nyfairuse.org

[CC] Counter-copyright: http://cyber.law.harvard.edu/cc/cc.html

I reserve no rights restricting copying, modification or distribution of
this incidentally recorded communication.  Original authorship should be
attributed reasonably, but only so far as such an expectation might hold for
usual practice in ordinary social discourse to which one holds no claim of
exclusive rights.


From gojomo at bitzi.com  Thu Jul 17 22:32:02 2003
From: gojomo at bitzi.com (Gordon Mohr)
Date: Sat Dec  9 22:12:21 2006
Subject: crypto naming Re: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs
References: <E19cSuD-0005pu-00@localhost> <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com>  <3F158787.6020902@chapweske.com> <E19cq2G-0008Uh-00@localhost> <3F158C33.4070604@gmx.de>
Message-ID: <018f01c34ced$e9fdf280$660a000a@golden>

Benja writes:
> Hi Zooko,
> 
> Zooko wrote:
> > Coincidentally, there is an active (and contentious) discussion of 
> > cryptography-based naming at the "cryptography" mailing list.
> > 
> > Three different concrete proposals, including one already deployed and one 
> > newly announced, have been mentioned.
> 
> Is there an archive?

I'd also be interested if there was an archive. Or, if the concrete 
proposals are available in draft form anywhere.

- Gordon

From hal at finney.org  Thu Jul 17 23:12:02 2003
From: hal at finney.org (Hal Finney)
Date: Sat Dec  9 22:12:21 2006
Subject: crypto naming Re: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs
Message-ID: <200307180610.h6I6AKs06961@finney.org>

Benja writes:
> Hi Zooko,
> 
> Zooko wrote:
> > Coincidentally, there is an active (and contentious) discussion of 
> > cryptography-based naming at the "cryptography" mailing list.
> > 
> > Three different concrete proposals, including one already deployed and one 
> > newly announced, have been mentioned.
> 
> Is there an archive?

The cryptography list is archived (somewhat imperfectly) at
http://www.mail-archive.com/cryptography%40metzdowd.com/.  You can
read the discussion of the naming issues under the thread
"Announcing httpsy://, a YURL scheme", originating with message
http://www.mail-archive.com/cryptography%40metzdowd.com/msg00481.html.
The YURL scheme itself is described at http://www.waterken.com/dev/YURL/
and related pages.

Hal

From zooko at zooko.com  Fri Jul 18 06:58:01 2003
From: zooko at zooko.com (Zooko)
Date: Sat Dec  9 22:12:21 2006
Subject: crypto naming Re: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs 
In-Reply-To: Message from "Gordon Mohr" <gojomo@bitzi.com> 
   of "Thu, 17 Jul 2003 22:32:00 PDT." <018f01c34ced$e9fdf280$660a000a@golden> 
References: <E19cSuD-0005pu-00@localhost> <3F0DC458.6090401@gmx.de> <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de> <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> <3F158787.6020902@chapweske.com> <E19cq2G-0008Uh-00@localhost> <3F158C33.4070604@gmx.de>  <018f01c34ced$e9fdf280$660a000a@golden> 
Message-ID: <E19dViB-0008IA-00@localhost>

 Gordon Mohr wrote:
>
> I'd also be interested if there was an archive. Or, if the concrete 
> proposals are available in draft form anywhere.

The proposals I meant were:

 * The Eternal Resource Locator

Anderson R. J., Matyas V., Jr. and Petitcolas F. A. P. "The Eternal Resource 
Locator: An Alternative Means of Establishing Trust on the World Wide Web", in 
3rd USENIX workshop on Electronic Commerce, 1998, Boston, Massachusetts, USA, 

http://citeseer.nj.nec.com/365389.html

 * The Self-Certifying File System
http://fs.net/

 * YURL
https://www.waterken.com/dev/YURL/


From mfreed at cs.nyu.edu  Fri Jul 18 13:20:02 2003
From: mfreed at cs.nyu.edu (Michael J. Freedman)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Verification for erasure-encoded multi-source dload
In-Reply-To: <1057775910.13939.896.camel@monster.omnifarious.org>
Message-ID: <Pine.BSO.4.44.0307181611550.26601-100000@ludlow.scs.cs.nyu.edu>

Hi all,

Given the recent long discussions about THEX and hash trees for
multi-source download (i.e., swarming), I thought this alternate approach
may be of direct interest:

  On-the-Fly Verification of Erasure-Encoded File Transfers (Extended Abstract)
  To appear in 1st IRIS Student Workshop on Peer-to-Peer Systems
  http://www.scs.cs.nyu.edu/~mfreed/docs/authcodes-isw03.ps .pdf

(Please note that this is not a full paper and is still in draft form.)

Basically, erasure codes such as LT Codes (Luby) and Online codes
(Maymounkov) can be very useful for building more efficient multi-source
download algorithms (see, for instance, "Rateless Codes and Big
Downloads", at http://www.scs.cs.nyu.edu/~petar/msdlncs.ps)

Unfortunately, traditional hash trees are not useful in this environment,
as the "checkblocks"  generated by erasure coding at mirrors are
randomized -- the initial file publisher cannot verify these.  This paper
describes a technique, based on "homomorphic hashing", that allows the
downloader to verify checkblocks on-the-fly.

--mike


On 9 Jul 2003, Eric M. Hopper wrote:

> Date: 09 Jul 2003 13:38:30 -0500
> From: Eric M. Hopper <hopper@omnifarious.org>
> Reply-To: p2p-hackers@zgp.org
> To: p2p-hackers@zgp.org
> Subject: Re: [p2p-hackers] THEX efficiency for authenticated p2p dload.
>     and alternate approach
>
> On Thu, 2003-07-03 at 23:47, Adam Back wrote:
>
> > So one common p2p download approach is to download a file in parts in
> > parallel (and out of order) from multiple servers.  (Particularly to
> > achieve reasonable download rates from multiple asynchronous links of
> > varying link speeds).  A common idiom is also that there is a compact
> > authenticator for a file (such as it's hash) which people will supply
> > as a document-id.
>
>
> There is an important thing you can do with an order 2 authentication
> tree that is much harder to do with a tree of an order larger than 2.
> That's download the authentication data needed to verify each packet
> against the root along with the packet itself.
>
> For example, if you have a 2-way THEX tree for 2^3*blocksize data, it
> will look something like this:
>
> Diagram of 2-way THEX tree
>
> If someone transmits the data for node A, in order to verify node A
> completely, the hashes for B, J, and N need to be transmitted.  No other
> hashes are needed since they are already known, as in the case for the
> root node, or can be calculated.
>
> If someone then transmits the data for node B, no hashes need to be
> transmitted since the reciever already has all the needed hashes.  For
> C, only D is needed.
>
> If you have an 8-way THEX tree, you end up with a diagram like this:
>
> Node diagram for an 8-way tree
>
> If someone recieves node A, they will have to also get the hashes for
> nodes B-H in order to verify node A.  This is MUCH more information than
> with a 2-way THEX tree, and as the depth of both trees grows, the 2-way
> tree is favored more and more.
>
> I think one useful measure is how much data is needed to verify a given
> block as compared to the size of a block.  If you have a block size of
> 64KB, and a hash data size of 32 bytes (I think SHA-1 is just a little
> too weak, and prefer SHA2-256), then you can deal with a 16MB file and
> still ensure that the maximum amount of data needed to verify any given
> block is less than the size of a block.  If you only use a 16 byte hash,
> then you can send 4GB file and still keep that property.  If you
> maintain a 32 byte hash, but go to a 128KB block, you can transmit an
> 8GB file and maintain that property.
>
> This is also highly resistant to jamming.  If you get the verification
> data from the same node that sent you the data block in the first place,
> that node will be unable to spoof the verification data to make a bad
> block look like a good one.  If you get the verification data from a
> different node than sent you the data block, that node will be unable to
> spoof the hashes in order to make a good block look like a bad one.  So
> errors in either the verification hashes, or the block are easily and
> quickly detectable.
>
> Lastly, it is easy to specify which subset of verification data you
> need.  Any data block you recieve may need
> log2(number_of_blocks_in_file) hashes worth of verification data.  You
> can simply send a bitstring with one bit for each hash you might
> potentially need saying whether or not you actually need it or not.
>
> Sorry this is in HTML.  I just didn't want to have to use ASCII art for
> the diagrams because it'd be a huge and annoying pain.
>
> Have fun (if at all possible),
>
> --
> There's an excellent C/C++/Python/Unix/Linux programmer with a wide
> range of other experience and system admin skills who needs work.
> Namely, me. http://www.omnifarious.org/~hopper/resume.html
> -- Eric Hopper <hopper@omnifarious.org>
>
>

-----
"Not all those who wander are lost."           www.michaelfreedman.org


From justin at chapweske.com  Fri Jul 18 15:06:02 2003
From: justin at chapweske.com (Justin Chapweske)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Verification for erasure-encoded multi-source dload
In-Reply-To: <Pine.BSO.4.44.0307181611550.26601-100000@ludlow.scs.cs.nyu.edu>
References: <Pine.BSO.4.44.0307181611550.26601-100000@ludlow.scs.cs.nyu.edu>
Message-ID: <3F186F24.4010405@chapweske.com>

Michael,

Interesting results.  Lots of great work coming out of you NYU guys :) 
Kademlia is a very nice system.

I'd like to point out though that hash trees still work quite well when 
small expansion factors are used.

With the original Swarmcast system back in 2000, we used an expansion 
factor of 8 (k=32, n=256) though I'm sure we could have gotten away with 
an even smaller expansion factor by adding a bit more scheduling to the 
protocol.  Either way, it is quite manageable to build a hash tree 
across the entire set of encoded data, though it requires a bit of 
preprocessing.

If you don't wish to make the hash tree dependant on the entire encoded 
set, you can create a number of hash trees, one for each expansion 
factor.  So with a systematic code such as the Vandermonde codes, your 
first hash tree is equivilent to a normal hash tree over the vanilla 
file.  Are Online Codes systematic?

Obviously, your work is important if you decide to forgo the 
Reed-Solomon codes and use LT or Online codes which lend themselves to 
huge expansion factors.  All in all, you guys do a great job of creating 
some very elegant systems.  By the way, what is the patent status of the 
Online Codes?

Thanks,

-Justin

Michael J. Freedman wrote:
> Hi all,
> 
> Given the recent long discussions about THEX and hash trees for
> multi-source download (i.e., swarming), I thought this alternate approach
> may be of direct interest:
> 
>   On-the-Fly Verification of Erasure-Encoded File Transfers (Extended Abstract)
>   To appear in 1st IRIS Student Workshop on Peer-to-Peer Systems
>   http://www.scs.cs.nyu.edu/~mfreed/docs/authcodes-isw03.ps .pdf
> 
> (Please note that this is not a full paper and is still in draft form.)
> 
> Basically, erasure codes such as LT Codes (Luby) and Online codes
> (Maymounkov) can be very useful for building more efficient multi-source
> download algorithms (see, for instance, "Rateless Codes and Big
> Downloads", at http://www.scs.cs.nyu.edu/~petar/msdlncs.ps)
> 
> Unfortunately, traditional hash trees are not useful in this environment,
> as the "checkblocks"  generated by erasure coding at mirrors are
> randomized -- the initial file publisher cannot verify these.  This paper
> describes a technique, based on "homomorphic hashing", that allows the
> downloader to verify checkblocks on-the-fly.
> 
> --mike
> 
> 
> On 9 Jul 2003, Eric M. Hopper wrote:
> 
> 
>>Date: 09 Jul 2003 13:38:30 -0500
>>From: Eric M. Hopper <hopper@omnifarious.org>
>>Reply-To: p2p-hackers@zgp.org
>>To: p2p-hackers@zgp.org
>>Subject: Re: [p2p-hackers] THEX efficiency for authenticated p2p dload.
>>    and alternate approach
>>
>>On Thu, 2003-07-03 at 23:47, Adam Back wrote:
>>
>>
>>>So one common p2p download approach is to download a file in parts in
>>>parallel (and out of order) from multiple servers.  (Particularly to
>>>achieve reasonable download rates from multiple asynchronous links of
>>>varying link speeds).  A common idiom is also that there is a compact
>>>authenticator for a file (such as it's hash) which people will supply
>>>as a document-id.
>>
>>
>>There is an important thing you can do with an order 2 authentication
>>tree that is much harder to do with a tree of an order larger than 2.
>>That's download the authentication data needed to verify each packet
>>against the root along with the packet itself.
>>
>>For example, if you have a 2-way THEX tree for 2^3*blocksize data, it
>>will look something like this:
>>
>>Diagram of 2-way THEX tree
>>
>>If someone transmits the data for node A, in order to verify node A
>>completely, the hashes for B, J, and N need to be transmitted.  No other
>>hashes are needed since they are already known, as in the case for the
>>root node, or can be calculated.
>>
>>If someone then transmits the data for node B, no hashes need to be
>>transmitted since the reciever already has all the needed hashes.  For
>>C, only D is needed.
>>
>>If you have an 8-way THEX tree, you end up with a diagram like this:
>>
>>Node diagram for an 8-way tree
>>
>>If someone recieves node A, they will have to also get the hashes for
>>nodes B-H in order to verify node A.  This is MUCH more information than
>>with a 2-way THEX tree, and as the depth of both trees grows, the 2-way
>>tree is favored more and more.
>>
>>I think one useful measure is how much data is needed to verify a given
>>block as compared to the size of a block.  If you have a block size of
>>64KB, and a hash data size of 32 bytes (I think SHA-1 is just a little
>>too weak, and prefer SHA2-256), then you can deal with a 16MB file and
>>still ensure that the maximum amount of data needed to verify any given
>>block is less than the size of a block.  If you only use a 16 byte hash,
>>then you can send 4GB file and still keep that property.  If you
>>maintain a 32 byte hash, but go to a 128KB block, you can transmit an
>>8GB file and maintain that property.
>>
>>This is also highly resistant to jamming.  If you get the verification
>>data from the same node that sent you the data block in the first place,
>>that node will be unable to spoof the verification data to make a bad
>>block look like a good one.  If you get the verification data from a
>>different node than sent you the data block, that node will be unable to
>>spoof the hashes in order to make a good block look like a bad one.  So
>>errors in either the verification hashes, or the block are easily and
>>quickly detectable.
>>
>>Lastly, it is easy to specify which subset of verification data you
>>need.  Any data block you recieve may need
>>log2(number_of_blocks_in_file) hashes worth of verification data.  You
>>can simply send a bitstring with one bit for each hash you might
>>potentially need saying whether or not you actually need it or not.
>>
>>Sorry this is in HTML.  I just didn't want to have to use ASCII art for
>>the diagrams because it'd be a huge and annoying pain.
>>
>>Have fun (if at all possible),
>>
>>--
>>There's an excellent C/C++/Python/Unix/Linux programmer with a wide
>>range of other experience and system admin skills who needs work.
>>Namely, me. http://www.omnifarious.org/~hopper/resume.html
>>-- Eric Hopper <hopper@omnifarious.org>
>>
>>
> 
> -----
> "Not all those who wander are lost."           www.michaelfreedman.org
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/


From petar at scs.cs.nyu.edu  Fri Jul 18 16:41:02 2003
From: petar at scs.cs.nyu.edu (Petar Maymounkov)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Verification for erasure-encoded multi-source
 dload
In-Reply-To: <3F186F24.4010405@chapweske.com>
Message-ID: <Pine.BSO.4.44.0307181851250.16439-100000@ludlow.scs.cs.nyu.edu>

Hi Guys,

So, we haven't thought about using trees for Online Codes because they are
rateless, which means that practically they have an infinite expansion, so
a tree won't be feasible. (Unlike, Reed-Solomon and other codes, which
have a pre-specified expansion).

As for the patent, it is held by DigitalFountain, more specifically
Michael Luby and Amin Shokrollahi.

Petar


On Fri, 18 Jul 2003, Justin Chapweske wrote:

> Michael,
>
> Interesting results.  Lots of great work coming out of you NYU guys :)
> Kademlia is a very nice system.
>
> I'd like to point out though that hash trees still work quite well when
> small expansion factors are used.
>
> With the original Swarmcast system back in 2000, we used an expansion
> factor of 8 (k=32, n=256) though I'm sure we could have gotten away with
> an even smaller expansion factor by adding a bit more scheduling to the
> protocol.  Either way, it is quite manageable to build a hash tree
> across the entire set of encoded data, though it requires a bit of
> preprocessing.
>
> If you don't wish to make the hash tree dependant on the entire encoded
> set, you can create a number of hash trees, one for each expansion
> factor.  So with a systematic code such as the Vandermonde codes, your
> first hash tree is equivilent to a normal hash tree over the vanilla
> file.  Are Online Codes systematic?
>
> Obviously, your work is important if you decide to forgo the
> Reed-Solomon codes and use LT or Online codes which lend themselves to
> huge expansion factors.  All in all, you guys do a great job of creating
> some very elegant systems.  By the way, what is the patent status of the
> Online Codes?
>
> Thanks,
>
> -Justin
>
> Michael J. Freedman wrote:
> > Hi all,
> >
> > Given the recent long discussions about THEX and hash trees for
> > multi-source download (i.e., swarming), I thought this alternate approach
> > may be of direct interest:
> >
> >   On-the-Fly Verification of Erasure-Encoded File Transfers (Extended Abstract)
> >   To appear in 1st IRIS Student Workshop on Peer-to-Peer Systems
> >   http://www.scs.cs.nyu.edu/~mfreed/docs/authcodes-isw03.ps .pdf
> >
> > (Please note that this is not a full paper and is still in draft form.)
> >
> > Basically, erasure codes such as LT Codes (Luby) and Online codes
> > (Maymounkov) can be very useful for building more efficient multi-source
> > download algorithms (see, for instance, "Rateless Codes and Big
> > Downloads", at http://www.scs.cs.nyu.edu/~petar/msdlncs.ps)
> >
> > Unfortunately, traditional hash trees are not useful in this environment,
> > as the "checkblocks"  generated by erasure coding at mirrors are
> > randomized -- the initial file publisher cannot verify these.  This paper
> > describes a technique, based on "homomorphic hashing", that allows the
> > downloader to verify checkblocks on-the-fly.
> >
> > --mike
> >
> >
> > On 9 Jul 2003, Eric M. Hopper wrote:
> >
> >
> >>Date: 09 Jul 2003 13:38:30 -0500
> >>From: Eric M. Hopper <hopper@omnifarious.org>
> >>Reply-To: p2p-hackers@zgp.org
> >>To: p2p-hackers@zgp.org
> >>Subject: Re: [p2p-hackers] THEX efficiency for authenticated p2p dload.
> >>    and alternate approach
> >>
> >>On Thu, 2003-07-03 at 23:47, Adam Back wrote:
> >>
> >>
> >>>So one common p2p download approach is to download a file in parts in
> >>>parallel (and out of order) from multiple servers.  (Particularly to
> >>>achieve reasonable download rates from multiple asynchronous links of
> >>>varying link speeds).  A common idiom is also that there is a compact
> >>>authenticator for a file (such as it's hash) which people will supply
> >>>as a document-id.
> >>
> >>
> >>There is an important thing you can do with an order 2 authentication
> >>tree that is much harder to do with a tree of an order larger than 2.
> >>That's download the authentication data needed to verify each packet
> >>against the root along with the packet itself.
> >>
> >>For example, if you have a 2-way THEX tree for 2^3*blocksize data, it
> >>will look something like this:
> >>
> >>Diagram of 2-way THEX tree
> >>
> >>If someone transmits the data for node A, in order to verify node A
> >>completely, the hashes for B, J, and N need to be transmitted.  No other
> >>hashes are needed since they are already known, as in the case for the
> >>root node, or can be calculated.
> >>
> >>If someone then transmits the data for node B, no hashes need to be
> >>transmitted since the reciever already has all the needed hashes.  For
> >>C, only D is needed.
> >>
> >>If you have an 8-way THEX tree, you end up with a diagram like this:
> >>
> >>Node diagram for an 8-way tree
> >>
> >>If someone recieves node A, they will have to also get the hashes for
> >>nodes B-H in order to verify node A.  This is MUCH more information than
> >>with a 2-way THEX tree, and as the depth of both trees grows, the 2-way
> >>tree is favored more and more.
> >>
> >>I think one useful measure is how much data is needed to verify a given
> >>block as compared to the size of a block.  If you have a block size of
> >>64KB, and a hash data size of 32 bytes (I think SHA-1 is just a little
> >>too weak, and prefer SHA2-256), then you can deal with a 16MB file and
> >>still ensure that the maximum amount of data needed to verify any given
> >>block is less than the size of a block.  If you only use a 16 byte hash,
> >>then you can send 4GB file and still keep that property.  If you
> >>maintain a 32 byte hash, but go to a 128KB block, you can transmit an
> >>8GB file and maintain that property.
> >>
> >>This is also highly resistant to jamming.  If you get the verification
> >>data from the same node that sent you the data block in the first place,
> >>that node will be unable to spoof the verification data to make a bad
> >>block look like a good one.  If you get the verification data from a
> >>different node than sent you the data block, that node will be unable to
> >>spoof the hashes in order to make a good block look like a bad one.  So
> >>errors in either the verification hashes, or the block are easily and
> >>quickly detectable.
> >>
> >>Lastly, it is easy to specify which subset of verification data you
> >>need.  Any data block you recieve may need
> >>log2(number_of_blocks_in_file) hashes worth of verification data.  You
> >>can simply send a bitstring with one bit for each hash you might
> >>potentially need saying whether or not you actually need it or not.
> >>
> >>Sorry this is in HTML.  I just didn't want to have to use ASCII art for
> >>the diagrams because it'd be a huge and annoying pain.
> >>
> >>Have fun (if at all possible),
> >>
> >>--
> >>There's an excellent C/C++/Python/Unix/Linux programmer with a wide
> >>range of other experience and system admin skills who needs work.
> >>Namely, me. http://www.omnifarious.org/~hopper/resume.html
> >>-- Eric Hopper <hopper@omnifarious.org>
> >>
> >>
> >
> > -----
> > "Not all those who wander are lost."           www.michaelfreedman.org
> >
> > _______________________________________________
> > p2p-hackers mailing list
> > p2p-hackers@zgp.org
> > http://zgp.org/mailman/listinfo/p2p-hackers
>
>
> --
> Justin Chapweske, Onion Networks
> http://onionnetworks.com/
>
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
>


From decoy at iki.fi  Sat Jul 19 04:45:02 2003
From: decoy at iki.fi (Sampo Syreeni)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Verification for erasure-encoded multi-source
 dload
In-Reply-To: <3F186F24.4010405@chapweske.com>
References: <Pine.BSO.4.44.0307181611550.26601-100000@ludlow.scs.cs.nyu.edu>
 <3F186F24.4010405@chapweske.com>
Message-ID: <Pine.SOL.4.51.0307191442370.18990@kruuna.Helsinki.FI>

On 2003-07-18, Justin Chapweske uttered:

>By the way, what is the patent status of the Online Codes?

I'd also check whether there are patents on the overall concept of using
sparse codes for swarmcast -- I think using them in conventional multicast
is in fact patented.
-- 
Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

From sam at neurogrid.com  Sat Jul 19 23:58:02 2003
From: sam at neurogrid.com (Sam Joseph)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] New Wiki List of P2P Conferences
Message-ID: <3F1A3D2B.3080804@neurogrid.com>

Hi All,

Just to let you know that I've set up a list of p2p conferences:

http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences

Most of these are ones that have already happened, but the top most 3 
are yet to come, and the top most one is still open for submissions. I'm 
hoping we can get more conferences in there that have upcoming 
submission deadlines so we can have a better chance of getting any work 
we write up to these conferences.

Please feel free to add conferences to the wiki, or mail me with 
upcoming p2p related conferences and I'll try and make sure they get 
added to the list ...

CHEERS> SAM


From b.fallenstein at gmx.de  Sun Jul 20 12:55:02 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] New Wiki List of P2P Conferences
In-Reply-To: <3F1A3D2B.3080804@neurogrid.com>
References: <3F1A3D2B.3080804@neurogrid.com>
Message-ID: <3F1AF326.7010804@gmx.de>

Hi Sam,

I'm unable to register for the Wiki (I get an error message that email 
could not be delivered). I suggest that you add USENIX NSDI'04 to the 
list-- the CFP mentions P2P systems as one area of interest. Deadline is 
September 15th.

     http://www.usenix.org/events/nsdi04/cfp/

Thanks for the effort, the Wiki looks like a good idea.
- Benja

Sam Joseph wrote:
> Hi All,
> 
> Just to let you know that I've set up a list of p2p conferences:
> 
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
> 
> Most of these are ones that have already happened, but the top most 3 
> are yet to come, and the top most one is still open for submissions. I'm 
> hoping we can get more conferences in there that have upcoming 
> submission deadlines so we can have a better chance of getting any work 
> we write up to these conferences.
> 
> Please feel free to add conferences to the wiki, or mail me with 
> upcoming p2p related conferences and I'll try and make sure they get 
> added to the list ...
> 
> CHEERS> SAM
> 
> 
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> 
> 


From b.fallenstein at gmx.de  Sun Jul 20 13:37:01 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] content-types in URIs
In-Reply-To: <m38yqt3r74.fsf@kailua.informatik.uni-freiburg.de>
References: <3F0DC458.6090401@gmx.de>	<m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>	<3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com>	<3F159813.4070903@gmx.de> <m38yqt3r74.fsf@kailua.informatik.uni-freiburg.de>
Message-ID: <3F1AFD07.2030509@gmx.de>

Peter Thiemann wrote:
>     Benja> Now, given
> 
>     Benja>      urn:sha1:text/plain,<hash>
> 
>     Benja> and a pair
> 
>     Benja>      ("text/plain", "foobar")
> 
>     Benja> we can verify the <hash> against "foobar", and compare the
>     Benja> "text/plain" from the URI to the "text/plain" in the pair, so the URI
>     Benja> clearly maps to only one such pair (as long as finding a hash
>     Benja> collision is impossible). Thus, sha1 identifiers of this form *would*
>     Benja> be self-verifying.
> 
> There is too much good will and protocol in your proposal.

Sorry, I don't understand what you mean.

> Here is a
> much simpler way of getting self-verifying and endorsable hashes:
> 
> Let's say you have this resource
> 
>         ("text/plain", "fubar")
> 
> the trick is to take the hash not just from "fubar" but from the
> concatenation of the mediatype and the contents:
> 
> H = sha1 ("text/plainfubar")

This is a classic fallacy when working with hashes btw: There is no way 
to know which of the following is meant:

     ("text/plai", "nfubar")
     ("text/plain", "fubar")
     ("text/plainf", "ubar")

Of course, that's easily fixed by separating the two by a character that 
cannot occur in the media type, probably the space--

H = sha1 ("text/plain fubar")

> With this setup, if there is a registry which endorses
> 
> 	urn:hash:text/plain:sha1:H
> 
> then that means the registry has checked
> 1. that the "fubar" content is ok and
> 2. that its mediatype is "text/plain"

What's the cost? That it doesn't interoperate with today's systems, 
which only hash the content; that all sha1 hashes in ciculation on the 
Web today are useless.

What's the benefit? That you can use H alone to identify the (content 
type, content) pair. Would this really be useful? I don't see how.

> In addition, the value H is *less* prone to forgery because the hashed
> value is not completely arbitrary: it *must* start with "text/plain".

I fail to see what you mean by "less prone to forgery."

> If this works, then it seems to be a quite strong argument for
> including content types.

Sorry, but while I understand what you propose, I seriously don't have 
the faintest clue why you propose it... :-) I neither understand what 
you mean by "good will and protocol" nor what you mean by "less prone to 
forgery." What kind of forgery do you mean?

We're miscommunicating somewhere, I think.
- Benja


From b.fallenstein at gmx.de  Sun Jul 20 15:31:02 2003
From: b.fallenstein at gmx.de (Benja Fallenstein)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] content-types in URIs
In-Reply-To: <m3u19g2930.fsf@kailua.informatik.uni-freiburg.de>
References: <3F0DC458.6090401@gmx.de>	<m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>	<3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com>	<3F159813.4070903@gmx.de>	<m38yqt3r74.fsf@kailua.informatik.uni-freiburg.de>	<3F1AFD07.2030509@gmx.de> <m3u19g2930.fsf@kailua.informatik.uni-freiburg.de>
Message-ID: <3F1B17C1.8030700@gmx.de>

Hi Peter,

(quoting out-of-order)

Peter Thiemann wrote:
> My proposal aims at adding internal structure artificially just during
> the computation of the hash. For the fubar example, this means
> 
> H = sha1 ("text/plainfubar")
> 
> Suppose some adversary has found a string x so that 
> 
> sha1 (x) = H
> 
> Too bad! But, what is the probability that x starts with
> "text/plain"?

Ok, I'm starting to see what you want, here. You have a specific model 
of attack in mind: Given a hash, the attacker can generate a 
(relatively) random bit string which has that hash. The attacker has 
little choice in the shape of this bit string. In this model, of course, 
it's quite unlikely that 'x' happens to start with 'text/plain.'

But of course the attacker will try to devise an algorithm that finds an 
'x' such that

sha1 ("text/plain" + x) = H

or

sha1 (x repetitive-xor "text/plain") = H

Basically, what you're doing is creating a family of hash functions, 
where the "text/plain" parameter chooses one function from that family. 
Your family would be defined as,

     f_str (x) = sha1 (x + str)

where str is the parameter.

Now, do you seriously think that once sha1 is broken, a skilled attacker 
will not be able to break f_str? I'm not an expert in this field either, 
but it seems like if you can break a hash function consisting of 
multiple rounds of addition, circular left shift, complement, and 
AND/OR/XOR in a complex pattern, that plainly prepending/xoring a 
bitstring won't add a lot of security.

OTOH, the community already considers a function *really, really* broken 
if you can find any x,y so that hash(x) = hash(y). So it's possible that 
no "get x for given H with hash(x)=H" attack will ever be devised. This 
is no protection against a skilled black-hat cryptoanalyst, of course.

If you think that f_str is a strong new family of hash functions, 
stronger than sha1, you should publish about it and subject it to peer 
review ;) ;)

>>>>>>"Benja" == Benja Fallenstein <b.fallenstein@gmx.de> writes:
>     >> There is too much good will and protocol in your proposal.
> 
>     Benja> Sorry, I don't understand what you mean.
> 
> It means you need to trust somebody else who maintains this pair. 
> If you got the urn and the file, then you can only verify it if you
> can verify the content type. I'm wondering if that's always possible.

This doesn't seem to be the same argument as above (lengthening the 
durability of the hash function beyond breakages).

I don't quite see the point of this paragraph. Either you have someone 
who's approved of the URN (maybe simply sent it to you). Then this 
person has vouched for the content type as well as the hash.

Or you assume that someone may have tampered with the URN, and changed 
the content type to something that suits that adversary. If so, I don't 
see how the adversary couldn't also have changed the hash.

Or maybe you assume that someone has vouched for the hash, but not the 
content type (i.e., not for the full URN). In that case, I would not 
conclude that the person has vouched for urn:sha1:text/plain,hash -- I 
would only conclude they have vouched for urn:sha1:hash ...


So, you seem to say two things:

1. Putting the content type into the hashed data makes the hash usable 
beyond breakages of the hash function. I think this is most likely 
incorrect, assuming an attack by a skilled cryptoanalyst (rather than a 
script kiddie).

2. Something else, which I still don't understand ;-)

> I hope this is becoming clearer :-)

Clearer-- but not yet clear :-)

- Benja


From sam at neurogrid.com  Sun Jul 20 19:25:02 2003
From: sam at neurogrid.com (Sam Joseph)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] New Wiki List of P2P Conferences
References: <3F1A3D2B.3080804@neurogrid.com> <3F1AF326.7010804@gmx.de>
Message-ID: <3F1B4EC7.5030304@neurogrid.com>

Hi Benja

Benja Fallenstein wrote:

> I'm unable to register for the Wiki (I get an error message that email 
> could not be delivered). I suggest that you add USENIX NSDI'04 to the 
> list-- the CFP mentions P2P systems as one area of interest. Deadline 
> is September 15th.
>
>     http://www.usenix.org/events/nsdi04/cfp/
>
> Thanks for the effort, the Wiki looks like a good idea. 

No worries - have added the USENIX conference.

Sorry about you not being able to log in - Actually you did successfully 
create your account and could have logged in - it was just the email 
notification that failed - I'm working with my ISP to try and fix that 
problem ...

You're in the system as BenjaFallenstein  - I believe that you should 
now be able to log in with the password you created previously  - if you 
can't let me know and I'll reset your account.

CHEERS> SAM


From dirkx at webweaving.org  Mon Jul 21 05:35:03 2003
From: dirkx at webweaving.org (Dirk-Willem van Gulik)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs (was: Comments on
 draft-thiemann-cbuid-urn-00)
In-Reply-To: <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com>
Message-ID: <20030720204545.T1449-100000@foem>


On Wed, 16 Jul 2003, Carl Ellison wrote:

> 	Is urn:sha1:dRDPBgZzTFq7Jl2Q2N/YNghcfj8= not now legal?

Until there is a registered URN namespace/authority called 'sha1' - nope.

You could use x-sha1 - but that is kind of frowned upon.

> >     urn:sha1:...
> >     urn:sha256:...
> >     urn:tiger:...
> >     urn:sha1+sha256:...
> >     urn:sha1+tiger:...
> >     urn:sha256+tiger:...
> >     urn:sha1+sha256+tiger:...

Or regigster

	urn:assortedhashes:....
or
	urn:hash:....

and write an RFC which would document that it has to have the
shape:

	urn:hash:[a-z\+]+:<base32string>

And then in that RFC either define all of the above; or devise a scheme
by which additional A+B+C's can be added to that block.

Obviously in the above replace s/hash/by-whatever/ you want.

Dw.


From thiemann at informatik.uni-freiburg.de  Mon Jul 21 05:35:06 2003
From: thiemann at informatik.uni-freiburg.de (Peter Thiemann)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] content-types in URIs
In-Reply-To: <3F159813.4070903@gmx.de>
References: <3F0DC458.6090401@gmx.de>
	<m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>
	<3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com>
	<3F159813.4070903@gmx.de>
Message-ID: <m38yqt3r74.fsf@kailua.informatik.uni-freiburg.de>

>>>>> "Benja" == Benja Fallenstein <b.fallenstein@gmx.de> writes:

    Benja> Hi Justin,

    Benja> Justin Chapweske wrote:
    >> Perhaps I'm misinterpretting cbuid, but it appears to suggest that
    >> the hash URIs include the content-type of the content in the URI.

    >> I think this notion is a good idea from a security perspective.  It
    >> is important to retrieve the content-type from a trusted source.
    >> While probably not practical, one could envision a content-type
    >> attack where something is perfectly harmless when interpretted as a
    >> JPEG, but turns into a virus when interpretted as an
    >> executable.

Well, my feelings towards this are similar to Benja's.

    Benja> In my mind, the entity that we're identifying is a pair:

    Benja>      (content type, octet stream)

    Benja> The question is, given such a pair, and given a hash-based URI, can we
    Benja> authenticate that the URI identifies exactly this pair and no other? 
    Benja> If so, I would call the URI self-verifying.

    Benja> Now, given

    Benja>      urn:sha1:text/plain,<hash>

    Benja> and a pair

    Benja>      ("text/plain", "foobar")

    Benja> we can verify the <hash> against "foobar", and compare the
    Benja> "text/plain" from the URI to the "text/plain" in the pair, so the URI
    Benja> clearly maps to only one such pair (as long as finding a hash
    Benja> collision is impossible). Thus, sha1 identifiers of this form *would*
    Benja> be self-verifying.

There is too much good will and protocol in your proposal. Here is a
much simpler way of getting self-verifying and endorsable hashes:

Let's say you have this resource

        ("text/plain", "fubar")

the trick is to take the hash not just from "fubar" but from the
concatenation of the mediatype and the contents:

H = sha1 ("text/plainfubar")

With this setup, if there is a registry which endorses

	urn:hash:text/plain:sha1:H

then that means the registry has checked
1. that the "fubar" content is ok and
2. that its mediatype is "text/plain"

In addition, the value H is *less* prone to forgery because the hashed
value is not completely arbitrary: it *must* start with "text/plain".
[I don't know enough about hash functions to judge if this introduces
enough regularity. If more regularity is required, then compute the
hash of the resource with the mediatype interleaved in some fixed way,
for example, at the start of each 1024 byte data block.
Alternatively, the hash might be taken from the resource by exor-ing
cyclically with the mediatype.
I don't know which alternative would be better.] 
If this works, then it seems to be a quite strong argument for
including content types.

-Peter

From thiemann at informatik.uni-freiburg.de  Mon Jul 21 05:35:09 2003
From: thiemann at informatik.uni-freiburg.de (Peter Thiemann)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] content-types in URIs
In-Reply-To: <3F1AFD07.2030509@gmx.de>
References: <3F0DC458.6090401@gmx.de>
	<m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>
	<3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com>
	<3F159813.4070903@gmx.de>
	<m38yqt3r74.fsf@kailua.informatik.uni-freiburg.de>
	<3F1AFD07.2030509@gmx.de>
Message-ID: <m3u19g2930.fsf@kailua.informatik.uni-freiburg.de>

>>>>> "Benja" == Benja Fallenstein <b.fallenstein@gmx.de> writes:

    Benja> Peter Thiemann wrote:
    Benja> Now, given
    Benja> urn:sha1:text/plain,<hash>

    Benja> and a pair

    Benja> ("text/plain", "foobar")

    Benja> we can verify the <hash> against "foobar", and compare
    Benja> the

    Benja> "text/plain" from the URI to the "text/plain" in the pair, so the URI
    Benja> clearly maps to only one such pair (as long as finding a hash
    Benja> collision is impossible). Thus, sha1 identifiers of this form *would*
    Benja> be self-verifying.
    >> There is too much good will and protocol in your proposal.


    Benja> Sorry, I don't understand what you mean.

It means you need to trust somebody else who maintains this pair. 
If you got the urn and the file, then you can only verify it if you
can verify the content type. I'm wondering if that's always possible.

    >> Here is a
    >> much simpler way of getting self-verifying and endorsable hashes:
    >> Let's say you have this resource

    >> ("text/plain", "fubar")

    >> the trick is to take the hash not just from "fubar" but from the

    >> concatenation of the mediatype and the contents:
    >> H = sha1 ("text/plainfubar")


    Benja> This is a classic fallacy when working with hashes btw: There is no
    Benja> way to know which of the following is meant:


    Benja>      ("text/plai", "nfubar")
    Benja>      ("text/plain", "fubar")
    Benja>      ("text/plainf", "ubar")

Hmm, I'd say it's trivial because the urn (see below) states the
content type as a string. So the boundary is obvious and implicit.

    >> With this setup, if there is a registry which endorses
    >> urn:hash:text/plain:sha1:H

    >> then that means the registry has checked

    >> 1. that the "fubar" content is ok and
    >> 2. that its mediatype is "text/plain"

    Benja> What's the cost? That it doesn't interoperate with today's systems,
    Benja> which only hash the content; that all sha1 hashes in ciculation on the
    Benja> Web today are useless.

No!

1. They are not useless. If the urn doesn't state the content type,
   then the hash is applied just to the resource. 
2. Interoperation is given in the same way, but you could migrate
   towards the other scheme.

    Benja> What's the benefit? That you can use H alone to identify the (content
    Benja> type, content) pair. Would this really be useful? I don't see how.

Well, this is the explanation:

    >> In addition, the value H is *less* prone to forgery because the hashed
    >> value is not completely arbitrary: it *must* start with "text/plain".

    Benja> I fail to see what you mean by "less prone to forgery."

What I mean to say is this: if you are hashing a file with some
internal structure, then it's much harder to construct a
collision. This is because the adversary would have to come up with
an offending file that has
a. the same hash and
b. the same internal structure
My proposal aims at adding internal structure artificially just during
the computation of the hash. For the fubar example, this means

H = sha1 ("text/plainfubar")

Suppose some adversary has found a string x so that 

sha1 (x) = H

Too bad! But, what is the probability that x starts with
"text/plain"? Very small, but I'm not an expert to judge how much
smaller. If the probability is not small enough, then create more
structure (see my suggestions in the other message) to make the
probability smaller. This way you can outlive broken hash functions,
which seems to be an advantage.

    Benja> We're miscommunicating somewhere, I think.

I hope this is becoming clearer :-)

-Peter


From michael at neonym.net  Mon Jul 21 05:35:11 2003
From: michael at neonym.net (Michael Mealling)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs (was: Comments on
 draft-thiemann-cbuid-urn-00)
In-Reply-To: <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com>
References: <E19cSuD-0005pu-00@localhost> <3F0DC458.6090401@gmx.de>
	 <m3oezvu8iq.fsf@kailua.informatik.uni-freiburg.de>
	 <3F1427B7.6010402@chapweske.com> <E19cSuD-0005pu-00@localhost>
	 <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com>
Message-ID: <1058740510.3214.1.camel@blackdell.neonym.net>

On Wed, 2003-07-16 at 11:06, Carl Ellison wrote:
> 	I apparently don't know the rules for URN formation.  I thought it
> was completely free after the "urn:".  Is that not true?

Essentially, yes. There are syntactic restrictions that exist for all
URIs but that's fairly generic. URNs still have to be persistent so you
can put things like domain-names in there unless you time/date stamp
them....

-Michael Mealling 


From bram at gawth.com  Mon Jul 21 18:09:02 2003
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Verification for erasure-encoded multi-source
 dload
In-Reply-To: <Pine.SOL.4.51.0307191442370.18990@kruuna.Helsinki.FI>
Message-ID: <Pine.LNX.4.21.0307211806030.8932-100000@ultra.gawth.com>

Sampo Syreeni wrote:

> I'd also check whether there are patents on the overall concept of using
> sparse codes for swarmcast -- I think using them in conventional multicast
> is in fact patented.

Good thing online codes are completely unnecessary for swarming.

I've said this before, but it's worth repeating: BitTorrent doesn't use
online codes because they would increase overhead and produce at best
dubious improvements in download rates.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From sherrysemails at yahoo.com  Mon Jul 21 21:49:02 2003
From: sherrysemails at yahoo.com (Carolyn Tracy)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] New Wiki List of P2P Conferences
In-Reply-To: <3F1B4EC7.5030304@neurogrid.com>
Message-ID: <20030722044809.58759.qmail@web20910.mail.yahoo.com>

please tell me how i can be removed form this ! I have no interest in getting anymore emails from anyone from here anymore. Thank you ...

Sam Joseph <sam@neurogrid.com> wrote:Hi Benja

Benja Fallenstein wrote:

> I'm unable to register for the Wiki (I get an error message that email 
> could not be delivered). I suggest that you add USENIX NSDI'04 to the 
> list-- the CFP mentions P2P systems as one area of interest. Deadline 
> is September 15th.
>
> http://www.usenix.org/events/nsdi04/cfp/
>
> Thanks for the effort, the Wiki looks like a good idea. 

No worries - have added the USENIX conference.

Sorry about you not being able to log in - Actually you did successfully 
create your account and could have logged in - it was just the email 
notification that failed - I'm working with my ISP to try and fix that 
problem ...

You're in the system as BenjaFallenstein - I believe that you should 
now be able to log in with the password you created previously - if you 
can't let me know and I'll reset your account.

CHEERS> SAM

_______________________________________________
p2p-hackers mailing list
p2p-hackers@zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers


---------------------------------
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://zgp.org/pipermail/p2p-hackers/attachments/20030721/d184581c/attachment.htm
From conspiracytheory1 at hotmail.com  Tue Jul 22 05:42:02 2003
From: conspiracytheory1 at hotmail.com (sam �)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] unsubscribe me Please
Message-ID: <Law9-F57M19VMczi9yz00003317@hotmail.com>


>From: "Doug Burton" <dtburton75@buckeye-express.com>
>Reply-To: p2p-hackers@zgp.org
>To: <p2p-hackers@zgp.org>
>Subject: [p2p-hackers] unsubscribe me Please
>Date: Fri, 4 Jul 2003 18:38:33 -0400
>

_________________________________________________________________
Use MSN Messenger to send music and pics to your friends 
http://www.msn.co.uk/messenger


From xmaj at hotmail.com  Wed Jul 23 06:03:02 2003
From: xmaj at hotmail.com (reza majidi)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] unsubscribe me pls.
Message-ID: <BAY2-F737OIkUqLpHka000011c3@hotmail.com>


_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE* 
http://join.msn.com/?page=features/virus


From hopper at omnifarious.org  Wed Jul 23 23:23:02 2003
From: hopper at omnifarious.org (Eric M. Hopper)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Verification for erasure-encoded multi-source
	dload
In-Reply-To: <Pine.LNX.4.21.0307211806030.8932-100000@ultra.gawth.com>
References: <Pine.LNX.4.21.0307211806030.8932-100000@ultra.gawth.com>
Message-ID: <1059027736.11698.374.camel@monster.omnifarious.org>

On Mon, 2003-07-21 at 20:08, Bram Cohen wrote:
> Sampo Syreeni wrote:
> 
> > I'd also check whether there are patents on the overall concept of using
> > sparse codes for swarmcast -- I think using them in conventional multicast
> > is in fact patented.
> 
> Good thing online codes are completely unnecessary for swarming.
> 
> I've said this before, but it's worth repeating: BitTorrent doesn't use
> online codes because they would increase overhead and produce at best
> dubious improvements in download rates.

The only thing I can see them buying you is a little more robustness
against seeds disappearing.

Have fun (if at all possible),
--
There's an excellent C/C++/Python/Unix/Linux programmer with a wide
range of other experience and system admin skills who needs work.
Namely, me. http://www.omnifarious.org/~hopper/resume.html
-- Eric Hopper <hopper@omnifarious.org>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 185 bytes
Desc: This is a digitally signed message part
Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030723/d2ac4e16/attachment.pgp
From tyler at waterken.com  Thu Jul 24 10:53:02 2003
From: tyler at waterken.com (Tyler Close)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Listing P2P URL schemes
Message-ID: <E19fjvY-000505-00@canteen.waterken.com>

I am building a list of P2P URL schemes at:

http://www.waterken.com/dev/YURL/#YURL_schemes

The list is divided into two sections: URLs that locate active
computing agents and URLs that locate files. I think many of the
participants on this mailing list have URL schemes that should be
listed in the second section of the list.

Please send me links for other URL schemes that belong on the
list. I'll try to update the list in real-time as I receive new
links.

Thank you,
Tyler

-- 
The union of REST and capability-based security:
http://www.waterken.com/dev/Web/

From digi at treepy.com  Thu Jul 24 11:43:02 2003
From: digi at treepy.com (p@)
Date: Sat Dec  9 22:12:21 2006
Subject: AW: [p2p-hackers] Listing P2P URL schemes
In-Reply-To: <E19fjvY-000505-00@canteen.waterken.com>
Message-ID: <004501c35213$5306d9c0$0200a8c0@pat>

Here are the treepy url definitions... reserved URL's are not complete
yet.

-----Urspr?ngliche Nachricht-----
Von: p2p-hackers-admin@zgp.org [mailto:p2p-hackers-admin@zgp.org] Im
Auftrag von Tyler Close
Gesendet: Donnerstag, 24. Juli 2003 19:31
An: p2p-hackers@zgp.org
Betreff: [p2p-hackers] Listing P2P URL schemes

I am building a list of P2P URL schemes at:

http://www.waterken.com/dev/YURL/#YURL_schemes

The list is divided into two sections: URLs that locate active
computing agents and URLs that locate files. I think many of the
participants on this mailing list have URL schemes that should be
listed in the second section of the list.

Please send me links for other URL schemes that belong on the
list. I'll try to update the list in real-time as I receive new
links.

Thank you,
Tyler

-- 
The union of REST and capability-based security:
http://www.waterken.com/dev/Web/
_______________________________________________
p2p-hackers mailing list
p2p-hackers@zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers
_______________________________________________
Here is a web page listing P2P Conferences:
http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences

-------------- next part --------------
========================================================================
		Treepy URL Definitions		
2003.03.04, Patrick Lauber
	- Initial document release

========================================================================
The structur of treepy URL's:

abstract: 3pi://directory1.directory2/path1/path2?argument&subargument
virtual:  3pi://entertainment.cinema.movieDB/action/terminator2.avi?name&changed
internal: 3pi://entertainment.cinema.movieDB/52?name&changed


3pi:// = schema

entertainment.cinema.movieDB = Location... normally directory clusters

/action/terminator2.avi = virtual-path to a cluster... this address is defined by the directory cluster (in this example movieDB, there is an internal address too... something like /_17)
if the cluster is a directory cluster the path is always '/'

?name argument or subobject... see a list of reserved arguments later
&changed sub-argument or sub-sub-object.. see a list of reserved sub-arguments later

The virtual urls are only used for display in GUI's. The gui is advised to use internal ones for communication with core ... and virtual ones for communcication with users.
The reason we have to use internal url's is that if a directory-cluster admin chooses to move a cluster in the tree the url would change... this would make a big security problem because only the cluster-creator should be allowed to choose the parent or url what is the same.
=======================================================================
Possible Arguments

core/load arguments (GET) (see treepyhttp.txt for more infos on core/load)
-------------------------
Only one argument can be provided at one time

example: http://localhost:3141/core/load?url="3pi://entertainment.cinema/?_arglist"

Reserved arguments:

Reserved arguments are either write-protected or only writeable by the cluster-creator or moderators

X = write protected (only core can write)
K = write only by cluster owner 
M = write by all mods and cluster owner

C = need a computation befor output

l = is a list
t = is a table
s = string
n = number
b = binary (1 or 0)

c = not accessible via tcp-COM-Load
_* 		   Joker argument... used for clusters with sources... means download-all
_argcreatorlist Xt returns a list of argument creators and theire total size of data and other infos
_argdelete MCb
_argdublicate MCb
_arglist Xl        returns a list of all used arguments or subobjects
_argnr XCn c       returns the number arguments or subobjects
_argmaxsize Mn     returns the maximum size of one argument
_argtreesigned MXt returns the signed argument tree
_argtreeunsigned MXt returns the unsigned argument tree ... new uploads go here
_argtreeadd
_argtreemove MCb
_auth XCb           returns if there is authentication required to access this cluster(white-list)
_blacklist Kt      returns IP's and keys of banned people
_changed Xs        returns the date of last change
_createdate CXs    returns the date of creation
_edit XCb c        returns if you have the right to change content of this cluster
_hasargtree        returns if the clusters uses argument-trees
_info XCt c        returns a text file with some infos about the cluster
_iscreator XCb c   returns if you have a privat key
_ismember Xb c	   returns if you are a member of this cluster
_ismod XCb c       returns if you have a privat key
_keepsources Xb    returns if you should register the arguments you have after online again
_lastaccess        returns the date of last acces tothis file
_memberlist Xl     returns a list of nodes that are members of this cluster
_membernr XCn c    returns the number of cluster members
_minmembers Cn     returns the minimal Nr of cluster-members
_modkeys Kt        returns the keys of the moderators that are allowed to change some content
_move XCb c        returns if you have the right to move this cluster (are you admin of parent-directory-cluster?)
_name Xs c         returns the name of the cluster in the tree (used for GUI, defined by directory-parent)
_pathlist Xl       returns a list of all argument paths (directory only)
_parent Ks         returns the parent-directory-cluster-url of this url
_parentarg Xs	   returns under which argument the parent has saved you
_ping XCn c	   returns the roundtime of a packet trough the cluster
_plugin Ks         returns the plugin name, plugin version, plugin URL
_pubargs Kb        returns if the public can create sub-objects (arguments)
_pubargquota Mb    returns the maximum size in bytes of data saved by one client
_pubargretime Mn   returns many infos on how many times a minute/hour/day someone can post
_pubkey Ks         returns the public KEY of the creator
_updatecache Mn    returns how big the updateslist is (default: 1024)  
_upload XCb c      returns if you are allowed to attach other clusters to this cluster
_treehash          returns a TIGER treehash of all arguments 
_ranmember XCs c   returns one random member ip:port
_referrers Xl      returns all refers to this cluster (links)
_size XCn c        returns the size of all data saved in this cluster
_sources Xt        returns a list of more clients with what arguments they have
_whitelist Kt      returns a list of usernames/passwords
_whitelistkeys Kt  returns pubkeys of users (enhanced security)

no argument returns _info or if plugin=directory returns _argtree and _pathlist
other arguments can be used freely if signed by _privkey or pubargs is true

sub arguments or reserverd sub-sub-objects
multiple sub-arguments possible (handled like one)
handled like this: ?argument&subargument

_changed Xs        returns the date this argument has changed
_pubkey Xs         returns the public key of the subobject creator
_haskey XCs        returns the privat key of the subobject creator
_sig Xs            returns the signatur of the parent-argument signed by creator
_hash X             if there is no sig... hash is saved instead
_encrypted Xb      returns if the data was encrypted
_enryptpubkey Xs   returns the pub key with whom the data was enrypted
           plain/text)

=====================================================================================
core/save arguments (POST)
--------------------------
reserved arguments:
xml = true        upload a xml file that is like infoxml to change infos
create = true     create a new cluster (provide a xml file with details)

all reserved arguments from load apply here too

example:

http://localhost:3141/core/save?url="3pi://entertainment.cinema/?_arglist"&data="<xml....

=====================================================================================
RESERVED URL'S

Write protected URL's

3pi://localcore/
3pi://localcore/settings?VERSION
3pi://localcore/settings?PUBKEY
3pi://localcore/status?IP
3pi://localcore/status?UPKBIT
3pi://localcore/status?DOWNKBIT
3pi://localcore/status?UPLOADS
3pi://localcore/status?DOWNLOADS
3pi://localcore/status?QUEUE
3pi://localcore/status?CONNECTIONS
3pi://localcore/status?DOWNLOADEDTOTALKBYTE
3pi://localcore/status?UPLOADEDTOTALKBYTE
3pi://localcore/status?FIREWALLED   (too if not forwarded NAT or Routed)
3pi://mytree
3pi://localcore/plugins/
3pi://localcore/downloads?URLLIST
3pi://localcore/urlcache?URLLIST
3pi://localcore/creatorurls?URLLIST

========================================================================
this url's can be written:

3pi://localcore/settings?CLUSTERKEYSIZE
3pi://localcore/settings?MODKEYSIZE
3pi://localcore/settings?ARGUMENTKEYSIZE
3pi://localcore/settings?USERKEY
3pi://localcore/settings?CHANGEPASSWORD  (for username too)
3pi://localcore/settings?ADDROOTCLUSTER
3pi://localcore/settings?KILLROOTCLUSTER
3pi://localcore/settings?IOPORT
3pi://localcore/settings?COMMPORT
3pi://localcore/settings?COMMCPORT
3pi://localcore/settings?URLCACHESIZE
3pi://localcore/settings?OOFSDRIVES
3pi://localcore/settings?OOFSDRIVES

3pi://localcore/clusterkeys?KEYLIST  (no read)
3pi://localcore/modkeys?KEYLIST      (no read)
3pi://localcore/argumentkeys?KEYLIST (no read)

3pi://localcore/clustermemberships?URLLIST
3pi://localcore/sources?URLLIST
3pi://localcore/seeds?URLLIST

3pi://localcore/status?MAXUPLOADKBIT
3pi://localcore/status?MAXNRQUEUE
3pi://localcore/status?MAXCONNECTIONS
3pi://localcore/status?MAXCONNECTIONSSECOND
From tyler at waterken.com  Fri Jul 25 09:01:02 2003
From: tyler at waterken.com (Tyler Close)
Date: Sat Dec  9 22:12:21 2006
Subject: AW: [p2p-hackers] Listing P2P URL schemes
In-Reply-To: <004501c35213$5306d9c0$0200a8c0@pat>
References: <004501c35213$5306d9c0$0200a8c0@pat>
Message-ID: <E19g4ew-0005F4-00@canteen.waterken.com>

On Thursday 24 July 2003 14:42, digi@treepy.com wrote:
> Here are the treepy url definitions... reserved URL's are not complete
> yet.

Do any of these URLs provide security guarantees like those
provided by Mnet?

I should have made more clear that the list I am building is of
URL schemes that provide P2P security. I am trying to list other
P2P network protocols like Mnet. For example, any P2P file sharing
program that identifies and authenticates a file based solely on a
cryptographic hash of the file, qualifies for the list.

Thanks,
Tyler

-- 
The union of REST and capability-based security:
http://www.waterken.com/dev/Web/

From digi at treepy.com  Fri Jul 25 09:46:02 2003
From: digi at treepy.com (p@)
Date: Sat Dec  9 22:12:21 2006
Subject: AW: AW: [p2p-hackers] Listing P2P URL schemes
In-Reply-To: <E19g4ew-0005F4-00@canteen.waterken.com>
Message-ID: <000b01c352cc$30f71110$0200a8c0@pat>

Sorry for misinterpretation... we use real urls instead of hashes that
look like urls... hashes are saved at url's too... every url is signed,
up from the root... 

Cheers

p@


-----Urspr?ngliche Nachricht-----
Von: p2p-hackers-admin@zgp.org [mailto:p2p-hackers-admin@zgp.org] Im
Auftrag von Tyler Close
Gesendet: Freitag, 25. Juli 2003 17:39
An: p2p-hackers@zgp.org
Betreff: Re: AW: [p2p-hackers] Listing P2P URL schemes

On Thursday 24 July 2003 14:42, digi@treepy.com wrote:
> Here are the treepy url definitions... reserved URL's are not complete
> yet.

Do any of these URLs provide security guarantees like those
provided by Mnet?

I should have made more clear that the list I am building is of
URL schemes that provide P2P security. I am trying to list other
P2P network protocols like Mnet. For example, any P2P file sharing
program that identifies and authenticates a file based solely on a
cryptographic hash of the file, qualifies for the list.

Thanks,
Tyler

-- 
The union of REST and capability-based security:
http://www.waterken.com/dev/Web/
_______________________________________________
p2p-hackers mailing list
p2p-hackers@zgp.org
http://zgp.org/mailman/listinfo/p2p-hackers
_______________________________________________
Here is a web page listing P2P Conferences:
http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences


From bram at gawth.com  Fri Jul 25 10:18:02 2003
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Verification for erasure-encoded multi-source
 dload
In-Reply-To: <1059027736.11698.374.camel@monster.omnifarious.org>
Message-ID: <Pine.LNX.4.21.0307251015480.8932-100000@ultra.gawth.com>

Eric M. Hopper wrote:

> > I've said this before, but it's worth repeating: BitTorrent doesn't use
> > online codes because they would increase overhead and produce at best
> > dubious improvements in download rates.
> 
> The only thing I can see them buying you is a little more robustness
> against seeds disappearing.

That's handled by carefully selecting which pieces to download first
(basically you start with the rarest ones, but there's some tweaking and
implementing it efficiently is nontrivial). The times I still get problems
are when everyone who's done downloading drops off leaving no complete
copies around, but that runs into information theoretic limits.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From bram at gawth.com  Fri Jul 25 10:19:02 2003
From: bram at gawth.com (Bram Cohen)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] Listing P2P URL schemes
In-Reply-To: <E19fjvY-000505-00@canteen.waterken.com>
Message-ID: <Pine.LNX.4.21.0307251018090.8932-100000@ultra.gawth.com>

Tyler Close wrote:

> Please send me links for other URL schemes that belong on the
> list. I'll try to update the list in real-time as I receive new
> links.

BitTorrent hacks in using a mimetype, so it uses regular-looking urls.

-Bram Cohen

"Markets can remain irrational longer than you can remain solvent"
                                        -- John Maynard Keynes


From cefn.hoile at bt.com  Wed Jul 30 07:39:03 2003
From: cefn.hoile at bt.com (cefn.hoile@bt.com)
Date: Sat Dec  9 22:12:21 2006
Subject: [p2p-hackers] MMAPPS Day
Message-ID: <21DA6754A9238B48B92F39637EF307FD0204B50B@i2km41-ukdy.nat.bt.com>

Some of my colleagues here at BT Exact www.btexact.com are involved in a
project called MMAPPS (Market Management of Peer-to-Peer Services) which
I expect to be of interest to p2p-hackers, p2prg and decentralization
subscribers. (Apologies for crosspost if you don't agree).

For general project information see:
www.mmapps.org
For further information on the MMAPPS Project Day see:
www.mmapps.org/events

A summary of the project is shown inline below for reference. Please
feel free to contact me for further information and I will pass on your
enquiry to the relevant member of the team.

Cefn
http://www.cefn.com

>>>>>>>>>>>>>>>>>>>

MMAPPS (Market Management of Peer-to-Peer Services) is creating generic
middleware to support a new class of P2P applications that give peers
appropriate incentives to co-operate for the good of the whole peer
community. These applications will encourage peers to contribute and
will then efficiently allocate that contribution amongst the members of
the community.

The middleware is generic and supports the definition of a wide variety
of incentive schemes which may be based on rewarding good behaviour, or
on punishing bad. The schemes can involve payments for contribution but
can also be based on rules that enforce a minimum contribution through
community sanctions.

A key aspect is how peers contribution is accounted for; the middleware
provides support for a very wide variety of specific accounting and
management schemes, since we have come to the firm conclusion that
different P2P applications can require very different trade-offs in
terms of the necessary security/scalability/anonymity/robustness of such
schemes.

The MMAPPS middleware framework has the potential to underlie a very
wide range of future P2P applications, and allows easy re-use of many
different independently-developed accounting schemes (including
traditional micropayment schemes, reputation-based schemes, and many
more novel, lighter-weight 'record-based' schemes).

The project is now approaching its mid-point (and is in its engineering
phase) and so has plenty of results to report.

If you would like to be kept informed of the project's progress we
recommend you do so either via the MMAPPS Newsletter (issued quarterly)
or more directly through participation in the FREE 1-day MMAPPS Project
Day we will be holding (as part of NGC/ICQT'03). 

To receive further information on either the Newsletter or Project Day,
please send a mail to cefn.hoile@bt.com, or register yourself directly
at the MMAPPS web-site www.mmapps.org.

MMAPPS Project partners are:	AUEB, BT, ETHZ, Mysterian, TA, TUD and
ULANC.

>>>>>>>>>>>>>>>>>>>

Disclaimer: This post represents the views of the author and does not
necessarily accurately represent the views of BT.