From bram at gawth.com Thu Jul 3 08:58:02 2003 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] IRTF P2P Working Group Formed In-Reply-To: <3EFB48D2.3080602@chapweske.com> Message-ID: Justin Chapweske wrote: > A new IRTF research group, P2PRG (Peer-to-Peer Research Group), has begun, > with the appended charter. Use p2prg-request@ietf.org to subscribe to the > mailing list. > > - Vern Paxson (IRTF chair) I predict that this group will be just as important as Intel's p2p standards group. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From justin at chapweske.com Thu Jul 3 10:03:02 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] IRTF P2P Working Group Formed In-Reply-To: References: Message-ID: <3F0461B4.8090401@chapweske.com> I disagree. Starting with a research group is a good first step in defining taxonomy and requirements in this space. I'm sure it will be a number of years before an IETF working group is formed and standards are created, but this is a good first step. Bram Cohen wrote: > Justin Chapweske wrote: > > >> A new IRTF research group, P2PRG (Peer-to-Peer Research Group), has begun, >> with the appended charter. Use p2prg-request@ietf.org to subscribe to the >> mailing list. >> >> - Vern Paxson (IRTF chair) > > > I predict that this group will be just as important as Intel's p2p > standards group. > > -Bram Cohen > > "Markets can remain irrational longer than you can remain solvent" > -- John Maynard Keynes -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From adam at cypherspace.org Thu Jul 3 21:48:02 2003 From: adam at cypherspace.org (Adam Back) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach Message-ID: <20030704054745.B13145352@exeter.ac.uk> So one common p2p download approach is to download a file in parts in parallel (and out of order) from multiple servers. (Particularly to achieve reasonable download rates from multiple asynchronous links of varying link speeds). A common idiom is also that there is a compact authenticator for a file (such as it's hash) which people will supply as a document-id. Then we have the issue with people actively jamming p2p networks, so it has become practically interesting to achieve per chunk authentication in these multiple server downloads, while retaining a single compact file authenticator. I read the THEX (Tree Hash EXchange format) internet-draft proposed by Justin Chapweske and Gordon Mohr, and I'm taking from that document that it attempts to deal with this problem. However it seems somewhat inefficient (or if used efficiently not to robustly achieve per-chunk anti-jamming for moderate-to-large sized files). They propose using a Merkle Hash Tree (MHT) on the document with base chunks of 1KB. One of their claims is that the MHT itself can be downloaded from different nodes to combat needing to trust the tree server (and presumably for scalability). They appear to propose that the recipient would download log(n) chunks from a tree server (by asking for offsets and lengths in the tree file, presumably using HTTP keep-alive and offset, length features of HTTP). However this has significant request overhead when the file hashes are 16-20bytes (the request would be larger than the hash), it also involves at least two connections: one for authentication info, another for downloading chunks. They also mention their format is suitable for serial download (as well as the random access download I described in the above paragraph). Here I presume (though it is not stated) that the user would be expected to download either the entire set of leaf nodes (1/2 the full tree size), or some subset of the leaf-nodes plus enough other nodes to verify that the leaf-nodes were correct. (To avoid being jammed during download from the tree server.) Again none of this is explicitly stated but would be minimally necessary to avoid jamming. A simpler and more efficient approach is as follows (I presume a 128bit (16byte) output hash function such as MD5, or truncated SHA1; I also presume each node has the whole file): if the file is <= 1KB download the file and compare to master hash. If the file is > 1KB and <= 64KB download, hash separately each of the 1KB chunk of the file; call the concatenation of those hashes the 2nd level hash set, and call the hash of the 2nd level hash set the master hash. To download first download the 2nd level hash set (a 1KB file) and check that it hashes to the master hash. Then download each 1KB chunk of the file (in random order from multiple servers) and check each 1KB chunk matches the corresponding chunk in the 2nd level hash set. If the file is > 64KB and <= 4MB, hash separately each of the 1KB chunks of the file; call the concatenation of those hashes the 2nd level hash set. The 2nd level hash set will be up be 64KB in size. Hash separately each of the up to 64 1KB chunks of the 2nd level hash set; call the concatenation of those hashes the 3rd level hash set. Call the hash of the 3rd level hash set the master hash. Download an dverification is an obvious extension of the 2 level case. Repeat for as many levels as necessary to match the file size. Bandwidth efficiency is optimal, there is a single compact file authenticator (the master hash: the hash of the 2nd level hash set), and immediate authentication is provided on each 1KB file chunk. To avoid the slow-start problem (can't download and verify from multiple servers until the 2nd level hash set has been downloaded), the 2nd level hash set chunk could have download started from multiple serers (to discover the fastest one), and/or speculative download of 3rd level chunks or content chunks could be started and verifciation deferred until the 2nd level hash set chunk is complete. Adam From gojomo at bitzi.com Fri Jul 4 00:07:02 2003 From: gojomo at bitzi.com (Gordon Mohr) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach References: <20030704054745.B13145352@exeter.ac.uk> Message-ID: <014d01c341fa$c0a58bd0$660a000a@golden> Adam Back writes: > I read the THEX (Tree Hash EXchange format) internet-draft proposed by > Justin Chapweske and Gordon Mohr, and I'm taking from that document > that it attempts to deal with this problem. > > However it seems somewhat inefficient (or if used efficiently not to > robustly achieve per-chunk anti-jamming for moderate-to-large sized > files). They propose using a Merkle Hash Tree (MHT) on the document > with base chunks of 1KB. Yes. > One of their claims is that the MHT itself > can be downloaded from different nodes to combat needing to trust the > tree server (and presumably for scalability). I would not put it that way; rather, for some reason, you already trust the root value of the tree. It's been recommended to you by a trusted source, commonly accepted in public forums as a good root, whatever. From adam at cypherspace.org Fri Jul 4 02:51:01 2003 From: adam at cypherspace.org (Adam Back) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <014d01c341fa$c0a58bd0$660a000a@golden>; from gojomo@bitzi.com on Fri, Jul 04, 2003 at 12:06:11AM -0700 References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden> Message-ID: <20030704105003.A13087560@exeter.ac.uk> > The THEX data format is really best for grabbing a whole subset of > the internal tree values, from the top/root on down, in one > gulp. Yes, that top-down format includes redundant info, Well you can halve the downloaded tree size as you only need the leaves (at your desired resolution) (given processor speeds, link speeds and the efficiency of hash algorithms, I think you can categorically state that you _will_ be able repopulate the rest faster than you can download it). > and you could grab the data from lots of different people, but it's > so small compared to the content you're getting, why not just get > the whole thing from any one arbitrary peer who has it handy? Because the arbitrary peer may be jamming you. I'm presuming a byzantine network, where some significant proportion of nodes are hostile and working for a well-funded adversary. (Actually the current p2p network looks a lot lke this thanks to the RIAA funding p2p jamming-ops). > For example, the full tree to verify a 1GB file at a resolution of > 64KB chunks is only (1G/64K)*2*24(Tiger)=768K, or less than 1/10th > of 1% of the total data being verified. So I'd say, just get it from > anyone who offers it, verify it's consistent with the desired root, > and keep it around -- nothing fancy. And so if the peer was jamming you, you have to download the whole tree again; repeat until you get a tree which matches the master hash. All I'm saying is your jamming resistance is unevenly balanced. You will detect jamming (in your example) after each 64KB chunk for normal downloads. But for the tree your are accepting a lower jamming resistance, namely you download the whole tree before you notice. So in your case the jamming resistance is 12x less effective. (Actually that's 6x because you are downloading redundant data for half the tree so you can skip that; similarly it need only be 5x if you also use a more reasonable sized mainstream hash like SHA1). So THEX has two problems: A) The rational jammer will always jam your tree downloads because they are 4-5x more vulnerable (he may jam content chunks also, but he gets best value for his investment by jamming your tree downloads). B) If you download your tree in parts in parallel to speed it up, your tree download becomes even more vulnerable because when the tree fails to verify you won't know which chunks were jammed, and your overall jamming risk (for tree download) will be higher: it will be (1-p)^k (where p is the proportion of nodes which are jaming and k is the number of nodes downloaded from. This factor will be multiplied by the jamming multiplier coming from A). (If you think this is not a problem you have never tried downloading 700KB files over a period of days from a collection of a few but periodicaly changing availability (because of competition for the download and because of drop-off-and-on) dialup users because they are the only nodes which happen to have the file you want). Note also in p2p networks links vary a lot. "Just grabbing" 768kB from the first host that comes to mind may not work out to well if it is a 56kb modem that is sharing it's link with 4 other downloads. (Determining link speed ahead of time is also a separate problem, so you can't rely on picking fast links: the jammer will advertise he has many T1 links each performing in reality like 56kb modems; he can do this from his single T1 or cable modem.) > > Repeat for as many levels as necessary to match the file size. > > Bandwidth efficiency is optimal, there is a single compact file > > authenticator (the master hash: the hash of the 2nd level hash set), > > and immediate authentication is provided on each 1KB file chunk. > > On what criteria is this more simple or efficient? Efficient in that the space overhead is negligibly higher, but the jamming resistance of the tree download portion is the same as the file portion. > exactly analogous to the degree-64 tree: it includes segment summary > values covering the exact same amount of source data.) Let's phrase the recursive download another way which would make it directly compatible with THEX. I argue that A) and B) are undesirable properties that can be fixed by the following algorithm: Download the 1KB sized 6th generation chunk of the THEX MHT (which can be done due to the serialization format). Then download 1st of 64 1KB 12th generation chunks (and it can be verified against the 1st 16 byte hash in the 6th generation chunk), then download the next etc. (They can be downloaded out of order from different hosts.) Note that in this case the chunk size is selectable. If the hash size is 16 bytes (a nice size to work with), and the desired chunk size is X bytes, then the generations to download to fill that chunk are multiples of log2(X/16). That algorithm could be input for a THEX draft 3. (I'd also argue you should note that only the leaf nodes are needed due to the fast repopulation operation). Adam From bert at web2peer.com Fri Jul 4 09:03:02 2003 From: bert at web2peer.com (bert@web2peer.com) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach Message-ID: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net> On Fri, 4 Jul 2003 10:50:03 +0100, Adam Back wrote: one arbitrary peer who has it handy? > > Because the arbitrary peer may be jamming you. I'm presuming a > byzantine network, where some significant proportion of nodes are > hostile and working for a well-funded adversary. (Actually the > current p2p network looks a lot lke this thanks to the RIAA funding > p2p jamming-ops If more than half the nodes are adversarial, the network is going to be prone to all kinds of attacks that you'll never be able to surmount. So let's be generous numbers and say the probability a selected node is adversarial is .5, and the THEX tree size is 1MByte. If you download the file as a whole, on average you'll be downloading 2Mbyte of summary data per file transfer. Your scheme might allow you to reduce this to (say) 1.1Mbyte on average by detecting malicious nodes a bit earlier, for a savings of .8Mbytes per file (and again that's being very generous in the assumptions). Given that the typical application of these techniques is multi-megabyte file downloads, a .8Mbyte savings is nothing to write home about, particularly given the added complexity your scheme introduces. Perhaps you have some other applications in mind? From adam at cypherspace.org Fri Jul 4 10:02:02 2003 From: adam at cypherspace.org (Adam Back) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net>; from bert@web2peer.com on Fri, Jul 04, 2003 at 09:02:29AM -0700 References: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net> Message-ID: <20030704180124.A12837776@exeter.ac.uk> On Fri, Jul 04, 2003 at 09:02:29AM -0700, bert@web2peer.com wrote: > If more than half the nodes are adversarial, the network is going to > be prone to all kinds of attacks that you'll never be able to > surmount. I think p2p download can work (albeit less efficiently) into quite high jamming levels. Your overhead is a startup and node-maintenance one: you try to first obtain and than maintain as many non jamming peers as you can to stream at the desired rate. You try to stay with nodes that have proven non-jamming while they let you. (Individual peers often have fairness policies such that they let other users download; and also nodes drop-off, re-join, so there are reasons you can not stay with them for the duration of the download). Hence the desire for immediate and comprehensive (tree blocks as well as content blocks) jamming detection. > So let's be generous numbers and say the probability a > selected node is adversarial is .5, and the THEX tree size is 1MByte. > If you download the file as a whole, on average you'll be downloading > 2Mbyte of summary data per file transfer. Your scheme might allow you > to reduce this to (say) 1.1Mbyte on average by detecting malicious > nodes a bit earlier, for a savings of .8Mbytes per file (and again > that's being very generous in the assumptions). Given that the > typical application of these techniques is multi-megabyte file > downloads, a .8Mbyte savings is nothing to write home about, > particularly given the added complexity your scheme introduces. It has another advantage: that you can safely download the tree in parallel, without admitting higher jamming ratios. I don't think in the conclusion it adds any significant complexity as it can be expressed as just an approach to downloading the THEX tree. (Which the THEX document anyway supposes you could download in chunks: I just specify how to download those chunks in parllel in a size equal to your preferred data chunk size in a way which avoids jamming to the same degree as THEX plans for document chunks). The chunk size is really an expression of your tolerance for jamming; how much bandwidth you're willing to expend to discover whether a node is jamming or not (if it is jamming you've wasted that chunk; if it's not you've got useful content/tree data.) The jamming tolerance chunk size need not be the same as the optimal request chunk size given your link characteristics. (The request chunk size would typically be larger; some multiple (probably a power of 2 multiple) of the jammming tolerance chunk size chosen to give TCP a chance to get streaming at full speed for some useful period of time). Adam From gojomo at bitzi.com Fri Jul 4 13:28:02 2003 From: gojomo at bitzi.com (Gordon Mohr) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden> <20030704105003.A13087560@exeter.ac.uk> Message-ID: <00b601c3426a$9ff6e3f0$660a000a@golden> Adam Back writes: > > The THEX data format is really best for grabbing a whole subset of > > the internal tree values, from the top/root on down, in one > > gulp. Yes, that top-down format includes redundant info, > > Well you can halve the downloaded tree size as you only need the > leaves (at your desired resolution) (given processor speeds, link > speeds and the efficiency of hash algorithms, I think you can > categorically state that you _will_ be able repopulate the rest faster > than you can download it). I agree, but: (1) That's a fairly easiy thing to do via a range-request, or perhaps a maximum of two requests: one to get the tree header, so you know where the raw data begins, and then one to get exactly the one minimum generation you want. (2) The savings is still tiny overall, compared to either the total size of data transferred or (if you're assuming a high level of malicious mischief) the amount of data you'll be dropping on the floor each time you discover a bad peer. So any random-access or minimum-needed optimizations can be deferred. However, some Gnutella engineers have proposed that "desired generation" be included on the URI-line for THEX tree requests, so that they can get just the resolution they want from cooperating peers. This could take the form of another THEX serialization-type -- say, "single generation". > > and you could grab the data from lots of different people, but it's > > so small compared to the content you're getting, why not just get > > the whole thing from any one arbitrary peer who has it handy? > > Because the arbitrary peer may be jamming you. I'm presuming a > byzantine network, where some significant proportion of nodes are > hostile and working for a well-funded adversary. (Actually the > current p2p network looks a lot lke this thanks to the RIAA funding > p2p jamming-ops). But the best you can do, in a network where many peers are malicious, is to identify them as soon as possible, and then ignore them, preferring those who are non-malicious. So you still begin with an untrusted arbitrary peer to provide your verification data, just like you'll have arbitrary untrusted peers to provide the content, and there's an unavoidable probability related to the total number of malicious peers that you'll have to throw out the data your receive when it doesn't chekc out with the trusted root value. Spreading your requests of the verification data out over many peers only increases the chance that you'll have mixed data, some good, some bad. But as soon as you've found one good source, why not just stick with it (for the full tree or full generation)? > > For example, the full tree to verify a 1GB file at a resolution of > > 64KB chunks is only (1G/64K)*2*24(Tiger)=768K, or less than 1/10th > > of 1% of the total data being verified. So I'd say, just get it from > > anyone who offers it, verify it's consistent with the desired root, > > and keep it around -- nothing fancy. > > And so if the peer was jamming you, you have to download the whole > tree again; repeat until you get a tree which matches the master hash. Yes, but the whole tree is so small, why care? And since you have to discover nodes are bad before you can ignore them on subsequent transactions, so even in this bad-case scenario you can now ignore the malicious node when later getting the file-content. And under your alternative proposal, you need the roughly the same amount of data -- or sometimes more -- to judge any node or subregion of the tree. (More specifically: if you were to specify a THEX serialization type where 5 out of every 6 generations were omitted, each remaining "level" of that format would be exactly equivalent to the 64-degree case.) In any of these cases, the main benefit of tree-based verification against malicious nodes remains: they now have to expend roughly as much effort to interfere with tranfers as acquirers are expending to receive transfers, and they are discovered almost instantly. Malicious nodes can no longer inject tiny amounts of bad data into large downloads to impose an asymmetric cost on acquirers, who discover that their full-file is bad but don't know which region/peer was responsible. > So THEX has two problems: > > A) The rational jammer will always jam your tree downloads because > they are 4-5x more vulnerable (he may jam content chunks also, but he > gets best value for his investment by jamming your tree downloads). This doesn't follow; you can be verifying the breadth-first serialization tree download as it happens, and as soon as a single inconsistent byte appears -- which you will always be able to detect within 2*hash_size bytes -- you can dump that peer. And you can still keep all the data the bad peer sent you that did check out With say SHA1, and disregarding the header overhead, that means you can't be fed more than 40 bad bytes before the problem is evident, because the data doesn't match up with what you've previously verified. Compared to the 1K minimum resolution size on content downloads, that's 25x LESS vulnerable. Even so, these differences are all negligible at this level. > B) If you download your tree in parts in parallel to speed it up, your > tree download becomes even more vulnerable because when the tree fails > to verify you won't know which chunks were jammed, and your overall > jamming risk (for tree download) will be higher: it will be (1-p)^k > (where p is the proportion of nodes which are jaming and k is the > number of nodes downloaded from. This factor will be multiplied by > the jamming multiplier coming from A). This doesn't follow either: some large portions of the tree segments you've grabbed will be self-consistent; some will further be consistent with the desired root. You can keep all those. Only the segment(s) which are self-inconsistent or inconsistent with the root value need be discarded. (This is actually a good reason for the redundant top-down tree format.) > > On what criteria is this more simple or efficient? > > Efficient in that the space overhead is negligibly higher, but the > jamming resistance of the tree download portion is the same as the > file portion. As I've noted above, the jamming resistance of the top-down binary tree is potentially to the resolution of every 2*hash_size byte range, much higher resolution than the 1K file portions. > > exactly analogous to the degree-64 tree: it includes segment summary > > values covering the exact same amount of source data.) > > Let's phrase the recursive download another way which would make it > directly compatible with THEX. I argue that A) and B) are undesirable > properties that can be fixed by the following algorithm: > > Download the 1KB sized 6th generation chunk of the THEX MHT (which can > be done due to the serialization format). Then download 1st of 64 1KB > 12th generation chunks (and it can be verified against the 1st 16 byte > hash in the 6th generation chunk), then download the next etc. (They > can be downloaded out of order from different hosts.) > > Note that in this case the chunk size is selectable. If the hash size > is 16 bytes (a nice size to work with), and the desired chunk size is > X bytes, then the generations to download to fill that chunk are > multiples of log2(X/16). > > That algorithm could be input for a THEX draft 3. Sure, people could do this with the existing calculation (and serialization) method. It just doesn't offer any tangible benefits to justify its added complexity. You still need the data all the way up to the root before you can make any judgements about the veracity of the lowest nodes. > (I'd also argue you > should note that only the leaf nodes are needed due to the fast > repopulation operation). I agree it would be good to explicitly remind people that any one generation is enough to recalculate the rest of the tree, and if there's real-world demand, specify a single-generation serialization format. - Gordon @ Bitzi ____________________ Gordon Mohr Bitzi CTO . . . describe and discover files of every kind. _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! From gojomo at bitzi.com Fri Jul 4 13:33:02 2003 From: gojomo at bitzi.com (Gordon Mohr) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach References: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net> <20030704180124.A12837776@exeter.ac.uk> Message-ID: <00be01c3426b$5be67670$660a000a@golden> Adam Back writes: > On Fri, Jul 04, 2003 at 09:02:29AM -0700, bert@web2peer.com wrote: > > If more than half the nodes are adversarial, the network is going to > > be prone to all kinds of attacks that you'll never be able to > > surmount. > > I think p2p download can work (albeit less efficiently) into quite > high jamming levels. I agree with Adam here. No matter how many adversarial nodes there are, if they are discovered as soon as they emit 1K of bad data, and thereafter ignored, even a tiny percentage of honest nodes can quickly find each other and bootstrap a useful network. Math rather than majorities will carry the day here. - Gordon From dtburton75 at buckeye-express.com Fri Jul 4 15:39:01 2003 From: dtburton75 at buckeye-express.com (Doug Burton) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] unsubscribe me Please Message-ID: <002e01c3427c$ffd9af20$6713a23f@douglasjjp7bcx> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zgp.org/pipermail/p2p-hackers/attachments/20030704/b5c8196a/attachment.html From bert at web2peer.com Fri Jul 4 17:17:02 2003 From: bert at web2peer.com (bert@web2peer.com) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach Message-ID: <20030704171611.27874.h022.c001.wm@mail.web2peer.com.criticalpath.net> On Fri, 4 Jul 2003 13:32:15 -0700, "Gordon Mohr" wrote: > I agree with Adam here. No matter how many adversarial nodes there > are, if they are discovered as soon as they emit 1K of bad data, > and thereafter ignored, even a tiny percentage of honest nodes can > quickly find each other and bootstrap a useful network. Wishful thinking I think ;-) Adversarial nodes do more than simply emit bad data during downloads. They are free hose/spoof/DOS the search & discovery protocols as well. They may act in collusion with many other nodes, and not necessary act adversarially in any consistent manner, making detection extremely difficult. So the point is, you may never get to the point where you ever are able to find *any* honest nodes, unless those happen to be pre-programmed or discovered via out of band means, or you rely on some kind of central trusted authority. This is admittedly speculation, but it's not blind. In theoretical models of fully distributed computations in the presence of adversaries (e.g. analysis of reputation or anonymizing systems) it's highly unusual to guarantee any interesting properties when the level of malicious (colluding) peers exceeds half the network. From bert at web2peer.com Fri Jul 4 17:17:04 2003 From: bert at web2peer.com (bert@web2peer.com) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach Message-ID: <20030704171611.27874.h022.c001.wm@mail.web2peer.com.criticalpath.net> On Fri, 4 Jul 2003 13:32:15 -0700, "Gordon Mohr" wrote: > I agree with Adam here. No matter how many adversarial nodes there > are, if they are discovered as soon as they emit 1K of bad data, > and thereafter ignored, even a tiny percentage of honest nodes can > quickly find each other and bootstrap a useful network. Wishful thinking I think ;-) Adversarial nodes do more than simply emit bad data during downloads. They are free hose/spoof/DOS the search & discovery protocols as well. They may act in collusion with many other nodes, and not necessary act adversarially in any consistent manner, making detection extremely difficult. So the point is, you may never get to the point where you ever are able to find *any* honest nodes, unless those happen to be pre-programmed or discovered via out of band means, or you rely on some kind of central trusted authority. This is admittedly speculation, but it's not blind. In theoretical models of fully distributed computations in the presence of adversaries (e.g. analysis of reputation or anonymizing systems) it's highly unusual to guarantee any interesting properties when the level of malicious (colluding) peers exceeds half the network. From zooko at zooko.com Fri Jul 4 17:44:02 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: Message from bert@web2peer.com of "Fri, 04 Jul 2003 17:16:09 PDT." <20030704171611.27874.h022.c001.wm@mail.web2peer.com.criticalpath.net> References: <20030704171611.27874.h022.c001.wm@mail.web2peer.com.criticalpath.net> Message-ID: bert@web2peer.com wrote: > > This is admittedly speculation, but it's not blind. In theoretical > models of fully distributed computations in the presence of > adversaries (e.g. analysis of reputation or anonymizing systems) it's > highly unusual to guarantee any interesting properties when the level > of malicious (colluding) peers exceeds half the network. *Any* interesting properties? The results that I am familiar with say things sort of like "You can't get the network to perform a general computation (i.e., execute an arbitrary program) correctly in the presence of more than X% misbehaving processors.". Those "Byzantine" results are certainly valuable research, but they might easily be misapplied to peer-to-peer systems, where there are lots of properties that we might consider interesting that do *not* require general multiparty computation. For example, in *general* you can't reliably ask a bunch of computers to tell you the contents of the 'sillynet://secrets.txt' file if you don't know how many of the computers you are talking to are malicious liars. However, if you already have the SHA1 hash of the file, it is perfectly feasible to ask a bunch of computers to tell you the contents of the 'mnet:38ppp56jbb8b64zrh8reoadzgn1zpdxc76enkmqduwtf4tug' file. Now perhaps in some contexts finding the contents of a file when you already knew the hash of it doesn't count as an "interesting" property, but I'm certainly interested in doing that! ;-) Regards, Zooko http://zooko.com/ ^-- under re-construction: some new stuff, some broken links From blanu at bozonics.com Fri Jul 4 19:44:02 2003 From: blanu at bozonics.com (Brandon Wiley) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <20030704171611.27874.h022.c001.wm@mail.web2peer.com.criticalpath.net> Message-ID: > highly unusual to guarantee any interesting properties when the level > of malicious (colluding) peers exceeds half the network. This reminds me of a paper I'm fond on, "Dynamically Fault-Tolerant Content Addressable Networks" by Jared Saia et. al., from IPTPS'02. "after the removal of 2/3 of the peers by an omniscient adversary who can choose which to destroy, 99% of the rest can access 99% of the remaining data" This is of course about node deletion, not adding evil nodes, but it's such an awesome paper I had to mention it for those who have not read it yet! From baford at mit.edu Sat Jul 5 05:51:02 2003 From: baford at mit.edu (Bryan Ford) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: THEX efficiency for authenticated p2p dload. and alternate approach Message-ID: <200307050850.14503.baford@mit.edu> This discussion seems strangely reminiscent of a debate between database engineers on whether to use binary trees or B-trees for indexes. The question is simply whether you use the minimal branching arity necessary to form a tree (namely 2) or whether you increase the arity so as to make index blocks the same size as data blocks. For databases the answer typically depends on where the index is stored. Use a binary tree if the index can be stored entirely in main memory, because the traversal and update code is very quick and simple. But use a B-tree if it's stored on disk, where you can only access data efficiently in blocks: if you access part of a disk block you might as well access the rest, so you want to make the most of each block access and minimize the number of blocks you have to traverse from root to leaf. As far as what's appropriate here, I have to put myself behind Adam's proposal. Requesting data from other nodes in a P2P network is certainly much more like disk access than memory access - you need to request (and verify) data in decent-size chunks for efficiency, but you also want a reasonable upper bound on the size of each chunk (e.g., to fit into a 1.5K Ethernet packet, or perhaps a 64K UDP datagram if you're feeling generous). THEX minimizes the amount of data that goes into computing each intermediate node (namely two sub-node/block hashes), at the cost of adding a _lot_ more intermediate levels to the tree (e.g., 6X more in the case of 16-byte hashes with 1K blocks) and increasing the total size of the metadata by almost a factor of two. Although THEX theoretically allows one P2P host to request THEX tree data from another in random access fashion and verify it incrementally one (e.g., 32-bit or 40-byte) intermediate node at a time, 40-byte requests are impractically small for network efficiency. Instead the THEX proposal seems to be for hosts always to exchange complete metadata trees - but as already pointed out in this discussion, doing so merely delays the problem because, for large files, the complete metadata tree itself becomes unwieldy (at least for users with low-bandwidth connections) and needs to be broken up. But breaking up large pieces of data into manageable fixed-size pieces is the purpose of using data blocks in the first place. If we have to break up metadata as well as "plain" data, why should we use a different strategy or a different block size to break up the metadata as we used for the data? Given the practical necessity of being able to break up and incrementally verify tree metadata _somehow_, I think that Adam's proposal is conceptually cleaner and simpler, it is likely to be easier to use in protocols and P2P applications because they only have to implement one data blocking mechanism rather than two, and it is certainly more efficient in terms of the total amount of metadata that must be stored and transferred for a given complete file. Ultimately I think this debate will prove to be just like the binary tree vs B-tree debate in the database world, and if history is anything to go by, for any high-latency block/chunk/packet-based "storage medium" the B-trees are going to win hands-down. Cheers, Bryan From bert at web2peer.com Sat Jul 5 08:48:02 2003 From: bert at web2peer.com (bert@web2peer.com) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach Message-ID: <20030705084700.27408.h022.c001.wm@mail.web2peer.com.criticalpath.net> On Fri, 4 Jul 2003 21:24:59 -0500 (CDT), Brandon Wiley wrote: > > highly unusual to guarantee any interesting properties when the level > > of malicious (colluding) peers exceeds half the network. > > This reminds me of a paper I'm fond on, "Dynamically Fault-Tolerant > Content Addressable Networks" by Jared Saia et. al., from IPTPS'02. > > "after the removal of 2/3 of the peers by an omniscient adversary who > can > choose which to destroy, 99% of the rest can access 99% of the > remaining > data" > > This is of course about node deletion, not adding evil nodes, but it's > such an awesome paper I had to mention it for those who have not read > it > yet! Definitely an interesting result, though it applies only to a very specific adversarial model. You'll also find this paper in IPTPS-2000 which considers much more diverse set of attacks: Emil Sit and Robert Morris, Security Considerations for Peer-to-Peer Distributed Hash Tables ...unfortunately it doesn't provide any particularly strong results on how to deal with them. There's a lot of interesting work remaining to do in this area, that's for sure. From bert at web2peer.com Sat Jul 5 08:48:04 2003 From: bert at web2peer.com (bert@web2peer.com) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach Message-ID: <20030705084700.27408.h022.c001.wm@mail.web2peer.com.criticalpath.net> On Fri, 4 Jul 2003 21:24:59 -0500 (CDT), Brandon Wiley wrote: > > highly unusual to guarantee any interesting properties when the level > > of malicious (colluding) peers exceeds half the network. > > This reminds me of a paper I'm fond on, "Dynamically Fault-Tolerant > Content Addressable Networks" by Jared Saia et. al., from IPTPS'02. > > "after the removal of 2/3 of the peers by an omniscient adversary who > can > choose which to destroy, 99% of the rest can access 99% of the > remaining > data" > > This is of course about node deletion, not adding evil nodes, but it's > such an awesome paper I had to mention it for those who have not read > it > yet! Definitely an interesting result, though it applies only to a very specific adversarial model. You'll also find this paper in IPTPS-2000 which considers much more diverse set of attacks: Emil Sit and Robert Morris, Security Considerations for Peer-to-Peer Distributed Hash Tables ...unfortunately it doesn't provide any particularly strong results on how to deal with them. There's a lot of interesting work remaining to do in this area, that's for sure. From bert at web2peer.com Sat Jul 5 09:03:02 2003 From: bert at web2peer.com (bert@web2peer.com) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach Message-ID: <20030705090238.27408.h022.c001.wm@mail.web2peer.com.criticalpath.net> >You'll also find this paper in IPTPS-2000 Sorry , that should have been IPTPS-*2002* From adam at cypherspace.org Sat Jul 5 12:05:02 2003 From: adam at cypherspace.org (Adam Back) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <00b601c3426a$9ff6e3f0$660a000a@golden>; from gojomo@bitzi.com on Fri, Jul 04, 2003 at 01:26:59PM -0700 References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden> <20030704105003.A13087560@exeter.ac.uk> <00b601c3426a$9ff6e3f0$660a000a@golden> Message-ID: <20030705200440.A12656853@exeter.ac.uk> On Fri, Jul 04, 2003 at 01:26:59PM -0700, Gordon Mohr wrote: > Spreading your requests of the verification data out over many > peers only increases the chance that you'll have mixed data, > some good, some bad. But as soon as you've found one good source, > why not just stick with it (for the full tree or full generation)? > > [..] the whole tree is so small, why care? Because the first node you find may not be a performant one, and you can not determine performance except empirically. So if you used this algorithm and you find an overloaded 56k modem, and you download the entire tree from that single node serially your tree download even though only 3% of the file size could easily take longer than the rest of the file. (Similarly this likely still holds for smaller percentages corresponding to larger chunks). > And under your alternative proposal, you need the roughly the > same amount of data -- or sometimes more -- to judge any node or > subregion of the tree. I can verify downloads in parallel because the intermediate tree auth and leaf tree chunks are downloaded in a pattern to optimize that. In my approach, the tree auth stuff (everything but the leafs) is redundant, but facilitates parallel download of just the leafe nodes. With your approach a full 50% of the download is redundant and the main value of the non-leaf nodes (given the speed of repopulation) is for authentication only. This is a highly inefficient authenticaiton approach compared to mine. My authentication overhead is about 0.025% of file size and yours is about 1.56% of file size; a 63x higher overhead. (For 1KB chunks; the same comparative efficiency ratio of the two approaches holds for arbitrary chunk sizes). > > A) The rational jammer will always jam your tree downloads because > > they are 4-5x more vulnerable (he may jam content chunks also, but he > > gets best value for his investment by jamming your tree downloads). > > This doesn't follow; you can be verifying the breadth-first > serialization tree download as it happens If you download the full tree this is true. But you have doubled the tree data size to be able to verify as you download sequentially from one node. If however you download the leaf nodes only, with your method you can verify _nothing_ until you've downloaded the _entire_ tree. (Downloading just the leaves is something you said people were asking to be able to do for efficiency reasons). > With say SHA1, and disregarding the header overhead, that means you can't > be fed more than 40 bad bytes before the problem is evident, because the > data doesn't match up with what you've previously verified. Compared to > the 1K minimum resolution size on content downloads, that's 25x LESS > vulnerable. I was taking the 1K block size to match the MTU. That is you can't download a smaller chunk than that (or you don't want to for efficiency). You could in theory probe nodes good behavior with smaller chunks to build confidence in a node, upping the chunk size over time I suppose. And if that was important to you you might define a variable chunk size. (But still then my approach allows parallel downloads of just the leaf nodes, just use log2(X/16) as the number of generations where X is that desired smaller chunk size; if the chunk size is 16 bytes then the approaches are equivalent). > > B) If you download your tree in parts in parallel to speed it up, your > > tree download becomes even more vulnerable because when the tree fails > > to verify you won't know which chunks were jammed, and your overall > > jamming risk (for tree download) will be higher: it will be (1-p)^k > > (where p is the proportion of nodes which are jaming and k is the > > number of nodes downloaded from. This factor will be multiplied by > > the jamming multiplier coming from A). > > This doesn't follow either: some large portions of the tree segments > you've grabbed will be self-consistent; some will further be > consistent with the desired root. You can keep all those. You are presuming downloading the entire tree (rather than just the leaf nodes). Anyway to follow this approach through (presuming you will download the full tree): - All chunks will be self-consistent (presuming a rational adversary). - The problem will be that some will not reach the root. My approach was about how to download in parallel if you want to download just half the tree (the leaf nodes) and repopulate the rest. A simple variant of my approach applies if you want to download the full tree. In this case similarly: you restrict your parallel fetches to looking no more than 6 generations ahead of what you have fully populated (presuming 16 byte hash). So if you advocate downloading the full tree it might be worth describing this variant of my algorithm to admit parallel downloads of the full tree with no uncertainty. > > > On what criteria is this more simple or efficient? > > > > Efficient in that the space overhead is negligibly higher, but the > > jamming resistance of the tree download portion is the same as the > > file portion. > > As I've noted above, the jamming resistance of the top-down binary tree > is potentially to the resolution of every 2*hash_size byte range, much > higher resolution than the 1K file portions. At a cost of close to 50% transferred tree data expansion over my approach (63x more tree auth data). > > Note that in this case the chunk size is selectable. If the hash size > > is 16 bytes (a nice size to work with), and the desired chunk size is > > X bytes, then the generations to download to fill that chunk are > > multiples of log2(X/16). > > > > That algorithm could be input for a THEX draft 3. > > Sure, people could do this with the existing calculation (and serialization) > method. It just doesn't offer any tangible benefits to justify its added > complexity. You still need the data all the way up to the root before you > can make any judgements about the veracity of the lowest nodes. This is why you download first the top 1KB to allow you to authenticate the next 64KB (which you could download 1KB chunks of in parallel); similary the 64KB (once downloaded) allows you to verify the next 4MB downloaded in parallel etc. So I'm not saying arbitrary sequence, but still a sequence that admits ample parallelism early in the download of the tree data. > > (I'd also argue you > > should note that only the leaf nodes are needed due to the fast > > repopulation operation). > > I agree it would be good to explicitly remind people that any one > generation is enough to recalculate the rest of the tree, and if there's > real-world demand, specify a single-generation serialization format. The optimal single-generation serialization format is the one I have been specifying. If you download a single-generation without using it you have to re-download the entire tree if anything goes wrong even from a single node. In parallel from multiple nodes you won't even know which parts of it are wrong, and then the bad probability of failure (1-p)^k that I mentioned comes into play. (Where p is proportion of bad nodes, k number of nodes downloaded from in parallel.) Adam From gojomo at bitzi.com Sat Jul 5 19:47:02 2003 From: gojomo at bitzi.com (Gordon Mohr) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden> <20030704105003.A13087560@exeter.ac.uk> <00b601c3426a$9ff6e3f0$660a000a@golden> <20030705200440.A12656853@exeter.ac.uk> Message-ID: <00b201c34368$c6732300$660a000a@golden> Adam, I believe your approach to be a premature optimization, against a theoretical attack which would not be rational for an adversary to attempt. Indeed, to consider the larger real-world context, adopting any of these approaches means that injecting a small bad segment into a larger download can no longer corrupt a much larger file in such a way that the jammer is costly to trace. Jammers who supply bad file blocks or bad verification data can be discovered within 1KB (or other choosable threshold) of their first bad data. I would suggest that this fact makes this sort of jamming -- malicious nodes claiming they'll supply something exact but then not supplying it -- sufficiently costly that it would rarely be tried. Attckers might as well just try a DOS of nonsense traffic floods. So cutting the (already-tiny) verification overhead by another 40% or so in data tranferred seems to me like a negligible benefit compared to just getting something deployed, and there's nothing more simple than a top-down full-tree immediately-self-verifying transmission. If anyone wanted to do an extreme optimization, or in fact react to some real threat that someday emerges, the full-tree format does allow -- via subrange-requests -- random access to just the levels and portions of levels they want. They could use your ignore-5-out-of-every-6 generations specialization, or they could ignore 3-out-of-4 levels, or 255-out-of-256 levels, whatever strikes their fancy. Using a binary tree at the core allows any of these other approaches to work. I doubt any of these would ever be pursued, though. Seeing how real P2P systems have been programmed, deployed, and thrived (or not), people only need a "good-enough" solution at each step, not an optimal one. The good-enough approach here is to ask a peer for the full tree (or full generation of interest). Then check to see if it's an honest tree (or honest up to some point). If not, discard the dishonest part, mark the peer as bad, and try another peer at random. Good enough, and since every approach requires finding at least one honest peer, this approach eventually succeeds in every case where success is at all possible. There's just one numerical point I'd like to address: Adam Back writes: > > And under your alternative proposal, you need the roughly the > > same amount of data -- or sometimes more -- to judge any node or > > subregion of the tree. > > I can verify downloads in parallel because the intermediate tree auth > and leaf tree chunks are downloaded in a pattern to optimize that. > > In my approach, the tree auth stuff (everything but the leafs) is > redundant, but facilitates parallel download of just the leafe nodes. > > With your approach a full 50% of the download is redundant and the > main value of the non-leaf nodes (given the speed of repopulation) is > for authentication only. This is a highly inefficient authenticaiton > approach compared to mine. > > My authentication overhead is about 0.025% of file size and yours is > about 1.56% of file size; a 63x higher overhead. (For 1KB chunks; the > same comparative efficiency ratio of the two approaches holds for > arbitrary chunk sizes). That's a deceptive comparison, disregarding the leaves as overhead. And if leaving out 5/6 of the levels is good, why not leave out 7/8? 15/16? It's just a tradeoff between bandwidth, granularity of resolution, expected level of malicious action, and code/doc complexity. I happen to think bandwidth is cheap -- and getting cheaper. Code/doc complexity is expensive -- and can even be fatal to adoption. Deploying any of these systems will cause the expected level of jamming attacks to plummet, because they'll no longer offer a big bang for the buck. And it's a pleasant fringe benefit that the most simple interchange format -- a top-down fully-filled-in binary tree -- also offers the best possible progressive verification granularity. Your tradeoffs may be different, but as noted above, your pattern of interactions can be implemented as an optional specialization of the more simple and general approach if ever needed. - Gordon From justin at chapweske.com Sun Jul 6 19:12:03 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:21 2006 Subject: [P2Prg] Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <20030704180124.A12837776@exeter.ac.uk> References: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net> <20030704180124.A12837776@exeter.ac.uk> Message-ID: <3F08D6D1.6020006@chapweske.com> > > I don't think in the conclusion it adds any significant complexity as > it can be expressed as just an approach to downloading the THEX tree. > (Which the THEX document anyway supposes you could download in chunks: > I just specify how to download those chunks in parllel in a size equal > to your preferred data chunk size in a way which avoids jamming to the > same degree as THEX plans for document chunks). > So, am I correct in understanding that the current THEX specification affords the flexibility necessary for you to implement your idea? We intentionally do not specify how to use THEX in the draft. We merely provide a flexible serialization format and let everyone adapt it to their own needs. I believe this is as it should be, because each network will have different requirements. In the next version of THEX we will be introducing a new optional serialization type, "rootleaves", for applications that only care about having a single row of hashes, and not the intermediate nodes. If you combine this with a service to dynamically generate THEX trees at a specified depth, you now gain the ability to randomly request any single row of the hash tree w/o doing fancy byte range requests. At some point the P2PRG should probably do a taxonomy of the major approaches to integrity verification. Off the top of my head I can think of: o full file hash o block hashes o hashed/signed/mac'd block hashes (tree hash with 2 levels) o chained block hashes o tree hashes Any others worth mentioning? -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From justin at chapweske.com Sun Jul 6 20:12:02 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <200307050850.14503.baford@mit.edu> References: <200307050850.14503.baford@mit.edu> Message-ID: <3F08E4EE.8070500@chapweske.com> > Although THEX theoretically allows one P2P host to request THEX tree data from > another in random access fashion and verify it incrementally one (e.g., > 32-bit or 40-byte) intermediate node at a time, 40-byte requests are > impractically small for network efficiency. I do not advocate any one way of implementing THEX. But if you desire to access the tree in a fully random fashion, then simply pipeline the requests to maintain reasonable network efficiency. Again, in practice the amount of THEX data used is a small fraction of the total file size, so even w/o pipelining, the performance impact will be negligable. > Instead the THEX proposal seems > to be for hosts always to exchange complete metadata trees - but as already > pointed out in this discussion, doing so merely delays the problem because, > for large files, the complete metadata tree itself becomes unwieldy (at least > for users with low-bandwidth connections) and needs to be broken up. The default breadthfirst serialization allows hosts to retrieve as little or much data as they desire. Downloading more data simply increases the verification resolution. You can also certainly fix the size of your THEX trees to a couple hundred kilobytes, as many of the Gnutella developers have done. This way your verification resolution decreases as the file size increases, though the THEX file size stays constant. And thats exactly what we try to accomplish with THEX. We want people to slice and dice it as they see fit for their application. -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From justin at chapweske.com Sun Jul 6 20:40:01 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <20030705200440.A12656853@exeter.ac.uk> References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden> <20030704105003.A13087560@exeter.ac.uk> <00b601c3426a$9ff6e3f0$660a000a@golden> <20030705200440.A12656853@exeter.ac.uk> Message-ID: <3F08EB71.2040103@chapweske.com> > > I can verify downloads in parallel because the intermediate tree auth > and leaf tree chunks are downloaded in a pattern to optimize that. > > In my approach, the tree auth stuff (everything but the leafs) is > redundant, but facilitates parallel download of just the leafe nodes. > > With your approach a full 50% of the download is redundant and the > main value of the non-leaf nodes (given the speed of repopulation) is > for authentication only. This is a highly inefficient authenticaiton > approach compared to mine. > We are fully aware that for any given download/verification strategy, there will be an optimal THEX serialization that will be 50% smaller than the breadthfirst serialization. However, that custom serialization format will not be as flexible as breadthfirst and may not meet the needs of networks operating under a different set of requirements. Please note that the THEX serialization type is specified as a URI, which allows it to be assigned in a decentralized fashion. So if you feel strongly about the advantages of a custom serialization type over the generic breadthfirst serialization for your specific application, then feel free to specify a new serialization type URI, such as (http://cypherspace.org/spec/thex/adambackserialization). -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From seth.johnson at RealMeasures.dyndns.org Mon Jul 7 09:36:02 2003 From: seth.johnson at RealMeasures.dyndns.org (Seth Johnson) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Call for WIPO DG on Open and Collaborative Public Goods Message-ID: <3F09A01B.EDA15121@RealMeasures.dyndns.org> (Looks like we're about to call WIPO out on the carpet. Information is the one indisputable public good, whatever its form of organization. Please see the APPENDIX below for an overview of categories of public goods being suggested. -- Seth) -------- Original Message -------- Subject: [Random-bits] WIPO DG asked to convene meeting on open and collaborativeprojects to create public goods Date: Mon, 7 Jul 2003 11:51:21 -0400 (EDT) From: Jay Sulzberger To: fairuse-discuss@nyfairuse.org CC: Jay Sulzberger ---------- Forwarded message ---------- Date: Mon, 07 Jul 2003 11:40:57 -0400 From: James Love To: random-bits@lists.essential.org, ecommerce Subject: [Random-bits] WIPO DG asked to convene meeting on open and collaborative projects to create public goods > http://www.cptech.org/ip/wipo/kamil-idris-7july2003.pdf 7 July 2003 Director General Dr. Kamil Idris, Director General World Intellectual Property Organization Geneva, Switzerland Dear Dr. Idris: In recent years there has been an explosion of open and collaborative projects to create public goods. These projects are extremely important, and they raise profound questions regarding appropriate intellectual property policies. They also provide evidence that one can achieve a high level of innovation in some areas of the modern economy without intellectual property protection, and indeed excessive, unbalanced, or poorly designed intellectual property protections may be counter-productive. We ask that the World Intellectual Property Organization convene a meeting in calendar year 2004 to examine these new open collaborative development models, and to discuss their relevance for public policy. (See Appendix following signatures for examples of open collaborative projects to create public goods). Sincerely, (in alphabetical order) Alan Asher Consumers Association London, UK Dr. K. Balasubramaniam Co-ordinator of Health Action International, Asia Pacific Columbo, Sri Lanka Konrad Becker, Director Institute for New Culture Technologies /t0 Vienna, Austria Yochai Benkler Professor of Law Yale Law School New Haven, CT USA Jonathan Berger Law and Treatment Access Unit AIDS Law Project University of the Witwatersrand South Africa James Boyle Professor of Law Duke Law School Durham, NC USA Diane Cabell Director, Clinical Programs, Berkman Center for Internet & Society Harvard Law School Cambridge, MA, USA Darius Cuplinskas Director, Information Program Open Society Institute Budapest, Hungary Marie de Cenival Charg?e de mission ETAPSUD Agence Nationale de Recherches sur le Sida (A.N.R.S.) INSERM 379 "Epid?miologie et Sciences Sociales appliqu?es ? l'innovation m?dicale" Marseille, France Felix Cohen CEO, Consumentenbond The Hague, the Netherlands Benjamin Coriat Professor of Economics, University of Paris 13 Director of CEPN-IIDE, CNRS Paris, France Carlos Correa Center for Interdisciplinary Studies on Industrial Property and Economics University of Buenos Aires Buenos Aires, Argentina Paul A. David Professor of Economics, Stanford University & Senior Fellow, Stanford Institute for Economic Policy Research Stanford, California, USA Emeritus Fellow, All Souls College, Oxford & Senior Fellow, Oxford Internet Institute Oxford, UK Kristin Dawkins Vice President for International Programs Institute for Agriculture and Trade Policy Minneapolis, MN USA Peter T. DiMauro Center for Technology Assessment Washington, DC USA Rochelle Cooper Dreyfuss Pauline Newman Professor of Law New York University School of Law NY, NY USA Peter Eckersley, Department of Computer Science, and IP Research Institute of Australia, The University of Melbourne Australia Michael B. Eisen Public Library of Science San Francisco, CA, and Lawrence Berkeley National Lab Berkeley, CA USA Nathan Geffen Treatment Action Campaign Cape Town, South Africa Gwen Hinze Staff Lawyer Electronic Frontier Foundation San Francisco, CA USA Ellen F.M. 't Hoen LL.M. Medecins sans Frontieres Access to Essential Medicines Campaign Paris, France Jeanette Hofmann Nexus & Social Science Research Center Berlin, Germany Aidan Hollis Associate Professor, Department of Economics, University of Calgary, and TD MacDonald Chair in Industrial Economics Competition Bureau, Industry Canada Gatineau, Quebec Canada Dr Tim Hubbard Head of Human Genome Analysis Wellcome Trust Sanger Institute Cambridge, UK Nobuo Ikeda Senior Fellow, Research Institute of Economy, Trade and Industry Tokyo, Japan Professor Wilmot James Chair, Africa Genome Initiative Social Cohesion & Integration Research Programme Human Sciences Research Council Cape Town, South Africa Niyada Kiatying-Angsulee, Ph.D. Drug Study Group Thailand Philippa Lawson Senior Counsel, Public Interest Advocacy Centre Ottawa, Canada Lawrence Lessig Professor at Law and Executive Director of the Center for Internet and Society Stanford Law School Stanford, CA USA James A. Lewis Director, Technology and Public Policy Program Center for Strategic and International Studies Washington, DC USA Jiraporn Limpananont, Ph.D. Pharmaceutical Patent Project, Social Pharmacy Research Unit (SPR), Faculty of Pharmaceutical Sciences Chulalongkorn University Bangkok, Thailand. James Love Director, Consumer Project on Technology Co-Chair, Trans Atlantic Consumer Dialogue (TACD) Committee on Intellectual Property Washington, DC USA Jason M. Mahler Vice President and General Counsel Computer and Communications Industry Association Washington, DC USA Eric S. Maskin A.O. Hirschman Professor of Social Science Institute for Advanced Study Princeton, NJ USA Professor Keith Maskus Chair, Department of Economics University of Colorado at Boulder. Boulder, CO USA Ken McEldowney Executive Director Consumer Action California USA William McGreevey Director, Development Economics Futures Group Washington, DC USA Professor Jon Merz Center for Bioethics University of Pennsylvania Philadelphia, PA USA Jean Paul Moatti Director, INSERM 379 Facult? de Sciences Economiques Universit? de la M?diterran?e Marseille, France Eben Moglen Professor of Law & Legal History Columbia University General Counsel, Free Software Foundation NY, NY USA Ralph Nader Consumer Advocate Washington, DC USA Hee-Seob Nam, Patent Attorney Intellectual Property Left Korea Progressive Network JINBONET Korea James Orbinski MD Associate Professor Centre for International Health University of Toronto, Canada Bruce Perens Director, Software in the Public Interest Inc. Co-Founder, Open Source Initiative, Linux Standard Base USA Greg Pomerantz, Fellow, Information Law Institute, New York University New York, NY USA Laurie Racine President, Center for the Public Domain Durham, NC USA Eric S. Raymond President, Open Source Initiative USA Juan Rovira Senior Health Economist The World Bank Frederic M. Scherer Emeritus, John F. Kennedy School, Harvard University Cambridge, MA USA Mark Silbergeld Consumer Federation of America Washington, DC USA Richard Stallman Launched the development of the GNU operating system, whose GNU/Linux variant is the principal competitor for Microsoft Windows. Cambridge, MA USA Anthony Stanco Center of Open Source & Government George Washington University Washington, DC USA Joseph Stiglitz Professor of Economics and Finance Columbia University Former Chief Economist World Bank Chairman of the White House Council of Economic Advisers from 1995 to 1997 Received Nobel Prize for Economics in 2001 New York, NY USA Peter Suber Research Professor of Philosophy, Earlham College Open Access Project Director, Public Knowledge Senior Researcher, SPARC Brooksville, ME, USA Sir John Sulton Winner of 2002 Nobel Prize for Physiology or Medicine Former Director of the Wellcome Trust Sanger Institute Cambridge, UK Harsha Thirumurthy Yale University, CT USA Alexander C. Tsai, MD Case Western Reserve University Cleveland, OH USA Pia Valota ACU Associazione Consumatori Utenti ONLUS AEC Association of European Consumers socially and environmentally aware Milano, Italy Professor Hal Varian Dean, School of Information and Management Systems University of California at Berkeley. Berkeley, CA USA Machiel van der Velde Co-Chair, Trans-Atlantic Consumer Dialogue (TACD) Committee on intellectual property The Hague, the Netherlands Victoria Villamar le Bureau Europ?en des Unions de Consommateurs/ European Consumers' Organisation Brussels, Belgium Robert Weissman Essential Action Washington, DC USA Professor Jonathan Zittrain Co-Director, Berkman Center for Internet & Society Harvard Law School Cambridge MA USA APPENDIX Open collaborative projects to create public goods These are some of the projects that could be discussed: 1. The IETF and Open Network Protocols. The Internet Engineering Task Force has worked for years to develop the public domain protocols that are essential for the operation of the Internet, an open network that has replaced a number of proprietary alternatives. It is important that WIPO acknowledge the success and importance of the Internet, and appreciate and understand the way the IETF functions. The IETF is currently struggling with problems setting open standards. When the IETF seeks to adopt a standard, there is uncertainty if anyone will later claim the standard infringes a patent. One suggestion to address this problem is to create a system whereby a standards organization could announce an intention to adopt a standard, and after a reasonable period for disclosure, prevent parties from later enforcing non-disclosed infringement claims. 2. Development of Free and Open Software This movement is highly decentralized, competitive, entrepreneurial, heterogeneous, and devoted to the publishing of software that is freely distributed and open. It includes projects that embrace the GNU General Public License (GPL), which uses copyright licenses to require that modified versions also be free software, and projects such as FreeBSD, which use minimal licensing restrictions and permit anyone to make non-free modified versions, as well as projects such as MySQL, which release the code under the GNU GPL but sell licenses to make non-free modified versions, as well as many other approaches. The new Apple operating system runs on top of FreeBSD, and big corporate players like Oracle and IBM run databases and server software on the mostly-GPL'd GNU/Linux operating system. Apache is the leading web page server software. WIPO provides frequent forums where firms that embrace closed and proprietary development models express their views, but very little is heard from those who have embraced open and collaborative development models for free software. The astonishing success of this movement should be recognized by WIPO, and policy development should be open to new ways of thinking. These various actors have a variety of values and objectives. Richard Stallman of the Free Software Foundation says "the freedom to change and redistribute software is a human right." Others see this is as primarily an issue of how to most efficiently develop and distribute software. The proponents of open collaborative free software projects note that there are powerful reasons why software code should be open and freely copied. Not only is it efficient to copy existing code in new programs, but the transparency of the code allows a large community to find flaws and suggest improvements (Linus Torvalds' observation, popularized by Eric Raymond's, that "with enough eyeballs, all bugs are shallow"). The free software movement is very important to the success and the future of the Internet, and it is also quite important in countering Microsoft's massive monopoly power, particularly given the number of commercial competitors to Microsoft that have disappeared. In recent years many governments have began to embrace open collaborative free software projects. Free software developers are concerned about a number of policies that WIPO is involved in, including whether to allow patents on computational ideas, the future development of digital rights management schemes, and the enforceability of "shrink wrapped" or click-on contracts that contain anticompetitive provisions. 3. The World Wide Web. If measured by the rate at which it has transformed the world, the World Wide Web is the most important publishing success ever. The web was built on public domain protocols, and on documents that were from the beginning, transparent and open at the level of source code. Long before anyone even knew how copyright would apply to the Internet, millions of documents were being created for free distribution on the Internet. Governments are now routinely publishing documents and data on the web so it can be freely available, as do multilateral institutions like WIPO. The entire future of the Web will depend upon the extent to which new digital copyright regimes permit such practices as hypertext linking, the use of materials in search engines such as Google, and liberal views toward fair use. 4. The Human Genome Project (HGP). In an April 14, 2003 state, the heads of state for the France, the US, the UK, Germany, Japan and China issued a statement, which noted that: "Scientists from six countries have completed the essential sequence of three billion base pairs of DNA of the human genome, the molecular instruction book of human life. . . This information is now freely available to the world without constraints via public databases on the World Wide Web." If Presidents Jacques Chirac and George Bush, Prime Ministers Tony Blair and Junichiro Koizumi, Chancellor Gerhard Schroeder and Premier WEN Jiabao can collaborate on a statement to herald efforts to create a public domain database, free from intellectual property claims, it is time for the World Intellectual Property Organization to better appreciate why these governments did not want the Human Genome patented. 5. The SNP Consortium A different example of a project to create a public domain database involves single nucleotide polymorphisms (SNPs), which are thought to have great significance in biomedical research. In 1999, the SNP Consortium was organized as a non-profit foundation to provide public data on SNPs. The SNP Consortium is composed of the Wellcome Trust and 11 pharmaceutical and technological companies including Amersham Biosciences, AstraZeneca, Aventis, Bayer, Bristol-Myers Squibb Company, Hoffmann-LaRoche, GSK, IBM, Motorola, Novartis, Pfizer and Searle. The work was preformed by the Stanford Human Genome Centerm, Washington University School of Medicine (St. Louis), the Sanger Centre and the Whitehead Institute for Biomedical Research. The mission of the SNP consortium was to develop up to 300,000 SNPs distributed evenly throughout the human genome and to make the information related to these SNPs available to the public without intellectual property restrictions. By 2001 it had exceeded expectations, and more than 1.5 million SNPs were discovered and made available to researchers worldwide. The SNPs consortium, the HGP and other similar projects represent different notions regarding the intellectual property rules for databases, and more information about these projects would be useful in evaluating assumptions and informing debates in the WIPO Standing Committee on Copyright as it considers current proposals to convene a diplomatic conference to adopt a treaty on new sui generis intellectual property rules for databases. 6. Open Academic and Scientific Journals The development of the Internet and the World Wide Web has fueled interest in new models for publishing academic and scientific journals. The prices for traditional journals have been sharply rising for years, worsening the gap between those who can afford access to information and those who cannot. In the past several years there has been a proliferation of projects to create open academic and scientific journals. The Public Library of Science was founded by Nobel Prize winner Dr. Harold Varmus and fellow researchers Patrick Brown and Michael Eisen. The Free Online Scholarship (FOS) movement, the creation of the widely read (for profit) BioMed Central to provide "immediate free access to peer-reviewed biomedical research," the Budapest Open Access Initiative (which has been endorsed by 210 organizations), and other similar projects seek to promote new business models for publishing that allow academic and scientific information to be more widely available to the research community. Other efforts to provide reduced price or free access to researchers in developing countries include the Health InterNetwork, which was introduced by the United Nations' Secretary General Kofi Annan at the UN Millennium Summit in the year 2000, a number of projects sponsored by the International Network for the Availability of Scientific Publications, eIFL.Net (Electronic Information for Libraries), a foundation that "strives to lead, negotiate, support and advocate for the wide availability of electronic resources by library users in transition and developing countries," and a new effort by the Creative Commons to create a license for free access to copyrighted materials in developing countries. Recently US Congressman Martin Sabo introduced legislation to require all US funded research to enter the public domain, and others are calling for international cooperation to similarly enhance the scientific commons. 7. The Global Positioning System. This is not an example of collaborative development model, but it does illustrate the benefits of providing a free information good, in terms of stimulating the development of an entire generation of new applications. If lighthouses are considered a textbook example of a public good, the modern equivalent might be the Global Positioning System (GPS), which provides the entire world highly accurate positioning and timing data via satellites. GPS signals are used for air, road, rail, and marine navigation, precision agriculture and mining, oil exploration, environmental research and management, telecommunications, electronic data transfer, construction, recreation and emergency response. There are an estimated 4 million GPS users worldwide. The services are offered without charge. Following the Korean Airline disaster, President Reagan offered GPS free to promote increased safety for civil aviation, and more recently President Clinton eliminated the intentional degrading of the system for civilian use. NASA reports that "many years ago we evaluated charging for the civil signal. The more we looked at it, the more convinced we became that by providing the signal free of direct user fees we would encourage technological development and industrial growth. The benefits from that, the new jobs created, and the increased safety and efficiency for services more than outweighed the money we would get from charging ? especially when you consider the additional bureaucracy that would be needed to manage cost recovery. We think that judgement has proven valid, as the world-wide market for GPS applications and services now exceeds $8 billion annually." -- James Love, Director, Consumer Project on Technology http://www.cptech.org, mailto:james.love@cptech.org tel. +1.202.387.8030, mobile +1.202.361.3040 _______________________________________________ Random-bits mailing list Random-bits@lists.essential.org http://lists.essential.org/mailman/listinfo/random-bits From painlord2k at libero.it Mon Jul 7 12:45:01 2003 From: painlord2k at libero.it (Mirco Romanato) Date: Sat Dec 9 22:12:21 2006 Subject: [P2Prg] Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach References: <20030704090230.13966.h003.c001.wm@mail.web2peer.com.criticalpath.net> <20030704180124.A12837776@exeter.ac.uk> <3F08D6D1.6020006@chapweske.com> Message-ID: <000101c344c0$269f75a0$99b5fea9@painlordcave1> ----- Original Message ----- From: "Justin Chapweske" To: "Adam Back" Cc: ; Sent: Monday, July 07, 2003 4:11 AM Subject: Re: [P2Prg] Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach I write a few example, correct me as you feel need. > At some point the P2PRG should probably do a taxonomy of the major > approaches to integrity verification. Off the top of my head I can > think of: > o full file hash Like SHA-1 in gnutella > o block hashes xBin? verify a file chunk against the root hash value? Mnet? > o hashed/signed/mac'd block hashes (tree hash with 2 levels) like CD4 used in the eDonkey network? > o chained block hashes like in chained block critography? this need the previous block/hash in the series to verify the actual block against its hash? > o tree hashes like TigerTree in Gnutella > Any others worth mentioning? Mirco From lgonze at panix.com Mon Jul 7 12:50:02 2003 From: lgonze at panix.com (Lucas Gonze) Date: Sat Dec 9 22:12:21 2006 Subject: [P2Prg] Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <000101c344c0$269f75a0$99b5fea9@painlordcave1> Message-ID: <2DC9873D-B0B4-11D7-AA05-000393455590@panix.com> >> Any others worth mentioning? >> One other: downloading overlapping segments from different servers and comparing the overlaps. That's insecure, obviously. - Lucas From varx32 at umkc.edu Tue Jul 8 09:07:02 2003 From: varx32 at umkc.edu (Rajegowda, Vikas Aralaguppe (UMKC-Student)) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] unsubscribe Message-ID: <051D9E794E394F4B8A03A2FC4F234D78452B39@KC-MAIL4.kc.umkc.edu> -----Original Message----- From: bert@web2peer.com [mailto:bert@web2peer.com] Sent: Fri 7/4/2003 7:16 PM To: p2p-hackers@zgp.org Cc: p2p-hackers@zgp.org; p2prg@ietf.org; adam@cypherspace.org Subject: Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 3674 bytes Desc: not available Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030708/9a1116c8/attachment.bin From bram at gawth.com Tue Jul 8 17:18:02 2003 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <20030704054745.B13145352@exeter.ac.uk> Message-ID: For what it's worth, BitTorrent does one dumber than Adam's approach and always has exactly two levels it its 'tree' - the list of all hashes, then everything else. It's easy to specify, easy to implement, seems to work fine, and unlike THEX, is widely deployed. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From gojomo at bitzi.com Wed Jul 9 01:02:02 2003 From: gojomo at bitzi.com (Gordon Mohr) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach References: Message-ID: <01be01c345f0$50244e10$660a000a@golden> Bram Cohen writes: > For what it's worth, BitTorrent does one dumber than Adam's approach and > always has exactly two levels it its 'tree' - the list of all hashes, then > everything else. EDonkey uses a similar approach, though its bottom level blocks are a fixed 9.5MBs in size. > It's easy to specify, easy to implement, seems to work fine, and unlike > THEX, is widely deployed. Yes, and that's further evidence that "simple and good enough" is often all that's needed. Even though the verification trees must be fetched in their entirety before they can be confirmed, that's not much of a problem or vulnerability in practice, at least when considering the class of people seeking many-megabyte files. The chief advantage I see in the deep-tree (THEX) approach is that it scales to any desired block size and verification resolution, while retaining the same root values. With deep trees, a paranoid darknet which does all sharing via origin-spoofed multiply-forwarded self-verifying 1KB unreliable- transport packets can use the same file-identity root values as a mainstream CDN using long-lived TCP connections to share 1MB blocks at a time. Further, if the level of malicious jamming rises on a network, cooperating nodes may then opt to verify incoming data at a finer resolution, minimizing the amount of wasteful activity any dishonest node can trigger. To vary the verification resolution of BitTorrent identifiers, a new block size must be chosen, which then alters the top-level identifier. - Gordon @ Bitzi From magnus at bodin.org Wed Jul 9 03:58:08 2003 From: magnus at bodin.org (Magnus Bodin) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] RFC3548 is out Message-ID: <20030709032024.GC15571@bodin.org> Just for curiosity; RFC3548 is finally out and it has a nice little reference to this list: [8] Wilcox-O'Hearn, B., "Post to P2P-hackers mailing list", World Wide Web http://zgp.org/pipermail/p2p-hackers/2001- September/000315.html, September 2001. i.e. I have a copy of the RFC here for laziness: /magnus -- http://x42.com From hopper at omnifarious.org Wed Jul 9 11:39:02 2003 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <20030704054745.B13145352@exeter.ac.uk> References: <20030704054745.B13145352@exeter.ac.uk> Message-ID: <1057775910.13939.896.camel@monster.omnifarious.org> Skipped content of type multipart/related-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030709/ab0d3732/attachment.pgp From justin at chapweske.com Wed Jul 9 12:31:02 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <1057775910.13939.896.camel@monster.omnifarious.org> References: <20030704054745.B13145352@exeter.ac.uk> <1057775910.13939.896.camel@monster.omnifarious.org> Message-ID: <3F0C6D68.4060106@chapweske.com> +1 Eric M. Hopper wrote: > On Thu, 2003-07-03 at 23:47, Adam Back wrote: > >> /So one common p2p download approach is to download a file in parts >> in/ /parallel (and out of order) from multiple servers. (Particularly >> to/ /achieve reasonable download rates from multiple asynchronous >> links of/ /varying link speeds). A common idiom is also that there is >> a compact/ /authenticator for a file (such as it's hash) which people >> will supply/ /as a document-id./ > > > There is an important thing you can do with an order 2 authentication > tree that is much harder to do with a tree of an order larger than 2. > That's download the authentication data needed to verify each packet > against the root along with the packet itself. > > For example, if you have a 2-way THEX tree for 2^3*blocksize data, it > will look something like this: > > > > If someone transmits the data for node A, in order to verify node A > completely, the hashes for B, J, and N need to be transmitted. No other > hashes are needed since they are already known, as in the case for the > root node, or can be calculated. > > If someone then transmits the data for node B, no hashes need to be > transmitted since the reciever already has all the needed hashes. For > C, only D is needed. > > If you have an 8-way THEX tree, you end up with a diagram like this: > > > > If someone recieves node A, they will have to also get the hashes for > nodes B-H in order to verify node A. This is MUCH more information than > with a 2-way THEX tree, and as the depth of both trees grows, the 2-way > tree is favored more and more. > > I think one useful measure is how much data is needed to verify a given > block as compared to the size of a block. If you have a block size of > 64KB, and a hash data size of 32 bytes (I think SHA-1 is just a little > too weak, and prefer SHA2-256), then you can deal with a 16MB file and > still ensure that the maximum amount of data needed to verify any given > block is less than the size of a block. If you only use a 16 byte hash, > then you can send 4GB file and still keep that property. If you > maintain a 32 byte hash, but go to a 128KB block, you can transmit an > 8GB file and maintain that property. > > This is also highly resistant to jamming. If you get the verification > data from the same node that sent you the data block in the first place, > that node will be unable to spoof the verification data to make a bad > block look like a good one. If you get the verification data from a > different node than sent you the data block, that node will be unable to > spoof the hashes in order to make a good block look like a bad one. So > errors in either the verification hashes, or the block are easily and > quickly detectable. > > Lastly, it is easy to specify which subset of verification data you > need. Any data block you recieve may need > log2(number_of_blocks_in_file) hashes worth of verification data. You > can simply send a bitstring with one bit for each hash you might > potentially need saying whether or not you actually need it or not. > > Sorry this is in HTML. I just didn't want to have to use ASCII art for > the diagrams because it'd be a huge and annoying pain. > > Have fun (if at all possible), > > -- > There's an excellent C/C++/Python/Unix/Linux programmer with a wide > range of other experience and system admin skills who needs work. > Namely, me. _http://www.omnifarious.org/~hopper/resume.html_ > -- Eric Hopper > > -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From darkelf at arabia.com Wed Jul 9 13:39:02 2003 From: darkelf at arabia.com (Oscar Cisneros) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] RFC3548 is out Message-ID: <388601c34659$9e6070e0$6ac9010a@mail2world.com> Can you elaborate, please? What is this RFC all about? -Oscar http://emote.net <-----Original Message-----> From: Magnus Bodin Sent: 9/7/2003 5:24:24 AM To: p2p-hackers@zgp.org Subject: Re: [p2p-hackers] RFC3548 is out Just for curiosity; RFC3548 is finally out and it has a nice little reference to this list: [8] Wilcox-O'Hearn, B., "Post to P2P-hackers mailing list", World Wide Web http://zgp.org/pipermail/p2p-hackers/2001- September/000315.html, September 2001. i.e. I have a copy of the RFC here for laziness: /magnus -- http://x42.com _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zgp.org/pipermail/p2p-hackers/attachments/20030709/bde26f14/attachment.html From magnus at bodin.org Wed Jul 9 14:01:01 2003 From: magnus at bodin.org (Magnus Bodin) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] RFC3548 is out In-Reply-To: <388601c34659$9e6070e0$6ac9010a@mail2world.com> References: <388601c34659$9e6070e0$6ac9010a@mail2world.com> Message-ID: <20030709210018.GL7976@bodin.org> On Wed, Jul 09, 2003 at 01:35:21PM -0700, Oscar Cisneros wrote: > Can you elaborate, please? What is this RFC all about? It's finally an RFC that covers _only_ base64, base32 and base16 so other standards may refer to that one instead of some embedded stuff. It's nothing fancy at all. Just found it funny that it referenced this list for just a minor comment about URL-safe chars. /magnus -- http://x42.com From jlevine at bayarea.net Wed Jul 9 14:11:02 2003 From: jlevine at bayarea.net (jlevine@bayarea.net) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] P2punks meeting next Monday evening July 14 7:30pm Message-ID: You know the routine... Also I have a few new Glyphguy t-shirts for anyone whose wardrobe is getting thin. See you there. -------------- Where: Dana Street Roasting Company 744 Dana St., Mountain View Phone: (650) 390-9638 1/2 block off Castro St. When: 7:30pm onward Website: http://www.bitbin.org/p2punks From adam at cypherspace.org Thu Jul 10 01:00:02 2003 From: adam at cypherspace.org (Adam Back) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <1057775910.13939.896.camel@monster.omnifarious.org>; from hopper@omnifarious.org on Wed, Jul 09, 2003 at 01:38:30PM -0500 References: <20030704054745.B13145352@exeter.ac.uk> <1057775910.13939.896.camel@monster.omnifarious.org> Message-ID: <20030710085938.A13047545@exeter.ac.uk> On Wed, Jul 09, 2003 at 01:38:30PM -0500, Eric M. Hopper wrote: > There is an important thing you can do with an order 2 > authentication tree that is much harder to do with a tree of an > order larger than 2. That's download the authentication data needed > to verify each packet against the root along with the packet itself. Yes Gordon made this same observation. At the end of that thread I think we reached the conclusion that it would be more flexible to not compute the 8-way THEX the way you show in the 8THEX diagram (actually I proposed 64-way, but that's less convenient to draw). Instead the 8-way would just be the 8 leaf nodes from diagram 2THEX. This allows you to mix requirements. If you want the possibility to do what you describe (download just the auth nodes necessary to authenticate a given chunk), you can do that. If you want the possibility to optimally efficiently download the tree itself in parallel and be able to verify chunks at some resoultion immediately you do what I described (download stripes through the tree at 6 generational gaps in sequence -- ie download first the 64 nodes at generation 6 in parallel; once you've done that you can download the 64^2 nodes at generation 12 etc; once you've got the full leaf node set you can then download the entire file in parallel with immediate verification of bad chunks). And if you don't care about that extra space efficiency of downloading the leaves only, but you do care about downloading the tree in parallel, you can do the variant of what I described where you download the first chunks worth of full tree. Then you download in parallel from that offset the next section of full tree, again no further than 6 generations ahead of what you have fully populated. (Where the generation gap is given by log2(chunk-size/hash-size)). > I think one useful measure is how much data is needed to verify a > given block as compared to the size of a block. If you have a block > size of 64KB, and a hash data size of 32 bytes [...] then you can > deal with a 16MB file and still ensure that the maximum amount of > data needed to verify any given block is less than the size of a > block. [...] And that would be a simplifying argument? (Ie if the auth data is the auth chunk size, then you can by definition downlaod it serially). Or just an interesting statistic? > Lastly, it is easy to specify which subset of verification data you > need. Any data block you recieve may need > log2(number_of_blocks_in_file) hashes worth of verification data. > You can simply send a bitstring with one bit for each hash you might > potentially need saying whether or not you actually need it or not. That might be an interesting little protocol for THEX input. Note however it only works where each node _has_ the whole file. For systems which do swarmcasting (bittorrent, edonkey?) then they don't. However I suppose if one wanted this, they could retain the log2 hash path that presumably they got when they fetched that chunk. (You could coalesce chunk lo2 hash paths as they are downloaded if desired to save local storage space, and still be able to recreate the paths). gnunet supports downloads of files in minimally UDP suitable chunk sizes (if I recall Chris Grothoff said 1KB chunks in his PET03 presentation or chatting afterwards). Adam From adam at cypherspace.org Thu Jul 10 01:07:02 2003 From: adam at cypherspace.org (Adam Back) Date: Sat Dec 9 22:12:21 2006 Subject: bittorrent tree mechanism (Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach) In-Reply-To: ; from bram@gawth.com on Tue, Jul 08, 2003 at 05:17:50PM -0700 References: <20030704054745.B13145352@exeter.ac.uk> Message-ID: <20030710090630.B13047545@exeter.ac.uk> On Tue, Jul 08, 2003 at 05:17:50PM -0700, Bram Cohen wrote: > For what it's worth, BitTorrent does one dumber than Adam's approach and > always has exactly two levels it its 'tree' - the list of all hashes, then > everything else. > > It's easy to specify, easy to implement, seems to work fine, and unlike > THEX, is widely deployed. I'd guess that the limit of what you can draw from the empirical evidence is that it works fine from a functional perspective rather than necessarily an adversarial one. That is to say as far as I know we are at this point ahead of the arms-race with the jammers who are satisfying themselves with exploiting the weakness of the rating systems to discover pre-rated known-good hashes - and just publishing mislabelled files, empty files an dfiles full of taunts. Do you swarm-cast the tree? Or is the tree downloaded from the bittorrent index server? Or is it downloaded from a random node? Do you check the consistency of the 2nd level with respect to the master hash prior to swarmcasting content? What would the bittorrent client do if the tree failed? Fail with error message or repeat until success? Adam From adam at cypherspace.org Thu Jul 10 01:12:02 2003 From: adam at cypherspace.org (Adam Back) Date: Sat Dec 9 22:12:21 2006 Subject: gnunet transport (Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach) In-Reply-To: <01be01c345f0$50244e10$660a000a@golden>; from gojomo@bitzi.com on Wed, Jul 09, 2003 at 01:01:32AM -0700 References: <01be01c345f0$50244e10$660a000a@golden> Message-ID: <20030710091117.C13047545@exeter.ac.uk> On Wed, Jul 09, 2003 at 01:01:32AM -0700, Gordon Mohr wrote: > With deep trees, a paranoid darknet which does all sharing via > origin-spoofed multiply-forwarded self-verifying 1KB unreliable- > transport packets btw that transport, chunk size, and forwarding for server and client anonymity objective describes gnunet I believe (or at least one of it's transports). Not sure if there are any gnunet people on this list. Adam From adam at cypherspace.org Thu Jul 10 01:34:02 2003 From: adam at cypherspace.org (Adam Back) Date: Sat Dec 9 22:12:21 2006 Subject: p2p network assumptions for download auth problem (Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach) In-Reply-To: <00b601c3426a$9ff6e3f0$660a000a@golden>; from gojomo@bitzi.com on Fri, Jul 04, 2003 at 01:26:59PM -0700 References: <20030704054745.B13145352@exeter.ac.uk> <014d01c341fa$c0a58bd0$660a000a@golden> <20030704105003.A13087560@exeter.ac.uk> <00b601c3426a$9ff6e3f0$660a000a@golden> Message-ID: <20030710093327.A13197394@exeter.ac.uk> On Fri, Jul 04, 2003 at 01:26:59PM -0700, Gordon Mohr wrote: > But the best you can do, in a network where many peers are malicious, > is to identify them as soon as possible, and then ignore them, > preferring those who are non-malicious. Actually I think this assumption may not be necessarily true in the general case. This goes back to what assumptions about what the p2p network one makes. The current kazaa jamming is not by peers, but by publishers. The publishers try to make their content look otherwise attractive (by giving rave review self-proclaimed meta data) to encourage others to download, and then magnify the effect, as people tend to download from fast sources, which means many sources; and for it to reach many sources it has to compete with good content which people will tend to delete less quickly. Of course they may also be running a number of peers holding their jammed data to firstly publish, but secondly to inflate it's apparent popularity. My interest is in server and user privacy, so I tend to think in terms of the p2p network features that enable them. So to get back to the assumptions, if we take the full general case, which is I think has aspects of gnunet (UDP transport option, small packets, forwarded so you don't necessarily know which node is serving, and add swarmcasting to that mix, and server deniability wrt to what it is serving, then ignoring nodes that serve bad content might not even be optimal as firstly you don't know which node served it, and serial tree downloads from a single node may not be available due to fragmentation and server deniability. So I'd say in the general case I think you need the parallel but authenticatable per packet approach I gave. Freenet also does the chunking and redirection with cacheing for server privacy, user privacy and dynamic load balancing. In specific simpler cases where you don't care about tree efficienciy, server deniability isn't there, forwarding and privacy isn't there then practically you can do just not bother for simplicity (as perhaps bittorrent may be doing). Adam From hopper at omnifarious.org Thu Jul 10 08:59:02 2003 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <20030710085938.A13047545@exeter.ac.uk> References: <20030704054745.B13145352@exeter.ac.uk> <1057775910.13939.896.camel@monster.omnifarious.org> <20030710085938.A13047545@exeter.ac.uk> Message-ID: <1057852681.13939.924.camel@monster.omnifarious.org> On Thu, 2003-07-10 at 02:59, Adam Back wrote: > And that would be a simplifying argument? (Ie if the auth data is the > auth chunk size, then you can by definition downlaod it serially). > > Or just an interesting statistic? Interesting statistic. Since a blocksize is chosen mainly because of network and protocol stack properties, it should be fixed for the lifetime of a protocol run. It's nice to see then what you can accomplish with a full block of data. > > Lastly, it is easy to specify which subset of verification data you > > need. Any data block you recieve may need > > log2(number_of_blocks_in_file) hashes worth of verification data. > > You can simply send a bitstring with one bit for each hash you might > > potentially need saying whether or not you actually need it or not. > > That might be an interesting little protocol for THEX input. > > Note however it only works where each node _has_ the whole file. For > systems which do swarmcasting (bittorrent, edonkey?) then they don't. > > However I suppose if one wanted this, they could retain the log2 hash > path that presumably they got when they fetched that chunk. (You > could coalesce chunk lo2 hash paths as they are downloaded if desired > to save local storage space, and still be able to recreate the paths). This is what I was imagining would be done. Every node should have the hash values to be able to verify all the blocks it has recieved, and therefor could validly send. > gnunet supports downloads of files in minimally UDP suitable chunk > sizes (if I recall Chris Grothoff said 1KB chunks in his PET03 > presentation or chatting afterwards). I also miscalculated badly as I used the size of the hash in bits rather than bytes. Using correct math... With 1KB blocks, you could have a 4TB file and still have all the verification data for a block fit into a block. This is even with a 256 bit hash. If you were willing to expand the verification blocks to 1280 bytes, you could have this property for a 1PB file. From what I understand, Hollywood digital masters are in the low terabyte range, so a 1280 byte verification block should be good until everybody's adopted gigabit ethernet and larger block sizes make a lot of sense. :-) Have fun (if at all possible), -- There's an excellent C/C++/Python/Unix/Linux programmer with a wide range of other experience and system admin skills who needs work. Namely, me. http://www.omnifarious.org/~hopper/resume.html -- Eric Hopper -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030710/802b4e2e/attachment.pgp From b.fallenstein at gmx.de Thu Jul 10 12:56:02 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Comments on draft-thiemann-cbuid-urn-00 Message-ID: <3F0DC458.6090401@gmx.de> [cc:ed to p2p-hackers and urn-nid] Dear Peter, I just saw your recent I-D in the archives of the urn-nid mailing list. (I'm not subscribed, thus the delay in reaction.) http://lists.research.netsol.com/pipermail/urn-nid/2003-June/000348.html I'm working on a system (Storm, ) having very similar addressing needs; in fact, I've been in the process of preparing a registration for an informal URN namespace using MIME type + cryptographic hash to identify typed octet stream resources. We are building a p2p-based extension to the Web, based on this. I think there are also plans to register a formal namespace for 'bitprints'-- http://bitzi.com/developer/bitprint which are combinations of a SHA-1 hash and a Merkle hash tree root using Tiger hashes. This namespace would not include content types. Bitprints are used by Bitzi and Onion Networks . Our project also uses bitprints for compatibility. (The only thing setting our namespace apart is that we include MIME types.) I hope that we can archieve some collaboration here. It would be good if we could reach agreement on a single namespace, so that URNs in this namespace can be resolved with either of our systems. Below are some high-level comments regarding the approach of your I-D. (I'll save low-level comments for a later stage of the discussion. :) ) - It seems obvious that a shared namespace would be a good thing. If someone made a link between two files in one of your repositories, for example, and someone else published the linked-to file using our system, then a browser understanding both systems could follow the link from your repository by downloading the file through our system. - Your proposal attempts to be general by allowing different hash functions to be used. I am wondering about that: It seems good practice to keep protocol parameters extensible, but it also means that there will be different ids for the same resource-- impractical when you try to look it up under one id, but it's stored under another! On the other hand, if you say "it's *this* hash" there will be people who'll want to use another, and they'd have to create their own namespace. Since this is not a standard-- it's an informational RFC-- there is no reason why they wouldn't. I think the way to go would be to provide a general namespace for hash-based URNs, but to specify one 'prefered' way to use it, noting that there are several systems implementing this way, already. But I'm not sure. - All your identifiers use only a single hash function. Especially for URNs, which are supposed to be long-lived, there is a good reason to use hashes generated by more than one function: If one of the functions is broken (but the other isn't simultaneously), the ids don't become useless; you can use timestamping to extend the lifetime of your ids indefinitely (as long as the two hash functions you happen to use at a given time are never broken simultaneously). [unforch the only reference I can find on this right now is the US patent on it :-(, # 5,373,561 .] I would suggest adding the ability to use more than one hash function in the same URN. The syntax could look for example like this: urn:cbuid:md5.sha1:. - Our project picked bitprint to be compatible-- because there were already at least two other independent projects using it, and it was explicitly promoted in the interest of compatibility. So I would suggest bitprints as the 'recommended' set of hash functions for this system. Using the above example syntax, they'd read like this: urn:cbuid:sha1.tigertree:. - The emerging 'industry standard' at least in the p2p community seems to encode hashes in base32; it provides shorter ids than base16 (hex), yet is also case-insensitive and uses only alphanumerics. Using the same hash, but different ASCII representations seems really icky: You *are* using the same technology, yet your ids don't resemble each other at all. I suggest that you use base32 in your namespace; see RFC 3548 if you need a definition. - Your mechanism for hashing parts of an email separately is quite application-specific; that's a pity, given that the namespace is quite generally useful otherwise. Different applications may very easily have different needs for breaking up an entity into parts; for example, I could easily imagine that somebody would like to hash each body part of a multipart message independently. It seems like solving your need through some mechanism outside this namespace registration would make the namespace simpler and more generally useful. What exactly do you need it for, anyway? - Two syntax considerations: Firstly, it would seem like a good idea to choose a syntax similar to that of RFC 2397 (The "data" URL scheme) since it also represents the MIME types of the resources it identifies in the URI. So you'd have something like, urn:cbuid:sha1:text/plain, In the flavor of URI that doesn't include content types, you could simply leave that part off. urn:cbuid:sha1: (Unambiguous because hashes cannot contain commas.) This would probably make the scheme more attractive to folks who don't want to include content types, since there would be no 'artefacts' (":*:") related to them in the content-type-less syntax. Secondly and finally, cbuid is hard to remember and easy to misspell, IMHO. If it's going to be a general namespace for cryptographic hash-based id, I'd propose simply calling it 'hash'-- urn:hash:sha1: :-) Hoping to spark some discussion, - Benja From medinajoe1 at msn.com Fri Jul 11 10:37:02 2003 From: medinajoe1 at msn.com (JOSEPH MEDINA) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] REMOVE ME Message-ID: An HTML attachment was scrubbed... URL: http://zgp.org/pipermail/p2p-hackers/attachments/20030711/0603ee25/attachment.html From zooko at zooko.com Fri Jul 11 11:32:02 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] list admin trivia (was: REMOVE ME) In-Reply-To: Message from "JOSEPH MEDINA" of "Fri, 11 Jul 2003 10:36:06 PDT." References: Message-ID: > REMOVE ME Your quiet, attentive listadmins will deal with this, as always. FYI, I also filter out about half-a-dozen spams a day (or, more precisely, I check all the stored spam and filter *in* the occasional good post from a non-subscriber) and I deal with the people who show up every couple of weeks looking for someone to go into business with them manufacturing phony phone cards for use in Europe. Let me know if you want me to forward that last kind to you. Just kidding. But let this be a lesson to you: don't give your mailing list a name that includes the string "hack". --Z From bram at gawth.com Sat Jul 12 07:24:02 2003 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: <20030710085938.A13047545@exeter.ac.uk> Message-ID: Adam Back wrote: > Note however it only works where each node _has_ the whole file. For > systems which do swarmcasting (bittorrent, edonkey?) then they don't. eDonkey does do swarming. It's actually quite sophisticated, supporting putting downloaders in queues and rarest first piece downloads. But it doesn't have any tit for tat leech resistance properties. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bram at gawth.com Sat Jul 12 07:31:02 2003 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:21 2006 Subject: bittorrent tree mechanism (Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach) In-Reply-To: <20030710090630.B13047545@exeter.ac.uk> Message-ID: Adam Back wrote: > That is to say as far as I know we are at this point ahead of the > arms-race with the jammers who are satisfying themselves with > exploiting the weakness of the rating systems to discover pre-rated > known-good hashes - and just publishing mislabelled files, empty files > an dfiles full of taunts. That's an issue of file discovery. The hashing information can be included with the basic metainfo for a file - it isn't all that much bigger than the hash tree root is, at least compared to the file as a whole. BitTorrent completely skips out on the whole discovery issue by making it be launched by clicking on a hyperlink. That does a good job of getting rid of all the fake file spam. > Do you swarm-cast the tree? Or is the tree downloaded from the > bittorrent index server? Or is it downloaded from a random node? It's downloaded from the original web site, frequently but not necessarily the same machine as the tracker. > Do you check the consistency of the 2nd level with respect to the > master hash prior to swarmcasting content? There is no master hash sent. > What would the bittorrent client do if the tree failed? Fail with > error message or repeat until success? Repeat until success. Piece failing happens once in a while even with nothing bad going on in the system. I don't know why. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From tpm101 at gmx.net Sun Jul 13 19:32:02 2003 From: tpm101 at gmx.net (Tim Muller) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] THEX efficiency for authenticated p2p dload. and alternate approach In-Reply-To: References: Message-ID: <200307140331.45299.tpm101@gmx.net> On Saturday 12 July 2003 15:23, Bram Cohen wrote: > eDonkey does do swarming. It's actually quite sophisticated, supporting > putting downloaders in queues and rarest first piece downloads. But it > doesn't have any tit for tat leech resistance properties. From http://www.edonkey2000.com and http://www.overnet.com (20 June 2003): ----- "eDonkey2000 now with the Horde!!! 6.20.03 - There is a new version on the download page. It includes a new download system called Horde. Horde makes downloading even faster. You will join the Horde for a file you are downloading. You will then find other users that are also in the Horde and partner with them. This means that you will each send parts of the file to each other until it is complete. This way you work closely with other people that are also downloading the file to complete it together. The Horde will work together to ensure that the file is downloaded as fast as possible to everyone. Horde is the leech killer. When you download in a Horde you are always seeking partners that give you the best speeds. Since everyone is doing this, those that have the highest upload speeds will also get the highest download speeds. If you don't upload then you wont find people to partner with you so your downloads will be sluggish. With Horde the more you give the more you will receive." ----- ;-) -Tim From jlevine at bayarea.net Sun Jul 13 22:17:02 2003 From: jlevine at bayarea.net (jlevine@bayarea.net) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] P2punks meeting Monday evening July 14 7:30pm Message-ID: Reminder... ---------- Forwarded message ---------- You know the routine... Also I have a few new Glyphguy t-shirts for anyone whose wardrobe is getting thin. See you there. -------------- Where: Dana Street Roasting Company 744 Dana St., Mountain View Phone: (650) 390-9638 1/2 block off Castro St. When: 7:30pm onward Website: http://www.bitbin.org/p2punks From tor.klingberg at gmx.net Mon Jul 14 10:26:02 2003 From: tor.klingberg at gmx.net (Tor Klingberg) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] RFC3548 is out References: <388601c34659$9e6070e0$6ac9010a@mail2world.com> <20030709210018.GL7976@bodin.org> Message-ID: <00f701c34a2c$ef6f96b0$722c43d5@Scaleo> From: "Magnus Bodin" > On Wed, Jul 09, 2003 at 01:35:21PM -0700, Oscar Cisneros wrote: > > Can you elaborate, please? What is this RFC all about? > > It's finally an RFC that covers _only_ base64, base32 and base16 so > other standards may refer to that one instead of some embedded stuff. I hope the base32 spec matches that used by Gnutella and Gordon More's Bitzi, found at http://groups.yahoo.com/group/the_gdf/files/Proposals/Working%20Proposals/HU GE/draft-gdf-huge-0_94.txt (section 2.3) I suppose it does. Just want to check. /Tor From Raphael_Manfredi at pobox.com Mon Jul 14 11:13:02 2003 From: Raphael_Manfredi at pobox.com (Raphael Manfredi) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: RFC3548 is out In-Reply-To: <00f701c34a2c$ef6f96b0$722c43d5@Scaleo> References: <388601c34659$9e6070e0$6ac9010a@mail2world.com> <20030709210018.GL7976@bodin.org> <00f701c34a2c$ef6f96b0$722c43d5@Scaleo> Message-ID: Quoting p2p-hackers@zgp.org from ml.p2p.hackers: :I hope the base32 spec matches that used by Gnutella and Gordon More's :Bitzi, found at :http://groups.yahoo.com/group/the_gdf/files/Proposals/Working%20Proposals/HU :GE/draft-gdf-huge-0_94.txt (section 2.3) It does match. Raphael From jlevine at bayarea.net Mon Jul 14 13:56:02 2003 From: jlevine at bayarea.net (jlevine@bayarea.net) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] P2punks meeting TONITE Monday evening July 14 7:30pm Message-ID: Last pester...see you all there! James -------------- Where: Dana Street Roasting Company 744 Dana St., Mountain View Phone: (650) 390-9638 1/2 block off Castro St. When: 7:30pm onward Website: http://www.bitbin.org/p2punks From thiemann at informatik.uni-freiburg.de Tue Jul 15 08:24:03 2003 From: thiemann at informatik.uni-freiburg.de (Peter Thiemann) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00 In-Reply-To: <3F0DC458.6090401@gmx.de> References: <3F0DC458.6090401@gmx.de> Message-ID: Dear Benja, BF> [cc:ed to p2p-hackers and urn-nid] BF> I just saw your recent I-D in the archives of the urn-nid mailing BF> list. (I'm not subscribed, thus the delay in reaction.) BF> http://lists.research.netsol.com/pipermail/urn-nid/2003-June/000348.html [ Nice to see that somebody is listening to this list, since nobody is really subscribed to it. I only received a message that my message awaits moderator approval. ] BF> I'm working on a system (Storm, BF> ) The project description link on that page is stale. BF> having very similar addressing needs; in fact, I've been in the BF> process of preparing a registration for an informal URN namespace BF> using MIME type + cryptographic hash to identify typed octet stream BF> resources. We are building a p2p-based extension to the Web, based on BF> this. This is not quite what we are after. We are interested in using p2p techniques for implementing a global sharable mail store. The main application is to use the ids as a replacement for IMAP's concept of message uids to allow for easy synchronization of client caches and replicas of the mail server. Hence the bias towards the message/rfc822 type in the draft. However, the intention was to design a scheme which is usable for other information storage and retrieval. Perhaps, your storm project can provide the storage layer that we need? BF> 'bitprints'-- ... Thanks for the pointer! I was not aware of those efforts. BF> I hope that we can archieve some collaboration here. It would be good BF> if we could reach agreement on a single namespace, so that URNs in BF> this namespace can be resolved with either of our systems. Same here. BF> Below are some high-level comments regarding the approach of your BF> I-D. (I'll save low-level comments for a later stage of the BF> discussion. :) ) BF> - It seems obvious that a shared namespace would be a good thing. If BF> someone made a link between two files in one of your repositories, BF> for example, and someone else published the linked-to file using our BF> system, then a browser understanding both systems could follow the BF> link from your repository by downloading the file through our BF> system. I agree completely. That's why we made some effort in keeping the scheme simple and extensible. This kind of support also makes the namespace application stronger. BF> - Your proposal attempts to be general by allowing different hash BF> functions to be used. I am wondering about that: It seems good BF> practice to keep protocol parameters extensible, but it also means BF> that there will be different ids for the same resource-- impractical BF> when you try to look it up under one id, but it's stored under BF> another! My general feeling is that this is not a huge problem as long as you stay inside one application framework. For example, in our application, you perform search queries against a database of meta information and then you get the results in terms of these URNs. Next, you try to retrieve (some of) these URNs basically from known hosts. So our application relies more on the uniqueness properties to achieve distribution and replication rather than on the naming function. BF> On the other hand, if you say "it's *this* hash" there will be BF> people who'll want to use another, and they'd have to create BF> their own namespace. Since this is not a standard-- it's an BF> informational RFC-- BF> there is no reason why they wouldn't. BF> I think the way to go would be to provide a general namespace for BF> hash-based URNs, but to specify one 'prefered' way to use it, noting BF> that there are several systems implementing this way, already. But I'm BF> not sure. Well, I'm pretty sure that the URN scheme should *not* fix one particular hash function. Instead, it should be extensible so that it does not become obsolete just because a hash function is broken or somebody discovers a new super-safe or super-efficient hash function. BF> - All your identifiers use only a single hash function. Especially for BF> URNs, which are supposed to be long-lived, there is a good reason to BF> use hashes generated by more than one function: If one of the BF> functions is broken (but the other isn't simultaneously), the ids BF> don't become useless; you can use timestamping to extend the BF> lifetime of your ids indefinitely (as long as the two hash functions BF> you happen to use at a given time are never broken BF> simultaneously). [unforch the only reference I can find on this BF> right now is the US patent on it :-(, # 5,373,561 BF> .] This is interesting but not directly relevant (I think) because we are not dealing with certificates here. Rather you want to increase the confidence (if a server supports more than one hash function) and robustness (if a server supports just one of a selection of hashes) of a data access. I really don't see why timestamping should be required because each hash value lives indefinitely long. BF> I would suggest adding the ability to use more than one hash function BF> in the same URN. BF> The syntax could look for example like this: BF> urn:cbuid:md5.sha1:. That sounds like a good proposal to me, it gives you increased confidence and robustness virtually for free. I'll put a concrete syntax proposal at the end of this message. BF> - Our project picked bitprint to be compatible-- because there were BF> already at least two other independent projects using it, and it was BF> explicitly promoted in the interest of compatibility. So I would BF> suggest bitprints as the 'recommended' set of hash functions for BF> this system. Using the above example syntax, they'd read like this: BF> urn:cbuid:sha1.tigertree:. I don't think this recommendation should be a part of a namespace application. This choice is really application dependent, so it is part of the application's description. I would not mind, though, to use it in an example (like: if you transform the id like this, then you get a valid bitprint id), I just would not want to make a normative statement. BF> - The emerging 'industry standard' at least in the p2p community seems BF> to encode hashes in base32; it provides shorter ids than base16 BF> (hex), yet is also case-insensitive and uses only BF> alphanumerics. Using the same hash, but different ASCII BF> representations seems really icky: You *are* using the same BF> technology, yet your ids don't resemble each other at all. I suggest BF> that you use base32 in your namespace; see RFC 3548 if you need a BF> definition. This is a hairy issue. I understand your reasoning. After studying the RFC I tend *not* to commit to any particular coding, but rather make the encoding a parameter with some reasonable default. Then the identifier equivalence section should state explicitly which representation is considered the normalized one. I'm not sure about the reasonable default. Can you give me a reason why p2p folks stick to base32? I don't really see an advantage over base64 for ids that are never handled by humans. BF> - Your mechanism for hashing parts of an email separately is quite BF> application-specific; that's a pity, given that the namespace is BF> quite generally useful otherwise. Different applications may very BF> easily have different needs for breaking up an entity into parts; BF> for example, I could easily imagine that somebody would like to hash BF> each body part of a multipart message independently. This is a misunderstanding of the intention, so this requires clarification in an update of the draft. The point is that many data formats contain header or other meta information (emails, images, mp3). The mode parameter signals that this meta information is ignored and only the raw contents are hashed. Since the hash only determines the raw contents, the specification needs to define how to complete the contents to a valid instance document of the specified type. In the case of an email message, this means to add the required fields from the RFC2822 specification. For other formats, other things have to be done. And for formats that only consist of raw contents, it will not make sense to define a mode 1 id. BF> It seems like solving your need through some mechanism outside this BF> namespace registration would make the namespace simpler and more BF> generally useful. What exactly do you need it for, anyway? see the top of this mesage. BF> - Two syntax considerations: Firstly, it would seem like a good idea BF> to choose a syntax similar to that of RFC 2397 (The "data" URL BF> scheme) since it also represents the MIME types of the resources it BF> identifies in the URI. So you'd have something like, BF> urn:cbuid:sha1:text/plain, I don't think it is a good idea to put the mediatype at this point because it separates the sha1 and the fields which belong together, logically. BF> In the flavor of URI that doesn't include content types, you could BF> simply leave that part off. BF> urn:cbuid:sha1: BF> (Unambiguous because hashes cannot contain commas.) This would BF> probably make the scheme more attractive to folks who don't want to BF> include content types, since there would be no 'artefacts' (":*:") BF> related to them in the content-type-less syntax. I don't see the point of doing so. Fact is that the ids are to be processed by machines and only occasionally will they pop up "in the human eye". Reading and writing them should be straightforward to implement. Hence, my preference is to have a fixed number of fields separated by ":" and simply leave unspecified slots empty (admittedly, the "*" is artificial). That is urn:cbuid::sha1.tigertree::. [the empty slot after the sha1.tigertree is reserved for specifying the encoding of the following hashes, see the grammar below] BF> Secondly and finally, cbuid is hard to remember and easy to misspell, BF> IMHO. If it's going to be a general namespace for cryptographic BF> hash-based id, I'd propose simply calling it 'hash'-- BF> urn:hash:sha1: Well, I could be talked into that. The name 'cbuid' stems from the idea that those URNs are going to generalize IMAP's uids as mentioned above. However, given that these URNs are not for human consumption (or are they), I see no convincing technical argument in favor of any particular name. Do you have one? Proposed Grammar: cbuid-nss = type-spec ":" hash *1(":" type-specific-extension) type-spec = [media-type *parameter] parameter = ";" "mode" "=" 1*DIGIT / ";" token "=" token token = 1*(ALPHA / DIGIT) hash = hash-scheme ":" hash-enc ":" hash-values *("/" hash-values) hash-scheme = hash-item *("." hash-item) hash-item = "md5" / "sha1" / "hash127" / hash-token hash-token = token hash-enc = ["base16" / "base32" / "base64"] hash-values = hash-value *("." hash-value) hash-value = token Changes: - typespec can be empty - more than one hash function - hash-enc[oding] added Questions: * does there have to be a registry for names of hash functions? Clearly, the namespace definition cannot be updated for each new hash function. * which value of should be the default (and probably be the result of the normalization process)? I can guess your answer... Best wishes -Peter From gojomo at bitzi.com Tue Jul 15 08:52:02 2003 From: gojomo at bitzi.com (Gordon Mohr) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00 References: <3F0DC458.6090401@gmx.de> Message-ID: <008401c34ae8$ef44fdd0$660a000a@golden> Peter Thiemann writes: > Can you give me a reason why p2p folks stick to base32? I don't really > see an advantage over base64 for ids that are never handled by humans. Content identifiers are often sent via email; they are also (very occasionally) put on paper. Content identifiers also appear inside other URIs (such as HTTP) or as filenames. Base64 characters aren't always legal in filenames, and may need to be escaped in other URIs. Content identifiers also occasionally get usefully catalogued by full-text indexers, but Base64 characters are usually considered word-boundaries or other punctuation by such tools. Thus Base32 identifiers can be sought as atomic words, while Base64 identifiers cannot. For example, try: http://www.google.com/search?q=EPH3MTGDELUYJU7UDWSRA6B3PAYVEILO Sometimes it may be handy to use a truncation of the full identifier -- say the first 4-6 characters -- as a nickname of the full identifier to distinguish (in a non-secure way) between variants of otherwise similar files. Such use only heightens the other problems with Base64: your short identifier could be "/a+A/4" instead of something like "AZB3A4". - Gojomo @ Bitzi ____________________ Gordon Mohr Bitzi CTO . . . describe and discover files of every kind. _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! From justin at chapweske.com Tue Jul 15 09:12:01 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00 In-Reply-To: References: <3F0DC458.6090401@gmx.de> Message-ID: <3F1427B7.6010402@chapweske.com> > > Well, I'm pretty sure that the URN scheme should *not* fix one > particular hash function. Instead, it should be extensible so that it > does not become obsolete just because a hash function is broken or > somebody discovers a new super-safe or super-efficient hash function. > I actually disagree. In order to avoid chosen-algorithm attacks, the set of standardized hashes should be kept very small and directly under the control of the IETF and at the guidance of the CFRG. By having it as a top level URN scheme such as urn:sha1, implementors will tend to focus on the algorithm itself rather than the notion of generic hash-based naming. We must avoid having a generic hash name space that allows developers to add new hashes willy-nilly. I believe that many non crypto-savvy developers would tend to support every algorithm specified in the name space without understanding that by supporting multiple agorithms you introduce a weakest-link condition. Also, for reasons of interoperability, it would be useful if the number of different URNs be kept small. A large number of the P2P developers have agreed upon SHA1 for the time being, which could potentially enable very simple interoperability between these systems. I also think some mechanism should be introduced to deprecate hash schemes over time. So while urn:md5 would be a decent name space to add for compatibility with existing MD5-based applications, such a scheme should be immediately denoted as being deprecated to avoid having new applications adopt MD5. > > BF> I would suggest adding the ability to use more than one hash function > BF> in the same URN. > > BF> The syntax could look for example like this: > > BF> urn:cbuid:md5.sha1:. > > That sounds like a good proposal to me, it gives you increased > confidence and robustness virtually for free. I'll put a concrete > syntax proposal at the end of this message. > I don't believe this is a good idea from a security perspective. I fear that most implementors would only verify one of the hashes, and unless they make a judicious choice about the hash to verify, they are again opening themselves to a chosen-algorithm attack. Otherwise, implementors w/o the proper context to decide which hash is the strongest are forced to verify both hashes, which will cut their hashing performance roughly in half. Zooko, do you have any thoughts on this? These seem like the types of attacks that you've spent a lot of time thinking about. -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From zooko at zooko.com Tue Jul 15 09:24:02 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00 In-Reply-To: Message from Peter Thiemann of "15 Jul 2003 17:14:37 +0200." References: <3F0DC458.6090401@gmx.de> Message-ID: > Can you give me a reason why p2p folks stick to base32? I don't really > see an advantage over base64 for ids that are never handled by humans. This was discussed on the p2p-hackers mailing list: http://zgp.org/pipermail/p2p-hackers/2001-September/date.html Look for messages with the subject "please prefer base 64 over base 32". I named that thread, because I started it by arguing in favor of base 64. In the course of the discussion Gordon Mohr and others convinced me to change my mind, and I adopted base-32 afterward. Regards, Zooko http://zooko.com/ From zooko at zooko.com Tue Jul 15 09:46:02 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00 In-Reply-To: Message from Justin Chapweske of "Tue, 15 Jul 2003 11:11:35 CDT." <3F1427B7.6010402@chapweske.com> References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> Message-ID: Justin Chapweske wrote: > > > BF> I would suggest adding the ability to use more than one hash function > > BF> in the same URN. > > > > BF> The syntax could look for example like this: > > > > BF> urn:cbuid:md5.sha1:. [...] > I don't believe this is a good idea from a security perspective. [...] [...] > Zooko, do you have any thoughts on this? These seem like the types of > attacks that you've spent a lot of time thinking about. I *have* thought about this issue. The short of it is: I'm not sure how to do it right, so I'm not going to do it. For example, what should the required/allowed behavior be when one of the hashes fails to match and the matches? Are implementors required to check all hashes that are included? I suspect that there might be a way to do secure, smooth, backwards- compatible upgrade past certain kinds of hash algorithm breakages. However, I haven't seen a complete description of how it would work. (The algorithm-negotiation features of SSL/TLS might be considered an example of what *not* to do, and in any case they cannot be carried over to this application since they are interactive.) So, lacking a clear understanding of how to do secure and useful algorithm upgrade, I have satisfied myself with simply hardcoding SHA-1. I accept that the identifiers thus generated might become unreliable in as little as ten years' time. When I *do* decide to change algorithms, I intend to do so unambiguously by either changing the namespace or relying on the length of the identifier. Mnet URI's currently look like this: mnet:38ppp56jbb8b64zrh8reoadzgn1zpdxc76enkmqduwtf4tug (They use SHA-1, exclusively and non-optionally.) If I were to change to SHA-256 or something, I would probably make them look like one of these: mnet:7nbcku4ijbk848kgzakqs316hdnbb6magsqag5hybrw1gmqf5b46xwenwz9um1qqhw8o (Uses the new, longer length to indicate that it uses the new algorithm.) or: mneu:7nbcku4ijbk848kgzakqs316hdnbb6magsqag5hybrw1gmqf5b46xwenwz9um1qqhw8o znet:7nbcku4ijbk848kgzakqs316hdnbb6magsqag5hybrw1gmqf5b46xwenwz9um1qqhw8o mnet2:7nbcku4ijbk848kgzakqs316hdnbb6magsqag5hybrw1gmqf5b46xwenwz9um1qqhw8o (Uses a different namespace.) I'm not at all concerned about using more namespace identifiers. There will be tens of thousands of different namespace identifiers used during the next ten years, nearly all of which will immediately die and become corpse namespaces. If Mnet still has any relevance at all in 2013, then it will be deserving of another namespace identifier. Regards, Zooko http://zooko.com/ From b.fallenstein at gmx.de Tue Jul 15 15:59:02 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Cryptographic hashes in URNs (was: Comments on draft-thiemann-cbuid-urn-00) In-Reply-To: References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> Message-ID: <3F1486C8.3000706@gmx.de> Hi all, This is a discussion about the registration of a URN namespace based on cryptographic hashes, i.e., a namespace of octet streams as identified by their hashes. Some security-related issues have (unsurprisingly) turned up, so I'm now cross-posting to three mailing lists-- - urn-nid (technical discussion of URN namespace registrations) - p2p-hackers (because one important user base would be the p2p community) - the IRTF crypto forum (to pick their minds about the security issues). I think it's appropriate in all three, but please stay strictly on topic :-) Justin Chapweske wrote: > [Peter Thiemann wrote: -bf] >> Well, I'm pretty sure that the URN scheme should *not* fix one >> particular hash function. Instead, it should be extensible so that it >> does not become obsolete just because a hash function is broken or >> somebody discovers a new super-safe or super-efficient hash function. [Later in the mail, Peter even suggested a registry for hash functions.] > I actually disagree. In order to avoid chosen-algorithm attacks, > the set of standardized hashes should be kept very small and directly > under the control of the IETF and at the guidance of the CFRG. I agree; if we have a urn:hash: namespace or similar, the list of allowable hash functions should be fixed in the namespace registration, and only be changeable by going through the RFC process. Regarding chosen-algorithm attacks, we have to keep in mind what our adversary model is. It should be noted that the hash function is picked by the person publishing a URN, not by the person a file is downloaded from. An attack using a hash collision would still be possible, but it would go more like this: - Find a hash collision, H = h(x) = h(y), x != y. - Use x to obtain some form of a "good rating" on H; this could include something like having an independent, widely trusted entity publish H as the hash of some good data, or even obtaining a signature on H. - Make y available for download with H as its id. If the URN is published by someone else than the adversary, the adversary doesn't get to choose the algorithm. > By having it as a top level URN scheme such as urn:sha1, implementors > will tend to focus on the algorithm itself rather than the notion > of generic hash-based naming. Maybe you're right. OTOH this doesn't generalize to using combinations of hash functions (more on this below). > We must avoid having a generic hash name space that allows developers > to add new hashes willy-nilly. I believe that many non crypto-savvy > developers would tend to support every algorithm specified in the > name space without understanding that by supporting multiple agorithms > you introduce a weakest-link condition. These are separate issues. Having developers add non-standard hash functions seems like a really bad idea. Supporting every function in the namespace registration may be a bad idea, but do you trust non-crypto-savvy developers to make a sensible *choice* of which algorithms to support (assuming that there *are* namespaces for different functions, including maybe md5 for backward compatibility). It should also be noted that the spec can try to educate those people that do read it. Of course there are always those that never read the spec, just use what the others use. > Also, for reasons of interoperability, it would be useful if the number > of different URNs be kept small. I agree, but what about developers who don't want to use e.g. SHA1 for technical reasons? (In particular hash size; possibly also backwards compatibility, especially if they have a running system they want to augment with URNs.) Well, maybe such developers aren't plentiful; anybody know any? (Not sure how on-topic this is for CFRG tho.) > A large number of the P2P developers have agreed upon SHA1 for the time > being, which could potentially enable very simple interoperability > between these systems. True, but there are also the two technical points against using SHA1 alone: - Many systems today use hash trees for multisource downloading. (Of course you can use a SHA1 hash tree, but that obviously doesn't give you the interoperability with plain SHA1 systems.) - Using just one function makes repair after the function is broken much more difficult (again, more below). > I also think some mechanism should be introduced to deprecate hash schemes > over time. So while urn:md5 would be a decent name space to add for > compatibility with existing MD5-based applications, such a scheme should be > immediately denoted as being deprecated to avoid having new applications > adopt MD5. Agreed. >> BF> I would suggest adding the ability to use more than one hash function >> BF> in the same URN. >> >> BF> The syntax could look for example like this: >> >> BF> urn:cbuid:md5.sha1:. >> >> That sounds like a good proposal to me, it gives you increased >> confidence and robustness virtually for free. I'll put a concrete >> syntax proposal at the end of this message. > > I don't believe this is a good idea from a security perspective. I fear that > most implementors would only verify one of the hashes, and unless they make a > judicious choice about the hash to verify, they are again opening themselves to > a chosen-algorithm attack. Otherwise, implementors w/o the proper context to > decide which hash is the strongest are forced to verify both hashes, which will > cut their hashing performance roughly in half. > > Zooko, do you have any thoughts on this? These seem like the types of attacks > that you've spent a lot of time thinking about. Zooko replied: > I *have* thought about this issue. The short of it is: I'm not sure how to do > it right, so I'm not going to do it. (I understand this as: "I'm not going to use more than a single hash function in the id.") > For example, what should the required/allowed behavior be when one of the > hashes fails to match and the matches? Are implementors required to check all > hashes that are included? My take on this is: Generally, yes. If you don't, you open yourself to a very similar attack to the one I described above regarding chosen-message attacks. Assume that two hash functions, g(.) and h(.) are in use. - Create a URN containing g(x), h(y) for x != y. - Using x, obtain an endorsement for the URN from someone who verifies only g(.). - Use the endorsement to "sell" y to someone who verifies only h(.). Yes, verifying only one of the two hashes takes longer. Well, verifying one hash takes longer than verifying zero hashes; let's specify a system without this kind of security holes. I think that using a URN which includes two hashes should be read as a statement by the URN's creator that they think verifying two hashes is reasonable. If you don't agree with them, don't resolve their URN in the first place... (I don't see why Justin thinks that verifying only one hash would introduce chosen-algorithm attacks, though. Assume that a verifier is capable of verifying both g(.) and h(.), but verifies only g(.), which happens to be insecure. I think it is reasonable to think that the verifier would also accept a URN including *only* a g(.) hash; thus, the double hash isn't necessary for the attack. And if the developer of the verifier knew that g(.) was insecure, surely they wouldn't make their program verify only g(.) in the first place. Maybe someone can enlighten me on chosen-algorithm attacks made possible by using two functions.) The one exception I would make to having to verify both hashes is when the verifier doesn't have an implementation of both hash functions. In these cases, I would say that an implementation MAY reject the URN, but if it does not, it SHOULD provide an indication to the user that the verifier cannot guarantee that everybody else will see the same file behind this URN as given. The verifier could also provide an alternative URN where this *can* be guaranteed (i.e., one containing only one hash). > I suspect that there might be a way to do secure, smooth, backwards- > compatible upgrade past certain kinds of hash algorithm breakages. However, > I haven't seen a complete description of how it would work. Ok. I assume that you mean, a way to continue to use identifiers even after the hash functions used in these identifiers are all broken. For the following, you have to have a secure timestamping service. Assume that your identifiers include two hashes, using g(.) and h(.). Assume that g(.) is broken, but h(.) still works. Assume that you have decided upon a new hash function, i(.), which you will in the future use together with h(.). For every file x where you have the contents, not just the hash, you can: - Compute h(x) and i(x). - Timestamp the statement: "H = h(x) and I = i(x) are hashes of the same file." Now suppose that h(.) is broken, but i(.) still works. Assume that you have an old-style identifier containing g(x) and h(x). Assume that you have downloaded y, an alleged copy of x; you want to verify it against the identifier containing G = g(x) and H2 = h(x). You have the timestamp certificate of the above statement. You now verify that g(y) = G; h(y) = H = H2; and i(y) = I. If this holds, you can be sure that y = x. Proof: Assume that y != x. We know that h(y) = h(x). That's not something big; at the time of verification, we know ways of computing collisions for h(.). However, the timestamped statement contains I = i(y). At the time of verification, it is believed that nobody can obtain hash collisions on i(.). Therefore, at the time of verification, the adversary must have used y to compute i(y), or, alternatively, created a timestamp for every possible value of i(.). Since timestamping involves hashing, creating a timestamp for every possible value of i(.) requires at least as much computing power as mounting a birthday attack on the function used, which is held to be impossible. Thus, someone has intentionally associated h(x) with i(y), and thus y. To know that it makes sense to associate h(x) with y, the person must have known, at the time, that h(x) = h(y). Therefore, if y != x, then someone must have known that h(x) = h(y), **at a time where no way to compute a hash collision on h(.) was known, yet**. This is held to be impossible; thus, y = x, q.e.d. Does this satisfy you? Note that you cannot use the above in a system using only one hash function-- because once your hash function breaks, your timestamps, using that hash function, break as well! At every time, you need at least one hash function that remains unbroken. I should also note that the above idea is patented in the US and probably elsewhere. However, patents expire (this one in the 2010s, I think), and this is long-term thinking. > (The algorithm-negotiation features of SSL/TLS might be considered an example > of what *not* to do, and in any case they cannot be carried over to this > application since they are interactive.) Yup and yup. (Some protocols may use hash URNs as return values and may include algorithm negotiation, but that would be a security issue with those protocols, not with the hash URN spec.) We can of course use a different URN namespace identifier for each hash function, i.e., urn:sha1:... A point in favor of this would be that people use it already. I'm not sure how this interacts with using different hashes in a single URN, though. I think there should be a method to do so, in order to enable the lifetime extension as described above, and in order to combine 'industry standard' SHA-1 hashes with a tree hash that enables multisource downloading. I think that the specification should probably say that an implementation must check all hashes, except at explicit user discretion. I think that if an an application does check all hashes, there are no security considerations with using more than one hash. OTOH there are security benefits-- if only that birthday attacks become harder to mount because of the additional bits. The question then is, if we allow e.g. {sha1,sha256,tiger} each by itself, do we also allow every possible combination of the above? Or do we limit ourselves to a specified set of combinations, e.g. sha1+tiger but not sha1+sha256? I believe that there is no security gain from allowing the former but disallowing the latter. OTOH I could easily imagine that if sha1+tiger is popular, some people would like to use sha1+tiger+sha256 for added security. There is the issue of having less URNs per resource, of course, which is outside the security domain I think. I'm not sure how much a concern this is... opinions? If we allow all possible subsets, using URN namespace names becomes problematic; we'd have to register urn:sha1:... urn:sha256:... urn:tiger:... urn:sha1+sha256:... urn:sha1+tiger:... urn:sha256+tiger:... urn:sha1+sha256+tiger:... (urks). Or, we'd have to do something like, use urn:sha1:... for just one hash but urn:hashes:sha1+sha256:... for more than one hash. Then, we'd only have to register urn:sha1:... urn:sha256:... urn:tiger:... urn:hashes:... Opinions? Thanks, - Benja From justin at chapweske.com Tue Jul 15 17:14:02 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Cryptographic hashes in URNs In-Reply-To: <3F1486C8.3000706@gmx.de> References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3F1486C8.3000706@gmx.de> Message-ID: <3F14989C.2020501@chapweske.com> > > A large number of the P2P developers have agreed upon SHA1 for the time > > being, which could potentially enable very simple interoperability > >> between these systems. > > > True, but there are also the two technical points against using SHA1 alone: > > - Many systems today use hash trees for multisource downloading. (Of > course you can use a SHA1 hash tree, but that obviously doesn't give you > the interoperability with plain SHA1 systems.) > - Using just one function makes repair after the function is broken much > more difficult (again, more below). > As they are defined today (http://open-content.net/specs/draft-jchapweske-thex-02.html), the tree hashes are parametizable to allow different file segment (leaf) sizes to be specified. IFAIK other traditional digest algorithms are set in stone, allowing no parameterization(?). If appropriate, I would be interested in certain hash tree forms being validated and standardized as normal digest functions. This would allow hash tree forms to be incorporated into existing standards that rely on a normal digest function. In regards to using multiple functions, most current systems that output multiple functions simply output them as multiple seperate pieces of metadata. For instance, a system could output something like the following: X-Content-URN: urn:sha1:FAB6CX2GZSWOOWCPXXBYSFBUSN4LIGTF X-Content-URN: urn:tree:tiger:3FOBMWPE2JED5DUN2VA6J7DGSNJNJILE4HRF6SQ I believe that in many systems the hashes will simply be treated as pieces of meta-data that are used to verify the integrity of a file. Just because they contain 'urn' in them doesn't mean that they have to be used as identifiers. So, if you look at it from the meta-data perspective, I think its natural to use multiple independant hashes rather them glomming them all together. > (I don't see why Justin thinks that verifying only one hash would > introduce chosen-algorithm attacks, though. Assume that a verifier is > capable of verifying both g(.) and h(.), but verifies only g(.), which > happens to be insecure. I think it is reasonable to think that the > verifier would also accept a URN including *only* a g(.) hash; thus, the > double hash isn't necessary for the attack. And if the developer of the > verifier knew that g(.) was insecure, surely they wouldn't make their > program verify only g(.) in the first place. Maybe someone can enlighten > me on chosen-algorithm attacks made possible by using two functions.) My point is subtle and perhaps irrelevant, but let me try to clarify: When I see something like: urn:hash:sha1.md5: This implies to me that I am obligated (not sure to whom) to verify both hashes. I believe these semantics are reasonable, however you will find very few developers will be willing to follow these semantics and verify both hashes. However, when I see something like: X-Content-URN: urn:md5:42J46YB3Y3OLLFYL52B4LNDE34 X-Content-URN: urn:sha1:FAB6CX2GZSWOOWCPXXBYSFBUSN4LIGTF X-Content-URN: urn:tree:tiger:3FOBMWPE2JED5DUN2VA6J7DGSNJNJILE4HRF6SQ X-Content-URN: urn:crc32: This implies to me that I should apply some sort of ranking between the algorithms and not use any that are below my standards. If I am not confident about any single hash, I am free to verify multiple of them. Obviously from a technical perspective, both approaches are the same, it just seems to me that the first approach invites developers to defy the semantics, while the second approach is likely to be a healthier approach. I am not a developer psychologist, so I'm very open to other viewpoints on this. -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From b.fallenstein at gmx.de Tue Jul 15 18:04:02 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Cryptographic hashes in URNs In-Reply-To: <3F14989C.2020501@chapweske.com> References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3F1486C8.3000706@gmx.de> <3F14989C.2020501@chapweske.com> Message-ID: <3F14A421.5010502@gmx.de> Hi, Justin Chapweske wrote: > If appropriate, I would be interested in certain hash tree forms being > validated and standardized as normal digest functions. This would allow > hash tree forms to be incorporated into existing standards that rely on > a normal digest function. Seems appropriate to me. > In regards to using multiple functions, most current systems that output > multiple functions simply output them as multiple seperate pieces of > metadata. For instance, a system could output something like the > following: > > X-Content-URN: urn:sha1:FAB6CX2GZSWOOWCPXXBYSFBUSN4LIGTF > X-Content-URN: urn:tree:tiger:3FOBMWPE2JED5DUN2VA6J7DGSNJNJILE4HRF6SQ > > I believe that in many systems the hashes will simply be treated as > pieces of meta-data that are used to verify the integrity of a file. > Just because they contain 'urn' in them doesn't mean that they have to > be used as identifiers. So, if you look at it from the meta-data > perspective, I think its natural to use multiple independant hashes > rather them glomming them all together. Well, I use them as context-less names that resolve to something; and you simply cannot give two different URNs in one . If you use them as metadata about a different resource, then I agree you don't need to put different hashes into a single URN. (OTOH, if you go the other way around, and want to give RDF metadata about a resource identified by a hash-URN-- e.g., ratings or locations--, there are good reasons for using a URN with more than one hash again, so that the resource name you store information about doesn't need to become invalid if one hash function is broken.) >> (I don't see why Justin thinks that verifying only one hash would >> introduce chosen-algorithm attacks, though. Assume that a verifier is >> capable of verifying both g(.) and h(.), but verifies only g(.), which >> happens to be insecure. I think it is reasonable to think that the >> verifier would also accept a URN including *only* a g(.) hash; thus, >> the double hash isn't necessary for the attack. And if the developer >> of the verifier knew that g(.) was insecure, surely they wouldn't make >> their program verify only g(.) in the first place. Maybe someone can >> enlighten me on chosen-algorithm attacks made possible by using two >> functions.) > > My point is subtle and perhaps irrelevant, but let me try to clarify: > > When I see something like: > > urn:hash:sha1.md5: > > This implies to me that I am obligated (not sure to whom) to verify both > hashes. I believe these semantics are reasonable, however you will find > very few developers will be willing to follow these semantics and verify > both hashes. I dunno... if the spec explains how this is a security leak, and the developers still do it, I think there's little I can do... > However, when I see something like: > > X-Content-URN: urn:md5:42J46YB3Y3OLLFYL52B4LNDE34 > X-Content-URN: urn:sha1:FAB6CX2GZSWOOWCPXXBYSFBUSN4LIGTF > X-Content-URN: urn:tree:tiger:3FOBMWPE2JED5DUN2VA6J7DGSNJNJILE4HRF6SQ > X-Content-URN: urn:crc32: > > This implies to me that I should apply some sort of ranking between the > algorithms and not use any that are below my standards. If I am not > confident about any single hash, I am free to verify multiple of them. I would actually agree with your assessments of the meaning of both statements -- the first implies you have to verify both, the second implies you have to verify as many as you deem necessary. So your point is: Because many developers will-- maybe-- use the second kind of semantics always, having the form aiming for the first kind of semantics is futile. I need the first kind of semantics, though, if I want to use hash URNs as reliable, trusted, *context-less* identifiers. (I.e., without being able to give more than one URN in an .) I think having a way to convey these semantics would be good, even if some people will implement it in an insecure way. Seems to me like this is generally useful where people use URNs for identifiers. I mean, when I post a URN in a forum, giving two URNs seems less practical and posting a number of URNs and saying "Download as many of these as you need to meet your security requirements, and verify that they are all equal" seems weird. :-) You should click on it and your software should handle the rest. My current thinking is that if the spec explains in which way verifying only one hash is a security leak, and if developers still do it, then it's an issue between the developers and their users-- if the users and developers are willing to accept the security leak of having somebody recommend a URN they haven't fully checked, then that's their problem, I'd think. - Benja From justin at chapweske.com Tue Jul 15 18:13:02 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Cryptographic hashes in URNs In-Reply-To: <3F14A421.5010502@gmx.de> References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3F1486C8.3000706@gmx.de> <3F14989C.2020501@chapweske.com> <3F14A421.5010502@gmx.de> Message-ID: <3F14A687.9000008@chapweske.com> I won't both the CFRG with this post. > > I need the first kind of semantics, though, if I want to use hash URNs > as reliable, trusted, *context-less* identifiers. (I.e., without being > able to give more than one URN in an .) I think having a way to > convey these semantics would be good, even if some people will implement > it in an insecure way. > I see how your requirements are different. Perhaps the solution is to define a generic URI scheme that allows composition of multiple URIs to identify the content. It could be done in a fashion similar to the DURI draft (http://www.watersprings.org/pub/id/draft-masinter-dated-uri-03.txt) and could be made independant of hashing if you wished. So, it could be similar in spirit to the urn:hashes that you mentioned in a previous post. I'm actually rather suprised that I've never seen a URI format defined that simply implies equivilence between a set of sub-URIs. -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From gojomo at bitzi.com Tue Jul 15 20:55:02 2003 From: gojomo at bitzi.com (Gordon Mohr) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Cryptographic hashes in URNs References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3F1486C8.3000706@gmx.de> <3F14989C.2020501@chapweske.com> <3F14A421.5010502@gmx.de> <3F14A687.9000008@chapweske.com> Message-ID: <012c01c34b4d$ffb96150$660a000a@golden> Justin writes: > > I need the first kind of semantics, though, if I want to use hash URNs > > as reliable, trusted, *context-less* identifiers. (I.e., without being > > able to give more than one URN in an .) I think having a way to > > convey these semantics would be good, even if some people will implement > > it in an insecure way. > > > > I see how your requirements are different. Perhaps the solution is to > define a generic URI scheme that allows composition of multiple URIs to > identify the content. It could be done in a fashion similar to the DURI > draft > (http://www.watersprings.org/pub/id/draft-masinter-dated-uri-03.txt) and > could be made independant of hashing if you wished. I was going to suggest something similar. > So, it could be similar in spirit to the urn:hashes that you mentioned > in a previous post. > > I'm actually rather suprised that I've never seen a URI format defined > that simply implies equivilence between a set of sub-URIs. There's a facility in the "magnet:" URI format, as practiced, for other URIs to be referenced as "exact substitutes" or "acceptable substitutes" of the main "topic" of the "magnet:" link. (The topic itself is also a URI, so "magnet" links are really just activator/ envelopes for one or more other related URIs.) See: http://groups.yahoo.com/group/magnet-uri/message/9 So while declaring such equivalence is not the purpose of "magnet:" URIs, there was a desire for such a facility, and so a practice has developed. (I'd be the first to admit that practice is far from elegant, but perhaps that's unavoidable in this domain.) - Gordon @ Bitzi ____________________ Gordon Mohr Bitzi CTO . . . describe and discover files of every kind. _ http://bitzi.com _ . . . Bitzi knows bits -- because you teach it! From b.fallenstein at gmx.de Wed Jul 16 06:50:02 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Cryptographic hashes in URNs In-Reply-To: <012c01c34b4d$ffb96150$660a000a@golden> References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3F1486C8.3000706@gmx.de> <3F14989C.2020501@chapweske.com> <3F14A421.5010502@gmx.de> <3F14A687.9000008@chapweske.com> <012c01c34b4d$ffb96150$660a000a@golden> Message-ID: <3F15579E.80308@gmx.de> Hi Gordon, hi Justin-- Gordon Mohr wrote: > Justin writes: >>I see how your requirements are different. Perhaps the solution is to >>define a generic URI scheme that allows composition of multiple URIs to >>identify the content. It could be done in a fashion similar to the DURI >>draft >>(http://www.watersprings.org/pub/id/draft-masinter-dated-uri-03.txt) and >>could be made independant of hashing if you wished. > > I was going to suggest something similar. At first I thought this a good idea, but after sleeping over and mulling about it, my feeling is that this may be overgeneralization. I think now that we should probably only standardize those hash functions and combinations thereof that someone actually uses, in the interest of interoperability, wide review, and having a smaller number of different names for a resource. If someone really needs to use something different, I think it would be acceptable to have them go through the registration and public review process. So, I would suggest that we specify-- urn:sha1: urn:tigertree: urn:sha1-tigertree:. Is there anybody whose requirements wouldn't be met by such an approach? If not, I think a more general approach would be overgeneralization (and would probably introduce unnecessary syntactic ugliness). Opinions? - Benja From b.fallenstein at gmx.de Wed Jul 16 07:22:01 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: Comments on draft-thiemann-cbuid-urn-00 In-Reply-To: References: <3F0DC458.6090401@gmx.de> Message-ID: <3F155F28.6070005@gmx.de> Peter Thiemann wrote: > BF> I just saw your recent I-D in the archives of the urn-nid mailing > BF> list. (I'm not subscribed, thus the delay in reaction.) > [ > Nice to see that somebody is listening to this list, since nobody is > really subscribed to it. I only received a message that my message > awaits moderator approval. > ] I'm subscribed now :-) > BF> I'm working on a system (Storm, > BF> ) > > The project description link on that page is stale. Sorry, I've neglected putting together a homepage for far too long now. I'll put something there this week. Till then, there's of course the readme; the current version is at: http://savannah.nongnu.org/cgi-bin/viewcvs/storm/storm/README?rev=1.12&content-type=text/vnd.viewcvs-markup Basically, it's a storage system which can perform similar functions as both the Web and the local file system (unifying their namespaces), based on cryptographic technology. Storing and downloading data identified through a hash works, but there are some thorny research issues with updateable resources, which is why we haven't released anything yet (since we cannot guarantee upwards compatibility for updateable resources, and thus cannot recomment using Storm yet :( ) > BF> having very similar addressing needs; in fact, I've been in the > BF> process of preparing a registration for an informal URN namespace > BF> using MIME type + cryptographic hash to identify typed octet stream > BF> resources. We are building a p2p-based extension to the Web, based on > BF> this. > > This is not quite what we are after. We are interested in using p2p > techniques for implementing a global sharable mail store. The main > application is to use the ids as a replacement for IMAP's concept of > message uids to allow for easy synchronization of client caches and > replicas of the mail server. Hence the bias towards the message/rfc822 > type in the draft. However, the intention was to design a scheme which > is usable for other information storage and retrieval. Perhaps, your > storm project can provide the storage layer that we need? Since you won't need updateable resources, maybe so, if you can live with a system written in Java (that can be interacted with through simple HTTP, though). I'd be really happy to see our system used in this way, I've wanted to store e-mail in it for a long time. I'll want to use it :-) > BF> - All your identifiers use only a single hash function. Especially for > BF> URNs, which are supposed to be long-lived, there is a good reason to > BF> use hashes generated by more than one function: If one of the > BF> functions is broken (but the other isn't simultaneously), the ids > BF> don't become useless; you can use timestamping to extend the > BF> lifetime of your ids indefinitely (as long as the two hash functions > BF> you happen to use at a given time are never broken > BF> simultaneously). [unforch the only reference I can find on this > BF> right now is the US patent on it :-(, # 5,373,561 > BF> .] > > This is interesting but not directly relevant (I think) because we are not > dealing with certificates here. Rather you want to increase the > confidence (if a server supports more than one hash function) and > robustness (if a server supports just one of a selection of hashes) of > a data access. I really don't see why timestamping should be required > because each hash value lives indefinitely long. It lives only as long as the hash function isn't broken-- but with timestamping, you can *continue* to use the identifiers *securely*, *after* all hash functions in the identifier have already been broken! > This is a hairy issue. I understand your reasoning. After studying the > RFC I tend *not* to commit to any particular coding, but rather make > the encoding a parameter with some reasonable default. Then the > identifier equivalence section should state explicitly which > representation is considered the normalized one. Why do you think a choice of encoding is needed? > BF> - Your mechanism for hashing parts of an email separately is quite > BF> application-specific; that's a pity, given that the namespace is > BF> quite generally useful otherwise. Different applications may very > BF> easily have different needs for breaking up an entity into parts; > BF> for example, I could easily imagine that somebody would like to hash > BF> each body part of a multipart message independently. > > This is a misunderstanding of the intention, so this requires > clarification in an update of the draft. The point is that many data > formats contain header or other meta information (emails, images, > mp3). The mode parameter signals that this meta information is > ignored and only the raw contents are hashed. Hmm. I can see how this could, in principle, be useful for many applications, but I'm still wondering how many people would actually implement it. Anybody on p2p-hackers who would like to use this in their system? > Since the hash only > determines the raw contents, the specification needs to define how to > complete the contents to a valid instance document of the specified > type. In the case of an email message, this means to add the required > fields from the RFC2822 specification. Sorry, I'm not following here. Could you give an example? > For other formats, other things > have to be done. And for formats that only consist of raw contents, it > will not make sense to define a mode 1 id. > > BF> It seems like solving your need through some mechanism outside this > BF> namespace registration would make the namespace simpler and more > BF> generally useful. What exactly do you need it for, anyway? > > see the top of this mesage. Ok, but why is it important that you hash the body separately from the header? I'm simply not sure why this is important for you. > BF> In the flavor of URI that doesn't include content types, you could > BF> simply leave that part off. > > BF> urn:cbuid:sha1: > > BF> (Unambiguous because hashes cannot contain commas.) This would > BF> probably make the scheme more attractive to folks who don't want to > BF> include content types, since there would be no 'artefacts' (":*:") > BF> related to them in the content-type-less syntax. > > I don't see the point of doing so. Fact is that the ids are to be > processed by machines and only occasionally will they pop up "in the > human eye". Reading and writing them should be straightforward > to implement. Hence, my preference is to have a fixed number of fields > separated by ":" and simply leave unspecified slots empty (admittedly, > the "*" is artificial). That is > > urn:cbuid::sha1.tigertree::. > > [the empty slot after the sha1.tigertree is reserved for specifying the > encoding of the following hashes, see the grammar below] My concern was mostly about the developers who don't feel they need content types and may see this as syntactical baggage. If we move in the direction of using namespace ids for hash functions, i.e. urn:sha1: there's also the problem of backwards compatibility; many people use them without a content type, today. I think it would be good if we could allow for a content type in the syntax, but would have the content-type-less syntax as above. Cheers, - Benja From b.fallenstein at gmx.de Wed Jul 16 08:38:02 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs In-Reply-To: <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> Message-ID: <3F1570D6.8080601@gmx.de> Hi Carl, Carl Ellison wrote: > I apparently don't know the rules for URN formation. I thought it > was completely free after the "urn:". Is that not true? No. After the "urn:" comes a "namespace identifier," a colon, and a "namespace-specific string," i.e. urn:: Assignment of namespace ids is a manged process, i.e., you cannot just create your own, you have to register it with the IETF. Normally this requires an RFC. The namespace registration explains how the namespace-specific string is interpreted. For more info, see RFCs 2141 and 3406. The official registry of namespace ids is at: http://www.iana.org/assignments/urn-namespaces > Is urn:sha1:dRDPBgZzTFq7Jl2Q2N/YNghcfj8= not now legal? No, since there is no registered 'sha1' namespace. > At 12:57 AM 7/16/2003 +0200, Benja Fallenstein wrote: >> urn:sha1:... >> urn:sha256:... >> urn:tiger:... >> urn:sha1+sha256:... >> urn:sha1+tiger:... >> urn:sha256+tiger:... >> urn:sha1+sha256+tiger:... [...] >>Opinions? > > When you combine hash functions, you need to specify the combining > function also. I assume you were assuming mere concatenation, here. > It's often more than that. So, the concatenation function would have > to be listed also. Hm, can you explain what else would be needed? Do we really need other combinations than concatenation in URNs-- what would be the applications? (I'm not familiar with other ways in which you would want to combine hash functions.) > My personal preference is for anyone who wants to do this to declare > a new hash function name and define it as the particular combination > function of the particular other hash functions, but use that new > hash function name in things like the URN construct. We've seen a > couple of these, but not very many and I don't see a reason to > encourage people to do this kind of concatenation. Hm. Maybe you're right. My one point against this would be that if you use urn:sha1.tigertree:... it's easier for a developer who supports only SHA-1 to realize that they can provide at least partial verification than it is if you use urn:bitprint:... and define 'bitprint' as, 'SHA-1 concatenated with a Tiger tree-hash.' But I think I'd be fine with the latter, if it is consensus. - Benja From justin at chapweske.com Wed Jul 16 10:13:02 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs In-Reply-To: <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> Message-ID: <3F158787.6020902@chapweske.com> > I'm skipping most of this conversation, to comment on a single point > at the end.. > > Is urn:sha1:dRDPBgZzTFq7Jl2Q2N/YNghcfj8= not now legal? > The sha1 urn scheme is not yet registered, but its de facto usage utilizes a Base32 encoding, not Base64. -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From justin at chapweske.com Wed Jul 16 10:26:02 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] content-types in URIs In-Reply-To: <3F155F28.6070005@gmx.de> References: <3F0DC458.6090401@gmx.de> <3F155F28.6070005@gmx.de> Message-ID: <3F158A90.2060503@chapweske.com> Perhaps I'm misinterpretting cbuid, but it appears to suggest that the hash URIs include the content-type of the content in the URI. I think this notion is a good idea from a security perspective. It is important to retrieve the content-type from a trusted source. While probably not practical, one could envision a content-type attack where something is perfectly harmless when interpretted as a JPEG, but turns into a virus when interpretted as an executable. The problem is, I don't think this belongs in a hash-based URI, because the content-type is not self-verifiable, while the hash itself is. If you want to associate trusted meta-data with a piece of content, I would suggest adding a layer of indirection. Use a hash-based URN such as "urn:sha1" to identify the meta-data and not the file itself. I think the old Mojonation system used to do this if I'm not mistaken. An alternative to hashing the meta-data is to use a facility like the HTML tag or HTTP Link header to describe extra meta-data about a URI, such as the content type. This approach is used in RSS-autodiscovery as follows: > > My concern was mostly about the developers who don't feel they need > content types and may see this as syntactical baggage. If we move in the > direction of using namespace ids for hash functions, i.e. > > urn:sha1: > > there's also the problem of backwards compatibility; many people use > them without a content type, today. I think it would be good if we could > allow for a content type in the syntax, but would have the > content-type-less syntax as above. > -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From zooko at zooko.com Wed Jul 16 10:28:02 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs In-Reply-To: Message from Justin Chapweske of "Wed, 16 Jul 2003 12:12:39 CDT." <3F158787.6020902@chapweske.com> References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> <3F158787.6020902@chapweske.com> Message-ID: Coincidentally, there is an active (and contentious) discussion of cryptography-based naming at the "cryptography" mailing list. Three different concrete proposals, including one already deployed and one newly announced, have been mentioned. majordomo@wasabisystems.com From b.fallenstein at gmx.de Wed Jul 16 10:35:02 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs In-Reply-To: References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> <3F158787.6020902@chapweske.com> Message-ID: <3F158C33.4070604@gmx.de> Hi Zooko, Zooko wrote: > Coincidentally, there is an active (and contentious) discussion of > cryptography-based naming at the "cryptography" mailing list. > > Three different concrete proposals, including one already deployed and one > newly announced, have been mentioned. Is there an archive? > majordomo@wasabisystems.com "subscribe cryptography" gives: **** subscribe: unknown list 'cryptography'. - Benja From zooko at zooko.com Wed Jul 16 10:54:03 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] content-types in URIs In-Reply-To: Message from Justin Chapweske of "Wed, 16 Jul 2003 12:25:36 CDT." <3F158A90.2060503@chapweske.com> References: <3F0DC458.6090401@gmx.de> <3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com> Message-ID: Justin Chapweske wrote: > > I think this notion is a good idea from a security perspective. It is > important to retrieve the content-type from a trusted source. While > probably not practical, one could envision a content-type attack where > something is perfectly harmless when interpretted as a JPEG, but turns > into a virus when interpretted as an executable. I agree that this is important for security. > The problem is, I don't think this belongs in a hash-based URI, because > the content-type is not self-verifiable, while the hash itself is. I agree that this is a problem, but I wouldn't say that the file has a "real" content-type and the problem is making sure that the real content-type is included. Rather I would say that different people might honestly ascribe different type to the same file. I think it's quite reasonable to include some type information in the crypto-id of the file, but we haven't yet decided to do that in Mnet. Here is the design document for the erasure coding, encryption, and identification of files in Mnet: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/mnet/mnet_new/doc/new_filesystem.html The doc is short and sweet, and possibly of interest if you are following this thread. In particular the section on metadata details several alternatives that we considered: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/mnet/mnet_new/doc/new_filesystem.html#metadata > If you want to associate trusted meta-data with a piece of content, I > would suggest adding a layer of indirection. Use a hash-based URN such > as "urn:sha1" to identify the meta-data and not the file itself. I > think the old Mojonation system used to do this if I'm not mistaken. Yes, adding another layer of indirection in order to handle metadata is one of the ideas, actually three of the ideas, suggested in our design doc. By the way, you are mistaken -- the hash-based ids in Mojo Nation identified just the file contents (plus a file name, but not a file type other than the extension of the file name). The metadata in Mojo Nation included the crypto-id of the file contents. The metadata was unsigned XML that you trusted because you got it directly from a server that you trusted. (Although actually you had no good reason to trust the servers, so this was an open hole.) Regards, Zooko http://zooko.com/ From smb at research.att.com Wed Jul 16 10:57:02 2003 From: smb at research.att.com (Steven M. Bellovin) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs Message-ID: <20030716175034.23A027B4D@berkshire.research.att.com> In message <3F158C33.4070604@gmx.de>, Benja Fallenstein writes: > >Hi Zooko, > >Zooko wrote: >> Coincidentally, there is an active (and contentious) discussion of >> cryptography-based naming at the "cryptography" mailing list. >> >> Three different concrete proposals, including one already deployed and one >> newly announced, have been mentioned. > >Is there an archive? > >> majordomo@wasabisystems.com > >"subscribe cryptography" gives: >**** subscribe: unknown list 'cryptography'. It's now at metzdowd.com --Steve Bellovin, http://www.research.att.com/~smb (me) http://www.wilyhacker.com (2nd edition of "Firewalls" book) From b.fallenstein at gmx.de Wed Jul 16 11:25:02 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] content-types in URIs In-Reply-To: <3F158A90.2060503@chapweske.com> References: <3F0DC458.6090401@gmx.de> <3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com> Message-ID: <3F159813.4070903@gmx.de> Hi Justin, Justin Chapweske wrote: > Perhaps I'm misinterpretting cbuid, but it appears to suggest that the > hash URIs include the content-type of the content in the URI. > > I think this notion is a good idea from a security perspective. It is > important to retrieve the content-type from a trusted source. While > probably not practical, one could envision a content-type attack where > something is perfectly harmless when interpretted as a JPEG, but turns > into a virus when interpretted as an executable. Yes, or a file that is a diagram when interpreted as JPEG, but a pornographic image when interpreted as some other image format. (Given the container architecure of many formats, I don't think it's necessary impossible to find such a 'pun' where two different formats interpret the header of a file differently, and thus have two different interpretations for the body.) > The problem is, I don't think this belongs in a hash-based URI, because > the content-type is not self-verifiable, while the hash itself is. In my mind, the entity that we're identifying is a pair: (content type, octet stream) The question is, given such a pair, and given a hash-based URI, can we authenticate that the URI identifies exactly this pair and no other? If so, I would call the URI self-verifying. Now, given urn:sha1:text/plain, and a pair ("text/plain", "foobar") we can verify the against "foobar", and compare the "text/plain" from the URI to the "text/plain" in the pair, so the URI clearly maps to only one such pair (as long as finding a hash collision is impossible). Thus, sha1 identifiers of this form *would* be self-verifying. Do you identify "self-verifying" somehow differently? - Benja From b.fallenstein at gmx.de Wed Jul 16 11:45:01 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: Hash URIs and metadata (was: Re: [p2p-hackers] content-types in URIs) In-Reply-To: References: <3F0DC458.6090401@gmx.de> <3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com> Message-ID: <3F159C99.6070905@gmx.de> Hi, Zooko wrote: >>If you want to associate trusted meta-data with a piece of content, I >>would suggest adding a layer of indirection. Use a hash-based URN such >>as "urn:sha1" to identify the meta-data and not the file itself. I >>think the old Mojonation system used to do this if I'm not mistaken. > > Yes, adding another layer of indirection in order to handle metadata is one of > the ideas, actually three of the ideas, suggested in our design doc. I'm planning to implement a system that works like this, for providing HTTP content negotiation features as well as arbitrary metadata, on top of hash-based identification. You would create a "resource specification", e.g. like this: <> spec:hasRepresentation . dc:language "en" . <> spec:hasRepresentation . dc:language "fr" . <> cc:license . <> dc:author _:x . _:x foaf:mailbox . So "this resource" would have two representations in two different languages, plus associated author and license information. Then you would refer to this resource through a special URN, like urn:urn-x:sha1:FOO where FOO is the hash of the above RDF graph. Entering the above URI into a browser would bring up the English or French version of the resource, depending on your preferences. To download the defining RDF graph itself, you'd use urn:sha1:FOO If anybody is doing something similar, it would be good to hear about it and possibly collaborate. - Benja From seth.johnson at RealMeasures.dyndns.org Thu Jul 17 10:05:02 2003 From: seth.johnson at RealMeasures.dyndns.org (Seth Johnson) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Nature: WIPO to Address Free Public Goods Message-ID: <3F16D506.8FD0C47C@RealMeasures.dyndns.org> (Forwarded from CNI Copyright list) ---------- Forwarded message ---------- From: James Love To: CNI-COPYRIGHT -- Copyright & Intellectual Property Sent: 7/10/03 7:03 PM Subject: Nature: Drive for patent-free innovation gathers pace - Kamil Idris is being asked to assess the merits of an open approach to intellectual property Nature reports that WIPO has agreed to organize the meeting on open development models... jamie * Francis Gurry, an assistant director-general at the WIPO, said that the organization welcomed the idea.?The use of open and collaborative development models for research and innovation is a very important and interesting development,? he said in a statement. ?The director-general looks forward with enthusiasm to taking up the invitation to organize a conference to explore the scope and application of these models.? in html > http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v424/n6945/full/424118a_fs.html or in pdf > http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v424/n6945/full/424118a_fs.html&content_filetype=PDF 118 NATURE|VOL 424 | 10 JULY 2003 |www.nature.com/nature *** Drive for patent-free innovation gathers pace Kamil Idris is being asked to assess the merits of an open approach to intellectual property. Declan Butler,Paris A group of top scientists and economists are asking the World Intellectual Property Organization (WIPO) in Geneva to promote open models of innovation that don't rely on patents. The group believes that innovation based on freely available knowledge can be effective not just in areas where it has established a foothold -- such as genome sequence data -- but also in sectors where patent protection is entirely dominant, such as drug development (see Nature 424, 10?11; 2003). In a 7 July letter to Kamil Idris, director general of the WIPO, 59 scientists and economists call attention to the "explosion of open and collaborative projects to create public goods" in recent years, including the Human Genome Project, the open-source software movement, and Internet standards. Such projects show that "one can achieve a high level of innovation in some areas of the modern economy without intellectual property protection," says the letter, arguing that "excessive, unbalanced or poorly designed intellectual property protections may be counterproductive." It calls on the WIPO to hold a major conference on these models during 2004. The signatories include Joseph Stiglitz of Columbia University in New York, who received the 2001 Nobel prize for economics; John Sulston of the Wellcome Trust Sanger Institute near Cambridge, UK, winner of the 2002 Nobel prize for medicine; James Orbinski, former president of M?decins Sans Fronti?res; and Richard Stallman, a computer scientist regarded by many as the "father" of the open-source software movement. Francis Gurry, an assistant director-general at the WIPO, said that the organization welcomed the idea. "The use of open and collaborative development models for research and innovation is a very important and interesting development," he said in a statement. "The director-general looks forward with enthusiasm to taking up the invitation to organize a conference to explore the scope and application of these models.' Advocates of open-source innovation want the WIPO and other public agencies to rethink how innovation works, says James Love, director of the Washington-based Consumer Project on Technology and a signatory to the letter. Open research for drug development is one of the initiative?s main targets, he says. Some of the authors are also pursuing the idea of an international treaty to encourage governments to fund drug research and put the results directly into the public domain. Love argues that research results should ultimately become a freely available commodity, with drug companies competing to market generics of any drugs developed. The current system, in which drug research and development is carried out by drug companies that keep patent rights for up to 20 years, is grossly inefficient and results in excessive prices so that those who need the drugs most cannot afford them, argues Love. Yet to be fleshed out are details of how such a model would work, and how competitive forces could be maintained within it.But in May, the general assembly of the World Health Organization instructed agency officials to draft terms of reference during 2004 for a new evaluation of intellectual property, innovation and public health. Consideration of open-science models is expected to be part of this exercise. "The success of the Internet and of open-source software has driven home just how far open and collaborative projects can go," says Hal Varian, an economist at the University of California, Berkeley, who has also signed the 7 July letter. Another signatory, Paul David, an economist at Stanford University, argues that systems such as free and open-source software are not at odds with intellectual property rights protection, but rather a choice by creators and society as to the benefits they want to obtain. 118 NATURE|VOL 424 | 10 JULY 2003 |www.nature.com/nature Kamil Idris is being asked to assess the merits of an open approach to intellectual property. -- James Love, Director, Consumer Project on Technology http://www.cptech.org, mailto:james.love@cptech.org tel. +1.202.387.8030, mobile +1.202.361.3040 *** -- DRM is Theft! We are the Stakeholders! New Yorkers for Fair Use http://www.nyfairuse.org [CC] Counter-copyright: http://cyber.law.harvard.edu/cc/cc.html I reserve no rights restricting copying, modification or distribution of this incidentally recorded communication. Original authorship should be attributed reasonably, but only so far as such an expectation might hold for usual practice in ordinary social discourse to which one holds no claim of exclusive rights. From gojomo at bitzi.com Thu Jul 17 22:32:02 2003 From: gojomo at bitzi.com (Gordon Mohr) Date: Sat Dec 9 22:12:21 2006 Subject: crypto naming Re: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> <3F158787.6020902@chapweske.com> <3F158C33.4070604@gmx.de> Message-ID: <018f01c34ced$e9fdf280$660a000a@golden> Benja writes: > Hi Zooko, > > Zooko wrote: > > Coincidentally, there is an active (and contentious) discussion of > > cryptography-based naming at the "cryptography" mailing list. > > > > Three different concrete proposals, including one already deployed and one > > newly announced, have been mentioned. > > Is there an archive? I'd also be interested if there was an archive. Or, if the concrete proposals are available in draft form anywhere. - Gordon From hal at finney.org Thu Jul 17 23:12:02 2003 From: hal at finney.org (Hal Finney) Date: Sat Dec 9 22:12:21 2006 Subject: crypto naming Re: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs Message-ID: <200307180610.h6I6AKs06961@finney.org> Benja writes: > Hi Zooko, > > Zooko wrote: > > Coincidentally, there is an active (and contentious) discussion of > > cryptography-based naming at the "cryptography" mailing list. > > > > Three different concrete proposals, including one already deployed and one > > newly announced, have been mentioned. > > Is there an archive? The cryptography list is archived (somewhat imperfectly) at http://www.mail-archive.com/cryptography%40metzdowd.com/. You can read the discussion of the naming issues under the thread "Announcing httpsy://, a YURL scheme", originating with message http://www.mail-archive.com/cryptography%40metzdowd.com/msg00481.html. The YURL scheme itself is described at http://www.waterken.com/dev/YURL/ and related pages. Hal From zooko at zooko.com Fri Jul 18 06:58:01 2003 From: zooko at zooko.com (Zooko) Date: Sat Dec 9 22:12:21 2006 Subject: crypto naming Re: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs In-Reply-To: Message from "Gordon Mohr" of "Thu, 17 Jul 2003 22:32:00 PDT." <018f01c34ced$e9fdf280$660a000a@golden> References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> <3F158787.6020902@chapweske.com> <3F158C33.4070604@gmx.de> <018f01c34ced$e9fdf280$660a000a@golden> Message-ID: Gordon Mohr wrote: > > I'd also be interested if there was an archive. Or, if the concrete > proposals are available in draft form anywhere. The proposals I meant were: * The Eternal Resource Locator Anderson R. J., Matyas V., Jr. and Petitcolas F. A. P. "The Eternal Resource Locator: An Alternative Means of Establishing Trust on the World Wide Web", in 3rd USENIX workshop on Electronic Commerce, 1998, Boston, Massachusetts, USA, http://citeseer.nj.nec.com/365389.html * The Self-Certifying File System http://fs.net/ * YURL https://www.waterken.com/dev/YURL/ From mfreed at cs.nyu.edu Fri Jul 18 13:20:02 2003 From: mfreed at cs.nyu.edu (Michael J. Freedman) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Verification for erasure-encoded multi-source dload In-Reply-To: <1057775910.13939.896.camel@monster.omnifarious.org> Message-ID: Hi all, Given the recent long discussions about THEX and hash trees for multi-source download (i.e., swarming), I thought this alternate approach may be of direct interest: On-the-Fly Verification of Erasure-Encoded File Transfers (Extended Abstract) To appear in 1st IRIS Student Workshop on Peer-to-Peer Systems http://www.scs.cs.nyu.edu/~mfreed/docs/authcodes-isw03.ps .pdf (Please note that this is not a full paper and is still in draft form.) Basically, erasure codes such as LT Codes (Luby) and Online codes (Maymounkov) can be very useful for building more efficient multi-source download algorithms (see, for instance, "Rateless Codes and Big Downloads", at http://www.scs.cs.nyu.edu/~petar/msdlncs.ps) Unfortunately, traditional hash trees are not useful in this environment, as the "checkblocks" generated by erasure coding at mirrors are randomized -- the initial file publisher cannot verify these. This paper describes a technique, based on "homomorphic hashing", that allows the downloader to verify checkblocks on-the-fly. --mike On 9 Jul 2003, Eric M. Hopper wrote: > Date: 09 Jul 2003 13:38:30 -0500 > From: Eric M. Hopper > Reply-To: p2p-hackers@zgp.org > To: p2p-hackers@zgp.org > Subject: Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. > and alternate approach > > On Thu, 2003-07-03 at 23:47, Adam Back wrote: > > > So one common p2p download approach is to download a file in parts in > > parallel (and out of order) from multiple servers. (Particularly to > > achieve reasonable download rates from multiple asynchronous links of > > varying link speeds). A common idiom is also that there is a compact > > authenticator for a file (such as it's hash) which people will supply > > as a document-id. > > > There is an important thing you can do with an order 2 authentication > tree that is much harder to do with a tree of an order larger than 2. > That's download the authentication data needed to verify each packet > against the root along with the packet itself. > > For example, if you have a 2-way THEX tree for 2^3*blocksize data, it > will look something like this: > > Diagram of 2-way THEX tree > > If someone transmits the data for node A, in order to verify node A > completely, the hashes for B, J, and N need to be transmitted. No other > hashes are needed since they are already known, as in the case for the > root node, or can be calculated. > > If someone then transmits the data for node B, no hashes need to be > transmitted since the reciever already has all the needed hashes. For > C, only D is needed. > > If you have an 8-way THEX tree, you end up with a diagram like this: > > Node diagram for an 8-way tree > > If someone recieves node A, they will have to also get the hashes for > nodes B-H in order to verify node A. This is MUCH more information than > with a 2-way THEX tree, and as the depth of both trees grows, the 2-way > tree is favored more and more. > > I think one useful measure is how much data is needed to verify a given > block as compared to the size of a block. If you have a block size of > 64KB, and a hash data size of 32 bytes (I think SHA-1 is just a little > too weak, and prefer SHA2-256), then you can deal with a 16MB file and > still ensure that the maximum amount of data needed to verify any given > block is less than the size of a block. If you only use a 16 byte hash, > then you can send 4GB file and still keep that property. If you > maintain a 32 byte hash, but go to a 128KB block, you can transmit an > 8GB file and maintain that property. > > This is also highly resistant to jamming. If you get the verification > data from the same node that sent you the data block in the first place, > that node will be unable to spoof the verification data to make a bad > block look like a good one. If you get the verification data from a > different node than sent you the data block, that node will be unable to > spoof the hashes in order to make a good block look like a bad one. So > errors in either the verification hashes, or the block are easily and > quickly detectable. > > Lastly, it is easy to specify which subset of verification data you > need. Any data block you recieve may need > log2(number_of_blocks_in_file) hashes worth of verification data. You > can simply send a bitstring with one bit for each hash you might > potentially need saying whether or not you actually need it or not. > > Sorry this is in HTML. I just didn't want to have to use ASCII art for > the diagrams because it'd be a huge and annoying pain. > > Have fun (if at all possible), > > -- > There's an excellent C/C++/Python/Unix/Linux programmer with a wide > range of other experience and system admin skills who needs work. > Namely, me. http://www.omnifarious.org/~hopper/resume.html > -- Eric Hopper > > ----- "Not all those who wander are lost." www.michaelfreedman.org From justin at chapweske.com Fri Jul 18 15:06:02 2003 From: justin at chapweske.com (Justin Chapweske) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Verification for erasure-encoded multi-source dload In-Reply-To: References: Message-ID: <3F186F24.4010405@chapweske.com> Michael, Interesting results. Lots of great work coming out of you NYU guys :) Kademlia is a very nice system. I'd like to point out though that hash trees still work quite well when small expansion factors are used. With the original Swarmcast system back in 2000, we used an expansion factor of 8 (k=32, n=256) though I'm sure we could have gotten away with an even smaller expansion factor by adding a bit more scheduling to the protocol. Either way, it is quite manageable to build a hash tree across the entire set of encoded data, though it requires a bit of preprocessing. If you don't wish to make the hash tree dependant on the entire encoded set, you can create a number of hash trees, one for each expansion factor. So with a systematic code such as the Vandermonde codes, your first hash tree is equivilent to a normal hash tree over the vanilla file. Are Online Codes systematic? Obviously, your work is important if you decide to forgo the Reed-Solomon codes and use LT or Online codes which lend themselves to huge expansion factors. All in all, you guys do a great job of creating some very elegant systems. By the way, what is the patent status of the Online Codes? Thanks, -Justin Michael J. Freedman wrote: > Hi all, > > Given the recent long discussions about THEX and hash trees for > multi-source download (i.e., swarming), I thought this alternate approach > may be of direct interest: > > On-the-Fly Verification of Erasure-Encoded File Transfers (Extended Abstract) > To appear in 1st IRIS Student Workshop on Peer-to-Peer Systems > http://www.scs.cs.nyu.edu/~mfreed/docs/authcodes-isw03.ps .pdf > > (Please note that this is not a full paper and is still in draft form.) > > Basically, erasure codes such as LT Codes (Luby) and Online codes > (Maymounkov) can be very useful for building more efficient multi-source > download algorithms (see, for instance, "Rateless Codes and Big > Downloads", at http://www.scs.cs.nyu.edu/~petar/msdlncs.ps) > > Unfortunately, traditional hash trees are not useful in this environment, > as the "checkblocks" generated by erasure coding at mirrors are > randomized -- the initial file publisher cannot verify these. This paper > describes a technique, based on "homomorphic hashing", that allows the > downloader to verify checkblocks on-the-fly. > > --mike > > > On 9 Jul 2003, Eric M. Hopper wrote: > > >>Date: 09 Jul 2003 13:38:30 -0500 >>From: Eric M. Hopper >>Reply-To: p2p-hackers@zgp.org >>To: p2p-hackers@zgp.org >>Subject: Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. >> and alternate approach >> >>On Thu, 2003-07-03 at 23:47, Adam Back wrote: >> >> >>>So one common p2p download approach is to download a file in parts in >>>parallel (and out of order) from multiple servers. (Particularly to >>>achieve reasonable download rates from multiple asynchronous links of >>>varying link speeds). A common idiom is also that there is a compact >>>authenticator for a file (such as it's hash) which people will supply >>>as a document-id. >> >> >>There is an important thing you can do with an order 2 authentication >>tree that is much harder to do with a tree of an order larger than 2. >>That's download the authentication data needed to verify each packet >>against the root along with the packet itself. >> >>For example, if you have a 2-way THEX tree for 2^3*blocksize data, it >>will look something like this: >> >>Diagram of 2-way THEX tree >> >>If someone transmits the data for node A, in order to verify node A >>completely, the hashes for B, J, and N need to be transmitted. No other >>hashes are needed since they are already known, as in the case for the >>root node, or can be calculated. >> >>If someone then transmits the data for node B, no hashes need to be >>transmitted since the reciever already has all the needed hashes. For >>C, only D is needed. >> >>If you have an 8-way THEX tree, you end up with a diagram like this: >> >>Node diagram for an 8-way tree >> >>If someone recieves node A, they will have to also get the hashes for >>nodes B-H in order to verify node A. This is MUCH more information than >>with a 2-way THEX tree, and as the depth of both trees grows, the 2-way >>tree is favored more and more. >> >>I think one useful measure is how much data is needed to verify a given >>block as compared to the size of a block. If you have a block size of >>64KB, and a hash data size of 32 bytes (I think SHA-1 is just a little >>too weak, and prefer SHA2-256), then you can deal with a 16MB file and >>still ensure that the maximum amount of data needed to verify any given >>block is less than the size of a block. If you only use a 16 byte hash, >>then you can send 4GB file and still keep that property. If you >>maintain a 32 byte hash, but go to a 128KB block, you can transmit an >>8GB file and maintain that property. >> >>This is also highly resistant to jamming. If you get the verification >>data from the same node that sent you the data block in the first place, >>that node will be unable to spoof the verification data to make a bad >>block look like a good one. If you get the verification data from a >>different node than sent you the data block, that node will be unable to >>spoof the hashes in order to make a good block look like a bad one. So >>errors in either the verification hashes, or the block are easily and >>quickly detectable. >> >>Lastly, it is easy to specify which subset of verification data you >>need. Any data block you recieve may need >>log2(number_of_blocks_in_file) hashes worth of verification data. You >>can simply send a bitstring with one bit for each hash you might >>potentially need saying whether or not you actually need it or not. >> >>Sorry this is in HTML. I just didn't want to have to use ASCII art for >>the diagrams because it'd be a huge and annoying pain. >> >>Have fun (if at all possible), >> >>-- >>There's an excellent C/C++/Python/Unix/Linux programmer with a wide >>range of other experience and system admin skills who needs work. >>Namely, me. http://www.omnifarious.org/~hopper/resume.html >>-- Eric Hopper >> >> > > ----- > "Not all those who wander are lost." www.michaelfreedman.org > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers -- Justin Chapweske, Onion Networks http://onionnetworks.com/ From petar at scs.cs.nyu.edu Fri Jul 18 16:41:02 2003 From: petar at scs.cs.nyu.edu (Petar Maymounkov) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Verification for erasure-encoded multi-source dload In-Reply-To: <3F186F24.4010405@chapweske.com> Message-ID: Hi Guys, So, we haven't thought about using trees for Online Codes because they are rateless, which means that practically they have an infinite expansion, so a tree won't be feasible. (Unlike, Reed-Solomon and other codes, which have a pre-specified expansion). As for the patent, it is held by DigitalFountain, more specifically Michael Luby and Amin Shokrollahi. Petar On Fri, 18 Jul 2003, Justin Chapweske wrote: > Michael, > > Interesting results. Lots of great work coming out of you NYU guys :) > Kademlia is a very nice system. > > I'd like to point out though that hash trees still work quite well when > small expansion factors are used. > > With the original Swarmcast system back in 2000, we used an expansion > factor of 8 (k=32, n=256) though I'm sure we could have gotten away with > an even smaller expansion factor by adding a bit more scheduling to the > protocol. Either way, it is quite manageable to build a hash tree > across the entire set of encoded data, though it requires a bit of > preprocessing. > > If you don't wish to make the hash tree dependant on the entire encoded > set, you can create a number of hash trees, one for each expansion > factor. So with a systematic code such as the Vandermonde codes, your > first hash tree is equivilent to a normal hash tree over the vanilla > file. Are Online Codes systematic? > > Obviously, your work is important if you decide to forgo the > Reed-Solomon codes and use LT or Online codes which lend themselves to > huge expansion factors. All in all, you guys do a great job of creating > some very elegant systems. By the way, what is the patent status of the > Online Codes? > > Thanks, > > -Justin > > Michael J. Freedman wrote: > > Hi all, > > > > Given the recent long discussions about THEX and hash trees for > > multi-source download (i.e., swarming), I thought this alternate approach > > may be of direct interest: > > > > On-the-Fly Verification of Erasure-Encoded File Transfers (Extended Abstract) > > To appear in 1st IRIS Student Workshop on Peer-to-Peer Systems > > http://www.scs.cs.nyu.edu/~mfreed/docs/authcodes-isw03.ps .pdf > > > > (Please note that this is not a full paper and is still in draft form.) > > > > Basically, erasure codes such as LT Codes (Luby) and Online codes > > (Maymounkov) can be very useful for building more efficient multi-source > > download algorithms (see, for instance, "Rateless Codes and Big > > Downloads", at http://www.scs.cs.nyu.edu/~petar/msdlncs.ps) > > > > Unfortunately, traditional hash trees are not useful in this environment, > > as the "checkblocks" generated by erasure coding at mirrors are > > randomized -- the initial file publisher cannot verify these. This paper > > describes a technique, based on "homomorphic hashing", that allows the > > downloader to verify checkblocks on-the-fly. > > > > --mike > > > > > > On 9 Jul 2003, Eric M. Hopper wrote: > > > > > >>Date: 09 Jul 2003 13:38:30 -0500 > >>From: Eric M. Hopper > >>Reply-To: p2p-hackers@zgp.org > >>To: p2p-hackers@zgp.org > >>Subject: Re: [p2p-hackers] THEX efficiency for authenticated p2p dload. > >> and alternate approach > >> > >>On Thu, 2003-07-03 at 23:47, Adam Back wrote: > >> > >> > >>>So one common p2p download approach is to download a file in parts in > >>>parallel (and out of order) from multiple servers. (Particularly to > >>>achieve reasonable download rates from multiple asynchronous links of > >>>varying link speeds). A common idiom is also that there is a compact > >>>authenticator for a file (such as it's hash) which people will supply > >>>as a document-id. > >> > >> > >>There is an important thing you can do with an order 2 authentication > >>tree that is much harder to do with a tree of an order larger than 2. > >>That's download the authentication data needed to verify each packet > >>against the root along with the packet itself. > >> > >>For example, if you have a 2-way THEX tree for 2^3*blocksize data, it > >>will look something like this: > >> > >>Diagram of 2-way THEX tree > >> > >>If someone transmits the data for node A, in order to verify node A > >>completely, the hashes for B, J, and N need to be transmitted. No other > >>hashes are needed since they are already known, as in the case for the > >>root node, or can be calculated. > >> > >>If someone then transmits the data for node B, no hashes need to be > >>transmitted since the reciever already has all the needed hashes. For > >>C, only D is needed. > >> > >>If you have an 8-way THEX tree, you end up with a diagram like this: > >> > >>Node diagram for an 8-way tree > >> > >>If someone recieves node A, they will have to also get the hashes for > >>nodes B-H in order to verify node A. This is MUCH more information than > >>with a 2-way THEX tree, and as the depth of both trees grows, the 2-way > >>tree is favored more and more. > >> > >>I think one useful measure is how much data is needed to verify a given > >>block as compared to the size of a block. If you have a block size of > >>64KB, and a hash data size of 32 bytes (I think SHA-1 is just a little > >>too weak, and prefer SHA2-256), then you can deal with a 16MB file and > >>still ensure that the maximum amount of data needed to verify any given > >>block is less than the size of a block. If you only use a 16 byte hash, > >>then you can send 4GB file and still keep that property. If you > >>maintain a 32 byte hash, but go to a 128KB block, you can transmit an > >>8GB file and maintain that property. > >> > >>This is also highly resistant to jamming. If you get the verification > >>data from the same node that sent you the data block in the first place, > >>that node will be unable to spoof the verification data to make a bad > >>block look like a good one. If you get the verification data from a > >>different node than sent you the data block, that node will be unable to > >>spoof the hashes in order to make a good block look like a bad one. So > >>errors in either the verification hashes, or the block are easily and > >>quickly detectable. > >> > >>Lastly, it is easy to specify which subset of verification data you > >>need. Any data block you recieve may need > >>log2(number_of_blocks_in_file) hashes worth of verification data. You > >>can simply send a bitstring with one bit for each hash you might > >>potentially need saying whether or not you actually need it or not. > >> > >>Sorry this is in HTML. I just didn't want to have to use ASCII art for > >>the diagrams because it'd be a huge and annoying pain. > >> > >>Have fun (if at all possible), > >> > >>-- > >>There's an excellent C/C++/Python/Unix/Linux programmer with a wide > >>range of other experience and system admin skills who needs work. > >>Namely, me. http://www.omnifarious.org/~hopper/resume.html > >>-- Eric Hopper > >> > >> > > > > ----- > > "Not all those who wander are lost." www.michaelfreedman.org > > > > _______________________________________________ > > p2p-hackers mailing list > > p2p-hackers@zgp.org > > http://zgp.org/mailman/listinfo/p2p-hackers > > > -- > Justin Chapweske, Onion Networks > http://onionnetworks.com/ > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > From decoy at iki.fi Sat Jul 19 04:45:02 2003 From: decoy at iki.fi (Sampo Syreeni) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Verification for erasure-encoded multi-source dload In-Reply-To: <3F186F24.4010405@chapweske.com> References: <3F186F24.4010405@chapweske.com> Message-ID: On 2003-07-18, Justin Chapweske uttered: >By the way, what is the patent status of the Online Codes? I'd also check whether there are patents on the overall concept of using sparse codes for swarmcast -- I think using them in conventional multicast is in fact patented. -- Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111 student/math+cs/helsinki university, http://www.iki.fi/~decoy/front openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 From sam at neurogrid.com Sat Jul 19 23:58:02 2003 From: sam at neurogrid.com (Sam Joseph) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] New Wiki List of P2P Conferences Message-ID: <3F1A3D2B.3080804@neurogrid.com> Hi All, Just to let you know that I've set up a list of p2p conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences Most of these are ones that have already happened, but the top most 3 are yet to come, and the top most one is still open for submissions. I'm hoping we can get more conferences in there that have upcoming submission deadlines so we can have a better chance of getting any work we write up to these conferences. Please feel free to add conferences to the wiki, or mail me with upcoming p2p related conferences and I'll try and make sure they get added to the list ... CHEERS> SAM From b.fallenstein at gmx.de Sun Jul 20 12:55:02 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] New Wiki List of P2P Conferences In-Reply-To: <3F1A3D2B.3080804@neurogrid.com> References: <3F1A3D2B.3080804@neurogrid.com> Message-ID: <3F1AF326.7010804@gmx.de> Hi Sam, I'm unable to register for the Wiki (I get an error message that email could not be delivered). I suggest that you add USENIX NSDI'04 to the list-- the CFP mentions P2P systems as one area of interest. Deadline is September 15th. http://www.usenix.org/events/nsdi04/cfp/ Thanks for the effort, the Wiki looks like a good idea. - Benja Sam Joseph wrote: > Hi All, > > Just to let you know that I've set up a list of p2p conferences: > > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences > > Most of these are ones that have already happened, but the top most 3 > are yet to come, and the top most one is still open for submissions. I'm > hoping we can get more conferences in there that have upcoming > submission deadlines so we can have a better chance of getting any work > we write up to these conferences. > > Please feel free to add conferences to the wiki, or mail me with > upcoming p2p related conferences and I'll try and make sure they get > added to the list ... > > CHEERS> SAM > > > _______________________________________________ > p2p-hackers mailing list > p2p-hackers@zgp.org > http://zgp.org/mailman/listinfo/p2p-hackers > > From b.fallenstein at gmx.de Sun Jul 20 13:37:01 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] content-types in URIs In-Reply-To: References: <3F0DC458.6090401@gmx.de> <3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com> <3F159813.4070903@gmx.de> Message-ID: <3F1AFD07.2030509@gmx.de> Peter Thiemann wrote: > Benja> Now, given > > Benja> urn:sha1:text/plain, > > Benja> and a pair > > Benja> ("text/plain", "foobar") > > Benja> we can verify the against "foobar", and compare the > Benja> "text/plain" from the URI to the "text/plain" in the pair, so the URI > Benja> clearly maps to only one such pair (as long as finding a hash > Benja> collision is impossible). Thus, sha1 identifiers of this form *would* > Benja> be self-verifying. > > There is too much good will and protocol in your proposal. Sorry, I don't understand what you mean. > Here is a > much simpler way of getting self-verifying and endorsable hashes: > > Let's say you have this resource > > ("text/plain", "fubar") > > the trick is to take the hash not just from "fubar" but from the > concatenation of the mediatype and the contents: > > H = sha1 ("text/plainfubar") This is a classic fallacy when working with hashes btw: There is no way to know which of the following is meant: ("text/plai", "nfubar") ("text/plain", "fubar") ("text/plainf", "ubar") Of course, that's easily fixed by separating the two by a character that cannot occur in the media type, probably the space-- H = sha1 ("text/plain fubar") > With this setup, if there is a registry which endorses > > urn:hash:text/plain:sha1:H > > then that means the registry has checked > 1. that the "fubar" content is ok and > 2. that its mediatype is "text/plain" What's the cost? That it doesn't interoperate with today's systems, which only hash the content; that all sha1 hashes in ciculation on the Web today are useless. What's the benefit? That you can use H alone to identify the (content type, content) pair. Would this really be useful? I don't see how. > In addition, the value H is *less* prone to forgery because the hashed > value is not completely arbitrary: it *must* start with "text/plain". I fail to see what you mean by "less prone to forgery." > If this works, then it seems to be a quite strong argument for > including content types. Sorry, but while I understand what you propose, I seriously don't have the faintest clue why you propose it... :-) I neither understand what you mean by "good will and protocol" nor what you mean by "less prone to forgery." What kind of forgery do you mean? We're miscommunicating somewhere, I think. - Benja From b.fallenstein at gmx.de Sun Jul 20 15:31:02 2003 From: b.fallenstein at gmx.de (Benja Fallenstein) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] content-types in URIs In-Reply-To: References: <3F0DC458.6090401@gmx.de> <3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com> <3F159813.4070903@gmx.de> <3F1AFD07.2030509@gmx.de> Message-ID: <3F1B17C1.8030700@gmx.de> Hi Peter, (quoting out-of-order) Peter Thiemann wrote: > My proposal aims at adding internal structure artificially just during > the computation of the hash. For the fubar example, this means > > H = sha1 ("text/plainfubar") > > Suppose some adversary has found a string x so that > > sha1 (x) = H > > Too bad! But, what is the probability that x starts with > "text/plain"? Ok, I'm starting to see what you want, here. You have a specific model of attack in mind: Given a hash, the attacker can generate a (relatively) random bit string which has that hash. The attacker has little choice in the shape of this bit string. In this model, of course, it's quite unlikely that 'x' happens to start with 'text/plain.' But of course the attacker will try to devise an algorithm that finds an 'x' such that sha1 ("text/plain" + x) = H or sha1 (x repetitive-xor "text/plain") = H Basically, what you're doing is creating a family of hash functions, where the "text/plain" parameter chooses one function from that family. Your family would be defined as, f_str (x) = sha1 (x + str) where str is the parameter. Now, do you seriously think that once sha1 is broken, a skilled attacker will not be able to break f_str? I'm not an expert in this field either, but it seems like if you can break a hash function consisting of multiple rounds of addition, circular left shift, complement, and AND/OR/XOR in a complex pattern, that plainly prepending/xoring a bitstring won't add a lot of security. OTOH, the community already considers a function *really, really* broken if you can find any x,y so that hash(x) = hash(y). So it's possible that no "get x for given H with hash(x)=H" attack will ever be devised. This is no protection against a skilled black-hat cryptoanalyst, of course. If you think that f_str is a strong new family of hash functions, stronger than sha1, you should publish about it and subject it to peer review ;) ;) >>>>>>"Benja" == Benja Fallenstein writes: > >> There is too much good will and protocol in your proposal. > > Benja> Sorry, I don't understand what you mean. > > It means you need to trust somebody else who maintains this pair. > If you got the urn and the file, then you can only verify it if you > can verify the content type. I'm wondering if that's always possible. This doesn't seem to be the same argument as above (lengthening the durability of the hash function beyond breakages). I don't quite see the point of this paragraph. Either you have someone who's approved of the URN (maybe simply sent it to you). Then this person has vouched for the content type as well as the hash. Or you assume that someone may have tampered with the URN, and changed the content type to something that suits that adversary. If so, I don't see how the adversary couldn't also have changed the hash. Or maybe you assume that someone has vouched for the hash, but not the content type (i.e., not for the full URN). In that case, I would not conclude that the person has vouched for urn:sha1:text/plain,hash -- I would only conclude they have vouched for urn:sha1:hash ... So, you seem to say two things: 1. Putting the content type into the hashed data makes the hash usable beyond breakages of the hash function. I think this is most likely incorrect, assuming an attack by a skilled cryptoanalyst (rather than a script kiddie). 2. Something else, which I still don't understand ;-) > I hope this is becoming clearer :-) Clearer-- but not yet clear :-) - Benja From sam at neurogrid.com Sun Jul 20 19:25:02 2003 From: sam at neurogrid.com (Sam Joseph) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] New Wiki List of P2P Conferences References: <3F1A3D2B.3080804@neurogrid.com> <3F1AF326.7010804@gmx.de> Message-ID: <3F1B4EC7.5030304@neurogrid.com> Hi Benja Benja Fallenstein wrote: > I'm unable to register for the Wiki (I get an error message that email > could not be delivered). I suggest that you add USENIX NSDI'04 to the > list-- the CFP mentions P2P systems as one area of interest. Deadline > is September 15th. > > http://www.usenix.org/events/nsdi04/cfp/ > > Thanks for the effort, the Wiki looks like a good idea. No worries - have added the USENIX conference. Sorry about you not being able to log in - Actually you did successfully create your account and could have logged in - it was just the email notification that failed - I'm working with my ISP to try and fix that problem ... You're in the system as BenjaFallenstein - I believe that you should now be able to log in with the password you created previously - if you can't let me know and I'll reset your account. CHEERS> SAM From dirkx at webweaving.org Mon Jul 21 05:35:03 2003 From: dirkx at webweaving.org (Dirk-Willem van Gulik) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs (was: Comments on draft-thiemann-cbuid-urn-00) In-Reply-To: <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> Message-ID: <20030720204545.T1449-100000@foem> On Wed, 16 Jul 2003, Carl Ellison wrote: > Is urn:sha1:dRDPBgZzTFq7Jl2Q2N/YNghcfj8= not now legal? Until there is a registered URN namespace/authority called 'sha1' - nope. You could use x-sha1 - but that is kind of frowned upon. > > urn:sha1:... > > urn:sha256:... > > urn:tiger:... > > urn:sha1+sha256:... > > urn:sha1+tiger:... > > urn:sha256+tiger:... > > urn:sha1+sha256+tiger:... Or regigster urn:assortedhashes:.... or urn:hash:.... and write an RFC which would document that it has to have the shape: urn:hash:[a-z\+]+: And then in that RFC either define all of the above; or devise a scheme by which additional A+B+C's can be added to that block. Obviously in the above replace s/hash/by-whatever/ you want. Dw. From thiemann at informatik.uni-freiburg.de Mon Jul 21 05:35:06 2003 From: thiemann at informatik.uni-freiburg.de (Peter Thiemann) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] content-types in URIs In-Reply-To: <3F159813.4070903@gmx.de> References: <3F0DC458.6090401@gmx.de> <3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com> <3F159813.4070903@gmx.de> Message-ID: >>>>> "Benja" == Benja Fallenstein writes: Benja> Hi Justin, Benja> Justin Chapweske wrote: >> Perhaps I'm misinterpretting cbuid, but it appears to suggest that >> the hash URIs include the content-type of the content in the URI. >> I think this notion is a good idea from a security perspective. It >> is important to retrieve the content-type from a trusted source. >> While probably not practical, one could envision a content-type >> attack where something is perfectly harmless when interpretted as a >> JPEG, but turns into a virus when interpretted as an >> executable. Well, my feelings towards this are similar to Benja's. Benja> In my mind, the entity that we're identifying is a pair: Benja> (content type, octet stream) Benja> The question is, given such a pair, and given a hash-based URI, can we Benja> authenticate that the URI identifies exactly this pair and no other? Benja> If so, I would call the URI self-verifying. Benja> Now, given Benja> urn:sha1:text/plain, Benja> and a pair Benja> ("text/plain", "foobar") Benja> we can verify the against "foobar", and compare the Benja> "text/plain" from the URI to the "text/plain" in the pair, so the URI Benja> clearly maps to only one such pair (as long as finding a hash Benja> collision is impossible). Thus, sha1 identifiers of this form *would* Benja> be self-verifying. There is too much good will and protocol in your proposal. Here is a much simpler way of getting self-verifying and endorsable hashes: Let's say you have this resource ("text/plain", "fubar") the trick is to take the hash not just from "fubar" but from the concatenation of the mediatype and the contents: H = sha1 ("text/plainfubar") With this setup, if there is a registry which endorses urn:hash:text/plain:sha1:H then that means the registry has checked 1. that the "fubar" content is ok and 2. that its mediatype is "text/plain" In addition, the value H is *less* prone to forgery because the hashed value is not completely arbitrary: it *must* start with "text/plain". [I don't know enough about hash functions to judge if this introduces enough regularity. If more regularity is required, then compute the hash of the resource with the mediatype interleaved in some fixed way, for example, at the start of each 1024 byte data block. Alternatively, the hash might be taken from the resource by exor-ing cyclically with the mediatype. I don't know which alternative would be better.] If this works, then it seems to be a quite strong argument for including content types. -Peter From thiemann at informatik.uni-freiburg.de Mon Jul 21 05:35:09 2003 From: thiemann at informatik.uni-freiburg.de (Peter Thiemann) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] content-types in URIs In-Reply-To: <3F1AFD07.2030509@gmx.de> References: <3F0DC458.6090401@gmx.de> <3F155F28.6070005@gmx.de> <3F158A90.2060503@chapweske.com> <3F159813.4070903@gmx.de> <3F1AFD07.2030509@gmx.de> Message-ID: >>>>> "Benja" == Benja Fallenstein writes: Benja> Peter Thiemann wrote: Benja> Now, given Benja> urn:sha1:text/plain, Benja> and a pair Benja> ("text/plain", "foobar") Benja> we can verify the against "foobar", and compare Benja> the Benja> "text/plain" from the URI to the "text/plain" in the pair, so the URI Benja> clearly maps to only one such pair (as long as finding a hash Benja> collision is impossible). Thus, sha1 identifiers of this form *would* Benja> be self-verifying. >> There is too much good will and protocol in your proposal. Benja> Sorry, I don't understand what you mean. It means you need to trust somebody else who maintains this pair. If you got the urn and the file, then you can only verify it if you can verify the content type. I'm wondering if that's always possible. >> Here is a >> much simpler way of getting self-verifying and endorsable hashes: >> Let's say you have this resource >> ("text/plain", "fubar") >> the trick is to take the hash not just from "fubar" but from the >> concatenation of the mediatype and the contents: >> H = sha1 ("text/plainfubar") Benja> This is a classic fallacy when working with hashes btw: There is no Benja> way to know which of the following is meant: Benja> ("text/plai", "nfubar") Benja> ("text/plain", "fubar") Benja> ("text/plainf", "ubar") Hmm, I'd say it's trivial because the urn (see below) states the content type as a string. So the boundary is obvious and implicit. >> With this setup, if there is a registry which endorses >> urn:hash:text/plain:sha1:H >> then that means the registry has checked >> 1. that the "fubar" content is ok and >> 2. that its mediatype is "text/plain" Benja> What's the cost? That it doesn't interoperate with today's systems, Benja> which only hash the content; that all sha1 hashes in ciculation on the Benja> Web today are useless. No! 1. They are not useless. If the urn doesn't state the content type, then the hash is applied just to the resource. 2. Interoperation is given in the same way, but you could migrate towards the other scheme. Benja> What's the benefit? That you can use H alone to identify the (content Benja> type, content) pair. Would this really be useful? I don't see how. Well, this is the explanation: >> In addition, the value H is *less* prone to forgery because the hashed >> value is not completely arbitrary: it *must* start with "text/plain". Benja> I fail to see what you mean by "less prone to forgery." What I mean to say is this: if you are hashing a file with some internal structure, then it's much harder to construct a collision. This is because the adversary would have to come up with an offending file that has a. the same hash and b. the same internal structure My proposal aims at adding internal structure artificially just during the computation of the hash. For the fubar example, this means H = sha1 ("text/plainfubar") Suppose some adversary has found a string x so that sha1 (x) = H Too bad! But, what is the probability that x starts with "text/plain"? Very small, but I'm not an expert to judge how much smaller. If the probability is not small enough, then create more structure (see my suggestions in the other message) to make the probability smaller. This way you can outlive broken hash functions, which seems to be an advantage. Benja> We're miscommunicating somewhere, I think. I hope this is becoming clearer :-) -Peter From michael at neonym.net Mon Jul 21 05:35:11 2003 From: michael at neonym.net (Michael Mealling) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Re: [Cfrg] Cryptographic hashes in URNs (was: Comments on draft-thiemann-cbuid-urn-00) In-Reply-To: <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> References: <3F0DC458.6090401@gmx.de> <3F1427B7.6010402@chapweske.com> <3.0.5.32.20030716080607.018df0c8@mailbox.jf.intel.com> Message-ID: <1058740510.3214.1.camel@blackdell.neonym.net> On Wed, 2003-07-16 at 11:06, Carl Ellison wrote: > I apparently don't know the rules for URN formation. I thought it > was completely free after the "urn:". Is that not true? Essentially, yes. There are syntactic restrictions that exist for all URIs but that's fairly generic. URNs still have to be persistent so you can put things like domain-names in there unless you time/date stamp them.... -Michael Mealling From bram at gawth.com Mon Jul 21 18:09:02 2003 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Verification for erasure-encoded multi-source dload In-Reply-To: Message-ID: Sampo Syreeni wrote: > I'd also check whether there are patents on the overall concept of using > sparse codes for swarmcast -- I think using them in conventional multicast > is in fact patented. Good thing online codes are completely unnecessary for swarming. I've said this before, but it's worth repeating: BitTorrent doesn't use online codes because they would increase overhead and produce at best dubious improvements in download rates. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From sherrysemails at yahoo.com Mon Jul 21 21:49:02 2003 From: sherrysemails at yahoo.com (Carolyn Tracy) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] New Wiki List of P2P Conferences In-Reply-To: <3F1B4EC7.5030304@neurogrid.com> Message-ID: <20030722044809.58759.qmail@web20910.mail.yahoo.com> please tell me how i can be removed form this ! I have no interest in getting anymore emails from anyone from here anymore. Thank you ... Sam Joseph wrote:Hi Benja Benja Fallenstein wrote: > I'm unable to register for the Wiki (I get an error message that email > could not be delivered). I suggest that you add USENIX NSDI'04 to the > list-- the CFP mentions P2P systems as one area of interest. Deadline > is September 15th. > > http://www.usenix.org/events/nsdi04/cfp/ > > Thanks for the effort, the Wiki looks like a good idea. No worries - have added the USENIX conference. Sorry about you not being able to log in - Actually you did successfully create your account and could have logged in - it was just the email notification that failed - I'm working with my ISP to try and fix that problem ... You're in the system as BenjaFallenstein - I believe that you should now be able to log in with the password you created previously - if you can't let me know and I'll reset your account. CHEERS> SAM _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers --------------------------------- Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software -------------- next part -------------- An HTML attachment was scrubbed... URL: http://zgp.org/pipermail/p2p-hackers/attachments/20030721/d184581c/attachment.htm From conspiracytheory1 at hotmail.com Tue Jul 22 05:42:02 2003 From: conspiracytheory1 at hotmail.com (sam  ) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] unsubscribe me Please Message-ID: >From: "Doug Burton" >Reply-To: p2p-hackers@zgp.org >To: >Subject: [p2p-hackers] unsubscribe me Please >Date: Fri, 4 Jul 2003 18:38:33 -0400 > _________________________________________________________________ Use MSN Messenger to send music and pics to your friends http://www.msn.co.uk/messenger From xmaj at hotmail.com Wed Jul 23 06:03:02 2003 From: xmaj at hotmail.com (reza majidi) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] unsubscribe me pls. Message-ID: _________________________________________________________________ MSN 8 with e-mail virus protection service: 2 months FREE* http://join.msn.com/?page=features/virus From hopper at omnifarious.org Wed Jul 23 23:23:02 2003 From: hopper at omnifarious.org (Eric M. Hopper) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Verification for erasure-encoded multi-source dload In-Reply-To: References: Message-ID: <1059027736.11698.374.camel@monster.omnifarious.org> On Mon, 2003-07-21 at 20:08, Bram Cohen wrote: > Sampo Syreeni wrote: > > > I'd also check whether there are patents on the overall concept of using > > sparse codes for swarmcast -- I think using them in conventional multicast > > is in fact patented. > > Good thing online codes are completely unnecessary for swarming. > > I've said this before, but it's worth repeating: BitTorrent doesn't use > online codes because they would increase overhead and produce at best > dubious improvements in download rates. The only thing I can see them buying you is a little more robustness against seeds disappearing. Have fun (if at all possible), -- There's an excellent C/C++/Python/Unix/Linux programmer with a wide range of other experience and system admin skills who needs work. Namely, me. http://www.omnifarious.org/~hopper/resume.html -- Eric Hopper -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: This is a digitally signed message part Url : http://zgp.org/pipermail/p2p-hackers/attachments/20030723/d2ac4e16/attachment.pgp From tyler at waterken.com Thu Jul 24 10:53:02 2003 From: tyler at waterken.com (Tyler Close) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Listing P2P URL schemes Message-ID: I am building a list of P2P URL schemes at: http://www.waterken.com/dev/YURL/#YURL_schemes The list is divided into two sections: URLs that locate active computing agents and URLs that locate files. I think many of the participants on this mailing list have URL schemes that should be listed in the second section of the list. Please send me links for other URL schemes that belong on the list. I'll try to update the list in real-time as I receive new links. Thank you, Tyler -- The union of REST and capability-based security: http://www.waterken.com/dev/Web/ From digi at treepy.com Thu Jul 24 11:43:02 2003 From: digi at treepy.com (p@) Date: Sat Dec 9 22:12:21 2006 Subject: AW: [p2p-hackers] Listing P2P URL schemes In-Reply-To: Message-ID: <004501c35213$5306d9c0$0200a8c0@pat> Here are the treepy url definitions... reserved URL's are not complete yet. -----Urspr?ngliche Nachricht----- Von: p2p-hackers-admin@zgp.org [mailto:p2p-hackers-admin@zgp.org] Im Auftrag von Tyler Close Gesendet: Donnerstag, 24. Juli 2003 19:31 An: p2p-hackers@zgp.org Betreff: [p2p-hackers] Listing P2P URL schemes I am building a list of P2P URL schemes at: http://www.waterken.com/dev/YURL/#YURL_schemes The list is divided into two sections: URLs that locate active computing agents and URLs that locate files. I think many of the participants on this mailing list have URL schemes that should be listed in the second section of the list. Please send me links for other URL schemes that belong on the list. I'll try to update the list in real-time as I receive new links. Thank you, Tyler -- The union of REST and capability-based security: http://www.waterken.com/dev/Web/ _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences -------------- next part -------------- ======================================================================== Treepy URL Definitions 2003.03.04, Patrick Lauber - Initial document release ======================================================================== The structur of treepy URL's: abstract: 3pi://directory1.directory2/path1/path2?argument&subargument virtual: 3pi://entertainment.cinema.movieDB/action/terminator2.avi?name&changed internal: 3pi://entertainment.cinema.movieDB/52?name&changed 3pi:// = schema entertainment.cinema.movieDB = Location... normally directory clusters /action/terminator2.avi = virtual-path to a cluster... this address is defined by the directory cluster (in this example movieDB, there is an internal address too... something like /_17) if the cluster is a directory cluster the path is always '/' ?name argument or subobject... see a list of reserved arguments later &changed sub-argument or sub-sub-object.. see a list of reserved sub-arguments later The virtual urls are only used for display in GUI's. The gui is advised to use internal ones for communication with core ... and virtual ones for communcication with users. The reason we have to use internal url's is that if a directory-cluster admin chooses to move a cluster in the tree the url would change... this would make a big security problem because only the cluster-creator should be allowed to choose the parent or url what is the same. ======================================================================= Possible Arguments core/load arguments (GET) (see treepyhttp.txt for more infos on core/load) ------------------------- Only one argument can be provided at one time example: http://localhost:3141/core/load?url="3pi://entertainment.cinema/?_arglist" Reserved arguments: Reserved arguments are either write-protected or only writeable by the cluster-creator or moderators X = write protected (only core can write) K = write only by cluster owner M = write by all mods and cluster owner C = need a computation befor output l = is a list t = is a table s = string n = number b = binary (1 or 0) c = not accessible via tcp-COM-Load _* Joker argument... used for clusters with sources... means download-all _argcreatorlist Xt returns a list of argument creators and theire total size of data and other infos _argdelete MCb _argdublicate MCb _arglist Xl returns a list of all used arguments or subobjects _argnr XCn c returns the number arguments or subobjects _argmaxsize Mn returns the maximum size of one argument _argtreesigned MXt returns the signed argument tree _argtreeunsigned MXt returns the unsigned argument tree ... new uploads go here _argtreeadd _argtreemove MCb _auth XCb returns if there is authentication required to access this cluster(white-list) _blacklist Kt returns IP's and keys of banned people _changed Xs returns the date of last change _createdate CXs returns the date of creation _edit XCb c returns if you have the right to change content of this cluster _hasargtree returns if the clusters uses argument-trees _info XCt c returns a text file with some infos about the cluster _iscreator XCb c returns if you have a privat key _ismember Xb c returns if you are a member of this cluster _ismod XCb c returns if you have a privat key _keepsources Xb returns if you should register the arguments you have after online again _lastaccess returns the date of last acces tothis file _memberlist Xl returns a list of nodes that are members of this cluster _membernr XCn c returns the number of cluster members _minmembers Cn returns the minimal Nr of cluster-members _modkeys Kt returns the keys of the moderators that are allowed to change some content _move XCb c returns if you have the right to move this cluster (are you admin of parent-directory-cluster?) _name Xs c returns the name of the cluster in the tree (used for GUI, defined by directory-parent) _pathlist Xl returns a list of all argument paths (directory only) _parent Ks returns the parent-directory-cluster-url of this url _parentarg Xs returns under which argument the parent has saved you _ping XCn c returns the roundtime of a packet trough the cluster _plugin Ks returns the plugin name, plugin version, plugin URL _pubargs Kb returns if the public can create sub-objects (arguments) _pubargquota Mb returns the maximum size in bytes of data saved by one client _pubargretime Mn returns many infos on how many times a minute/hour/day someone can post _pubkey Ks returns the public KEY of the creator _updatecache Mn returns how big the updateslist is (default: 1024) _upload XCb c returns if you are allowed to attach other clusters to this cluster _treehash returns a TIGER treehash of all arguments _ranmember XCs c returns one random member ip:port _referrers Xl returns all refers to this cluster (links) _size XCn c returns the size of all data saved in this cluster _sources Xt returns a list of more clients with what arguments they have _whitelist Kt returns a list of usernames/passwords _whitelistkeys Kt returns pubkeys of users (enhanced security) no argument returns _info or if plugin=directory returns _argtree and _pathlist other arguments can be used freely if signed by _privkey or pubargs is true sub arguments or reserverd sub-sub-objects multiple sub-arguments possible (handled like one) handled like this: ?argument&subargument _changed Xs returns the date this argument has changed _pubkey Xs returns the public key of the subobject creator _haskey XCs returns the privat key of the subobject creator _sig Xs returns the signatur of the parent-argument signed by creator _hash X if there is no sig... hash is saved instead _encrypted Xb returns if the data was encrypted _enryptpubkey Xs returns the pub key with whom the data was enrypted plain/text) ===================================================================================== core/save arguments (POST) -------------------------- reserved arguments: xml = true upload a xml file that is like infoxml to change infos create = true create a new cluster (provide a xml file with details) all reserved arguments from load apply here too example: http://localhost:3141/core/save?url="3pi://entertainment.cinema/?_arglist"&data=" References: <004501c35213$5306d9c0$0200a8c0@pat> Message-ID: On Thursday 24 July 2003 14:42, digi@treepy.com wrote: > Here are the treepy url definitions... reserved URL's are not complete > yet. Do any of these URLs provide security guarantees like those provided by Mnet? I should have made more clear that the list I am building is of URL schemes that provide P2P security. I am trying to list other P2P network protocols like Mnet. For example, any P2P file sharing program that identifies and authenticates a file based solely on a cryptographic hash of the file, qualifies for the list. Thanks, Tyler -- The union of REST and capability-based security: http://www.waterken.com/dev/Web/ From digi at treepy.com Fri Jul 25 09:46:02 2003 From: digi at treepy.com (p@) Date: Sat Dec 9 22:12:21 2006 Subject: AW: AW: [p2p-hackers] Listing P2P URL schemes In-Reply-To: Message-ID: <000b01c352cc$30f71110$0200a8c0@pat> Sorry for misinterpretation... we use real urls instead of hashes that look like urls... hashes are saved at url's too... every url is signed, up from the root... Cheers p@ -----Urspr?ngliche Nachricht----- Von: p2p-hackers-admin@zgp.org [mailto:p2p-hackers-admin@zgp.org] Im Auftrag von Tyler Close Gesendet: Freitag, 25. Juli 2003 17:39 An: p2p-hackers@zgp.org Betreff: Re: AW: [p2p-hackers] Listing P2P URL schemes On Thursday 24 July 2003 14:42, digi@treepy.com wrote: > Here are the treepy url definitions... reserved URL's are not complete > yet. Do any of these URLs provide security guarantees like those provided by Mnet? I should have made more clear that the list I am building is of URL schemes that provide P2P security. I am trying to list other P2P network protocols like Mnet. For example, any P2P file sharing program that identifies and authenticates a file based solely on a cryptographic hash of the file, qualifies for the list. Thanks, Tyler -- The union of REST and capability-based security: http://www.waterken.com/dev/Web/ _______________________________________________ p2p-hackers mailing list p2p-hackers@zgp.org http://zgp.org/mailman/listinfo/p2p-hackers _______________________________________________ Here is a web page listing P2P Conferences: http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences From bram at gawth.com Fri Jul 25 10:18:02 2003 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Verification for erasure-encoded multi-source dload In-Reply-To: <1059027736.11698.374.camel@monster.omnifarious.org> Message-ID: Eric M. Hopper wrote: > > I've said this before, but it's worth repeating: BitTorrent doesn't use > > online codes because they would increase overhead and produce at best > > dubious improvements in download rates. > > The only thing I can see them buying you is a little more robustness > against seeds disappearing. That's handled by carefully selecting which pieces to download first (basically you start with the rarest ones, but there's some tweaking and implementing it efficiently is nontrivial). The times I still get problems are when everyone who's done downloading drops off leaving no complete copies around, but that runs into information theoretic limits. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From bram at gawth.com Fri Jul 25 10:19:02 2003 From: bram at gawth.com (Bram Cohen) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] Listing P2P URL schemes In-Reply-To: Message-ID: Tyler Close wrote: > Please send me links for other URL schemes that belong on the > list. I'll try to update the list in real-time as I receive new > links. BitTorrent hacks in using a mimetype, so it uses regular-looking urls. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes From cefn.hoile at bt.com Wed Jul 30 07:39:03 2003 From: cefn.hoile at bt.com (cefn.hoile@bt.com) Date: Sat Dec 9 22:12:21 2006 Subject: [p2p-hackers] MMAPPS Day Message-ID: <21DA6754A9238B48B92F39637EF307FD0204B50B@i2km41-ukdy.nat.bt.com> Some of my colleagues here at BT Exact www.btexact.com are involved in a project called MMAPPS (Market Management of Peer-to-Peer Services) which I expect to be of interest to p2p-hackers, p2prg and decentralization subscribers. (Apologies for crosspost if you don't agree). For general project information see: www.mmapps.org For further information on the MMAPPS Project Day see: www.mmapps.org/events A summary of the project is shown inline below for reference. Please feel free to contact me for further information and I will pass on your enquiry to the relevant member of the team. Cefn http://www.cefn.com >>>>>>>>>>>>>>>>>>> MMAPPS (Market Management of Peer-to-Peer Services) is creating generic middleware to support a new class of P2P applications that give peers appropriate incentives to co-operate for the good of the whole peer community. These applications will encourage peers to contribute and will then efficiently allocate that contribution amongst the members of the community. The middleware is generic and supports the definition of a wide variety of incentive schemes which may be based on rewarding good behaviour, or on punishing bad. The schemes can involve payments for contribution but can also be based on rules that enforce a minimum contribution through community sanctions. A key aspect is how peers contribution is accounted for; the middleware provides support for a very wide variety of specific accounting and management schemes, since we have come to the firm conclusion that different P2P applications can require very different trade-offs in terms of the necessary security/scalability/anonymity/robustness of such schemes. The MMAPPS middleware framework has the potential to underlie a very wide range of future P2P applications, and allows easy re-use of many different independently-developed accounting schemes (including traditional micropayment schemes, reputation-based schemes, and many more novel, lighter-weight 'record-based' schemes). The project is now approaching its mid-point (and is in its engineering phase) and so has plenty of results to report. If you would like to be kept informed of the project's progress we recommend you do so either via the MMAPPS Newsletter (issued quarterly) or more directly through participation in the FREE 1-day MMAPPS Project Day we will be holding (as part of NGC/ICQT'03). To receive further information on either the Newsletter or Project Day, please send a mail to cefn.hoile@bt.com, or register yourself directly at the MMAPPS web-site www.mmapps.org. MMAPPS Project partners are: AUEB, BT, ETHZ, Mysterian, TA, TUD and ULANC. >>>>>>>>>>>>>>>>>>> Disclaimer: This post represents the views of the author and does not necessarily accurately represent the views of BT.