Received: from sog-mx-3.v43.ch3.sourceforge.com ([172.29.43.193] helo=mx.sourceforge.net) by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1UPW0J-0001vb-87 for bitcoin-development@lists.sourceforge.net; Tue, 09 Apr 2013 10:42:19 +0000 Received-SPF: pass (sog-mx-3.v43.ch3.sourceforge.com: domain of gmail.com designates 209.85.214.180 as permitted sender) client-ip=209.85.214.180; envelope-from=mh.in.england@gmail.com; helo=mail-ob0-f180.google.com; Received: from mail-ob0-f180.google.com ([209.85.214.180]) by sog-mx-3.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) id 1UPW0I-0000JJ-5b for bitcoin-development@lists.sourceforge.net; Tue, 09 Apr 2013 10:42:19 +0000 Received: by mail-ob0-f180.google.com with SMTP id un3so2507516obb.25 for ; Tue, 09 Apr 2013 03:42:12 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.182.105.2 with SMTP id gi2mr8789538obb.15.1365504132699; Tue, 09 Apr 2013 03:42:12 -0700 (PDT) Sender: mh.in.england@gmail.com Received: by 10.76.162.198 with HTTP; Tue, 9 Apr 2013 03:42:12 -0700 (PDT) In-Reply-To: References: Date: Tue, 9 Apr 2013 12:42:12 +0200 X-Google-Sender-Auth: aQxMZbXYc4RqQErJjTmX6Q4g88c Message-ID: From: Mike Hearn To: Jeff Garzik Content-Type: multipart/alternative; boundary=e89a8ff1cdf0c52ed604d9eb3452 X-Spam-Score: -0.5 (/) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for sender-domain 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (mh.in.england[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 1.0 HTML_MESSAGE BODY: HTML included in message 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature X-Headers-End: 1UPW0I-0000JJ-5b Cc: Bitcoin Development Subject: Re: [Bitcoin-development] On-going data spam X-BeenThere: bitcoin-development@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Apr 2013 10:42:19 -0000 --e89a8ff1cdf0c52ed604d9eb3452 Content-Type: text/plain; charset=UTF-8 OK, as the start of that conversation is now on the list, I might as well post the other thoughts we had. Or at least that I had :) It's tempting to see this kind of abuse through the lens of fees, because we only have a few hammers and so everything looks like a kind of nail. The problem is the moment you try to define "abuse" economically you end up excluding legitimate and beneficial uses as well. Maybe Peters patch for uneconomical outputs is different because of how it works. But mostly it's true. In this case, fees would never work - Peter said the guy who uploaded Wikileaks paid something like $500 to do it. I guess by now it's more like $600-$700. It's hard for regular end users to compete with that kind of wild-eyed dedication to "the cause". The root problem here is people believe the block chain is a data structure that will live forever and be served by everyone for free, in perpetuity, and is thus the perfect place for "uncensorable" stuff. That's a reasonable assumption given how Bitcoin works today. But there's no reason it will be true in the long run (I know this can be an unpopular viewpoint). Firstly, legal issues - I think it's very unlikely any sane court would care about illegal stuff in the block chain given you need special tools to extract it (mens rea). Besides, I guess most end users will end up on SPV clients as they mature. So these users already don't have a copy of the entire block chain. I don't worry too much about this. Secondly, the need to host blocks forever. In future, many (most?) full nodes will be pruning, and won't actually store old blocks at all. They'll just have the utxo database, some undo blocks and some number of old blocks for serving, probably whatever fits in the amount of disk space the user is willing to allocate. But very old blocks will have been deleted. This leads to the question of what incentives people have to not prune. The obvious incentive is money - charge for access to older parts of the chain. The fewer people that host it, the more you can charge. In the worst case scenario where, you know, only 10 different organizations store a copy of the chain, it might mean that bootstrapping a new node in a trust-less manner is expensive. But I really doubt it'd ever get so few. Serving large static datasets just isn't that expensive. Also, you don't actually need to replay from the genesis block to bring up a new code, you can copy the UTXO database from somewhere else. By comparing the databases of lots of different nodes together, the chances of you being in a matrix-like sybil world can be reduced to "beyond reasonable doubt". Maybe nodes would charge for copies of their database too, but ideally there are lots of nodes and so the charge for that should be so close to zero as makes no odds - you can trivially undercut someone by buying access to the dataset and then reselling it for a bit less, so the price should converge on the actual cost of providing the service. Which will be very cheap. There was one last thought I had, which is that if there's a shorter team need to discourage this kind of thing we can use a network/bandwith related hack by changing the protocol. Nodes can serve up blocks encrypted under a random key. You only get the key when you finish the download. A blacklist can apply to Bloom filtering such that transactions which are known to be "abusive" require you to fully download the block rather than select the transactions with a filter. This means that people can still access the data in the chain, but the older it gets the slower and more bandwidth intensive it becomes. Stuffing Wikileaks into the chain sounds good when a 20 line Python script can extract it "instantly". If someone who wants the files has to download gigabytes of padding around it first, suddenly hosting it on a Tor hidden service becomes more attractive. --e89a8ff1cdf0c52ed604d9eb3452 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
OK, as the start of that conversation is now on the list, = I might as well post the other thoughts we had. Or at least that I had :)
It's tempting to see this kind of abuse through= the lens of fees, because we only have a few hammers and so everything loo= ks like a kind of nail. The problem is the moment you try to define "a= buse" economically you end up excluding legitimate and beneficial uses= as well. Maybe Peters patch for uneconomical outputs is different because = of how it works. But mostly it's true. In this case, fees would never w= ork - Peter said the guy who uploaded Wikileaks paid something like $500 to= do it. I guess by now it's more like $600-$700. It's hard for regu= lar end users to compete with that kind of wild-eyed dedication to "th= e cause".

The root problem here is people believe the= block chain is a data structure that will live forever and be served by ev= eryone for free, in perpetuity, and is thus the perfect place for "unc= ensorable" stuff. That's a reasonable assumption given how Bitcoin= works today. But there's no reason it will be true in the long run (I = know this can be an unpopular viewpoint).

Firstly, legal issues - I think it's ve= ry unlikely any sane court would care about illegal stuff in the block chai= n given you need special tools to extract it (mens rea). Besides, I guess m= ost end users will end up on SPV clients as they mature. So these users alr= eady don't have a copy of the entire block chain. I don't worry too= much about this.

Secondly, the need to host blocks forever. = In future, many (most?) full nodes will be pruning, and won't actually = store old blocks at all. They'll just have the utxo database, some undo= blocks and some number of old blocks for serving, probably whatever fits i= n the amount of disk space the user is willing to allocate. But very old bl= ocks will have been deleted.=C2=A0

This leads to the question of what incentiv= es people have to not prune. The obvious incentive is money - charge for ac= cess to older parts of the chain. The fewer people that host it, the more y= ou can charge. In the worst case scenario where, you know, only 10 differen= t organizations store a copy of the chain, it might mean that bootstrapping= a new node in a trust-less manner is expensive. But I really doubt it'= d ever get so few. Serving large static datasets just isn't that expens= ive. Also, you don't actually need to replay from the genesis block to = bring up a new code, you can copy the UTXO database from somewhere else. By= comparing the databases of lots of different nodes together, the chances o= f you being in a matrix-like sybil world can be reduced to "beyond rea= sonable doubt". Maybe nodes would charge for copies of their database = too, but ideally there are lots of nodes and so the charge for that should = be so close to zero as makes no odds - you can trivially undercut someone b= y buying access to the dataset and then reselling it for a bit less, so the= price should converge on the actual cost of providing the service. Which w= ill be very cheap.

There was one last thought I had, which is = that if there's a shorter team need to discourage this kind of thing we= can use a network/bandwith related hack by changing the protocol. Nodes ca= n serve up blocks encrypted under a random key. You only get the key when y= ou finish the download. A blacklist can apply to Bloom filtering such that = transactions which are known to be "abusive" require you to fully= download the block rather than select the transactions with a filter. This= means that people can still access the data in the chain, but the older it= gets the slower and more bandwidth intensive it becomes. Stuffing Wikileak= s into the chain sounds good when a 20 line Python script can extract it &q= uot;instantly". If someone who wants the files has to download gigabyt= es of padding around it first, suddenly hosting it on a Tor hidden service = becomes more attractive.


--e89a8ff1cdf0c52ed604d9eb3452--