Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 47906B2B for ; Mon, 17 Apr 2017 10:14:28 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-wr0-f171.google.com (mail-wr0-f171.google.com [209.85.128.171]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 2FA8710A for ; Mon, 17 Apr 2017 10:14:26 +0000 (UTC) Received: by mail-wr0-f171.google.com with SMTP id o21so80965833wrb.2 for ; Mon, 17 Apr 2017 03:14:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to; bh=Sk7ECr3J84dWyq9tfe58DIH4Fl8WW8T2KipWzJ8xLHM=; b=efE1Di24kJLrpje1PSBaM1pLPcbzz0eTTvRDH5FjN4jJGcZYBF5Lt3go1bn7fvdcVh B9y82n/t9lpn6uz38kbmr+dE2BRyswnCZAF/zBhgdZdiAWXJjLmEaDBqS6X4EQcpjbNm hlAkYpBGCzOHktS+4aFCg/CvhsFH2qa5ilvjCeHHKK5Z4TiURGJCZf6KsgzSU3IAPu0q tIAYpFXt1k73yS0kwgp7Yc0MEor1AcR4wdYXq2dUDTvbK1c0JEB1oC48baV9FYjZi9T5 FsQTZpWjntptkx9yYKZjwnAy65Ubb8PcwSgq7+eM2jUQaKF09Z2gG2J/55snpNsC9hJV JUsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=Sk7ECr3J84dWyq9tfe58DIH4Fl8WW8T2KipWzJ8xLHM=; b=Uuhj3i3w73rEEmoBhvssLnp7VHGHpy8/F7FFO7GPSQyIlwwOOzPp8D3t0WZCE3iPB+ KreywTib66FQRCflXEt4NWLWVSPH1L5owplhaN2qeQgPInW/0uMzXN26Ii+SAxDVJpRf /PGoOqwEQ7UPPDc3Q8M9uQMe8ghxW8ASSOZws27b2QyyEWeLARhhIIM5PEDkk+36FNb9 cKxS3Rn9rcdBwqd039VM+FsW49MOs/9XVkdt4obZYJE8ddAfAakuYqj41urUEGDhdtQw LAP8x0g4uKapJ4DEesNisuFTQZvYDTGgNcfB5jVG8Wgpskkr40tLTuUv+LGKPSCU7pRP Yy8Q== X-Gm-Message-State: AN3rC/4Z574OKC3NZ6D1T9+g5Oc1CbWqCha/VSv0ED4kG++ScLrZCbAX IR0elash3W9bWxu2 X-Received: by 10.223.162.147 with SMTP id s19mr16839411wra.142.1492424064575; Mon, 17 Apr 2017 03:14:24 -0700 (PDT) Received: from [192.168.1.10] (ANice-654-1-153-20.w86-205.abo.wanadoo.fr. [86.205.80.20]) by smtp.googlemail.com with ESMTPSA id o71sm13726616wrb.47.2017.04.17.03.14.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Apr 2017 03:14:24 -0700 (PDT) To: bitcoin-dev@lists.linuxfoundation.org References: From: Aymeric Vitte Message-ID: <19dbfef2-3791-8fe7-1c00-c4052c3d6c45@gmail.com> Date: Mon, 17 Apr 2017 12:14:25 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.3; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------74800D2B13771D9DD9D7A344" X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Apr 2017 10:14:28 -0000 This is a multi-part message in MIME format. --------------74800D2B13771D9DD9D7A344 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit While I fully agree with the intent (increasing full nodes so a big miner waking up in a bad mood can't threaten the world any longer every day as it is now) I am not sure to get the interest of this proposal, because: - it's probably not a good idea to encourage the home users to run full nodes, there are many people running servers far from their capacity that could easily run efficient full nodes - if someone can't allocate 100 GB today to run a full node, then we can't expect him to allocate more in the future - the download time is a real concern - this proposal is a kind of reinventing torrents, while limiting the number of connections to something not efficient at all, I don't see why something that is proven to be super efficient (torrents) would be needed to be reinvented, I am not saying that it should be used as the bittorrent network is doing but the concepts can be reused - I don't get at all the concept of "archival" nodes since it's another useless step toward centralization I think the only way to increase full nodes it to design an incentive for people to run them Le 17/04/2017 à 08:54, David Vorick via bitcoin-dev a écrit : > *Rationale:* > > A node that stores the full blockchain (I will use the term archival > node) requires over 100GB of disk space, which I believe is one of the > most significant barriers to more people running full nodes. And I > believe the ecosystem would benefit substantially if more users were > running full nodes. > > The best alternative today to storing the full blockchain is to run a > pruned node, which keeps only the UTXO set and throws away already > verified blocks. The operator of the pruned node is able to enjoy the > full security benefits of a full node, but is essentially leeching the > network, as they performed a large download likely without > contributing anything back. > > This puts more pressure on the archival nodes, as the archival nodes > need to pick up the slack and help new nodes bootstrap to the network. > As the pressure on archival nodes grows, fewer people will be able to > actually run archival nodes, and the situation will degrade. The > situation would likely become problematic quickly if bitcoin-core were > to ship with the defaults set to a pruned node. > > Even further, the people most likely to care about saving 100GB of > disk space are also the people least likely to care about some extra > bandwidth usage. For datacenter nodes, and for nodes doing lots of > bandwidth, the bandwidth is usually the biggest cost of running the > node. For home users however, as long as they stay under their > bandwidth cap, the bandwidth is actually free. Ideally, new nodes > would be able to bootstrap from nodes that do not have to pay for > their bandwidth, instead of needing to rely on a decreasing percentage > of heavy-duty archival nodes. > > I have (perhaps incorrectly) identified disk space consumption as the > most significant factor in your average user choosing to run a pruned > node or a lite client instead of a full node. The average user is not > typically too worried about bandwidth, and is also not typically too > worried about initial blockchain download time. But the 100GB hit to > your disk space can be a huge psychological factor, especially if your > hard drive only has 500GB available in the first place, and 250+ GB is > already consumed by other files you have. > > I believe that improving the disk usage situation would greatly > benefit decentralization, especially if it could be done without > putting pressure on archival nodes. > > *Small Nodes Proposal:* > > I propose an alternative to the pruned node that does not put undue > pressure on archival nodes, and would be acceptable and non-risky to > ship as a default in bitcoin-core. For lack of a better name, I'll > call this new type of node a 'small node'. The intention is that > bitcoin-core would eventually ship 'small nodes' by default, such that > the expected amount of disk consumption drops from today's 100+ GB to > less than 30 GB. > > My alternative proposal has the following properties: > > + Full nodes only need to store ~20% of the blockchain > + With very high probability, a new node will be able to recover the > entire blockchain by connecting to 6 random small node peers. > + An attacker that can eliminate a chosen+ 95% of the full nodes > running today will be unable to prevent new nodes from downloading the > full blockchain, even if the attacker is also able to eliminate all > archival nodes. (assuming all nodes today were small nodes instead of > archival nodes) > > Method: > > A small node will pick an index [5, 256). This index is that node's > permanent index. When storing a block, instead of storing the full > block, the node will use Reed-Solomon coding to erasure code the block > using a 5-of-256 scheme. The result will be 256 pieces that are 20% of > the size of the block each. The node picks the piece that corresponds > to its index, and stores that instead. (Indexes 0-4 are reserved for > archival nodes - explained later) > > The node is now storing a fragment of every block. Alone, this > fragment cannot be used to recover any piece of the blockchain. > However, when paired with any 5 unique fragments (fragments of the > same index will not be unique), the full block can be recovered. > > Nodes can optionally store more than 1 fragment each. At 5 fragments, > the node becomes a full archival node, and the chosen indexes should > be 0-4. This is advantageous for the archival node as the encoded data > for the first 5 indexes will actually be identical to the block itself > - there is no computational overhead for selecting the first indexes. > There is also no need to choose random indexes, because the full block > can be recovered no matter which indexes are chosen. > > When connecting to new peers, the indexes of each peer needs to be > known. Once peers totaling 5 unique indexes are discovered, blockchain > download can begin. Connecting to just 5 small node peers provides a > >95% chance of getting 5 uniques, with exponentially improving odds of > success as you connect to more peers. Connecting to a single archive > node guarantees that any gaps can be filled. > > A good encoder should be able to turn a block into a 5-of-256 piece > set in under 10 milliseconds using a single core on a standard > consumer desktop. This should not slow down initial blockchain > download substantially, though the overhead is more than a rounding error. > > *DoS Prevention:* > > A malicious node may provide garbage data instead of the actual piece. > Given just the garbage data and 4 other correct pieces, it is > impossible (best I know anyway) to tell which piece is the garbage piece. > > One option in this case would be to seek out an archival node that > could verify the correctness of the pieces, and identify the malicious > node. > > Another option would be to have the small nodes store a cryptographic > checksum of each piece. Obtaining the cryptographic checksum for all > 256 pieces would incur a nontrivial amount of hashing (post segwit, as > much as 100MB of extra hashing per block), and would require an > additional ~4kb of storage per block. The hashing overhead here may be > prohibitive. > > Another solution would be to find additional pieces and brute-force > combinations of 5 until a working combination was discovered. Though > this sounds nasty, it should take less than five seconds of > computation to find the working combination given 5 correct pieces and > 2 incorrect pieces. This computation only needs to be performed once > to identify the malicious peers. > > I also believe that alternative erasure coding schemes exist which > actually are able to identify the bad pieces given sufficient good > pieces, however I don't know if they have the same computational > performance as the best Reed-Solomon coding implementations. > > *Deployment:* > > Small nodes are completely useless unless the critical mass of 5 > pieces can be obtained. The first version that supports small node > block downloads should default everyone to an archival node (meaning > indexes 0-4 are used) > > Once there are enough small-node-enabled archive nodes, the default > can be switched so that nodes only have a single index by default. In > the first few days, when there are only a few small nodes, the > previously-deployed archival nodes can help fill in the gaps, and the > small nodes can be useful for blockchain download right away. > > ---------------------------------- > > This represents a non-trivial amount of code, but I believe that the > result would be a non-trivial increase in the percentage of users > running full nodes, and a healthier overall network. > > > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev -- Zcash wallets made simple: https://github.com/Ayms/zcash-wallets Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets Get the torrent dynamic blocklist: http://peersm.com/getblocklist Check the 10 M passwords list: http://peersm.com/findmyass Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org Peersm : http://www.peersm.com torrent-live: https://github.com/Ayms/torrent-live node-Tor : https://www.github.com/Ayms/node-Tor GitHub : https://www.github.com/Ayms --------------74800D2B13771D9DD9D7A344 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit

While I fully agree with the intent (increasing full nodes so a big miner waking up in a bad mood can't threaten the world any longer every day as it is now) I am not sure to get the interest of this proposal, because:

- it's probably not a good idea to encourage the home users to run full nodes, there are many people running servers far from their capacity that could easily run efficient full nodes

- if someone can't allocate 100 GB today to run a full node, then we can't expect him to allocate more in the future

- the download time is a real concern

- this proposal is a kind of reinventing torrents, while limiting the number of connections to something not efficient at all, I don't see why something that is proven to be super efficient (torrents) would be needed to be reinvented, I am not saying that it should be used as the bittorrent network is doing but the concepts can be reused

- I don't get at all the concept of "archival" nodes since it's another useless step toward centralization

I think the only way to increase full nodes it to design an incentive for people to run them


Le 17/04/2017 à 08:54, David Vorick via bitcoin-dev a écrit :
Rationale:

A node that stores the full blockchain (I will use the term archival node) requires over 100GB of disk space, which I believe is one of the most significant barriers to more people running full nodes. And I believe the ecosystem would benefit substantially if more users were running full nodes.

The best alternative today to storing the full blockchain is to run a pruned node, which keeps only the UTXO set and throws away already verified blocks. The operator of the pruned node is able to enjoy the full security benefits of a full node, but is essentially leeching the network, as they performed a large download likely without contributing anything back.

This puts more pressure on the archival nodes, as the archival nodes need to pick up the slack and help new nodes bootstrap to the network. As the pressure on archival nodes grows, fewer people will be able to actually run archival nodes, and the situation will degrade. The situation would likely become problematic quickly if bitcoin-core were to ship with the defaults set to a pruned node.

Even further, the people most likely to care about saving 100GB of disk space are also the people least likely to care about some extra bandwidth usage. For datacenter nodes, and for nodes doing lots of bandwidth, the bandwidth is usually the biggest cost of running the node. For home users however, as long as they stay under their bandwidth cap, the bandwidth is actually free. Ideally, new nodes would be able to bootstrap from nodes that do not have to pay for their bandwidth, instead of needing to rely on a decreasing percentage of heavy-duty archival nodes.

I have (perhaps incorrectly) identified disk space consumption as the most significant factor in your average user choosing to run a pruned node or a lite client instead of a full node. The average user is not typically too worried about bandwidth, and is also not typically too worried about initial blockchain download time. But the 100GB hit to your disk space can be a huge psychological factor, especially if your hard drive only has 500GB available in the first place, and 250+ GB is already consumed by other files you have.

I believe that improving the disk usage situation would greatly benefit decentralization, especially if it could be done without putting pressure on archival nodes.

Small Nodes Proposal:

I propose an alternative to the pruned node that does not put undue pressure on archival nodes, and would be acceptable and non-risky to ship as a default in bitcoin-core. For lack of a better name, I'll call this new type of node a 'small node'. The intention is that bitcoin-core would eventually ship 'small nodes' by default, such that the expected amount of disk consumption drops from today's 100+ GB to less than 30 GB.

My alternative proposal has the following properties:

+ Full nodes only need to store ~20% of the blockchain
+ With very high probability, a new node will be able to recover the entire blockchain by connecting to 6 random small node peers.
+ An attacker that can eliminate a chosen+ 95% of the full nodes running today will be unable to prevent new nodes from downloading the full blockchain, even if the attacker is also able to eliminate all archival nodes. (assuming all nodes today were small nodes instead of archival nodes)

Method:

A small node will pick an index [5, 256). This index is that node's permanent index. When storing a block, instead of storing the full block, the node will use Reed-Solomon coding to erasure code the block using a 5-of-256 scheme. The result will be 256 pieces that are 20% of the size of the block each. The node picks the piece that corresponds to its index, and stores that instead. (Indexes 0-4 are reserved for archival nodes - explained later)

The node is now storing a fragment of every block. Alone, this fragment cannot be used to recover any piece of the blockchain. However, when paired with any 5 unique fragments (fragments of the same index will not be unique), the full block can be recovered.

Nodes can optionally store more than 1 fragment each. At 5 fragments, the node becomes a full archival node, and the chosen indexes should be 0-4. This is advantageous for the archival node as the encoded data for the first 5 indexes will actually be identical to the block itself - there is no computational overhead for selecting the first indexes. There is also no need to choose random indexes, because the full block can be recovered no matter which indexes are chosen.

When connecting to new peers, the indexes of each peer needs to be known. Once peers totaling 5 unique indexes are discovered, blockchain download can begin. Connecting to just 5 small node peers provides a >95% chance of getting 5 uniques, with exponentially improving odds of success as you connect to more peers. Connecting to a single archive node guarantees that any gaps can be filled.

A good encoder should be able to turn a block into a 5-of-256 piece set in under 10 milliseconds using a single core on a standard consumer desktop. This should not slow down initial blockchain download substantially, though the overhead is more than a rounding error.

DoS Prevention:

A malicious node may provide garbage data instead of the actual piece. Given just the garbage data and 4 other correct pieces, it is impossible (best I know anyway) to tell which piece is the garbage piece.

One option in this case would be to seek out an archival node that could verify the correctness of the pieces, and identify the malicious node.

Another option would be to have the small nodes store a cryptographic checksum of each piece. Obtaining the cryptographic checksum for all 256 pieces would incur a nontrivial amount of hashing (post segwit, as much as 100MB of extra hashing per block), and would require an additional ~4kb of storage per block. The hashing overhead here may be prohibitive.

Another solution would be to find additional pieces and brute-force combinations of 5 until a working combination was discovered. Though this sounds nasty, it should take less than five seconds of computation to find the working combination given 5 correct pieces and 2 incorrect pieces. This computation only needs to be performed once to identify the malicious peers.

I also believe that alternative erasure coding schemes exist which actually are able to identify the bad pieces given sufficient good pieces, however I don't know if they have the same computational performance as the best Reed-Solomon coding implementations.

Deployment:

Small nodes are completely useless unless the critical mass of 5 pieces can be obtained. The first version that supports small node block downloads should default everyone to an archival node (meaning indexes 0-4 are used)

Once there are enough small-node-enabled archive nodes, the default can be switched so that nodes only have a single index by default. In the first few days, when there are only a few small nodes, the previously-deployed archival nodes can help fill in the gaps, and the small nodes can be useful for blockchain download right away.

----------------------------------

This represents a non-trivial amount of code, but I believe that the result would be a non-trivial increase in the percentage of users running full nodes, and a healthier overall network.


_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

-- 
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms
--------------74800D2B13771D9DD9D7A344--