diff options
author | Gregory Maxwell <greg@xiph.org> | 2017-04-21 20:38:36 +0000 |
---|---|---|
committer | bitcoindev <bitcoindev@gnusha.org> | 2017-04-21 20:38:39 +0000 |
commit | f0cc285425bb9961f7dfe0533bfc4212fd738dc1 (patch) | |
tree | 3cfc6b9247ae2a6fed27fee55585df4a124710d7 | |
parent | db98bd7e95fc0ec996a43b88115adb227cd7e977 (diff) | |
download | pi-bitcoindev-f0cc285425bb9961f7dfe0533bfc4212fd738dc1.tar.gz pi-bitcoindev-f0cc285425bb9961f7dfe0533bfc4212fd738dc1.zip |
Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
-rw-r--r-- | 04/67a539128a51d26df1462c52dcc33347677ecd | 312 |
1 files changed, 312 insertions, 0 deletions
diff --git a/04/67a539128a51d26df1462c52dcc33347677ecd b/04/67a539128a51d26df1462c52dcc33347677ecd new file mode 100644 index 000000000..fa73daa58 --- /dev/null +++ b/04/67a539128a51d26df1462c52dcc33347677ecd @@ -0,0 +1,312 @@ +Return-Path: <gmaxwell@gmail.com> +Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org + [172.17.192.35]) + by mail.linuxfoundation.org (Postfix) with ESMTPS id 03728BCB + for <bitcoin-dev@lists.linuxfoundation.org>; + Fri, 21 Apr 2017 20:38:39 +0000 (UTC) +X-Greylist: whitelisted by SQLgrey-1.7.6 +Received: from mail-ua0-f178.google.com (mail-ua0-f178.google.com + [209.85.217.178]) + by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C6491140 + for <bitcoin-dev@lists.linuxfoundation.org>; + Fri, 21 Apr 2017 20:38:37 +0000 (UTC) +Received: by mail-ua0-f178.google.com with SMTP id j59so43940122uad.0 + for <bitcoin-dev@lists.linuxfoundation.org>; + Fri, 21 Apr 2017 13:38:37 -0700 (PDT) +DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; + h=mime-version:sender:in-reply-to:references:from:date:message-id + :subject:to:cc; + bh=shyPDrYp9c3rKtJFxGH7/XxCsgHywCoRQzFBJJiOZFI=; + b=oDWao0gNxyxcu8dTsJWXJ14LjT973J9otGVhurTh+zpArHMXhVogsE2Yf4rY3Hs1nW + YfP7t94aJyo2hxfmiaHajidrP9ELoNSRDeeUJ5VsiZCTz+95/T9RZcZtUyl0LGqEOIIq + d2mvYAP8WiAmEU+0y17gKtq8qOKaBPGARGiG2aDuOc0jC1wNUDJxz+Xj6je6tZaaFPw7 + McPD5OlfYWwbfcIxACpN+fKO0UwV+45IbE1n9w0h82SwzAexp8qgKhP2hYFRKDdxTYX2 + 5laZdmoSSplnGFYQgcaMYWy+riQZJF/KhatvUaBv0YJy5uM0jQJU1E/wqJWA0mrZH4UK + FbSg== +X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; + d=1e100.net; s=20161025; + h=x-gm-message-state:mime-version:sender:in-reply-to:references:from + :date:message-id:subject:to:cc; + bh=shyPDrYp9c3rKtJFxGH7/XxCsgHywCoRQzFBJJiOZFI=; + b=NSsRus0w5vlpAgPo76ehfMqwqsXjHtCPufjRTERPk61MwB8+h+ei3AmjJuizjudDHh + GQwc09LSMkIxcjsMhyoSqrX0fONOXmsWj+2iO6nWcs2ZIC4OZGn8zXnItArZzzdMgfw3 + MGeDcGjqzEpTVuQSbJAe8qzCcjhGxmApyutevHs4MVu13QTmYTW9PTpctieSp7O63+BJ + uF5RGzLL3DDG8Zcz5jOS923WdBlSst0Fo44pkrpVs+q4sVVxZAbtSYGp5fMCoCUANosQ + 4ccTr93zZADYJP5JzHK2whRtbTdQoTXMSEIyCeQfGscQ5fLpx9QqgFWMIL9e0I2mlDDT + 63sg== +X-Gm-Message-State: AN3rC/5BAiiZH/qtufCY4udO4GMPoSzZbT9TOvXUYQWWeHcPB0FTDM6p + BolfvkyVZeGQvcbC4ij6UfKZjoZKwA== +X-Received: by 10.176.84.211 with SMTP id q19mr7351167uaa.119.1492807116705; + Fri, 21 Apr 2017 13:38:36 -0700 (PDT) +MIME-Version: 1.0 +Sender: gmaxwell@gmail.com +Received: by 10.103.94.132 with HTTP; Fri, 21 Apr 2017 13:38:36 -0700 (PDT) +In-Reply-To: <CAFVRnypbQQ-vsSLqv48cYaqTCty4R1DmFRqfAvxe4mAqyQNXxQ@mail.gmail.com> +References: <CAFVRnypbQQ-vsSLqv48cYaqTCty4R1DmFRqfAvxe4mAqyQNXxQ@mail.gmail.com> +From: Gregory Maxwell <greg@xiph.org> +Date: Fri, 21 Apr 2017 20:38:36 +0000 +X-Google-Sender-Auth: 41lOI8puhoMMcqRP9PIyukPjMaw +Message-ID: <CAAS2fgT5pJh68xufv_81+N8K0asxH16WdX7PLLXGjRPmJOkYFQ@mail.gmail.com> +To: David Vorick <david.vorick@gmail.com> +Content-Type: text/plain; charset=UTF-8 +X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, + DKIM_VALID, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, + RCVD_IN_SORBS_SPAM autolearn=no version=3.3.1 +X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on + smtp1.linux-foundation.org +Cc: Bitcoin Dev <bitcoin-dev@lists.linuxfoundation.org> +Subject: Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes +X-BeenThere: bitcoin-dev@lists.linuxfoundation.org +X-Mailman-Version: 2.1.12 +Precedence: list +List-Id: Bitcoin Protocol Discussion <bitcoin-dev.lists.linuxfoundation.org> +List-Unsubscribe: <https://lists.linuxfoundation.org/mailman/options/bitcoin-dev>, + <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=unsubscribe> +List-Archive: <http://lists.linuxfoundation.org/pipermail/bitcoin-dev/> +List-Post: <mailto:bitcoin-dev@lists.linuxfoundation.org> +List-Help: <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=help> +List-Subscribe: <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>, + <mailto:bitcoin-dev-request@lists.linuxfoundation.org?subject=subscribe> +X-List-Received-Date: Fri, 21 Apr 2017 20:38:39 -0000 + +On Mon, Apr 17, 2017 at 6:54 AM, David Vorick via bitcoin-dev +<bitcoin-dev@lists.linuxfoundation.org> wrote: +> Rationale: +> +> A node that stores the full blockchain (I will use the term archival node) +> requires over 100GB of disk space, which I believe is one of the most +> significant barriers to more people running full nodes. And I believe the +> ecosystem would benefit substantially if more users were running full nodes. + +Hi David, + +I've been thinking about this subject for a long time (Tier Nolan +linked some of the threads; see also the coloration part of +https://en.bitcoin.it/wiki/User:Gmaxwell/block_network_coding) and +about using FEC with the history for the last year or so. (we use FEC +in practice in fibre for relay at the tip now, and that has a design +history going back to 2013). + +As Jonas points out, we're all set to also offer the 'recent blocks' +separately, so that is obviously going to happen and will help. The +free design parameter is the definition of recent, but we have good +measurements for setting that now. But about history... + +I think I might have designed myself into a corner and perhaps you've +shown a way out-- I think there are some serious limits in your +proposal but perhaps we can fix them. Let me give you what I had been +thinking about, then hand you at least a couple improvements to your +thinking: + +As you hopefully now know (per Tier Nolan's) post: I'd previously been +thinking about nodes keeping a deterministic random, independently +selected subset which is self-leveling based on a small seed. The +connection overhead can can be mitigated by working with chunks of +blocks rather than single blocks. + +But as you've observed, the failure probabilities are rather high, +especially if an active attacker targets nodes carrying less commonly +available blocks. So I thought, okay we can use FEC to help improve +the recovery odds. + +So I considered using a sparse random code (E.g. an LDPC erasure code +like in RFC 5170) the advantage of these codes is that they are very +fast to decode, and they support having enormous codewords, so you can +a high probability of every peer having a unique ID, so there would +effectively never need to be any duplication. + +But then I ran into the problem that I had no reasonable way to +recover from bad data, short of pulling a single group from an archive +then banning whatever peers gave you bad chunks. + +So that is kind of where I got stuck. + +I didn't even consider the advantage of only being able to use a few +peers total, as I still assumed that it would be doing the random +groups thing as well. That's a great point. + +So on your design: + +Being able to have a lower bound of 20% seems like a serious +limitation to me: it would be fine today, but the chain will quite +possibly be twice the current size in a year.... and in four years +we're back to where we are now. What I'd been thinking would be able +to scale arbitrarily low. + +20% is small, but is it small enough? -- today that would get us back +into common SSDs being able to hold the whole chain, but that property +will probably be lost again in a couple years. I think with any fixed +fraction we'll constantly be fighting against the opportunity cost of +upgrading storage, if not the cost of the storage itself. (and I agree +this is the vast majority of the response from actual users, sync +time and ongoing bandwidth usage seeming to tie for second; the latter +of which can be mitigated in other ways but for the former see my +remarks at the end) + +The fact that having only a few required blocks needed lets you brute +force out the decode is a great point. I hadn't considered that, and +the schemed I'd been considering are not communications efficient with +only a few blocks required e.g. they sometimes require a couple extra +blocks to successfully decode, which is a lot of overhead if you're +only splitting 5 ways. + +> I also believe that alternative erasure coding schemes exist which actually +> are able to identify the bad pieces given sufficient good pieces, however I +> don't know if they have the same computational performance as the best +> Reed-Solomon coding implementations. + +Actually RS codes are _the_ codes you want to use for with decoding +with errors but the 'computationally efficient' error correcting +decoding (which is itself heinously slow) requires 2*errors + data +number of blocks to decode. Not so useful if you're only looking at 5 +parts. + +RS decoding is pretty slow generally, esp compared to binary sparse +codes. We were unable to make RS coding make sense for use in fast +block relay for this reason-- decoding time bottlenecked +reconstruction. The most highly optimized RS codes make a special +optimization which is not useful for your proposal: They're much +faster to decode if you're decoding from the first couple correction +blocks. Without these optimizations speeds from from 1GB/s to more +like 100MB/s or worse. Though I suppose with assumevalid initial sync +now is pretty cpu-idle on most hardware. (If 100MB/s sounds fast, +keep in mind that time spent decoding is time that can't be spent +hashing/verifying/etc.. and on a lot hardware with fast broadband with +signature validation enabled we're cpu bound already) + +> Another option would be to have the small nodes store a cryptographic +> checksum of each piece. Obtaining the cryptographic checksum for all 256 +> pieces would incur a nontrivial amount of hashing (post segwit, as much as +> 100MB of extra hashing per block), and would require an additional ~4kb of +> storage per block. The hashing overhead here may be prohibitive. + +This was a complete non-starter when thinking about using these sparse +codes where the number of possible blocks is effectively unlimited. + +Your 4kb assumption isn't correct though: How you do it is that after +downloading a block you compute the 255 fragments (as an aside, you're +saying 256 but the most you can get is 255 for an 8-bit RS code). +then you compute a hashtree over them. Every node remembers the root, +and the membership proofs for their chunk(s)... this is 256 bytes of +extra storage. + +When you decode you use a majority to decide what root you are trying +to decode. If it fails to result in valid blocks, you blacklist that +root, ban all those peers, and try the next. Worst case cost is the +number of invalid roots rather than peers choose 5. + +I'm not sure if combinitoral decode or the above would be easier to implement. + +> This represents a non-trivial amount of code, but I believe that the result +> would be a non-trivial increase in the percentage of users running full +> nodes, and a healthier overall network. + +The RS coder stuff is small, even doing the fancy w/ error decodes it +isn't huge. I expect complexity mostly will show up in the garbage +input handling. + +A couple other thoughts: + +The coded blocks are not useful for things like bloom scanning or +other lookup. With the committed filter proposals you could still +keep the committed filters (even 5 way shared themselves...) so +perhaps not that much of a concern. + +Coding the blocks will make their encoding normative. The current P2P +one we use is by no means more efficient, Pieter and I have an +encoding that works on a transaction by transaction basis and +decreases the size of the whole chain by ~28%, block-wide entropy +encoding could reduce the size even further. We'd hoped to at least +start using this per transaction coding locally, converting on the fly +to the original serialization where needed (the idea would be to +eventually support the compacted serialization on the wire too). +Maybe the answer there is to just get in whatever improvements we +think are reasonable before doing the coded block. + +I think the 20% floor is still too high, and too many nodes will be +forced to just do the recent blocks things. perhaps not today, but +eventually. I suppose we could just assume that the block is split +10 ways, and the default is two indexes, but there is flexibility to +go down to one in the future, at the cost of needing more peers... +could run numbers on the decode failure odds for the 10 of 255 case... +But that would only improve it to 10%. I suppose the proposal could +be combined with sparse chain storage like I was thinking years ago, +but much less sparsity would be needed so the downsides would be less +severe. + +E.g. if you want to store <10% of the chain you'd keep some 10% blocks +for random groups. such a feature could be introduced later when it +was more important to keep less than 10 or 20 percent. + +Recovery odds could be improved with a second level of coding. E.g. if +your ID was not a constant but a seed into PRNG(height)%250+5 then +you also have a fraction of nodes store the xor between adjacent pairs +then you can fill in unrecoverable groups. the limit of this thinking +is exactly the sparse random code ECC schemes) Probably not worth the +complexity, but just a thing to keep in mind... + +Probably the next step is to do some more focused benchmarks. I have +some code I can recommend offline. + +I can't help but feel that this might be a little bit of a waste of +time. I think the long term survival of the system is likely going to +ultimately depend on doing an assumevalid like gated sync of a UTXO +set. Ethereum already does something far more reckless (they let +miners pick the state and blindly trust it with almost no depth +required) and no one cares. If that is done then all these concerns +are greatly reduced, along with the 100(+++?)GB/history-year transfer +costs. You'd still want to have archives, but do you care about 20% +archives? + +Replying to some other comments in the thread: + +> A user may not want to run a node at home, but rather on a digital ocean or AWS server + +This is almost if not quite entirely pointless. There are already many +nodes on AWS and DO, adding more does not improve your security (you +must trust DO/AWS), does not improve your reliability (DO/AWS), does +not improve the network's capacity, etc. About the only arguable +benefit I can see there beyond making more money for these hosts is +feel good points for (incorrectly) thinking you're helping the +network, and negligibly reducing the density of spy nodes (but wait: +AWS/DO may well just be spying on your connections too..). and it it +isn't like fast storage on these services is cheap. + +> Perhaps it is not, but I would think that it would be pretty straightforward to configure a global bandwidth limit within Bitcoin. + +We have outbound transmission limits though not by default. Setting +them sensibly is something of a black art. Obviously these will +improve over time... and are more reasons that I agree with your +relative cost assumptions except for the sync-time part. + +> Fingerprinting? + +Unavoidable, but should be made inconsequential by making transaction +announcement more private independent of any of this. There are +already _MANY_ ways to fingerprint nodes, please don't think that +existing software has any real immunity here. We avoid adding new +fingerprinting where we can... but they're very fingerprintable +already. Transaction announcement privacy MUST be fixed. I assume if +any of this is worth doing, it will also be worth the increase in +fingerprinting. And at least this proposal would 'only' create 255 +node classes. + +> SPV? + +As I pointed out above, Vorick's idea is totally compatible with +committed filters, and the filters can even be shared like the blocks +are. + +> [People who fail at math] + +The SNR on this list really sucks. If you aren't spending a couple +hours thinking about your responses or at least citing research that +took days of effort or more then you are probably wasting people's +time. Please respect the other users of the list. + +> Could there be a slider? + +Absolutely, I assume if Vorick's proposal were implemented that nodes +would have the follow options: Pruned [UTXO + recent two weeks of +blocks], 20%, 40%, 60%, 80%, 100% (archive). + |