Return-Path: Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 80F0AC0001 for ; Fri, 26 Feb 2021 18:40:49 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 5B28184090 for ; Fri, 26 Feb 2021 18:40:49 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org X-Spam-Flag: NO X-Spam-Score: 0.601 X-Spam-Level: X-Spam-Status: No, score=0.601 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no Authentication-Results: smtp1.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id b_PWG9qSu3JJ for ; Fri, 26 Feb 2021 18:40:48 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) by smtp1.osuosl.org (Postfix) with ESMTPS id 6E24D84080 for ; Fri, 26 Feb 2021 18:40:48 +0000 (UTC) Received: by mail-wr1-f42.google.com with SMTP id e10so9289029wro.12 for ; Fri, 26 Feb 2021 10:40:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=EN5ikFcIA6VxrMha3gPH+0OckG3F3b9Qz49V0HsqmrQ=; b=QYVg6BFvX59G9NKi5oRdOh6Fm2Fn7FIgEGVv5NPU/rI94fIpqFk8JKksLd6/l9QD9/ NcNj8C+Kc0yhBa9yhyzQWNzxr5mblnCSiPiKVDaZ1Z4B4peg6IHp6rmKEqoyBc0FbrVQ bISSrn9T71UEHSUegLKr3x5PL0NKwYQBKGWIGZro7JEANUfXnxWpzjn/UbwwVMbi3Dnd l5ZAbbalcKWJUQpVHbi1bjUubT4VA9daK9mLfMUV+KI2JCDIo111hfcNqN9mf2Z+/TMS k+sIIFD3tRuocE8cHtvug6SZPLvQ8J8RNtTdpEna5ZJaznIGDIvohbOOVjtE9WFnG39S 2oTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=EN5ikFcIA6VxrMha3gPH+0OckG3F3b9Qz49V0HsqmrQ=; b=ItvHQrG2xQggEtwvOxRhSsBNJWzsg49xFB3ydqEFzhL0NdMMFtErwYJfzSNQv1G0sT hIoeuugIStYl35XeHV2OK3wh4JWNfiqIjXX2TAGj5XNBvQm0xQSsGlEg3TufpsAKNOrg rEwvVmAuGenbnO1HAoIfLrIZgfv8d+sQnYay7313F+1rYyeifRBF+4dRH86ropM2pT7u oSyyaZWpNoUBY+ZHKrSaB/SvI7vdZOZZjzhpfgnzvLHMaI7TbGcI6+sSglxHfNM/cdTZ pLRwmoS+Q1+kI3e5POrVqGZ8riCHIYMe39LwyMlS+ETlWSMywDxk2lsESW5hA/XvYiyZ 3yRA== X-Gm-Message-State: AOAM530aYM8gbJkWyywO6yMLwi5iXfyfFVEAO6YPNBbYw/ASUi4k+2Gu jVgvShLVnM0+WAGCbLYdzVMpb5WjmFyMIp+UxMPn8HXHPbc= X-Google-Smtp-Source: ABdhPJzDTAGOzM1UZMiyiauCgq6IY08WQbeiY+zSdflTYpCVVn3Y/NStUxyOib48hbUwuz6znCLZT1KPfXO3NKxPmh0= X-Received: by 2002:adf:ed44:: with SMTP id u4mr4577063wro.35.1614364846272; Fri, 26 Feb 2021 10:40:46 -0800 (PST) MIME-Version: 1.0 From: Keagan McClelland Date: Fri, 26 Feb 2021 11:40:35 -0700 Message-ID: To: Bitcoin Protocol Discussion Content-Type: multipart/alternative; boundary="00000000000033320f05bc419b01" X-Mailman-Approved-At: Fri, 26 Feb 2021 20:57:10 +0000 Subject: [bitcoin-dev] A design for Probabilistic Partial Pruning X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Feb 2021 18:40:49 -0000 --00000000000033320f05bc419b01 Content-Type: text/plain; charset="UTF-8" Hi all, I've been thinking for quite some time about the problem of pruned nodes and ongoing storage costs for full nodes. One of the things that strikes me as odd is that we only really have two settings. A. Prune everything except the most recent blocks, down to the cache size B. Keep everything since genesis From my observations and conversations with various folks in the community, they would like to be able to run a "partially" pruned node to help bear the load of bootstrapping other nodes and helping with data redundancy in the network, but would prefer to not dedicate hundreds of Gigabytes of storage space to the cause. This led me to the idea that a node could randomly prune some of the blocks from history if it passed some predicate. A rough sketch of this would look as follows. 1. At node startup, it would generate a random seed, this would be unique to the node but not necessary that it be cryptographically secure. 2. In the node configuration it would also carry a "threshold" expressed as some percentage of blocks it wanted to keep. 3. As IBD occurs, based off of the threshold, the block hash, and the node's unique seed, the node would either decide to prune the data or keep it. The uniqueness of the node's hash should ensure that no block is systematically overrepresented in the set of nodes choosing this storage scheme. 4. Once the node's IBD is complete it would advertise this as a peer service, advertising its seed and threshold, so that nodes could deterministically deduce which of its peers had which blocks. The goals are to increase data redundancy in a way that more uniformly shares the load across nodes, alleviating some of the pressure of full archive nodes on the IBD problem. I am working on a draft BIP for this proposal but figured I would submit it as a high level idea in case anyone had any feedback on the initial design before I go into specification levels of detail. If you have thoughts on A. The protocol design itself B. The barriers to put this kind of functionality into Core I would love to hear from you, Cheers, Keagan --00000000000033320f05bc419b01 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi all,

I've been thinking for quit= e some time about the problem of pruned nodes and ongoing storage costs for= full nodes. One of the things that strikes me as odd is that we only reall= y have two settings.

A. Prune everything except th= e most recent blocks, down to the cache size
B. Keep everything s= ince genesis

From my observations and conversation= s with various folks in the community, they would like to be able to run a = "partially" pruned node to help bear the load of bootstrapping ot= her nodes and helping with data redundancy in the network, but would prefer= to not dedicate hundreds of Gigabytes of storage space to the cause.
=

This led me to the idea that a node could randomly prun= e some of the blocks from history if it passed some predicate. A rough sket= ch of this would look as follows.

1. At node start= up, it would generate a random seed, this would be unique to the node but n= ot necessary that it be cryptographically secure.
2. In the node = configuration it would also carry a "threshold" expressed as some= percentage of blocks it wanted to keep.
3. As IBD occurs, based = off of the threshold, the block hash, and the node's unique seed, the n= ode would either decide to prune the data or keep it. The uniqueness of the= node's hash should ensure that no block is systematically overrepresen= ted in the set of nodes choosing this storage scheme.
4. Once the= node's IBD is complete it would advertise this as a peer service, adve= rtising its seed and threshold, so that nodes could deterministically deduc= e which of its peers had which blocks.

The goals a= re to increase data redundancy in a way that more uniformly shares the load= across nodes, alleviating some of the pressure of full archive nodes on th= e IBD problem. I am working on a draft BIP for this proposal but figured I = would submit it as a high level idea in case anyone had any feedback on the= initial design before I go into specification levels of detail.
=
If you have thoughts on

A. The prot= ocol design itself
B. The barriers to put this kind of functional= ity into Core

I would love to hear from you,
=

Cheers,
Keagan
--00000000000033320f05bc419b01--