Received-SPF: pass (sog-mx-2.v43.ch3.sourceforge.com: domain of gmail.com
	designates 209.85.220.173 as permitted sender)
	client-ip=209.85.220.173; envelope-from=tier.nolan@gmail.com;
	helo=mail-qk0-f173.google.com; 
MIME-Version: 1.0
In-Reply-To: <CAAS2fgRzGkcJbWbJmFN2-NSJGUcLdPKp0q7FjM0x7WDvHoRq=g@mail.gmail.com>
References: <CANJO25J1WRHtfQLVXUB2s_sjj39pTPWmixAcXNJ3t-5os8RPmQ@mail.gmail.com>
	<CANJO25JTtfmfsOQYOzJeksJn3CoKE3W8iLGsRko-_xd4XhB3ZA@mail.gmail.com>
	<CAJHLa0O5OxaX5g3u=dnCY6Lz_gK3QZgQEPNcWNVRD4JziwAmvg@mail.gmail.com>
	<20150512171640.GA32606@savin.petertodd.org>
	<CAE-z3OV3VdSoiTSfASwYHr1CjZSqio303sqGq_1Y9yaYgov2sw@mail.gmail.com>
	<CAAS2fgRzGkcJbWbJmFN2-NSJGUcLdPKp0q7FjM0x7WDvHoRq=g@mail.gmail.com>
Date: Tue, 12 May 2015 23:00:33 +0100
Message-ID: <CAE-z3OWR72Og78RLuXEPjzRR8gCEjAuFk2nq-JzDtt_2pKSmHQ@mail.gmail.com>
From: Tier Nolan <tier.nolan@gmail.com>
Cc: Bitcoin Dev <bitcoin-development@lists.sourceforge.net>
Content-Type: multipart/alternative; boundary=001a11475ddaa566850515e99f27
Subject: Re: [Bitcoin-development] Proposed additional options for pruned
	nodes
Precedence: list

--001a11475ddaa566850515e99f27
Content-Type: text/plain; charset=UTF-8

On Tue, May 12, 2015 at 8:03 PM, Gregory Maxwell <gmaxwell@gmail.com> wrote:

>
> (0) Block coverage should have locality; historical blocks are
> (almost) always needed in contiguous ranges.   Having random peers
> with totally random blocks would be horrific for performance; as you'd
> have to hunt down a working peer and make a connection for each block
> with high probability.
>
> (1) Block storage on nodes with a fraction of the history should not
> depend on believing random peers; because listening to peers can
> easily create attacks (e.g. someone could break the network; by
> convincing nodes to become unbalanced) and not useful-- it's not like
> the blockchain is substantially different for anyone; if you're to the
> point of needing to know coverage to fill then something is wrong.
> Gaps would be handled by archive nodes, so there is no reason to
> increase vulnerability by doing anything but behaving uniformly.
>
> (2) The decision to contact a node should need O(1) communications,
> not just because of the delay of chasing around just to find who has
> someone; but because that chasing process usually makes the process
> _highly_ sybil vulnerable.
>
> (3) The expression of what blocks a node has should be compact (e.g.
> not a dense list of blocks) so it can be rumored efficiently.
>
> (4) Figuring out what block (ranges) a peer has given should be
> computationally efficient.
>
> (5) The communication about what blocks a node has should be compact.
>
> (6) The coverage created by the network should be uniform, and should
> remain uniform as the blockchain grows; ideally it you shouldn't need
> to update your state to know what blocks a peer will store in the
> future, assuming that it doesn't change the amount of data its
> planning to use. (What Tier Nolan proposes sounds like it fails this
> point)
>
> (7) Growth of the blockchain shouldn't cause much (or any) need to
> refetch old blocks.
>

M = 1,000,000
N = number of "starts"

S(0) = hash(seed) mod M
...
S(n) = hash(S(n-1)) mod M

This generates a sequence of start points.  If the start point is less than
the block height, then it counts as a hit.

The node stores the 50MB of data starting at the block at height S(n).

As the blockchain increases in size, new starts will be less than the block
height.  This means some other runs would be deleted.

A weakness is that it is random with regards to block heights.  Tiny blocks
have the same priority as larger blocks.

0) Blocks are local, in 50MB runs
1) Agreed, nodes should download headers-first (or some other compact way
of finding the highest POW chain)
2) M could be fixed, N and the seed are all that is required.  The seed
doesn't have to be that large.  If 1% of the blockchain is stored, then 16
bits should be sufficient so that every block is covered by seeds.
3) N is likely to be less than 2 bytes and the seed can be 2 bytes
4) A 1% cover of 50GB of blockchain would have 10 starts @ 50MB per run.
That is 10 hashes.  They don't even necessarily need to be crypt hashes
5) Isn't this the same as 3?
6) Every block has the same odds of being included.  There inherently needs
to be an update when a node deletes some info due to exceeding its cap.  N
can be dropped one run at a time.
7) When new starts drop below the tip height, N can be decremented and that
one run is deleted.

There would need to be a special rule to ensure the low height blocks are
covered.  Nodes should keep the first 50MB of blocks with some probability
(10%?)

--001a11475ddaa566850515e99f27
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div><div><div><div><div><div><div><di=
v><div class=3D"gmail_extra"><br><div class=3D"gmail_quote"><br>On Tue, May=
 12, 2015 at 8:03 PM, Gregory Maxwell <span dir=3D"ltr">&lt;<a href=3D"mail=
to:gmaxwell@gmail.com" target=3D"_blank">gmaxwell@gmail.com</a>&gt;</span> =
wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8=
ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
(0) Block coverage should have locality; historical blocks are<br>
(almost) always needed in contiguous ranges.=C2=A0 =C2=A0Having random peer=
s<br>
with totally random blocks would be horrific for performance; as you&#39;d<=
br>
have to hunt down a working peer and make a connection for each block<br>
with high probability.<br>
<br>
(1) Block storage on nodes with a fraction of the history should not<br>
depend on believing random peers; because listening to peers can<br>
easily create attacks (e.g. someone could break the network; by<br>
convincing nodes to become unbalanced) and not useful-- it&#39;s not like<b=
r>
the blockchain is substantially different for anyone; if you&#39;re to the<=
br>
point of needing to know coverage to fill then something is wrong.<br>
Gaps would be handled by archive nodes, so there is no reason to<br>
increase vulnerability by doing anything but behaving uniformly.<br>
<br>
(2) The decision to contact a node should need O(1) communications,<br>
not just because of the delay of chasing around just to find who has<br>
someone; but because that chasing process usually makes the process<br>
_highly_ sybil vulnerable.<br>
<br>
(3) The expression of what blocks a node has should be compact (e.g.<br>
not a dense list of blocks) so it can be rumored efficiently.<br>
<br>
(4) Figuring out what block (ranges) a peer has given should be<br>
computationally efficient.<br>
<br>
(5) The communication about what blocks a node has should be compact.<br>
<br>
(6) The coverage created by the network should be uniform, and should<br>
remain uniform as the blockchain grows; ideally it you shouldn&#39;t need<b=
r>
to update your state to know what blocks a peer will store in the<br>
future, assuming that it doesn&#39;t change the amount of data its<br>
planning to use. (What Tier Nolan proposes sounds like it fails this<br>
point)<br>
<br>
(7) Growth of the blockchain shouldn&#39;t cause much (or any) need to<br>
refetch old blocks.<br></blockquote><div><br><div class=3D"gmail_quote">M =
=3D 1,000,000<br></div><div class=3D"gmail_quote">N =3D number of &quot;sta=
rts&quot;<br></div><div class=3D"gmail_quote"><br></div><div class=3D"gmail=
_quote">S(0) =3D hash(seed) mod M<br></div><div class=3D"gmail_quote">...<b=
r>S(n) =3D hash(S(n-1)) mod M<br></div><div class=3D"gmail_quote"><br></div=
><div class=3D"gmail_quote">This generates a sequence of start points.=C2=
=A0 If the start point is less than the block height, then it counts as a h=
it.<br><br></div><div class=3D"gmail_quote">The node stores the 50MB of dat=
a starting at the block at height S(n).<br><br></div><div class=3D"gmail_qu=
ote">As
 the blockchain increases in size, new starts will be less than the=20
block height.=C2=A0 This means some other runs would be deleted.<br><br></d=
iv>A weakness is that it is random with regards to block heights.=C2=A0 Tin=
y blocks have the same priority as larger blocks.<br><br></div><div>0) Bloc=
ks are local, in 50MB runs<br></div><div>1) Agreed, nodes should download h=
eaders-first (or some other compact way of finding the highest POW chain)<b=
r></div><div>2) M could be fixed, N and the seed are all that is required.=
=C2=A0 The seed doesn&#39;t have to be that large.=C2=A0 If 1% of the block=
chain is stored, then 16 bits should be sufficient so that every block is c=
overed by seeds.<br></div><div>3) N is likely to be less than 2 bytes and t=
he seed can be 2 bytes<br></div><div>4) A 1% cover of 50GB of blockchain wo=
uld have 10 starts @ 50MB per run.=C2=A0 That is 10 hashes.=C2=A0 They don&=
#39;t even necessarily need to be crypt hashes<br></div><div>5) Isn&#39;t t=
his the same as 3?<br></div><div>6) Every block has the same odds of being =
included.=C2=A0 There inherently needs to be an update when a node deletes =
some info due to exceeding its cap.=C2=A0 N can be dropped one run at a tim=
e.=C2=A0 <br></div><div>7) When new starts drop below the tip height, N can=
 be decremented and that one run is deleted.<br><br></div><div>There would =
need to be a special rule to ensure the low height blocks are covered.=C2=
=A0 Nodes should keep the first 50MB of blocks with some probability (10%?)=
<br></div></div></div></div></div></div></div></div></div></div></div></div=
></div></div></div></div>

--001a11475ddaa566850515e99f27--