MIME-Version: 1.0
References: <CAO3Pvs8ccTkgrecJG6KFbBW+9moHF-FTU+4qNfayeE3hM9uRrg@mail.gmail.com>
	<CAAS2fgRVTfsfXAyHBoBaCqAXpK+=QCFy-Lx3zH=d3tPteu7GcA@mail.gmail.com>
	<CAO3Pvs-_0scKqFnT-fE7aDd7gkw+epJkTcQ-jAYZwENsWt8b=A@mail.gmail.com>
In-Reply-To: <CAO3Pvs-_0scKqFnT-fE7aDd7gkw+epJkTcQ-jAYZwENsWt8b=A@mail.gmail.com>
From: Olaoluwa Osuntokun <laolu32@gmail.com>
Date: Fri, 09 Jun 2017 04:47:19 +0000
Message-ID: <CAO3Pvs-P2WcNRKGfH_FA7DFNRZOi5H+szO58wBLKA8gVh7mxZQ@mail.gmail.com>
To: Gregory Maxwell <greg@xiph.org>
Content-Type: multipart/alternative; boundary="94eb2c0b927cb69e8405517faa23"
Cc: Arnoud Kouwenhoven - Pukaki Corp via bitcoin-dev
	<bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for
 Light Clients
Precedence: list

--94eb2c0b927cb69e8405517faa23
Content-Type: text/plain; charset="UTF-8"

> Correct me if I'm wrong, but from my interpretation we can't use that
> method as described as we need to output 64-bit integers rather than
> 32-bit integers.

Had a chat with gmax off-list and came to the realization that the method
_should_ indeed generalize to our case of outputting 64-bit integers.
We'll need to do a bit of bit twiddling to make it work properly. I'll
modify our implementation and report back with some basic benchmarks.

-- Laolu


On Thu, Jun 8, 2017 at 8:42 PM Olaoluwa Osuntokun <laolu32@gmail.com> wrote:

> Gregory wrote:
> > I see the inner loop of construction and lookup are free of
> > non-constant divmod. This will result in implementations being
> > needlessly slow
>
> Ahh, sipa brought this up other day, but I thought he was referring to the
> coding loop (which uses a power of 2 divisor/modulus), not the
> siphash-then-reduce loop.
>
> > I believe this can be fixed by using this approach
> >
> http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
> > which has the same non-uniformity as mod but needs only a multiply and
> > shift.
>
> Very cool, I wasn't aware of the existence of such a mapping.
>
> Correct me if I'm wrong, but from my interpretation we can't use that
> method as described as we need to output 64-bit integers rather than
> 32-bit integers. A range of 32-bits would be constrain the number of items
> we could encode to be ~4096 to ensure that we don't overflow with fp
> values such as 20 (which we currently use in our code).
>
> If filter commitment are to be considered for a soft-fork in the future,
> then we should definitely optimize the construction of the filters as much
> as possible! I'll look into that paper you referenced to get a feel for
> just how complex the optimization would be.
>
> > Shouldn't all cases in your spec where you have N=transactions be
> > n=indexed-outputs? Otherwise, I think your golomb parameter and false
> > positive rate are wrong.
>
> Yep! Nice catch. Our code is correct, but mistake in the spec was an
> oversight on my part. I've pushed a commit[1] to the bip repo referenced
> in the OP to fix this error.
>
> I've also pushed another commit to explicitly take advantage of the fact
> that P is a power-of-two within the coding loop [2].
>
> -- Laolu
>
> [1]:
> https://github.com/Roasbeef/bips/commit/bc5c6d6797f3df1c4a44213963ba12e72122163d
> [2]:
> https://github.com/Roasbeef/bips/commit/578a4e3aa8ec04524c83bfc5d14be1b2660e7f7a
>
>
> On Wed, Jun 7, 2017 at 2:41 PM Gregory Maxwell <greg@xiph.org> wrote:
>
>> On Thu, Jun 1, 2017 at 7:01 PM, Olaoluwa Osuntokun via bitcoin-dev
>> <bitcoin-dev@lists.linuxfoundation.org> wrote:
>> > Hi y'all,
>> >
>> > Alex Akselrod and I would like to propose a new light client BIP for
>> > consideration:
>> >    *
>> https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki
>>
>> I see the inner loop of construction and lookup are free of
>> non-constant divmod. This will result in implementations being
>> needlessly slow (especially on arm, but even on modern x86_64 a
>> division is a 90 cycle-ish affair.)
>>
>> I believe this can be fixed by using this approach
>>
>> http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
>>    which has the same non-uniformity as mod but needs only a multiply
>> and shift.
>>
>> Otherwise fast implementations will have to implement the code to
>> compute bit twiddling hack exact division code, which is kind of
>> complicated. (e.g. via the technique in "{N}-bit Unsigned Division via
>> {N}-bit Multiply-Add" by Arch D. Robison).
>>
>> Shouldn't all cases in your spec where you have N=transactions be
>> n=indexed-outputs? Otherwise, I think your golomb parameter and false
>> positive rate are wrong.
>>
>

--94eb2c0b927cb69e8405517faa23
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>&gt; Correct me if I&#39;m wrong, but from my interpr=
etation we can&#39;t use that</div><div>&gt; method as described as we need=
 to output 64-bit integers rather than</div><div>&gt; 32-bit integers.=C2=
=A0</div><div><br></div><div>Had a chat with gmax off-list and came to the =
realization that the method</div><div>_should_ indeed generalize to our cas=
e of outputting 64-bit integers.</div><div>We&#39;ll need to do a bit of bi=
t twiddling to make it work properly. I&#39;ll</div><div>modify our impleme=
ntation and report back with some basic benchmarks.</div><div><br></div><di=
v>-- Laolu</div><div><br></div><br><div class=3D"gmail_quote"><div dir=3D"l=
tr">On Thu, Jun 8, 2017 at 8:42 PM Olaoluwa Osuntokun &lt;<a href=3D"mailto=
:laolu32@gmail.com">laolu32@gmail.com</a>&gt; wrote:<br></div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex"><div dir=3D"ltr"><div>Gregory wrote:</div><div>&gt; I see=
 the inner loop of construction and lookup are free of</div><div>&gt; non-c=
onstant divmod. This will result in implementations being</div><div>&gt; ne=
edlessly slow=C2=A0</div><div><br></div></div><div dir=3D"ltr"><div>Ahh, si=
pa brought this up other day, but I thought he was referring to the</div><d=
iv>coding loop (which uses a power of 2 divisor/modulus), not the</div><div=
>siphash-then-reduce loop.</div></div><div dir=3D"ltr"><div><br></div><div>=
&gt; I believe this can be fixed by using this approach</div><div>&gt; <a h=
ref=3D"http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-re=
duction/" target=3D"_blank">http://lemire.me/blog/2016/06/27/a-fast-alterna=
tive-to-the-modulo-reduction/</a></div><div>&gt; which has the same non-uni=
formity as mod but needs only a multiply and</div><div>&gt; shift.</div><di=
v><br></div></div><div dir=3D"ltr"><div>Very cool, I wasn&#39;t aware of th=
e existence of such a mapping.</div><div><br></div><div>Correct me if I&#39=
;m wrong, but from my interpretation we can&#39;t use that</div><div>method=
 as described as we need to output 64-bit integers rather than</div><div>32=
-bit integers. A range of 32-bits would be constrain the number of items</d=
iv><div>we could encode to be ~4096 to ensure that we don&#39;t overflow wi=
th fp</div><div>values such as 20 (which we currently use in our code).</di=
v><div><br></div><div>If filter commitment are to be considered for a soft-=
fork in the future,</div><div>then we should definitely optimize the constr=
uction of the filters as much</div><div>as possible! I&#39;ll look into tha=
t paper you referenced to get a feel for</div><div>just how complex the opt=
imization would be.</div></div><div dir=3D"ltr"><div><br></div><div>&gt; Sh=
ouldn&#39;t all cases in your spec where you have N=3Dtransactions be</div>=
<div>&gt; n=3Dindexed-outputs? Otherwise, I think your golomb parameter and=
 false</div><div>&gt; positive rate are wrong.</div><div><br></div></div><d=
iv dir=3D"ltr"><div>Yep! Nice catch. Our code is correct, but mistake in th=
e spec was an</div><div>oversight on my part. I&#39;ve pushed a commit[1] t=
o the bip repo referenced</div><div>in the OP to fix this error.</div><div>=
<br></div><div>I&#39;ve also pushed another commit to explicitly take advan=
tage of the fact</div><div>that P is a power-of-two within the coding loop =
[2].</div><div><br></div><div>-- Laolu</div><div><br></div><div>[1]: <a hre=
f=3D"https://github.com/Roasbeef/bips/commit/bc5c6d6797f3df1c4a44213963ba12=
e72122163d" target=3D"_blank">https://github.com/Roasbeef/bips/commit/bc5c6=
d6797f3df1c4a44213963ba12e72122163d</a></div><div>[2]: <a href=3D"https://g=
ithub.com/Roasbeef/bips/commit/578a4e3aa8ec04524c83bfc5d14be1b2660e7f7a" ta=
rget=3D"_blank">https://github.com/Roasbeef/bips/commit/578a4e3aa8ec04524c8=
3bfc5d14be1b2660e7f7a</a></div></div><div dir=3D"ltr"><div><br></div><br><d=
iv class=3D"gmail_quote"><div dir=3D"ltr">On Wed, Jun 7, 2017 at 2:41 PM Gr=
egory Maxwell &lt;<a href=3D"mailto:greg@xiph.org" target=3D"_blank">greg@x=
iph.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Thu, Jun =
1, 2017 at 7:01 PM, Olaoluwa Osuntokun via bitcoin-dev<br>
&lt;<a href=3D"mailto:bitcoin-dev@lists.linuxfoundation.org" target=3D"_bla=
nk">bitcoin-dev@lists.linuxfoundation.org</a>&gt; wrote:<br>
&gt; Hi y&#39;all,<br>
&gt;<br>
&gt; Alex Akselrod and I would like to propose a new light client BIP for<b=
r>
&gt; consideration:<br>
&gt;=C2=A0 =C2=A0 * <a href=3D"https://github.com/Roasbeef/bips/blob/master=
/gcs_light_client.mediawiki" rel=3D"noreferrer" target=3D"_blank">https://g=
ithub.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki</a><br>
<br>
I see the inner loop of construction and lookup are free of<br>
non-constant divmod. This will result in implementations being<br>
needlessly slow (especially on arm, but even on modern x86_64 a<br>
division is a 90 cycle-ish affair.)<br>
<br>
I believe this can be fixed by using this approach<br>
<a href=3D"http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modul=
o-reduction/" rel=3D"noreferrer" target=3D"_blank">http://lemire.me/blog/20=
16/06/27/a-fast-alternative-to-the-modulo-reduction/</a><br>
=C2=A0 =C2=A0which has the same non-uniformity as mod but needs only a mult=
iply<br>
and shift.<br>
<br>
Otherwise fast implementations will have to implement the code to<br>
compute bit twiddling hack exact division code, which is kind of<br>
complicated. (e.g. via the technique in &quot;{N}-bit Unsigned Division via=
<br>
{N}-bit Multiply-Add&quot; by Arch D. Robison).<br>
<br>
Shouldn&#39;t all cases in your spec where you have N=3Dtransactions be<br>
n=3Dindexed-outputs? Otherwise, I think your golomb parameter and false<br>
positive rate are wrong.<br>
</blockquote></div></div></blockquote></div></div>

--94eb2c0b927cb69e8405517faa23--