summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMike Hearn <mike@plan99.net>2014-10-28 09:55:00 +0000
committerbitcoindev <bitcoindev@gnusha.org>2014-10-28 09:55:08 +0000
commit701ad0e25ce2170dce072931b0a471de26a910af (patch)
tree13264d34589faac3f441cbbf28f7b8138a23cef0
parentf99739df1dbc13cf30138a496916127099344306 (diff)
downloadpi-bitcoindev-701ad0e25ce2170dce072931b0a471de26a910af.tar.gz
pi-bitcoindev-701ad0e25ce2170dce072931b0a471de26a910af.zip
Re: [Bitcoin-development] Reworking the policy estimation code (fee estimates)
-rw-r--r--dc/42582b6212969f3b7db0ea900600f9d0964aed326
1 files changed, 326 insertions, 0 deletions
diff --git a/dc/42582b6212969f3b7db0ea900600f9d0964aed b/dc/42582b6212969f3b7db0ea900600f9d0964aed
new file mode 100644
index 000000000..f24d8ba08
--- /dev/null
+++ b/dc/42582b6212969f3b7db0ea900600f9d0964aed
@@ -0,0 +1,326 @@
+Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194]
+ helo=mx.sourceforge.net)
+ by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
+ (envelope-from <mh.in.england@gmail.com>) id 1Xj3Ua-0004d4-Gf
+ for bitcoin-development@lists.sourceforge.net;
+ Tue, 28 Oct 2014 09:55:08 +0000
+Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of gmail.com
+ designates 209.85.214.179 as permitted sender)
+ client-ip=209.85.214.179; envelope-from=mh.in.england@gmail.com;
+ helo=mail-ob0-f179.google.com;
+Received: from mail-ob0-f179.google.com ([209.85.214.179])
+ by sog-mx-4.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128)
+ (Exim 4.76) id 1Xj3UX-00011y-P4
+ for bitcoin-development@lists.sourceforge.net;
+ Tue, 28 Oct 2014 09:55:08 +0000
+Received: by mail-ob0-f179.google.com with SMTP id m8so206007obr.38
+ for <bitcoin-development@lists.sourceforge.net>;
+ Tue, 28 Oct 2014 02:55:00 -0700 (PDT)
+MIME-Version: 1.0
+X-Received: by 10.182.213.2 with SMTP id no2mr204777obc.76.1414490100240; Tue,
+ 28 Oct 2014 02:55:00 -0700 (PDT)
+Sender: mh.in.england@gmail.com
+Received: by 10.76.6.7 with HTTP; Tue, 28 Oct 2014 02:55:00 -0700 (PDT)
+Received: by 10.76.6.7 with HTTP; Tue, 28 Oct 2014 02:55:00 -0700 (PDT)
+In-Reply-To: <CAPWm=eXxs=AfFhaT2EeGFsR+2r96WcaOeWL_Z59-6LixH+=4AQ@mail.gmail.com>
+References: <CAPWm=eXxs=AfFhaT2EeGFsR+2r96WcaOeWL_Z59-6LixH+=4AQ@mail.gmail.com>
+Date: Tue, 28 Oct 2014 09:55:00 +0000
+X-Google-Sender-Auth: DVEKo3s_GBo8W4lKZ6PofjnJK2Y
+Message-ID: <CANEZrP0tpnES7=SuaVHOfWdf=CDkzhx2V=4mqRGDpW3ELsGkKA@mail.gmail.com>
+From: Mike Hearn <mike@plan99.net>
+To: Alex Morcos <morcos@gmail.com>
+Content-Type: multipart/alternative; boundary=001a11c2de64f69ad2050678a37e
+X-Spam-Score: -0.5 (/)
+X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
+ See http://spamassassin.org/tag/ for more details.
+ -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
+ sender-domain
+ 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
+ (mh.in.england[at]gmail.com)
+ -0.0 SPF_PASS SPF: sender matches SPF record
+ 1.0 HTML_MESSAGE BODY: HTML included in message
+ 0.1 DKIM_SIGNED Message has a DKIM or DK signature,
+ not necessarily valid
+ -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
+X-Headers-End: 1Xj3UX-00011y-P4
+Cc: Bitcoin Dev <bitcoin-development@lists.sourceforge.net>
+Subject: Re: [Bitcoin-development] Reworking the policy estimation code (fee
+ estimates)
+X-BeenThere: bitcoin-development@lists.sourceforge.net
+X-Mailman-Version: 2.1.9
+Precedence: list
+List-Id: <bitcoin-development.lists.sourceforge.net>
+List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
+ <mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
+List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
+List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
+List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
+List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
+ <mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
+X-List-Received-Date: Tue, 28 Oct 2014 09:55:08 -0000
+
+--001a11c2de64f69ad2050678a37e
+Content-Type: text/plain; charset=UTF-8
+
+Could you explain a little further why you think the current approach is
+statistically incorrect? There's no doubt that the existing estimates the
+system produces are garbage, but that's because it assumes players in the
+fee market are rational and they are not.
+
+Fwiw bitcoinj 0.12.1 applies the January fee drop and will attach fee of
+only 1000 satoshis per kB by default. I also have a program that measures
+confirmation time for a given fee level (with fresh coins so there's no
+priority) and it aligns with your findings, most txns confirm within a
+couple of blocks.
+
+Ultimately there isn't any easy method to stop people throwing money away.
+Bitcoinj will probably continue to use hard coded fee values for now to try
+and contribute to market sanity in the hope it makes smartfees smarter.
+On 27 Oct 2014 19:34, "Alex Morcos" <morcos@gmail.com> wrote:
+
+> I've been playing around with the code for estimating fees and found a few
+> issues with the existing code. I think this will address several
+> observations that the estimates returned by the existing code appear to be
+> too high. For instance see @cozz in Issue 4866
+> <https://github.com/bitcoin/bitcoin/issues/4866>.
+>
+> Here's what I found:
+>
+> 1) We're trying to answer the question of what fee X you need in order to
+> be confirmed within Y blocks. The existing code tries to do that by
+> calculating the median fee for each possible Y instead of gathering
+> statistics for each possible X. That approach is statistically incorrect.
+> In fact since certain X's appear so frequently, they tend to dominate the
+> statistics at all possible Y's (a fee rate of about 40k satoshis)
+>
+> 2) The existing code then sorts all of the data points in all of the
+> buckets together by fee rate and then reassigns buckets before calculating
+> the medians for each confirmation bucket. The sorting forces a
+> relationship where there might not be one. Imagine some other variable,
+> such as first 2 bytes of the transaction hash. If we sorted these and then
+> used them to give estimates, we'd see a clear but false relationship where
+> transactions with low starting bytes in their hashes took longer to confirm.
+>
+> 3) Transactions which don't have all their inputs available (because they
+> depend on other transactions in the mempool) aren't excluded from the
+> calculations. This skews the results.
+>
+> I rewrote the code to follow a different approach. I divided all possible
+> fee rates up into fee rate buckets (I spaced these logarithmically). For
+> each transaction that was confirmed, I updated the appropriate fee rate
+> bucket with how many blocks it took to confirm that transaction.
+>
+> The hardest part of doing this fee estimation is to decide what the
+> question really is that we're trying to answer. I took the approach that
+> if you are asking what fee rate I need to be confirmed within Y blocks,
+> then what you would like to know is the lowest fee rate such that a
+> relatively high percentage of transactions of that fee rate are confirmed
+> within Y blocks. Since even the highest fee transactions are confirmed
+> within the first block only 90-93% of the time, I decided to use 80% as my
+> cutoff. So now to answer "estimatefee Y", I scan through all of the fee
+> buckets from the most expensive down until I find the last bucket with >80%
+> of the transactions confirmed within Y blocks.
+>
+> Unfortunately we still have the problem of not having enough data points
+> for non-typical fee rates, and so it requires gathering a lot of data to
+> give reasonable answers. To keep all of these data points in a circular
+> buffer and then sort them for every analysis (or after every new block) is
+> expensive. So instead I adopted the approach of keeping an exponentially
+> decaying moving average for each bucket. I used a decay of .998 which
+> represents a half life of 374 blocks or about 2.5 days. Also if a bucket
+> doesn't have very many transactions, I combine it with the next bucket.
+>
+> Here is a link <https://github.com/morcos/bitcoin/pull/3> to the code. I
+> can create an actual pull request if there is consensus that it makes sense
+> to do so.
+>
+> I've attached a graph comparing the estimates produced for 1-3
+> confirmations by the new code and the old code. I did apply the patch to
+> fix issue 3 above to the old code first. The new code is in green and the
+> fixed code is in purple. The Y axis is a log scale of feerate in satoshis
+> per KB and the X axis is chain height. The new code produces the same
+> estimates for 2 and 3 confirmations (the answers are effectively quantized
+> by bucket).
+>
+> I've also completely reworked smartfees.py. It turns out to require many
+> many more transactions are put through in order to have statistically
+> significant results, so the test is quite slow to run (about 3 mins on my
+> machine).
+>
+> I've also been running a real world test, sending transactions of various
+> fee rates and seeing how long they took to get confirmed. After almost 200
+> tx's at each fee rate, here are the results so far:
+>
+> Fee rate 1100 Avg blocks to confirm 2.30 NumBlocks:% confirmed 1: 0.528
+> 2: 0.751 3: 0.870
+> Fee rate 2500 Avg blocks to confirm 2.22 NumBlocks:% confirmed 1: 0.528
+> 2: 0.766 3: 0.880
+> Fee rate 5000 Avg blocks to confirm 1.93 NumBlocks:% confirmed 1: 0.528
+> 2: 0.782 3: 0.891
+> Fee rate 10000 Avg blocks to confirm 1.67 NumBlocks:% confirmed 1: 0.569
+> 2: 0.844 3: 0.943
+> Fee rate 20000 Avg blocks to confirm 1.33 NumBlocks:% confirmed 1: 0.715
+> 2: 0.963 3: 0.989
+> Fee rate 30000 Avg blocks to confirm 1.27 NumBlocks:% confirmed 1: 0.751
+> 2: 0.974 3: 1.0
+> Fee rate 40000 Avg blocks to confirm 1.25 NumBlocks:% confirmed 1: 0.792
+> 2: 0.953 3: 0.994
+> Fee rate 60000 Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.875
+> 2: 1.0 3: 1.0
+> Fee rate 100000 Avg blocks to confirm 1.09 NumBlocks:% confirmed 1: 0.901
+> 2: 1.0 3: 1.0
+> Fee rate 300000 Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.886
+> 2: 0.989 3: 1.0
+>
+>
+> Alex
+>
+>
+> ------------------------------------------------------------------------------
+>
+> _______________________________________________
+> Bitcoin-development mailing list
+> Bitcoin-development@lists.sourceforge.net
+> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
+>
+>
+
+--001a11c2de64f69ad2050678a37e
+Content-Type: text/html; charset=UTF-8
+Content-Transfer-Encoding: quoted-printable
+
+<p dir=3D"ltr">Could you explain a little further why you think the current=
+ approach is statistically incorrect? There&#39;s no doubt that the existin=
+g estimates the system produces are garbage, but that&#39;s because it assu=
+mes players in the fee market are rational and they are not.</p>
+<p dir=3D"ltr">Fwiw bitcoinj 0.12.1 applies the January fee drop and will a=
+ttach fee of only 1000 satoshis per kB by default. I also have a program th=
+at measures confirmation time for a given fee level (with fresh coins so th=
+ere&#39;s no priority) and it aligns with your findings, most txns confirm =
+within a couple of blocks.</p>
+<p dir=3D"ltr">Ultimately there isn&#39;t any easy method to stop people th=
+rowing money away. Bitcoinj will probably continue to use hard coded fee va=
+lues for now to try and contribute to market sanity in the hope it makes sm=
+artfees smarter.</p>
+<div class=3D"gmail_quote">On 27 Oct 2014 19:34, &quot;Alex Morcos&quot; &l=
+t;<a href=3D"mailto:morcos@gmail.com">morcos@gmail.com</a>&gt; wrote:<br ty=
+pe=3D"attribution"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
+.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">I&#39;ve=
+ been playing around with the code for estimating fees and found a few issu=
+es with the existing code. =C2=A0 I think this will address several observa=
+tions that the estimates returned by the existing code appear to be too hig=
+h.=C2=A0 For instance see @cozz in <a href=3D"https://github.com/bitcoin/bi=
+tcoin/issues/4866" target=3D"_blank">Issue 4866</a>.<div><br></div><div>Her=
+e&#39;s what I found:</div><div><br></div><div>1)=C2=A0<font face=3D"arial,=
+ sans-serif">We&#39;re trying to answer the question of what fee X you need=
+ in order to be confirmed within Y blocks. =C2=A0 The existing code tries t=
+o do that by calculating the median fee for each possible Y instead of gath=
+ering statistics for each possible X.=C2=A0 That approach is=C2=A0statistic=
+ally=C2=A0incorrect.=C2=A0 In fact since certain X&#39;s appear so frequent=
+ly, they tend to dominate the statistics at all possible Y&#39;s (a fee rat=
+e of about 40k satoshis)</font></div><div><span style=3D"font-family:arial,=
+sans-serif;font-size:13px"><br></span></div><div><span style=3D"font-family=
+:arial,sans-serif;font-size:13px">2) The existing code then sorts all of th=
+e data points in all of the buckets together by fee rate and then reassigns=
+ buckets before calculating the medians for each confirmation bucket. =C2=
+=A0</span><span style=3D"font-family:arial,sans-serif;font-size:13px">The s=
+orting forces a relationship where there might not be one.=C2=A0 Imagine so=
+me other variable, such as first 2 bytes of the transaction hash.=C2=A0 If =
+we sorted these and then used them to give estimates, we&#39;d see a clear =
+but false relationship where transactions with low starting bytes in their =
+hashes took longer to confirm.</span></div><div><span style=3D"font-family:=
+arial,sans-serif;font-size:13px"><br></span></div><div><span style=3D"font-=
+family:arial,sans-serif;font-size:13px">3) Transactions which don&#39;t hav=
+e all their inputs available (because they depend on other transactions in =
+the mempool) aren&#39;t excluded from the calculations.=C2=A0 This skews th=
+e results.</span></div><div><br></div><div><font face=3D"arial, sans-serif"=
+>I rewrote the code to follow a different approach.=C2=A0 I divided all pos=
+sible fee rates up into fee rate buckets (I spaced these logarithmically).=
+=C2=A0 For each transaction that was confirmed, I updated the appropriate f=
+ee rate bucket=C2=A0with how many blocks it took to confirm that transactio=
+n. =C2=A0</font></div><div><span style=3D"font-family:arial,sans-serif;font=
+-size:13px"><br></span></div><div><span style=3D"font-family:arial,sans-ser=
+if;font-size:13px">The hardest part of doing this fee estimation is to deci=
+de what the question really is that we&#39;re trying to answer.=C2=A0 I too=
+k the approach that if you are asking what fee rate I need to be confirmed =
+within Y blocks, then what you would like to know is the lowest fee rate su=
+ch that a relatively high percentage of transactions of that fee rate are c=
+onfirmed within Y blocks. Since even the highest fee transactions are confi=
+rmed within the first block only 90-93% of the time, I decided to use 80% a=
+s my cutoff.=C2=A0 So now to answer &quot;estimatefee Y&quot;, I scan throu=
+gh all of the fee buckets from the most expensive down until I find the las=
+t bucket with &gt;80% of the transactions confirmed within Y blocks.</span>=
+</div><div><br></div><div><span style=3D"font-family:arial,sans-serif;font-=
+size:13px">Unfortunately we still have the problem of not having enough dat=
+a points for non-typical fee rates, and so it requires gathering a lot of d=
+ata to give reasonable answers. To keep all of these data points in a circu=
+lar buffer and then sort them for every analysis (or after every new block)=
+ is expensive.=C2=A0 So instead I adopted the approach of keeping an expone=
+ntially decaying moving average for each bucket. =C2=A0</span><span style=
+=3D"font-family:arial,sans-serif;font-size:13px">I used a decay of .998 whi=
+ch represents a half life of 374 blocks or about 2.5 days. =C2=A0</span><sp=
+an style=3D"font-family:arial,sans-serif;font-size:13px">Also if a bucket d=
+oesn&#39;t have very many transactions, I combine it with the next bucket.<=
+/span><span style=3D"font-family:arial,sans-serif;font-size:13px"><br></spa=
+n></div><div><span style=3D"font-family:arial,sans-serif;font-size:13px"><b=
+r></span></div><div><span style=3D"font-family:arial,sans-serif;font-size:1=
+3px">Here is a <a href=3D"https://github.com/morcos/bitcoin/pull/3" target=
+=3D"_blank">link</a> to the code.=C2=A0 I can create an actual pull request=
+ if there is consensus that it makes sense to do so.</span></div><div><span=
+ style=3D"font-family:arial,sans-serif;font-size:13px"><br></span></div><di=
+v><span style=3D"font-family:arial,sans-serif;font-size:13px">I&#39;ve atta=
+ched a graph comparing the estimates produced for 1-3 confirmations by the =
+new code and the old code.=C2=A0 I did apply the patch to fix issue 3 above=
+ to the old code first.=C2=A0 The new code is in green and the fixed code i=
+s in purple.=C2=A0 The Y axis is a log scale of feerate in satoshis per KB =
+and the X axis is chain height.=C2=A0 The new code produces the same estima=
+tes for 2 and 3 confirmations (the answers are effectively quantized by buc=
+ket).</span></div><div><span style=3D"font-family:arial,sans-serif;font-siz=
+e:13px"><br></span></div><div><span style=3D"font-family:arial,sans-serif;f=
+ont-size:13px">I&#39;ve also completely reworked smartfees.py.=C2=A0 It tur=
+ns out to require many many more transactions are put through in order to h=
+ave statistically significant results, so the test is quite slow to run (ab=
+out 3 mins on my machine).</span></div><div><span style=3D"font-family:aria=
+l,sans-serif;font-size:13px"><br></span></div><div><span style=3D"font-fami=
+ly:arial,sans-serif;font-size:13px">I&#39;ve also been running a real world=
+ test, sending transactions of various fee rates and seeing how long they t=
+ook to get confirmed.=C2=A0 After almost 200 tx&#39;s at each fee rate, her=
+e are the results so far:</span></div><div><br></div><div><div><font face=
+=3D"courier new, monospace">Fee rate 1100 =C2=A0 Avg blocks to confirm 2.30=
+ NumBlocks:% confirmed 1: 0.528 2: 0.751 3: 0.870</font></div><div><font fa=
+ce=3D"courier new, monospace">Fee rate 2500 =C2=A0 Avg blocks to confirm 2.=
+22 NumBlocks:% confirmed 1: 0.528 2: 0.766 3: 0.880</font></div><div><font =
+face=3D"courier new, monospace">Fee rate 5000 =C2=A0 Avg blocks to confirm =
+1.93 NumBlocks:% confirmed 1: 0.528 2: 0.782 3: 0.891</font></div><div><fon=
+t face=3D"courier new, monospace">Fee rate 10000 =C2=A0Avg blocks to confir=
+m 1.67 NumBlocks:% confirmed 1: 0.569 2: 0.844 3: 0.943</font></div><div><f=
+ont face=3D"courier new, monospace">Fee rate 20000 =C2=A0Avg blocks to conf=
+irm 1.33 NumBlocks:% confirmed 1: 0.715 2: 0.963 3: 0.989</font></div><div>=
+<font face=3D"courier new, monospace">Fee rate 30000 =C2=A0Avg blocks to co=
+nfirm 1.27 NumBlocks:% confirmed 1: 0.751 2: 0.974 3: 1.0</font></div><div>=
+<font face=3D"courier new, monospace">Fee rate 40000 =C2=A0Avg blocks to co=
+nfirm 1.25 NumBlocks:% confirmed 1: 0.792 2: 0.953 3: 0.994</font></div><di=
+v><font face=3D"courier new, monospace">Fee rate 60000 =C2=A0Avg blocks to =
+confirm 1.12 NumBlocks:% confirmed 1: 0.875 2: 1.0 =C2=A0 3: 1.0</font></di=
+v><div><font face=3D"courier new, monospace">Fee rate 100000 Avg blocks to =
+confirm 1.09 NumBlocks:% confirmed 1: 0.901 2: 1.0 =C2=A0 3: 1.0</font></di=
+v><div><font face=3D"courier new, monospace">Fee rate 300000 Avg blocks to =
+confirm 1.12 NumBlocks:% confirmed 1: 0.886 2: 0.989 3: 1.0</font></div><di=
+v style=3D"font-family:arial,sans-serif;font-size:13px"><br></div></div><di=
+v><span style=3D"font-family:arial,sans-serif;font-size:13px"><br></span></=
+div><div><span style=3D"font-family:arial,sans-serif;font-size:13px">Alex</=
+span></div></div>
+<br>-----------------------------------------------------------------------=
+-------<br>
+<br>_______________________________________________<br>
+Bitcoin-development mailing list<br>
+<a href=3D"mailto:Bitcoin-development@lists.sourceforge.net">Bitcoin-develo=
+pment@lists.sourceforge.net</a><br>
+<a href=3D"https://lists.sourceforge.net/lists/listinfo/bitcoin-development=
+" target=3D"_blank">https://lists.sourceforge.net/lists/listinfo/bitcoin-de=
+velopment</a><br>
+<br></blockquote></div>
+
+--001a11c2de64f69ad2050678a37e--
+
+