diff options
author | Mike Hearn <mike@plan99.net> | 2014-10-28 09:55:00 +0000 |
---|---|---|
committer | bitcoindev <bitcoindev@gnusha.org> | 2014-10-28 09:55:08 +0000 |
commit | 701ad0e25ce2170dce072931b0a471de26a910af (patch) | |
tree | 13264d34589faac3f441cbbf28f7b8138a23cef0 | |
parent | f99739df1dbc13cf30138a496916127099344306 (diff) | |
download | pi-bitcoindev-701ad0e25ce2170dce072931b0a471de26a910af.tar.gz pi-bitcoindev-701ad0e25ce2170dce072931b0a471de26a910af.zip |
Re: [Bitcoin-development] Reworking the policy estimation code (fee estimates)
-rw-r--r-- | dc/42582b6212969f3b7db0ea900600f9d0964aed | 326 |
1 files changed, 326 insertions, 0 deletions
diff --git a/dc/42582b6212969f3b7db0ea900600f9d0964aed b/dc/42582b6212969f3b7db0ea900600f9d0964aed new file mode 100644 index 000000000..f24d8ba08 --- /dev/null +++ b/dc/42582b6212969f3b7db0ea900600f9d0964aed @@ -0,0 +1,326 @@ +Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194] + helo=mx.sourceforge.net) + by sfs-ml-2.v29.ch3.sourceforge.com with esmtp (Exim 4.76) + (envelope-from <mh.in.england@gmail.com>) id 1Xj3Ua-0004d4-Gf + for bitcoin-development@lists.sourceforge.net; + Tue, 28 Oct 2014 09:55:08 +0000 +Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of gmail.com + designates 209.85.214.179 as permitted sender) + client-ip=209.85.214.179; envelope-from=mh.in.england@gmail.com; + helo=mail-ob0-f179.google.com; +Received: from mail-ob0-f179.google.com ([209.85.214.179]) + by sog-mx-4.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128) + (Exim 4.76) id 1Xj3UX-00011y-P4 + for bitcoin-development@lists.sourceforge.net; + Tue, 28 Oct 2014 09:55:08 +0000 +Received: by mail-ob0-f179.google.com with SMTP id m8so206007obr.38 + for <bitcoin-development@lists.sourceforge.net>; + Tue, 28 Oct 2014 02:55:00 -0700 (PDT) +MIME-Version: 1.0 +X-Received: by 10.182.213.2 with SMTP id no2mr204777obc.76.1414490100240; Tue, + 28 Oct 2014 02:55:00 -0700 (PDT) +Sender: mh.in.england@gmail.com +Received: by 10.76.6.7 with HTTP; Tue, 28 Oct 2014 02:55:00 -0700 (PDT) +Received: by 10.76.6.7 with HTTP; Tue, 28 Oct 2014 02:55:00 -0700 (PDT) +In-Reply-To: <CAPWm=eXxs=AfFhaT2EeGFsR+2r96WcaOeWL_Z59-6LixH+=4AQ@mail.gmail.com> +References: <CAPWm=eXxs=AfFhaT2EeGFsR+2r96WcaOeWL_Z59-6LixH+=4AQ@mail.gmail.com> +Date: Tue, 28 Oct 2014 09:55:00 +0000 +X-Google-Sender-Auth: DVEKo3s_GBo8W4lKZ6PofjnJK2Y +Message-ID: <CANEZrP0tpnES7=SuaVHOfWdf=CDkzhx2V=4mqRGDpW3ELsGkKA@mail.gmail.com> +From: Mike Hearn <mike@plan99.net> +To: Alex Morcos <morcos@gmail.com> +Content-Type: multipart/alternative; boundary=001a11c2de64f69ad2050678a37e +X-Spam-Score: -0.5 (/) +X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. + See http://spamassassin.org/tag/ for more details. + -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for + sender-domain + 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider + (mh.in.england[at]gmail.com) + -0.0 SPF_PASS SPF: sender matches SPF record + 1.0 HTML_MESSAGE BODY: HTML included in message + 0.1 DKIM_SIGNED Message has a DKIM or DK signature, + not necessarily valid + -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature +X-Headers-End: 1Xj3UX-00011y-P4 +Cc: Bitcoin Dev <bitcoin-development@lists.sourceforge.net> +Subject: Re: [Bitcoin-development] Reworking the policy estimation code (fee + estimates) +X-BeenThere: bitcoin-development@lists.sourceforge.net +X-Mailman-Version: 2.1.9 +Precedence: list +List-Id: <bitcoin-development.lists.sourceforge.net> +List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>, + <mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe> +List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development> +List-Post: <mailto:bitcoin-development@lists.sourceforge.net> +List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help> +List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>, + <mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe> +X-List-Received-Date: Tue, 28 Oct 2014 09:55:08 -0000 + +--001a11c2de64f69ad2050678a37e +Content-Type: text/plain; charset=UTF-8 + +Could you explain a little further why you think the current approach is +statistically incorrect? There's no doubt that the existing estimates the +system produces are garbage, but that's because it assumes players in the +fee market are rational and they are not. + +Fwiw bitcoinj 0.12.1 applies the January fee drop and will attach fee of +only 1000 satoshis per kB by default. I also have a program that measures +confirmation time for a given fee level (with fresh coins so there's no +priority) and it aligns with your findings, most txns confirm within a +couple of blocks. + +Ultimately there isn't any easy method to stop people throwing money away. +Bitcoinj will probably continue to use hard coded fee values for now to try +and contribute to market sanity in the hope it makes smartfees smarter. +On 27 Oct 2014 19:34, "Alex Morcos" <morcos@gmail.com> wrote: + +> I've been playing around with the code for estimating fees and found a few +> issues with the existing code. I think this will address several +> observations that the estimates returned by the existing code appear to be +> too high. For instance see @cozz in Issue 4866 +> <https://github.com/bitcoin/bitcoin/issues/4866>. +> +> Here's what I found: +> +> 1) We're trying to answer the question of what fee X you need in order to +> be confirmed within Y blocks. The existing code tries to do that by +> calculating the median fee for each possible Y instead of gathering +> statistics for each possible X. That approach is statistically incorrect. +> In fact since certain X's appear so frequently, they tend to dominate the +> statistics at all possible Y's (a fee rate of about 40k satoshis) +> +> 2) The existing code then sorts all of the data points in all of the +> buckets together by fee rate and then reassigns buckets before calculating +> the medians for each confirmation bucket. The sorting forces a +> relationship where there might not be one. Imagine some other variable, +> such as first 2 bytes of the transaction hash. If we sorted these and then +> used them to give estimates, we'd see a clear but false relationship where +> transactions with low starting bytes in their hashes took longer to confirm. +> +> 3) Transactions which don't have all their inputs available (because they +> depend on other transactions in the mempool) aren't excluded from the +> calculations. This skews the results. +> +> I rewrote the code to follow a different approach. I divided all possible +> fee rates up into fee rate buckets (I spaced these logarithmically). For +> each transaction that was confirmed, I updated the appropriate fee rate +> bucket with how many blocks it took to confirm that transaction. +> +> The hardest part of doing this fee estimation is to decide what the +> question really is that we're trying to answer. I took the approach that +> if you are asking what fee rate I need to be confirmed within Y blocks, +> then what you would like to know is the lowest fee rate such that a +> relatively high percentage of transactions of that fee rate are confirmed +> within Y blocks. Since even the highest fee transactions are confirmed +> within the first block only 90-93% of the time, I decided to use 80% as my +> cutoff. So now to answer "estimatefee Y", I scan through all of the fee +> buckets from the most expensive down until I find the last bucket with >80% +> of the transactions confirmed within Y blocks. +> +> Unfortunately we still have the problem of not having enough data points +> for non-typical fee rates, and so it requires gathering a lot of data to +> give reasonable answers. To keep all of these data points in a circular +> buffer and then sort them for every analysis (or after every new block) is +> expensive. So instead I adopted the approach of keeping an exponentially +> decaying moving average for each bucket. I used a decay of .998 which +> represents a half life of 374 blocks or about 2.5 days. Also if a bucket +> doesn't have very many transactions, I combine it with the next bucket. +> +> Here is a link <https://github.com/morcos/bitcoin/pull/3> to the code. I +> can create an actual pull request if there is consensus that it makes sense +> to do so. +> +> I've attached a graph comparing the estimates produced for 1-3 +> confirmations by the new code and the old code. I did apply the patch to +> fix issue 3 above to the old code first. The new code is in green and the +> fixed code is in purple. The Y axis is a log scale of feerate in satoshis +> per KB and the X axis is chain height. The new code produces the same +> estimates for 2 and 3 confirmations (the answers are effectively quantized +> by bucket). +> +> I've also completely reworked smartfees.py. It turns out to require many +> many more transactions are put through in order to have statistically +> significant results, so the test is quite slow to run (about 3 mins on my +> machine). +> +> I've also been running a real world test, sending transactions of various +> fee rates and seeing how long they took to get confirmed. After almost 200 +> tx's at each fee rate, here are the results so far: +> +> Fee rate 1100 Avg blocks to confirm 2.30 NumBlocks:% confirmed 1: 0.528 +> 2: 0.751 3: 0.870 +> Fee rate 2500 Avg blocks to confirm 2.22 NumBlocks:% confirmed 1: 0.528 +> 2: 0.766 3: 0.880 +> Fee rate 5000 Avg blocks to confirm 1.93 NumBlocks:% confirmed 1: 0.528 +> 2: 0.782 3: 0.891 +> Fee rate 10000 Avg blocks to confirm 1.67 NumBlocks:% confirmed 1: 0.569 +> 2: 0.844 3: 0.943 +> Fee rate 20000 Avg blocks to confirm 1.33 NumBlocks:% confirmed 1: 0.715 +> 2: 0.963 3: 0.989 +> Fee rate 30000 Avg blocks to confirm 1.27 NumBlocks:% confirmed 1: 0.751 +> 2: 0.974 3: 1.0 +> Fee rate 40000 Avg blocks to confirm 1.25 NumBlocks:% confirmed 1: 0.792 +> 2: 0.953 3: 0.994 +> Fee rate 60000 Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.875 +> 2: 1.0 3: 1.0 +> Fee rate 100000 Avg blocks to confirm 1.09 NumBlocks:% confirmed 1: 0.901 +> 2: 1.0 3: 1.0 +> Fee rate 300000 Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.886 +> 2: 0.989 3: 1.0 +> +> +> Alex +> +> +> ------------------------------------------------------------------------------ +> +> _______________________________________________ +> Bitcoin-development mailing list +> Bitcoin-development@lists.sourceforge.net +> https://lists.sourceforge.net/lists/listinfo/bitcoin-development +> +> + +--001a11c2de64f69ad2050678a37e +Content-Type: text/html; charset=UTF-8 +Content-Transfer-Encoding: quoted-printable + +<p dir=3D"ltr">Could you explain a little further why you think the current= + approach is statistically incorrect? There's no doubt that the existin= +g estimates the system produces are garbage, but that's because it assu= +mes players in the fee market are rational and they are not.</p> +<p dir=3D"ltr">Fwiw bitcoinj 0.12.1 applies the January fee drop and will a= +ttach fee of only 1000 satoshis per kB by default. I also have a program th= +at measures confirmation time for a given fee level (with fresh coins so th= +ere's no priority) and it aligns with your findings, most txns confirm = +within a couple of blocks.</p> +<p dir=3D"ltr">Ultimately there isn't any easy method to stop people th= +rowing money away. Bitcoinj will probably continue to use hard coded fee va= +lues for now to try and contribute to market sanity in the hope it makes sm= +artfees smarter.</p> +<div class=3D"gmail_quote">On 27 Oct 2014 19:34, "Alex Morcos" &l= +t;<a href=3D"mailto:morcos@gmail.com">morcos@gmail.com</a>> wrote:<br ty= +pe=3D"attribution"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 = +.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">I've= + been playing around with the code for estimating fees and found a few issu= +es with the existing code. =C2=A0 I think this will address several observa= +tions that the estimates returned by the existing code appear to be too hig= +h.=C2=A0 For instance see @cozz in <a href=3D"https://github.com/bitcoin/bi= +tcoin/issues/4866" target=3D"_blank">Issue 4866</a>.<div><br></div><div>Her= +e's what I found:</div><div><br></div><div>1)=C2=A0<font face=3D"arial,= + sans-serif">We're trying to answer the question of what fee X you need= + in order to be confirmed within Y blocks. =C2=A0 The existing code tries t= +o do that by calculating the median fee for each possible Y instead of gath= +ering statistics for each possible X.=C2=A0 That approach is=C2=A0statistic= +ally=C2=A0incorrect.=C2=A0 In fact since certain X's appear so frequent= +ly, they tend to dominate the statistics at all possible Y's (a fee rat= +e of about 40k satoshis)</font></div><div><span style=3D"font-family:arial,= +sans-serif;font-size:13px"><br></span></div><div><span style=3D"font-family= +:arial,sans-serif;font-size:13px">2) The existing code then sorts all of th= +e data points in all of the buckets together by fee rate and then reassigns= + buckets before calculating the medians for each confirmation bucket. =C2= +=A0</span><span style=3D"font-family:arial,sans-serif;font-size:13px">The s= +orting forces a relationship where there might not be one.=C2=A0 Imagine so= +me other variable, such as first 2 bytes of the transaction hash.=C2=A0 If = +we sorted these and then used them to give estimates, we'd see a clear = +but false relationship where transactions with low starting bytes in their = +hashes took longer to confirm.</span></div><div><span style=3D"font-family:= +arial,sans-serif;font-size:13px"><br></span></div><div><span style=3D"font-= +family:arial,sans-serif;font-size:13px">3) Transactions which don't hav= +e all their inputs available (because they depend on other transactions in = +the mempool) aren't excluded from the calculations.=C2=A0 This skews th= +e results.</span></div><div><br></div><div><font face=3D"arial, sans-serif"= +>I rewrote the code to follow a different approach.=C2=A0 I divided all pos= +sible fee rates up into fee rate buckets (I spaced these logarithmically).= +=C2=A0 For each transaction that was confirmed, I updated the appropriate f= +ee rate bucket=C2=A0with how many blocks it took to confirm that transactio= +n. =C2=A0</font></div><div><span style=3D"font-family:arial,sans-serif;font= +-size:13px"><br></span></div><div><span style=3D"font-family:arial,sans-ser= +if;font-size:13px">The hardest part of doing this fee estimation is to deci= +de what the question really is that we're trying to answer.=C2=A0 I too= +k the approach that if you are asking what fee rate I need to be confirmed = +within Y blocks, then what you would like to know is the lowest fee rate su= +ch that a relatively high percentage of transactions of that fee rate are c= +onfirmed within Y blocks. Since even the highest fee transactions are confi= +rmed within the first block only 90-93% of the time, I decided to use 80% a= +s my cutoff.=C2=A0 So now to answer "estimatefee Y", I scan throu= +gh all of the fee buckets from the most expensive down until I find the las= +t bucket with >80% of the transactions confirmed within Y blocks.</span>= +</div><div><br></div><div><span style=3D"font-family:arial,sans-serif;font-= +size:13px">Unfortunately we still have the problem of not having enough dat= +a points for non-typical fee rates, and so it requires gathering a lot of d= +ata to give reasonable answers. To keep all of these data points in a circu= +lar buffer and then sort them for every analysis (or after every new block)= + is expensive.=C2=A0 So instead I adopted the approach of keeping an expone= +ntially decaying moving average for each bucket. =C2=A0</span><span style= +=3D"font-family:arial,sans-serif;font-size:13px">I used a decay of .998 whi= +ch represents a half life of 374 blocks or about 2.5 days. =C2=A0</span><sp= +an style=3D"font-family:arial,sans-serif;font-size:13px">Also if a bucket d= +oesn't have very many transactions, I combine it with the next bucket.<= +/span><span style=3D"font-family:arial,sans-serif;font-size:13px"><br></spa= +n></div><div><span style=3D"font-family:arial,sans-serif;font-size:13px"><b= +r></span></div><div><span style=3D"font-family:arial,sans-serif;font-size:1= +3px">Here is a <a href=3D"https://github.com/morcos/bitcoin/pull/3" target= +=3D"_blank">link</a> to the code.=C2=A0 I can create an actual pull request= + if there is consensus that it makes sense to do so.</span></div><div><span= + style=3D"font-family:arial,sans-serif;font-size:13px"><br></span></div><di= +v><span style=3D"font-family:arial,sans-serif;font-size:13px">I've atta= +ched a graph comparing the estimates produced for 1-3 confirmations by the = +new code and the old code.=C2=A0 I did apply the patch to fix issue 3 above= + to the old code first.=C2=A0 The new code is in green and the fixed code i= +s in purple.=C2=A0 The Y axis is a log scale of feerate in satoshis per KB = +and the X axis is chain height.=C2=A0 The new code produces the same estima= +tes for 2 and 3 confirmations (the answers are effectively quantized by buc= +ket).</span></div><div><span style=3D"font-family:arial,sans-serif;font-siz= +e:13px"><br></span></div><div><span style=3D"font-family:arial,sans-serif;f= +ont-size:13px">I've also completely reworked smartfees.py.=C2=A0 It tur= +ns out to require many many more transactions are put through in order to h= +ave statistically significant results, so the test is quite slow to run (ab= +out 3 mins on my machine).</span></div><div><span style=3D"font-family:aria= +l,sans-serif;font-size:13px"><br></span></div><div><span style=3D"font-fami= +ly:arial,sans-serif;font-size:13px">I've also been running a real world= + test, sending transactions of various fee rates and seeing how long they t= +ook to get confirmed.=C2=A0 After almost 200 tx's at each fee rate, her= +e are the results so far:</span></div><div><br></div><div><div><font face= +=3D"courier new, monospace">Fee rate 1100 =C2=A0 Avg blocks to confirm 2.30= + NumBlocks:% confirmed 1: 0.528 2: 0.751 3: 0.870</font></div><div><font fa= +ce=3D"courier new, monospace">Fee rate 2500 =C2=A0 Avg blocks to confirm 2.= +22 NumBlocks:% confirmed 1: 0.528 2: 0.766 3: 0.880</font></div><div><font = +face=3D"courier new, monospace">Fee rate 5000 =C2=A0 Avg blocks to confirm = +1.93 NumBlocks:% confirmed 1: 0.528 2: 0.782 3: 0.891</font></div><div><fon= +t face=3D"courier new, monospace">Fee rate 10000 =C2=A0Avg blocks to confir= +m 1.67 NumBlocks:% confirmed 1: 0.569 2: 0.844 3: 0.943</font></div><div><f= +ont face=3D"courier new, monospace">Fee rate 20000 =C2=A0Avg blocks to conf= +irm 1.33 NumBlocks:% confirmed 1: 0.715 2: 0.963 3: 0.989</font></div><div>= +<font face=3D"courier new, monospace">Fee rate 30000 =C2=A0Avg blocks to co= +nfirm 1.27 NumBlocks:% confirmed 1: 0.751 2: 0.974 3: 1.0</font></div><div>= +<font face=3D"courier new, monospace">Fee rate 40000 =C2=A0Avg blocks to co= +nfirm 1.25 NumBlocks:% confirmed 1: 0.792 2: 0.953 3: 0.994</font></div><di= +v><font face=3D"courier new, monospace">Fee rate 60000 =C2=A0Avg blocks to = +confirm 1.12 NumBlocks:% confirmed 1: 0.875 2: 1.0 =C2=A0 3: 1.0</font></di= +v><div><font face=3D"courier new, monospace">Fee rate 100000 Avg blocks to = +confirm 1.09 NumBlocks:% confirmed 1: 0.901 2: 1.0 =C2=A0 3: 1.0</font></di= +v><div><font face=3D"courier new, monospace">Fee rate 300000 Avg blocks to = +confirm 1.12 NumBlocks:% confirmed 1: 0.886 2: 0.989 3: 1.0</font></div><di= +v style=3D"font-family:arial,sans-serif;font-size:13px"><br></div></div><di= +v><span style=3D"font-family:arial,sans-serif;font-size:13px"><br></span></= +div><div><span style=3D"font-family:arial,sans-serif;font-size:13px">Alex</= +span></div></div> +<br>-----------------------------------------------------------------------= +-------<br> +<br>_______________________________________________<br> +Bitcoin-development mailing list<br> +<a href=3D"mailto:Bitcoin-development@lists.sourceforge.net">Bitcoin-develo= +pment@lists.sourceforge.net</a><br> +<a href=3D"https://lists.sourceforge.net/lists/listinfo/bitcoin-development= +" target=3D"_blank">https://lists.sourceforge.net/lists/listinfo/bitcoin-de= +velopment</a><br> +<br></blockquote></div> + +--001a11c2de64f69ad2050678a37e-- + + |