summaryrefslogtreecommitdiff
path: root/c9/dbbf71dd48a0cdcfa6fc019602174fc35c0fd9
blob: a2ff4dc1703adbc1284bd90c7e29730da21d8584 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
Received: from sog-mx-4.v43.ch3.sourceforge.com ([172.29.43.194]
	helo=mx.sourceforge.net)
	by sfs-ml-3.v29.ch3.sourceforge.com with esmtp (Exim 4.76)
	(envelope-from <morcos@gmail.com>) id 1Xj5e5-0003Iv-7f
	for bitcoin-development@lists.sourceforge.net;
	Tue, 28 Oct 2014 12:13:05 +0000
Received-SPF: pass (sog-mx-4.v43.ch3.sourceforge.com: domain of gmail.com
	designates 209.85.223.176 as permitted sender)
	client-ip=209.85.223.176; envelope-from=morcos@gmail.com;
	helo=mail-ie0-f176.google.com; 
Received: from mail-ie0-f176.google.com ([209.85.223.176])
	by sog-mx-4.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128)
	(Exim 4.76) id 1Xj5e3-0005Pf-5T
	for bitcoin-development@lists.sourceforge.net;
	Tue, 28 Oct 2014 12:13:05 +0000
Received: by mail-ie0-f176.google.com with SMTP id rd18so470557iec.21
	for <bitcoin-development@lists.sourceforge.net>;
	Tue, 28 Oct 2014 05:12:57 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.107.135.146 with SMTP id r18mr2154419ioi.62.1414498377797;
	Tue, 28 Oct 2014 05:12:57 -0700 (PDT)
Received: by 10.50.223.146 with HTTP; Tue, 28 Oct 2014 05:12:57 -0700 (PDT)
In-Reply-To: <CANEZrP0tpnES7=SuaVHOfWdf=CDkzhx2V=4mqRGDpW3ELsGkKA@mail.gmail.com>
References: <CAPWm=eXxs=AfFhaT2EeGFsR+2r96WcaOeWL_Z59-6LixH+=4AQ@mail.gmail.com>
	<CANEZrP0tpnES7=SuaVHOfWdf=CDkzhx2V=4mqRGDpW3ELsGkKA@mail.gmail.com>
Date: Tue, 28 Oct 2014 08:12:57 -0400
Message-ID: <CAPWm=eXBUNd2WeCwnTtocuofiEq+Ccn0-hZTrMQCQiuHRvVZEQ@mail.gmail.com>
From: Alex Morcos <morcos@gmail.com>
To: Mike Hearn <mike@plan99.net>
Content-Type: multipart/alternative; boundary=001a113f966258189505067a91e1
X-Spam-Score: -0.6 (/)
X-Spam-Report: Spam Filtering performed by mx.sourceforge.net.
	See http://spamassassin.org/tag/ for more details.
	-1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for
	sender-domain
	0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
	(morcos[at]gmail.com)
	-0.0 SPF_PASS               SPF: sender matches SPF record
	1.0 HTML_MESSAGE           BODY: HTML included in message
	-0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
	author's domain
	0.1 DKIM_SIGNED            Message has a DKIM or DK signature,
	not necessarily valid
	-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
X-Headers-End: 1Xj5e3-0005Pf-5T
Cc: Bitcoin Dev <bitcoin-development@lists.sourceforge.net>
Subject: Re: [Bitcoin-development] Reworking the policy estimation code (fee
	estimates)
X-BeenThere: bitcoin-development@lists.sourceforge.net
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <bitcoin-development.lists.sourceforge.net>
List-Unsubscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchive/forum.php?forum_name=bitcoin-development>
List-Post: <mailto:bitcoin-development@lists.sourceforge.net>
List-Help: <mailto:bitcoin-development-request@lists.sourceforge.net?subject=help>
List-Subscribe: <https://lists.sourceforge.net/lists/listinfo/bitcoin-development>,
	<mailto:bitcoin-development-request@lists.sourceforge.net?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 12:13:05 -0000

--001a113f966258189505067a91e1
Content-Type: text/plain; charset=UTF-8

Yeah, so to explain points 1 and 2 a bit more

1)  It's about what question you are trying to answer.  The existing code
tries to answer the question of what is the median fee of a transaction
that gets confirmed in Y blocks.  It turns out that is not a very good
proxy for the question we really want to know which is what is the fee that
is necessary such that we are likely to be confirmed within Y blocks.
What happens is that there are so many transactions of the 40k satoshis/kB
feerate that they turn out to be the dominant data points of transactions
that are confirmed after 2 blocks, 3 blocks, etc. and not only 1 block.

So for example.   A hypothetical sample of 20 txs might find 2 of your 1k
sat/kB txs and 18 of the 40k sat/kB txs.  Perhaps 15 of the 40k txs are
confirmed in 1 block and the other 3 in 2 blocks, and 1 of the 1k txs in 1
block and the other in 2 blocks.  So if you analyze the data by
confirmation time, you find that 15/16 1-conf txs are 40k and 3/4 2-conf
txs are 40k, so the median feerate is 40k for both 1 and 2 confirmations.
Instead, the correct thing to do is analyze the data by feerate.  Doing
that, we find that 15/18 (83%) of 40k txs are confirmed in 1 block and 1/2
(50%) 1k txs are.  But 100% of both are confirmed within two blocks.  This
leads you to say, you need 40k feerate if you want to get confirmed in 1
block but 1k is sufficient if you want to be confirmed in 2 blocks.

Put another way, Let's imagine you wanted to know how tall you have to be
no longer fit in the coach seats on an airplane.   If you looked at the
median height of all people in coach and all people in first class, you
would see that they were about the same, and you would get a confusing
answer.  Instead you have to bin by height, and look at the percentage of
people of each height that fly first-class vs coach, and I'd guess that by
the time you got up to say 6'8" you were finding greater than 50% of the
people flying first class.

2) The code also presupposes that higher fee rate transactions must be
confirmed quicker.  And so in addition to binning all transactions by
confirmation time, the code then sorts all of the transactions and re-bins
them such that the highest fee transactions are all in the 1-confirmation
bin and the lowest fee transactions are all in the 25-confirmation bin.  If
we'd been trying to predict whether the first 2 bytes of transaction hash
influenced our confirmation time, we would have started by having a random
distribution of hashes in each confirmation bin, but then after doing the
sorting, we'd of course have found that the "median hash" of the
1-confirmation transactions was higher, because we sorted it to make that
the case.

In the airplane example this would have been equivalent to taking the
median height of the 20 tallest people on the plane (assuming first class
is 20 seats) and saying that was the height for first class and the median
height of the remaining people for coach.  This will appear to give a
slightly better answer than the first approach, but is still wrong.



There are still a lot of additional improvements that can be made to fee
estimation.  One problem my proposed code has is there really just aren't
enough data points of low feerate transactions to give meaningful answers
about how likely those are to be confirmed, so its answers are still a bit
conservative.  This will improve though as the actual distribution of
transactions spreads out.    The other major needed improvement is to not
just state some description of what has happened in the past, but to
actually make a prediction about what is going to happen in the future.
For instance looking at the feerates of unconfirmed transactions currently
in the mempool could tell you that if you want to be confirmed immediately
you'll need to be high enough in that priority queue.








On Tue, Oct 28, 2014 at 5:55 AM, Mike Hearn <mike@plan99.net> wrote:

> Could you explain a little further why you think the current approach is
> statistically incorrect? There's no doubt that the existing estimates the
> system produces are garbage, but that's because it assumes players in the
> fee market are rational and they are not.
>
> Fwiw bitcoinj 0.12.1 applies the January fee drop and will attach fee of
> only 1000 satoshis per kB by default. I also have a program that measures
> confirmation time for a given fee level (with fresh coins so there's no
> priority) and it aligns with your findings, most txns confirm within a
> couple of blocks.
>
> Ultimately there isn't any easy method to stop people throwing money away.
> Bitcoinj will probably continue to use hard coded fee values for now to try
> and contribute to market sanity in the hope it makes smartfees smarter.
> On 27 Oct 2014 19:34, "Alex Morcos" <morcos@gmail.com> wrote:
>
>> I've been playing around with the code for estimating fees and found a
>> few issues with the existing code.   I think this will address several
>> observations that the estimates returned by the existing code appear to be
>> too high.  For instance see @cozz in Issue 4866
>> <https://github.com/bitcoin/bitcoin/issues/4866>.
>>
>> Here's what I found:
>>
>> 1) We're trying to answer the question of what fee X you need in order
>> to be confirmed within Y blocks.   The existing code tries to do that by
>> calculating the median fee for each possible Y instead of gathering
>> statistics for each possible X.  That approach is statistically incorrect.
>> In fact since certain X's appear so frequently, they tend to dominate the
>> statistics at all possible Y's (a fee rate of about 40k satoshis)
>>
>> 2) The existing code then sorts all of the data points in all of the
>> buckets together by fee rate and then reassigns buckets before calculating
>> the medians for each confirmation bucket.  The sorting forces a
>> relationship where there might not be one.  Imagine some other variable,
>> such as first 2 bytes of the transaction hash.  If we sorted these and then
>> used them to give estimates, we'd see a clear but false relationship where
>> transactions with low starting bytes in their hashes took longer to confirm.
>>
>> 3) Transactions which don't have all their inputs available (because they
>> depend on other transactions in the mempool) aren't excluded from the
>> calculations.  This skews the results.
>>
>> I rewrote the code to follow a different approach.  I divided all
>> possible fee rates up into fee rate buckets (I spaced these
>> logarithmically).  For each transaction that was confirmed, I updated the
>> appropriate fee rate bucket with how many blocks it took to confirm that
>> transaction.
>>
>> The hardest part of doing this fee estimation is to decide what the
>> question really is that we're trying to answer.  I took the approach that
>> if you are asking what fee rate I need to be confirmed within Y blocks,
>> then what you would like to know is the lowest fee rate such that a
>> relatively high percentage of transactions of that fee rate are confirmed
>> within Y blocks. Since even the highest fee transactions are confirmed
>> within the first block only 90-93% of the time, I decided to use 80% as my
>> cutoff.  So now to answer "estimatefee Y", I scan through all of the fee
>> buckets from the most expensive down until I find the last bucket with >80%
>> of the transactions confirmed within Y blocks.
>>
>> Unfortunately we still have the problem of not having enough data points
>> for non-typical fee rates, and so it requires gathering a lot of data to
>> give reasonable answers. To keep all of these data points in a circular
>> buffer and then sort them for every analysis (or after every new block) is
>> expensive.  So instead I adopted the approach of keeping an exponentially
>> decaying moving average for each bucket.  I used a decay of .998 which
>> represents a half life of 374 blocks or about 2.5 days.  Also if a
>> bucket doesn't have very many transactions, I combine it with the next
>> bucket.
>>
>> Here is a link <https://github.com/morcos/bitcoin/pull/3> to the code.
>> I can create an actual pull request if there is consensus that it makes
>> sense to do so.
>>
>> I've attached a graph comparing the estimates produced for 1-3
>> confirmations by the new code and the old code.  I did apply the patch to
>> fix issue 3 above to the old code first.  The new code is in green and the
>> fixed code is in purple.  The Y axis is a log scale of feerate in satoshis
>> per KB and the X axis is chain height.  The new code produces the same
>> estimates for 2 and 3 confirmations (the answers are effectively quantized
>> by bucket).
>>
>> I've also completely reworked smartfees.py.  It turns out to require many
>> many more transactions are put through in order to have statistically
>> significant results, so the test is quite slow to run (about 3 mins on my
>> machine).
>>
>> I've also been running a real world test, sending transactions of various
>> fee rates and seeing how long they took to get confirmed.  After almost 200
>> tx's at each fee rate, here are the results so far:
>>
>> Fee rate 1100   Avg blocks to confirm 2.30 NumBlocks:% confirmed 1: 0.528
>> 2: 0.751 3: 0.870
>> Fee rate 2500   Avg blocks to confirm 2.22 NumBlocks:% confirmed 1: 0.528
>> 2: 0.766 3: 0.880
>> Fee rate 5000   Avg blocks to confirm 1.93 NumBlocks:% confirmed 1: 0.528
>> 2: 0.782 3: 0.891
>> Fee rate 10000  Avg blocks to confirm 1.67 NumBlocks:% confirmed 1: 0.569
>> 2: 0.844 3: 0.943
>> Fee rate 20000  Avg blocks to confirm 1.33 NumBlocks:% confirmed 1: 0.715
>> 2: 0.963 3: 0.989
>> Fee rate 30000  Avg blocks to confirm 1.27 NumBlocks:% confirmed 1: 0.751
>> 2: 0.974 3: 1.0
>> Fee rate 40000  Avg blocks to confirm 1.25 NumBlocks:% confirmed 1: 0.792
>> 2: 0.953 3: 0.994
>> Fee rate 60000  Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.875
>> 2: 1.0   3: 1.0
>> Fee rate 100000 Avg blocks to confirm 1.09 NumBlocks:% confirmed 1: 0.901
>> 2: 1.0   3: 1.0
>> Fee rate 300000 Avg blocks to confirm 1.12 NumBlocks:% confirmed 1: 0.886
>> 2: 0.989 3: 1.0
>>
>>
>> Alex
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Bitcoin-development mailing list
>> Bitcoin-development@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>>
>>

--001a113f966258189505067a91e1
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Yeah, so to explain points 1 and 2 a bit more<div><br></di=
v><div>1) =C2=A0It&#39;s about what question you are trying to answer.=C2=
=A0 The existing code tries to answer the question of what is the median fe=
e of a transaction that gets confirmed in Y blocks.=C2=A0 It turns out that=
 is not a very good proxy for the question we really want to know which is =
what is the fee that is necessary such that we are likely to be confirmed w=
ithin Y blocks. =C2=A0 What happens is that there are so many transactions =
of the 40k satoshis/kB feerate that they turn out to be the dominant data p=
oints of transactions that are confirmed after 2 blocks, 3 blocks, etc. and=
 not only 1 block. =C2=A0=C2=A0</div><div><br></div><div>So for example. =
=C2=A0 A hypothetical sample of 20 txs might find 2 of your 1k sat/kB txs a=
nd 18 of the 40k sat/kB txs.=C2=A0 Perhaps 15 of the 40k txs are confirmed =
in 1 block and the other 3 in 2 blocks, and 1 of the 1k txs in 1 block and =
the other in 2 blocks.=C2=A0 So if you analyze the data by confirmation tim=
e, you find that 15/16 1-conf txs are 40k and 3/4 2-conf txs are 40k, so th=
e median feerate is 40k for both 1 and 2 confirmations. =C2=A0 Instead, the=
 correct thing to do is analyze the data by feerate.=C2=A0 Doing that, we f=
ind that 15/18 (83%) of 40k txs are confirmed in 1 block and 1/2 (50%) 1k t=
xs are.=C2=A0 But 100% of both are confirmed within two blocks.=C2=A0 This =
leads you to say, you need 40k feerate if you want to get confirmed in 1 bl=
ock but 1k is sufficient if you want to be confirmed in 2 blocks.</div><div=
><br></div><div>Put another way, Let&#39;s imagine you wanted to know how t=
all you have to be no longer fit in the coach seats on an airplane. =C2=A0 =
If you looked at the median height of all people in coach and all people in=
 first class, you would see that they were about the same, and you would ge=
t a confusing answer.=C2=A0 Instead you have to bin by height, and look at =
the percentage of people of each height that fly first-class vs coach, and =
I&#39;d guess that by the time you got up to say 6&#39;8&quot; you were fin=
ding greater than 50% of the people flying first class.</div><div><br></div=
><div>2) The code also presupposes that higher fee rate transactions must b=
e confirmed quicker.=C2=A0 And so in addition to binning all transactions b=
y confirmation time, the code then sorts all of the transactions and re-bin=
s them such that the highest fee transactions are all in the 1-confirmation=
 bin and the lowest fee transactions are all in the 25-confirmation bin.=C2=
=A0 If we&#39;d been trying to predict whether the first 2 bytes of transac=
tion hash influenced our confirmation time, we would have started by having=
 a random distribution of hashes in each confirmation bin, but then after d=
oing the sorting, we&#39;d of course have found that the &quot;median hash&=
quot; of the 1-confirmation transactions was higher, because we sorted it t=
o make that the case. =C2=A0=C2=A0</div><div><br></div><div>In the airplane=
 example this would have been equivalent to taking the median height of the=
 20 tallest people on the plane (assuming first class is 20 seats) and sayi=
ng that was the height for first class and the median height of the remaini=
ng people for coach.=C2=A0 This will appear to give a slightly better answe=
r than the first approach, but is still wrong.</div><div><br></div><div><br=
></div><div><br></div><div>There are still a lot of additional improvements=
 that can be made to fee estimation.=C2=A0 One problem my proposed code has=
 is there really just aren&#39;t enough data points of low feerate transact=
ions to give meaningful answers about how likely those are to be confirmed,=
 so its answers are still a bit conservative.=C2=A0 This will improve thoug=
h as the actual distribution of transactions spreads out. =C2=A0 =C2=A0The =
other major needed improvement is to not just state some description of wha=
t has happened in the past, but to actually make a prediction about what is=
 going to happen in the future.=C2=A0 For instance looking at the feerates =
of unconfirmed transactions currently in the mempool could tell you that if=
 you want to be confirmed immediately you&#39;ll need to be high enough in =
that priority queue.</div><div><br></div><div><br></div><div><br></div><div=
><br></div><div><br></div><div><br></div><div><br></div></div><div class=3D=
"gmail_extra"><br><div class=3D"gmail_quote">On Tue, Oct 28, 2014 at 5:55 A=
M, Mike Hearn <span dir=3D"ltr">&lt;<a href=3D"mailto:mike@plan99.net" targ=
et=3D"_blank">mike@plan99.net</a>&gt;</span> wrote:<br><blockquote class=3D=
"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding=
-left:1ex"><p dir=3D"ltr">Could you explain a little further why you think =
the current approach is statistically incorrect? There&#39;s no doubt that =
the existing estimates the system produces are garbage, but that&#39;s beca=
use it assumes players in the fee market are rational and they are not.</p>
<p dir=3D"ltr">Fwiw bitcoinj 0.12.1 applies the January fee drop and will a=
ttach fee of only 1000 satoshis per kB by default. I also have a program th=
at measures confirmation time for a given fee level (with fresh coins so th=
ere&#39;s no priority) and it aligns with your findings, most txns confirm =
within a couple of blocks.</p>
<p dir=3D"ltr">Ultimately there isn&#39;t any easy method to stop people th=
rowing money away. Bitcoinj will probably continue to use hard coded fee va=
lues for now to try and contribute to market sanity in the hope it makes sm=
artfees smarter.</p>
<div class=3D"gmail_quote"><div><div class=3D"h5">On 27 Oct 2014 19:34, &qu=
ot;Alex Morcos&quot; &lt;<a href=3D"mailto:morcos@gmail.com" target=3D"_bla=
nk">morcos@gmail.com</a>&gt; wrote:<br type=3D"attribution"></div></div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #=
ccc solid;padding-left:1ex"><div><div class=3D"h5"><div dir=3D"ltr">I&#39;v=
e been playing around with the code for estimating fees and found a few iss=
ues with the existing code. =C2=A0 I think this will address several observ=
ations that the estimates returned by the existing code appear to be too hi=
gh.=C2=A0 For instance see @cozz in <a href=3D"https://github.com/bitcoin/b=
itcoin/issues/4866" target=3D"_blank">Issue 4866</a>.<div><br></div><div>He=
re&#39;s what I found:</div><div><br></div><div>1)=C2=A0<font face=3D"arial=
, sans-serif">We&#39;re trying to answer the question of what fee X you nee=
d in order to be confirmed within Y blocks. =C2=A0 The existing code tries =
to do that by calculating the median fee for each possible Y instead of gat=
hering statistics for each possible X.=C2=A0 That approach is=C2=A0statisti=
cally=C2=A0incorrect.=C2=A0 In fact since certain X&#39;s appear so frequen=
tly, they tend to dominate the statistics at all possible Y&#39;s (a fee ra=
te of about 40k satoshis)</font></div><div><span style=3D"font-family:arial=
,sans-serif;font-size:13px"><br></span></div><div><span style=3D"font-famil=
y:arial,sans-serif;font-size:13px">2) The existing code then sorts all of t=
he data points in all of the buckets together by fee rate and then reassign=
s buckets before calculating the medians for each confirmation bucket. =C2=
=A0</span><span style=3D"font-family:arial,sans-serif;font-size:13px">The s=
orting forces a relationship where there might not be one.=C2=A0 Imagine so=
me other variable, such as first 2 bytes of the transaction hash.=C2=A0 If =
we sorted these and then used them to give estimates, we&#39;d see a clear =
but false relationship where transactions with low starting bytes in their =
hashes took longer to confirm.</span></div><div><span style=3D"font-family:=
arial,sans-serif;font-size:13px"><br></span></div><div><span style=3D"font-=
family:arial,sans-serif;font-size:13px">3) Transactions which don&#39;t hav=
e all their inputs available (because they depend on other transactions in =
the mempool) aren&#39;t excluded from the calculations.=C2=A0 This skews th=
e results.</span></div><div><br></div><div><font face=3D"arial, sans-serif"=
>I rewrote the code to follow a different approach.=C2=A0 I divided all pos=
sible fee rates up into fee rate buckets (I spaced these logarithmically).=
=C2=A0 For each transaction that was confirmed, I updated the appropriate f=
ee rate bucket=C2=A0with how many blocks it took to confirm that transactio=
n. =C2=A0</font></div><div><span style=3D"font-family:arial,sans-serif;font=
-size:13px"><br></span></div><div><span style=3D"font-family:arial,sans-ser=
if;font-size:13px">The hardest part of doing this fee estimation is to deci=
de what the question really is that we&#39;re trying to answer.=C2=A0 I too=
k the approach that if you are asking what fee rate I need to be confirmed =
within Y blocks, then what you would like to know is the lowest fee rate su=
ch that a relatively high percentage of transactions of that fee rate are c=
onfirmed within Y blocks. Since even the highest fee transactions are confi=
rmed within the first block only 90-93% of the time, I decided to use 80% a=
s my cutoff.=C2=A0 So now to answer &quot;estimatefee Y&quot;, I scan throu=
gh all of the fee buckets from the most expensive down until I find the las=
t bucket with &gt;80% of the transactions confirmed within Y blocks.</span>=
</div><div><br></div><div><span style=3D"font-family:arial,sans-serif;font-=
size:13px">Unfortunately we still have the problem of not having enough dat=
a points for non-typical fee rates, and so it requires gathering a lot of d=
ata to give reasonable answers. To keep all of these data points in a circu=
lar buffer and then sort them for every analysis (or after every new block)=
 is expensive.=C2=A0 So instead I adopted the approach of keeping an expone=
ntially decaying moving average for each bucket. =C2=A0</span><span style=
=3D"font-family:arial,sans-serif;font-size:13px">I used a decay of .998 whi=
ch represents a half life of 374 blocks or about 2.5 days. =C2=A0</span><sp=
an style=3D"font-family:arial,sans-serif;font-size:13px">Also if a bucket d=
oesn&#39;t have very many transactions, I combine it with the next bucket.<=
/span><span style=3D"font-family:arial,sans-serif;font-size:13px"><br></spa=
n></div><div><span style=3D"font-family:arial,sans-serif;font-size:13px"><b=
r></span></div><div><span style=3D"font-family:arial,sans-serif;font-size:1=
3px">Here is a <a href=3D"https://github.com/morcos/bitcoin/pull/3" target=
=3D"_blank">link</a> to the code.=C2=A0 I can create an actual pull request=
 if there is consensus that it makes sense to do so.</span></div><div><span=
 style=3D"font-family:arial,sans-serif;font-size:13px"><br></span></div><di=
v><span style=3D"font-family:arial,sans-serif;font-size:13px">I&#39;ve atta=
ched a graph comparing the estimates produced for 1-3 confirmations by the =
new code and the old code.=C2=A0 I did apply the patch to fix issue 3 above=
 to the old code first.=C2=A0 The new code is in green and the fixed code i=
s in purple.=C2=A0 The Y axis is a log scale of feerate in satoshis per KB =
and the X axis is chain height.=C2=A0 The new code produces the same estima=
tes for 2 and 3 confirmations (the answers are effectively quantized by buc=
ket).</span></div><div><span style=3D"font-family:arial,sans-serif;font-siz=
e:13px"><br></span></div><div><span style=3D"font-family:arial,sans-serif;f=
ont-size:13px">I&#39;ve also completely reworked smartfees.py.=C2=A0 It tur=
ns out to require many many more transactions are put through in order to h=
ave statistically significant results, so the test is quite slow to run (ab=
out 3 mins on my machine).</span></div><div><span style=3D"font-family:aria=
l,sans-serif;font-size:13px"><br></span></div><div><span style=3D"font-fami=
ly:arial,sans-serif;font-size:13px">I&#39;ve also been running a real world=
 test, sending transactions of various fee rates and seeing how long they t=
ook to get confirmed.=C2=A0 After almost 200 tx&#39;s at each fee rate, her=
e are the results so far:</span></div><div><br></div><div><div><font face=
=3D"courier new, monospace">Fee rate 1100 =C2=A0 Avg blocks to confirm 2.30=
 NumBlocks:% confirmed 1: 0.528 2: 0.751 3: 0.870</font></div><div><font fa=
ce=3D"courier new, monospace">Fee rate 2500 =C2=A0 Avg blocks to confirm 2.=
22 NumBlocks:% confirmed 1: 0.528 2: 0.766 3: 0.880</font></div><div><font =
face=3D"courier new, monospace">Fee rate 5000 =C2=A0 Avg blocks to confirm =
1.93 NumBlocks:% confirmed 1: 0.528 2: 0.782 3: 0.891</font></div><div><fon=
t face=3D"courier new, monospace">Fee rate 10000 =C2=A0Avg blocks to confir=
m 1.67 NumBlocks:% confirmed 1: 0.569 2: 0.844 3: 0.943</font></div><div><f=
ont face=3D"courier new, monospace">Fee rate 20000 =C2=A0Avg blocks to conf=
irm 1.33 NumBlocks:% confirmed 1: 0.715 2: 0.963 3: 0.989</font></div><div>=
<font face=3D"courier new, monospace">Fee rate 30000 =C2=A0Avg blocks to co=
nfirm 1.27 NumBlocks:% confirmed 1: 0.751 2: 0.974 3: 1.0</font></div><div>=
<font face=3D"courier new, monospace">Fee rate 40000 =C2=A0Avg blocks to co=
nfirm 1.25 NumBlocks:% confirmed 1: 0.792 2: 0.953 3: 0.994</font></div><di=
v><font face=3D"courier new, monospace">Fee rate 60000 =C2=A0Avg blocks to =
confirm 1.12 NumBlocks:% confirmed 1: 0.875 2: 1.0 =C2=A0 3: 1.0</font></di=
v><div><font face=3D"courier new, monospace">Fee rate 100000 Avg blocks to =
confirm 1.09 NumBlocks:% confirmed 1: 0.901 2: 1.0 =C2=A0 3: 1.0</font></di=
v><div><font face=3D"courier new, monospace">Fee rate 300000 Avg blocks to =
confirm 1.12 NumBlocks:% confirmed 1: 0.886 2: 0.989 3: 1.0</font></div><di=
v style=3D"font-family:arial,sans-serif;font-size:13px"><br></div></div><di=
v><span style=3D"font-family:arial,sans-serif;font-size:13px"><br></span></=
div><div><span style=3D"font-family:arial,sans-serif;font-size:13px">Alex</=
span></div></div>
<br></div></div><span class=3D"">------------------------------------------=
------------------------------------<br>
<br>_______________________________________________<br>
Bitcoin-development mailing list<br>
<a href=3D"mailto:Bitcoin-development@lists.sourceforge.net" target=3D"_bla=
nk">Bitcoin-development@lists.sourceforge.net</a><br>
<a href=3D"https://lists.sourceforge.net/lists/listinfo/bitcoin-development=
" target=3D"_blank">https://lists.sourceforge.net/lists/listinfo/bitcoin-de=
velopment</a><br>
<br></span></blockquote></div>
</blockquote></div><br></div>

--001a113f966258189505067a91e1--