Return-Path: Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id DDBD5C0001 for ; Wed, 24 Feb 2021 07:18:44 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id BF04183C80 for ; Wed, 24 Feb 2021 07:18:44 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org X-Spam-Flag: NO X-Spam-Score: -0.587 X-Spam-Level: X-Spam-Status: No, score=-0.587 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, KHOP_HELO_FCRDNS=0.001, LOTS_OF_MONEY=0.001, MONEY_NOHTML=1.298, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_MONEY_PERCENT=0.01, UNPARSEABLE_RELAY=0.001] autolearn=no autolearn_force=no Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4RaiaqiFSFsW for ; Wed, 24 Feb 2021 07:18:43 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from azure.erisian.com.au (cerulean.erisian.com.au [139.162.42.226]) by smtp1.osuosl.org (Postfix) with ESMTPS id 0593F83C7F for ; Wed, 24 Feb 2021 07:18:42 +0000 (UTC) Received: from aj@azure.erisian.com.au (helo=sapphire.erisian.com.au) by azure.erisian.com.au with esmtpsa (Exim 4.92 #3 (Debian)) id 1lEoRR-00016I-9W; Wed, 24 Feb 2021 17:18:39 +1000 Received: by sapphire.erisian.com.au (sSMTP sendmail emulation); Wed, 24 Feb 2021 17:18:32 +1000 Date: Wed, 24 Feb 2021 17:18:32 +1000 From: Anthony Towns To: Jeremy , Bitcoin Protocol Discussion Message-ID: <20210224071832.qhsc2fhw5cf7tybs@erisian.com.au> References: <20210222101632.j5udrgtj2aj5bw6q@erisian.com.au> <7B0D8EE4-19D9-4686-906C-F762F29E74D4@mattcorallo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) X-Spam-Score-int: -5 X-Spam-Bar: / Subject: Re: [bitcoin-dev] Yesterday's Taproot activation meeting on lockinontimeout (LOT) X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Feb 2021 07:18:45 -0000 On Mon, Feb 22, 2021 at 06:10:34PM -0800, Jeremy via bitcoin-dev wrote: > Not responding to anyone in particular, but it strikes me that one can think > about the case where a small minority (let's say H = 20%?) of nodes I don't think that's a good way to try to look at things -- number of nodes has some impacts, but they're relatively minor (pun deflected). I think the things to look at are (from most to least important): (1) what the price indicates / what people buying/selling BTC want (2) what hashpower does (3) what nodes do Here's a concrete example to help justify that ordering. Suppose that for whatever reason nobody is particularly interested in running lockinontimeout=true -- only 0.1% of nodes are doing it and they're not the "economic majority" in any way. In addition, 15% of hashpower have spent almost the entire signalling period not bothering to upgrade and thus haven't been signalling and have been blocking activation. Suppose further that there are futures/prediction markets setup so that people can bet on taproot activation (eg the bitfinex chain split tokens, or some sort of DeFi contracts), and the result is that there's some decent profits to be made if it does activate, enough to tempt >55% of hashpower into running with lockinontimeout=true. That way those miners can be confident it will activate, take up contracts in the futures/predictions markets, and be confident they'll win and get a big payday. (Note that this means the people on the other side of those contracts are betting that taproot *doesn't* activate) Once a majority of hashpower is running lockinontimeout=true, it then makes sense for the remaining hashpower to both signal for activation and also run lockinontimeout=true -- otherwise they risk their blocks being orphaned if too many blocks don't signal, and they build on top of one. Figuring out that a majority of hashpower is/will be running lockinontimeout=true can be done either by a coinbase message or by bip91-style signalling. In that scenario, you end up with >90% of hashpower running with lockinontimeout=true, even if only a token number of nodes in the wild are doing the same. It's possible to do estimates of what happens if a majority of miners are using lockinontimeout=true, and the numbers end up pretty wild. With 90% of miners signalling and running lockinontimeout=true, if the remaining 10% don't signal, they can expect to lose around 3% of revenue ($2M) due to blocks getting orphaned. If the numbers are 85% running lockinontimeout=true, and 15% not signalling, the non-signallers can expect to lose about 37% of revenue ($38M) during the retarget period prior to timeout. If 60% of miners are doing spy-mining for up to 90s, they would expect to lose 0.9% of their spy-mining revenue ($2.5M). If 60% of hashpower is running lockinontimeout=true, while 40% don't signal, the non-signallers will forego ~83% of revenue ($320M) due to their blocks being orphaned, and if 60% of miners spy-mine for 90s, they should expect to lose 5% of revenue ($10M) over the same period. Dollar figures based on 6.25BTC/block at $50k per BTC. https://gist.github.com/ajtowns/fbcf30ed9d0e1708fdc98a876a04ff69#file-forced_signalling_chaos_cost_sim-py Note that if miners simply accept those losses and don't take any action to prevent it, very long reorgs are to be expected -- in the 15% non-signalling scenario, you'd expect to see a 5-block reorg; in the 40% non-signalling scenario, you'd get reorgs of 60+ blocks. (Only people not running lockinontimeout=true would see the blocks being reorged out, of course) So I think focussing on how many nodes have a particular lockinontimeout setting can be pretty misleading. > # 80% on LOT=false, 20% LOT=True > - Case 1: Activates ahead of time anyways That's the case where >90% of hashpower is signalling, and everything works fine. > - Case 2: Fails to Activate before timeout... > 20% *may* fork off with LOT=true. Anyone running with lockinontimeout=true will refuse to follow a chain where lockin hasn't been reached by the timeout height; so if the most work chain meets that condition, lockinontimeout=true nodes will refuse to follow it; either getting stuck with no confirmations at all, or following a lower work chain that does (or can) reach lockin by timeout height. > Bitcoin hashrate reduced, chance of multi > block reorgs at time of fork relatively high, especially if network does not > partition. If the most-work chain fails to activate, and only a minority of hashrate is running lockinontimeout=true, the chance of multiblock reorgs is actually pretty low. The way it would play out in detail, with say 20% of hashpower not signalling and 40% of hashpower running lockinontimeout=true: * the chain reaches the last retarget period; lockinontimeout=false nodes stay in STARTED, lockinontimeout=true nodes switch to MUST_SIGNAL * for the first ~1009 blocks, everyone stays in sync, but block ~1010 becomes the 202nd non-signalling block, meaning that the 60% of hashpower on lockinontimeout=false is now one block ahead of the 40% of hashpower on lockinontimeout=true * it's possible that the 40% have a lucky run and get ahead of the 60% chain causing a reorg. But in that case the within about 5 blocks, another non-signalling block will be mined and the 60% will be ahead again. So the 40% of lockinontimeout=true hashpower has to keep with with miners that have 150% of their hashrate for ~1000 blocks in order for everyone to end up on a locked in chain, which is vanishingly unlike. Even if you set the percentage not signalling to 11% and the percent of hashpower running lockinontimeout=true to 48%, by my count you only get about a 27% chance of ending up reaching lockin on the most work chain. With the 40%/20% figures above it's a flat 0.0%. https://gist.github.com/ajtowns/fbcf30ed9d0e1708fdc98a876a04ff69#file-test_disaster-py It's possible that the 60% will take some action to prevent their blocks being reorged out if the 40% do get lucky. One option would be for them to set lockinontimeout=true -- then we quickly get back to the "almost all hashpower ends up running lockinontimeout=true" and activation is certain. But they could just as easily decide that one getblockchaininfo reports a softfork isn't possible, they won't reorg to a chain where it is possible unless it's 2 or 4 or 6 or whatever blocks longer. > # 80% on LOT=true, 20% LOT=False > - Case 1: Activates ahead of time Anyways > No issues. This is same case where there's plenty of signalling and it's irrelevant what the setting for lockinontimeout is... > - Case 2: Fails to Activate before timeout... I'm not sure what you mean by "before timeout" here -- if you mean it reaches the MUST_SIGNAL phase, with 80% of hashpower running lockinontimeout=true, then things work out okay: even assuming that all 20% that are not running lockinontimeout=true are also not signalling, then the miners who don't signal will lose up to 56% of their revenue for the MUST_SIGNAL period (~$80M) , and if some of the lockinontimeout=true miners do spy-mining and build on top of non-signalling blocks, they may lose something like 1.7% of their revenue as well. In addition we might see reorgs of up to ~10 blocks as this resolves itself. That's a significant loss for the miners who are out of consensus, and the liklihood of large reorgs will make doing business with bitcoin harder, but that at least is all able to be coped with. But if you mean the most work chain reaches the timeout height without achieving locked in state, because the majority of miners aren't running lockinontimeout=true, then the 80% of nodes running lockinontimeout=true will be stalled, and unable to process transactions, until they downgrade. If that ever occurred, it would be an astounding disaster, and I hope the first thing people would do is decide never to run any software by whoever proposed, ACKed or merged the PR that resulted in 80% of nodes running with lockinontimeout=true. *Because* it would be such a disaster to effectively run a denial-of-service attack on 80% of nodes, it's plausible that price signals would indicate to miners that it will be much more profitable to run lockinontimeout=true, preventing that from occuring. But people can make profits out of disasters too -- it might be that people will figure "oh, the price will crash if this happens, so it'll be a chance to get some cheap bitcoins, and maybe put competing miners out of business so I can buy their ASICs off them for cheap too!" > My overall summary is thus: > 1) People care what Core releases because we assume the majority will likely > run it. If core were a minority project, we wouldn't really care what core > released. That seems very backwards to me. I'd put it as: people run core because it makes good, conservative decisions on what features to add. If "choose your own consensus rules" were what the market wanted, then Bitcoin Unlimited or similar would be what everyone was running. If core were to change that policy and push risky changes, I'd hope that users would be able to recognise this, and would switch to an implementation that continues to emphasise safe, conservative policies. > 2) People are upset with LOT=true being suggested as release parameters because > of the narrative that it puts devs in control. If users will just run whatever core devs release, even if it involves contentious changes to consensus rules, then the core devs are in control. > 3) LOT=true having a sizeable minority running it presents major issues to > majority LOT=false in terms of lost blocks during the final period and in terms > of a longer term fork. As above, I think this scenario is easy to avoid if it were to eventuate. > 4) Majority LOT=true has no long term instability on consensus (majority LOT= > true means the final period always activates, any instability is short lived + > irrational). The instability occurs if the lockinontimeout=true chain stalls or is overtaken by a more-work non-activating chain, then users running nodes with that parameter set will stop their nodes, and reinstall/reconfigure it to set lockinontimeout=false. > 5) On the balance, the safer parameter to release *seems* to be LOT=true. But > because devs are sensitive to control narrative, LOT=false is preferred by > devs. I think that conclusion is based on a few shakey assumptions; particularly that people won't downgrade/reinstall back to lockinontimeout=false and that miners will be be pretty naive about allowing their blocks to be orphaned. > 6) Almost paradoxically, choosing a less safe option for a narrative reason is > more of a show of dev control than choosing a more safe option despite > appearances. Going all-in on a bluff can be a good bet 9 times out of 10, while still being a net negative because of the 1 time out of 10 when you lose. In the examples above, the "80% of nodes running the default client can no longer follow the blockchain without manual intervention" is the "lose it all scenario", even if "taproot" is probably one of the 9/10 cases, not the 1/10 case. > 7) This all comes down to if we think that a reasonable number of important > nodes will run LOT=true. What nodes run (as compared to hashpower, or as compared to what people want to buy/sell) is the least important factor in working out what's going to happen. > As a plan of action, I think that means that either: > A) Core should release LOT=true, as a less disruptive option given stated > community intentions to do LOT=true > B) Core  community should vehemently anti-advocate running LOT=true to ensure > the % is as small as possible > C) Do nothing > D) Core community should release LOT=false and vehemently advocate manually > changing to LOT=true to ensure the % is supermajority, but leaving it as a user > choice. I think these are all a bit terrible as plans of action -- "core should release X, then advocate Y" is really not playing to core's strengths. Far better for devs to focus on writing/debugging code, analysing the way things work, making tests, and adding mitigations for risks. Better for bloggers and podcasters and the twitterati to do the advocacy, and core to stick to working on code and saying "no, there are significant technical risks to doing that that we don't yet have mitigations for" when people advocate for risky things. My view is more along the lines of: - the setting for lockinontimeout will not matter until around July 2022, (though maybe as early as May 2022 if blocks come really fast) either technically or even as a game theory incentive - lockinontimeout=true has consensus implications, and depending on the response by miners can cause network interruptions like long chains of reorgs. At best, it hasn't had the same level of review as taproot, and some experienced developers aren't comfortable with it as it stands. Those seem like pretty good reasons not to deploy it immediately, IMO. - the lockinontimeout=true code we've got doesn't do (at least) two things that the bip148 client did that help avoid bad cases: - ensure preferential connections to other nodes setting lockinontimeout=true to prevent network splits if the non-activating chain is longer during/after the MUST_SIGNAL phase - cope with rewinding the chain to the best lockinontimeout=true valid block, in the event a node is upgraded to lockinontimeout=true from either lockinontimeout=false or a version of bitcoind that doesn't have activation parameters set at all I think it makes more sense to: 1) release lockinontimeout=false code with a view to reconsidering it at about ~6 months (so prior to the 23.0 release) 2) do more review of lockinontimeout=true code to ensure everyone understands what behaviours are likely 3) add support for the features from the bip148 client, along with any other mitigations we think of, assuming we can do so in a way that's safe and sane 4) work with miners and mining pools to ensure that if lockinontimeout=true does get used they know how to minimise disruption and losses due to orphaning, etc. That gives us about 6 months work on (2) and (3), and probably 9-12 months to work on (4), and it's all technical rather than advocacy and popularity contests. Six, nine or twelve months should be plenty of time to get pretty clear indications of what both the market in general thinks about things, and what miners are thinking. I think if lockinontimeout=true weren't new code, and devs, miners and users widely understood its potential behaviours and risks, and we didn't have safety features that were still on the todo list, then there'd be a good argument for doing lockinontimeout=true from day 1. I could see that being the case for the next soft-fork, assuming it gets a similar amount of review prior to deployment as taproot has had, eg. But, to me, taking a more cautious approach seems more sensible today. > If I had to summarize the emotional dynamic among developers [...] (Fortunately, you don't have to do that...) Cheers, aj