UTXO accumulators, UTXO commitments, and utreexo
2018-10-08
Tadge Dryja
https://twitter.com/kanzure/status/1049112390413897728
If people saw Benedikt's talk, two days ago, it's related. It's a different construction but same goal. The basic idea is, and I think Cory kind of started to talk about this a few months ago on the mailing list... instead of storing all UTXOs in leveldb, store the hash of each UTXO, and then it's half the size, and then you could almost create it from the hash of the input, it's like 10 bytes more. Instead of storing the hashes of every UTXO, what about storing some compact representation of that and provide proofs.
In Benedikt's talk, there was RSA-based accumulators or possibly this "class group" thing which is totally unproven. I don't know. The thing I'm working on is the hash-based accumulator. The basic properties of an accumulator are that you can-- there are different constructions with different features or operations you can do. Generally you'll have some generator, make the accumulator, some kind of add operation (add an element), and then a prove function. The generator creator returns the accumulator, the add operation takes an accumulator and an element and then spits out a modified accumulator, and then the prove function is... there's a prove and then a verify. The prove function is, I want to prove that this element exists in the accumulator, and this results in some kind of proof. The verify function takes a proof against the accumulator and you get a boolean. Every accumulator construction will have at least these functions, you need to be able to add, prove and verify. Prove generates the proof, verify checks the validity of the proof against the accumulator. Alice makes a proof, and Bob verifies or something. This is the most basic construction. You could also have a prove non inclusion function (prove not exist) and get a different proof that the thing is not in the accumulator. You can sometimes have a delete operation and delete an element from the accumulator and get back a modified accumulator.
The first papers about these had one-way accumulators where you could keep adding things into the accumulator, never remove but it's all in there. Others have the ability to delete and prove non-inclusion. The one I'm talking about do not have non-inclusion... there might be ways to do it, but it will probably be ugly. For a UTXO set, you might not need it.
There are a number of different reasons why you might want this. The goal is to replace leveldb. Instead of storing the whole UTXO set, you're storing this accumulator. Every time a transaction comes in, you add and delete from your accumulator. For every input in every transaction, there's a proof and you verify that proof. That's the model I'm thinking of. There are other possible use cases for this in bitcoin, but I think this one would be cool and be faster.
Long-term, it's nice because then the unbounded growth of the utxo set gets much less scary. Generally these accumulators or proofs are either constant-sized or logarithmic sized. The RSA ones are constant-sized, these are log sized. 10 billion UTXOs, no problem. It shifts the burden to the right place. If you're an exchange with 20 million UTXOs, everyone has to store them in their chainstate folder, but instead this would push the effort to the exchange to keep their own proofs and provide those to people.
Even if that's not what happens, even if what happens is that the burden of creating these proofs is moved to nodees that does so for everyone- like nodes that serve the entire blockchain for everyone- it is still an advantage because you have removed the burden of UTXO management from the validators. Miners don't need it anymore, full nodes don't need it anymore, and these services (whether yourself or other people) can produce the proofs and it's not latency-critical, it can be slow. Every block would have a proof, collecting it from the individual transactions.
This can be done without consensus changes, it could be p2p.
For RSA, if we could do it without trusted setup, even though it's slow, is it still practical for bitcoin? We're not sure. Probably not. The class groups? Probably not. We don't have numbers, but it doesn't seem likely.
If you have an accumulator... let's say you start from scratch and there's not 500k blocks of history. Start a new blockchain and this ignores the bridge node problem.. the bridge node problem is that if you're the first person to use this new software, and you don't have the UTXO set, and you need proofs from every input-- 0.17 will not be able to provide you a proof, so you're stuck. You need some way to bootstrap. Bridge nodes are a huge thing here.
If you have these operations, and everyone starts out with it, and every wallet now keeps proofs for every UTXO that they own and there's laso an update proof operation... and they are sort of combined with the add and delete operation, where it modifies the accumulator, but when you modify the accumulator you might want to modify the proofs. The proofs are with respect to the accumulators. The proofs change every time you change the accumulator and that can be expensive. If you have one UTXO, you're going to do 5000 operations per block to keep it up to date. If you have a million UTXOs, then you're doing like a billion operations... you need to update every proof per every change to the accumulator.
You could use this today-- you don't have to touch disk, everything is in RAM. This is not for trillions of UTXOs. Taking a day to sync the blockchain is kind of blah. Right now bitcoin works well with the current settings and you don't need a fast computer. But this accumulator technique would let you do it on phones pretty viably.
Could full nodes throw away proofs after verification? Yes. Same as pruning. You could create the proofs as you are syncing from the beginning. The data in the proofs is all calculatable from the blockchain. It's just an optimization.
So getting to the bridge nodes. There's a problem. The idea of a bridge node is that it's a wlalet with every UTXO. Its your wallet to keep your proofs up to date. Bridge nodes have a proof for everything. It bridges like v0.17 nodes and then you have a bridge and then you got these utreexo clients... they need proofs, these guys have no idea what these proofs are, all the UTX messages going here, bridge nodes sticks proofs on the inputs, he broadcasts it and propagates it. This is doable, but it depends. This bridge node has to maintain 60 million proofs depending on how big the proofs are this could be annoying... Well maybe they can re-calculate a proof on the fly when an input comes in, how much do they store? With the RSA one, it seems like a dealbreaker in that the bridge node-- it takes, like 100 milliseconds to calculate each. The bridge node allows it to enter the mempool. It has to be everything; you can try to predict it. So it has to store all these proofs and recompute them constantly. For RSA, this is a dealbreaker-- it's years per block of proofs. You could shard it up among a server farm. It's perfectly shardable... but it might be a big data centers. RSA proofs here are very fast to verify.
There is a new development announced at Scaling Bitcoin. There are some that are fast, small, a few kilobytes for the whole block or whole blockchain. It's really cool in size, but speed wise it's not clear yet. So the RSA one is like, tiny, and it's slow to prove... My method is utreexo... For RSA and class group, update proof is slow. For utreexo (hash-based), it's fast. utreexo is aggregatable because they all overlap in the tree. Verify for RSA/CG is fast, it's nanoseconds maybe 1 microsecond let's say. In utreexo, it's fast but not super fast to do verification. Proof size for RSA/CG is O(1) in practice it's 1.5 kilobytes, and accumulator size is 1.5 kilobytes and O(1) and it's also aggreagatable.... Update operation is both add + delete. RSA accumulators, you can prove a million things in constant size. In utreexo, it's O(log(n)) and it's sort of z log(n). You can get a couple hundred kilobytes per block for utreexo, but there's a lot of optimizations you can do. I want to get the proof size for the hash-based accumulator down, there's a lot of tricks. For RSA, the accumulator size is constant-sized, but utreexo is O(log(n)) but in practice it's like 800 bytes. The other tradeoff is trust, or the assumptions. That's a huge one. The huge assumption for RSA is the RSA assumption but also some kind of N = PQ which nobody knows, and class group I don't even know so I can't say anything about that. And the utreexo one is collision-resistant hashes which is an assumption that bitcoin already makes, like SHA. Quantumy, who knows. For RSA, you need a modulus that nobody knows the factorization for, which is not fun for bitcoin in general. But you can do this without a fork. I am running a test node and test bridge and nobody knows and it's fine... you can have an electrum-style thing where you have an address index and you can just build that, for a client that needs those proofs, and you have a server somewhere that provides those proofs. But htis is a nicer model where it's built into clients and they can p2p propagate the proofs, but you can start without that and test it out which is kind of cool.
The main ways to improve proof size for the hash-based accumulator... the idea of hte hash-based accumulator is that you have merkle trees, you put UTXOs in the merkle trees, and you can delete arbitrary elements and then shuffle around to maintain perfect trees. I'll go into that later. In general, each thing is going to need a proof of the height of the tree. If you're proving two cousins, once they intersect, you don't need overlapping proof data. Also, within time, the merkle tree doesn't completely change every few blocks, only small segments of the tree.
When you're doing initial block download, the full node you're talking to knows the future in that they know what's coming next. If they help you out, they can speed you up a lot. The full node can say this UTXO you just added, it gets deleted next block, so just keep that in RAM, so you just remembe rit, so you don't have to cache it or whatever. And other parts of the tree, it stays there for 5 years, so eject it from RAM and I prove it later when you're done with initial block download. It can also help with cache ejection for leveldb stuff. This can bring down the proof size.
Where does the 800 byte accumulator size come from? You store the root of the subtrees. Let me talk about the construction of this hash-based accumulator. You have a merkle forest- it's a perfect tree, always a power of 2. There's some fun bitshifty stuff where you write the total number of UTXOs in binaries, so for 15 items it's 0b1111. If I was to delete two things, I would end up turning htis into a 0 and this subtree would disappear, so it's kind of cool. Adding is easy in which you just put things on the side and recompute your trees. If I was to add something at the end, I don't know these uncircled numbers because I'm just storing the roots. But I can compute the parent of the last one, and the new one, and I can keep going down and get one hash for the entire state.
If you delete something, its neighbor becomes an "only child". It gets ejected to the end of the tree. There's a bunch of stuff that shifts to the end of the tree, and you chop it up into four subtrees like you did before. If you're deleting an element, the proof is going to be related ot its neighbor and the subtrees over there around it. Those hashes in the proof become the next roots for the four subtrees once you deleted that. You can aggregate this, so if you're deleting many things at once... this is in progress but it works... the first thing you do is find places where both siblings were deleted, you have a list of deletions, you find two siblings being deleted, then you delete the parent. If you have single deletions, leaving only children, you can swap the siblings to other subtrees. Keep old stuff on the left, keep new stuff on the right because new stuff is more likely to get spent so the proofs should be smaller for that.
For non-bridge nodes, it's super fast and easy. You have these partial branches and you just hash it a lot. The main knob you have, the main tradeoff, is like RAM vs download. If you have a number of blocks to look at -- assuming the IBD node you're connecting to will tell you when the UTXO gets spent; you might store everything for 4000 blocks going on, so you keep everything in RAM for UTXOs that are spent soon... if you remember everything, then you never need to download any proofs, so the total size of all the proofs is zero. If you have no memory whatsoever and you always need to download new proofs for every block, the total ends up being 160 gigabytes, which is in addition to the 200 gigabyte blockchain data. I think the curve for the tradeoff is really steep. I haven't figured it out yet, but intuitively, if you store everything then you never need to download. I think the curve is steep and since there's a lot of UTXOs getting spent quickly... if you have a few hundred megs of RAM, then you store a bunch of proofs temporarily during IBD. Spacewise, the RSA thing completely obliterates it and it's 1.5 kilobytes for everything.
What about making this into a soft-fork since we like soft-forks and they are so easy to do? Let's stick the roots of this accumulator or this RSA accumulator itself into an OP_RETURN in the coinbase transaction, and then nobody has to validate everything anymore because then they can trust the miners to verify everything. Maybe we take all the proofs for the block and put that into an OP_RETURN, and you can explain why that's important.... It's denial of service resistance. Right now whenever I do a block validation, I first do proof-of-work check which is super cheap and if the PoW doesn't work then I am not going to bother to do the full validation. Due to the fact that all the data that goes into my validation equation is committed by the PoW, if someone on the fly on the p2p protocol invalidates a block by modifying a signature or whatever, then the PoW gets invalidated and I'm not going to bother. The flip side is that whenever I do pass the PoW check, and a block is still invalid, I can mark it as permanently invalid- as it was the block that was invalid, not some auxiliary data around it. I'm never going to check the same block twice. If you're in a world where blocks have auxiliary data that needs to be valid, you can't distinguish between the block being invalid or the auxiliary data being invalid. This gives a DoS vector. You get a block, with modified/attacked proof data, and then oops you can't tell at all- you can ban the node but you don't know if the block is really invalid (or you might think it is, but you've been attacked now). If you're not able to soft-fork in a commitment, then in practice, most of the stuff in the block is going to be stuff ........
Committing the actual accumulator state? No, because that encourages people to not validate anything. The UTXO commitment stuff that people have been talking about 1,000 years.. if you comimt to the UTXO set in the block, then it helps people know they have the right UTXO set, or they can download it. If you add this, then that will happen. You can have a super powerful version of assumevalid and you can just hardcode at height 500000 here's the hashes of the UTXO set, skip there and fully validate after that. That's a double-edged sword: it's cool, and there's use cases for that like syncing on my desktop, validate everything, take a snapshot on my phone of the QR code that the client provides me, and then my phone has the state. But someone might go to blockchain.info and download the accumulator state.
Say we figure out a way to make class group stuff super efficient. Unfortunately it's a completely novel cryptographic assumption. People have assumed this is a hard problem for a very long time, like 200 years, and it has never been used in any actual protocol. So yeah we're not going to make that a consensus rule. But if you want to trust that and setup a p2p network in which it works, then why not. Or even if it's like, here's a cool hash-based model that is pretty efficient and then a year or two later someone makes a better one and it's 10x more efficient then a soft-fork would mean we're stuck with an old one. I guess w ecould d oa soft-fork with an inactivation date but it's still too early to look at those aspects. I don't know, it's a cool way to maybe speed things up. Clients can pick their tradeoffs, like in the non-committed model for this. But you want to remove the need for the miners to maintain UTXO sets. Full validation would become cheaper.
If you use a model where the proofs get outdated, which is not true for all of them but most of them, it means you need to see the full blocks to be able to update your proofs, which completely rules out lite clients. Lite clients need to rely on a service to update the proofs. I think it's unreasonable to say that we can have an ecosystem where there are no services to provide that proof. So I think bridge nodes will be required.
Electrum has an address index, and provides SPV proofs, and in this model it would also be a bridge node. In the hsah-based model, being a bridge node is smaller and easier than having an address index. In the RSA model, it's small but computationally infeasible.
If there were no bridge nodes, then in SPV you can still bother to not validate, but you're still doing a lot of work to keep your proofs valid, and it's almost no extra overhead to actually validate. It also increases costs for people who have bazillions of UTXOs. But you're making it slow and annoying for the people who we might want to discourage anyway, like SPV stuff and having bazillions of UTXOs.
In the hash-based model, bridge nodes don't do any extra CPU work because you're performing all those hashes... in the hash-based bridge node, you just store the whole tree instead of only the roots. You're doing the same hash operations, but you're just keeping the whole thing on disk. It's more i/o, whereas the non bridge-node is storing less overall data. Bridge node in the hsah-based model is not doing any extra CPU work other than disk i/o.
For the bridge node in the hash-based model, it should take on the order of one second. All of my software is single core and unoptimized and it's still pretty fast. It's written in golang. I have it where, I don't have like the p2p layer stuff. It's mostly to get numbers. It does all the hashing and all the stuff, but it doesn't have the node p2p messages. It feels like no disk utilization, it's nice. It's 100% cpu as long as you can keep up the bandwidth. Right now it's 10% cpu. I don't know how niche that is. What do people run full nodes on? I don't know.
Proving the existence of UTXO in small hardware environments like hardware security modules or HSMs, that would be really useful. This is a place where you can do it completely privately. Your host is the bridge node, and it just hands it proofs.
The other maybe feature of libconsensus-- the idea of a function that takes the whole state of the blockchain and spits out yes that's valid or not. It becomes more feasible. The whole thing is like a megabyte, the arguments are like here's a block, here's the state, and it returns a boolean and a new state. This is something that people have been working towards and it seems like it helpps you get there. It might help.
I am writing a paper, trying to make this more legitimate. I have written code in golang. It seems fun. It seems like long-term it's a nice model.
It might be more elegant to not have giant states being carried around for validation. What if you want bigger blocks? utreexo probably helps that. I know the bcash people don't like me because lightning, but maybe they will like utreexo ((laughter)).
You can't have an ordered-insertion thing for monero because you need to prove non-inclusion of the spend in the set of spends. You can have a spent transaction output set... ah but you don't know the spents there. This wouldn't work for monero.
Future work
http://diyhpl.us/wiki/transcripts/mit-bitcoin-expo-2019/utreexo/
https://github.com/mit-dci/utreexo