Andreas Antonopoulos

Bitcoin Q&A: Initial blockchain download

Video: https://www.youtube.com/watch?v=OrYDehC-8TU

Initial blockchain download

Becca asks why does it take it so long to download the blockchain? I do have a fast internet connection and I could download 200GB in less than a hour. What Becca is talking about is what’s called the initial blockchain download or IBD which is the first synchronization of the Bitcoin node or any kind of blockchain node to its blockchain. The answer is that while the amount of data you need to download in order to get the full blockchain is about 200GB or so you’re not simply downloading that and storing it on disk. One of the fundamental functions of the Bitcoin node is to validate all of the rules of consensus. Your node does that. It does that even if you’re not doing a full sync of the blockchain. Every node validates every rule. When you start from the genesis block and you download block 0 and then block 1 and block 2 etc, you start building the blockchain to get to today’s complete blockchain and sync fully with the rest of the network. Every block you download, you download all of the transactions in that block and then your node goes through and it validates everything. All of the signatures, all of the spends, all of the amounts, all of the coinbase rewards, all of the fees. It recreates and reconstructs every soft fork and upgrade and change in the code replicating the entire history from January 3rd 2009. It behaves like a node in 2009 for the first period of downloading the blockchain and then as the rules change it counts the votes in the soft fork and changes the rules in real time and then evaluates the next block based on the new rules. It recalculates the difficulty and sees if the miners are missing the target for blocks that were mined in 2010. It evaluates every rule as if it is at that time downloading it for the first time. It simulates living in 2009 and then in 2010 etc all the way up to today. Every bug, every fork, every change. That takes more than just bandwidth. It takes CPU. It also takes a big amount of disk indexing. If you think about it, in order to validate whether a transaction isn’t double spending something or that it is properly spent. It has got to keep a UTXO set in memory. This UTXO set it is going to use in order to validate that amount was available for spending. It has got to index all the UTXO and transaction IDs when your transaction refers to a previous transaction, it has to look it up by hash. It has to reconstruct the Merkle roots of all of the blocks and keep all of the block hash from the previous block listed. That’s a lot of database indexing. That’s what is happening with your node. I would guess that your real problem here is not bandwidth on the network but it is probably bandwidth to the hard drive so capacity through to the hard drive, the performance of the hard drive as well as available memory. A recommended minimum configuration involves 4GB of RAM and that’s only if you have a relatively fast solid state disk like a SSD disk because of all of the indexing and reading and writing from the database on the disk that will be happening. If you don’t have a solid state disk then you need to do a lot more caching in RAM in order to compensate for the performance of an old mechanical hard drive. In that case you might need 8 or 16 GB of RAM to compensate for that. I would guess that your bottleneck is disk I/O perhaps CPU although that is less likely. If you’re running it on a 4 Core modern processor it shouldn’t be a problem. If you’re doing all of this on a Raspberry Pi with only 2GB of RAM then I can see what your problem is. That is going to be all of the bottlenecks within the system rather than your bandwidth.