Bloom Filters and SPV nodes within the bitcoin blockchain

Simplified Payment Verification (SPV) is a method employed by some thin clients within the bitcoin network in order to verify transactions without the requirement to keep an entire copy of the blockchain. As such these thin nodes use bloom filters to specify only the transactions they are interested in receiving updates for.

Let’s unpack that a little….

Nodes on the bitcoin network are computers who run the bitcoin core software. If they hold a full copy of the bitcoin blockchain then they are referred to as full nodes and if not they are referred to a thin nodes. In order to be a miner (and race to group transactions into blocks, solve cryptographic puzzles and hopefully earn some newly mined bitcoins) you must be running a full node. However, there are many full nodes on the network who don’t mine but instead contribute towards the security and decentralisation of the network, simply by validating transactions and holding a full copy of the blockchain.

Then there are thin nodes who hold an incomplete copy of the blockchain. This could be for a number of reasons but the one which we’ll discuss today is for Simplified Payment Verification. As such, the thin node will broadcast out a list of transactions which it is interesting in receiving updates for. The full nodes on the network will then selectively share information to the thin nodes which match they set of transactions they have requested updates for. This enables the thin client to verify transaction information without requiring a full copy of the blockchain.

It is important to note that SPV nodes don’t necessarily list every transaction they could be interested in, as this could be thousands, but instead calculate a bloom filter for the set of transactions and broadcast this out to the full nodes instead. So, what is a bloom filter?

Bloom filters were invented c50 years ago by Burton Howard Bloom and are particularly good for answering questions such as “has this been seen before?’ and “is this item in the set?”. They sort information into buckets by placing it through a hash function and returning an identifier. These identifiers correspond to the buckets and due to the use of hash functions, a bloom filter requires very little space relatively to the size of the items which would need to stored and checked. A fantastic write up of bloom filters is described in this medium article by Jamie Talbot (https://blog.medium.com/what-are-bloom-filters-1ec2a50c68ff) and I will use this as inspiration for the example below.

Let’s take 10 buckets and 3 hash functions. As such when any information is passed through the bloom filter it will return 3 identifiers within the range of 1–10. This is evidently a much simplified version as we could have n number of buckets and m number of hash functions.

In the example below we want to find out what cryptocurrencies Alice owns…

We first put bitcoin (BTC) through the bloom filter and it returns the following identifiers:

Next, we put Ether (ETH) through and see the following identifiers:

Finally, we put Litecoin (LTC) through and receive the following identifiers:

The bloom filter is therefore complete with:

We can then ask the bloom filter whether Alice has any Ether (ETH), and since we know the identifiers for Ether (ETH) are 2,5 and 9, we simply check whether any of these buckets within the bloom filter are complete. It is key to recall at this point that any information passed through a hash function numerous times will always result in the same hash. As such, each time ‘bitcoin’ is parsed through the hash function the same identifiers will be seen. As such, since buckets 2,5 and 9 are complete then we can surmise that Alice has Ether.

However, we then ask the bloom filter whether Alice has Ether Classic (ETC) which has identifiers 6,8,9. Therefore since bucket 9 is complete the bloom filter would provide a false positive of yes. It is therefore key to understand that whilst a bloom filter does not mitigate against the risk of false positives it does however mitigate against the more problematic false negatives. As such, if asked whether Alice owns bitcoin (BTC), with identifiers 1,5 and 7, the bloom filter would never report no since buckets 1,5 and 7 are complete.

In order to lessen the risk of false positives a larger bloom filter with many more buckets would be suitable.

Within SPV nodes, the bloom filters look not for whether Alice has a certain cryptocurrency but whether a certain transaction is related to her. As such the transactions would be parsed through the bloom filter and identifiers received. When the question ‘Does Alice want to know about transaction x?’ is asked, then the identifiers for transaction x are compared with the filled buckets in her bloom filter. If a yes is received, then the node can receive the information and verify the transaction.

However, there is an argument against SPVs in a blockchain network which states that the network would be most secure if every participant ran a full copy of the blockchain and thus ran a full node only. This would ensure that no node or participant relied on another for confirmation about the full history of all transactions. Whilst theoretically this argument may hold, the bitcoin blockchain has grown in size to almost 185,000 MB which is an increase of 100MB since mid-2016 alone. Therefore, as usage continues to grow the size of the blockchain will also and it becomes more computational heavy for to run a full node. As such, thin nodes on the network allow for greater participation and as long as the number of users running full nodes does not dramatically reduce, security on the network should remain.

There is also an argument which is sometimes incorrectly purported and states that SPVs are against the original intention of the bitcoin network since they do not require a user to hold a complete copy of the blockchain themselves. However this is easily dismissed as Satoshi actually discusses them in section 8 of the seminal whitepaper.

It is therefore clear that the use of bloom filters in SPVs allows easier payment verification without the requirement to run a full node on the network, thus improving the ability for users to join and participate in the network.

For more #BitcoinExplained, follow me on LinkedIn (https://www.linkedin.com/in/tarannison)