What is a blockchain node?
This is a very important term within web3 parlance (although it didn’t make it into my Bitcoin ABC Book — being pipped to the ’N’ post by Non-Fungible Tokens) and it can best be thought of as the actual ‘computers’ running the blockchain.
But let’s get more technical than this to understand more about what they are and where they are….
To do this, we first need to consider what a blockchain itself is. The short answer to this is just lots of lines of code (for Bitcoin it’s around 700,000 lines of code!) and this code sets the individual rules for the chain on how to create transactions, process them into blocks and cryptographically secure this all in a decentralised way. One common misconception is that each blockchain has just one version of its code but actually as long as the code you’re running follows the most important aspects of the chain then you can run it with tweaks and improvements as needed/wanted. If you change it too much and beyond the foundational rules of the network (e.g creating a total supply of more than 21m bitcoins) then you’d create a brand new blockchain. This is actually how many of the early bitcoin forks and EVM compatible chains were made — they’re just modified versions of their inspiration-chain!
For Bitcoin the most popular version of the code is Bitcoin Core: https://bitcoin.org/en/bitcoin-core/ which is run by 86% of the network.
Ethereum is a bit more complex as it has two ‘layers’ to the chain — the execution layer (which is processing the transactions and activity) and the consensus layer (which is keeping everyone in agreement about what’s happening on the chain). The code for each of these layers is referred to as a ‘client’ and the most popular consensus client is Lighthouse and the most popular execution client is Geth (closely followed by Nethermind).
The code for these Ethereum clients and the Bitcoin software (and all other chains) is run on computers and it’s these computers which we refer to as ‘nodes’.
Importantly these computers must not only run the code for the chain but also be connected to other nodes in the network since a notable aspect of a blockchain is the decentralised decision making and information sharing. This network of nodes therefore;
- Creates the transactions for the chain
- Processed the transactions into blocks
- Comes to consensus about the state of the network and who’s doing what
However not every node has the same role within the network.
Node Types
Although different blockchains use different naming conventions the below hold true for Bitcoin and Ethereum and can be applied to the vast majority of other chains.
Full Node
This node type is responsible for verifying and validating transactions and blocks. It does this by keeping a full copy of the blockchain’s history (at least for Bitcoin full nodes) so that it can check that the address sending the funds has a sufficient amount, and it checks that the spending behaviour is valid based on the protocol rules (as defined in the Bitcoin/Ethereum code). Invalid transactions will be rejected by the node and so not shared to other nodes it’s connected to.
When you spin up a Bitcoin full node it can take a few days as the node needs to sync with a full history of the chain — that’s 669 GB of data! For Ethereum, some clients have a faster sync process (such as Geth’s snap sync) which just takes the latest blocks and state data rather than going from the genesis block, and in all cases only the last 128 blocks of data are kept with the rest pruned to save size. We’ll talk about what nodes who keep the full Ethereum history are called in just a moment.
Running a full node is an important responsibility as this node type maintains the network security and decentralisation. They’re the record keepers for the chain. However running a full node is not a paid role. Many blockchain users will still do so as this gives them access to broadcast their own transactions in a censorship resistant way, or if you’re running a crypto based service you may want to run your own node to quickly collect onchain data, or having your own full node could just be good stewardship of the chain to add to decentralisation. However there’s a sub-type of full nodes where you do get paid; being a miner/validator. This is a full node which also takes on additional consensus responsibilities to actually process the blocks of transactions. It adheres to the consensus model of the chain e.g Proof-of-work for Bitcoin and Proof-of-stake for Ethereum, and is paid for the service it provides of ‘running’ the chain.
So all miners/validators are full nodes, but not all full nodes are miners/validators.
Archive Node
This is a node type on Ethereum (and many other big data chains) but not Bitcoin (although a full node on Bitcoin is really an Archive node too). It’s a node which stores the complete history of the chain rather than the pruned version that the full Ethereum node contains. For Ethereum this is a LOT of data — the full node, pruned to the last 128 blocks is a whopping 1.34 TB of data, and an archive node is a huge ~12–13 TB of data. To put that into context that’s the size of 2.6million songs or 520 4k movies or 10% the data storage of the human brain! This is useful if you want to be able to ‘look back in history’ and see what the balance of an account was at a certain block height and this node type is often used by block explorers, chain analytic firms and advanced wallet providers. Not your average node runner though as that is a LOT of data to store and it takes around 4–8 weeks of syncing time!
Light Node
In Bitcoin this is often referred to as an SPV (simplified payment verification node) and in comparison to a Bitcoin full node or an Ethereum archive node, this node type only holds a portion of the chain’s data. A consequence of this is that it must rely on the full/archive nodes in order to verify transactions — since it doesn’t have the complete history/state of the chain and all its activity. However it can still broadcast transactions to the network. This node type is most useful for wallet providers, who only want information about their own addresses. It’s worth noting that this is a huge development area for Ethereum whose aim is for light nodes to be able to run on mobile phones and even one day participate in consensus!
These node types cover the Bitcoin and Ethereum space and are often seen in other blockchains but you’ll come across a plethora of other node responsibilities for other chains: Repair Node (Solana), Chunk-only Producer Node (Near), Indexing Node (Aptos), Subnet Node (Avalanche), Fisherman (Polkadot) and so many more!
So far we’ve covered that the clients/code are the rules for the protocol itself and this is run by computers which we call nodes. They can take different responsibilities within the network depending on whether they want to keep a full copy of the chain, help to process the blocks, or just keep a limited view for their own activity.
But the next question is where are these nodes being run? Is it in basements across the world, or in more commercial situations? And how many of them are there?
Node Locations and Scale
There are a lot of nodes for each chain, scattered all across the world. #Decentralisation!
For Bitcoin there’s currently over 22,000 nodes, with the vast majority in unknown locations. This is because the node is deliberately obscuring its location — most often by using a VPN, on connecting via location-hiding services like Tor. Whilst your initial reaction may be that something dodgy is afoot, this could be for many legitimate reasons — the main being privacy. If you’re running a bitcoin miner (and successfully so) then you could be earning notable amounts of bitcoin which puts you at potential risk given the increasing value per sat! Some node operators choose to keep their IP hidden to avoid their activity being directly linked to their IP address using chain analytics. This could be for normal privacy preserving methods or could be used by illicit actors who are seeking to avoid detection for tax payment purposes, because they’re broadcasting illicit linked transactions, or running in locations which do not allow crypto activity or are seen on the world stage as high risk.
Of the known locations for nodes there’s a concentration in the US, Germany and specifically data centres used by cloud providers like AWS and GCP.
Back in 2016 there was only around 5,500 nodes so as the Bitcoin industry and bitcoin use has grown, so has interest in contribution to the decentralised nature of the network.
Ethereum has fewer nodes, and in general the location is known. So why are Bitcoin node operators more privacy conscious? Firstly it’s more of a Bitcoin-culture which was born out of privacy preserving money vs the decentralised world computer of Ethereum. Secondly at the technical level a lot of Ethereum nodes are run by RPC providers like Infura, Alchemy, Quicknode etc who don’t seek to hide locations.
One reason that nodes on any blockchain may be looking to hide their location is because they’re being run within a sanctioned country. Of the countries on most global sanctions lists:
- Russia
- Iran
- North Korea
- Syria
- Cuba
- Venezuela
- Crimea / Donetsk / Luhansk (regions)
- Myanmar
- Belarus
- Sudan
- Afghanistan
We can see that there is some Bitcoin and Ethereum node action happening:
These are most definitely a lower bound as node operators in these jurisdictions will, in general, be obscuring their location. So these are just the nodes operating in the open!
And in 2021 it was estimated that Iran was secretly mining as much as 3.1%-4.5% of the global bitcoin supply! I’d be surprised if North Korea didn’t have some hidden mining farms too although they seem to be getting a steady supply from their other efforts. 😬
So who’s running all these nodes across the world? Is it a nerd in a basement?
Alas mostly not any more! In the early days of bitcoin that was definitely the case :
However with the value that can be earned from mining/validation and due to the professional services now needing to use their own (or other’s) nodes, many blockchain nodes are now run in very professional settings.
Think giant data centres; huge warehouses with computers in there all running the protocol’s code and connecting to the thousands of other nodes across the world.
Due to the heat produced by all these computers, many of these nodes are operated in cold weather jurisdictions, and some operators get creative and make use of this excess heat to grow strawberries or keep their chickens or homes warm!
All these computers also need a huge amount of energy and this strays into the often surfaced debate of whether this is a good or bad use of energy and the classic “Bitcoin is using as much energy as X small country”, but that’s a topic for another article ….
Originally published at https://www.linkedin.com.
