Lesson 08: Blockchain Data Structures — From Linked Lists to Merkle Trees

In this chapter, we'll dive deep into the two most fundamental and important data structures in Bitcoin: the blockchain and the Merkle tree. These seemingly simple structures, empowered by cryptography, become a crucial pillar of Bitcoin's security.

Recommended exchange for buying BTC / ETH / USDT: Binance Registration Link

Introduction: How Bitcoin Makes Data Structures Tamper-Proof
Hash Pointers: Guardians of Data Structures
Blockchain: A Tamper-Proof Linked List
Merkle Trees: Efficient Data Verification
Practical Applications: SPV and Light Nodes
FAQ
Conclusion

Introduction: How Bitcoin Makes Data Structures Tamper-Proof

Bitcoin's genius lies in the fact that it didn't invent entirely new data structures — instead, it cleverly added cryptographic elements to ordinary data structures.

Core insight:

Linked list + Hash pointers = Blockchain (tamper-proof linked list)
Binary tree + Hash pointers = Merkle tree (verifiable binary tree)

In traditional centralized systems, data integrity is guaranteed by a trusted central authority. But in decentralized systems like Bitcoin, we need a method that allows every participant to:

Independently verify data integrity
Quickly detect data tampering
Efficiently synchronize and verify transactions

Solution: Add "hash pointers" — a cryptographic tool — to ordinary data structures.

Hash Pointers: Guardians of Data Structures

Starting with Real-World Pointers

Imagine you're looking up a book in a library:

Regular index card: Only tells you "the book is on shelf A-12-34"
Anti-counterfeit index card: Tells you the location AND includes the book's "fingerprint"

If someone secretly replaces the book:

The regular index card wouldn't notice
The anti-counterfeit index card can detect it immediately

The Evolution of Pointers in Computer Science

In computer science, a regular pointer stores the memory address of a data structure:

Regular pointer p -> Memory address of struct A -> Contents of struct A

A hash pointer stores not only the address but also the hash of the structure's contents:

Hash pointer h -> {
  Address: Location of struct A
  Hash: SHA256 of struct A's contents
}

This way we can not only find the data's location but also detect whether it has been tampered with.

Technical Implementation

python

python
class HashPointer:
    def __init__(self, data, address):
        self.address = address
        self.hash = hashlib.sha256(data.encode()).hexdigest()

    def verify_data(self, data):
        return self.hash == hashlib.sha256(data.encode()).hexdigest()

Properties of Hash Pointers

Property	Regular Pointer	Hash Pointer
Locate data	Yes	Yes
Detect tampering	No	Yes
Storage overhead	Small	Larger
Verification speed	Not needed	Fast

Blockchain: A Tamper-Proof Linked List

The "Domino Effect" Tamper-Proof Mechanism

The fundamental difference between a blockchain and a regular linked list is its tamper-proof property:

Regular linked list:

Modifying a node only affects that node
Other nodes remain unchanged
Cannot detect tampering

Blockchain:

Modifying any block triggers a "domino effect"
Chain reaction affects all subsequent blocks
Tampering is immediately detectable

Technical Implementation

python

python
class Block:
    def __init__(self, data, prev_hash):
        self.data = data
        self.prev_hash = prev_hash
        self.hash = self.calculate_hash()

    def calculate_hash(self):
        combined = str(self.data) + str(self.prev_hash)
        return hashlib.sha256(combined.encode()).hexdigest()

class Blockchain:
    def __init__(self):
        self.genesis = Block("Genesis", "0" * 64)
        self.blocks = {self.genesis.hash: self.genesis}
        self.head = self.genesis

    def add_block(self, data):
        new_block = Block(data, self.head.hash)
        self.blocks[new_block.hash] = new_block
        self.head = new_block
        return new_block

    def verify_chain(self):
        # For detailed implementation, see: data_structure_examples.py
        return self._verify_chain_integrity()

The Chain Reaction of Tamper Detection

Let's see what happens if someone tries to tamper with the blockchain:

Genesis Block <- Block 1 <- Block 2 <- Block 3 <- Block 4 (latest)

Tamper scenario: An attacker modifies the contents of Block 2

First-level impact: Block 2's hash value changes.
Second-level impact: The "previous hash" stored in Block 3 no longer matches Block 2's new hash.
Repair attempt: The attacker must also modify Block 3's previous hash.
Chain reaction: Block 3 is modified, so its hash changes too, and Block 4 no longer matches.
Endless repairs: Must keep modifying all the way to the latest block.

Achieving Tamper-Evident Properties

The blockchain achieves two important security properties:

Tamper-Evident: Able to detect tampering.
Tamper-Resistant: Makes tampering computationally infeasible.

Key insight: Just by storing the latest block's hash, you can detect any tampering of the entire chain!

python

python
# Verify the integrity of the entire chain
def verify_chain_integrity(latest_block_hash):
    """Only one hash is needed to verify the entire chain"""
    if latest_block_hash != stored_hash:
        return "Chain has been tampered with!"
    return "Chain is intact"

Blockchain Tamper-Proof Properties

Hash linking: Each block is linked to the previous one via a hash pointer.
Cascade effect: Modifying any block breaks the hash of all subsequent blocks.
Simple verification: Only need to store the latest block's hash to verify the entire chain.
Storage optimization: Nodes can keep only the most recent several thousand blocks; earlier blocks can be verified and obtained from other nodes.

Merkle Trees: Efficient Data Verification

Understanding Merkle Trees Through "Package Inventory"

Imagine you're a warehouse manager for a delivery company:

Traditional approach:

Finding a single package:
1. Check the entire warehouse
2. Time complexity: O(n)

Merkle tree approach:

Zone-based warehouse management:
1. Each zone has an inventory list
2. Zone lists form a master list
3. Time complexity: O(log n)

Technical Implementation

python

python
class MerkleNode:
    def __init__(self, data=None, left=None, right=None):
        self.data = data
        self.left = left
        self.right = right
        self.hash = self.calculate_hash()

    def calculate_hash(self):
        if self.data:  # Leaf node
            return hashlib.sha256(self.data.encode()).hexdigest()
        else:  # Internal node
            combined = self.left.hash + self.right.hash
            return hashlib.sha256(combined.encode()).hexdigest()

class MerkleTree:
    def __init__(self, transactions):
        self.leaves = [MerkleNode(tx) for tx in transactions]
        self.root = self.build_tree(self.leaves[:])

    def build_tree(self, nodes):
        # For complete implementation, see: data_structure_examples.py
        # Core logic shown here
        if len(nodes) == 1:
            return nodes[0]
        # ... recursively build the tree structure

Properties of Merkle Trees

Property	Description	Complexity
Build tree	Bottom-up construction	O(n)
Verify transaction	Provide Merkle path	O(log n)
Update transaction	Recalculate path	O(log n)
Space usage	Additional hash storage	O(n)

Practical Applications: SPV and Light Nodes

Bitcoin Node Classification

In the Bitcoin network, nodes fall into two categories:

Full Node:

Stores complete block content: block header + block body.
Has detailed information for all transactions.
Can independently verify all transactions and blocks.
Typically runs on servers or PCs.

Light Node:

Stores only block header information.
Does not have the complete transaction list.
Relies on full nodes to provide Merkle proofs.
Suitable for mobile wallet applications.

Block Structure Explained

Each Bitcoin block consists of two parts:

Block = Block Header + Block Body

Block Header contains:

Hash of the previous block
Merkle root hash
Timestamp, difficulty, and other metadata

Block Body contains:

The actual transaction list
Organized as a Merkle tree

SPV (Simplified Payment Verification) Background

Real-world challenge:

A full node needs to store the entire blockchain (hundreds of GB).
A mobile wallet cannot store that much data.
But users need to verify transaction authenticity.

Core question: How can you verify a transaction without downloading all the data?

SPV Solution:

python

python
class SPVNode:
    def __init__(self):
        self.block_headers = {}  # Store block headers

    def verify_transaction(self, tx_hash, block_hash, merkle_proof):
        """Verify whether a transaction is in a specified block"""
        if block_hash not in self.block_headers:
            return False
        block_header = self.block_headers[block_hash]
        return self.verify_merkle_proof(tx_hash, merkle_proof,
                                       block_header['merkle_root'])

Light Node Verification Flow

Core verification flow:

python

python
# 1. Get block headers
def sync_block_headers(self, start_height, count):
    # Sync block header information from full nodes
    pass

# 2. Verify Merkle proof
def verify_merkle_proof(self, tx_hash, proof, root_hash):
    current_hash = hashlib.sha256(tx_hash.encode()).hexdigest()
    for step in proof:
        # Recompute hash based on position information
        if step['is_left']:
            combined = current_hash + step['hash']
        else:
            combined = step['hash'] + current_hash
        current_hash = hashlib.sha256(combined.encode()).hexdigest()
    return current_hash == root_hash

### Merkle Proof Verification Scenario

Let's simulate a real-world scenario:

**Scenario**: Alice transfers to Bob using her mobile wallet, and Bob needs to verify whether this transaction has actually been written to the blockchain.

**Participants**:
- Bob: Light node user (mobile wallet)
- Full node: Server providing the Merkle proof

**Verification process**:

1. **Bob's request**: "Please prove that the Alice -> Bob transaction is in Block #12345"

2. **Full node's response**: Provides the Merkle proof path

Merkle Proof = [ { hash: "0x789...", is_left: false }, # Sibling node 1 { hash: "0xabc...", is_left: true }, # Sibling node 2 { hash: "0xdef...", is_left: false } # Sibling node 3 ]


3. **Bob's verification process**:
```python
# Bob only needs to verify this path
current_hash = hash("Alice -> Bob transaction")
for step in merkle_proof:
    if step.is_left:
        current_hash = hash(step.hash + current_hash)
    else:
        current_hash = hash(current_hash + step.hash)

# The final result must equal the Merkle root in the block header
assert current_hash == block_header.merkle_root

Verification result:
- If the final hash matches -> The transaction is indeed in the block
- If the final hash doesn't match -> The transaction doesn't exist or data was tampered with

Key advantages:

Bob doesn't need to download the entire block (which may contain thousands of transactions).
Only a few hash values are needed to complete verification.
Verification complexity: O(log n) instead of O(n).

For detailed implementation, see: data_structure_examples.py


## Bitcoin's Merkle Tree Implementation

### Bitcoin Uses a Standard (Unsorted) Merkle Tree

**Important clarification**: Bitcoin uses a **standard Merkle tree**. Transaction order is the order in which they appear in the block — **no sorting is performed**.

```python
# How transactions are organized in a Bitcoin block
class BitcoinBlock:
    def __init__(self, transactions):
        # Transactions are in their block order, not sorted
        self.transactions = transactions
        self.merkle_tree = MerkleTree(transactions)  # Maintain original order
        self.merkle_root = self.merkle_tree.root.hash

Why Doesn't Bitcoin Need Sorting?

SPV verification: Light nodes need to verify that a transaction is indeed in a block.
Block verification: Full nodes need to verify that a block's Merkle root is correct.
Simple and fast: No additional sorting overhead when building blocks.

Not needed:

Proving a transaction is "not in" a block (rarely needed in Bitcoin).
Complex membership queries.
Data audit functionality.

How Bitcoin's Merkle Tree Works in Practice

Transaction order:

Transaction order in a block:
1. Coinbase transaction (mining reward, always first)
2. User transaction 1
3. User transaction 2
4. ...
5. User transaction N

The Merkle tree is built in this order — no sorting!

Construction process:

python

python
def build_bitcoin_merkle_tree(transactions):
    """Bitcoin Merkle tree construction method"""
    # 1. Maintain original transaction order
    leaves = [hash(tx) for tx in transactions]

    # 2. If transaction count is odd, duplicate the last one
    if len(leaves) % 2 == 1:
        leaves.append(leaves[-1])

    # 3. Pair up and build upward
    while len(leaves) > 1:
        next_level = []
        for i in range(0, len(leaves), 2):
            combined = leaves[i] + leaves[i+1]
            next_level.append(double_sha256(combined))
        leaves = next_level

    return leaves[0]  # Merkle root

Real Example: Bitcoin Block #100000

Let's look at a real Bitcoin block example:

Block information:

Height: 100,000
Transaction count: 4
Merkle root: f3e94742aca4b5ef85488dc37c06c3282295ffec960994b2c0d5ac2a25a95766

Transaction list (in original order):

Coinbase: 8c14f0db3df150123e6f3dbbf30f8b955a8249b62ac1d1ff16284aefa3d06d87
User TX: fff2525b8931402dd09222c50775608f75787bd2b87e56995a7bdd30f79702c4
User TX: 6359f0868171b1d194cbee1af2f16ea598ae8fad666d9b012c8ed2b79a236ec4
User TX: e9a66845e05d5abc0ad04ec80f774a7e585c6e8db975962d069a522137b80c1d

Merkle tree construction:

Level 4:                     [Merkle Root]
                              /      \
Level 3:                [Hash1_2]      [Hash3_4]
                      /      \        /      \
Level 2:        [Hash1] [Hash2] [Hash3] [Hash4]
             (Coinbase) (TX1)   (TX2)   (TX3)

This is exactly how Bitcoin works in practice!

Applicability Boundaries of Hash Pointers

Based on the above concepts, hash pointers have one important limitation:

Data structures that CAN use hash pointers:

Linked lists (blockchain)
Tree structures (Merkle trees)
Directed acyclic graphs (DAGs)

Data structures that CANNOT use hash pointers:

Cyclic structures (circular linked lists, graphs with cycles)

Why don't cycles work?

Circular linked list example:
A -> B -> C -> A  (if using hash pointers)

Problem: Circular dependency
- A's hash depends on C's content
- C's hash depends on B's content
- B's hash depends on A's content
- A's hash depends on C's content <- Forms a cycle!

Result: Cannot determine the hash value of any node

FAQ

Why not use a regular linked list?

Technical reasons:

Integrity verification: Regular linked lists cannot detect data tampering.
Distributed consensus: All nodes need to agree on the data state.
Light node support: Regular linked lists don't support SPV verification.

Security considerations:

Tamper resistance: Hash linking ensures history cannot be changed.
Verification efficiency: No need to download all data.
Fork detection: Quickly identify chain forks.

What are the limitations of Merkle trees?

Technical limitations:

Storage overhead: Requires storing additional hash values.
Update cost: Modifying a leaf node requires recalculating the entire path.
Unsorted nature: Standard Merkle trees don't support efficient non-existence proofs.

Solutions:

Storage optimization: Use compression techniques.
Batch updates: Merge multiple update operations.
Sorted variants: Use sorted Merkle trees.

How is tree imbalance handled?

Problem description: When the number of transactions is not a power of 2, the tree will be unbalanced.

Handling method:

python

python
# When the number of nodes is odd, duplicate the last node to balance the tree
if len(nodes) % 2 == 1:
    nodes.append(nodes[-1])

# For the complete tree balancing algorithm, see: data_structure_examples.py

Conclusion

Bitcoin's data structure design represents the combined wisdom of cryptography and computer science:

Design Philosophy

Simplicity: Uses basic data structures.
Verifiability: Anyone can verify.
Efficiency: Supports light node verification.
Security: Guaranteed by cryptography.

Technical Highlights

Hash pointers: Add tamper-proof properties to data structures.
Blockchain: Achieves immutable historical records.
Merkle trees: Support efficient data verification.
SPV technology: Makes light node verification possible.

Practical Value

Mastering these data structures enables you to:

Understand the foundation of Bitcoin's security.
Develop light nodes and SPV wallets.
Design efficient blockchain applications.
Optimize data verification and synchronization strategies.

Code Practice

This chapter provides complete, runnable code examples:

File	Description	Usage
`data_structure_examples.py`	Complete data structure implementations	`python data_structure_examples.py`
`test_examples.py`	Functional tests and verification	`python test_examples.py`

Learning Checkpoints:

Can you run the code and understand the output?
Can you modify parameters and observe different results?
Can you explain the Merkle proof verification process?

Hands-On Practice: It's recommended to first run the demo code, then read the implementation details, and finally try modifying code parameters.

Home | Follow the Author | Join the Community

第08讲区块链数据结构

Lesson 08: Blockchain Data Structures — From Linked Lists to Merkle Trees

Table of Contents

Introduction: How Bitcoin Makes Data Structures Tamper-Proof

Hash Pointers: Guardians of Data Structures

Starting with Real-World Pointers

The Evolution of Pointers in Computer Science

Technical Implementation

Properties of Hash Pointers

Blockchain: A Tamper-Proof Linked List

The "Domino Effect" Tamper-Proof Mechanism

Technical Implementation

The Chain Reaction of Tamper Detection

Achieving Tamper-Evident Properties

Blockchain Tamper-Proof Properties

Merkle Trees: Efficient Data Verification

Understanding Merkle Trees Through "Package Inventory"

Technical Implementation

Properties of Merkle Trees

Practical Applications: SPV and Light Nodes

Bitcoin Node Classification

Block Structure Explained

SPV (Simplified Payment Verification) Background

Light Node Verification Flow

For detailed implementation, see: data_structure_examples.py

Why Doesn't Bitcoin Need Sorting?

How Bitcoin's Merkle Tree Works in Practice

Real Example: Bitcoin Block #100000

Applicability Boundaries of Hash Pointers

FAQ

Why not use a regular linked list?

What are the limitations of Merkle trees?

How is tree imbalance handled?

Conclusion

Design Philosophy

Technical Highlights

Practical Value

Code Practice

Challenge Quiz

第08讲 区块链数据结构

Lesson 08: Blockchain Data Structures — From Linked Lists to Merkle Trees

Table of Contents

Introduction: How Bitcoin Makes Data Structures Tamper-Proof

Hash Pointers: Guardians of Data Structures

Starting with Real-World Pointers

The Evolution of Pointers in Computer Science

Technical Implementation

Properties of Hash Pointers

Blockchain: A Tamper-Proof Linked List

The "Domino Effect" Tamper-Proof Mechanism

Technical Implementation

The Chain Reaction of Tamper Detection

Achieving Tamper-Evident Properties

Blockchain Tamper-Proof Properties

Merkle Trees: Efficient Data Verification

Understanding Merkle Trees Through "Package Inventory"

Technical Implementation

Properties of Merkle Trees

Practical Applications: SPV and Light Nodes

Bitcoin Node Classification

Block Structure Explained

SPV (Simplified Payment Verification) Background

Light Node Verification Flow

For detailed implementation, see: data_structure_examples.py

Why Doesn't Bitcoin Need Sorting?

How Bitcoin's Merkle Tree Works in Practice

Real Example: Bitcoin Block #100000

Applicability Boundaries of Hash Pointers

FAQ

Why not use a regular linked list?

What are the limitations of Merkle trees?

How is tree imbalance handled?

Conclusion

Design Philosophy

Technical Highlights

Practical Value

Code Practice

Challenge Quiz

第08讲区块链数据结构