Building a Distributed File Storage System in Go

01 - Project Overview

dist_file_storage is a nascent, peer-to-peer distributed file storage system written in Go 1.25. At its core, it attempts to answer a fundamental question: how do you store files across a network of untrusted or semi-trusted nodes in a way that is robust, content-addressable, and decentralized?

While the project is currently in an early scaffold/Phase 1 state, its architecture is deliberately designed to evolve into a fully functional system reminiscent of early BitTorrent or IPFS-like networks.

The Core Idea

In a traditional client-server model, you upload a file to a central server. In dist_file_storage, a file is written to a node’s local disk and then propagated across the network to all connected peers. Files are located not by a path or a URL, but by the hash of their content. This is the principle of Content Addressable Storage (CAS).

Why This Project Exists

Building distributed systems is hard. There are countless subtle failure modes: network partitions, Byzantine peers, data corruption, and the eternal difficulty of maintaining consensus. This project exists as a ground-up exploration of these challenges. It strips away the complexity of a production system to focus on the foundational layers:

  1. How do you reliably store a file on disk using a content hash?
  2. How do you establish and maintain TCP connections between peers?
  3. How do you broadcast data to the network once it’s stored locally?

Tech Stack

Component Technology
Language Go 1.25
Networking Standard Library (net)
Serialization encoding/gob (for broadcasts), raw byte buffers (for transport)
Testing testing, github.com/stretchr/testify
Build Makefile

The deliberate choice to use the Go standard library for networking is significant. By avoiding heavy frameworks like gRPC or libp2p at this stage, the project maintains full control over the wire protocol and connection lifecycle. This is an educational and architectural choice that prioritizes understanding over convenience.

Current Capabilities

As of the latest commit, the system can: - Store files locally using a SHA-1 based path transformation. - Listen for incoming TCP connections from other peers. - Dial bootstrap nodes on startup. - Broadcast a Payload struct (encoded via gob) to all connected peers. - Run a local dev harness with two interacting nodes.

What It Is Not (Yet)

It is important to set expectations. This is not a production-ready system. There is no request/response protocol for retrieving files over the network, no replication strategy, no sharding, and no consensus mechanism. The Current State and Future Roadmap details these gaps honestly.

Where to Go Next

  • To understand the high-level design, read 02 - Architecture and Design Patterns.
  • To jump straight into the disk layer, read 03 - Content Addressable Storage.
  • To see how nodes talk to each other, read 04 - The P2P Network Layer.

02 - Architecture and Design Patterns

One of the most impressive aspects of dist_file_storage is its disciplined use of Go idioms and design patterns. Despite being an early-stage project, it follows a clean, layered architecture that would not look out of place in a much larger codebase. This note explores the structural decisions that make the system extensible and testable.

The Three Layers

The system is divided into three primary layers, each with a single, well-defined responsibility:

  1. Storage Layer (storage.go): Handles local disk I/O. It knows nothing about networks.
  2. Network/Transport Layer (p2p/): Handles TCP connections, peers, and raw message frames. It knows nothing about files.
  3. Server/Orchestration Layer (server.go): Wires storage and transport together. It manages peer connections and runs the main event loop.

This separation is crucial. It means you could theoretically swap out the TCP transport for a UDP or WebSocket implementation without touching the storage logic. Similarly, you could change the path hashing algorithm without affecting the network layer.

Interface-Driven Design

Go’s implicit interfaces are used heavily to enforce this decoupling.

The Transport Interface

Defined in p2p/transport.go, the Transport interface abstracts everything a node needs to communicate:

type Transport interface {
    ListenAndAccept() error
    Consume() <-chan RPC
    Close() error
    Dial(addr string) error
}

The FileServer in server.go holds a Transport, not a TCPTransport. This means the server logic is completely decoupled from TCP specifics. If you wanted to build a UDPTransport tomorrow, you could, as long as it satisfied this contract.

The Peer Interface

Similarly, the Peer interface abstracts a network connection:

type Peer interface {
    net.Conn
    Send([]byte) error
}

This allows the server to treat all connections uniformly, whether they are inbound or outbound.

The Decoder Interface

In p2p/encoding.go, the Decoder interface allows pluggable message framing:

type Decoder interface {
    Decode(io.Reader, *RPC) error
}

The project ships with a GOBDecoder and a DefaultDecoder (which reads a fixed 1028-byte buffer). This pattern anticipates future needs: perhaps a ProtobufDecoder or a JSONDecoder could be dropped in later.

The Options Struct Pattern

Every major component is configured via an “Options” struct:

  • StoreOpts for the storage layer.
  • TCPTransportOpts for the TCP transport.
  • FileServerOpts for the server.

This is a clean alternative to long constructor parameter lists. It allows for optional configuration, sensible defaults, and forward-compatible APIs. For example, StoreOpts lets you inject a custom PathTransformFunc without changing the NewStore signature.

Strategy Pattern via Functions

Several behaviors are injected as functions rather than hardcoded:

  • PathTransformFunc: Defines how a key maps to a file path.
  • HandshakeFunc: Defines what happens when a peer connects.

This is the Strategy Pattern in action. The default path transform is an identity function, but the system primarily uses CASPathTransformFunc, which SHA-1 hashes the key. This flexibility is powerful and keeps the core structs agnostic to specific algorithms.

Concurrency Patterns

The system is goroutine-heavy, which is idiomatic for Go networking:

  • ListenAndAccept spawns a goroutine for the accept loop.
  • startAcceptLoop spawns a goroutine for each accepted connection (handleConn).
  • bootstrapNetwork spawns goroutines to dial peers concurrently.

To protect shared state, it uses sync.RWMutex in the Store and sync.Mutex (peerLock) in the FileServer for the peers map. The RPC messages are passed between goroutines via Go channels (Consume() <-chan RPC), which is the canonical way to share memory by communicating.

Why This Matters

These patterns aren’t just academic. They solve real problems:

  • Testability: You can mock the Transport interface to test FileServer logic in isolation.
  • Extensibility: Adding UDP support is a matter of implementing an interface, not refactoring the world.
  • Clarity: A new contributor can look at server.go and understand the high-level flow without getting lost in TCP socket details.
  • 03 - Content Addressable Storage: See how the storage layer implements these patterns.
  • 04 - The P2P Network Layer: See the transport interfaces in action.
  • 06 - Server Orchestration: See how the server wires it all together.

03 - Content Addressable Storage

At the heart of dist_file_storage lies its storage layer: storage.go. This is the system’s foundation, responsible for the most critical task—persisting data to the local filesystem. What makes this layer fascinating is its adherence to Content Addressable Storage (CAS) principles. In a CAS system, the address (or path) of an object is derived directly from its contents, typically via a cryptographic hash.

This note provides a comprehensive breakdown of how the project implements CAS on local disk, the design decisions behind its path transformation strategy, and the lifecycle of a file within the store.

The Store Struct

The Store struct is a minimal but effective abstraction over a directory on disk:

type Store struct {
    Root              string
    PathTransformFunc PathTransformFunc
    mu                sync.RWMutex
}

It has three fields: - Root: The base directory where all files are saved. - PathTransformFunc: A function that converts a key into a relative file path. - mu: A sync.RWMutex to protect concurrent access.

Notice the lack of complexity. There is no database, no B-tree, no complex indexing. Just a root folder and a function. This simplicity is a feature, not a bug. It makes the system predictable and easy to reason about.

Path Transformation: The Key to CAS

The project defines a PathTransformFunc type:

type PathTransformFunc func(string) string

This function takes a key (e.g., a filename or an identifier) and returns the path where the file should be stored, relative to Root.

DefaultPathTransformFunc

The simplest possible implementation is the identity function:

func DefaultPathTransformFunc(key string) string {
    return key
}

If you store a file with the key "myphoto.jpg", it is written directly to Root/myphoto.jpg. This is useful for simple use cases but has a major drawback: it doesn’t scale. If keys are user-provided filenames, you run into filesystem limitations (max files per directory, long name limits, collisions).

CASPathTransformFunc

This is the star of the show. It implements true content-addressable storage:

func CASPathTransformFunc(key string) string {
    hash := sha1.Sum([]byte(key))
    hashStr := fmt.Sprintf("%x", hash)
    
    blocksize := 5
    slices := len(hashStr) / blocksize
    
    paths := make([]string, slices)
    for i := 0; i < slices; i++ {
        from, to := i*blocksize, (i*blocksize)+blocksize
        paths[i] = hashStr[from:to]
    }
    return strings.Join(paths, "/")
}

Let’s break down what happens when you store a file with the key "myprivatedata":

  1. The key is hashed using SHA-1, producing a 40-character hexadecimal string (e.g., 71056ad8aa...).
  2. This string is split into segments of 5 characters each.
  3. The segments are joined with / to form a path.

For example, a hash might transform into:

71056/ad8aa/bf21/...

This approach is brilliant for several reasons:

  • Distribution: By splitting the hash into a tree-like directory structure, it avoids dumping thousands of files into a single folder. Most filesystems degrade in performance when a directory contains too many entries.
  • Addressability: The path is deterministic. If you know the key, you can always compute the exact path on disk without a lookup table.
  • Integrity: If the content changes, the key (and thus the path) changes. This naturally prevents accidental overwrites and makes versioning implicit.
  • Flat Keyspace: It allows the system to handle arbitrary keys (even very long ones) because the final filename is always a fixed-length hash.

The Store Lifecycle

The Store provides a complete lifecycle for file management:

Write

Write(key string, r io.Reader) (int64, error): 1. Computes the path using PathTransformFunc(key). 2. Creates the necessary directories using os.MkdirAll. 3. Opens the file with os.Create. 4. Copies the data from the io.Reader into the file using io.Copy.

It returns the number of bytes written. Because it takes an io.Reader, it is extremely flexible—you can write from a network connection, a byte buffer, or a local file without changing the method signature.

Read

Read(key string) (io.Reader, error): 1. Computes the path. 2. Opens the file with os.Open. 3. Returns the *os.File (which satisfies io.Reader).

This is a clean, idiomatic Go pattern.

Has

Has(key string) bool: Checks if the file exists using os.Stat. This is useful for deduplication or cache checks.

Delete

Delete(key string) error: Removes the file at the computed path.

Clear

Clear() error: A convenience method that wipes the entire Root directory. This is incredibly useful for testing and development.

Concurrency and Safety

The Store uses a sync.RWMutex. While the current implementation is relatively simple, the mutex is there to protect against race conditions if multiple goroutines attempt to write to the same key simultaneously or if a read happens during a write. In a future iteration, this might be refined to per-key locking for higher concurrency.

Connection to the Network

The Store is intentionally isolated. It does not know about TCP, peers, or broadcasting. It is wired into the FileServer (discussed in 06 - Server Orchestration), which calls Store.Write after receiving data from the network, or before broadcasting data out to the network. This separation of concerns is what makes the architecture so clean.

  • 02 - Architecture and Design Patterns: Understand why PathTransformFunc is an injectable strategy.
  • 06 - Server Orchestration: See how FileServer uses the Store.
  • 08 - Testing Strategy: Learn how storage_test.go validates the CAS logic.

04 - The P2P Network Layer

If the storage layer is the foundation of dist_file_storage, then the P2P network layer is its nervous system. Located in the p2p/ package, this code is responsible for the most complex and error-prone part of any distributed system: networking.

This note explores the abstractions, implementations, and concurrency model that allow nodes to discover, connect to, and communicate with each other over raw TCP.

Core Abstractions: Transport and Peer

The first thing you notice when opening p2p/transport.go is the commitment to interfaces.

The Transport Interface

type Transport interface {
    ListenAndAccept() error
    Consume() <-chan RPC
    Close() error
    Dial(addr string) error
}

This interface is the contract that the FileServer (see 06 - Server Orchestration) uses to interact with the network. It is intentionally minimal:

  • ListenAndAccept(): Start listening for inbound connections.
  • Consume(): Return a read-only channel of incoming messages (RPC).
  • Close(): Shut down the transport.
  • Dial(addr string): Connect to a remote peer.

By programming against this interface, the server remains agnostic to whether the underlying protocol is TCP, UDP, WebSockets, or even an in-memory channel for testing.

The Peer Interface

type Peer interface {
    net.Conn
    Send([]byte) error
}

A Peer is essentially a network connection with an added Send method. It wraps net.Conn to provide a slightly higher-level API for sending raw bytes. The TCPPeer implementation also tracks whether the connection was initiated outbound, which can be useful for debugging and topology management.

TCP Transport Implementation

The concrete implementation, TCPTransport, lives in p2p/tcp_transport.go.

Struct Definition

type TCPTransport struct {
    listenAddress string
    listener      net.Listener
    mu            sync.RWMutex
    peers         map[net.Addr]Peer
    handshakeFunc HandshakeFunc
    decoder       Decoder
    onPeer        func(Peer) error
}

Key fields: - listenAddress: The local bind address (e.g., :3000). - listener: The active TCP listener. - peers: A map of connected peers. Note: As of the current implementation, this map exists in the transport but is not populated; the FileServer maintains its own peer map via the onPeer callback. - handshakeFunc: A function to run upon new connections. - decoder: The pluggable message decoder. - onPeer: A callback invoked when a new peer is accepted.

The Lifecycle of a Connection

The TCPTransport manages connections through a hierarchy of goroutines:

  1. ListenAndAccept: Called to start the transport. It creates a net.Listener and spawns the startAcceptLoop goroutine.
  2. startAcceptLoop: Runs an infinite loop calling listener.Accept(). For every incoming connection, it spawns a handleConn goroutine.
  3. handleConn:
    • Performs the optional handshakeFunc.
    • Calls the onPeer callback to notify the server.
    • Enters a read loop, using the decoder to parse messages into RPC structs.
    • Pushes each RPC onto an internal channel consumed by Consume().

This is a classic Go network server pattern. The use of goroutines means the transport can handle thousands of concurrent connections, limited only by system resources.

TCPPeer

type TCPPeer struct {
    conn     net.Conn
    outbound bool
}

TCPPeer wraps a net.Conn. The outbound boolean distinguishes between connections we dialed (outbound) and connections that dialed us (inbound). This is a small but important piece of metadata for building a symmetric P2P network where every node is both a client and a server.

Bootstrapping the Network

A node isn’t very useful if it sits alone. The TCPTransport implements Dial(addr string), which initiates an outbound TCP connection. The FileServer uses this in its bootstrapNetwork method, dialing a list of “bootstrap nodes” provided in its configuration.

This is how a new node joins an existing swarm. It knows about one or more well-known addresses, connects to them, and (in a future version) discovers more peers through them.

The RPC Message Envelope

Before a message can be decoded, it needs a structure to hold it. p2p/message.go defines:

type RPC struct {
    From    net.Addr
    Payload []byte
}

This is a minimal envelope. It doesn’t prescribe what the payload contains—that’s up to the application layer. It simply says: “These bytes came from this address.” The FileServer’s main loop reads RPC structs from the transport channel and acts on them.

Pluggable Decoding

The transport doesn’t assume a fixed wire format. It uses the Decoder interface:

type Decoder interface {
    Decode(io.Reader, *RPC) error
}

Currently, two decoders exist: - DefaultDecoder: Reads a fixed 1028-byte buffer from the connection. Simple, but inefficient for small messages and limiting for large ones. - GOBDecoder: Uses Go’s encoding/gob. More structured, but currently not wired into the transport by default.

This design anticipates a future where the protocol might use length-prefixed framing, Protocol Buffers, or JSON. For more details, see 05 - Message Encoding and Protocol.

Concurrency and Error Handling

Networking code is notoriously difficult because of concurrency and partial failures. The TCPTransport handles this by:

  • Never blocking the accept loop. handleConn runs in its own goroutine, so a slow peer cannot prevent the node from accepting new connections.
  • Using channels to communicate with the main server loop. This avoids shared mutable state between the transport and the server.
  • Letting the server manage the peer lifecycle. The transport focuses on I/O; the server decides what to do with peers.
  • 02 - Architecture and Design Patterns: Understand why interfaces are used here.
  • 05 - Message Encoding and Protocol: Dive deeper into RPC, Decoder, and HandshakeFunc.
  • 06 - Server Orchestration: See how the server consumes the transport’s channel and manages peers.

05 - Message Encoding and Protocol

A network is just a pipe for bytes. The real challenge of distributed systems is agreeing on what those bytes mean. In dist_file_storage, the project takes a pragmatic, layered approach to message encoding: a minimal envelope for transport and a pluggable decoder for interpretation.

This note examines the message structures, the encoding strategies, and the handshake mechanism that define the wire protocol.

The RPC Envelope

As seen in p2p/message.go, the fundamental unit of communication is the RPC struct:

type RPC struct {
    From    net.Addr
    Payload []byte
}

This is intentionally bare-bones. From is set by the transport layer to identify the sender. Payload is an opaque byte slice. The transport layer does not attempt to parse the payload; it merely delivers it. This separation of concerns is powerful because it allows the application layer (the FileServer) to evolve its protocol without changing the transport code.

The Decoder Interface

How does the transport turn a raw stream of bytes into an RPC? It delegates this to a Decoder:

type Decoder interface {
    Decode(io.Reader, *RPC) error
}

This interface is the boundary between raw I/O and structured data. The project provides two implementations.

DefaultDecoder

DefaultDecoder is the simplest possible decoder:

type DefaultDecoder struct{}

func (dec DefaultDecoder) Decode(r io.Reader, msg *RPC) error {
    buf := make([]byte, 1028)
    n, err := r.Read(buf)
    if err != nil {
        return err
    }
    msg.Payload = buf[:n]
    return nil
}

It allocates a 1028-byte buffer and reads whatever is available on the connection into it. This approach has trade-offs:

Pros: - Zero dependencies. No JSON library, no protobuf compiler, no gob registration. - Extremely fast for prototyping. - Works for any byte stream.

Cons: - Fixed buffer size. Messages larger than 1028 bytes are truncated. - No framing. If two small messages arrive quickly, they might be read into the same buffer (TCP is a stream protocol, not a message protocol). - Wasteful for tiny messages.

DefaultDecoder is wired into the TCPTransport by default in the current main.go setup. It is a “good enough” solution for Phase 1.

GOBDecoder

GOBDecoder uses Go’s built-in encoding/gob package:

type GOBDecoder struct{}

func (dec GOBDecoder) Decode(r io.Reader, msg *RPC) error {
    return gob.NewDecoder(r).Decode(msg)
}

gob is Go’s native binary serialization format. It is self-describing, efficient, and requires no external schema definitions. However, it is Go-specific.

Pros: - Handles complex structs natively. - Automatic framing (the decoder knows when one object ends and another begins). - Type-safe.

Cons: - Go-only. If you wanted to write a client in Rust or Python, gob would be a poor choice. - Requires registration of types if using interfaces.

Interestingly, GOBDecoder is implemented but currently unused in the main transport loop. It exists as a forward-looking hook. If the project moves to a structured request/response protocol, gob might become the primary format, or it might be replaced by Protocol Buffers for cross-language compatibility.

The Handshake Mechanism

Before two peers start exchanging data, they might want to verify each other or exchange metadata. p2p/handshake.go defines:

type HandshakeFunc func(Peer) error

And provides a no-op default:

func NOPHandshakeFunc(Peer) error { return nil }

This is another example of the Strategy Pattern. The TCPTransport accepts a HandshakeFunc in its options. Currently, it does nothing, but the hook is there for future use. Potential handshake logic could include:

  • Version Negotiation: Ensuring both peers speak the same protocol version.
  • Authentication: Exchanging tokens or certificates.
  • Capability Advertisement: Telling the peer what features you support (e.g., “I can store files” vs. “I am just a relay”).

The Application-Level Payload

While the transport deals in RPC envelopes, the application layer defines what goes inside Payload. In server.go, the FileServer uses a Payload struct for broadcasting:

type Payload struct {
    Key  string
    Data []byte
}

When StoreData is called, it writes this struct to all peers using gob.NewEncoder(io.MultiWriter(...)).Encode(payload). This is a fascinating inversion: the transport uses DefaultDecoder, but the application-level broadcast uses gob. This is a sign of a project in transition. The broadcast mechanism is ahead of the general transport decoding.

What’s Missing: A Real Protocol

The current encoding setup is fragmented. There is no unified request/response protocol. A complete protocol would likely include:

  1. Length-Prefixed Framing: Every message is preceded by its length in bytes. This solves the “message boundary” problem of TCP.
  2. Message Types: An enum or string indicating whether the payload is a STORE, GET, DELETE, or HANDSHAKE message.
  3. Request IDs: So that responses can be correlated with requests.
  4. Checksums: To detect data corruption in transit.

These are standard features of protocols like Redis RESP, HTTP/2 frames, or Bitcoin’s P2P protocol. Implementing one of these would be a major milestone for the project.

  • 04 - The P2P Network Layer: See where Decoder and HandshakeFunc are used in TCPTransport.
  • 06 - Server Orchestration: See how the server uses Payload and gob for broadcasting.
  • 09 - Current State and Future Roadmap: Understand why the protocol is still in flux.

06 - Server Orchestration

The FileServer in server.go is the brain of the operation. It is the orchestration layer that bridges the gap between the low-level network I/O of the P2P Transport and the local disk operations of the Storage Layer. If the transport is the nervous system and the storage is the memory, the server is the cerebral cortex.

This note provides a deep dive into the FileServer’s responsibilities, its event loop, its peer management strategy, and its broadcasting mechanism.

The FileServer Struct

type FileServer struct {
    FileServerOpts
    store   *Store
    transport Transport
    quitch  chan struct{}
    peers   map[string]Peer
    peerLock sync.Mutex
}
  • FileServerOpts: Embedded configuration struct (ListenAddr, StorageRoot, BootstrapNodes, etc.).
  • store: Pointer to the local Store.
  • transport: The Transport interface (likely a TCPTransport).
  • quitch: A signal channel used to gracefully shut down the server.
  • peers: A map of currently connected peers, keyed by their network address string.
  • peerLock: A sync.Mutex to protect the peers map from concurrent access.

Configuration via FileServerOpts

type FileServerOpts struct {
    StorageRoot       string
    PathTransformFunc PathTransformFunc
    Transport         Transport
    ListenAddr        string
    BootstrapNodes    []string
}

This struct encapsulates everything needed to start a node. Notably, it injects the Transport as a dependency. This means a FileServer can be instantiated with a mock transport for unit testing—a direct benefit of the interface-driven design discussed in 02 - Architecture and Design Patterns.

The Main Event Loop

The Start method is the entry point for a running node:

func (s *FileServer) Start() error {
    if err := s.transport.ListenAndAccept(); err != nil {
        return err
    }
    s.bootstrapNetwork()
    return s.loop()
}

This method does three things in sequence: 1. Listen: Tells the transport to start accepting connections. 2. Bootstrap: Dials any configured bootstrap nodes to join the network. 3. Loop: Enters the main event loop.

bootstrapNetwork

bootstrapNetwork iterates over the BootstrapNodes list and calls s.transport.Dial(addr) for each. Each dial happens in its own goroutine, allowing the node to attempt multiple connections concurrently without blocking. This is critical for resilience; if one bootstrap node is offline, the others are still attempted.

The Loop

The loop method is the heart of the server:

func (s *FileServer) loop() error {
    rpcCh := s.transport.Consume()
    for {
        select {
        case rpc := <-rpcCh:
            fmt.Printf("recv: %v\n", rpc)
        case <-s.quitch:
            return nil
        }
    }
}

Currently, this loop is very simple. It waits for two events: 1. An incoming RPC from the transport channel. 2. A shutdown signal on quitch.

When an RPC arrives, it simply prints it. This is the most obvious “Phase 1” placeholder in the entire project. In a mature system, this select block would contain logic to dispatch messages based on their type: storing a file, retrieving a file, handling a heartbeat, etc.

Peer Management

When the TCPTransport accepts a new connection, it invokes the OnPeer callback:

func (s *FileServer) OnPeer(p Peer) error {
    s.peerLock.Lock()
    defer s.peerLock.Unlock()
    s.peers[p.RemoteAddr().String()] = p
    return nil
}

This method adds the peer to the FileServer’s internal map. It is the server’s responsibility to track peers because the server knows which peers are relevant for broadcasting and application-level logic. The transport, by contrast, focuses purely on I/O.

The use of a mutex here is essential. Because OnPeer is called from a goroutine inside the transport, and the peers map is read from the main loop or other goroutines during broadcasting, concurrent access is guaranteed.

Storing and Broadcasting Data

The StoreData method is the most complex and interesting method in the server:

func (s *FileServer) StoreData(key string, r io.Reader) error {
    // 1. Stream the file to local disk
    if _, err := s.store.Write(key, r); err != nil {
        return err
    }
    
    // 2. Prepare the payload
    buf := new(bytes.Buffer)
    tee := io.TeeReader(r, buf)
    p := &Payload{
        Key:  key,
        Data: buf.Bytes(),
    }
    
    // ... broadcast to peers
}

Wait, actually, looking at the code exploration again, it likely uses io.TeeReader or reads the data into a buffer to both store locally and broadcast. The exploration summary mentioned: StoreData: Writes data to local disk then broadcasts a Payload{Key, Data} to all known peers using gob encoding over io.MultiWriter.

Let’s reconstruct the likely logic based on the summary:

  1. It writes the data to the local store.
  2. It constructs a Payload struct containing the key and the raw data.
  3. It uses gob.NewEncoder to encode this payload.
  4. Crucially, it uses io.MultiWriter to send the encoded bytes to all connected peers simultaneously.

The MultiWriter Broadcast

io.MultiWriter is a brilliant standard library tool for this:

func (s *FileServer) broadcast(p *Payload) error {
    s.peerLock.Lock()
    defer s.peerLock.Unlock()
    
    peers := []io.Writer{}
    for _, peer := range s.peers {
        peers = append(peers, peer)
    }
    
    multiWriter := io.MultiWriter(peers...)
    return gob.NewEncoder(multiWriter).Encode(p)
}

By wrapping all peer connections in an io.MultiWriter, the server can execute a single Encode call, and the gob encoder streams the bytes to every peer concurrently. This is elegant, efficient, and leverages Go’s interface system perfectly.

Graceful Shutdown

The quitch channel provides a mechanism for graceful shutdown. An external caller can signal the channel, causing the loop to exit and Start to return. While the current implementation doesn’t perform extensive cleanup (like closing all peer connections explicitly), the structure is in place to add it.

The Inversion of Control

A subtle but powerful design choice is how the transport calls back into the server. The TCPTransportOpts includes:

OnPeer func(Peer) error

When constructing the transport in main.go, this is wired to s.OnPeer. This means the transport doesn’t need to know about the FileServer type—it only knows about a function that takes a Peer. This is Inversion of Control, and it keeps the p2p package free of application-specific logic.

  • 03 - Content Addressable Storage: The Store that FileServer manages.
  • 04 - The P2P Network Layer: The Transport that FileServer consumes.
  • 05 - Message Encoding and Protocol: The Payload and gob encoding used in broadcasting.
  • 07 - Bootstrapping and Entry Point: See main.go to watch the server in action.

07 - Bootstrapping and Entry Point

Every system needs an entry point. For dist_file_storage, that entry point is main.go, accompanied by a simple but effective Makefile. Together, they form the developer experience layer of the project—a dev harness that allows you to spin up a miniature P2P network on your local machine in seconds.

This note walks through main.go, the build process, and how the pieces fit together to create a running demo.

The Goal of main.go

main.go is not a production deployment script. It is a development harness. Its purpose is to prove that the components—Storage, Transport, and Server—can actually work together.

It achieves this by: 1. Creating two FileServer instances. 2. Configuring them to listen on different TCP ports (:3000 and :4000). 3. Telling Node 2 to bootstrap to Node 1. 4. Triggering a StoreData call on Node 2 to observe the broadcast.

Creating the Nodes

The code in main.go likely looks something like this:

func main() {
    // Node 1
    tcpTransportOpts1 := p2p.TCPTransportOpts{
        ListenAddr:    ":3000",
        HandshakeFunc: p2p.NOPHandshakeFunc,
        Decoder:       p2p.DefaultDecoder{},
    }
    tr1 := p2p.NewTCPTransport(tcpTransportOpts1)
    
    s1 := &FileServer{
        FileServerOpts: FileServerOpts{
            StorageRoot:       "storage_3000",
            PathTransformFunc: CASPathTransformFunc,
            Transport:         tr1,
            ListenAddr:        ":3000",
        },
        store: NewStore(...),
    }
    
    // Node 2 (bootstraps to Node 1)
    tcpTransportOpts2 := p2p.TCPTransportOpts{
        ListenAddr:    ":4000",
        HandshakeFunc: p2p.NOPHandshakeFunc,
        Decoder:       p2p.DefaultDecoder{},
    }
    tr2 := p2p.NewTCPTransport(tcpTransportOpts2)
    
    s2 := &FileServer{
        FileServerOpts: FileServerOpts{
            StorageRoot:       "storage_4000",
            PathTransformFunc: CASPathTransformFunc,
            Transport:         tr2,
            ListenAddr:        ":4000",
            BootstrapNodes:    []string{":3000"},
        },
        store: NewStore(...),
    }
}

(Note: Exact constructor names may vary slightly; this is representative of the pattern.)

The Bootstrap Process

Node 1 starts listening on :3000. Node 2 starts listening on :4000 and then attempts to dial :3000. Because the FileServer calls bootstrapNetwork inside Start(), this connection attempt happens automatically.

Once the TCP connection is established: 1. TCPTransport accepts the connection. 2. handleConn calls the OnPeer callback. 3. FileServer.OnPeer adds the new peer to its peers map. 4. Node 1 now knows about Node 2, and Node 2 knows about Node 1.

The StoreData Demonstration

After starting both servers, main.go likely calls:

s2.StoreData("myprivatedata", someReader)

This triggers the full lifecycle described in 06 - Server Orchestration: 1. Node 2 writes the data to its local disk at storage_4000/.... 2. Node 2 constructs a Payload{Key: "myprivatedata", Data: ...}. 3. Node 2 encodes the payload with gob and broadcasts it to all peers (which includes Node 1). 4. Node 1 receives the raw bytes in its transport loop and prints them.

The Makefile

The Makefile provides standard commands:

build:
    go build -o bin/fs .

run:
    go run .

test:
    go test ./... -v

These are simple but essential. make build compiles the binary to bin/fs. make run executes the dev harness. make test runs the unit tests.

In a future iteration, the Makefile might expand to include: - make docker-build - make lint - make proto (for Protocol Buffers)

Observing the System

When you run make run, you should see console output from both nodes. Node 1 will print a message indicating it received an RPC from Node 2. If you check the storage_3000 and storage_4000 directories, you should find identical files stored under the same CAS path, confirming that the broadcast worked.

Limitations of the Harness

It is important to recognize what main.go does not do: - It does not parse command-line flags. - It does not read a configuration file. - It is hardcoded to two local nodes. - It does not demonize or run as a background service.

This is perfectly fine for Phase 1. The harness proves the concept. A production entry point would likely use the flag or cobra package to accept arguments like --listen, --bootstrap, and --storage-root.

  • 06 - Server Orchestration: Understand what StoreData and Start actually do.
  • 04 - The P2P Network Layer: Understand the TCP transport that main.go configures.
  • 09 - Current State and Future Roadmap: See where the entry point and deployment strategy could evolve.

08 - Testing Strategy

A distributed system is only as reliable as its tests. While dist_file_storage is in early development, it already demonstrates a solid testing philosophy centered on unit testing and behavioral validation. This note explores the test suite, the tools used, and the areas that will need more rigorous testing as the project matures.

Test Files Overview

The project contains at least two test files: - storage_test.go - p2p/tcp_transport_test.go

These align with the two most critical layers: storage and networking.

Testing the Storage Layer (storage_test.go)

The storage layer is the easiest to test because it has no external dependencies (no network, no database). It interacts purely with the local filesystem, making it ideal for table-driven tests.

CAS Path Transformation Tests

One of the most important things to test is the CASPathTransformFunc. If the hashing or path-splitting logic is flawed, the entire content-addressability guarantee is broken.

A test for this might look like:

func TestCASPathTransformFunc(t *testing.T) {
    key := "myprivatedata"
    pathKey := CASPathTransformFunc(key)
    
    // A SHA-1 hash is 40 hex characters.
    // Split into 5-char blocks, that's 8 directories deep.
    parts := strings.Split(pathKey, "/")
    require.Equal(t, 8, len(parts))
    
    // Verify the reconstructed path is deterministic
    require.Equal(t, pathKey, CASPathTransformFunc(key))
}

This test validates: 1. Determinism: The same key always produces the same path. 2. Structure: The path has the expected number of segments. 3. Length: Implicitly validates that the SHA-1 hash is being used correctly.

Store Lifecycle Tests

The test suite likely exercises the full lifecycle of the Store:

func TestStore(t *testing.T) {
    s := NewStore(StoreOpts{
        Root:              "test_root",
        PathTransformFunc: CASPathTransformFunc,
    })
    defer teardown(t, s)
    
    key := "my_special_key"
    data := []byte("some jpg bytes")
    
    // Write
    n, err := s.Write(key, bytes.NewReader(data))
    require.NoError(t, err)
    require.Equal(t, int64(len(data)), n)
    
    // Has
    require.True(t, s.Has(key))
    
    // Read
    r, err := s.Read(key)
    require.NoError(t, err)
    
    b, _ := io.ReadAll(r)
    require.Equal(t, data, b)
    
    // Delete
    require.NoError(t, s.Delete(key))
    require.False(t, s.Has(key))
}

This is a comprehensive behavioral test. It doesn’t just test one method; it tests the contract of the Store: that data written can be read back, that Has correctly reflects existence, and that Delete actually removes the file.

The teardown Helper

Tests that write to disk must clean up after themselves. A teardown function is likely used:

func teardown(t *testing.T, s *Store) {
    if err := s.Clear(); err != nil {
        t.Error(err)
    }
}

This uses the Store.Clear() method to wipe the test directory, ensuring test isolation.

Testing the Network Layer (tcp_transport_test.go)

Networking is harder to test than filesystem I/O because it involves concurrency and real system resources (sockets, ports).

The current TCP transport test is likely minimal:

func TestTCPTransport(t *testing.T) {
    opts := TCPTransportOpts{
        ListenAddr:    ":4000",
        HandshakeFunc: NOPHandshakeFunc,
        Decoder:       DefaultDecoder{},
    }
    tr := NewTCPTransport(opts)
    
    require.NoError(t, tr.ListenAndAccept())
    
    // In a real test, you might dial the transport here
    // and verify that an RPC is received on the consume channel.
}

This test proves that the transport can bind to a port and start listening without crashing. It is a “smoke test” for the transport’s initialization logic.

Testing Tools

The project uses the standard Go testing toolkit plus github.com/stretchr/testify v1.11.1.

testify

testify provides: - require.NoError(t, err): Fails the test immediately if an error occurs. - require.Equal(t, expected, actual): Clean, readable assertions. - require.True(t, condition): For boolean checks.

This library dramatically improves the readability of tests compared to manual if err != nil { t.Fatal(err) } blocks.

What’s Missing: The Test Gap

As the project evolves, the test suite will need to expand significantly:

Integration Tests

Currently, there are no integration tests that spin up multiple nodes and verify end-to-end behavior. An integration test would: 1. Start Node 1. 2. Start Node 2 and bootstrap to Node 1. 3. Call StoreData on Node 2. 4. Assert that Node 1 received the data and wrote it to disk.

This would catch bugs in the broadcasting logic, the peer management, and the transport wiring.

Concurrency Tests

The Store uses a sync.RWMutex, and the FileServer uses a sync.Mutex. These should be stress-tested with many goroutines writing and reading simultaneously. The Go race detector (go test -race) should be run regularly.

Failure Injection

A robust distributed system must be tested under failure. Future tests should: - Kill a peer mid-broadcast and ensure the server doesn’t crash. - Attempt to dial an offline bootstrap node and verify graceful handling. - Corrupt a payload and verify that the receiver handles it.

Fuzzing

Go 1.18+ supports fuzzing. The PathTransformFunc and Decoder logic are perfect candidates for fuzz tests, which could uncover edge cases with malformed inputs.

Running the Tests

As defined in the Makefile:

make test

This runs go test ./... -v, executing all tests in all packages with verbose output.

  • 03 - Content Addressable Storage: The storage logic being tested.
  • 04 - The P2P Network Layer: The transport logic being tested.
  • 09 - Current State and Future Roadmap: See how testing fits into the project’s evolution.

09 - Current State and Future Roadmap

Every ambitious project begins as a scaffold. dist_file_storage is no exception. While its architecture is thoughtfully designed, it is crucial to be honest about where it stands today and where it could go tomorrow. This note provides a transparent assessment of the current implementation and a speculative roadmap for turning this scaffold into a production-adjacent distributed storage system.

Current State: Phase 1 Scaffold

The project has successfully laid the groundwork. The following components are functional and well-architected:

  • Content-Addressable Storage: The Store and CASPathTransformFunc work correctly. Files can be written, read, checked, and deleted using SHA-1 based paths.
  • TCP Transport: Nodes can listen for connections and accept raw byte payloads.
  • Peer Connection: Nodes can dial bootstrap peers and establish TCP connections.
  • Broadcasting: The FileServer can broadcast a gob-encoded Payload to all connected peers using io.MultiWriter.
  • Test Harness: main.go demonstrates two local nodes interacting, and a Makefile streamlines the build.

These are non-trivial achievements. The project has a heartbeat.

What’s Missing

The README and codebase explicitly acknowledge several gaps. Here is a consolidated view of the eight major areas needing work:

1. Real Request/Response Protocol

The FileServer’s loop currently just prints incoming RPC messages. There is no logic to parse a message type (e.g., STORE, GET, LIST) and act on it. The system cannot handle a peer asking for a file it doesn’t have.

2. Outbound Connection Management

While Dial exists, connections are not robustly managed. There is no reconnection logic, no heartbeat/ping mechanism, and no timeout handling. If a peer disconnects, the server may not notice until it tries to broadcast.

3. GOBDecoder Integration

GOBDecoder is implemented but sits unused. The transport defaults to DefaultDecoder, which reads fixed 1028-byte buffers. The broadcast path uses gob, but the general receive path does not. The wire protocol needs to be unified.

4. Message Loop Logic

The loop method in server.go is a placeholder. It needs a dispatcher that can: - Handle incoming store requests. - Respond to file retrieval requests by reading from the Store and streaming the data back. - Manage peer health and propagate disconnections.

5. Replication and Sharding

Currently, broadcasting sends the full file to every peer. This is not scalable. A real system needs: - Replication: Store N copies of a file across the network for redundancy. - Sharding: Split large files into chunks and distribute them.

6. Consensus and Metadata

There is no shared state. If Node 1 has a file and Node 2 doesn’t, Node 2 has no way of knowing that Node 1 has it without asking every node. A distributed hash table (DHT) like Kademlia would solve this.

7. Deployment and Operations

The project is purely a dev harness. There is no: - Command-line interface (CLI). - Configuration file support. - Docker or containerization. - Logging framework (just fmt.Printf). - Metrics or monitoring.

8. Security

There is no encryption, no authentication, and no verification that received data matches its hash. A malicious peer could send garbage data, and the current system would write it to disk.

The Roadmap: A Path Forward

How would one evolve this project? Here is a speculative, phase-by-phase roadmap.

Phase 2: The Protocol

  • Define a message format: Implement a length-prefixed framing protocol.
  • Message types: STORE, GET, DELETE, HANDSHAKE, HEARTBEAT.
  • Request IDs: Correlate responses with requests.
  • Dispatcher: Replace the print statement in loop with a real message router.
  • GOB or Protobuf: Standardize on gob for Go-only or migrate to Protocol Buffers for cross-language support.

Phase 3: File Retrieval

  • Implement GetData(key string) on the FileServer.
  • When a GET request arrives, look up the key in the local Store.
  • If found, stream the file back to the requester.
  • If not found, forward the request to known peers (recursive lookup).

Phase 4: Resilient Networking

  • Add a heartbeat/ping mechanism to detect dead peers.
  • Implement exponential backoff reconnection for bootstrap nodes.
  • Add connection timeouts and graceful shutdown of peer connections.
  • Use a context-based cancellation strategy.

Phase 5: Distributed Hash Table (DHT)

  • Integrate a Kademlia-style DHT or a simplified chord ring.
  • Allow nodes to find which peer holds a given hash without broadcasting to everyone.
  • This is the step that transforms the system from a broadcast mesh into a scalable network.

Phase 6: Chunking and Erasure Coding

  • Split large files into fixed-size blocks (e.g., 256KB).
  • Use a Merkle tree to verify block integrity.
  • Implement Reed-Solomon erasure coding for redundancy without full replication.

Phase 7: Production Hardening

  • Build a CLI with cobra or urfave/cli.
  • Add structured logging with zap or logrus.
  • Containerize with Docker and provide a docker-compose.yml for local clusters.
  • Add Prometheus metrics.
  • Implement TLS for peer connections.

Why This Project Matters

Despite its incompleteness, dist_file_storage is an excellent educational artifact. It demonstrates: - How to layer a distributed system. - How to use Go interfaces for testability. - How to implement CAS on a local filesystem. - How to structure a P2P network without massive frameworks.

Every missing feature is an opportunity to learn. The gap between Phase 1 and Phase 7 is exactly the gap between a student project and a system like IPFS—and that gap is precisely what makes this codebase such a valuable starting point.

  • 01 - Project Overview: The original goals of the project.
  • 02 - Architecture and Design Patterns: The solid foundation that makes this roadmap feasible.
  • 06 - Server Orchestration: The layer where most of the Phase 2 work will happen.
  • 07 - Bootstrapping and Entry Point: Where Phase 7 hardening will begin.