Lock-Free SPSC Ring Buffer in C++

Written by Shivam Kachhadiya

Designing ultra-low latency systems means eliminating locks, syscalls, and unpredictable pauses. In high-frequency trading or real-time engines, even microseconds matter. This article explains how to build a Lock-Free Single Producer Single Consumer (SPSC) Ring Buffer using C++ atomics and memory ordering.

🚀 Problem Statement

We need fast communication between two threads without mutex overhead. Mutex causes:

Solution → Lock-free ring buffer

🧠 Architecture

Producer  --->  [ Ring Buffer ]  --->  Consumer

Only:

This is called the Single Writer Principle → zero write contention → naturally lock-free.

📦 Data Structure


template<typename T, size_t N>
struct RingBuffer {
    T buffer[N];

    std::atomic<size_t> head;
    std::atomic<size_t> tail;
};

Explain each:

⚙️ Producer (push)

Producer steps:

Why? Before updating tail → data must be fully written. Release ensures consumer sees correct memory.


bool push(const T& item) {
    size_t t = tail.load(std::memory_order_relaxed);
    size_t next = (t + 1) % N;

    if (next == head.load(std::memory_order_acquire))
        return false; // full

    buffer[t] = item;

    tail.store(next, std::memory_order_release);
    return true;
}

⚙️ Consumer (pop)

Consumer steps:


bool pop(T& item) {
    size_t h = head.load(std::memory_order_relaxed);

    if (h == tail.load(std::memory_order_acquire))
        return false; // empty

    item = buffer[h];

    head.store((h + 1) % N, std::memory_order_release);
    return true;
}

🧩 Why Acquire–Release?

Release → Ensures all writes before store become visible
Acquire → Ensures all reads after load see latest data

Creates happens-before relationship between producer and consumer.

🔥 Why No Mutex?

Mutex → kernel involvement, context switch, 1000ns+, unpredictable
Atomics → CPU instruction only, few nanoseconds
Lock-free gives deterministic latency, high throughput → ideal for HFT

🔥 Why Fast?

🧠 Common Follow-ups

📝 Perfect Interview Answer

I implemented a lock-free SPSC ring buffer to achieve ultra-low latency inter-thread communication. Since only one thread writes head and one writes tail, there’s no write contention. I used std::atomic with acquire-release ordering to guarantee visibility without locks. The buffer is preallocated and cache-aligned to avoid heap allocation and false sharing. This eliminates syscalls and context switching, allowing us to achieve 10–20M messages/sec with sub-microsecond latency.

🔥 Problem Setup (Example)

Assume:

Initial state:

buffer: [ _  _  _  _ ]
head = 0
tail = 0

Meaning: head == tail → EMPTY

🎯 Scenario Simulation

  1. Producer pushes A → tail moves to 1 → buffer: [ A _ _ _ ]
  2. Producer pushes B → tail = 2 → buffer: [ A B _ _ ]
  3. Producer pushes C → tail = 3 → buffer: [ A B C _ ]
  4. Consumer pops first item → head = 1 → reads A → buffer stays [ A B C _ ]
  5. Consumer pops second item → head = 2 → reads B → buffer: [ A B C _ ]
  6. Producer pushes D → tail wraps to 0 → buffer: [ A B C D ]
  7. Producer pushes E → tail = 1 → buffer: [ E B C D ]
  8. Consumer pops all → head = 1 → reads C,D,E → final state head=tail=1 → EMPTY

Works without locks because producer only modifies tail and consumer only modifies head. Only reads other's pointer → safe.

Lock-Free C++ Atomics Concurrency Low Latency