Lock-Free SPSC Ring Buffer in C++

Written by Shivam Kachhadiya • Last Updated: Feb 16, 2026

Designing ultra-low latency systems means eliminating locks, syscalls, and unpredictable pauses. In high-frequency trading or real-time engines, even microseconds matter. This article explains how to build a Lock-Free Single Producer Single Consumer (SPSC) Ring Buffer using C++ atomics and memory ordering.

🚀 Problem Statement

We need fast communication between two threads without mutex overhead. Mutex causes:

context switching
kernel involvement
latency spikes

Solution → Lock-free ring buffer

🧠 Architecture

Producer  --->  [ Ring Buffer ]  --->  Consumer

Only:

Producer writes tail
Consumer writes head

This is called the Single Writer Principle → zero write contention → naturally lock-free.

📦 Data Structure


template<typename T, size_t N>
struct RingBuffer {
    T buffer[N];

    std::atomic<size_t> head;
    std::atomic<size_t> tail;
};

Explain each:

buffer → Stores elements
head → Index to read
tail → Index to write

⚙️ Producer (push)

Producer steps:

Load head (acquire)
Check if buffer full
Write element
Update tail (release)

Why? Before updating tail → data must be fully written. Release ensures consumer sees correct memory.


bool push(const T& item) {
    size_t t = tail.load(std::memory_order_relaxed);
    size_t next = (t + 1) % N;

    if (next == head.load(std::memory_order_acquire))
        return false; // full

    buffer[t] = item;

    tail.store(next, std::memory_order_release);
    return true;
}

⚙️ Consumer (pop)

Consumer steps:

Load tail (acquire)
If empty → return
Read data
Update head (release)


bool pop(T& item) {
    size_t h = head.load(std::memory_order_relaxed);

    if (h == tail.load(std::memory_order_acquire))
        return false; // empty

    item = buffer[h];

    head.store((h + 1) % N, std::memory_order_release);
    return true;
}

🧩 Why Acquire–Release?

Release → Ensures all writes before store become visible
Acquire → Ensures all reads after load see latest data

Creates happens-before relationship between producer and consumer.

🔥 Why No Mutex?

Mutex → kernel involvement, context switch, 1000ns+, unpredictable
Atomics → CPU instruction only, few nanoseconds
Lock-free gives deterministic latency, high throughput → ideal for HFT

🔥 Why Fast?

No locks → no syscalls
Contiguous memory → cache friendly
No heap → preallocated
No false sharing → alignas(64)

🧠 Common Follow-ups

Q: Why only SPSC?
A: MPMC needs CAS → slower + ABA issues
Q: Why modulo slow?
A: Use power-of-two + bitmask → index & (N-1)
Q: What if producer faster?
A: Buffer full → drop or spin
Q: What about false sharing?
A: Pad head and tail

📝 Perfect Interview Answer

I implemented a lock-free SPSC ring buffer to achieve ultra-low latency inter-thread communication. Since only one thread writes head and one writes tail, there’s no write contention. I used std::atomic with acquire-release ordering to guarantee visibility without locks. The buffer is preallocated and cache-aligned to avoid heap allocation and false sharing. This eliminates syscalls and context switching, allowing us to achieve 10–20M messages/sec with sub-microsecond latency.

🔥 Problem Setup (Example)

Assume:

Buffer size = 4
Indexes: 0 1 2 3

Initial state:

buffer: [ _  _  _  _ ]
head = 0
tail = 0

Meaning: head == tail → EMPTY

🎯 Scenario Simulation

Producer pushes A → tail moves to 1 → buffer: [ A _ _ _ ]
Producer pushes B → tail = 2 → buffer: [ A B _ _ ]
Producer pushes C → tail = 3 → buffer: [ A B C _ ]
Consumer pops first item → head = 1 → reads A → buffer stays [ A B C _ ]
Consumer pops second item → head = 2 → reads B → buffer: [ A B C _ ]
Producer pushes D → tail wraps to 0 → buffer: [ A B C D ]
Producer pushes E → tail = 1 → buffer: [ E B C D ]
Consumer pops all → head = 1 → reads C,D,E → final state head=tail=1 → EMPTY

Works without locks because producer only modifies tail and consumer only modifies head. Only reads other's pointer → safe.

Lock-Free C++ Atomics Concurrency Low Latency