Multithreading: Thundering herd, Cache stampede and Lock convoy problems

(Image Credit — Pinterest)

What is Concurrency?

Concurrency is when multiple sequences of operations are run in overlapping periods of time.

Refer to the link for further details.

What is Multithreading?

Multithreading is the ability of a CPU (or a single core in a multi-core processor) to provide multiple threads of execution concurrently, supported by the operating system.

Refer to the link for further details.

Thundering herd problem

Once the lock is released (on a shared object) Or an I/O event completion, all the waiting threads (blocked threads) will be notified and brought back to runnable state.

Even though a large number of threads are awoken only one thread out of that will be able to handle the I/O event Or acquire the lock.

When the threads wake up, they will each try to handle the event, but only one will win. All threads will compete for resources, possibly freezing the computer, until the herd is calmed down again.

The thundering herd problem occurs when a large number of threads are awoken by a single lock release Or I/O completion event. This hinders the performance of the system.

Most Unix/Linux kernels overcome the thundering herd problem by serialising the response to accept, thus, only one thread is waken up if more than one are blocking on accept against a single open file descriptor Or a lock.

In Linux based kernels this is achieved by the EPOLLEXCLUSIVE flag.

“EPOLLEXCLUSIVE” flag exclusively wakes up only one thread when more than one thread are blocking on accept. “EPOLLROUNDROBIN” flag is used in conjunction with “EPOLLEXCLUSIVE” flag to evenly distribute the wake ups.

Refer to the link for further details.

Adding to the above definition, the following scenarios can also be considered as variants of thunder herd problem,

  • In a client-server scenario:
    A burst of requests hit the server overwhelming the number of available threads to process the requests (Or overwhelming the computation power of the system)
  • In a proxy server — legacy backend scenario:
    When the backend is down and multiple worker threads in the proxy server retrying to get connection to the backend overwhelming the computation resources of the system.
  • Cache stampede problem:
    Consider a web server with response caching, once the cached page expires multiple concurrent threads will try to fetch the data and recompute the cache value which will cause the computation resources to be overwhelmed.

Refer to the link for further details.

Mitigating thundering herd problem

OS level herding when multiple threads awoken by a single lock release Or I/O completion event.

Mitigation technique:

  • Introduction of specialised flags to serialise the response to accept lock release and I/O completion events.

Client — Server scenario herding when number of requests overwhelm the servers processing power.

Mitigation technique:

  • Introduce rate limiting, throttling, and burst control mechanisms to prevent the thundering herd problem.
  • Introduce a cache layer to cache frequently served content in-order to reduce the hits on the server.

Proxy server — backend scenario herding when multiple threads consistently retrying to connect to a unavailable backend,

Mitigation technique:

  • Implement exponential backoff algorithms for clients to retry after a specified amount of time.
  • Introduce jitter to break the synchronisation across the clients to avoid collisions.
  • Introduce proxy cache to cache frequently served content in-order to reduce the hits on the backend thus avoiding the herding.

Cache stampede problem

It is a variant of thundering herd problem which occurs to caches in massively concurrent systems.

In the above section we discussed of introducing a cache layer as an option to mitigate thundering herd problem in client-server systems. But caches are also susceptible to herding when it comes to high levels of concurrent loads.

Consider a client-server scenario with a cache layer. Under heavy load, when the cached version of that page expires, multiple threads of execution will all attempt to render the content of that page simultaneously.

Multiple threads trying to re-render and re-cache the same page at the same time will exhaust shared resources. As none of the concurrent threads know other threads are also trying to re-render the same page.

In cases when the load is very high, the congestion for the shared resources can even lead to a situation where it results in preventing the page from ever being completely re-rendered and re-cached, as every attempt to do so times out. This can turn into a cascading failure and result in zero cache hit rate and ever congested system.

Refer to the link for further details.

Cache stampede mitigation

  • Locking based on Cache key
  • External re-computation
  • Probabilistic early expiration

Locking based on Cache key

External re-computation

The recomputation of the external process can be triggered in different ways:

  • When the cache value approaches its expiration
  • Periodically
  • When a process needing the value encounters a cache miss

Probabilistic early expiration

Since the probabilistic decision is made independently by each process, the effect of the stampede is mitigated as fewer processes will expire at the same time.

Apart from these main 3 approaches having a multi-level cache layer (L1, L2 and L3) can also help mitigating the cache stampede as it reduces the probability of cache misses.

Lock convoy problem

Lock convoys create performance impact due to the repeated context switching and under utilisation of scheduling quota.

Each time a thread tries to acquire the lock and fails, the thread will be pre-empted (pre-emptive scheduling) by the CPU and moved to waiting state and the context will be switched to a new thread.

When multiple threads repeatedly context switched upon failing to acquire the lock it creates performance overhead. Unlike deadlock and livelock situations, the threads in a lock convoy do make progress.

Refer to the link for further details.

Mitigating lock convey problem

There are few techniques we can use to mitigate the lock convoys,

  • Use non-preemptive scheduling (in-contrast to pre-emptive scheduling. Non-preemptive scheduling prevents repeated context switching to new thread until the thread completes task.
  • Alternating the relative priorities of the contending threads.
    Since threads with the same level of priority contend over the same lock, altering their priority can prevent the contention and in-turn the lock convoys.
  • Implement the logic using non-locking alternatives such as lock-free algorithms.


Thank you for Reading!

Originally published at on December 21, 2020.

Senior Software Engineer @WSO2, B.Sc.(Hons).Computer Engineering