Multithreading: Thundering herd, Cache stampede and Lock convoy problems

(Image Credit — Pinterest)

What is Concurrency?

Carrying out more than one tasks at a time in a pseudo-parallel fashion.

What is Multithreading?

According to Wikipedia,

Thundering herd problem

Consider a scenario where multiple threads/processes (with same scheduling priority) waiting for a particular lock Or an I/O event.

  • In a proxy server — legacy backend scenario:
    When the backend is down and multiple worker threads in the proxy server retrying to get connection to the backend overwhelming the computation resources of the system.
  • Cache stampede problem:
    Consider a web server with response caching, once the cached page expires multiple concurrent threads will try to fetch the data and recompute the cache value which will cause the computation resources to be overwhelmed.

Mitigating thundering herd problem

Depending on the nature of the thundering herd problem there are several techniques that can be used to mitigate.

  • Introduce a cache layer to cache frequently served content in-order to reduce the hits on the server.
  • Introduce jitter to break the synchronisation across the clients to avoid collisions.
  • Introduce proxy cache to cache frequently served content in-order to reduce the hits on the backend thus avoiding the herding.

Cache stampede problem

Cache stampede Or Cache Missing Storm is a type of cascading failure that can occur when massively parallel computing systems with caching mechanisms come under a very high load. This behaviour is sometimes also called dog-piling.

Cache stampede mitigation

There are 3 main category of approaches proposed to mitigate cache stampede.

  • External re-computation
  • Probabilistic early expiration

Locking based on Cache key

To prevent multiple simultaneous re-computations of the same value, upon a cache miss, a process will attempt to acquire the lock for that “cache key” and recompute the cache value only if it acquires the lock.

External re-computation

In this approach, computation of the cache value is moved to an independent external process from the process which needs the cache value.

  • Periodically
  • When a process needing the value encounters a cache miss

Probabilistic early expiration

In this approach each process recomputes the cache value before its expiration by making an independent probabilistic decision, where the probability of performing the early re-computation increases as we get closer to the expiration of the value.

Lock convoy problem

Similar to the thundering herd problem lock convoy problem is also performance problem which occurs when multiple threads of equal priority contend repeatedly for the same lock.

Mitigating lock convey problem

Main reason for the lock convey problem is the locks which are used to serialise the access to a commonly used resource, such as a memory heap or a thread pool.

  • Alternating the relative priorities of the contending threads.
    Since threads with the same level of priority contend over the same lock, altering their priority can prevent the contention and in-turn the lock convoys.
  • Implement the logic using non-locking alternatives such as lock-free algorithms.

References:

https://en.wikipedia.org/wiki/Concurrency_(computer_science)
https://en.wikipedia.org/wiki/Multithreading_(computer_architecture)
https://lwn.net/Articles/632590/
https://en.wikipedia.org/wiki/Thundering_herd_problem
http://neopatel.blogspot.com/2015/05/rate-limiting-apis-and-java-services.html
https://www.ehcache.org/documentation/2.8/recipes/thunderingherd.html
https://en.wikipedia.org/wiki/Cache_stampede
https://en.wikipedia.org/wiki/Lock_convoy
https://en.wikipedia.org/wiki/Non-blocking_algorithm

Senior Software Engineer @WSO2, B.Sc.(Hons).Computer Engineering

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store