Cache Coherence Problem and Approaches
What is a Cache Coherence Problem?
Cache coherence is a concern raised in a multi-core system distributed L1 and L2 caches. Each core has its own L1 and L2 caches and they need to always be in-sync with each other to have the most up-to-date version of the data.
The Cache Coherence Problem is the challenge of keeping multiple local caches synchronized when one of the processors updates its local copy of data which is shared among multiple caches.
Imagine a scenario where multiple copies of same data exists in different caches simultaneously, and if the processors are allowed to update their own copies freely, an inconsistent view of memory can result.
For example, imagine a dual-core processor where each core brought a block of memory into its private cache, and then one core writes a value to a specific location. When the second core attempts to read that value from its cache, it will not have the most recent version unless its cache entry is invalidated and a cache miss occurs. This cache miss forces the second core’s cache entry to be updated. To trigger the cache invalidation we need cache coherence policies. If there is no cache coherence policy in-place, the wrong data would be read and invalid results would be produced, possibly crashing the program.
Cache Write Policies
There two main cache write policies.
- Write back : Write operations are usually made only to the cache. Main memory is only updated when the corresponding cache line is flushed from the cache.
- Write through : All write operations are made to main memory as well as to the cache, ensuring that main memory is always valid.
From the above description it is clear that Write back policy results in inconsistency. If two caches contain the same line, and the line is updated in one cache, the other cache will unknowingly have an invalid value. Subsequently read to that invalid line produce invalid results.
But if we think deeper even the Write through policy also has consistency issues. Even though memory is updated inconsistency can occur unless other cache monitor the memory traffic or receive some direct notification of the update.
In order to solve these issue we introduce Cache Coherency Protocols. Objective of any cache coherency protocol is to load the recently used local variables into the appropriate caches and keep them through numerous reads and writes, while using the protocol to maintain consistency of shared variables that might be in multiple caches at the same time.
First let’s see what are the common ways of writing into a Cache before we jump into the solutions to the Cache Coherence Problem.
Write Through (WT) Protocol
There are two fundamental implementations of the WT protocol.
- Write through with update protocol
- Write through with invalidation of copies
Write through with update protocol
When a processor writes a new value into its cache, the new value is also written into the memory module that holds the cache block being changed. Some copies of this block may exist in other caches, these copies must be updated to reflect the change caused by the write operation.
We update the other cache copies by doing a broadcast with the updated data to all processor modules in the system. Each processor module receives the broadcast data, it updates the contents of the affected cache block if this block is present in its cache.
Write through with invalidation of copies
When a processor writes a new value into its cache, this value is written into the memory and all other copies in other caches are invalidated. This also done by broadcasting the invalidation request through the system. All caches receives this invalidation request and the cache which contains the updated data flushes its cache line.
Write Back (WB) Protocol
In the WB protocol, multiple copies of a cache block may exist if different processors have loaded (read) the block into their caches.
In this approach if some processor wants to change this block, it must first become the exclusive owner of the block.
When the ownership is granted to this processor by the memory module that is the home location of the block. All other copies, including the one in the memory module, are invalidated.
Now the owner of the block may change the contents of the memory.
When another processor wishes to read this block, the data are sent to this processor by the current owner. The data are also sent to the home memory module, which requires ownership and updates the block to contain the latest value.
There are software level and hardware level solutions for cache coherence problem.
Software Level Solution — Compiler-based cache coherence mechanism
In the software approach, we try to detect the potential code segments which might cause cache coherence issues and treat them separately.
Potential cache coherence problem is transferred from run time to compile time, and the design complexity is transferred from hardware to software.
Downside of this approach is in the compile time; software approaches generally make conservative decisions. which leads to inefficient cache utilization.
Compiler-based cache coherence mechanism perform an analysis on the code to determine which data items may become unsafe for caching, and they mark those items accordingly.
So, there are some more cache-able items, and the operating system or hardware does not cache those items. This the simplest approach to prevent any shared data variables from being cached.
But this not an optimal approach and too conservative, because a shared data structure may be exclusively used during some periods and may be effectively read-only during other periods. It is only during the exclusively accessed periods the cache coherence issue occurs.
More efficient approaches analyze the code to determine safe periods for shared variables. Then the compiler inserts instructions into generated code to enforce cache coherence during the critical periods.
Hardware Level Solutions
Hardware level solutions are way better than the software level solutions as they provide dynamic recognition at run time of potential inconsistency conditions. Because the problem is only dealt with when it actually arises, there is more effective use of caches, leading to improved performances over a software approach.
Hardware schema can be divided into two categories :
- Directory Protocols
- Snoopy Protocols
In this approach we collect and maintain information about where copies of lines reside in each cache.
Typical implementation has a centralized controller which is a part of the main memory controller. There is a directory that is stored in main memory which contains global state information about the contents of the various local caches.
When an individual cache controller makes a request, the centralized controller checks and issues necessary commands for data transfer between memory and caches or between caches themselves.
It is also responsible for keeping the state information up to date, therefore, every local action that can affect the global state of a line must be reported to the central controller. The controller maintains information about which processors have a copy of which lines.
Before a processor can write to a local copy of a line, it must request exclusive access to the line from the controller.
Let’s see the flow when an individual cache controller tries to update its local copy of the cache line.
- Local cache controller request for an exclusive access to the line from the centralized cache controller
- Before granting the exclusive access, the controller sends a message to all processors with a cached copy of this time, forcing each processors to invalidate its copy.
- Centralized controller receives the acknowledgement back from each processor
- Controller grants exclusive access to the requesting processor.
- When another processor tries to read a line that is exclusively granted to another processors, this results in a cache miss, it will send a miss notification to the controller.
- Upon receiving the cache miss notification controller will issue command to the processor holding the exclusive access to write-back to main memory.
Drawbacks of directory schema :
- Central cache controller bottleneck
- Communication overhead between various cache controller and central controller.
Snoopy protocols distribute the responsibility for maintaining cache coherence among all of the cache controllers in a multiprocessor system.
A cache must recognize when a line that it holds is shared with other caches. When an update action is performed on a shared cache line, it must be announced to all other caches by a broadcast mechanism.
Each cache controller is able to “snoop” on the network to observed these broadcasted notification and react accordingly.
Snoopy protocols are ideally suited to a bus-based multiprocessor, because the shared bus provides a simple means for broadcasting and snooping.
There are two basic approaches to snoopy protocols explored :
- Write-invalidate snoopy protocol
- Write-update (write-broadcast) snoopy protocol
There can be multiple readers but only one write at a time.
Initially, a line may be shared among several caches for reading purposes.
When one of the caches wants to perform a write to the line it first issues a notice that invalidates that tine in the other caches, making the line exclusive to the writing cache.
Once the line is exclusive, the owning processor can make local writes until some other processor requires the same line.
Write with update protocol
There can be multiple writers as well as multiple readers.
When a processors wishes to update a shared line, the word to be updated is distributed to all others, and caches containing that line can update it.