Alasir Enterprises
Main Page >  Articles >  Functional Principles of Cache Memory  

Main Page
About Us
Functional Principles of Cache Memory

Paul V. Bolotoff
Release date: 20th of April 2007
Last modify date: 20th of April 2007

in Russian

Line Condition Identifiers

It would be quite logical to suppose that any cache line should be given with some service information describing its status. For instance, how to find out whether a particular cache line is valid (i. e. contains some useful information) or invalid (i. e. full of junk)? Of course, a special policy must be developed to address such a common situation. Say, to consider a line invalid if all tag and data bits are set high, i. e. to all ones. However, this approach would turn life of any cache controller into a neverending nightmare. When a read or write request arrives, the controller activates all lines carrying an appropriate index and starts to check their tag and data bits on per line basis (a consequent logical multiplication, AND). If the result is equal to one, such a cache line should be considered as invalid. To throw an example in, if a cache memory of 256Kb is 16-way set associative with 64-byte line size and operational over 4Gb memory range, it takes 288 tag bits and 8192 data bits to be read and compared against validity on every transaction at the worst! The situation gets even worse if to involve data integrity algorithms. Of course, this approach is of no practical use. Even if to simplify it by paying no attention to data bits and checking tag bits only, thus leaving the last memory segment uncacheable, it's viable for limited use only. For that matter, B-caches of 486 and Pentium times were designed this way sometimes, but bear in mind that they were direct-mapped and their tag size was small relatively. So, this approach is rather an exception than a rule. However, it becomes a completely different matter if to add an extra bit per line to describe its validity. Let's suppose that a line is considered valid if this bit is set high. Life gets simpler: cache controller doesn't need to check anything else to decide whether a particular line is junky or not.
While an excellent idea itself, validity bit may lead to serious reliability issues if gets inverted accidentally. In real life, it's highly unlikely though still possible for a junk line to become valid this way. There are several ways to prevent this, but the simplest one is to establish another accompanying bit to carry an inverted value of the actual validity bit. If both bits happen to be of the same value, a respective cache line must be considered invalid. It isn't really possible to change values of two neighborous bits to their opposites because any kind of electromagnetic interference known on this planet is unipolar.
There is modify bit also known as dirty bit. Its purpose is to indicate whether contents of a particular cache line are different to what is stored in operating memory. If there is no such a bit per line implemented, then all lines must be kept unmodified/clean, what means that any line change must be forwarded to memory immediately. Such a cache is called memory coherent. Well, it isn't attractive very much because of significant memory bandwidth waste. On the other hand, it allows for faster cache writes because any unmodified line may be overwritten with another one easily, no need to take any additional actions. Unlike that, no modified line may be overwritten without prior eviction to a lower cache or memory because of apparent data loss. In other words, cache memory controller should keep an eye on these lines somehow, and that's what modify bit is exactly for. It should also be mentioned that D-caches and U-caches may or may not feature support for modified lines while I-caches are memory coherent always. Although it is possible to store dirty lines in cache memory with no modify bits implemented, this approach isn't popular. It simply treats all cache lines as dirty right from the start, no matter whether they actually are. However, this trick has a side effect of very large write traffic to a lower cache memory or operating memory. Every line loaded into such a cache must be unloaded back inescapably at some moment in the future. Some system logic sets of the past, such as VIA Apollo VP2/97 (also known as AMD-640) for Pentium-class processors, supported this mode as an option for B-cache aiming larger cacheability region of operating memory, but performance shown in the real life was rather disappointing.
The third important cache line condition identifier is shared bit. Its purpose is to indicate whether a particular line is contained in other caches including those which belong to other processors. For instance, if one processor updates its local copy, other processors should be aware of this event and treat their copies accordingly. There are two methods for that purpose, Write Update (also known as Write Broadcast) and Write Invalidate. The first one obligates the modifying processor to broadcast changes made to other processors, so they can update their copies. The second one differs from the previous one in means that no data transferred but a special message, and the other processors have got to invalidate their copies in accordance with it. In general, Write Update produces significantly more system bus traffic than Write Invalidate, so this approach works well for systems with star-based processor topology (also known as point to point topology) and especially well for those ones with some kind of direct interprocessor connections. On the other hand, Write Invalidate does the best for systems with bus-based topology.
Given the information above, it's the right time to start with logical condition and coherence protocols which form basis of almost all physical implementations. The most popular one is MESI (Modified, Exclusive, Shared, Invalid):
Condition bit 0 bit 1
Modified 1 0
Exclusive 0 1
Shared 1 1
Invalid 0 0

This protocol requires physical availability of two condition bits per line to describe all four possible states. Obviously, Invalid means that a particular line contains no useful information, so it may be considered as empty. Exclusive implies that contents of this line are identical absolutely to memory, so it may be called a clean line, also there are no copies of it in other caches, i. e. this one is exclusive literally. Modified states that contents of this line are different to memory, so it may be called a dirty line, also there are still no copies of it in other caches. Shared defines a clean line which may have copies in other caches (they were for sure at the moment of condition writing, but could be invalidated later).
The MESI protocol isn't perfect though simple fairly and well useful for uniprocessor configurations with one or two levels of cache memory. Here is a simple example to illustrate its behaviour. For a good start, a fresh line has been loaded into D-cache from the operating memory, so it becomes Exclusive. A while later, another line gets loaded into the same place, thus forcing the original line for eviction to S-cache where it remains Exclusive. Upon a cache hit, this line gets loaded back from S-cache to D-cache, and both lines become Shared. If a clone located in D-cache has to get dirty, there are two reasonable solutions possible: a) the changes will be issued to both S-cache and the operating memory, so both clones will stay Shared; b) the changes will not be issued anywhere, a clone in S-cache will be labelled as Invalid to allow that one in D-cache for Modified. Any other arcane approach is not to be reviewed here.
There are several relatives of the MESI protocol such as MSI (Modified, Shared, Invalid), MOSI (Modified, Owned, Shared, Invalid), MOESI (Modified, Owned, Exclusive, Shared, Invalid) and MOWESI (Modified, Owned, Written, Exclusive, Shared, Invalid). The last two are the most advanced, so about them in detail. MOESI requires 3 physical condition bits per line to be implemented, and it's primary purpose is to eliminate that drawback of MESI which doesn't allow to keep dirty lines in more than one cache. A new state has been developed to address this issue, Owned, which has something to do with ownership rights to a particular cache line, all clones of it, and a respective memory line as well. These rights belong to system logic by default, so it must keep track of all system agents (nota bene: these aren't processors only, but also all bus master devices) to receive valid information at any moment. The first violation attempt of these rights was the condition of Modified which made system logic's life more complicated considerably. Every time a system agent asks for any information from a cacheable memory space, the system logic has to terminate the request and issue an inquiry to all processors whether any of them has an appropriate line dirty in caches of its own. On a positive response, the processor holding the line is obligated to write it back to the operating memory and change the line status to Exclusive. Upon completion, the system logic restarts the original request and delivers the information. If there were no condition of Modified, all caches of all processors would be memory coherent, so the system logic would have no need in performing any further investigation and could just deliver what has been requested directly from the operating memory. Let's get back to Owned. By granting a line with this condition, a respective processor takes all rights and duties regarding it from the system logic. If any system agent requests this line, the owning processor must respond and supply it with a valid copy. No operating memory is updated on this transaction even if the system logic gets involved. In a matter of fact, while the state of Modified may be elaborated as Exclusive Modified, the state of Owned is nothing else but Shared Modified. If any line becomes Owned, all copies of it must remain Shared including those in caches of other processors.
However, the MOESI protocol isn't perfect as well. In a matter of fact, the condition of Owned gives all power to master processor, but places other processors in read only position. So, if any of them needs to perform a write to its Shared line, it has to be done to operating memory as well, thus terminating the master processor's state of Owned. That's all right for most multiprocessor environments featuring only one active writer and several readers usually, but it's a different story for multithreaded environments where several processors or processor cores may execute different threads of the same process concurrently. An additional condition, Written, has been introduced in order to improve performance in this area, and it has resulted in creation of the MOWESI protocol. When a write to a Shared line occurs, it changes the status to Written, all other processors either invalidate or update their obsolete copies, after that it needs to change the line status to Modified or Owned respectively. An operating memory update has been avoided successfully either way. This protocol has been introduced in the 2-core IBM POWER5 processor.
There are numerous low-level condition & coherence protocols documented which feature different combinations of condition identifiers, write methods, bus signaling asf. To name a few the oldest: Illinois, Write Once, Berkeley, DEC Firefly, Xerox Dragon. For instance, the protocol implemented in most DEC Alpha systems is MOESI-based with Write Invalidate support. It takes three physical condition bits per line to operate: tagCtlV (Valid), tagCtlS (Shared) and tagCtlD (Dirty). An additional bit, tagCtlP, is designed to handle even parity on them. The following table illustrates all combinations of the condition bits and their actual meaning, where "x" character means that a respective bit value is of no difference (nota bene: there is also a simpler protocol supported on uniprocessor systems only, which doesn't utilise tagCtlS, but it isn't to be described here).
Condition tagCtlV tagCtlS tagCtlD
Invalid 0 x x
Clean 1 0 0
Dirty 1 0 1
Clean Shared 1 1 0
Dirty Shared 1 1 1

The conditions of Clean and Dirty are fully equivalent to Exclusive and Modified respectively from the MOESI protocol as well as Invalid. However, Clean Shared and Dirty Shared behave not exactly as Shared and Owned respectively. Operating memory gets updated always on a write into such a line while other processors invalidate their obsolete copies, and the line written gets Clean. In other words, any line labelled as Clean or Dirty may become Clean Shared or Dirty Shared respectively, but Clean Shared may never be turned into Dirty Shared or vice versa.
<< Previous page Next page >>

Copyright (c) Paul V. Bolotoff, 2007. All rights reserved.
A full or partial reprint without a permission received from the author is prohibited.
Designed and maintained by Alasir Enterprises, 1999-2007
rhett from, walter from