Alasir Enterprises
Main Page >  Articles >  Functional Principles of Cache Memory  

 
Main Page
 
 
Reviews
 
Articles
 
Software
 
Reference
 
Motley
 
 
About Us
 
 
Functional Principles of Cache Memory

Paul V. Bolotoff
 
Release date: 20th of April 2007
Last modify date: 20th of April 2007

Contents:
in Russian

 
Associativity

Associativity is a characteristic of cache memory related directly to its logical segmentation: there are as many segments as many ways defined. In other words, N-way set associative cache memory means that information stored at some address in operating memory could be placed (cached) in N locations (lines) of this cache memory. The basic principle of logical segmentation says that there is only one line within any particular segment to be capable of caching information located at some memory address. Thereby, once the address is known, there is only one line in a whole segment to be investigated upon. When a cache controller receives a read or write request, it queries appropriate lines of all cache segments on information availability (cache hit, i. e. valid data) or absense (cache miss, i. e. invalid data or valid but related to another memory address). Either all segments return misses or all but one which returns a hit indeed. If there is a hit, then it just has to be serviced accordingly, no need to perform any further investigation. If there are all misses on a read request, the controller reissues the original request to a lower cache controller and so forth. If all of them fail to complete the operation, then it must be routed to a memory controller. On a write request, the cache controller has to choose a segment to post a record to. In order to do so, it checks condition and other identifiers of those lines to choose from and arrives at a decision accordingly to one of the following popular algorithms:
  • LRU (Least Recently Used) — a line containing information with the oldest last hit shall be chosen for a replacement;
     
  • LRR (Least Recently Replaced), also known as FIFO (First In, First Out) — a line containing information with the oldest record shall be chosen for a replacement;
     
  • LFU (Least Frequently Used) — a line containing information which has been requested least often shall be chosen for a replacement;
     
  • Random — a casual line to be chosen for a replacement.

In a matter of fact, the more advanced algorithms compared to Random are more effective, but require more complex logic designs to service them. LFU supposes every cache line to be supported by a special counter which should be increased on every hit. LRU and LRR imply some kind of a time stamp counter instead of a hit counter, though LRU also imposes a reset on every hit. Random has no need in any line counters and could be well implemented with a single number generator or something like that.
 
There are two particular representatives of cache associativity policy which are opposite in fact. If information located at some address in operating memory can be placed in any cache line, such cache memory is called fully associative. On the other hand, if this information can be placed in only one cache line, such cache memory is called direct mapped. To elaborate these statements a little, there are as many cache segments as cache lines available in case of full associativity, and there is only one cache segment present in case of direct mapping. Hence, direct mapped cache memory may be referenced as 1-way set associative, and number of ways possessed by fully associative one equals to number of cache lines available. Direct mapped cache is faster apparently than fully associative one because it makes an easy task for controller to decide on a line for writing/reading once a memory address is known. However, this approach is the least efficient because controller has no choice while writing because there is only one line available to choose from. So, significant performance degradations occur if there are two or more memory addresses challenging each other for a single cache line. In contrary, fully associative cache is the slowest yet the most efficient one. Nevertheless, other variants of cache associativity policy do exist between those two described above because there is plenty of room for trade-offs in means of performance and efficiency indeed. For instance, 2-, 4- and 8-way set associative caches tend to be the most popular.
 
The following drawing illustrates how caches with different implementations of associativity policy do their job. Consider some abstract model with 8 cache lines and physical memory space equivalent in size to 64 cache lines. It needs to store the 10th so-called memory line in this cache (nota bene: all lines are numbered from 0 and up the way, so this line carries number 9). In case of direct-mapped cache this memory line may be written in the only one place: 9 mod 8 = 1 (remainder on division), i. e. cache line number 1 of the only 8-line cache segment. 2-way set associative organisation offers two possible places: 9 mod 4 = 1, i. e. cache line number 1 of any 4-line cache segment. Fully associative approach delivers the maximum of options: 9 mod 1 = 0, i. e. cache line number 0 of any 1-line cache segment, basically, any cache line available. Actions taken on a read request are related fairly. In case of direct-mapped cache the only line that needs to be probed is number 1. 2-way set associative mode implies lines number 1 and 5 to be queried, and fully associative mode requires all lines to be investigated upon.
 
Cache associativities explained

Although rarely and of a little size, fully associative cache memory may be found in real-life processors. For instance, those of Cyrix 6x86 family contain 256 bytes of such cache called instruction line buffer. This instruction-only "zero level" cache is backed up by dual-ported U-cache of either 16Kb (6x86 and 6x86L) or 64Kb (6x86MX and 6x86MII). The primary purpose of this tiny little cache is to reduce load imposed on U-cache and to minimise execution stalls possible when U-cache is overloaded with data. Fully associative approach is popular while designing TLBs (shall be told in depth about them later), branch target caches, input/output buffers and so on. In a matter of fact, I-cache and D-cache feature low associativities (2 or 4 ways usually) because if increased they would affect access latencies and lead to serious performance decrease. To address this issue, associativities of S-cache are set higher (8 or 16 ways usually) because access latencies of this cache memory aren't of the primary importance. However, B-cache doesn't feature many associativities as regular: 1-way for caches located on mainboards, 2- or 4-way for those located on processor modules. It's just complicated to build high-associativity caches out of standard discrete components.
 
<< Previous page Next page >>

Copyright (c) Paul V. Bolotoff, 2007. All rights reserved.
A full or partial reprint without a permission received from the author is prohibited.
 
Designed and maintained by Alasir Enterprises, 1999-2007
rhett from alasir.com, walter from alasir.com