Alasir Enterprises
Main Page >  Articles >  Functional Principles of Cache Memory  

Main Page
About Us
Functional Principles of Cache Memory

Paul V. Bolotoff
Release date: 20th of April 2007
Last modify date: 20th of April 2007

in Russian

Line, Tag and Index Size

As it has been mentioned in the previous chapter, cache memory of any kind is subdivided for so-called lines. Well, it's unreasonable very much to provide every single byte in cache with an address field indicating its location in operating memory because it would lead to terrible consequences in means of manufacturing costs. If to consider a 32-bit linear physical address space which allows for up to 4Gb of operating memory, every single cache byte must be supplied with 4 bytes for addressing needs this way! Additionally, such a cache would be poor very much in means of performance. Thus, it's much wiser to address certain sets of adjacent bytes which would form, yes, cache lines. In real life, these lines are sized at 32 or 64 bytes each usually, however could be found as large as 1024 bytes each. Of course, any line size must be a positive integer power of two. Nevertheless, even if size requirement has been met, not any set of adjacent bytes may be cached because of an additional limitation known as address alignment. In other words, a set of adjacent bytes may be placed into a cache line if and only if its starting address is aligned at the line size boundary. To throw an example in, a 32-byte cache line may only be filled with information from operating memory located at hexadecimal (decimal) addresses of 00-1F (00-31), 20-3F (32-63), 40-5F (64-95) and so on. Apart of other benefits, this simple rule allows to reduce number of address bits required per line. To be absolutely correct, by 5 in the example above, i. e. log2(32) = 5. Finally, a cache line must either be loaded fully with some valid information or be empty completely which means the same to be loaded with invalid information. So, it cannot carry any partial content. There is only one exception to this rule: if two cache lines share one address field, they may be considered as sublines of one large line, and they're independent to some extent from each other.
Every address field consists of two primary parts: a dynamical (tag) which contains the higher address bits, and a statical (index) which contains the lower address bits (nota bene: do not associate with dynamic and static memory cells, that's different). The first one may be modified during run-time while the second one is fixed. To explain what these parts are for, it needs to introduce logical memory segmentation. Imagine that there is some physical memory space which is organised in form of M memory segments sized equally. Suppose that those memory segments and cache segments are exactly of the same size. Every memory segment contains N memory lines, of course, sized equally. Once again, suppose that those memory lines and cache lines are exactly of the same size. Therefore, to obtain an address of a memory line, it needs to determine the number of its memory segment first and the number of the memory line inside of that segment second, then to merge both numbers. Substitute the segment number with the tag and the line number with the index, and you should have realised the idea in general.
Cache memory to operating memory mapping

Therefore, cache line's tag size depends on 3 factors:
  • size of cache memory;
  • associativity of cache memory;
  • cacheable range of operating memory.

It could be calculated using the following formula:
Cache tag equation

   Stag — size of cache tag, in bits;
   Smemory — cacheable range of operating memory, in bytes;
   Scache — size of cache memory, in bytes;
   A — associativity of cache memory, in ways.
Consider some abstract machine with 1Gb maximum of operating memory and 1Mb of cache memory (doesn't matter at all of what level) with 2-way set associative policy which requires exactly 11 bits per tag. In other words, there are 1Gb / 512Kb = 2048 memory segments to be addressed by any particular tag, thus log2(2048) = 11 bits. However, one must say that it takes 11 bits per tag to maintain cacheability of any memory location over strictly 1Gb range with such a cache design given. If there are fewer bits per tag available, such a cache is still effective though with a reduced operating range. For example, if there are only 8 bits per tag implemented, they allow to operate with 28 = 256 memory segments of 512Kb each, hence delivering only 128Mb of cacheable range. Anything above this boundary will be never cached. Although undesirable, such a situation was common in times of 486 and Pentium processors. Their system logic sets supported large amounts of operating memory for those old days, from 256Mb to 1Gb for Pentium-class systems typically, but their B-caches were capable of dealing with just 64Mb usually. Many of those systems were designed upon regular 8-bit SRAM chip used for tag array of B-cache and eight 8x32Kbit SRAM chips to produce a direct-mapped data bank of 256Kb. Those chips were often installed in sockets, thus could be removed for upgrade without much trouble. Later, they disappeared in favour of less expensive derivatives with higher pin counts, which were soldered directly on mainboards. Well, it's much cheaper to produce and install two 32x32Kbit SRAM chips instead of eight 8x32Kbit ones plus eight sockets. The largest B-cache to appear on a Pentium-class mainboard is of 2Mb. The idea of such discrete caches was abandoned in the following generations of mainstream x86 hardware with an exception of a few processor families which featured cache chips soldered on processor modules — Intel Pentium II (Klamath, Deschutes) and Pentium II Xeon (Drake), Intel Pentium III (Katmai) and Pentium III Xeon (Tanner), AMD Athlon (K7, K75).
Index size depends only on cache segment size and line size. Actually, it must be big enough to enumerate all lines within any particular segment. For instance, if there is 512Kb cache segment with 32-byte line size, index size is log2(512Kb / 32b) = 14 bits. In a matter of fact, every cache line within a particular segment has a dedicated number assigned and hardwired. It cannot be changed and there is no need to. If there are N cache segments, there must always be N cache lines carrying a similar index. If to keep segment size intact, line size increase imposes index size decrease as well as reduction on total number of index fields because of reduction on total number of lines involved. However, larger lines feature higher load/store latencies while transferring from/to other caches or operating memory since bandwidth of any bus interface is well finite. Additionally, larger lines decrease caching efficiency because of higher level of pollution with unnecessary information.
After all, every cache line is provided usually with some redundant information for proper error detection. Data fields are protected usually with either parity checking on byte level or one of error checking and correcting (ECC) protocols on field level, most of which are based upon the Hamming algorithm. Tags may be protected with single-bit parity checking, though it isn't nearly as mandatory as in case of data fields. Whatever a particular implementation of error detection is, any fault protecting logic adds some complexity to an original design and gets transaction latencies affected inevitably. That is, a control sum of data field has to be recalculated and compared to the stored one on every read. The worst case is partial write into a valid line because a control sum has to be calculated twice: before and after the line change.
<< Previous page Next page >>

Copyright (c) Paul V. Bolotoff, 2007. All rights reserved.
A full or partial reprint without a permission received from the author is prohibited.
Designed and maintained by Alasir Enterprises, 1999-2007
rhett from, walter from