Alasir Enterprises
Main Page >  Articles >  Functional Principles of Cache Memory  

 
Main Page
 
 
Reviews
 
Articles
 
Software
 
Reference
 
Motley
 
 
About Us
 
 
Functional Principles of Cache Memory

Paul V. Bolotoff
 
Release date: 20th of April 2007
Last modify date: 7th of May 2007

Contents:
in Russian

 
TLB and Virtual Memory

First of all, it needs to introduce basics of physical and virtual memory addressing. Physical memory space refers to actual size of operating memory installed plus PCI address range while virtual is some imaginary space available to software tasks. In a matter of fact, virtual memory space is larger than or equal to physical. Every task running is allocated with some virtual memory which is mapped onto physical memory in some way, so that several virtual addresses may refer to the same physical address. Both virtual and physical memory spaces use pages for addressing needs. Although both kinds of pages are supposed to be of the same size, actual physical memory page size is a matter of memory row size which is hardware dependent, but it makes no real difference at all considering both address spaces. There are 4Kb and 8Kb virtual pages used most often, though larger sizes may be also supported such as 2Mb or 4Mb. Although technique of page based virtual addressing has been developed a long time ago, the first x86 processor to feature support for it is Intel 386DX introduced in 1985, though basic capabilities of virtual addressing have been implemented in Intel 80286 about 3 years before (protected mode). For that matter, the 8086 and 8088 processors by Intel, which have given life to the whole architecture back in 1978-79, support segment based addressing only (real mode). In general, this mode is much more limited and inconvenient from programmer's point of view.
 
So, there are virtual and physical memory pages. Processor functional units operate with virtual addresses, but cache and operating memory controllers have to deal with physical addresses. Conversion between both kinds of addresses is done in A-box, though actual virtual address calculations may be delegated to E-box. Operating system manages virtual memory and defines ways of its use, though hardware requirements and limitations apply as well. Common practice shows that complete virtual memory space should be divided for two parts: one for operating system's kernel and another for user tasks. There are also page tables which serve the purpose of finding out a physical page address given a virtual one. For instance, 2-level scheme seems to be the most popular, so there are root page table and many user page tables. The first one is sized within one memory page typically. Every user page table is of one memory page usually, and maximal number of these tables is limited by maximal number of entries in root page table. Therefore, any record of root page table addresses one user page table, and any record of user page table addresses one physical memory page. Virtual memory address contains offsets only and is useless completely without page tables. The process of obtaining a physical page address through use of these tables is known as page table walk. All x86 processors starting with Intel 386DX follow such a 2-level scheme for 4Kb pages. However, most x86 processors starting with Intel Pentium introduced in 1993 include support for 4Mb pages which eliminates need of user page tables because all memory pages can be addressed through root page table solely. Moreover, both 4Kb and 4Mb pages may be used simultaneously, and there is page size bit per every entry of root page table for that purpose.
 
x86 virtual addressing for 4Kb pages
x86 root page table entry for 4Kb pages
x86 user page table entry for 4Kb pages
x86 virtual addressing for 4Mb pages
x86 root page table entry for 4Mb pages
R AVL PS A PCD PWT U/SW/R V D
Reserved Available
to Software
Page
Size
Accessed Page Cache
Disable
Page
Write-Through
User /
Supervisor
Write /
Read
Valid Dirty

However, it's unwise to perform a page table walk every time a physical address is due. Some kind of cache must be developed to hold those page table entries. That's what TLBs (Translation Look-aside Buffers) are exactly for. On the analogy with regular caches, they may be separated or unified (I-TLB and D-TLB vs. U-TLB), so that I-TLB contains entries related to instruction pages and D-cache — to data pages. There may be more than one TLB level, and the second one (S-TLB) is unified usually. Different TLBs may feature different set associativities, numbers of read/write ports, entry replacement policies asf.
 
There are hardware managed and software managed TLBs. The first ones require additional processor logic to perform page table walks and fill TLBs. This approach allows for background operations, i. e. without task interrupting. The second ones are dependent on operating system. Every time a TLB miss is detected, a special exception is generated by processor. It's catched by operating system which saves a task state, performs a page table walk, fills a TLB and restores the task state. Some may say that approach of software managed TLBs is slower, but it isn't an issue really. Page table walking code is small very much and well cacheable, so they cause an almost unnoticeable impact on performance overall. On the other hand, this approach is more flexible as it gives more power to operating system designers. By the way, page table walk isn't necessarily to be successful every time. If a virtual page has no corresponding physical page, a page fault exception is generated with forthcoming task termination.
 
It needs to mention that most caches are tagged physically, but some may be tagged virtually. This solution improves performance because there is no need in TLB look-ups, but introduces a serious hazard of line doubling. If two tasks share a single physical line which appears under different addresses in their virtual memory spaces, and one task modifies its copy, the other task may not be aware of it and continue using an obsolete copy. The easiest way to avoid this issue is to flush cache on every task switch, though it isn't much attractive in means of performance. Another way is to supply every line with an auxiliary field known as ASID (Address Space IDentifier) or RID (Region IDentifier). There are other ways as well. In general, virtual tagging is popular for I-caches only.
 
Virtual and physical tagging
<< Previous page  

Copyright (c) Paul V. Bolotoff, 2007. All rights reserved.
A full or partial reprint without a permission received from the author is prohibited.
 
Designed and maintained by Alasir Enterprises, 1999-2007
rhett from alasir.com, walter from alasir.com