First of all, it needs to introduce basics of physical and virtual memory
addressing. Physical memory space refers to actual size of operating memory
installed plus PCI address range while virtual is some imaginary space available
to software tasks. In a matter of fact, virtual memory space is larger than or
equal to physical. Every task running is allocated with some virtual memory
which is mapped onto physical memory in some way, so that several virtual
addresses may refer to the same physical address. Both virtual and physical
memory spaces use pages for addressing needs. Although both kinds of pages are
supposed to be of the same size, actual physical memory page size is a matter of
memory row size which is hardware dependent, but it makes no real difference at
all considering both address spaces. There are 4Kb and 8Kb virtual pages used
most often, though larger sizes may be also supported such as 2Mb or 4Mb.
Although technique of page based virtual addressing has been developed a long
time ago, the first x86 processor to feature support for it is Intel 386DX
introduced in 1985, though basic capabilities of virtual addressing have been
implemented in Intel 80286 about 3 years before (protected mode). For that
matter, the 8086 and 8088 processors by Intel, which have given life to the
whole architecture back in 1978-79, support segment based addressing only (real
mode). In general, this mode is much more limited and inconvenient from
programmer's point of view.
So, there are virtual and physical memory pages. Processor functional units
operate with virtual addresses, but cache and operating memory controllers have
to deal with physical addresses. Conversion between both kinds of addresses is
done in A-box, though actual virtual address calculations may be delegated to
E-box. Operating system manages virtual memory and defines ways of its use,
though hardware requirements and limitations apply as well. Common practice
shows that complete virtual memory space should be divided for two parts: one
for operating system's kernel and another for user tasks. There are also page
tables which serve the purpose of finding out a physical page address given a
virtual one. For instance, 2-level scheme seems to be the most popular, so there
are root page table and many user page tables. The first one is sized within one
memory page typically. Every user page table is of one memory page usually, and
maximal number of these tables is limited by maximal number of entries in root
page table. Therefore, any record of root page table addresses one user page
table, and any record of user page table addresses one physical memory page.
Virtual memory address contains offsets only and is useless completely without
page tables. The process of obtaining a physical page address through use of
these tables is known as page table walk. All x86 processors starting with Intel
386DX follow such a 2-level scheme for 4Kb pages. However, most x86 processors
starting with Intel Pentium introduced in 1993 include support for 4Mb pages
which eliminates need of user page tables because all memory pages can be
addressed through root page table solely. Moreover, both 4Kb and 4Mb pages may
be used simultaneously, and there is page size bit per every entry of root page
table for that purpose.
|
|
|
|
|
R |
AVL |
PS |
A |
PCD |
PWT |
U/SW/R |
V |
D |
Reserved |
Available to Software |
Page Size |
Accessed |
Page Cache Disable |
Page Write-Through |
User / Supervisor |
Write / Read |
Valid |
Dirty |
|
However, it's unwise to perform a page table walk every time a physical
address is due. Some kind of cache must be developed to hold those page table
entries. That's what TLBs (Translation Look-aside Buffers) are exactly for. On
the analogy with regular caches, they may be separated or unified (I-TLB and
D-TLB vs. U-TLB), so that I-TLB contains entries related to instruction pages
and D-cache — to data pages. There may be more than one TLB level, and
the second one (S-TLB) is unified usually. Different TLBs may feature different
set associativities, numbers of read/write ports, entry replacement policies
asf.
There are hardware managed and software managed TLBs. The first ones require
additional processor logic to perform page table walks and fill TLBs. This
approach allows for background operations, i. e. without task interrupting.
The second ones are dependent on operating system. Every time a TLB miss is
detected, a special exception is generated by processor. It's catched by
operating system which saves a task state, performs a page table walk, fills a
TLB and restores the task state. Some may say that approach of software managed
TLBs is slower, but it isn't an issue really. Page table walking code is small
very much and well cacheable, so they cause an almost unnoticeable impact on
performance overall. On the other hand, this approach is more flexible as it
gives more power to operating system designers. By the way, page table walk
isn't necessarily to be successful every time. If a virtual page has no
corresponding physical page, a page fault exception is generated with
forthcoming task termination.
It needs to mention that most caches are tagged physically, but some may be
tagged virtually. This solution improves performance because there is no need in
TLB look-ups, but introduces a serious hazard of line doubling. If two tasks
share a single physical line which appears under different addresses in their
virtual memory spaces, and one task modifies its copy, the other task may not be
aware of it and continue using an obsolete copy. The easiest way to avoid this
issue is to flush cache on every task switch, though it isn't much attractive in
means of performance. Another way is to supply every line with an auxiliary
field known as ASID (Address Space IDentifier) or RID (Region IDentifier). There
are other ways as well. In general, virtual tagging is popular for I-caches
only.