Alasir Enterprises
Main Page >  Articles >  Alpha: The History in Facts and Comments  

Main Page
About Us
Alpha: The History in Facts and Comments
Alpha Powered Dig my grave both long and narrow
Make my coffin neat and strong 

(from an old American song)

Paul V. Bolotoff
Release date: 14th of April 2005
Last modify date: 22nd of April 2007

in Russian

Alpha 21164 (EV5, EV56) and 21164PC (PCA56, PCA57)

DEC unveiled the very first information about the 2nd generation Alpha processor at a Hot Chips conference located in Palo Alto (California, the USA) which started on the 14th of August 1994. Although 21164 (EV5) was presented officially only after a respective press release by DEC which was dated by the 7th of September 1994. The processor was based upon the core of EV45 and was rather an evolution of the latter than a revolutionary new design. When compared to EV4 or EV45, the number of pipelines was doubled, both integer and floating-point. In addition, the floating-point pipelines were transformed to run through 9 stages rather than 10. However, the integer pipelines weren't all the same if compared to each other: while both were capable of basic arithmetical and logical operations, the 1st only could multiply and shift, and the 2nd only was able to process conditional/unconditional branches. Both pipelines were able to calculate virtual addresses for load instructions, but the 1st one only — for store ones. The floating-point pipelines were different as well: the 1st could execute any floating-point code except of multiply instructions which were the only code the 2nd pipeline could process. I-box was able to fetch and decode up to 4 instructions per cycle to provide the execution units with a proper load. Was manufactured using the same proprietary 4-layer 0.5µ CMOS5 process as of EV45, therefore required the same 3.3V power supply. Consisted of 9.3 mln. transistors (including 7.8 mln. spent on integrated caches), possessed a die size of 299mm² (close to theoretical limits of the technological process involved). Core frequencies of 21164 ranged from 266MHz to 333MHz (TDP from 46W to 56W). Form-factor: IPGA-499 (Interstitial Pin Grid Array).
I-cache and D-cache were sized and organised just like in EV4, i. e. 8Kb each, write-through. Although D-cache was made dual-ported, i. e. was able to deliver data for 2 load instructions per cycle. Sacrificing transistors for the sake of performance, D-cache was composed physically of 2 identical parts of 8Kb each, so data could be read from either one but had to be written to the both. The processor was accommodated with 96Kb of the integrated L2 cache (S-cache, secondary cache), write-back, 3-way set associative, and C-box accessed it through a dedicated 128-bit data bus. At the same time, B-cache was also functional though remained optional, consisted of external cache SRAMs and could be as large as 64Mb, though usually from 1Mb to 4Mb. 128-bit data bus to B-cache was multiplexed with the system data bus still. So, EV5 supported 3 cache levels, and was the first processor to feature such hierarchy.
S-cache was accessed through a 4-stage pipeline: 2 cycles for tag look-up and bank activation plus 2 cycles for data access and delivery (16 bytes per cycle), though an extra cycle was required for data to propagate across the processor from C-box to D-cache and either E-box or F-box. Engineers who designed EV5 considered to implement tag look-up and data access in parallel, so all 3 banks would deliver data to be evaluated upon arrival. This approach could reduce the pipeline by 1 stage, but would cause a serious impact on processor power consumption (+40% estimated). However, it didn't prevent D-cache from operating this way, but there was only 1 bank of 8Kb rather than 3 banks of 32Kb each. Even more, read latency of D-cache was reduced from 3 to 2 cycles. Every line of S-cache was 64 bytes wide with one tag per line, though it was possible to address every line as if there were two sublines 32 bytes wide each because I-cache and D-cache operated with 32-byte lines. S-cache was inclusive to D-cache. In turn, B-cache was inclusive to S-cache with no regard to write-back policy of the latter and the difference in associativities. I-TLB held 48 entries (for pages sized from 8Kb to 4Mb), D-TLB — 64 entries (for pages sized from 8Kb to 4Mb) and was dual-ported for load operations in the same manner as D-cache. The system data bus was fixed-length at 128 bits with additional 16 bits for ECC protection, still multiplexed with the data path to B-cache, though more effective because of a new split-transaction protocol. The system address bus was 40-bit, the system control bus — 10-bit.
Micrograph of Alpha 21164 (EV5) Floor-plan of Alpha 21164 (EV5)

21164A (EV56) was introduced at a Microprocessor Forum in October of 1995. It was a modified release of EV5 after a redesign for a proprietary 4-layer 0.35µ CMOS6 process. It was manufactured at the same semiconductor factory in Hudson (Massachusetts, the USA), and DEC had invested about 450 mln. USD in modernisation prior to. The most important architectural difference was BWX (Byte-Word Extension) — a set of 6 additional instructions to load/store data in 8- or 16-bit quanta (LDBU, LDWU, STB, STW, SEXTB, SEXTW). Right from the start, the Alpha architecture was forced to load/store data in 32- or 64-bit quanta what caused certain difficulties while porting or emulating code belonging to other processor architectures such as i386 or MIPS. A request to implement BWX in hardware was submitted in June of 1994 by Richard Sites and was approved in June of 1995. Although to perform BWX transfers a system logic had to be aware of it as well. Core frequencies of 21164A (EV56) ranged from 366MHz to 666MHz (TDP from 31W to 55W), and the manufacturing started somewhere in the summer of 1996. Also was produced by Samsung under a licence agreement signed in June of 1996 (a 666MHz version was available from Samsung only). Consisted of 9.66 mln. transistors, possessed a die size of 209mm², required a dual voltage power supply (2.5V for primary and 3.3V for input/output circuits).
Micrograph of Alpha 21164A (EV56)
DEC Alpha 21164A (EV56) - front view DEC Alpha 21164A (EV56) - back view
(click to enlarge, 53Kb) (click to enlarge, 144Kb)

21164PC (PCA56) was introduced on the 17th of March 1997. It was a low-cost version of EV56 designed by DEC and Mitsubishi cooperatively. S-cache was absent as well as accompanying logic, but I-cache size was increased from 8Kb to 16Kb. Was manufactured with the same CMOS5 process and required a 2.5V/3.3V power supply. Consisted of 3.5 mln. transistors and possessed a die size of 141mm². Core frequencies of 21164PC (PCA56) ranged from 400MHz to 533MHz (TDP from 26W to 35W). Form-factor was changed in favour of IPGA-413. There was also 0.28µ 21164PC (PCA57) manufactured by Samsung. I-cache and D-cache of it were doubled in size, and I-cache was made 2-way set associative. At the same time, transistors' count increased to 5.7 mln. but die size decreased to 101mm². It required lower voltages than PCA56 (2.0V for primary and 2.5V for input-output circuits). Core frequencies of 21164PC (PCA57) ranged from 533 to 666MHz (TDP from 18W to 23W).
In addition to BWX which was inherited from EV56, PCA56 and PCA57 supported a new set of 13 SIMD (Single Instruction, Multiple Data) instructions called MVI (Motion Video Instructions): packing (PACKWB, PACKLB), unpacking (UNPKBW, UNPKBL), choosing minimum (MINUB8, MINSB8, MINUW4, MINSW4) or maximum (MAXUB8, MAXSB8, MAXUW4, MAXSW4) and motion estimating (PERR). The most interesting was the last one, Pixel ERRor, which processed 8 pixels at once. Unlike the MMX instruction set for i386 processors which utilised floating-point registers to store data, MVI used integer registers for that purpose.
The first standard system logic set developed for EV5 was DEC Alcor (21171). It supported 33MHz system bus frequency, up to 16Mb of B-cache, up to 8Gb of FPM or EDO DRAM ECC through a 256-bit data path, also a single 64-bit PCI bus at 33MHz. Support for either the ISA or EISA bus could be implemented through use of a standard bridge such as i82378IB (ISA) or i82378EB (EISA). There was no built-in IDE controller which could be installed separately using third-party hardware. DEC Alcor consisted physically of 5 chips: 1 universal controller with the PCI bus support (Control, I/O and Address — CIA) and 4 data switches (DSW). A new release of this system logic set for 21164A, i. e. with the BWX support added, was called DEC Alcor 2 (21172). It was followed soon by DEC Pyxis (21174), a single-chip solution supporting 66MHz system bus frequency and up to 1,5Gb of 66MHz SDRAM ECC or parity accessed through a 128-bit data path, also a single 64-bit PCI bus at 33MHz. There was also VLSI Polaris developed for systems based on 21164PC (PCA57).
<< Previous page Next page >>

Copyright (c) Paul V. Bolotoff, 2005-07. All rights reserved.
A full or partial reprint without a permission received from the author is prohibited.
Designed and maintained by Alasir Enterprises, 1999-2007
rhett from, walter from