Alasir Enterprises
Main Page >  Articles >  Alpha: The History in Facts and Comments  

Main Page
About Us
Alpha: The History in Facts and Comments
Alpha Powered Dig my grave both long and narrow
Make my coffin neat and strong 

(from an old American song)

Paul V. Bolotoff
Release date: 14th of April 2005
Last modify date: 22nd of April 2007

in Russian

Alpha 21064 (EV4, EV45) and 21066 (LCA4, LCA45)

The first processor of the Alpha family was called 21064 ("21" implied that Alpha was an architecture of the 21st century, "0" — a processor's generation, "64" — a computational capability in bits), also code-named as EV4 ("EV" was [supposedly] the abbreviation of "Extended VAX" and "4" — a technological process' generation, CMOS4; in turn, CMOS stood for Complementary Metal Oxide Semiconductor). To mention, a prototype of EV4 was ready in 1991 by using a less detailed CMOS3 process, therefore with cache sizes reduced and with no floating-point unit. Nevertheless, it was an important threshold for tuning and polishing off the architecture and software. EV4 was introduced officially in November of 1992 at a COMDEX in Las Vegas (Nevada, the USA). It was manufactured with a proprietary 3-layer 0.75µ technological process (in the future, it was modified towards a 0.675µ CMOS4S, the optical modification of CMOS4). Consisted of 1.68 mln. transistors, possessed a die size of 233mm², required a 3.3V power supply. Core frequencies of 21064 ranged from 150MHz to 200MHz (TDP from 21W to 27W). Supported multiprocessing as one of the architecture's key features. Form-factor: PGA-431 (Pin Grid Array).
The L1 cache was integrated: 8Kb for instructions (I-cache, instruction cache), direct-mapped, also 8Kb for data (D-cache, data cache), direct-mapped and write-through. Read latency of D-cache was 3 cycles. Every line of I-cache consisted of 32 instruction bytes, a 21-bit tag record, an 8-bit branch history field and of several auxiliary fields. Every line of D-cache consisted of 32 data bytes and a 21-bit tag record. The L2 cache (B-cache, back-up cache) was a recommended option to be implemented through external synchronous or asynchronous SRAM chips, direct-mapped, write-back, write-ahead and sized up to 16Mb (from 512Kb to 2Mb usually). Every line consisted of 32 data or instruction bytes with a 1-bit long-word parity or 7-bit long-word ECC field, a 17-bit maximum tag record with an additional 1-bit long-word parity protection and a 3-bit condition flag with an additional parity bit. Read and write speeds of B-cache were programmable in the processor's cycles. The system data bus was either 64-bit or 128-bit wide (programmable, with a 1-bit long-word parity or 7-bit long-word ECC field) and was multiplexed with B-cache data bus, thus physical bus lines were switched between these logical buses if necessary. The system address bus was 34-bit wide. B-cache was organised to be inclusive to D-cache, i. e. contained a full copy of the latter. A processor and no one else could perform read/write operations on B-cache, though a system logic was granted a permission to read B-tags (tags of B-cache) because it was convenient for cache coherence mechanisms to work this way. In other words, a system logic was able to perform so-called snoop operations on B-cache with no processor involved.
EV4 was powered with one integer pipeline (E-box, 7 stages) and one floating-point pipeline (F-box, 10 stages). The instruction decoder and scheduler (I-box) was able to supply up to 2 commands per clock in-order to the functional units, namely E-box, F-box and load/store unit (A-box). The cache memory and system bus controller (C-box) worked in cooperation with A-box and supervised integrated I-cache and D-cache as well as external B-cache. Calculations of virtual addresses were handled by E-box. The branch prediction unit maintained a 4096-entry branch prediction table with 2 bits per entry. There was I-TLB (Instruction TLB) of 8 entries for 8Kb pages and 4 entries for 4Mb pages, also D-TLB (Data TLB) of 32 entries for pages sized from 8Kb to 4Mb. Both I-TLB and D-TLB were fully associative.
Micrograph of Alpha 21064 (EV4) Floor-plan of Alpha 21064 (EV4)
DEC Alpha 21064 (EV4) - front view DEC Alpha 21064 (EV4) - back view
(click to enlarge, 59Kb) (click to enlarge, 134Kb)

The first workstation of the Alpha architecture, DEC 3000 Model 500 AXP (code-named as Flamingo), was introduced in November of 1992. It carried a 150MHz 21064, 512Kb of B-cache, 32Mb of main memory, an integrated 8-bit video controller with 2Mb of VRAM, a 1Gb SCSI HDD, a SCSI CD-ROM, a built-in 10Mbit Ethernet controller (thick coaxial and twisted pair), built-in sound and ISDN controllers, also a 19" monitor (1280x1024x72Hz). All peripherals were served by the proprietary TURBOchannel bus. The price was impressive: 39 ths. USD. Although there was less expensive workstation, DEC 3000 Model 400 AXP with 21064 at 133MHz, much more affordable machine had to exist anyway.
DEC tried to design a 21064-powered personal computer supporting the ISA or EISA peripheral bus since February of 1991. There were 35 systems of the Beta project engineered and built successfully, each using a 100MHz EV4 prototype, an Intel 82380 ISA system logic set and other proprietary and third-party hardware. However, the upcoming Theta project rather failed because of engineering mistakes which crept into a mainboard powered by the Intel 82350DT EISA system logic set. However, two design teams located in Maynard (Massachusetts, the USA) and Ayr (Scotland, the UK) worked around all issues and released DECpc AXP 150 (code-named as Jensen) in August of 1992. It contained a 150MHz EV4, 512Kb of B-cache, an AT form-factor mainboard, industry-standard 72-pin FPM parity SIMMs and EISA peripherals. Although this machine ran DEC OSF/1 and OpenVMS, its future was tied to Windows NT. So, DECpc AXP 150 was introduced on the 28th of October 1992 in New York (New York, the USA) at the Windows on Wall Street presentation when Bill Gates demonstrated this OS for the first time.
There were also three 21064-powered server families: 2-processor DEC 4000, 6-processor DEC 7000 (with 182MHz processors) and DEC 10000 (with 200MHz processors). DEC 7000 and DEC 10000 were modular designs, they featured 4Mb of B-cache per processor and could accommodate up to 14Gb of operating memory (with 7 memory modules 2Gb each installed). While DEC 4000 was designed to support the FutureBus+ peripheral bus, DEC 7000 and DEC 10000 could also be configured for the XMI peripheral bus given an appropriate module (or several ones). DEC 7000 and DEC 10000 could be also powered with NVAX+ processors, hence called VAX 7000 and VAX 10000 (reconfiguration was possible simply by replacing the processor modules).
With a respect to its excellent performance, 21064 was expensive considerably for most potential customers, thus a low-priced brother was released in September of 1993, 21066 (LCA4 or LCA4S). It was based upon the core of EV4, but with the operating memory and PCI controllers integrated additionally as well as with several secondary functional units. On the other hand, the system data bus width was reduced to 64 bits causing a negative impact on performance. LCA4 was manufactured using a 0.675µ CMOS4S process resulting in a die size even smaller than of original EV4 (209mm² compared to 233mm²). However, its clock frequencies were lowered to range from 100MHz to 166MHz, presumably to avoid potential overheating issues common for ventilated badly desktop cases of those days, also to avoid creation of an additional competitor to EV4. Consisted of 1.75 mln. transistors, required a 3.3V power supply. Design of this processor was licenced to Mitsubishi, so it manufactured LCA4 as well even including a 200MHz version.
21064A (EV45) was announced at a Microprocessor Forum in October of 1993. It was a modified EV4 manufactured with a proprietary 4-layer 0.5µ CMOS5 process. 21066A (LCA45) was presented at a COMDEX in November of 1994. It was modified almost exactly the same way as EV4 was towards EV45 in means of technological process and core internals. To mention, DEC's marketing people developed a habit to add a letter to a processor's model name after a redesign towards a more advanced process. Both cores of EV45 and LCA45 were changed not so much: I-cache and D-cache of EV45 were doubled in size (16Kb I-cache + 16Kb D-cache) and their data and tag fields gained a parity bit each, branch history fields of I-cache were expanded to 16 bits, D-cache became 2-way set associative and 1-bit byte parity mode was added to those existing integrity modes of the system data bus. In addition, both EV45 and LCA45 were awarded with a modified F-box (division optimisation: EV4 could execute a floating-point division instruction in 34 cycles for single-precision operands and in 63 cycles for double-precision operands with no dependence upon operands' values; EV45 could do the same thing in 19 to 34 cycles for single-precision operands and in 29 to 63 cycles for double-precision operands with dependence upon operands' values). LCA45 was also manufactured by Mitsubishi. Both dies decreased in size to 164mm² for EV45 and to 161mm² for LCA45. Transistors' count increased to 2.85 mln. for EV45 and remained 1.75 mln. for LCA45. Finally, power consumption per cycle decreased for both processors, though power supply voltage didn't change from 3.3V. Core frequencies of 21064 ranged from 200MHz to 300MHz (TDP from 24W to 36W), of 21066A — from 166MHz to 233MHz.
21066 and 21066A found their use mostly in DEC UDB (Universal Desktop Box; code-named as Multia) personal computers, though there were UDBs powered by Intel Pentium processors as well. 21066 and 21066A were also used in Tadpole ALPHAbook 1, the first (and the only?) Alpha-based notebook.
DEC developed equipment for the Department of Defense of the United States, so 21068 at 66MHz and 21068A at 100MHz were introduced in 1994. They had derived from LCA4 and LCA45 respectively with necessary improvements made to meet military needs (passive cooling, insensibility to vibration and extreme temperature conditions asf.). In particular, military systems could be expected to run over the temperature range of -54 °C to +70 °C with a 30-minute tolerance of +85 °C (taken from MIL-E-5400T, General Specification for Electronic Equipment, Aerospace dated by May of 1990). Back in 1986, DEC signed an agreement with Raytheon Company to develop jointly computer systems for the Department of Defense, though of the VAX architecture those days (conversion of VAX 6200). So, DEC AXPvme 64 design was ready by 1994. It accommodated a 66MHz 21068, a must have for military equipment the VME64 peripheral bus, and some PCI and ISA peripherals including a DEC 21040 Ethernet controller. This module was sold as Raytheon Model 910.
First two 21064A-based workstations were announced in July of 1994: DEC 3000 Model 900 AXP and Model 700 AXP (code-named as Flamingo45 and Sandpiper45 respectively). They were powered with a 275MHz and 225MHz 21064A respectively, also accommodated 2Mb of B-cache, 128Mb of main memory, a ZLX family 24-bit video card, FastSCSI peripherals, the same networking, sound and ISDN hardware to of Model 500 AXP. The first workstation was offered for 43.4 ths. USD, the second — for 27.7 ths. USD.
First system logic sets for the Alpha architecture featured support for the TURBOchannel, FutureBus+ and XMI peripheral buses. In particular, a nameless one for DEC 3000 workstations consisted of 6 chips: 1 ADDR ASIC, 1 TC (TURBOchannel) ASIC, and 4 SLICE ASICs which served a 128-bit system data path and 256-bit memory data path as well as a 32-bit data path to the TC ASIC. Although this and other peripheral bus implementations were high-speed designs for those days (about 100MB/s per bus), they didn't obtain any significant market share, thus a very limited set of peripherals was available for them. So, DEC turned its attention to PCI, a new promising peripheral bus. A new system logic set called DEC Apecs was introduced in 1994 in two editions: for 64-bit system data bus (21071) and for 128-bit one (21072). The difference was that 21071 consisted of 4 chips (1 universal controller — COMANCHE, 2 data slices — DECADE, 1 PCI bus controller — EPIC) but 21072 — of 6 (2 additional DECADE chips). Supported 33MHz system bus frequency, up to 16Mb of B-cache, up to 4Gb of FPM parity memory with access time from 100 to 50ns (8 banks) and up to 16Mb of dual-ported VRAM for an optional video frame-buffer (1 bank). Support for either the ISA or EISA bus could be implemented through use of a standard bridge such as Intel 82378IB (ISA) or 82378EB (EISA). Was implemented in EB64+ and AlphaPC 64 (code-named as Cabriolet) mainboard designs.
Drawing of DEC Apecs
<< Previous page Next page >>

Copyright (c) Paul V. Bolotoff, 2005-07. All rights reserved.
A full or partial reprint without a permission received from the author is prohibited.
Designed and maintained by Alasir Enterprises, 1999-2007
rhett from, walter from