Alasir Enterprises
Home >  Main page >  x86 CPU Reference (part 2)  

  to Part 1
 
(continued)

32 BITS: SUPERSCALAR

4.1. Intel Pentium (P5)

Since Intel was unable to protect the "5x86" name legally from numerous competitors, the "Pentium" appeared. A superscalar CPU that featured two independent 5-stage integer pipelines (the 1st pipeline could process all integer instructions and supported the barrel shifter, but the 2nd -- just mostly used), two 8-stage floating-point pipelines (multiplexed with integer pipelines; the 1st pipeline could process all floating-point instructions, the 2nd -- only FXCH), two address generators (one per pipeline), and the branch target buffer with the 1-level branch prediction logic. But, both pipelines had to execute either integer or floating-point code at any moment given. The first x86-family CPU to work in dual-processor configurations. The on-chip 16Kb cache was separated equally for data and instructions (Harvard architecture); the data cache was triple-ported to support two simultaneous data transfers and one inquiry within a clock tick; the instruction cache was triple-ported as well. Operated without clock-multiplying. Form-factor: Socket 4.

Due to the large core, there were numerous overheating issues reported. More, in October of 1994 Dr. Thomas R. Nicely, Professor of Mathematics at the Lynchburg College (Lynchburg, VA), reported a bug present in the FPUs of all Pentium CPUs: the double precision part of the mantissa was not computed correctly when dividing in some areas of the mantissa space of the divisor. This "FDIV bug" was fixed after November of 1994.

On integer tasks Pentium performed about 2 times faster than i486DX2, but on floating-point -- over 3 times:

CPUSPECint92SPECfp92
i486DX2-6639.618.8
Pentium-6678.063.6

Later, numerous companies built different Pentium-class CPUs that performed often better on integer tasks, but always worse on floating-point.

Core clock speeds: 60 or 66MHz.

4.2. Intel Pentium (P54C)

A P5 built with a 0.6µ technology, and thus required a lower voltage. No significant differences. Form-factor: Socket 5.

Core clock speeds: 75, 90 or 100MHz.
FSB clock speeds: 50, 60 and 66MHz, respectively.

4.2. NexGen Nx586

NexGen (from Milpitas, CA) in March of 1994 released Nx586 processor, featuring a superscalar RISC architecture. Two 7-stage integer pipelines, one address generator, one data transfer unit (universal load & store), the 32Kb L1 cache, the L2 cache controller, no FPU (could be installed separately, Nx587). Out-of-order execution, register renaming, data forwarding, and a 2-level branch prediction logic. The L2 cache (256Kb, independent 64-bit bus) was located on motherboard, but operated at the full core clock speed. Should be installed into NxVL-based motherboards with a proprietary PGA socket.

Since NexGen had no own manufacturing facilities, all CPUs were produced by IBM, but unfortunately not in a sufficient volume to get a significant market share.

On integer tasks performed about 10% faster than equally-clocked Pentium.

Core clock speeds: 70, 75, 84 or 93MHz.
FSB clock speeds: 35, 37, 42 or 46MHz, respectively.

Note: initial samples were running at 60 or 66MHz, but available commercially started from 70MHz, since September of 1994.

4.2. Intel Pentium (P54CQS)

120MHz P54C built after a technology shrink (to 0.35µ), with no multiprocessor support. FSB clock speed: 60MHz. Form-factor: Socket 5.

4.3. Cyrix 6x86 (M1, M1R)

A superscalar RISC CPU: two 7-stage integer pipelines, one non-pipelined FPU, two data transfer units (1 load, 1 store). Out-of-order execution, branch prediction (512-entry branch target buffer), return stack, register renaming, data forwarding. The 16Kb unified L1 cache and the 256-byte direct-mapped instruction subcache. Multiprocessor support.

80 and 100MHz versions were manufactured by SGS-Thomson, 100MHz and up -- by IBM; they also sold 6x86 under their brands. Form-factor: Socket 5.

Core clock speeds: 80, 100, 110, 120, 133 or 150MHz.
FSB clock speeds: 40, 50, 55, 60 or 66MHz.

Note: supported 2x and 3x multipliers only.

4.4. NexGen Nx686

A result of Nx586's evolution. Two integer pipelines, one non-pipelined FPU, two data transfer units (1 load, 1 store), one multimedia unit (on the 1st integer pipeline), and one branch prediction unit.

The L2 cache controller was integrated into the core, so off-chip L2 cache memory ran at the full core clock speed (like in Nx586), and could be as much as 2Mb. Was designed for use with NxPCI chipset, and was expected to cost significantly less than upcoming Intel's Pentium Pro while offering comparable or even better performance on integer tasks.

Was not available commercially. NexGen was acquired by AMD in the beginning of 1996, and after some modifications to meet requirements of Socket 7 form-factor AMD released Nx686 as K6 (original AMD's K6 core was scrapped; AMD engineers were targeted to develop K7 (aka Athlon), while K6 and derivatives were supported mostly by former NexGen's employees).

Core clock speed: 180MHz.
FSB clock speed: 60MHz.

4.5. Intel Pentium Pro (P6)

The next generation CPU compared to Intel Pentium. Superscalar superpipelined RISC core, large L2 cache with ECC support operating at the full core clock speed (it was not integrated into the core, but located within the chip, and connected to the core using Dual Independent Bus). The L2 cache controller was built into the core, and utilised a 64-bit bus. Two 12-stage integer pipelines, one floating-point pipeline (the length wasn't documented, should be about 25 stages; shared an I/O port with the 1st integer pipeline), a jump execution unit (shared an I/O port with the 2nd integer pipeline), two address generators (1 load, 1 store), and two data transfer units (1 load, 1 store). Out-of-order execution, multiple branch prediction, and instruction pool. Three instruction decoders worked in parallel,and each could process a command per clock tick. One instruction retire unit was capable of converting three internal commands into regular x86 ones, per clock tick. Form-factor: Socket 8.

Due to its RISC core, the fast FPU and L2 cache, Pentium Pro outperformed Pentium significantly. However, the core was optimised for 32-bit integer code, thus performance increase with 16-bit was not so tangible.

By the way, the suffix "Pro" meant not "professional", but "Precision RISC Organisation".

Core clock speeds: 150MHz (256K L2 cache), 166MHz (512K L2 cache), 180MHz (256K L2 cache), 200MHz (256, 512 or 1024K L2 cache).
FSB clock speeds: 60 or 66MHz.
Intel P6 architecture

4.6. NexGen Nx586FP

After moving to a 0.44µ technical process, it was possible to increase the core clock speed. Also because of the smaller die size it appeared possible to built the mathprocessor into the core. Hence, the new CPU was called Nx586FP, and was designed for use with NxPCI-based motherboards (with 1Mb L2 cache).

On integer tasks performed up to 20% faster than equally-clocked Pentium.

Core clock speeds: 120 or 133MHz.
FSB clock speeds: 60 or 66MHz, respectively.

4.7. Intel Pentium (P54CS)

A modification of P54C, of the same technical process and voltage as P54CQS. Form-factor: Socket 5.

Core clock speeds: 133, 150, 166 or 200MHz.
FSB clock speeds: 60 or 66MHz.

4.8. AMD K5 (Krypton-5)

A self-designed CPU by AMD to feature a RISC architecture. Originally code-named as Kryptonite-5 (a fictional element that could destroy Superman), later changed for Krypton-5, to avoid possible problems with copyright infringement from DC Comics.

Three 5-stage integer pipelines, one non-pipelined FPU, one data transfer unit. Out-of-order execution and branch prediction. The on-chip cache was separated not equally: 8K for data (dual-ported) and 16K for instructions. Form-factor: Socket 5.

There were 2 versions of K5: SSA/5 (Model 0), a pre-release, with the higher internal wait states and the branch prediction disabled, and 5k86 (Models 1, 2 and 3), a fully-featured version. SSA/5 performed on integer tasks like equally-clocked Pentium, but 5k86 -- up to 30% faster. Could be a good alternative to the latter, but appeared too late and could only compete on the low-end market.

Core clock speeds: 75, 90 or 100MHz (SSA/5); 90, 100, 105, 116 and 133MHz (5k86).
FSB clock speeds: 50, 60 and 66MHz.

Note: K5 could operate at 1.75x of the FSB clock speed (105 and 116MHz versions). Motherboard settings of 1.5x and 2x were interpreted as 1.5x; 2.5x as 1.75x; 3x as 2x. K5 Model 0 supported 1.5x and 2x multipliers; Model 1 -- 1.5x only; Model 2 -- 1.5x and 1.75x; Model 3 -- 1.5x, 1.75x and 2x.

4.9. Intel Pentium MMX (P55C)

Another modified version of P54C. Included the MMX (Multi Media eXtension, or Matrix Math eXclusive) unit supporting 57 new multi-media instructions to improve audio/video processing speed. Also the larger 16Kb + 16Kb L1 cache, and the improved branch prediction mechanism (2-level, taken from Pentium Pro). Required dual voltage. Form-factor: Socket 7.

Core clock speeds: 166, 200 or 233MHz (66MHz FSB).

Note: 233MHz P55C interpreted 1.5x multiplier as 3.5x because motherboards lacked 3.5x jumper setting.

4.10. Cyrix 6x86L (M1L)

A low-power modification of 6x86, that was manufactured by IBM with a 0.35µ technology. Required dual voltage. No functional differences. Form-factor: Socket 7.

Core clock speeds: 100, 110, 120 or 150MHz.
FSB clock speeds: 50, 60 or 66MHz.

4.11. Cyrix MediaGX

An all-in-one chip, targeted to the market of extremely low-cost PCs. Included the memory, video, and PCI bus controllers. The video frame buffer was kept in main memory (UMA, Unified Memory Architecture). Should be used with Cx5510 core logic set. Was designed for a proprietary BGA socket.

Core clock speeds: 120, 133, 150, 180, 200, 233 or 266MHz.
FSB clock speeds: 60 or 66MHz.
Cyrix MediaGX diagram

4.12. Intel Pentium II (Klamath)

A P6-class CPU. The modified slightly core from Pentium Pro, plus two MMX execution units (1 on the integer pipeline, 1 on the floating). The 512Kb L2 cache running half the core speed was located out of the core, but inside the CPU cartridge; communicated with the core using the same Dual Independent Bus. The first Intel's CPU to feature a fixed clock multiplier. Could be used in 2-way SMP systems. Form-factor: Slot 1.

The CPUs manufactured since July of 1997 had the L2 cache with ECC. Regardless of the 32-bit address bus, only 512Mb were cacheable by the L2 cache.

Core clock speeds: 233, 266 or 300MHz (66MHz FSB).

4.13. AMD K6

Actually, AMD K6 is a modified version of NexGen Nx686 (original K6 core was scrapped soon after acqusition of NexGen in the beginning of 1996). A RISC core with two execution pipelines. The 1st pipeline handled the 1st integer ALU, one integer divide & multiply, one integer shifter & rotater, several auxiliary integer units, one MMX ALU, one MMX shifter & packer & unpacker and one MMX multiplier. The 2nd pipeline handled the 2nd integer ALU only (operated with basic integer instructions). The core also included a non-pipelined FPU, two data transfer units (1 load, 1 store), and a branch prediction unit. The L1 cache was separated equally, 32Kb for data and 32Kb for instructions. The on-board L2 cache controller of Nx686 was abandoned. Four decoders could process either two short (up to 7 bytes long), or one long (up to 11 bytes long), or one complex (up to 15 bytes long) instruction per clock tick.

The branch logic operated with the 2-level prediction scheme using a 8192-entry branch history table, a 16-entry by 16-byte branch target cache, a 16-entry return stack and a branch prediction unit; generally, it was able to deliver a successful prediction rate of about 95%. Form-factor: Socket 7.

Integer performance of K6 was comparable to of Pentium II, but floating-point -- even worse than of Pentium MMX (for about 15%).

Core clock speeds: 166, 200 or 233MHz (Model 6), 200, 233, 266 or 300MHz (Model 7), all 66MHz FSB.

4.14. Cyrix 6x86MX (M2)

A descendant of 6x86L. Two MMX execution units were added, the L1 cache size was increased to 64Kb (still unified), the FPU and the memory management unit were modified slightly. Was manufactured by IBM. Form-factor: Socket 7.

Due to increased cache size and well-optimised RISC architecture, 6x86MX outperformed significantly equally-clocked Pentium MMX and K6 on integer tasks (for about 50% and 20%, respectively), but handled floating-point tasks significantly slower than even K6 (for about 30%, or 40% when compared to Pentium MMX).

Core clock speeds: 133, 150, 166, 188, 208, 225, 233, 250 or 263MHz.
FSB clock speeds: 60, 66, 75 or 83MHz.

4.15. IDT WinChip (C6),

IDT manufactured CPUs designed by their subsidiary, Centaur Technology. These chips were closer to 486 than to 586: the single integer pipeline, one non-pipelined FPU, one MMX pipeline. No out-of-order execution, register renaming, data forwarding, even no branch prediction logic! Though the integer and MMX units could operate in parallel, achieving some level of superscalarity. Anyway, the performance was quite low. But, power consumption was low too, due to the small core size, and because of a single voltage nature C6 could be used successfully for upgrading purposes. Form-factor: Socket 7.

Core clock speeds: 180 or 240MHz (60MHz FSB), 200MHz (66MHz FSB), 225MHz (75MHz FSB).

4.16. Intel Pentium II (Deschutes)

The next (and the last) release of Pentium II. After moving to a 0.25µ process it was possible to increase core clock speeds, but architecture remained almost the same. Supported the L2 cache ECC checking, and the cacheable range was increased to 4Gb. Among other improvements, the bus interface unit was enhanced towards a better performance.

Models designed for 66MHz FSB were well-overclockable, so there was a high probability to get them running stable at 100MHz FSB.

Core clock speeds: 266, 300 or 333MHz (66MHz FSB), 350, 400 or 450MHz (100MHz FSB).

4.17. Intel Celeron (Covington)

In the beginning of 1998 Pentium II was a mainstream CPU, while Pentium Pro was still succesfully used in mid-range and high-end servers because of the native 4-way SMP support. But, there was a significant demand in some low-end product, a successor of Pentium MMX. Hence, a stripped version of Pentium II (Deschutes) was introduced, without the L2 cache and running at 66MHz FSB, with a fixed multiplier. Form-factor: Slot 1.

Accordingly to numerous benchmarks, performance decrease was significant enough to fall behind PII-233. More, it handled integer tasks even slower than P-200MMX. On the other hand, because of the L2 cache's absence and thus the compact core this CPU was reported to be overclockable well: it could almost always run at 83MHz FSB, but not so good at 100MHz.

Intel marketed this CPU for uniprocessor systems only, but with a little pinout "modification" it could be used in 2-way SMP systems.

Core clock speeds: 266 or 300MHz (66MHz FSB).

4.18. Cyrix 6x86MII (M2)

Nothing but 6x86MX built after technology shrinks. Was manufactured by National Semiconductor, a new owner of Cyrix. Form-factor: Socket 7.

Core clock speeds: 200, 225, 233, 250, 285, or 300MHz.
FSB clock speeds: 66, 75, 83, 95 or 100MHz.

4.19. AMD K6-2 (Chambers)

A K6 with several significant enhancements: the instruction pre-decode cache buffer of 20K was implemented; the 2nd MMX ALU was added to the 2nd integer pipeline; the MMX shifter, the MMX & 3DNow! multiplier and the 3DNow! ALU were shared between both pipelines. The 3DNow! execution units operated with 21 new instruction, for improved floating-point performance (one instruction was addressed to improve MPEG decoding). Every 3DNow! unit was pipelined (able to execute one instruction per two clock ticks, and work in parallel). Form-factor: Socket 7.

On integer tasks, K6-2 performed close to Pentium II or Celeron, but about two times slower when it came to true floating-point calculations, just like K6. However, running 3DNow!-optimised applications it was able even to outperform Pentium II.

Core clock speeds: 233, 266, 300, 333 or 366MHz (66MHz FSB), 380 or 475MHz (95MHz FSB), 533MHz (97MHz FSB), 350, 400, 450, 500 or 550MHz (100MHz FSB).
AMD K6-2 architecture

4.20. Intel Pentium II Xeon (Drake)

In the beginning of 1998, because of the 2-way SMP limitation of Pentium II, obsolete Pentium Pro was still used succesfully in mid-range and high-end servers because of native 4-way SMP support, not mentioning 8-way (or even more) capabilities if used with clustering technologies. So, it was a right time to introduce Pentium II Xeon, aka "Pentium II on steroids". Had many features of Pentium Pro: the native 4-way SMP support, the off-core L2 cache running at the full core clock speed (with ECC checking), the 36-bit address bus. Beyond them, the core was almost similar to that of Pentium II (Deschutes). Form-factor: Slot 2.

Though the L2 cache was running at the full core speed, it appeared to be slower than of PPro (supposely, 3-2-2-2 when writing and 3-1-1-1 when reading, against 3-1-1-1 and 3-1-1-1, respectively).

Core clock speeds: 400 [512K or 1M L2 cache] or 450MHz [512K, 1M or 2M L2 cache] (100MHz FSB).

Note: Drake could operate with 4x (400 and 450MHz CPUs) or 4.5x multiplier (450MHz only).

4.21. Intel Celeron (Mendocino)

Since Covington demonstrated too much poor performance, and thereby met a very limited market demand, it was quickly modified with the L2 cache of 128K, integrated into the core. After this change, performance shown was close to that of equally-clocked Pentium II. Considering still the low price, new Celeron was a very good choice for desktop systems. Form-factor: Slot 1 (initially) or Socket 370 (since November of 1998).

Interesting, but Mendocino was even better overclockable than Covington. Numerous reports proved that 300-366MHz chips with adequate cooling could run at 100MHz FSB, outperforming Pentium II.

Core clock speeds: 300, 333, 366, 400 or 433MHz (Slot 1, 66MHz FSB), 300, 333, 366, 400, 433, 466, 500 or 533MHz (Socket 370, 66MHz FSB).

Note: since Mendocino was launched at 300MHz, to show the difference between it and 300MHz Covington, new Celeron was called "300A".

4.22. Intel Pentium II OverDrive (P6T)

Designed to upgrade PPro-based systems. Core from Deschutes + 512K L2 cache, running at the full core speed. Supported the 2-way SMP. Form-factor: Socket 8.

In a matter of fact, the L2 cache appeared to be somewhat slower than of PPro (supposely, 3-2-2-2 when writing and 3-1-1-1 when reading, against 3-1-1-1 and 3-1-1-1, respectively). More, it seems that the L2 cache chip is the same to that in PIIXeon. Overall, PIIOD looked to me as PIIXeon running on 66MHz FSB, and reshaped for Socket 8.

Core clock speed: 333MHz (66MHz FSB).

4.23. IDT WinChip 2 (C6+)

An enhanced version of C6, with two MMX and two 3DNow! pipelines. The FPU became partially pipelined. The branch prediction logic was also implemented. Form-factor: Socket 7.

Was manufactured both by IDT (0.35µ and 0.25µ processes), and IBM (0.25µ process).

Core clock speeds: 240MHz (60MHz FSB), 200 or 233MHz (66MHz FSB), 225MHz (75MHz FSB), 233MHz (100MHz FSB).

Note: some chips could operate with 2.33x and 2.66x multipliers (motherboard jumpers 5x and 5.5x, respectively).

4.24. Intel Pentium III (Katmai)

A modified version of Deschutes. The integer and floating-point units, as well as the branch prediction logic, were unchanged. The MMX command set was extended with new (mostly 32-bit) instructions, called MMX2. The memory streaming feature was added, to enhance the FSB throughput. Form-factor: Slot 1.

The key difference between Katmai and Deschutes was SSE -- Streaming SIMD (Single Instruction -- Multiple Data) Extensions, a set of 70 instructions for accelerated integer & floating-point calculations. Its logic operated with 8 new 128-bit registers, which were implemented in the core. The dark side of this innovation was that to enable SSE existing operating systems had to be modified in order to save the new registers during task switching. But there was the bright side too: since SSE had no attitude to the floating-point registers, the SSE unit (being entirely pipelined) could operate in parallel with either the MMX or the FP unit.

Core clock speeds: 450, 500, 550 or 600MHz (100MHz FSB), 533 or 600MHz (133MHz FSB).

Note: Katmai was planned to run at 100MHz FSB only, but due to manufacturing problems with upcoming Coppermine two versions that utilised 133MHz FSB were released, called 533B and 600B.

4.25. AMD K6-III (Sharptooth)

A K6-2 enhanced with the on-chip L2 cache of 256Kb running at the full core clock speed. Form-factor: Socket 7.

Reported to be up to 40% faster than K6-2 on some tasks, but was quite expensive ($476 for 450MHz K6-III while 450MHz K6-2 cost $203, in March of 1999). A year after, the production was ceased in favour of K6-2+.

Core clock speeds: 475MHz (95MHz FSB), 400, 450 or 500MHz (100MHz FSB).

4.26. Rise iDragon mP6

A Pentium-class CPU from Rise Technology. Featured three 8-stage ALUs, one highly-pipelined FPU (could execute up to 2 instructions per cycle), one MMX unit consisting of 3 pipelines (ALU, multiplier, shifter) and able to process up to 3 instructions per cycle, also the branch prediction logic with the 512-entry branch history table.

Unfortunately, appeared too late running at too low clock speeds, and didn't obtain any significant market share. Though, was quite cheap and featured a very low power dissipation (Kirin, 4W maximum at 250MHz). Form-factor: Socket 7.

Since Rise had no manufacturing facilities of its own, all CPUs were produced by TSMC, Taiwan.

The next Socket 7 CPU from Rise, mP6 II, should feature higher clock speeds (250-350MHz) and the L2 cache of 256Kb (integrated into the core), but was never released, and Rise went out of the PC processor market.

Core clock speeds: 166 (83MHz FSB), 190 (95MHz FSB), 200MHz or 250MHz (100MHz FSB).

Note: featured an unlocked clock multiplier.

4.27. Intel Pentium III Xeon (Tanner)

Frankly speaking, differs from Drake in the same way as Katmai from Deschutes. Had the off-chip full-speed L2 cache of 512Kb, 1Mb or 2Mb. Form-factor: Slot 2.

Core clock speeds: 500 or 550MHz (100MHz FSB).

4.28. AMD Athlon (K7)

K7 was a long-waited AMD's bet to break through Intel's leadership on the PC market. The pocessor core was redesigned completely when compared to K6 series, and many architectural components were borrowed from DEC Alpha 21164 and 21264 (EV5 and EV6, respectively). Three 10-stage integer and three 15-stage floating-point pipelines, plus three address generators (one per integer pipeline) and two data transfer units (universal load & store), operating independently from other execution units. Also two MMX and two 3Dnow! units were located at the floating-point pipelines. The 3DNow! command set was enhanced with 24 additional instructions to improve mostly integer arithmetics. Both integer and floating-point performance increased very much compared to K6 series, significantly outperforming Intel's Katmai and staying on par with Coppermine. The L1 cache of 128Kb (the largest L1 cache ever integrated into a x86 CPU) was separated equally for data and instructions, the L2 cache of 512Kb was located off-chip [but within the CPU cartridge], and operated at 1/3, 1/2 or 2/3 of the core clock speed; the L2 cache bus was 64 bits wide. Besides, the L1 cache was exclusive (i.e. information within the L1 cache didn't need to be mirrored within the L2 cache, unlike in PPro and derivatives), and the victim buffer was implemented to facilitate transfers between both cache levels (like in DEC EV5 and EV6). Form-factor: Slot A (physically, but not electrically, compatible with Slot 1).

Another key feature of K7 was the system bus architecture (S2K), which derived directly from DEC EV6; running at 100MHz, it could transfer data at both rising and falling edges of the clock signal, thereby increasing the bus bandwidth for two times (nowadays this feature is called DDR -- Double Data Rate), but addresses were transferred at the nominal frequency.

Core clock speeds: 500, 550, 600, 650, 700, 750, 800, 850, 900, 950 or 1000MHz (100MHz DDR FSB).
L2 cache multipliers: 1/2 (500-700MHz), 2/5 (750-850MHz) or 1/3 (900-1000MHz).
AMD K7 architecture

4.29. Intel Pentium III (Coppermine)

Featured several functional changes when compared to Katmai, but most significant was the Advanced Transfer Cache (ATC). It meant that the off-core L2 cache was integrated into, launched at the full core clock speed, and communicated with the core through a 256-bit bus. Also buffering was significantly enhanced: 4 write-back buffers compared to 1 in Katmai, 6 fill buffers compared to 4, 8 bus queue entries compared to 4. Supported 2-way SMP. Form-factor: Slot 1 or Socket 370.

Core clock speeds: 550, 600, 650, 700, 750, 800, 850 or 1000MHz (Slot 1, 100MHz FSB); 533, 600, 666, 733, 800, 866, 933, 1000 or 1133MHz (Slot 1, 133MHz FSB); 500, 550, 600, 650, 700, 750, 800, 850, 900, 1000 or 1100MHz (Socket 370, 100MHz FSB); 533, 600, 666, 733, 800, 866, 933, 1000 or 1133MHz (Socket 370, 133MHz FSB).

Note 1: to differ Coppermine from speed-matching Katmai, an "E" mark was introduced, i.e. 533MHz Coppermine was promoted as PIII-533EB; a "B" mark was also used to label chips running at 133MHz FSB.

Note 2: there were also Celeron and Xeon CPUs based upon Coppermine core. Pentium III Xeon CPUs were almost exactly the same to Pentium III, but reshaped for Slot 2 form-factor, utilised 133MHz FSB, and supported 2-way SMP. Celeron CPUs were Coppermines with the 128Kb L2 cache (to be correct, there was the L2 cache of 256Kb, but one half was disabled), multipliers were locked for 66MHz FSB, and SMP feature was disabled as well.

Note 3: Celeron's core clock speeds: 533, 566, 600, 633, 666, 700, 733 or 766MHz (66MHz FSB), 800, 850, 900, 950, 1000 or 1100MHz (100MHz FSB).

Note 4: Pentium III Xeon's core clock speeds: 600, 666, 733, 800, 866, 933 and 1000MHz (133MHz FSB).

Note 5: to differ 533MHz Mendocino and Coppermine (Celeron) the latter was called 533A.

4.30. Intel Pentium III Xeon (Cascades)

A real successor of Drakes and Tanners. Coppermine with the L2 cache of 2Mb integrated into the core, and supporting 4-way SMP. Form-factor: Slot 2.

Core clock speeds: 700 and 900MHz (100MHz FSB).

Note: 700MHz version was also available with the L2 cache of 1Mb, i.e. with another megabyte disabled.

4.31. AMD Athlon (Thunderbird)

A K7 (or K75, to be precise) with the L2 cache of 256Kb, that was moved off the core right into. The bus interface unit was modified for faster memory I/O operations. Form-factor: Slot A or Socket A.

Core clock speeds: 700, 750, 800, 850, 900, 950 or 1000MHz (Slot A, 100MHz DDR FSB), 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300 or 1400MHz (Socket A, 100MHz DDR FSB), 1000, 1133, 1200, 1333 or 1400MHz (Socket A, 133MHz DDR FSB).

4.32. AMD Duron (Spitfire)

A budget version of Thunderbird, with the L2 cache of 64Kb. Form-factor: Socket A.

Core clock speeds: 600, 650, 700, 750, 800, 850, 900 or 950MHz (100MHz DDR FSB).

4.33. AMD K6-2+

Since K6-III was expensive very much for most potential buyers, K6-2+ was developed. Basically, it was a K6-2 reshaped for a 0.18µ technology, and featured additionally the L2 cache running at the full core clock speed (like K6-III), PowerNow! (advanced power-saving), and the extended 3DNow! command set (like K7). Was good for desktop as well as portable systems. Form-factor: Socket 7.

Core clock speeds: 475MHz (95MHz FSB), 533MHz (97MHz FSB), 450, 500 or 550MHz (100MHz FSB).

4.34. AMD K6-III+

Almost the same to K6-2+, but with the L2 cache of 256Kb. Form-factor: Socket 7.

Core clock speeds: 475MHz (95MHz FSB), 450 or 500MHz (100MHz FSB).

Note: due to unlocked multipliers and the compact core, was highly overclockable (450MHz model was reported to run stable at 600MHz, with a regular air cooling).

4.35. VIA Cyrix III (Samuel, C5)

After acquiring both Cyrix and Centaur in August, 1999, VIA became a processor-developing company. Soon Cyrix's Joshua (also known as Jalapeno or Mojave) had been scrapped finally in favour of Centaur's Samuel (C5). So in spite of its name, VIA Cyrix III had almost no connection with Cyrix. The core featured one 12-stage ALU, one 16-stage partially-pipelined FPU, the MMX and 3DNow! units, the branch prediction logic with the 4096-entry branch history table. Didn't require active cooling at all, due to a little power dissipation. Though it was low-priced relatively, couldn't stay on par with Intel's or AMD's low-end processors in any mean of performance. Form-factor: Socket 370.

Core clock speeds: 500, 550, 600 or 650MHz (100MHz FSB), 533 or 666MHz (133 MHz FSB).

4.36. Intel Pentium 4 (Willamette)

Intel engineers considered the core of Pentium Pro (that was used with some enhancements upwards to Pentium III) as limited and obsolete. So, a new architecture (called NetBurst) was developed and had very little to share with the previous one (P6), only keeping traditional software compatibility. The CPU featured very long execution pipelines, and thereby was able to be clocked very highly, though actual performance increased not as much. Two ALUs were running internally twice the core frequency. In addition to MMX and SSE, SSE2 was implemented, featuring 144 new instructions. The L1 data cache of 8Kb (write-through), and the L1-like cache of 12Kuops (decoded instructions, RISC-like). The L2 cache of 256Kb (write-back) was located on-die. The CPU was interfaced to a system bus that was able to transfer data 4 times per clock tick (like AGP 4x), i.e. featuring a theoretical transfer rate of 3.2Gb/sec at 100MHz.

Overall, the new hyper-pipelined architecture proved to be very efficient when running streaming multimedia applications, due to high ALU frequencies and the FSB bandwidth. On the other hand, the FPU performance was rather poor, though SSE2 could help somewhat. Form-factor: Socket 423 or Socket 478.

Core clock speeds: 1300, 1400, 1500, 1600, 1700, 1800, 1900 or 2000MHz (Socket 423), 1400, 1500, 1600, 1700, 1800, 1900 or 2000MHz (Socket 478), all 100MHz QDR FSB.

Note 1: there were also Celeron chips based upon Willamette core (since May, 2002); were clocked at 1700 or 1800MHz with 100MHz QDR FSB, and the 128Kb L2 cache (1/2 of 256Kb was disabled).

Note 2: there were also Xeon chips based upon Willamette core (since May, 2001), codenamed as Foster; were capable of dual-processing, designed for Socket 603, and clocked at 1400, 1500, 1700 or 2000MHz with 100MHz QDR FSB.

4.37. VIA C3 (Samuel 2, C5B)

An enhanced version of Samuel, with the L2 cache of 64Kb, integrated into the core. Another "notable" improvement: the FPU was launched at half the core clock speed (note that it wasn't of excellent performance even before). Form-factor: Socket 370.

Core clock speeds: 750 or 800MHz (100MHz FSB), 733MHz (133MHz FSB).

4.38. AMD Athlon XP/MP (Palomino)

A enhanced version of Thunderbird, with SSE support, and the data prefetch logic improved. Athlon MP was the first CPU from AMD to work in multiprocessor configurations. Notably, Athlon XP\MP were marketed under Pentium ratings, like K5 in old good times. Form-factor: Socket A.

Core clock speeds: 1333, 1400, 1466, 1533, 1600, 1666 or 1733MHz (Athlon XP, 133MHz DDR FSB), 1000, 1200, 1333, 1400, 1533, 1600, 1666 or 1733MHz (Athlon MP, 133MHz DDR FSB).

4.39. Intel Pentium III (Tualatin)

The last series of Pentium III processors. No architectural differences from Coppermine. Was available with the L2 cache of either 256Kb or 512Kb; 256Kb models simply had 512Kb with one half disabled. Form-factor: Socket 370.

Core clock speeds: 1000, 1133, 1200, 1333 or 1400MHz (256Kb L2 cache, 133MHz FSB), 700 or 900MHz (512Kb L2 cache, 100MHz FSB), 1133, 1266 or 1400MHz (512Kb L2 cache, 133MHz FSB).

Note 1: there were also Celeron chips based upon Tualatin core (since May, 2002); were clocked at 900, 1000, 1100, 1200, 1300 or 1400MHz at 100MHz FSB, the 256Kb L2 cache (1/2 of 512Kb was disabled), and the 32-bit address bus.

Note 2: only Tualatins with the L2 cache of 512Kb supported 2-way SMP, so-called PIII-S (server edition).

4.40. AMD Duron (Morgan)

An enhanced version of Spitfire, with SSE support and the data prefetch logic improved. Form-factor: Socket A.

First CPUs (model 6) were made out of Athlons (Palomino), which failed quality tests and thus were resold with 3/4 of the L2 cache disabled. Later CPUs (model 7) were manufactured separately, and had no disabled L2 cache blocks.

Core clock speeds: 1000, 1100, 1200, or 1300MHz (100MHz DDR FSB).

4.41. Intel Pentium 4 (Northwood)

A Willamette, redesigned for a 0.13µ technology, with the 512Kb L2 cache. Also, the Jackson's Hyperthreading Technology was implemented in some processors (though it was available in all P4 chips since Willamette, but disabled in hardware): was supposed to turn a physical core into two logical ones and to increase performance thereby, but it required existing software to be recompiled and generally seemed to be just a marketing trick. Form-factor: Socket 478.

Core clock speeds: 1600, 1800, 2000, 2400, 2500 or 2600MHz (100MHz QDR FSB), 2266, 2400, 2533, 2666, 2800 or 3066MHz (133MHz QDR FSB), 2800, 3000 or 3200MHz (200MHz QDR FSB).

Note 1: to differ Northwood from speed-matching Willamette, an "A" mark was introduced, i.e. 1600MHz Northwood was marketed as P4-1.6A; a "C" mark was used with HT-capable CPUs.

Note 2: there were also Celeron chips based upon Northwood core (since September, 2002); were clocked at 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700 or 2800MHz at 100MHz QDR FSB, and the 128Kb L2 cache (3/4 of 512Kb were disabled).

Note 3: there were also Xeon chips based upon Northwood core (since February, 2002), codenamed as Prestonia; were capable of dual-processing, designed for Socket 603, and clocked at 1600, 1800, 2000, 2200, 2400, 2600 or 2800MHz (100MHz QDR FSB), 2400, 2666, 2800 or 3066MHz (133MHz QDR FSB).

4.42. Intel Xeon MP (Foster MP)

A Foster (or Willamette), enhanced with the 1Mb L3 cache, and the Jackson's Hyperthreading Technology. Although, the L3 cache was asynchronous and thus much slower than the L2. Form-factor: Socket 603.

Core clock speeds: 1400, 1500, or 1600MHz (100MHz QDR FSB),

Note: 1400 and 1500MHz models had the 512Kb L3 cache (1/2 of 1Mb was disabled).

4.43. AMD Athlon XP/MP (Thoroughbred)

A Palomino that was redesigned for a 0.13µ technology. Form-factor: Socket A.

Core clock speeds: 1400, 1466, 1533, 1600, 1666, 1733, 1800, 2000 or 2133MHz (Athlon XP, 133MHz DDR FSB), 2083, 2166 or 2250MHz (Athlon XP, 166MHz DDR FSB), 1666, 1800, 2000 or 2133MHz (Athlon MP, 133MHz DDR FSB).

Note: there were also Duron chips based upon Thoroughbred core (codenamed as Applebred, since August, 2003); were clocked at 1400, 1600 or 1800MHz with 133MHz DDR FSB, and the 64Kb L2 cache (3/4 of 256Kb disabled).

4.44. Intel Xeon MP (Gallatin)

A Prestonia (or Northwood), enhanced with the 2Mb L3 cache. Although, the L3 cache was still asynchronous, like in Foster MP. Form-factor: Socket 603.

Core clock speeds: 1500, 1900, 2000, 2200, 2500 or 2800MHz (100MHz QDR FSB).

Note 1: some chips had the 1M L3 cache, and another megabyte disabled.

Note 2: was also introduced for Socket 478 (3200MHz with the 2Mb L3 cache, 200MHz QDR FSB, so-called "Pentium 4 Extreme Edition").

4.45. AMD Athlon XP (Barton)

A high-end version of Thoroughbred, with the 512Kb L2 cache. Form-factor: Socket A.

Core clock speeds: 1833, 1916, 1533, 2083 or 2166MHz (166MHz DDR FSB), 2100 or 2200MHz (200MHz DDR FSB).

There were Athlon XP chips with the 256Kb L2 cache (1/2 of 512Kb was disabled), codenamed as Thorton.


Technical datasheet
CPU Speeds Released L1 cache
(D+I)
L2 cache Mem. bus Addr. bus Trans. Tech. Voltage Instr. set
P5 60 and 66MHz 1 Mar,1993 8K+8K WB 2-way none 64-bit 32-bit 3.1 mln. 0.8µ 5V i586
P54C 75 to 100MHz Mar,1994 8K+8K WB 2-way none 64-bit 32-bit 3.2 mln. 0.6µ 3.1-3.6V i586
Nx586 70 to 93MHz Mar,1994 16K+16K WT 4-way none 64-bit 32-bit 3.5 mln. 0.5µ 4V i386
P54CQS 120MHz Mar,1995 8K+8K WB 2-way none 64-bit 32-bit 3.3 mln. 0.35µ 3.1-3.6V i586
6x86 80 to 150MHz Oct,1995 16K WB 4-way none 64-bit 32-bit 3.0 mln. 0.65µ 2 3.3V i486
Nx686 180MHz Oct,1995 32K+16K WB none 64-bit 32-bit 6.6 mln. 0.35µ ? i586
P6 150 to 200MHz Nov,1995 8K WB 2-way +
8K WB 4-way
256K to 1M 3
WB 4-way
64-bit 36-bit 5.5 mln. 4 0.6µ 5 or 0.35µ 3.1-3.5V i686
Nx586FP 120 and 133MHz Nov,1995 16K+16K WT 4-way none 64-bit 32-bit 4.2 mln. 0.44µ 4V core
5V I/O
i386
K5 75 to 133MHz Mar,1996 8K+16K WB 4-way none 64-bit 32-bit 4.3 mln. 0.35µ 3.5V 6 i586
P54CS 133 to 200MHz Jun,1996 8K+8K WB 2-way none 64-bit 32-bit 3.3 mln. 0.35µ 3.1-3.6V i586
P55C 166 to 233MHz Jan,1997 16K+16K WB 4-way none 64-bit 32-bit 4.5 mln. 0.35µ 2.8V core
3.3V I/O
i586
6x86L 100 to 150MHz Feb,1997 16K WB 4-way none 64-bit 32-bit 3.0 mln. 0.35µ 2.8V core
3.3V I/O
i486
MediaGX 120 to 266MHz Feb,1997 16K WB 4-way none 64-bit 32-bit 2.4 mln. 0.35µ 7 3.3V, 2.5V i586
Klamath 233 to 300MHz Apr,1997 16K+16K WB 4-way 512K WB 4-way 64-bit 32-bit 7.5 mln. 8 0.35µ 2.8V core
3.3V I/O
i686
K6 166 to 300MHz Apr, 1997 32K+32K WB 2-way none 64-bit 32-bit 8.8 mln. 0.35µ or 0.25µ 9 2.9V core 10
3.3V I/O
i586
6x86MX 133 to 263MHz May,1997 64K WB 4-way none 64-bit 32-bit 6.0 mln. 0.35µ 11 2.9V core
3.3V I/O
i686
C6 180 to 240MHz Oct,1997 32K+32K WB 2-way none 32-bit 64-bit 5.4 mln. 0.35µ 3.3V or
3.5V
i586
Deschutes 266 to 450MHz Jan,1998 16K+16K WB 4-way 512K WB 4-way 64-bit 36-bit 7.5 mln. 8 0.25µ 2.0V core
3.3V I/O
i686
Covington 266 and 300MHz Apr,1998 16K+16K WB 4-way none 64-bit 32-bit 7.5 mln. 0.25µ 2.0V i686
6x86MII 200 to 300MHz Apr,1998 64K WB 4-way none 64-bit 32-bit 6.0 mln. 0.30µ, 0.25µ, or 0.18µ 12 2.9V or 2.2V core 12
3.3V I/O
i686
K6-2 233 to 475MHz May,1998 32K+32K WB 2-way none 64-bit 32-bit 9.3 mln. 0.25µ 2.2-2.3V core
3.3V I/O
i586
Drake 400 and 450MHz Jun,1998 16K+16K WB 4-way 512K to 2M WB 4-way 64-bit 36-bit 7.5 mln. 8 0.25µ 2.0V core
2.5-2.7V I/O
i686
Mendocino 300 to 533MHz Jun,1998 16K+16K WB 4-way 128K WB 4-way 64-bit 32-bit 19 mln. 0.25µ 2.0V i686
P6T 333MHz Aug,1998 16K+16K WB 4-way 512K WB 4-way 64-bit 36-bit 38.5 mln. 0.25µ 3.3V i686
C6+ 200 to 240MHz Oct,1998 32K WB 4-way +
32K WB 2-way
none 32-bit 64-bit 6.0 mln. 0.35µ or 0.25µ 3,3V or
3.5V
i586
Katmai 450 to 600MHz Feb,1999 16K+16K WB 4-way 512K WB 4-way 64-bit 36-bit 9.5 mln. 8 0.25µ 2.0V core
3.3V I/O
i686
K6-III 400 to 475MHz Feb,1999 32K+32K WB 2-way 256K WB 4-way 64-bit 32-bit 21.3 mln. 0.25µ 2.4V core
3.3V I/O
i586
mP6 166 to 250MHz Feb,1999 8K+8K WB 2-way none 64-bit 32-bit 3.6 mln. 0.25µ or
0.18µ
2.8V or 2.0V core
3.3V I/O
i586
Tanner 500 and 550MHz Mar,1999 16K+16K WB 4-way 512K to 2M WB 4-way 64-bit 36-bit 9.5 mln. 8 0.25µ 2.0V core
2.0-2.7V I/O
i686
K7 500 to 1000MHz Apr,1999 64K+64K WB 2-way 512K WB 2-way 64-bit 36-bit 22 mln. 8 0.25µ 13 1.6-1.8V core
3.3V I/O
i686
Coppermine 500 to 1133MHz Oct,1999 16K+16K WB 4-way 256K WB 8-way 64-bit 32-bit 28 mln. 0.18µ 1.6-1.8V i686
Cascades 700 to 900MHz Oct,1999 16K+16K WB 4-way 1M or 2M WB 8-way 64-bit 36-bit 140 mln. 0.18µ OCVR 14 i686
K6-2+ 450 to 550MHz Apr,2000 32K+32K WB 2-way 128K WB 4-way 64-bit 32-bit 15.3 mln. 0.18µ 2.0V core
3.3V I/O
i586
K6-III+ 400 to 500MHz Apr,2000 32K+32K WB 2-way 256K WB 4-way 64-bit 32-bit 23 mln. 0.18µ 2.0V core
3,3V I/O
i586
Thunderbird 700 to 1400MHz Jun,2000 64K+64K WB 2-way 256K WB 16-way 64-bit 36-bit 37 mln. 0.18µ 1.7-1.75V i686
Spitfire 600 to 950MHz Jun,2000 64K+64K WB 2-way 64K WB 16-way 64-bit 36-bit 25 mln. 0.18µ 1.5-1.6V i686
Samuel 500 to 700MHz Jun,2000 64K WB 4-way +
64K WB 2-way
none 64-bit 32-bit 11.3 mln. 0.18µ 1.9-2.0V i586
Willamette 1300 to 2000MHz Jan,2001 8K WT 4-way +
12Kuops 8-way
256K WB 8-way 64-bit 36-bit 42 mln. 0.18µ 1.7-1.75V i686
Samuel 2 733 to 800MHz Mar,2001 64K WB 4-way +
64K WB 2-way
64K WB 4-way 64-bit 32-bit 15.2 mln. 0.15µ 1.5-1.6V i586
Palomino 1000 to 1733MHz Jun,2001 64K+64K WB 2-way 256K WB 16-way 64-bit 36-bit 37.5 mln. 0.18µ 1.75V i686
Tualatin 700 to 1400MHz Jul,2001 16K+16K WB 4-way 256K or
512K WB
8-way
64-bit 36-bit 44 mln. 0.13µ 1.45-1.5V i686
Morgan 1000 to 1300MHz Aug,2001 64K+64K WB 2-way 64K WB 16-way 64-bit 36-bit 25.2 mln. 0.18µ 1.75V i686
Northwood 1600 to 3200MHz Jan,2002 8K WT 4-way +
12Kuops 8-way
512K WB 8-way 64-bit 36-bit 55 mln. 0.13µ 1.475-1.55V i686
Foster MP 1400 to 1600MHz Mar,2002 8K WT 4-way +
12Kuops 8-way
256K [L2] WB 8-way
+
1M [L3] WB 8-way
64-bit 36-bit 108 mln. 0.18µ 1.7V i686
Thoroughbred 1400 to 2250MHz Jun,2002 64K+64K WB 2-way 256K WB 16-way 64-bit 36-bit 37.2 mln. 0.13µ 1.5-1.65V i686
Gallatin 1500 to 3200MHz Mar,2002 8K WT 4-way +
12Kuops 8-way
512K [L2] WB 8-way
+
2M [L3] WB 8-way
64-bit 36-bit 169 mln. 0.13µ 1.475-1.55V i686
Barton 1833 to 2200MHz Feb,2003 64K+64K WB 2-way 512K WB 16-way 64-bit 36-bit 54.3 mln. 0.13µ 1.65V i686

1. Many first 66 MHz CPUs had overheating troubles and were released as 60 MHz.

2. 6x86 at 150MHz was manufactured with a 0.44µ technology.

3. In August of 1997 a 200MHz version with two 512Kb L2 cache dies per chip (totally 1Mb) was released.

4. Core only; the 256Kb L2 cache consisted of 15.5 mln. transistors, 512Kb -- of 31 mln.

5. Only 150MHz model was manufactured with a 0.6µ process.

6. Many CPUs featured dual voltage: 2.5-2.9V core with 3.4V I\O.

7. MediaGX at 200, 233 and 266MHz were manufactured with a 0.25µ technology.

8. Core only, without L2 cache.

9. K6 Model 6 was manufactured with a 0.35µ process, Model 7 -- 0.25µ.

10. K6 Model 6 at 233MHz required 3.2Vcore, all K6 Model 7 -- 2.2Vcore.

11. 6x86MX chips were also manufactured by National Semiconductor with a 0.30µ technology, since Q2 of 1998; 225, 233, 250 and 263MHz chips were manufactured by IBM with a 0.25µ technology.

12. Some CPUs were labeled as 6x86MIIv, manufactured with a 0.18µ process (285 and 300MHz, maybe others too), and required 2.2Vcore.

13. Initial 500-700MHz models were manufactured with a 0.25µ process, but 0.18µ 550-1000MHz models (codenamed as K75) took place in production since January-March of 2000.

14. Every Cascades (and PIIIXeon of Coppermine core) had an OCVR (On Cartridge Voltage Regulator) that drew either 2.8 or 5/12V.

to be continued...



Designed and maintained by Alasir Enterprises, 1999-2005
rhett at alasir.com