Alasir Enterprises
Main Page >  Reference >  A Concise Guide to the MMX Technology  

 
Main Page
 
 
Reviews
 
Articles
 
Software
 
Reference
 
Motley
 
 
About Us
 
 
A Concise Guide to the MMX Technology

Paul V. Bolotoff
 
Release date: 31st of January 2005
Last modify date: 25th of January 2007

Contents:

 
Introduction

MMX itself stands either for MultiMedia eXtension or Matrix Math eXtension or nothing at all. Anyway, it doesn't really matter. This technology has been developed by Intel to accelerate a wide range of multimedia tasks, i. e. those related to video and audio processing. In fact, a new instruction set has been introduced to perform these tasks directly on a general-purpose x86 processor. Previously, it has been a question of expensive third-party DSP-based hardware designs to achieve performance improvements in specific areas, for instance, MPEG video compression & decompression. Basically, the MMX technology allows for the same or comparable effect at no additional costs, it only needs to rewrite some parts of multimedia applications. Primary architects of this technology are Alex Peleg and Uri Weiser of Intel's Israel Design Center in Haifa.
 
This technology has been implemented initially in the Pentium MMX (P55C) processor in January of 1997 and next in Pentium II (Klamath) in April of 1997. It has also been supported by competitive products such as AMD K6, Cyrix 6x86MX and IDT WinChip. In fact, all x86 processors manufactured after 1997 can execute MMX-optimised code. It also seems to be a fact that this technology has achieved a great success among multimedia software developers. Although MMX isn't the first technology to appear in this area. The pioneers are the Intel 860 and Motorola 88110 processors which have introduced limited instruction extensions to accelerate basic functions of 3D graphics back in 1991-92. There are also the Hewlett-Packard PA-7100LC (1994), Sun UltraSPARC (1995) and MIPS R10000 (1996) processors as well as their derivatives which feature support for MAX-1 (Multimedia Acceleration eXtensions - 1), VIS (Visual Instruction Set) and MDMX (MIPS Digital Media eXtension) respectively. Although MDMX may be considered as the closest relative to MMX because it has been also intended to accelerate multimedia calculations and to follow a register mapping technique explained below.
 
The MMX technology follows a vectorised execution approach also known as SIMD (Single Instruction, Multiple Data). There are 8 new 64-bit MMX registers available — MM0 to MM7 — as well as 47 new MMX instructions separated for 7 groups (arithmetical, logical, comparing, shifting, converting, transferring, state managing). However, it needs to mention that the MMX registers happen to be mapped onto the mantissa space (i. e. the lower 64 bits) of the respective floating-point registers — ST(0) to ST(7) — which are 80-bit in accordance with double extended format defined in the IEEE 754 standard. The exponent space and the sign bit (i. e. the upper 16 bits) are set to all ones on every write operation to a particular hardware register, thus indicating NaN (Not a Number) or infinity if this register with MMX data happens to be treated as containing FP data.
 
i387-compatible floating-point register
79   64 63 0
s exponent mantissa

The MMX registers can be accessed in random order unlike the FP registers which are subjects for stack order. Nevertheless, since both the MMX and FP register sets have been implemented using the same hardware register set, it isn't really possible to execute both MMX and FP code concurrently. That's a large drawback, though in practice most subroutines incorporating MMX code don't need to perform any FP operations. On the other hand, operating systems don't need to get updated to support the technology because the MMX registers can be saved and restored during task switches as if they were the FP registers (see the description of the EMMS instruction for further information on this matter).
 
The MMX instructions operate with several new data types. There are packed byte, packed word, packed doubleword, and quadword. Since the MMX registers are 64-bit wide, it takes either 8 bytes (a packed byte) or 4 words (a packed word) or 2 doublewords (a packed doubleword) or 1 quadword to populate a register. Vectorisation means that all data entities are processed in parallel. In other words, if there are 4 words of data in a particular register, they may be processed at once with a single MMX instruction. Having traditional integer or floating-point code, it would be necessary to process those data words one by one, so performance decrease may be tangible very much. The following example illustrates how the PADDW instruction works.
 
63 0  
 
word3 word2 word1 word0
  src
 
word7 word6 word5 word4
  dst
 
 
word3 + word7 word2 + word6 word1 + word5 word0 + word4
  dst (new)

 
In addition, the technology introduces both wrapping-around and saturating instructions. In case of wrap-around, results overflown or underflown are truncated and only the lower (least significant) bits are returned. In case of saturation, results overflown or underflown are clipped (saturated) to the data range limit for a type given. For instance, word calculations are saturated to the maximum possible value (0xFFFF) in case of overflow and to the minimum possible value (0x0000) in case of underflow. It's important very much for some tasks to choose the right kind of instructions.
 
All MMX instructions feature 1-cycle latency except of PMUL (Packed MULtiply) and PMADD (Packed Multiply and ADD) which take 3 cycles to complete, though they're fully pipelined and can be issued every cycle to a functional unit. Transferring instructions may be processed in a single cycle, though actual execution time depends on data disposition. EMMS (Exit MultiMedia State) is the only instruction with an undefined (implementation dependent) execution time. The information above applies to Intel processors only, other designs might behave differently.
 
The simplest way to find out whether a particular processor supports MMX is to use the CPUID instruction. It's known that the standard function 1 returns bit 23 of EDX set high if the support is available. The following code illustrates this procedure:
	movl	$1, %eax
	cpuid
	testl	$0x00800000, %edx
	jz	.mmx_unsupported

There are no operands labelled as "first" and "second" throughout this reference. It seems to be less ambiguous to have them defined as "source" and "destination" because these operands are positioned reversely by the assembly language syntaxes of AT&T and Intel style. It's a matter of taste to prefer either, but the AT&T style seems to dominate on UNIX-like systems while the other one is widespread where Windows reigns.
 
Abbreviations used:
 src     source operand
 dst     destination operand
 mm      MMX register
 int32   integer register (32-bit)
 imm8    immediate value (8-bit)
 mem32   adjacent 32 bits in memory
 mem64   adjacent 64 bits in memory
 
Next page >>
 

Copyright (c) Paul V. Bolotoff, 2005-07. All rights reserved.
A full or partial reprint without a permission received from the author is prohibited.
 
Designed and maintained by Alasir Enterprises, 1999-2007
rhett from alasir.com, walter from alasir.com