SSE-HOWTO ================================================================================ The Pentium III (and later) PC processors have an additional set of registers and an associated arithmetic unit that can be addressed through the Streaming SIMD Extension (SSE) instructions. This instruction set includes cache manipulation functions and instructions for parallel single precision arithmetic. An extended instruction set (SSE2) that enables the use of the SSE registers for double precision arithmetic is supported by the Pentium 4 processor. SSE + SSE2 instructions will also be understood by the forthcoming generations of AMD processors. For full details on the architecture and capabilities of the Pentium processors, the best address is http://developer.intel.com/software/idap/processor/ Just click on the processor family you are interested in, and you will be guided to a host of useful information. In particular, the manuals that can be downloaded in pdf format are all what is needed to be able to exploit the advanced features of these processors. On a Linux PC the SSE unit can be fully used provided - Linux kernel 2.4.0 or later, properly configured for PIII+ machines, - gcc version 2.95.2 or later, and - binutils snapshot 010122 or later are installed. The memory prefetch instructions also work with earlier versions of the kernel, but the SSE registers cannot be accessed without kernel patch (not recommended). Current stable versions of binutils support SSE but not SSE2. If you would like to write your own programs with SSE instructions, I suggest to first look at the files linalg.c and sse.h in the directory 64bit/. To learn about gcc inline assembly, it suffices to browse through the gcc info pages (-> C Extensions -> Extended ASM) and the macros in the file sse.h. A complete list of the available SSE and SSE2 instructions, with exhaustive explanations on what precisely they do, is given in the Intel reference manuals that can be downloaded from the above site (some 800 pages, not to be printed out!). An important point to note is that the Intel syntax and the gcc syntax are exactly reversed. For an instruction that requires a source and destination register, for example, the notations are Intel: instruction destination source gcc: instruction source, destination Some instructions have a third argument, a so-called immediate integer, and the rules in this case are Intel: instruction destination source immediate gcc: instruction immediate, source, destination In the gcc format the integer can be specified as $0x1f, where the $ is a syntactic key, 0x switches to hexadecimal base and 1f is the (hexadecimal) number that is to be passed to the instruction.