Inline SSE routines for MILC V6 Jan 28, 2002 This tar file contains SSE versions of a number of the C-language matrix-vector and matrix-matrix routines: mult_su3_nn mult_su3_na mult_su3_an mult_su3_mat_vec mult_adj_su3_mat_vec mult_su3_mat_vec_sum_4dir mult_adj_su3_mat_vec_4dir mult_adj_su3_mat_4vec su3_projector mult_su3_mat_hwvec mult_adj_su3_mat_hwvec sub_four_su3_vecs add_su3_vector scalar_mult_add_su3_vector scalar_mult_add_su3_matrix These have all been implemented using the NASM assembler (see http://nasm.2y.net). However, using this assembler implies that these routines must be called as subroutines. Compared with the comparable inline assembler codes which may be implemented with gcc, there's a substantial overhead associated with the subroutine calls (about 25 cycles). Consequently I've implemented a translator (nas2m2c.pl) which generates gcc inline macros from each of these codes. To use these inline routines, do the following: 1. Unpack this tar file in the top level MILC directory 2. In libraries/, do: `make -f Make_sse all` 3. In any MILC C program where you want to substitute these inline macros for their corresponding C routines, add the following near the top of the file: #define SSE_SUBS #include "../include/inline_sse.h" Note that in step 3, the SSE_SUBS macro will enable macros similar to the following: #define mult_su3_nn(args...) _inline_sse_mult_su3_nn(## args) The affect of these macros is to transparently substitute the inline macros for the original C invocations. If you prefer, do not define SSE_SUBS and edit each C invocation in the code, adding an "_inline_sse_" prefix. To use these inline routines in all of the modules of a build, add the lines in step #3 above to include/su3.h (or any other header file universally included). You may also use the NASM assembler to create object files. Invoke the assembler as follows: nasm -f elf sse_mult_nn.nas Include the resulting sse_mult_nn.o file in your link step. You'll have to edit all invocations of C routines to add "sse_" prefixes, or use macros like those shown above (note, however, that only gcc allows varargs-type macro constructions). Using nasm-generated object files is useful when you're using a non-gcc compiler to build these codes. Don Holmgren, Fermilab djholm@fnal.gov