Description of compiler flags for Intel C++ Compiler 8.0 ------------------------------------------------------- -O2 Optimizes for speed. The -O2 option has the same effect as specifying the following options: -Og, -Oi, -Ot, -Oy, -Ob1, -Gf, -Gs, and -Gy. This options defaults to ON. -O3 Optimizes for speed. Enables high-level optimization. This level does not guarantee higher performance. Using this option may increase the compilation time. Impact on performance is application dependent, some applications may not see a performance improvement. -Oa[-] Assume [not assume] no aliasing -Obn Controls the compiler's inline expansion. The amount of inline expansion performed varies with the value of n as follows: 0: Disables inlining. 1: Enables (default) inlining of functions declared with the __inline keyword. Also enables inlining according to the C++ language. 2: Enables inlining of any function. However, the compiler decides which functions to inline. Enables interprocedural optimizations and has the same effect as -Qip. -Og Enables global optimizations. -Ot Enables all speed optimizations. -Oi[-] Enables/disables inline expansion of intrinsic functions -Ow[-] Assume[not assume] no cross-function aliasing. -Oy[-] Enables [disables] the use of the EBP register in optimizations. When you disable with -Oy-, the EBP register is used as frame pointer. -Gf Enables string-pooling optimization. -Gs[n] Disables stack-checking for routines with n or more bytes of local variables and compiler temporaries. Default: n=4096 -Gy Packages functions to enable linker optimization. -Qax{K|W|N} Generates specialized code for processor specific codes K, W, N while also generating generic IA-32 code. K = Intel Pentium III and compatible Intel processors W = Intel Pentium 4 and compatible Intel processors N = Intel Pentium 4 and compatible Intel processors. These options also enable advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors. -Qx{K|W|N} Generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. -Qip Enables single-file interprocedural optimizations within a file. -Qipo multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propagation - monitoring module-level static variables - dead code elimination - propagation of function characteristics - passing arguments in registers - loop-invariant code motion -Qprof_gen Instruments the program for profiling: to get the execution count of each basic block. -Qprof_use Enables the use of profiling dynamic feedback information during optimization. -Qrcd Enables[disables] fast conversions of floating-point to integer conversions. This option does not guarantee that any particular rounding mode will be used. -Qansi_alias[-] -Qansi_alias directs the compiler to assume the following: - Arrays are not accessed out of bounds. - Pointers are not cast to non-pointer types, and vice-versa. - References to objects of two different scalar types cannot alias. For example, an object of type int cannot alias with an object of type float, or an object of type float cannot alias with an object of type double. If your program satisfies the above conditions, setting the -Qansi_alias flag will help the compiler better optimize the program. However, if your program does not satisfy one of the above conditions, the -Qansi_alias flag may lead the compiler to generate incorrect code. -GR[-] Enables[disables] C++ Run Time Type Information (RTTI). Default is -GR- -GX[-] Enables[disables] C++ Exception Handling. Default is -GX- -fast Maximize speed across the entire program. Turns on -O3 and -Qipo. /Qfp_port round fp results at assignments & casts (some speed impact) /Qprefetch is warned and ignored by the Intel C/C++ Compiler -Qunroll[n] Specifies the maximum number of times to unroll a loop. n=0 disables loop unrolling. /Qoption,tool,optlist /Qoption passes an option specified by optlist to a tool, where optlist is a comma-separated list of options. tool Description ------------------------------------ cpp Specifies the compiler front-end preprocessor c Specifies the C++ compiler asm Specifies the assembler link Specifies the linker oplist Indicates one or more valid argument strings for the designated program. If the argument is a command-line option, you must include the hyphen. If the argument contains a space or tab character, you must enclose the entire argument in quotation characters (""). You must separate multiple arguments with commas /Qoption can be used with the -Qipo flag to refine IPO. The valid options that can be used for this purpose are: -ip_args_in_regs=0 Disables the passing of arguments in registers. -ip_ninl_max_stats=n Sets the valid max number of intermediate language statements for a function that is expanded in line. The number n is a positive integer. The number of intermediate language statements usually exceeds the actual number of source language statements. The default value for n is 230. The compiler uses a larger limit for user inline functions. -ip_ninl_min_stats=n Sets the valid min number of intermediate language statements for a function that is expanded in line. The number n is a positive integer. The default values for ip_ninl_min_stats are: IA-32 compiler: ip_ninl_min_stats = 7 -ip_ninl_max_total_stats=n Sets the maximum increase in size of a function, measured in intermediate language statements, due to inlining. n is a positive integer whose default value is 2000. shlW32M6.lib: MicroQuill SmartHeap Library 6.0 available from http://www.microquill.com/ -Zp{1|2|4|8|16} Specifies the strictest alignment constraint for structure and union types as 1, 2. 4. 8 or 16 bytes. Default is 16. -arch:SSE Enables the compiler to use SSE instructions. -arch:SSE2 Enables the compiler to use SSE2 instructions. Description of compiler flags for Intel Fortran Compiler 8.0 ------------------------------------------------------------ -O2 Optimizes for maximum speed. The -O2 option has the same effect as -Ox. This options defaults to ON. -O3 Enables -O2 option with more aggressive optimization, for example, loop transformation. Optimizes for maximum speed but may not improve performance for some programs. -Oa[-] Assume [not assume] no aliasing -Ob{0|1|2} Controls the compiler's inline expansion. The amount of inline expansion performed varies as follows: -Ob0: Disable inlining. -Ob1: Disables (default) inlining unless -Qip or -Ob2 is specified. Enables inlining of functions. -Ob2: Enables inlining of any function. However, the compiler decides which functions to inline. Enables interprocedural optimizations and has the same effect as -Qip. -Og Enables global optimizations. -Ot Enables all speed optimizations. -Oi[-] Enables/disables inline expansion of intrinsic functions -Ow[-] Assume[not assume] no cross-function aliasing. -Ox Same as the -O2 option: enables -Gs, and -Ob1, -Og, -Oy, -Ot, -Oi. -Oy[-] Enables [disables] the use of the EBP register in optimizations. When you disable with -Oy-, the EBP register is used as frame pointer. -Gf Enables string-pooling optimization. -Gs[n] Disables stack-checking for routines with n or more bytes of local variables and compiler temporaries. Default: n=4096 -Gy Packages functions to enable linker optimization. -fast Maximize speed across the entire program. Turns on -O3 and -Qipo. -Qax{K|W|N} Generates specialized code for processor specific codes K, W, N while also generating generic IA-32 code. K = Intel Pentium III and compatible Intel processors W = Intel Pentium 4 and compatible Intel processors N = Intel Pentium 4 and compatible Intel processors. These options also enable advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors. -Qx{K|W|N} Generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. -Qip Enables single-file interprocedural optimizations within a file. -Qipo multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propagation - monitoring module-level static variables - dead code elimination - propagation of function characteristics - passing arguments in registers - loop-invariant code motion -Qprof_gen Instruments the program for profiling: to get the execution count of each basic block. -Qprof_use Enables the use of profiling dynamic feedback information during optimization. -Qrcd Enables[disables] fast conversions of floating-point to integer conversions. This option does not guarantee that any particular rounding mode will be used. -Qansi_alias Enables (default) or disables the compiler to assume that the program adheres to the ANSI Fortran type aliasablility rules. For example, an object of type real cannot be accessed as an integer. You should see the ANSI standard for the complete set of rules. -Qscalar_rep[-] Enables[disables] scalar replacement performed during loop transformations. (requires /O3). -Qunroll[n] Specifies the maximum number of times to unroll a loop. n=0 disables loop unrolling. -Qprefetch[-] Enables or disables prefetch insertion (requires -O3). /Qoption,tool,optlist /Qoption passes an option specified by optlist to a tool, where optlist is a comma-separated list of options. tool Description ------------------------------------ fpp Specifies the Fortran preprocessor f Specifies the Fortran compiler asm Specifies the assembler link Specifies the linker oplist Indicates one or more valid argument strings for the designated tool. You must separate multiple arguments with commas. /Qoption can be used with the -Qipo flag to refine IPO. The valid option list that can be used for this purpose are -ip_args_in_regs=0 Disables the passing of arguments in registers. -ip_ninl_max_stats=n Sets the valid max number of intermediate language statements for a function that is expanded in line. The number n is a positive integer. The number of intermediate language statements usually exceeds the actual number of source language statements. The default value for n is 230. The compiler uses a larger limit for user inline functions. -ip_ninl_min_stats=n Sets the valid min number of intermediate language statements for a function that is expanded in line. The number n is a positive integer. The default values for ip_ninl_min_stats are: IA-32 compiler: ip_ninl_min_stats = 7 -ip_ninl_max_total_stats=n Sets the maximum increase in size of a function, measured in intermediate language statements, due to inlining. n is a positive integer whose default value is 2000. shlW32M6.lib: MicroQuill SmartHeap Library 6.0 available from http://www.microquill.com/ -Zp{1|2|4|8|16} Specifies the strictest alignment constraint for structure and union types as 1, 2. 4. 8 or 16 bytes. Default is 16. Other Notes: ------------ "/" and "-" are both allowable starting tokens for flags passed to the compiler i.e. -QxK and /QxK are identical switches. Compiler options for PGI Fortran compiler 5.1 for Windows XP ------------------------------------------------------------- The optimization levels and their meanings are as follows: +ACML Link with the AMD Core Math Library 2.0. Available from www.amd.com -O0 A basic block is generated for each Fortran statement. No scheduling is done between statements. No global optimizations are performed. -O1 Scheduling within extended basic blocks is performed. Some register allocation is performed. No global optimizations are performed. -O2 All level 1 optimizations are performed. In addition, scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer. -O3 This level performs all level-one and level-two optimizations and enables more aggressive hoisting and scalar replacement optimizations. -fast Equivalent to "-O2 -Munroll:=c:1 -Mnoframe -Mlre" -fastsse Equivalent to "-fast -Mscalarsse -Mvect=sse -Mcache_align -Mflushz" -Mcache_align Align unconstrained objects of length greater than or equal to 16 bytes on cache-line boundaries. An unconstrained object is a data object that is not a member of an aggregate structure or common block. This option does not affect the alignment of allocatable or automatic arrays. Note: To effect cache-line alignment of stack-based local variables, the main program or function must be compiled with -Mcache_align. -Mfixed Process source using Fortran90 freeform specifications. -Mflushz Set SSE MXCSR register to flush-to-zero mode. -Mipa=[option] Enables interprocedural analysis with the specified option. The valid options are: -Mipa=align Instructs the IPA to recognize when pointer targets are all cache-line aligned, allowing better SSE code generation. -Mipa=arg Instructs the IPA to remove arguments replaced by -Mipa=ptr,const -Mipa=const Enable propagation of constants across procedure calls. -Mipa=fast Equivalent to: -Mipa=align,arg,const,globals,f90ptr,shape,localarg,ptr,vestigial -Mipa=globals Instructs the IPA to optimize references to globals when not used in procedure calls. -Mipa=localarg Externalizes local variables for use with -Mipa=arg -Mipa=ptr Instructs the IPA to perform pointer disambiguation across procedure calls. -Mipa=vestigial Instructs the IPA to eliminate functions that are not called. -Mnoframe Eliminate operations that set up a true stack frame pointer for functions. -Mnosmart Don't run the Smart assembly re-write tool to enable post-compilation linear assembly scheduling and optimization -Mscalarsse Utilize the SSE (Streaming SIMD(Single Instruction Multiple Data) Extensions) and SSE2 instructions to perform the operations coded. This implies -Mflushz. -Munix Use UNIX calling conventions, no trailing underscores. -Munroll Invokes the loop unroller. This also sets the optimization level to 2 if the level is set to less than 2. c:m Instructs the compiler to completely unroll loops with a constant loop count less than or equal to m, a supplied constant. If this value is not supplied, the m count is set to 4. n:u Instructs the compiler to unroll u times, a loop which is not completely unrolled, or has a non-constant loop count. If u is not supplied, the unroller computes the number of times a candidate loop is unrolled. -Mvect=sse Instructs the vectorizer to search for loops, and where possible, use the SSE or SSE2 and prefetch instructions (depending on which processor is targeted). Portability options for CPU2000: ------------------------------- 176.gcc: -Dalloca=_alloca : so as to use the built-in optimized alloca /Fn : 176.gcc uses alloca and this options tells the linker to pre-allocate n bytes of stack. The default amount of stack allocated is not enough and 176.gcc crashes with a run-time error 178.galgel: -Mfixed : Assume free-format source 186.crafty: -DNT_i386 : Specifies that it is a Windows NT Intel processor-based system which makes the compiler use "long long" as the 64-bit variable that 186.crafty needs. 253.perlbmk: -DSPEC_CPU2000_NTOS : This enables the code changes for porting to Windows get included. -DPERLDLL : On Windows, we need a perl.exe instead of a perl.exe and perl.dll. This pre-define ensures that the changes necessary to get a single, UNIX-style executable without getting the indirect calls that can cause a 10% performance degradation. This allows the Windows-based executable to be as close as possible to the Unix-based one. /MT : Use the static multi-threaded library else it will not compile. 254.gap: -DSYS_HAS_CALLOC_PROTO : -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence of malloc and calloc prototypes.