Sun flags file for SPEC benchmark suite SPEC OMP2001 This file is for flags used with the Sun, Opteron based systems. Flags described below are for the compilers: Sun Studio 10 Sun Studio 11 And for the OS Solaris 10 Revised 4 Novemember 2005 ---------------------------------------------------------------------------- Compiler flags ---------------------------------------------------------------------------- Flag Description ---- ----------- -D Set definition for preprocessor. -dalign Selects generation of faster double word load/store instructions, and alignment of double and quad data on their natural boundaries in common blocks. -depend=yes Selects dependence analysis to better optimize DO loops. -e Accept extended (132 character) input source lines (FORTRAN) -fast This is a convenience option for selecting a set of optimizations for performance and it chooses the following switches that are defined elsewhere in this page: (C) -fns -fsimple=2 -fsingle -ftrap=%none -nofstore -xalias_level=basic -xbuiltin=%all -xdepend -xlibmil -xlibmopt -xO5 -xregs=frameptr -xtarget=native (Fortran) -xtarget=native -xO5 -xlibmil -fsimple=2 -dalign -xlibmopt -depend=yes -fns -ftrap=common -pad=local -xvector=yes -xprefetch=yes -xprefetch_level=2 -nofstore -fixed Accept fixed-format input source files (FORTRAN) -fns Select non-standard floating point mode. This flag causes the nonstandard floating point mode to be enabled when a program begins execution. By default, the nonstandard floating point mode will not be enabled automatically. Warning: When nonstandard mode is enabled, floating point arithmetic may produce results that do not conform to the requirements of the IEEE 754 standard. See the Numerical Computation Guide for more information (see docs.sun.com). -fsimple=2 Selects aggressive floating-point optimizations. This option might be unsuited for programs requiring strict IEEE 754 standards compliance. -fsingle (-Xt and -Xs modes only) Causes the compiler to evaluate float expressions as single precision, rather than double precision. (This option has no effect if the compiler is used in either -Xa or -Xc modes, as float expressions are already evaluated as single precision.) -ftrap=t Sets the IEEE 754 trapping mode in effect at startup. t is a comma-separated list that consists of one or more of the following: %all, %none, common, [no%]invalid, [no%]overflow, [no%]underflow, [no%]division, [no%]inexact. The default is -ftrap=%none. This option sets the IEEE 754 trapping modes that are established at program initialization. Processing is left-to-right. common - invalid, division by zero, and overflow. %none - the default, turns off all trapping modes. Do not use this option for programs that depend on IEEE standard exception handling; you can get different numerical results, premature program termination, or unexpected SIGFPE signals. -lm Link with math library -lmopt This chooses the math library that is optimized for speed -lmtmalloc Link with library libmtmalloc.a or libmtmalloc.so -lmvec Link with vector math library -nofstore Cancels forcing expressions to have the precision of the result. -pad=local Local padding to improve use of cache. -Qoption Pass option list to the compiler phase (Fortran, C++): f90comp - Fortran first pass iropt - Global optimizer cg - Code generator -qoption Same as -Qoption, the q is not case sensitive -Qoption iropt -Aujam:inner=g Increase the probability that small-trip-count inner loops will be fully unrolled. -Qoption iropt -xprefetch_level[=1|2|3] Increase the probability that small-trip-count inner loops will be fully unrolled. -xprefetch_level=1 enables automatic generation of prefetch instructions. -xprefetch_level=2 enables additional generation beyond level 1 and -xprefetch=3 enables additional generation beyond level 2. -xalias_level= Allows compiler to perform type-based alias analysis at the given alias level (C). basic - assume ISO C9X aliasing rules for basic types only. std - assume ISO C9X aliasing rules. strong - assume all pointers are type safe (strongly typed). -xarch=isa This option limits the code generated by the compiler to the instructions of the specified instruction set architecture. amd64 Compile for 64-bit Solaris x86 platforms. -xbuiltin=%all Substitute intrinsic functions or inline system functions where profitable for performance. -Xc Assume strict ANSI C conformance. -xcrossfile[=] Enable optimization and inlining across source files, n={0|1}. The default is -xcrossfile=0 which specifies that no cross file optimizations are performed. -xcrossfile is equivalent to -xcrossfile=1. Normally, the scope of the compiler's analysis is limited to each separate file on the command line. With -xcrossfile, the compiler analyzes all the files named on the command line as if they had been concatenated into a single source file. -xdepend Analyze loops for data dependencies. -xipo[=] Enable optimization and inlining across source files, n={0|1|2}. At -xipo=2, the compiler performs interprocedural aliasing analysis as well as optimiza- tion of memory allocation and layout to improve cache performance. -xlibmil selects inlining of certain math library routines. -xlibmopt Selects linking the optimized math library. -xlic_lib=sunperf Link in the Sun supplied performance libraries -xO1 Does basic local optimization (peephole). -xO2 xO1 and more local and global optimizations. -xO3 Besides what xO2 does, it optimizes references or definitions for external variables. Loop unrolling and software pipelining are also performed. -xO4 xO3 plus function inlining. -xO5 Besides what xO4 does, it enables speculative code motion. -xopenmp= Enable OpenMP language extension ={noopt|parallel|none}. If you specify -xopenmp, but do not include a value, the compiler assumes -xopenmp=parallel. parallel Enables recognition of OpenMP pragmas. The optimization level under -xopenmp=parallel is -x03. The compiler changes the optimization level to -x03 if necessary and issues a warn- ing. -xpagesize= Set the preferred page size for running the program. -xprefetch_level[=] Controls the aggressiveness of the -xprefetch=auto option (n={1|2|3}) -xprefetch[=val[,val]] Enable prefetch instructions on those architectures that support prefetch. auto Enable automatic generation of prefetch instructions. no%auto Disable automatic generation of prefetch instructions explicit Enable explicit prefetch macros no%explicit Disable explicit prefetch macros yes -xprefetch=yes is the same as -xprefetch=auto,explicit no -xprefetch=no is the same as -xprefetch=no%auto,no%explicit Defaults If -xprefetch is not specified, -xprefetch=no%auto,explicit is assumed. If only -xprefetch is specified, -xprefetch=auto,explicit is assumed. -xprofile Use the profile feature, shorthand used for the process below -xprofile=

Collect data for a profile or use a profile to optimize

={{collect,use}[:],tcov} collect[:name] Collects and saves execution frequency for later use by the optimizer with -xprofile=use. The compiler generates code to measure statement execution-frequency. use[:name] Uses execution frequency data to optimize strategically. The name is the name of the executable that is being analyzed. -xregs= Specify the usage of optional registers -xtarget=native Sets the hardware target. If the program is intended to run on a different target than the compilation machine, follow the -fast with the appropriate -xtarget= option. For example: f95 -fast -xtarget=ultra -xvector=simd Automatic generation of the vector SIMD instructions -xvector=yes Selects the vectorized math library. -xprofile=

Collect or optimize with runtime profiling data

must be collect[:nm], use[:nm], or tcov. At runtime a program compiled with -xprofile=collect:nm will create the subdirectory nm.profile to hold the runtime feedback information. nm is an optional name. -xprofile=collect Collect profile data for feedback directed optimizations. -xprofile=use Use data collected for profile feedback. ---------------------------------------------------------------------------- Operating System ---------------------------------------------------------------------------- Environment Variables Description --------------------- ----------- LD_PRELOAD=mpss.so.1 Allow use of the mpss.so.1 shared object, which provides a means by which preferred stack and/or heap page sizes can be selected. Once preloaded, the mpss.so.1 shared object reads environment variables MPSSHEAP and MPSSSTACK to determine any preferred page MPSSHEAP= Specify the preferred page size for heap. The specified page size is applied to all created processes. MPSSSTACK= Specify the preferred page size for stack. The specified page size is applied to all created processes. OMP_DYNAMIC Enables (TRUE) or disables (FALSE) dynamic adjustment of the number of threads available for execution of parallel regions. OMP_NUM_THREADS Sets the number of threads to use during execution, unless that number is explicitly changed by calling the OMP_SET_NUM_THREADS subroutine. SUNW_MP_PROCBIND This environment variable can be used to bind the LWPs (lightweight processes) managed by the microtasking library, libmtsk, to processors. Performance can be enhanced with processor binding, but performance degradation will occur if multiple LWPs are bound to the same processor. The value for SUNW_MP_PROCBIND can be - The string TRUE or FALSE (in any case). - a non-negative integer. - a list of two or more non-negative integers separated by one or more spaces (" "). - two non-negative integers, n1 and n2, separated by a minus ("-"); n1 must be less than or equal to n2. Integers in the above denote the "logical" processor IDs to which the LWPs are to be bound. Logical processor IDs are consecutive integers that start with 0, and may or may not be identical to the actual processsor IDs. If n processors are available online, then their logical processor IDs are 0, 1, ..., n-1. By default, LWPs are not bound to processors. It is left up to the operating system, Solaris, to schedule LWPs onto processors. STACKSIZE A default stacksize of 4 MB (for 32-bit programs) and 8 MB (for 64-bit programs) is used for additional threads created in an OpenMP program. The environment variable STACKSIZE can be used to set it to a different value. For example, setenv STACKSIZE 2048 creates threads with stacksize of 2 MB each. OMP_NESTED Enables or disables nested parallelism. Value is either TRUE or FALSE. SUNW_MP_THR_IDLE=SPIN Controls the end-of-task status of each helper thread executing the parallel part of a program. You can set the value to spin, sleep ns, or sleep nms. The default is SPIN -- the thread spins (or busy-waits) after completing a parallel task until a new parallel task arrives. Choosing SLEEP time specifies the amount of time a helper thread should spin-wait after completing a parallel task. If, while a thread is spinning, a new task arrives for the thread, the tread executes the new task immediately. Otherwise, the thread goes to sleep and is awakened when a new task arrives. time may be specified in seconds, (ns) or just (n), or milliseconds, (nms). SLEEP with no argument puts the thread to sleep immediately after completing a parallel task. SLEEP, SLEEP (0), SLEEP (0s), and SLEEP (0ms) are all equivalent. - - - - - - - - - - - - - - - - - - - - - - - - - Shell Variables Description --------------- ----------- ulimit -s unlimited Set size of stack segment to unlimited ----------------------------------------------------------------------------