# Invocation command line: # /home/cpu2017-1.1.8-amd-aocc400-genoa-B1b/bin/harness/runcpu --configfile amd_speed_aocc400_genoa_B1.cfg --tune all --reportable --iterations 3 --nopower --runmode speed --tune base:peak --size test:train:refspeed fpspeed # output_root was not used for this run ############################################################################ ################################################################################ # AMD AOCC 400 SPEC CPU 2017 V1.1.8 Speed Configuration File for 64-bit Linux # # File name : amd_speed_aocc400_genoa_B1.cfg # Creation Date : October 6, 2022 # CPU 2017 Version : 1.1.8 # Supported benchmarks : All Speed benchmarks (intspeed, fpspeed) # Compiler name/version : AOCC 4.0.0 # Operating system version : RHEL 8.6 # Supported OS's : Ubuntu 22.04, RHEL 8.6/9, SLE 15 SP4 # Hardware : AMD Genoa (AMD64) # FP Base Pointer Size : 64-bit # FP Peak Pointer Size : 64-bit # INT Base Pointer Size : 64-bit # INT Peak Pointer Size : 64-bit # Auto Parallelization : No # # Note: DO NOT EDIT THIS FILE, the only edits required to properly run these # binaries are made in the ini Python file. Please consult Readme.amd_speed_aocc400_genoa_B1.txt # for a few uncommon exceptions which require edits to this file. # # Description: # # This binary package automates away many of the complexities necessary to set # up and run SPEC CPU 2017 under optimized conditions on AMD Genoa-based # server platforms within Linux (AMD64). # # The binary package was built specifically for AMD Genoa microprocessors and # is not intended to run on other products. # # Please install the binary package by following the instructions in # "Readme.amd_speed_aocc400_genoa_B1.txt" under the "How To Use the Binaries" section. # # The binary package is designed to work without alteration on one socket AMD # Genoa-based servers with 96 cores, SMT enabled and 768 (64x12) GB of DDR5 # memory distributed evenly among all 12 channels using 64 GiB DIMMs. # # To run the binary package on other Genoa configurations, please review # "Readme.amd_speed_aocc400_genoa_B1.txt". In general, Genoa CPUs # should be autodetected with no action required by the user. # # In most cases, it should be unnecessary to edit "amd_speed_aocc400_genoa_B1.cfg" or any # other file besides "ini_amd_speed_aocc400_genoa_B1.py" where reporting fields # and run conditions are set. # # The run script automatically sets the optimal number of speed copies and binds # them appropriately. # # The run script and accompanying binary package are designed to work on Ubuntu # 22.04, RHEL 8.6/9, and SLE 15 SP4. # # Important! If you write your own run script, please set the stack size to # "unlimited" when executing this binary package. Failure to do so may cause # some benchmarks to overflow the stack. For example, to set stack size within # the bash shell, include the following line somewhere at the top of your run # script before the runcpu invocation: # # ulimit -s unlimited # # Modification of this config file should only be necessary if you intend to # rebuild the binaries. General instructions for rebuilding the binaries are # found in-line below. # ### add power options power_analyzer = WIN:9888 temp_meter = WIN:9889 ################################################################################ # Modifiable macros: ################################################################################ # "allow_build"" switch: # Change the following line to true if you intend to REBUILD the binaries (AMD # does not support this). Valid values are "true" or "false" (no quotes). %define allow_build false # Only change these macros if you are rebuilding the binary package: %define compiler_name aocc400 %define binary_package_name amd_speed_%{compiler_name}_genoa_B %define binary_package_ext %{binary_package_name} %define binary_package_revision 1 %define build_path ${SPEC} %define flags_file_name %{compiler_name}-flags.xml # Do NOT change build_lib_dir after the build or it will trigger a # rebuild of the xalanc. It should also remain literal: %define build_lib_dir amd_speed_aocc400_genoa_B_lib # To enable the platform file, be sure to uncomment the flagsurl02 header line # below in the Header settings. %define platform_file_name INVALID_platform_%{binary_package_name}.xml ################################################################################ # You should never have to change binary_package_full_name: %define binary_package_full_name %{binary_package_name}%{binary_package_revision} ################################################################################ # Include file name ################################################################################ # The include file contains fields that are commonly changed. This file is auto- # generated based upon INI file settings and should not need user modification # for runs. %define inc_file_name %{binary_package_full_name}.inc %define flags_inc_file_name %{binary_package_full_name}_flags.inc # Binary label extension: # Only modify the binary label extension if you plan to rebuild the binaries. # If you plan to recompile these CPU 2017 binaries, please choose a new extension # name below to avoid confusion with the current binary set on your system # under test, and to avoid confusion for SPEC submission reviewers. You will # also need to set "allow_build" to true above. Finally, you must modify the # Paths section below to point to your library locations if the paths are not # already set up in your build environment. # Note that AMD calls an external script to set up the compiler and library # paths before initiating the build. %define ext %{binary_package_ext} ################################################################################ # Paths and Environment Variables # ** MODIFY AS NEEDED (modification should not be necessary for runs) ** ################################################################################ # Allow environment variables to be set before runs: preenv = 1 # retain:true is necessary to avoid gcc out-of-memory exceptions on certain SUTs: # oversize_threshold is required to support jemalloc 5.2.x+ preENV_MALLOC_CONF = oversize_threshold:0,retain:true preENV_LIBOMP_NUM_HIDDEN_HELPER_THREADS = 0 # OpenMP environment variables: preENV_OMP_SCHEDULE = static preENV_OMP_DYNAMIC = false preENV_OMP_STACKSIZE = 128M # Define the name of the directory that holds AMD library files: %define lib_dir %{binary_package_name}_lib # Set the shared object library path for runs and builds: preENV_LD_LIBRARY_PATH = $[top]/%{lib_dir}/lib:%{ENV_LD_LIBRARY_PATH} %if '%{allow_build}' eq 'false' # The include file is only needed for runs, but not for builds. # include: %{inc_file_name} # ----- Begin inclusion of 'amd_speed_aocc400_genoa_B1.inc' ############################################################################ ################################################################################ ################################################################################ # File name: amd_speed_aocc400_genoa_B1.inc # File generation code date: October 11, 2022 # File generation date/time: December 18, 2022 / 02:36:02 # # This file is automatically generated during a SPEC CPU2017 run. # # To modify inc file generation, please consult the readme file or the run # script. ################################################################################ ################################################################################ ################################################################################ ################################################################################ # The following macros are generated for use in the cfg file. ################################################################################ ################################################################################ %define logical_core_count 192 %define physical_core_count 96 %define physical_core_max 95 %define logical_core_max 191 ################################################################################ ################################################################################ # The following inc blocks set the speed thread counts and affinity settings. # # intspeed benchmarks: 600.perlbench_s,602.gcc_s,605.mcf_s,620.omnetpp_s, # 623.xalancbmk_s,625.x264_s,631.deepsjeng_s,641.leela_s,648.exchange2_s, # 657.xz_s # fpspeed benchmarks: 603.bwaves_s,607.cactuBSSN_s,619.lbm_s,621.wrf_s, # 627.cam4_s,628.pop2_s,638.imagick_s,644.nab_s,649.fotonik3d_s, # 654.roms_s # # Selected thread counts from '9454' section of CPU info ################################################################################ # default preENV thread settings: default: preENV_OMP_THREAD_LIMIT = 192 preENV_GOMP_CPU_AFFINITY = 0-191 ################################################################################ ################################################################################ # intspeed base thread counts: intspeed=base: threads = 96 ENV_GOMP_CPU_AFFINITY = 0-95 bind0 = numactl --physcpubind=0-95 submit = echo "$command" > run.sh ; $BIND bash run.sh ################################################################################ ################################################################################ # fpspeed base thread counts: fpspeed=base: threads = 96 ENV_GOMP_CPU_AFFINITY = 0-95 bind0 = numactl --physcpubind=0-95 submit = echo "$command" > run.sh ; $BIND bash run.sh ################################################################################ ################################################################################ # peak thread counts: 1 600.perlbench_s,602.gcc_s,605.mcf_s,620.omnetpp_s,623.xalancbmk_s,625.x264_s,631.deepsjeng_s,641.leela_s,648.exchange2_s=peak: threads = 1 ENV_GOMP_CPU_AFFINITY = 15 bind0 = numactl --physcpubind=15 submit = echo "$command" > run.sh ; $BIND bash run.sh ################################################################################ ################################################################################ # peak thread counts: 96 603.bwaves_s,619.lbm_s,621.wrf_s,628.pop2_s,649.fotonik3d_s=peak: threads = 96 ENV_GOMP_CPU_AFFINITY = 0-95 bind0 = numactl --physcpubind=0-95 submit = echo "$command" > run.sh ; $BIND bash run.sh ################################################################################ ################################################################################ # peak thread counts: 192 607.cactuBSSN_s,627.cam4_s,638.imagick_s,644.nab_s,657.xz_s=peak: threads = 192 ENV_GOMP_CPU_AFFINITY = 0-191 bind0 = numactl --physcpubind=0-191 submit = echo "$command" > run.sh ; $BIND bash run.sh ################################################################################ ################################################################################ # peak thread counts: 192 654.roms_s=peak: threads = 192 ENV_GOMP_CPU_AFFINITY = 0 96 1 97 2 98 3 99 4 100 5 101 6 102 7 103 8 104 9 105 10 106 11 107 12 108 13 109 14 110 15 111 16 112 17 113 18 114 19 115 20 116 21 117 22 118 23 119 24 120 25 121 26 122 27 123 28 124 29 125 30 126 31 127 32 128 33 129 34 130 35 131 36 132 37 133 38 134 39 135 40 136 41 137 42 138 43 139 44 140 45 141 46 142 47 143 48 144 49 145 50 146 51 147 52 148 53 149 54 150 55 151 56 152 57 153 58 154 59 155 60 156 61 157 62 158 63 159 64 160 65 161 66 162 67 163 68 164 69 165 70 166 71 167 72 168 73 169 74 170 75 171 76 172 77 173 78 174 79 175 80 176 81 177 82 178 83 179 84 180 85 181 86 182 87 183 88 184 89 185 90 186 91 187 92 188 93 189 94 190 95 191 bind0 = numactl --physcpubind=0-191 submit = echo "$command" > run.sh ; $BIND bash run.sh ################################################################################ ################################################################################ ################################################################################ # Switch back to default: default: ################################################################################ ################################################################################ ################################################################################ # The remainder of this file defines CPU2017 report parameters. ################################################################################ ################################################################################ ################################################################################ # SPEC CPU 2017 report header ################################################################################ license_num =9017 tester =Lenovo Global Technology test_sponsor =Lenovo Global Technology hw_vendor =Lenovo Global Technology hw_model000 =ThinkSystem SR665 V3 hw_model001 =(2.75 GHz,AMD EPYC 9454) #--------- If you install new compilers, edit this section -------------------- sw_compiler =C/C++/Fortran: Version 4.0.0 of AOCC ################################################################################ ################################################################################ # Hardware, firmware and software information ################################################################################ hw_avail =Feb-2023 sw_avail =Nov-2022 hw_cpu_name =AMD EPYC 9454 hw_cpu_nominal_mhz =2750 hw_cpu_max_mhz =3800 hw_ncores =96 hw_nthreadspercore =2 hw_ncpuorder =1,2 chips hw_other =None # Other perf-relevant hw, or "None" fw_bios =Lenovo BIOS Version KAE105F 1.20 released Dec-2022 sw_base_ptrsize =64-bit hw_pcache =32 KB I + 32 KB D on chip per core hw_scache =1 MB I+D on chip per core hw_tcache000 =256 MB I+D on chip per chip, hw_tcache001 = 32 MB shared / 6 cores hw_ocache =None sw_other =None ################################################################################ # Notes ################################################################################ # Enter notes_000 through notes_100 here. notes_000 =Binaries were compiled on a system with 2x AMD EPYC 9174F CPU + 1.5TiB Memory using RHEL 8.6 notes_005 = notes_010 =NA: The test sponsor attests, as of date of publication, that CVE-2017-5754 (Meltdown) notes_015 =is mitigated in the system as tested and documented. notes_020 =Yes: The test sponsor attests, as of date of publication, that CVE-2017-5753 (Spectre variant 1) notes_025 =is mitigated in the system as tested and documented. notes_030 =Yes: The test sponsor attests, as of date of publication, that CVE-2017-5715 (Spectre variant 2) notes_035 =is mitigated in the system as tested and documented. notes_040 = notes_submit_000 ='numactl' was used to bind copies to the cores. notes_submit_005 =See the configuration file for details. notes_submit_010 = notes_os_000 ='ulimit -s unlimited' was used to set environment stack size limit notes_os_005 ='ulimit -l 2097152' was used to set environment locked pages in memory limit notes_os_010 = notes_os_015 =runcpu command invoked through numactl i.e.: notes_os_020 =numactl --interleave=all runcpu notes_os_025 = notes_os_030 =To limit dirty cache to 8% of memory, 'sysctl -w vm.dirty_ratio=8' run as root. notes_os_035 =To limit swap usage to minimum necessary, 'sysctl -w vm.swappiness=1' run as root. notes_os_040 =To free node-local memory and avoid remote memory usage, notes_os_045 ='sysctl -w vm.zone_reclaim_mode=1' run as root. notes_os_050 =To clear filesystem caches, 'sync; sysctl -w vm.drop_caches=3' run as root. notes_os_055 =To disable address space layout randomization (ASLR) to reduce run-to-run notes_os_060 =variability, 'sysctl -w kernel.randomize_va_space=0' run as root. notes_os_065 = notes_os_thp_000 =To enable Transparent Hugepages (THP) for all allocations, notes_os_thp_005 ='echo always > /sys/kernel/mm/transparent_hugepage/enabled' and notes_os_thp_010 ='echo always > /sys/kernel/mm/transparent_hugepage/defrag' run as root. notes_comp_000 =The AMD64 AOCC Compiler Suite is available at notes_comp_005 =http://developer.amd.com/amd-aocc/ notes_comp_010 = # notes_jemalloc_000 =jemalloc: configured and built with GCC v4.8.2 in RHEL 7.4 (No options specified) # notes_jemalloc_005 =jemalloc 5.1.0 is available here: # notes_jemalloc_010 =https://github.com/jemalloc/jemalloc/releases/download/5.1.0/jemalloc-5.1.0.tar.bz2 # notes_jemalloc_015 = # sw_other000 =jemalloc: jemalloc memory allocator library v5.1.0 ################################################################################ # The following note fields describe platorm settings. ################################################################################ # example: (edit and uncomment as necessary) # notes_plat_000 =BIOS settings: # notes_plat_002 = TDP: 400 # notes_plat_004 = Determinism Slider set to Power # notes_plat_006 = PPT: 400 # notes_plat_010 = NPS: 4 # notes_plat_011 = Workload Profile = CPU Intensive # notes_plat_012 = TSME = Disabled # notes_plat_014 = SEV Control = Disabled # notes_plat_015 = Fan Speed: Maximum ################################################################################ # The following are custom fields: ################################################################################ # Use custom_fields to enter lines that are not listed here. For example: # notes_plat_100 = Energy Bias set to Max Performance # new_field = Ambient temperature set to 10C ################################################################################ # The following fields must be set here for only Int benchmarks. ################################################################################ intspeed: sw_peak_ptrsize =64-bit notes_os_thp_003 = ################################################################################ # The following fields must be set here for FP benchmarks. ################################################################################ fpspeed: sw_peak_ptrsize =64-bit notes_os_thp_015 =To always enable THP for peak runs of: notes_os_thp_020 =603.bwaves_s, 607.cactuBSSN_s, 619.lbm_s, 627.cam4_s, 628.pop2_s, 638.imagick_s, 644.nab_s, 649.fotonik3d_s: notes_os_thp_025 ='echo madvise > /sys/kernel/mm/transparent_hugepage/enabled; echo always > /sys/kernel/mm/transparent_hugepage/defrag' notes_os_thp_030 =run as root. notes_os_thp_035 =To disable THP for peak runs of 621.wrf_s: notes_os_thp_040 ='echo never > /sys/kernel/mm/transparent_hugepage/enabled; echo always > /sys/kernel/mm/transparent_hugepage/defrag' notes_os_thp_045 =run as root. notes_os_thp_050 =To enable THP only on request for peak runs of 654.roms_s: notes_os_thp_055 ='echo madvise > /sys/kernel/mm/transparent_hugepage/enabled; echo madvise > /sys/kernel/mm/transparent_hugepage/defrag' notes_os_thp_060 =run as root. ################################################################################ # The following fields must be set here or they will be overwritten by sysinfo. ################################################################################ intspeed,fpspeed: hw_disk =1 x 480 GB SATA SSD hw_memory =768 GB (24 x 32 GB 2Rx8 PC5-4800B-R) hw_nchips =2 prepared_by =Lenovo Global Technology sw_file =xfs sw_os000 =SUSE Linux Enterprise Server 15 SP4 (x86_64) sw_os001 =Kernel 5.14.21-150400.22-default sw_state =Run level 3 (multi-user) ################################################################################ # End of inc file ################################################################################ # Switch back to the default block after the include file: default: # ---- End inclusion of '/home/cpu2017-1.1.8-amd-aocc400-genoa-B1b/config/amd_speed_aocc400_genoa_B1.inc' # Switch back to default block after the include file: default: fail_build = 1 %elif '%{allow_build}' eq 'true' # If you intend to rebuild, be sure to set the library paths either in the # build script or here: preENV_LIBRARY_PATH = $[top]/%{build_lib_dir}/lib:%{ENV_LIBRARY_PATH} % define build_ncpus 16 # controls number of simultaneous compiles fail_build = 0 makeflags = --jobs=%{build_ncpus} --load-average=%{build_ncpus} %else % error The value of "allow_build" is %{allow_build}, but it can only be "true" or "false". This error was generated %endif ################################################################################ # Enable automated data collection per benchmark ################################################################################ # Data collection is not enabled for reportable runs. # teeout is necessary to get data collection stdout into the logs. Best # practices for the individual data collection items would be to have # them store important output in separate files. Filenames could be # constructed from $SPEC (environment), $lognum (result number from runcpu), # and benchmark name/number. teeout = yes # Run runcpu with '-v 35' (or greater) to log lists of variables which can # be used in substitutions as below. # For CPU2006, change $label to $ext %define data-collection-parameters benchname='$name' benchnum='$num' benchmark='$benchmark' iteration=$iter size='$size' tune='$tune' label='$label' log='$log' lognum='$lognum' from_runcpu='$from_runcpu' %define data-collection-start $[top]/data-collection/data-collection start %{data-collection-parameters} %define data-collection-stop $[top]/data-collection/data-collection stop %{data-collection-parameters} monitor_specrun_wrapper = %{data-collection-start} ; $command ; %{data-collection-stop} ################################################################################ # Header settings ################################################################################ backup_config = 0 # set to 0 if you do not want backup files bench_post_setup = sync # command_add_redirect: If set, the generated ${command} will include # redirection operators (stdout, stderr), which are passed along to the shell # that executes the command. If this variable is not set, specinvoke does the # redirection. command_add_redirect = yes env_vars = yes flagsurl000 = http://www.spec.org/cpu2017/flags/Lenovo-Platform-SPECcpu2017-Flags-V1.2-Genoa-O.xml flagsurl001 = http://www.spec.org/cpu2017/flags/aocc400-flags.xml #flagsurl02 = $[top]/%{platform_file_name} # label: User defined extension string that tags your binaries & directories: label = %{ext} line_width = 1020 log_line_width = 1020 mean_anyway = yes output_format = all reportable = yes size = test,train,ref teeout = yes teerunout = yes tune = base,peak use_submit_for_speed = yes ################################################################################ # Include the flags file: ################################################################################ #include: %{flags_inc_file_name} # ----- Begin inclusion of 'amd_speed_aocc400_genoa_B1_flags.inc' ############################################################################ ################################################################################ # AMD AOCC 4.0.0 SPEC CPU2017 V1.1.8 Speed Configuration Flags for AMD64 Linux ################################################################################ # Compilers ################################################################################ default: CC = clang -m64 CXX = clang++ -m64 FC = flang -m64 CLD = clang -m64 CXXLD = clang++ -m64 FLD = flang -m64 CC_VERSION_OPTION = --version CXX_VERSION_OPTION = --version FC_VERSION_OPTION = --version ################################################################################ # Portability Flags ################################################################################ default: # data model applies to all benchmarks EXTRA_PORTABILITY = -DSPEC_LP64 # *** Benchmark-specific portability *** # Anything other than the data model is only allowed where a need is proven. # (ordered by last 2 digits of benchmark number) 600.perlbench_s: #lang='C' PORTABILITY = -DSPEC_LINUX_X64 621.wrf_s: #lang='F,C' CPORTABILITY = -DSPEC_CASE_FLAG FPORTABILITY = -Mbyteswapio 623.xalancbmk_s: #lang='CXX' PORTABILITY = -DSPEC_LINUX 627.cam4_s: #lang='F,C' PORTABILITY = -DSPEC_CASE_FLAG 628.pop2_s: #lang='F,C' CPORTABILITY = -DSPEC_CASE_FLAG FPORTABILITY = -Mbyteswapio ################################################################################ # Default libraries and variables ################################################################################ default: # Libraries: EXTRA_LIBS = -fopenmp=libomp \ -lomp \ -lamdalloc \ -lamdlibm \ -lm MATHLIBOPT = #clearing this variable or else SPEC will set it to -lm VECMATHLIB = -fveclib=AMDLIBM # Variables: OPT_ROOT = -march=znver4 \ $(VECMATHLIB) \ -ffast-math \ -fopenmp OPT_ROOT_BASE = -O3 \ $(OPT_ROOT) OPT_ROOT_PEAK = -Ofast \ $(OPT_ROOT) \ -flto THP_ALWAYS = echo always > /sys/kernel/mm/transparent_hugepage/enabled; echo always > /sys/kernel/mm/transparent_hugepage/defrag THP_NEVER = echo never > /sys/kernel/mm/transparent_hugepage/enabled; echo never > /sys/kernel/mm/transparent_hugepage/defrag THP_MADVISE = echo madvise > /sys/kernel/mm/transparent_hugepage/enabled; echo madvise > /sys/kernel/mm/transparent_hugepage/defrag DEFAULT_SUBMIT = echo "$command" > run.sh ; $BIND bash run.sh ############################################################################### # AOCC 4.0.0 workarounds that do not count as PORTABILITY ################################################################################ # The workarounds in this section would not qualify under the SPEC CPU # PORTABILITY rule. # - In peak, they can be set as needed for individual benchmarks. # - In base, individual settings are not allowed; set for whole suite. # Use EXTRA_CFLAGS, EXTRA_CXXFLAGS, and EXTRA_FFLAGS for them. # # See: # https://www.spec.org/cpu2017/Docs/runrules.html#portability # https://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags ####################### # Default workarounds # ####################### default: # Allow unused compile/link arguments without triggering warnings during build: EXTRA_CFLAGS = -Wno-unused-command-line-argument EXTRA_CXXFLAGS = -Wno-unused-command-line-argument EXTRA_FFLAGS = -Wno-unused-command-line-argument LDOPTIONS = -Wno-unused-command-line-argument #################### # Base workarounds # #################### # # *** NONE *** # ############################## # Integer workarounds - base # ############################## intrate=base: # The following is necessary for 602 gcc: EXTRA_LDFLAGS = -z muldefs ######################### # FP workarounds - base # ######################### # # *** NONE *** # #################### # Peak workarounds # #################### # # *** NONE *** # ############################## # Integer workarounds - peak # ############################## 602.gcc_s=peak: #lang='C' EXTRA_LDFLAGS = -z muldefs ##################################### # Floating Point workarounds - peak # ##################################### # # *** NONE *** # ################################################################################ # Tuning Flags ################################################################################ ##################### # Base tuning flags # ##################### default=base: COPTIMIZE = $(OPT_ROOT_BASE) \ -flto \ -fstruct-layout=7 \ -mllvm -unroll-threshold=50 \ -mllvm -inline-threshold=1000 \ -fremap-arrays \ -fstrip-mining \ -mllvm -reduce-array-computations=3 \ -DSPEC_OPENMP \ -Wno-return-type \ -zopt CXXOPTIMIZE = $(OPT_ROOT_BASE) \ -flto \ -mllvm -unroll-threshold=100 \ -finline-aggressive \ -mllvm -loop-unswitch-threshold=200000 \ -mllvm -reduce-array-computations=3 \ -DSPEC_OPENMP \ -zopt FOPTIMIZE = $(OPT_ROOT_BASE) \ -flto \ -Mrecursive \ -funroll-loops \ -mllvm -lsr-in-nested-loop \ -mllvm -reduce-array-computations=3 \ -DSPEC_OPENMP \ -zopt LDCXXFLAGS = -Wl,-mllvm -Wl,-x86-use-vzeroupper=false LDFLAGS = -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6 \ -Wl,-mllvm -Wl,-reduce-array-computations=3 LDFFLAGS = -Wl,-mllvm -Wl,-enable-X86-prefetching #other libraries # Put OpenMP and math libraries here: # -lm needed at the end for some transcendental functions: EXTRA_LIBS = -fopenmp=libomp \ -lomp \ -lamdlibm \ -lamdalloc \ -lflang \ -lm EXTRA_FLIBS = # Don't put the AMD and mvec math libraries in MATHLIBOPT because it will trigger a reporting issue # because GCC won't use them. Forcefeed all benchmarks the math libraries in EXTRA_LIBS and clear # out MATHLIBOPT. MATHLIBOPT = ######################### # intspeed tuning flags # ######################### intspeed: FOPTIMIZE = $(OPT_ROOT_BASE) \ -flto \ -mllvm -optimize-strided-mem-cost EXTRA_FFLAGS = -mllvm -unroll-aggressive \ -mllvm -unroll-threshold=150 EXTRA_CXXFLAGS = -fvirtual-function-elimination \ -fvisibility=hidden LDFLAGS = -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6 \ -Wl,-mllvm -Wl,-reduce-array-computations=3 LDCFLAGS = -Wl,-allow-multiple-definition LDCXXFLAGS = LDFFLAGS = -Wl,-mllvm -Wl,-inline-recursion=4 \ -Wl,-mllvm -Wl,-lsr-in-nested-loop \ -Wl,-mllvm -Wl,-enable-iv-split ############################## # intspeed base tuning flags # ############################## intspeed=base: EXTRA_LIBS = -fopenmp=libomp \ -lomp \ -lamdlibm \ -lflang \ -lm EXTRA_CLIBS = -lamdalloc EXTRA_CXXLIBS = -lamdalloc-ext EXTRA_FLIBS = -lamdalloc submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} ############################## # intspeed peak tuning flags # ############################## intspeed=peak: submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} ############################# # fpspeed base tuning flags # ############################# fpspeed=base: submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} ############################# # fpspeed peak tuning flags # ############################# fpspeed=peak: submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} ##################### # Peak tuning flags # ##################### default=peak: COPTIMIZE = $(OPT_ROOT_PEAK) -fstruct-layout=9 \ -mllvm -unroll-threshold=50 \ -fremap-arrays \ -fstrip-mining \ -mllvm -inline-threshold=1000 \ -mllvm -reduce-array-computations=3 \ -DSPEC_OPENMP -Wno-return-type \ -zopt CXXOPTIMIZE = $(OPT_ROOT_PEAK) -finline-aggressive \ -mllvm -unroll-threshold=100 \ -mllvm -reduce-array-computations=3 \ -DSPEC_OPENMP \ -zopt FOPTIMIZE = $(OPT_ROOT_PEAK) -Mrecursive \ -mllvm -reduce-array-computations=3 \ -DSPEC_OPENMP \ -zopt LDFLAGS = -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6 \ -Wl,-mllvm -Wl,-reduce-array-computations=3 LDFFLAGS = -Wl,-mllvm -Wl,-enable-X86-prefetching LDCXXFLAGS = -Wl,-mllvm -Wl,-x86-use-vzeroupper=false EXTRA_LIBS = -fopenmp=libomp \ -lomp \ -lamdlibm \ -lamdalloc \ -lflang \ -lm feedback = 0 PASS1_CFLAGS = -fprofile-instr-generate PASS2_CFLAGS = -fprofile-instr-use PASS1_FFLAGS = -fprofile-generate PASS2_FFLAGS = -fprofile-use PASS1_CXXFLAGS = -fprofile-instr-generate PASS2_CXXFLAGS = -fprofile-instr-use PASS1_LDFLAGS = -fprofile-instr-generate PASS2_LDFLAGS = -fprofile-instr-use fdo_run1 = $command ; llvm-profdata merge --output=default.profdata *.profraw # Benchmark specific peak tuning flags: 603.bwaves_s=peak: #lang='F' FOPTIMIZE = -Ofast \ $(OPT_ROOT) \ -Mrecursive \ -mllvm -reduce-array-computations=3 \ -DSPEC_OPENMP \ -fvector-transform \ -fscalar-transform submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} 607.cactuBSSN_s=peak: #lang='CXX,C,F' submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} 619.lbm_s=peak: submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} 620.omnetpp_s=peak: #lang='CXX' EXTRA_LIBS = -fopenmp=libomp \ -lomp \ -lamdlibm \ -lamdalloc-ext \ -lflang -lm 621.wrf_s=peak: #lang='F,C' FOPTIMIZE = $(OPT_ROOT_BASE) \ -Mrecursive \ -funroll-loops \ -mllvm -lsr-in-nested-loop \ -mllvm -reduce-array-computations=3 \ -DSPEC_OPENMP \ -zopt submit = ${THP_NEVER}; ${DEFAULT_SUBMIT} 623.xalancbmk_s=peak: #lang='CXX' EXTRA_CXXFLAGS = -mllvm -do-block-reorder=aggressive \ -fvirtual-function-elimination -fvisibility=hidden LDFLAGS = -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6 \ -Wl,-mllvm -Wl,-reduce-array-computations=3 \ -Wl,-mllvm -Wl,-do-block-reorder=aggressive EXTRA_LIBS = -fopenmp=libomp \ -lomp \ -lamdlibm \ -lamdalloc-ext \ -lflang \ -lm 627.cam4_s=peak: #lang='F,C' LDFLAGS = -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6 \ -Wl,-mllvm -Wl,-reduce-array-computations=3 submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} 628.pop2_s=peak: #lang='F,C' FOPTIMIZE = $(OPT_ROOT) \ -Ofast \ -Mrecursive \ -mllvm -reduce-array-computations=3 \ -DSPEC_OPENMP \ -fvector-transform \ -fscalar-transform submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} 638.imagick_s=peak: #lang='C' LDFLAGS = -Wl,-mllvm -Wl,-align-all-nofallthru-blocks=6 \ -Wl,-mllvm -Wl,-reduce-array-computations=3 submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} 644.nab_s=peak: #lang='C' LDFLAGS = -Wl,-mllvm -Wl,-region-vectorize submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} 649.fotonik3d_s=peak: #lang='F' ENV_PGHPF_ZMEM = yes submit = ${THP_ALWAYS}; ${DEFAULT_SUBMIT} 654.roms_s=peak: #lang='F' FOPTIMIZE = -Ofast \ $(OPT_ROOT) \ -Mrecursive \ -mllvm -reduce-array-computations=3 \ -DSPEC_OPENMP \ -fvector-transform \ -fscalar-transform submit = ${THP_MADVISE}; ${DEFAULT_SUBMIT} 657.xz_s=peak: #lang='C' ENV_LIBOMP_NUM_HIDDEN_HELPER_THREADS = 8 # ---- End inclusion of '/home/cpu2017-1.1.8-amd-aocc400-genoa-B1b/config/amd_speed_aocc400_genoa_B1_flags.inc' # The following settings were obtained by running the sysinfo_program # 'specperl $[top]/bin/sysinfo' (sysinfo:SHA:679c83684f6f4fc369a093999b6661d0a378911de2a006d3245423ad80d3fb9a) default: notes_plat_sysinfo_000 = notes_plat_sysinfo_005 = Sysinfo program /home/cpu2017-1.1.8-amd-aocc400-genoa-B1b/bin/sysinfo notes_plat_sysinfo_010 = Rev: r6622 of 2021-04-07 982a61ec0915b55891ef0e16acafc64d notes_plat_sysinfo_015 = running on localhost Sun Dec 18 02:36:11 2022 notes_plat_sysinfo_020 = notes_plat_sysinfo_025 = SUT (System Under Test) info as seen by some common utilities. notes_plat_sysinfo_030 = For more information on this section, see notes_plat_sysinfo_035 = https://www.spec.org/cpu2017/Docs/config.html#sysinfo notes_plat_sysinfo_040 = notes_plat_sysinfo_045 = From /proc/cpuinfo notes_plat_sysinfo_050 = model name : AMD EPYC 9454 48-Core Processor notes_plat_sysinfo_055 = 2 "physical id"s (chips) notes_plat_sysinfo_060 = 192 "processors" notes_plat_sysinfo_065 = cores, siblings (Caution: counting these is hw and system dependent. The following notes_plat_sysinfo_070 = excerpts from /proc/cpuinfo might not be reliable. Use with caution.) notes_plat_sysinfo_075 = cpu cores : 48 notes_plat_sysinfo_080 = siblings : 96 notes_plat_sysinfo_085 = physical 0: cores 0 1 2 3 4 5 10 11 12 13 16 17 18 19 20 21 24 25 26 27 28 29 32 33 notes_plat_sysinfo_090 = 34 35 36 37 40 41 42 43 44 45 48 49 50 51 52 53 56 57 58 59 60 61 notes_plat_sysinfo_095 = physical 1: cores 0 1 2 3 4 5 10 11 12 13 16 17 18 19 20 21 24 25 26 27 28 29 32 33 notes_plat_sysinfo_100 = 34 35 36 37 40 41 42 43 44 45 48 49 50 51 52 53 56 57 58 59 60 61 notes_plat_sysinfo_105 = notes_plat_sysinfo_110 = From lscpu from util-linux 2.37.2: notes_plat_sysinfo_115 = Architecture: x86_64 notes_plat_sysinfo_120 = CPU op-mode(s): 32-bit, 64-bit notes_plat_sysinfo_125 = Address sizes: 52 bits physical, 57 bits virtual notes_plat_sysinfo_130 = Byte Order: Little Endian notes_plat_sysinfo_135 = CPU(s): 192 notes_plat_sysinfo_140 = On-line CPU(s) list: 0-191 notes_plat_sysinfo_145 = Vendor ID: AuthenticAMD notes_plat_sysinfo_150 = Model name: AMD EPYC 9454 48-Core Processor notes_plat_sysinfo_155 = CPU family: 25 notes_plat_sysinfo_160 = Model: 17 notes_plat_sysinfo_165 = Thread(s) per core: 2 notes_plat_sysinfo_170 = Core(s) per socket: 48 notes_plat_sysinfo_175 = Socket(s): 2 notes_plat_sysinfo_180 = Stepping: 1 notes_plat_sysinfo_185 = Frequency boost: enabled notes_plat_sysinfo_190 = CPU max MHz: 3810.7910 notes_plat_sysinfo_195 = CPU min MHz: 1500.0000 notes_plat_sysinfo_200 = BogoMIPS: 5491.85 notes_plat_sysinfo_205 = Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr notes_plat_sysinfo_210 = pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt notes_plat_sysinfo_215 = pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid notes_plat_sysinfo_220 = aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe notes_plat_sysinfo_225 = popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a notes_plat_sysinfo_230 = misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb notes_plat_sysinfo_235 = bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs notes_plat_sysinfo_240 = ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f notes_plat_sysinfo_245 = avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw notes_plat_sysinfo_250 = avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total notes_plat_sysinfo_255 = cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt notes_plat_sysinfo_260 = lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter notes_plat_sysinfo_265 = pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl avx512vbmi umip pku ospke notes_plat_sysinfo_270 = avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 notes_plat_sysinfo_275 = rdpid overflow_recov succor smca fsrm flush_l1d notes_plat_sysinfo_280 = Virtualization: AMD-V notes_plat_sysinfo_285 = L1d cache: 3 MiB (96 instances) notes_plat_sysinfo_290 = L1i cache: 3 MiB (96 instances) notes_plat_sysinfo_295 = L2 cache: 96 MiB (96 instances) notes_plat_sysinfo_300 = L3 cache: 512 MiB (16 instances) notes_plat_sysinfo_305 = NUMA node(s): 2 notes_plat_sysinfo_310 = NUMA node0 CPU(s): 0-47,96-143 notes_plat_sysinfo_315 = NUMA node1 CPU(s): 48-95,144-191 notes_plat_sysinfo_320 = Vulnerability Itlb multihit: Not affected notes_plat_sysinfo_325 = Vulnerability L1tf: Not affected notes_plat_sysinfo_330 = Vulnerability Mds: Not affected notes_plat_sysinfo_335 = Vulnerability Meltdown: Not affected notes_plat_sysinfo_340 = Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via notes_plat_sysinfo_345 = prctl and seccomp notes_plat_sysinfo_350 = Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user notes_plat_sysinfo_355 = pointer sanitization notes_plat_sysinfo_360 = Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, notes_plat_sysinfo_365 = STIBP always-on, RSB filling notes_plat_sysinfo_370 = Vulnerability Srbds: Not affected notes_plat_sysinfo_375 = Vulnerability Tsx async abort: Not affected notes_plat_sysinfo_380 = notes_plat_sysinfo_385 = From lscpu --cache: notes_plat_sysinfo_390 = NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL SETS PHY-LINE COHERENCY-SIZE notes_plat_sysinfo_395 = L1d 32K 3M 8 Data 1 64 1 64 notes_plat_sysinfo_400 = L1i 32K 3M 8 Instruction 1 64 1 64 notes_plat_sysinfo_405 = L2 1M 96M 8 Unified 2 2048 1 64 notes_plat_sysinfo_410 = L3 32M 512M 16 Unified 3 32768 1 64 notes_plat_sysinfo_415 = notes_plat_sysinfo_420 = /proc/cpuinfo cache data notes_plat_sysinfo_425 = cache size : 1024 KB notes_plat_sysinfo_430 = notes_plat_sysinfo_435 = From numactl --hardware notes_plat_sysinfo_440 = WARNING: a numactl 'node' might or might not correspond to a physical chip. notes_plat_sysinfo_445 = available: 2 nodes (0-1) notes_plat_sysinfo_450 = node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 notes_plat_sysinfo_455 = 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 96 97 98 99 100 101 102 103 notes_plat_sysinfo_460 = 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 notes_plat_sysinfo_465 = 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 notes_plat_sysinfo_470 = node 0 size: 386600 MB notes_plat_sysinfo_475 = node 0 free: 384422 MB notes_plat_sysinfo_480 = node 1 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 notes_plat_sysinfo_485 = 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 144 145 146 147 notes_plat_sysinfo_490 = 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 notes_plat_sysinfo_495 = 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 notes_plat_sysinfo_500 = node 1 size: 386810 MB notes_plat_sysinfo_505 = node 1 free: 386237 MB notes_plat_sysinfo_510 = node distances: notes_plat_sysinfo_515 = node 0 1 notes_plat_sysinfo_520 = 0: 10 32 notes_plat_sysinfo_525 = 1: 32 10 notes_plat_sysinfo_530 = notes_plat_sysinfo_535 = From /proc/meminfo notes_plat_sysinfo_540 = MemTotal: 791972832 kB notes_plat_sysinfo_545 = HugePages_Total: 0 notes_plat_sysinfo_550 = Hugepagesize: 2048 kB notes_plat_sysinfo_555 = notes_plat_sysinfo_560 = /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor has notes_plat_sysinfo_565 = performance notes_plat_sysinfo_570 = notes_plat_sysinfo_575 = From /etc/*release* /etc/*version* notes_plat_sysinfo_580 = os-release: notes_plat_sysinfo_585 = NAME="SLES" notes_plat_sysinfo_590 = VERSION="15-SP4" notes_plat_sysinfo_595 = VERSION_ID="15.4" notes_plat_sysinfo_600 = PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4" notes_plat_sysinfo_605 = ID="sles" notes_plat_sysinfo_610 = ID_LIKE="suse" notes_plat_sysinfo_615 = ANSI_COLOR="0;32" notes_plat_sysinfo_620 = CPE_NAME="cpe:/o:suse:sles:15:sp4" notes_plat_sysinfo_625 = notes_plat_sysinfo_630 = uname -a: notes_plat_sysinfo_635 = Linux localhost 5.14.21-150400.22-default #1 SMP PREEMPT_DYNAMIC Wed May 11 06:57:18 notes_plat_sysinfo_640 = UTC 2022 (49db222) x86_64 x86_64 x86_64 GNU/Linux notes_plat_sysinfo_645 = notes_plat_sysinfo_650 = Kernel self-reported vulnerability status: notes_plat_sysinfo_655 = notes_plat_sysinfo_660 = CVE-2018-12207 (iTLB Multihit): Not affected notes_plat_sysinfo_665 = CVE-2018-3620 (L1 Terminal Fault): Not affected notes_plat_sysinfo_670 = Microarchitectural Data Sampling: Not affected notes_plat_sysinfo_675 = CVE-2017-5754 (Meltdown): Not affected notes_plat_sysinfo_680 = CVE-2018-3639 (Speculative Store Bypass): Mitigation: Speculative Store notes_plat_sysinfo_685 = Bypass disabled via prctl and notes_plat_sysinfo_690 = seccomp notes_plat_sysinfo_695 = CVE-2017-5753 (Spectre variant 1): Mitigation: usercopy/swapgs notes_plat_sysinfo_700 = barriers and __user pointer notes_plat_sysinfo_705 = sanitization notes_plat_sysinfo_710 = CVE-2017-5715 (Spectre variant 2): Mitigation: Retpolines, IBPB: notes_plat_sysinfo_715 = conditional, IBRS_FW, STIBP: notes_plat_sysinfo_720 = always-on, RSB filling notes_plat_sysinfo_725 = CVE-2020-0543 (Special Register Buffer Data Sampling): Not affected notes_plat_sysinfo_730 = CVE-2019-11135 (TSX Asynchronous Abort): Not affected notes_plat_sysinfo_735 = notes_plat_sysinfo_740 = run-level 3 Dec 18 00:56 notes_plat_sysinfo_745 = notes_plat_sysinfo_750 = SPEC is set to: /home/cpu2017-1.1.8-amd-aocc400-genoa-B1b notes_plat_sysinfo_755 = Filesystem Type Size Used Avail Use% Mounted on notes_plat_sysinfo_760 = /dev/sda2 xfs 446G 31G 416G 7% / notes_plat_sysinfo_765 = notes_plat_sysinfo_770 = From /sys/devices/virtual/dmi/id notes_plat_sysinfo_775 = Vendor: Lenovo notes_plat_sysinfo_780 = Product: ThinkSystem SR665 V3 MB,Genoa,Kauai,DDR5,Kauai,2U notes_plat_sysinfo_785 = Product Family: ThinkSystem notes_plat_sysinfo_790 = Serial: 1234567890 notes_plat_sysinfo_795 = notes_plat_sysinfo_800 = Additional information from dmidecode 3.2 follows. WARNING: Use caution when you notes_plat_sysinfo_805 = interpret this section. The 'dmidecode' program reads system data which is "intended to notes_plat_sysinfo_810 = allow hardware to be accurately determined", but the intent may not be met, as there are notes_plat_sysinfo_815 = frequent changes to hardware, firmware, and the "DMTF SMBIOS" standard. notes_plat_sysinfo_820 = Memory: notes_plat_sysinfo_825 = 5x SK Hynix HMCG88AEBRA115N 32 GB 2 rank 4800 notes_plat_sysinfo_830 = 19x SK Hynix HMCG88AEBRA168N 32 GB 2 rank 4800 notes_plat_sysinfo_835 = notes_plat_sysinfo_840 = BIOS: notes_plat_sysinfo_845 = BIOS Vendor: Lenovo notes_plat_sysinfo_850 = BIOS Version: KAE105F-1.20 notes_plat_sysinfo_855 = BIOS Date: 12/01/2022 notes_plat_sysinfo_860 = BIOS Revision: 1.20 notes_plat_sysinfo_865 = Firmware Revision: 1.20 notes_plat_sysinfo_870 = notes_plat_sysinfo_875 = (End of data from sysinfo program) hw_cpu_name = AMD EPYC 9454 hw_disk = 446 GB add more disk info here hw_memory001 = 755.284 GB fixme: If using DDR4, the format is: hw_memory002 = 'N GB (N x N GB nRxn PC4-nnnnX-X)' hw_nchips = 2 prepared_by = root (is never output, only tags rawfile) sw_file = xfs sw_os001 = NAME="SLES" sw_state = Run level 3 (add definition here) # End of settings added by sysinfo_program 607.cactuBSSN_s: # The following setting was inserted automatically as a result of # post-run basepeak application. basepeak = 1 603.bwaves_s: # The following setting was inserted automatically as a result of # post-run basepeak application. basepeak = 1 # The following section was added automatically, and contains settings that # did not appear in the original configuration file, but were added to the # raw file after the run. default: power_management000 = BIOS and OS set to prefer performance at the cost power_management001 = of additional power usage notes_plat_form_000 =BIOS configuration: notes_plat_form_005 =Operating Mode set to Maximum Performance