Version 1.05
Last modified: March 15, 2006
This document specifies how Release 1.0 of the SPECjbb2005 benchmark is to be run for measuring and publicly reporting performance results. These rules abide by the norms laid down by SPEC. This ensures that results generated with this benchmark are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).
Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.
SPEC intends that this benchmark measure the overall performance of systems that provide environments for running server-side Java applications. It is not a J2EE benchmark and therefore it does not measure Enterprise Java Beans (EJBs), servlets, Java Server Pages (JSPs), etc.
The general philosophy behind the rules for running the SPECjbb2005 benchmark is to ensure that an independent party can reproduce the reported results.
For results to be publishable, SPEC expects:
Proper use of the SPEC benchmark tools as provided.
Availability of an appropriate full disclosure report.
Support for all of the appropriate APIs.
SPEC is aware of the importance of optimizations in producing the best system performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks and optimizations that specifically target the SPEC benchmarks. However, with the rules below, SPEC wants to increase the awareness by implementors and end users of issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking.
Hardware and software used to run the SPECjbb2005 benchmark must provide a suitable environment for running typical server-side Java programs. (Note that this may be different from a typical environment for client Java programs.)
Optimizations must generate correct code for a class of programs, where the class of programs must be larger than a single SPEC benchmark.
Optimizations must improve performance for a class of programs, where the class of programs must be larger than a single SPEC benchmark.
The vendor encourages the implementation for general use.
The implementation is generally available, documented and supported by the providing vendor.
Furthermore, SPEC expects that any public use of results from this benchmark shall be for configurations that are appropriate for public consumption and comparison.
In the case where it appears that the above guidelines have not been followed, SPEC may investigate such a claim and request that the offending optimization (e.g. a SPEC-benchmark specific pattern matching) be backed off and the results resubmitted. Or, SPEC may request that the vendor correct the deficiency (e.g. make the optimization more general purpose or correct problems with code generation) before submitting results based on the optimization.
SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECjbb2005 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees whenever it makes changes to the benchmark and may rename the metrics. In the event that the workload or metric is changed, SPEC reserves the right to republish in summary form "adapted" results for previously published systems, converted to the new metric. In the case of other changes, a republication may necessitate retesting and may require support from the original test sponsor.
Tested systems must provide an environment suitable for running typical server-side J2SE 5.0 applications and must be generally available for that purpose. Any tested system must include an implementation of the Java (tm) Virtual Machine as described by the following references, or as amended by SPEC for later Java versions:
Java Virtual Machine Specification (ISBN: 0201633.4X)
The following are specifically allowed, within the bounds of the Java Platform:
Precompilation and on-disk storage of compiled executables are specifically allowed. However, support for dynamic loading is required. Additional rules are defined in section 2.1.1. See section 2.4 for details about allowable flags for compilation.
Using 80-bit intermediate floating point values is specifically allowed. However, this benchmark has minimal floating-point computation.
The system must include a complete implementation of those classes that are referenced by this benchmark as in the J2SE 5.0 specification.
SPEC does not intend to check for implementation of APIs not used in the benchmark. For example, the benchmark does not use AWT, and SPEC does not intend to check for implementation of AWT.
Feedback directed optimization and precompilation from the Java bytecodes are allowed, subject to the restrictions regarding benchmark-specific optimizations in section 1.2. Precompilation and feedback-optimization before the measured invocation of the benchmark are allowed. Such optimizations must be fully disclosed (see section 3.5).
The SPECjbb2005 benchmark binaries are provided in jar files containing the Java classes. Valid runs must use the provided jar files and these files must not be updated or modified in any way. While the source code of the benchmark is provided for reference, the benchmarker must not recompile any of the provided .java files. Any runs that used recompiled class files are not valid and can not be reported or published.
A set of sequential points is run starting at 1 warehouse up through the minimum of 8 warehouses or 2 * N warehouses, whichever is higher. N is the expected peak number of warehouses, which by default is the value returned by the java.lang.Runtime.getRuntime.availableProcessors API. N may be overridden by setting the input.expected_peak_warehouse property, provided that the result is submitted to SPEC for review and an acceptable reason is given in the config.sw.notes section. An example of an acceptable reason to override the default value of N would be if System.availableProcessors() does not return an accurate or valid value for the hardware architecture of the SUT. An example of an unacceptable reason would be decreasing the value of N from the default to hide scalability problems and artificially obtain a higher score.
The sequence must increment by 1. The test may be configured to run beyond 2 * N warehouses by setting the input.ending_number_warehouses property. The points beyond the 2 * N point will appear in the report and on the graph, but will not be used to calculate the metric.
In some cases, the system under test may not be able to run all the points up to 2*N. If the system is able to run up to M warehouses, where N < M < 2*N, the test will still be marked valid and the missing points from M+1 to 2*N will be considered to have a throughput of 0 SPECjbb2005 bops for the purposes of metric computation. In this situation, the tester is strongly encouraged to contact the HW and SW vendors for fixes that would allow the system to run to 2*N. If such fixes are not available, the tester also has the option of disabling some CPUs and rerunning the test. Since the location of the peak, N, is strongly correlated to the number of CPUs in the system, reducing the number of CPUs will reduce N, and correspondingly, 2*N.
The following are required for a valid run and are automatically checked:
The Java environment must pass the partial conformance testing done by the benchmark prior to running any points.
Contains all required points as defined in the previous section.
The actual measurement interval for each warehouse iteration must be no less than 238.8 seconds and no greater than 264 seconds. (Note: the measurement interval specified in the properties file must be 240 seconds. This rule only allows for some variation in communicating the end of measurement to the threads.)
For each warehouse iteration, the start of the measurement interval of the first instance must be greater than or equal to the start of the rampup interval of the last instance and the end of the measurement interval of every instance must be less than or equal to the end of the rampdown interval of the instance that ended first. Comment: the intent of this requirement is to ensure that the measurement interval for all instances occurs during the time all instances are running.
The SPECjbb2005 benchmark runs on one machine (in a single OS image) as one or more instances of a single Java application. Use of clusters or aggregations of machines is specifically disallowed. No network, database, or web server components are required, only a Java environment.
Any deviations from the standard, default configuration for the SUT will need to be documented so an independent party would be able to reproduce the result without further assistance.
These changes must be "generally available", i.e., available, supported and documented. For example, if a special tool is needed to change the OS state, it must be provided to users and documented.
There are a number of parameters, in two properties files, that control the operation of SPECjbb2005. Parameter usage is explained in the SPECjbb2005 User Guide. The properties in the "Fixed Input Parameters" section of the file "SPECjbb.props" must not be changed from the values as provided by SPEC. The properties in the "Changeable Input Parameters" section may be set to the appropriate values for a valid run.
All benchmark settings must be reported, as well as the command line used for the reported run, and for precompilation, if any.
Both JVMs and native compilers are capable of modifying their behavior based on flags. Flags which do not break conformance to section 2.1 are allowed.
SPECjbb2005 produces two throughput metrics, as follows:
The total throughput measurement, SPECjbb2005 bops
The average throughput per JVM instance, SPECjbb2005 bops/JVM
The throughput metrics are calculated as follows:
For each JVM instance, all points (numbers of warehouses) are run, from 1 up to at least twice the number N of warehouses expected to produce the peak throughput. At a minimum all points from 1 to 8 must be run.
For all points from N to 2*N warehouses, the scores for the individual JVM instances are added.
The summed throughputs for all the points from N warehouses to 2*N inclusive warehouses are averaged (arithmetic mean is used). This average is the SPECjbb2005 bops metric. As explained in section 2.3, results from systems that are unable to run all points up to 2*N warehouses are still considered valid. For any missing points in the range N to 2*N, the throughput is considered to be 0 SPECjbb2005 bops in the metric computation.
The SPECjbb2005 bops/JVM is obtained by dividing the SPECjbb2005 bops metric by the number of JVM instances.
The reporting tool contained within SPECjbb2005 produces a graph of the throughput at all the measured points with warehouses on the horizontal axis and the summed throughputs on the vertical axis. All points from 1 to the minimum of 8 or 2*N are required to be run and reported. Missing points in the range N to 2*N will be reported to have a throughput of 0 SPECjbb2005 bops. The points being averaged for the metric will be marked on the report.
The run must meet all requirements described in section 2.4 to be a valid run.
All components, both hardware and software, must be generally available within 3 months of the publication date in order to be a valid publication. However, if JVM licensing issues cause a change in software availability date after publication date, the change will be allowed to be made without penalty, subject to subcommittee review.
If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the released system. If the sponsor later finds the performance of the released system to be 5% lower than that reported for the pre-release system, then the sponsor is requested to report a corrected test result.
All configuration properties contained in the descriptive properties file must be accurate.
The descriptive properties file contains a parameter config.sw.tuning which should be used to document any system tuning information. SPEC is aware that mechanisms for doing this include environment flags, command line flags, configuration files, registries, etc. Whatever the mechanism, it must be fully disclosed here in sufficient detail to enable the results to be reproduced. Examples of tuning information which should be documented include, but are not limited to:
Description of System Tuning (includes any special OS parameters set, changes to standard daemons (services for Microsoft Windows))
List of flags used
Precompilation or feedback optimization employed
Any special per-JVM tuning for multi-JVM running (e.g. pinning JVMs to specific processors)
SPEC is aware that sometimes the spelling of command line switches or environment variables, or even their presence, changes between beta releases and final releases. For example, suppose that during a product beta the tester specifies:
java -XX:fast -XX:architecture_level=3 -XX:unroll 16
but the tester knows that in the final release the architecture level will be automatically set by -XX:fast, and the product is going to change to set the default unroll level to 16. In that case, the actual command line used for the run should be recorded in the command-line parameter, config.command_line, and the final form of the command line should be reported in the config.sw.tuning parameter of the descriptive properties file.
The tester is expected to exercise due diligence regarding such flag reporting, to ensure that the disclosure correctly records the intended final product.
In order to publicly disclose SPECjbb2005 results, the tester must adhere to these reporting rules in addition to having followed the run rules above. The goal of the reporting rules is to ensure the system under test is sufficiently documented such that someone could reproduce the test and its results.
Any SPECjbb2005 result produced in compliance with these run and reporting rules may be publicly disclosed and represented as valid SPECjbb2005 results.
Any test result not in full compliance with the run and reporting rules must not be represented using the SPECjbb2005 metrics.
Results for which the value of the input.expected_peak_warehouse property has been set must be submitted and reviewed by SPEC to determine compliance with section 2.3. Future publications by the vendor using input.expected_peak_warehouse do not require review unless the technical reason for setting the flag differs from what was previously accepted by the subcommittee.
Once you have a compliant run and wish to submit it to SPEC for review, you will need to provide the raw file(s) created by the run.
Once you have the submission ready, please e-mail it to [email protected]
SPEC encourages the submission of results to SPEC for review by the relevant subcommittee and subsequent publication on SPEC's website. Vendors may publish compliant results independently, provided that the first use of input.expected_peak_warehouse property by the vendor be reviewed by the subcommittee to determine compliance with section 2.3. Future publications by the vendor using input.expected_peak_warehouse do not require review unless the technical reason for setting the flag differs from what was previously accepted by the subcommittee. However any SPEC member may request a full disclosure report for that result and the tester must comply within 10 business days. Issues raised concerning a result's compliance to the run and reporting rules will be taken up by the relevant subcommittee regardless of whether or not the result was formally submitted to SPEC.
Estimated results are not allowed to be publically disclosed.
SPECjbb2005 results must not be publicly compared to results from any other benchmark. This would be a violation of the SPECjbb2005 reporting rules and, in the case of the TPC benchmarks, a serious violation of the TPC "fair use policy."
Consistency and fairness are guiding principles for SPEC. To assure these principles are sustained, the following guidelines have been created with the intent that they serve as specific guidance for any organization (or individual) who chooses to make public comparisons using SPEC benchmark results.
When any organization or individual makes public claims using SPEC benchmark results, SPEC requires that the following guidelines be observed:
[1] Reference is made to the SPEC trademark. Such reference may be included in a notes section with other trademark references (see http://www.spec.org/spec/trademarks.html for all SPEC trademarks and service marks).
[2] The SPEC web site (http://www.spec.org) or a suitable sub page is noted as the source for more information.
[3] If competitive comparisons are made the following rules apply:
a. the results compared must utilize SPEC metrics and be compliant with that SPEC benchmark's run and reporting rules,
b. the basis for comparison must be stated,
c. the source of the competitive data must be stated, and the licensee (tester) must be identified or be clearly identifiable from the source,
d. the date competitive data was retrieved must be stated,
e. All data used in comparisons must be publicly available (from SPEC or elsewhere)
f. the number of jvms used in the benchmark must be stated,
g. both throughput metrics (SPECjbb2005 bops and SPECjbb2005 bops/JVM) must be stated.
[4] Comparisons with or between non-compliant test results can only be made within academic or research documents where the deviations from the rules for any non-compliant results have been disclosed.
The following paragraph is an example of acceptable language when publicly using SPEC benchmarks for competitive comparisons:
Example:
Server X (4 chips, 4 cores, 4 thread) 25,000 SPECjbb2005 bops, 25,000 SPECjbb2005 bops/JVM; Server Y (2 chips, 4 cores, 8 thread) 40,000 SPECjbb2005 bops, 40,000 SPECjbb2005 bops/JVM. SPEC and the benchmark name SPECjbb2005 are trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on http://www.spec.org as of November 30, 2005. For the latest SPECjbb2005 benchmark results, visit http://www.spec.org/osg/jbb2005.
SPEC encourages use of the SPECjbb2005 benchmark in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that demanded of licensees submitting to the SPEC web site. For example, a research environment may use early prototype hardware that simply cannot be expected to stay up for the length of time required to run the required number of points, or may use research compilers that are unsupported and are not generally available.
Nevertheless, SPEC encourages researchers to obey as many of the run rules as practical, even for informal research. SPEC respectfully suggests that following the rules will improve the clarity, reproducibility, and comparability of research results. Where the rules cannot be followed, SPEC requires the results be clearly distinguished from results officially submitted to SPEC, by disclosing the deviations from the rules.
Copyright 2005 Standard Performance Evaluation Corporation
Home - Contact - Site Map - Privacy - About SPEC
[email protected]
Last updated: Thu Mar 15 14:33:48 EDT 2006
Copyright
© 1995 - 2005 Standard Performance Evaluation Corporation
URL:
http://www.spec.org/jbb2005/docs/RunRules.html