1Running the Tests 2================= 3 4All the tests are executed using the "Run" script in the top-level directory. 5 6The simplest way to generate results is with the commmand: 7 ./Run 8 9This will run a standard "index" test (see "The BYTE Index" below), and 10save the report in the "results" directory, with a filename like 11 hostname-2007-09-23-01 12An HTML version is also saved. 13 14If you want to generate both the basic system index and the graphics index, 15then do: 16 ./Run gindex 17 18If your system has more than one CPU, the tests will be run twice -- once 19with a single copy of each test running at once, and once with N copies, 20where N is the number of CPUs. Some categories of tests, however (currently 21the graphics tests) will only run with a single copy. 22 23Since the tests are based on constant time (variable work), a "system" 24run usually takes about 29 minutes; the "graphics" part about 18 minutes. 25A "gindex" run on a dual-core machine will do 2 "system" passes (single- 26and dual-processing) and one "graphics" run, for a total around one and 27a quarter hours. 28 29============================================================================ 30 31Detailed Usage 32============== 33 34The Run script takes a number of options which you can use to customise a 35test, and you can specify the names of the tests to run. The full usage 36is: 37 38 Run [ -q | -v ] [-i <n> ] [-c <n> [-c <n> ...]] [test ...] 39 40The option flags are: 41 42 -q Run in quiet mode. 43 -v Run in verbose mode. 44 -i <count> Run <count> iterations for each test -- slower tests 45 use <count> / 3, but at least 1. Defaults to 10 (3 for 46 slow tests). 47 -c <n> Run <n> copies of each test in parallel. 48 49The -c option can be given multiple times; for example: 50 51 ./Run -c 1 -c 4 52 53will run a single-streamed pass, then a 4-streamed pass. Note that some 54tests (currently the graphics tests) will only run in a single-streamed pass. 55 56The remaining non-flag arguments are taken to be the names of tests to run. 57The default is to run "index". See "Tests" below. 58 59When running the tests, I do *not* recommend switching to single-user mode 60("init 1"). This seems to change the results in ways I don't understand, 61and it's not realistic (unless your system will actually be running in this 62mode, of course). However, if using a windowing system, you may want to 63switch to a minimal window setup (for example, log in to a "twm" session), 64so that randomly-churning background processes don't randomise the results 65too much. This is particularly true for the graphics tests. 66 67 68Output can be specified by setting the following environment variables: 69 70 * "UB_RESULTDIR" : Absolute path of output directory of result files. 71 * "UB_TMPDIR" : Absolute path of temporary files for IO tests. 72 * "UB_OUTPUT_FILE_NAME" : Output file name. If exists it will be overwritten. 73 * "UB_OUTPUT_CSV" : If set "true", output results(score only) to .csv. 74============================================================================ 75 76Tests 77===== 78 79The available tests are organised into categories; when generating index 80scores (see "The BYTE Index" below) the results for each category are 81produced separately. The categories are: 82 83 system The original Unix system tests (not all are actually 84 in the index) 85 2d 2D graphics tests (not all are actually in the index) 86 3d 3D graphics tests 87 misc Various non-indexed tests 88 89The following individual tests are available: 90 91 system: 92 dhry2reg Dhrystone 2 using register variables 93 whetstone-double Double-Precision Whetstone 94 syscall System Call Overhead 95 pipe Pipe Throughput 96 context1 Pipe-based Context Switching 97 spawn Process Creation 98 execl Execl Throughput 99 fstime-w File Write 1024 bufsize 2000 maxblocks 100 fstime-r File Read 1024 bufsize 2000 maxblocks 101 fstime File Copy 1024 bufsize 2000 maxblocks 102 fsbuffer-w File Write 256 bufsize 500 maxblocks 103 fsbuffer-r File Read 256 bufsize 500 maxblocks 104 fsbuffer File Copy 256 bufsize 500 maxblocks 105 fsdisk-w File Write 4096 bufsize 8000 maxblocks 106 fsdisk-r File Read 4096 bufsize 8000 maxblocks 107 fsdisk File Copy 4096 bufsize 8000 maxblocks 108 shell1 Shell Scripts (1 concurrent) (runs "looper 60 multi.sh 1") 109 shell8 Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 8") 110 shell16 Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 16") 111 112 2d: 113 2d-rects 2D graphics: rectangles 114 2d-lines 2D graphics: lines 115 2d-circle 2D graphics: circles 116 2d-ellipse 2D graphics: ellipses 117 2d-shapes 2D graphics: polygons 118 2d-aashapes 2D graphics: aa polygons 119 2d-polys 2D graphics: complex polygons 120 2d-text 2D graphics: text 121 2d-blit 2D graphics: images and blits 122 2d-window 2D graphics: windows 123 124 3d: 125 ubgears 3D graphics: gears 126 127 misc: 128 C C Compiler Throughput ("looper 60 $cCompiler cctest.c") 129 arithoh Arithoh (huh?) 130 short Arithmetic Test (short) (this is arith.c configured for 131 "short" variables; ditto for the ones below) 132 int Arithmetic Test (int) 133 long Arithmetic Test (long) 134 float Arithmetic Test (float) 135 double Arithmetic Test (double) 136 dc Dc: sqrt(2) to 99 decimal places (runs 137 "looper 30 dc < dc.dat", using your system's copy of "dc") 138 hanoi Recursion Test -- Tower of Hanoi 139 grep Grep for a string in a large file, using your system's 140 copy of "grep" 141 sysexec Exercise fork() and exec(). 142 143The following pseudo-test names are aliases for combinations of other 144tests: 145 146 arithmetic Runs arithoh, short, int, long, float, double, 147 and whetstone-double 148 dhry Alias for dhry2reg 149 dhrystone Alias for dhry2reg 150 whets Alias for whetstone-double 151 whetstone Alias for whetstone-double 152 load Runs shell1, shell8, and shell16 153 misc Runs C, dc, and hanoi 154 speed Runs the arithmetic and system groups 155 oldsystem Runs execl, fstime, fsbuffer, fsdisk, pipe, context1, 156 spawn, and syscall 157 system Runs oldsystem plus shell1, shell8, and shell16 158 fs Runs fstime-w, fstime-r, fstime, fsbuffer-w, 159 fsbuffer-r, fsbuffer, fsdisk-w, fsdisk-r, and fsdisk 160 shell Runs shell1, shell8, and shell16 161 162 index Runs the tests which constitute the official index: 163 the oldsystem group, plus dhry2reg, whetstone-double, 164 shell1, and shell8 165 See "The BYTE Index" below for more information. 166 graphics Runs the tests which constitute the graphics index: 167 2d-rects, 2d-ellipse, 2d-aashapes, 2d-text, 2d-blit, 168 2d-window, and ubgears 169 gindex Runs the index and graphics groups, to generate both 170 sets of index results 171 172 all Runs all tests 173 174 175============================================================================ 176 177The BYTE Index 178============== 179 180The purpose of this test is to provide a basic indicator of the performance 181of a Unix-like system; hence, multiple tests are used to test various 182aspects of the system's performance. These test results are then compared 183to the scores from a baseline system to produce an index value, which is 184generally easier to handle than the raw sores. The entire set of index 185values is then combined to make an overall index for the system. 186 187Since 1995, the baseline system has been "George", a SPARCstation 20-61 188with 128 MB RAM, a SPARC Storage Array, and Solaris 2.3, whose ratings 189were set at 10.0. (So a system which scores 520 is 52 times faster than 190this machine.) Since the numbers are really only useful in a relative 191sense, there's no particular reason to update the base system, so for the 192sake of consistency it's probably best to leave it alone. George's scores 193are in the file "pgms/index.base"; this file is used to calculate the 194index scores for any particular run. 195 196Over the years, various changes have been made to the set of tests in the 197index. Although there is a desire for a consistent baseline, various tests 198have been determined to be misleading, and have been removed; and a few 199alternatives have been added. These changes are detailed in the README, 200and should be born in mind when looking at old scores. 201 202A number of tests are included in the benchmark suite which are not part of 203the index, for various reasons; these tests can of course be run manually. 204See "Tests" above. 205 206 207============================================================================ 208 209Graphics Tests 210============== 211 212As of version 5.1, UnixBench now contains some graphics benchmarks. These 213are intended to give a rough idea of the general graphics performance of 214a system. 215 216The graphics tests are in categories "2d" and "3d", so the index scores 217for these tests are separate from the basic system index. This seems 218like a sensible division, since the graphics performance of a system 219depends largely on the graphics adaptor. 220 221The tests currently consist of some 2D "x11perf" tests and "ubgears". 222 223* The 2D tests are a selection of the x11perf tests, using the host 224 system's x11perf command (which must be installed and in the search 225 path). Only a few of the x11perf tests are used, in the interests 226 of completing a test run in a reasonable time; if you want to do 227 detailed diagnosis of an X server or graphics chip, then use x11perf 228 directly. 229 230* The 3D test is "ubgears", a modified version of the familiar "glxgears". 231 This version runs for 5 seconds to "warm up", then performs a timed 232 run and displays the average frames-per-second. 233 234On multi-CPU systems, the graphics tests will only run in single-processing 235mode. This is because the meaning of running two copies of a test at once 236is dubious; and the test windows tend to overlay each other, meaning that 237the window behind isn't actually doing any work. 238 239 240============================================================================ 241 242Multiple CPUs 243============= 244 245If your system has multiple CPUs, the default behaviour is to run the selected 246tests twice -- once with one copy of each test program running at a time, 247and once with N copies, where N is the number of CPUs. (You can override 248this with the "-c" option; see "Detailed Usage" above.) This is designed to 249allow you to assess: 250 251 - the performance of your system when running a single task 252 - the performance of your system when running multiple tasks 253 - the gain from your system's implementation of parallel processing 254 255The results, however, need to be handled with care. Here are the results 256of two runs on a dual-processor system, one in single-processing mode, one 257dual-processing: 258 259 Test Single Dual Gain 260 -------------------- ------ ------ ---- 261 Dhrystone 2 562.5 1110.3 97% 262 Double Whetstone 320.0 640.4 100% 263 Execl Throughput 450.4 880.3 95% 264 File Copy 1024 759.4 595.9 -22% 265 File Copy 256 535.8 438.8 -18% 266 File Copy 4096 1261.8 1043.4 -17% 267 Pipe Throughput 481.0 979.3 104% 268 Pipe-based Switching 326.8 1229.0 276% 269 Process Creation 917.2 1714.1 87% 270 Shell Scripts (1) 1064.9 1566.3 47% 271 Shell Scripts (8) 1567.7 1709.9 9% 272 System Call Overhead 944.2 1445.5 53% 273 -------------------- ------ ------ ---- 274 Index Score: 678.2 1026.2 51% 275 276As expected, the heavily CPU-dependent tasks -- dhrystone, whetstone, 277execl, pipe throughput, process creation -- show close to 100% gain when 278running 2 copies in parallel. 279 280The Pipe-based Context Switching test measures context switching overhead 281by sending messages back and forth between 2 processes. I don't know why 282it shows such a huge gain with 2 copies (ie. 4 processes total) running, 283but it seems to be consistent on my system. I think this may be an issue 284with the SMP implementation. 285 286The System Call Overhead shows a lesser gain, presumably because it uses a 287lot of CPU time in single-threaded kernel code. The shell scripts test with 2888 concurrent processes shows no gain -- because the test itself runs 8 289scripts in parallel, it's already using both CPUs, even when the benchmark 290is run in single-stream mode. The same test with one process per copy 291shows a real gain. 292 293The filesystem throughput tests show a loss, instead of a gain, when 294multi-processing. That there's no gain is to be expected, since the tests 295are presumably constrained by the throughput of the I/O subsystem and the 296disk drive itself; the drop in performance is presumably down to the 297increased contention for resources, and perhaps greater disk head movement. 298 299So what tests should you use, how many copies should you run, and how should 300you interpret the results? Well, that's up to you, since it depends on 301what it is you're trying to measure. 302 303Implementation 304-------------- 305 306The multi-processing mode is implemented at the level of test iterations. 307During each iteration of a test, N slave processes are started using fork(). 308Each of these slaves executes the test program using fork() and exec(), 309reads and stores the entire output, times the run, and prints all the 310results to a pipe. The Run script reads the pipes for each of the slaves 311in turn to get the results and times. The scores are added, and the times 312averaged. 313 314The result is that each test program has N copies running at once. They 315should all finish at around the same time, since they run for constant time. 316 317If a test program itself starts off K multiple processes (as with the shell8 318test), then the effect will be that there are N * K processes running at 319once. This is probably not very useful for testing multi-CPU performance. 320 321 322============================================================================ 323 324The Language Setting 325==================== 326 327The $LANG environment variable determines how programs abnd library 328routines interpret text. This can have a big impact on the test results. 329 330If $LANG is set to POSIX, or is left unset, text is treated as ASCII; if 331it is set to en_US.UTF-8, foir example, then text is treated as being 332encoded in UTF-8, which is more complex and therefore slower. Setting 333it to other languages can have varying results. 334 335To ensure consistency between test runs, the Run script now (as of version 3365.1.1) sets $LANG to "en_US.utf8". 337 338This setting which is configured with the variable "$language". You 339should not change this if you want to share your results to allow 340comparisons between systems; however, you may want to change it to see 341how different language settings affect performance. 342 343Each test report now includes the language settings in use. The reported 344language is what is set in $LANG, and is not necessarily supported by the 345system; but we also report the character mapping and collation order which 346are actually in use (as reported by "locale"). 347 348 349============================================================================ 350 351Interpreting the Results 352======================== 353 354Interpreting the results of these tests is tricky, and totally depends on 355what you're trying to measure. 356 357For example, are you trying to measure how fast your CPU is? Or how good 358your compiler is? Because these tests are all recompiled using your host 359system's compiler, the performance of the compiler will inevitably impact 360the performance of the tests. Is this a problem? If you're choosing a 361system, you probably care about its overall speed, which may well depend 362on how good its compiler is; so including that in the test results may be 363the right answer. But you may want to ensure that the right compiler is 364used to build the tests. 365 366On the other hand, with the vast majority of Unix systems being x86 / PC 367compatibles, running Linux and the GNU C compiler, the results will tend 368to be more dependent on the hardware; but the versions of the compiler and 369OS can make a big difference. (I measured a 50% gain between SUSE 10.1 370and OpenSUSE 10.2 on the same machine.) So you may want to make sure that 371all your test systems are running the same version of the OS; or at least 372publish the OS and compuiler versions with your results. Then again, it may 373be compiler performance that you're interested in. 374 375The C test is very dubious -- it tests the speed of compilation. If you're 376running the exact same compiler on each system, OK; but otherwise, the 377results should probably be discarded. A slower compilation doesn't say 378anything about the speed of your system, since the compiler may simply be 379spending more time to super-optimise the code, which would actually make it 380faster. 381 382This will be particularly true on architectures like IA-64 (Itanium etc.) 383where the compiler spends huge amounts of effort scheduling instructions 384to run in parallel, with a resultant significant gain in execution speed. 385 386Some tests are even more dubious in terms of host-dependency -- for example, 387the "dc" test uses the host's version of dc (a calculator program). The 388version of this which is available can make a huge difference to the score, 389which is why it's not in the index group. Read through the release notes 390for more on these kinds of issues. 391 392Another age-old issue is that of the benchmarks being too trivial to be 393meaningful. With compilers getting ever smarter, and performing more 394wide-ranging flow path analyses, the danger of parts of the benchmarks 395simply being optimised out of existance is always present. 396 397All in all, the "index" and "gindex" tests (see above) are designed to 398give a reasonable measure of overall system performance; but the results 399of any test run should always be used with care. 400 401