1Running the Tests
2=================
3
4All the tests are executed using the "Run" script in the top-level directory.
5
6The simplest way to generate results is with the commmand:
7    ./Run
8
9This will run a standard "index" test (see "The BYTE Index" below), and
10save the report in the "results" directory, with a filename like
11    hostname-2007-09-23-01
12An HTML version is also saved.
13
14If you want to generate both the basic system index and the graphics index,
15then do:
16    ./Run gindex
17
18If your system has more than one CPU, the tests will be run twice -- once
19with a single copy of each test running at once, and once with N copies,
20where N is the number of CPUs.  Some categories of tests, however (currently
21the graphics tests) will only run with a single copy.
22
23Since the tests are based on constant time (variable work), a "system"
24run usually takes about 29 minutes; the "graphics" part about 18 minutes.
25A "gindex" run on a dual-core machine will do 2 "system" passes (single-
26and dual-processing) and one "graphics" run, for a total around one and
27a quarter hours.
28
29============================================================================
30
31Detailed Usage
32==============
33
34The Run script takes a number of options which you can use to customise a
35test, and you can specify the names of the tests to run.  The full usage
36is:
37
38    Run [ -q | -v ] [-i <n> ] [-c <n> [-c <n> ...]] [test ...]
39
40The option flags are:
41
42  -q            Run in quiet mode.
43  -v            Run in verbose mode.
44  -i <count>    Run <count> iterations for each test -- slower tests
45                use <count> / 3, but at least 1.  Defaults to 10 (3 for
46                slow tests).
47  -c <n>        Run <n> copies of each test in parallel.
48
49The -c option can be given multiple times; for example:
50
51    ./Run -c 1 -c 4
52
53will run a single-streamed pass, then a 4-streamed pass.  Note that some
54tests (currently the graphics tests) will only run in a single-streamed pass.
55
56The remaining non-flag arguments are taken to be the names of tests to run.
57The default is to run "index".  See "Tests" below.
58
59When running the tests, I do *not* recommend switching to single-user mode
60("init 1").  This seems to change the results in ways I don't understand,
61and it's not realistic (unless your system will actually be running in this
62mode, of course).  However, if using a windowing system, you may want to
63switch to a minimal window setup (for example, log in to a "twm" session),
64so that randomly-churning background processes don't randomise the results
65too much.  This is particularly true for the graphics tests.
66
67
68Output can be specified by setting the following environment variables:
69
70  * "UB_RESULTDIR" : Absolute path of output directory of result files.
71  * "UB_TMPDIR" : Absolute path of temporary files for IO tests.
72  * "UB_OUTPUT_FILE_NAME" : Output file name. If exists it will be overwritten.
73  * "UB_OUTPUT_CSV"       : If set "true", output results(score only) to .csv.
74============================================================================
75
76Tests
77=====
78
79The available tests are organised into categories; when generating index
80scores (see "The BYTE Index" below) the results for each category are
81produced separately.  The categories are:
82
83   system          The original Unix system tests (not all are actually
84                   in the index)
85   2d              2D graphics tests (not all are actually in the index)
86   3d              3D graphics tests
87   misc            Various non-indexed tests
88
89The following individual tests are available:
90
91  system:
92    dhry2reg         Dhrystone 2 using register variables
93    whetstone-double Double-Precision Whetstone
94    syscall          System Call Overhead
95    pipe             Pipe Throughput
96    context1         Pipe-based Context Switching
97    spawn            Process Creation
98    execl            Execl Throughput
99    fstime-w         File Write 1024 bufsize 2000 maxblocks
100    fstime-r         File Read 1024 bufsize 2000 maxblocks
101    fstime           File Copy 1024 bufsize 2000 maxblocks
102    fsbuffer-w       File Write 256 bufsize 500 maxblocks
103    fsbuffer-r       File Read 256 bufsize 500 maxblocks
104    fsbuffer         File Copy 256 bufsize 500 maxblocks
105    fsdisk-w         File Write 4096 bufsize 8000 maxblocks
106    fsdisk-r         File Read 4096 bufsize 8000 maxblocks
107    fsdisk           File Copy 4096 bufsize 8000 maxblocks
108    shell1           Shell Scripts (1 concurrent) (runs "looper 60 multi.sh 1")
109    shell8           Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 8")
110    shell16          Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 16")
111
112  2d:
113    2d-rects         2D graphics: rectangles
114    2d-lines         2D graphics: lines
115    2d-circle        2D graphics: circles
116    2d-ellipse       2D graphics: ellipses
117    2d-shapes        2D graphics: polygons
118    2d-aashapes      2D graphics: aa polygons
119    2d-polys         2D graphics: complex polygons
120    2d-text          2D graphics: text
121    2d-blit          2D graphics: images and blits
122    2d-window        2D graphics: windows
123
124  3d:
125    ubgears          3D graphics: gears
126
127  misc:
128    C                C Compiler Throughput ("looper 60 $cCompiler cctest.c")
129    arithoh          Arithoh (huh?)
130    short            Arithmetic Test (short) (this is arith.c configured for
131                     "short" variables; ditto for the ones below)
132    int              Arithmetic Test (int)
133    long             Arithmetic Test (long)
134    float            Arithmetic Test (float)
135    double           Arithmetic Test (double)
136    dc               Dc: sqrt(2) to 99 decimal places (runs
137                     "looper 30 dc < dc.dat", using your system's copy of "dc")
138    hanoi            Recursion Test -- Tower of Hanoi
139    grep             Grep for a string in a large file, using your system's
140                     copy of "grep"
141    sysexec          Exercise fork() and exec().
142
143The following pseudo-test names are aliases for combinations of other
144tests:
145
146    arithmetic       Runs arithoh, short, int, long, float, double,
147                     and whetstone-double
148    dhry             Alias for dhry2reg
149    dhrystone        Alias for dhry2reg
150    whets            Alias for whetstone-double
151    whetstone        Alias for whetstone-double
152    load             Runs shell1, shell8, and shell16
153    misc             Runs C, dc, and hanoi
154    speed            Runs the arithmetic and system groups
155    oldsystem        Runs execl, fstime, fsbuffer, fsdisk, pipe, context1,
156                     spawn, and syscall
157    system           Runs oldsystem plus shell1, shell8, and shell16
158    fs               Runs fstime-w, fstime-r, fstime, fsbuffer-w,
159                     fsbuffer-r, fsbuffer, fsdisk-w, fsdisk-r, and fsdisk
160    shell            Runs shell1, shell8, and shell16
161
162    index            Runs the tests which constitute the official index:
163                     the oldsystem group, plus dhry2reg, whetstone-double,
164                     shell1, and shell8
165                     See "The BYTE Index" below for more information.
166    graphics         Runs the tests which constitute the graphics index:
167                     2d-rects, 2d-ellipse, 2d-aashapes, 2d-text, 2d-blit,
168                     2d-window, and ubgears
169    gindex           Runs the index and graphics groups, to generate both
170                     sets of index results
171
172    all              Runs all tests
173
174
175============================================================================
176
177The BYTE Index
178==============
179
180The purpose of this test is to provide a basic indicator of the performance
181of a Unix-like system; hence, multiple tests are used to test various
182aspects of the system's performance.  These test results are then compared
183to the scores from a baseline system to produce an index value, which is
184generally easier to handle than the raw sores.  The entire set of index
185values is then combined to make an overall index for the system.
186
187Since 1995, the baseline system has been "George", a SPARCstation 20-61
188with 128 MB RAM, a SPARC Storage Array, and Solaris 2.3, whose ratings
189were set at 10.0.  (So a system which scores 520 is 52 times faster than
190this machine.)  Since the numbers are really only useful in a relative
191sense, there's no particular reason to update the base system, so for the
192sake of consistency it's probably best to leave it alone.  George's scores
193are in the file "pgms/index.base"; this file is used to calculate the
194index scores for any particular run.
195
196Over the years, various changes have been made to the set of tests in the
197index.  Although there is a desire for a consistent baseline, various tests
198have been determined to be misleading, and have been removed; and a few
199alternatives have been added.  These changes are detailed in the README,
200and should be born in mind when looking at old scores.
201
202A number of tests are included in the benchmark suite which are not part of
203the index, for various reasons; these tests can of course be run manually.
204See "Tests" above.
205
206
207============================================================================
208
209Graphics Tests
210==============
211
212As of version 5.1, UnixBench now contains some graphics benchmarks.  These
213are intended to give a rough idea of the general graphics performance of
214a system.
215
216The graphics tests are in categories "2d" and "3d", so the index scores
217for these tests are separate from the basic system index.  This seems
218like a sensible division, since the graphics performance of a system
219depends largely on the graphics adaptor.
220
221The tests currently consist of some 2D "x11perf" tests and "ubgears".
222
223* The 2D tests are a selection of the x11perf tests, using the host
224  system's x11perf command (which must be installed and in the search
225  path).  Only a few of the x11perf tests are used, in the interests
226  of completing a test run in a reasonable time; if you want to do
227  detailed diagnosis of an X server or graphics chip, then use x11perf
228  directly.
229
230* The 3D test is "ubgears", a modified version of the familiar "glxgears".
231  This version runs for 5 seconds to "warm up", then performs a timed
232  run and displays the average frames-per-second.
233
234On multi-CPU systems, the graphics tests will only run in single-processing
235mode.  This is because the meaning of running two copies of a test at once
236is dubious; and the test windows tend to overlay each other, meaning that
237the window behind isn't actually doing any work.
238
239
240============================================================================
241
242Multiple CPUs
243=============
244
245If your system has multiple CPUs, the default behaviour is to run the selected
246tests twice -- once with one copy of each test program running at a time,
247and once with N copies, where N is the number of CPUs.  (You can override
248this with the "-c" option; see "Detailed Usage" above.)  This is designed to
249allow you to assess:
250
251 - the performance of your system when running a single task
252 - the performance of your system when running multiple tasks
253 - the gain from your system's implementation of parallel processing
254
255The results, however, need to be handled with care.  Here are the results
256of two runs on a dual-processor system, one in single-processing mode, one
257dual-processing:
258
259  Test                    Single     Dual   Gain
260  --------------------    ------   ------   ----
261  Dhrystone 2              562.5   1110.3    97%
262  Double Whetstone         320.0    640.4   100%
263  Execl Throughput         450.4    880.3    95%
264  File Copy 1024           759.4    595.9   -22%
265  File Copy 256            535.8    438.8   -18%
266  File Copy 4096          1261.8   1043.4   -17%
267  Pipe Throughput          481.0    979.3   104%
268  Pipe-based Switching     326.8   1229.0   276%
269  Process Creation         917.2   1714.1    87%
270  Shell Scripts (1)       1064.9   1566.3    47%
271  Shell Scripts (8)       1567.7   1709.9     9%
272  System Call Overhead     944.2   1445.5    53%
273  --------------------    ------   ------   ----
274  Index Score:             678.2   1026.2    51%
275
276As expected, the heavily CPU-dependent tasks -- dhrystone, whetstone,
277execl, pipe throughput, process creation -- show close to 100% gain when
278running 2 copies in parallel.
279
280The Pipe-based Context Switching test measures context switching overhead
281by sending messages back and forth between 2 processes.  I don't know why
282it shows such a huge gain with 2 copies (ie. 4 processes total) running,
283but it seems to be consistent on my system.  I think this may be an issue
284with the SMP implementation.
285
286The System Call Overhead shows a lesser gain, presumably because it uses a
287lot of CPU time in single-threaded kernel code.  The shell scripts test with
2888 concurrent processes shows no gain -- because the test itself runs 8
289scripts in parallel, it's already using both CPUs, even when the benchmark
290is run in single-stream mode.  The same test with one process per copy
291shows a real gain.
292
293The filesystem throughput tests show a loss, instead of a gain, when
294multi-processing.  That there's no gain is to be expected, since the tests
295are presumably constrained by the throughput of the I/O subsystem and the
296disk drive itself; the drop in performance is presumably down to the
297increased contention for resources, and perhaps greater disk head movement.
298
299So what tests should you use, how many copies should you run, and how should
300you interpret the results?  Well, that's up to you, since it depends on
301what it is you're trying to measure.
302
303Implementation
304--------------
305
306The multi-processing mode is implemented at the level of test iterations.
307During each iteration of a test, N slave processes are started using fork().
308Each of these slaves executes the test program using fork() and exec(),
309reads and stores the entire output, times the run, and prints all the
310results to a pipe.  The Run script reads the pipes for each of the slaves
311in turn to get the results and times.  The scores are added, and the times
312averaged.
313
314The result is that each test program has N copies running at once.  They
315should all finish at around the same time, since they run for constant time.
316
317If a test program itself starts off K multiple processes (as with the shell8
318test), then the effect will be that there are N * K processes running at
319once.  This is probably not very useful for testing multi-CPU performance.
320
321
322============================================================================
323
324The Language Setting
325====================
326
327The $LANG environment variable determines how programs abnd library
328routines interpret text.  This can have a big impact on the test results.
329
330If $LANG is set to POSIX, or is left unset, text is treated as ASCII; if
331it is set to en_US.UTF-8, foir example, then text is treated as being
332encoded in UTF-8, which is more complex and therefore slower.  Setting
333it to other languages can have varying results.
334
335To ensure consistency between test runs, the Run script now (as of version
3365.1.1) sets $LANG to "en_US.utf8".
337
338This setting which is configured with the variable "$language".  You
339should not change this if you want to share your results to allow
340comparisons between systems; however, you may want to change it to see
341how different language settings affect performance.
342
343Each test report now includes the language settings in use.  The reported
344language is what is set in $LANG, and is not necessarily supported by the
345system; but we also report the character mapping and collation order which
346are actually in use (as reported by "locale").
347
348
349============================================================================
350
351Interpreting the Results
352========================
353
354Interpreting the results of these tests is tricky, and totally depends on
355what you're trying to measure.
356
357For example, are you trying to measure how fast your CPU is?  Or how good
358your compiler is?  Because these tests are all recompiled using your host
359system's compiler, the performance of the compiler will inevitably impact
360the performance of the tests.  Is this a problem?  If you're choosing a
361system, you probably care about its overall speed, which may well depend
362on how good its compiler is; so including that in the test results may be
363the right answer.  But you may want to ensure that the right compiler is
364used to build the tests.
365
366On the other hand, with the vast majority of Unix systems being x86 / PC
367compatibles, running Linux and the GNU C compiler, the results will tend
368to be more dependent on the hardware; but the versions of the compiler and
369OS can make a big difference.  (I measured a 50% gain between SUSE 10.1
370and OpenSUSE 10.2 on the same machine.)  So you may want to make sure that
371all your test systems are running the same version of the OS; or at least
372publish the OS and compuiler versions with your results.  Then again, it may
373be compiler performance that you're interested in.
374
375The C test is very dubious -- it tests the speed of compilation.  If you're
376running the exact same compiler on each system, OK; but otherwise, the
377results should probably be discarded.  A slower compilation doesn't say
378anything about the speed of your system, since the compiler may simply be
379spending more time to super-optimise the code, which would actually make it
380faster.
381
382This will be particularly true on architectures like IA-64 (Itanium etc.)
383where the compiler spends huge amounts of effort scheduling instructions
384to run in parallel, with a resultant significant gain in execution speed.
385
386Some tests are even more dubious in terms of host-dependency -- for example,
387the "dc" test uses the host's version of dc (a calculator program).  The
388version of this which is available can make a huge difference to the score,
389which is why it's not in the index group.  Read through the release notes
390for more on these kinds of issues.
391
392Another age-old issue is that of the benchmarks being too trivial to be
393meaningful.  With compilers getting ever smarter, and performing more
394wide-ranging flow path analyses, the danger of parts of the benchmarks
395simply being optimised out of existance is always present.
396
397All in all, the "index" and "gindex" tests (see above) are designed to
398give a reasonable measure of overall system performance; but the results
399of any test run should always be used with care.
400
401