1*4882a593Smuzhiyunperf-bench(1) 2*4882a593Smuzhiyun============= 3*4882a593Smuzhiyun 4*4882a593SmuzhiyunNAME 5*4882a593Smuzhiyun---- 6*4882a593Smuzhiyunperf-bench - General framework for benchmark suites 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunSYNOPSIS 9*4882a593Smuzhiyun-------- 10*4882a593Smuzhiyun[verse] 11*4882a593Smuzhiyun'perf bench' [<common options>] <subsystem> <suite> [<options>] 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunDESCRIPTION 14*4882a593Smuzhiyun----------- 15*4882a593SmuzhiyunThis 'perf bench' command is a general framework for benchmark suites. 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunCOMMON OPTIONS 18*4882a593Smuzhiyun-------------- 19*4882a593Smuzhiyun-r:: 20*4882a593Smuzhiyun--repeat=:: 21*4882a593SmuzhiyunSpecify amount of times to repeat the run (default 10). 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun-f:: 24*4882a593Smuzhiyun--format=:: 25*4882a593SmuzhiyunSpecify format style. 26*4882a593SmuzhiyunCurrent available format styles are: 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun'default':: 29*4882a593SmuzhiyunDefault style. This is mainly for human reading. 30*4882a593Smuzhiyun--------------------- 31*4882a593Smuzhiyun% perf bench sched pipe # with no style specified 32*4882a593Smuzhiyun(executing 1000000 pipe operations between two tasks) 33*4882a593Smuzhiyun Total time:5.855 sec 34*4882a593Smuzhiyun 5.855061 usecs/op 35*4882a593Smuzhiyun 170792 ops/sec 36*4882a593Smuzhiyun--------------------- 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun'simple':: 39*4882a593SmuzhiyunThis simple style is friendly for automated 40*4882a593Smuzhiyunprocessing by scripts. 41*4882a593Smuzhiyun--------------------- 42*4882a593Smuzhiyun% perf bench --format=simple sched pipe # specified simple 43*4882a593Smuzhiyun5.988 44*4882a593Smuzhiyun--------------------- 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunSUBSYSTEM 47*4882a593Smuzhiyun--------- 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun'sched':: 50*4882a593Smuzhiyun Scheduler and IPC mechanisms. 51*4882a593Smuzhiyun 52*4882a593Smuzhiyun'syscall':: 53*4882a593Smuzhiyun System call performance (throughput). 54*4882a593Smuzhiyun 55*4882a593Smuzhiyun'mem':: 56*4882a593Smuzhiyun Memory access performance. 57*4882a593Smuzhiyun 58*4882a593Smuzhiyun'numa':: 59*4882a593Smuzhiyun NUMA scheduling and MM benchmarks. 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun'futex':: 62*4882a593Smuzhiyun Futex stressing benchmarks. 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun'epoll':: 65*4882a593Smuzhiyun Eventpoll (epoll) stressing benchmarks. 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun'internals':: 68*4882a593Smuzhiyun Benchmark internal perf functionality. 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun'all':: 71*4882a593Smuzhiyun All benchmark subsystems. 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunSUITES FOR 'sched' 74*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~ 75*4882a593Smuzhiyun*messaging*:: 76*4882a593SmuzhiyunSuite for evaluating performance of scheduler and IPC mechanisms. 77*4882a593SmuzhiyunBased on hackbench by Rusty Russell. 78*4882a593Smuzhiyun 79*4882a593SmuzhiyunOptions of *messaging* 80*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^ 81*4882a593Smuzhiyun-p:: 82*4882a593Smuzhiyun--pipe:: 83*4882a593SmuzhiyunUse pipe() instead of socketpair() 84*4882a593Smuzhiyun 85*4882a593Smuzhiyun-t:: 86*4882a593Smuzhiyun--thread:: 87*4882a593SmuzhiyunBe multi thread instead of multi process 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun-g:: 90*4882a593Smuzhiyun--group=:: 91*4882a593SmuzhiyunSpecify number of groups 92*4882a593Smuzhiyun 93*4882a593Smuzhiyun-l:: 94*4882a593Smuzhiyun--nr_loops=:: 95*4882a593SmuzhiyunSpecify number of loops 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunExample of *messaging* 98*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^ 99*4882a593Smuzhiyun 100*4882a593Smuzhiyun--------------------- 101*4882a593Smuzhiyun% perf bench sched messaging # run with default 102*4882a593Smuzhiyunoptions (20 sender and receiver processes per group) 103*4882a593Smuzhiyun(10 groups == 400 processes run) 104*4882a593Smuzhiyun 105*4882a593Smuzhiyun Total time:0.308 sec 106*4882a593Smuzhiyun 107*4882a593Smuzhiyun% perf bench sched messaging -t -g 20 # be multi-thread, with 20 groups 108*4882a593Smuzhiyun(20 sender and receiver threads per group) 109*4882a593Smuzhiyun(20 groups == 800 threads run) 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun Total time:0.582 sec 112*4882a593Smuzhiyun--------------------- 113*4882a593Smuzhiyun 114*4882a593Smuzhiyun*pipe*:: 115*4882a593SmuzhiyunSuite for pipe() system call. 116*4882a593SmuzhiyunBased on pipe-test-1m.c by Ingo Molnar. 117*4882a593Smuzhiyun 118*4882a593SmuzhiyunOptions of *pipe* 119*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^ 120*4882a593Smuzhiyun-l:: 121*4882a593Smuzhiyun--loop=:: 122*4882a593SmuzhiyunSpecify number of loops. 123*4882a593Smuzhiyun 124*4882a593SmuzhiyunExample of *pipe* 125*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^ 126*4882a593Smuzhiyun 127*4882a593Smuzhiyun--------------------- 128*4882a593Smuzhiyun% perf bench sched pipe 129*4882a593Smuzhiyun(executing 1000000 pipe operations between two tasks) 130*4882a593Smuzhiyun 131*4882a593Smuzhiyun Total time:8.091 sec 132*4882a593Smuzhiyun 8.091833 usecs/op 133*4882a593Smuzhiyun 123581 ops/sec 134*4882a593Smuzhiyun 135*4882a593Smuzhiyun% perf bench sched pipe -l 1000 # loop 1000 136*4882a593Smuzhiyun(executing 1000 pipe operations between two tasks) 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun Total time:0.016 sec 139*4882a593Smuzhiyun 16.948000 usecs/op 140*4882a593Smuzhiyun 59004 ops/sec 141*4882a593Smuzhiyun--------------------- 142*4882a593Smuzhiyun 143*4882a593SmuzhiyunSUITES FOR 'syscall' 144*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~ 145*4882a593Smuzhiyun*basic*:: 146*4882a593SmuzhiyunSuite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics). 147*4882a593SmuzhiyunThis uses a single thread simply doing getppid(2), which is a simple syscall where the result is not 148*4882a593Smuzhiyuncached by glibc. 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun 151*4882a593SmuzhiyunSUITES FOR 'mem' 152*4882a593Smuzhiyun~~~~~~~~~~~~~~~~ 153*4882a593Smuzhiyun*memcpy*:: 154*4882a593SmuzhiyunSuite for evaluating performance of simple memory copy in various ways. 155*4882a593Smuzhiyun 156*4882a593SmuzhiyunOptions of *memcpy* 157*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^ 158*4882a593Smuzhiyun-l:: 159*4882a593Smuzhiyun--size:: 160*4882a593SmuzhiyunSpecify size of memory to copy (default: 1MB). 161*4882a593SmuzhiyunAvailable units are B, KB, MB, GB and TB (case insensitive). 162*4882a593Smuzhiyun 163*4882a593Smuzhiyun-f:: 164*4882a593Smuzhiyun--function:: 165*4882a593SmuzhiyunSpecify function to copy (default: default). 166*4882a593SmuzhiyunAvailable functions are depend on the architecture. 167*4882a593SmuzhiyunOn x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported. 168*4882a593Smuzhiyun 169*4882a593Smuzhiyun-l:: 170*4882a593Smuzhiyun--nr_loops:: 171*4882a593SmuzhiyunRepeat memcpy invocation this number of times. 172*4882a593Smuzhiyun 173*4882a593Smuzhiyun-c:: 174*4882a593Smuzhiyun--cycles:: 175*4882a593SmuzhiyunUse perf's cpu-cycles event instead of gettimeofday syscall. 176*4882a593Smuzhiyun 177*4882a593Smuzhiyun*memset*:: 178*4882a593SmuzhiyunSuite for evaluating performance of simple memory set in various ways. 179*4882a593Smuzhiyun 180*4882a593SmuzhiyunOptions of *memset* 181*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^ 182*4882a593Smuzhiyun-l:: 183*4882a593Smuzhiyun--size:: 184*4882a593SmuzhiyunSpecify size of memory to set (default: 1MB). 185*4882a593SmuzhiyunAvailable units are B, KB, MB, GB and TB (case insensitive). 186*4882a593Smuzhiyun 187*4882a593Smuzhiyun-f:: 188*4882a593Smuzhiyun--function:: 189*4882a593SmuzhiyunSpecify function to set (default: default). 190*4882a593SmuzhiyunAvailable functions are depend on the architecture. 191*4882a593SmuzhiyunOn x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported. 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun-l:: 194*4882a593Smuzhiyun--nr_loops:: 195*4882a593SmuzhiyunRepeat memset invocation this number of times. 196*4882a593Smuzhiyun 197*4882a593Smuzhiyun-c:: 198*4882a593Smuzhiyun--cycles:: 199*4882a593SmuzhiyunUse perf's cpu-cycles event instead of gettimeofday syscall. 200*4882a593Smuzhiyun 201*4882a593SmuzhiyunSUITES FOR 'numa' 202*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~ 203*4882a593Smuzhiyun*mem*:: 204*4882a593SmuzhiyunSuite for evaluating NUMA workloads. 205*4882a593Smuzhiyun 206*4882a593SmuzhiyunSUITES FOR 'futex' 207*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~ 208*4882a593Smuzhiyun*hash*:: 209*4882a593SmuzhiyunSuite for evaluating hash tables. 210*4882a593Smuzhiyun 211*4882a593Smuzhiyun*wake*:: 212*4882a593SmuzhiyunSuite for evaluating wake calls. 213*4882a593Smuzhiyun 214*4882a593Smuzhiyun*wake-parallel*:: 215*4882a593SmuzhiyunSuite for evaluating parallel wake calls. 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun*requeue*:: 218*4882a593SmuzhiyunSuite for evaluating requeue calls. 219*4882a593Smuzhiyun 220*4882a593Smuzhiyun*lock-pi*:: 221*4882a593SmuzhiyunSuite for evaluating futex lock_pi calls. 222*4882a593Smuzhiyun 223*4882a593SmuzhiyunSUITES FOR 'epoll' 224*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~ 225*4882a593Smuzhiyun*wait*:: 226*4882a593SmuzhiyunSuite for evaluating concurrent epoll_wait calls. 227*4882a593Smuzhiyun 228*4882a593Smuzhiyun*ctl*:: 229*4882a593SmuzhiyunSuite for evaluating multiple epoll_ctl calls. 230*4882a593Smuzhiyun 231*4882a593SmuzhiyunSUITES FOR 'internals' 232*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~ 233*4882a593Smuzhiyun*synthesize*:: 234*4882a593SmuzhiyunSuite for evaluating perf's event synthesis performance. 235*4882a593Smuzhiyun 236*4882a593SmuzhiyunSEE ALSO 237*4882a593Smuzhiyun-------- 238*4882a593Smuzhiyunlinkperf:perf[1] 239