xref: /OK3568_Linux_fs/kernel/tools/perf/Documentation/perf-bench.txt (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyunperf-bench(1)
2*4882a593Smuzhiyun=============
3*4882a593Smuzhiyun
4*4882a593SmuzhiyunNAME
5*4882a593Smuzhiyun----
6*4882a593Smuzhiyunperf-bench - General framework for benchmark suites
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunSYNOPSIS
9*4882a593Smuzhiyun--------
10*4882a593Smuzhiyun[verse]
11*4882a593Smuzhiyun'perf bench' [<common options>] <subsystem> <suite> [<options>]
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunDESCRIPTION
14*4882a593Smuzhiyun-----------
15*4882a593SmuzhiyunThis 'perf bench' command is a general framework for benchmark suites.
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunCOMMON OPTIONS
18*4882a593Smuzhiyun--------------
19*4882a593Smuzhiyun-r::
20*4882a593Smuzhiyun--repeat=::
21*4882a593SmuzhiyunSpecify amount of times to repeat the run (default 10).
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun-f::
24*4882a593Smuzhiyun--format=::
25*4882a593SmuzhiyunSpecify format style.
26*4882a593SmuzhiyunCurrent available format styles are:
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun'default'::
29*4882a593SmuzhiyunDefault style. This is mainly for human reading.
30*4882a593Smuzhiyun---------------------
31*4882a593Smuzhiyun% perf bench sched pipe                      # with no style specified
32*4882a593Smuzhiyun(executing 1000000 pipe operations between two tasks)
33*4882a593Smuzhiyun        Total time:5.855 sec
34*4882a593Smuzhiyun                5.855061 usecs/op
35*4882a593Smuzhiyun		170792 ops/sec
36*4882a593Smuzhiyun---------------------
37*4882a593Smuzhiyun
38*4882a593Smuzhiyun'simple'::
39*4882a593SmuzhiyunThis simple style is friendly for automated
40*4882a593Smuzhiyunprocessing by scripts.
41*4882a593Smuzhiyun---------------------
42*4882a593Smuzhiyun% perf bench --format=simple sched pipe      # specified simple
43*4882a593Smuzhiyun5.988
44*4882a593Smuzhiyun---------------------
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunSUBSYSTEM
47*4882a593Smuzhiyun---------
48*4882a593Smuzhiyun
49*4882a593Smuzhiyun'sched'::
50*4882a593Smuzhiyun	Scheduler and IPC mechanisms.
51*4882a593Smuzhiyun
52*4882a593Smuzhiyun'syscall'::
53*4882a593Smuzhiyun	System call performance (throughput).
54*4882a593Smuzhiyun
55*4882a593Smuzhiyun'mem'::
56*4882a593Smuzhiyun	Memory access performance.
57*4882a593Smuzhiyun
58*4882a593Smuzhiyun'numa'::
59*4882a593Smuzhiyun	NUMA scheduling and MM benchmarks.
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun'futex'::
62*4882a593Smuzhiyun	Futex stressing benchmarks.
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun'epoll'::
65*4882a593Smuzhiyun	Eventpoll (epoll) stressing benchmarks.
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun'internals'::
68*4882a593Smuzhiyun	Benchmark internal perf functionality.
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun'all'::
71*4882a593Smuzhiyun	All benchmark subsystems.
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunSUITES FOR 'sched'
74*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~
75*4882a593Smuzhiyun*messaging*::
76*4882a593SmuzhiyunSuite for evaluating performance of scheduler and IPC mechanisms.
77*4882a593SmuzhiyunBased on hackbench by Rusty Russell.
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunOptions of *messaging*
80*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^
81*4882a593Smuzhiyun-p::
82*4882a593Smuzhiyun--pipe::
83*4882a593SmuzhiyunUse pipe() instead of socketpair()
84*4882a593Smuzhiyun
85*4882a593Smuzhiyun-t::
86*4882a593Smuzhiyun--thread::
87*4882a593SmuzhiyunBe multi thread instead of multi process
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun-g::
90*4882a593Smuzhiyun--group=::
91*4882a593SmuzhiyunSpecify number of groups
92*4882a593Smuzhiyun
93*4882a593Smuzhiyun-l::
94*4882a593Smuzhiyun--nr_loops=::
95*4882a593SmuzhiyunSpecify number of loops
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunExample of *messaging*
98*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^
99*4882a593Smuzhiyun
100*4882a593Smuzhiyun---------------------
101*4882a593Smuzhiyun% perf bench sched messaging                 # run with default
102*4882a593Smuzhiyunoptions (20 sender and receiver processes per group)
103*4882a593Smuzhiyun(10 groups == 400 processes run)
104*4882a593Smuzhiyun
105*4882a593Smuzhiyun      Total time:0.308 sec
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun% perf bench sched messaging -t -g 20        # be multi-thread, with 20 groups
108*4882a593Smuzhiyun(20 sender and receiver threads per group)
109*4882a593Smuzhiyun(20 groups == 800 threads run)
110*4882a593Smuzhiyun
111*4882a593Smuzhiyun      Total time:0.582 sec
112*4882a593Smuzhiyun---------------------
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun*pipe*::
115*4882a593SmuzhiyunSuite for pipe() system call.
116*4882a593SmuzhiyunBased on pipe-test-1m.c by Ingo Molnar.
117*4882a593Smuzhiyun
118*4882a593SmuzhiyunOptions of *pipe*
119*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^
120*4882a593Smuzhiyun-l::
121*4882a593Smuzhiyun--loop=::
122*4882a593SmuzhiyunSpecify number of loops.
123*4882a593Smuzhiyun
124*4882a593SmuzhiyunExample of *pipe*
125*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^
126*4882a593Smuzhiyun
127*4882a593Smuzhiyun---------------------
128*4882a593Smuzhiyun% perf bench sched pipe
129*4882a593Smuzhiyun(executing 1000000 pipe operations between two tasks)
130*4882a593Smuzhiyun
131*4882a593Smuzhiyun        Total time:8.091 sec
132*4882a593Smuzhiyun                8.091833 usecs/op
133*4882a593Smuzhiyun                123581 ops/sec
134*4882a593Smuzhiyun
135*4882a593Smuzhiyun% perf bench sched pipe -l 1000              # loop 1000
136*4882a593Smuzhiyun(executing 1000 pipe operations between two tasks)
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun        Total time:0.016 sec
139*4882a593Smuzhiyun                16.948000 usecs/op
140*4882a593Smuzhiyun                59004 ops/sec
141*4882a593Smuzhiyun---------------------
142*4882a593Smuzhiyun
143*4882a593SmuzhiyunSUITES FOR 'syscall'
144*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~
145*4882a593Smuzhiyun*basic*::
146*4882a593SmuzhiyunSuite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics).
147*4882a593SmuzhiyunThis uses a single thread simply doing getppid(2), which is a simple syscall where the result is not
148*4882a593Smuzhiyuncached by glibc.
149*4882a593Smuzhiyun
150*4882a593Smuzhiyun
151*4882a593SmuzhiyunSUITES FOR 'mem'
152*4882a593Smuzhiyun~~~~~~~~~~~~~~~~
153*4882a593Smuzhiyun*memcpy*::
154*4882a593SmuzhiyunSuite for evaluating performance of simple memory copy in various ways.
155*4882a593Smuzhiyun
156*4882a593SmuzhiyunOptions of *memcpy*
157*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^
158*4882a593Smuzhiyun-l::
159*4882a593Smuzhiyun--size::
160*4882a593SmuzhiyunSpecify size of memory to copy (default: 1MB).
161*4882a593SmuzhiyunAvailable units are B, KB, MB, GB and TB (case insensitive).
162*4882a593Smuzhiyun
163*4882a593Smuzhiyun-f::
164*4882a593Smuzhiyun--function::
165*4882a593SmuzhiyunSpecify function to copy (default: default).
166*4882a593SmuzhiyunAvailable functions are depend on the architecture.
167*4882a593SmuzhiyunOn x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.
168*4882a593Smuzhiyun
169*4882a593Smuzhiyun-l::
170*4882a593Smuzhiyun--nr_loops::
171*4882a593SmuzhiyunRepeat memcpy invocation this number of times.
172*4882a593Smuzhiyun
173*4882a593Smuzhiyun-c::
174*4882a593Smuzhiyun--cycles::
175*4882a593SmuzhiyunUse perf's cpu-cycles event instead of gettimeofday syscall.
176*4882a593Smuzhiyun
177*4882a593Smuzhiyun*memset*::
178*4882a593SmuzhiyunSuite for evaluating performance of simple memory set in various ways.
179*4882a593Smuzhiyun
180*4882a593SmuzhiyunOptions of *memset*
181*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^
182*4882a593Smuzhiyun-l::
183*4882a593Smuzhiyun--size::
184*4882a593SmuzhiyunSpecify size of memory to set (default: 1MB).
185*4882a593SmuzhiyunAvailable units are B, KB, MB, GB and TB (case insensitive).
186*4882a593Smuzhiyun
187*4882a593Smuzhiyun-f::
188*4882a593Smuzhiyun--function::
189*4882a593SmuzhiyunSpecify function to set (default: default).
190*4882a593SmuzhiyunAvailable functions are depend on the architecture.
191*4882a593SmuzhiyunOn x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.
192*4882a593Smuzhiyun
193*4882a593Smuzhiyun-l::
194*4882a593Smuzhiyun--nr_loops::
195*4882a593SmuzhiyunRepeat memset invocation this number of times.
196*4882a593Smuzhiyun
197*4882a593Smuzhiyun-c::
198*4882a593Smuzhiyun--cycles::
199*4882a593SmuzhiyunUse perf's cpu-cycles event instead of gettimeofday syscall.
200*4882a593Smuzhiyun
201*4882a593SmuzhiyunSUITES FOR 'numa'
202*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~
203*4882a593Smuzhiyun*mem*::
204*4882a593SmuzhiyunSuite for evaluating NUMA workloads.
205*4882a593Smuzhiyun
206*4882a593SmuzhiyunSUITES FOR 'futex'
207*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~
208*4882a593Smuzhiyun*hash*::
209*4882a593SmuzhiyunSuite for evaluating hash tables.
210*4882a593Smuzhiyun
211*4882a593Smuzhiyun*wake*::
212*4882a593SmuzhiyunSuite for evaluating wake calls.
213*4882a593Smuzhiyun
214*4882a593Smuzhiyun*wake-parallel*::
215*4882a593SmuzhiyunSuite for evaluating parallel wake calls.
216*4882a593Smuzhiyun
217*4882a593Smuzhiyun*requeue*::
218*4882a593SmuzhiyunSuite for evaluating requeue calls.
219*4882a593Smuzhiyun
220*4882a593Smuzhiyun*lock-pi*::
221*4882a593SmuzhiyunSuite for evaluating futex lock_pi calls.
222*4882a593Smuzhiyun
223*4882a593SmuzhiyunSUITES FOR 'epoll'
224*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~
225*4882a593Smuzhiyun*wait*::
226*4882a593SmuzhiyunSuite for evaluating concurrent epoll_wait calls.
227*4882a593Smuzhiyun
228*4882a593Smuzhiyun*ctl*::
229*4882a593SmuzhiyunSuite for evaluating multiple epoll_ctl calls.
230*4882a593Smuzhiyun
231*4882a593SmuzhiyunSUITES FOR 'internals'
232*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~
233*4882a593Smuzhiyun*synthesize*::
234*4882a593SmuzhiyunSuite for evaluating perf's event synthesis performance.
235*4882a593Smuzhiyun
236*4882a593SmuzhiyunSEE ALSO
237*4882a593Smuzhiyun--------
238*4882a593Smuzhiyunlinkperf:perf[1]
239