xref: /OK3568_Linux_fs/kernel/tools/perf/Documentation/perf-stat.txt (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyunperf-stat(1)
2*4882a593Smuzhiyun============
3*4882a593Smuzhiyun
4*4882a593SmuzhiyunNAME
5*4882a593Smuzhiyun----
6*4882a593Smuzhiyunperf-stat - Run a command and gather performance counter statistics
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunSYNOPSIS
9*4882a593Smuzhiyun--------
10*4882a593Smuzhiyun[verse]
11*4882a593Smuzhiyun'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
12*4882a593Smuzhiyun'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
13*4882a593Smuzhiyun'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>]
14*4882a593Smuzhiyun'perf stat' report [-i file]
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunDESCRIPTION
17*4882a593Smuzhiyun-----------
18*4882a593SmuzhiyunThis command runs a command and gathers performance counter statistics
19*4882a593Smuzhiyunfrom it.
20*4882a593Smuzhiyun
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunOPTIONS
23*4882a593Smuzhiyun-------
24*4882a593Smuzhiyun<command>...::
25*4882a593Smuzhiyun	Any command you can specify in a shell.
26*4882a593Smuzhiyun
27*4882a593Smuzhiyunrecord::
28*4882a593Smuzhiyun	See STAT RECORD.
29*4882a593Smuzhiyun
30*4882a593Smuzhiyunreport::
31*4882a593Smuzhiyun	See STAT REPORT.
32*4882a593Smuzhiyun
33*4882a593Smuzhiyun-e::
34*4882a593Smuzhiyun--event=::
35*4882a593Smuzhiyun	Select the PMU event. Selection can be:
36*4882a593Smuzhiyun
37*4882a593Smuzhiyun	- a symbolic event name (use 'perf list' to list all events)
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun	- a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
40*4882a593Smuzhiyun	  hexadecimal event descriptor.
41*4882a593Smuzhiyun
42*4882a593Smuzhiyun        - a symbolic or raw PMU event followed by an optional colon
43*4882a593Smuzhiyun	  and a list of event modifiers, e.g., cpu-cycles:p.  See the
44*4882a593Smuzhiyun	  linkperf:perf-list[1] man page for details on event modifiers.
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun	- a symbolically formed event like 'pmu/param1=0x3,param2/' where
47*4882a593Smuzhiyun	  param1 and param2 are defined as formats for the PMU in
48*4882a593Smuzhiyun	  /sys/bus/event_source/devices/<pmu>/format/*
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun	  'percore' is a event qualifier that sums up the event counts for both
51*4882a593Smuzhiyun	  hardware threads in a core. For example:
52*4882a593Smuzhiyun	  perf stat -A -a -e cpu/event,percore=1/,otherevent ...
53*4882a593Smuzhiyun
54*4882a593Smuzhiyun	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
55*4882a593Smuzhiyun	  where M, N, K are numbers (in decimal, hex, octal format).
56*4882a593Smuzhiyun	  Acceptable values for each of 'config', 'config1' and 'config2'
57*4882a593Smuzhiyun	  parameters are defined by corresponding entries in
58*4882a593Smuzhiyun	  /sys/bus/event_source/devices/<pmu>/format/*
59*4882a593Smuzhiyun
60*4882a593Smuzhiyun	Note that the last two syntaxes support prefix and glob matching in
61*4882a593Smuzhiyun	the PMU name to simplify creation of events across multiple instances
62*4882a593Smuzhiyun	of the same type of PMU in large systems (e.g. memory controller PMUs).
63*4882a593Smuzhiyun	Multiple PMU instances are typical for uncore PMUs, so the prefix
64*4882a593Smuzhiyun	'uncore_' is also ignored when performing this match.
65*4882a593Smuzhiyun
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun-i::
68*4882a593Smuzhiyun--no-inherit::
69*4882a593Smuzhiyun        child tasks do not inherit counters
70*4882a593Smuzhiyun-p::
71*4882a593Smuzhiyun--pid=<pid>::
72*4882a593Smuzhiyun        stat events on existing process id (comma separated list)
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun-t::
75*4882a593Smuzhiyun--tid=<tid>::
76*4882a593Smuzhiyun        stat events on existing thread id (comma separated list)
77*4882a593Smuzhiyun
78*4882a593Smuzhiyunifdef::HAVE_LIBPFM[]
79*4882a593Smuzhiyun--pfm-events events::
80*4882a593SmuzhiyunSelect a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
81*4882a593Smuzhiyunincluding support for event filters. For example '--pfm-events
82*4882a593Smuzhiyuninst_retired:any_p:u:c=1:i'. More than one event can be passed to the
83*4882a593Smuzhiyunoption using the comma separator. Hardware events and generic hardware
84*4882a593Smuzhiyunevents cannot be mixed together. The latter must be used with the -e
85*4882a593Smuzhiyunoption. The -e option and this one can be mixed and matched.  Events
86*4882a593Smuzhiyuncan be grouped using the {} notation.
87*4882a593Smuzhiyunendif::HAVE_LIBPFM[]
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun-a::
90*4882a593Smuzhiyun--all-cpus::
91*4882a593Smuzhiyun        system-wide collection from all CPUs (default if no target is specified)
92*4882a593Smuzhiyun
93*4882a593Smuzhiyun--no-scale::
94*4882a593Smuzhiyun	Don't scale/normalize counter values
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun-d::
97*4882a593Smuzhiyun--detailed::
98*4882a593Smuzhiyun	print more detailed statistics, can be specified up to 3 times
99*4882a593Smuzhiyun
100*4882a593Smuzhiyun	   -d:          detailed events, L1 and LLC data cache
101*4882a593Smuzhiyun        -d -d:     more detailed events, dTLB and iTLB events
102*4882a593Smuzhiyun     -d -d -d:     very detailed events, adding prefetch events
103*4882a593Smuzhiyun
104*4882a593Smuzhiyun-r::
105*4882a593Smuzhiyun--repeat=<n>::
106*4882a593Smuzhiyun	repeat command and print average + stddev (max: 100). 0 means forever.
107*4882a593Smuzhiyun
108*4882a593Smuzhiyun-B::
109*4882a593Smuzhiyun--big-num::
110*4882a593Smuzhiyun        print large numbers with thousands' separators according to locale.
111*4882a593Smuzhiyun	Enabled by default. Use "--no-big-num" to disable.
112*4882a593Smuzhiyun	Default setting can be changed with "perf config stat.big-num=false".
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun-C::
115*4882a593Smuzhiyun--cpu=::
116*4882a593SmuzhiyunCount only on the list of CPUs provided. Multiple CPUs can be provided as a
117*4882a593Smuzhiyuncomma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
118*4882a593SmuzhiyunIn per-thread mode, this option is ignored. The -a option is still necessary
119*4882a593Smuzhiyunto activate system-wide monitoring. Default is to count on all CPUs.
120*4882a593Smuzhiyun
121*4882a593Smuzhiyun-A::
122*4882a593Smuzhiyun--no-aggr::
123*4882a593SmuzhiyunDo not aggregate counts across all monitored CPUs.
124*4882a593Smuzhiyun
125*4882a593Smuzhiyun-n::
126*4882a593Smuzhiyun--null::
127*4882a593Smuzhiyun        null run - don't start any counters
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun-v::
130*4882a593Smuzhiyun--verbose::
131*4882a593Smuzhiyun        be more verbose (show counter open errors, etc)
132*4882a593Smuzhiyun
133*4882a593Smuzhiyun-x SEP::
134*4882a593Smuzhiyun--field-separator SEP::
135*4882a593Smuzhiyunprint counts using a CSV-style output to make it easy to import directly into
136*4882a593Smuzhiyunspreadsheets. Columns are separated by the string specified in SEP.
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun--table:: Display time for each run (-r option), in a table format, e.g.:
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun  $ perf stat --null -r 5 --table perf bench sched pipe
141*4882a593Smuzhiyun
142*4882a593Smuzhiyun   Performance counter stats for 'perf bench sched pipe' (5 runs):
143*4882a593Smuzhiyun
144*4882a593Smuzhiyun             # Table of individual measurements:
145*4882a593Smuzhiyun             5.189 (-0.293) #
146*4882a593Smuzhiyun             5.189 (-0.294) #
147*4882a593Smuzhiyun             5.186 (-0.296) #
148*4882a593Smuzhiyun             5.663 (+0.181) ##
149*4882a593Smuzhiyun             6.186 (+0.703) ####
150*4882a593Smuzhiyun
151*4882a593Smuzhiyun             # Final result:
152*4882a593Smuzhiyun             5.483 +- 0.198 seconds time elapsed  ( +-  3.62% )
153*4882a593Smuzhiyun
154*4882a593Smuzhiyun-G name::
155*4882a593Smuzhiyun--cgroup name::
156*4882a593Smuzhiyunmonitor only in the container (cgroup) called "name". This option is available only
157*4882a593Smuzhiyunin per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
158*4882a593Smuzhiyuncontainer "name" are monitored when they run on the monitored CPUs. Multiple cgroups
159*4882a593Smuzhiyuncan be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
160*4882a593Smuzhiyunto first event, second cgroup to second event and so on. It is possible to provide
161*4882a593Smuzhiyunan empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
162*4882a593Smuzhiyuncorresponding events, i.e., they always refer to events defined earlier on the command
163*4882a593Smuzhiyunline. If the user wants to track multiple events for a specific cgroup, the user can
164*4882a593Smuzhiyunuse '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
165*4882a593Smuzhiyun
166*4882a593SmuzhiyunIf wanting to monitor, say, 'cycles' for a cgroup and also for system wide, this
167*4882a593Smuzhiyuncommand line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
168*4882a593Smuzhiyun
169*4882a593Smuzhiyun--for-each-cgroup name::
170*4882a593SmuzhiyunExpand event list for each cgroup in "name" (allow multiple cgroups separated
171*4882a593Smuzhiyunby comma).  This has same effect that repeating -e option and -G option for
172*4882a593Smuzhiyuneach event x name.  This option cannot be used with -G/--cgroup option.
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun-o file::
175*4882a593Smuzhiyun--output file::
176*4882a593SmuzhiyunPrint the output into the designated file.
177*4882a593Smuzhiyun
178*4882a593Smuzhiyun--append::
179*4882a593SmuzhiyunAppend to the output file designated with the -o option. Ignored if -o is not specified.
180*4882a593Smuzhiyun
181*4882a593Smuzhiyun--log-fd::
182*4882a593Smuzhiyun
183*4882a593SmuzhiyunLog output to fd, instead of stderr.  Complementary to --output, and mutually exclusive
184*4882a593Smuzhiyunwith it.  --append may be used here.  Examples:
185*4882a593Smuzhiyun     3>results  perf stat --log-fd 3          -- $cmd
186*4882a593Smuzhiyun     3>>results perf stat --log-fd 3 --append -- $cmd
187*4882a593Smuzhiyun
188*4882a593Smuzhiyun--control=fifo:ctl-fifo[,ack-fifo]::
189*4882a593Smuzhiyun--control=fd:ctl-fd[,ack-fd]::
190*4882a593Smuzhiyunctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
191*4882a593SmuzhiyunListen on ctl-fd descriptor for command to control measurement ('enable': enable events,
192*4882a593Smuzhiyun'disable': disable events). Measurements can be started with events disabled using
193*4882a593Smuzhiyun--delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
194*4882a593Smuzhiyunto synchronize with the controlling process. Example of bash shell script to enable and
195*4882a593Smuzhiyundisable events during measurements:
196*4882a593Smuzhiyun
197*4882a593Smuzhiyun #!/bin/bash
198*4882a593Smuzhiyun
199*4882a593Smuzhiyun ctl_dir=/tmp/
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun ctl_fifo=${ctl_dir}perf_ctl.fifo
202*4882a593Smuzhiyun test -p ${ctl_fifo} && unlink ${ctl_fifo}
203*4882a593Smuzhiyun mkfifo ${ctl_fifo}
204*4882a593Smuzhiyun exec {ctl_fd}<>${ctl_fifo}
205*4882a593Smuzhiyun
206*4882a593Smuzhiyun ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo
207*4882a593Smuzhiyun test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
208*4882a593Smuzhiyun mkfifo ${ctl_ack_fifo}
209*4882a593Smuzhiyun exec {ctl_fd_ack}<>${ctl_ack_fifo}
210*4882a593Smuzhiyun
211*4882a593Smuzhiyun perf stat -D -1 -e cpu-cycles -a -I 1000       \
212*4882a593Smuzhiyun           --control fd:${ctl_fd},${ctl_fd_ack} \
213*4882a593Smuzhiyun           -- sleep 30 &
214*4882a593Smuzhiyun perf_pid=$!
215*4882a593Smuzhiyun
216*4882a593Smuzhiyun sleep 5  && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
217*4882a593Smuzhiyun sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
218*4882a593Smuzhiyun
219*4882a593Smuzhiyun exec {ctl_fd_ack}>&-
220*4882a593Smuzhiyun unlink ${ctl_ack_fifo}
221*4882a593Smuzhiyun
222*4882a593Smuzhiyun exec {ctl_fd}>&-
223*4882a593Smuzhiyun unlink ${ctl_fifo}
224*4882a593Smuzhiyun
225*4882a593Smuzhiyun wait -n ${perf_pid}
226*4882a593Smuzhiyun exit $?
227*4882a593Smuzhiyun
228*4882a593Smuzhiyun
229*4882a593Smuzhiyun--pre::
230*4882a593Smuzhiyun--post::
231*4882a593Smuzhiyun	Pre and post measurement hooks, e.g.:
232*4882a593Smuzhiyun
233*4882a593Smuzhiyunperf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defconfig-build/ bzImage
234*4882a593Smuzhiyun
235*4882a593Smuzhiyun-I msecs::
236*4882a593Smuzhiyun--interval-print msecs::
237*4882a593SmuzhiyunPrint count deltas every N milliseconds (minimum: 1ms)
238*4882a593SmuzhiyunThe overhead percentage could be high in some cases, for instance with small, sub 100ms intervals.  Use with caution.
239*4882a593Smuzhiyun	example: 'perf stat -I 1000 -e cycles -a sleep 5'
240*4882a593Smuzhiyun
241*4882a593SmuzhiyunIf the metric exists, it is calculated by the counts generated in this interval and the metric is printed after #.
242*4882a593Smuzhiyun
243*4882a593Smuzhiyun--interval-count times::
244*4882a593SmuzhiyunPrint count deltas for fixed number of times.
245*4882a593SmuzhiyunThis option should be used together with "-I" option.
246*4882a593Smuzhiyun	example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
247*4882a593Smuzhiyun
248*4882a593Smuzhiyun--interval-clear::
249*4882a593SmuzhiyunClear the screen before next interval.
250*4882a593Smuzhiyun
251*4882a593Smuzhiyun--timeout msecs::
252*4882a593SmuzhiyunStop the 'perf stat' session and print count deltas after N milliseconds (minimum: 10 ms).
253*4882a593SmuzhiyunThis option is not supported with the "-I" option.
254*4882a593Smuzhiyun	example: 'perf stat --time 2000 -e cycles -a'
255*4882a593Smuzhiyun
256*4882a593Smuzhiyun--metric-only::
257*4882a593SmuzhiyunOnly print computed metrics. Print them in a single line.
258*4882a593SmuzhiyunDon't show any raw values. Not supported with --per-thread.
259*4882a593Smuzhiyun
260*4882a593Smuzhiyun--per-socket::
261*4882a593SmuzhiyunAggregate counts per processor socket for system-wide mode measurements.  This
262*4882a593Smuzhiyunis a useful mode to detect imbalance between sockets.  To enable this mode,
263*4882a593Smuzhiyunuse --per-socket in addition to -a. (system-wide).  The output includes the
264*4882a593Smuzhiyunsocket number and the number of online processors on that socket. This is
265*4882a593Smuzhiyunuseful to gauge the amount of aggregation.
266*4882a593Smuzhiyun
267*4882a593Smuzhiyun--per-die::
268*4882a593SmuzhiyunAggregate counts per processor die for system-wide mode measurements.  This
269*4882a593Smuzhiyunis a useful mode to detect imbalance between dies.  To enable this mode,
270*4882a593Smuzhiyunuse --per-die in addition to -a. (system-wide).  The output includes the
271*4882a593Smuzhiyundie number and the number of online processors on that die. This is
272*4882a593Smuzhiyunuseful to gauge the amount of aggregation.
273*4882a593Smuzhiyun
274*4882a593Smuzhiyun--per-core::
275*4882a593SmuzhiyunAggregate counts per physical processor for system-wide mode measurements.  This
276*4882a593Smuzhiyunis a useful mode to detect imbalance between physical cores.  To enable this mode,
277*4882a593Smuzhiyunuse --per-core in addition to -a. (system-wide).  The output includes the
278*4882a593Smuzhiyuncore number and the number of online logical processors on that physical processor.
279*4882a593Smuzhiyun
280*4882a593Smuzhiyun--per-thread::
281*4882a593SmuzhiyunAggregate counts per monitored threads, when monitoring threads (-t option)
282*4882a593Smuzhiyunor processes (-p option).
283*4882a593Smuzhiyun
284*4882a593Smuzhiyun--per-node::
285*4882a593SmuzhiyunAggregate counts per NUMA nodes for system-wide mode measurements. This
286*4882a593Smuzhiyunis a useful mode to detect imbalance between NUMA nodes. To enable this
287*4882a593Smuzhiyunmode, use --per-node in addition to -a. (system-wide).
288*4882a593Smuzhiyun
289*4882a593Smuzhiyun-D msecs::
290*4882a593Smuzhiyun--delay msecs::
291*4882a593SmuzhiyunAfter starting the program, wait msecs before measuring (-1: start with events
292*4882a593Smuzhiyundisabled). This is useful to filter out the startup phase of the program,
293*4882a593Smuzhiyunwhich is often very different.
294*4882a593Smuzhiyun
295*4882a593Smuzhiyun-T::
296*4882a593Smuzhiyun--transaction::
297*4882a593Smuzhiyun
298*4882a593SmuzhiyunPrint statistics of transactional execution if supported.
299*4882a593Smuzhiyun
300*4882a593Smuzhiyun--metric-no-group::
301*4882a593SmuzhiyunBy default, events to compute a metric are placed in weak groups. The
302*4882a593Smuzhiyungroup tries to enforce scheduling all or none of the events. The
303*4882a593Smuzhiyun--metric-no-group option places events outside of groups and may
304*4882a593Smuzhiyunincrease the chance of the event being scheduled - leading to more
305*4882a593Smuzhiyunaccuracy. However, as events may not be scheduled together accuracy
306*4882a593Smuzhiyunfor metrics like instructions per cycle can be lower - as both metrics
307*4882a593Smuzhiyunmay no longer be being measured at the same time.
308*4882a593Smuzhiyun
309*4882a593Smuzhiyun--metric-no-merge::
310*4882a593SmuzhiyunBy default metric events in different weak groups can be shared if one
311*4882a593Smuzhiyungroup contains all the events needed by another. In such cases one
312*4882a593Smuzhiyungroup will be eliminated reducing event multiplexing and making it so
313*4882a593Smuzhiyunthat certain groups of metrics sum to 100%. A downside to sharing a
314*4882a593Smuzhiyungroup is that the group may require multiplexing and so accuracy for a
315*4882a593Smuzhiyunsmall group that need not have multiplexing is lowered. This option
316*4882a593Smuzhiyunforbids the event merging logic from sharing events between groups and
317*4882a593Smuzhiyunmay be used to increase accuracy in this case.
318*4882a593Smuzhiyun
319*4882a593SmuzhiyunSTAT RECORD
320*4882a593Smuzhiyun-----------
321*4882a593SmuzhiyunStores stat data into perf data file.
322*4882a593Smuzhiyun
323*4882a593Smuzhiyun-o file::
324*4882a593Smuzhiyun--output file::
325*4882a593SmuzhiyunOutput file name.
326*4882a593Smuzhiyun
327*4882a593SmuzhiyunSTAT REPORT
328*4882a593Smuzhiyun-----------
329*4882a593SmuzhiyunReads and reports stat data from perf data file.
330*4882a593Smuzhiyun
331*4882a593Smuzhiyun-i file::
332*4882a593Smuzhiyun--input file::
333*4882a593SmuzhiyunInput file name.
334*4882a593Smuzhiyun
335*4882a593Smuzhiyun--per-socket::
336*4882a593SmuzhiyunAggregate counts per processor socket for system-wide mode measurements.
337*4882a593Smuzhiyun
338*4882a593Smuzhiyun--per-die::
339*4882a593SmuzhiyunAggregate counts per processor die for system-wide mode measurements.
340*4882a593Smuzhiyun
341*4882a593Smuzhiyun--per-core::
342*4882a593SmuzhiyunAggregate counts per physical processor for system-wide mode measurements.
343*4882a593Smuzhiyun
344*4882a593Smuzhiyun-M::
345*4882a593Smuzhiyun--metrics::
346*4882a593SmuzhiyunPrint metrics or metricgroups specified in a comma separated list.
347*4882a593SmuzhiyunFor a group all metrics from the group are added.
348*4882a593SmuzhiyunThe events from the metrics are automatically measured.
349*4882a593SmuzhiyunSee perf list output for the possble metrics and metricgroups.
350*4882a593Smuzhiyun
351*4882a593Smuzhiyun-A::
352*4882a593Smuzhiyun--no-aggr::
353*4882a593SmuzhiyunDo not aggregate counts across all monitored CPUs.
354*4882a593Smuzhiyun
355*4882a593Smuzhiyun--topdown::
356*4882a593SmuzhiyunPrint top down level 1 metrics if supported by the CPU. This allows to
357*4882a593Smuzhiyundetermine bottle necks in the CPU pipeline for CPU bound workloads,
358*4882a593Smuzhiyunby breaking the cycles consumed down into frontend bound, backend bound,
359*4882a593Smuzhiyunbad speculation and retiring.
360*4882a593Smuzhiyun
361*4882a593SmuzhiyunFrontend bound means that the CPU cannot fetch and decode instructions fast
362*4882a593Smuzhiyunenough. Backend bound means that computation or memory access is the bottle
363*4882a593Smuzhiyunneck. Bad Speculation means that the CPU wasted cycles due to branch
364*4882a593Smuzhiyunmispredictions and similar issues. Retiring means that the CPU computed without
365*4882a593Smuzhiyunan apparently bottleneck. The bottleneck is only the real bottleneck
366*4882a593Smuzhiyunif the workload is actually bound by the CPU and not by something else.
367*4882a593Smuzhiyun
368*4882a593SmuzhiyunFor best results it is usually a good idea to use it with interval
369*4882a593Smuzhiyunmode like -I 1000, as the bottleneck of workloads can change often.
370*4882a593Smuzhiyun
371*4882a593SmuzhiyunThis enables --metric-only, unless overridden with --no-metric-only.
372*4882a593Smuzhiyun
373*4882a593SmuzhiyunThe following restrictions only apply to older Intel CPUs and Atom,
374*4882a593Smuzhiyunon newer CPUs (IceLake and later) TopDown can be collected for any thread:
375*4882a593Smuzhiyun
376*4882a593SmuzhiyunThe top down metrics are collected per core instead of per
377*4882a593SmuzhiyunCPU thread. Per core mode is automatically enabled
378*4882a593Smuzhiyunand -a (global monitoring) is needed, requiring root rights or
379*4882a593Smuzhiyunperf.perf_event_paranoid=-1.
380*4882a593Smuzhiyun
381*4882a593SmuzhiyunTopdown uses the full Performance Monitoring Unit, and needs
382*4882a593Smuzhiyundisabling of the NMI watchdog (as root):
383*4882a593Smuzhiyunecho 0 > /proc/sys/kernel/nmi_watchdog
384*4882a593Smuzhiyunfor best results. Otherwise the bottlenecks may be inconsistent
385*4882a593Smuzhiyunon workload with changing phases.
386*4882a593Smuzhiyun
387*4882a593SmuzhiyunTo interpret the results it is usually needed to know on which
388*4882a593SmuzhiyunCPUs the workload runs on. If needed the CPUs can be forced using
389*4882a593Smuzhiyuntaskset.
390*4882a593Smuzhiyun
391*4882a593Smuzhiyun--no-merge::
392*4882a593SmuzhiyunDo not merge results from same PMUs.
393*4882a593Smuzhiyun
394*4882a593SmuzhiyunWhen multiple events are created from a single event specification,
395*4882a593Smuzhiyunstat will, by default, aggregate the event counts and show the result
396*4882a593Smuzhiyunin a single row. This option disables that behavior and shows
397*4882a593Smuzhiyunthe individual events and counts.
398*4882a593Smuzhiyun
399*4882a593SmuzhiyunMultiple events are created from a single event specification when:
400*4882a593Smuzhiyun1. Prefix or glob matching is used for the PMU name.
401*4882a593Smuzhiyun2. Aliases, which are listed immediately after the Kernel PMU events
402*4882a593Smuzhiyun   by perf list, are used.
403*4882a593Smuzhiyun
404*4882a593Smuzhiyun--smi-cost::
405*4882a593SmuzhiyunMeasure SMI cost if msr/aperf/ and msr/smi/ events are supported.
406*4882a593Smuzhiyun
407*4882a593SmuzhiyunDuring the measurement, the /sys/device/cpu/freeze_on_smi will be set to
408*4882a593Smuzhiyunfreeze core counters on SMI.
409*4882a593SmuzhiyunThe aperf counter will not be effected by the setting.
410*4882a593SmuzhiyunThe cost of SMI can be measured by (aperf - unhalted core cycles).
411*4882a593Smuzhiyun
412*4882a593SmuzhiyunIn practice, the percentages of SMI cycles is very useful for performance
413*4882a593Smuzhiyunoriented analysis. --metric_only will be applied by default.
414*4882a593SmuzhiyunThe output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf
415*4882a593Smuzhiyun
416*4882a593SmuzhiyunUsers who wants to get the actual value can apply --no-metric-only.
417*4882a593Smuzhiyun
418*4882a593Smuzhiyun--all-kernel::
419*4882a593SmuzhiyunConfigure all used events to run in kernel space.
420*4882a593Smuzhiyun
421*4882a593Smuzhiyun--all-user::
422*4882a593SmuzhiyunConfigure all used events to run in user space.
423*4882a593Smuzhiyun
424*4882a593Smuzhiyun--percore-show-thread::
425*4882a593SmuzhiyunThe event modifier "percore" has supported to sum up the event counts
426*4882a593Smuzhiyunfor all hardware threads in a core and show the counts per core.
427*4882a593Smuzhiyun
428*4882a593SmuzhiyunThis option with event modifier "percore" enabled also sums up the event
429*4882a593Smuzhiyuncounts for all hardware threads in a core but show the sum counts per
430*4882a593Smuzhiyunhardware thread. This is essentially a replacement for the any bit and
431*4882a593Smuzhiyunconvenient for post processing.
432*4882a593Smuzhiyun
433*4882a593Smuzhiyun--summary::
434*4882a593SmuzhiyunPrint summary for interval mode (-I).
435*4882a593Smuzhiyun
436*4882a593SmuzhiyunEXAMPLES
437*4882a593Smuzhiyun--------
438*4882a593Smuzhiyun
439*4882a593Smuzhiyun$ perf stat -- make
440*4882a593Smuzhiyun
441*4882a593Smuzhiyun   Performance counter stats for 'make':
442*4882a593Smuzhiyun
443*4882a593Smuzhiyun        83723.452481      task-clock:u (msec)       #    1.004 CPUs utilized
444*4882a593Smuzhiyun                   0      context-switches:u        #    0.000 K/sec
445*4882a593Smuzhiyun                   0      cpu-migrations:u          #    0.000 K/sec
446*4882a593Smuzhiyun           3,228,188      page-faults:u             #    0.039 M/sec
447*4882a593Smuzhiyun     229,570,665,834      cycles:u                  #    2.742 GHz
448*4882a593Smuzhiyun     313,163,853,778      instructions:u            #    1.36  insn per cycle
449*4882a593Smuzhiyun      69,704,684,856      branches:u                #  832.559 M/sec
450*4882a593Smuzhiyun       2,078,861,393      branch-misses:u           #    2.98% of all branches
451*4882a593Smuzhiyun
452*4882a593Smuzhiyun        83.409183620 seconds time elapsed
453*4882a593Smuzhiyun
454*4882a593Smuzhiyun        74.684747000 seconds user
455*4882a593Smuzhiyun         8.739217000 seconds sys
456*4882a593Smuzhiyun
457*4882a593SmuzhiyunTIMINGS
458*4882a593Smuzhiyun-------
459*4882a593SmuzhiyunAs displayed in the example above we can display 3 types of timings.
460*4882a593SmuzhiyunWe always display the time the counters were enabled/alive:
461*4882a593Smuzhiyun
462*4882a593Smuzhiyun        83.409183620 seconds time elapsed
463*4882a593Smuzhiyun
464*4882a593SmuzhiyunFor workload sessions we also display time the workloads spent in
465*4882a593Smuzhiyunuser/system lands:
466*4882a593Smuzhiyun
467*4882a593Smuzhiyun        74.684747000 seconds user
468*4882a593Smuzhiyun         8.739217000 seconds sys
469*4882a593Smuzhiyun
470*4882a593SmuzhiyunThose times are the very same as displayed by the 'time' tool.
471*4882a593Smuzhiyun
472*4882a593SmuzhiyunCSV FORMAT
473*4882a593Smuzhiyun----------
474*4882a593Smuzhiyun
475*4882a593SmuzhiyunWith -x, perf stat is able to output a not-quite-CSV format output
476*4882a593SmuzhiyunCommas in the output are not put into "". To make it easy to parse
477*4882a593Smuzhiyunit is recommended to use a different character like -x \;
478*4882a593Smuzhiyun
479*4882a593SmuzhiyunThe fields are in this order:
480*4882a593Smuzhiyun
481*4882a593Smuzhiyun	- optional usec time stamp in fractions of second (with -I xxx)
482*4882a593Smuzhiyun	- optional CPU, core, or socket identifier
483*4882a593Smuzhiyun	- optional number of logical CPUs aggregated
484*4882a593Smuzhiyun	- counter value
485*4882a593Smuzhiyun	- unit of the counter value or empty
486*4882a593Smuzhiyun	- event name
487*4882a593Smuzhiyun	- run time of counter
488*4882a593Smuzhiyun	- percentage of measurement time the counter was running
489*4882a593Smuzhiyun	- optional variance if multiple values are collected with -r
490*4882a593Smuzhiyun	- optional metric value
491*4882a593Smuzhiyun	- optional unit of metric
492*4882a593Smuzhiyun
493*4882a593SmuzhiyunAdditional metrics may be printed with all earlier fields being empty.
494*4882a593Smuzhiyun
495*4882a593SmuzhiyunSEE ALSO
496*4882a593Smuzhiyun--------
497*4882a593Smuzhiyunlinkperf:perf-top[1], linkperf:perf-list[1]
498