xref: /OK3568_Linux_fs/kernel/tools/perf/Documentation/perf-report.txt (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyunperf-report(1)
2*4882a593Smuzhiyun==============
3*4882a593Smuzhiyun
4*4882a593SmuzhiyunNAME
5*4882a593Smuzhiyun----
6*4882a593Smuzhiyunperf-report - Read perf.data (created by perf record) and display the profile
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunSYNOPSIS
9*4882a593Smuzhiyun--------
10*4882a593Smuzhiyun[verse]
11*4882a593Smuzhiyun'perf report' [-i <file> | --input=file]
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunDESCRIPTION
14*4882a593Smuzhiyun-----------
15*4882a593SmuzhiyunThis command displays the performance counter profile information recorded
16*4882a593Smuzhiyunvia perf record.
17*4882a593Smuzhiyun
18*4882a593SmuzhiyunOPTIONS
19*4882a593Smuzhiyun-------
20*4882a593Smuzhiyun-i::
21*4882a593Smuzhiyun--input=::
22*4882a593Smuzhiyun        Input file name. (default: perf.data unless stdin is a fifo)
23*4882a593Smuzhiyun
24*4882a593Smuzhiyun-v::
25*4882a593Smuzhiyun--verbose::
26*4882a593Smuzhiyun        Be more verbose. (show symbol address, etc)
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun-q::
29*4882a593Smuzhiyun--quiet::
30*4882a593Smuzhiyun	Do not show any message.  (Suppress -v)
31*4882a593Smuzhiyun
32*4882a593Smuzhiyun-n::
33*4882a593Smuzhiyun--show-nr-samples::
34*4882a593Smuzhiyun	Show the number of samples for each symbol
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun--show-cpu-utilization::
37*4882a593Smuzhiyun        Show sample percentage for different cpu modes.
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun-T::
40*4882a593Smuzhiyun--threads::
41*4882a593Smuzhiyun	Show per-thread event counters.  The input data file should be recorded
42*4882a593Smuzhiyun	with -s option.
43*4882a593Smuzhiyun-c::
44*4882a593Smuzhiyun--comms=::
45*4882a593Smuzhiyun	Only consider symbols in these comms. CSV that understands
46*4882a593Smuzhiyun	file://filename entries.  This option will affect the percentage of
47*4882a593Smuzhiyun	the overhead column.  See --percentage for more info.
48*4882a593Smuzhiyun--pid=::
49*4882a593Smuzhiyun        Only show events for given process ID (comma separated list).
50*4882a593Smuzhiyun
51*4882a593Smuzhiyun--tid=::
52*4882a593Smuzhiyun        Only show events for given thread ID (comma separated list).
53*4882a593Smuzhiyun-d::
54*4882a593Smuzhiyun--dsos=::
55*4882a593Smuzhiyun	Only consider symbols in these dsos. CSV that understands
56*4882a593Smuzhiyun	file://filename entries.  This option will affect the percentage of
57*4882a593Smuzhiyun	the overhead column.  See --percentage for more info.
58*4882a593Smuzhiyun-S::
59*4882a593Smuzhiyun--symbols=::
60*4882a593Smuzhiyun	Only consider these symbols. CSV that understands
61*4882a593Smuzhiyun	file://filename entries.  This option will affect the percentage of
62*4882a593Smuzhiyun	the overhead column.  See --percentage for more info.
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun--symbol-filter=::
65*4882a593Smuzhiyun	Only show symbols that match (partially) with this filter.
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun-U::
68*4882a593Smuzhiyun--hide-unresolved::
69*4882a593Smuzhiyun        Only display entries resolved to a symbol.
70*4882a593Smuzhiyun
71*4882a593Smuzhiyun-s::
72*4882a593Smuzhiyun--sort=::
73*4882a593Smuzhiyun	Sort histogram entries by given key(s) - multiple keys can be specified
74*4882a593Smuzhiyun	in CSV format.  Following sort keys are available:
75*4882a593Smuzhiyun	pid, comm, dso, symbol, parent, cpu, socket, srcline, weight,
76*4882a593Smuzhiyun	local_weight, cgroup_id.
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun	Each key has following meaning:
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun	- comm: command (name) of the task which can be read via /proc/<pid>/comm
81*4882a593Smuzhiyun	- pid: command and tid of the task
82*4882a593Smuzhiyun	- dso: name of library or module executed at the time of sample
83*4882a593Smuzhiyun	- dso_size: size of library or module executed at the time of sample
84*4882a593Smuzhiyun	- symbol: name of function executed at the time of sample
85*4882a593Smuzhiyun	- symbol_size: size of function executed at the time of sample
86*4882a593Smuzhiyun	- parent: name of function matched to the parent regex filter. Unmatched
87*4882a593Smuzhiyun	entries are displayed as "[other]".
88*4882a593Smuzhiyun	- cpu: cpu number the task ran at the time of sample
89*4882a593Smuzhiyun	- socket: processor socket number the task ran at the time of sample
90*4882a593Smuzhiyun	- srcline: filename and line number executed at the time of sample.  The
91*4882a593Smuzhiyun	DWARF debugging info must be provided.
92*4882a593Smuzhiyun	- srcfile: file name of the source file of the samples. Requires dwarf
93*4882a593Smuzhiyun	information.
94*4882a593Smuzhiyun	- weight: Event specific weight, e.g. memory latency or transaction
95*4882a593Smuzhiyun	abort cost. This is the global weight.
96*4882a593Smuzhiyun	- local_weight: Local weight version of the weight above.
97*4882a593Smuzhiyun	- cgroup_id: ID derived from cgroup namespace device and inode numbers.
98*4882a593Smuzhiyun	- cgroup: cgroup pathname in the cgroupfs.
99*4882a593Smuzhiyun	- transaction: Transaction abort flags.
100*4882a593Smuzhiyun	- overhead: Overhead percentage of sample
101*4882a593Smuzhiyun	- overhead_sys: Overhead percentage of sample running in system mode
102*4882a593Smuzhiyun	- overhead_us: Overhead percentage of sample running in user mode
103*4882a593Smuzhiyun	- overhead_guest_sys: Overhead percentage of sample running in system mode
104*4882a593Smuzhiyun	on guest machine
105*4882a593Smuzhiyun	- overhead_guest_us: Overhead percentage of sample running in user mode on
106*4882a593Smuzhiyun	guest machine
107*4882a593Smuzhiyun	- sample: Number of sample
108*4882a593Smuzhiyun	- period: Raw number of event count of sample
109*4882a593Smuzhiyun	- time: Separate the samples by time stamp with the resolution specified by
110*4882a593Smuzhiyun	--time-quantum (default 100ms). Specify with overhead and before it.
111*4882a593Smuzhiyun
112*4882a593Smuzhiyun	By default, comm, dso and symbol keys are used.
113*4882a593Smuzhiyun	(i.e. --sort comm,dso,symbol)
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun	If --branch-stack option is used, following sort keys are also
116*4882a593Smuzhiyun	available:
117*4882a593Smuzhiyun
118*4882a593Smuzhiyun	- dso_from: name of library or module branched from
119*4882a593Smuzhiyun	- dso_to: name of library or module branched to
120*4882a593Smuzhiyun	- symbol_from: name of function branched from
121*4882a593Smuzhiyun	- symbol_to: name of function branched to
122*4882a593Smuzhiyun	- srcline_from: source file and line branched from
123*4882a593Smuzhiyun	- srcline_to: source file and line branched to
124*4882a593Smuzhiyun	- mispredict: "N" for predicted branch, "Y" for mispredicted branch
125*4882a593Smuzhiyun	- in_tx: branch in TSX transaction
126*4882a593Smuzhiyun	- abort: TSX transaction abort.
127*4882a593Smuzhiyun	- cycles: Cycles in basic block
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun	And default sort keys are changed to comm, dso_from, symbol_from, dso_to
130*4882a593Smuzhiyun	and symbol_to, see '--branch-stack'.
131*4882a593Smuzhiyun
132*4882a593Smuzhiyun	When the sort key symbol is specified, columns "IPC" and "IPC Coverage"
133*4882a593Smuzhiyun	are enabled automatically. Column "IPC" reports the average IPC per function
134*4882a593Smuzhiyun	and column "IPC coverage" reports the percentage of instructions with
135*4882a593Smuzhiyun	sampled IPC in this function. IPC means Instruction Per Cycle. If it's low,
136*4882a593Smuzhiyun	it indicates there may be a performance bottleneck when the function is
137*4882a593Smuzhiyun	executed, such as a memory access bottleneck. If a function has high overhead
138*4882a593Smuzhiyun	and low IPC, it's worth further analyzing it to optimize its performance.
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun	If the --mem-mode option is used, the following sort keys are also available
141*4882a593Smuzhiyun	(incompatible with --branch-stack):
142*4882a593Smuzhiyun	symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.
143*4882a593Smuzhiyun
144*4882a593Smuzhiyun	- symbol_daddr: name of data symbol being executed on at the time of sample
145*4882a593Smuzhiyun	- dso_daddr: name of library or module containing the data being executed
146*4882a593Smuzhiyun	on at the time of the sample
147*4882a593Smuzhiyun	- locked: whether the bus was locked at the time of the sample
148*4882a593Smuzhiyun	- tlb: type of tlb access for the data at the time of the sample
149*4882a593Smuzhiyun	- mem: type of memory access for the data at the time of the sample
150*4882a593Smuzhiyun	- snoop: type of snoop (if any) for the data at the time of the sample
151*4882a593Smuzhiyun	- dcacheline: the cacheline the data address is on at the time of the sample
152*4882a593Smuzhiyun	- phys_daddr: physical address of data being executed on at the time of sample
153*4882a593Smuzhiyun
154*4882a593Smuzhiyun	And the default sort keys are changed to local_weight, mem, sym, dso,
155*4882a593Smuzhiyun	symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'.
156*4882a593Smuzhiyun
157*4882a593Smuzhiyun	If the data file has tracepoint event(s), following (dynamic) sort keys
158*4882a593Smuzhiyun	are also available:
159*4882a593Smuzhiyun	trace, trace_fields, [<event>.]<field>[/raw]
160*4882a593Smuzhiyun
161*4882a593Smuzhiyun	- trace: pretty printed trace output in a single column
162*4882a593Smuzhiyun	- trace_fields: fields in tracepoints in separate columns
163*4882a593Smuzhiyun	- <field name>: optional event and field name for a specific field
164*4882a593Smuzhiyun
165*4882a593Smuzhiyun	The last form consists of event and field names.  If event name is
166*4882a593Smuzhiyun	omitted, it searches all events for matching field name.  The matched
167*4882a593Smuzhiyun	field will be shown only for the event has the field.  The event name
168*4882a593Smuzhiyun	supports substring match so user doesn't need to specify full subsystem
169*4882a593Smuzhiyun	and event name everytime.  For example, 'sched:sched_switch' event can
170*4882a593Smuzhiyun	be shortened to 'switch' as long as it's not ambiguous.  Also event can
171*4882a593Smuzhiyun	be specified by its index (starting from 1) preceded by the '%'.
172*4882a593Smuzhiyun	So '%1' is the first event, '%2' is the second, and so on.
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun	The field name can have '/raw' suffix which disables pretty printing
175*4882a593Smuzhiyun	and shows raw field value like hex numbers.  The --raw-trace option
176*4882a593Smuzhiyun	has the same effect for all dynamic sort keys.
177*4882a593Smuzhiyun
178*4882a593Smuzhiyun	The default sort keys are changed to 'trace' if all events in the data
179*4882a593Smuzhiyun	file are tracepoint.
180*4882a593Smuzhiyun
181*4882a593Smuzhiyun-F::
182*4882a593Smuzhiyun--fields=::
183*4882a593Smuzhiyun	Specify output field - multiple keys can be specified in CSV format.
184*4882a593Smuzhiyun	Following fields are available:
185*4882a593Smuzhiyun	overhead, overhead_sys, overhead_us, overhead_children, sample and period.
186*4882a593Smuzhiyun	Also it can contain any sort key(s).
187*4882a593Smuzhiyun
188*4882a593Smuzhiyun	By default, every sort keys not specified in -F will be appended
189*4882a593Smuzhiyun	automatically.
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun	If the keys starts with a prefix '+', then it will append the specified
192*4882a593Smuzhiyun        field(s) to the default field order. For example: perf report -F +period,sample.
193*4882a593Smuzhiyun
194*4882a593Smuzhiyun-p::
195*4882a593Smuzhiyun--parent=<regex>::
196*4882a593Smuzhiyun        A regex filter to identify parent. The parent is a caller of this
197*4882a593Smuzhiyun	function and searched through the callchain, thus it requires callchain
198*4882a593Smuzhiyun	information recorded. The pattern is in the extended regex format and
199*4882a593Smuzhiyun	defaults to "\^sys_|^do_page_fault", see '--sort parent'.
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun-x::
202*4882a593Smuzhiyun--exclude-other::
203*4882a593Smuzhiyun        Only display entries with parent-match.
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun-w::
206*4882a593Smuzhiyun--column-widths=<width[,width...]>::
207*4882a593Smuzhiyun	Force each column width to the provided list, for large terminal
208*4882a593Smuzhiyun	readability.  0 means no limit (default behavior).
209*4882a593Smuzhiyun
210*4882a593Smuzhiyun-t::
211*4882a593Smuzhiyun--field-separator=::
212*4882a593Smuzhiyun	Use a special separator character and don't pad with spaces, replacing
213*4882a593Smuzhiyun	all occurrences of this separator in symbol names (and other output)
214*4882a593Smuzhiyun	with a '.' character, that thus it's the only non valid separator.
215*4882a593Smuzhiyun
216*4882a593Smuzhiyun-D::
217*4882a593Smuzhiyun--dump-raw-trace::
218*4882a593Smuzhiyun        Dump raw trace in ASCII.
219*4882a593Smuzhiyun
220*4882a593Smuzhiyun-g::
221*4882a593Smuzhiyun--call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>::
222*4882a593Smuzhiyun        Display call chains using type, min percent threshold, print limit,
223*4882a593Smuzhiyun	call order, sort key, optional branch and value.  Note that ordering
224*4882a593Smuzhiyun	is not fixed so any parameter can be given in an arbitrary order.
225*4882a593Smuzhiyun	One exception is the print_limit which should be preceded by threshold.
226*4882a593Smuzhiyun
227*4882a593Smuzhiyun	print_type can be either:
228*4882a593Smuzhiyun	- flat: single column, linear exposure of call chains.
229*4882a593Smuzhiyun	- graph: use a graph tree, displaying absolute overhead rates. (default)
230*4882a593Smuzhiyun	- fractal: like graph, but displays relative rates. Each branch of
231*4882a593Smuzhiyun		 the tree is considered as a new profiled object.
232*4882a593Smuzhiyun	- folded: call chains are displayed in a line, separated by semicolons
233*4882a593Smuzhiyun	- none: disable call chain display.
234*4882a593Smuzhiyun
235*4882a593Smuzhiyun	threshold is a percentage value which specifies a minimum percent to be
236*4882a593Smuzhiyun	included in the output call graph.  Default is 0.5 (%).
237*4882a593Smuzhiyun
238*4882a593Smuzhiyun	print_limit is only applied when stdio interface is used.  It's to limit
239*4882a593Smuzhiyun	number of call graph entries in a single hist entry.  Note that it needs
240*4882a593Smuzhiyun	to be given after threshold (but not necessarily consecutive).
241*4882a593Smuzhiyun	Default is 0 (unlimited).
242*4882a593Smuzhiyun
243*4882a593Smuzhiyun	order can be either:
244*4882a593Smuzhiyun	- callee: callee based call graph.
245*4882a593Smuzhiyun	- caller: inverted caller based call graph.
246*4882a593Smuzhiyun	Default is 'caller' when --children is used, otherwise 'callee'.
247*4882a593Smuzhiyun
248*4882a593Smuzhiyun	sort_key can be:
249*4882a593Smuzhiyun	- function: compare on functions (default)
250*4882a593Smuzhiyun	- address: compare on individual code addresses
251*4882a593Smuzhiyun	- srcline: compare on source filename and line number
252*4882a593Smuzhiyun
253*4882a593Smuzhiyun	branch can be:
254*4882a593Smuzhiyun	- branch: include last branch information in callgraph when available.
255*4882a593Smuzhiyun	          Usually more convenient to use --branch-history for this.
256*4882a593Smuzhiyun
257*4882a593Smuzhiyun	value can be:
258*4882a593Smuzhiyun	- percent: display overhead percent (default)
259*4882a593Smuzhiyun	- period: display event period
260*4882a593Smuzhiyun	- count: display event count
261*4882a593Smuzhiyun
262*4882a593Smuzhiyun--children::
263*4882a593Smuzhiyun	Accumulate callchain of children to parent entry so that then can
264*4882a593Smuzhiyun	show up in the output.  The output will have a new "Children" column
265*4882a593Smuzhiyun	and will be sorted on the data.  It requires callchains are recorded.
266*4882a593Smuzhiyun	See the `overhead calculation' section for more details. Enabled by
267*4882a593Smuzhiyun	default, disable with --no-children.
268*4882a593Smuzhiyun
269*4882a593Smuzhiyun--max-stack::
270*4882a593Smuzhiyun	Set the stack depth limit when parsing the callchain, anything
271*4882a593Smuzhiyun	beyond the specified depth will be ignored. This is a trade-off
272*4882a593Smuzhiyun	between information loss and faster processing especially for
273*4882a593Smuzhiyun	workloads that can have a very long callchain stack.
274*4882a593Smuzhiyun	Note that when using the --itrace option the synthesized callchain size
275*4882a593Smuzhiyun	will override this value if the synthesized callchain size is bigger.
276*4882a593Smuzhiyun
277*4882a593Smuzhiyun	Default: 127
278*4882a593Smuzhiyun
279*4882a593Smuzhiyun-G::
280*4882a593Smuzhiyun--inverted::
281*4882a593Smuzhiyun        alias for inverted caller based call graph.
282*4882a593Smuzhiyun
283*4882a593Smuzhiyun--ignore-callees=<regex>::
284*4882a593Smuzhiyun        Ignore callees of the function(s) matching the given regex.
285*4882a593Smuzhiyun        This has the effect of collecting the callers of each such
286*4882a593Smuzhiyun        function into one place in the call-graph tree.
287*4882a593Smuzhiyun
288*4882a593Smuzhiyun--pretty=<key>::
289*4882a593Smuzhiyun        Pretty printing style.  key: normal, raw
290*4882a593Smuzhiyun
291*4882a593Smuzhiyun--stdio:: Use the stdio interface.
292*4882a593Smuzhiyun
293*4882a593Smuzhiyun--stdio-color::
294*4882a593Smuzhiyun	'always', 'never' or 'auto', allowing configuring color output
295*4882a593Smuzhiyun	via the command line, in addition to via "color.ui" .perfconfig.
296*4882a593Smuzhiyun	Use '--stdio-color always' to generate color even when redirecting
297*4882a593Smuzhiyun	to a pipe or file. Using just '--stdio-color' is equivalent to
298*4882a593Smuzhiyun	using 'always'.
299*4882a593Smuzhiyun
300*4882a593Smuzhiyun--tui:: Use the TUI interface, that is integrated with annotate and allows
301*4882a593Smuzhiyun        zooming into DSOs or threads, among other features. Use of --tui
302*4882a593Smuzhiyun	requires a tty, if one is not present, as when piping to other
303*4882a593Smuzhiyun	commands, the stdio interface is used.
304*4882a593Smuzhiyun
305*4882a593Smuzhiyun--gtk:: Use the GTK2 interface.
306*4882a593Smuzhiyun
307*4882a593Smuzhiyun-k::
308*4882a593Smuzhiyun--vmlinux=<file>::
309*4882a593Smuzhiyun        vmlinux pathname
310*4882a593Smuzhiyun
311*4882a593Smuzhiyun--ignore-vmlinux::
312*4882a593Smuzhiyun	Ignore vmlinux files.
313*4882a593Smuzhiyun
314*4882a593Smuzhiyun--kallsyms=<file>::
315*4882a593Smuzhiyun        kallsyms pathname
316*4882a593Smuzhiyun
317*4882a593Smuzhiyun-m::
318*4882a593Smuzhiyun--modules::
319*4882a593Smuzhiyun        Load module symbols. WARNING: This should only be used with -k and
320*4882a593Smuzhiyun        a LIVE kernel.
321*4882a593Smuzhiyun
322*4882a593Smuzhiyun-f::
323*4882a593Smuzhiyun--force::
324*4882a593Smuzhiyun        Don't do ownership validation.
325*4882a593Smuzhiyun
326*4882a593Smuzhiyun--symfs=<directory>::
327*4882a593Smuzhiyun        Look for files with symbols relative to this directory.
328*4882a593Smuzhiyun
329*4882a593Smuzhiyun-C::
330*4882a593Smuzhiyun--cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
331*4882a593Smuzhiyun	be provided as a comma-separated list with no space: 0,1. Ranges of
332*4882a593Smuzhiyun	CPUs are specified with -: 0-2. Default is to report samples on all
333*4882a593Smuzhiyun	CPUs.
334*4882a593Smuzhiyun
335*4882a593Smuzhiyun-M::
336*4882a593Smuzhiyun--disassembler-style=:: Set disassembler style for objdump.
337*4882a593Smuzhiyun
338*4882a593Smuzhiyun--source::
339*4882a593Smuzhiyun	Interleave source code with assembly code. Enabled by default,
340*4882a593Smuzhiyun	disable with --no-source.
341*4882a593Smuzhiyun
342*4882a593Smuzhiyun--asm-raw::
343*4882a593Smuzhiyun	Show raw instruction encoding of assembly instructions.
344*4882a593Smuzhiyun
345*4882a593Smuzhiyun--show-total-period:: Show a column with the sum of periods.
346*4882a593Smuzhiyun
347*4882a593Smuzhiyun-I::
348*4882a593Smuzhiyun--show-info::
349*4882a593Smuzhiyun	Display extended information about the perf.data file. This adds
350*4882a593Smuzhiyun	information which may be very large and thus may clutter the display.
351*4882a593Smuzhiyun	It currently includes: cpu and numa topology of the host system.
352*4882a593Smuzhiyun
353*4882a593Smuzhiyun-b::
354*4882a593Smuzhiyun--branch-stack::
355*4882a593Smuzhiyun	Use the addresses of sampled taken branches instead of the instruction
356*4882a593Smuzhiyun	address to build the histograms. To generate meaningful output, the
357*4882a593Smuzhiyun	perf.data file must have been obtained using perf record -b or
358*4882a593Smuzhiyun	perf record --branch-filter xxx where xxx is a branch filter option.
359*4882a593Smuzhiyun	perf report is able to auto-detect whether a perf.data file contains
360*4882a593Smuzhiyun	branch stacks and it will automatically switch to the branch view mode,
361*4882a593Smuzhiyun	unless --no-branch-stack is used.
362*4882a593Smuzhiyun
363*4882a593Smuzhiyun--branch-history::
364*4882a593Smuzhiyun	Add the addresses of sampled taken branches to the callstack.
365*4882a593Smuzhiyun	This allows to examine the path the program took to each sample.
366*4882a593Smuzhiyun	The data collection must have used -b (or -j) and -g.
367*4882a593Smuzhiyun
368*4882a593Smuzhiyun--objdump=<path>::
369*4882a593Smuzhiyun        Path to objdump binary.
370*4882a593Smuzhiyun
371*4882a593Smuzhiyun--prefix=PREFIX::
372*4882a593Smuzhiyun--prefix-strip=N::
373*4882a593Smuzhiyun	Remove first N entries from source file path names in executables
374*4882a593Smuzhiyun	and add PREFIX. This allows to display source code compiled on systems
375*4882a593Smuzhiyun	with different file system layout.
376*4882a593Smuzhiyun
377*4882a593Smuzhiyun--group::
378*4882a593Smuzhiyun	Show event group information together. It forces group output also
379*4882a593Smuzhiyun	if there are no groups defined in data file.
380*4882a593Smuzhiyun
381*4882a593Smuzhiyun--group-sort-idx::
382*4882a593Smuzhiyun	Sort the output by the event at the index n in group. If n is invalid,
383*4882a593Smuzhiyun	sort by the first event. It can support multiple groups with different
384*4882a593Smuzhiyun	amount of events. WARNING: This should be used on grouped events.
385*4882a593Smuzhiyun
386*4882a593Smuzhiyun--demangle::
387*4882a593Smuzhiyun	Demangle symbol names to human readable form. It's enabled by default,
388*4882a593Smuzhiyun	disable with --no-demangle.
389*4882a593Smuzhiyun
390*4882a593Smuzhiyun--demangle-kernel::
391*4882a593Smuzhiyun	Demangle kernel symbol names to human readable form (for C++ kernels).
392*4882a593Smuzhiyun
393*4882a593Smuzhiyun--mem-mode::
394*4882a593Smuzhiyun	Use the data addresses of samples in addition to instruction addresses
395*4882a593Smuzhiyun	to build the histograms.  To generate meaningful output, the perf.data
396*4882a593Smuzhiyun	file must have been obtained using perf record -d -W and using a
397*4882a593Smuzhiyun	special event -e cpu/mem-loads/p or -e cpu/mem-stores/p. See
398*4882a593Smuzhiyun	'perf mem' for simpler access.
399*4882a593Smuzhiyun
400*4882a593Smuzhiyun--percent-limit::
401*4882a593Smuzhiyun	Do not show entries which have an overhead under that percent.
402*4882a593Smuzhiyun	(Default: 0).  Note that this option also sets the percent limit (threshold)
403*4882a593Smuzhiyun	of callchains.  However the default value of callchain threshold is
404*4882a593Smuzhiyun	different than the default value of hist entries.  Please see the
405*4882a593Smuzhiyun	--call-graph option for details.
406*4882a593Smuzhiyun
407*4882a593Smuzhiyun--percentage::
408*4882a593Smuzhiyun	Determine how to display the overhead percentage of filtered entries.
409*4882a593Smuzhiyun	Filters can be applied by --comms, --dsos and/or --symbols options and
410*4882a593Smuzhiyun	Zoom operations on the TUI (thread, dso, etc).
411*4882a593Smuzhiyun
412*4882a593Smuzhiyun	"relative" means it's relative to filtered entries only so that the
413*4882a593Smuzhiyun	sum of shown entries will be always 100%.  "absolute" means it retains
414*4882a593Smuzhiyun	the original value before and after the filter is applied.
415*4882a593Smuzhiyun
416*4882a593Smuzhiyun--header::
417*4882a593Smuzhiyun	Show header information in the perf.data file.  This includes
418*4882a593Smuzhiyun	various information like hostname, OS and perf version, cpu/mem
419*4882a593Smuzhiyun	info, perf command line, event list and so on.  Currently only
420*4882a593Smuzhiyun	--stdio output supports this feature.
421*4882a593Smuzhiyun
422*4882a593Smuzhiyun--header-only::
423*4882a593Smuzhiyun	Show only perf.data header (forces --stdio).
424*4882a593Smuzhiyun
425*4882a593Smuzhiyun--time::
426*4882a593Smuzhiyun	Only analyze samples within given time window: <start>,<stop>. Times
427*4882a593Smuzhiyun	have the format seconds.nanoseconds. If start is not given (i.e. time
428*4882a593Smuzhiyun	string is ',x.y') then analysis starts at the beginning of the file. If
429*4882a593Smuzhiyun	stop time is not given (i.e. time string is 'x.y,') then analysis goes
430*4882a593Smuzhiyun	to end of file. Multiple ranges can be separated by spaces, which
431*4882a593Smuzhiyun	requires the argument to be quoted e.g. --time "1234.567,1234.789 1235,"
432*4882a593Smuzhiyun
433*4882a593Smuzhiyun	Also support time percent with multiple time ranges. Time string is
434*4882a593Smuzhiyun	'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'.
435*4882a593Smuzhiyun
436*4882a593Smuzhiyun	For example:
437*4882a593Smuzhiyun	Select the second 10% time slice:
438*4882a593Smuzhiyun
439*4882a593Smuzhiyun	  perf report --time 10%/2
440*4882a593Smuzhiyun
441*4882a593Smuzhiyun	Select from 0% to 10% time slice:
442*4882a593Smuzhiyun
443*4882a593Smuzhiyun	  perf report --time 0%-10%
444*4882a593Smuzhiyun
445*4882a593Smuzhiyun	Select the first and second 10% time slices:
446*4882a593Smuzhiyun
447*4882a593Smuzhiyun	  perf report --time 10%/1,10%/2
448*4882a593Smuzhiyun
449*4882a593Smuzhiyun	Select from 0% to 10% and 30% to 40% slices:
450*4882a593Smuzhiyun
451*4882a593Smuzhiyun	  perf report --time 0%-10%,30%-40%
452*4882a593Smuzhiyun
453*4882a593Smuzhiyun--switch-on EVENT_NAME::
454*4882a593Smuzhiyun	Only consider events after this event is found.
455*4882a593Smuzhiyun
456*4882a593Smuzhiyun	This may be interesting to measure a workload only after some initialization
457*4882a593Smuzhiyun	phase is over, i.e. insert a perf probe at that point and then using this
458*4882a593Smuzhiyun	option with that probe.
459*4882a593Smuzhiyun
460*4882a593Smuzhiyun--switch-off EVENT_NAME::
461*4882a593Smuzhiyun	Stop considering events after this event is found.
462*4882a593Smuzhiyun
463*4882a593Smuzhiyun--show-on-off-events::
464*4882a593Smuzhiyun	Show the --switch-on/off events too. This has no effect in 'perf report' now
465*4882a593Smuzhiyun	but probably we'll make the default not to show the switch-on/off events
466*4882a593Smuzhiyun        on the --group mode and if there is only one event besides the off/on ones,
467*4882a593Smuzhiyun	go straight to the histogram browser, just like 'perf report' with no events
468*4882a593Smuzhiyun	explicitely specified does.
469*4882a593Smuzhiyun
470*4882a593Smuzhiyun--itrace::
471*4882a593Smuzhiyun	Options for decoding instruction tracing data. The options are:
472*4882a593Smuzhiyun
473*4882a593Smuzhiyuninclude::itrace.txt[]
474*4882a593Smuzhiyun
475*4882a593Smuzhiyun	To disable decoding entirely, use --no-itrace.
476*4882a593Smuzhiyun
477*4882a593Smuzhiyun--full-source-path::
478*4882a593Smuzhiyun	Show the full path for source files for srcline output.
479*4882a593Smuzhiyun
480*4882a593Smuzhiyun--show-ref-call-graph::
481*4882a593Smuzhiyun	When multiple events are sampled, it may not be needed to collect
482*4882a593Smuzhiyun	callgraphs for all of them. The sample sites are usually nearby,
483*4882a593Smuzhiyun	and it's enough to collect the callgraphs on a reference event.
484*4882a593Smuzhiyun	So user can use "call-graph=no" event modifier to disable callgraph
485*4882a593Smuzhiyun	for other events to reduce the overhead.
486*4882a593Smuzhiyun	However, perf report cannot show callgraphs for the event which
487*4882a593Smuzhiyun	disable the callgraph.
488*4882a593Smuzhiyun	This option extends the perf report to show reference callgraphs,
489*4882a593Smuzhiyun	which collected by reference event, in no callgraph event.
490*4882a593Smuzhiyun
491*4882a593Smuzhiyun--stitch-lbr::
492*4882a593Smuzhiyun	Show callgraph with stitched LBRs, which may have more complete
493*4882a593Smuzhiyun	callgraph. The perf.data file must have been obtained using
494*4882a593Smuzhiyun	perf record --call-graph lbr.
495*4882a593Smuzhiyun	Disabled by default. In common cases with call stack overflows,
496*4882a593Smuzhiyun	it can recreate better call stacks than the default lbr call stack
497*4882a593Smuzhiyun	output. But this approach is not full proof. There can be cases
498*4882a593Smuzhiyun	where it creates incorrect call stacks from incorrect matches.
499*4882a593Smuzhiyun	The known limitations include exception handing such as
500*4882a593Smuzhiyun	setjmp/longjmp will have calls/returns not match.
501*4882a593Smuzhiyun
502*4882a593Smuzhiyun--socket-filter::
503*4882a593Smuzhiyun	Only report the samples on the processor socket that match with this filter
504*4882a593Smuzhiyun
505*4882a593Smuzhiyun--samples=N::
506*4882a593Smuzhiyun	Save N individual samples for each histogram entry to show context in perf
507*4882a593Smuzhiyun	report tui browser.
508*4882a593Smuzhiyun
509*4882a593Smuzhiyun--raw-trace::
510*4882a593Smuzhiyun	When displaying traceevent output, do not use print fmt or plugins.
511*4882a593Smuzhiyun
512*4882a593Smuzhiyun--hierarchy::
513*4882a593Smuzhiyun	Enable hierarchical output.
514*4882a593Smuzhiyun
515*4882a593Smuzhiyun--inline::
516*4882a593Smuzhiyun	If a callgraph address belongs to an inlined function, the inline stack
517*4882a593Smuzhiyun	will be printed. Each entry is function name or file/line. Enabled by
518*4882a593Smuzhiyun	default, disable with --no-inline.
519*4882a593Smuzhiyun
520*4882a593Smuzhiyun--mmaps::
521*4882a593Smuzhiyun	Show --tasks output plus mmap information in a format similar to
522*4882a593Smuzhiyun	/proc/<PID>/maps.
523*4882a593Smuzhiyun
524*4882a593Smuzhiyun	Please note that not all mmaps are stored, options affecting which ones
525*4882a593Smuzhiyun	are include 'perf record --data', for instance.
526*4882a593Smuzhiyun
527*4882a593Smuzhiyun--ns::
528*4882a593Smuzhiyun	Show time stamps in nanoseconds.
529*4882a593Smuzhiyun
530*4882a593Smuzhiyun--stats::
531*4882a593Smuzhiyun	Display overall events statistics without any further processing.
532*4882a593Smuzhiyun	(like the one at the end of the perf report -D command)
533*4882a593Smuzhiyun
534*4882a593Smuzhiyun--tasks::
535*4882a593Smuzhiyun	Display monitored tasks stored in perf data. Displaying pid/tid/ppid
536*4882a593Smuzhiyun	plus the command string aligned to distinguish parent and child tasks.
537*4882a593Smuzhiyun
538*4882a593Smuzhiyun--percent-type::
539*4882a593Smuzhiyun	Set annotation percent type from following choices:
540*4882a593Smuzhiyun	  global-period, local-period, global-hits, local-hits
541*4882a593Smuzhiyun
542*4882a593Smuzhiyun	The local/global keywords set if the percentage is computed
543*4882a593Smuzhiyun	in the scope of the function (local) or the whole data (global).
544*4882a593Smuzhiyun	The period/hits keywords set the base the percentage is computed
545*4882a593Smuzhiyun	on - the samples period or the number of samples (hits).
546*4882a593Smuzhiyun
547*4882a593Smuzhiyun--time-quantum::
548*4882a593Smuzhiyun	Configure time quantum for time sort key. Default 100ms.
549*4882a593Smuzhiyun	Accepts s, us, ms, ns units.
550*4882a593Smuzhiyun
551*4882a593Smuzhiyun--total-cycles::
552*4882a593Smuzhiyun	When --total-cycles is specified, it supports sorting for all blocks by
553*4882a593Smuzhiyun	'Sampled Cycles%'. This is useful to concentrate on the globally hottest
554*4882a593Smuzhiyun	blocks. In output, there are some new columns:
555*4882a593Smuzhiyun
556*4882a593Smuzhiyun	'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles
557*4882a593Smuzhiyun	'Sampled Cycles'  - block sampled cycles aggregation
558*4882a593Smuzhiyun	'Avg Cycles%'     - block average sampled cycles / sum of total block average
559*4882a593Smuzhiyun			    sampled cycles
560*4882a593Smuzhiyun	'Avg Cycles'      - block average sampled cycles
561*4882a593Smuzhiyun
562*4882a593Smuzhiyuninclude::callchain-overhead-calculation.txt[]
563*4882a593Smuzhiyun
564*4882a593SmuzhiyunSEE ALSO
565*4882a593Smuzhiyun--------
566*4882a593Smuzhiyunlinkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1],
567*4882a593Smuzhiyunlinkperf:perf-intel-pt[1]
568