1*4882a593Smuzhiyunperf-report(1) 2*4882a593Smuzhiyun============== 3*4882a593Smuzhiyun 4*4882a593SmuzhiyunNAME 5*4882a593Smuzhiyun---- 6*4882a593Smuzhiyunperf-report - Read perf.data (created by perf record) and display the profile 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunSYNOPSIS 9*4882a593Smuzhiyun-------- 10*4882a593Smuzhiyun[verse] 11*4882a593Smuzhiyun'perf report' [-i <file> | --input=file] 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunDESCRIPTION 14*4882a593Smuzhiyun----------- 15*4882a593SmuzhiyunThis command displays the performance counter profile information recorded 16*4882a593Smuzhiyunvia perf record. 17*4882a593Smuzhiyun 18*4882a593SmuzhiyunOPTIONS 19*4882a593Smuzhiyun------- 20*4882a593Smuzhiyun-i:: 21*4882a593Smuzhiyun--input=:: 22*4882a593Smuzhiyun Input file name. (default: perf.data unless stdin is a fifo) 23*4882a593Smuzhiyun 24*4882a593Smuzhiyun-v:: 25*4882a593Smuzhiyun--verbose:: 26*4882a593Smuzhiyun Be more verbose. (show symbol address, etc) 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun-q:: 29*4882a593Smuzhiyun--quiet:: 30*4882a593Smuzhiyun Do not show any message. (Suppress -v) 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun-n:: 33*4882a593Smuzhiyun--show-nr-samples:: 34*4882a593Smuzhiyun Show the number of samples for each symbol 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun--show-cpu-utilization:: 37*4882a593Smuzhiyun Show sample percentage for different cpu modes. 38*4882a593Smuzhiyun 39*4882a593Smuzhiyun-T:: 40*4882a593Smuzhiyun--threads:: 41*4882a593Smuzhiyun Show per-thread event counters. The input data file should be recorded 42*4882a593Smuzhiyun with -s option. 43*4882a593Smuzhiyun-c:: 44*4882a593Smuzhiyun--comms=:: 45*4882a593Smuzhiyun Only consider symbols in these comms. CSV that understands 46*4882a593Smuzhiyun file://filename entries. This option will affect the percentage of 47*4882a593Smuzhiyun the overhead column. See --percentage for more info. 48*4882a593Smuzhiyun--pid=:: 49*4882a593Smuzhiyun Only show events for given process ID (comma separated list). 50*4882a593Smuzhiyun 51*4882a593Smuzhiyun--tid=:: 52*4882a593Smuzhiyun Only show events for given thread ID (comma separated list). 53*4882a593Smuzhiyun-d:: 54*4882a593Smuzhiyun--dsos=:: 55*4882a593Smuzhiyun Only consider symbols in these dsos. CSV that understands 56*4882a593Smuzhiyun file://filename entries. This option will affect the percentage of 57*4882a593Smuzhiyun the overhead column. See --percentage for more info. 58*4882a593Smuzhiyun-S:: 59*4882a593Smuzhiyun--symbols=:: 60*4882a593Smuzhiyun Only consider these symbols. CSV that understands 61*4882a593Smuzhiyun file://filename entries. This option will affect the percentage of 62*4882a593Smuzhiyun the overhead column. See --percentage for more info. 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun--symbol-filter=:: 65*4882a593Smuzhiyun Only show symbols that match (partially) with this filter. 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun-U:: 68*4882a593Smuzhiyun--hide-unresolved:: 69*4882a593Smuzhiyun Only display entries resolved to a symbol. 70*4882a593Smuzhiyun 71*4882a593Smuzhiyun-s:: 72*4882a593Smuzhiyun--sort=:: 73*4882a593Smuzhiyun Sort histogram entries by given key(s) - multiple keys can be specified 74*4882a593Smuzhiyun in CSV format. Following sort keys are available: 75*4882a593Smuzhiyun pid, comm, dso, symbol, parent, cpu, socket, srcline, weight, 76*4882a593Smuzhiyun local_weight, cgroup_id. 77*4882a593Smuzhiyun 78*4882a593Smuzhiyun Each key has following meaning: 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun - comm: command (name) of the task which can be read via /proc/<pid>/comm 81*4882a593Smuzhiyun - pid: command and tid of the task 82*4882a593Smuzhiyun - dso: name of library or module executed at the time of sample 83*4882a593Smuzhiyun - dso_size: size of library or module executed at the time of sample 84*4882a593Smuzhiyun - symbol: name of function executed at the time of sample 85*4882a593Smuzhiyun - symbol_size: size of function executed at the time of sample 86*4882a593Smuzhiyun - parent: name of function matched to the parent regex filter. Unmatched 87*4882a593Smuzhiyun entries are displayed as "[other]". 88*4882a593Smuzhiyun - cpu: cpu number the task ran at the time of sample 89*4882a593Smuzhiyun - socket: processor socket number the task ran at the time of sample 90*4882a593Smuzhiyun - srcline: filename and line number executed at the time of sample. The 91*4882a593Smuzhiyun DWARF debugging info must be provided. 92*4882a593Smuzhiyun - srcfile: file name of the source file of the samples. Requires dwarf 93*4882a593Smuzhiyun information. 94*4882a593Smuzhiyun - weight: Event specific weight, e.g. memory latency or transaction 95*4882a593Smuzhiyun abort cost. This is the global weight. 96*4882a593Smuzhiyun - local_weight: Local weight version of the weight above. 97*4882a593Smuzhiyun - cgroup_id: ID derived from cgroup namespace device and inode numbers. 98*4882a593Smuzhiyun - cgroup: cgroup pathname in the cgroupfs. 99*4882a593Smuzhiyun - transaction: Transaction abort flags. 100*4882a593Smuzhiyun - overhead: Overhead percentage of sample 101*4882a593Smuzhiyun - overhead_sys: Overhead percentage of sample running in system mode 102*4882a593Smuzhiyun - overhead_us: Overhead percentage of sample running in user mode 103*4882a593Smuzhiyun - overhead_guest_sys: Overhead percentage of sample running in system mode 104*4882a593Smuzhiyun on guest machine 105*4882a593Smuzhiyun - overhead_guest_us: Overhead percentage of sample running in user mode on 106*4882a593Smuzhiyun guest machine 107*4882a593Smuzhiyun - sample: Number of sample 108*4882a593Smuzhiyun - period: Raw number of event count of sample 109*4882a593Smuzhiyun - time: Separate the samples by time stamp with the resolution specified by 110*4882a593Smuzhiyun --time-quantum (default 100ms). Specify with overhead and before it. 111*4882a593Smuzhiyun 112*4882a593Smuzhiyun By default, comm, dso and symbol keys are used. 113*4882a593Smuzhiyun (i.e. --sort comm,dso,symbol) 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun If --branch-stack option is used, following sort keys are also 116*4882a593Smuzhiyun available: 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun - dso_from: name of library or module branched from 119*4882a593Smuzhiyun - dso_to: name of library or module branched to 120*4882a593Smuzhiyun - symbol_from: name of function branched from 121*4882a593Smuzhiyun - symbol_to: name of function branched to 122*4882a593Smuzhiyun - srcline_from: source file and line branched from 123*4882a593Smuzhiyun - srcline_to: source file and line branched to 124*4882a593Smuzhiyun - mispredict: "N" for predicted branch, "Y" for mispredicted branch 125*4882a593Smuzhiyun - in_tx: branch in TSX transaction 126*4882a593Smuzhiyun - abort: TSX transaction abort. 127*4882a593Smuzhiyun - cycles: Cycles in basic block 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun And default sort keys are changed to comm, dso_from, symbol_from, dso_to 130*4882a593Smuzhiyun and symbol_to, see '--branch-stack'. 131*4882a593Smuzhiyun 132*4882a593Smuzhiyun When the sort key symbol is specified, columns "IPC" and "IPC Coverage" 133*4882a593Smuzhiyun are enabled automatically. Column "IPC" reports the average IPC per function 134*4882a593Smuzhiyun and column "IPC coverage" reports the percentage of instructions with 135*4882a593Smuzhiyun sampled IPC in this function. IPC means Instruction Per Cycle. If it's low, 136*4882a593Smuzhiyun it indicates there may be a performance bottleneck when the function is 137*4882a593Smuzhiyun executed, such as a memory access bottleneck. If a function has high overhead 138*4882a593Smuzhiyun and low IPC, it's worth further analyzing it to optimize its performance. 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun If the --mem-mode option is used, the following sort keys are also available 141*4882a593Smuzhiyun (incompatible with --branch-stack): 142*4882a593Smuzhiyun symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline. 143*4882a593Smuzhiyun 144*4882a593Smuzhiyun - symbol_daddr: name of data symbol being executed on at the time of sample 145*4882a593Smuzhiyun - dso_daddr: name of library or module containing the data being executed 146*4882a593Smuzhiyun on at the time of the sample 147*4882a593Smuzhiyun - locked: whether the bus was locked at the time of the sample 148*4882a593Smuzhiyun - tlb: type of tlb access for the data at the time of the sample 149*4882a593Smuzhiyun - mem: type of memory access for the data at the time of the sample 150*4882a593Smuzhiyun - snoop: type of snoop (if any) for the data at the time of the sample 151*4882a593Smuzhiyun - dcacheline: the cacheline the data address is on at the time of the sample 152*4882a593Smuzhiyun - phys_daddr: physical address of data being executed on at the time of sample 153*4882a593Smuzhiyun 154*4882a593Smuzhiyun And the default sort keys are changed to local_weight, mem, sym, dso, 155*4882a593Smuzhiyun symbol_daddr, dso_daddr, snoop, tlb, locked, see '--mem-mode'. 156*4882a593Smuzhiyun 157*4882a593Smuzhiyun If the data file has tracepoint event(s), following (dynamic) sort keys 158*4882a593Smuzhiyun are also available: 159*4882a593Smuzhiyun trace, trace_fields, [<event>.]<field>[/raw] 160*4882a593Smuzhiyun 161*4882a593Smuzhiyun - trace: pretty printed trace output in a single column 162*4882a593Smuzhiyun - trace_fields: fields in tracepoints in separate columns 163*4882a593Smuzhiyun - <field name>: optional event and field name for a specific field 164*4882a593Smuzhiyun 165*4882a593Smuzhiyun The last form consists of event and field names. If event name is 166*4882a593Smuzhiyun omitted, it searches all events for matching field name. The matched 167*4882a593Smuzhiyun field will be shown only for the event has the field. The event name 168*4882a593Smuzhiyun supports substring match so user doesn't need to specify full subsystem 169*4882a593Smuzhiyun and event name everytime. For example, 'sched:sched_switch' event can 170*4882a593Smuzhiyun be shortened to 'switch' as long as it's not ambiguous. Also event can 171*4882a593Smuzhiyun be specified by its index (starting from 1) preceded by the '%'. 172*4882a593Smuzhiyun So '%1' is the first event, '%2' is the second, and so on. 173*4882a593Smuzhiyun 174*4882a593Smuzhiyun The field name can have '/raw' suffix which disables pretty printing 175*4882a593Smuzhiyun and shows raw field value like hex numbers. The --raw-trace option 176*4882a593Smuzhiyun has the same effect for all dynamic sort keys. 177*4882a593Smuzhiyun 178*4882a593Smuzhiyun The default sort keys are changed to 'trace' if all events in the data 179*4882a593Smuzhiyun file are tracepoint. 180*4882a593Smuzhiyun 181*4882a593Smuzhiyun-F:: 182*4882a593Smuzhiyun--fields=:: 183*4882a593Smuzhiyun Specify output field - multiple keys can be specified in CSV format. 184*4882a593Smuzhiyun Following fields are available: 185*4882a593Smuzhiyun overhead, overhead_sys, overhead_us, overhead_children, sample and period. 186*4882a593Smuzhiyun Also it can contain any sort key(s). 187*4882a593Smuzhiyun 188*4882a593Smuzhiyun By default, every sort keys not specified in -F will be appended 189*4882a593Smuzhiyun automatically. 190*4882a593Smuzhiyun 191*4882a593Smuzhiyun If the keys starts with a prefix '+', then it will append the specified 192*4882a593Smuzhiyun field(s) to the default field order. For example: perf report -F +period,sample. 193*4882a593Smuzhiyun 194*4882a593Smuzhiyun-p:: 195*4882a593Smuzhiyun--parent=<regex>:: 196*4882a593Smuzhiyun A regex filter to identify parent. The parent is a caller of this 197*4882a593Smuzhiyun function and searched through the callchain, thus it requires callchain 198*4882a593Smuzhiyun information recorded. The pattern is in the extended regex format and 199*4882a593Smuzhiyun defaults to "\^sys_|^do_page_fault", see '--sort parent'. 200*4882a593Smuzhiyun 201*4882a593Smuzhiyun-x:: 202*4882a593Smuzhiyun--exclude-other:: 203*4882a593Smuzhiyun Only display entries with parent-match. 204*4882a593Smuzhiyun 205*4882a593Smuzhiyun-w:: 206*4882a593Smuzhiyun--column-widths=<width[,width...]>:: 207*4882a593Smuzhiyun Force each column width to the provided list, for large terminal 208*4882a593Smuzhiyun readability. 0 means no limit (default behavior). 209*4882a593Smuzhiyun 210*4882a593Smuzhiyun-t:: 211*4882a593Smuzhiyun--field-separator=:: 212*4882a593Smuzhiyun Use a special separator character and don't pad with spaces, replacing 213*4882a593Smuzhiyun all occurrences of this separator in symbol names (and other output) 214*4882a593Smuzhiyun with a '.' character, that thus it's the only non valid separator. 215*4882a593Smuzhiyun 216*4882a593Smuzhiyun-D:: 217*4882a593Smuzhiyun--dump-raw-trace:: 218*4882a593Smuzhiyun Dump raw trace in ASCII. 219*4882a593Smuzhiyun 220*4882a593Smuzhiyun-g:: 221*4882a593Smuzhiyun--call-graph=<print_type,threshold[,print_limit],order,sort_key[,branch],value>:: 222*4882a593Smuzhiyun Display call chains using type, min percent threshold, print limit, 223*4882a593Smuzhiyun call order, sort key, optional branch and value. Note that ordering 224*4882a593Smuzhiyun is not fixed so any parameter can be given in an arbitrary order. 225*4882a593Smuzhiyun One exception is the print_limit which should be preceded by threshold. 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun print_type can be either: 228*4882a593Smuzhiyun - flat: single column, linear exposure of call chains. 229*4882a593Smuzhiyun - graph: use a graph tree, displaying absolute overhead rates. (default) 230*4882a593Smuzhiyun - fractal: like graph, but displays relative rates. Each branch of 231*4882a593Smuzhiyun the tree is considered as a new profiled object. 232*4882a593Smuzhiyun - folded: call chains are displayed in a line, separated by semicolons 233*4882a593Smuzhiyun - none: disable call chain display. 234*4882a593Smuzhiyun 235*4882a593Smuzhiyun threshold is a percentage value which specifies a minimum percent to be 236*4882a593Smuzhiyun included in the output call graph. Default is 0.5 (%). 237*4882a593Smuzhiyun 238*4882a593Smuzhiyun print_limit is only applied when stdio interface is used. It's to limit 239*4882a593Smuzhiyun number of call graph entries in a single hist entry. Note that it needs 240*4882a593Smuzhiyun to be given after threshold (but not necessarily consecutive). 241*4882a593Smuzhiyun Default is 0 (unlimited). 242*4882a593Smuzhiyun 243*4882a593Smuzhiyun order can be either: 244*4882a593Smuzhiyun - callee: callee based call graph. 245*4882a593Smuzhiyun - caller: inverted caller based call graph. 246*4882a593Smuzhiyun Default is 'caller' when --children is used, otherwise 'callee'. 247*4882a593Smuzhiyun 248*4882a593Smuzhiyun sort_key can be: 249*4882a593Smuzhiyun - function: compare on functions (default) 250*4882a593Smuzhiyun - address: compare on individual code addresses 251*4882a593Smuzhiyun - srcline: compare on source filename and line number 252*4882a593Smuzhiyun 253*4882a593Smuzhiyun branch can be: 254*4882a593Smuzhiyun - branch: include last branch information in callgraph when available. 255*4882a593Smuzhiyun Usually more convenient to use --branch-history for this. 256*4882a593Smuzhiyun 257*4882a593Smuzhiyun value can be: 258*4882a593Smuzhiyun - percent: display overhead percent (default) 259*4882a593Smuzhiyun - period: display event period 260*4882a593Smuzhiyun - count: display event count 261*4882a593Smuzhiyun 262*4882a593Smuzhiyun--children:: 263*4882a593Smuzhiyun Accumulate callchain of children to parent entry so that then can 264*4882a593Smuzhiyun show up in the output. The output will have a new "Children" column 265*4882a593Smuzhiyun and will be sorted on the data. It requires callchains are recorded. 266*4882a593Smuzhiyun See the `overhead calculation' section for more details. Enabled by 267*4882a593Smuzhiyun default, disable with --no-children. 268*4882a593Smuzhiyun 269*4882a593Smuzhiyun--max-stack:: 270*4882a593Smuzhiyun Set the stack depth limit when parsing the callchain, anything 271*4882a593Smuzhiyun beyond the specified depth will be ignored. This is a trade-off 272*4882a593Smuzhiyun between information loss and faster processing especially for 273*4882a593Smuzhiyun workloads that can have a very long callchain stack. 274*4882a593Smuzhiyun Note that when using the --itrace option the synthesized callchain size 275*4882a593Smuzhiyun will override this value if the synthesized callchain size is bigger. 276*4882a593Smuzhiyun 277*4882a593Smuzhiyun Default: 127 278*4882a593Smuzhiyun 279*4882a593Smuzhiyun-G:: 280*4882a593Smuzhiyun--inverted:: 281*4882a593Smuzhiyun alias for inverted caller based call graph. 282*4882a593Smuzhiyun 283*4882a593Smuzhiyun--ignore-callees=<regex>:: 284*4882a593Smuzhiyun Ignore callees of the function(s) matching the given regex. 285*4882a593Smuzhiyun This has the effect of collecting the callers of each such 286*4882a593Smuzhiyun function into one place in the call-graph tree. 287*4882a593Smuzhiyun 288*4882a593Smuzhiyun--pretty=<key>:: 289*4882a593Smuzhiyun Pretty printing style. key: normal, raw 290*4882a593Smuzhiyun 291*4882a593Smuzhiyun--stdio:: Use the stdio interface. 292*4882a593Smuzhiyun 293*4882a593Smuzhiyun--stdio-color:: 294*4882a593Smuzhiyun 'always', 'never' or 'auto', allowing configuring color output 295*4882a593Smuzhiyun via the command line, in addition to via "color.ui" .perfconfig. 296*4882a593Smuzhiyun Use '--stdio-color always' to generate color even when redirecting 297*4882a593Smuzhiyun to a pipe or file. Using just '--stdio-color' is equivalent to 298*4882a593Smuzhiyun using 'always'. 299*4882a593Smuzhiyun 300*4882a593Smuzhiyun--tui:: Use the TUI interface, that is integrated with annotate and allows 301*4882a593Smuzhiyun zooming into DSOs or threads, among other features. Use of --tui 302*4882a593Smuzhiyun requires a tty, if one is not present, as when piping to other 303*4882a593Smuzhiyun commands, the stdio interface is used. 304*4882a593Smuzhiyun 305*4882a593Smuzhiyun--gtk:: Use the GTK2 interface. 306*4882a593Smuzhiyun 307*4882a593Smuzhiyun-k:: 308*4882a593Smuzhiyun--vmlinux=<file>:: 309*4882a593Smuzhiyun vmlinux pathname 310*4882a593Smuzhiyun 311*4882a593Smuzhiyun--ignore-vmlinux:: 312*4882a593Smuzhiyun Ignore vmlinux files. 313*4882a593Smuzhiyun 314*4882a593Smuzhiyun--kallsyms=<file>:: 315*4882a593Smuzhiyun kallsyms pathname 316*4882a593Smuzhiyun 317*4882a593Smuzhiyun-m:: 318*4882a593Smuzhiyun--modules:: 319*4882a593Smuzhiyun Load module symbols. WARNING: This should only be used with -k and 320*4882a593Smuzhiyun a LIVE kernel. 321*4882a593Smuzhiyun 322*4882a593Smuzhiyun-f:: 323*4882a593Smuzhiyun--force:: 324*4882a593Smuzhiyun Don't do ownership validation. 325*4882a593Smuzhiyun 326*4882a593Smuzhiyun--symfs=<directory>:: 327*4882a593Smuzhiyun Look for files with symbols relative to this directory. 328*4882a593Smuzhiyun 329*4882a593Smuzhiyun-C:: 330*4882a593Smuzhiyun--cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can 331*4882a593Smuzhiyun be provided as a comma-separated list with no space: 0,1. Ranges of 332*4882a593Smuzhiyun CPUs are specified with -: 0-2. Default is to report samples on all 333*4882a593Smuzhiyun CPUs. 334*4882a593Smuzhiyun 335*4882a593Smuzhiyun-M:: 336*4882a593Smuzhiyun--disassembler-style=:: Set disassembler style for objdump. 337*4882a593Smuzhiyun 338*4882a593Smuzhiyun--source:: 339*4882a593Smuzhiyun Interleave source code with assembly code. Enabled by default, 340*4882a593Smuzhiyun disable with --no-source. 341*4882a593Smuzhiyun 342*4882a593Smuzhiyun--asm-raw:: 343*4882a593Smuzhiyun Show raw instruction encoding of assembly instructions. 344*4882a593Smuzhiyun 345*4882a593Smuzhiyun--show-total-period:: Show a column with the sum of periods. 346*4882a593Smuzhiyun 347*4882a593Smuzhiyun-I:: 348*4882a593Smuzhiyun--show-info:: 349*4882a593Smuzhiyun Display extended information about the perf.data file. This adds 350*4882a593Smuzhiyun information which may be very large and thus may clutter the display. 351*4882a593Smuzhiyun It currently includes: cpu and numa topology of the host system. 352*4882a593Smuzhiyun 353*4882a593Smuzhiyun-b:: 354*4882a593Smuzhiyun--branch-stack:: 355*4882a593Smuzhiyun Use the addresses of sampled taken branches instead of the instruction 356*4882a593Smuzhiyun address to build the histograms. To generate meaningful output, the 357*4882a593Smuzhiyun perf.data file must have been obtained using perf record -b or 358*4882a593Smuzhiyun perf record --branch-filter xxx where xxx is a branch filter option. 359*4882a593Smuzhiyun perf report is able to auto-detect whether a perf.data file contains 360*4882a593Smuzhiyun branch stacks and it will automatically switch to the branch view mode, 361*4882a593Smuzhiyun unless --no-branch-stack is used. 362*4882a593Smuzhiyun 363*4882a593Smuzhiyun--branch-history:: 364*4882a593Smuzhiyun Add the addresses of sampled taken branches to the callstack. 365*4882a593Smuzhiyun This allows to examine the path the program took to each sample. 366*4882a593Smuzhiyun The data collection must have used -b (or -j) and -g. 367*4882a593Smuzhiyun 368*4882a593Smuzhiyun--objdump=<path>:: 369*4882a593Smuzhiyun Path to objdump binary. 370*4882a593Smuzhiyun 371*4882a593Smuzhiyun--prefix=PREFIX:: 372*4882a593Smuzhiyun--prefix-strip=N:: 373*4882a593Smuzhiyun Remove first N entries from source file path names in executables 374*4882a593Smuzhiyun and add PREFIX. This allows to display source code compiled on systems 375*4882a593Smuzhiyun with different file system layout. 376*4882a593Smuzhiyun 377*4882a593Smuzhiyun--group:: 378*4882a593Smuzhiyun Show event group information together. It forces group output also 379*4882a593Smuzhiyun if there are no groups defined in data file. 380*4882a593Smuzhiyun 381*4882a593Smuzhiyun--group-sort-idx:: 382*4882a593Smuzhiyun Sort the output by the event at the index n in group. If n is invalid, 383*4882a593Smuzhiyun sort by the first event. It can support multiple groups with different 384*4882a593Smuzhiyun amount of events. WARNING: This should be used on grouped events. 385*4882a593Smuzhiyun 386*4882a593Smuzhiyun--demangle:: 387*4882a593Smuzhiyun Demangle symbol names to human readable form. It's enabled by default, 388*4882a593Smuzhiyun disable with --no-demangle. 389*4882a593Smuzhiyun 390*4882a593Smuzhiyun--demangle-kernel:: 391*4882a593Smuzhiyun Demangle kernel symbol names to human readable form (for C++ kernels). 392*4882a593Smuzhiyun 393*4882a593Smuzhiyun--mem-mode:: 394*4882a593Smuzhiyun Use the data addresses of samples in addition to instruction addresses 395*4882a593Smuzhiyun to build the histograms. To generate meaningful output, the perf.data 396*4882a593Smuzhiyun file must have been obtained using perf record -d -W and using a 397*4882a593Smuzhiyun special event -e cpu/mem-loads/p or -e cpu/mem-stores/p. See 398*4882a593Smuzhiyun 'perf mem' for simpler access. 399*4882a593Smuzhiyun 400*4882a593Smuzhiyun--percent-limit:: 401*4882a593Smuzhiyun Do not show entries which have an overhead under that percent. 402*4882a593Smuzhiyun (Default: 0). Note that this option also sets the percent limit (threshold) 403*4882a593Smuzhiyun of callchains. However the default value of callchain threshold is 404*4882a593Smuzhiyun different than the default value of hist entries. Please see the 405*4882a593Smuzhiyun --call-graph option for details. 406*4882a593Smuzhiyun 407*4882a593Smuzhiyun--percentage:: 408*4882a593Smuzhiyun Determine how to display the overhead percentage of filtered entries. 409*4882a593Smuzhiyun Filters can be applied by --comms, --dsos and/or --symbols options and 410*4882a593Smuzhiyun Zoom operations on the TUI (thread, dso, etc). 411*4882a593Smuzhiyun 412*4882a593Smuzhiyun "relative" means it's relative to filtered entries only so that the 413*4882a593Smuzhiyun sum of shown entries will be always 100%. "absolute" means it retains 414*4882a593Smuzhiyun the original value before and after the filter is applied. 415*4882a593Smuzhiyun 416*4882a593Smuzhiyun--header:: 417*4882a593Smuzhiyun Show header information in the perf.data file. This includes 418*4882a593Smuzhiyun various information like hostname, OS and perf version, cpu/mem 419*4882a593Smuzhiyun info, perf command line, event list and so on. Currently only 420*4882a593Smuzhiyun --stdio output supports this feature. 421*4882a593Smuzhiyun 422*4882a593Smuzhiyun--header-only:: 423*4882a593Smuzhiyun Show only perf.data header (forces --stdio). 424*4882a593Smuzhiyun 425*4882a593Smuzhiyun--time:: 426*4882a593Smuzhiyun Only analyze samples within given time window: <start>,<stop>. Times 427*4882a593Smuzhiyun have the format seconds.nanoseconds. If start is not given (i.e. time 428*4882a593Smuzhiyun string is ',x.y') then analysis starts at the beginning of the file. If 429*4882a593Smuzhiyun stop time is not given (i.e. time string is 'x.y,') then analysis goes 430*4882a593Smuzhiyun to end of file. Multiple ranges can be separated by spaces, which 431*4882a593Smuzhiyun requires the argument to be quoted e.g. --time "1234.567,1234.789 1235," 432*4882a593Smuzhiyun 433*4882a593Smuzhiyun Also support time percent with multiple time ranges. Time string is 434*4882a593Smuzhiyun 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. 435*4882a593Smuzhiyun 436*4882a593Smuzhiyun For example: 437*4882a593Smuzhiyun Select the second 10% time slice: 438*4882a593Smuzhiyun 439*4882a593Smuzhiyun perf report --time 10%/2 440*4882a593Smuzhiyun 441*4882a593Smuzhiyun Select from 0% to 10% time slice: 442*4882a593Smuzhiyun 443*4882a593Smuzhiyun perf report --time 0%-10% 444*4882a593Smuzhiyun 445*4882a593Smuzhiyun Select the first and second 10% time slices: 446*4882a593Smuzhiyun 447*4882a593Smuzhiyun perf report --time 10%/1,10%/2 448*4882a593Smuzhiyun 449*4882a593Smuzhiyun Select from 0% to 10% and 30% to 40% slices: 450*4882a593Smuzhiyun 451*4882a593Smuzhiyun perf report --time 0%-10%,30%-40% 452*4882a593Smuzhiyun 453*4882a593Smuzhiyun--switch-on EVENT_NAME:: 454*4882a593Smuzhiyun Only consider events after this event is found. 455*4882a593Smuzhiyun 456*4882a593Smuzhiyun This may be interesting to measure a workload only after some initialization 457*4882a593Smuzhiyun phase is over, i.e. insert a perf probe at that point and then using this 458*4882a593Smuzhiyun option with that probe. 459*4882a593Smuzhiyun 460*4882a593Smuzhiyun--switch-off EVENT_NAME:: 461*4882a593Smuzhiyun Stop considering events after this event is found. 462*4882a593Smuzhiyun 463*4882a593Smuzhiyun--show-on-off-events:: 464*4882a593Smuzhiyun Show the --switch-on/off events too. This has no effect in 'perf report' now 465*4882a593Smuzhiyun but probably we'll make the default not to show the switch-on/off events 466*4882a593Smuzhiyun on the --group mode and if there is only one event besides the off/on ones, 467*4882a593Smuzhiyun go straight to the histogram browser, just like 'perf report' with no events 468*4882a593Smuzhiyun explicitely specified does. 469*4882a593Smuzhiyun 470*4882a593Smuzhiyun--itrace:: 471*4882a593Smuzhiyun Options for decoding instruction tracing data. The options are: 472*4882a593Smuzhiyun 473*4882a593Smuzhiyuninclude::itrace.txt[] 474*4882a593Smuzhiyun 475*4882a593Smuzhiyun To disable decoding entirely, use --no-itrace. 476*4882a593Smuzhiyun 477*4882a593Smuzhiyun--full-source-path:: 478*4882a593Smuzhiyun Show the full path for source files for srcline output. 479*4882a593Smuzhiyun 480*4882a593Smuzhiyun--show-ref-call-graph:: 481*4882a593Smuzhiyun When multiple events are sampled, it may not be needed to collect 482*4882a593Smuzhiyun callgraphs for all of them. The sample sites are usually nearby, 483*4882a593Smuzhiyun and it's enough to collect the callgraphs on a reference event. 484*4882a593Smuzhiyun So user can use "call-graph=no" event modifier to disable callgraph 485*4882a593Smuzhiyun for other events to reduce the overhead. 486*4882a593Smuzhiyun However, perf report cannot show callgraphs for the event which 487*4882a593Smuzhiyun disable the callgraph. 488*4882a593Smuzhiyun This option extends the perf report to show reference callgraphs, 489*4882a593Smuzhiyun which collected by reference event, in no callgraph event. 490*4882a593Smuzhiyun 491*4882a593Smuzhiyun--stitch-lbr:: 492*4882a593Smuzhiyun Show callgraph with stitched LBRs, which may have more complete 493*4882a593Smuzhiyun callgraph. The perf.data file must have been obtained using 494*4882a593Smuzhiyun perf record --call-graph lbr. 495*4882a593Smuzhiyun Disabled by default. In common cases with call stack overflows, 496*4882a593Smuzhiyun it can recreate better call stacks than the default lbr call stack 497*4882a593Smuzhiyun output. But this approach is not full proof. There can be cases 498*4882a593Smuzhiyun where it creates incorrect call stacks from incorrect matches. 499*4882a593Smuzhiyun The known limitations include exception handing such as 500*4882a593Smuzhiyun setjmp/longjmp will have calls/returns not match. 501*4882a593Smuzhiyun 502*4882a593Smuzhiyun--socket-filter:: 503*4882a593Smuzhiyun Only report the samples on the processor socket that match with this filter 504*4882a593Smuzhiyun 505*4882a593Smuzhiyun--samples=N:: 506*4882a593Smuzhiyun Save N individual samples for each histogram entry to show context in perf 507*4882a593Smuzhiyun report tui browser. 508*4882a593Smuzhiyun 509*4882a593Smuzhiyun--raw-trace:: 510*4882a593Smuzhiyun When displaying traceevent output, do not use print fmt or plugins. 511*4882a593Smuzhiyun 512*4882a593Smuzhiyun--hierarchy:: 513*4882a593Smuzhiyun Enable hierarchical output. 514*4882a593Smuzhiyun 515*4882a593Smuzhiyun--inline:: 516*4882a593Smuzhiyun If a callgraph address belongs to an inlined function, the inline stack 517*4882a593Smuzhiyun will be printed. Each entry is function name or file/line. Enabled by 518*4882a593Smuzhiyun default, disable with --no-inline. 519*4882a593Smuzhiyun 520*4882a593Smuzhiyun--mmaps:: 521*4882a593Smuzhiyun Show --tasks output plus mmap information in a format similar to 522*4882a593Smuzhiyun /proc/<PID>/maps. 523*4882a593Smuzhiyun 524*4882a593Smuzhiyun Please note that not all mmaps are stored, options affecting which ones 525*4882a593Smuzhiyun are include 'perf record --data', for instance. 526*4882a593Smuzhiyun 527*4882a593Smuzhiyun--ns:: 528*4882a593Smuzhiyun Show time stamps in nanoseconds. 529*4882a593Smuzhiyun 530*4882a593Smuzhiyun--stats:: 531*4882a593Smuzhiyun Display overall events statistics without any further processing. 532*4882a593Smuzhiyun (like the one at the end of the perf report -D command) 533*4882a593Smuzhiyun 534*4882a593Smuzhiyun--tasks:: 535*4882a593Smuzhiyun Display monitored tasks stored in perf data. Displaying pid/tid/ppid 536*4882a593Smuzhiyun plus the command string aligned to distinguish parent and child tasks. 537*4882a593Smuzhiyun 538*4882a593Smuzhiyun--percent-type:: 539*4882a593Smuzhiyun Set annotation percent type from following choices: 540*4882a593Smuzhiyun global-period, local-period, global-hits, local-hits 541*4882a593Smuzhiyun 542*4882a593Smuzhiyun The local/global keywords set if the percentage is computed 543*4882a593Smuzhiyun in the scope of the function (local) or the whole data (global). 544*4882a593Smuzhiyun The period/hits keywords set the base the percentage is computed 545*4882a593Smuzhiyun on - the samples period or the number of samples (hits). 546*4882a593Smuzhiyun 547*4882a593Smuzhiyun--time-quantum:: 548*4882a593Smuzhiyun Configure time quantum for time sort key. Default 100ms. 549*4882a593Smuzhiyun Accepts s, us, ms, ns units. 550*4882a593Smuzhiyun 551*4882a593Smuzhiyun--total-cycles:: 552*4882a593Smuzhiyun When --total-cycles is specified, it supports sorting for all blocks by 553*4882a593Smuzhiyun 'Sampled Cycles%'. This is useful to concentrate on the globally hottest 554*4882a593Smuzhiyun blocks. In output, there are some new columns: 555*4882a593Smuzhiyun 556*4882a593Smuzhiyun 'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles 557*4882a593Smuzhiyun 'Sampled Cycles' - block sampled cycles aggregation 558*4882a593Smuzhiyun 'Avg Cycles%' - block average sampled cycles / sum of total block average 559*4882a593Smuzhiyun sampled cycles 560*4882a593Smuzhiyun 'Avg Cycles' - block average sampled cycles 561*4882a593Smuzhiyun 562*4882a593Smuzhiyuninclude::callchain-overhead-calculation.txt[] 563*4882a593Smuzhiyun 564*4882a593SmuzhiyunSEE ALSO 565*4882a593Smuzhiyun-------- 566*4882a593Smuzhiyunlinkperf:perf-stat[1], linkperf:perf-annotate[1], linkperf:perf-record[1], 567*4882a593Smuzhiyunlinkperf:perf-intel-pt[1] 568