1*4882a593Smuzhiyun.. SPDX-License-Identifier: CC-BY-SA-2.0-UK 2*4882a593Smuzhiyun.. highlight:: shell 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun*************************************************************** 5*4882a593SmuzhiyunBasic Usage (with examples) for each of the Yocto Tracing Tools 6*4882a593Smuzhiyun*************************************************************** 7*4882a593Smuzhiyun 8*4882a593Smuzhiyun| 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunThis chapter presents basic usage examples for each of the tracing 11*4882a593Smuzhiyuntools. 12*4882a593Smuzhiyun 13*4882a593Smuzhiyunperf 14*4882a593Smuzhiyun==== 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunThe 'perf' tool is the profiling and tracing tool that comes bundled 17*4882a593Smuzhiyunwith the Linux kernel. 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunDon't let the fact that it's part of the kernel fool you into thinking 20*4882a593Smuzhiyunthat it's only for tracing and profiling the kernel - you can indeed use 21*4882a593Smuzhiyunit to trace and profile just the kernel, but you can also use it to 22*4882a593Smuzhiyunprofile specific applications separately (with or without kernel 23*4882a593Smuzhiyuncontext), and you can also use it to trace and profile the kernel and 24*4882a593Smuzhiyunall applications on the system simultaneously to gain a system-wide view 25*4882a593Smuzhiyunof what's going on. 26*4882a593Smuzhiyun 27*4882a593SmuzhiyunIn many ways, perf aims to be a superset of all the tracing and 28*4882a593Smuzhiyunprofiling tools available in Linux today, including all the other tools 29*4882a593Smuzhiyuncovered in this HOWTO. The past couple of years have seen perf subsume a 30*4882a593Smuzhiyunlot of the functionality of those other tools and, at the same time, 31*4882a593Smuzhiyunthose other tools have removed large portions of their previous 32*4882a593Smuzhiyunfunctionality and replaced it with calls to the equivalent functionality 33*4882a593Smuzhiyunnow implemented by the perf subsystem. Extrapolation suggests that at 34*4882a593Smuzhiyunsome point those other tools will simply become completely redundant and 35*4882a593Smuzhiyungo away; until then, we'll cover those other tools in these pages and in 36*4882a593Smuzhiyunmany cases show how the same things can be accomplished in perf and the 37*4882a593Smuzhiyunother tools when it seems useful to do so. 38*4882a593Smuzhiyun 39*4882a593SmuzhiyunThe coverage below details some of the most common ways you'll likely 40*4882a593Smuzhiyunwant to apply the tool; full documentation can be found either within 41*4882a593Smuzhiyunthe tool itself or in the man pages at 42*4882a593Smuzhiyun`perf(1) <https://linux.die.net/man/1/perf>`__. 43*4882a593Smuzhiyun 44*4882a593SmuzhiyunPerf Setup 45*4882a593Smuzhiyun---------- 46*4882a593Smuzhiyun 47*4882a593SmuzhiyunFor this section, we'll assume you've already performed the basic setup 48*4882a593Smuzhiyunoutlined in the ":ref:`profile-manual/intro:General Setup`" section. 49*4882a593Smuzhiyun 50*4882a593SmuzhiyunIn particular, you'll get the most mileage out of perf if you profile an 51*4882a593Smuzhiyunimage built with the following in your ``local.conf`` file:: 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun INHIBIT_PACKAGE_STRIP = "1" 54*4882a593Smuzhiyun 55*4882a593Smuzhiyunperf runs on the target system for the most part. You can archive 56*4882a593Smuzhiyunprofile data and copy it to the host for analysis, but for the rest of 57*4882a593Smuzhiyunthis document we assume you've ssh'ed to the host and will be running 58*4882a593Smuzhiyunthe perf commands on the target. 59*4882a593Smuzhiyun 60*4882a593SmuzhiyunBasic Perf Usage 61*4882a593Smuzhiyun---------------- 62*4882a593Smuzhiyun 63*4882a593SmuzhiyunThe perf tool is pretty much self-documenting. To remind yourself of the 64*4882a593Smuzhiyunavailable commands, simply type 'perf', which will show you basic usage 65*4882a593Smuzhiyunalong with the available perf subcommands:: 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun root@crownbay:~# perf 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun usage: perf [--version] [--help] COMMAND [ARGS] 70*4882a593Smuzhiyun 71*4882a593Smuzhiyun The most commonly used perf commands are: 72*4882a593Smuzhiyun annotate Read perf.data (created by perf record) and display annotated code 73*4882a593Smuzhiyun archive Create archive with object files with build-ids found in perf.data file 74*4882a593Smuzhiyun bench General framework for benchmark suites 75*4882a593Smuzhiyun buildid-cache Manage build-id cache. 76*4882a593Smuzhiyun buildid-list List the buildids in a perf.data file 77*4882a593Smuzhiyun diff Read two perf.data files and display the differential profile 78*4882a593Smuzhiyun evlist List the event names in a perf.data file 79*4882a593Smuzhiyun inject Filter to augment the events stream with additional information 80*4882a593Smuzhiyun kmem Tool to trace/measure kernel memory(slab) properties 81*4882a593Smuzhiyun kvm Tool to trace/measure kvm guest os 82*4882a593Smuzhiyun list List all symbolic event types 83*4882a593Smuzhiyun lock Analyze lock events 84*4882a593Smuzhiyun probe Define new dynamic tracepoints 85*4882a593Smuzhiyun record Run a command and record its profile into perf.data 86*4882a593Smuzhiyun report Read perf.data (created by perf record) and display the profile 87*4882a593Smuzhiyun sched Tool to trace/measure scheduler properties (latencies) 88*4882a593Smuzhiyun script Read perf.data (created by perf record) and display trace output 89*4882a593Smuzhiyun stat Run a command and gather performance counter statistics 90*4882a593Smuzhiyun test Runs sanity tests. 91*4882a593Smuzhiyun timechart Tool to visualize total system behavior during a workload 92*4882a593Smuzhiyun top System profiling tool. 93*4882a593Smuzhiyun 94*4882a593Smuzhiyun See 'perf help COMMAND' for more information on a specific command. 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunUsing perf to do Basic Profiling 98*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunAs a simple test case, we'll profile the 'wget' of a fairly large file, 101*4882a593Smuzhiyunwhich is a minimally interesting case because it has both file and 102*4882a593Smuzhiyunnetwork I/O aspects, and at least in the case of standard Yocto images, 103*4882a593Smuzhiyunit's implemented as part of BusyBox, so the methods we use to analyze it 104*4882a593Smuzhiyuncan be used in a very similar way to the whole host of supported BusyBox 105*4882a593Smuzhiyunapplets in Yocto. :: 106*4882a593Smuzhiyun 107*4882a593Smuzhiyun root@crownbay:~# rm linux-2.6.19.2.tar.bz2; \ 108*4882a593Smuzhiyun wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunThe quickest and easiest way to get some basic overall data about what's 111*4882a593Smuzhiyungoing on for a particular workload is to profile it using 'perf stat'. 112*4882a593Smuzhiyun'perf stat' basically profiles using a few default counters and displays 113*4882a593Smuzhiyunthe summed counts at the end of the run:: 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun root@crownbay:~# perf stat wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 116*4882a593Smuzhiyun Connecting to downloads.yoctoproject.org (140.211.169.59:80) 117*4882a593Smuzhiyun linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA 118*4882a593Smuzhiyun 119*4882a593Smuzhiyun Performance counter stats for 'wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2': 120*4882a593Smuzhiyun 121*4882a593Smuzhiyun 4597.223902 task-clock # 0.077 CPUs utilized 122*4882a593Smuzhiyun 23568 context-switches # 0.005 M/sec 123*4882a593Smuzhiyun 68 CPU-migrations # 0.015 K/sec 124*4882a593Smuzhiyun 241 page-faults # 0.052 K/sec 125*4882a593Smuzhiyun 3045817293 cycles # 0.663 GHz 126*4882a593Smuzhiyun <not supported> stalled-cycles-frontend 127*4882a593Smuzhiyun <not supported> stalled-cycles-backend 128*4882a593Smuzhiyun 858909167 instructions # 0.28 insns per cycle 129*4882a593Smuzhiyun 165441165 branches # 35.987 M/sec 130*4882a593Smuzhiyun 19550329 branch-misses # 11.82% of all branches 131*4882a593Smuzhiyun 132*4882a593Smuzhiyun 59.836627620 seconds time elapsed 133*4882a593Smuzhiyun 134*4882a593SmuzhiyunMany times such a simple-minded test doesn't yield much of 135*4882a593Smuzhiyuninterest, but sometimes it does (see Real-world Yocto bug (slow 136*4882a593Smuzhiyunloop-mounted write speed)). 137*4882a593Smuzhiyun 138*4882a593SmuzhiyunAlso, note that 'perf stat' isn't restricted to a fixed set of counters 139*4882a593Smuzhiyun- basically any event listed in the output of 'perf list' can be tallied 140*4882a593Smuzhiyunby 'perf stat'. For example, suppose we wanted to see a summary of all 141*4882a593Smuzhiyunthe events related to kernel memory allocation/freeing along with cache 142*4882a593Smuzhiyunhits and misses:: 143*4882a593Smuzhiyun 144*4882a593Smuzhiyun root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 145*4882a593Smuzhiyun Connecting to downloads.yoctoproject.org (140.211.169.59:80) 146*4882a593Smuzhiyun linux-2.6.19.2.tar.b 100% |***************************************************| 41727k 0:00:00 ETA 147*4882a593Smuzhiyun 148*4882a593Smuzhiyun Performance counter stats for 'wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2': 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun 5566 kmem:kmalloc 151*4882a593Smuzhiyun 125517 kmem:kmem_cache_alloc 152*4882a593Smuzhiyun 0 kmem:kmalloc_node 153*4882a593Smuzhiyun 0 kmem:kmem_cache_alloc_node 154*4882a593Smuzhiyun 34401 kmem:kfree 155*4882a593Smuzhiyun 69920 kmem:kmem_cache_free 156*4882a593Smuzhiyun 133 kmem:mm_page_free 157*4882a593Smuzhiyun 41 kmem:mm_page_free_batched 158*4882a593Smuzhiyun 11502 kmem:mm_page_alloc 159*4882a593Smuzhiyun 11375 kmem:mm_page_alloc_zone_locked 160*4882a593Smuzhiyun 0 kmem:mm_page_pcpu_drain 161*4882a593Smuzhiyun 0 kmem:mm_page_alloc_extfrag 162*4882a593Smuzhiyun 66848602 cache-references 163*4882a593Smuzhiyun 2917740 cache-misses # 4.365 % of all cache refs 164*4882a593Smuzhiyun 165*4882a593Smuzhiyun 44.831023415 seconds time elapsed 166*4882a593Smuzhiyun 167*4882a593SmuzhiyunSo 'perf stat' gives us a nice easy 168*4882a593Smuzhiyunway to get a quick overview of what might be happening for a set of 169*4882a593Smuzhiyunevents, but normally we'd need a little more detail in order to 170*4882a593Smuzhiyununderstand what's going on in a way that we can act on in a useful way. 171*4882a593Smuzhiyun 172*4882a593SmuzhiyunTo dive down into a next level of detail, we can use 'perf record'/'perf 173*4882a593Smuzhiyunreport' which will collect profiling data and present it to use using an 174*4882a593Smuzhiyuninteractive text-based UI (or simply as text if we specify ``--stdio`` to 175*4882a593Smuzhiyun'perf report'). 176*4882a593Smuzhiyun 177*4882a593SmuzhiyunAs our first attempt at profiling this workload, we'll simply run 'perf 178*4882a593Smuzhiyunrecord', handing it the workload we want to profile (everything after 179*4882a593Smuzhiyun'perf record' and any perf options we hand it - here none - will be 180*4882a593Smuzhiyunexecuted in a new shell). perf collects samples until the process exits 181*4882a593Smuzhiyunand records them in a file named 'perf.data' in the current working 182*4882a593Smuzhiyundirectory. :: 183*4882a593Smuzhiyun 184*4882a593Smuzhiyun root@crownbay:~# perf record wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 185*4882a593Smuzhiyun 186*4882a593Smuzhiyun Connecting to downloads.yoctoproject.org (140.211.169.59:80) 187*4882a593Smuzhiyun linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA 188*4882a593Smuzhiyun [ perf record: Woken up 1 times to write data ] 189*4882a593Smuzhiyun [ perf record: Captured and wrote 0.176 MB perf.data (~7700 samples) ] 190*4882a593Smuzhiyun 191*4882a593SmuzhiyunTo see the results in a 192*4882a593Smuzhiyun'text-based UI' (tui), simply run 'perf report', which will read the 193*4882a593Smuzhiyunperf.data file in the current working directory and display the results 194*4882a593Smuzhiyunin an interactive UI:: 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun root@crownbay:~# perf report 197*4882a593Smuzhiyun 198*4882a593Smuzhiyun.. image:: figures/perf-wget-flat-stripped.png 199*4882a593Smuzhiyun :align: center 200*4882a593Smuzhiyun 201*4882a593SmuzhiyunThe above screenshot displays a 'flat' profile, one entry for each 202*4882a593Smuzhiyun'bucket' corresponding to the functions that were profiled during the 203*4882a593Smuzhiyunprofiling run, ordered from the most popular to the least (perf has 204*4882a593Smuzhiyunoptions to sort in various orders and keys as well as display entries 205*4882a593Smuzhiyunonly above a certain threshold and so on - see the perf documentation 206*4882a593Smuzhiyunfor details). Note that this includes both userspace functions (entries 207*4882a593Smuzhiyuncontaining a [.]) and kernel functions accounted to the process (entries 208*4882a593Smuzhiyuncontaining a [k]). (perf has command-line modifiers that can be used to 209*4882a593Smuzhiyunrestrict the profiling to kernel or userspace, among others). 210*4882a593Smuzhiyun 211*4882a593SmuzhiyunNotice also that the above report shows an entry for 'busybox', which is 212*4882a593Smuzhiyunthe executable that implements 'wget' in Yocto, but that instead of a 213*4882a593Smuzhiyunuseful function name in that entry, it displays a not-so-friendly hex 214*4882a593Smuzhiyunvalue instead. The steps below will show how to fix that problem. 215*4882a593Smuzhiyun 216*4882a593SmuzhiyunBefore we do that, however, let's try running a different profile, one 217*4882a593Smuzhiyunwhich shows something a little more interesting. The only difference 218*4882a593Smuzhiyunbetween the new profile and the previous one is that we'll add the -g 219*4882a593Smuzhiyunoption, which will record not just the address of a sampled function, 220*4882a593Smuzhiyunbut the entire callchain to the sampled function as well:: 221*4882a593Smuzhiyun 222*4882a593Smuzhiyun root@crownbay:~# perf record -g wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 223*4882a593Smuzhiyun Connecting to downloads.yoctoproject.org (140.211.169.59:80) 224*4882a593Smuzhiyun linux-2.6.19.2.tar.b 100% |************************************************| 41727k 0:00:00 ETA 225*4882a593Smuzhiyun [ perf record: Woken up 3 times to write data ] 226*4882a593Smuzhiyun [ perf record: Captured and wrote 0.652 MB perf.data (~28476 samples) ] 227*4882a593Smuzhiyun 228*4882a593Smuzhiyun 229*4882a593Smuzhiyun root@crownbay:~# perf report 230*4882a593Smuzhiyun 231*4882a593Smuzhiyun.. image:: figures/perf-wget-g-copy-to-user-expanded-stripped.png 232*4882a593Smuzhiyun :align: center 233*4882a593Smuzhiyun 234*4882a593SmuzhiyunUsing the callgraph view, we can actually see not only which functions 235*4882a593Smuzhiyuntook the most time, but we can also see a summary of how those functions 236*4882a593Smuzhiyunwere called and learn something about how the program interacts with the 237*4882a593Smuzhiyunkernel in the process. 238*4882a593Smuzhiyun 239*4882a593SmuzhiyunNotice that each entry in the above screenshot now contains a '+' on the 240*4882a593Smuzhiyunleft-hand side. This means that we can expand the entry and drill down 241*4882a593Smuzhiyuninto the callchains that feed into that entry. Pressing 'enter' on any 242*4882a593Smuzhiyunone of them will expand the callchain (you can also press 'E' to expand 243*4882a593Smuzhiyunthem all at the same time or 'C' to collapse them all). 244*4882a593Smuzhiyun 245*4882a593SmuzhiyunIn the screenshot above, we've toggled the ``__copy_to_user_ll()`` entry 246*4882a593Smuzhiyunand several subnodes all the way down. This lets us see which callchains 247*4882a593Smuzhiyuncontributed to the profiled ``__copy_to_user_ll()`` function which 248*4882a593Smuzhiyuncontributed 1.77% to the total profile. 249*4882a593Smuzhiyun 250*4882a593SmuzhiyunAs a bit of background explanation for these callchains, think about 251*4882a593Smuzhiyunwhat happens at a high level when you run wget to get a file out on the 252*4882a593Smuzhiyunnetwork. Basically what happens is that the data comes into the kernel 253*4882a593Smuzhiyunvia the network connection (socket) and is passed to the userspace 254*4882a593Smuzhiyunprogram 'wget' (which is actually a part of BusyBox, but that's not 255*4882a593Smuzhiyunimportant for now), which takes the buffers the kernel passes to it and 256*4882a593Smuzhiyunwrites it to a disk file to save it. 257*4882a593Smuzhiyun 258*4882a593SmuzhiyunThe part of this process that we're looking at in the above call stacks 259*4882a593Smuzhiyunis the part where the kernel passes the data it has read from the socket 260*4882a593Smuzhiyundown to wget i.e. a copy-to-user. 261*4882a593Smuzhiyun 262*4882a593SmuzhiyunNotice also that here there's also a case where the hex value is 263*4882a593Smuzhiyundisplayed in the callstack, here in the expanded ``sys_clock_gettime()`` 264*4882a593Smuzhiyunfunction. Later we'll see it resolve to a userspace function call in 265*4882a593Smuzhiyunbusybox. 266*4882a593Smuzhiyun 267*4882a593Smuzhiyun.. image:: figures/perf-wget-g-copy-from-user-expanded-stripped.png 268*4882a593Smuzhiyun :align: center 269*4882a593Smuzhiyun 270*4882a593SmuzhiyunThe above screenshot shows the other half of the journey for the data - 271*4882a593Smuzhiyunfrom the wget program's userspace buffers to disk. To get the buffers to 272*4882a593Smuzhiyundisk, the wget program issues a ``write(2)``, which does a ``copy-from-user`` to 273*4882a593Smuzhiyunthe kernel, which then takes care via some circuitous path (probably 274*4882a593Smuzhiyunalso present somewhere in the profile data), to get it safely to disk. 275*4882a593Smuzhiyun 276*4882a593SmuzhiyunNow that we've seen the basic layout of the profile data and the basics 277*4882a593Smuzhiyunof how to extract useful information out of it, let's get back to the 278*4882a593Smuzhiyuntask at hand and see if we can get some basic idea about where the time 279*4882a593Smuzhiyunis spent in the program we're profiling, wget. Remember that wget is 280*4882a593Smuzhiyunactually implemented as an applet in BusyBox, so while the process name 281*4882a593Smuzhiyunis 'wget', the executable we're actually interested in is BusyBox. So 282*4882a593Smuzhiyunlet's expand the first entry containing BusyBox: 283*4882a593Smuzhiyun 284*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-expanded-stripped.png 285*4882a593Smuzhiyun :align: center 286*4882a593Smuzhiyun 287*4882a593SmuzhiyunAgain, before we expanded we saw that the function was labeled with a 288*4882a593Smuzhiyunhex value instead of a symbol as with most of the kernel entries. 289*4882a593SmuzhiyunExpanding the BusyBox entry doesn't make it any better. 290*4882a593Smuzhiyun 291*4882a593SmuzhiyunThe problem is that perf can't find the symbol information for the 292*4882a593Smuzhiyunbusybox binary, which is actually stripped out by the Yocto build 293*4882a593Smuzhiyunsystem. 294*4882a593Smuzhiyun 295*4882a593SmuzhiyunOne way around that is to put the following in your ``local.conf`` file 296*4882a593Smuzhiyunwhen you build the image:: 297*4882a593Smuzhiyun 298*4882a593Smuzhiyun INHIBIT_PACKAGE_STRIP = "1" 299*4882a593Smuzhiyun 300*4882a593SmuzhiyunHowever, we already have an image with the binaries stripped, so 301*4882a593Smuzhiyunwhat can we do to get perf to resolve the symbols? Basically we need to 302*4882a593Smuzhiyuninstall the debuginfo for the BusyBox package. 303*4882a593Smuzhiyun 304*4882a593SmuzhiyunTo generate the debug info for the packages in the image, we can add 305*4882a593Smuzhiyun``dbg-pkgs`` to :term:`EXTRA_IMAGE_FEATURES` in ``local.conf``. For example:: 306*4882a593Smuzhiyun 307*4882a593Smuzhiyun EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs" 308*4882a593Smuzhiyun 309*4882a593SmuzhiyunAdditionally, in order to generate the type of debuginfo that perf 310*4882a593Smuzhiyununderstands, we also need to set 311*4882a593Smuzhiyun:term:`PACKAGE_DEBUG_SPLIT_STYLE` 312*4882a593Smuzhiyunin the ``local.conf`` file:: 313*4882a593Smuzhiyun 314*4882a593Smuzhiyun PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory' 315*4882a593Smuzhiyun 316*4882a593SmuzhiyunOnce we've done that, we can install the 317*4882a593Smuzhiyundebuginfo for BusyBox. The debug packages once built can be found in 318*4882a593Smuzhiyun``build/tmp/deploy/rpm/*`` on the host system. Find the busybox-dbg-...rpm 319*4882a593Smuzhiyunfile and copy it to the target. For example:: 320*4882a593Smuzhiyun 321*4882a593Smuzhiyun [trz@empanada core2]$ scp /home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2_32/busybox-dbg-1.20.2-r2.core2_32.rpm root@192.168.1.31: 322*4882a593Smuzhiyun busybox-dbg-1.20.2-r2.core2_32.rpm 100% 1826KB 1.8MB/s 00:01 323*4882a593Smuzhiyun 324*4882a593SmuzhiyunNow install the debug rpm on the target:: 325*4882a593Smuzhiyun 326*4882a593Smuzhiyun root@crownbay:~# rpm -i busybox-dbg-1.20.2-r2.core2_32.rpm 327*4882a593Smuzhiyun 328*4882a593SmuzhiyunNow that the debuginfo is installed, we see that the BusyBox entries now display 329*4882a593Smuzhiyuntheir functions symbolically: 330*4882a593Smuzhiyun 331*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-debuginfo.png 332*4882a593Smuzhiyun :align: center 333*4882a593Smuzhiyun 334*4882a593SmuzhiyunIf we expand one of the entries and press 'enter' on a leaf node, we're 335*4882a593Smuzhiyunpresented with a menu of actions we can take to get more information 336*4882a593Smuzhiyunrelated to that entry: 337*4882a593Smuzhiyun 338*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-dso-zoom-menu.png 339*4882a593Smuzhiyun :align: center 340*4882a593Smuzhiyun 341*4882a593SmuzhiyunOne of these actions allows us to show a view that displays a 342*4882a593Smuzhiyunbusybox-centric view of the profiled functions (in this case we've also 343*4882a593Smuzhiyunexpanded all the nodes using the 'E' key): 344*4882a593Smuzhiyun 345*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-dso-zoom.png 346*4882a593Smuzhiyun :align: center 347*4882a593Smuzhiyun 348*4882a593SmuzhiyunFinally, we can see that now that the BusyBox debuginfo is installed, 349*4882a593Smuzhiyunthe previously unresolved symbol in the ``sys_clock_gettime()`` entry 350*4882a593Smuzhiyunmentioned previously is now resolved, and shows that the 351*4882a593Smuzhiyunsys_clock_gettime system call that was the source of 6.75% of the 352*4882a593Smuzhiyuncopy-to-user overhead was initiated by the ``handle_input()`` BusyBox 353*4882a593Smuzhiyunfunction: 354*4882a593Smuzhiyun 355*4882a593Smuzhiyun.. image:: figures/perf-wget-g-copy-to-user-expanded-debuginfo.png 356*4882a593Smuzhiyun :align: center 357*4882a593Smuzhiyun 358*4882a593SmuzhiyunAt the lowest level of detail, we can dive down to the assembly level 359*4882a593Smuzhiyunand see which instructions caused the most overhead in a function. 360*4882a593SmuzhiyunPressing 'enter' on the 'udhcpc_main' function, we're again presented 361*4882a593Smuzhiyunwith a menu: 362*4882a593Smuzhiyun 363*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-annotate-menu.png 364*4882a593Smuzhiyun :align: center 365*4882a593Smuzhiyun 366*4882a593SmuzhiyunSelecting 'Annotate udhcpc_main', we get a detailed listing of 367*4882a593Smuzhiyunpercentages by instruction for the udhcpc_main function. From the 368*4882a593Smuzhiyundisplay, we can see that over 50% of the time spent in this function is 369*4882a593Smuzhiyuntaken up by a couple tests and the move of a constant (1) to a register: 370*4882a593Smuzhiyun 371*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-annotate-udhcpc.png 372*4882a593Smuzhiyun :align: center 373*4882a593Smuzhiyun 374*4882a593SmuzhiyunAs a segue into tracing, let's try another profile using a different 375*4882a593Smuzhiyuncounter, something other than the default 'cycles'. 376*4882a593Smuzhiyun 377*4882a593SmuzhiyunThe tracing and profiling infrastructure in Linux has become unified in 378*4882a593Smuzhiyuna way that allows us to use the same tool with a completely different 379*4882a593Smuzhiyunset of counters, not just the standard hardware counters that 380*4882a593Smuzhiyuntraditional tools have had to restrict themselves to (of course the 381*4882a593Smuzhiyuntraditional tools can also make use of the expanded possibilities now 382*4882a593Smuzhiyunavailable to them, and in some cases have, as mentioned previously). 383*4882a593Smuzhiyun 384*4882a593SmuzhiyunWe can get a list of the available events that can be used to profile a 385*4882a593Smuzhiyunworkload via 'perf list':: 386*4882a593Smuzhiyun 387*4882a593Smuzhiyun root@crownbay:~# perf list 388*4882a593Smuzhiyun 389*4882a593Smuzhiyun List of pre-defined events (to be used in -e): 390*4882a593Smuzhiyun cpu-cycles OR cycles [Hardware event] 391*4882a593Smuzhiyun stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] 392*4882a593Smuzhiyun stalled-cycles-backend OR idle-cycles-backend [Hardware event] 393*4882a593Smuzhiyun instructions [Hardware event] 394*4882a593Smuzhiyun cache-references [Hardware event] 395*4882a593Smuzhiyun cache-misses [Hardware event] 396*4882a593Smuzhiyun branch-instructions OR branches [Hardware event] 397*4882a593Smuzhiyun branch-misses [Hardware event] 398*4882a593Smuzhiyun bus-cycles [Hardware event] 399*4882a593Smuzhiyun ref-cycles [Hardware event] 400*4882a593Smuzhiyun 401*4882a593Smuzhiyun cpu-clock [Software event] 402*4882a593Smuzhiyun task-clock [Software event] 403*4882a593Smuzhiyun page-faults OR faults [Software event] 404*4882a593Smuzhiyun minor-faults [Software event] 405*4882a593Smuzhiyun major-faults [Software event] 406*4882a593Smuzhiyun context-switches OR cs [Software event] 407*4882a593Smuzhiyun cpu-migrations OR migrations [Software event] 408*4882a593Smuzhiyun alignment-faults [Software event] 409*4882a593Smuzhiyun emulation-faults [Software event] 410*4882a593Smuzhiyun 411*4882a593Smuzhiyun L1-dcache-loads [Hardware cache event] 412*4882a593Smuzhiyun L1-dcache-load-misses [Hardware cache event] 413*4882a593Smuzhiyun L1-dcache-prefetch-misses [Hardware cache event] 414*4882a593Smuzhiyun L1-icache-loads [Hardware cache event] 415*4882a593Smuzhiyun L1-icache-load-misses [Hardware cache event] 416*4882a593Smuzhiyun . 417*4882a593Smuzhiyun . 418*4882a593Smuzhiyun . 419*4882a593Smuzhiyun rNNN [Raw hardware event descriptor] 420*4882a593Smuzhiyun cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] 421*4882a593Smuzhiyun (see 'perf list --help' on how to encode it) 422*4882a593Smuzhiyun 423*4882a593Smuzhiyun mem:<addr>[:access] [Hardware breakpoint] 424*4882a593Smuzhiyun 425*4882a593Smuzhiyun sunrpc:rpc_call_status [Tracepoint event] 426*4882a593Smuzhiyun sunrpc:rpc_bind_status [Tracepoint event] 427*4882a593Smuzhiyun sunrpc:rpc_connect_status [Tracepoint event] 428*4882a593Smuzhiyun sunrpc:rpc_task_begin [Tracepoint event] 429*4882a593Smuzhiyun skb:kfree_skb [Tracepoint event] 430*4882a593Smuzhiyun skb:consume_skb [Tracepoint event] 431*4882a593Smuzhiyun skb:skb_copy_datagram_iovec [Tracepoint event] 432*4882a593Smuzhiyun net:net_dev_xmit [Tracepoint event] 433*4882a593Smuzhiyun net:net_dev_queue [Tracepoint event] 434*4882a593Smuzhiyun net:netif_receive_skb [Tracepoint event] 435*4882a593Smuzhiyun net:netif_rx [Tracepoint event] 436*4882a593Smuzhiyun napi:napi_poll [Tracepoint event] 437*4882a593Smuzhiyun sock:sock_rcvqueue_full [Tracepoint event] 438*4882a593Smuzhiyun sock:sock_exceed_buf_limit [Tracepoint event] 439*4882a593Smuzhiyun udp:udp_fail_queue_rcv_skb [Tracepoint event] 440*4882a593Smuzhiyun hda:hda_send_cmd [Tracepoint event] 441*4882a593Smuzhiyun hda:hda_get_response [Tracepoint event] 442*4882a593Smuzhiyun hda:hda_bus_reset [Tracepoint event] 443*4882a593Smuzhiyun scsi:scsi_dispatch_cmd_start [Tracepoint event] 444*4882a593Smuzhiyun scsi:scsi_dispatch_cmd_error [Tracepoint event] 445*4882a593Smuzhiyun scsi:scsi_eh_wakeup [Tracepoint event] 446*4882a593Smuzhiyun drm:drm_vblank_event [Tracepoint event] 447*4882a593Smuzhiyun drm:drm_vblank_event_queued [Tracepoint event] 448*4882a593Smuzhiyun drm:drm_vblank_event_delivered [Tracepoint event] 449*4882a593Smuzhiyun random:mix_pool_bytes [Tracepoint event] 450*4882a593Smuzhiyun random:mix_pool_bytes_nolock [Tracepoint event] 451*4882a593Smuzhiyun random:credit_entropy_bits [Tracepoint event] 452*4882a593Smuzhiyun gpio:gpio_direction [Tracepoint event] 453*4882a593Smuzhiyun gpio:gpio_value [Tracepoint event] 454*4882a593Smuzhiyun block:block_rq_abort [Tracepoint event] 455*4882a593Smuzhiyun block:block_rq_requeue [Tracepoint event] 456*4882a593Smuzhiyun block:block_rq_issue [Tracepoint event] 457*4882a593Smuzhiyun block:block_bio_bounce [Tracepoint event] 458*4882a593Smuzhiyun block:block_bio_complete [Tracepoint event] 459*4882a593Smuzhiyun block:block_bio_backmerge [Tracepoint event] 460*4882a593Smuzhiyun . 461*4882a593Smuzhiyun . 462*4882a593Smuzhiyun writeback:writeback_wake_thread [Tracepoint event] 463*4882a593Smuzhiyun writeback:writeback_wake_forker_thread [Tracepoint event] 464*4882a593Smuzhiyun writeback:writeback_bdi_register [Tracepoint event] 465*4882a593Smuzhiyun . 466*4882a593Smuzhiyun . 467*4882a593Smuzhiyun writeback:writeback_single_inode_requeue [Tracepoint event] 468*4882a593Smuzhiyun writeback:writeback_single_inode [Tracepoint event] 469*4882a593Smuzhiyun kmem:kmalloc [Tracepoint event] 470*4882a593Smuzhiyun kmem:kmem_cache_alloc [Tracepoint event] 471*4882a593Smuzhiyun kmem:mm_page_alloc [Tracepoint event] 472*4882a593Smuzhiyun kmem:mm_page_alloc_zone_locked [Tracepoint event] 473*4882a593Smuzhiyun kmem:mm_page_pcpu_drain [Tracepoint event] 474*4882a593Smuzhiyun kmem:mm_page_alloc_extfrag [Tracepoint event] 475*4882a593Smuzhiyun vmscan:mm_vmscan_kswapd_sleep [Tracepoint event] 476*4882a593Smuzhiyun vmscan:mm_vmscan_kswapd_wake [Tracepoint event] 477*4882a593Smuzhiyun vmscan:mm_vmscan_wakeup_kswapd [Tracepoint event] 478*4882a593Smuzhiyun vmscan:mm_vmscan_direct_reclaim_begin [Tracepoint event] 479*4882a593Smuzhiyun . 480*4882a593Smuzhiyun . 481*4882a593Smuzhiyun module:module_get [Tracepoint event] 482*4882a593Smuzhiyun module:module_put [Tracepoint event] 483*4882a593Smuzhiyun module:module_request [Tracepoint event] 484*4882a593Smuzhiyun sched:sched_kthread_stop [Tracepoint event] 485*4882a593Smuzhiyun sched:sched_wakeup [Tracepoint event] 486*4882a593Smuzhiyun sched:sched_wakeup_new [Tracepoint event] 487*4882a593Smuzhiyun sched:sched_process_fork [Tracepoint event] 488*4882a593Smuzhiyun sched:sched_process_exec [Tracepoint event] 489*4882a593Smuzhiyun sched:sched_stat_runtime [Tracepoint event] 490*4882a593Smuzhiyun rcu:rcu_utilization [Tracepoint event] 491*4882a593Smuzhiyun workqueue:workqueue_queue_work [Tracepoint event] 492*4882a593Smuzhiyun workqueue:workqueue_execute_end [Tracepoint event] 493*4882a593Smuzhiyun signal:signal_generate [Tracepoint event] 494*4882a593Smuzhiyun signal:signal_deliver [Tracepoint event] 495*4882a593Smuzhiyun timer:timer_init [Tracepoint event] 496*4882a593Smuzhiyun timer:timer_start [Tracepoint event] 497*4882a593Smuzhiyun timer:hrtimer_cancel [Tracepoint event] 498*4882a593Smuzhiyun timer:itimer_state [Tracepoint event] 499*4882a593Smuzhiyun timer:itimer_expire [Tracepoint event] 500*4882a593Smuzhiyun irq:irq_handler_entry [Tracepoint event] 501*4882a593Smuzhiyun irq:irq_handler_exit [Tracepoint event] 502*4882a593Smuzhiyun irq:softirq_entry [Tracepoint event] 503*4882a593Smuzhiyun irq:softirq_exit [Tracepoint event] 504*4882a593Smuzhiyun irq:softirq_raise [Tracepoint event] 505*4882a593Smuzhiyun printk:console [Tracepoint event] 506*4882a593Smuzhiyun task:task_newtask [Tracepoint event] 507*4882a593Smuzhiyun task:task_rename [Tracepoint event] 508*4882a593Smuzhiyun syscalls:sys_enter_socketcall [Tracepoint event] 509*4882a593Smuzhiyun syscalls:sys_exit_socketcall [Tracepoint event] 510*4882a593Smuzhiyun . 511*4882a593Smuzhiyun . 512*4882a593Smuzhiyun . 513*4882a593Smuzhiyun syscalls:sys_enter_unshare [Tracepoint event] 514*4882a593Smuzhiyun syscalls:sys_exit_unshare [Tracepoint event] 515*4882a593Smuzhiyun raw_syscalls:sys_enter [Tracepoint event] 516*4882a593Smuzhiyun raw_syscalls:sys_exit [Tracepoint event] 517*4882a593Smuzhiyun 518*4882a593Smuzhiyun.. admonition:: Tying it Together 519*4882a593Smuzhiyun 520*4882a593Smuzhiyun These are exactly the same set of events defined by the trace event 521*4882a593Smuzhiyun subsystem and exposed by ftrace/tracecmd/kernelshark as files in 522*4882a593Smuzhiyun /sys/kernel/debug/tracing/events, by SystemTap as 523*4882a593Smuzhiyun kernel.trace("tracepoint_name") and (partially) accessed by LTTng. 524*4882a593Smuzhiyun 525*4882a593SmuzhiyunOnly a subset of these would be of interest to us when looking at this 526*4882a593Smuzhiyunworkload, so let's choose the most likely subsystems (identified by the 527*4882a593Smuzhiyunstring before the colon in the Tracepoint events) and do a 'perf stat' 528*4882a593Smuzhiyunrun using only those wildcarded subsystems:: 529*4882a593Smuzhiyun 530*4882a593Smuzhiyun root@crownbay:~# perf stat -e skb:* -e net:* -e napi:* -e sched:* -e workqueue:* -e irq:* -e syscalls:* wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 531*4882a593Smuzhiyun Performance counter stats for 'wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2': 532*4882a593Smuzhiyun 533*4882a593Smuzhiyun 23323 skb:kfree_skb 534*4882a593Smuzhiyun 0 skb:consume_skb 535*4882a593Smuzhiyun 49897 skb:skb_copy_datagram_iovec 536*4882a593Smuzhiyun 6217 net:net_dev_xmit 537*4882a593Smuzhiyun 6217 net:net_dev_queue 538*4882a593Smuzhiyun 7962 net:netif_receive_skb 539*4882a593Smuzhiyun 2 net:netif_rx 540*4882a593Smuzhiyun 8340 napi:napi_poll 541*4882a593Smuzhiyun 0 sched:sched_kthread_stop 542*4882a593Smuzhiyun 0 sched:sched_kthread_stop_ret 543*4882a593Smuzhiyun 3749 sched:sched_wakeup 544*4882a593Smuzhiyun 0 sched:sched_wakeup_new 545*4882a593Smuzhiyun 0 sched:sched_switch 546*4882a593Smuzhiyun 29 sched:sched_migrate_task 547*4882a593Smuzhiyun 0 sched:sched_process_free 548*4882a593Smuzhiyun 1 sched:sched_process_exit 549*4882a593Smuzhiyun 0 sched:sched_wait_task 550*4882a593Smuzhiyun 0 sched:sched_process_wait 551*4882a593Smuzhiyun 0 sched:sched_process_fork 552*4882a593Smuzhiyun 1 sched:sched_process_exec 553*4882a593Smuzhiyun 0 sched:sched_stat_wait 554*4882a593Smuzhiyun 2106519415641 sched:sched_stat_sleep 555*4882a593Smuzhiyun 0 sched:sched_stat_iowait 556*4882a593Smuzhiyun 147453613 sched:sched_stat_blocked 557*4882a593Smuzhiyun 12903026955 sched:sched_stat_runtime 558*4882a593Smuzhiyun 0 sched:sched_pi_setprio 559*4882a593Smuzhiyun 3574 workqueue:workqueue_queue_work 560*4882a593Smuzhiyun 3574 workqueue:workqueue_activate_work 561*4882a593Smuzhiyun 0 workqueue:workqueue_execute_start 562*4882a593Smuzhiyun 0 workqueue:workqueue_execute_end 563*4882a593Smuzhiyun 16631 irq:irq_handler_entry 564*4882a593Smuzhiyun 16631 irq:irq_handler_exit 565*4882a593Smuzhiyun 28521 irq:softirq_entry 566*4882a593Smuzhiyun 28521 irq:softirq_exit 567*4882a593Smuzhiyun 28728 irq:softirq_raise 568*4882a593Smuzhiyun 1 syscalls:sys_enter_sendmmsg 569*4882a593Smuzhiyun 1 syscalls:sys_exit_sendmmsg 570*4882a593Smuzhiyun 0 syscalls:sys_enter_recvmmsg 571*4882a593Smuzhiyun 0 syscalls:sys_exit_recvmmsg 572*4882a593Smuzhiyun 14 syscalls:sys_enter_socketcall 573*4882a593Smuzhiyun 14 syscalls:sys_exit_socketcall 574*4882a593Smuzhiyun . 575*4882a593Smuzhiyun . 576*4882a593Smuzhiyun . 577*4882a593Smuzhiyun 16965 syscalls:sys_enter_read 578*4882a593Smuzhiyun 16965 syscalls:sys_exit_read 579*4882a593Smuzhiyun 12854 syscalls:sys_enter_write 580*4882a593Smuzhiyun 12854 syscalls:sys_exit_write 581*4882a593Smuzhiyun . 582*4882a593Smuzhiyun . 583*4882a593Smuzhiyun . 584*4882a593Smuzhiyun 585*4882a593Smuzhiyun 58.029710972 seconds time elapsed 586*4882a593Smuzhiyun 587*4882a593Smuzhiyun 588*4882a593Smuzhiyun 589*4882a593SmuzhiyunLet's pick one of these tracepoints 590*4882a593Smuzhiyunand tell perf to do a profile using it as the sampling event:: 591*4882a593Smuzhiyun 592*4882a593Smuzhiyun root@crownbay:~# perf record -g -e sched:sched_wakeup wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 593*4882a593Smuzhiyun 594*4882a593Smuzhiyun.. image:: figures/sched-wakeup-profile.png 595*4882a593Smuzhiyun :align: center 596*4882a593Smuzhiyun 597*4882a593SmuzhiyunThe screenshot above shows the results of running a profile using 598*4882a593Smuzhiyunsched:sched_switch tracepoint, which shows the relative costs of various 599*4882a593Smuzhiyunpaths to sched_wakeup (note that sched_wakeup is the name of the 600*4882a593Smuzhiyuntracepoint - it's actually defined just inside ttwu_do_wakeup(), which 601*4882a593Smuzhiyunaccounts for the function name actually displayed in the profile: 602*4882a593Smuzhiyun 603*4882a593Smuzhiyun.. code-block:: c 604*4882a593Smuzhiyun 605*4882a593Smuzhiyun /* 606*4882a593Smuzhiyun * Mark the task runnable and perform wakeup-preemption. 607*4882a593Smuzhiyun */ 608*4882a593Smuzhiyun static void 609*4882a593Smuzhiyun ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags) 610*4882a593Smuzhiyun { 611*4882a593Smuzhiyun trace_sched_wakeup(p, true); 612*4882a593Smuzhiyun . 613*4882a593Smuzhiyun . 614*4882a593Smuzhiyun . 615*4882a593Smuzhiyun } 616*4882a593Smuzhiyun 617*4882a593SmuzhiyunA couple of the more interesting 618*4882a593Smuzhiyuncallchains are expanded and displayed above, basically some network 619*4882a593Smuzhiyunreceive paths that presumably end up waking up wget (busybox) when 620*4882a593Smuzhiyunnetwork data is ready. 621*4882a593Smuzhiyun 622*4882a593SmuzhiyunNote that because tracepoints are normally used for tracing, the default 623*4882a593Smuzhiyunsampling period for tracepoints is 1 i.e. for tracepoints perf will 624*4882a593Smuzhiyunsample on every event occurrence (this can be changed using the -c 625*4882a593Smuzhiyunoption). This is in contrast to hardware counters such as for example 626*4882a593Smuzhiyunthe default 'cycles' hardware counter used for normal profiling, where 627*4882a593Smuzhiyunsampling periods are much higher (in the thousands) because profiling 628*4882a593Smuzhiyunshould have as low an overhead as possible and sampling on every cycle 629*4882a593Smuzhiyunwould be prohibitively expensive. 630*4882a593Smuzhiyun 631*4882a593SmuzhiyunUsing perf to do Basic Tracing 632*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 633*4882a593Smuzhiyun 634*4882a593SmuzhiyunProfiling is a great tool for solving many problems or for getting a 635*4882a593Smuzhiyunhigh-level view of what's going on with a workload or across the system. 636*4882a593SmuzhiyunIt is however by definition an approximation, as suggested by the most 637*4882a593Smuzhiyunprominent word associated with it, 'sampling'. On the one hand, it 638*4882a593Smuzhiyunallows a representative picture of what's going on in the system to be 639*4882a593Smuzhiyuncheaply taken, but on the other hand, that cheapness limits its utility 640*4882a593Smuzhiyunwhen that data suggests a need to 'dive down' more deeply to discover 641*4882a593Smuzhiyunwhat's really going on. In such cases, the only way to see what's really 642*4882a593Smuzhiyungoing on is to be able to look at (or summarize more intelligently) the 643*4882a593Smuzhiyunindividual steps that go into the higher-level behavior exposed by the 644*4882a593Smuzhiyuncoarse-grained profiling data. 645*4882a593Smuzhiyun 646*4882a593SmuzhiyunAs a concrete example, we can trace all the events we think might be 647*4882a593Smuzhiyunapplicable to our workload:: 648*4882a593Smuzhiyun 649*4882a593Smuzhiyun root@crownbay:~# perf record -g -e skb:* -e net:* -e napi:* -e sched:sched_switch -e sched:sched_wakeup -e irq:* 650*4882a593Smuzhiyun -e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write 651*4882a593Smuzhiyun wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 652*4882a593Smuzhiyun 653*4882a593SmuzhiyunWe can look at the raw trace output using 'perf script' with no 654*4882a593Smuzhiyunarguments:: 655*4882a593Smuzhiyun 656*4882a593Smuzhiyun root@crownbay:~# perf script 657*4882a593Smuzhiyun 658*4882a593Smuzhiyun perf 1262 [000] 11624.857082: sys_exit_read: 0x0 659*4882a593Smuzhiyun perf 1262 [000] 11624.857193: sched_wakeup: comm=migration/0 pid=6 prio=0 success=1 target_cpu=000 660*4882a593Smuzhiyun wget 1262 [001] 11624.858021: softirq_raise: vec=1 [action=TIMER] 661*4882a593Smuzhiyun wget 1262 [001] 11624.858074: softirq_entry: vec=1 [action=TIMER] 662*4882a593Smuzhiyun wget 1262 [001] 11624.858081: softirq_exit: vec=1 [action=TIMER] 663*4882a593Smuzhiyun wget 1262 [001] 11624.858166: sys_enter_read: fd: 0x0003, buf: 0xbf82c940, count: 0x0200 664*4882a593Smuzhiyun wget 1262 [001] 11624.858177: sys_exit_read: 0x200 665*4882a593Smuzhiyun wget 1262 [001] 11624.858878: kfree_skb: skbaddr=0xeb248d80 protocol=0 location=0xc15a5308 666*4882a593Smuzhiyun wget 1262 [001] 11624.858945: kfree_skb: skbaddr=0xeb248000 protocol=0 location=0xc15a5308 667*4882a593Smuzhiyun wget 1262 [001] 11624.859020: softirq_raise: vec=1 [action=TIMER] 668*4882a593Smuzhiyun wget 1262 [001] 11624.859076: softirq_entry: vec=1 [action=TIMER] 669*4882a593Smuzhiyun wget 1262 [001] 11624.859083: softirq_exit: vec=1 [action=TIMER] 670*4882a593Smuzhiyun wget 1262 [001] 11624.859167: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 671*4882a593Smuzhiyun wget 1262 [001] 11624.859192: sys_exit_read: 0x1d7 672*4882a593Smuzhiyun wget 1262 [001] 11624.859228: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 673*4882a593Smuzhiyun wget 1262 [001] 11624.859233: sys_exit_read: 0x0 674*4882a593Smuzhiyun wget 1262 [001] 11624.859573: sys_enter_read: fd: 0x0003, buf: 0xbf82c580, count: 0x0200 675*4882a593Smuzhiyun wget 1262 [001] 11624.859584: sys_exit_read: 0x200 676*4882a593Smuzhiyun wget 1262 [001] 11624.859864: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 677*4882a593Smuzhiyun wget 1262 [001] 11624.859888: sys_exit_read: 0x400 678*4882a593Smuzhiyun wget 1262 [001] 11624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400 679*4882a593Smuzhiyun wget 1262 [001] 11624.859944: sys_exit_read: 0x400 680*4882a593Smuzhiyun 681*4882a593SmuzhiyunThis gives us a detailed timestamped sequence of events that occurred within the 682*4882a593Smuzhiyunworkload with respect to those events. 683*4882a593Smuzhiyun 684*4882a593SmuzhiyunIn many ways, profiling can be viewed as a subset of tracing - 685*4882a593Smuzhiyuntheoretically, if you have a set of trace events that's sufficient to 686*4882a593Smuzhiyuncapture all the important aspects of a workload, you can derive any of 687*4882a593Smuzhiyunthe results or views that a profiling run can. 688*4882a593Smuzhiyun 689*4882a593SmuzhiyunAnother aspect of traditional profiling is that while powerful in many 690*4882a593Smuzhiyunways, it's limited by the granularity of the underlying data. Profiling 691*4882a593Smuzhiyuntools offer various ways of sorting and presenting the sample data, 692*4882a593Smuzhiyunwhich make it much more useful and amenable to user experimentation, but 693*4882a593Smuzhiyunin the end it can't be used in an open-ended way to extract data that 694*4882a593Smuzhiyunjust isn't present as a consequence of the fact that conceptually, most 695*4882a593Smuzhiyunof it has been thrown away. 696*4882a593Smuzhiyun 697*4882a593SmuzhiyunFull-blown detailed tracing data does however offer the opportunity to 698*4882a593Smuzhiyunmanipulate and present the information collected during a tracing run in 699*4882a593Smuzhiyunan infinite variety of ways. 700*4882a593Smuzhiyun 701*4882a593SmuzhiyunAnother way to look at it is that there are only so many ways that the 702*4882a593Smuzhiyun'primitive' counters can be used on their own to generate interesting 703*4882a593Smuzhiyunoutput; to get anything more complicated than simple counts requires 704*4882a593Smuzhiyunsome amount of additional logic, which is typically very specific to the 705*4882a593Smuzhiyunproblem at hand. For example, if we wanted to make use of a 'counter' 706*4882a593Smuzhiyunthat maps to the value of the time difference between when a process was 707*4882a593Smuzhiyunscheduled to run on a processor and the time it actually ran, we 708*4882a593Smuzhiyunwouldn't expect such a counter to exist on its own, but we could derive 709*4882a593Smuzhiyunone called say 'wakeup_latency' and use it to extract a useful view of 710*4882a593Smuzhiyunthat metric from trace data. Likewise, we really can't figure out from 711*4882a593Smuzhiyunstandard profiling tools how much data every process on the system reads 712*4882a593Smuzhiyunand writes, along with how many of those reads and writes fail 713*4882a593Smuzhiyuncompletely. If we have sufficient trace data, however, we could with the 714*4882a593Smuzhiyunright tools easily extract and present that information, but we'd need 715*4882a593Smuzhiyunsomething other than pre-canned profiling tools to do that. 716*4882a593Smuzhiyun 717*4882a593SmuzhiyunLuckily, there is a general-purpose way to handle such needs, called 718*4882a593Smuzhiyun'programming languages'. Making programming languages easily available 719*4882a593Smuzhiyunto apply to such problems given the specific format of data is called a 720*4882a593Smuzhiyun'programming language binding' for that data and language. Perf supports 721*4882a593Smuzhiyuntwo programming language bindings, one for Python and one for Perl. 722*4882a593Smuzhiyun 723*4882a593Smuzhiyun.. admonition:: Tying it Together 724*4882a593Smuzhiyun 725*4882a593Smuzhiyun Language bindings for manipulating and aggregating trace data are of 726*4882a593Smuzhiyun course not a new idea. One of the first projects to do this was IBM's 727*4882a593Smuzhiyun DProbes dpcc compiler, an ANSI C compiler which targeted a low-level 728*4882a593Smuzhiyun assembly language running on an in-kernel interpreter on the target 729*4882a593Smuzhiyun system. This is exactly analogous to what Sun's DTrace did, except 730*4882a593Smuzhiyun that DTrace invented its own language for the purpose. Systemtap, 731*4882a593Smuzhiyun heavily inspired by DTrace, also created its own one-off language, 732*4882a593Smuzhiyun but rather than running the product on an in-kernel interpreter, 733*4882a593Smuzhiyun created an elaborate compiler-based machinery to translate its 734*4882a593Smuzhiyun language into kernel modules written in C. 735*4882a593Smuzhiyun 736*4882a593SmuzhiyunNow that we have the trace data in perf.data, we can use 'perf script 737*4882a593Smuzhiyun-g' to generate a skeleton script with handlers for the read/write 738*4882a593Smuzhiyunentry/exit events we recorded:: 739*4882a593Smuzhiyun 740*4882a593Smuzhiyun root@crownbay:~# perf script -g python 741*4882a593Smuzhiyun generated Python script: perf-script.py 742*4882a593Smuzhiyun 743*4882a593SmuzhiyunThe skeleton script simply creates a python function for each event type in the 744*4882a593Smuzhiyunperf.data file. The body of each function simply prints the event name along 745*4882a593Smuzhiyunwith its parameters. For example: 746*4882a593Smuzhiyun 747*4882a593Smuzhiyun.. code-block:: python 748*4882a593Smuzhiyun 749*4882a593Smuzhiyun def net__netif_rx(event_name, context, common_cpu, 750*4882a593Smuzhiyun common_secs, common_nsecs, common_pid, common_comm, 751*4882a593Smuzhiyun skbaddr, len, name): 752*4882a593Smuzhiyun print_header(event_name, common_cpu, common_secs, common_nsecs, 753*4882a593Smuzhiyun common_pid, common_comm) 754*4882a593Smuzhiyun 755*4882a593Smuzhiyun print "skbaddr=%u, len=%u, name=%s\n" % (skbaddr, len, name), 756*4882a593Smuzhiyun 757*4882a593SmuzhiyunWe can run that script directly to print all of the events contained in the 758*4882a593Smuzhiyunperf.data file:: 759*4882a593Smuzhiyun 760*4882a593Smuzhiyun root@crownbay:~# perf script -s perf-script.py 761*4882a593Smuzhiyun 762*4882a593Smuzhiyun in trace_begin 763*4882a593Smuzhiyun syscalls__sys_exit_read 0 11624.857082795 1262 perf nr=3, ret=0 764*4882a593Smuzhiyun sched__sched_wakeup 0 11624.857193498 1262 perf comm=migration/0, pid=6, prio=0, success=1, target_cpu=0 765*4882a593Smuzhiyun irq__softirq_raise 1 11624.858021635 1262 wget vec=TIMER 766*4882a593Smuzhiyun irq__softirq_entry 1 11624.858074075 1262 wget vec=TIMER 767*4882a593Smuzhiyun irq__softirq_exit 1 11624.858081389 1262 wget vec=TIMER 768*4882a593Smuzhiyun syscalls__sys_enter_read 1 11624.858166434 1262 wget nr=3, fd=3, buf=3213019456, count=512 769*4882a593Smuzhiyun syscalls__sys_exit_read 1 11624.858177924 1262 wget nr=3, ret=512 770*4882a593Smuzhiyun skb__kfree_skb 1 11624.858878188 1262 wget skbaddr=3945041280, location=3243922184, protocol=0 771*4882a593Smuzhiyun skb__kfree_skb 1 11624.858945608 1262 wget skbaddr=3945037824, location=3243922184, protocol=0 772*4882a593Smuzhiyun irq__softirq_raise 1 11624.859020942 1262 wget vec=TIMER 773*4882a593Smuzhiyun irq__softirq_entry 1 11624.859076935 1262 wget vec=TIMER 774*4882a593Smuzhiyun irq__softirq_exit 1 11624.859083469 1262 wget vec=TIMER 775*4882a593Smuzhiyun syscalls__sys_enter_read 1 11624.859167565 1262 wget nr=3, fd=3, buf=3077701632, count=1024 776*4882a593Smuzhiyun syscalls__sys_exit_read 1 11624.859192533 1262 wget nr=3, ret=471 777*4882a593Smuzhiyun syscalls__sys_enter_read 1 11624.859228072 1262 wget nr=3, fd=3, buf=3077701632, count=1024 778*4882a593Smuzhiyun syscalls__sys_exit_read 1 11624.859233707 1262 wget nr=3, ret=0 779*4882a593Smuzhiyun syscalls__sys_enter_read 1 11624.859573008 1262 wget nr=3, fd=3, buf=3213018496, count=512 780*4882a593Smuzhiyun syscalls__sys_exit_read 1 11624.859584818 1262 wget nr=3, ret=512 781*4882a593Smuzhiyun syscalls__sys_enter_read 1 11624.859864562 1262 wget nr=3, fd=3, buf=3077701632, count=1024 782*4882a593Smuzhiyun syscalls__sys_exit_read 1 11624.859888770 1262 wget nr=3, ret=1024 783*4882a593Smuzhiyun syscalls__sys_enter_read 1 11624.859935140 1262 wget nr=3, fd=3, buf=3077701632, count=1024 784*4882a593Smuzhiyun syscalls__sys_exit_read 1 11624.859944032 1262 wget nr=3, ret=1024 785*4882a593Smuzhiyun 786*4882a593SmuzhiyunThat in itself isn't very useful; after all, we can accomplish pretty much the 787*4882a593Smuzhiyunsame thing by simply running 'perf script' without arguments in the same 788*4882a593Smuzhiyundirectory as the perf.data file. 789*4882a593Smuzhiyun 790*4882a593SmuzhiyunWe can however replace the print statements in the generated function 791*4882a593Smuzhiyunbodies with whatever we want, and thereby make it infinitely more 792*4882a593Smuzhiyunuseful. 793*4882a593Smuzhiyun 794*4882a593SmuzhiyunAs a simple example, let's just replace the print statements in the 795*4882a593Smuzhiyunfunction bodies with a simple function that does nothing but increment a 796*4882a593Smuzhiyunper-event count. When the program is run against a perf.data file, each 797*4882a593Smuzhiyuntime a particular event is encountered, a tally is incremented for that 798*4882a593Smuzhiyunevent. For example: 799*4882a593Smuzhiyun 800*4882a593Smuzhiyun.. code-block:: python 801*4882a593Smuzhiyun 802*4882a593Smuzhiyun def net__netif_rx(event_name, context, common_cpu, 803*4882a593Smuzhiyun common_secs, common_nsecs, common_pid, common_comm, 804*4882a593Smuzhiyun skbaddr, len, name): 805*4882a593Smuzhiyun inc_counts(event_name) 806*4882a593Smuzhiyun 807*4882a593SmuzhiyunEach event handler function in the generated code 808*4882a593Smuzhiyunis modified to do this. For convenience, we define a common function 809*4882a593Smuzhiyuncalled inc_counts() that each handler calls; inc_counts() simply tallies 810*4882a593Smuzhiyuna count for each event using the 'counts' hash, which is a specialized 811*4882a593Smuzhiyunhash function that does Perl-like autovivification, a capability that's 812*4882a593Smuzhiyunextremely useful for kinds of multi-level aggregation commonly used in 813*4882a593Smuzhiyunprocessing traces (see perf's documentation on the Python language 814*4882a593Smuzhiyunbinding for details): 815*4882a593Smuzhiyun 816*4882a593Smuzhiyun.. code-block:: python 817*4882a593Smuzhiyun 818*4882a593Smuzhiyun counts = autodict() 819*4882a593Smuzhiyun 820*4882a593Smuzhiyun def inc_counts(event_name): 821*4882a593Smuzhiyun try: 822*4882a593Smuzhiyun counts[event_name] += 1 823*4882a593Smuzhiyun except TypeError: 824*4882a593Smuzhiyun counts[event_name] = 1 825*4882a593Smuzhiyun 826*4882a593SmuzhiyunFinally, at the end of the trace processing run, we want to print the 827*4882a593Smuzhiyunresult of all the per-event tallies. For that, we use the special 828*4882a593Smuzhiyun'trace_end()' function: 829*4882a593Smuzhiyun 830*4882a593Smuzhiyun.. code-block:: python 831*4882a593Smuzhiyun 832*4882a593Smuzhiyun def trace_end(): 833*4882a593Smuzhiyun for event_name, count in counts.iteritems(): 834*4882a593Smuzhiyun print "%-40s %10s\n" % (event_name, count) 835*4882a593Smuzhiyun 836*4882a593SmuzhiyunThe end result is a summary of all the events recorded in the trace:: 837*4882a593Smuzhiyun 838*4882a593Smuzhiyun skb__skb_copy_datagram_iovec 13148 839*4882a593Smuzhiyun irq__softirq_entry 4796 840*4882a593Smuzhiyun irq__irq_handler_exit 3805 841*4882a593Smuzhiyun irq__softirq_exit 4795 842*4882a593Smuzhiyun syscalls__sys_enter_write 8990 843*4882a593Smuzhiyun net__net_dev_xmit 652 844*4882a593Smuzhiyun skb__kfree_skb 4047 845*4882a593Smuzhiyun sched__sched_wakeup 1155 846*4882a593Smuzhiyun irq__irq_handler_entry 3804 847*4882a593Smuzhiyun irq__softirq_raise 4799 848*4882a593Smuzhiyun net__net_dev_queue 652 849*4882a593Smuzhiyun syscalls__sys_enter_read 17599 850*4882a593Smuzhiyun net__netif_receive_skb 1743 851*4882a593Smuzhiyun syscalls__sys_exit_read 17598 852*4882a593Smuzhiyun net__netif_rx 2 853*4882a593Smuzhiyun napi__napi_poll 1877 854*4882a593Smuzhiyun syscalls__sys_exit_write 8990 855*4882a593Smuzhiyun 856*4882a593SmuzhiyunNote that this is 857*4882a593Smuzhiyunpretty much exactly the same information we get from 'perf stat', which 858*4882a593Smuzhiyungoes a little way to support the idea mentioned previously that given 859*4882a593Smuzhiyunthe right kind of trace data, higher-level profiling-type summaries can 860*4882a593Smuzhiyunbe derived from it. 861*4882a593Smuzhiyun 862*4882a593SmuzhiyunDocumentation on using the `'perf script' python 863*4882a593Smuzhiyunbinding <https://linux.die.net/man/1/perf-script-python>`__. 864*4882a593Smuzhiyun 865*4882a593SmuzhiyunSystem-Wide Tracing and Profiling 866*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 867*4882a593Smuzhiyun 868*4882a593SmuzhiyunThe examples so far have focused on tracing a particular program or 869*4882a593Smuzhiyunworkload - in other words, every profiling run has specified the program 870*4882a593Smuzhiyunto profile in the command-line e.g. 'perf record wget ...'. 871*4882a593Smuzhiyun 872*4882a593SmuzhiyunIt's also possible, and more interesting in many cases, to run a 873*4882a593Smuzhiyunsystem-wide profile or trace while running the workload in a separate 874*4882a593Smuzhiyunshell. 875*4882a593Smuzhiyun 876*4882a593SmuzhiyunTo do system-wide profiling or tracing, you typically use the -a flag to 877*4882a593Smuzhiyun'perf record'. 878*4882a593Smuzhiyun 879*4882a593SmuzhiyunTo demonstrate this, open up one window and start the profile using the 880*4882a593Smuzhiyun-a flag (press Ctrl-C to stop tracing):: 881*4882a593Smuzhiyun 882*4882a593Smuzhiyun root@crownbay:~# perf record -g -a 883*4882a593Smuzhiyun ^C[ perf record: Woken up 6 times to write data ] 884*4882a593Smuzhiyun [ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ] 885*4882a593Smuzhiyun 886*4882a593SmuzhiyunIn another window, run the wget test:: 887*4882a593Smuzhiyun 888*4882a593Smuzhiyun root@crownbay:~# wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2 889*4882a593Smuzhiyun Connecting to downloads.yoctoproject.org (140.211.169.59:80) 890*4882a593Smuzhiyun linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA 891*4882a593Smuzhiyun 892*4882a593SmuzhiyunHere we see entries not only for our wget load, but for 893*4882a593Smuzhiyunother processes running on the system as well: 894*4882a593Smuzhiyun 895*4882a593Smuzhiyun.. image:: figures/perf-systemwide.png 896*4882a593Smuzhiyun :align: center 897*4882a593Smuzhiyun 898*4882a593SmuzhiyunIn the snapshot above, we can see callchains that originate in libc, and 899*4882a593Smuzhiyuna callchain from Xorg that demonstrates that we're using a proprietary X 900*4882a593Smuzhiyundriver in userspace (notice the presence of 'PVR' and some other 901*4882a593Smuzhiyununresolvable symbols in the expanded Xorg callchain). 902*4882a593Smuzhiyun 903*4882a593SmuzhiyunNote also that we have both kernel and userspace entries in the above 904*4882a593Smuzhiyunsnapshot. We can also tell perf to focus on userspace but providing a 905*4882a593Smuzhiyunmodifier, in this case 'u', to the 'cycles' hardware counter when we 906*4882a593Smuzhiyunrecord a profile:: 907*4882a593Smuzhiyun 908*4882a593Smuzhiyun root@crownbay:~# perf record -g -a -e cycles:u 909*4882a593Smuzhiyun ^C[ perf record: Woken up 2 times to write data ] 910*4882a593Smuzhiyun [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ] 911*4882a593Smuzhiyun 912*4882a593Smuzhiyun.. image:: figures/perf-report-cycles-u.png 913*4882a593Smuzhiyun :align: center 914*4882a593Smuzhiyun 915*4882a593SmuzhiyunNotice in the screenshot above, we see only userspace entries ([.]) 916*4882a593Smuzhiyun 917*4882a593SmuzhiyunFinally, we can press 'enter' on a leaf node and select the 'Zoom into 918*4882a593SmuzhiyunDSO' menu item to show only entries associated with a specific DSO. In 919*4882a593Smuzhiyunthe screenshot below, we've zoomed into the 'libc' DSO which shows all 920*4882a593Smuzhiyunthe entries associated with the libc-xxx.so DSO. 921*4882a593Smuzhiyun 922*4882a593Smuzhiyun.. image:: figures/perf-systemwide-libc.png 923*4882a593Smuzhiyun :align: center 924*4882a593Smuzhiyun 925*4882a593SmuzhiyunWe can also use the system-wide -a switch to do system-wide tracing. 926*4882a593SmuzhiyunHere we'll trace a couple of scheduler events:: 927*4882a593Smuzhiyun 928*4882a593Smuzhiyun root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup 929*4882a593Smuzhiyun ^C[ perf record: Woken up 38 times to write data ] 930*4882a593Smuzhiyun [ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ] 931*4882a593Smuzhiyun 932*4882a593SmuzhiyunWe can look at the raw output using 'perf script' with no arguments:: 933*4882a593Smuzhiyun 934*4882a593Smuzhiyun root@crownbay:~# perf script 935*4882a593Smuzhiyun 936*4882a593Smuzhiyun perf 1383 [001] 6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 937*4882a593Smuzhiyun perf 1383 [001] 6171.460066: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 938*4882a593Smuzhiyun kworker/1:1 21 [001] 6171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 939*4882a593Smuzhiyun swapper 0 [000] 6171.468063: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 940*4882a593Smuzhiyun swapper 0 [000] 6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 941*4882a593Smuzhiyun kworker/0:3 1209 [000] 6171.468143: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 942*4882a593Smuzhiyun perf 1383 [001] 6171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 943*4882a593Smuzhiyun perf 1383 [001] 6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 944*4882a593Smuzhiyun kworker/1:1 21 [001] 6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120 945*4882a593Smuzhiyun perf 1383 [001] 6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 946*4882a593Smuzhiyun 947*4882a593SmuzhiyunFiltering 948*4882a593Smuzhiyun^^^^^^^^^ 949*4882a593Smuzhiyun 950*4882a593SmuzhiyunNotice that there are a lot of events that don't really have anything to 951*4882a593Smuzhiyundo with what we're interested in, namely events that schedule 'perf' 952*4882a593Smuzhiyunitself in and out or that wake perf up. We can get rid of those by using 953*4882a593Smuzhiyunthe '--filter' option - for each event we specify using -e, we can add a 954*4882a593Smuzhiyun--filter after that to filter out trace events that contain fields with 955*4882a593Smuzhiyunspecific values:: 956*4882a593Smuzhiyun 957*4882a593Smuzhiyun root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf' 958*4882a593Smuzhiyun ^C[ perf record: Woken up 38 times to write data ] 959*4882a593Smuzhiyun [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ] 960*4882a593Smuzhiyun 961*4882a593Smuzhiyun 962*4882a593Smuzhiyun root@crownbay:~# perf script 963*4882a593Smuzhiyun 964*4882a593Smuzhiyun swapper 0 [000] 7932.162180: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 965*4882a593Smuzhiyun kworker/0:3 1209 [000] 7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 966*4882a593Smuzhiyun perf 1407 [001] 7932.170048: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 967*4882a593Smuzhiyun perf 1407 [001] 7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 968*4882a593Smuzhiyun perf 1407 [001] 7932.190038: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 969*4882a593Smuzhiyun perf 1407 [001] 7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 970*4882a593Smuzhiyun perf 1407 [001] 7932.210044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 971*4882a593Smuzhiyun perf 1407 [001] 7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 972*4882a593Smuzhiyun swapper 0 [001] 7932.230111: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001 973*4882a593Smuzhiyun swapper 0 [001] 7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21 next_prio=120 974*4882a593Smuzhiyun kworker/1:1 21 [001] 7932.230205: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120 975*4882a593Smuzhiyun swapper 0 [000] 7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000 976*4882a593Smuzhiyun swapper 0 [000] 7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120 977*4882a593Smuzhiyun kworker/0:3 1209 [000] 7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 978*4882a593Smuzhiyun 979*4882a593SmuzhiyunIn this case, we've filtered out all events that have 980*4882a593Smuzhiyun'perf' in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice that 981*4882a593Smuzhiyunthere are still events recorded for perf, but notice that those events 982*4882a593Smuzhiyundon't have values of 'perf' for the filtered fields. To completely 983*4882a593Smuzhiyunfilter out anything from perf will require a bit more work, but for the 984*4882a593Smuzhiyunpurpose of demonstrating how to use filters, it's close enough. 985*4882a593Smuzhiyun 986*4882a593Smuzhiyun.. admonition:: Tying it Together 987*4882a593Smuzhiyun 988*4882a593Smuzhiyun These are exactly the same set of event filters defined by the trace 989*4882a593Smuzhiyun event subsystem. See the ftrace/tracecmd/kernelshark section for more 990*4882a593Smuzhiyun discussion about these event filters. 991*4882a593Smuzhiyun 992*4882a593Smuzhiyun.. admonition:: Tying it Together 993*4882a593Smuzhiyun 994*4882a593Smuzhiyun These event filters are implemented by a special-purpose 995*4882a593Smuzhiyun pseudo-interpreter in the kernel and are an integral and 996*4882a593Smuzhiyun indispensable part of the perf design as it relates to tracing. 997*4882a593Smuzhiyun kernel-based event filters provide a mechanism to precisely throttle 998*4882a593Smuzhiyun the event stream that appears in user space, where it makes sense to 999*4882a593Smuzhiyun provide bindings to real programming languages for postprocessing the 1000*4882a593Smuzhiyun event stream. This architecture allows for the intelligent and 1001*4882a593Smuzhiyun flexible partitioning of processing between the kernel and user 1002*4882a593Smuzhiyun space. Contrast this with other tools such as SystemTap, which does 1003*4882a593Smuzhiyun all of its processing in the kernel and as such requires a special 1004*4882a593Smuzhiyun project-defined language in order to accommodate that design, or 1005*4882a593Smuzhiyun LTTng, where everything is sent to userspace and as such requires a 1006*4882a593Smuzhiyun super-efficient kernel-to-userspace transport mechanism in order to 1007*4882a593Smuzhiyun function properly. While perf certainly can benefit from for instance 1008*4882a593Smuzhiyun advances in the design of the transport, it doesn't fundamentally 1009*4882a593Smuzhiyun depend on them. Basically, if you find that your perf tracing 1010*4882a593Smuzhiyun application is causing buffer I/O overruns, it probably means that 1011*4882a593Smuzhiyun you aren't taking enough advantage of the kernel filtering engine. 1012*4882a593Smuzhiyun 1013*4882a593SmuzhiyunUsing Dynamic Tracepoints 1014*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~ 1015*4882a593Smuzhiyun 1016*4882a593Smuzhiyunperf isn't restricted to the fixed set of static tracepoints listed by 1017*4882a593Smuzhiyun'perf list'. Users can also add their own 'dynamic' tracepoints anywhere 1018*4882a593Smuzhiyunin the kernel. For instance, suppose we want to define our own 1019*4882a593Smuzhiyuntracepoint on do_fork(). We can do that using the 'perf probe' perf 1020*4882a593Smuzhiyunsubcommand:: 1021*4882a593Smuzhiyun 1022*4882a593Smuzhiyun root@crownbay:~# perf probe do_fork 1023*4882a593Smuzhiyun Added new event: 1024*4882a593Smuzhiyun probe:do_fork (on do_fork) 1025*4882a593Smuzhiyun 1026*4882a593Smuzhiyun You can now use it in all perf tools, such as: 1027*4882a593Smuzhiyun 1028*4882a593Smuzhiyun perf record -e probe:do_fork -aR sleep 1 1029*4882a593Smuzhiyun 1030*4882a593SmuzhiyunAdding a new tracepoint via 1031*4882a593Smuzhiyun'perf probe' results in an event with all the expected files and format 1032*4882a593Smuzhiyunin /sys/kernel/debug/tracing/events, just the same as for static 1033*4882a593Smuzhiyuntracepoints (as discussed in more detail in the trace events subsystem 1034*4882a593Smuzhiyunsection:: 1035*4882a593Smuzhiyun 1036*4882a593Smuzhiyun root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al 1037*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Oct 28 11:42 . 1038*4882a593Smuzhiyun drwxr-xr-x 3 root root 0 Oct 28 11:42 .. 1039*4882a593Smuzhiyun -rw-r--r-- 1 root root 0 Oct 28 11:42 enable 1040*4882a593Smuzhiyun -rw-r--r-- 1 root root 0 Oct 28 11:42 filter 1041*4882a593Smuzhiyun -r--r--r-- 1 root root 0 Oct 28 11:42 format 1042*4882a593Smuzhiyun -r--r--r-- 1 root root 0 Oct 28 11:42 id 1043*4882a593Smuzhiyun 1044*4882a593Smuzhiyun root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format 1045*4882a593Smuzhiyun name: do_fork 1046*4882a593Smuzhiyun ID: 944 1047*4882a593Smuzhiyun format: 1048*4882a593Smuzhiyun field:unsigned short common_type; offset:0; size:2; signed:0; 1049*4882a593Smuzhiyun field:unsigned char common_flags; offset:2; size:1; signed:0; 1050*4882a593Smuzhiyun field:unsigned char common_preempt_count; offset:3; size:1; signed:0; 1051*4882a593Smuzhiyun field:int common_pid; offset:4; size:4; signed:1; 1052*4882a593Smuzhiyun field:int common_padding; offset:8; size:4; signed:1; 1053*4882a593Smuzhiyun 1054*4882a593Smuzhiyun field:unsigned long __probe_ip; offset:12; size:4; signed:0; 1055*4882a593Smuzhiyun 1056*4882a593Smuzhiyun print fmt: "(%lx)", REC->__probe_ip 1057*4882a593Smuzhiyun 1058*4882a593SmuzhiyunWe can list all dynamic tracepoints currently in 1059*4882a593Smuzhiyunexistence:: 1060*4882a593Smuzhiyun 1061*4882a593Smuzhiyun root@crownbay:~# perf probe -l 1062*4882a593Smuzhiyun probe:do_fork (on do_fork) 1063*4882a593Smuzhiyun probe:schedule (on schedule) 1064*4882a593Smuzhiyun 1065*4882a593SmuzhiyunLet's record system-wide ('sleep 30' is a 1066*4882a593Smuzhiyuntrick for recording system-wide but basically do nothing and then wake 1067*4882a593Smuzhiyunup after 30 seconds):: 1068*4882a593Smuzhiyun 1069*4882a593Smuzhiyun root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30 1070*4882a593Smuzhiyun [ perf record: Woken up 1 times to write data ] 1071*4882a593Smuzhiyun [ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ] 1072*4882a593Smuzhiyun 1073*4882a593SmuzhiyunUsing 'perf script' we can see each do_fork event that fired:: 1074*4882a593Smuzhiyun 1075*4882a593Smuzhiyun root@crownbay:~# perf script 1076*4882a593Smuzhiyun 1077*4882a593Smuzhiyun # ======== 1078*4882a593Smuzhiyun # captured on: Sun Oct 28 11:55:18 2012 1079*4882a593Smuzhiyun # hostname : crownbay 1080*4882a593Smuzhiyun # os release : 3.4.11-yocto-standard 1081*4882a593Smuzhiyun # perf version : 3.4.11 1082*4882a593Smuzhiyun # arch : i686 1083*4882a593Smuzhiyun # nrcpus online : 2 1084*4882a593Smuzhiyun # nrcpus avail : 2 1085*4882a593Smuzhiyun # cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz 1086*4882a593Smuzhiyun # cpuid : GenuineIntel,6,38,1 1087*4882a593Smuzhiyun # total memory : 1017184 kB 1088*4882a593Smuzhiyun # cmdline : /usr/bin/perf record -g -a -e probe:do_fork sleep 30 1089*4882a593Smuzhiyun # event : name = probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern 1090*4882a593Smuzhiyun = 0, id = { 5, 6 } 1091*4882a593Smuzhiyun # HEADER_CPU_TOPOLOGY info available, use -I to display 1092*4882a593Smuzhiyun # ======== 1093*4882a593Smuzhiyun # 1094*4882a593Smuzhiyun matchbox-deskto 1197 [001] 34211.378318: do_fork: (c1028460) 1095*4882a593Smuzhiyun matchbox-deskto 1295 [001] 34211.380388: do_fork: (c1028460) 1096*4882a593Smuzhiyun pcmanfm 1296 [000] 34211.632350: do_fork: (c1028460) 1097*4882a593Smuzhiyun pcmanfm 1296 [000] 34211.639917: do_fork: (c1028460) 1098*4882a593Smuzhiyun matchbox-deskto 1197 [001] 34217.541603: do_fork: (c1028460) 1099*4882a593Smuzhiyun matchbox-deskto 1299 [001] 34217.543584: do_fork: (c1028460) 1100*4882a593Smuzhiyun gthumb 1300 [001] 34217.697451: do_fork: (c1028460) 1101*4882a593Smuzhiyun gthumb 1300 [001] 34219.085734: do_fork: (c1028460) 1102*4882a593Smuzhiyun gthumb 1300 [000] 34219.121351: do_fork: (c1028460) 1103*4882a593Smuzhiyun gthumb 1300 [001] 34219.264551: do_fork: (c1028460) 1104*4882a593Smuzhiyun pcmanfm 1296 [000] 34219.590380: do_fork: (c1028460) 1105*4882a593Smuzhiyun matchbox-deskto 1197 [001] 34224.955965: do_fork: (c1028460) 1106*4882a593Smuzhiyun matchbox-deskto 1306 [001] 34224.957972: do_fork: (c1028460) 1107*4882a593Smuzhiyun matchbox-termin 1307 [000] 34225.038214: do_fork: (c1028460) 1108*4882a593Smuzhiyun matchbox-termin 1307 [001] 34225.044218: do_fork: (c1028460) 1109*4882a593Smuzhiyun matchbox-termin 1307 [000] 34225.046442: do_fork: (c1028460) 1110*4882a593Smuzhiyun matchbox-deskto 1197 [001] 34237.112138: do_fork: (c1028460) 1111*4882a593Smuzhiyun matchbox-deskto 1311 [001] 34237.114106: do_fork: (c1028460) 1112*4882a593Smuzhiyun gaku 1312 [000] 34237.202388: do_fork: (c1028460) 1113*4882a593Smuzhiyun 1114*4882a593SmuzhiyunAnd using 'perf report' on the same file, we can see the 1115*4882a593Smuzhiyuncallgraphs from starting a few programs during those 30 seconds: 1116*4882a593Smuzhiyun 1117*4882a593Smuzhiyun.. image:: figures/perf-probe-do_fork-profile.png 1118*4882a593Smuzhiyun :align: center 1119*4882a593Smuzhiyun 1120*4882a593Smuzhiyun.. admonition:: Tying it Together 1121*4882a593Smuzhiyun 1122*4882a593Smuzhiyun The trace events subsystem accommodate static and dynamic tracepoints 1123*4882a593Smuzhiyun in exactly the same way - there's no difference as far as the 1124*4882a593Smuzhiyun infrastructure is concerned. See the ftrace section for more details 1125*4882a593Smuzhiyun on the trace event subsystem. 1126*4882a593Smuzhiyun 1127*4882a593Smuzhiyun.. admonition:: Tying it Together 1128*4882a593Smuzhiyun 1129*4882a593Smuzhiyun Dynamic tracepoints are implemented under the covers by kprobes and 1130*4882a593Smuzhiyun uprobes. kprobes and uprobes are also used by and in fact are the 1131*4882a593Smuzhiyun main focus of SystemTap. 1132*4882a593Smuzhiyun 1133*4882a593SmuzhiyunPerf Documentation 1134*4882a593Smuzhiyun------------------ 1135*4882a593Smuzhiyun 1136*4882a593SmuzhiyunOnline versions of the man pages for the commands discussed in this 1137*4882a593Smuzhiyunsection can be found here: 1138*4882a593Smuzhiyun 1139*4882a593Smuzhiyun- The `'perf stat' manpage <https://linux.die.net/man/1/perf-stat>`__. 1140*4882a593Smuzhiyun 1141*4882a593Smuzhiyun- The `'perf record' 1142*4882a593Smuzhiyun manpage <https://linux.die.net/man/1/perf-record>`__. 1143*4882a593Smuzhiyun 1144*4882a593Smuzhiyun- The `'perf report' 1145*4882a593Smuzhiyun manpage <https://linux.die.net/man/1/perf-report>`__. 1146*4882a593Smuzhiyun 1147*4882a593Smuzhiyun- The `'perf probe' manpage <https://linux.die.net/man/1/perf-probe>`__. 1148*4882a593Smuzhiyun 1149*4882a593Smuzhiyun- The `'perf script' 1150*4882a593Smuzhiyun manpage <https://linux.die.net/man/1/perf-script>`__. 1151*4882a593Smuzhiyun 1152*4882a593Smuzhiyun- Documentation on using the `'perf script' python 1153*4882a593Smuzhiyun binding <https://linux.die.net/man/1/perf-script-python>`__. 1154*4882a593Smuzhiyun 1155*4882a593Smuzhiyun- The top-level `perf(1) manpage <https://linux.die.net/man/1/perf>`__. 1156*4882a593Smuzhiyun 1157*4882a593SmuzhiyunNormally, you should be able to invoke the man pages via perf itself 1158*4882a593Smuzhiyune.g. 'perf help' or 'perf help record'. 1159*4882a593Smuzhiyun 1160*4882a593SmuzhiyunTo have the perf manpages installed on your target, modify your 1161*4882a593Smuzhiyunconfiguration as follows:: 1162*4882a593Smuzhiyun 1163*4882a593Smuzhiyun IMAGE_INSTALL:append = " perf perf-doc" 1164*4882a593Smuzhiyun DISTRO_FEATURES:append = " api-documentation" 1165*4882a593Smuzhiyun 1166*4882a593SmuzhiyunThe man pages in text form, along with some other files, such as a set 1167*4882a593Smuzhiyunof examples, can also be found in the 'perf' directory of the kernel tree:: 1168*4882a593Smuzhiyun 1169*4882a593Smuzhiyun tools/perf/Documentation 1170*4882a593Smuzhiyun 1171*4882a593SmuzhiyunThere's also a nice perf tutorial on the perf 1172*4882a593Smuzhiyunwiki that goes into more detail than we do here in certain areas: `Perf 1173*4882a593SmuzhiyunTutorial <https://perf.wiki.kernel.org/index.php/Tutorial>`__ 1174*4882a593Smuzhiyun 1175*4882a593Smuzhiyunftrace 1176*4882a593Smuzhiyun====== 1177*4882a593Smuzhiyun 1178*4882a593Smuzhiyun'ftrace' literally refers to the 'ftrace function tracer' but in reality 1179*4882a593Smuzhiyunthis encompasses a number of related tracers along with the 1180*4882a593Smuzhiyuninfrastructure that they all make use of. 1181*4882a593Smuzhiyun 1182*4882a593Smuzhiyunftrace Setup 1183*4882a593Smuzhiyun------------ 1184*4882a593Smuzhiyun 1185*4882a593SmuzhiyunFor this section, we'll assume you've already performed the basic setup 1186*4882a593Smuzhiyunoutlined in the ":ref:`profile-manual/intro:General Setup`" section. 1187*4882a593Smuzhiyun 1188*4882a593Smuzhiyunftrace, trace-cmd, and kernelshark run on the target system, and are 1189*4882a593Smuzhiyunready to go out-of-the-box - no additional setup is necessary. For the 1190*4882a593Smuzhiyunrest of this section we assume you've ssh'ed to the host and will be 1191*4882a593Smuzhiyunrunning ftrace on the target. kernelshark is a GUI application and if 1192*4882a593Smuzhiyunyou use the '-X' option to ssh you can have the kernelshark GUI run on 1193*4882a593Smuzhiyunthe target but display remotely on the host if you want. 1194*4882a593Smuzhiyun 1195*4882a593SmuzhiyunBasic ftrace usage 1196*4882a593Smuzhiyun------------------ 1197*4882a593Smuzhiyun 1198*4882a593Smuzhiyun'ftrace' essentially refers to everything included in the /tracing 1199*4882a593Smuzhiyundirectory of the mounted debugfs filesystem (Yocto follows the standard 1200*4882a593Smuzhiyunconvention and mounts it at /sys/kernel/debug). Here's a listing of all 1201*4882a593Smuzhiyunthe files found in /sys/kernel/debug/tracing on a Yocto system:: 1202*4882a593Smuzhiyun 1203*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# ls 1204*4882a593Smuzhiyun README kprobe_events trace 1205*4882a593Smuzhiyun available_events kprobe_profile trace_clock 1206*4882a593Smuzhiyun available_filter_functions options trace_marker 1207*4882a593Smuzhiyun available_tracers per_cpu trace_options 1208*4882a593Smuzhiyun buffer_size_kb printk_formats trace_pipe 1209*4882a593Smuzhiyun buffer_total_size_kb saved_cmdlines tracing_cpumask 1210*4882a593Smuzhiyun current_tracer set_event tracing_enabled 1211*4882a593Smuzhiyun dyn_ftrace_total_info set_ftrace_filter tracing_on 1212*4882a593Smuzhiyun enabled_functions set_ftrace_notrace tracing_thresh 1213*4882a593Smuzhiyun events set_ftrace_pid 1214*4882a593Smuzhiyun free_buffer set_graph_function 1215*4882a593Smuzhiyun 1216*4882a593SmuzhiyunThe files listed above are used for various purposes 1217*4882a593Smuzhiyun- some relate directly to the tracers themselves, others are used to set 1218*4882a593Smuzhiyuntracing options, and yet others actually contain the tracing output when 1219*4882a593Smuzhiyuna tracer is in effect. Some of the functions can be guessed from their 1220*4882a593Smuzhiyunnames, others need explanation; in any case, we'll cover some of the 1221*4882a593Smuzhiyunfiles we see here below but for an explanation of the others, please see 1222*4882a593Smuzhiyunthe ftrace documentation. 1223*4882a593Smuzhiyun 1224*4882a593SmuzhiyunWe'll start by looking at some of the available built-in tracers. 1225*4882a593Smuzhiyun 1226*4882a593Smuzhiyuncat'ing the 'available_tracers' file lists the set of available tracers:: 1227*4882a593Smuzhiyun 1228*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# cat available_tracers 1229*4882a593Smuzhiyun blk function_graph function nop 1230*4882a593Smuzhiyun 1231*4882a593SmuzhiyunThe 'current_tracer' file contains the tracer currently in effect:: 1232*4882a593Smuzhiyun 1233*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer 1234*4882a593Smuzhiyun nop 1235*4882a593Smuzhiyun 1236*4882a593SmuzhiyunThe above listing of current_tracer shows that the 1237*4882a593Smuzhiyun'nop' tracer is in effect, which is just another way of saying that 1238*4882a593Smuzhiyunthere's actually no tracer currently in effect. 1239*4882a593Smuzhiyun 1240*4882a593Smuzhiyunecho'ing one of the available_tracers into current_tracer makes the 1241*4882a593Smuzhiyunspecified tracer the current tracer:: 1242*4882a593Smuzhiyun 1243*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# echo function > current_tracer 1244*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer 1245*4882a593Smuzhiyun function 1246*4882a593Smuzhiyun 1247*4882a593SmuzhiyunThe above sets the current tracer to be the 'function tracer'. This tracer 1248*4882a593Smuzhiyuntraces every function call in the kernel and makes it available as the 1249*4882a593Smuzhiyuncontents of the 'trace' file. Reading the 'trace' file lists the 1250*4882a593Smuzhiyuncurrently buffered function calls that have been traced by the function 1251*4882a593Smuzhiyuntracer:: 1252*4882a593Smuzhiyun 1253*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# cat trace | less 1254*4882a593Smuzhiyun 1255*4882a593Smuzhiyun # tracer: function 1256*4882a593Smuzhiyun # 1257*4882a593Smuzhiyun # entries-in-buffer/entries-written: 310629/766471 #P:8 1258*4882a593Smuzhiyun # 1259*4882a593Smuzhiyun # _-----=> irqs-off 1260*4882a593Smuzhiyun # / _----=> need-resched 1261*4882a593Smuzhiyun # | / _---=> hardirq/softirq 1262*4882a593Smuzhiyun # || / _--=> preempt-depth 1263*4882a593Smuzhiyun # ||| / delay 1264*4882a593Smuzhiyun # TASK-PID CPU# |||| TIMESTAMP FUNCTION 1265*4882a593Smuzhiyun # | | | |||| | | 1266*4882a593Smuzhiyun <idle>-0 [004] d..1 470.867169: ktime_get_real <-intel_idle 1267*4882a593Smuzhiyun <idle>-0 [004] d..1 470.867170: getnstimeofday <-ktime_get_real 1268*4882a593Smuzhiyun <idle>-0 [004] d..1 470.867171: ns_to_timeval <-intel_idle 1269*4882a593Smuzhiyun <idle>-0 [004] d..1 470.867171: ns_to_timespec <-ns_to_timeval 1270*4882a593Smuzhiyun <idle>-0 [004] d..1 470.867172: smp_apic_timer_interrupt <-apic_timer_interrupt 1271*4882a593Smuzhiyun <idle>-0 [004] d..1 470.867172: native_apic_mem_write <-smp_apic_timer_interrupt 1272*4882a593Smuzhiyun <idle>-0 [004] d..1 470.867172: irq_enter <-smp_apic_timer_interrupt 1273*4882a593Smuzhiyun <idle>-0 [004] d..1 470.867172: rcu_irq_enter <-irq_enter 1274*4882a593Smuzhiyun <idle>-0 [004] d..1 470.867173: rcu_idle_exit_common.isra.33 <-rcu_irq_enter 1275*4882a593Smuzhiyun <idle>-0 [004] d..1 470.867173: local_bh_disable <-irq_enter 1276*4882a593Smuzhiyun <idle>-0 [004] d..1 470.867173: add_preempt_count <-local_bh_disable 1277*4882a593Smuzhiyun <idle>-0 [004] d.s1 470.867174: tick_check_idle <-irq_enter 1278*4882a593Smuzhiyun <idle>-0 [004] d.s1 470.867174: tick_check_oneshot_broadcast <-tick_check_idle 1279*4882a593Smuzhiyun <idle>-0 [004] d.s1 470.867174: ktime_get <-tick_check_idle 1280*4882a593Smuzhiyun <idle>-0 [004] d.s1 470.867174: tick_nohz_stop_idle <-tick_check_idle 1281*4882a593Smuzhiyun <idle>-0 [004] d.s1 470.867175: update_ts_time_stats <-tick_nohz_stop_idle 1282*4882a593Smuzhiyun <idle>-0 [004] d.s1 470.867175: nr_iowait_cpu <-update_ts_time_stats 1283*4882a593Smuzhiyun <idle>-0 [004] d.s1 470.867175: tick_do_update_jiffies64 <-tick_check_idle 1284*4882a593Smuzhiyun <idle>-0 [004] d.s1 470.867175: _raw_spin_lock <-tick_do_update_jiffies64 1285*4882a593Smuzhiyun <idle>-0 [004] d.s1 470.867176: add_preempt_count <-_raw_spin_lock 1286*4882a593Smuzhiyun <idle>-0 [004] d.s2 470.867176: do_timer <-tick_do_update_jiffies64 1287*4882a593Smuzhiyun <idle>-0 [004] d.s2 470.867176: _raw_spin_lock <-do_timer 1288*4882a593Smuzhiyun <idle>-0 [004] d.s2 470.867176: add_preempt_count <-_raw_spin_lock 1289*4882a593Smuzhiyun <idle>-0 [004] d.s3 470.867177: ntp_tick_length <-do_timer 1290*4882a593Smuzhiyun <idle>-0 [004] d.s3 470.867177: _raw_spin_lock_irqsave <-ntp_tick_length 1291*4882a593Smuzhiyun . 1292*4882a593Smuzhiyun . 1293*4882a593Smuzhiyun . 1294*4882a593Smuzhiyun 1295*4882a593SmuzhiyunEach line in the trace above shows what was happening in the kernel on a given 1296*4882a593Smuzhiyuncpu, to the level of detail of function calls. Each entry shows the function 1297*4882a593Smuzhiyuncalled, followed by its caller (after the arrow). 1298*4882a593Smuzhiyun 1299*4882a593SmuzhiyunThe function tracer gives you an extremely detailed idea of what the 1300*4882a593Smuzhiyunkernel was doing at the point in time the trace was taken, and is a 1301*4882a593Smuzhiyungreat way to learn about how the kernel code works in a dynamic sense. 1302*4882a593Smuzhiyun 1303*4882a593Smuzhiyun.. admonition:: Tying it Together 1304*4882a593Smuzhiyun 1305*4882a593Smuzhiyun The ftrace function tracer is also available from within perf, as the 1306*4882a593Smuzhiyun ftrace:function tracepoint. 1307*4882a593Smuzhiyun 1308*4882a593SmuzhiyunIt is a little more difficult to follow the call chains than it needs to 1309*4882a593Smuzhiyunbe - luckily there's a variant of the function tracer that displays the 1310*4882a593Smuzhiyuncallchains explicitly, called the 'function_graph' tracer:: 1311*4882a593Smuzhiyun 1312*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# echo function_graph > current_tracer 1313*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# cat trace | less 1314*4882a593Smuzhiyun 1315*4882a593Smuzhiyun tracer: function_graph 1316*4882a593Smuzhiyun 1317*4882a593Smuzhiyun CPU DURATION FUNCTION CALLS 1318*4882a593Smuzhiyun | | | | | | | 1319*4882a593Smuzhiyun 7) 0.046 us | pick_next_task_fair(); 1320*4882a593Smuzhiyun 7) 0.043 us | pick_next_task_stop(); 1321*4882a593Smuzhiyun 7) 0.042 us | pick_next_task_rt(); 1322*4882a593Smuzhiyun 7) 0.032 us | pick_next_task_fair(); 1323*4882a593Smuzhiyun 7) 0.030 us | pick_next_task_idle(); 1324*4882a593Smuzhiyun 7) | _raw_spin_unlock_irq() { 1325*4882a593Smuzhiyun 7) 0.033 us | sub_preempt_count(); 1326*4882a593Smuzhiyun 7) 0.258 us | } 1327*4882a593Smuzhiyun 7) 0.032 us | sub_preempt_count(); 1328*4882a593Smuzhiyun 7) + 13.341 us | } /* __schedule */ 1329*4882a593Smuzhiyun 7) 0.095 us | } /* sub_preempt_count */ 1330*4882a593Smuzhiyun 7) | schedule() { 1331*4882a593Smuzhiyun 7) | __schedule() { 1332*4882a593Smuzhiyun 7) 0.060 us | add_preempt_count(); 1333*4882a593Smuzhiyun 7) 0.044 us | rcu_note_context_switch(); 1334*4882a593Smuzhiyun 7) | _raw_spin_lock_irq() { 1335*4882a593Smuzhiyun 7) 0.033 us | add_preempt_count(); 1336*4882a593Smuzhiyun 7) 0.247 us | } 1337*4882a593Smuzhiyun 7) | idle_balance() { 1338*4882a593Smuzhiyun 7) | _raw_spin_unlock() { 1339*4882a593Smuzhiyun 7) 0.031 us | sub_preempt_count(); 1340*4882a593Smuzhiyun 7) 0.246 us | } 1341*4882a593Smuzhiyun 7) | update_shares() { 1342*4882a593Smuzhiyun 7) 0.030 us | __rcu_read_lock(); 1343*4882a593Smuzhiyun 7) 0.029 us | __rcu_read_unlock(); 1344*4882a593Smuzhiyun 7) 0.484 us | } 1345*4882a593Smuzhiyun 7) 0.030 us | __rcu_read_lock(); 1346*4882a593Smuzhiyun 7) | load_balance() { 1347*4882a593Smuzhiyun 7) | find_busiest_group() { 1348*4882a593Smuzhiyun 7) 0.031 us | idle_cpu(); 1349*4882a593Smuzhiyun 7) 0.029 us | idle_cpu(); 1350*4882a593Smuzhiyun 7) 0.035 us | idle_cpu(); 1351*4882a593Smuzhiyun 7) 0.906 us | } 1352*4882a593Smuzhiyun 7) 1.141 us | } 1353*4882a593Smuzhiyun 7) 0.022 us | msecs_to_jiffies(); 1354*4882a593Smuzhiyun 7) | load_balance() { 1355*4882a593Smuzhiyun 7) | find_busiest_group() { 1356*4882a593Smuzhiyun 7) 0.031 us | idle_cpu(); 1357*4882a593Smuzhiyun . 1358*4882a593Smuzhiyun . 1359*4882a593Smuzhiyun . 1360*4882a593Smuzhiyun 4) 0.062 us | msecs_to_jiffies(); 1361*4882a593Smuzhiyun 4) 0.062 us | __rcu_read_unlock(); 1362*4882a593Smuzhiyun 4) | _raw_spin_lock() { 1363*4882a593Smuzhiyun 4) 0.073 us | add_preempt_count(); 1364*4882a593Smuzhiyun 4) 0.562 us | } 1365*4882a593Smuzhiyun 4) + 17.452 us | } 1366*4882a593Smuzhiyun 4) 0.108 us | put_prev_task_fair(); 1367*4882a593Smuzhiyun 4) 0.102 us | pick_next_task_fair(); 1368*4882a593Smuzhiyun 4) 0.084 us | pick_next_task_stop(); 1369*4882a593Smuzhiyun 4) 0.075 us | pick_next_task_rt(); 1370*4882a593Smuzhiyun 4) 0.062 us | pick_next_task_fair(); 1371*4882a593Smuzhiyun 4) 0.066 us | pick_next_task_idle(); 1372*4882a593Smuzhiyun ------------------------------------------ 1373*4882a593Smuzhiyun 4) kworker-74 => <idle>-0 1374*4882a593Smuzhiyun ------------------------------------------ 1375*4882a593Smuzhiyun 1376*4882a593Smuzhiyun 4) | finish_task_switch() { 1377*4882a593Smuzhiyun 4) | _raw_spin_unlock_irq() { 1378*4882a593Smuzhiyun 4) 0.100 us | sub_preempt_count(); 1379*4882a593Smuzhiyun 4) 0.582 us | } 1380*4882a593Smuzhiyun 4) 1.105 us | } 1381*4882a593Smuzhiyun 4) 0.088 us | sub_preempt_count(); 1382*4882a593Smuzhiyun 4) ! 100.066 us | } 1383*4882a593Smuzhiyun . 1384*4882a593Smuzhiyun . 1385*4882a593Smuzhiyun . 1386*4882a593Smuzhiyun 3) | sys_ioctl() { 1387*4882a593Smuzhiyun 3) 0.083 us | fget_light(); 1388*4882a593Smuzhiyun 3) | security_file_ioctl() { 1389*4882a593Smuzhiyun 3) 0.066 us | cap_file_ioctl(); 1390*4882a593Smuzhiyun 3) 0.562 us | } 1391*4882a593Smuzhiyun 3) | do_vfs_ioctl() { 1392*4882a593Smuzhiyun 3) | drm_ioctl() { 1393*4882a593Smuzhiyun 3) 0.075 us | drm_ut_debug_printk(); 1394*4882a593Smuzhiyun 3) | i915_gem_pwrite_ioctl() { 1395*4882a593Smuzhiyun 3) | i915_mutex_lock_interruptible() { 1396*4882a593Smuzhiyun 3) 0.070 us | mutex_lock_interruptible(); 1397*4882a593Smuzhiyun 3) 0.570 us | } 1398*4882a593Smuzhiyun 3) | drm_gem_object_lookup() { 1399*4882a593Smuzhiyun 3) | _raw_spin_lock() { 1400*4882a593Smuzhiyun 3) 0.080 us | add_preempt_count(); 1401*4882a593Smuzhiyun 3) 0.620 us | } 1402*4882a593Smuzhiyun 3) | _raw_spin_unlock() { 1403*4882a593Smuzhiyun 3) 0.085 us | sub_preempt_count(); 1404*4882a593Smuzhiyun 3) 0.562 us | } 1405*4882a593Smuzhiyun 3) 2.149 us | } 1406*4882a593Smuzhiyun 3) 0.133 us | i915_gem_object_pin(); 1407*4882a593Smuzhiyun 3) | i915_gem_object_set_to_gtt_domain() { 1408*4882a593Smuzhiyun 3) 0.065 us | i915_gem_object_flush_gpu_write_domain(); 1409*4882a593Smuzhiyun 3) 0.065 us | i915_gem_object_wait_rendering(); 1410*4882a593Smuzhiyun 3) 0.062 us | i915_gem_object_flush_cpu_write_domain(); 1411*4882a593Smuzhiyun 3) 1.612 us | } 1412*4882a593Smuzhiyun 3) | i915_gem_object_put_fence() { 1413*4882a593Smuzhiyun 3) 0.097 us | i915_gem_object_flush_fence.constprop.36(); 1414*4882a593Smuzhiyun 3) 0.645 us | } 1415*4882a593Smuzhiyun 3) 0.070 us | add_preempt_count(); 1416*4882a593Smuzhiyun 3) 0.070 us | sub_preempt_count(); 1417*4882a593Smuzhiyun 3) 0.073 us | i915_gem_object_unpin(); 1418*4882a593Smuzhiyun 3) 0.068 us | mutex_unlock(); 1419*4882a593Smuzhiyun 3) 9.924 us | } 1420*4882a593Smuzhiyun 3) + 11.236 us | } 1421*4882a593Smuzhiyun 3) + 11.770 us | } 1422*4882a593Smuzhiyun 3) + 13.784 us | } 1423*4882a593Smuzhiyun 3) | sys_ioctl() { 1424*4882a593Smuzhiyun 1425*4882a593SmuzhiyunAs you can see, the function_graph display is much easier 1426*4882a593Smuzhiyunto follow. Also note that in addition to the function calls and 1427*4882a593Smuzhiyunassociated braces, other events such as scheduler events are displayed 1428*4882a593Smuzhiyunin context. In fact, you can freely include any tracepoint available in 1429*4882a593Smuzhiyunthe trace events subsystem described in the next section by simply 1430*4882a593Smuzhiyunenabling those events, and they'll appear in context in the function 1431*4882a593Smuzhiyungraph display. Quite a powerful tool for understanding kernel dynamics. 1432*4882a593Smuzhiyun 1433*4882a593SmuzhiyunAlso notice that there are various annotations on the left hand side of 1434*4882a593Smuzhiyunthe display. For example if the total time it took for a given function 1435*4882a593Smuzhiyunto execute is above a certain threshold, an exclamation point or plus 1436*4882a593Smuzhiyunsign appears on the left hand side. Please see the ftrace documentation 1437*4882a593Smuzhiyunfor details on all these fields. 1438*4882a593Smuzhiyun 1439*4882a593SmuzhiyunThe 'trace events' Subsystem 1440*4882a593Smuzhiyun---------------------------- 1441*4882a593Smuzhiyun 1442*4882a593SmuzhiyunOne especially important directory contained within the 1443*4882a593Smuzhiyun/sys/kernel/debug/tracing directory is the 'events' subdirectory, which 1444*4882a593Smuzhiyuncontains representations of every tracepoint in the system. Listing out 1445*4882a593Smuzhiyunthe contents of the 'events' subdirectory, we see mainly another set of 1446*4882a593Smuzhiyunsubdirectories:: 1447*4882a593Smuzhiyun 1448*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# cd events 1449*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing/events# ls -al 1450*4882a593Smuzhiyun drwxr-xr-x 38 root root 0 Nov 14 23:19 . 1451*4882a593Smuzhiyun drwxr-xr-x 5 root root 0 Nov 14 23:19 .. 1452*4882a593Smuzhiyun drwxr-xr-x 19 root root 0 Nov 14 23:19 block 1453*4882a593Smuzhiyun drwxr-xr-x 32 root root 0 Nov 14 23:19 btrfs 1454*4882a593Smuzhiyun drwxr-xr-x 5 root root 0 Nov 14 23:19 drm 1455*4882a593Smuzhiyun -rw-r--r-- 1 root root 0 Nov 14 23:19 enable 1456*4882a593Smuzhiyun drwxr-xr-x 40 root root 0 Nov 14 23:19 ext3 1457*4882a593Smuzhiyun drwxr-xr-x 79 root root 0 Nov 14 23:19 ext4 1458*4882a593Smuzhiyun drwxr-xr-x 14 root root 0 Nov 14 23:19 ftrace 1459*4882a593Smuzhiyun drwxr-xr-x 8 root root 0 Nov 14 23:19 hda 1460*4882a593Smuzhiyun -r--r--r-- 1 root root 0 Nov 14 23:19 header_event 1461*4882a593Smuzhiyun -r--r--r-- 1 root root 0 Nov 14 23:19 header_page 1462*4882a593Smuzhiyun drwxr-xr-x 25 root root 0 Nov 14 23:19 i915 1463*4882a593Smuzhiyun drwxr-xr-x 7 root root 0 Nov 14 23:19 irq 1464*4882a593Smuzhiyun drwxr-xr-x 12 root root 0 Nov 14 23:19 jbd 1465*4882a593Smuzhiyun drwxr-xr-x 14 root root 0 Nov 14 23:19 jbd2 1466*4882a593Smuzhiyun drwxr-xr-x 14 root root 0 Nov 14 23:19 kmem 1467*4882a593Smuzhiyun drwxr-xr-x 7 root root 0 Nov 14 23:19 module 1468*4882a593Smuzhiyun drwxr-xr-x 3 root root 0 Nov 14 23:19 napi 1469*4882a593Smuzhiyun drwxr-xr-x 6 root root 0 Nov 14 23:19 net 1470*4882a593Smuzhiyun drwxr-xr-x 3 root root 0 Nov 14 23:19 oom 1471*4882a593Smuzhiyun drwxr-xr-x 12 root root 0 Nov 14 23:19 power 1472*4882a593Smuzhiyun drwxr-xr-x 3 root root 0 Nov 14 23:19 printk 1473*4882a593Smuzhiyun drwxr-xr-x 8 root root 0 Nov 14 23:19 random 1474*4882a593Smuzhiyun drwxr-xr-x 4 root root 0 Nov 14 23:19 raw_syscalls 1475*4882a593Smuzhiyun drwxr-xr-x 3 root root 0 Nov 14 23:19 rcu 1476*4882a593Smuzhiyun drwxr-xr-x 6 root root 0 Nov 14 23:19 rpm 1477*4882a593Smuzhiyun drwxr-xr-x 20 root root 0 Nov 14 23:19 sched 1478*4882a593Smuzhiyun drwxr-xr-x 7 root root 0 Nov 14 23:19 scsi 1479*4882a593Smuzhiyun drwxr-xr-x 4 root root 0 Nov 14 23:19 signal 1480*4882a593Smuzhiyun drwxr-xr-x 5 root root 0 Nov 14 23:19 skb 1481*4882a593Smuzhiyun drwxr-xr-x 4 root root 0 Nov 14 23:19 sock 1482*4882a593Smuzhiyun drwxr-xr-x 10 root root 0 Nov 14 23:19 sunrpc 1483*4882a593Smuzhiyun drwxr-xr-x 538 root root 0 Nov 14 23:19 syscalls 1484*4882a593Smuzhiyun drwxr-xr-x 4 root root 0 Nov 14 23:19 task 1485*4882a593Smuzhiyun drwxr-xr-x 14 root root 0 Nov 14 23:19 timer 1486*4882a593Smuzhiyun drwxr-xr-x 3 root root 0 Nov 14 23:19 udp 1487*4882a593Smuzhiyun drwxr-xr-x 21 root root 0 Nov 14 23:19 vmscan 1488*4882a593Smuzhiyun drwxr-xr-x 3 root root 0 Nov 14 23:19 vsyscall 1489*4882a593Smuzhiyun drwxr-xr-x 6 root root 0 Nov 14 23:19 workqueue 1490*4882a593Smuzhiyun drwxr-xr-x 26 root root 0 Nov 14 23:19 writeback 1491*4882a593Smuzhiyun 1492*4882a593SmuzhiyunEach one of these subdirectories 1493*4882a593Smuzhiyuncorresponds to a 'subsystem' and contains yet again more subdirectories, 1494*4882a593Smuzhiyuneach one of those finally corresponding to a tracepoint. For example, 1495*4882a593Smuzhiyunhere are the contents of the 'kmem' subsystem:: 1496*4882a593Smuzhiyun 1497*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing/events# cd kmem 1498*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing/events/kmem# ls -al 1499*4882a593Smuzhiyun drwxr-xr-x 14 root root 0 Nov 14 23:19 . 1500*4882a593Smuzhiyun drwxr-xr-x 38 root root 0 Nov 14 23:19 .. 1501*4882a593Smuzhiyun -rw-r--r-- 1 root root 0 Nov 14 23:19 enable 1502*4882a593Smuzhiyun -rw-r--r-- 1 root root 0 Nov 14 23:19 filter 1503*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 kfree 1504*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc 1505*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 kmalloc_node 1506*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc 1507*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_alloc_node 1508*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 kmem_cache_free 1509*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc 1510*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_extfrag 1511*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_alloc_zone_locked 1512*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free 1513*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_free_batched 1514*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 mm_page_pcpu_drain 1515*4882a593Smuzhiyun 1516*4882a593SmuzhiyunLet's see what's inside the subdirectory for a 1517*4882a593Smuzhiyunspecific tracepoint, in this case the one for kmalloc:: 1518*4882a593Smuzhiyun 1519*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing/events/kmem# cd kmalloc 1520*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# ls -al 1521*4882a593Smuzhiyun drwxr-xr-x 2 root root 0 Nov 14 23:19 . 1522*4882a593Smuzhiyun drwxr-xr-x 14 root root 0 Nov 14 23:19 .. 1523*4882a593Smuzhiyun -rw-r--r-- 1 root root 0 Nov 14 23:19 enable 1524*4882a593Smuzhiyun -rw-r--r-- 1 root root 0 Nov 14 23:19 filter 1525*4882a593Smuzhiyun -r--r--r-- 1 root root 0 Nov 14 23:19 format 1526*4882a593Smuzhiyun -r--r--r-- 1 root root 0 Nov 14 23:19 id 1527*4882a593Smuzhiyun 1528*4882a593SmuzhiyunThe 'format' file for the 1529*4882a593Smuzhiyuntracepoint describes the event in memory, which is used by the various 1530*4882a593Smuzhiyuntracing tools that now make use of these tracepoint to parse the event 1531*4882a593Smuzhiyunand make sense of it, along with a 'print fmt' field that allows tools 1532*4882a593Smuzhiyunlike ftrace to display the event as text. Here's what the format of the 1533*4882a593Smuzhiyunkmalloc event looks like:: 1534*4882a593Smuzhiyun 1535*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format 1536*4882a593Smuzhiyun name: kmalloc 1537*4882a593Smuzhiyun ID: 313 1538*4882a593Smuzhiyun format: 1539*4882a593Smuzhiyun field:unsigned short common_type; offset:0; size:2; signed:0; 1540*4882a593Smuzhiyun field:unsigned char common_flags; offset:2; size:1; signed:0; 1541*4882a593Smuzhiyun field:unsigned char common_preempt_count; offset:3; size:1; signed:0; 1542*4882a593Smuzhiyun field:int common_pid; offset:4; size:4; signed:1; 1543*4882a593Smuzhiyun field:int common_padding; offset:8; size:4; signed:1; 1544*4882a593Smuzhiyun 1545*4882a593Smuzhiyun field:unsigned long call_site; offset:16; size:8; signed:0; 1546*4882a593Smuzhiyun field:const void * ptr; offset:24; size:8; signed:0; 1547*4882a593Smuzhiyun field:size_t bytes_req; offset:32; size:8; signed:0; 1548*4882a593Smuzhiyun field:size_t bytes_alloc; offset:40; size:8; signed:0; 1549*4882a593Smuzhiyun field:gfp_t gfp_flags; offset:48; size:4; signed:0; 1550*4882a593Smuzhiyun 1551*4882a593Smuzhiyun print fmt: "call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s", REC->call_site, REC->ptr, REC->bytes_req, REC->bytes_alloc, 1552*4882a593Smuzhiyun (REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {(unsigned long)(((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( 1553*4882a593Smuzhiyun gfp_t)0x20000u) | (( gfp_t)0x02u) | (( gfp_t)0x08u)) | (( gfp_t)0x4000u) | (( gfp_t)0x10000u) | (( gfp_t)0x1000u) | (( gfp_t)0x200u) | (( 1554*4882a593Smuzhiyun gfp_t)0x400000u)), "GFP_TRANSHUGE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x20000u) | (( 1555*4882a593Smuzhiyun gfp_t)0x02u) | (( gfp_t)0x08u)), "GFP_HIGHUSER_MOVABLE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( 1556*4882a593Smuzhiyun gfp_t)0x20000u) | (( gfp_t)0x02u)), "GFP_HIGHUSER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( 1557*4882a593Smuzhiyun gfp_t)0x20000u)), "GFP_USER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x80000u)), GFP_TEMPORARY"}, 1558*4882a593Smuzhiyun {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u)), "GFP_KERNEL"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u)), 1559*4882a593Smuzhiyun "GFP_NOFS"}, {(unsigned long)((( gfp_t)0x20u)), "GFP_ATOMIC"}, {(unsigned long)((( gfp_t)0x10u)), "GFP_NOIO"}, {(unsigned long)(( 1560*4882a593Smuzhiyun gfp_t)0x20u), "GFP_HIGH"}, {(unsigned long)(( gfp_t)0x10u), "GFP_WAIT"}, {(unsigned long)(( gfp_t)0x40u), "GFP_IO"}, {(unsigned long)(( 1561*4882a593Smuzhiyun gfp_t)0x100u), "GFP_COLD"}, {(unsigned long)(( gfp_t)0x200u), "GFP_NOWARN"}, {(unsigned long)(( gfp_t)0x400u), "GFP_REPEAT"}, {(unsigned 1562*4882a593Smuzhiyun long)(( gfp_t)0x800u), "GFP_NOFAIL"}, {(unsigned long)(( gfp_t)0x1000u), "GFP_NORETRY"}, {(unsigned long)(( gfp_t)0x4000u), "GFP_COMP"}, 1563*4882a593Smuzhiyun {(unsigned long)(( gfp_t)0x8000u), "GFP_ZERO"}, {(unsigned long)(( gfp_t)0x10000u), "GFP_NOMEMALLOC"}, {(unsigned long)(( gfp_t)0x20000u), 1564*4882a593Smuzhiyun "GFP_HARDWALL"}, {(unsigned long)(( gfp_t)0x40000u), "GFP_THISNODE"}, {(unsigned long)(( gfp_t)0x80000u), "GFP_RECLAIMABLE"}, {(unsigned 1565*4882a593Smuzhiyun long)(( gfp_t)0x08u), "GFP_MOVABLE"}, {(unsigned long)(( gfp_t)0), "GFP_NOTRACK"}, {(unsigned long)(( gfp_t)0x400000u), "GFP_NO_KSWAPD"}, 1566*4882a593Smuzhiyun {(unsigned long)(( gfp_t)0x800000u), "GFP_OTHER_NODE"} ) : "GFP_NOWAIT" 1567*4882a593Smuzhiyun 1568*4882a593SmuzhiyunThe 'enable' file 1569*4882a593Smuzhiyunin the tracepoint directory is what allows the user (or tools such as 1570*4882a593Smuzhiyuntrace-cmd) to actually turn the tracepoint on and off. When enabled, the 1571*4882a593Smuzhiyuncorresponding tracepoint will start appearing in the ftrace 'trace' file 1572*4882a593Smuzhiyundescribed previously. For example, this turns on the kmalloc tracepoint:: 1573*4882a593Smuzhiyun 1574*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 1 > enable 1575*4882a593Smuzhiyun 1576*4882a593SmuzhiyunAt the moment, we're not interested in the function tracer or 1577*4882a593Smuzhiyunsome other tracer that might be in effect, so we first turn it off, but 1578*4882a593Smuzhiyunif we do that, we still need to turn tracing on in order to see the 1579*4882a593Smuzhiyunevents in the output buffer:: 1580*4882a593Smuzhiyun 1581*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# echo nop > current_tracer 1582*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# echo 1 > tracing_on 1583*4882a593Smuzhiyun 1584*4882a593SmuzhiyunNow, if we look at the 'trace' file, we see nothing 1585*4882a593Smuzhiyunbut the kmalloc events we just turned on:: 1586*4882a593Smuzhiyun 1587*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing# cat trace | less 1588*4882a593Smuzhiyun # tracer: nop 1589*4882a593Smuzhiyun # 1590*4882a593Smuzhiyun # entries-in-buffer/entries-written: 1897/1897 #P:8 1591*4882a593Smuzhiyun # 1592*4882a593Smuzhiyun # _-----=> irqs-off 1593*4882a593Smuzhiyun # / _----=> need-resched 1594*4882a593Smuzhiyun # | / _---=> hardirq/softirq 1595*4882a593Smuzhiyun # || / _--=> preempt-depth 1596*4882a593Smuzhiyun # ||| / delay 1597*4882a593Smuzhiyun # TASK-PID CPU# |||| TIMESTAMP FUNCTION 1598*4882a593Smuzhiyun # | | | |||| | | 1599*4882a593Smuzhiyun dropbear-1465 [000] ...1 18154.620753: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL 1600*4882a593Smuzhiyun <idle>-0 [000] ..s3 18154.621640: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1601*4882a593Smuzhiyun <idle>-0 [000] ..s3 18154.621656: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1602*4882a593Smuzhiyun matchbox-termin-1361 [001] ...1 18154.755472: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f0e00 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT 1603*4882a593Smuzhiyun Xorg-1264 [002] ...1 18154.755581: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY 1604*4882a593Smuzhiyun Xorg-1264 [002] ...1 18154.755583: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO 1605*4882a593Smuzhiyun Xorg-1264 [002] ...1 18154.755589: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO 1606*4882a593Smuzhiyun matchbox-termin-1361 [001] ...1 18155.354594: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db35400 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT 1607*4882a593Smuzhiyun Xorg-1264 [002] ...1 18155.354703: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY 1608*4882a593Smuzhiyun Xorg-1264 [002] ...1 18155.354705: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO 1609*4882a593Smuzhiyun Xorg-1264 [002] ...1 18155.354711: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO 1610*4882a593Smuzhiyun <idle>-0 [000] ..s3 18155.673319: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1611*4882a593Smuzhiyun dropbear-1465 [000] ...1 18155.673525: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL 1612*4882a593Smuzhiyun <idle>-0 [000] ..s3 18155.674821: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1613*4882a593Smuzhiyun <idle>-0 [000] ..s3 18155.793014: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1614*4882a593Smuzhiyun dropbear-1465 [000] ...1 18155.793219: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL 1615*4882a593Smuzhiyun <idle>-0 [000] ..s3 18155.794147: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1616*4882a593Smuzhiyun <idle>-0 [000] ..s3 18155.936705: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1617*4882a593Smuzhiyun dropbear-1465 [000] ...1 18155.936910: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL 1618*4882a593Smuzhiyun <idle>-0 [000] ..s3 18155.937869: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1619*4882a593Smuzhiyun matchbox-termin-1361 [001] ...1 18155.953667: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f2000 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT 1620*4882a593Smuzhiyun Xorg-1264 [002] ...1 18155.953775: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY 1621*4882a593Smuzhiyun Xorg-1264 [002] ...1 18155.953777: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO 1622*4882a593Smuzhiyun Xorg-1264 [002] ...1 18155.953783: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO 1623*4882a593Smuzhiyun <idle>-0 [000] ..s3 18156.176053: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1624*4882a593Smuzhiyun dropbear-1465 [000] ...1 18156.176257: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL 1625*4882a593Smuzhiyun <idle>-0 [000] ..s3 18156.177717: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1626*4882a593Smuzhiyun <idle>-0 [000] ..s3 18156.399229: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1627*4882a593Smuzhiyun dropbear-1465 [000] ...1 18156.399434: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_http://rostedt.homelinux.com/kernelshark/req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL 1628*4882a593Smuzhiyun <idle>-0 [000] ..s3 18156.400660: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC 1629*4882a593Smuzhiyun matchbox-termin-1361 [001] ...1 18156.552800: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db34800 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT 1630*4882a593Smuzhiyun 1631*4882a593SmuzhiyunTo again disable the kmalloc event, we need to send 0 to the enable file:: 1632*4882a593Smuzhiyun 1633*4882a593Smuzhiyun root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 0 > enable 1634*4882a593Smuzhiyun 1635*4882a593SmuzhiyunYou can enable any number of events or complete subsystems (by 1636*4882a593Smuzhiyunusing the 'enable' file in the subsystem directory) and get an 1637*4882a593Smuzhiyunarbitrarily fine-grained idea of what's going on in the system by 1638*4882a593Smuzhiyunenabling as many of the appropriate tracepoints as applicable. 1639*4882a593Smuzhiyun 1640*4882a593SmuzhiyunA number of the tools described in this HOWTO do just that, including 1641*4882a593Smuzhiyuntrace-cmd and kernelshark in the next section. 1642*4882a593Smuzhiyun 1643*4882a593Smuzhiyun.. admonition:: Tying it Together 1644*4882a593Smuzhiyun 1645*4882a593Smuzhiyun These tracepoints and their representation are used not only by 1646*4882a593Smuzhiyun ftrace, but by many of the other tools covered in this document and 1647*4882a593Smuzhiyun they form a central point of integration for the various tracers 1648*4882a593Smuzhiyun available in Linux. They form a central part of the instrumentation 1649*4882a593Smuzhiyun for the following tools: perf, lttng, ftrace, blktrace and SystemTap 1650*4882a593Smuzhiyun 1651*4882a593Smuzhiyun.. admonition:: Tying it Together 1652*4882a593Smuzhiyun 1653*4882a593Smuzhiyun Eventually all the special-purpose tracers currently available in 1654*4882a593Smuzhiyun /sys/kernel/debug/tracing will be removed and replaced with 1655*4882a593Smuzhiyun equivalent tracers based on the 'trace events' subsystem. 1656*4882a593Smuzhiyun 1657*4882a593Smuzhiyuntrace-cmd/kernelshark 1658*4882a593Smuzhiyun--------------------- 1659*4882a593Smuzhiyun 1660*4882a593Smuzhiyuntrace-cmd is essentially an extensive command-line 'wrapper' interface 1661*4882a593Smuzhiyunthat hides the details of all the individual files in 1662*4882a593Smuzhiyun/sys/kernel/debug/tracing, allowing users to specify specific particular 1663*4882a593Smuzhiyunevents within the /sys/kernel/debug/tracing/events/ subdirectory and to 1664*4882a593Smuzhiyuncollect traces and avoid having to deal with those details directly. 1665*4882a593Smuzhiyun 1666*4882a593SmuzhiyunAs yet another layer on top of that, kernelshark provides a GUI that 1667*4882a593Smuzhiyunallows users to start and stop traces and specify sets of events using 1668*4882a593Smuzhiyunan intuitive interface, and view the output as both trace events and as 1669*4882a593Smuzhiyuna per-CPU graphical display. It directly uses 'trace-cmd' as the 1670*4882a593Smuzhiyunplumbing that accomplishes all that underneath the covers (and actually 1671*4882a593Smuzhiyundisplays the trace-cmd command it uses, as we'll see). 1672*4882a593Smuzhiyun 1673*4882a593SmuzhiyunTo start a trace using kernelshark, first start kernelshark:: 1674*4882a593Smuzhiyun 1675*4882a593Smuzhiyun root@sugarbay:~# kernelshark 1676*4882a593Smuzhiyun 1677*4882a593SmuzhiyunThen bring up the 'Capture' dialog by 1678*4882a593Smuzhiyunchoosing from the kernelshark menu:: 1679*4882a593Smuzhiyun 1680*4882a593Smuzhiyun Capture | Record 1681*4882a593Smuzhiyun 1682*4882a593SmuzhiyunThat will display the following dialog, which allows you to choose one or more 1683*4882a593Smuzhiyunevents (or even one or more complete subsystems) to trace: 1684*4882a593Smuzhiyun 1685*4882a593Smuzhiyun.. image:: figures/kernelshark-choose-events.png 1686*4882a593Smuzhiyun :align: center 1687*4882a593Smuzhiyun 1688*4882a593SmuzhiyunNote that these are exactly the same sets of events described in the 1689*4882a593Smuzhiyunprevious trace events subsystem section, and in fact is where trace-cmd 1690*4882a593Smuzhiyungets them for kernelshark. 1691*4882a593Smuzhiyun 1692*4882a593SmuzhiyunIn the above screenshot, we've decided to explore the graphics subsystem 1693*4882a593Smuzhiyuna bit and so have chosen to trace all the tracepoints contained within 1694*4882a593Smuzhiyunthe 'i915' and 'drm' subsystems. 1695*4882a593Smuzhiyun 1696*4882a593SmuzhiyunAfter doing that, we can start and stop the trace using the 'Run' and 1697*4882a593Smuzhiyun'Stop' button on the lower right corner of the dialog (the same button 1698*4882a593Smuzhiyunwill turn into the 'Stop' button after the trace has started): 1699*4882a593Smuzhiyun 1700*4882a593Smuzhiyun.. image:: figures/kernelshark-output-display.png 1701*4882a593Smuzhiyun :align: center 1702*4882a593Smuzhiyun 1703*4882a593SmuzhiyunNotice that the right-hand pane shows the exact trace-cmd command-line 1704*4882a593Smuzhiyunthat's used to run the trace, along with the results of the trace-cmd 1705*4882a593Smuzhiyunrun. 1706*4882a593Smuzhiyun 1707*4882a593SmuzhiyunOnce the 'Stop' button is pressed, the graphical view magically fills up 1708*4882a593Smuzhiyunwith a colorful per-cpu display of the trace data, along with the 1709*4882a593Smuzhiyundetailed event listing below that: 1710*4882a593Smuzhiyun 1711*4882a593Smuzhiyun.. image:: figures/kernelshark-i915-display.png 1712*4882a593Smuzhiyun :align: center 1713*4882a593Smuzhiyun 1714*4882a593SmuzhiyunHere's another example, this time a display resulting from tracing 'all 1715*4882a593Smuzhiyunevents': 1716*4882a593Smuzhiyun 1717*4882a593Smuzhiyun.. image:: figures/kernelshark-all.png 1718*4882a593Smuzhiyun :align: center 1719*4882a593Smuzhiyun 1720*4882a593SmuzhiyunThe tool is pretty self-explanatory, but for more detailed information 1721*4882a593Smuzhiyunon navigating through the data, see the `kernelshark 1722*4882a593Smuzhiyunwebsite <https://kernelshark.org/Documentation.html>`__. 1723*4882a593Smuzhiyun 1724*4882a593Smuzhiyunftrace Documentation 1725*4882a593Smuzhiyun-------------------- 1726*4882a593Smuzhiyun 1727*4882a593SmuzhiyunThe documentation for ftrace can be found in the kernel Documentation 1728*4882a593Smuzhiyundirectory:: 1729*4882a593Smuzhiyun 1730*4882a593Smuzhiyun Documentation/trace/ftrace.txt 1731*4882a593Smuzhiyun 1732*4882a593SmuzhiyunThe documentation for the trace event subsystem can also be found in the kernel 1733*4882a593SmuzhiyunDocumentation directory:: 1734*4882a593Smuzhiyun 1735*4882a593Smuzhiyun Documentation/trace/events.txt 1736*4882a593Smuzhiyun 1737*4882a593SmuzhiyunThere is a nice series of articles on using ftrace and trace-cmd at LWN: 1738*4882a593Smuzhiyun 1739*4882a593Smuzhiyun- `Debugging the kernel using Ftrace - part 1740*4882a593Smuzhiyun 1 <https://lwn.net/Articles/365835/>`__ 1741*4882a593Smuzhiyun 1742*4882a593Smuzhiyun- `Debugging the kernel using Ftrace - part 1743*4882a593Smuzhiyun 2 <https://lwn.net/Articles/366796/>`__ 1744*4882a593Smuzhiyun 1745*4882a593Smuzhiyun- `Secrets of the Ftrace function 1746*4882a593Smuzhiyun tracer <https://lwn.net/Articles/370423/>`__ 1747*4882a593Smuzhiyun 1748*4882a593Smuzhiyun- `trace-cmd: A front-end for 1749*4882a593Smuzhiyun Ftrace <https://lwn.net/Articles/410200/>`__ 1750*4882a593Smuzhiyun 1751*4882a593SmuzhiyunSee also `KernelShark's documentation <https://kernelshark.org/Documentation.html>`__ 1752*4882a593Smuzhiyunfor further usage details. 1753*4882a593Smuzhiyun 1754*4882a593SmuzhiyunAn amusing yet useful README (a tracing mini-HOWTO) can be found in 1755*4882a593Smuzhiyun``/sys/kernel/debug/tracing/README``. 1756*4882a593Smuzhiyun 1757*4882a593Smuzhiyunsystemtap 1758*4882a593Smuzhiyun========= 1759*4882a593Smuzhiyun 1760*4882a593SmuzhiyunSystemTap is a system-wide script-based tracing and profiling tool. 1761*4882a593Smuzhiyun 1762*4882a593SmuzhiyunSystemTap scripts are C-like programs that are executed in the kernel to 1763*4882a593Smuzhiyungather/print/aggregate data extracted from the context they end up being 1764*4882a593Smuzhiyuninvoked under. 1765*4882a593Smuzhiyun 1766*4882a593SmuzhiyunFor example, this probe from the `SystemTap 1767*4882a593Smuzhiyuntutorial <https://sourceware.org/systemtap/tutorial/>`__ simply prints a 1768*4882a593Smuzhiyunline every time any process on the system open()s a file. For each line, 1769*4882a593Smuzhiyunit prints the executable name of the program that opened the file, along 1770*4882a593Smuzhiyunwith its PID, and the name of the file it opened (or tried to open), 1771*4882a593Smuzhiyunwhich it extracts from the open syscall's argstr. 1772*4882a593Smuzhiyun 1773*4882a593Smuzhiyun.. code-block:: none 1774*4882a593Smuzhiyun 1775*4882a593Smuzhiyun probe syscall.open 1776*4882a593Smuzhiyun { 1777*4882a593Smuzhiyun printf ("%s(%d) open (%s)\n", execname(), pid(), argstr) 1778*4882a593Smuzhiyun } 1779*4882a593Smuzhiyun 1780*4882a593Smuzhiyun probe timer.ms(4000) # after 4 seconds 1781*4882a593Smuzhiyun { 1782*4882a593Smuzhiyun exit () 1783*4882a593Smuzhiyun } 1784*4882a593Smuzhiyun 1785*4882a593SmuzhiyunNormally, to execute this 1786*4882a593Smuzhiyunprobe, you'd simply install systemtap on the system you want to probe, 1787*4882a593Smuzhiyunand directly run the probe on that system e.g. assuming the name of the 1788*4882a593Smuzhiyunfile containing the above text is trace_open.stp:: 1789*4882a593Smuzhiyun 1790*4882a593Smuzhiyun # stap trace_open.stp 1791*4882a593Smuzhiyun 1792*4882a593SmuzhiyunWhat systemtap does under the covers to run this probe is 1) parse and 1793*4882a593Smuzhiyunconvert the probe to an equivalent 'C' form, 2) compile the 'C' form 1794*4882a593Smuzhiyuninto a kernel module, 3) insert the module into the kernel, which arms 1795*4882a593Smuzhiyunit, and 4) collect the data generated by the probe and display it to the 1796*4882a593Smuzhiyunuser. 1797*4882a593Smuzhiyun 1798*4882a593SmuzhiyunIn order to accomplish steps 1 and 2, the 'stap' program needs access to 1799*4882a593Smuzhiyunthe kernel build system that produced the kernel that the probed system 1800*4882a593Smuzhiyunis running. In the case of a typical embedded system (the 'target'), the 1801*4882a593Smuzhiyunkernel build system unfortunately isn't typically part of the image 1802*4882a593Smuzhiyunrunning on the target. It is normally available on the 'host' system 1803*4882a593Smuzhiyunthat produced the target image however; in such cases, steps 1 and 2 are 1804*4882a593Smuzhiyunexecuted on the host system, and steps 3 and 4 are executed on the 1805*4882a593Smuzhiyuntarget system, using only the systemtap 'runtime'. 1806*4882a593Smuzhiyun 1807*4882a593SmuzhiyunThe systemtap support in Yocto assumes that only steps 3 and 4 are run 1808*4882a593Smuzhiyunon the target; it is possible to do everything on the target, but this 1809*4882a593Smuzhiyunsection assumes only the typical embedded use-case. 1810*4882a593Smuzhiyun 1811*4882a593SmuzhiyunSo basically what you need to do in order to run a systemtap script on 1812*4882a593Smuzhiyunthe target is to 1) on the host system, compile the probe into a kernel 1813*4882a593Smuzhiyunmodule that makes sense to the target, 2) copy the module onto the 1814*4882a593Smuzhiyuntarget system and 3) insert the module into the target kernel, which 1815*4882a593Smuzhiyunarms it, and 4) collect the data generated by the probe and display it 1816*4882a593Smuzhiyunto the user. 1817*4882a593Smuzhiyun 1818*4882a593Smuzhiyunsystemtap Setup 1819*4882a593Smuzhiyun--------------- 1820*4882a593Smuzhiyun 1821*4882a593SmuzhiyunThose are a lot of steps and a lot of details, but fortunately Yocto 1822*4882a593Smuzhiyunincludes a script called 'crosstap' that will take care of those 1823*4882a593Smuzhiyundetails, allowing you to simply execute a systemtap script on the remote 1824*4882a593Smuzhiyuntarget, with arguments if necessary. 1825*4882a593Smuzhiyun 1826*4882a593SmuzhiyunIn order to do this from a remote host, however, you need to have access 1827*4882a593Smuzhiyunto the build for the image you booted. The 'crosstap' script provides 1828*4882a593Smuzhiyundetails on how to do this if you run the script on the host without 1829*4882a593Smuzhiyunhaving done a build:: 1830*4882a593Smuzhiyun 1831*4882a593Smuzhiyun $ crosstap root@192.168.1.88 trace_open.stp 1832*4882a593Smuzhiyun 1833*4882a593Smuzhiyun Error: No target kernel build found. 1834*4882a593Smuzhiyun Did you forget to create a local build of your image? 1835*4882a593Smuzhiyun 1836*4882a593Smuzhiyun 'crosstap' requires a local sdk build of the target system 1837*4882a593Smuzhiyun (or a build that includes 'tools-profile') in order to build 1838*4882a593Smuzhiyun kernel modules that can probe the target system. 1839*4882a593Smuzhiyun 1840*4882a593Smuzhiyun Practically speaking, that means you need to do the following: 1841*4882a593Smuzhiyun - If you're running a pre-built image, download the release 1842*4882a593Smuzhiyun and/or BSP tarballs used to build the image. 1843*4882a593Smuzhiyun - If you're working from git sources, just clone the metadata 1844*4882a593Smuzhiyun and BSP layers needed to build the image you'll be booting. 1845*4882a593Smuzhiyun - Make sure you're properly set up to build a new image (see 1846*4882a593Smuzhiyun the BSP README and/or the widely available basic documentation 1847*4882a593Smuzhiyun that discusses how to build images). 1848*4882a593Smuzhiyun - Build an -sdk version of the image e.g.: 1849*4882a593Smuzhiyun $ bitbake core-image-sato-sdk 1850*4882a593Smuzhiyun OR 1851*4882a593Smuzhiyun - Build a non-sdk image but include the profiling tools: 1852*4882a593Smuzhiyun [ edit local.conf and add 'tools-profile' to the end of 1853*4882a593Smuzhiyun the EXTRA_IMAGE_FEATURES variable ] 1854*4882a593Smuzhiyun $ bitbake core-image-sato 1855*4882a593Smuzhiyun 1856*4882a593Smuzhiyun Once you've build the image on the host system, you're ready to 1857*4882a593Smuzhiyun boot it (or the equivalent pre-built image) and use 'crosstap' 1858*4882a593Smuzhiyun to probe it (you need to source the environment as usual first): 1859*4882a593Smuzhiyun 1860*4882a593Smuzhiyun $ source oe-init-build-env 1861*4882a593Smuzhiyun $ cd ~/my/systemtap/scripts 1862*4882a593Smuzhiyun $ crosstap root@192.168.1.xxx myscript.stp 1863*4882a593Smuzhiyun 1864*4882a593Smuzhiyun.. note:: 1865*4882a593Smuzhiyun 1866*4882a593Smuzhiyun SystemTap, which uses 'crosstap', assumes you can establish an ssh 1867*4882a593Smuzhiyun connection to the remote target. Please refer to the crosstap wiki 1868*4882a593Smuzhiyun page for details on verifying ssh connections at 1869*4882a593Smuzhiyun . Also, the ability to ssh into the target system is not enabled by 1870*4882a593Smuzhiyun default in \*-minimal images. 1871*4882a593Smuzhiyun 1872*4882a593SmuzhiyunSo essentially what you need to 1873*4882a593Smuzhiyundo is build an SDK image or image with 'tools-profile' as detailed in 1874*4882a593Smuzhiyunthe ":ref:`profile-manual/intro:General Setup`" section of this 1875*4882a593Smuzhiyunmanual, and boot the resulting target image. 1876*4882a593Smuzhiyun 1877*4882a593Smuzhiyun.. note:: 1878*4882a593Smuzhiyun 1879*4882a593Smuzhiyun If you have a build directory containing multiple machines, you need 1880*4882a593Smuzhiyun to have the MACHINE you're connecting to selected in local.conf, and 1881*4882a593Smuzhiyun the kernel in that machine's build directory must match the kernel on 1882*4882a593Smuzhiyun the booted system exactly, or you'll get the above 'crosstap' message 1883*4882a593Smuzhiyun when you try to invoke a script. 1884*4882a593Smuzhiyun 1885*4882a593SmuzhiyunRunning a Script on a Target 1886*4882a593Smuzhiyun---------------------------- 1887*4882a593Smuzhiyun 1888*4882a593SmuzhiyunOnce you've done that, you should be able to run a systemtap script on 1889*4882a593Smuzhiyunthe target:: 1890*4882a593Smuzhiyun 1891*4882a593Smuzhiyun $ cd /path/to/yocto 1892*4882a593Smuzhiyun $ source oe-init-build-env 1893*4882a593Smuzhiyun 1894*4882a593Smuzhiyun ### Shell environment set up for builds. ### 1895*4882a593Smuzhiyun 1896*4882a593Smuzhiyun You can now run 'bitbake <target>' 1897*4882a593Smuzhiyun 1898*4882a593Smuzhiyun Common targets are: 1899*4882a593Smuzhiyun core-image-minimal 1900*4882a593Smuzhiyun core-image-sato 1901*4882a593Smuzhiyun meta-toolchain 1902*4882a593Smuzhiyun meta-ide-support 1903*4882a593Smuzhiyun 1904*4882a593Smuzhiyun You can also run generated QEMU images with a command like 'runqemu qemux86-64' 1905*4882a593Smuzhiyun 1906*4882a593SmuzhiyunOnce you've done that, you can cd to whatever 1907*4882a593Smuzhiyundirectory contains your scripts and use 'crosstap' to run the script:: 1908*4882a593Smuzhiyun 1909*4882a593Smuzhiyun $ cd /path/to/my/systemap/script 1910*4882a593Smuzhiyun $ crosstap root@192.168.7.2 trace_open.stp 1911*4882a593Smuzhiyun 1912*4882a593SmuzhiyunIf you get an error connecting to the target e.g.:: 1913*4882a593Smuzhiyun 1914*4882a593Smuzhiyun $ crosstap root@192.168.7.2 trace_open.stp 1915*4882a593Smuzhiyun error establishing ssh connection on remote 'root@192.168.7.2' 1916*4882a593Smuzhiyun 1917*4882a593SmuzhiyunTry ssh'ing to the target and see what happens:: 1918*4882a593Smuzhiyun 1919*4882a593Smuzhiyun $ ssh root@192.168.7.2 1920*4882a593Smuzhiyun 1921*4882a593SmuzhiyunA lot of the time, connection 1922*4882a593Smuzhiyunproblems are due specifying a wrong IP address or having a 'host key 1923*4882a593Smuzhiyunverification error'. 1924*4882a593Smuzhiyun 1925*4882a593SmuzhiyunIf everything worked as planned, you should see something like this 1926*4882a593Smuzhiyun(enter the password when prompted, or press enter if it's set up to use 1927*4882a593Smuzhiyunno password): 1928*4882a593Smuzhiyun 1929*4882a593Smuzhiyun.. code-block:: none 1930*4882a593Smuzhiyun 1931*4882a593Smuzhiyun $ crosstap root@192.168.7.2 trace_open.stp 1932*4882a593Smuzhiyun root@192.168.7.2's password: 1933*4882a593Smuzhiyun matchbox-termin(1036) open ("/tmp/vte3FS2LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600) 1934*4882a593Smuzhiyun matchbox-termin(1036) open ("/tmp/vteJMC7LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600) 1935*4882a593Smuzhiyun 1936*4882a593Smuzhiyunsystemtap Documentation 1937*4882a593Smuzhiyun----------------------- 1938*4882a593Smuzhiyun 1939*4882a593SmuzhiyunThe SystemTap language reference can be found here: `SystemTap Language 1940*4882a593SmuzhiyunReference <https://sourceware.org/systemtap/langref/>`__ 1941*4882a593Smuzhiyun 1942*4882a593SmuzhiyunLinks to other SystemTap documents, tutorials, and examples can be found 1943*4882a593Smuzhiyunhere: `SystemTap documentation 1944*4882a593Smuzhiyunpage <https://sourceware.org/systemtap/documentation.html>`__ 1945*4882a593Smuzhiyun 1946*4882a593SmuzhiyunSysprof 1947*4882a593Smuzhiyun======= 1948*4882a593Smuzhiyun 1949*4882a593SmuzhiyunSysprof is a very easy to use system-wide profiler that consists of a 1950*4882a593Smuzhiyunsingle window with three panes and a few buttons which allow you to 1951*4882a593Smuzhiyunstart, stop, and view the profile from one place. 1952*4882a593Smuzhiyun 1953*4882a593SmuzhiyunSysprof Setup 1954*4882a593Smuzhiyun------------- 1955*4882a593Smuzhiyun 1956*4882a593SmuzhiyunFor this section, we'll assume you've already performed the basic setup 1957*4882a593Smuzhiyunoutlined in the ":ref:`profile-manual/intro:General Setup`" section. 1958*4882a593Smuzhiyun 1959*4882a593SmuzhiyunSysprof is a GUI-based application that runs on the target system. For 1960*4882a593Smuzhiyunthe rest of this document we assume you've ssh'ed to the host and will 1961*4882a593Smuzhiyunbe running Sysprof on the target (you can use the '-X' option to ssh and 1962*4882a593Smuzhiyunhave the Sysprof GUI run on the target but display remotely on the host 1963*4882a593Smuzhiyunif you want). 1964*4882a593Smuzhiyun 1965*4882a593SmuzhiyunBasic Sysprof Usage 1966*4882a593Smuzhiyun------------------- 1967*4882a593Smuzhiyun 1968*4882a593SmuzhiyunTo start profiling the system, you simply press the 'Start' button. To 1969*4882a593Smuzhiyunstop profiling and to start viewing the profile data in one easy step, 1970*4882a593Smuzhiyunpress the 'Profile' button. 1971*4882a593Smuzhiyun 1972*4882a593SmuzhiyunOnce you've pressed the profile button, the three panes will fill up 1973*4882a593Smuzhiyunwith profiling data: 1974*4882a593Smuzhiyun 1975*4882a593Smuzhiyun.. image:: figures/sysprof-copy-to-user.png 1976*4882a593Smuzhiyun :align: center 1977*4882a593Smuzhiyun 1978*4882a593SmuzhiyunThe left pane shows a list of functions and processes. Selecting one of 1979*4882a593Smuzhiyunthose expands that function in the right pane, showing all its callees. 1980*4882a593SmuzhiyunNote that this caller-oriented display is essentially the inverse of 1981*4882a593Smuzhiyunperf's default callee-oriented callchain display. 1982*4882a593Smuzhiyun 1983*4882a593SmuzhiyunIn the screenshot above, we're focusing on ``__copy_to_user_ll()`` and 1984*4882a593Smuzhiyunlooking up the callchain we can see that one of the callers of 1985*4882a593Smuzhiyun``__copy_to_user_ll`` is sys_read() and the complete callpath between them. 1986*4882a593SmuzhiyunNotice that this is essentially a portion of the same information we saw 1987*4882a593Smuzhiyunin the perf display shown in the perf section of this page. 1988*4882a593Smuzhiyun 1989*4882a593Smuzhiyun.. image:: figures/sysprof-copy-from-user.png 1990*4882a593Smuzhiyun :align: center 1991*4882a593Smuzhiyun 1992*4882a593SmuzhiyunSimilarly, the above is a snapshot of the Sysprof display of a 1993*4882a593Smuzhiyuncopy-from-user callchain. 1994*4882a593Smuzhiyun 1995*4882a593SmuzhiyunFinally, looking at the third Sysprof pane in the lower left, we can see 1996*4882a593Smuzhiyuna list of all the callers of a particular function selected in the top 1997*4882a593Smuzhiyunleft pane. In this case, the lower pane is showing all the callers of 1998*4882a593Smuzhiyun``__mark_inode_dirty``: 1999*4882a593Smuzhiyun 2000*4882a593Smuzhiyun.. image:: figures/sysprof-callers.png 2001*4882a593Smuzhiyun :align: center 2002*4882a593Smuzhiyun 2003*4882a593SmuzhiyunDouble-clicking on one of those functions will in turn change the focus 2004*4882a593Smuzhiyunto the selected function, and so on. 2005*4882a593Smuzhiyun 2006*4882a593Smuzhiyun.. admonition:: Tying it Together 2007*4882a593Smuzhiyun 2008*4882a593Smuzhiyun If you like sysprof's 'caller-oriented' display, you may be able to 2009*4882a593Smuzhiyun approximate it in other tools as well. For example, 'perf report' has 2010*4882a593Smuzhiyun the -g (--call-graph) option that you can experiment with; one of the 2011*4882a593Smuzhiyun options is 'caller' for an inverted caller-based callgraph display. 2012*4882a593Smuzhiyun 2013*4882a593SmuzhiyunSysprof Documentation 2014*4882a593Smuzhiyun--------------------- 2015*4882a593Smuzhiyun 2016*4882a593SmuzhiyunThere doesn't seem to be any documentation for Sysprof, but maybe that's 2017*4882a593Smuzhiyunbecause it's pretty self-explanatory. The Sysprof website, however, is 2018*4882a593Smuzhiyunhere: `Sysprof, System-wide Performance Profiler for 2019*4882a593SmuzhiyunLinux <http://sysprof.com/>`__ 2020*4882a593Smuzhiyun 2021*4882a593SmuzhiyunLTTng (Linux Trace Toolkit, next generation) 2022*4882a593Smuzhiyun============================================ 2023*4882a593Smuzhiyun 2024*4882a593SmuzhiyunLTTng Setup 2025*4882a593Smuzhiyun----------- 2026*4882a593Smuzhiyun 2027*4882a593SmuzhiyunFor this section, we'll assume you've already performed the basic setup 2028*4882a593Smuzhiyunoutlined in the ":ref:`profile-manual/intro:General Setup`" section. 2029*4882a593SmuzhiyunLTTng is run on the target system by ssh'ing to it. 2030*4882a593Smuzhiyun 2031*4882a593SmuzhiyunCollecting and Viewing Traces 2032*4882a593Smuzhiyun----------------------------- 2033*4882a593Smuzhiyun 2034*4882a593SmuzhiyunOnce you've applied the above commits and built and booted your image 2035*4882a593Smuzhiyun(you need to build the core-image-sato-sdk image or use one of the other 2036*4882a593Smuzhiyunmethods described in the ":ref:`profile-manual/intro:General Setup`" section), you're ready to start 2037*4882a593Smuzhiyuntracing. 2038*4882a593Smuzhiyun 2039*4882a593SmuzhiyunCollecting and viewing a trace on the target (inside a shell) 2040*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2041*4882a593Smuzhiyun 2042*4882a593SmuzhiyunFirst, from the host, ssh to the target:: 2043*4882a593Smuzhiyun 2044*4882a593Smuzhiyun $ ssh -l root 192.168.1.47 2045*4882a593Smuzhiyun The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established. 2046*4882a593Smuzhiyun RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e. 2047*4882a593Smuzhiyun Are you sure you want to continue connecting (yes/no)? yes 2048*4882a593Smuzhiyun Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts. 2049*4882a593Smuzhiyun root@192.168.1.47's password: 2050*4882a593Smuzhiyun 2051*4882a593SmuzhiyunOnce on the target, use these steps to create a trace:: 2052*4882a593Smuzhiyun 2053*4882a593Smuzhiyun root@crownbay:~# lttng create 2054*4882a593Smuzhiyun Spawning a session daemon 2055*4882a593Smuzhiyun Session auto-20121015-232120 created. 2056*4882a593Smuzhiyun Traces will be written in /home/root/lttng-traces/auto-20121015-232120 2057*4882a593Smuzhiyun 2058*4882a593SmuzhiyunEnable the events you want to trace (in this case all kernel events):: 2059*4882a593Smuzhiyun 2060*4882a593Smuzhiyun root@crownbay:~# lttng enable-event --kernel --all 2061*4882a593Smuzhiyun All kernel events are enabled in channel channel0 2062*4882a593Smuzhiyun 2063*4882a593SmuzhiyunStart the trace:: 2064*4882a593Smuzhiyun 2065*4882a593Smuzhiyun root@crownbay:~# lttng start 2066*4882a593Smuzhiyun Tracing started for session auto-20121015-232120 2067*4882a593Smuzhiyun 2068*4882a593SmuzhiyunAnd then stop the trace after awhile or after running a particular workload that 2069*4882a593Smuzhiyunyou want to trace:: 2070*4882a593Smuzhiyun 2071*4882a593Smuzhiyun root@crownbay:~# lttng stop 2072*4882a593Smuzhiyun Tracing stopped for session auto-20121015-232120 2073*4882a593Smuzhiyun 2074*4882a593SmuzhiyunYou can now view the trace in text form on the target:: 2075*4882a593Smuzhiyun 2076*4882a593Smuzhiyun root@crownbay:~# lttng view 2077*4882a593Smuzhiyun [23:21:56.989270399] (+?.?????????) sys_geteuid: { 1 }, { } 2078*4882a593Smuzhiyun [23:21:56.989278081] (+0.000007682) exit_syscall: { 1 }, { ret = 0 } 2079*4882a593Smuzhiyun [23:21:56.989286043] (+0.000007962) sys_pipe: { 1 }, { fildes = 0xB77B9E8C } 2080*4882a593Smuzhiyun [23:21:56.989321802] (+0.000035759) exit_syscall: { 1 }, { ret = 0 } 2081*4882a593Smuzhiyun [23:21:56.989329345] (+0.000007543) sys_mmap_pgoff: { 1 }, { addr = 0x0, len = 10485760, prot = 3, flags = 131362, fd = 4294967295, pgoff = 0 } 2082*4882a593Smuzhiyun [23:21:56.989351694] (+0.000022349) exit_syscall: { 1 }, { ret = -1247805440 } 2083*4882a593Smuzhiyun [23:21:56.989432989] (+0.000081295) sys_clone: { 1 }, { clone_flags = 0x411, newsp = 0xB5EFFFE4, parent_tid = 0xFFFFFFFF, child_tid = 0x0 } 2084*4882a593Smuzhiyun [23:21:56.989477129] (+0.000044140) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 681660, vruntime = 43367983388 } 2085*4882a593Smuzhiyun [23:21:56.989486697] (+0.000009568) sched_migrate_task: { 1 }, { comm = "lttng-consumerd", tid = 1193, prio = 20, orig_cpu = 1, dest_cpu = 1 } 2086*4882a593Smuzhiyun [23:21:56.989508418] (+0.000021721) hrtimer_init: { 1 }, { hrtimer = 3970832076, clockid = 1, mode = 1 } 2087*4882a593Smuzhiyun [23:21:56.989770462] (+0.000262044) hrtimer_cancel: { 1 }, { hrtimer = 3993865440 } 2088*4882a593Smuzhiyun [23:21:56.989771580] (+0.000001118) hrtimer_cancel: { 0 }, { hrtimer = 3993812192 } 2089*4882a593Smuzhiyun [23:21:56.989776957] (+0.000005377) hrtimer_expire_entry: { 1 }, { hrtimer = 3993865440, now = 79815980007057, function = 3238465232 } 2090*4882a593Smuzhiyun [23:21:56.989778145] (+0.000001188) hrtimer_expire_entry: { 0 }, { hrtimer = 3993812192, now = 79815980008174, function = 3238465232 } 2091*4882a593Smuzhiyun [23:21:56.989791695] (+0.000013550) softirq_raise: { 1 }, { vec = 1 } 2092*4882a593Smuzhiyun [23:21:56.989795396] (+0.000003701) softirq_raise: { 0 }, { vec = 1 } 2093*4882a593Smuzhiyun [23:21:56.989800635] (+0.000005239) softirq_raise: { 0 }, { vec = 9 } 2094*4882a593Smuzhiyun [23:21:56.989807130] (+0.000006495) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 330710, vruntime = 43368314098 } 2095*4882a593Smuzhiyun [23:21:56.989809993] (+0.000002863) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 1015313, vruntime = 36976733240 } 2096*4882a593Smuzhiyun [23:21:56.989818514] (+0.000008521) hrtimer_expire_exit: { 0 }, { hrtimer = 3993812192 } 2097*4882a593Smuzhiyun [23:21:56.989819631] (+0.000001117) hrtimer_expire_exit: { 1 }, { hrtimer = 3993865440 } 2098*4882a593Smuzhiyun [23:21:56.989821866] (+0.000002235) hrtimer_start: { 0 }, { hrtimer = 3993812192, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 } 2099*4882a593Smuzhiyun [23:21:56.989822984] (+0.000001118) hrtimer_start: { 1 }, { hrtimer = 3993865440, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 } 2100*4882a593Smuzhiyun [23:21:56.989832762] (+0.000009778) softirq_entry: { 1 }, { vec = 1 } 2101*4882a593Smuzhiyun [23:21:56.989833879] (+0.000001117) softirq_entry: { 0 }, { vec = 1 } 2102*4882a593Smuzhiyun [23:21:56.989838069] (+0.000004190) timer_cancel: { 1 }, { timer = 3993871956 } 2103*4882a593Smuzhiyun [23:21:56.989839187] (+0.000001118) timer_cancel: { 0 }, { timer = 3993818708 } 2104*4882a593Smuzhiyun [23:21:56.989841492] (+0.000002305) timer_expire_entry: { 1 }, { timer = 3993871956, now = 79515980, function = 3238277552 } 2105*4882a593Smuzhiyun [23:21:56.989842819] (+0.000001327) timer_expire_entry: { 0 }, { timer = 3993818708, now = 79515980, function = 3238277552 } 2106*4882a593Smuzhiyun [23:21:56.989854831] (+0.000012012) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 49237, vruntime = 43368363335 } 2107*4882a593Smuzhiyun [23:21:56.989855949] (+0.000001118) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 45121, vruntime = 36976778361 } 2108*4882a593Smuzhiyun [23:21:56.989861257] (+0.000005308) sched_stat_sleep: { 1 }, { comm = "kworker/1:1", tid = 21, delay = 9451318 } 2109*4882a593Smuzhiyun [23:21:56.989862374] (+0.000001117) sched_stat_sleep: { 0 }, { comm = "kworker/0:0", tid = 4, delay = 9958820 } 2110*4882a593Smuzhiyun [23:21:56.989868241] (+0.000005867) sched_wakeup: { 0 }, { comm = "kworker/0:0", tid = 4, prio = 120, success = 1, target_cpu = 0 } 2111*4882a593Smuzhiyun [23:21:56.989869358] (+0.000001117) sched_wakeup: { 1 }, { comm = "kworker/1:1", tid = 21, prio = 120, success = 1, target_cpu = 1 } 2112*4882a593Smuzhiyun [23:21:56.989877460] (+0.000008102) timer_expire_exit: { 1 }, { timer = 3993871956 } 2113*4882a593Smuzhiyun [23:21:56.989878577] (+0.000001117) timer_expire_exit: { 0 }, { timer = 3993818708 } 2114*4882a593Smuzhiyun . 2115*4882a593Smuzhiyun . 2116*4882a593Smuzhiyun . 2117*4882a593Smuzhiyun 2118*4882a593SmuzhiyunYou can now safely destroy the trace 2119*4882a593Smuzhiyunsession (note that this doesn't delete the trace - it's still there in 2120*4882a593Smuzhiyun~/lttng-traces):: 2121*4882a593Smuzhiyun 2122*4882a593Smuzhiyun root@crownbay:~# lttng destroy 2123*4882a593Smuzhiyun Session auto-20121015-232120 destroyed at /home/root 2124*4882a593Smuzhiyun 2125*4882a593SmuzhiyunNote that the trace is saved in a directory of the same name as returned by 2126*4882a593Smuzhiyun'lttng create', under the ~/lttng-traces directory (note that you can change this by 2127*4882a593Smuzhiyunsupplying your own name to 'lttng create'):: 2128*4882a593Smuzhiyun 2129*4882a593Smuzhiyun root@crownbay:~# ls -al ~/lttng-traces 2130*4882a593Smuzhiyun drwxrwx--- 3 root root 1024 Oct 15 23:21 . 2131*4882a593Smuzhiyun drwxr-xr-x 5 root root 1024 Oct 15 23:57 .. 2132*4882a593Smuzhiyun drwxrwx--- 3 root root 1024 Oct 15 23:21 auto-20121015-232120 2133*4882a593Smuzhiyun 2134*4882a593SmuzhiyunCollecting and viewing a userspace trace on the target (inside a shell) 2135*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2136*4882a593Smuzhiyun 2137*4882a593SmuzhiyunFor LTTng userspace tracing, you need to have a properly instrumented 2138*4882a593Smuzhiyunuserspace program. For this example, we'll use the 'hello' test program 2139*4882a593Smuzhiyungenerated by the lttng-ust build. 2140*4882a593Smuzhiyun 2141*4882a593SmuzhiyunThe 'hello' test program isn't installed on the root filesystem by the lttng-ust 2142*4882a593Smuzhiyunbuild, so we need to copy it over manually. First cd into the build 2143*4882a593Smuzhiyundirectory that contains the hello executable:: 2144*4882a593Smuzhiyun 2145*4882a593Smuzhiyun $ cd build/tmp/work/core2_32-poky-linux/lttng-ust/2.0.5-r0/git/tests/hello/.libs 2146*4882a593Smuzhiyun 2147*4882a593SmuzhiyunCopy that over to the target machine:: 2148*4882a593Smuzhiyun 2149*4882a593Smuzhiyun $ scp hello root@192.168.1.20: 2150*4882a593Smuzhiyun 2151*4882a593SmuzhiyunYou now have the instrumented lttng 'hello world' test program on the 2152*4882a593Smuzhiyuntarget, ready to test. 2153*4882a593Smuzhiyun 2154*4882a593SmuzhiyunFirst, from the host, ssh to the target:: 2155*4882a593Smuzhiyun 2156*4882a593Smuzhiyun $ ssh -l root 192.168.1.47 2157*4882a593Smuzhiyun The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established. 2158*4882a593Smuzhiyun RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e. 2159*4882a593Smuzhiyun Are you sure you want to continue connecting (yes/no)? yes 2160*4882a593Smuzhiyun Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts. 2161*4882a593Smuzhiyun root@192.168.1.47's password: 2162*4882a593Smuzhiyun 2163*4882a593SmuzhiyunOnce on the target, use these steps to create a trace:: 2164*4882a593Smuzhiyun 2165*4882a593Smuzhiyun root@crownbay:~# lttng create 2166*4882a593Smuzhiyun Session auto-20190303-021943 created. 2167*4882a593Smuzhiyun Traces will be written in /home/root/lttng-traces/auto-20190303-021943 2168*4882a593Smuzhiyun 2169*4882a593SmuzhiyunEnable the events you want to trace (in this case all userspace events):: 2170*4882a593Smuzhiyun 2171*4882a593Smuzhiyun root@crownbay:~# lttng enable-event --userspace --all 2172*4882a593Smuzhiyun All UST events are enabled in channel channel0 2173*4882a593Smuzhiyun 2174*4882a593SmuzhiyunStart the trace:: 2175*4882a593Smuzhiyun 2176*4882a593Smuzhiyun root@crownbay:~# lttng start 2177*4882a593Smuzhiyun Tracing started for session auto-20190303-021943 2178*4882a593Smuzhiyun 2179*4882a593SmuzhiyunRun the instrumented hello world program:: 2180*4882a593Smuzhiyun 2181*4882a593Smuzhiyun root@crownbay:~# ./hello 2182*4882a593Smuzhiyun Hello, World! 2183*4882a593Smuzhiyun Tracing... done. 2184*4882a593Smuzhiyun 2185*4882a593SmuzhiyunAnd then stop the trace after awhile or after running a particular workload 2186*4882a593Smuzhiyunthat you want to trace:: 2187*4882a593Smuzhiyun 2188*4882a593Smuzhiyun root@crownbay:~# lttng stop 2189*4882a593Smuzhiyun Tracing stopped for session auto-20190303-021943 2190*4882a593Smuzhiyun 2191*4882a593SmuzhiyunYou can now view the trace in text form on the target:: 2192*4882a593Smuzhiyun 2193*4882a593Smuzhiyun root@crownbay:~# lttng view 2194*4882a593Smuzhiyun [02:31:14.906146544] (+?.?????????) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 0, intfield2 = 0x0, longfield = 0, netintfield = 0, netintfieldhex = 0x0, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 } 2195*4882a593Smuzhiyun [02:31:14.906170360] (+0.000023816) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 1, intfield2 = 0x1, longfield = 1, netintfield = 1, netintfieldhex = 0x1, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 } 2196*4882a593Smuzhiyun [02:31:14.906183140] (+0.000012780) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 2, intfield2 = 0x2, longfield = 2, netintfield = 2, netintfieldhex = 0x2, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 } 2197*4882a593Smuzhiyun [02:31:14.906194385] (+0.000011245) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 3, intfield2 = 0x3, longfield = 3, netintfield = 3, netintfieldhex = 0x3, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 } 2198*4882a593Smuzhiyun . 2199*4882a593Smuzhiyun . 2200*4882a593Smuzhiyun . 2201*4882a593Smuzhiyun 2202*4882a593SmuzhiyunYou can now safely destroy the trace session (note that this doesn't delete the 2203*4882a593Smuzhiyuntrace - it's still there in ~/lttng-traces):: 2204*4882a593Smuzhiyun 2205*4882a593Smuzhiyun root@crownbay:~# lttng destroy 2206*4882a593Smuzhiyun Session auto-20190303-021943 destroyed at /home/root 2207*4882a593Smuzhiyun 2208*4882a593SmuzhiyunLTTng Documentation 2209*4882a593Smuzhiyun------------------- 2210*4882a593Smuzhiyun 2211*4882a593SmuzhiyunYou can find the primary LTTng Documentation on the `LTTng 2212*4882a593SmuzhiyunDocumentation <https://lttng.org/docs/>`__ site. The documentation on 2213*4882a593Smuzhiyunthis site is appropriate for intermediate to advanced software 2214*4882a593Smuzhiyundevelopers who are working in a Linux environment and are interested in 2215*4882a593Smuzhiyunefficient software tracing. 2216*4882a593Smuzhiyun 2217*4882a593SmuzhiyunFor information on LTTng in general, visit the `LTTng 2218*4882a593SmuzhiyunProject <https://lttng.org/lttng2.0>`__ site. You can find a "Getting 2219*4882a593SmuzhiyunStarted" link on this site that takes you to an LTTng Quick Start. 2220*4882a593Smuzhiyun 2221*4882a593Smuzhiyunblktrace 2222*4882a593Smuzhiyun======== 2223*4882a593Smuzhiyun 2224*4882a593Smuzhiyunblktrace is a tool for tracing and reporting low-level disk I/O. 2225*4882a593Smuzhiyunblktrace provides the tracing half of the equation; its output can be 2226*4882a593Smuzhiyunpiped into the blkparse program, which renders the data in a 2227*4882a593Smuzhiyunhuman-readable form and does some basic analysis: 2228*4882a593Smuzhiyun 2229*4882a593Smuzhiyunblktrace Setup 2230*4882a593Smuzhiyun-------------- 2231*4882a593Smuzhiyun 2232*4882a593SmuzhiyunFor this section, we'll assume you've already performed the basic setup 2233*4882a593Smuzhiyunoutlined in the ":ref:`profile-manual/intro:General Setup`" 2234*4882a593Smuzhiyunsection. 2235*4882a593Smuzhiyun 2236*4882a593Smuzhiyunblktrace is an application that runs on the target system. You can run 2237*4882a593Smuzhiyunthe entire blktrace and blkparse pipeline on the target, or you can run 2238*4882a593Smuzhiyunblktrace in 'listen' mode on the target and have blktrace and blkparse 2239*4882a593Smuzhiyuncollect and analyze the data on the host (see the 2240*4882a593Smuzhiyun":ref:`profile-manual/usage:Using blktrace Remotely`" section 2241*4882a593Smuzhiyunbelow). For the rest of this section we assume you've ssh'ed to the host and 2242*4882a593Smuzhiyunwill be running blkrace on the target. 2243*4882a593Smuzhiyun 2244*4882a593SmuzhiyunBasic blktrace Usage 2245*4882a593Smuzhiyun-------------------- 2246*4882a593Smuzhiyun 2247*4882a593SmuzhiyunTo record a trace, simply run the 'blktrace' command, giving it the name 2248*4882a593Smuzhiyunof the block device you want to trace activity on:: 2249*4882a593Smuzhiyun 2250*4882a593Smuzhiyun root@crownbay:~# blktrace /dev/sdc 2251*4882a593Smuzhiyun 2252*4882a593SmuzhiyunIn another shell, execute a workload you want to trace. :: 2253*4882a593Smuzhiyun 2254*4882a593Smuzhiyun root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2; sync 2255*4882a593Smuzhiyun Connecting to downloads.yoctoproject.org (140.211.169.59:80) 2256*4882a593Smuzhiyun linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA 2257*4882a593Smuzhiyun 2258*4882a593SmuzhiyunPress Ctrl-C in the blktrace shell to stop the trace. It 2259*4882a593Smuzhiyunwill display how many events were logged, along with the per-cpu file 2260*4882a593Smuzhiyunsizes (blktrace records traces in per-cpu kernel buffers and simply 2261*4882a593Smuzhiyundumps them to userspace for blkparse to merge and sort later). :: 2262*4882a593Smuzhiyun 2263*4882a593Smuzhiyun ^C=== sdc === 2264*4882a593Smuzhiyun CPU 0: 7082 events, 332 KiB data 2265*4882a593Smuzhiyun CPU 1: 1578 events, 74 KiB data 2266*4882a593Smuzhiyun Total: 8660 events (dropped 0), 406 KiB data 2267*4882a593Smuzhiyun 2268*4882a593SmuzhiyunIf you examine the files saved to disk, you see multiple files, one per CPU and 2269*4882a593Smuzhiyunwith the device name as the first part of the filename:: 2270*4882a593Smuzhiyun 2271*4882a593Smuzhiyun root@crownbay:~# ls -al 2272*4882a593Smuzhiyun drwxr-xr-x 6 root root 1024 Oct 27 22:39 . 2273*4882a593Smuzhiyun drwxr-sr-x 4 root root 1024 Oct 26 18:24 .. 2274*4882a593Smuzhiyun -rw-r--r-- 1 root root 339938 Oct 27 22:40 sdc.blktrace.0 2275*4882a593Smuzhiyun -rw-r--r-- 1 root root 75753 Oct 27 22:40 sdc.blktrace.1 2276*4882a593Smuzhiyun 2277*4882a593SmuzhiyunTo view the trace events, simply invoke 'blkparse' in the directory 2278*4882a593Smuzhiyuncontaining the trace files, giving it the device name that forms the 2279*4882a593Smuzhiyunfirst part of the filenames:: 2280*4882a593Smuzhiyun 2281*4882a593Smuzhiyun root@crownbay:~# blkparse sdc 2282*4882a593Smuzhiyun 2283*4882a593Smuzhiyun 8,32 1 1 0.000000000 1225 Q WS 3417048 + 8 [jbd2/sdc-8] 2284*4882a593Smuzhiyun 8,32 1 2 0.000025213 1225 G WS 3417048 + 8 [jbd2/sdc-8] 2285*4882a593Smuzhiyun 8,32 1 3 0.000033384 1225 P N [jbd2/sdc-8] 2286*4882a593Smuzhiyun 8,32 1 4 0.000043301 1225 I WS 3417048 + 8 [jbd2/sdc-8] 2287*4882a593Smuzhiyun 8,32 1 0 0.000057270 0 m N cfq1225 insert_request 2288*4882a593Smuzhiyun 8,32 1 0 0.000064813 0 m N cfq1225 add_to_rr 2289*4882a593Smuzhiyun 8,32 1 5 0.000076336 1225 U N [jbd2/sdc-8] 1 2290*4882a593Smuzhiyun 8,32 1 0 0.000088559 0 m N cfq workload slice:150 2291*4882a593Smuzhiyun 8,32 1 0 0.000097359 0 m N cfq1225 set_active wl_prio:0 wl_type:1 2292*4882a593Smuzhiyun 8,32 1 0 0.000104063 0 m N cfq1225 Not idling. st->count:1 2293*4882a593Smuzhiyun 8,32 1 0 0.000112584 0 m N cfq1225 fifo= (null) 2294*4882a593Smuzhiyun 8,32 1 0 0.000118730 0 m N cfq1225 dispatch_insert 2295*4882a593Smuzhiyun 8,32 1 0 0.000127390 0 m N cfq1225 dispatched a request 2296*4882a593Smuzhiyun 8,32 1 0 0.000133536 0 m N cfq1225 activate rq, drv=1 2297*4882a593Smuzhiyun 8,32 1 6 0.000136889 1225 D WS 3417048 + 8 [jbd2/sdc-8] 2298*4882a593Smuzhiyun 8,32 1 7 0.000360381 1225 Q WS 3417056 + 8 [jbd2/sdc-8] 2299*4882a593Smuzhiyun 8,32 1 8 0.000377422 1225 G WS 3417056 + 8 [jbd2/sdc-8] 2300*4882a593Smuzhiyun 8,32 1 9 0.000388876 1225 P N [jbd2/sdc-8] 2301*4882a593Smuzhiyun 8,32 1 10 0.000397886 1225 Q WS 3417064 + 8 [jbd2/sdc-8] 2302*4882a593Smuzhiyun 8,32 1 11 0.000404800 1225 M WS 3417064 + 8 [jbd2/sdc-8] 2303*4882a593Smuzhiyun 8,32 1 12 0.000412343 1225 Q WS 3417072 + 8 [jbd2/sdc-8] 2304*4882a593Smuzhiyun 8,32 1 13 0.000416533 1225 M WS 3417072 + 8 [jbd2/sdc-8] 2305*4882a593Smuzhiyun 8,32 1 14 0.000422121 1225 Q WS 3417080 + 8 [jbd2/sdc-8] 2306*4882a593Smuzhiyun 8,32 1 15 0.000425194 1225 M WS 3417080 + 8 [jbd2/sdc-8] 2307*4882a593Smuzhiyun 8,32 1 16 0.000431968 1225 Q WS 3417088 + 8 [jbd2/sdc-8] 2308*4882a593Smuzhiyun 8,32 1 17 0.000435251 1225 M WS 3417088 + 8 [jbd2/sdc-8] 2309*4882a593Smuzhiyun 8,32 1 18 0.000440279 1225 Q WS 3417096 + 8 [jbd2/sdc-8] 2310*4882a593Smuzhiyun 8,32 1 19 0.000443911 1225 M WS 3417096 + 8 [jbd2/sdc-8] 2311*4882a593Smuzhiyun 8,32 1 20 0.000450336 1225 Q WS 3417104 + 8 [jbd2/sdc-8] 2312*4882a593Smuzhiyun 8,32 1 21 0.000454038 1225 M WS 3417104 + 8 [jbd2/sdc-8] 2313*4882a593Smuzhiyun 8,32 1 22 0.000462070 1225 Q WS 3417112 + 8 [jbd2/sdc-8] 2314*4882a593Smuzhiyun 8,32 1 23 0.000465422 1225 M WS 3417112 + 8 [jbd2/sdc-8] 2315*4882a593Smuzhiyun 8,32 1 24 0.000474222 1225 I WS 3417056 + 64 [jbd2/sdc-8] 2316*4882a593Smuzhiyun 8,32 1 0 0.000483022 0 m N cfq1225 insert_request 2317*4882a593Smuzhiyun 8,32 1 25 0.000489727 1225 U N [jbd2/sdc-8] 1 2318*4882a593Smuzhiyun 8,32 1 0 0.000498457 0 m N cfq1225 Not idling. st->count:1 2319*4882a593Smuzhiyun 8,32 1 0 0.000503765 0 m N cfq1225 dispatch_insert 2320*4882a593Smuzhiyun 8,32 1 0 0.000512914 0 m N cfq1225 dispatched a request 2321*4882a593Smuzhiyun 8,32 1 0 0.000518851 0 m N cfq1225 activate rq, drv=2 2322*4882a593Smuzhiyun . 2323*4882a593Smuzhiyun . 2324*4882a593Smuzhiyun . 2325*4882a593Smuzhiyun 8,32 0 0 58.515006138 0 m N cfq3551 complete rqnoidle 1 2326*4882a593Smuzhiyun 8,32 0 2024 58.516603269 3 C WS 3156992 + 16 [0] 2327*4882a593Smuzhiyun 8,32 0 0 58.516626736 0 m N cfq3551 complete rqnoidle 1 2328*4882a593Smuzhiyun 8,32 0 0 58.516634558 0 m N cfq3551 arm_idle: 8 group_idle: 0 2329*4882a593Smuzhiyun 8,32 0 0 58.516636933 0 m N cfq schedule dispatch 2330*4882a593Smuzhiyun 8,32 1 0 58.516971613 0 m N cfq3551 slice expired t=0 2331*4882a593Smuzhiyun 8,32 1 0 58.516982089 0 m N cfq3551 sl_used=13 disp=6 charge=13 iops=0 sect=80 2332*4882a593Smuzhiyun 8,32 1 0 58.516985511 0 m N cfq3551 del_from_rr 2333*4882a593Smuzhiyun 8,32 1 0 58.516990819 0 m N cfq3551 put_queue 2334*4882a593Smuzhiyun 2335*4882a593Smuzhiyun CPU0 (sdc): 2336*4882a593Smuzhiyun Reads Queued: 0, 0KiB Writes Queued: 331, 26,284KiB 2337*4882a593Smuzhiyun Read Dispatches: 0, 0KiB Write Dispatches: 485, 40,484KiB 2338*4882a593Smuzhiyun Reads Requeued: 0 Writes Requeued: 0 2339*4882a593Smuzhiyun Reads Completed: 0, 0KiB Writes Completed: 511, 41,000KiB 2340*4882a593Smuzhiyun Read Merges: 0, 0KiB Write Merges: 13, 160KiB 2341*4882a593Smuzhiyun Read depth: 0 Write depth: 2 2342*4882a593Smuzhiyun IO unplugs: 23 Timer unplugs: 0 2343*4882a593Smuzhiyun CPU1 (sdc): 2344*4882a593Smuzhiyun Reads Queued: 0, 0KiB Writes Queued: 249, 15,800KiB 2345*4882a593Smuzhiyun Read Dispatches: 0, 0KiB Write Dispatches: 42, 1,600KiB 2346*4882a593Smuzhiyun Reads Requeued: 0 Writes Requeued: 0 2347*4882a593Smuzhiyun Reads Completed: 0, 0KiB Writes Completed: 16, 1,084KiB 2348*4882a593Smuzhiyun Read Merges: 0, 0KiB Write Merges: 40, 276KiB 2349*4882a593Smuzhiyun Read depth: 0 Write depth: 2 2350*4882a593Smuzhiyun IO unplugs: 30 Timer unplugs: 1 2351*4882a593Smuzhiyun 2352*4882a593Smuzhiyun Total (sdc): 2353*4882a593Smuzhiyun Reads Queued: 0, 0KiB Writes Queued: 580, 42,084KiB 2354*4882a593Smuzhiyun Read Dispatches: 0, 0KiB Write Dispatches: 527, 42,084KiB 2355*4882a593Smuzhiyun Reads Requeued: 0 Writes Requeued: 0 2356*4882a593Smuzhiyun Reads Completed: 0, 0KiB Writes Completed: 527, 42,084KiB 2357*4882a593Smuzhiyun Read Merges: 0, 0KiB Write Merges: 53, 436KiB 2358*4882a593Smuzhiyun IO unplugs: 53 Timer unplugs: 1 2359*4882a593Smuzhiyun 2360*4882a593Smuzhiyun Throughput (R/W): 0KiB/s / 719KiB/s 2361*4882a593Smuzhiyun Events (sdc): 6,592 entries 2362*4882a593Smuzhiyun Skips: 0 forward (0 - 0.0%) 2363*4882a593Smuzhiyun Input file sdc.blktrace.0 added 2364*4882a593Smuzhiyun Input file sdc.blktrace.1 added 2365*4882a593Smuzhiyun 2366*4882a593SmuzhiyunThe report shows each event that was 2367*4882a593Smuzhiyunfound in the blktrace data, along with a summary of the overall block 2368*4882a593SmuzhiyunI/O traffic during the run. You can look at the 2369*4882a593Smuzhiyun`blkparse <https://linux.die.net/man/1/blkparse>`__ manpage to learn the 2370*4882a593Smuzhiyunmeaning of each field displayed in the trace listing. 2371*4882a593Smuzhiyun 2372*4882a593SmuzhiyunLive Mode 2373*4882a593Smuzhiyun~~~~~~~~~ 2374*4882a593Smuzhiyun 2375*4882a593Smuzhiyunblktrace and blkparse are designed from the ground up to be able to 2376*4882a593Smuzhiyunoperate together in a 'pipe mode' where the stdout of blktrace can be 2377*4882a593Smuzhiyunfed directly into the stdin of blkparse:: 2378*4882a593Smuzhiyun 2379*4882a593Smuzhiyun root@crownbay:~# blktrace /dev/sdc -o - | blkparse -i - 2380*4882a593Smuzhiyun 2381*4882a593SmuzhiyunThis enables long-lived tracing sessions 2382*4882a593Smuzhiyunto run without writing anything to disk, and allows the user to look for 2383*4882a593Smuzhiyuncertain conditions in the trace data in 'real-time' by viewing the trace 2384*4882a593Smuzhiyunoutput as it scrolls by on the screen or by passing it along to yet 2385*4882a593Smuzhiyunanother program in the pipeline such as grep which can be used to 2386*4882a593Smuzhiyunidentify and capture conditions of interest. 2387*4882a593Smuzhiyun 2388*4882a593SmuzhiyunThere's actually another blktrace command that implements the above 2389*4882a593Smuzhiyunpipeline as a single command, so the user doesn't have to bother typing 2390*4882a593Smuzhiyunin the above command sequence:: 2391*4882a593Smuzhiyun 2392*4882a593Smuzhiyun root@crownbay:~# btrace /dev/sdc 2393*4882a593Smuzhiyun 2394*4882a593SmuzhiyunUsing blktrace Remotely 2395*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~ 2396*4882a593Smuzhiyun 2397*4882a593SmuzhiyunBecause blktrace traces block I/O and at the same time normally writes 2398*4882a593Smuzhiyunits trace data to a block device, and in general because it's not really 2399*4882a593Smuzhiyuna great idea to make the device being traced the same as the device the 2400*4882a593Smuzhiyuntracer writes to, blktrace provides a way to trace without perturbing 2401*4882a593Smuzhiyunthe traced device at all by providing native support for sending all 2402*4882a593Smuzhiyuntrace data over the network. 2403*4882a593Smuzhiyun 2404*4882a593SmuzhiyunTo have blktrace operate in this mode, start blktrace on the target 2405*4882a593Smuzhiyunsystem being traced with the -l option, along with the device to trace:: 2406*4882a593Smuzhiyun 2407*4882a593Smuzhiyun root@crownbay:~# blktrace -l /dev/sdc 2408*4882a593Smuzhiyun server: waiting for connections... 2409*4882a593Smuzhiyun 2410*4882a593SmuzhiyunOn the host system, use the -h option to connect to the target system, 2411*4882a593Smuzhiyunalso passing it the device to trace:: 2412*4882a593Smuzhiyun 2413*4882a593Smuzhiyun $ blktrace -d /dev/sdc -h 192.168.1.43 2414*4882a593Smuzhiyun blktrace: connecting to 192.168.1.43 2415*4882a593Smuzhiyun blktrace: connected! 2416*4882a593Smuzhiyun 2417*4882a593SmuzhiyunOn the target system, you should see this:: 2418*4882a593Smuzhiyun 2419*4882a593Smuzhiyun server: connection from 192.168.1.43 2420*4882a593Smuzhiyun 2421*4882a593SmuzhiyunIn another shell, execute a workload you want to trace. :: 2422*4882a593Smuzhiyun 2423*4882a593Smuzhiyun root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2; sync 2424*4882a593Smuzhiyun Connecting to downloads.yoctoproject.org (140.211.169.59:80) 2425*4882a593Smuzhiyun linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA 2426*4882a593Smuzhiyun 2427*4882a593SmuzhiyunWhen it's done, do a Ctrl-C on the host system to stop the 2428*4882a593Smuzhiyuntrace:: 2429*4882a593Smuzhiyun 2430*4882a593Smuzhiyun ^C=== sdc === 2431*4882a593Smuzhiyun CPU 0: 7691 events, 361 KiB data 2432*4882a593Smuzhiyun CPU 1: 4109 events, 193 KiB data 2433*4882a593Smuzhiyun Total: 11800 events (dropped 0), 554 KiB data 2434*4882a593Smuzhiyun 2435*4882a593SmuzhiyunOn the target system, you should also see a trace summary for the trace 2436*4882a593Smuzhiyunjust ended:: 2437*4882a593Smuzhiyun 2438*4882a593Smuzhiyun server: end of run for 192.168.1.43:sdc 2439*4882a593Smuzhiyun === sdc === 2440*4882a593Smuzhiyun CPU 0: 7691 events, 361 KiB data 2441*4882a593Smuzhiyun CPU 1: 4109 events, 193 KiB data 2442*4882a593Smuzhiyun Total: 11800 events (dropped 0), 554 KiB data 2443*4882a593Smuzhiyun 2444*4882a593SmuzhiyunThe blktrace instance on the host will 2445*4882a593Smuzhiyunsave the target output inside a hostname-timestamp directory:: 2446*4882a593Smuzhiyun 2447*4882a593Smuzhiyun $ ls -al 2448*4882a593Smuzhiyun drwxr-xr-x 10 root root 1024 Oct 28 02:40 . 2449*4882a593Smuzhiyun drwxr-sr-x 4 root root 1024 Oct 26 18:24 .. 2450*4882a593Smuzhiyun drwxr-xr-x 2 root root 1024 Oct 28 02:40 192.168.1.43-2012-10-28-02:40:56 2451*4882a593Smuzhiyun 2452*4882a593Smuzhiyuncd into that directory to see the output files:: 2453*4882a593Smuzhiyun 2454*4882a593Smuzhiyun $ ls -l 2455*4882a593Smuzhiyun -rw-r--r-- 1 root root 369193 Oct 28 02:44 sdc.blktrace.0 2456*4882a593Smuzhiyun -rw-r--r-- 1 root root 197278 Oct 28 02:44 sdc.blktrace.1 2457*4882a593Smuzhiyun 2458*4882a593SmuzhiyunAnd run blkparse on the host system using the device name:: 2459*4882a593Smuzhiyun 2460*4882a593Smuzhiyun $ blkparse sdc 2461*4882a593Smuzhiyun 2462*4882a593Smuzhiyun 8,32 1 1 0.000000000 1263 Q RM 6016 + 8 [ls] 2463*4882a593Smuzhiyun 8,32 1 0 0.000036038 0 m N cfq1263 alloced 2464*4882a593Smuzhiyun 8,32 1 2 0.000039390 1263 G RM 6016 + 8 [ls] 2465*4882a593Smuzhiyun 8,32 1 3 0.000049168 1263 I RM 6016 + 8 [ls] 2466*4882a593Smuzhiyun 8,32 1 0 0.000056152 0 m N cfq1263 insert_request 2467*4882a593Smuzhiyun 8,32 1 0 0.000061600 0 m N cfq1263 add_to_rr 2468*4882a593Smuzhiyun 8,32 1 0 0.000075498 0 m N cfq workload slice:300 2469*4882a593Smuzhiyun . 2470*4882a593Smuzhiyun . 2471*4882a593Smuzhiyun . 2472*4882a593Smuzhiyun 8,32 0 0 177.266385696 0 m N cfq1267 arm_idle: 8 group_idle: 0 2473*4882a593Smuzhiyun 8,32 0 0 177.266388140 0 m N cfq schedule dispatch 2474*4882a593Smuzhiyun 8,32 1 0 177.266679239 0 m N cfq1267 slice expired t=0 2475*4882a593Smuzhiyun 8,32 1 0 177.266689297 0 m N cfq1267 sl_used=9 disp=6 charge=9 iops=0 sect=56 2476*4882a593Smuzhiyun 8,32 1 0 177.266692649 0 m N cfq1267 del_from_rr 2477*4882a593Smuzhiyun 8,32 1 0 177.266696560 0 m N cfq1267 put_queue 2478*4882a593Smuzhiyun 2479*4882a593Smuzhiyun CPU0 (sdc): 2480*4882a593Smuzhiyun Reads Queued: 0, 0KiB Writes Queued: 270, 21,708KiB 2481*4882a593Smuzhiyun Read Dispatches: 59, 2,628KiB Write Dispatches: 495, 39,964KiB 2482*4882a593Smuzhiyun Reads Requeued: 0 Writes Requeued: 0 2483*4882a593Smuzhiyun Reads Completed: 90, 2,752KiB Writes Completed: 543, 41,596KiB 2484*4882a593Smuzhiyun Read Merges: 0, 0KiB Write Merges: 9, 344KiB 2485*4882a593Smuzhiyun Read depth: 2 Write depth: 2 2486*4882a593Smuzhiyun IO unplugs: 20 Timer unplugs: 1 2487*4882a593Smuzhiyun CPU1 (sdc): 2488*4882a593Smuzhiyun Reads Queued: 688, 2,752KiB Writes Queued: 381, 20,652KiB 2489*4882a593Smuzhiyun Read Dispatches: 31, 124KiB Write Dispatches: 59, 2,396KiB 2490*4882a593Smuzhiyun Reads Requeued: 0 Writes Requeued: 0 2491*4882a593Smuzhiyun Reads Completed: 0, 0KiB Writes Completed: 11, 764KiB 2492*4882a593Smuzhiyun Read Merges: 598, 2,392KiB Write Merges: 88, 448KiB 2493*4882a593Smuzhiyun Read depth: 2 Write depth: 2 2494*4882a593Smuzhiyun IO unplugs: 52 Timer unplugs: 0 2495*4882a593Smuzhiyun 2496*4882a593Smuzhiyun Total (sdc): 2497*4882a593Smuzhiyun Reads Queued: 688, 2,752KiB Writes Queued: 651, 42,360KiB 2498*4882a593Smuzhiyun Read Dispatches: 90, 2,752KiB Write Dispatches: 554, 42,360KiB 2499*4882a593Smuzhiyun Reads Requeued: 0 Writes Requeued: 0 2500*4882a593Smuzhiyun Reads Completed: 90, 2,752KiB Writes Completed: 554, 42,360KiB 2501*4882a593Smuzhiyun Read Merges: 598, 2,392KiB Write Merges: 97, 792KiB 2502*4882a593Smuzhiyun IO unplugs: 72 Timer unplugs: 1 2503*4882a593Smuzhiyun 2504*4882a593Smuzhiyun Throughput (R/W): 15KiB/s / 238KiB/s 2505*4882a593Smuzhiyun Events (sdc): 9,301 entries 2506*4882a593Smuzhiyun Skips: 0 forward (0 - 0.0%) 2507*4882a593Smuzhiyun 2508*4882a593SmuzhiyunYou should see the trace events and summary just as you would have if you'd run 2509*4882a593Smuzhiyunthe same command on the target. 2510*4882a593Smuzhiyun 2511*4882a593SmuzhiyunTracing Block I/O via 'ftrace' 2512*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2513*4882a593Smuzhiyun 2514*4882a593SmuzhiyunIt's also possible to trace block I/O using only 2515*4882a593Smuzhiyun:ref:`profile-manual/usage:The 'trace events' Subsystem`, which 2516*4882a593Smuzhiyuncan be useful for casual tracing if you don't want to bother dealing with the 2517*4882a593Smuzhiyunuserspace tools. 2518*4882a593Smuzhiyun 2519*4882a593SmuzhiyunTo enable tracing for a given device, use /sys/block/xxx/trace/enable, 2520*4882a593Smuzhiyunwhere xxx is the device name. This for example enables tracing for 2521*4882a593Smuzhiyun/dev/sdc:: 2522*4882a593Smuzhiyun 2523*4882a593Smuzhiyun root@crownbay:/sys/kernel/debug/tracing# echo 1 > /sys/block/sdc/trace/enable 2524*4882a593Smuzhiyun 2525*4882a593SmuzhiyunOnce you've selected the device(s) you want 2526*4882a593Smuzhiyunto trace, selecting the 'blk' tracer will turn the blk tracer on:: 2527*4882a593Smuzhiyun 2528*4882a593Smuzhiyun root@crownbay:/sys/kernel/debug/tracing# cat available_tracers 2529*4882a593Smuzhiyun blk function_graph function nop 2530*4882a593Smuzhiyun 2531*4882a593Smuzhiyun root@crownbay:/sys/kernel/debug/tracing# echo blk > current_tracer 2532*4882a593Smuzhiyun 2533*4882a593SmuzhiyunExecute the workload you're interested in:: 2534*4882a593Smuzhiyun 2535*4882a593Smuzhiyun root@crownbay:/sys/kernel/debug/tracing# cat /media/sdc/testfile.txt 2536*4882a593Smuzhiyun 2537*4882a593SmuzhiyunAnd look at the output (note here that we're using 'trace_pipe' instead of 2538*4882a593Smuzhiyuntrace to capture this trace - this allows us to wait around on the pipe 2539*4882a593Smuzhiyunfor data to appear):: 2540*4882a593Smuzhiyun 2541*4882a593Smuzhiyun root@crownbay:/sys/kernel/debug/tracing# cat trace_pipe 2542*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276361: 8,32 Q R 1699848 + 8 [cat] 2543*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276410: 8,32 m N cfq3587 alloced 2544*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276415: 8,32 G R 1699848 + 8 [cat] 2545*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276424: 8,32 P N [cat] 2546*4882a593Smuzhiyun cat-3587 [001] d..2 3023.276432: 8,32 I R 1699848 + 8 [cat] 2547*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276439: 8,32 m N cfq3587 insert_request 2548*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276445: 8,32 m N cfq3587 add_to_rr 2549*4882a593Smuzhiyun cat-3587 [001] d..2 3023.276454: 8,32 U N [cat] 1 2550*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276464: 8,32 m N cfq workload slice:150 2551*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276471: 8,32 m N cfq3587 set_active wl_prio:0 wl_type:2 2552*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276478: 8,32 m N cfq3587 fifo= (null) 2553*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276483: 8,32 m N cfq3587 dispatch_insert 2554*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276490: 8,32 m N cfq3587 dispatched a request 2555*4882a593Smuzhiyun cat-3587 [001] d..1 3023.276497: 8,32 m N cfq3587 activate rq, drv=1 2556*4882a593Smuzhiyun cat-3587 [001] d..2 3023.276500: 8,32 D R 1699848 + 8 [cat] 2557*4882a593Smuzhiyun 2558*4882a593SmuzhiyunAnd this turns off tracing for the specified device:: 2559*4882a593Smuzhiyun 2560*4882a593Smuzhiyun root@crownbay:/sys/kernel/debug/tracing# echo 0 > /sys/block/sdc/trace/enable 2561*4882a593Smuzhiyun 2562*4882a593Smuzhiyunblktrace Documentation 2563*4882a593Smuzhiyun---------------------- 2564*4882a593Smuzhiyun 2565*4882a593SmuzhiyunOnline versions of the man pages for the commands discussed in this 2566*4882a593Smuzhiyunsection can be found here: 2567*4882a593Smuzhiyun 2568*4882a593Smuzhiyun- https://linux.die.net/man/8/blktrace 2569*4882a593Smuzhiyun 2570*4882a593Smuzhiyun- https://linux.die.net/man/1/blkparse 2571*4882a593Smuzhiyun 2572*4882a593Smuzhiyun- https://linux.die.net/man/8/btrace 2573*4882a593Smuzhiyun 2574*4882a593SmuzhiyunThe above manpages, along with manpages for the other blktrace utilities 2575*4882a593Smuzhiyun(btt, blkiomon, etc) can be found in the /doc directory of the blktrace 2576*4882a593Smuzhiyuntools git repo:: 2577*4882a593Smuzhiyun 2578*4882a593Smuzhiyun $ git clone git://git.kernel.dk/blktrace.git 2579