xref: /OK3568_Linux_fs/yocto/poky/documentation/profile-manual/usage.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: CC-BY-SA-2.0-UK
2*4882a593Smuzhiyun.. highlight:: shell
3*4882a593Smuzhiyun
4*4882a593Smuzhiyun***************************************************************
5*4882a593SmuzhiyunBasic Usage (with examples) for each of the Yocto Tracing Tools
6*4882a593Smuzhiyun***************************************************************
7*4882a593Smuzhiyun
8*4882a593Smuzhiyun|
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunThis chapter presents basic usage examples for each of the tracing
11*4882a593Smuzhiyuntools.
12*4882a593Smuzhiyun
13*4882a593Smuzhiyunperf
14*4882a593Smuzhiyun====
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunThe 'perf' tool is the profiling and tracing tool that comes bundled
17*4882a593Smuzhiyunwith the Linux kernel.
18*4882a593Smuzhiyun
19*4882a593SmuzhiyunDon't let the fact that it's part of the kernel fool you into thinking
20*4882a593Smuzhiyunthat it's only for tracing and profiling the kernel - you can indeed use
21*4882a593Smuzhiyunit to trace and profile just the kernel, but you can also use it to
22*4882a593Smuzhiyunprofile specific applications separately (with or without kernel
23*4882a593Smuzhiyuncontext), and you can also use it to trace and profile the kernel and
24*4882a593Smuzhiyunall applications on the system simultaneously to gain a system-wide view
25*4882a593Smuzhiyunof what's going on.
26*4882a593Smuzhiyun
27*4882a593SmuzhiyunIn many ways, perf aims to be a superset of all the tracing and
28*4882a593Smuzhiyunprofiling tools available in Linux today, including all the other tools
29*4882a593Smuzhiyuncovered in this HOWTO. The past couple of years have seen perf subsume a
30*4882a593Smuzhiyunlot of the functionality of those other tools and, at the same time,
31*4882a593Smuzhiyunthose other tools have removed large portions of their previous
32*4882a593Smuzhiyunfunctionality and replaced it with calls to the equivalent functionality
33*4882a593Smuzhiyunnow implemented by the perf subsystem. Extrapolation suggests that at
34*4882a593Smuzhiyunsome point those other tools will simply become completely redundant and
35*4882a593Smuzhiyungo away; until then, we'll cover those other tools in these pages and in
36*4882a593Smuzhiyunmany cases show how the same things can be accomplished in perf and the
37*4882a593Smuzhiyunother tools when it seems useful to do so.
38*4882a593Smuzhiyun
39*4882a593SmuzhiyunThe coverage below details some of the most common ways you'll likely
40*4882a593Smuzhiyunwant to apply the tool; full documentation can be found either within
41*4882a593Smuzhiyunthe tool itself or in the man pages at
42*4882a593Smuzhiyun`perf(1) <https://linux.die.net/man/1/perf>`__.
43*4882a593Smuzhiyun
44*4882a593SmuzhiyunPerf Setup
45*4882a593Smuzhiyun----------
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunFor this section, we'll assume you've already performed the basic setup
48*4882a593Smuzhiyunoutlined in the ":ref:`profile-manual/intro:General Setup`" section.
49*4882a593Smuzhiyun
50*4882a593SmuzhiyunIn particular, you'll get the most mileage out of perf if you profile an
51*4882a593Smuzhiyunimage built with the following in your ``local.conf`` file::
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun   INHIBIT_PACKAGE_STRIP = "1"
54*4882a593Smuzhiyun
55*4882a593Smuzhiyunperf runs on the target system for the most part. You can archive
56*4882a593Smuzhiyunprofile data and copy it to the host for analysis, but for the rest of
57*4882a593Smuzhiyunthis document we assume you've ssh'ed to the host and will be running
58*4882a593Smuzhiyunthe perf commands on the target.
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunBasic Perf Usage
61*4882a593Smuzhiyun----------------
62*4882a593Smuzhiyun
63*4882a593SmuzhiyunThe perf tool is pretty much self-documenting. To remind yourself of the
64*4882a593Smuzhiyunavailable commands, simply type 'perf', which will show you basic usage
65*4882a593Smuzhiyunalong with the available perf subcommands::
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun   root@crownbay:~# perf
68*4882a593Smuzhiyun
69*4882a593Smuzhiyun   usage: perf [--version] [--help] COMMAND [ARGS]
70*4882a593Smuzhiyun
71*4882a593Smuzhiyun   The most commonly used perf commands are:
72*4882a593Smuzhiyun     annotate        Read perf.data (created by perf record) and display annotated code
73*4882a593Smuzhiyun     archive         Create archive with object files with build-ids found in perf.data file
74*4882a593Smuzhiyun     bench           General framework for benchmark suites
75*4882a593Smuzhiyun     buildid-cache   Manage build-id cache.
76*4882a593Smuzhiyun     buildid-list    List the buildids in a perf.data file
77*4882a593Smuzhiyun     diff            Read two perf.data files and display the differential profile
78*4882a593Smuzhiyun     evlist          List the event names in a perf.data file
79*4882a593Smuzhiyun     inject          Filter to augment the events stream with additional information
80*4882a593Smuzhiyun     kmem            Tool to trace/measure kernel memory(slab) properties
81*4882a593Smuzhiyun     kvm             Tool to trace/measure kvm guest os
82*4882a593Smuzhiyun     list            List all symbolic event types
83*4882a593Smuzhiyun     lock            Analyze lock events
84*4882a593Smuzhiyun     probe           Define new dynamic tracepoints
85*4882a593Smuzhiyun     record          Run a command and record its profile into perf.data
86*4882a593Smuzhiyun     report          Read perf.data (created by perf record) and display the profile
87*4882a593Smuzhiyun     sched           Tool to trace/measure scheduler properties (latencies)
88*4882a593Smuzhiyun     script          Read perf.data (created by perf record) and display trace output
89*4882a593Smuzhiyun     stat            Run a command and gather performance counter statistics
90*4882a593Smuzhiyun     test            Runs sanity tests.
91*4882a593Smuzhiyun     timechart       Tool to visualize total system behavior during a workload
92*4882a593Smuzhiyun     top             System profiling tool.
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun   See 'perf help COMMAND' for more information on a specific command.
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunUsing perf to do Basic Profiling
98*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunAs a simple test case, we'll profile the 'wget' of a fairly large file,
101*4882a593Smuzhiyunwhich is a minimally interesting case because it has both file and
102*4882a593Smuzhiyunnetwork I/O aspects, and at least in the case of standard Yocto images,
103*4882a593Smuzhiyunit's implemented as part of BusyBox, so the methods we use to analyze it
104*4882a593Smuzhiyuncan be used in a very similar way to the whole host of supported BusyBox
105*4882a593Smuzhiyunapplets in Yocto. ::
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun   root@crownbay:~# rm linux-2.6.19.2.tar.bz2; \
108*4882a593Smuzhiyun                    wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunThe quickest and easiest way to get some basic overall data about what's
111*4882a593Smuzhiyungoing on for a particular workload is to profile it using 'perf stat'.
112*4882a593Smuzhiyun'perf stat' basically profiles using a few default counters and displays
113*4882a593Smuzhiyunthe summed counts at the end of the run::
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun   root@crownbay:~# perf stat wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2
116*4882a593Smuzhiyun   Connecting to downloads.yoctoproject.org (140.211.169.59:80)
117*4882a593Smuzhiyun   linux-2.6.19.2.tar.b 100% |***************************************************| 41727k  0:00:00 ETA
118*4882a593Smuzhiyun
119*4882a593Smuzhiyun   Performance counter stats for 'wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2':
120*4882a593Smuzhiyun
121*4882a593Smuzhiyun         4597.223902 task-clock                #    0.077 CPUs utilized
122*4882a593Smuzhiyun               23568 context-switches          #    0.005 M/sec
123*4882a593Smuzhiyun                  68 CPU-migrations            #    0.015 K/sec
124*4882a593Smuzhiyun                 241 page-faults               #    0.052 K/sec
125*4882a593Smuzhiyun          3045817293 cycles                    #    0.663 GHz
126*4882a593Smuzhiyun     <not supported> stalled-cycles-frontend
127*4882a593Smuzhiyun     <not supported> stalled-cycles-backend
128*4882a593Smuzhiyun           858909167 instructions              #    0.28  insns per cycle
129*4882a593Smuzhiyun           165441165 branches                  #   35.987 M/sec
130*4882a593Smuzhiyun            19550329 branch-misses             #   11.82% of all branches
131*4882a593Smuzhiyun
132*4882a593Smuzhiyun        59.836627620 seconds time elapsed
133*4882a593Smuzhiyun
134*4882a593SmuzhiyunMany times such a simple-minded test doesn't yield much of
135*4882a593Smuzhiyuninterest, but sometimes it does (see Real-world Yocto bug (slow
136*4882a593Smuzhiyunloop-mounted write speed)).
137*4882a593Smuzhiyun
138*4882a593SmuzhiyunAlso, note that 'perf stat' isn't restricted to a fixed set of counters
139*4882a593Smuzhiyun- basically any event listed in the output of 'perf list' can be tallied
140*4882a593Smuzhiyunby 'perf stat'. For example, suppose we wanted to see a summary of all
141*4882a593Smuzhiyunthe events related to kernel memory allocation/freeing along with cache
142*4882a593Smuzhiyunhits and misses::
143*4882a593Smuzhiyun
144*4882a593Smuzhiyun   root@crownbay:~# perf stat -e kmem:* -e cache-references -e cache-misses wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2
145*4882a593Smuzhiyun   Connecting to downloads.yoctoproject.org (140.211.169.59:80)
146*4882a593Smuzhiyun   linux-2.6.19.2.tar.b 100% |***************************************************| 41727k  0:00:00 ETA
147*4882a593Smuzhiyun
148*4882a593Smuzhiyun   Performance counter stats for 'wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2':
149*4882a593Smuzhiyun
150*4882a593Smuzhiyun                5566 kmem:kmalloc
151*4882a593Smuzhiyun              125517 kmem:kmem_cache_alloc
152*4882a593Smuzhiyun                   0 kmem:kmalloc_node
153*4882a593Smuzhiyun                   0 kmem:kmem_cache_alloc_node
154*4882a593Smuzhiyun               34401 kmem:kfree
155*4882a593Smuzhiyun               69920 kmem:kmem_cache_free
156*4882a593Smuzhiyun                 133 kmem:mm_page_free
157*4882a593Smuzhiyun                  41 kmem:mm_page_free_batched
158*4882a593Smuzhiyun               11502 kmem:mm_page_alloc
159*4882a593Smuzhiyun               11375 kmem:mm_page_alloc_zone_locked
160*4882a593Smuzhiyun                   0 kmem:mm_page_pcpu_drain
161*4882a593Smuzhiyun                   0 kmem:mm_page_alloc_extfrag
162*4882a593Smuzhiyun            66848602 cache-references
163*4882a593Smuzhiyun             2917740 cache-misses              #    4.365 % of all cache refs
164*4882a593Smuzhiyun
165*4882a593Smuzhiyun        44.831023415 seconds time elapsed
166*4882a593Smuzhiyun
167*4882a593SmuzhiyunSo 'perf stat' gives us a nice easy
168*4882a593Smuzhiyunway to get a quick overview of what might be happening for a set of
169*4882a593Smuzhiyunevents, but normally we'd need a little more detail in order to
170*4882a593Smuzhiyununderstand what's going on in a way that we can act on in a useful way.
171*4882a593Smuzhiyun
172*4882a593SmuzhiyunTo dive down into a next level of detail, we can use 'perf record'/'perf
173*4882a593Smuzhiyunreport' which will collect profiling data and present it to use using an
174*4882a593Smuzhiyuninteractive text-based UI (or simply as text if we specify ``--stdio`` to
175*4882a593Smuzhiyun'perf report').
176*4882a593Smuzhiyun
177*4882a593SmuzhiyunAs our first attempt at profiling this workload, we'll simply run 'perf
178*4882a593Smuzhiyunrecord', handing it the workload we want to profile (everything after
179*4882a593Smuzhiyun'perf record' and any perf options we hand it - here none - will be
180*4882a593Smuzhiyunexecuted in a new shell). perf collects samples until the process exits
181*4882a593Smuzhiyunand records them in a file named 'perf.data' in the current working
182*4882a593Smuzhiyundirectory. ::
183*4882a593Smuzhiyun
184*4882a593Smuzhiyun   root@crownbay:~# perf record wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2
185*4882a593Smuzhiyun
186*4882a593Smuzhiyun   Connecting to downloads.yoctoproject.org (140.211.169.59:80)
187*4882a593Smuzhiyun   linux-2.6.19.2.tar.b 100% |************************************************| 41727k  0:00:00 ETA
188*4882a593Smuzhiyun   [ perf record: Woken up 1 times to write data ]
189*4882a593Smuzhiyun   [ perf record: Captured and wrote 0.176 MB perf.data (~7700 samples) ]
190*4882a593Smuzhiyun
191*4882a593SmuzhiyunTo see the results in a
192*4882a593Smuzhiyun'text-based UI' (tui), simply run 'perf report', which will read the
193*4882a593Smuzhiyunperf.data file in the current working directory and display the results
194*4882a593Smuzhiyunin an interactive UI::
195*4882a593Smuzhiyun
196*4882a593Smuzhiyun   root@crownbay:~# perf report
197*4882a593Smuzhiyun
198*4882a593Smuzhiyun.. image:: figures/perf-wget-flat-stripped.png
199*4882a593Smuzhiyun   :align: center
200*4882a593Smuzhiyun
201*4882a593SmuzhiyunThe above screenshot displays a 'flat' profile, one entry for each
202*4882a593Smuzhiyun'bucket' corresponding to the functions that were profiled during the
203*4882a593Smuzhiyunprofiling run, ordered from the most popular to the least (perf has
204*4882a593Smuzhiyunoptions to sort in various orders and keys as well as display entries
205*4882a593Smuzhiyunonly above a certain threshold and so on - see the perf documentation
206*4882a593Smuzhiyunfor details). Note that this includes both userspace functions (entries
207*4882a593Smuzhiyuncontaining a [.]) and kernel functions accounted to the process (entries
208*4882a593Smuzhiyuncontaining a [k]). (perf has command-line modifiers that can be used to
209*4882a593Smuzhiyunrestrict the profiling to kernel or userspace, among others).
210*4882a593Smuzhiyun
211*4882a593SmuzhiyunNotice also that the above report shows an entry for 'busybox', which is
212*4882a593Smuzhiyunthe executable that implements 'wget' in Yocto, but that instead of a
213*4882a593Smuzhiyunuseful function name in that entry, it displays a not-so-friendly hex
214*4882a593Smuzhiyunvalue instead. The steps below will show how to fix that problem.
215*4882a593Smuzhiyun
216*4882a593SmuzhiyunBefore we do that, however, let's try running a different profile, one
217*4882a593Smuzhiyunwhich shows something a little more interesting. The only difference
218*4882a593Smuzhiyunbetween the new profile and the previous one is that we'll add the -g
219*4882a593Smuzhiyunoption, which will record not just the address of a sampled function,
220*4882a593Smuzhiyunbut the entire callchain to the sampled function as well::
221*4882a593Smuzhiyun
222*4882a593Smuzhiyun   root@crownbay:~# perf record -g wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2
223*4882a593Smuzhiyun   Connecting to downloads.yoctoproject.org (140.211.169.59:80)
224*4882a593Smuzhiyun   linux-2.6.19.2.tar.b 100% |************************************************| 41727k  0:00:00 ETA
225*4882a593Smuzhiyun   [ perf record: Woken up 3 times to write data ]
226*4882a593Smuzhiyun   [ perf record: Captured and wrote 0.652 MB perf.data (~28476 samples) ]
227*4882a593Smuzhiyun
228*4882a593Smuzhiyun
229*4882a593Smuzhiyun   root@crownbay:~# perf report
230*4882a593Smuzhiyun
231*4882a593Smuzhiyun.. image:: figures/perf-wget-g-copy-to-user-expanded-stripped.png
232*4882a593Smuzhiyun   :align: center
233*4882a593Smuzhiyun
234*4882a593SmuzhiyunUsing the callgraph view, we can actually see not only which functions
235*4882a593Smuzhiyuntook the most time, but we can also see a summary of how those functions
236*4882a593Smuzhiyunwere called and learn something about how the program interacts with the
237*4882a593Smuzhiyunkernel in the process.
238*4882a593Smuzhiyun
239*4882a593SmuzhiyunNotice that each entry in the above screenshot now contains a '+' on the
240*4882a593Smuzhiyunleft-hand side. This means that we can expand the entry and drill down
241*4882a593Smuzhiyuninto the callchains that feed into that entry. Pressing 'enter' on any
242*4882a593Smuzhiyunone of them will expand the callchain (you can also press 'E' to expand
243*4882a593Smuzhiyunthem all at the same time or 'C' to collapse them all).
244*4882a593Smuzhiyun
245*4882a593SmuzhiyunIn the screenshot above, we've toggled the ``__copy_to_user_ll()`` entry
246*4882a593Smuzhiyunand several subnodes all the way down. This lets us see which callchains
247*4882a593Smuzhiyuncontributed to the profiled ``__copy_to_user_ll()`` function which
248*4882a593Smuzhiyuncontributed 1.77% to the total profile.
249*4882a593Smuzhiyun
250*4882a593SmuzhiyunAs a bit of background explanation for these callchains, think about
251*4882a593Smuzhiyunwhat happens at a high level when you run wget to get a file out on the
252*4882a593Smuzhiyunnetwork. Basically what happens is that the data comes into the kernel
253*4882a593Smuzhiyunvia the network connection (socket) and is passed to the userspace
254*4882a593Smuzhiyunprogram 'wget' (which is actually a part of BusyBox, but that's not
255*4882a593Smuzhiyunimportant for now), which takes the buffers the kernel passes to it and
256*4882a593Smuzhiyunwrites it to a disk file to save it.
257*4882a593Smuzhiyun
258*4882a593SmuzhiyunThe part of this process that we're looking at in the above call stacks
259*4882a593Smuzhiyunis the part where the kernel passes the data it has read from the socket
260*4882a593Smuzhiyundown to wget i.e. a copy-to-user.
261*4882a593Smuzhiyun
262*4882a593SmuzhiyunNotice also that here there's also a case where the hex value is
263*4882a593Smuzhiyundisplayed in the callstack, here in the expanded ``sys_clock_gettime()``
264*4882a593Smuzhiyunfunction. Later we'll see it resolve to a userspace function call in
265*4882a593Smuzhiyunbusybox.
266*4882a593Smuzhiyun
267*4882a593Smuzhiyun.. image:: figures/perf-wget-g-copy-from-user-expanded-stripped.png
268*4882a593Smuzhiyun   :align: center
269*4882a593Smuzhiyun
270*4882a593SmuzhiyunThe above screenshot shows the other half of the journey for the data -
271*4882a593Smuzhiyunfrom the wget program's userspace buffers to disk. To get the buffers to
272*4882a593Smuzhiyundisk, the wget program issues a ``write(2)``, which does a ``copy-from-user`` to
273*4882a593Smuzhiyunthe kernel, which then takes care via some circuitous path (probably
274*4882a593Smuzhiyunalso present somewhere in the profile data), to get it safely to disk.
275*4882a593Smuzhiyun
276*4882a593SmuzhiyunNow that we've seen the basic layout of the profile data and the basics
277*4882a593Smuzhiyunof how to extract useful information out of it, let's get back to the
278*4882a593Smuzhiyuntask at hand and see if we can get some basic idea about where the time
279*4882a593Smuzhiyunis spent in the program we're profiling, wget. Remember that wget is
280*4882a593Smuzhiyunactually implemented as an applet in BusyBox, so while the process name
281*4882a593Smuzhiyunis 'wget', the executable we're actually interested in is BusyBox. So
282*4882a593Smuzhiyunlet's expand the first entry containing BusyBox:
283*4882a593Smuzhiyun
284*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-expanded-stripped.png
285*4882a593Smuzhiyun   :align: center
286*4882a593Smuzhiyun
287*4882a593SmuzhiyunAgain, before we expanded we saw that the function was labeled with a
288*4882a593Smuzhiyunhex value instead of a symbol as with most of the kernel entries.
289*4882a593SmuzhiyunExpanding the BusyBox entry doesn't make it any better.
290*4882a593Smuzhiyun
291*4882a593SmuzhiyunThe problem is that perf can't find the symbol information for the
292*4882a593Smuzhiyunbusybox binary, which is actually stripped out by the Yocto build
293*4882a593Smuzhiyunsystem.
294*4882a593Smuzhiyun
295*4882a593SmuzhiyunOne way around that is to put the following in your ``local.conf`` file
296*4882a593Smuzhiyunwhen you build the image::
297*4882a593Smuzhiyun
298*4882a593Smuzhiyun   INHIBIT_PACKAGE_STRIP = "1"
299*4882a593Smuzhiyun
300*4882a593SmuzhiyunHowever, we already have an image with the binaries stripped, so
301*4882a593Smuzhiyunwhat can we do to get perf to resolve the symbols? Basically we need to
302*4882a593Smuzhiyuninstall the debuginfo for the BusyBox package.
303*4882a593Smuzhiyun
304*4882a593SmuzhiyunTo generate the debug info for the packages in the image, we can add
305*4882a593Smuzhiyun``dbg-pkgs`` to :term:`EXTRA_IMAGE_FEATURES` in ``local.conf``. For example::
306*4882a593Smuzhiyun
307*4882a593Smuzhiyun   EXTRA_IMAGE_FEATURES = "debug-tweaks tools-profile dbg-pkgs"
308*4882a593Smuzhiyun
309*4882a593SmuzhiyunAdditionally, in order to generate the type of debuginfo that perf
310*4882a593Smuzhiyununderstands, we also need to set
311*4882a593Smuzhiyun:term:`PACKAGE_DEBUG_SPLIT_STYLE`
312*4882a593Smuzhiyunin the ``local.conf`` file::
313*4882a593Smuzhiyun
314*4882a593Smuzhiyun   PACKAGE_DEBUG_SPLIT_STYLE = 'debug-file-directory'
315*4882a593Smuzhiyun
316*4882a593SmuzhiyunOnce we've done that, we can install the
317*4882a593Smuzhiyundebuginfo for BusyBox. The debug packages once built can be found in
318*4882a593Smuzhiyun``build/tmp/deploy/rpm/*`` on the host system. Find the busybox-dbg-...rpm
319*4882a593Smuzhiyunfile and copy it to the target. For example::
320*4882a593Smuzhiyun
321*4882a593Smuzhiyun   [trz@empanada core2]$ scp /home/trz/yocto/crownbay-tracing-dbg/build/tmp/deploy/rpm/core2_32/busybox-dbg-1.20.2-r2.core2_32.rpm root@192.168.1.31:
322*4882a593Smuzhiyun   busybox-dbg-1.20.2-r2.core2_32.rpm                     100% 1826KB   1.8MB/s   00:01
323*4882a593Smuzhiyun
324*4882a593SmuzhiyunNow install the debug rpm on the target::
325*4882a593Smuzhiyun
326*4882a593Smuzhiyun   root@crownbay:~# rpm -i busybox-dbg-1.20.2-r2.core2_32.rpm
327*4882a593Smuzhiyun
328*4882a593SmuzhiyunNow that the debuginfo is installed, we see that the BusyBox entries now display
329*4882a593Smuzhiyuntheir functions symbolically:
330*4882a593Smuzhiyun
331*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-debuginfo.png
332*4882a593Smuzhiyun   :align: center
333*4882a593Smuzhiyun
334*4882a593SmuzhiyunIf we expand one of the entries and press 'enter' on a leaf node, we're
335*4882a593Smuzhiyunpresented with a menu of actions we can take to get more information
336*4882a593Smuzhiyunrelated to that entry:
337*4882a593Smuzhiyun
338*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-dso-zoom-menu.png
339*4882a593Smuzhiyun   :align: center
340*4882a593Smuzhiyun
341*4882a593SmuzhiyunOne of these actions allows us to show a view that displays a
342*4882a593Smuzhiyunbusybox-centric view of the profiled functions (in this case we've also
343*4882a593Smuzhiyunexpanded all the nodes using the 'E' key):
344*4882a593Smuzhiyun
345*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-dso-zoom.png
346*4882a593Smuzhiyun   :align: center
347*4882a593Smuzhiyun
348*4882a593SmuzhiyunFinally, we can see that now that the BusyBox debuginfo is installed,
349*4882a593Smuzhiyunthe previously unresolved symbol in the ``sys_clock_gettime()`` entry
350*4882a593Smuzhiyunmentioned previously is now resolved, and shows that the
351*4882a593Smuzhiyunsys_clock_gettime system call that was the source of 6.75% of the
352*4882a593Smuzhiyuncopy-to-user overhead was initiated by the ``handle_input()`` BusyBox
353*4882a593Smuzhiyunfunction:
354*4882a593Smuzhiyun
355*4882a593Smuzhiyun.. image:: figures/perf-wget-g-copy-to-user-expanded-debuginfo.png
356*4882a593Smuzhiyun   :align: center
357*4882a593Smuzhiyun
358*4882a593SmuzhiyunAt the lowest level of detail, we can dive down to the assembly level
359*4882a593Smuzhiyunand see which instructions caused the most overhead in a function.
360*4882a593SmuzhiyunPressing 'enter' on the 'udhcpc_main' function, we're again presented
361*4882a593Smuzhiyunwith a menu:
362*4882a593Smuzhiyun
363*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-annotate-menu.png
364*4882a593Smuzhiyun   :align: center
365*4882a593Smuzhiyun
366*4882a593SmuzhiyunSelecting 'Annotate udhcpc_main', we get a detailed listing of
367*4882a593Smuzhiyunpercentages by instruction for the udhcpc_main function. From the
368*4882a593Smuzhiyundisplay, we can see that over 50% of the time spent in this function is
369*4882a593Smuzhiyuntaken up by a couple tests and the move of a constant (1) to a register:
370*4882a593Smuzhiyun
371*4882a593Smuzhiyun.. image:: figures/perf-wget-busybox-annotate-udhcpc.png
372*4882a593Smuzhiyun   :align: center
373*4882a593Smuzhiyun
374*4882a593SmuzhiyunAs a segue into tracing, let's try another profile using a different
375*4882a593Smuzhiyuncounter, something other than the default 'cycles'.
376*4882a593Smuzhiyun
377*4882a593SmuzhiyunThe tracing and profiling infrastructure in Linux has become unified in
378*4882a593Smuzhiyuna way that allows us to use the same tool with a completely different
379*4882a593Smuzhiyunset of counters, not just the standard hardware counters that
380*4882a593Smuzhiyuntraditional tools have had to restrict themselves to (of course the
381*4882a593Smuzhiyuntraditional tools can also make use of the expanded possibilities now
382*4882a593Smuzhiyunavailable to them, and in some cases have, as mentioned previously).
383*4882a593Smuzhiyun
384*4882a593SmuzhiyunWe can get a list of the available events that can be used to profile a
385*4882a593Smuzhiyunworkload via 'perf list'::
386*4882a593Smuzhiyun
387*4882a593Smuzhiyun   root@crownbay:~# perf list
388*4882a593Smuzhiyun
389*4882a593Smuzhiyun   List of pre-defined events (to be used in -e):
390*4882a593Smuzhiyun    cpu-cycles OR cycles                               [Hardware event]
391*4882a593Smuzhiyun    stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
392*4882a593Smuzhiyun    stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
393*4882a593Smuzhiyun    instructions                                       [Hardware event]
394*4882a593Smuzhiyun    cache-references                                   [Hardware event]
395*4882a593Smuzhiyun    cache-misses                                       [Hardware event]
396*4882a593Smuzhiyun    branch-instructions OR branches                    [Hardware event]
397*4882a593Smuzhiyun    branch-misses                                      [Hardware event]
398*4882a593Smuzhiyun    bus-cycles                                         [Hardware event]
399*4882a593Smuzhiyun    ref-cycles                                         [Hardware event]
400*4882a593Smuzhiyun
401*4882a593Smuzhiyun    cpu-clock                                          [Software event]
402*4882a593Smuzhiyun    task-clock                                         [Software event]
403*4882a593Smuzhiyun    page-faults OR faults                              [Software event]
404*4882a593Smuzhiyun    minor-faults                                       [Software event]
405*4882a593Smuzhiyun    major-faults                                       [Software event]
406*4882a593Smuzhiyun    context-switches OR cs                             [Software event]
407*4882a593Smuzhiyun    cpu-migrations OR migrations                       [Software event]
408*4882a593Smuzhiyun    alignment-faults                                   [Software event]
409*4882a593Smuzhiyun    emulation-faults                                   [Software event]
410*4882a593Smuzhiyun
411*4882a593Smuzhiyun    L1-dcache-loads                                    [Hardware cache event]
412*4882a593Smuzhiyun    L1-dcache-load-misses                              [Hardware cache event]
413*4882a593Smuzhiyun    L1-dcache-prefetch-misses                          [Hardware cache event]
414*4882a593Smuzhiyun    L1-icache-loads                                    [Hardware cache event]
415*4882a593Smuzhiyun    L1-icache-load-misses                              [Hardware cache event]
416*4882a593Smuzhiyun    .
417*4882a593Smuzhiyun    .
418*4882a593Smuzhiyun    .
419*4882a593Smuzhiyun    rNNN                                               [Raw hardware event descriptor]
420*4882a593Smuzhiyun    cpu/t1=v1[,t2=v2,t3 ...]/modifier                  [Raw hardware event descriptor]
421*4882a593Smuzhiyun     (see 'perf list --help' on how to encode it)
422*4882a593Smuzhiyun
423*4882a593Smuzhiyun    mem:<addr>[:access]                                [Hardware breakpoint]
424*4882a593Smuzhiyun
425*4882a593Smuzhiyun    sunrpc:rpc_call_status                             [Tracepoint event]
426*4882a593Smuzhiyun    sunrpc:rpc_bind_status                             [Tracepoint event]
427*4882a593Smuzhiyun    sunrpc:rpc_connect_status                          [Tracepoint event]
428*4882a593Smuzhiyun    sunrpc:rpc_task_begin                              [Tracepoint event]
429*4882a593Smuzhiyun    skb:kfree_skb                                      [Tracepoint event]
430*4882a593Smuzhiyun    skb:consume_skb                                    [Tracepoint event]
431*4882a593Smuzhiyun    skb:skb_copy_datagram_iovec                        [Tracepoint event]
432*4882a593Smuzhiyun    net:net_dev_xmit                                   [Tracepoint event]
433*4882a593Smuzhiyun    net:net_dev_queue                                  [Tracepoint event]
434*4882a593Smuzhiyun    net:netif_receive_skb                              [Tracepoint event]
435*4882a593Smuzhiyun    net:netif_rx                                       [Tracepoint event]
436*4882a593Smuzhiyun    napi:napi_poll                                     [Tracepoint event]
437*4882a593Smuzhiyun    sock:sock_rcvqueue_full                            [Tracepoint event]
438*4882a593Smuzhiyun    sock:sock_exceed_buf_limit                         [Tracepoint event]
439*4882a593Smuzhiyun    udp:udp_fail_queue_rcv_skb                         [Tracepoint event]
440*4882a593Smuzhiyun    hda:hda_send_cmd                                   [Tracepoint event]
441*4882a593Smuzhiyun    hda:hda_get_response                               [Tracepoint event]
442*4882a593Smuzhiyun    hda:hda_bus_reset                                  [Tracepoint event]
443*4882a593Smuzhiyun    scsi:scsi_dispatch_cmd_start                       [Tracepoint event]
444*4882a593Smuzhiyun    scsi:scsi_dispatch_cmd_error                       [Tracepoint event]
445*4882a593Smuzhiyun    scsi:scsi_eh_wakeup                                [Tracepoint event]
446*4882a593Smuzhiyun    drm:drm_vblank_event                               [Tracepoint event]
447*4882a593Smuzhiyun    drm:drm_vblank_event_queued                        [Tracepoint event]
448*4882a593Smuzhiyun    drm:drm_vblank_event_delivered                     [Tracepoint event]
449*4882a593Smuzhiyun    random:mix_pool_bytes                              [Tracepoint event]
450*4882a593Smuzhiyun    random:mix_pool_bytes_nolock                       [Tracepoint event]
451*4882a593Smuzhiyun    random:credit_entropy_bits                         [Tracepoint event]
452*4882a593Smuzhiyun    gpio:gpio_direction                                [Tracepoint event]
453*4882a593Smuzhiyun    gpio:gpio_value                                    [Tracepoint event]
454*4882a593Smuzhiyun    block:block_rq_abort                               [Tracepoint event]
455*4882a593Smuzhiyun    block:block_rq_requeue                             [Tracepoint event]
456*4882a593Smuzhiyun    block:block_rq_issue                               [Tracepoint event]
457*4882a593Smuzhiyun    block:block_bio_bounce                             [Tracepoint event]
458*4882a593Smuzhiyun    block:block_bio_complete                           [Tracepoint event]
459*4882a593Smuzhiyun    block:block_bio_backmerge                          [Tracepoint event]
460*4882a593Smuzhiyun    .
461*4882a593Smuzhiyun    .
462*4882a593Smuzhiyun    writeback:writeback_wake_thread                    [Tracepoint event]
463*4882a593Smuzhiyun    writeback:writeback_wake_forker_thread             [Tracepoint event]
464*4882a593Smuzhiyun    writeback:writeback_bdi_register                   [Tracepoint event]
465*4882a593Smuzhiyun    .
466*4882a593Smuzhiyun    .
467*4882a593Smuzhiyun    writeback:writeback_single_inode_requeue           [Tracepoint event]
468*4882a593Smuzhiyun    writeback:writeback_single_inode                   [Tracepoint event]
469*4882a593Smuzhiyun    kmem:kmalloc                                       [Tracepoint event]
470*4882a593Smuzhiyun    kmem:kmem_cache_alloc                              [Tracepoint event]
471*4882a593Smuzhiyun    kmem:mm_page_alloc                                 [Tracepoint event]
472*4882a593Smuzhiyun    kmem:mm_page_alloc_zone_locked                     [Tracepoint event]
473*4882a593Smuzhiyun    kmem:mm_page_pcpu_drain                            [Tracepoint event]
474*4882a593Smuzhiyun    kmem:mm_page_alloc_extfrag                         [Tracepoint event]
475*4882a593Smuzhiyun    vmscan:mm_vmscan_kswapd_sleep                      [Tracepoint event]
476*4882a593Smuzhiyun    vmscan:mm_vmscan_kswapd_wake                       [Tracepoint event]
477*4882a593Smuzhiyun    vmscan:mm_vmscan_wakeup_kswapd                     [Tracepoint event]
478*4882a593Smuzhiyun    vmscan:mm_vmscan_direct_reclaim_begin              [Tracepoint event]
479*4882a593Smuzhiyun    .
480*4882a593Smuzhiyun    .
481*4882a593Smuzhiyun    module:module_get                                  [Tracepoint event]
482*4882a593Smuzhiyun    module:module_put                                  [Tracepoint event]
483*4882a593Smuzhiyun    module:module_request                              [Tracepoint event]
484*4882a593Smuzhiyun    sched:sched_kthread_stop                           [Tracepoint event]
485*4882a593Smuzhiyun    sched:sched_wakeup                                 [Tracepoint event]
486*4882a593Smuzhiyun    sched:sched_wakeup_new                             [Tracepoint event]
487*4882a593Smuzhiyun    sched:sched_process_fork                           [Tracepoint event]
488*4882a593Smuzhiyun    sched:sched_process_exec                           [Tracepoint event]
489*4882a593Smuzhiyun    sched:sched_stat_runtime                           [Tracepoint event]
490*4882a593Smuzhiyun    rcu:rcu_utilization                                [Tracepoint event]
491*4882a593Smuzhiyun    workqueue:workqueue_queue_work                     [Tracepoint event]
492*4882a593Smuzhiyun    workqueue:workqueue_execute_end                    [Tracepoint event]
493*4882a593Smuzhiyun    signal:signal_generate                             [Tracepoint event]
494*4882a593Smuzhiyun    signal:signal_deliver                              [Tracepoint event]
495*4882a593Smuzhiyun    timer:timer_init                                   [Tracepoint event]
496*4882a593Smuzhiyun    timer:timer_start                                  [Tracepoint event]
497*4882a593Smuzhiyun    timer:hrtimer_cancel                               [Tracepoint event]
498*4882a593Smuzhiyun    timer:itimer_state                                 [Tracepoint event]
499*4882a593Smuzhiyun    timer:itimer_expire                                [Tracepoint event]
500*4882a593Smuzhiyun    irq:irq_handler_entry                              [Tracepoint event]
501*4882a593Smuzhiyun    irq:irq_handler_exit                               [Tracepoint event]
502*4882a593Smuzhiyun    irq:softirq_entry                                  [Tracepoint event]
503*4882a593Smuzhiyun    irq:softirq_exit                                   [Tracepoint event]
504*4882a593Smuzhiyun    irq:softirq_raise                                  [Tracepoint event]
505*4882a593Smuzhiyun    printk:console                                     [Tracepoint event]
506*4882a593Smuzhiyun    task:task_newtask                                  [Tracepoint event]
507*4882a593Smuzhiyun    task:task_rename                                   [Tracepoint event]
508*4882a593Smuzhiyun    syscalls:sys_enter_socketcall                      [Tracepoint event]
509*4882a593Smuzhiyun    syscalls:sys_exit_socketcall                       [Tracepoint event]
510*4882a593Smuzhiyun    .
511*4882a593Smuzhiyun    .
512*4882a593Smuzhiyun    .
513*4882a593Smuzhiyun    syscalls:sys_enter_unshare                         [Tracepoint event]
514*4882a593Smuzhiyun    syscalls:sys_exit_unshare                          [Tracepoint event]
515*4882a593Smuzhiyun    raw_syscalls:sys_enter                             [Tracepoint event]
516*4882a593Smuzhiyun    raw_syscalls:sys_exit                              [Tracepoint event]
517*4882a593Smuzhiyun
518*4882a593Smuzhiyun.. admonition:: Tying it Together
519*4882a593Smuzhiyun
520*4882a593Smuzhiyun   These are exactly the same set of events defined by the trace event
521*4882a593Smuzhiyun   subsystem and exposed by ftrace/tracecmd/kernelshark as files in
522*4882a593Smuzhiyun   /sys/kernel/debug/tracing/events, by SystemTap as
523*4882a593Smuzhiyun   kernel.trace("tracepoint_name") and (partially) accessed by LTTng.
524*4882a593Smuzhiyun
525*4882a593SmuzhiyunOnly a subset of these would be of interest to us when looking at this
526*4882a593Smuzhiyunworkload, so let's choose the most likely subsystems (identified by the
527*4882a593Smuzhiyunstring before the colon in the Tracepoint events) and do a 'perf stat'
528*4882a593Smuzhiyunrun using only those wildcarded subsystems::
529*4882a593Smuzhiyun
530*4882a593Smuzhiyun   root@crownbay:~# perf stat -e skb:* -e net:* -e napi:* -e sched:* -e workqueue:* -e irq:* -e syscalls:* wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2
531*4882a593Smuzhiyun   Performance counter stats for 'wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2':
532*4882a593Smuzhiyun
533*4882a593Smuzhiyun               23323 skb:kfree_skb
534*4882a593Smuzhiyun                   0 skb:consume_skb
535*4882a593Smuzhiyun               49897 skb:skb_copy_datagram_iovec
536*4882a593Smuzhiyun                6217 net:net_dev_xmit
537*4882a593Smuzhiyun                6217 net:net_dev_queue
538*4882a593Smuzhiyun                7962 net:netif_receive_skb
539*4882a593Smuzhiyun                   2 net:netif_rx
540*4882a593Smuzhiyun                8340 napi:napi_poll
541*4882a593Smuzhiyun                   0 sched:sched_kthread_stop
542*4882a593Smuzhiyun                   0 sched:sched_kthread_stop_ret
543*4882a593Smuzhiyun                3749 sched:sched_wakeup
544*4882a593Smuzhiyun                   0 sched:sched_wakeup_new
545*4882a593Smuzhiyun                   0 sched:sched_switch
546*4882a593Smuzhiyun                  29 sched:sched_migrate_task
547*4882a593Smuzhiyun                   0 sched:sched_process_free
548*4882a593Smuzhiyun                   1 sched:sched_process_exit
549*4882a593Smuzhiyun                   0 sched:sched_wait_task
550*4882a593Smuzhiyun                   0 sched:sched_process_wait
551*4882a593Smuzhiyun                   0 sched:sched_process_fork
552*4882a593Smuzhiyun                   1 sched:sched_process_exec
553*4882a593Smuzhiyun                   0 sched:sched_stat_wait
554*4882a593Smuzhiyun       2106519415641 sched:sched_stat_sleep
555*4882a593Smuzhiyun                   0 sched:sched_stat_iowait
556*4882a593Smuzhiyun           147453613 sched:sched_stat_blocked
557*4882a593Smuzhiyun         12903026955 sched:sched_stat_runtime
558*4882a593Smuzhiyun                   0 sched:sched_pi_setprio
559*4882a593Smuzhiyun                3574 workqueue:workqueue_queue_work
560*4882a593Smuzhiyun                3574 workqueue:workqueue_activate_work
561*4882a593Smuzhiyun                   0 workqueue:workqueue_execute_start
562*4882a593Smuzhiyun                   0 workqueue:workqueue_execute_end
563*4882a593Smuzhiyun               16631 irq:irq_handler_entry
564*4882a593Smuzhiyun               16631 irq:irq_handler_exit
565*4882a593Smuzhiyun               28521 irq:softirq_entry
566*4882a593Smuzhiyun               28521 irq:softirq_exit
567*4882a593Smuzhiyun               28728 irq:softirq_raise
568*4882a593Smuzhiyun                   1 syscalls:sys_enter_sendmmsg
569*4882a593Smuzhiyun                   1 syscalls:sys_exit_sendmmsg
570*4882a593Smuzhiyun                   0 syscalls:sys_enter_recvmmsg
571*4882a593Smuzhiyun                   0 syscalls:sys_exit_recvmmsg
572*4882a593Smuzhiyun                  14 syscalls:sys_enter_socketcall
573*4882a593Smuzhiyun                  14 syscalls:sys_exit_socketcall
574*4882a593Smuzhiyun                     .
575*4882a593Smuzhiyun                     .
576*4882a593Smuzhiyun                     .
577*4882a593Smuzhiyun               16965 syscalls:sys_enter_read
578*4882a593Smuzhiyun               16965 syscalls:sys_exit_read
579*4882a593Smuzhiyun               12854 syscalls:sys_enter_write
580*4882a593Smuzhiyun               12854 syscalls:sys_exit_write
581*4882a593Smuzhiyun                     .
582*4882a593Smuzhiyun                     .
583*4882a593Smuzhiyun                     .
584*4882a593Smuzhiyun
585*4882a593Smuzhiyun        58.029710972 seconds time elapsed
586*4882a593Smuzhiyun
587*4882a593Smuzhiyun
588*4882a593Smuzhiyun
589*4882a593SmuzhiyunLet's pick one of these tracepoints
590*4882a593Smuzhiyunand tell perf to do a profile using it as the sampling event::
591*4882a593Smuzhiyun
592*4882a593Smuzhiyun   root@crownbay:~# perf record -g -e sched:sched_wakeup wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2
593*4882a593Smuzhiyun
594*4882a593Smuzhiyun.. image:: figures/sched-wakeup-profile.png
595*4882a593Smuzhiyun   :align: center
596*4882a593Smuzhiyun
597*4882a593SmuzhiyunThe screenshot above shows the results of running a profile using
598*4882a593Smuzhiyunsched:sched_switch tracepoint, which shows the relative costs of various
599*4882a593Smuzhiyunpaths to sched_wakeup (note that sched_wakeup is the name of the
600*4882a593Smuzhiyuntracepoint - it's actually defined just inside ttwu_do_wakeup(), which
601*4882a593Smuzhiyunaccounts for the function name actually displayed in the profile:
602*4882a593Smuzhiyun
603*4882a593Smuzhiyun.. code-block:: c
604*4882a593Smuzhiyun
605*4882a593Smuzhiyun     /*
606*4882a593Smuzhiyun      * Mark the task runnable and perform wakeup-preemption.
607*4882a593Smuzhiyun      */
608*4882a593Smuzhiyun     static void
609*4882a593Smuzhiyun     ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
610*4882a593Smuzhiyun     {
611*4882a593Smuzhiyun          trace_sched_wakeup(p, true);
612*4882a593Smuzhiyun          .
613*4882a593Smuzhiyun          .
614*4882a593Smuzhiyun          .
615*4882a593Smuzhiyun     }
616*4882a593Smuzhiyun
617*4882a593SmuzhiyunA couple of the more interesting
618*4882a593Smuzhiyuncallchains are expanded and displayed above, basically some network
619*4882a593Smuzhiyunreceive paths that presumably end up waking up wget (busybox) when
620*4882a593Smuzhiyunnetwork data is ready.
621*4882a593Smuzhiyun
622*4882a593SmuzhiyunNote that because tracepoints are normally used for tracing, the default
623*4882a593Smuzhiyunsampling period for tracepoints is 1 i.e. for tracepoints perf will
624*4882a593Smuzhiyunsample on every event occurrence (this can be changed using the -c
625*4882a593Smuzhiyunoption). This is in contrast to hardware counters such as for example
626*4882a593Smuzhiyunthe default 'cycles' hardware counter used for normal profiling, where
627*4882a593Smuzhiyunsampling periods are much higher (in the thousands) because profiling
628*4882a593Smuzhiyunshould have as low an overhead as possible and sampling on every cycle
629*4882a593Smuzhiyunwould be prohibitively expensive.
630*4882a593Smuzhiyun
631*4882a593SmuzhiyunUsing perf to do Basic Tracing
632*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
633*4882a593Smuzhiyun
634*4882a593SmuzhiyunProfiling is a great tool for solving many problems or for getting a
635*4882a593Smuzhiyunhigh-level view of what's going on with a workload or across the system.
636*4882a593SmuzhiyunIt is however by definition an approximation, as suggested by the most
637*4882a593Smuzhiyunprominent word associated with it, 'sampling'. On the one hand, it
638*4882a593Smuzhiyunallows a representative picture of what's going on in the system to be
639*4882a593Smuzhiyuncheaply taken, but on the other hand, that cheapness limits its utility
640*4882a593Smuzhiyunwhen that data suggests a need to 'dive down' more deeply to discover
641*4882a593Smuzhiyunwhat's really going on. In such cases, the only way to see what's really
642*4882a593Smuzhiyungoing on is to be able to look at (or summarize more intelligently) the
643*4882a593Smuzhiyunindividual steps that go into the higher-level behavior exposed by the
644*4882a593Smuzhiyuncoarse-grained profiling data.
645*4882a593Smuzhiyun
646*4882a593SmuzhiyunAs a concrete example, we can trace all the events we think might be
647*4882a593Smuzhiyunapplicable to our workload::
648*4882a593Smuzhiyun
649*4882a593Smuzhiyun   root@crownbay:~# perf record -g -e skb:* -e net:* -e napi:* -e sched:sched_switch -e sched:sched_wakeup -e irq:*
650*4882a593Smuzhiyun    -e syscalls:sys_enter_read -e syscalls:sys_exit_read -e syscalls:sys_enter_write -e syscalls:sys_exit_write
651*4882a593Smuzhiyun    wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2
652*4882a593Smuzhiyun
653*4882a593SmuzhiyunWe can look at the raw trace output using 'perf script' with no
654*4882a593Smuzhiyunarguments::
655*4882a593Smuzhiyun
656*4882a593Smuzhiyun   root@crownbay:~# perf script
657*4882a593Smuzhiyun
658*4882a593Smuzhiyun         perf  1262 [000] 11624.857082: sys_exit_read: 0x0
659*4882a593Smuzhiyun         perf  1262 [000] 11624.857193: sched_wakeup: comm=migration/0 pid=6 prio=0 success=1 target_cpu=000
660*4882a593Smuzhiyun         wget  1262 [001] 11624.858021: softirq_raise: vec=1 [action=TIMER]
661*4882a593Smuzhiyun         wget  1262 [001] 11624.858074: softirq_entry: vec=1 [action=TIMER]
662*4882a593Smuzhiyun         wget  1262 [001] 11624.858081: softirq_exit: vec=1 [action=TIMER]
663*4882a593Smuzhiyun         wget  1262 [001] 11624.858166: sys_enter_read: fd: 0x0003, buf: 0xbf82c940, count: 0x0200
664*4882a593Smuzhiyun         wget  1262 [001] 11624.858177: sys_exit_read: 0x200
665*4882a593Smuzhiyun         wget  1262 [001] 11624.858878: kfree_skb: skbaddr=0xeb248d80 protocol=0 location=0xc15a5308
666*4882a593Smuzhiyun         wget  1262 [001] 11624.858945: kfree_skb: skbaddr=0xeb248000 protocol=0 location=0xc15a5308
667*4882a593Smuzhiyun         wget  1262 [001] 11624.859020: softirq_raise: vec=1 [action=TIMER]
668*4882a593Smuzhiyun         wget  1262 [001] 11624.859076: softirq_entry: vec=1 [action=TIMER]
669*4882a593Smuzhiyun         wget  1262 [001] 11624.859083: softirq_exit: vec=1 [action=TIMER]
670*4882a593Smuzhiyun         wget  1262 [001] 11624.859167: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
671*4882a593Smuzhiyun         wget  1262 [001] 11624.859192: sys_exit_read: 0x1d7
672*4882a593Smuzhiyun         wget  1262 [001] 11624.859228: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
673*4882a593Smuzhiyun         wget  1262 [001] 11624.859233: sys_exit_read: 0x0
674*4882a593Smuzhiyun         wget  1262 [001] 11624.859573: sys_enter_read: fd: 0x0003, buf: 0xbf82c580, count: 0x0200
675*4882a593Smuzhiyun         wget  1262 [001] 11624.859584: sys_exit_read: 0x200
676*4882a593Smuzhiyun         wget  1262 [001] 11624.859864: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
677*4882a593Smuzhiyun         wget  1262 [001] 11624.859888: sys_exit_read: 0x400
678*4882a593Smuzhiyun         wget  1262 [001] 11624.859935: sys_enter_read: fd: 0x0003, buf: 0xb7720000, count: 0x0400
679*4882a593Smuzhiyun         wget  1262 [001] 11624.859944: sys_exit_read: 0x400
680*4882a593Smuzhiyun
681*4882a593SmuzhiyunThis gives us a detailed timestamped sequence of events that occurred within the
682*4882a593Smuzhiyunworkload with respect to those events.
683*4882a593Smuzhiyun
684*4882a593SmuzhiyunIn many ways, profiling can be viewed as a subset of tracing -
685*4882a593Smuzhiyuntheoretically, if you have a set of trace events that's sufficient to
686*4882a593Smuzhiyuncapture all the important aspects of a workload, you can derive any of
687*4882a593Smuzhiyunthe results or views that a profiling run can.
688*4882a593Smuzhiyun
689*4882a593SmuzhiyunAnother aspect of traditional profiling is that while powerful in many
690*4882a593Smuzhiyunways, it's limited by the granularity of the underlying data. Profiling
691*4882a593Smuzhiyuntools offer various ways of sorting and presenting the sample data,
692*4882a593Smuzhiyunwhich make it much more useful and amenable to user experimentation, but
693*4882a593Smuzhiyunin the end it can't be used in an open-ended way to extract data that
694*4882a593Smuzhiyunjust isn't present as a consequence of the fact that conceptually, most
695*4882a593Smuzhiyunof it has been thrown away.
696*4882a593Smuzhiyun
697*4882a593SmuzhiyunFull-blown detailed tracing data does however offer the opportunity to
698*4882a593Smuzhiyunmanipulate and present the information collected during a tracing run in
699*4882a593Smuzhiyunan infinite variety of ways.
700*4882a593Smuzhiyun
701*4882a593SmuzhiyunAnother way to look at it is that there are only so many ways that the
702*4882a593Smuzhiyun'primitive' counters can be used on their own to generate interesting
703*4882a593Smuzhiyunoutput; to get anything more complicated than simple counts requires
704*4882a593Smuzhiyunsome amount of additional logic, which is typically very specific to the
705*4882a593Smuzhiyunproblem at hand. For example, if we wanted to make use of a 'counter'
706*4882a593Smuzhiyunthat maps to the value of the time difference between when a process was
707*4882a593Smuzhiyunscheduled to run on a processor and the time it actually ran, we
708*4882a593Smuzhiyunwouldn't expect such a counter to exist on its own, but we could derive
709*4882a593Smuzhiyunone called say 'wakeup_latency' and use it to extract a useful view of
710*4882a593Smuzhiyunthat metric from trace data. Likewise, we really can't figure out from
711*4882a593Smuzhiyunstandard profiling tools how much data every process on the system reads
712*4882a593Smuzhiyunand writes, along with how many of those reads and writes fail
713*4882a593Smuzhiyuncompletely. If we have sufficient trace data, however, we could with the
714*4882a593Smuzhiyunright tools easily extract and present that information, but we'd need
715*4882a593Smuzhiyunsomething other than pre-canned profiling tools to do that.
716*4882a593Smuzhiyun
717*4882a593SmuzhiyunLuckily, there is a general-purpose way to handle such needs, called
718*4882a593Smuzhiyun'programming languages'. Making programming languages easily available
719*4882a593Smuzhiyunto apply to such problems given the specific format of data is called a
720*4882a593Smuzhiyun'programming language binding' for that data and language. Perf supports
721*4882a593Smuzhiyuntwo programming language bindings, one for Python and one for Perl.
722*4882a593Smuzhiyun
723*4882a593Smuzhiyun.. admonition:: Tying it Together
724*4882a593Smuzhiyun
725*4882a593Smuzhiyun   Language bindings for manipulating and aggregating trace data are of
726*4882a593Smuzhiyun   course not a new idea. One of the first projects to do this was IBM's
727*4882a593Smuzhiyun   DProbes dpcc compiler, an ANSI C compiler which targeted a low-level
728*4882a593Smuzhiyun   assembly language running on an in-kernel interpreter on the target
729*4882a593Smuzhiyun   system. This is exactly analogous to what Sun's DTrace did, except
730*4882a593Smuzhiyun   that DTrace invented its own language for the purpose. Systemtap,
731*4882a593Smuzhiyun   heavily inspired by DTrace, also created its own one-off language,
732*4882a593Smuzhiyun   but rather than running the product on an in-kernel interpreter,
733*4882a593Smuzhiyun   created an elaborate compiler-based machinery to translate its
734*4882a593Smuzhiyun   language into kernel modules written in C.
735*4882a593Smuzhiyun
736*4882a593SmuzhiyunNow that we have the trace data in perf.data, we can use 'perf script
737*4882a593Smuzhiyun-g' to generate a skeleton script with handlers for the read/write
738*4882a593Smuzhiyunentry/exit events we recorded::
739*4882a593Smuzhiyun
740*4882a593Smuzhiyun   root@crownbay:~# perf script -g python
741*4882a593Smuzhiyun   generated Python script: perf-script.py
742*4882a593Smuzhiyun
743*4882a593SmuzhiyunThe skeleton script simply creates a python function for each event type in the
744*4882a593Smuzhiyunperf.data file. The body of each function simply prints the event name along
745*4882a593Smuzhiyunwith its parameters. For example:
746*4882a593Smuzhiyun
747*4882a593Smuzhiyun.. code-block:: python
748*4882a593Smuzhiyun
749*4882a593Smuzhiyun   def net__netif_rx(event_name, context, common_cpu,
750*4882a593Smuzhiyun          common_secs, common_nsecs, common_pid, common_comm,
751*4882a593Smuzhiyun          skbaddr, len, name):
752*4882a593Smuzhiyun                  print_header(event_name, common_cpu, common_secs, common_nsecs,
753*4882a593Smuzhiyun                          common_pid, common_comm)
754*4882a593Smuzhiyun
755*4882a593Smuzhiyun                  print "skbaddr=%u, len=%u, name=%s\n" % (skbaddr, len, name),
756*4882a593Smuzhiyun
757*4882a593SmuzhiyunWe can run that script directly to print all of the events contained in the
758*4882a593Smuzhiyunperf.data file::
759*4882a593Smuzhiyun
760*4882a593Smuzhiyun   root@crownbay:~# perf script -s perf-script.py
761*4882a593Smuzhiyun
762*4882a593Smuzhiyun   in trace_begin
763*4882a593Smuzhiyun   syscalls__sys_exit_read     0 11624.857082795     1262 perf                  nr=3, ret=0
764*4882a593Smuzhiyun   sched__sched_wakeup      0 11624.857193498     1262 perf                  comm=migration/0, pid=6, prio=0,      success=1, target_cpu=0
765*4882a593Smuzhiyun   irq__softirq_raise       1 11624.858021635     1262 wget                  vec=TIMER
766*4882a593Smuzhiyun   irq__softirq_entry       1 11624.858074075     1262 wget                  vec=TIMER
767*4882a593Smuzhiyun   irq__softirq_exit        1 11624.858081389     1262 wget                  vec=TIMER
768*4882a593Smuzhiyun   syscalls__sys_enter_read     1 11624.858166434     1262 wget                  nr=3, fd=3, buf=3213019456,      count=512
769*4882a593Smuzhiyun   syscalls__sys_exit_read     1 11624.858177924     1262 wget                  nr=3, ret=512
770*4882a593Smuzhiyun   skb__kfree_skb           1 11624.858878188     1262 wget                  skbaddr=3945041280,           location=3243922184, protocol=0
771*4882a593Smuzhiyun   skb__kfree_skb           1 11624.858945608     1262 wget                  skbaddr=3945037824,      location=3243922184, protocol=0
772*4882a593Smuzhiyun   irq__softirq_raise       1 11624.859020942     1262 wget                  vec=TIMER
773*4882a593Smuzhiyun   irq__softirq_entry       1 11624.859076935     1262 wget                  vec=TIMER
774*4882a593Smuzhiyun   irq__softirq_exit        1 11624.859083469     1262 wget                  vec=TIMER
775*4882a593Smuzhiyun   syscalls__sys_enter_read     1 11624.859167565     1262 wget                  nr=3, fd=3, buf=3077701632,      count=1024
776*4882a593Smuzhiyun   syscalls__sys_exit_read     1 11624.859192533     1262 wget                  nr=3, ret=471
777*4882a593Smuzhiyun   syscalls__sys_enter_read     1 11624.859228072     1262 wget                  nr=3, fd=3, buf=3077701632,      count=1024
778*4882a593Smuzhiyun   syscalls__sys_exit_read     1 11624.859233707     1262 wget                  nr=3, ret=0
779*4882a593Smuzhiyun   syscalls__sys_enter_read     1 11624.859573008     1262 wget                  nr=3, fd=3, buf=3213018496,      count=512
780*4882a593Smuzhiyun   syscalls__sys_exit_read     1 11624.859584818     1262 wget                  nr=3, ret=512
781*4882a593Smuzhiyun   syscalls__sys_enter_read     1 11624.859864562     1262 wget                  nr=3, fd=3, buf=3077701632,      count=1024
782*4882a593Smuzhiyun   syscalls__sys_exit_read     1 11624.859888770     1262 wget                  nr=3, ret=1024
783*4882a593Smuzhiyun   syscalls__sys_enter_read     1 11624.859935140     1262 wget                  nr=3, fd=3, buf=3077701632,      count=1024
784*4882a593Smuzhiyun   syscalls__sys_exit_read     1 11624.859944032     1262 wget                  nr=3, ret=1024
785*4882a593Smuzhiyun
786*4882a593SmuzhiyunThat in itself isn't very useful; after all, we can accomplish pretty much the
787*4882a593Smuzhiyunsame thing by simply running 'perf script' without arguments in the same
788*4882a593Smuzhiyundirectory as the perf.data file.
789*4882a593Smuzhiyun
790*4882a593SmuzhiyunWe can however replace the print statements in the generated function
791*4882a593Smuzhiyunbodies with whatever we want, and thereby make it infinitely more
792*4882a593Smuzhiyunuseful.
793*4882a593Smuzhiyun
794*4882a593SmuzhiyunAs a simple example, let's just replace the print statements in the
795*4882a593Smuzhiyunfunction bodies with a simple function that does nothing but increment a
796*4882a593Smuzhiyunper-event count. When the program is run against a perf.data file, each
797*4882a593Smuzhiyuntime a particular event is encountered, a tally is incremented for that
798*4882a593Smuzhiyunevent. For example:
799*4882a593Smuzhiyun
800*4882a593Smuzhiyun.. code-block:: python
801*4882a593Smuzhiyun
802*4882a593Smuzhiyun   def net__netif_rx(event_name, context, common_cpu,
803*4882a593Smuzhiyun          common_secs, common_nsecs, common_pid, common_comm,
804*4882a593Smuzhiyun          skbaddr, len, name):
805*4882a593Smuzhiyun              inc_counts(event_name)
806*4882a593Smuzhiyun
807*4882a593SmuzhiyunEach event handler function in the generated code
808*4882a593Smuzhiyunis modified to do this. For convenience, we define a common function
809*4882a593Smuzhiyuncalled inc_counts() that each handler calls; inc_counts() simply tallies
810*4882a593Smuzhiyuna count for each event using the 'counts' hash, which is a specialized
811*4882a593Smuzhiyunhash function that does Perl-like autovivification, a capability that's
812*4882a593Smuzhiyunextremely useful for kinds of multi-level aggregation commonly used in
813*4882a593Smuzhiyunprocessing traces (see perf's documentation on the Python language
814*4882a593Smuzhiyunbinding for details):
815*4882a593Smuzhiyun
816*4882a593Smuzhiyun.. code-block:: python
817*4882a593Smuzhiyun
818*4882a593Smuzhiyun     counts = autodict()
819*4882a593Smuzhiyun
820*4882a593Smuzhiyun     def inc_counts(event_name):
821*4882a593Smuzhiyun            try:
822*4882a593Smuzhiyun                    counts[event_name] += 1
823*4882a593Smuzhiyun            except TypeError:
824*4882a593Smuzhiyun                    counts[event_name] = 1
825*4882a593Smuzhiyun
826*4882a593SmuzhiyunFinally, at the end of the trace processing run, we want to print the
827*4882a593Smuzhiyunresult of all the per-event tallies. For that, we use the special
828*4882a593Smuzhiyun'trace_end()' function:
829*4882a593Smuzhiyun
830*4882a593Smuzhiyun.. code-block:: python
831*4882a593Smuzhiyun
832*4882a593Smuzhiyun     def trace_end():
833*4882a593Smuzhiyun            for event_name, count in counts.iteritems():
834*4882a593Smuzhiyun                    print "%-40s %10s\n" % (event_name, count)
835*4882a593Smuzhiyun
836*4882a593SmuzhiyunThe end result is a summary of all the events recorded in the trace::
837*4882a593Smuzhiyun
838*4882a593Smuzhiyun   skb__skb_copy_datagram_iovec                  13148
839*4882a593Smuzhiyun   irq__softirq_entry                             4796
840*4882a593Smuzhiyun   irq__irq_handler_exit                          3805
841*4882a593Smuzhiyun   irq__softirq_exit                              4795
842*4882a593Smuzhiyun   syscalls__sys_enter_write                      8990
843*4882a593Smuzhiyun   net__net_dev_xmit                               652
844*4882a593Smuzhiyun   skb__kfree_skb                                 4047
845*4882a593Smuzhiyun   sched__sched_wakeup                            1155
846*4882a593Smuzhiyun   irq__irq_handler_entry                         3804
847*4882a593Smuzhiyun   irq__softirq_raise                             4799
848*4882a593Smuzhiyun   net__net_dev_queue                              652
849*4882a593Smuzhiyun   syscalls__sys_enter_read                      17599
850*4882a593Smuzhiyun   net__netif_receive_skb                         1743
851*4882a593Smuzhiyun   syscalls__sys_exit_read                       17598
852*4882a593Smuzhiyun   net__netif_rx                                     2
853*4882a593Smuzhiyun   napi__napi_poll                                1877
854*4882a593Smuzhiyun   syscalls__sys_exit_write                       8990
855*4882a593Smuzhiyun
856*4882a593SmuzhiyunNote that this is
857*4882a593Smuzhiyunpretty much exactly the same information we get from 'perf stat', which
858*4882a593Smuzhiyungoes a little way to support the idea mentioned previously that given
859*4882a593Smuzhiyunthe right kind of trace data, higher-level profiling-type summaries can
860*4882a593Smuzhiyunbe derived from it.
861*4882a593Smuzhiyun
862*4882a593SmuzhiyunDocumentation on using the `'perf script' python
863*4882a593Smuzhiyunbinding <https://linux.die.net/man/1/perf-script-python>`__.
864*4882a593Smuzhiyun
865*4882a593SmuzhiyunSystem-Wide Tracing and Profiling
866*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
867*4882a593Smuzhiyun
868*4882a593SmuzhiyunThe examples so far have focused on tracing a particular program or
869*4882a593Smuzhiyunworkload - in other words, every profiling run has specified the program
870*4882a593Smuzhiyunto profile in the command-line e.g. 'perf record wget ...'.
871*4882a593Smuzhiyun
872*4882a593SmuzhiyunIt's also possible, and more interesting in many cases, to run a
873*4882a593Smuzhiyunsystem-wide profile or trace while running the workload in a separate
874*4882a593Smuzhiyunshell.
875*4882a593Smuzhiyun
876*4882a593SmuzhiyunTo do system-wide profiling or tracing, you typically use the -a flag to
877*4882a593Smuzhiyun'perf record'.
878*4882a593Smuzhiyun
879*4882a593SmuzhiyunTo demonstrate this, open up one window and start the profile using the
880*4882a593Smuzhiyun-a flag (press Ctrl-C to stop tracing)::
881*4882a593Smuzhiyun
882*4882a593Smuzhiyun   root@crownbay:~# perf record -g -a
883*4882a593Smuzhiyun   ^C[ perf record: Woken up 6 times to write data ]
884*4882a593Smuzhiyun   [ perf record: Captured and wrote 1.400 MB perf.data (~61172 samples) ]
885*4882a593Smuzhiyun
886*4882a593SmuzhiyunIn another window, run the wget test::
887*4882a593Smuzhiyun
888*4882a593Smuzhiyun   root@crownbay:~# wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2
889*4882a593Smuzhiyun   Connecting to downloads.yoctoproject.org (140.211.169.59:80)
890*4882a593Smuzhiyun   linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA
891*4882a593Smuzhiyun
892*4882a593SmuzhiyunHere we see entries not only for our wget load, but for
893*4882a593Smuzhiyunother processes running on the system as well:
894*4882a593Smuzhiyun
895*4882a593Smuzhiyun.. image:: figures/perf-systemwide.png
896*4882a593Smuzhiyun   :align: center
897*4882a593Smuzhiyun
898*4882a593SmuzhiyunIn the snapshot above, we can see callchains that originate in libc, and
899*4882a593Smuzhiyuna callchain from Xorg that demonstrates that we're using a proprietary X
900*4882a593Smuzhiyundriver in userspace (notice the presence of 'PVR' and some other
901*4882a593Smuzhiyununresolvable symbols in the expanded Xorg callchain).
902*4882a593Smuzhiyun
903*4882a593SmuzhiyunNote also that we have both kernel and userspace entries in the above
904*4882a593Smuzhiyunsnapshot. We can also tell perf to focus on userspace but providing a
905*4882a593Smuzhiyunmodifier, in this case 'u', to the 'cycles' hardware counter when we
906*4882a593Smuzhiyunrecord a profile::
907*4882a593Smuzhiyun
908*4882a593Smuzhiyun   root@crownbay:~# perf record -g -a -e cycles:u
909*4882a593Smuzhiyun   ^C[ perf record: Woken up 2 times to write data ]
910*4882a593Smuzhiyun   [ perf record: Captured and wrote 0.376 MB perf.data (~16443 samples) ]
911*4882a593Smuzhiyun
912*4882a593Smuzhiyun.. image:: figures/perf-report-cycles-u.png
913*4882a593Smuzhiyun   :align: center
914*4882a593Smuzhiyun
915*4882a593SmuzhiyunNotice in the screenshot above, we see only userspace entries ([.])
916*4882a593Smuzhiyun
917*4882a593SmuzhiyunFinally, we can press 'enter' on a leaf node and select the 'Zoom into
918*4882a593SmuzhiyunDSO' menu item to show only entries associated with a specific DSO. In
919*4882a593Smuzhiyunthe screenshot below, we've zoomed into the 'libc' DSO which shows all
920*4882a593Smuzhiyunthe entries associated with the libc-xxx.so DSO.
921*4882a593Smuzhiyun
922*4882a593Smuzhiyun.. image:: figures/perf-systemwide-libc.png
923*4882a593Smuzhiyun   :align: center
924*4882a593Smuzhiyun
925*4882a593SmuzhiyunWe can also use the system-wide -a switch to do system-wide tracing.
926*4882a593SmuzhiyunHere we'll trace a couple of scheduler events::
927*4882a593Smuzhiyun
928*4882a593Smuzhiyun   root@crownbay:~# perf record -a -e sched:sched_switch -e sched:sched_wakeup
929*4882a593Smuzhiyun   ^C[ perf record: Woken up 38 times to write data ]
930*4882a593Smuzhiyun   [ perf record: Captured and wrote 9.780 MB perf.data (~427299 samples) ]
931*4882a593Smuzhiyun
932*4882a593SmuzhiyunWe can look at the raw output using 'perf script' with no arguments::
933*4882a593Smuzhiyun
934*4882a593Smuzhiyun   root@crownbay:~# perf script
935*4882a593Smuzhiyun
936*4882a593Smuzhiyun              perf  1383 [001]  6171.460045: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
937*4882a593Smuzhiyun              perf  1383 [001]  6171.460066: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120
938*4882a593Smuzhiyun       kworker/1:1    21 [001]  6171.460093: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120
939*4882a593Smuzhiyun           swapper     0 [000]  6171.468063: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000
940*4882a593Smuzhiyun           swapper     0 [000]  6171.468107: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120
941*4882a593Smuzhiyun       kworker/0:3  1209 [000]  6171.468143: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
942*4882a593Smuzhiyun              perf  1383 [001]  6171.470039: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
943*4882a593Smuzhiyun              perf  1383 [001]  6171.470058: sched_switch: prev_comm=perf prev_pid=1383 prev_prio=120 prev_state=R+ ==> next_comm=kworker/1:1 next_pid=21 next_prio=120
944*4882a593Smuzhiyun       kworker/1:1    21 [001]  6171.470082: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=perf next_pid=1383 next_prio=120
945*4882a593Smuzhiyun              perf  1383 [001]  6171.480035: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
946*4882a593Smuzhiyun
947*4882a593SmuzhiyunFiltering
948*4882a593Smuzhiyun^^^^^^^^^
949*4882a593Smuzhiyun
950*4882a593SmuzhiyunNotice that there are a lot of events that don't really have anything to
951*4882a593Smuzhiyundo with what we're interested in, namely events that schedule 'perf'
952*4882a593Smuzhiyunitself in and out or that wake perf up. We can get rid of those by using
953*4882a593Smuzhiyunthe '--filter' option - for each event we specify using -e, we can add a
954*4882a593Smuzhiyun--filter after that to filter out trace events that contain fields with
955*4882a593Smuzhiyunspecific values::
956*4882a593Smuzhiyun
957*4882a593Smuzhiyun   root@crownbay:~# perf record -a -e sched:sched_switch --filter 'next_comm != perf && prev_comm != perf' -e sched:sched_wakeup --filter 'comm != perf'
958*4882a593Smuzhiyun   ^C[ perf record: Woken up 38 times to write data ]
959*4882a593Smuzhiyun   [ perf record: Captured and wrote 9.688 MB perf.data (~423279 samples) ]
960*4882a593Smuzhiyun
961*4882a593Smuzhiyun
962*4882a593Smuzhiyun   root@crownbay:~# perf script
963*4882a593Smuzhiyun
964*4882a593Smuzhiyun           swapper     0 [000]  7932.162180: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120
965*4882a593Smuzhiyun       kworker/0:3  1209 [000]  7932.162236: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
966*4882a593Smuzhiyun              perf  1407 [001]  7932.170048: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
967*4882a593Smuzhiyun              perf  1407 [001]  7932.180044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
968*4882a593Smuzhiyun              perf  1407 [001]  7932.190038: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
969*4882a593Smuzhiyun              perf  1407 [001]  7932.200044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
970*4882a593Smuzhiyun              perf  1407 [001]  7932.210044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
971*4882a593Smuzhiyun              perf  1407 [001]  7932.220044: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
972*4882a593Smuzhiyun           swapper     0 [001]  7932.230111: sched_wakeup: comm=kworker/1:1 pid=21 prio=120 success=1 target_cpu=001
973*4882a593Smuzhiyun           swapper     0 [001]  7932.230146: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/1:1 next_pid=21 next_prio=120
974*4882a593Smuzhiyun       kworker/1:1    21 [001]  7932.230205: sched_switch: prev_comm=kworker/1:1 prev_pid=21 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
975*4882a593Smuzhiyun           swapper     0 [000]  7932.326109: sched_wakeup: comm=kworker/0:3 pid=1209 prio=120 success=1 target_cpu=000
976*4882a593Smuzhiyun           swapper     0 [000]  7932.326171: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/0:3 next_pid=1209 next_prio=120
977*4882a593Smuzhiyun       kworker/0:3  1209 [000]  7932.326214: sched_switch: prev_comm=kworker/0:3 prev_pid=1209 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
978*4882a593Smuzhiyun
979*4882a593SmuzhiyunIn this case, we've filtered out all events that have
980*4882a593Smuzhiyun'perf' in their 'comm' or 'comm_prev' or 'comm_next' fields. Notice that
981*4882a593Smuzhiyunthere are still events recorded for perf, but notice that those events
982*4882a593Smuzhiyundon't have values of 'perf' for the filtered fields. To completely
983*4882a593Smuzhiyunfilter out anything from perf will require a bit more work, but for the
984*4882a593Smuzhiyunpurpose of demonstrating how to use filters, it's close enough.
985*4882a593Smuzhiyun
986*4882a593Smuzhiyun.. admonition:: Tying it Together
987*4882a593Smuzhiyun
988*4882a593Smuzhiyun   These are exactly the same set of event filters defined by the trace
989*4882a593Smuzhiyun   event subsystem. See the ftrace/tracecmd/kernelshark section for more
990*4882a593Smuzhiyun   discussion about these event filters.
991*4882a593Smuzhiyun
992*4882a593Smuzhiyun.. admonition:: Tying it Together
993*4882a593Smuzhiyun
994*4882a593Smuzhiyun   These event filters are implemented by a special-purpose
995*4882a593Smuzhiyun   pseudo-interpreter in the kernel and are an integral and
996*4882a593Smuzhiyun   indispensable part of the perf design as it relates to tracing.
997*4882a593Smuzhiyun   kernel-based event filters provide a mechanism to precisely throttle
998*4882a593Smuzhiyun   the event stream that appears in user space, where it makes sense to
999*4882a593Smuzhiyun   provide bindings to real programming languages for postprocessing the
1000*4882a593Smuzhiyun   event stream. This architecture allows for the intelligent and
1001*4882a593Smuzhiyun   flexible partitioning of processing between the kernel and user
1002*4882a593Smuzhiyun   space. Contrast this with other tools such as SystemTap, which does
1003*4882a593Smuzhiyun   all of its processing in the kernel and as such requires a special
1004*4882a593Smuzhiyun   project-defined language in order to accommodate that design, or
1005*4882a593Smuzhiyun   LTTng, where everything is sent to userspace and as such requires a
1006*4882a593Smuzhiyun   super-efficient kernel-to-userspace transport mechanism in order to
1007*4882a593Smuzhiyun   function properly. While perf certainly can benefit from for instance
1008*4882a593Smuzhiyun   advances in the design of the transport, it doesn't fundamentally
1009*4882a593Smuzhiyun   depend on them. Basically, if you find that your perf tracing
1010*4882a593Smuzhiyun   application is causing buffer I/O overruns, it probably means that
1011*4882a593Smuzhiyun   you aren't taking enough advantage of the kernel filtering engine.
1012*4882a593Smuzhiyun
1013*4882a593SmuzhiyunUsing Dynamic Tracepoints
1014*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~
1015*4882a593Smuzhiyun
1016*4882a593Smuzhiyunperf isn't restricted to the fixed set of static tracepoints listed by
1017*4882a593Smuzhiyun'perf list'. Users can also add their own 'dynamic' tracepoints anywhere
1018*4882a593Smuzhiyunin the kernel. For instance, suppose we want to define our own
1019*4882a593Smuzhiyuntracepoint on do_fork(). We can do that using the 'perf probe' perf
1020*4882a593Smuzhiyunsubcommand::
1021*4882a593Smuzhiyun
1022*4882a593Smuzhiyun   root@crownbay:~# perf probe do_fork
1023*4882a593Smuzhiyun   Added new event:
1024*4882a593Smuzhiyun     probe:do_fork        (on do_fork)
1025*4882a593Smuzhiyun
1026*4882a593Smuzhiyun   You can now use it in all perf tools, such as:
1027*4882a593Smuzhiyun
1028*4882a593Smuzhiyun     perf record -e probe:do_fork -aR sleep 1
1029*4882a593Smuzhiyun
1030*4882a593SmuzhiyunAdding a new tracepoint via
1031*4882a593Smuzhiyun'perf probe' results in an event with all the expected files and format
1032*4882a593Smuzhiyunin /sys/kernel/debug/tracing/events, just the same as for static
1033*4882a593Smuzhiyuntracepoints (as discussed in more detail in the trace events subsystem
1034*4882a593Smuzhiyunsection::
1035*4882a593Smuzhiyun
1036*4882a593Smuzhiyun   root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# ls -al
1037*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Oct 28 11:42 .
1038*4882a593Smuzhiyun   drwxr-xr-x    3 root     root             0 Oct 28 11:42 ..
1039*4882a593Smuzhiyun   -rw-r--r--    1 root     root             0 Oct 28 11:42 enable
1040*4882a593Smuzhiyun   -rw-r--r--    1 root     root             0 Oct 28 11:42 filter
1041*4882a593Smuzhiyun   -r--r--r--    1 root     root             0 Oct 28 11:42 format
1042*4882a593Smuzhiyun   -r--r--r--    1 root     root             0 Oct 28 11:42 id
1043*4882a593Smuzhiyun
1044*4882a593Smuzhiyun   root@crownbay:/sys/kernel/debug/tracing/events/probe/do_fork# cat format
1045*4882a593Smuzhiyun   name: do_fork
1046*4882a593Smuzhiyun   ID: 944
1047*4882a593Smuzhiyun   format:
1048*4882a593Smuzhiyun           field:unsigned short common_type;	offset:0;	size:2;	signed:0;
1049*4882a593Smuzhiyun           field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
1050*4882a593Smuzhiyun           field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
1051*4882a593Smuzhiyun           field:int common_pid;	offset:4;	size:4;	signed:1;
1052*4882a593Smuzhiyun           field:int common_padding;	offset:8;	size:4;	signed:1;
1053*4882a593Smuzhiyun
1054*4882a593Smuzhiyun           field:unsigned long __probe_ip;	offset:12;	size:4;	signed:0;
1055*4882a593Smuzhiyun
1056*4882a593Smuzhiyun   print fmt: "(%lx)", REC->__probe_ip
1057*4882a593Smuzhiyun
1058*4882a593SmuzhiyunWe can list all dynamic tracepoints currently in
1059*4882a593Smuzhiyunexistence::
1060*4882a593Smuzhiyun
1061*4882a593Smuzhiyun   root@crownbay:~# perf probe -l
1062*4882a593Smuzhiyun    probe:do_fork (on do_fork)
1063*4882a593Smuzhiyun    probe:schedule (on schedule)
1064*4882a593Smuzhiyun
1065*4882a593SmuzhiyunLet's record system-wide ('sleep 30' is a
1066*4882a593Smuzhiyuntrick for recording system-wide but basically do nothing and then wake
1067*4882a593Smuzhiyunup after 30 seconds)::
1068*4882a593Smuzhiyun
1069*4882a593Smuzhiyun   root@crownbay:~# perf record -g -a -e probe:do_fork sleep 30
1070*4882a593Smuzhiyun   [ perf record: Woken up 1 times to write data ]
1071*4882a593Smuzhiyun   [ perf record: Captured and wrote 0.087 MB perf.data (~3812 samples) ]
1072*4882a593Smuzhiyun
1073*4882a593SmuzhiyunUsing 'perf script' we can see each do_fork event that fired::
1074*4882a593Smuzhiyun
1075*4882a593Smuzhiyun   root@crownbay:~# perf script
1076*4882a593Smuzhiyun
1077*4882a593Smuzhiyun   # ========
1078*4882a593Smuzhiyun   # captured on: Sun Oct 28 11:55:18 2012
1079*4882a593Smuzhiyun   # hostname : crownbay
1080*4882a593Smuzhiyun   # os release : 3.4.11-yocto-standard
1081*4882a593Smuzhiyun   # perf version : 3.4.11
1082*4882a593Smuzhiyun   # arch : i686
1083*4882a593Smuzhiyun   # nrcpus online : 2
1084*4882a593Smuzhiyun   # nrcpus avail : 2
1085*4882a593Smuzhiyun   # cpudesc : Intel(R) Atom(TM) CPU E660 @ 1.30GHz
1086*4882a593Smuzhiyun   # cpuid : GenuineIntel,6,38,1
1087*4882a593Smuzhiyun   # total memory : 1017184 kB
1088*4882a593Smuzhiyun   # cmdline : /usr/bin/perf record -g -a -e probe:do_fork sleep 30
1089*4882a593Smuzhiyun   # event : name = probe:do_fork, type = 2, config = 0x3b0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern
1090*4882a593Smuzhiyun    = 0, id = { 5, 6 }
1091*4882a593Smuzhiyun   # HEADER_CPU_TOPOLOGY info available, use -I to display
1092*4882a593Smuzhiyun   # ========
1093*4882a593Smuzhiyun   #
1094*4882a593Smuzhiyun    matchbox-deskto  1197 [001] 34211.378318: do_fork: (c1028460)
1095*4882a593Smuzhiyun    matchbox-deskto  1295 [001] 34211.380388: do_fork: (c1028460)
1096*4882a593Smuzhiyun            pcmanfm  1296 [000] 34211.632350: do_fork: (c1028460)
1097*4882a593Smuzhiyun            pcmanfm  1296 [000] 34211.639917: do_fork: (c1028460)
1098*4882a593Smuzhiyun    matchbox-deskto  1197 [001] 34217.541603: do_fork: (c1028460)
1099*4882a593Smuzhiyun    matchbox-deskto  1299 [001] 34217.543584: do_fork: (c1028460)
1100*4882a593Smuzhiyun             gthumb  1300 [001] 34217.697451: do_fork: (c1028460)
1101*4882a593Smuzhiyun             gthumb  1300 [001] 34219.085734: do_fork: (c1028460)
1102*4882a593Smuzhiyun             gthumb  1300 [000] 34219.121351: do_fork: (c1028460)
1103*4882a593Smuzhiyun             gthumb  1300 [001] 34219.264551: do_fork: (c1028460)
1104*4882a593Smuzhiyun            pcmanfm  1296 [000] 34219.590380: do_fork: (c1028460)
1105*4882a593Smuzhiyun    matchbox-deskto  1197 [001] 34224.955965: do_fork: (c1028460)
1106*4882a593Smuzhiyun    matchbox-deskto  1306 [001] 34224.957972: do_fork: (c1028460)
1107*4882a593Smuzhiyun    matchbox-termin  1307 [000] 34225.038214: do_fork: (c1028460)
1108*4882a593Smuzhiyun    matchbox-termin  1307 [001] 34225.044218: do_fork: (c1028460)
1109*4882a593Smuzhiyun    matchbox-termin  1307 [000] 34225.046442: do_fork: (c1028460)
1110*4882a593Smuzhiyun    matchbox-deskto  1197 [001] 34237.112138: do_fork: (c1028460)
1111*4882a593Smuzhiyun    matchbox-deskto  1311 [001] 34237.114106: do_fork: (c1028460)
1112*4882a593Smuzhiyun               gaku  1312 [000] 34237.202388: do_fork: (c1028460)
1113*4882a593Smuzhiyun
1114*4882a593SmuzhiyunAnd using 'perf report' on the same file, we can see the
1115*4882a593Smuzhiyuncallgraphs from starting a few programs during those 30 seconds:
1116*4882a593Smuzhiyun
1117*4882a593Smuzhiyun.. image:: figures/perf-probe-do_fork-profile.png
1118*4882a593Smuzhiyun   :align: center
1119*4882a593Smuzhiyun
1120*4882a593Smuzhiyun.. admonition:: Tying it Together
1121*4882a593Smuzhiyun
1122*4882a593Smuzhiyun   The trace events subsystem accommodate static and dynamic tracepoints
1123*4882a593Smuzhiyun   in exactly the same way - there's no difference as far as the
1124*4882a593Smuzhiyun   infrastructure is concerned. See the ftrace section for more details
1125*4882a593Smuzhiyun   on the trace event subsystem.
1126*4882a593Smuzhiyun
1127*4882a593Smuzhiyun.. admonition:: Tying it Together
1128*4882a593Smuzhiyun
1129*4882a593Smuzhiyun   Dynamic tracepoints are implemented under the covers by kprobes and
1130*4882a593Smuzhiyun   uprobes. kprobes and uprobes are also used by and in fact are the
1131*4882a593Smuzhiyun   main focus of SystemTap.
1132*4882a593Smuzhiyun
1133*4882a593SmuzhiyunPerf Documentation
1134*4882a593Smuzhiyun------------------
1135*4882a593Smuzhiyun
1136*4882a593SmuzhiyunOnline versions of the man pages for the commands discussed in this
1137*4882a593Smuzhiyunsection can be found here:
1138*4882a593Smuzhiyun
1139*4882a593Smuzhiyun-  The `'perf stat' manpage <https://linux.die.net/man/1/perf-stat>`__.
1140*4882a593Smuzhiyun
1141*4882a593Smuzhiyun-  The `'perf record'
1142*4882a593Smuzhiyun   manpage <https://linux.die.net/man/1/perf-record>`__.
1143*4882a593Smuzhiyun
1144*4882a593Smuzhiyun-  The `'perf report'
1145*4882a593Smuzhiyun   manpage <https://linux.die.net/man/1/perf-report>`__.
1146*4882a593Smuzhiyun
1147*4882a593Smuzhiyun-  The `'perf probe' manpage <https://linux.die.net/man/1/perf-probe>`__.
1148*4882a593Smuzhiyun
1149*4882a593Smuzhiyun-  The `'perf script'
1150*4882a593Smuzhiyun   manpage <https://linux.die.net/man/1/perf-script>`__.
1151*4882a593Smuzhiyun
1152*4882a593Smuzhiyun-  Documentation on using the `'perf script' python
1153*4882a593Smuzhiyun   binding <https://linux.die.net/man/1/perf-script-python>`__.
1154*4882a593Smuzhiyun
1155*4882a593Smuzhiyun-  The top-level `perf(1) manpage <https://linux.die.net/man/1/perf>`__.
1156*4882a593Smuzhiyun
1157*4882a593SmuzhiyunNormally, you should be able to invoke the man pages via perf itself
1158*4882a593Smuzhiyune.g. 'perf help' or 'perf help record'.
1159*4882a593Smuzhiyun
1160*4882a593SmuzhiyunTo have the perf manpages installed on your target, modify your
1161*4882a593Smuzhiyunconfiguration as follows::
1162*4882a593Smuzhiyun
1163*4882a593Smuzhiyun   IMAGE_INSTALL:append = " perf perf-doc"
1164*4882a593Smuzhiyun   DISTRO_FEATURES:append = " api-documentation"
1165*4882a593Smuzhiyun
1166*4882a593SmuzhiyunThe man pages in text form, along with some other files, such as a set
1167*4882a593Smuzhiyunof examples, can also be found in the 'perf' directory of the kernel tree::
1168*4882a593Smuzhiyun
1169*4882a593Smuzhiyun   tools/perf/Documentation
1170*4882a593Smuzhiyun
1171*4882a593SmuzhiyunThere's also a nice perf tutorial on the perf
1172*4882a593Smuzhiyunwiki that goes into more detail than we do here in certain areas: `Perf
1173*4882a593SmuzhiyunTutorial <https://perf.wiki.kernel.org/index.php/Tutorial>`__
1174*4882a593Smuzhiyun
1175*4882a593Smuzhiyunftrace
1176*4882a593Smuzhiyun======
1177*4882a593Smuzhiyun
1178*4882a593Smuzhiyun'ftrace' literally refers to the 'ftrace function tracer' but in reality
1179*4882a593Smuzhiyunthis encompasses a number of related tracers along with the
1180*4882a593Smuzhiyuninfrastructure that they all make use of.
1181*4882a593Smuzhiyun
1182*4882a593Smuzhiyunftrace Setup
1183*4882a593Smuzhiyun------------
1184*4882a593Smuzhiyun
1185*4882a593SmuzhiyunFor this section, we'll assume you've already performed the basic setup
1186*4882a593Smuzhiyunoutlined in the ":ref:`profile-manual/intro:General Setup`" section.
1187*4882a593Smuzhiyun
1188*4882a593Smuzhiyunftrace, trace-cmd, and kernelshark run on the target system, and are
1189*4882a593Smuzhiyunready to go out-of-the-box - no additional setup is necessary. For the
1190*4882a593Smuzhiyunrest of this section we assume you've ssh'ed to the host and will be
1191*4882a593Smuzhiyunrunning ftrace on the target. kernelshark is a GUI application and if
1192*4882a593Smuzhiyunyou use the '-X' option to ssh you can have the kernelshark GUI run on
1193*4882a593Smuzhiyunthe target but display remotely on the host if you want.
1194*4882a593Smuzhiyun
1195*4882a593SmuzhiyunBasic ftrace usage
1196*4882a593Smuzhiyun------------------
1197*4882a593Smuzhiyun
1198*4882a593Smuzhiyun'ftrace' essentially refers to everything included in the /tracing
1199*4882a593Smuzhiyundirectory of the mounted debugfs filesystem (Yocto follows the standard
1200*4882a593Smuzhiyunconvention and mounts it at /sys/kernel/debug). Here's a listing of all
1201*4882a593Smuzhiyunthe files found in /sys/kernel/debug/tracing on a Yocto system::
1202*4882a593Smuzhiyun
1203*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# ls
1204*4882a593Smuzhiyun   README                      kprobe_events               trace
1205*4882a593Smuzhiyun   available_events            kprobe_profile              trace_clock
1206*4882a593Smuzhiyun   available_filter_functions  options                     trace_marker
1207*4882a593Smuzhiyun   available_tracers           per_cpu                     trace_options
1208*4882a593Smuzhiyun   buffer_size_kb              printk_formats              trace_pipe
1209*4882a593Smuzhiyun   buffer_total_size_kb        saved_cmdlines              tracing_cpumask
1210*4882a593Smuzhiyun   current_tracer              set_event                   tracing_enabled
1211*4882a593Smuzhiyun   dyn_ftrace_total_info       set_ftrace_filter           tracing_on
1212*4882a593Smuzhiyun   enabled_functions           set_ftrace_notrace          tracing_thresh
1213*4882a593Smuzhiyun   events                      set_ftrace_pid
1214*4882a593Smuzhiyun   free_buffer                 set_graph_function
1215*4882a593Smuzhiyun
1216*4882a593SmuzhiyunThe files listed above are used for various purposes
1217*4882a593Smuzhiyun- some relate directly to the tracers themselves, others are used to set
1218*4882a593Smuzhiyuntracing options, and yet others actually contain the tracing output when
1219*4882a593Smuzhiyuna tracer is in effect. Some of the functions can be guessed from their
1220*4882a593Smuzhiyunnames, others need explanation; in any case, we'll cover some of the
1221*4882a593Smuzhiyunfiles we see here below but for an explanation of the others, please see
1222*4882a593Smuzhiyunthe ftrace documentation.
1223*4882a593Smuzhiyun
1224*4882a593SmuzhiyunWe'll start by looking at some of the available built-in tracers.
1225*4882a593Smuzhiyun
1226*4882a593Smuzhiyuncat'ing the 'available_tracers' file lists the set of available tracers::
1227*4882a593Smuzhiyun
1228*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# cat available_tracers
1229*4882a593Smuzhiyun   blk function_graph function nop
1230*4882a593Smuzhiyun
1231*4882a593SmuzhiyunThe 'current_tracer' file contains the tracer currently in effect::
1232*4882a593Smuzhiyun
1233*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer
1234*4882a593Smuzhiyun   nop
1235*4882a593Smuzhiyun
1236*4882a593SmuzhiyunThe above listing of current_tracer shows that the
1237*4882a593Smuzhiyun'nop' tracer is in effect, which is just another way of saying that
1238*4882a593Smuzhiyunthere's actually no tracer currently in effect.
1239*4882a593Smuzhiyun
1240*4882a593Smuzhiyunecho'ing one of the available_tracers into current_tracer makes the
1241*4882a593Smuzhiyunspecified tracer the current tracer::
1242*4882a593Smuzhiyun
1243*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# echo function > current_tracer
1244*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# cat current_tracer
1245*4882a593Smuzhiyun   function
1246*4882a593Smuzhiyun
1247*4882a593SmuzhiyunThe above sets the current tracer to be the 'function tracer'. This tracer
1248*4882a593Smuzhiyuntraces every function call in the kernel and makes it available as the
1249*4882a593Smuzhiyuncontents of the 'trace' file. Reading the 'trace' file lists the
1250*4882a593Smuzhiyuncurrently buffered function calls that have been traced by the function
1251*4882a593Smuzhiyuntracer::
1252*4882a593Smuzhiyun
1253*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# cat trace | less
1254*4882a593Smuzhiyun
1255*4882a593Smuzhiyun   # tracer: function
1256*4882a593Smuzhiyun   #
1257*4882a593Smuzhiyun   # entries-in-buffer/entries-written: 310629/766471   #P:8
1258*4882a593Smuzhiyun   #
1259*4882a593Smuzhiyun   #                              _-----=> irqs-off
1260*4882a593Smuzhiyun   #                             / _----=> need-resched
1261*4882a593Smuzhiyun   #                            | / _---=> hardirq/softirq
1262*4882a593Smuzhiyun   #                            || / _--=> preempt-depth
1263*4882a593Smuzhiyun   #                            ||| /     delay
1264*4882a593Smuzhiyun   #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
1265*4882a593Smuzhiyun   #              | |       |   ||||       |         |
1266*4882a593Smuzhiyun            <idle>-0     [004] d..1   470.867169: ktime_get_real <-intel_idle
1267*4882a593Smuzhiyun            <idle>-0     [004] d..1   470.867170: getnstimeofday <-ktime_get_real
1268*4882a593Smuzhiyun            <idle>-0     [004] d..1   470.867171: ns_to_timeval <-intel_idle
1269*4882a593Smuzhiyun            <idle>-0     [004] d..1   470.867171: ns_to_timespec <-ns_to_timeval
1270*4882a593Smuzhiyun            <idle>-0     [004] d..1   470.867172: smp_apic_timer_interrupt <-apic_timer_interrupt
1271*4882a593Smuzhiyun            <idle>-0     [004] d..1   470.867172: native_apic_mem_write <-smp_apic_timer_interrupt
1272*4882a593Smuzhiyun            <idle>-0     [004] d..1   470.867172: irq_enter <-smp_apic_timer_interrupt
1273*4882a593Smuzhiyun            <idle>-0     [004] d..1   470.867172: rcu_irq_enter <-irq_enter
1274*4882a593Smuzhiyun            <idle>-0     [004] d..1   470.867173: rcu_idle_exit_common.isra.33 <-rcu_irq_enter
1275*4882a593Smuzhiyun            <idle>-0     [004] d..1   470.867173: local_bh_disable <-irq_enter
1276*4882a593Smuzhiyun            <idle>-0     [004] d..1   470.867173: add_preempt_count <-local_bh_disable
1277*4882a593Smuzhiyun            <idle>-0     [004] d.s1   470.867174: tick_check_idle <-irq_enter
1278*4882a593Smuzhiyun            <idle>-0     [004] d.s1   470.867174: tick_check_oneshot_broadcast <-tick_check_idle
1279*4882a593Smuzhiyun            <idle>-0     [004] d.s1   470.867174: ktime_get <-tick_check_idle
1280*4882a593Smuzhiyun            <idle>-0     [004] d.s1   470.867174: tick_nohz_stop_idle <-tick_check_idle
1281*4882a593Smuzhiyun            <idle>-0     [004] d.s1   470.867175: update_ts_time_stats <-tick_nohz_stop_idle
1282*4882a593Smuzhiyun            <idle>-0     [004] d.s1   470.867175: nr_iowait_cpu <-update_ts_time_stats
1283*4882a593Smuzhiyun            <idle>-0     [004] d.s1   470.867175: tick_do_update_jiffies64 <-tick_check_idle
1284*4882a593Smuzhiyun            <idle>-0     [004] d.s1   470.867175: _raw_spin_lock <-tick_do_update_jiffies64
1285*4882a593Smuzhiyun            <idle>-0     [004] d.s1   470.867176: add_preempt_count <-_raw_spin_lock
1286*4882a593Smuzhiyun            <idle>-0     [004] d.s2   470.867176: do_timer <-tick_do_update_jiffies64
1287*4882a593Smuzhiyun            <idle>-0     [004] d.s2   470.867176: _raw_spin_lock <-do_timer
1288*4882a593Smuzhiyun            <idle>-0     [004] d.s2   470.867176: add_preempt_count <-_raw_spin_lock
1289*4882a593Smuzhiyun            <idle>-0     [004] d.s3   470.867177: ntp_tick_length <-do_timer
1290*4882a593Smuzhiyun            <idle>-0     [004] d.s3   470.867177: _raw_spin_lock_irqsave <-ntp_tick_length
1291*4882a593Smuzhiyun            .
1292*4882a593Smuzhiyun            .
1293*4882a593Smuzhiyun            .
1294*4882a593Smuzhiyun
1295*4882a593SmuzhiyunEach line in the trace above shows what was happening in the kernel on a given
1296*4882a593Smuzhiyuncpu, to the level of detail of function calls. Each entry shows the function
1297*4882a593Smuzhiyuncalled, followed by its caller (after the arrow).
1298*4882a593Smuzhiyun
1299*4882a593SmuzhiyunThe function tracer gives you an extremely detailed idea of what the
1300*4882a593Smuzhiyunkernel was doing at the point in time the trace was taken, and is a
1301*4882a593Smuzhiyungreat way to learn about how the kernel code works in a dynamic sense.
1302*4882a593Smuzhiyun
1303*4882a593Smuzhiyun.. admonition:: Tying it Together
1304*4882a593Smuzhiyun
1305*4882a593Smuzhiyun   The ftrace function tracer is also available from within perf, as the
1306*4882a593Smuzhiyun   ftrace:function tracepoint.
1307*4882a593Smuzhiyun
1308*4882a593SmuzhiyunIt is a little more difficult to follow the call chains than it needs to
1309*4882a593Smuzhiyunbe - luckily there's a variant of the function tracer that displays the
1310*4882a593Smuzhiyuncallchains explicitly, called the 'function_graph' tracer::
1311*4882a593Smuzhiyun
1312*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# echo function_graph > current_tracer
1313*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# cat trace | less
1314*4882a593Smuzhiyun
1315*4882a593Smuzhiyun    tracer: function_graph
1316*4882a593Smuzhiyun
1317*4882a593Smuzhiyun    CPU  DURATION                  FUNCTION CALLS
1318*4882a593Smuzhiyun    |     |   |                     |   |   |   |
1319*4882a593Smuzhiyun   7)   0.046 us    |      pick_next_task_fair();
1320*4882a593Smuzhiyun   7)   0.043 us    |      pick_next_task_stop();
1321*4882a593Smuzhiyun   7)   0.042 us    |      pick_next_task_rt();
1322*4882a593Smuzhiyun   7)   0.032 us    |      pick_next_task_fair();
1323*4882a593Smuzhiyun   7)   0.030 us    |      pick_next_task_idle();
1324*4882a593Smuzhiyun   7)               |      _raw_spin_unlock_irq() {
1325*4882a593Smuzhiyun   7)   0.033 us    |        sub_preempt_count();
1326*4882a593Smuzhiyun   7)   0.258 us    |      }
1327*4882a593Smuzhiyun   7)   0.032 us    |      sub_preempt_count();
1328*4882a593Smuzhiyun   7) + 13.341 us   |    } /* __schedule */
1329*4882a593Smuzhiyun   7)   0.095 us    |  } /* sub_preempt_count */
1330*4882a593Smuzhiyun   7)               |  schedule() {
1331*4882a593Smuzhiyun   7)               |    __schedule() {
1332*4882a593Smuzhiyun   7)   0.060 us    |      add_preempt_count();
1333*4882a593Smuzhiyun   7)   0.044 us    |      rcu_note_context_switch();
1334*4882a593Smuzhiyun   7)               |      _raw_spin_lock_irq() {
1335*4882a593Smuzhiyun   7)   0.033 us    |        add_preempt_count();
1336*4882a593Smuzhiyun   7)   0.247 us    |      }
1337*4882a593Smuzhiyun   7)               |      idle_balance() {
1338*4882a593Smuzhiyun   7)               |        _raw_spin_unlock() {
1339*4882a593Smuzhiyun   7)   0.031 us    |          sub_preempt_count();
1340*4882a593Smuzhiyun   7)   0.246 us    |        }
1341*4882a593Smuzhiyun   7)               |        update_shares() {
1342*4882a593Smuzhiyun   7)   0.030 us    |          __rcu_read_lock();
1343*4882a593Smuzhiyun   7)   0.029 us    |          __rcu_read_unlock();
1344*4882a593Smuzhiyun   7)   0.484 us    |        }
1345*4882a593Smuzhiyun   7)   0.030 us    |        __rcu_read_lock();
1346*4882a593Smuzhiyun   7)               |        load_balance() {
1347*4882a593Smuzhiyun   7)               |          find_busiest_group() {
1348*4882a593Smuzhiyun   7)   0.031 us    |            idle_cpu();
1349*4882a593Smuzhiyun   7)   0.029 us    |            idle_cpu();
1350*4882a593Smuzhiyun   7)   0.035 us    |            idle_cpu();
1351*4882a593Smuzhiyun   7)   0.906 us    |          }
1352*4882a593Smuzhiyun   7)   1.141 us    |        }
1353*4882a593Smuzhiyun   7)   0.022 us    |        msecs_to_jiffies();
1354*4882a593Smuzhiyun   7)               |        load_balance() {
1355*4882a593Smuzhiyun   7)               |          find_busiest_group() {
1356*4882a593Smuzhiyun   7)   0.031 us    |            idle_cpu();
1357*4882a593Smuzhiyun   .
1358*4882a593Smuzhiyun   .
1359*4882a593Smuzhiyun   .
1360*4882a593Smuzhiyun   4)   0.062 us    |        msecs_to_jiffies();
1361*4882a593Smuzhiyun   4)   0.062 us    |        __rcu_read_unlock();
1362*4882a593Smuzhiyun   4)               |        _raw_spin_lock() {
1363*4882a593Smuzhiyun   4)   0.073 us    |          add_preempt_count();
1364*4882a593Smuzhiyun   4)   0.562 us    |        }
1365*4882a593Smuzhiyun   4) + 17.452 us   |      }
1366*4882a593Smuzhiyun   4)   0.108 us    |      put_prev_task_fair();
1367*4882a593Smuzhiyun   4)   0.102 us    |      pick_next_task_fair();
1368*4882a593Smuzhiyun   4)   0.084 us    |      pick_next_task_stop();
1369*4882a593Smuzhiyun   4)   0.075 us    |      pick_next_task_rt();
1370*4882a593Smuzhiyun   4)   0.062 us    |      pick_next_task_fair();
1371*4882a593Smuzhiyun   4)   0.066 us    |      pick_next_task_idle();
1372*4882a593Smuzhiyun   ------------------------------------------
1373*4882a593Smuzhiyun   4)   kworker-74   =>    <idle>-0
1374*4882a593Smuzhiyun   ------------------------------------------
1375*4882a593Smuzhiyun
1376*4882a593Smuzhiyun   4)               |      finish_task_switch() {
1377*4882a593Smuzhiyun   4)               |        _raw_spin_unlock_irq() {
1378*4882a593Smuzhiyun   4)   0.100 us    |          sub_preempt_count();
1379*4882a593Smuzhiyun   4)   0.582 us    |        }
1380*4882a593Smuzhiyun   4)   1.105 us    |      }
1381*4882a593Smuzhiyun   4)   0.088 us    |      sub_preempt_count();
1382*4882a593Smuzhiyun   4) ! 100.066 us  |    }
1383*4882a593Smuzhiyun   .
1384*4882a593Smuzhiyun   .
1385*4882a593Smuzhiyun   .
1386*4882a593Smuzhiyun   3)               |  sys_ioctl() {
1387*4882a593Smuzhiyun   3)   0.083 us    |    fget_light();
1388*4882a593Smuzhiyun   3)               |    security_file_ioctl() {
1389*4882a593Smuzhiyun   3)   0.066 us    |      cap_file_ioctl();
1390*4882a593Smuzhiyun   3)   0.562 us    |    }
1391*4882a593Smuzhiyun   3)               |    do_vfs_ioctl() {
1392*4882a593Smuzhiyun   3)               |      drm_ioctl() {
1393*4882a593Smuzhiyun   3)   0.075 us    |        drm_ut_debug_printk();
1394*4882a593Smuzhiyun   3)               |        i915_gem_pwrite_ioctl() {
1395*4882a593Smuzhiyun   3)               |          i915_mutex_lock_interruptible() {
1396*4882a593Smuzhiyun   3)   0.070 us    |            mutex_lock_interruptible();
1397*4882a593Smuzhiyun   3)   0.570 us    |          }
1398*4882a593Smuzhiyun   3)               |          drm_gem_object_lookup() {
1399*4882a593Smuzhiyun   3)               |            _raw_spin_lock() {
1400*4882a593Smuzhiyun   3)   0.080 us    |              add_preempt_count();
1401*4882a593Smuzhiyun   3)   0.620 us    |            }
1402*4882a593Smuzhiyun   3)               |            _raw_spin_unlock() {
1403*4882a593Smuzhiyun   3)   0.085 us    |              sub_preempt_count();
1404*4882a593Smuzhiyun   3)   0.562 us    |            }
1405*4882a593Smuzhiyun   3)   2.149 us    |          }
1406*4882a593Smuzhiyun   3)   0.133 us    |          i915_gem_object_pin();
1407*4882a593Smuzhiyun   3)               |          i915_gem_object_set_to_gtt_domain() {
1408*4882a593Smuzhiyun   3)   0.065 us    |            i915_gem_object_flush_gpu_write_domain();
1409*4882a593Smuzhiyun   3)   0.065 us    |            i915_gem_object_wait_rendering();
1410*4882a593Smuzhiyun   3)   0.062 us    |            i915_gem_object_flush_cpu_write_domain();
1411*4882a593Smuzhiyun   3)   1.612 us    |          }
1412*4882a593Smuzhiyun   3)               |          i915_gem_object_put_fence() {
1413*4882a593Smuzhiyun   3)   0.097 us    |            i915_gem_object_flush_fence.constprop.36();
1414*4882a593Smuzhiyun   3)   0.645 us    |          }
1415*4882a593Smuzhiyun   3)   0.070 us    |          add_preempt_count();
1416*4882a593Smuzhiyun   3)   0.070 us    |          sub_preempt_count();
1417*4882a593Smuzhiyun   3)   0.073 us    |          i915_gem_object_unpin();
1418*4882a593Smuzhiyun   3)   0.068 us    |          mutex_unlock();
1419*4882a593Smuzhiyun   3)   9.924 us    |        }
1420*4882a593Smuzhiyun   3) + 11.236 us   |      }
1421*4882a593Smuzhiyun   3) + 11.770 us   |    }
1422*4882a593Smuzhiyun   3) + 13.784 us   |  }
1423*4882a593Smuzhiyun   3)               |  sys_ioctl() {
1424*4882a593Smuzhiyun
1425*4882a593SmuzhiyunAs you can see, the function_graph display is much easier
1426*4882a593Smuzhiyunto follow. Also note that in addition to the function calls and
1427*4882a593Smuzhiyunassociated braces, other events such as scheduler events are displayed
1428*4882a593Smuzhiyunin context. In fact, you can freely include any tracepoint available in
1429*4882a593Smuzhiyunthe trace events subsystem described in the next section by simply
1430*4882a593Smuzhiyunenabling those events, and they'll appear in context in the function
1431*4882a593Smuzhiyungraph display. Quite a powerful tool for understanding kernel dynamics.
1432*4882a593Smuzhiyun
1433*4882a593SmuzhiyunAlso notice that there are various annotations on the left hand side of
1434*4882a593Smuzhiyunthe display. For example if the total time it took for a given function
1435*4882a593Smuzhiyunto execute is above a certain threshold, an exclamation point or plus
1436*4882a593Smuzhiyunsign appears on the left hand side. Please see the ftrace documentation
1437*4882a593Smuzhiyunfor details on all these fields.
1438*4882a593Smuzhiyun
1439*4882a593SmuzhiyunThe 'trace events' Subsystem
1440*4882a593Smuzhiyun----------------------------
1441*4882a593Smuzhiyun
1442*4882a593SmuzhiyunOne especially important directory contained within the
1443*4882a593Smuzhiyun/sys/kernel/debug/tracing directory is the 'events' subdirectory, which
1444*4882a593Smuzhiyuncontains representations of every tracepoint in the system. Listing out
1445*4882a593Smuzhiyunthe contents of the 'events' subdirectory, we see mainly another set of
1446*4882a593Smuzhiyunsubdirectories::
1447*4882a593Smuzhiyun
1448*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# cd events
1449*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing/events# ls -al
1450*4882a593Smuzhiyun   drwxr-xr-x   38 root     root             0 Nov 14 23:19 .
1451*4882a593Smuzhiyun   drwxr-xr-x    5 root     root             0 Nov 14 23:19 ..
1452*4882a593Smuzhiyun   drwxr-xr-x   19 root     root             0 Nov 14 23:19 block
1453*4882a593Smuzhiyun   drwxr-xr-x   32 root     root             0 Nov 14 23:19 btrfs
1454*4882a593Smuzhiyun   drwxr-xr-x    5 root     root             0 Nov 14 23:19 drm
1455*4882a593Smuzhiyun   -rw-r--r--    1 root     root             0 Nov 14 23:19 enable
1456*4882a593Smuzhiyun   drwxr-xr-x   40 root     root             0 Nov 14 23:19 ext3
1457*4882a593Smuzhiyun   drwxr-xr-x   79 root     root             0 Nov 14 23:19 ext4
1458*4882a593Smuzhiyun   drwxr-xr-x   14 root     root             0 Nov 14 23:19 ftrace
1459*4882a593Smuzhiyun   drwxr-xr-x    8 root     root             0 Nov 14 23:19 hda
1460*4882a593Smuzhiyun   -r--r--r--    1 root     root             0 Nov 14 23:19 header_event
1461*4882a593Smuzhiyun   -r--r--r--    1 root     root             0 Nov 14 23:19 header_page
1462*4882a593Smuzhiyun   drwxr-xr-x   25 root     root             0 Nov 14 23:19 i915
1463*4882a593Smuzhiyun   drwxr-xr-x    7 root     root             0 Nov 14 23:19 irq
1464*4882a593Smuzhiyun   drwxr-xr-x   12 root     root             0 Nov 14 23:19 jbd
1465*4882a593Smuzhiyun   drwxr-xr-x   14 root     root             0 Nov 14 23:19 jbd2
1466*4882a593Smuzhiyun   drwxr-xr-x   14 root     root             0 Nov 14 23:19 kmem
1467*4882a593Smuzhiyun   drwxr-xr-x    7 root     root             0 Nov 14 23:19 module
1468*4882a593Smuzhiyun   drwxr-xr-x    3 root     root             0 Nov 14 23:19 napi
1469*4882a593Smuzhiyun   drwxr-xr-x    6 root     root             0 Nov 14 23:19 net
1470*4882a593Smuzhiyun   drwxr-xr-x    3 root     root             0 Nov 14 23:19 oom
1471*4882a593Smuzhiyun   drwxr-xr-x   12 root     root             0 Nov 14 23:19 power
1472*4882a593Smuzhiyun   drwxr-xr-x    3 root     root             0 Nov 14 23:19 printk
1473*4882a593Smuzhiyun   drwxr-xr-x    8 root     root             0 Nov 14 23:19 random
1474*4882a593Smuzhiyun   drwxr-xr-x    4 root     root             0 Nov 14 23:19 raw_syscalls
1475*4882a593Smuzhiyun   drwxr-xr-x    3 root     root             0 Nov 14 23:19 rcu
1476*4882a593Smuzhiyun   drwxr-xr-x    6 root     root             0 Nov 14 23:19 rpm
1477*4882a593Smuzhiyun   drwxr-xr-x   20 root     root             0 Nov 14 23:19 sched
1478*4882a593Smuzhiyun   drwxr-xr-x    7 root     root             0 Nov 14 23:19 scsi
1479*4882a593Smuzhiyun   drwxr-xr-x    4 root     root             0 Nov 14 23:19 signal
1480*4882a593Smuzhiyun   drwxr-xr-x    5 root     root             0 Nov 14 23:19 skb
1481*4882a593Smuzhiyun   drwxr-xr-x    4 root     root             0 Nov 14 23:19 sock
1482*4882a593Smuzhiyun   drwxr-xr-x   10 root     root             0 Nov 14 23:19 sunrpc
1483*4882a593Smuzhiyun   drwxr-xr-x  538 root     root             0 Nov 14 23:19 syscalls
1484*4882a593Smuzhiyun   drwxr-xr-x    4 root     root             0 Nov 14 23:19 task
1485*4882a593Smuzhiyun   drwxr-xr-x   14 root     root             0 Nov 14 23:19 timer
1486*4882a593Smuzhiyun   drwxr-xr-x    3 root     root             0 Nov 14 23:19 udp
1487*4882a593Smuzhiyun   drwxr-xr-x   21 root     root             0 Nov 14 23:19 vmscan
1488*4882a593Smuzhiyun   drwxr-xr-x    3 root     root             0 Nov 14 23:19 vsyscall
1489*4882a593Smuzhiyun   drwxr-xr-x    6 root     root             0 Nov 14 23:19 workqueue
1490*4882a593Smuzhiyun   drwxr-xr-x   26 root     root             0 Nov 14 23:19 writeback
1491*4882a593Smuzhiyun
1492*4882a593SmuzhiyunEach one of these subdirectories
1493*4882a593Smuzhiyuncorresponds to a 'subsystem' and contains yet again more subdirectories,
1494*4882a593Smuzhiyuneach one of those finally corresponding to a tracepoint. For example,
1495*4882a593Smuzhiyunhere are the contents of the 'kmem' subsystem::
1496*4882a593Smuzhiyun
1497*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing/events# cd kmem
1498*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing/events/kmem# ls -al
1499*4882a593Smuzhiyun   drwxr-xr-x   14 root     root             0 Nov 14 23:19 .
1500*4882a593Smuzhiyun   drwxr-xr-x   38 root     root             0 Nov 14 23:19 ..
1501*4882a593Smuzhiyun   -rw-r--r--    1 root     root             0 Nov 14 23:19 enable
1502*4882a593Smuzhiyun   -rw-r--r--    1 root     root             0 Nov 14 23:19 filter
1503*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 kfree
1504*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 kmalloc
1505*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 kmalloc_node
1506*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 kmem_cache_alloc
1507*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 kmem_cache_alloc_node
1508*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 kmem_cache_free
1509*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_alloc
1510*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_alloc_extfrag
1511*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_alloc_zone_locked
1512*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_free
1513*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_free_batched
1514*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_pcpu_drain
1515*4882a593Smuzhiyun
1516*4882a593SmuzhiyunLet's see what's inside the subdirectory for a
1517*4882a593Smuzhiyunspecific tracepoint, in this case the one for kmalloc::
1518*4882a593Smuzhiyun
1519*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing/events/kmem# cd kmalloc
1520*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# ls -al
1521*4882a593Smuzhiyun   drwxr-xr-x    2 root     root             0 Nov 14 23:19 .
1522*4882a593Smuzhiyun   drwxr-xr-x   14 root     root             0 Nov 14 23:19 ..
1523*4882a593Smuzhiyun   -rw-r--r--    1 root     root             0 Nov 14 23:19 enable
1524*4882a593Smuzhiyun   -rw-r--r--    1 root     root             0 Nov 14 23:19 filter
1525*4882a593Smuzhiyun   -r--r--r--    1 root     root             0 Nov 14 23:19 format
1526*4882a593Smuzhiyun   -r--r--r--    1 root     root             0 Nov 14 23:19 id
1527*4882a593Smuzhiyun
1528*4882a593SmuzhiyunThe 'format' file for the
1529*4882a593Smuzhiyuntracepoint describes the event in memory, which is used by the various
1530*4882a593Smuzhiyuntracing tools that now make use of these tracepoint to parse the event
1531*4882a593Smuzhiyunand make sense of it, along with a 'print fmt' field that allows tools
1532*4882a593Smuzhiyunlike ftrace to display the event as text. Here's what the format of the
1533*4882a593Smuzhiyunkmalloc event looks like::
1534*4882a593Smuzhiyun
1535*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format
1536*4882a593Smuzhiyun   name: kmalloc
1537*4882a593Smuzhiyun   ID: 313
1538*4882a593Smuzhiyun   format:
1539*4882a593Smuzhiyun           field:unsigned short common_type;	offset:0;	size:2;	signed:0;
1540*4882a593Smuzhiyun           field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
1541*4882a593Smuzhiyun           field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
1542*4882a593Smuzhiyun           field:int common_pid;	offset:4;	size:4;	signed:1;
1543*4882a593Smuzhiyun           field:int common_padding;	offset:8;	size:4;	signed:1;
1544*4882a593Smuzhiyun
1545*4882a593Smuzhiyun           field:unsigned long call_site;	offset:16;	size:8;	signed:0;
1546*4882a593Smuzhiyun           field:const void * ptr;	offset:24;	size:8;	signed:0;
1547*4882a593Smuzhiyun           field:size_t bytes_req;	offset:32;	size:8;	signed:0;
1548*4882a593Smuzhiyun           field:size_t bytes_alloc;	offset:40;	size:8;	signed:0;
1549*4882a593Smuzhiyun           field:gfp_t gfp_flags;	offset:48;	size:4;	signed:0;
1550*4882a593Smuzhiyun
1551*4882a593Smuzhiyun   print fmt: "call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s", REC->call_site, REC->ptr, REC->bytes_req, REC->bytes_alloc,
1552*4882a593Smuzhiyun   (REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {(unsigned long)(((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((
1553*4882a593Smuzhiyun   gfp_t)0x20000u) | (( gfp_t)0x02u) | (( gfp_t)0x08u)) | (( gfp_t)0x4000u) | (( gfp_t)0x10000u) | (( gfp_t)0x1000u) | (( gfp_t)0x200u) | ((
1554*4882a593Smuzhiyun   gfp_t)0x400000u)), "GFP_TRANSHUGE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x20000u) | ((
1555*4882a593Smuzhiyun   gfp_t)0x02u) | (( gfp_t)0x08u)), "GFP_HIGHUSER_MOVABLE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((
1556*4882a593Smuzhiyun   gfp_t)0x20000u) | (( gfp_t)0x02u)), "GFP_HIGHUSER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((
1557*4882a593Smuzhiyun   gfp_t)0x20000u)), "GFP_USER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x80000u)), GFP_TEMPORARY"},
1558*4882a593Smuzhiyun   {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u)), "GFP_KERNEL"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u)),
1559*4882a593Smuzhiyun   "GFP_NOFS"}, {(unsigned long)((( gfp_t)0x20u)), "GFP_ATOMIC"}, {(unsigned long)((( gfp_t)0x10u)), "GFP_NOIO"}, {(unsigned long)((
1560*4882a593Smuzhiyun   gfp_t)0x20u), "GFP_HIGH"}, {(unsigned long)(( gfp_t)0x10u), "GFP_WAIT"}, {(unsigned long)(( gfp_t)0x40u), "GFP_IO"}, {(unsigned long)((
1561*4882a593Smuzhiyun   gfp_t)0x100u), "GFP_COLD"}, {(unsigned long)(( gfp_t)0x200u), "GFP_NOWARN"}, {(unsigned long)(( gfp_t)0x400u), "GFP_REPEAT"}, {(unsigned
1562*4882a593Smuzhiyun   long)(( gfp_t)0x800u), "GFP_NOFAIL"}, {(unsigned long)(( gfp_t)0x1000u), "GFP_NORETRY"},      {(unsigned long)(( gfp_t)0x4000u), "GFP_COMP"},
1563*4882a593Smuzhiyun   {(unsigned long)(( gfp_t)0x8000u), "GFP_ZERO"}, {(unsigned long)(( gfp_t)0x10000u), "GFP_NOMEMALLOC"}, {(unsigned long)(( gfp_t)0x20000u),
1564*4882a593Smuzhiyun   "GFP_HARDWALL"}, {(unsigned long)(( gfp_t)0x40000u), "GFP_THISNODE"}, {(unsigned long)(( gfp_t)0x80000u), "GFP_RECLAIMABLE"}, {(unsigned
1565*4882a593Smuzhiyun   long)(( gfp_t)0x08u), "GFP_MOVABLE"}, {(unsigned long)(( gfp_t)0), "GFP_NOTRACK"}, {(unsigned long)(( gfp_t)0x400000u), "GFP_NO_KSWAPD"},
1566*4882a593Smuzhiyun   {(unsigned long)(( gfp_t)0x800000u), "GFP_OTHER_NODE"} ) : "GFP_NOWAIT"
1567*4882a593Smuzhiyun
1568*4882a593SmuzhiyunThe 'enable' file
1569*4882a593Smuzhiyunin the tracepoint directory is what allows the user (or tools such as
1570*4882a593Smuzhiyuntrace-cmd) to actually turn the tracepoint on and off. When enabled, the
1571*4882a593Smuzhiyuncorresponding tracepoint will start appearing in the ftrace 'trace' file
1572*4882a593Smuzhiyundescribed previously. For example, this turns on the kmalloc tracepoint::
1573*4882a593Smuzhiyun
1574*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 1 > enable
1575*4882a593Smuzhiyun
1576*4882a593SmuzhiyunAt the moment, we're not interested in the function tracer or
1577*4882a593Smuzhiyunsome other tracer that might be in effect, so we first turn it off, but
1578*4882a593Smuzhiyunif we do that, we still need to turn tracing on in order to see the
1579*4882a593Smuzhiyunevents in the output buffer::
1580*4882a593Smuzhiyun
1581*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# echo nop > current_tracer
1582*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# echo 1 > tracing_on
1583*4882a593Smuzhiyun
1584*4882a593SmuzhiyunNow, if we look at the 'trace' file, we see nothing
1585*4882a593Smuzhiyunbut the kmalloc events we just turned on::
1586*4882a593Smuzhiyun
1587*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing# cat trace | less
1588*4882a593Smuzhiyun   # tracer: nop
1589*4882a593Smuzhiyun   #
1590*4882a593Smuzhiyun   # entries-in-buffer/entries-written: 1897/1897   #P:8
1591*4882a593Smuzhiyun   #
1592*4882a593Smuzhiyun   #                              _-----=> irqs-off
1593*4882a593Smuzhiyun   #                             / _----=> need-resched
1594*4882a593Smuzhiyun   #                            | / _---=> hardirq/softirq
1595*4882a593Smuzhiyun   #                            || / _--=> preempt-depth
1596*4882a593Smuzhiyun   #                            ||| /     delay
1597*4882a593Smuzhiyun   #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
1598*4882a593Smuzhiyun   #              | |       |   ||||       |         |
1599*4882a593Smuzhiyun          dropbear-1465  [000] ...1 18154.620753: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1600*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18154.621640: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1601*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18154.621656: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1602*4882a593Smuzhiyun   matchbox-termin-1361  [001] ...1 18154.755472: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f0e00 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT
1603*4882a593Smuzhiyun              Xorg-1264  [002] ...1 18154.755581: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY
1604*4882a593Smuzhiyun              Xorg-1264  [002] ...1 18154.755583: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
1605*4882a593Smuzhiyun              Xorg-1264  [002] ...1 18154.755589: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO
1606*4882a593Smuzhiyun   matchbox-termin-1361  [001] ...1 18155.354594: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db35400 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT
1607*4882a593Smuzhiyun              Xorg-1264  [002] ...1 18155.354703: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY
1608*4882a593Smuzhiyun              Xorg-1264  [002] ...1 18155.354705: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
1609*4882a593Smuzhiyun              Xorg-1264  [002] ...1 18155.354711: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO
1610*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18155.673319: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1611*4882a593Smuzhiyun          dropbear-1465  [000] ...1 18155.673525: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1612*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18155.674821: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1613*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18155.793014: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1614*4882a593Smuzhiyun          dropbear-1465  [000] ...1 18155.793219: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1615*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18155.794147: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1616*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18155.936705: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1617*4882a593Smuzhiyun          dropbear-1465  [000] ...1 18155.936910: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1618*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18155.937869: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1619*4882a593Smuzhiyun   matchbox-termin-1361  [001] ...1 18155.953667: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f2000 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT
1620*4882a593Smuzhiyun              Xorg-1264  [002] ...1 18155.953775: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY
1621*4882a593Smuzhiyun              Xorg-1264  [002] ...1 18155.953777: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
1622*4882a593Smuzhiyun              Xorg-1264  [002] ...1 18155.953783: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO
1623*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18156.176053: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1624*4882a593Smuzhiyun          dropbear-1465  [000] ...1 18156.176257: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1625*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18156.177717: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1626*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18156.399229: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1627*4882a593Smuzhiyun          dropbear-1465  [000] ...1 18156.399434: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_http://rostedt.homelinux.com/kernelshark/req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
1628*4882a593Smuzhiyun            <idle>-0     [000] ..s3 18156.400660: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
1629*4882a593Smuzhiyun   matchbox-termin-1361  [001] ...1 18156.552800: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db34800 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT
1630*4882a593Smuzhiyun
1631*4882a593SmuzhiyunTo again disable the kmalloc event, we need to send 0 to the enable file::
1632*4882a593Smuzhiyun
1633*4882a593Smuzhiyun   root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 0 > enable
1634*4882a593Smuzhiyun
1635*4882a593SmuzhiyunYou can enable any number of events or complete subsystems (by
1636*4882a593Smuzhiyunusing the 'enable' file in the subsystem directory) and get an
1637*4882a593Smuzhiyunarbitrarily fine-grained idea of what's going on in the system by
1638*4882a593Smuzhiyunenabling as many of the appropriate tracepoints as applicable.
1639*4882a593Smuzhiyun
1640*4882a593SmuzhiyunA number of the tools described in this HOWTO do just that, including
1641*4882a593Smuzhiyuntrace-cmd and kernelshark in the next section.
1642*4882a593Smuzhiyun
1643*4882a593Smuzhiyun.. admonition:: Tying it Together
1644*4882a593Smuzhiyun
1645*4882a593Smuzhiyun   These tracepoints and their representation are used not only by
1646*4882a593Smuzhiyun   ftrace, but by many of the other tools covered in this document and
1647*4882a593Smuzhiyun   they form a central point of integration for the various tracers
1648*4882a593Smuzhiyun   available in Linux. They form a central part of the instrumentation
1649*4882a593Smuzhiyun   for the following tools: perf, lttng, ftrace, blktrace and SystemTap
1650*4882a593Smuzhiyun
1651*4882a593Smuzhiyun.. admonition:: Tying it Together
1652*4882a593Smuzhiyun
1653*4882a593Smuzhiyun   Eventually all the special-purpose tracers currently available in
1654*4882a593Smuzhiyun   /sys/kernel/debug/tracing will be removed and replaced with
1655*4882a593Smuzhiyun   equivalent tracers based on the 'trace events' subsystem.
1656*4882a593Smuzhiyun
1657*4882a593Smuzhiyuntrace-cmd/kernelshark
1658*4882a593Smuzhiyun---------------------
1659*4882a593Smuzhiyun
1660*4882a593Smuzhiyuntrace-cmd is essentially an extensive command-line 'wrapper' interface
1661*4882a593Smuzhiyunthat hides the details of all the individual files in
1662*4882a593Smuzhiyun/sys/kernel/debug/tracing, allowing users to specify specific particular
1663*4882a593Smuzhiyunevents within the /sys/kernel/debug/tracing/events/ subdirectory and to
1664*4882a593Smuzhiyuncollect traces and avoid having to deal with those details directly.
1665*4882a593Smuzhiyun
1666*4882a593SmuzhiyunAs yet another layer on top of that, kernelshark provides a GUI that
1667*4882a593Smuzhiyunallows users to start and stop traces and specify sets of events using
1668*4882a593Smuzhiyunan intuitive interface, and view the output as both trace events and as
1669*4882a593Smuzhiyuna per-CPU graphical display. It directly uses 'trace-cmd' as the
1670*4882a593Smuzhiyunplumbing that accomplishes all that underneath the covers (and actually
1671*4882a593Smuzhiyundisplays the trace-cmd command it uses, as we'll see).
1672*4882a593Smuzhiyun
1673*4882a593SmuzhiyunTo start a trace using kernelshark, first start kernelshark::
1674*4882a593Smuzhiyun
1675*4882a593Smuzhiyun   root@sugarbay:~# kernelshark
1676*4882a593Smuzhiyun
1677*4882a593SmuzhiyunThen bring up the 'Capture' dialog by
1678*4882a593Smuzhiyunchoosing from the kernelshark menu::
1679*4882a593Smuzhiyun
1680*4882a593Smuzhiyun   Capture | Record
1681*4882a593Smuzhiyun
1682*4882a593SmuzhiyunThat will display the following dialog, which allows you to choose one or more
1683*4882a593Smuzhiyunevents (or even one or more complete subsystems) to trace:
1684*4882a593Smuzhiyun
1685*4882a593Smuzhiyun.. image:: figures/kernelshark-choose-events.png
1686*4882a593Smuzhiyun   :align: center
1687*4882a593Smuzhiyun
1688*4882a593SmuzhiyunNote that these are exactly the same sets of events described in the
1689*4882a593Smuzhiyunprevious trace events subsystem section, and in fact is where trace-cmd
1690*4882a593Smuzhiyungets them for kernelshark.
1691*4882a593Smuzhiyun
1692*4882a593SmuzhiyunIn the above screenshot, we've decided to explore the graphics subsystem
1693*4882a593Smuzhiyuna bit and so have chosen to trace all the tracepoints contained within
1694*4882a593Smuzhiyunthe 'i915' and 'drm' subsystems.
1695*4882a593Smuzhiyun
1696*4882a593SmuzhiyunAfter doing that, we can start and stop the trace using the 'Run' and
1697*4882a593Smuzhiyun'Stop' button on the lower right corner of the dialog (the same button
1698*4882a593Smuzhiyunwill turn into the 'Stop' button after the trace has started):
1699*4882a593Smuzhiyun
1700*4882a593Smuzhiyun.. image:: figures/kernelshark-output-display.png
1701*4882a593Smuzhiyun   :align: center
1702*4882a593Smuzhiyun
1703*4882a593SmuzhiyunNotice that the right-hand pane shows the exact trace-cmd command-line
1704*4882a593Smuzhiyunthat's used to run the trace, along with the results of the trace-cmd
1705*4882a593Smuzhiyunrun.
1706*4882a593Smuzhiyun
1707*4882a593SmuzhiyunOnce the 'Stop' button is pressed, the graphical view magically fills up
1708*4882a593Smuzhiyunwith a colorful per-cpu display of the trace data, along with the
1709*4882a593Smuzhiyundetailed event listing below that:
1710*4882a593Smuzhiyun
1711*4882a593Smuzhiyun.. image:: figures/kernelshark-i915-display.png
1712*4882a593Smuzhiyun   :align: center
1713*4882a593Smuzhiyun
1714*4882a593SmuzhiyunHere's another example, this time a display resulting from tracing 'all
1715*4882a593Smuzhiyunevents':
1716*4882a593Smuzhiyun
1717*4882a593Smuzhiyun.. image:: figures/kernelshark-all.png
1718*4882a593Smuzhiyun   :align: center
1719*4882a593Smuzhiyun
1720*4882a593SmuzhiyunThe tool is pretty self-explanatory, but for more detailed information
1721*4882a593Smuzhiyunon navigating through the data, see the `kernelshark
1722*4882a593Smuzhiyunwebsite <https://kernelshark.org/Documentation.html>`__.
1723*4882a593Smuzhiyun
1724*4882a593Smuzhiyunftrace Documentation
1725*4882a593Smuzhiyun--------------------
1726*4882a593Smuzhiyun
1727*4882a593SmuzhiyunThe documentation for ftrace can be found in the kernel Documentation
1728*4882a593Smuzhiyundirectory::
1729*4882a593Smuzhiyun
1730*4882a593Smuzhiyun   Documentation/trace/ftrace.txt
1731*4882a593Smuzhiyun
1732*4882a593SmuzhiyunThe documentation for the trace event subsystem can also be found in the kernel
1733*4882a593SmuzhiyunDocumentation directory::
1734*4882a593Smuzhiyun
1735*4882a593Smuzhiyun   Documentation/trace/events.txt
1736*4882a593Smuzhiyun
1737*4882a593SmuzhiyunThere is a nice series of articles on using ftrace and trace-cmd at LWN:
1738*4882a593Smuzhiyun
1739*4882a593Smuzhiyun-  `Debugging the kernel using Ftrace - part
1740*4882a593Smuzhiyun   1 <https://lwn.net/Articles/365835/>`__
1741*4882a593Smuzhiyun
1742*4882a593Smuzhiyun-  `Debugging the kernel using Ftrace - part
1743*4882a593Smuzhiyun   2 <https://lwn.net/Articles/366796/>`__
1744*4882a593Smuzhiyun
1745*4882a593Smuzhiyun-  `Secrets of the Ftrace function
1746*4882a593Smuzhiyun   tracer <https://lwn.net/Articles/370423/>`__
1747*4882a593Smuzhiyun
1748*4882a593Smuzhiyun-  `trace-cmd: A front-end for
1749*4882a593Smuzhiyun   Ftrace <https://lwn.net/Articles/410200/>`__
1750*4882a593Smuzhiyun
1751*4882a593SmuzhiyunSee also `KernelShark's documentation <https://kernelshark.org/Documentation.html>`__
1752*4882a593Smuzhiyunfor further usage details.
1753*4882a593Smuzhiyun
1754*4882a593SmuzhiyunAn amusing yet useful README (a tracing mini-HOWTO) can be found in
1755*4882a593Smuzhiyun``/sys/kernel/debug/tracing/README``.
1756*4882a593Smuzhiyun
1757*4882a593Smuzhiyunsystemtap
1758*4882a593Smuzhiyun=========
1759*4882a593Smuzhiyun
1760*4882a593SmuzhiyunSystemTap is a system-wide script-based tracing and profiling tool.
1761*4882a593Smuzhiyun
1762*4882a593SmuzhiyunSystemTap scripts are C-like programs that are executed in the kernel to
1763*4882a593Smuzhiyungather/print/aggregate data extracted from the context they end up being
1764*4882a593Smuzhiyuninvoked under.
1765*4882a593Smuzhiyun
1766*4882a593SmuzhiyunFor example, this probe from the `SystemTap
1767*4882a593Smuzhiyuntutorial <https://sourceware.org/systemtap/tutorial/>`__ simply prints a
1768*4882a593Smuzhiyunline every time any process on the system open()s a file. For each line,
1769*4882a593Smuzhiyunit prints the executable name of the program that opened the file, along
1770*4882a593Smuzhiyunwith its PID, and the name of the file it opened (or tried to open),
1771*4882a593Smuzhiyunwhich it extracts from the open syscall's argstr.
1772*4882a593Smuzhiyun
1773*4882a593Smuzhiyun.. code-block:: none
1774*4882a593Smuzhiyun
1775*4882a593Smuzhiyun   probe syscall.open
1776*4882a593Smuzhiyun   {
1777*4882a593Smuzhiyun           printf ("%s(%d) open (%s)\n", execname(), pid(), argstr)
1778*4882a593Smuzhiyun   }
1779*4882a593Smuzhiyun
1780*4882a593Smuzhiyun   probe timer.ms(4000) # after 4 seconds
1781*4882a593Smuzhiyun   {
1782*4882a593Smuzhiyun           exit ()
1783*4882a593Smuzhiyun   }
1784*4882a593Smuzhiyun
1785*4882a593SmuzhiyunNormally, to execute this
1786*4882a593Smuzhiyunprobe, you'd simply install systemtap on the system you want to probe,
1787*4882a593Smuzhiyunand directly run the probe on that system e.g. assuming the name of the
1788*4882a593Smuzhiyunfile containing the above text is trace_open.stp::
1789*4882a593Smuzhiyun
1790*4882a593Smuzhiyun   # stap trace_open.stp
1791*4882a593Smuzhiyun
1792*4882a593SmuzhiyunWhat systemtap does under the covers to run this probe is 1) parse and
1793*4882a593Smuzhiyunconvert the probe to an equivalent 'C' form, 2) compile the 'C' form
1794*4882a593Smuzhiyuninto a kernel module, 3) insert the module into the kernel, which arms
1795*4882a593Smuzhiyunit, and 4) collect the data generated by the probe and display it to the
1796*4882a593Smuzhiyunuser.
1797*4882a593Smuzhiyun
1798*4882a593SmuzhiyunIn order to accomplish steps 1 and 2, the 'stap' program needs access to
1799*4882a593Smuzhiyunthe kernel build system that produced the kernel that the probed system
1800*4882a593Smuzhiyunis running. In the case of a typical embedded system (the 'target'), the
1801*4882a593Smuzhiyunkernel build system unfortunately isn't typically part of the image
1802*4882a593Smuzhiyunrunning on the target. It is normally available on the 'host' system
1803*4882a593Smuzhiyunthat produced the target image however; in such cases, steps 1 and 2 are
1804*4882a593Smuzhiyunexecuted on the host system, and steps 3 and 4 are executed on the
1805*4882a593Smuzhiyuntarget system, using only the systemtap 'runtime'.
1806*4882a593Smuzhiyun
1807*4882a593SmuzhiyunThe systemtap support in Yocto assumes that only steps 3 and 4 are run
1808*4882a593Smuzhiyunon the target; it is possible to do everything on the target, but this
1809*4882a593Smuzhiyunsection assumes only the typical embedded use-case.
1810*4882a593Smuzhiyun
1811*4882a593SmuzhiyunSo basically what you need to do in order to run a systemtap script on
1812*4882a593Smuzhiyunthe target is to 1) on the host system, compile the probe into a kernel
1813*4882a593Smuzhiyunmodule that makes sense to the target, 2) copy the module onto the
1814*4882a593Smuzhiyuntarget system and 3) insert the module into the target kernel, which
1815*4882a593Smuzhiyunarms it, and 4) collect the data generated by the probe and display it
1816*4882a593Smuzhiyunto the user.
1817*4882a593Smuzhiyun
1818*4882a593Smuzhiyunsystemtap Setup
1819*4882a593Smuzhiyun---------------
1820*4882a593Smuzhiyun
1821*4882a593SmuzhiyunThose are a lot of steps and a lot of details, but fortunately Yocto
1822*4882a593Smuzhiyunincludes a script called 'crosstap' that will take care of those
1823*4882a593Smuzhiyundetails, allowing you to simply execute a systemtap script on the remote
1824*4882a593Smuzhiyuntarget, with arguments if necessary.
1825*4882a593Smuzhiyun
1826*4882a593SmuzhiyunIn order to do this from a remote host, however, you need to have access
1827*4882a593Smuzhiyunto the build for the image you booted. The 'crosstap' script provides
1828*4882a593Smuzhiyundetails on how to do this if you run the script on the host without
1829*4882a593Smuzhiyunhaving done a build::
1830*4882a593Smuzhiyun
1831*4882a593Smuzhiyun   $ crosstap root@192.168.1.88 trace_open.stp
1832*4882a593Smuzhiyun
1833*4882a593Smuzhiyun   Error: No target kernel build found.
1834*4882a593Smuzhiyun   Did you forget to create a local build of your image?
1835*4882a593Smuzhiyun
1836*4882a593Smuzhiyun   'crosstap' requires a local sdk build of the target system
1837*4882a593Smuzhiyun   (or a build that includes 'tools-profile') in order to build
1838*4882a593Smuzhiyun   kernel modules that can probe the target system.
1839*4882a593Smuzhiyun
1840*4882a593Smuzhiyun   Practically speaking, that means you need to do the following:
1841*4882a593Smuzhiyun    - If you're running a pre-built image, download the release
1842*4882a593Smuzhiyun      and/or BSP tarballs used to build the image.
1843*4882a593Smuzhiyun    - If you're working from git sources, just clone the metadata
1844*4882a593Smuzhiyun      and BSP layers needed to build the image you'll be booting.
1845*4882a593Smuzhiyun    - Make sure you're properly set up to build a new image (see
1846*4882a593Smuzhiyun      the BSP README and/or the widely available basic documentation
1847*4882a593Smuzhiyun      that discusses how to build images).
1848*4882a593Smuzhiyun    - Build an -sdk version of the image e.g.:
1849*4882a593Smuzhiyun        $ bitbake core-image-sato-sdk
1850*4882a593Smuzhiyun    OR
1851*4882a593Smuzhiyun    - Build a non-sdk image but include the profiling tools:
1852*4882a593Smuzhiyun        [ edit local.conf and add 'tools-profile' to the end of
1853*4882a593Smuzhiyun          the EXTRA_IMAGE_FEATURES variable ]
1854*4882a593Smuzhiyun        $ bitbake core-image-sato
1855*4882a593Smuzhiyun
1856*4882a593Smuzhiyun   Once you've build the image on the host system, you're ready to
1857*4882a593Smuzhiyun   boot it (or the equivalent pre-built image) and use 'crosstap'
1858*4882a593Smuzhiyun   to probe it (you need to source the environment as usual first):
1859*4882a593Smuzhiyun
1860*4882a593Smuzhiyun      $ source oe-init-build-env
1861*4882a593Smuzhiyun      $ cd ~/my/systemtap/scripts
1862*4882a593Smuzhiyun      $ crosstap root@192.168.1.xxx myscript.stp
1863*4882a593Smuzhiyun
1864*4882a593Smuzhiyun.. note::
1865*4882a593Smuzhiyun
1866*4882a593Smuzhiyun   SystemTap, which uses 'crosstap', assumes you can establish an ssh
1867*4882a593Smuzhiyun   connection to the remote target. Please refer to the crosstap wiki
1868*4882a593Smuzhiyun   page for details on verifying ssh connections at
1869*4882a593Smuzhiyun   . Also, the ability to ssh into the target system is not enabled by
1870*4882a593Smuzhiyun   default in \*-minimal images.
1871*4882a593Smuzhiyun
1872*4882a593SmuzhiyunSo essentially what you need to
1873*4882a593Smuzhiyundo is build an SDK image or image with 'tools-profile' as detailed in
1874*4882a593Smuzhiyunthe ":ref:`profile-manual/intro:General Setup`" section of this
1875*4882a593Smuzhiyunmanual, and boot the resulting target image.
1876*4882a593Smuzhiyun
1877*4882a593Smuzhiyun.. note::
1878*4882a593Smuzhiyun
1879*4882a593Smuzhiyun   If you have a build directory containing multiple machines, you need
1880*4882a593Smuzhiyun   to have the MACHINE you're connecting to selected in local.conf, and
1881*4882a593Smuzhiyun   the kernel in that machine's build directory must match the kernel on
1882*4882a593Smuzhiyun   the booted system exactly, or you'll get the above 'crosstap' message
1883*4882a593Smuzhiyun   when you try to invoke a script.
1884*4882a593Smuzhiyun
1885*4882a593SmuzhiyunRunning a Script on a Target
1886*4882a593Smuzhiyun----------------------------
1887*4882a593Smuzhiyun
1888*4882a593SmuzhiyunOnce you've done that, you should be able to run a systemtap script on
1889*4882a593Smuzhiyunthe target::
1890*4882a593Smuzhiyun
1891*4882a593Smuzhiyun   $ cd /path/to/yocto
1892*4882a593Smuzhiyun   $ source oe-init-build-env
1893*4882a593Smuzhiyun
1894*4882a593Smuzhiyun   ### Shell environment set up for builds. ###
1895*4882a593Smuzhiyun
1896*4882a593Smuzhiyun   You can now run 'bitbake <target>'
1897*4882a593Smuzhiyun
1898*4882a593Smuzhiyun   Common targets are:
1899*4882a593Smuzhiyun            core-image-minimal
1900*4882a593Smuzhiyun            core-image-sato
1901*4882a593Smuzhiyun            meta-toolchain
1902*4882a593Smuzhiyun            meta-ide-support
1903*4882a593Smuzhiyun
1904*4882a593Smuzhiyun   You can also run generated QEMU images with a command like 'runqemu qemux86-64'
1905*4882a593Smuzhiyun
1906*4882a593SmuzhiyunOnce you've done that, you can cd to whatever
1907*4882a593Smuzhiyundirectory contains your scripts and use 'crosstap' to run the script::
1908*4882a593Smuzhiyun
1909*4882a593Smuzhiyun   $ cd /path/to/my/systemap/script
1910*4882a593Smuzhiyun   $ crosstap root@192.168.7.2 trace_open.stp
1911*4882a593Smuzhiyun
1912*4882a593SmuzhiyunIf you get an error connecting to the target e.g.::
1913*4882a593Smuzhiyun
1914*4882a593Smuzhiyun   $ crosstap root@192.168.7.2 trace_open.stp
1915*4882a593Smuzhiyun   error establishing ssh connection on remote 'root@192.168.7.2'
1916*4882a593Smuzhiyun
1917*4882a593SmuzhiyunTry ssh'ing to the target and see what happens::
1918*4882a593Smuzhiyun
1919*4882a593Smuzhiyun   $ ssh root@192.168.7.2
1920*4882a593Smuzhiyun
1921*4882a593SmuzhiyunA lot of the time, connection
1922*4882a593Smuzhiyunproblems are due specifying a wrong IP address or having a 'host key
1923*4882a593Smuzhiyunverification error'.
1924*4882a593Smuzhiyun
1925*4882a593SmuzhiyunIf everything worked as planned, you should see something like this
1926*4882a593Smuzhiyun(enter the password when prompted, or press enter if it's set up to use
1927*4882a593Smuzhiyunno password):
1928*4882a593Smuzhiyun
1929*4882a593Smuzhiyun.. code-block:: none
1930*4882a593Smuzhiyun
1931*4882a593Smuzhiyun   $ crosstap root@192.168.7.2 trace_open.stp
1932*4882a593Smuzhiyun   root@192.168.7.2's password:
1933*4882a593Smuzhiyun   matchbox-termin(1036) open ("/tmp/vte3FS2LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)
1934*4882a593Smuzhiyun   matchbox-termin(1036) open ("/tmp/vteJMC7LW", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600)
1935*4882a593Smuzhiyun
1936*4882a593Smuzhiyunsystemtap Documentation
1937*4882a593Smuzhiyun-----------------------
1938*4882a593Smuzhiyun
1939*4882a593SmuzhiyunThe SystemTap language reference can be found here: `SystemTap Language
1940*4882a593SmuzhiyunReference <https://sourceware.org/systemtap/langref/>`__
1941*4882a593Smuzhiyun
1942*4882a593SmuzhiyunLinks to other SystemTap documents, tutorials, and examples can be found
1943*4882a593Smuzhiyunhere: `SystemTap documentation
1944*4882a593Smuzhiyunpage <https://sourceware.org/systemtap/documentation.html>`__
1945*4882a593Smuzhiyun
1946*4882a593SmuzhiyunSysprof
1947*4882a593Smuzhiyun=======
1948*4882a593Smuzhiyun
1949*4882a593SmuzhiyunSysprof is a very easy to use system-wide profiler that consists of a
1950*4882a593Smuzhiyunsingle window with three panes and a few buttons which allow you to
1951*4882a593Smuzhiyunstart, stop, and view the profile from one place.
1952*4882a593Smuzhiyun
1953*4882a593SmuzhiyunSysprof Setup
1954*4882a593Smuzhiyun-------------
1955*4882a593Smuzhiyun
1956*4882a593SmuzhiyunFor this section, we'll assume you've already performed the basic setup
1957*4882a593Smuzhiyunoutlined in the ":ref:`profile-manual/intro:General Setup`" section.
1958*4882a593Smuzhiyun
1959*4882a593SmuzhiyunSysprof is a GUI-based application that runs on the target system. For
1960*4882a593Smuzhiyunthe rest of this document we assume you've ssh'ed to the host and will
1961*4882a593Smuzhiyunbe running Sysprof on the target (you can use the '-X' option to ssh and
1962*4882a593Smuzhiyunhave the Sysprof GUI run on the target but display remotely on the host
1963*4882a593Smuzhiyunif you want).
1964*4882a593Smuzhiyun
1965*4882a593SmuzhiyunBasic Sysprof Usage
1966*4882a593Smuzhiyun-------------------
1967*4882a593Smuzhiyun
1968*4882a593SmuzhiyunTo start profiling the system, you simply press the 'Start' button. To
1969*4882a593Smuzhiyunstop profiling and to start viewing the profile data in one easy step,
1970*4882a593Smuzhiyunpress the 'Profile' button.
1971*4882a593Smuzhiyun
1972*4882a593SmuzhiyunOnce you've pressed the profile button, the three panes will fill up
1973*4882a593Smuzhiyunwith profiling data:
1974*4882a593Smuzhiyun
1975*4882a593Smuzhiyun.. image:: figures/sysprof-copy-to-user.png
1976*4882a593Smuzhiyun   :align: center
1977*4882a593Smuzhiyun
1978*4882a593SmuzhiyunThe left pane shows a list of functions and processes. Selecting one of
1979*4882a593Smuzhiyunthose expands that function in the right pane, showing all its callees.
1980*4882a593SmuzhiyunNote that this caller-oriented display is essentially the inverse of
1981*4882a593Smuzhiyunperf's default callee-oriented callchain display.
1982*4882a593Smuzhiyun
1983*4882a593SmuzhiyunIn the screenshot above, we're focusing on ``__copy_to_user_ll()`` and
1984*4882a593Smuzhiyunlooking up the callchain we can see that one of the callers of
1985*4882a593Smuzhiyun``__copy_to_user_ll`` is sys_read() and the complete callpath between them.
1986*4882a593SmuzhiyunNotice that this is essentially a portion of the same information we saw
1987*4882a593Smuzhiyunin the perf display shown in the perf section of this page.
1988*4882a593Smuzhiyun
1989*4882a593Smuzhiyun.. image:: figures/sysprof-copy-from-user.png
1990*4882a593Smuzhiyun   :align: center
1991*4882a593Smuzhiyun
1992*4882a593SmuzhiyunSimilarly, the above is a snapshot of the Sysprof display of a
1993*4882a593Smuzhiyuncopy-from-user callchain.
1994*4882a593Smuzhiyun
1995*4882a593SmuzhiyunFinally, looking at the third Sysprof pane in the lower left, we can see
1996*4882a593Smuzhiyuna list of all the callers of a particular function selected in the top
1997*4882a593Smuzhiyunleft pane. In this case, the lower pane is showing all the callers of
1998*4882a593Smuzhiyun``__mark_inode_dirty``:
1999*4882a593Smuzhiyun
2000*4882a593Smuzhiyun.. image:: figures/sysprof-callers.png
2001*4882a593Smuzhiyun   :align: center
2002*4882a593Smuzhiyun
2003*4882a593SmuzhiyunDouble-clicking on one of those functions will in turn change the focus
2004*4882a593Smuzhiyunto the selected function, and so on.
2005*4882a593Smuzhiyun
2006*4882a593Smuzhiyun.. admonition:: Tying it Together
2007*4882a593Smuzhiyun
2008*4882a593Smuzhiyun   If you like sysprof's 'caller-oriented' display, you may be able to
2009*4882a593Smuzhiyun   approximate it in other tools as well. For example, 'perf report' has
2010*4882a593Smuzhiyun   the -g (--call-graph) option that you can experiment with; one of the
2011*4882a593Smuzhiyun   options is 'caller' for an inverted caller-based callgraph display.
2012*4882a593Smuzhiyun
2013*4882a593SmuzhiyunSysprof Documentation
2014*4882a593Smuzhiyun---------------------
2015*4882a593Smuzhiyun
2016*4882a593SmuzhiyunThere doesn't seem to be any documentation for Sysprof, but maybe that's
2017*4882a593Smuzhiyunbecause it's pretty self-explanatory. The Sysprof website, however, is
2018*4882a593Smuzhiyunhere: `Sysprof, System-wide Performance Profiler for
2019*4882a593SmuzhiyunLinux <http://sysprof.com/>`__
2020*4882a593Smuzhiyun
2021*4882a593SmuzhiyunLTTng (Linux Trace Toolkit, next generation)
2022*4882a593Smuzhiyun============================================
2023*4882a593Smuzhiyun
2024*4882a593SmuzhiyunLTTng Setup
2025*4882a593Smuzhiyun-----------
2026*4882a593Smuzhiyun
2027*4882a593SmuzhiyunFor this section, we'll assume you've already performed the basic setup
2028*4882a593Smuzhiyunoutlined in the ":ref:`profile-manual/intro:General Setup`" section.
2029*4882a593SmuzhiyunLTTng is run on the target system by ssh'ing to it.
2030*4882a593Smuzhiyun
2031*4882a593SmuzhiyunCollecting and Viewing Traces
2032*4882a593Smuzhiyun-----------------------------
2033*4882a593Smuzhiyun
2034*4882a593SmuzhiyunOnce you've applied the above commits and built and booted your image
2035*4882a593Smuzhiyun(you need to build the core-image-sato-sdk image or use one of the other
2036*4882a593Smuzhiyunmethods described in the ":ref:`profile-manual/intro:General Setup`" section), you're ready to start
2037*4882a593Smuzhiyuntracing.
2038*4882a593Smuzhiyun
2039*4882a593SmuzhiyunCollecting and viewing a trace on the target (inside a shell)
2040*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2041*4882a593Smuzhiyun
2042*4882a593SmuzhiyunFirst, from the host, ssh to the target::
2043*4882a593Smuzhiyun
2044*4882a593Smuzhiyun   $ ssh -l root 192.168.1.47
2045*4882a593Smuzhiyun   The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.
2046*4882a593Smuzhiyun   RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.
2047*4882a593Smuzhiyun   Are you sure you want to continue connecting (yes/no)? yes
2048*4882a593Smuzhiyun   Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.
2049*4882a593Smuzhiyun   root@192.168.1.47's password:
2050*4882a593Smuzhiyun
2051*4882a593SmuzhiyunOnce on the target, use these steps to create a trace::
2052*4882a593Smuzhiyun
2053*4882a593Smuzhiyun   root@crownbay:~# lttng create
2054*4882a593Smuzhiyun   Spawning a session daemon
2055*4882a593Smuzhiyun   Session auto-20121015-232120 created.
2056*4882a593Smuzhiyun   Traces will be written in /home/root/lttng-traces/auto-20121015-232120
2057*4882a593Smuzhiyun
2058*4882a593SmuzhiyunEnable the events you want to trace (in this case all kernel events)::
2059*4882a593Smuzhiyun
2060*4882a593Smuzhiyun   root@crownbay:~# lttng enable-event --kernel --all
2061*4882a593Smuzhiyun   All kernel events are enabled in channel channel0
2062*4882a593Smuzhiyun
2063*4882a593SmuzhiyunStart the trace::
2064*4882a593Smuzhiyun
2065*4882a593Smuzhiyun   root@crownbay:~# lttng start
2066*4882a593Smuzhiyun   Tracing started for session auto-20121015-232120
2067*4882a593Smuzhiyun
2068*4882a593SmuzhiyunAnd then stop the trace after awhile or after running a particular workload that
2069*4882a593Smuzhiyunyou want to trace::
2070*4882a593Smuzhiyun
2071*4882a593Smuzhiyun   root@crownbay:~# lttng stop
2072*4882a593Smuzhiyun   Tracing stopped for session auto-20121015-232120
2073*4882a593Smuzhiyun
2074*4882a593SmuzhiyunYou can now view the trace in text form on the target::
2075*4882a593Smuzhiyun
2076*4882a593Smuzhiyun   root@crownbay:~# lttng view
2077*4882a593Smuzhiyun   [23:21:56.989270399] (+?.?????????) sys_geteuid: { 1 }, { }
2078*4882a593Smuzhiyun   [23:21:56.989278081] (+0.000007682) exit_syscall: { 1 }, { ret = 0 }
2079*4882a593Smuzhiyun   [23:21:56.989286043] (+0.000007962) sys_pipe: { 1 }, { fildes = 0xB77B9E8C }
2080*4882a593Smuzhiyun   [23:21:56.989321802] (+0.000035759) exit_syscall: { 1 }, { ret = 0 }
2081*4882a593Smuzhiyun   [23:21:56.989329345] (+0.000007543) sys_mmap_pgoff: { 1 }, { addr = 0x0, len = 10485760, prot = 3, flags = 131362, fd = 4294967295, pgoff = 0 }
2082*4882a593Smuzhiyun   [23:21:56.989351694] (+0.000022349) exit_syscall: { 1 }, { ret = -1247805440 }
2083*4882a593Smuzhiyun   [23:21:56.989432989] (+0.000081295) sys_clone: { 1 }, { clone_flags = 0x411, newsp = 0xB5EFFFE4, parent_tid = 0xFFFFFFFF, child_tid = 0x0 }
2084*4882a593Smuzhiyun   [23:21:56.989477129] (+0.000044140) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 681660, vruntime = 43367983388 }
2085*4882a593Smuzhiyun   [23:21:56.989486697] (+0.000009568) sched_migrate_task: { 1 }, { comm = "lttng-consumerd", tid = 1193, prio = 20, orig_cpu = 1, dest_cpu = 1 }
2086*4882a593Smuzhiyun   [23:21:56.989508418] (+0.000021721) hrtimer_init: { 1 }, { hrtimer = 3970832076, clockid = 1, mode = 1 }
2087*4882a593Smuzhiyun   [23:21:56.989770462] (+0.000262044) hrtimer_cancel: { 1 }, { hrtimer = 3993865440 }
2088*4882a593Smuzhiyun   [23:21:56.989771580] (+0.000001118) hrtimer_cancel: { 0 }, { hrtimer = 3993812192 }
2089*4882a593Smuzhiyun   [23:21:56.989776957] (+0.000005377) hrtimer_expire_entry: { 1 }, { hrtimer = 3993865440, now = 79815980007057, function = 3238465232 }
2090*4882a593Smuzhiyun   [23:21:56.989778145] (+0.000001188) hrtimer_expire_entry: { 0 }, { hrtimer = 3993812192, now = 79815980008174, function = 3238465232 }
2091*4882a593Smuzhiyun   [23:21:56.989791695] (+0.000013550) softirq_raise: { 1 }, { vec = 1 }
2092*4882a593Smuzhiyun   [23:21:56.989795396] (+0.000003701) softirq_raise: { 0 }, { vec = 1 }
2093*4882a593Smuzhiyun   [23:21:56.989800635] (+0.000005239) softirq_raise: { 0 }, { vec = 9 }
2094*4882a593Smuzhiyun   [23:21:56.989807130] (+0.000006495) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 330710, vruntime = 43368314098 }
2095*4882a593Smuzhiyun   [23:21:56.989809993] (+0.000002863) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 1015313, vruntime = 36976733240 }
2096*4882a593Smuzhiyun   [23:21:56.989818514] (+0.000008521) hrtimer_expire_exit: { 0 }, { hrtimer = 3993812192 }
2097*4882a593Smuzhiyun   [23:21:56.989819631] (+0.000001117) hrtimer_expire_exit: { 1 }, { hrtimer = 3993865440 }
2098*4882a593Smuzhiyun   [23:21:56.989821866] (+0.000002235) hrtimer_start: { 0 }, { hrtimer = 3993812192, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }
2099*4882a593Smuzhiyun   [23:21:56.989822984] (+0.000001118) hrtimer_start: { 1 }, { hrtimer = 3993865440, function = 3238465232, expires = 79815981000000, softexpires = 79815981000000 }
2100*4882a593Smuzhiyun   [23:21:56.989832762] (+0.000009778) softirq_entry: { 1 }, { vec = 1 }
2101*4882a593Smuzhiyun   [23:21:56.989833879] (+0.000001117) softirq_entry: { 0 }, { vec = 1 }
2102*4882a593Smuzhiyun   [23:21:56.989838069] (+0.000004190) timer_cancel: { 1 }, { timer = 3993871956 }
2103*4882a593Smuzhiyun   [23:21:56.989839187] (+0.000001118) timer_cancel: { 0 }, { timer = 3993818708 }
2104*4882a593Smuzhiyun   [23:21:56.989841492] (+0.000002305) timer_expire_entry: { 1 }, { timer = 3993871956, now = 79515980, function = 3238277552 }
2105*4882a593Smuzhiyun   [23:21:56.989842819] (+0.000001327) timer_expire_entry: { 0 }, { timer = 3993818708, now = 79515980, function = 3238277552 }
2106*4882a593Smuzhiyun   [23:21:56.989854831] (+0.000012012) sched_stat_runtime: { 1 }, { comm = "lttng-consumerd", tid = 1193, runtime = 49237, vruntime = 43368363335 }
2107*4882a593Smuzhiyun   [23:21:56.989855949] (+0.000001118) sched_stat_runtime: { 0 }, { comm = "lttng-sessiond", tid = 1181, runtime = 45121, vruntime = 36976778361 }
2108*4882a593Smuzhiyun   [23:21:56.989861257] (+0.000005308) sched_stat_sleep: { 1 }, { comm = "kworker/1:1", tid = 21, delay = 9451318 }
2109*4882a593Smuzhiyun   [23:21:56.989862374] (+0.000001117) sched_stat_sleep: { 0 }, { comm = "kworker/0:0", tid = 4, delay = 9958820 }
2110*4882a593Smuzhiyun   [23:21:56.989868241] (+0.000005867) sched_wakeup: { 0 }, { comm = "kworker/0:0", tid = 4, prio = 120, success = 1, target_cpu = 0 }
2111*4882a593Smuzhiyun   [23:21:56.989869358] (+0.000001117) sched_wakeup: { 1 }, { comm = "kworker/1:1", tid = 21, prio = 120, success = 1, target_cpu = 1 }
2112*4882a593Smuzhiyun   [23:21:56.989877460] (+0.000008102) timer_expire_exit: { 1 }, { timer = 3993871956 }
2113*4882a593Smuzhiyun   [23:21:56.989878577] (+0.000001117) timer_expire_exit: { 0 }, { timer = 3993818708 }
2114*4882a593Smuzhiyun   .
2115*4882a593Smuzhiyun   .
2116*4882a593Smuzhiyun   .
2117*4882a593Smuzhiyun
2118*4882a593SmuzhiyunYou can now safely destroy the trace
2119*4882a593Smuzhiyunsession (note that this doesn't delete the trace - it's still there in
2120*4882a593Smuzhiyun~/lttng-traces)::
2121*4882a593Smuzhiyun
2122*4882a593Smuzhiyun   root@crownbay:~# lttng destroy
2123*4882a593Smuzhiyun   Session auto-20121015-232120 destroyed at /home/root
2124*4882a593Smuzhiyun
2125*4882a593SmuzhiyunNote that the trace is saved in a directory of the same name as returned by
2126*4882a593Smuzhiyun'lttng create', under the ~/lttng-traces directory (note that you can change this by
2127*4882a593Smuzhiyunsupplying your own name to 'lttng create')::
2128*4882a593Smuzhiyun
2129*4882a593Smuzhiyun   root@crownbay:~# ls -al ~/lttng-traces
2130*4882a593Smuzhiyun   drwxrwx---    3 root     root          1024 Oct 15 23:21 .
2131*4882a593Smuzhiyun   drwxr-xr-x    5 root     root          1024 Oct 15 23:57 ..
2132*4882a593Smuzhiyun   drwxrwx---    3 root     root          1024 Oct 15 23:21 auto-20121015-232120
2133*4882a593Smuzhiyun
2134*4882a593SmuzhiyunCollecting and viewing a userspace trace on the target (inside a shell)
2135*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2136*4882a593Smuzhiyun
2137*4882a593SmuzhiyunFor LTTng userspace tracing, you need to have a properly instrumented
2138*4882a593Smuzhiyunuserspace program. For this example, we'll use the 'hello' test program
2139*4882a593Smuzhiyungenerated by the lttng-ust build.
2140*4882a593Smuzhiyun
2141*4882a593SmuzhiyunThe 'hello' test program isn't installed on the root filesystem by the lttng-ust
2142*4882a593Smuzhiyunbuild, so we need to copy it over manually. First cd into the build
2143*4882a593Smuzhiyundirectory that contains the hello executable::
2144*4882a593Smuzhiyun
2145*4882a593Smuzhiyun   $ cd build/tmp/work/core2_32-poky-linux/lttng-ust/2.0.5-r0/git/tests/hello/.libs
2146*4882a593Smuzhiyun
2147*4882a593SmuzhiyunCopy that over to the target machine::
2148*4882a593Smuzhiyun
2149*4882a593Smuzhiyun   $ scp hello root@192.168.1.20:
2150*4882a593Smuzhiyun
2151*4882a593SmuzhiyunYou now have the instrumented lttng 'hello world' test program on the
2152*4882a593Smuzhiyuntarget, ready to test.
2153*4882a593Smuzhiyun
2154*4882a593SmuzhiyunFirst, from the host, ssh to the target::
2155*4882a593Smuzhiyun
2156*4882a593Smuzhiyun   $ ssh -l root 192.168.1.47
2157*4882a593Smuzhiyun   The authenticity of host '192.168.1.47 (192.168.1.47)' can't be established.
2158*4882a593Smuzhiyun   RSA key fingerprint is 23:bd:c8:b1:a8:71:52:00:ee:00:4f:64:9e:10:b9:7e.
2159*4882a593Smuzhiyun   Are you sure you want to continue connecting (yes/no)? yes
2160*4882a593Smuzhiyun   Warning: Permanently added '192.168.1.47' (RSA) to the list of known hosts.
2161*4882a593Smuzhiyun   root@192.168.1.47's password:
2162*4882a593Smuzhiyun
2163*4882a593SmuzhiyunOnce on the target, use these steps to create a trace::
2164*4882a593Smuzhiyun
2165*4882a593Smuzhiyun   root@crownbay:~# lttng create
2166*4882a593Smuzhiyun   Session auto-20190303-021943 created.
2167*4882a593Smuzhiyun   Traces will be written in /home/root/lttng-traces/auto-20190303-021943
2168*4882a593Smuzhiyun
2169*4882a593SmuzhiyunEnable the events you want to trace (in this case all userspace events)::
2170*4882a593Smuzhiyun
2171*4882a593Smuzhiyun   root@crownbay:~# lttng enable-event --userspace --all
2172*4882a593Smuzhiyun   All UST events are enabled in channel channel0
2173*4882a593Smuzhiyun
2174*4882a593SmuzhiyunStart the trace::
2175*4882a593Smuzhiyun
2176*4882a593Smuzhiyun   root@crownbay:~# lttng start
2177*4882a593Smuzhiyun   Tracing started for session auto-20190303-021943
2178*4882a593Smuzhiyun
2179*4882a593SmuzhiyunRun the instrumented hello world program::
2180*4882a593Smuzhiyun
2181*4882a593Smuzhiyun   root@crownbay:~# ./hello
2182*4882a593Smuzhiyun   Hello, World!
2183*4882a593Smuzhiyun   Tracing... done.
2184*4882a593Smuzhiyun
2185*4882a593SmuzhiyunAnd then stop the trace after awhile or after running a particular workload
2186*4882a593Smuzhiyunthat you want to trace::
2187*4882a593Smuzhiyun
2188*4882a593Smuzhiyun   root@crownbay:~# lttng stop
2189*4882a593Smuzhiyun   Tracing stopped for session auto-20190303-021943
2190*4882a593Smuzhiyun
2191*4882a593SmuzhiyunYou can now view the trace in text form on the target::
2192*4882a593Smuzhiyun
2193*4882a593Smuzhiyun   root@crownbay:~# lttng view
2194*4882a593Smuzhiyun   [02:31:14.906146544] (+?.?????????) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 0, intfield2 = 0x0, longfield = 0, netintfield = 0, netintfieldhex = 0x0, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4,  seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }
2195*4882a593Smuzhiyun   [02:31:14.906170360] (+0.000023816) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 1, intfield2 = 0x1, longfield = 1, netintfield = 1, netintfieldhex = 0x1, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }
2196*4882a593Smuzhiyun   [02:31:14.906183140] (+0.000012780) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 2, intfield2 = 0x2, longfield = 2, netintfield = 2, netintfieldhex = 0x2, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }
2197*4882a593Smuzhiyun   [02:31:14.906194385] (+0.000011245) hello:1424 ust_tests_hello:tptest: { cpu_id = 1 }, { intfield = 3, intfield2 = 0x3, longfield = 3, netintfield = 3, netintfieldhex = 0x3, arrfield1 = [ [0] = 1, [1] = 2, [2] = 3 ], arrfield2 = "test", _seqfield1_length = 4, seqfield1 = [ [0] = 116, [1] = 101, [2] = 115, [3] = 116 ], _seqfield2_length = 4, seqfield2 = "test", stringfield = "test", floatfield = 2222, doublefield = 2, boolfield = 1 }
2198*4882a593Smuzhiyun   .
2199*4882a593Smuzhiyun   .
2200*4882a593Smuzhiyun   .
2201*4882a593Smuzhiyun
2202*4882a593SmuzhiyunYou can now safely destroy the trace session (note that this doesn't delete the
2203*4882a593Smuzhiyuntrace - it's still there in ~/lttng-traces)::
2204*4882a593Smuzhiyun
2205*4882a593Smuzhiyun   root@crownbay:~# lttng destroy
2206*4882a593Smuzhiyun   Session auto-20190303-021943 destroyed at /home/root
2207*4882a593Smuzhiyun
2208*4882a593SmuzhiyunLTTng Documentation
2209*4882a593Smuzhiyun-------------------
2210*4882a593Smuzhiyun
2211*4882a593SmuzhiyunYou can find the primary LTTng Documentation on the `LTTng
2212*4882a593SmuzhiyunDocumentation <https://lttng.org/docs/>`__ site. The documentation on
2213*4882a593Smuzhiyunthis site is appropriate for intermediate to advanced software
2214*4882a593Smuzhiyundevelopers who are working in a Linux environment and are interested in
2215*4882a593Smuzhiyunefficient software tracing.
2216*4882a593Smuzhiyun
2217*4882a593SmuzhiyunFor information on LTTng in general, visit the `LTTng
2218*4882a593SmuzhiyunProject <https://lttng.org/lttng2.0>`__ site. You can find a "Getting
2219*4882a593SmuzhiyunStarted" link on this site that takes you to an LTTng Quick Start.
2220*4882a593Smuzhiyun
2221*4882a593Smuzhiyunblktrace
2222*4882a593Smuzhiyun========
2223*4882a593Smuzhiyun
2224*4882a593Smuzhiyunblktrace is a tool for tracing and reporting low-level disk I/O.
2225*4882a593Smuzhiyunblktrace provides the tracing half of the equation; its output can be
2226*4882a593Smuzhiyunpiped into the blkparse program, which renders the data in a
2227*4882a593Smuzhiyunhuman-readable form and does some basic analysis:
2228*4882a593Smuzhiyun
2229*4882a593Smuzhiyunblktrace Setup
2230*4882a593Smuzhiyun--------------
2231*4882a593Smuzhiyun
2232*4882a593SmuzhiyunFor this section, we'll assume you've already performed the basic setup
2233*4882a593Smuzhiyunoutlined in the ":ref:`profile-manual/intro:General Setup`"
2234*4882a593Smuzhiyunsection.
2235*4882a593Smuzhiyun
2236*4882a593Smuzhiyunblktrace is an application that runs on the target system. You can run
2237*4882a593Smuzhiyunthe entire blktrace and blkparse pipeline on the target, or you can run
2238*4882a593Smuzhiyunblktrace in 'listen' mode on the target and have blktrace and blkparse
2239*4882a593Smuzhiyuncollect and analyze the data on the host (see the
2240*4882a593Smuzhiyun":ref:`profile-manual/usage:Using blktrace Remotely`" section
2241*4882a593Smuzhiyunbelow). For the rest of this section we assume you've ssh'ed to the host and
2242*4882a593Smuzhiyunwill be running blkrace on the target.
2243*4882a593Smuzhiyun
2244*4882a593SmuzhiyunBasic blktrace Usage
2245*4882a593Smuzhiyun--------------------
2246*4882a593Smuzhiyun
2247*4882a593SmuzhiyunTo record a trace, simply run the 'blktrace' command, giving it the name
2248*4882a593Smuzhiyunof the block device you want to trace activity on::
2249*4882a593Smuzhiyun
2250*4882a593Smuzhiyun   root@crownbay:~# blktrace /dev/sdc
2251*4882a593Smuzhiyun
2252*4882a593SmuzhiyunIn another shell, execute a workload you want to trace. ::
2253*4882a593Smuzhiyun
2254*4882a593Smuzhiyun   root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2; sync
2255*4882a593Smuzhiyun   Connecting to downloads.yoctoproject.org (140.211.169.59:80)
2256*4882a593Smuzhiyun   linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA
2257*4882a593Smuzhiyun
2258*4882a593SmuzhiyunPress Ctrl-C in the blktrace shell to stop the trace. It
2259*4882a593Smuzhiyunwill display how many events were logged, along with the per-cpu file
2260*4882a593Smuzhiyunsizes (blktrace records traces in per-cpu kernel buffers and simply
2261*4882a593Smuzhiyundumps them to userspace for blkparse to merge and sort later). ::
2262*4882a593Smuzhiyun
2263*4882a593Smuzhiyun   ^C=== sdc ===
2264*4882a593Smuzhiyun    CPU  0:                 7082 events,      332 KiB data
2265*4882a593Smuzhiyun    CPU  1:                 1578 events,       74 KiB data
2266*4882a593Smuzhiyun    Total:                  8660 events (dropped 0),      406 KiB data
2267*4882a593Smuzhiyun
2268*4882a593SmuzhiyunIf you examine the files saved to disk, you see multiple files, one per CPU and
2269*4882a593Smuzhiyunwith the device name as the first part of the filename::
2270*4882a593Smuzhiyun
2271*4882a593Smuzhiyun   root@crownbay:~# ls -al
2272*4882a593Smuzhiyun   drwxr-xr-x    6 root     root          1024 Oct 27 22:39 .
2273*4882a593Smuzhiyun   drwxr-sr-x    4 root     root          1024 Oct 26 18:24 ..
2274*4882a593Smuzhiyun   -rw-r--r--    1 root     root        339938 Oct 27 22:40 sdc.blktrace.0
2275*4882a593Smuzhiyun   -rw-r--r--    1 root     root         75753 Oct 27 22:40 sdc.blktrace.1
2276*4882a593Smuzhiyun
2277*4882a593SmuzhiyunTo view the trace events, simply invoke 'blkparse' in the directory
2278*4882a593Smuzhiyuncontaining the trace files, giving it the device name that forms the
2279*4882a593Smuzhiyunfirst part of the filenames::
2280*4882a593Smuzhiyun
2281*4882a593Smuzhiyun   root@crownbay:~# blkparse sdc
2282*4882a593Smuzhiyun
2283*4882a593Smuzhiyun    8,32   1        1     0.000000000  1225  Q  WS 3417048 + 8 [jbd2/sdc-8]
2284*4882a593Smuzhiyun    8,32   1        2     0.000025213  1225  G  WS 3417048 + 8 [jbd2/sdc-8]
2285*4882a593Smuzhiyun    8,32   1        3     0.000033384  1225  P   N [jbd2/sdc-8]
2286*4882a593Smuzhiyun    8,32   1        4     0.000043301  1225  I  WS 3417048 + 8 [jbd2/sdc-8]
2287*4882a593Smuzhiyun    8,32   1        0     0.000057270     0  m   N cfq1225 insert_request
2288*4882a593Smuzhiyun    8,32   1        0     0.000064813     0  m   N cfq1225 add_to_rr
2289*4882a593Smuzhiyun    8,32   1        5     0.000076336  1225  U   N [jbd2/sdc-8] 1
2290*4882a593Smuzhiyun    8,32   1        0     0.000088559     0  m   N cfq workload slice:150
2291*4882a593Smuzhiyun    8,32   1        0     0.000097359     0  m   N cfq1225 set_active wl_prio:0 wl_type:1
2292*4882a593Smuzhiyun    8,32   1        0     0.000104063     0  m   N cfq1225 Not idling. st->count:1
2293*4882a593Smuzhiyun    8,32   1        0     0.000112584     0  m   N cfq1225 fifo=  (null)
2294*4882a593Smuzhiyun    8,32   1        0     0.000118730     0  m   N cfq1225 dispatch_insert
2295*4882a593Smuzhiyun    8,32   1        0     0.000127390     0  m   N cfq1225 dispatched a request
2296*4882a593Smuzhiyun    8,32   1        0     0.000133536     0  m   N cfq1225 activate rq, drv=1
2297*4882a593Smuzhiyun    8,32   1        6     0.000136889  1225  D  WS 3417048 + 8 [jbd2/sdc-8]
2298*4882a593Smuzhiyun    8,32   1        7     0.000360381  1225  Q  WS 3417056 + 8 [jbd2/sdc-8]
2299*4882a593Smuzhiyun    8,32   1        8     0.000377422  1225  G  WS 3417056 + 8 [jbd2/sdc-8]
2300*4882a593Smuzhiyun    8,32   1        9     0.000388876  1225  P   N [jbd2/sdc-8]
2301*4882a593Smuzhiyun    8,32   1       10     0.000397886  1225  Q  WS 3417064 + 8 [jbd2/sdc-8]
2302*4882a593Smuzhiyun    8,32   1       11     0.000404800  1225  M  WS 3417064 + 8 [jbd2/sdc-8]
2303*4882a593Smuzhiyun    8,32   1       12     0.000412343  1225  Q  WS 3417072 + 8 [jbd2/sdc-8]
2304*4882a593Smuzhiyun    8,32   1       13     0.000416533  1225  M  WS 3417072 + 8 [jbd2/sdc-8]
2305*4882a593Smuzhiyun    8,32   1       14     0.000422121  1225  Q  WS 3417080 + 8 [jbd2/sdc-8]
2306*4882a593Smuzhiyun    8,32   1       15     0.000425194  1225  M  WS 3417080 + 8 [jbd2/sdc-8]
2307*4882a593Smuzhiyun    8,32   1       16     0.000431968  1225  Q  WS 3417088 + 8 [jbd2/sdc-8]
2308*4882a593Smuzhiyun    8,32   1       17     0.000435251  1225  M  WS 3417088 + 8 [jbd2/sdc-8]
2309*4882a593Smuzhiyun    8,32   1       18     0.000440279  1225  Q  WS 3417096 + 8 [jbd2/sdc-8]
2310*4882a593Smuzhiyun    8,32   1       19     0.000443911  1225  M  WS 3417096 + 8 [jbd2/sdc-8]
2311*4882a593Smuzhiyun    8,32   1       20     0.000450336  1225  Q  WS 3417104 + 8 [jbd2/sdc-8]
2312*4882a593Smuzhiyun    8,32   1       21     0.000454038  1225  M  WS 3417104 + 8 [jbd2/sdc-8]
2313*4882a593Smuzhiyun    8,32   1       22     0.000462070  1225  Q  WS 3417112 + 8 [jbd2/sdc-8]
2314*4882a593Smuzhiyun    8,32   1       23     0.000465422  1225  M  WS 3417112 + 8 [jbd2/sdc-8]
2315*4882a593Smuzhiyun    8,32   1       24     0.000474222  1225  I  WS 3417056 + 64 [jbd2/sdc-8]
2316*4882a593Smuzhiyun    8,32   1        0     0.000483022     0  m   N cfq1225 insert_request
2317*4882a593Smuzhiyun    8,32   1       25     0.000489727  1225  U   N [jbd2/sdc-8] 1
2318*4882a593Smuzhiyun    8,32   1        0     0.000498457     0  m   N cfq1225 Not idling. st->count:1
2319*4882a593Smuzhiyun    8,32   1        0     0.000503765     0  m   N cfq1225 dispatch_insert
2320*4882a593Smuzhiyun    8,32   1        0     0.000512914     0  m   N cfq1225 dispatched a request
2321*4882a593Smuzhiyun    8,32   1        0     0.000518851     0  m   N cfq1225 activate rq, drv=2
2322*4882a593Smuzhiyun    .
2323*4882a593Smuzhiyun    .
2324*4882a593Smuzhiyun    .
2325*4882a593Smuzhiyun    8,32   0        0    58.515006138     0  m   N cfq3551 complete rqnoidle 1
2326*4882a593Smuzhiyun    8,32   0     2024    58.516603269     3  C  WS 3156992 + 16 [0]
2327*4882a593Smuzhiyun    8,32   0        0    58.516626736     0  m   N cfq3551 complete rqnoidle 1
2328*4882a593Smuzhiyun    8,32   0        0    58.516634558     0  m   N cfq3551 arm_idle: 8 group_idle: 0
2329*4882a593Smuzhiyun    8,32   0        0    58.516636933     0  m   N cfq schedule dispatch
2330*4882a593Smuzhiyun    8,32   1        0    58.516971613     0  m   N cfq3551 slice expired t=0
2331*4882a593Smuzhiyun    8,32   1        0    58.516982089     0  m   N cfq3551 sl_used=13 disp=6 charge=13 iops=0 sect=80
2332*4882a593Smuzhiyun    8,32   1        0    58.516985511     0  m   N cfq3551 del_from_rr
2333*4882a593Smuzhiyun    8,32   1        0    58.516990819     0  m   N cfq3551 put_queue
2334*4882a593Smuzhiyun
2335*4882a593Smuzhiyun   CPU0 (sdc):
2336*4882a593Smuzhiyun    Reads Queued:           0,        0KiB	 Writes Queued:         331,   26,284KiB
2337*4882a593Smuzhiyun    Read Dispatches:        0,        0KiB	 Write Dispatches:      485,   40,484KiB
2338*4882a593Smuzhiyun    Reads Requeued:         0		 Writes Requeued:         0
2339*4882a593Smuzhiyun    Reads Completed:        0,        0KiB	 Writes Completed:      511,   41,000KiB
2340*4882a593Smuzhiyun    Read Merges:            0,        0KiB	 Write Merges:           13,      160KiB
2341*4882a593Smuzhiyun    Read depth:             0        	 Write depth:             2
2342*4882a593Smuzhiyun    IO unplugs:            23        	 Timer unplugs:           0
2343*4882a593Smuzhiyun   CPU1 (sdc):
2344*4882a593Smuzhiyun    Reads Queued:           0,        0KiB	 Writes Queued:         249,   15,800KiB
2345*4882a593Smuzhiyun    Read Dispatches:        0,        0KiB	 Write Dispatches:       42,    1,600KiB
2346*4882a593Smuzhiyun    Reads Requeued:         0		 Writes Requeued:         0
2347*4882a593Smuzhiyun    Reads Completed:        0,        0KiB	 Writes Completed:       16,    1,084KiB
2348*4882a593Smuzhiyun    Read Merges:            0,        0KiB	 Write Merges:           40,      276KiB
2349*4882a593Smuzhiyun    Read depth:             0        	 Write depth:             2
2350*4882a593Smuzhiyun    IO unplugs:            30        	 Timer unplugs:           1
2351*4882a593Smuzhiyun
2352*4882a593Smuzhiyun   Total (sdc):
2353*4882a593Smuzhiyun    Reads Queued:           0,        0KiB	 Writes Queued:         580,   42,084KiB
2354*4882a593Smuzhiyun    Read Dispatches:        0,        0KiB	 Write Dispatches:      527,   42,084KiB
2355*4882a593Smuzhiyun    Reads Requeued:         0		 Writes Requeued:         0
2356*4882a593Smuzhiyun    Reads Completed:        0,        0KiB	 Writes Completed:      527,   42,084KiB
2357*4882a593Smuzhiyun    Read Merges:            0,        0KiB	 Write Merges:           53,      436KiB
2358*4882a593Smuzhiyun    IO unplugs:            53        	 Timer unplugs:           1
2359*4882a593Smuzhiyun
2360*4882a593Smuzhiyun   Throughput (R/W): 0KiB/s / 719KiB/s
2361*4882a593Smuzhiyun   Events (sdc): 6,592 entries
2362*4882a593Smuzhiyun   Skips: 0 forward (0 -   0.0%)
2363*4882a593Smuzhiyun   Input file sdc.blktrace.0 added
2364*4882a593Smuzhiyun   Input file sdc.blktrace.1 added
2365*4882a593Smuzhiyun
2366*4882a593SmuzhiyunThe report shows each event that was
2367*4882a593Smuzhiyunfound in the blktrace data, along with a summary of the overall block
2368*4882a593SmuzhiyunI/O traffic during the run. You can look at the
2369*4882a593Smuzhiyun`blkparse <https://linux.die.net/man/1/blkparse>`__ manpage to learn the
2370*4882a593Smuzhiyunmeaning of each field displayed in the trace listing.
2371*4882a593Smuzhiyun
2372*4882a593SmuzhiyunLive Mode
2373*4882a593Smuzhiyun~~~~~~~~~
2374*4882a593Smuzhiyun
2375*4882a593Smuzhiyunblktrace and blkparse are designed from the ground up to be able to
2376*4882a593Smuzhiyunoperate together in a 'pipe mode' where the stdout of blktrace can be
2377*4882a593Smuzhiyunfed directly into the stdin of blkparse::
2378*4882a593Smuzhiyun
2379*4882a593Smuzhiyun   root@crownbay:~# blktrace /dev/sdc -o - | blkparse -i -
2380*4882a593Smuzhiyun
2381*4882a593SmuzhiyunThis enables long-lived tracing sessions
2382*4882a593Smuzhiyunto run without writing anything to disk, and allows the user to look for
2383*4882a593Smuzhiyuncertain conditions in the trace data in 'real-time' by viewing the trace
2384*4882a593Smuzhiyunoutput as it scrolls by on the screen or by passing it along to yet
2385*4882a593Smuzhiyunanother program in the pipeline such as grep which can be used to
2386*4882a593Smuzhiyunidentify and capture conditions of interest.
2387*4882a593Smuzhiyun
2388*4882a593SmuzhiyunThere's actually another blktrace command that implements the above
2389*4882a593Smuzhiyunpipeline as a single command, so the user doesn't have to bother typing
2390*4882a593Smuzhiyunin the above command sequence::
2391*4882a593Smuzhiyun
2392*4882a593Smuzhiyun   root@crownbay:~# btrace /dev/sdc
2393*4882a593Smuzhiyun
2394*4882a593SmuzhiyunUsing blktrace Remotely
2395*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~
2396*4882a593Smuzhiyun
2397*4882a593SmuzhiyunBecause blktrace traces block I/O and at the same time normally writes
2398*4882a593Smuzhiyunits trace data to a block device, and in general because it's not really
2399*4882a593Smuzhiyuna great idea to make the device being traced the same as the device the
2400*4882a593Smuzhiyuntracer writes to, blktrace provides a way to trace without perturbing
2401*4882a593Smuzhiyunthe traced device at all by providing native support for sending all
2402*4882a593Smuzhiyuntrace data over the network.
2403*4882a593Smuzhiyun
2404*4882a593SmuzhiyunTo have blktrace operate in this mode, start blktrace on the target
2405*4882a593Smuzhiyunsystem being traced with the -l option, along with the device to trace::
2406*4882a593Smuzhiyun
2407*4882a593Smuzhiyun   root@crownbay:~# blktrace -l /dev/sdc
2408*4882a593Smuzhiyun   server: waiting for connections...
2409*4882a593Smuzhiyun
2410*4882a593SmuzhiyunOn the host system, use the -h option to connect to the target system,
2411*4882a593Smuzhiyunalso passing it the device to trace::
2412*4882a593Smuzhiyun
2413*4882a593Smuzhiyun   $ blktrace -d /dev/sdc -h 192.168.1.43
2414*4882a593Smuzhiyun   blktrace: connecting to 192.168.1.43
2415*4882a593Smuzhiyun   blktrace: connected!
2416*4882a593Smuzhiyun
2417*4882a593SmuzhiyunOn the target system, you should see this::
2418*4882a593Smuzhiyun
2419*4882a593Smuzhiyun   server: connection from 192.168.1.43
2420*4882a593Smuzhiyun
2421*4882a593SmuzhiyunIn another shell, execute a workload you want to trace. ::
2422*4882a593Smuzhiyun
2423*4882a593Smuzhiyun   root@crownbay:/media/sdc# rm linux-2.6.19.2.tar.bz2; wget &YOCTO_DL_URL;/mirror/sources/linux-2.6.19.2.tar.bz2; sync
2424*4882a593Smuzhiyun   Connecting to downloads.yoctoproject.org (140.211.169.59:80)
2425*4882a593Smuzhiyun   linux-2.6.19.2.tar.b 100% \|*******************************\| 41727k 0:00:00 ETA
2426*4882a593Smuzhiyun
2427*4882a593SmuzhiyunWhen it's done, do a Ctrl-C on the host system to stop the
2428*4882a593Smuzhiyuntrace::
2429*4882a593Smuzhiyun
2430*4882a593Smuzhiyun   ^C=== sdc ===
2431*4882a593Smuzhiyun    CPU  0:                 7691 events,      361 KiB data
2432*4882a593Smuzhiyun    CPU  1:                 4109 events,      193 KiB data
2433*4882a593Smuzhiyun    Total:                 11800 events (dropped 0),      554 KiB data
2434*4882a593Smuzhiyun
2435*4882a593SmuzhiyunOn the target system, you should also see a trace summary for the trace
2436*4882a593Smuzhiyunjust ended::
2437*4882a593Smuzhiyun
2438*4882a593Smuzhiyun   server: end of run for 192.168.1.43:sdc
2439*4882a593Smuzhiyun   === sdc ===
2440*4882a593Smuzhiyun    CPU  0:                 7691 events,      361 KiB data
2441*4882a593Smuzhiyun    CPU  1:                 4109 events,      193 KiB data
2442*4882a593Smuzhiyun    Total:                 11800 events (dropped 0),      554 KiB data
2443*4882a593Smuzhiyun
2444*4882a593SmuzhiyunThe blktrace instance on the host will
2445*4882a593Smuzhiyunsave the target output inside a hostname-timestamp directory::
2446*4882a593Smuzhiyun
2447*4882a593Smuzhiyun   $ ls -al
2448*4882a593Smuzhiyun   drwxr-xr-x   10 root     root          1024 Oct 28 02:40 .
2449*4882a593Smuzhiyun   drwxr-sr-x    4 root     root          1024 Oct 26 18:24 ..
2450*4882a593Smuzhiyun   drwxr-xr-x    2 root     root          1024 Oct 28 02:40 192.168.1.43-2012-10-28-02:40:56
2451*4882a593Smuzhiyun
2452*4882a593Smuzhiyuncd into that directory to see the output files::
2453*4882a593Smuzhiyun
2454*4882a593Smuzhiyun   $ ls -l
2455*4882a593Smuzhiyun   -rw-r--r--    1 root     root        369193 Oct 28 02:44 sdc.blktrace.0
2456*4882a593Smuzhiyun   -rw-r--r--    1 root     root        197278 Oct 28 02:44 sdc.blktrace.1
2457*4882a593Smuzhiyun
2458*4882a593SmuzhiyunAnd run blkparse on the host system using the device name::
2459*4882a593Smuzhiyun
2460*4882a593Smuzhiyun   $ blkparse sdc
2461*4882a593Smuzhiyun
2462*4882a593Smuzhiyun    8,32   1        1     0.000000000  1263  Q  RM 6016 + 8 [ls]
2463*4882a593Smuzhiyun    8,32   1        0     0.000036038     0  m   N cfq1263 alloced
2464*4882a593Smuzhiyun    8,32   1        2     0.000039390  1263  G  RM 6016 + 8 [ls]
2465*4882a593Smuzhiyun    8,32   1        3     0.000049168  1263  I  RM 6016 + 8 [ls]
2466*4882a593Smuzhiyun    8,32   1        0     0.000056152     0  m   N cfq1263 insert_request
2467*4882a593Smuzhiyun    8,32   1        0     0.000061600     0  m   N cfq1263 add_to_rr
2468*4882a593Smuzhiyun    8,32   1        0     0.000075498     0  m   N cfq workload slice:300
2469*4882a593Smuzhiyun    .
2470*4882a593Smuzhiyun    .
2471*4882a593Smuzhiyun    .
2472*4882a593Smuzhiyun    8,32   0        0   177.266385696     0  m   N cfq1267 arm_idle: 8 group_idle: 0
2473*4882a593Smuzhiyun    8,32   0        0   177.266388140     0  m   N cfq schedule dispatch
2474*4882a593Smuzhiyun    8,32   1        0   177.266679239     0  m   N cfq1267 slice expired t=0
2475*4882a593Smuzhiyun    8,32   1        0   177.266689297     0  m   N cfq1267 sl_used=9 disp=6 charge=9 iops=0 sect=56
2476*4882a593Smuzhiyun    8,32   1        0   177.266692649     0  m   N cfq1267 del_from_rr
2477*4882a593Smuzhiyun    8,32   1        0   177.266696560     0  m   N cfq1267 put_queue
2478*4882a593Smuzhiyun
2479*4882a593Smuzhiyun   CPU0 (sdc):
2480*4882a593Smuzhiyun    Reads Queued:           0,        0KiB	 Writes Queued:         270,   21,708KiB
2481*4882a593Smuzhiyun    Read Dispatches:       59,    2,628KiB	 Write Dispatches:      495,   39,964KiB
2482*4882a593Smuzhiyun    Reads Requeued:         0		 Writes Requeued:         0
2483*4882a593Smuzhiyun    Reads Completed:       90,    2,752KiB	 Writes Completed:      543,   41,596KiB
2484*4882a593Smuzhiyun    Read Merges:            0,        0KiB	 Write Merges:            9,      344KiB
2485*4882a593Smuzhiyun    Read depth:             2        	 Write depth:             2
2486*4882a593Smuzhiyun    IO unplugs:            20        	 Timer unplugs:           1
2487*4882a593Smuzhiyun   CPU1 (sdc):
2488*4882a593Smuzhiyun    Reads Queued:         688,    2,752KiB	 Writes Queued:         381,   20,652KiB
2489*4882a593Smuzhiyun    Read Dispatches:       31,      124KiB	 Write Dispatches:       59,    2,396KiB
2490*4882a593Smuzhiyun    Reads Requeued:         0		 Writes Requeued:         0
2491*4882a593Smuzhiyun    Reads Completed:        0,        0KiB	 Writes Completed:       11,      764KiB
2492*4882a593Smuzhiyun    Read Merges:          598,    2,392KiB	 Write Merges:           88,      448KiB
2493*4882a593Smuzhiyun    Read depth:             2        	 Write depth:             2
2494*4882a593Smuzhiyun    IO unplugs:            52        	 Timer unplugs:           0
2495*4882a593Smuzhiyun
2496*4882a593Smuzhiyun   Total (sdc):
2497*4882a593Smuzhiyun    Reads Queued:         688,    2,752KiB	 Writes Queued:         651,   42,360KiB
2498*4882a593Smuzhiyun    Read Dispatches:       90,    2,752KiB	 Write Dispatches:      554,   42,360KiB
2499*4882a593Smuzhiyun    Reads Requeued:         0		 Writes Requeued:         0
2500*4882a593Smuzhiyun    Reads Completed:       90,    2,752KiB	 Writes Completed:      554,   42,360KiB
2501*4882a593Smuzhiyun    Read Merges:          598,    2,392KiB	 Write Merges:           97,      792KiB
2502*4882a593Smuzhiyun    IO unplugs:            72        	 Timer unplugs:           1
2503*4882a593Smuzhiyun
2504*4882a593Smuzhiyun   Throughput (R/W): 15KiB/s / 238KiB/s
2505*4882a593Smuzhiyun   Events (sdc): 9,301 entries
2506*4882a593Smuzhiyun   Skips: 0 forward (0 -   0.0%)
2507*4882a593Smuzhiyun
2508*4882a593SmuzhiyunYou should see the trace events and summary just as you would have if you'd run
2509*4882a593Smuzhiyunthe same command on the target.
2510*4882a593Smuzhiyun
2511*4882a593SmuzhiyunTracing Block I/O via 'ftrace'
2512*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2513*4882a593Smuzhiyun
2514*4882a593SmuzhiyunIt's also possible to trace block I/O using only
2515*4882a593Smuzhiyun:ref:`profile-manual/usage:The 'trace events' Subsystem`, which
2516*4882a593Smuzhiyuncan be useful for casual tracing if you don't want to bother dealing with the
2517*4882a593Smuzhiyunuserspace tools.
2518*4882a593Smuzhiyun
2519*4882a593SmuzhiyunTo enable tracing for a given device, use /sys/block/xxx/trace/enable,
2520*4882a593Smuzhiyunwhere xxx is the device name. This for example enables tracing for
2521*4882a593Smuzhiyun/dev/sdc::
2522*4882a593Smuzhiyun
2523*4882a593Smuzhiyun   root@crownbay:/sys/kernel/debug/tracing# echo 1 > /sys/block/sdc/trace/enable
2524*4882a593Smuzhiyun
2525*4882a593SmuzhiyunOnce you've selected the device(s) you want
2526*4882a593Smuzhiyunto trace, selecting the 'blk' tracer will turn the blk tracer on::
2527*4882a593Smuzhiyun
2528*4882a593Smuzhiyun   root@crownbay:/sys/kernel/debug/tracing# cat available_tracers
2529*4882a593Smuzhiyun   blk function_graph function nop
2530*4882a593Smuzhiyun
2531*4882a593Smuzhiyun   root@crownbay:/sys/kernel/debug/tracing# echo blk > current_tracer
2532*4882a593Smuzhiyun
2533*4882a593SmuzhiyunExecute the workload you're interested in::
2534*4882a593Smuzhiyun
2535*4882a593Smuzhiyun   root@crownbay:/sys/kernel/debug/tracing# cat /media/sdc/testfile.txt
2536*4882a593Smuzhiyun
2537*4882a593SmuzhiyunAnd look at the output (note here that we're using 'trace_pipe' instead of
2538*4882a593Smuzhiyuntrace to capture this trace - this allows us to wait around on the pipe
2539*4882a593Smuzhiyunfor data to appear)::
2540*4882a593Smuzhiyun
2541*4882a593Smuzhiyun   root@crownbay:/sys/kernel/debug/tracing# cat trace_pipe
2542*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276361:   8,32   Q   R 1699848 + 8 [cat]
2543*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276410:   8,32   m   N cfq3587 alloced
2544*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276415:   8,32   G   R 1699848 + 8 [cat]
2545*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276424:   8,32   P   N [cat]
2546*4882a593Smuzhiyun               cat-3587  [001] d..2  3023.276432:   8,32   I   R 1699848 + 8 [cat]
2547*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276439:   8,32   m   N cfq3587 insert_request
2548*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276445:   8,32   m   N cfq3587 add_to_rr
2549*4882a593Smuzhiyun               cat-3587  [001] d..2  3023.276454:   8,32   U   N [cat] 1
2550*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276464:   8,32   m   N cfq workload slice:150
2551*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276471:   8,32   m   N cfq3587 set_active wl_prio:0 wl_type:2
2552*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276478:   8,32   m   N cfq3587 fifo=  (null)
2553*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276483:   8,32   m   N cfq3587 dispatch_insert
2554*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276490:   8,32   m   N cfq3587 dispatched a request
2555*4882a593Smuzhiyun               cat-3587  [001] d..1  3023.276497:   8,32   m   N cfq3587 activate rq, drv=1
2556*4882a593Smuzhiyun               cat-3587  [001] d..2  3023.276500:   8,32   D   R 1699848 + 8 [cat]
2557*4882a593Smuzhiyun
2558*4882a593SmuzhiyunAnd this turns off tracing for the specified device::
2559*4882a593Smuzhiyun
2560*4882a593Smuzhiyun   root@crownbay:/sys/kernel/debug/tracing# echo 0 > /sys/block/sdc/trace/enable
2561*4882a593Smuzhiyun
2562*4882a593Smuzhiyunblktrace Documentation
2563*4882a593Smuzhiyun----------------------
2564*4882a593Smuzhiyun
2565*4882a593SmuzhiyunOnline versions of the man pages for the commands discussed in this
2566*4882a593Smuzhiyunsection can be found here:
2567*4882a593Smuzhiyun
2568*4882a593Smuzhiyun-  https://linux.die.net/man/8/blktrace
2569*4882a593Smuzhiyun
2570*4882a593Smuzhiyun-  https://linux.die.net/man/1/blkparse
2571*4882a593Smuzhiyun
2572*4882a593Smuzhiyun-  https://linux.die.net/man/8/btrace
2573*4882a593Smuzhiyun
2574*4882a593SmuzhiyunThe above manpages, along with manpages for the other blktrace utilities
2575*4882a593Smuzhiyun(btt, blkiomon, etc) can be found in the /doc directory of the blktrace
2576*4882a593Smuzhiyuntools git repo::
2577*4882a593Smuzhiyun
2578*4882a593Smuzhiyun   $ git clone git://git.kernel.dk/blktrace.git
2579