xref: /OK3568_Linux_fs/kernel/Documentation/trace/ftrace-uses.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=================================
2*4882a593SmuzhiyunUsing ftrace to hook to functions
3*4882a593Smuzhiyun=================================
4*4882a593Smuzhiyun
5*4882a593Smuzhiyun.. Copyright 2017 VMware Inc.
6*4882a593Smuzhiyun..   Author:   Steven Rostedt <srostedt@goodmis.org>
7*4882a593Smuzhiyun..  License:   The GNU Free Documentation License, Version 1.2
8*4882a593Smuzhiyun..               (dual licensed under the GPL v2)
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunWritten for: 4.14
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunIntroduction
13*4882a593Smuzhiyun============
14*4882a593Smuzhiyun
15*4882a593SmuzhiyunThe ftrace infrastructure was originally created to attach callbacks to the
16*4882a593Smuzhiyunbeginning of functions in order to record and trace the flow of the kernel.
17*4882a593SmuzhiyunBut callbacks to the start of a function can have other use cases. Either
18*4882a593Smuzhiyunfor live kernel patching, or for security monitoring. This document describes
19*4882a593Smuzhiyunhow to use ftrace to implement your own function callbacks.
20*4882a593Smuzhiyun
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunThe ftrace context
23*4882a593Smuzhiyun==================
24*4882a593Smuzhiyun.. warning::
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun  The ability to add a callback to almost any function within the
27*4882a593Smuzhiyun  kernel comes with risks. A callback can be called from any context
28*4882a593Smuzhiyun  (normal, softirq, irq, and NMI). Callbacks can also be called just before
29*4882a593Smuzhiyun  going to idle, during CPU bring up and takedown, or going to user space.
30*4882a593Smuzhiyun  This requires extra care to what can be done inside a callback. A callback
31*4882a593Smuzhiyun  can be called outside the protective scope of RCU.
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunThe ftrace infrastructure has some protections against recursions and RCU
34*4882a593Smuzhiyunbut one must still be very careful how they use the callbacks.
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunThe ftrace_ops structure
38*4882a593Smuzhiyun========================
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunTo register a function callback, a ftrace_ops is required. This structure
41*4882a593Smuzhiyunis used to tell ftrace what function should be called as the callback
42*4882a593Smuzhiyunas well as what protections the callback will perform and not require
43*4882a593Smuzhiyunftrace to handle.
44*4882a593Smuzhiyun
45*4882a593SmuzhiyunThere is only one field that is needed to be set when registering
46*4882a593Smuzhiyunan ftrace_ops with ftrace:
47*4882a593Smuzhiyun
48*4882a593Smuzhiyun.. code-block:: c
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun struct ftrace_ops ops = {
51*4882a593Smuzhiyun       .func			= my_callback_func,
52*4882a593Smuzhiyun       .flags			= MY_FTRACE_FLAGS
53*4882a593Smuzhiyun       .private			= any_private_data_structure,
54*4882a593Smuzhiyun };
55*4882a593Smuzhiyun
56*4882a593SmuzhiyunBoth .flags and .private are optional. Only .func is required.
57*4882a593Smuzhiyun
58*4882a593SmuzhiyunTo enable tracing call::
59*4882a593Smuzhiyun
60*4882a593Smuzhiyun    register_ftrace_function(&ops);
61*4882a593Smuzhiyun
62*4882a593SmuzhiyunTo disable tracing call::
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun    unregister_ftrace_function(&ops);
65*4882a593Smuzhiyun
66*4882a593SmuzhiyunThe above is defined by including the header::
67*4882a593Smuzhiyun
68*4882a593Smuzhiyun    #include <linux/ftrace.h>
69*4882a593Smuzhiyun
70*4882a593SmuzhiyunThe registered callback will start being called some time after the
71*4882a593Smuzhiyunregister_ftrace_function() is called and before it returns. The exact time
72*4882a593Smuzhiyunthat callbacks start being called is dependent upon architecture and scheduling
73*4882a593Smuzhiyunof services. The callback itself will have to handle any synchronization if it
74*4882a593Smuzhiyunmust begin at an exact moment.
75*4882a593Smuzhiyun
76*4882a593SmuzhiyunThe unregister_ftrace_function() will guarantee that the callback is
77*4882a593Smuzhiyunno longer being called by functions after the unregister_ftrace_function()
78*4882a593Smuzhiyunreturns. Note that to perform this guarantee, the unregister_ftrace_function()
79*4882a593Smuzhiyunmay take some time to finish.
80*4882a593Smuzhiyun
81*4882a593Smuzhiyun
82*4882a593SmuzhiyunThe callback function
83*4882a593Smuzhiyun=====================
84*4882a593Smuzhiyun
85*4882a593SmuzhiyunThe prototype of the callback function is as follows (as of v4.14):
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun.. code-block:: c
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun   void callback_func(unsigned long ip, unsigned long parent_ip,
90*4882a593Smuzhiyun                      struct ftrace_ops *op, struct pt_regs *regs);
91*4882a593Smuzhiyun
92*4882a593Smuzhiyun@ip
93*4882a593Smuzhiyun	 This is the instruction pointer of the function that is being traced.
94*4882a593Smuzhiyun      	 (where the fentry or mcount is within the function)
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun@parent_ip
97*4882a593Smuzhiyun	This is the instruction pointer of the function that called the
98*4882a593Smuzhiyun	the function being traced (where the call of the function occurred).
99*4882a593Smuzhiyun
100*4882a593Smuzhiyun@op
101*4882a593Smuzhiyun	This is a pointer to ftrace_ops that was used to register the callback.
102*4882a593Smuzhiyun	This can be used to pass data to the callback via the private pointer.
103*4882a593Smuzhiyun
104*4882a593Smuzhiyun@regs
105*4882a593Smuzhiyun	If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
106*4882a593Smuzhiyun	flags are set in the ftrace_ops structure, then this will be pointing
107*4882a593Smuzhiyun	to the pt_regs structure like it would be if an breakpoint was placed
108*4882a593Smuzhiyun	at the start of the function where ftrace was tracing. Otherwise it
109*4882a593Smuzhiyun	either contains garbage, or NULL.
110*4882a593Smuzhiyun
111*4882a593Smuzhiyun
112*4882a593SmuzhiyunThe ftrace FLAGS
113*4882a593Smuzhiyun================
114*4882a593Smuzhiyun
115*4882a593SmuzhiyunThe ftrace_ops flags are all defined and documented in include/linux/ftrace.h.
116*4882a593SmuzhiyunSome of the flags are used for internal infrastructure of ftrace, but the
117*4882a593Smuzhiyunones that users should be aware of are the following:
118*4882a593Smuzhiyun
119*4882a593SmuzhiyunFTRACE_OPS_FL_SAVE_REGS
120*4882a593Smuzhiyun	If the callback requires reading or modifying the pt_regs
121*4882a593Smuzhiyun	passed to the callback, then it must set this flag. Registering
122*4882a593Smuzhiyun	a ftrace_ops with this flag set on an architecture that does not
123*4882a593Smuzhiyun	support passing of pt_regs to the callback will fail.
124*4882a593Smuzhiyun
125*4882a593SmuzhiyunFTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
126*4882a593Smuzhiyun	Similar to SAVE_REGS but the registering of a
127*4882a593Smuzhiyun	ftrace_ops on an architecture that does not support passing of regs
128*4882a593Smuzhiyun	will not fail with this flag set. But the callback must check if
129*4882a593Smuzhiyun	regs is NULL or not to determine if the architecture supports it.
130*4882a593Smuzhiyun
131*4882a593SmuzhiyunFTRACE_OPS_FL_RECURSION_SAFE
132*4882a593Smuzhiyun	By default, a wrapper is added around the callback to
133*4882a593Smuzhiyun	make sure that recursion of the function does not occur. That is,
134*4882a593Smuzhiyun	if a function that is called as a result of the callback's execution
135*4882a593Smuzhiyun	is also traced, ftrace will prevent the callback from being called
136*4882a593Smuzhiyun	again. But this wrapper adds some overhead, and if the callback is
137*4882a593Smuzhiyun	safe from recursion, it can set this flag to disable the ftrace
138*4882a593Smuzhiyun	protection.
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun	Note, if this flag is set, and recursion does occur, it could cause
141*4882a593Smuzhiyun	the system to crash, and possibly reboot via a triple fault.
142*4882a593Smuzhiyun
143*4882a593Smuzhiyun	It is OK if another callback traces a function that is called by a
144*4882a593Smuzhiyun	callback that is marked recursion safe. Recursion safe callbacks
145*4882a593Smuzhiyun	must never trace any function that are called by the callback
146*4882a593Smuzhiyun	itself or any nested functions that those functions call.
147*4882a593Smuzhiyun
148*4882a593Smuzhiyun	If this flag is set, it is possible that the callback will also
149*4882a593Smuzhiyun	be called with preemption enabled (when CONFIG_PREEMPTION is set),
150*4882a593Smuzhiyun	but this is not guaranteed.
151*4882a593Smuzhiyun
152*4882a593SmuzhiyunFTRACE_OPS_FL_IPMODIFY
153*4882a593Smuzhiyun	Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack"
154*4882a593Smuzhiyun	the traced function (have another function called instead of the
155*4882a593Smuzhiyun	traced function), it requires setting this flag. This is what live
156*4882a593Smuzhiyun	kernel patches uses. Without this flag the pt_regs->ip can not be
157*4882a593Smuzhiyun	modified.
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun	Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be
160*4882a593Smuzhiyun	registered to any given function at a time.
161*4882a593Smuzhiyun
162*4882a593SmuzhiyunFTRACE_OPS_FL_RCU
163*4882a593Smuzhiyun	If this is set, then the callback will only be called by functions
164*4882a593Smuzhiyun	where RCU is "watching". This is required if the callback function
165*4882a593Smuzhiyun	performs any rcu_read_lock() operation.
166*4882a593Smuzhiyun
167*4882a593Smuzhiyun	RCU stops watching when the system goes idle, the time when a CPU
168*4882a593Smuzhiyun	is taken down and comes back online, and when entering from kernel
169*4882a593Smuzhiyun	to user space and back to kernel space. During these transitions,
170*4882a593Smuzhiyun	a callback may be executed and RCU synchronization will not protect
171*4882a593Smuzhiyun	it.
172*4882a593Smuzhiyun
173*4882a593SmuzhiyunFTRACE_OPS_FL_PERMANENT
174*4882a593Smuzhiyun        If this is set on any ftrace ops, then the tracing cannot disabled by
175*4882a593Smuzhiyun        writing 0 to the proc sysctl ftrace_enabled. Equally, a callback with
176*4882a593Smuzhiyun        the flag set cannot be registered if ftrace_enabled is 0.
177*4882a593Smuzhiyun
178*4882a593Smuzhiyun        Livepatch uses it not to lose the function redirection, so the system
179*4882a593Smuzhiyun        stays protected.
180*4882a593Smuzhiyun
181*4882a593Smuzhiyun
182*4882a593SmuzhiyunFiltering which functions to trace
183*4882a593Smuzhiyun==================================
184*4882a593Smuzhiyun
185*4882a593SmuzhiyunIf a callback is only to be called from specific functions, a filter must be
186*4882a593Smuzhiyunset up. The filters are added by name, or ip if it is known.
187*4882a593Smuzhiyun
188*4882a593Smuzhiyun.. code-block:: c
189*4882a593Smuzhiyun
190*4882a593Smuzhiyun   int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
191*4882a593Smuzhiyun                         int len, int reset);
192*4882a593Smuzhiyun
193*4882a593Smuzhiyun@ops
194*4882a593Smuzhiyun	The ops to set the filter with
195*4882a593Smuzhiyun
196*4882a593Smuzhiyun@buf
197*4882a593Smuzhiyun	The string that holds the function filter text.
198*4882a593Smuzhiyun@len
199*4882a593Smuzhiyun	The length of the string.
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun@reset
202*4882a593Smuzhiyun	Non-zero to reset all filters before applying this filter.
203*4882a593Smuzhiyun
204*4882a593SmuzhiyunFilters denote which functions should be enabled when tracing is enabled.
205*4882a593SmuzhiyunIf @buf is NULL and reset is set, all functions will be enabled for tracing.
206*4882a593Smuzhiyun
207*4882a593SmuzhiyunThe @buf can also be a glob expression to enable all functions that
208*4882a593Smuzhiyunmatch a specific pattern.
209*4882a593Smuzhiyun
210*4882a593SmuzhiyunSee Filter Commands in :file:`Documentation/trace/ftrace.rst`.
211*4882a593Smuzhiyun
212*4882a593SmuzhiyunTo just trace the schedule function:
213*4882a593Smuzhiyun
214*4882a593Smuzhiyun.. code-block:: c
215*4882a593Smuzhiyun
216*4882a593Smuzhiyun   ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0);
217*4882a593Smuzhiyun
218*4882a593SmuzhiyunTo add more functions, call the ftrace_set_filter() more than once with the
219*4882a593Smuzhiyun@reset parameter set to zero. To remove the current filter set and replace it
220*4882a593Smuzhiyunwith new functions defined by @buf, have @reset be non-zero.
221*4882a593Smuzhiyun
222*4882a593SmuzhiyunTo remove all the filtered functions and trace all functions:
223*4882a593Smuzhiyun
224*4882a593Smuzhiyun.. code-block:: c
225*4882a593Smuzhiyun
226*4882a593Smuzhiyun   ret = ftrace_set_filter(&ops, NULL, 0, 1);
227*4882a593Smuzhiyun
228*4882a593Smuzhiyun
229*4882a593SmuzhiyunSometimes more than one function has the same name. To trace just a specific
230*4882a593Smuzhiyunfunction in this case, ftrace_set_filter_ip() can be used.
231*4882a593Smuzhiyun
232*4882a593Smuzhiyun.. code-block:: c
233*4882a593Smuzhiyun
234*4882a593Smuzhiyun   ret = ftrace_set_filter_ip(&ops, ip, 0, 0);
235*4882a593Smuzhiyun
236*4882a593SmuzhiyunAlthough the ip must be the address where the call to fentry or mcount is
237*4882a593Smuzhiyunlocated in the function. This function is used by perf and kprobes that
238*4882a593Smuzhiyungets the ip address from the user (usually using debug info from the kernel).
239*4882a593Smuzhiyun
240*4882a593SmuzhiyunIf a glob is used to set the filter, functions can be added to a "notrace"
241*4882a593Smuzhiyunlist that will prevent those functions from calling the callback.
242*4882a593SmuzhiyunThe "notrace" list takes precedence over the "filter" list. If the
243*4882a593Smuzhiyuntwo lists are non-empty and contain the same functions, the callback will not
244*4882a593Smuzhiyunbe called by any function.
245*4882a593Smuzhiyun
246*4882a593SmuzhiyunAn empty "notrace" list means to allow all functions defined by the filter
247*4882a593Smuzhiyunto be traced.
248*4882a593Smuzhiyun
249*4882a593Smuzhiyun.. code-block:: c
250*4882a593Smuzhiyun
251*4882a593Smuzhiyun   int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
252*4882a593Smuzhiyun                          int len, int reset);
253*4882a593Smuzhiyun
254*4882a593SmuzhiyunThis takes the same parameters as ftrace_set_filter() but will add the
255*4882a593Smuzhiyunfunctions it finds to not be traced. This is a separate list from the
256*4882a593Smuzhiyunfilter list, and this function does not modify the filter list.
257*4882a593Smuzhiyun
258*4882a593SmuzhiyunA non-zero @reset will clear the "notrace" list before adding functions
259*4882a593Smuzhiyunthat match @buf to it.
260*4882a593Smuzhiyun
261*4882a593SmuzhiyunClearing the "notrace" list is the same as clearing the filter list
262*4882a593Smuzhiyun
263*4882a593Smuzhiyun.. code-block:: c
264*4882a593Smuzhiyun
265*4882a593Smuzhiyun  ret = ftrace_set_notrace(&ops, NULL, 0, 1);
266*4882a593Smuzhiyun
267*4882a593SmuzhiyunThe filter and notrace lists may be changed at any time. If only a set of
268*4882a593Smuzhiyunfunctions should call the callback, it is best to set the filters before
269*4882a593Smuzhiyunregistering the callback. But the changes may also happen after the callback
270*4882a593Smuzhiyunhas been registered.
271*4882a593Smuzhiyun
272*4882a593SmuzhiyunIf a filter is in place, and the @reset is non-zero, and @buf contains a
273*4882a593Smuzhiyunmatching glob to functions, the switch will happen during the time of
274*4882a593Smuzhiyunthe ftrace_set_filter() call. At no time will all functions call the callback.
275*4882a593Smuzhiyun
276*4882a593Smuzhiyun.. code-block:: c
277*4882a593Smuzhiyun
278*4882a593Smuzhiyun   ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
279*4882a593Smuzhiyun
280*4882a593Smuzhiyun   register_ftrace_function(&ops);
281*4882a593Smuzhiyun
282*4882a593Smuzhiyun   msleep(10);
283*4882a593Smuzhiyun
284*4882a593Smuzhiyun   ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1);
285*4882a593Smuzhiyun
286*4882a593Smuzhiyunis not the same as:
287*4882a593Smuzhiyun
288*4882a593Smuzhiyun.. code-block:: c
289*4882a593Smuzhiyun
290*4882a593Smuzhiyun   ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
291*4882a593Smuzhiyun
292*4882a593Smuzhiyun   register_ftrace_function(&ops);
293*4882a593Smuzhiyun
294*4882a593Smuzhiyun   msleep(10);
295*4882a593Smuzhiyun
296*4882a593Smuzhiyun   ftrace_set_filter(&ops, NULL, 0, 1);
297*4882a593Smuzhiyun
298*4882a593Smuzhiyun   ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0);
299*4882a593Smuzhiyun
300*4882a593SmuzhiyunAs the latter will have a short time where all functions will call
301*4882a593Smuzhiyunthe callback, between the time of the reset, and the time of the
302*4882a593Smuzhiyunnew setting of the filter.
303