xref: /OK3568_Linux_fs/kernel/Documentation/trace/kprobes.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=======================
2*4882a593SmuzhiyunKernel Probes (Kprobes)
3*4882a593Smuzhiyun=======================
4*4882a593Smuzhiyun
5*4882a593Smuzhiyun:Author: Jim Keniston <jkenisto@us.ibm.com>
6*4882a593Smuzhiyun:Author: Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com>
7*4882a593Smuzhiyun:Author: Masami Hiramatsu <mhiramat@redhat.com>
8*4882a593Smuzhiyun
9*4882a593Smuzhiyun.. CONTENTS
10*4882a593Smuzhiyun
11*4882a593Smuzhiyun  1. Concepts: Kprobes, and Return Probes
12*4882a593Smuzhiyun  2. Architectures Supported
13*4882a593Smuzhiyun  3. Configuring Kprobes
14*4882a593Smuzhiyun  4. API Reference
15*4882a593Smuzhiyun  5. Kprobes Features and Limitations
16*4882a593Smuzhiyun  6. Probe Overhead
17*4882a593Smuzhiyun  7. TODO
18*4882a593Smuzhiyun  8. Kprobes Example
19*4882a593Smuzhiyun  9. Kretprobes Example
20*4882a593Smuzhiyun  10. Deprecated Features
21*4882a593Smuzhiyun  Appendix A: The kprobes debugfs interface
22*4882a593Smuzhiyun  Appendix B: The kprobes sysctl interface
23*4882a593Smuzhiyun  Appendix C: References
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunConcepts: Kprobes and Return Probes
26*4882a593Smuzhiyun=========================================
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunKprobes enables you to dynamically break into any kernel routine and
29*4882a593Smuzhiyuncollect debugging and performance information non-disruptively. You
30*4882a593Smuzhiyuncan trap at almost any kernel code address [1]_, specifying a handler
31*4882a593Smuzhiyunroutine to be invoked when the breakpoint is hit.
32*4882a593Smuzhiyun
33*4882a593Smuzhiyun.. [1] some parts of the kernel code can not be trapped, see
34*4882a593Smuzhiyun       :ref:`kprobes_blacklist`)
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunThere are currently two types of probes: kprobes, and kretprobes
37*4882a593Smuzhiyun(also called return probes).  A kprobe can be inserted on virtually
38*4882a593Smuzhiyunany instruction in the kernel.  A return probe fires when a specified
39*4882a593Smuzhiyunfunction returns.
40*4882a593Smuzhiyun
41*4882a593SmuzhiyunIn the typical case, Kprobes-based instrumentation is packaged as
42*4882a593Smuzhiyuna kernel module.  The module's init function installs ("registers")
43*4882a593Smuzhiyunone or more probes, and the exit function unregisters them.  A
44*4882a593Smuzhiyunregistration function such as register_kprobe() specifies where
45*4882a593Smuzhiyunthe probe is to be inserted and what handler is to be called when
46*4882a593Smuzhiyunthe probe is hit.
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunThere are also ``register_/unregister_*probes()`` functions for batch
49*4882a593Smuzhiyunregistration/unregistration of a group of ``*probes``. These functions
50*4882a593Smuzhiyuncan speed up unregistration process when you have to unregister
51*4882a593Smuzhiyuna lot of probes at once.
52*4882a593Smuzhiyun
53*4882a593SmuzhiyunThe next four subsections explain how the different types of
54*4882a593Smuzhiyunprobes work and how jump optimization works.  They explain certain
55*4882a593Smuzhiyunthings that you'll need to know in order to make the best use of
56*4882a593SmuzhiyunKprobes -- e.g., the difference between a pre_handler and
57*4882a593Smuzhiyuna post_handler, and how to use the maxactive and nmissed fields of
58*4882a593Smuzhiyuna kretprobe.  But if you're in a hurry to start using Kprobes, you
59*4882a593Smuzhiyuncan skip ahead to :ref:`kprobes_archs_supported`.
60*4882a593Smuzhiyun
61*4882a593SmuzhiyunHow Does a Kprobe Work?
62*4882a593Smuzhiyun-----------------------
63*4882a593Smuzhiyun
64*4882a593SmuzhiyunWhen a kprobe is registered, Kprobes makes a copy of the probed
65*4882a593Smuzhiyuninstruction and replaces the first byte(s) of the probed instruction
66*4882a593Smuzhiyunwith a breakpoint instruction (e.g., int3 on i386 and x86_64).
67*4882a593Smuzhiyun
68*4882a593SmuzhiyunWhen a CPU hits the breakpoint instruction, a trap occurs, the CPU's
69*4882a593Smuzhiyunregisters are saved, and control passes to Kprobes via the
70*4882a593Smuzhiyunnotifier_call_chain mechanism.  Kprobes executes the "pre_handler"
71*4882a593Smuzhiyunassociated with the kprobe, passing the handler the addresses of the
72*4882a593Smuzhiyunkprobe struct and the saved registers.
73*4882a593Smuzhiyun
74*4882a593SmuzhiyunNext, Kprobes single-steps its copy of the probed instruction.
75*4882a593Smuzhiyun(It would be simpler to single-step the actual instruction in place,
76*4882a593Smuzhiyunbut then Kprobes would have to temporarily remove the breakpoint
77*4882a593Smuzhiyuninstruction.  This would open a small time window when another CPU
78*4882a593Smuzhiyuncould sail right past the probepoint.)
79*4882a593Smuzhiyun
80*4882a593SmuzhiyunAfter the instruction is single-stepped, Kprobes executes the
81*4882a593Smuzhiyun"post_handler," if any, that is associated with the kprobe.
82*4882a593SmuzhiyunExecution then continues with the instruction following the probepoint.
83*4882a593Smuzhiyun
84*4882a593SmuzhiyunChanging Execution Path
85*4882a593Smuzhiyun-----------------------
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunSince kprobes can probe into a running kernel code, it can change the
88*4882a593Smuzhiyunregister set, including instruction pointer. This operation requires
89*4882a593Smuzhiyunmaximum care, such as keeping the stack frame, recovering the execution
90*4882a593Smuzhiyunpath etc. Since it operates on a running kernel and needs deep knowledge
91*4882a593Smuzhiyunof computer architecture and concurrent computing, you can easily shoot
92*4882a593Smuzhiyunyour foot.
93*4882a593Smuzhiyun
94*4882a593SmuzhiyunIf you change the instruction pointer (and set up other related
95*4882a593Smuzhiyunregisters) in pre_handler, you must return !0 so that kprobes stops
96*4882a593Smuzhiyunsingle stepping and just returns to the given address.
97*4882a593SmuzhiyunThis also means post_handler should not be called anymore.
98*4882a593Smuzhiyun
99*4882a593SmuzhiyunNote that this operation may be harder on some architectures which use
100*4882a593SmuzhiyunTOC (Table of Contents) for function call, since you have to setup a new
101*4882a593SmuzhiyunTOC for your function in your module, and recover the old one after
102*4882a593Smuzhiyunreturning from it.
103*4882a593Smuzhiyun
104*4882a593SmuzhiyunReturn Probes
105*4882a593Smuzhiyun-------------
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunHow Does a Return Probe Work?
108*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunWhen you call register_kretprobe(), Kprobes establishes a kprobe at
111*4882a593Smuzhiyunthe entry to the function.  When the probed function is called and this
112*4882a593Smuzhiyunprobe is hit, Kprobes saves a copy of the return address, and replaces
113*4882a593Smuzhiyunthe return address with the address of a "trampoline."  The trampoline
114*4882a593Smuzhiyunis an arbitrary piece of code -- typically just a nop instruction.
115*4882a593SmuzhiyunAt boot time, Kprobes registers a kprobe at the trampoline.
116*4882a593Smuzhiyun
117*4882a593SmuzhiyunWhen the probed function executes its return instruction, control
118*4882a593Smuzhiyunpasses to the trampoline and that probe is hit.  Kprobes' trampoline
119*4882a593Smuzhiyunhandler calls the user-specified return handler associated with the
120*4882a593Smuzhiyunkretprobe, then sets the saved instruction pointer to the saved return
121*4882a593Smuzhiyunaddress, and that's where execution resumes upon return from the trap.
122*4882a593Smuzhiyun
123*4882a593SmuzhiyunWhile the probed function is executing, its return address is
124*4882a593Smuzhiyunstored in an object of type kretprobe_instance.  Before calling
125*4882a593Smuzhiyunregister_kretprobe(), the user sets the maxactive field of the
126*4882a593Smuzhiyunkretprobe struct to specify how many instances of the specified
127*4882a593Smuzhiyunfunction can be probed simultaneously.  register_kretprobe()
128*4882a593Smuzhiyunpre-allocates the indicated number of kretprobe_instance objects.
129*4882a593Smuzhiyun
130*4882a593SmuzhiyunFor example, if the function is non-recursive and is called with a
131*4882a593Smuzhiyunspinlock held, maxactive = 1 should be enough.  If the function is
132*4882a593Smuzhiyunnon-recursive and can never relinquish the CPU (e.g., via a semaphore
133*4882a593Smuzhiyunor preemption), NR_CPUS should be enough.  If maxactive <= 0, it is
134*4882a593Smuzhiyunset to a default value.  If CONFIG_PREEMPT is enabled, the default
135*4882a593Smuzhiyunis max(10, 2*NR_CPUS).  Otherwise, the default is NR_CPUS.
136*4882a593Smuzhiyun
137*4882a593SmuzhiyunIt's not a disaster if you set maxactive too low; you'll just miss
138*4882a593Smuzhiyunsome probes.  In the kretprobe struct, the nmissed field is set to
139*4882a593Smuzhiyunzero when the return probe is registered, and is incremented every
140*4882a593Smuzhiyuntime the probed function is entered but there is no kretprobe_instance
141*4882a593Smuzhiyunobject available for establishing the return probe.
142*4882a593Smuzhiyun
143*4882a593SmuzhiyunKretprobe entry-handler
144*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^
145*4882a593Smuzhiyun
146*4882a593SmuzhiyunKretprobes also provides an optional user-specified handler which runs
147*4882a593Smuzhiyunon function entry. This handler is specified by setting the entry_handler
148*4882a593Smuzhiyunfield of the kretprobe struct. Whenever the kprobe placed by kretprobe at the
149*4882a593Smuzhiyunfunction entry is hit, the user-defined entry_handler, if any, is invoked.
150*4882a593SmuzhiyunIf the entry_handler returns 0 (success) then a corresponding return handler
151*4882a593Smuzhiyunis guaranteed to be called upon function return. If the entry_handler
152*4882a593Smuzhiyunreturns a non-zero error then Kprobes leaves the return address as is, and
153*4882a593Smuzhiyunthe kretprobe has no further effect for that particular function instance.
154*4882a593Smuzhiyun
155*4882a593SmuzhiyunMultiple entry and return handler invocations are matched using the unique
156*4882a593Smuzhiyunkretprobe_instance object associated with them. Additionally, a user
157*4882a593Smuzhiyunmay also specify per return-instance private data to be part of each
158*4882a593Smuzhiyunkretprobe_instance object. This is especially useful when sharing private
159*4882a593Smuzhiyundata between corresponding user entry and return handlers. The size of each
160*4882a593Smuzhiyunprivate data object can be specified at kretprobe registration time by
161*4882a593Smuzhiyunsetting the data_size field of the kretprobe struct. This data can be
162*4882a593Smuzhiyunaccessed through the data field of each kretprobe_instance object.
163*4882a593Smuzhiyun
164*4882a593SmuzhiyunIn case probed function is entered but there is no kretprobe_instance
165*4882a593Smuzhiyunobject available, then in addition to incrementing the nmissed count,
166*4882a593Smuzhiyunthe user entry_handler invocation is also skipped.
167*4882a593Smuzhiyun
168*4882a593Smuzhiyun.. _kprobes_jump_optimization:
169*4882a593Smuzhiyun
170*4882a593SmuzhiyunHow Does Jump Optimization Work?
171*4882a593Smuzhiyun--------------------------------
172*4882a593Smuzhiyun
173*4882a593SmuzhiyunIf your kernel is built with CONFIG_OPTPROBES=y (currently this flag
174*4882a593Smuzhiyunis automatically set 'y' on x86/x86-64, non-preemptive kernel) and
175*4882a593Smuzhiyunthe "debug.kprobes_optimization" kernel parameter is set to 1 (see
176*4882a593Smuzhiyunsysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump
177*4882a593Smuzhiyuninstruction instead of a breakpoint instruction at each probepoint.
178*4882a593Smuzhiyun
179*4882a593SmuzhiyunInit a Kprobe
180*4882a593Smuzhiyun^^^^^^^^^^^^^
181*4882a593Smuzhiyun
182*4882a593SmuzhiyunWhen a probe is registered, before attempting this optimization,
183*4882a593SmuzhiyunKprobes inserts an ordinary, breakpoint-based kprobe at the specified
184*4882a593Smuzhiyunaddress. So, even if it's not possible to optimize this particular
185*4882a593Smuzhiyunprobepoint, there'll be a probe there.
186*4882a593Smuzhiyun
187*4882a593SmuzhiyunSafety Check
188*4882a593Smuzhiyun^^^^^^^^^^^^
189*4882a593Smuzhiyun
190*4882a593SmuzhiyunBefore optimizing a probe, Kprobes performs the following safety checks:
191*4882a593Smuzhiyun
192*4882a593Smuzhiyun- Kprobes verifies that the region that will be replaced by the jump
193*4882a593Smuzhiyun  instruction (the "optimized region") lies entirely within one function.
194*4882a593Smuzhiyun  (A jump instruction is multiple bytes, and so may overlay multiple
195*4882a593Smuzhiyun  instructions.)
196*4882a593Smuzhiyun
197*4882a593Smuzhiyun- Kprobes analyzes the entire function and verifies that there is no
198*4882a593Smuzhiyun  jump into the optimized region.  Specifically:
199*4882a593Smuzhiyun
200*4882a593Smuzhiyun  - the function contains no indirect jump;
201*4882a593Smuzhiyun  - the function contains no instruction that causes an exception (since
202*4882a593Smuzhiyun    the fixup code triggered by the exception could jump back into the
203*4882a593Smuzhiyun    optimized region -- Kprobes checks the exception tables to verify this);
204*4882a593Smuzhiyun  - there is no near jump to the optimized region (other than to the first
205*4882a593Smuzhiyun    byte).
206*4882a593Smuzhiyun
207*4882a593Smuzhiyun- For each instruction in the optimized region, Kprobes verifies that
208*4882a593Smuzhiyun  the instruction can be executed out of line.
209*4882a593Smuzhiyun
210*4882a593SmuzhiyunPreparing Detour Buffer
211*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^
212*4882a593Smuzhiyun
213*4882a593SmuzhiyunNext, Kprobes prepares a "detour" buffer, which contains the following
214*4882a593Smuzhiyuninstruction sequence:
215*4882a593Smuzhiyun
216*4882a593Smuzhiyun- code to push the CPU's registers (emulating a breakpoint trap)
217*4882a593Smuzhiyun- a call to the trampoline code which calls user's probe handlers.
218*4882a593Smuzhiyun- code to restore registers
219*4882a593Smuzhiyun- the instructions from the optimized region
220*4882a593Smuzhiyun- a jump back to the original execution path.
221*4882a593Smuzhiyun
222*4882a593SmuzhiyunPre-optimization
223*4882a593Smuzhiyun^^^^^^^^^^^^^^^^
224*4882a593Smuzhiyun
225*4882a593SmuzhiyunAfter preparing the detour buffer, Kprobes verifies that none of the
226*4882a593Smuzhiyunfollowing situations exist:
227*4882a593Smuzhiyun
228*4882a593Smuzhiyun- The probe has a post_handler.
229*4882a593Smuzhiyun- Other instructions in the optimized region are probed.
230*4882a593Smuzhiyun- The probe is disabled.
231*4882a593Smuzhiyun
232*4882a593SmuzhiyunIn any of the above cases, Kprobes won't start optimizing the probe.
233*4882a593SmuzhiyunSince these are temporary situations, Kprobes tries to start
234*4882a593Smuzhiyunoptimizing it again if the situation is changed.
235*4882a593Smuzhiyun
236*4882a593SmuzhiyunIf the kprobe can be optimized, Kprobes enqueues the kprobe to an
237*4882a593Smuzhiyunoptimizing list, and kicks the kprobe-optimizer workqueue to optimize
238*4882a593Smuzhiyunit.  If the to-be-optimized probepoint is hit before being optimized,
239*4882a593SmuzhiyunKprobes returns control to the original instruction path by setting
240*4882a593Smuzhiyunthe CPU's instruction pointer to the copied code in the detour buffer
241*4882a593Smuzhiyun-- thus at least avoiding the single-step.
242*4882a593Smuzhiyun
243*4882a593SmuzhiyunOptimization
244*4882a593Smuzhiyun^^^^^^^^^^^^
245*4882a593Smuzhiyun
246*4882a593SmuzhiyunThe Kprobe-optimizer doesn't insert the jump instruction immediately;
247*4882a593Smuzhiyunrather, it calls synchronize_rcu() for safety first, because it's
248*4882a593Smuzhiyunpossible for a CPU to be interrupted in the middle of executing the
249*4882a593Smuzhiyunoptimized region [3]_.  As you know, synchronize_rcu() can ensure
250*4882a593Smuzhiyunthat all interruptions that were active when synchronize_rcu()
251*4882a593Smuzhiyunwas called are done, but only if CONFIG_PREEMPT=n.  So, this version
252*4882a593Smuzhiyunof kprobe optimization supports only kernels with CONFIG_PREEMPT=n [4]_.
253*4882a593Smuzhiyun
254*4882a593SmuzhiyunAfter that, the Kprobe-optimizer calls stop_machine() to replace
255*4882a593Smuzhiyunthe optimized region with a jump instruction to the detour buffer,
256*4882a593Smuzhiyunusing text_poke_smp().
257*4882a593Smuzhiyun
258*4882a593SmuzhiyunUnoptimization
259*4882a593Smuzhiyun^^^^^^^^^^^^^^
260*4882a593Smuzhiyun
261*4882a593SmuzhiyunWhen an optimized kprobe is unregistered, disabled, or blocked by
262*4882a593Smuzhiyunanother kprobe, it will be unoptimized.  If this happens before
263*4882a593Smuzhiyunthe optimization is complete, the kprobe is just dequeued from the
264*4882a593Smuzhiyunoptimized list.  If the optimization has been done, the jump is
265*4882a593Smuzhiyunreplaced with the original code (except for an int3 breakpoint in
266*4882a593Smuzhiyunthe first byte) by using text_poke_smp().
267*4882a593Smuzhiyun
268*4882a593Smuzhiyun.. [3] Please imagine that the 2nd instruction is interrupted and then
269*4882a593Smuzhiyun   the optimizer replaces the 2nd instruction with the jump *address*
270*4882a593Smuzhiyun   while the interrupt handler is running. When the interrupt
271*4882a593Smuzhiyun   returns to original address, there is no valid instruction,
272*4882a593Smuzhiyun   and it causes an unexpected result.
273*4882a593Smuzhiyun
274*4882a593Smuzhiyun.. [4] This optimization-safety checking may be replaced with the
275*4882a593Smuzhiyun   stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y
276*4882a593Smuzhiyun   kernel.
277*4882a593Smuzhiyun
278*4882a593SmuzhiyunNOTE for geeks:
279*4882a593SmuzhiyunThe jump optimization changes the kprobe's pre_handler behavior.
280*4882a593SmuzhiyunWithout optimization, the pre_handler can change the kernel's execution
281*4882a593Smuzhiyunpath by changing regs->ip and returning 1.  However, when the probe
282*4882a593Smuzhiyunis optimized, that modification is ignored.  Thus, if you want to
283*4882a593Smuzhiyuntweak the kernel's execution path, you need to suppress optimization,
284*4882a593Smuzhiyunusing one of the following techniques:
285*4882a593Smuzhiyun
286*4882a593Smuzhiyun- Specify an empty function for the kprobe's post_handler.
287*4882a593Smuzhiyun
288*4882a593Smuzhiyunor
289*4882a593Smuzhiyun
290*4882a593Smuzhiyun- Execute 'sysctl -w debug.kprobes_optimization=n'
291*4882a593Smuzhiyun
292*4882a593Smuzhiyun.. _kprobes_blacklist:
293*4882a593Smuzhiyun
294*4882a593SmuzhiyunBlacklist
295*4882a593Smuzhiyun---------
296*4882a593Smuzhiyun
297*4882a593SmuzhiyunKprobes can probe most of the kernel except itself. This means
298*4882a593Smuzhiyunthat there are some functions where kprobes cannot probe. Probing
299*4882a593Smuzhiyun(trapping) such functions can cause a recursive trap (e.g. double
300*4882a593Smuzhiyunfault) or the nested probe handler may never be called.
301*4882a593SmuzhiyunKprobes manages such functions as a blacklist.
302*4882a593SmuzhiyunIf you want to add a function into the blacklist, you just need
303*4882a593Smuzhiyunto (1) include linux/kprobes.h and (2) use NOKPROBE_SYMBOL() macro
304*4882a593Smuzhiyunto specify a blacklisted function.
305*4882a593SmuzhiyunKprobes checks the given probe address against the blacklist and
306*4882a593Smuzhiyunrejects registering it, if the given address is in the blacklist.
307*4882a593Smuzhiyun
308*4882a593Smuzhiyun.. _kprobes_archs_supported:
309*4882a593Smuzhiyun
310*4882a593SmuzhiyunArchitectures Supported
311*4882a593Smuzhiyun=======================
312*4882a593Smuzhiyun
313*4882a593SmuzhiyunKprobes and return probes are implemented on the following
314*4882a593Smuzhiyunarchitectures:
315*4882a593Smuzhiyun
316*4882a593Smuzhiyun- i386 (Supports jump optimization)
317*4882a593Smuzhiyun- x86_64 (AMD-64, EM64T) (Supports jump optimization)
318*4882a593Smuzhiyun- ppc64
319*4882a593Smuzhiyun- ia64 (Does not support probes on instruction slot1.)
320*4882a593Smuzhiyun- sparc64 (Return probes not yet implemented.)
321*4882a593Smuzhiyun- arm
322*4882a593Smuzhiyun- ppc
323*4882a593Smuzhiyun- mips
324*4882a593Smuzhiyun- s390
325*4882a593Smuzhiyun- parisc
326*4882a593Smuzhiyun
327*4882a593SmuzhiyunConfiguring Kprobes
328*4882a593Smuzhiyun===================
329*4882a593Smuzhiyun
330*4882a593SmuzhiyunWhen configuring the kernel using make menuconfig/xconfig/oldconfig,
331*4882a593Smuzhiyunensure that CONFIG_KPROBES is set to "y". Under "General setup", look
332*4882a593Smuzhiyunfor "Kprobes".
333*4882a593Smuzhiyun
334*4882a593SmuzhiyunSo that you can load and unload Kprobes-based instrumentation modules,
335*4882a593Smuzhiyunmake sure "Loadable module support" (CONFIG_MODULES) and "Module
336*4882a593Smuzhiyununloading" (CONFIG_MODULE_UNLOAD) are set to "y".
337*4882a593Smuzhiyun
338*4882a593SmuzhiyunAlso make sure that CONFIG_KALLSYMS and perhaps even CONFIG_KALLSYMS_ALL
339*4882a593Smuzhiyunare set to "y", since kallsyms_lookup_name() is used by the in-kernel
340*4882a593Smuzhiyunkprobe address resolution code.
341*4882a593Smuzhiyun
342*4882a593SmuzhiyunIf you need to insert a probe in the middle of a function, you may find
343*4882a593Smuzhiyunit useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO),
344*4882a593Smuzhiyunso you can use "objdump -d -l vmlinux" to see the source-to-object
345*4882a593Smuzhiyuncode mapping.
346*4882a593Smuzhiyun
347*4882a593SmuzhiyunAPI Reference
348*4882a593Smuzhiyun=============
349*4882a593Smuzhiyun
350*4882a593SmuzhiyunThe Kprobes API includes a "register" function and an "unregister"
351*4882a593Smuzhiyunfunction for each type of probe. The API also includes "register_*probes"
352*4882a593Smuzhiyunand "unregister_*probes" functions for (un)registering arrays of probes.
353*4882a593SmuzhiyunHere are terse, mini-man-page specifications for these functions and
354*4882a593Smuzhiyunthe associated probe handlers that you'll write. See the files in the
355*4882a593Smuzhiyunsamples/kprobes/ sub-directory for examples.
356*4882a593Smuzhiyun
357*4882a593Smuzhiyunregister_kprobe
358*4882a593Smuzhiyun---------------
359*4882a593Smuzhiyun
360*4882a593Smuzhiyun::
361*4882a593Smuzhiyun
362*4882a593Smuzhiyun	#include <linux/kprobes.h>
363*4882a593Smuzhiyun	int register_kprobe(struct kprobe *kp);
364*4882a593Smuzhiyun
365*4882a593SmuzhiyunSets a breakpoint at the address kp->addr.  When the breakpoint is
366*4882a593Smuzhiyunhit, Kprobes calls kp->pre_handler.  After the probed instruction
367*4882a593Smuzhiyunis single-stepped, Kprobe calls kp->post_handler.  If a fault
368*4882a593Smuzhiyunoccurs during execution of kp->pre_handler or kp->post_handler,
369*4882a593Smuzhiyunor during single-stepping of the probed instruction, Kprobes calls
370*4882a593Smuzhiyunkp->fault_handler.  Any or all handlers can be NULL. If kp->flags
371*4882a593Smuzhiyunis set KPROBE_FLAG_DISABLED, that kp will be registered but disabled,
372*4882a593Smuzhiyunso, its handlers aren't hit until calling enable_kprobe(kp).
373*4882a593Smuzhiyun
374*4882a593Smuzhiyun.. note::
375*4882a593Smuzhiyun
376*4882a593Smuzhiyun   1. With the introduction of the "symbol_name" field to struct kprobe,
377*4882a593Smuzhiyun      the probepoint address resolution will now be taken care of by the kernel.
378*4882a593Smuzhiyun      The following will now work::
379*4882a593Smuzhiyun
380*4882a593Smuzhiyun	kp.symbol_name = "symbol_name";
381*4882a593Smuzhiyun
382*4882a593Smuzhiyun      (64-bit powerpc intricacies such as function descriptors are handled
383*4882a593Smuzhiyun      transparently)
384*4882a593Smuzhiyun
385*4882a593Smuzhiyun   2. Use the "offset" field of struct kprobe if the offset into the symbol
386*4882a593Smuzhiyun      to install a probepoint is known. This field is used to calculate the
387*4882a593Smuzhiyun      probepoint.
388*4882a593Smuzhiyun
389*4882a593Smuzhiyun   3. Specify either the kprobe "symbol_name" OR the "addr". If both are
390*4882a593Smuzhiyun      specified, kprobe registration will fail with -EINVAL.
391*4882a593Smuzhiyun
392*4882a593Smuzhiyun   4. With CISC architectures (such as i386 and x86_64), the kprobes code
393*4882a593Smuzhiyun      does not validate if the kprobe.addr is at an instruction boundary.
394*4882a593Smuzhiyun      Use "offset" with caution.
395*4882a593Smuzhiyun
396*4882a593Smuzhiyunregister_kprobe() returns 0 on success, or a negative errno otherwise.
397*4882a593Smuzhiyun
398*4882a593SmuzhiyunUser's pre-handler (kp->pre_handler)::
399*4882a593Smuzhiyun
400*4882a593Smuzhiyun	#include <linux/kprobes.h>
401*4882a593Smuzhiyun	#include <linux/ptrace.h>
402*4882a593Smuzhiyun	int pre_handler(struct kprobe *p, struct pt_regs *regs);
403*4882a593Smuzhiyun
404*4882a593SmuzhiyunCalled with p pointing to the kprobe associated with the breakpoint,
405*4882a593Smuzhiyunand regs pointing to the struct containing the registers saved when
406*4882a593Smuzhiyunthe breakpoint was hit.  Return 0 here unless you're a Kprobes geek.
407*4882a593Smuzhiyun
408*4882a593SmuzhiyunUser's post-handler (kp->post_handler)::
409*4882a593Smuzhiyun
410*4882a593Smuzhiyun	#include <linux/kprobes.h>
411*4882a593Smuzhiyun	#include <linux/ptrace.h>
412*4882a593Smuzhiyun	void post_handler(struct kprobe *p, struct pt_regs *regs,
413*4882a593Smuzhiyun			  unsigned long flags);
414*4882a593Smuzhiyun
415*4882a593Smuzhiyunp and regs are as described for the pre_handler.  flags always seems
416*4882a593Smuzhiyunto be zero.
417*4882a593Smuzhiyun
418*4882a593SmuzhiyunUser's fault-handler (kp->fault_handler)::
419*4882a593Smuzhiyun
420*4882a593Smuzhiyun	#include <linux/kprobes.h>
421*4882a593Smuzhiyun	#include <linux/ptrace.h>
422*4882a593Smuzhiyun	int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr);
423*4882a593Smuzhiyun
424*4882a593Smuzhiyunp and regs are as described for the pre_handler.  trapnr is the
425*4882a593Smuzhiyunarchitecture-specific trap number associated with the fault (e.g.,
426*4882a593Smuzhiyunon i386, 13 for a general protection fault or 14 for a page fault).
427*4882a593SmuzhiyunReturns 1 if it successfully handled the exception.
428*4882a593Smuzhiyun
429*4882a593Smuzhiyunregister_kretprobe
430*4882a593Smuzhiyun------------------
431*4882a593Smuzhiyun
432*4882a593Smuzhiyun::
433*4882a593Smuzhiyun
434*4882a593Smuzhiyun	#include <linux/kprobes.h>
435*4882a593Smuzhiyun	int register_kretprobe(struct kretprobe *rp);
436*4882a593Smuzhiyun
437*4882a593SmuzhiyunEstablishes a return probe for the function whose address is
438*4882a593Smuzhiyunrp->kp.addr.  When that function returns, Kprobes calls rp->handler.
439*4882a593SmuzhiyunYou must set rp->maxactive appropriately before you call
440*4882a593Smuzhiyunregister_kretprobe(); see "How Does a Return Probe Work?" for details.
441*4882a593Smuzhiyun
442*4882a593Smuzhiyunregister_kretprobe() returns 0 on success, or a negative errno
443*4882a593Smuzhiyunotherwise.
444*4882a593Smuzhiyun
445*4882a593SmuzhiyunUser's return-probe handler (rp->handler)::
446*4882a593Smuzhiyun
447*4882a593Smuzhiyun	#include <linux/kprobes.h>
448*4882a593Smuzhiyun	#include <linux/ptrace.h>
449*4882a593Smuzhiyun	int kretprobe_handler(struct kretprobe_instance *ri,
450*4882a593Smuzhiyun			      struct pt_regs *regs);
451*4882a593Smuzhiyun
452*4882a593Smuzhiyunregs is as described for kprobe.pre_handler.  ri points to the
453*4882a593Smuzhiyunkretprobe_instance object, of which the following fields may be
454*4882a593Smuzhiyunof interest:
455*4882a593Smuzhiyun
456*4882a593Smuzhiyun- ret_addr: the return address
457*4882a593Smuzhiyun- rp: points to the corresponding kretprobe object
458*4882a593Smuzhiyun- task: points to the corresponding task struct
459*4882a593Smuzhiyun- data: points to per return-instance private data; see "Kretprobe
460*4882a593Smuzhiyun	entry-handler" for details.
461*4882a593Smuzhiyun
462*4882a593SmuzhiyunThe regs_return_value(regs) macro provides a simple abstraction to
463*4882a593Smuzhiyunextract the return value from the appropriate register as defined by
464*4882a593Smuzhiyunthe architecture's ABI.
465*4882a593Smuzhiyun
466*4882a593SmuzhiyunThe handler's return value is currently ignored.
467*4882a593Smuzhiyun
468*4882a593Smuzhiyununregister_*probe
469*4882a593Smuzhiyun------------------
470*4882a593Smuzhiyun
471*4882a593Smuzhiyun::
472*4882a593Smuzhiyun
473*4882a593Smuzhiyun	#include <linux/kprobes.h>
474*4882a593Smuzhiyun	void unregister_kprobe(struct kprobe *kp);
475*4882a593Smuzhiyun	void unregister_kretprobe(struct kretprobe *rp);
476*4882a593Smuzhiyun
477*4882a593SmuzhiyunRemoves the specified probe.  The unregister function can be called
478*4882a593Smuzhiyunat any time after the probe has been registered.
479*4882a593Smuzhiyun
480*4882a593Smuzhiyun.. note::
481*4882a593Smuzhiyun
482*4882a593Smuzhiyun   If the functions find an incorrect probe (ex. an unregistered probe),
483*4882a593Smuzhiyun   they clear the addr field of the probe.
484*4882a593Smuzhiyun
485*4882a593Smuzhiyunregister_*probes
486*4882a593Smuzhiyun----------------
487*4882a593Smuzhiyun
488*4882a593Smuzhiyun::
489*4882a593Smuzhiyun
490*4882a593Smuzhiyun	#include <linux/kprobes.h>
491*4882a593Smuzhiyun	int register_kprobes(struct kprobe **kps, int num);
492*4882a593Smuzhiyun	int register_kretprobes(struct kretprobe **rps, int num);
493*4882a593Smuzhiyun
494*4882a593SmuzhiyunRegisters each of the num probes in the specified array.  If any
495*4882a593Smuzhiyunerror occurs during registration, all probes in the array, up to
496*4882a593Smuzhiyunthe bad probe, are safely unregistered before the register_*probes
497*4882a593Smuzhiyunfunction returns.
498*4882a593Smuzhiyun
499*4882a593Smuzhiyun- kps/rps: an array of pointers to ``*probe`` data structures
500*4882a593Smuzhiyun- num: the number of the array entries.
501*4882a593Smuzhiyun
502*4882a593Smuzhiyun.. note::
503*4882a593Smuzhiyun
504*4882a593Smuzhiyun   You have to allocate(or define) an array of pointers and set all
505*4882a593Smuzhiyun   of the array entries before using these functions.
506*4882a593Smuzhiyun
507*4882a593Smuzhiyununregister_*probes
508*4882a593Smuzhiyun------------------
509*4882a593Smuzhiyun
510*4882a593Smuzhiyun::
511*4882a593Smuzhiyun
512*4882a593Smuzhiyun	#include <linux/kprobes.h>
513*4882a593Smuzhiyun	void unregister_kprobes(struct kprobe **kps, int num);
514*4882a593Smuzhiyun	void unregister_kretprobes(struct kretprobe **rps, int num);
515*4882a593Smuzhiyun
516*4882a593SmuzhiyunRemoves each of the num probes in the specified array at once.
517*4882a593Smuzhiyun
518*4882a593Smuzhiyun.. note::
519*4882a593Smuzhiyun
520*4882a593Smuzhiyun   If the functions find some incorrect probes (ex. unregistered
521*4882a593Smuzhiyun   probes) in the specified array, they clear the addr field of those
522*4882a593Smuzhiyun   incorrect probes. However, other probes in the array are
523*4882a593Smuzhiyun   unregistered correctly.
524*4882a593Smuzhiyun
525*4882a593Smuzhiyundisable_*probe
526*4882a593Smuzhiyun--------------
527*4882a593Smuzhiyun
528*4882a593Smuzhiyun::
529*4882a593Smuzhiyun
530*4882a593Smuzhiyun	#include <linux/kprobes.h>
531*4882a593Smuzhiyun	int disable_kprobe(struct kprobe *kp);
532*4882a593Smuzhiyun	int disable_kretprobe(struct kretprobe *rp);
533*4882a593Smuzhiyun
534*4882a593SmuzhiyunTemporarily disables the specified ``*probe``. You can enable it again by using
535*4882a593Smuzhiyunenable_*probe(). You must specify the probe which has been registered.
536*4882a593Smuzhiyun
537*4882a593Smuzhiyunenable_*probe
538*4882a593Smuzhiyun-------------
539*4882a593Smuzhiyun
540*4882a593Smuzhiyun::
541*4882a593Smuzhiyun
542*4882a593Smuzhiyun	#include <linux/kprobes.h>
543*4882a593Smuzhiyun	int enable_kprobe(struct kprobe *kp);
544*4882a593Smuzhiyun	int enable_kretprobe(struct kretprobe *rp);
545*4882a593Smuzhiyun
546*4882a593SmuzhiyunEnables ``*probe`` which has been disabled by disable_*probe(). You must specify
547*4882a593Smuzhiyunthe probe which has been registered.
548*4882a593Smuzhiyun
549*4882a593SmuzhiyunKprobes Features and Limitations
550*4882a593Smuzhiyun================================
551*4882a593Smuzhiyun
552*4882a593SmuzhiyunKprobes allows multiple probes at the same address. Also,
553*4882a593Smuzhiyuna probepoint for which there is a post_handler cannot be optimized.
554*4882a593SmuzhiyunSo if you install a kprobe with a post_handler, at an optimized
555*4882a593Smuzhiyunprobepoint, the probepoint will be unoptimized automatically.
556*4882a593Smuzhiyun
557*4882a593SmuzhiyunIn general, you can install a probe anywhere in the kernel.
558*4882a593SmuzhiyunIn particular, you can probe interrupt handlers.  Known exceptions
559*4882a593Smuzhiyunare discussed in this section.
560*4882a593Smuzhiyun
561*4882a593SmuzhiyunThe register_*probe functions will return -EINVAL if you attempt
562*4882a593Smuzhiyunto install a probe in the code that implements Kprobes (mostly
563*4882a593Smuzhiyunkernel/kprobes.c and ``arch/*/kernel/kprobes.c``, but also functions such
564*4882a593Smuzhiyunas do_page_fault and notifier_call_chain).
565*4882a593Smuzhiyun
566*4882a593SmuzhiyunIf you install a probe in an inline-able function, Kprobes makes
567*4882a593Smuzhiyunno attempt to chase down all inline instances of the function and
568*4882a593Smuzhiyuninstall probes there.  gcc may inline a function without being asked,
569*4882a593Smuzhiyunso keep this in mind if you're not seeing the probe hits you expect.
570*4882a593Smuzhiyun
571*4882a593SmuzhiyunA probe handler can modify the environment of the probed function
572*4882a593Smuzhiyun-- e.g., by modifying kernel data structures, or by modifying the
573*4882a593Smuzhiyuncontents of the pt_regs struct (which are restored to the registers
574*4882a593Smuzhiyunupon return from the breakpoint).  So Kprobes can be used, for example,
575*4882a593Smuzhiyunto install a bug fix or to inject faults for testing.  Kprobes, of
576*4882a593Smuzhiyuncourse, has no way to distinguish the deliberately injected faults
577*4882a593Smuzhiyunfrom the accidental ones.  Don't drink and probe.
578*4882a593Smuzhiyun
579*4882a593SmuzhiyunKprobes makes no attempt to prevent probe handlers from stepping on
580*4882a593Smuzhiyuneach other -- e.g., probing printk() and then calling printk() from a
581*4882a593Smuzhiyunprobe handler.  If a probe handler hits a probe, that second probe's
582*4882a593Smuzhiyunhandlers won't be run in that instance, and the kprobe.nmissed member
583*4882a593Smuzhiyunof the second probe will be incremented.
584*4882a593Smuzhiyun
585*4882a593SmuzhiyunAs of Linux v2.6.15-rc1, multiple handlers (or multiple instances of
586*4882a593Smuzhiyunthe same handler) may run concurrently on different CPUs.
587*4882a593Smuzhiyun
588*4882a593SmuzhiyunKprobes does not use mutexes or allocate memory except during
589*4882a593Smuzhiyunregistration and unregistration.
590*4882a593Smuzhiyun
591*4882a593SmuzhiyunProbe handlers are run with preemption disabled or interrupt disabled,
592*4882a593Smuzhiyunwhich depends on the architecture and optimization state.  (e.g.,
593*4882a593Smuzhiyunkretprobe handlers and optimized kprobe handlers run without interrupt
594*4882a593Smuzhiyundisabled on x86/x86-64).  In any case, your handler should not yield
595*4882a593Smuzhiyunthe CPU (e.g., by attempting to acquire a semaphore, or waiting I/O).
596*4882a593Smuzhiyun
597*4882a593SmuzhiyunSince a return probe is implemented by replacing the return
598*4882a593Smuzhiyunaddress with the trampoline's address, stack backtraces and calls
599*4882a593Smuzhiyunto __builtin_return_address() will typically yield the trampoline's
600*4882a593Smuzhiyunaddress instead of the real return address for kretprobed functions.
601*4882a593Smuzhiyun(As far as we can tell, __builtin_return_address() is used only
602*4882a593Smuzhiyunfor instrumentation and error reporting.)
603*4882a593Smuzhiyun
604*4882a593SmuzhiyunIf the number of times a function is called does not match the number
605*4882a593Smuzhiyunof times it returns, registering a return probe on that function may
606*4882a593Smuzhiyunproduce undesirable results. In such a case, a line:
607*4882a593Smuzhiyunkretprobe BUG!: Processing kretprobe d000000000041aa8 @ c00000000004f48c
608*4882a593Smuzhiyungets printed. With this information, one will be able to correlate the
609*4882a593Smuzhiyunexact instance of the kretprobe that caused the problem. We have the
610*4882a593Smuzhiyundo_exit() case covered. do_execve() and do_fork() are not an issue.
611*4882a593SmuzhiyunWe're unaware of other specific cases where this could be a problem.
612*4882a593Smuzhiyun
613*4882a593SmuzhiyunIf, upon entry to or exit from a function, the CPU is running on
614*4882a593Smuzhiyuna stack other than that of the current task, registering a return
615*4882a593Smuzhiyunprobe on that function may produce undesirable results.  For this
616*4882a593Smuzhiyunreason, Kprobes doesn't support return probes (or kprobes)
617*4882a593Smuzhiyunon the x86_64 version of __switch_to(); the registration functions
618*4882a593Smuzhiyunreturn -EINVAL.
619*4882a593Smuzhiyun
620*4882a593SmuzhiyunOn x86/x86-64, since the Jump Optimization of Kprobes modifies
621*4882a593Smuzhiyuninstructions widely, there are some limitations to optimization. To
622*4882a593Smuzhiyunexplain it, we introduce some terminology. Imagine a 3-instruction
623*4882a593Smuzhiyunsequence consisting of a two 2-byte instructions and one 3-byte
624*4882a593Smuzhiyuninstruction.
625*4882a593Smuzhiyun
626*4882a593Smuzhiyun::
627*4882a593Smuzhiyun
628*4882a593Smuzhiyun		IA
629*4882a593Smuzhiyun		|
630*4882a593Smuzhiyun	[-2][-1][0][1][2][3][4][5][6][7]
631*4882a593Smuzhiyun		[ins1][ins2][  ins3 ]
632*4882a593Smuzhiyun		[<-     DCR       ->]
633*4882a593Smuzhiyun		[<- JTPR ->]
634*4882a593Smuzhiyun
635*4882a593Smuzhiyun	ins1: 1st Instruction
636*4882a593Smuzhiyun	ins2: 2nd Instruction
637*4882a593Smuzhiyun	ins3: 3rd Instruction
638*4882a593Smuzhiyun	IA:  Insertion Address
639*4882a593Smuzhiyun	JTPR: Jump Target Prohibition Region
640*4882a593Smuzhiyun	DCR: Detoured Code Region
641*4882a593Smuzhiyun
642*4882a593SmuzhiyunThe instructions in DCR are copied to the out-of-line buffer
643*4882a593Smuzhiyunof the kprobe, because the bytes in DCR are replaced by
644*4882a593Smuzhiyuna 5-byte jump instruction. So there are several limitations.
645*4882a593Smuzhiyun
646*4882a593Smuzhiyuna) The instructions in DCR must be relocatable.
647*4882a593Smuzhiyunb) The instructions in DCR must not include a call instruction.
648*4882a593Smuzhiyunc) JTPR must not be targeted by any jump or call instruction.
649*4882a593Smuzhiyund) DCR must not straddle the border between functions.
650*4882a593Smuzhiyun
651*4882a593SmuzhiyunAnyway, these limitations are checked by the in-kernel instruction
652*4882a593Smuzhiyundecoder, so you don't need to worry about that.
653*4882a593Smuzhiyun
654*4882a593SmuzhiyunProbe Overhead
655*4882a593Smuzhiyun==============
656*4882a593Smuzhiyun
657*4882a593SmuzhiyunOn a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
658*4882a593Smuzhiyunmicroseconds to process.  Specifically, a benchmark that hits the same
659*4882a593Smuzhiyunprobepoint repeatedly, firing a simple handler each time, reports 1-2
660*4882a593Smuzhiyunmillion hits per second, depending on the architecture.  A return-probe
661*4882a593Smuzhiyunhit typically takes 50-75% longer than a kprobe hit.
662*4882a593SmuzhiyunWhen you have a return probe set on a function, adding a kprobe at
663*4882a593Smuzhiyunthe entry to that function adds essentially no overhead.
664*4882a593Smuzhiyun
665*4882a593SmuzhiyunHere are sample overhead figures (in usec) for different architectures::
666*4882a593Smuzhiyun
667*4882a593Smuzhiyun  k = kprobe; r = return probe; kr = kprobe + return probe
668*4882a593Smuzhiyun  on same function
669*4882a593Smuzhiyun
670*4882a593Smuzhiyun  i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
671*4882a593Smuzhiyun  k = 0.57 usec; r = 0.92; kr = 0.99
672*4882a593Smuzhiyun
673*4882a593Smuzhiyun  x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
674*4882a593Smuzhiyun  k = 0.49 usec; r = 0.80; kr = 0.82
675*4882a593Smuzhiyun
676*4882a593Smuzhiyun  ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
677*4882a593Smuzhiyun  k = 0.77 usec; r = 1.26; kr = 1.45
678*4882a593Smuzhiyun
679*4882a593SmuzhiyunOptimized Probe Overhead
680*4882a593Smuzhiyun------------------------
681*4882a593Smuzhiyun
682*4882a593SmuzhiyunTypically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to
683*4882a593Smuzhiyunprocess. Here are sample overhead figures (in usec) for x86 architectures::
684*4882a593Smuzhiyun
685*4882a593Smuzhiyun  k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe,
686*4882a593Smuzhiyun  r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe.
687*4882a593Smuzhiyun
688*4882a593Smuzhiyun  i386: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
689*4882a593Smuzhiyun  k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33
690*4882a593Smuzhiyun
691*4882a593Smuzhiyun  x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
692*4882a593Smuzhiyun  k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30
693*4882a593Smuzhiyun
694*4882a593SmuzhiyunTODO
695*4882a593Smuzhiyun====
696*4882a593Smuzhiyun
697*4882a593Smuzhiyuna. SystemTap (http://sourceware.org/systemtap): Provides a simplified
698*4882a593Smuzhiyun   programming interface for probe-based instrumentation.  Try it out.
699*4882a593Smuzhiyunb. Kernel return probes for sparc64.
700*4882a593Smuzhiyunc. Support for other architectures.
701*4882a593Smuzhiyund. User-space probes.
702*4882a593Smuzhiyune. Watchpoint probes (which fire on data references).
703*4882a593Smuzhiyun
704*4882a593SmuzhiyunKprobes Example
705*4882a593Smuzhiyun===============
706*4882a593Smuzhiyun
707*4882a593SmuzhiyunSee samples/kprobes/kprobe_example.c
708*4882a593Smuzhiyun
709*4882a593SmuzhiyunKretprobes Example
710*4882a593Smuzhiyun==================
711*4882a593Smuzhiyun
712*4882a593SmuzhiyunSee samples/kprobes/kretprobe_example.c
713*4882a593Smuzhiyun
714*4882a593SmuzhiyunDeprecated Features
715*4882a593Smuzhiyun===================
716*4882a593Smuzhiyun
717*4882a593SmuzhiyunJprobes is now a deprecated feature. People who are depending on it should
718*4882a593Smuzhiyunmigrate to other tracing features or use older kernels. Please consider to
719*4882a593Smuzhiyunmigrate your tool to one of the following options:
720*4882a593Smuzhiyun
721*4882a593Smuzhiyun- Use trace-event to trace target function with arguments.
722*4882a593Smuzhiyun
723*4882a593Smuzhiyun  trace-event is a low-overhead (and almost no visible overhead if it
724*4882a593Smuzhiyun  is off) statically defined event interface. You can define new events
725*4882a593Smuzhiyun  and trace it via ftrace or any other tracing tools.
726*4882a593Smuzhiyun
727*4882a593Smuzhiyun  See the following urls:
728*4882a593Smuzhiyun
729*4882a593Smuzhiyun    - https://lwn.net/Articles/379903/
730*4882a593Smuzhiyun    - https://lwn.net/Articles/381064/
731*4882a593Smuzhiyun    - https://lwn.net/Articles/383362/
732*4882a593Smuzhiyun
733*4882a593Smuzhiyun- Use ftrace dynamic events (kprobe event) with perf-probe.
734*4882a593Smuzhiyun
735*4882a593Smuzhiyun  If you build your kernel with debug info (CONFIG_DEBUG_INFO=y), you can
736*4882a593Smuzhiyun  find which register/stack is assigned to which local variable or arguments
737*4882a593Smuzhiyun  by using perf-probe and set up new event to trace it.
738*4882a593Smuzhiyun
739*4882a593Smuzhiyun  See following documents:
740*4882a593Smuzhiyun
741*4882a593Smuzhiyun  - Documentation/trace/kprobetrace.rst
742*4882a593Smuzhiyun  - Documentation/trace/events.rst
743*4882a593Smuzhiyun  - tools/perf/Documentation/perf-probe.txt
744*4882a593Smuzhiyun
745*4882a593Smuzhiyun
746*4882a593SmuzhiyunThe kprobes debugfs interface
747*4882a593Smuzhiyun=============================
748*4882a593Smuzhiyun
749*4882a593Smuzhiyun
750*4882a593SmuzhiyunWith recent kernels (> 2.6.20) the list of registered kprobes is visible
751*4882a593Smuzhiyununder the /sys/kernel/debug/kprobes/ directory (assuming debugfs is mounted at //sys/kernel/debug).
752*4882a593Smuzhiyun
753*4882a593Smuzhiyun/sys/kernel/debug/kprobes/list: Lists all registered probes on the system::
754*4882a593Smuzhiyun
755*4882a593Smuzhiyun	c015d71a  k  vfs_read+0x0
756*4882a593Smuzhiyun	c03dedc5  r  tcp_v4_rcv+0x0
757*4882a593Smuzhiyun
758*4882a593SmuzhiyunThe first column provides the kernel address where the probe is inserted.
759*4882a593SmuzhiyunThe second column identifies the type of probe (k - kprobe and r - kretprobe)
760*4882a593Smuzhiyunwhile the third column specifies the symbol+offset of the probe.
761*4882a593SmuzhiyunIf the probed function belongs to a module, the module name is also
762*4882a593Smuzhiyunspecified. Following columns show probe status. If the probe is on
763*4882a593Smuzhiyuna virtual address that is no longer valid (module init sections, module
764*4882a593Smuzhiyunvirtual addresses that correspond to modules that've been unloaded),
765*4882a593Smuzhiyunsuch probes are marked with [GONE]. If the probe is temporarily disabled,
766*4882a593Smuzhiyunsuch probes are marked with [DISABLED]. If the probe is optimized, it is
767*4882a593Smuzhiyunmarked with [OPTIMIZED]. If the probe is ftrace-based, it is marked with
768*4882a593Smuzhiyun[FTRACE].
769*4882a593Smuzhiyun
770*4882a593Smuzhiyun/sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
771*4882a593Smuzhiyun
772*4882a593SmuzhiyunProvides a knob to globally and forcibly turn registered kprobes ON or OFF.
773*4882a593SmuzhiyunBy default, all kprobes are enabled. By echoing "0" to this file, all
774*4882a593Smuzhiyunregistered probes will be disarmed, till such time a "1" is echoed to this
775*4882a593Smuzhiyunfile. Note that this knob just disarms and arms all kprobes and doesn't
776*4882a593Smuzhiyunchange each probe's disabling state. This means that disabled kprobes (marked
777*4882a593Smuzhiyun[DISABLED]) will be not enabled if you turn ON all kprobes by this knob.
778*4882a593Smuzhiyun
779*4882a593Smuzhiyun
780*4882a593SmuzhiyunThe kprobes sysctl interface
781*4882a593Smuzhiyun============================
782*4882a593Smuzhiyun
783*4882a593Smuzhiyun/proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF.
784*4882a593Smuzhiyun
785*4882a593SmuzhiyunWhen CONFIG_OPTPROBES=y, this sysctl interface appears and it provides
786*4882a593Smuzhiyuna knob to globally and forcibly turn jump optimization (see section
787*4882a593Smuzhiyun:ref:`kprobes_jump_optimization`) ON or OFF. By default, jump optimization
788*4882a593Smuzhiyunis allowed (ON). If you echo "0" to this file or set
789*4882a593Smuzhiyun"debug.kprobes_optimization" to 0 via sysctl, all optimized probes will be
790*4882a593Smuzhiyununoptimized, and any new probes registered after that will not be optimized.
791*4882a593Smuzhiyun
792*4882a593SmuzhiyunNote that this knob *changes* the optimized state. This means that optimized
793*4882a593Smuzhiyunprobes (marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be
794*4882a593Smuzhiyunremoved). If the knob is turned on, they will be optimized again.
795*4882a593Smuzhiyun
796*4882a593SmuzhiyunReferences
797*4882a593Smuzhiyun==========
798*4882a593Smuzhiyun
799*4882a593SmuzhiyunFor additional information on Kprobes, refer to the following URLs:
800*4882a593Smuzhiyun
801*4882a593Smuzhiyun- https://www.ibm.com/developerworks/library/l-kprobes/index.html
802*4882a593Smuzhiyun- https://www.kernel.org/doc/ols/2006/ols2006v2-pages-109-124.pdf
803*4882a593Smuzhiyun
804