1*4882a593Smuzhiyun================================= 2*4882a593SmuzhiyunUsing ftrace to hook to functions 3*4882a593Smuzhiyun================================= 4*4882a593Smuzhiyun 5*4882a593Smuzhiyun.. Copyright 2017 VMware Inc. 6*4882a593Smuzhiyun.. Author: Steven Rostedt <srostedt@goodmis.org> 7*4882a593Smuzhiyun.. License: The GNU Free Documentation License, Version 1.2 8*4882a593Smuzhiyun.. (dual licensed under the GPL v2) 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunWritten for: 4.14 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunIntroduction 13*4882a593Smuzhiyun============ 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunThe ftrace infrastructure was originally created to attach callbacks to the 16*4882a593Smuzhiyunbeginning of functions in order to record and trace the flow of the kernel. 17*4882a593SmuzhiyunBut callbacks to the start of a function can have other use cases. Either 18*4882a593Smuzhiyunfor live kernel patching, or for security monitoring. This document describes 19*4882a593Smuzhiyunhow to use ftrace to implement your own function callbacks. 20*4882a593Smuzhiyun 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunThe ftrace context 23*4882a593Smuzhiyun================== 24*4882a593Smuzhiyun.. warning:: 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun The ability to add a callback to almost any function within the 27*4882a593Smuzhiyun kernel comes with risks. A callback can be called from any context 28*4882a593Smuzhiyun (normal, softirq, irq, and NMI). Callbacks can also be called just before 29*4882a593Smuzhiyun going to idle, during CPU bring up and takedown, or going to user space. 30*4882a593Smuzhiyun This requires extra care to what can be done inside a callback. A callback 31*4882a593Smuzhiyun can be called outside the protective scope of RCU. 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunThe ftrace infrastructure has some protections against recursions and RCU 34*4882a593Smuzhiyunbut one must still be very careful how they use the callbacks. 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunThe ftrace_ops structure 38*4882a593Smuzhiyun======================== 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunTo register a function callback, a ftrace_ops is required. This structure 41*4882a593Smuzhiyunis used to tell ftrace what function should be called as the callback 42*4882a593Smuzhiyunas well as what protections the callback will perform and not require 43*4882a593Smuzhiyunftrace to handle. 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunThere is only one field that is needed to be set when registering 46*4882a593Smuzhiyunan ftrace_ops with ftrace: 47*4882a593Smuzhiyun 48*4882a593Smuzhiyun.. code-block:: c 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun struct ftrace_ops ops = { 51*4882a593Smuzhiyun .func = my_callback_func, 52*4882a593Smuzhiyun .flags = MY_FTRACE_FLAGS 53*4882a593Smuzhiyun .private = any_private_data_structure, 54*4882a593Smuzhiyun }; 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunBoth .flags and .private are optional. Only .func is required. 57*4882a593Smuzhiyun 58*4882a593SmuzhiyunTo enable tracing call:: 59*4882a593Smuzhiyun 60*4882a593Smuzhiyun register_ftrace_function(&ops); 61*4882a593Smuzhiyun 62*4882a593SmuzhiyunTo disable tracing call:: 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun unregister_ftrace_function(&ops); 65*4882a593Smuzhiyun 66*4882a593SmuzhiyunThe above is defined by including the header:: 67*4882a593Smuzhiyun 68*4882a593Smuzhiyun #include <linux/ftrace.h> 69*4882a593Smuzhiyun 70*4882a593SmuzhiyunThe registered callback will start being called some time after the 71*4882a593Smuzhiyunregister_ftrace_function() is called and before it returns. The exact time 72*4882a593Smuzhiyunthat callbacks start being called is dependent upon architecture and scheduling 73*4882a593Smuzhiyunof services. The callback itself will have to handle any synchronization if it 74*4882a593Smuzhiyunmust begin at an exact moment. 75*4882a593Smuzhiyun 76*4882a593SmuzhiyunThe unregister_ftrace_function() will guarantee that the callback is 77*4882a593Smuzhiyunno longer being called by functions after the unregister_ftrace_function() 78*4882a593Smuzhiyunreturns. Note that to perform this guarantee, the unregister_ftrace_function() 79*4882a593Smuzhiyunmay take some time to finish. 80*4882a593Smuzhiyun 81*4882a593Smuzhiyun 82*4882a593SmuzhiyunThe callback function 83*4882a593Smuzhiyun===================== 84*4882a593Smuzhiyun 85*4882a593SmuzhiyunThe prototype of the callback function is as follows (as of v4.14): 86*4882a593Smuzhiyun 87*4882a593Smuzhiyun.. code-block:: c 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun void callback_func(unsigned long ip, unsigned long parent_ip, 90*4882a593Smuzhiyun struct ftrace_ops *op, struct pt_regs *regs); 91*4882a593Smuzhiyun 92*4882a593Smuzhiyun@ip 93*4882a593Smuzhiyun This is the instruction pointer of the function that is being traced. 94*4882a593Smuzhiyun (where the fentry or mcount is within the function) 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun@parent_ip 97*4882a593Smuzhiyun This is the instruction pointer of the function that called the 98*4882a593Smuzhiyun the function being traced (where the call of the function occurred). 99*4882a593Smuzhiyun 100*4882a593Smuzhiyun@op 101*4882a593Smuzhiyun This is a pointer to ftrace_ops that was used to register the callback. 102*4882a593Smuzhiyun This can be used to pass data to the callback via the private pointer. 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun@regs 105*4882a593Smuzhiyun If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED 106*4882a593Smuzhiyun flags are set in the ftrace_ops structure, then this will be pointing 107*4882a593Smuzhiyun to the pt_regs structure like it would be if an breakpoint was placed 108*4882a593Smuzhiyun at the start of the function where ftrace was tracing. Otherwise it 109*4882a593Smuzhiyun either contains garbage, or NULL. 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunThe ftrace FLAGS 113*4882a593Smuzhiyun================ 114*4882a593Smuzhiyun 115*4882a593SmuzhiyunThe ftrace_ops flags are all defined and documented in include/linux/ftrace.h. 116*4882a593SmuzhiyunSome of the flags are used for internal infrastructure of ftrace, but the 117*4882a593Smuzhiyunones that users should be aware of are the following: 118*4882a593Smuzhiyun 119*4882a593SmuzhiyunFTRACE_OPS_FL_SAVE_REGS 120*4882a593Smuzhiyun If the callback requires reading or modifying the pt_regs 121*4882a593Smuzhiyun passed to the callback, then it must set this flag. Registering 122*4882a593Smuzhiyun a ftrace_ops with this flag set on an architecture that does not 123*4882a593Smuzhiyun support passing of pt_regs to the callback will fail. 124*4882a593Smuzhiyun 125*4882a593SmuzhiyunFTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED 126*4882a593Smuzhiyun Similar to SAVE_REGS but the registering of a 127*4882a593Smuzhiyun ftrace_ops on an architecture that does not support passing of regs 128*4882a593Smuzhiyun will not fail with this flag set. But the callback must check if 129*4882a593Smuzhiyun regs is NULL or not to determine if the architecture supports it. 130*4882a593Smuzhiyun 131*4882a593SmuzhiyunFTRACE_OPS_FL_RECURSION_SAFE 132*4882a593Smuzhiyun By default, a wrapper is added around the callback to 133*4882a593Smuzhiyun make sure that recursion of the function does not occur. That is, 134*4882a593Smuzhiyun if a function that is called as a result of the callback's execution 135*4882a593Smuzhiyun is also traced, ftrace will prevent the callback from being called 136*4882a593Smuzhiyun again. But this wrapper adds some overhead, and if the callback is 137*4882a593Smuzhiyun safe from recursion, it can set this flag to disable the ftrace 138*4882a593Smuzhiyun protection. 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun Note, if this flag is set, and recursion does occur, it could cause 141*4882a593Smuzhiyun the system to crash, and possibly reboot via a triple fault. 142*4882a593Smuzhiyun 143*4882a593Smuzhiyun It is OK if another callback traces a function that is called by a 144*4882a593Smuzhiyun callback that is marked recursion safe. Recursion safe callbacks 145*4882a593Smuzhiyun must never trace any function that are called by the callback 146*4882a593Smuzhiyun itself or any nested functions that those functions call. 147*4882a593Smuzhiyun 148*4882a593Smuzhiyun If this flag is set, it is possible that the callback will also 149*4882a593Smuzhiyun be called with preemption enabled (when CONFIG_PREEMPTION is set), 150*4882a593Smuzhiyun but this is not guaranteed. 151*4882a593Smuzhiyun 152*4882a593SmuzhiyunFTRACE_OPS_FL_IPMODIFY 153*4882a593Smuzhiyun Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack" 154*4882a593Smuzhiyun the traced function (have another function called instead of the 155*4882a593Smuzhiyun traced function), it requires setting this flag. This is what live 156*4882a593Smuzhiyun kernel patches uses. Without this flag the pt_regs->ip can not be 157*4882a593Smuzhiyun modified. 158*4882a593Smuzhiyun 159*4882a593Smuzhiyun Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be 160*4882a593Smuzhiyun registered to any given function at a time. 161*4882a593Smuzhiyun 162*4882a593SmuzhiyunFTRACE_OPS_FL_RCU 163*4882a593Smuzhiyun If this is set, then the callback will only be called by functions 164*4882a593Smuzhiyun where RCU is "watching". This is required if the callback function 165*4882a593Smuzhiyun performs any rcu_read_lock() operation. 166*4882a593Smuzhiyun 167*4882a593Smuzhiyun RCU stops watching when the system goes idle, the time when a CPU 168*4882a593Smuzhiyun is taken down and comes back online, and when entering from kernel 169*4882a593Smuzhiyun to user space and back to kernel space. During these transitions, 170*4882a593Smuzhiyun a callback may be executed and RCU synchronization will not protect 171*4882a593Smuzhiyun it. 172*4882a593Smuzhiyun 173*4882a593SmuzhiyunFTRACE_OPS_FL_PERMANENT 174*4882a593Smuzhiyun If this is set on any ftrace ops, then the tracing cannot disabled by 175*4882a593Smuzhiyun writing 0 to the proc sysctl ftrace_enabled. Equally, a callback with 176*4882a593Smuzhiyun the flag set cannot be registered if ftrace_enabled is 0. 177*4882a593Smuzhiyun 178*4882a593Smuzhiyun Livepatch uses it not to lose the function redirection, so the system 179*4882a593Smuzhiyun stays protected. 180*4882a593Smuzhiyun 181*4882a593Smuzhiyun 182*4882a593SmuzhiyunFiltering which functions to trace 183*4882a593Smuzhiyun================================== 184*4882a593Smuzhiyun 185*4882a593SmuzhiyunIf a callback is only to be called from specific functions, a filter must be 186*4882a593Smuzhiyunset up. The filters are added by name, or ip if it is known. 187*4882a593Smuzhiyun 188*4882a593Smuzhiyun.. code-block:: c 189*4882a593Smuzhiyun 190*4882a593Smuzhiyun int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, 191*4882a593Smuzhiyun int len, int reset); 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun@ops 194*4882a593Smuzhiyun The ops to set the filter with 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun@buf 197*4882a593Smuzhiyun The string that holds the function filter text. 198*4882a593Smuzhiyun@len 199*4882a593Smuzhiyun The length of the string. 200*4882a593Smuzhiyun 201*4882a593Smuzhiyun@reset 202*4882a593Smuzhiyun Non-zero to reset all filters before applying this filter. 203*4882a593Smuzhiyun 204*4882a593SmuzhiyunFilters denote which functions should be enabled when tracing is enabled. 205*4882a593SmuzhiyunIf @buf is NULL and reset is set, all functions will be enabled for tracing. 206*4882a593Smuzhiyun 207*4882a593SmuzhiyunThe @buf can also be a glob expression to enable all functions that 208*4882a593Smuzhiyunmatch a specific pattern. 209*4882a593Smuzhiyun 210*4882a593SmuzhiyunSee Filter Commands in :file:`Documentation/trace/ftrace.rst`. 211*4882a593Smuzhiyun 212*4882a593SmuzhiyunTo just trace the schedule function: 213*4882a593Smuzhiyun 214*4882a593Smuzhiyun.. code-block:: c 215*4882a593Smuzhiyun 216*4882a593Smuzhiyun ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0); 217*4882a593Smuzhiyun 218*4882a593SmuzhiyunTo add more functions, call the ftrace_set_filter() more than once with the 219*4882a593Smuzhiyun@reset parameter set to zero. To remove the current filter set and replace it 220*4882a593Smuzhiyunwith new functions defined by @buf, have @reset be non-zero. 221*4882a593Smuzhiyun 222*4882a593SmuzhiyunTo remove all the filtered functions and trace all functions: 223*4882a593Smuzhiyun 224*4882a593Smuzhiyun.. code-block:: c 225*4882a593Smuzhiyun 226*4882a593Smuzhiyun ret = ftrace_set_filter(&ops, NULL, 0, 1); 227*4882a593Smuzhiyun 228*4882a593Smuzhiyun 229*4882a593SmuzhiyunSometimes more than one function has the same name. To trace just a specific 230*4882a593Smuzhiyunfunction in this case, ftrace_set_filter_ip() can be used. 231*4882a593Smuzhiyun 232*4882a593Smuzhiyun.. code-block:: c 233*4882a593Smuzhiyun 234*4882a593Smuzhiyun ret = ftrace_set_filter_ip(&ops, ip, 0, 0); 235*4882a593Smuzhiyun 236*4882a593SmuzhiyunAlthough the ip must be the address where the call to fentry or mcount is 237*4882a593Smuzhiyunlocated in the function. This function is used by perf and kprobes that 238*4882a593Smuzhiyungets the ip address from the user (usually using debug info from the kernel). 239*4882a593Smuzhiyun 240*4882a593SmuzhiyunIf a glob is used to set the filter, functions can be added to a "notrace" 241*4882a593Smuzhiyunlist that will prevent those functions from calling the callback. 242*4882a593SmuzhiyunThe "notrace" list takes precedence over the "filter" list. If the 243*4882a593Smuzhiyuntwo lists are non-empty and contain the same functions, the callback will not 244*4882a593Smuzhiyunbe called by any function. 245*4882a593Smuzhiyun 246*4882a593SmuzhiyunAn empty "notrace" list means to allow all functions defined by the filter 247*4882a593Smuzhiyunto be traced. 248*4882a593Smuzhiyun 249*4882a593Smuzhiyun.. code-block:: c 250*4882a593Smuzhiyun 251*4882a593Smuzhiyun int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, 252*4882a593Smuzhiyun int len, int reset); 253*4882a593Smuzhiyun 254*4882a593SmuzhiyunThis takes the same parameters as ftrace_set_filter() but will add the 255*4882a593Smuzhiyunfunctions it finds to not be traced. This is a separate list from the 256*4882a593Smuzhiyunfilter list, and this function does not modify the filter list. 257*4882a593Smuzhiyun 258*4882a593SmuzhiyunA non-zero @reset will clear the "notrace" list before adding functions 259*4882a593Smuzhiyunthat match @buf to it. 260*4882a593Smuzhiyun 261*4882a593SmuzhiyunClearing the "notrace" list is the same as clearing the filter list 262*4882a593Smuzhiyun 263*4882a593Smuzhiyun.. code-block:: c 264*4882a593Smuzhiyun 265*4882a593Smuzhiyun ret = ftrace_set_notrace(&ops, NULL, 0, 1); 266*4882a593Smuzhiyun 267*4882a593SmuzhiyunThe filter and notrace lists may be changed at any time. If only a set of 268*4882a593Smuzhiyunfunctions should call the callback, it is best to set the filters before 269*4882a593Smuzhiyunregistering the callback. But the changes may also happen after the callback 270*4882a593Smuzhiyunhas been registered. 271*4882a593Smuzhiyun 272*4882a593SmuzhiyunIf a filter is in place, and the @reset is non-zero, and @buf contains a 273*4882a593Smuzhiyunmatching glob to functions, the switch will happen during the time of 274*4882a593Smuzhiyunthe ftrace_set_filter() call. At no time will all functions call the callback. 275*4882a593Smuzhiyun 276*4882a593Smuzhiyun.. code-block:: c 277*4882a593Smuzhiyun 278*4882a593Smuzhiyun ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); 279*4882a593Smuzhiyun 280*4882a593Smuzhiyun register_ftrace_function(&ops); 281*4882a593Smuzhiyun 282*4882a593Smuzhiyun msleep(10); 283*4882a593Smuzhiyun 284*4882a593Smuzhiyun ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1); 285*4882a593Smuzhiyun 286*4882a593Smuzhiyunis not the same as: 287*4882a593Smuzhiyun 288*4882a593Smuzhiyun.. code-block:: c 289*4882a593Smuzhiyun 290*4882a593Smuzhiyun ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); 291*4882a593Smuzhiyun 292*4882a593Smuzhiyun register_ftrace_function(&ops); 293*4882a593Smuzhiyun 294*4882a593Smuzhiyun msleep(10); 295*4882a593Smuzhiyun 296*4882a593Smuzhiyun ftrace_set_filter(&ops, NULL, 0, 1); 297*4882a593Smuzhiyun 298*4882a593Smuzhiyun ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0); 299*4882a593Smuzhiyun 300*4882a593SmuzhiyunAs the latter will have a short time where all functions will call 301*4882a593Smuzhiyunthe callback, between the time of the reset, and the time of the 302*4882a593Smuzhiyunnew setting of the filter. 303