1*4882a593Smuzhiyun============= 2*4882a593SmuzhiyunEvent Tracing 3*4882a593Smuzhiyun============= 4*4882a593Smuzhiyun 5*4882a593Smuzhiyun:Author: Theodore Ts'o 6*4882a593Smuzhiyun:Updated: Li Zefan and Tom Zanussi 7*4882a593Smuzhiyun 8*4882a593Smuzhiyun1. Introduction 9*4882a593Smuzhiyun=============== 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunTracepoints (see Documentation/trace/tracepoints.rst) can be used 12*4882a593Smuzhiyunwithout creating custom kernel modules to register probe functions 13*4882a593Smuzhiyunusing the event tracing infrastructure. 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunNot all tracepoints can be traced using the event tracing system; 16*4882a593Smuzhiyunthe kernel developer must provide code snippets which define how the 17*4882a593Smuzhiyuntracing information is saved into the tracing buffer, and how the 18*4882a593Smuzhiyuntracing information should be printed. 19*4882a593Smuzhiyun 20*4882a593Smuzhiyun2. Using Event Tracing 21*4882a593Smuzhiyun====================== 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun2.1 Via the 'set_event' interface 24*4882a593Smuzhiyun--------------------------------- 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunThe events which are available for tracing can be found in the file 27*4882a593Smuzhiyun/sys/kernel/debug/tracing/available_events. 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunTo enable a particular event, such as 'sched_wakeup', simply echo it 30*4882a593Smuzhiyunto /sys/kernel/debug/tracing/set_event. For example:: 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event 33*4882a593Smuzhiyun 34*4882a593Smuzhiyun.. Note:: '>>' is necessary, otherwise it will firstly disable all the events. 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunTo disable an event, echo the event name to the set_event file prefixed 37*4882a593Smuzhiyunwith an exclamation point:: 38*4882a593Smuzhiyun 39*4882a593Smuzhiyun # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event 40*4882a593Smuzhiyun 41*4882a593SmuzhiyunTo disable all events, echo an empty line to the set_event file:: 42*4882a593Smuzhiyun 43*4882a593Smuzhiyun # echo > /sys/kernel/debug/tracing/set_event 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunTo enable all events, echo ``*:*`` or ``*:`` to the set_event file:: 46*4882a593Smuzhiyun 47*4882a593Smuzhiyun # echo *:* > /sys/kernel/debug/tracing/set_event 48*4882a593Smuzhiyun 49*4882a593SmuzhiyunThe events are organized into subsystems, such as ext4, irq, sched, 50*4882a593Smuzhiyunetc., and a full event name looks like this: <subsystem>:<event>. The 51*4882a593Smuzhiyunsubsystem name is optional, but it is displayed in the available_events 52*4882a593Smuzhiyunfile. All of the events in a subsystem can be specified via the syntax 53*4882a593Smuzhiyun``<subsystem>:*``; for example, to enable all irq events, you can use the 54*4882a593Smuzhiyuncommand:: 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun # echo 'irq:*' > /sys/kernel/debug/tracing/set_event 57*4882a593Smuzhiyun 58*4882a593Smuzhiyun2.2 Via the 'enable' toggle 59*4882a593Smuzhiyun--------------------------- 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunThe events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy 62*4882a593Smuzhiyunof directories. 63*4882a593Smuzhiyun 64*4882a593SmuzhiyunTo enable event 'sched_wakeup':: 65*4882a593Smuzhiyun 66*4882a593Smuzhiyun # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable 67*4882a593Smuzhiyun 68*4882a593SmuzhiyunTo disable it:: 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable 71*4882a593Smuzhiyun 72*4882a593SmuzhiyunTo enable all events in sched subsystem:: 73*4882a593Smuzhiyun 74*4882a593Smuzhiyun # echo 1 > /sys/kernel/debug/tracing/events/sched/enable 75*4882a593Smuzhiyun 76*4882a593SmuzhiyunTo enable all events:: 77*4882a593Smuzhiyun 78*4882a593Smuzhiyun # echo 1 > /sys/kernel/debug/tracing/events/enable 79*4882a593Smuzhiyun 80*4882a593SmuzhiyunWhen reading one of these enable files, there are four results: 81*4882a593Smuzhiyun 82*4882a593Smuzhiyun - 0 - all events this file affects are disabled 83*4882a593Smuzhiyun - 1 - all events this file affects are enabled 84*4882a593Smuzhiyun - X - there is a mixture of events enabled and disabled 85*4882a593Smuzhiyun - ? - this file does not affect any event 86*4882a593Smuzhiyun 87*4882a593Smuzhiyun2.3 Boot option 88*4882a593Smuzhiyun--------------- 89*4882a593Smuzhiyun 90*4882a593SmuzhiyunIn order to facilitate early boot debugging, use boot option:: 91*4882a593Smuzhiyun 92*4882a593Smuzhiyun trace_event=[event-list] 93*4882a593Smuzhiyun 94*4882a593Smuzhiyunevent-list is a comma separated list of events. See section 2.1 for event 95*4882a593Smuzhiyunformat. 96*4882a593Smuzhiyun 97*4882a593Smuzhiyun3. Defining an event-enabled tracepoint 98*4882a593Smuzhiyun======================================= 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunSee The example provided in samples/trace_events 101*4882a593Smuzhiyun 102*4882a593Smuzhiyun4. Event formats 103*4882a593Smuzhiyun================ 104*4882a593Smuzhiyun 105*4882a593SmuzhiyunEach trace event has a 'format' file associated with it that contains 106*4882a593Smuzhiyuna description of each field in a logged event. This information can 107*4882a593Smuzhiyunbe used to parse the binary trace stream, and is also the place to 108*4882a593Smuzhiyunfind the field names that can be used in event filters (see section 5). 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunIt also displays the format string that will be used to print the 111*4882a593Smuzhiyunevent in text mode, along with the event name and ID used for 112*4882a593Smuzhiyunprofiling. 113*4882a593Smuzhiyun 114*4882a593SmuzhiyunEvery event has a set of ``common`` fields associated with it; these are 115*4882a593Smuzhiyunthe fields prefixed with ``common_``. The other fields vary between 116*4882a593Smuzhiyunevents and correspond to the fields defined in the TRACE_EVENT 117*4882a593Smuzhiyundefinition for that event. 118*4882a593Smuzhiyun 119*4882a593SmuzhiyunEach field in the format has the form:: 120*4882a593Smuzhiyun 121*4882a593Smuzhiyun field:field-type field-name; offset:N; size:N; 122*4882a593Smuzhiyun 123*4882a593Smuzhiyunwhere offset is the offset of the field in the trace record and size 124*4882a593Smuzhiyunis the size of the data item, in bytes. 125*4882a593Smuzhiyun 126*4882a593SmuzhiyunFor example, here's the information displayed for the 'sched_wakeup' 127*4882a593Smuzhiyunevent:: 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format 130*4882a593Smuzhiyun 131*4882a593Smuzhiyun name: sched_wakeup 132*4882a593Smuzhiyun ID: 60 133*4882a593Smuzhiyun format: 134*4882a593Smuzhiyun field:unsigned short common_type; offset:0; size:2; 135*4882a593Smuzhiyun field:unsigned char common_flags; offset:2; size:1; 136*4882a593Smuzhiyun field:unsigned char common_preempt_count; offset:3; size:1; 137*4882a593Smuzhiyun field:int common_pid; offset:4; size:4; 138*4882a593Smuzhiyun field:int common_tgid; offset:8; size:4; 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun field:char comm[TASK_COMM_LEN]; offset:12; size:16; 141*4882a593Smuzhiyun field:pid_t pid; offset:28; size:4; 142*4882a593Smuzhiyun field:int prio; offset:32; size:4; 143*4882a593Smuzhiyun field:int success; offset:36; size:4; 144*4882a593Smuzhiyun field:int cpu; offset:40; size:4; 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid, 147*4882a593Smuzhiyun REC->prio, REC->success, REC->cpu 148*4882a593Smuzhiyun 149*4882a593SmuzhiyunThis event contains 10 fields, the first 5 common and the remaining 5 150*4882a593Smuzhiyunevent-specific. All the fields for this event are numeric, except for 151*4882a593Smuzhiyun'comm' which is a string, a distinction important for event filtering. 152*4882a593Smuzhiyun 153*4882a593Smuzhiyun5. Event filtering 154*4882a593Smuzhiyun================== 155*4882a593Smuzhiyun 156*4882a593SmuzhiyunTrace events can be filtered in the kernel by associating boolean 157*4882a593Smuzhiyun'filter expressions' with them. As soon as an event is logged into 158*4882a593Smuzhiyunthe trace buffer, its fields are checked against the filter expression 159*4882a593Smuzhiyunassociated with that event type. An event with field values that 160*4882a593Smuzhiyun'match' the filter will appear in the trace output, and an event whose 161*4882a593Smuzhiyunvalues don't match will be discarded. An event with no filter 162*4882a593Smuzhiyunassociated with it matches everything, and is the default when no 163*4882a593Smuzhiyunfilter has been set for an event. 164*4882a593Smuzhiyun 165*4882a593Smuzhiyun5.1 Expression syntax 166*4882a593Smuzhiyun--------------------- 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunA filter expression consists of one or more 'predicates' that can be 169*4882a593Smuzhiyuncombined using the logical operators '&&' and '||'. A predicate is 170*4882a593Smuzhiyunsimply a clause that compares the value of a field contained within a 171*4882a593Smuzhiyunlogged event with a constant value and returns either 0 or 1 depending 172*4882a593Smuzhiyunon whether the field value matched (1) or didn't match (0):: 173*4882a593Smuzhiyun 174*4882a593Smuzhiyun field-name relational-operator value 175*4882a593Smuzhiyun 176*4882a593SmuzhiyunParentheses can be used to provide arbitrary logical groupings and 177*4882a593Smuzhiyundouble-quotes can be used to prevent the shell from interpreting 178*4882a593Smuzhiyunoperators as shell metacharacters. 179*4882a593Smuzhiyun 180*4882a593SmuzhiyunThe field-names available for use in filters can be found in the 181*4882a593Smuzhiyun'format' files for trace events (see section 4). 182*4882a593Smuzhiyun 183*4882a593SmuzhiyunThe relational-operators depend on the type of the field being tested: 184*4882a593Smuzhiyun 185*4882a593SmuzhiyunThe operators available for numeric fields are: 186*4882a593Smuzhiyun 187*4882a593Smuzhiyun==, !=, <, <=, >, >=, & 188*4882a593Smuzhiyun 189*4882a593SmuzhiyunAnd for string fields they are: 190*4882a593Smuzhiyun 191*4882a593Smuzhiyun==, !=, ~ 192*4882a593Smuzhiyun 193*4882a593SmuzhiyunThe glob (~) accepts a wild card character (\*,?) and character classes 194*4882a593Smuzhiyun([). For example:: 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun prev_comm ~ "*sh" 197*4882a593Smuzhiyun prev_comm ~ "sh*" 198*4882a593Smuzhiyun prev_comm ~ "*sh*" 199*4882a593Smuzhiyun prev_comm ~ "ba*sh" 200*4882a593Smuzhiyun 201*4882a593SmuzhiyunIf the field is a pointer that points into user space (for example 202*4882a593Smuzhiyun"filename" from sys_enter_openat), then you have to append ".ustring" to the 203*4882a593Smuzhiyunfield name:: 204*4882a593Smuzhiyun 205*4882a593Smuzhiyun filename.ustring ~ "password" 206*4882a593Smuzhiyun 207*4882a593SmuzhiyunAs the kernel will have to know how to retrieve the memory that the pointer 208*4882a593Smuzhiyunis at from user space. 209*4882a593Smuzhiyun 210*4882a593Smuzhiyun5.2 Setting filters 211*4882a593Smuzhiyun------------------- 212*4882a593Smuzhiyun 213*4882a593SmuzhiyunA filter for an individual event is set by writing a filter expression 214*4882a593Smuzhiyunto the 'filter' file for the given event. 215*4882a593Smuzhiyun 216*4882a593SmuzhiyunFor example:: 217*4882a593Smuzhiyun 218*4882a593Smuzhiyun # cd /sys/kernel/debug/tracing/events/sched/sched_wakeup 219*4882a593Smuzhiyun # echo "common_preempt_count > 4" > filter 220*4882a593Smuzhiyun 221*4882a593SmuzhiyunA slightly more involved example:: 222*4882a593Smuzhiyun 223*4882a593Smuzhiyun # cd /sys/kernel/debug/tracing/events/signal/signal_generate 224*4882a593Smuzhiyun # echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter 225*4882a593Smuzhiyun 226*4882a593SmuzhiyunIf there is an error in the expression, you'll get an 'Invalid 227*4882a593Smuzhiyunargument' error when setting it, and the erroneous string along with 228*4882a593Smuzhiyunan error message can be seen by looking at the filter e.g.:: 229*4882a593Smuzhiyun 230*4882a593Smuzhiyun # cd /sys/kernel/debug/tracing/events/signal/signal_generate 231*4882a593Smuzhiyun # echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter 232*4882a593Smuzhiyun -bash: echo: write error: Invalid argument 233*4882a593Smuzhiyun # cat filter 234*4882a593Smuzhiyun ((sig >= 10 && sig < 15) || dsig == 17) && comm != bash 235*4882a593Smuzhiyun ^ 236*4882a593Smuzhiyun parse_error: Field not found 237*4882a593Smuzhiyun 238*4882a593SmuzhiyunCurrently the caret ('^') for an error always appears at the beginning of 239*4882a593Smuzhiyunthe filter string; the error message should still be useful though 240*4882a593Smuzhiyuneven without more accurate position info. 241*4882a593Smuzhiyun 242*4882a593Smuzhiyun5.2.1 Filter limitations 243*4882a593Smuzhiyun------------------------ 244*4882a593Smuzhiyun 245*4882a593SmuzhiyunIf a filter is placed on a string pointer ``(char *)`` that does not point 246*4882a593Smuzhiyunto a string on the ring buffer, but instead points to kernel or user space 247*4882a593Smuzhiyunmemory, then, for safety reasons, at most 1024 bytes of the content is 248*4882a593Smuzhiyuncopied onto a temporary buffer to do the compare. If the copy of the memory 249*4882a593Smuzhiyunfaults (the pointer points to memory that should not be accessed), then the 250*4882a593Smuzhiyunstring compare will be treated as not matching. 251*4882a593Smuzhiyun 252*4882a593Smuzhiyun5.3 Clearing filters 253*4882a593Smuzhiyun-------------------- 254*4882a593Smuzhiyun 255*4882a593SmuzhiyunTo clear the filter for an event, write a '0' to the event's filter 256*4882a593Smuzhiyunfile. 257*4882a593Smuzhiyun 258*4882a593SmuzhiyunTo clear the filters for all events in a subsystem, write a '0' to the 259*4882a593Smuzhiyunsubsystem's filter file. 260*4882a593Smuzhiyun 261*4882a593Smuzhiyun5.3 Subsystem filters 262*4882a593Smuzhiyun--------------------- 263*4882a593Smuzhiyun 264*4882a593SmuzhiyunFor convenience, filters for every event in a subsystem can be set or 265*4882a593Smuzhiyuncleared as a group by writing a filter expression into the filter file 266*4882a593Smuzhiyunat the root of the subsystem. Note however, that if a filter for any 267*4882a593Smuzhiyunevent within the subsystem lacks a field specified in the subsystem 268*4882a593Smuzhiyunfilter, or if the filter can't be applied for any other reason, the 269*4882a593Smuzhiyunfilter for that event will retain its previous setting. This can 270*4882a593Smuzhiyunresult in an unintended mixture of filters which could lead to 271*4882a593Smuzhiyunconfusing (to the user who might think different filters are in 272*4882a593Smuzhiyuneffect) trace output. Only filters that reference just the common 273*4882a593Smuzhiyunfields can be guaranteed to propagate successfully to all events. 274*4882a593Smuzhiyun 275*4882a593SmuzhiyunHere are a few subsystem filter examples that also illustrate the 276*4882a593Smuzhiyunabove points: 277*4882a593Smuzhiyun 278*4882a593SmuzhiyunClear the filters on all events in the sched subsystem:: 279*4882a593Smuzhiyun 280*4882a593Smuzhiyun # cd /sys/kernel/debug/tracing/events/sched 281*4882a593Smuzhiyun # echo 0 > filter 282*4882a593Smuzhiyun # cat sched_switch/filter 283*4882a593Smuzhiyun none 284*4882a593Smuzhiyun # cat sched_wakeup/filter 285*4882a593Smuzhiyun none 286*4882a593Smuzhiyun 287*4882a593SmuzhiyunSet a filter using only common fields for all events in the sched 288*4882a593Smuzhiyunsubsystem (all events end up with the same filter):: 289*4882a593Smuzhiyun 290*4882a593Smuzhiyun # cd /sys/kernel/debug/tracing/events/sched 291*4882a593Smuzhiyun # echo common_pid == 0 > filter 292*4882a593Smuzhiyun # cat sched_switch/filter 293*4882a593Smuzhiyun common_pid == 0 294*4882a593Smuzhiyun # cat sched_wakeup/filter 295*4882a593Smuzhiyun common_pid == 0 296*4882a593Smuzhiyun 297*4882a593SmuzhiyunAttempt to set a filter using a non-common field for all events in the 298*4882a593Smuzhiyunsched subsystem (all events but those that have a prev_pid field retain 299*4882a593Smuzhiyuntheir old filters):: 300*4882a593Smuzhiyun 301*4882a593Smuzhiyun # cd /sys/kernel/debug/tracing/events/sched 302*4882a593Smuzhiyun # echo prev_pid == 0 > filter 303*4882a593Smuzhiyun # cat sched_switch/filter 304*4882a593Smuzhiyun prev_pid == 0 305*4882a593Smuzhiyun # cat sched_wakeup/filter 306*4882a593Smuzhiyun common_pid == 0 307*4882a593Smuzhiyun 308*4882a593Smuzhiyun5.4 PID filtering 309*4882a593Smuzhiyun----------------- 310*4882a593Smuzhiyun 311*4882a593SmuzhiyunThe set_event_pid file in the same directory as the top events directory 312*4882a593Smuzhiyunexists, will filter all events from tracing any task that does not have the 313*4882a593SmuzhiyunPID listed in the set_event_pid file. 314*4882a593Smuzhiyun:: 315*4882a593Smuzhiyun 316*4882a593Smuzhiyun # cd /sys/kernel/debug/tracing 317*4882a593Smuzhiyun # echo $$ > set_event_pid 318*4882a593Smuzhiyun # echo 1 > events/enable 319*4882a593Smuzhiyun 320*4882a593SmuzhiyunWill only trace events for the current task. 321*4882a593Smuzhiyun 322*4882a593SmuzhiyunTo add more PIDs without losing the PIDs already included, use '>>'. 323*4882a593Smuzhiyun:: 324*4882a593Smuzhiyun 325*4882a593Smuzhiyun # echo 123 244 1 >> set_event_pid 326*4882a593Smuzhiyun 327*4882a593Smuzhiyun 328*4882a593Smuzhiyun6. Event triggers 329*4882a593Smuzhiyun================= 330*4882a593Smuzhiyun 331*4882a593SmuzhiyunTrace events can be made to conditionally invoke trigger 'commands' 332*4882a593Smuzhiyunwhich can take various forms and are described in detail below; 333*4882a593Smuzhiyunexamples would be enabling or disabling other trace events or invoking 334*4882a593Smuzhiyuna stack trace whenever the trace event is hit. Whenever a trace event 335*4882a593Smuzhiyunwith attached triggers is invoked, the set of trigger commands 336*4882a593Smuzhiyunassociated with that event is invoked. Any given trigger can 337*4882a593Smuzhiyunadditionally have an event filter of the same form as described in 338*4882a593Smuzhiyunsection 5 (Event filtering) associated with it - the command will only 339*4882a593Smuzhiyunbe invoked if the event being invoked passes the associated filter. 340*4882a593SmuzhiyunIf no filter is associated with the trigger, it always passes. 341*4882a593Smuzhiyun 342*4882a593SmuzhiyunTriggers are added to and removed from a particular event by writing 343*4882a593Smuzhiyuntrigger expressions to the 'trigger' file for the given event. 344*4882a593Smuzhiyun 345*4882a593SmuzhiyunA given event can have any number of triggers associated with it, 346*4882a593Smuzhiyunsubject to any restrictions that individual commands may have in that 347*4882a593Smuzhiyunregard. 348*4882a593Smuzhiyun 349*4882a593SmuzhiyunEvent triggers are implemented on top of "soft" mode, which means that 350*4882a593Smuzhiyunwhenever a trace event has one or more triggers associated with it, 351*4882a593Smuzhiyunthe event is activated even if it isn't actually enabled, but is 352*4882a593Smuzhiyundisabled in a "soft" mode. That is, the tracepoint will be called, 353*4882a593Smuzhiyunbut just will not be traced, unless of course it's actually enabled. 354*4882a593SmuzhiyunThis scheme allows triggers to be invoked even for events that aren't 355*4882a593Smuzhiyunenabled, and also allows the current event filter implementation to be 356*4882a593Smuzhiyunused for conditionally invoking triggers. 357*4882a593Smuzhiyun 358*4882a593SmuzhiyunThe syntax for event triggers is roughly based on the syntax for 359*4882a593Smuzhiyunset_ftrace_filter 'ftrace filter commands' (see the 'Filter commands' 360*4882a593Smuzhiyunsection of Documentation/trace/ftrace.rst), but there are major 361*4882a593Smuzhiyundifferences and the implementation isn't currently tied to it in any 362*4882a593Smuzhiyunway, so beware about making generalizations between the two. 363*4882a593Smuzhiyun 364*4882a593Smuzhiyun.. Note:: 365*4882a593Smuzhiyun Writing into trace_marker (See Documentation/trace/ftrace.rst) 366*4882a593Smuzhiyun can also enable triggers that are written into 367*4882a593Smuzhiyun /sys/kernel/tracing/events/ftrace/print/trigger 368*4882a593Smuzhiyun 369*4882a593Smuzhiyun6.1 Expression syntax 370*4882a593Smuzhiyun--------------------- 371*4882a593Smuzhiyun 372*4882a593SmuzhiyunTriggers are added by echoing the command to the 'trigger' file:: 373*4882a593Smuzhiyun 374*4882a593Smuzhiyun # echo 'command[:count] [if filter]' > trigger 375*4882a593Smuzhiyun 376*4882a593SmuzhiyunTriggers are removed by echoing the same command but starting with '!' 377*4882a593Smuzhiyunto the 'trigger' file:: 378*4882a593Smuzhiyun 379*4882a593Smuzhiyun # echo '!command[:count] [if filter]' > trigger 380*4882a593Smuzhiyun 381*4882a593SmuzhiyunThe [if filter] part isn't used in matching commands when removing, so 382*4882a593Smuzhiyunleaving that off in a '!' command will accomplish the same thing as 383*4882a593Smuzhiyunhaving it in. 384*4882a593Smuzhiyun 385*4882a593SmuzhiyunThe filter syntax is the same as that described in the 'Event 386*4882a593Smuzhiyunfiltering' section above. 387*4882a593Smuzhiyun 388*4882a593SmuzhiyunFor ease of use, writing to the trigger file using '>' currently just 389*4882a593Smuzhiyunadds or removes a single trigger and there's no explicit '>>' support 390*4882a593Smuzhiyun('>' actually behaves like '>>') or truncation support to remove all 391*4882a593Smuzhiyuntriggers (you have to use '!' for each one added.) 392*4882a593Smuzhiyun 393*4882a593Smuzhiyun6.2 Supported trigger commands 394*4882a593Smuzhiyun------------------------------ 395*4882a593Smuzhiyun 396*4882a593SmuzhiyunThe following commands are supported: 397*4882a593Smuzhiyun 398*4882a593Smuzhiyun- enable_event/disable_event 399*4882a593Smuzhiyun 400*4882a593Smuzhiyun These commands can enable or disable another trace event whenever 401*4882a593Smuzhiyun the triggering event is hit. When these commands are registered, 402*4882a593Smuzhiyun the other trace event is activated, but disabled in a "soft" mode. 403*4882a593Smuzhiyun That is, the tracepoint will be called, but just will not be traced. 404*4882a593Smuzhiyun The event tracepoint stays in this mode as long as there's a trigger 405*4882a593Smuzhiyun in effect that can trigger it. 406*4882a593Smuzhiyun 407*4882a593Smuzhiyun For example, the following trigger causes kmalloc events to be 408*4882a593Smuzhiyun traced when a read system call is entered, and the :1 at the end 409*4882a593Smuzhiyun specifies that this enablement happens only once:: 410*4882a593Smuzhiyun 411*4882a593Smuzhiyun # echo 'enable_event:kmem:kmalloc:1' > \ 412*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger 413*4882a593Smuzhiyun 414*4882a593Smuzhiyun The following trigger causes kmalloc events to stop being traced 415*4882a593Smuzhiyun when a read system call exits. This disablement happens on every 416*4882a593Smuzhiyun read system call exit:: 417*4882a593Smuzhiyun 418*4882a593Smuzhiyun # echo 'disable_event:kmem:kmalloc' > \ 419*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger 420*4882a593Smuzhiyun 421*4882a593Smuzhiyun The format is:: 422*4882a593Smuzhiyun 423*4882a593Smuzhiyun enable_event:<system>:<event>[:count] 424*4882a593Smuzhiyun disable_event:<system>:<event>[:count] 425*4882a593Smuzhiyun 426*4882a593Smuzhiyun To remove the above commands:: 427*4882a593Smuzhiyun 428*4882a593Smuzhiyun # echo '!enable_event:kmem:kmalloc:1' > \ 429*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger 430*4882a593Smuzhiyun 431*4882a593Smuzhiyun # echo '!disable_event:kmem:kmalloc' > \ 432*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger 433*4882a593Smuzhiyun 434*4882a593Smuzhiyun Note that there can be any number of enable/disable_event triggers 435*4882a593Smuzhiyun per triggering event, but there can only be one trigger per 436*4882a593Smuzhiyun triggered event. e.g. sys_enter_read can have triggers enabling both 437*4882a593Smuzhiyun kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc 438*4882a593Smuzhiyun versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if 439*4882a593Smuzhiyun bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they 440*4882a593Smuzhiyun could be combined into a single filter on kmem:kmalloc though). 441*4882a593Smuzhiyun 442*4882a593Smuzhiyun- stacktrace 443*4882a593Smuzhiyun 444*4882a593Smuzhiyun This command dumps a stacktrace in the trace buffer whenever the 445*4882a593Smuzhiyun triggering event occurs. 446*4882a593Smuzhiyun 447*4882a593Smuzhiyun For example, the following trigger dumps a stacktrace every time the 448*4882a593Smuzhiyun kmalloc tracepoint is hit:: 449*4882a593Smuzhiyun 450*4882a593Smuzhiyun # echo 'stacktrace' > \ 451*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger 452*4882a593Smuzhiyun 453*4882a593Smuzhiyun The following trigger dumps a stacktrace the first 5 times a kmalloc 454*4882a593Smuzhiyun request happens with a size >= 64K:: 455*4882a593Smuzhiyun 456*4882a593Smuzhiyun # echo 'stacktrace:5 if bytes_req >= 65536' > \ 457*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger 458*4882a593Smuzhiyun 459*4882a593Smuzhiyun The format is:: 460*4882a593Smuzhiyun 461*4882a593Smuzhiyun stacktrace[:count] 462*4882a593Smuzhiyun 463*4882a593Smuzhiyun To remove the above commands:: 464*4882a593Smuzhiyun 465*4882a593Smuzhiyun # echo '!stacktrace' > \ 466*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger 467*4882a593Smuzhiyun 468*4882a593Smuzhiyun # echo '!stacktrace:5 if bytes_req >= 65536' > \ 469*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger 470*4882a593Smuzhiyun 471*4882a593Smuzhiyun The latter can also be removed more simply by the following (without 472*4882a593Smuzhiyun the filter):: 473*4882a593Smuzhiyun 474*4882a593Smuzhiyun # echo '!stacktrace:5' > \ 475*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger 476*4882a593Smuzhiyun 477*4882a593Smuzhiyun Note that there can be only one stacktrace trigger per triggering 478*4882a593Smuzhiyun event. 479*4882a593Smuzhiyun 480*4882a593Smuzhiyun- snapshot 481*4882a593Smuzhiyun 482*4882a593Smuzhiyun This command causes a snapshot to be triggered whenever the 483*4882a593Smuzhiyun triggering event occurs. 484*4882a593Smuzhiyun 485*4882a593Smuzhiyun The following command creates a snapshot every time a block request 486*4882a593Smuzhiyun queue is unplugged with a depth > 1. If you were tracing a set of 487*4882a593Smuzhiyun events or functions at the time, the snapshot trace buffer would 488*4882a593Smuzhiyun capture those events when the trigger event occurred:: 489*4882a593Smuzhiyun 490*4882a593Smuzhiyun # echo 'snapshot if nr_rq > 1' > \ 491*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/block/block_unplug/trigger 492*4882a593Smuzhiyun 493*4882a593Smuzhiyun To only snapshot once:: 494*4882a593Smuzhiyun 495*4882a593Smuzhiyun # echo 'snapshot:1 if nr_rq > 1' > \ 496*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/block/block_unplug/trigger 497*4882a593Smuzhiyun 498*4882a593Smuzhiyun To remove the above commands:: 499*4882a593Smuzhiyun 500*4882a593Smuzhiyun # echo '!snapshot if nr_rq > 1' > \ 501*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/block/block_unplug/trigger 502*4882a593Smuzhiyun 503*4882a593Smuzhiyun # echo '!snapshot:1 if nr_rq > 1' > \ 504*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/block/block_unplug/trigger 505*4882a593Smuzhiyun 506*4882a593Smuzhiyun Note that there can be only one snapshot trigger per triggering 507*4882a593Smuzhiyun event. 508*4882a593Smuzhiyun 509*4882a593Smuzhiyun- traceon/traceoff 510*4882a593Smuzhiyun 511*4882a593Smuzhiyun These commands turn tracing on and off when the specified events are 512*4882a593Smuzhiyun hit. The parameter determines how many times the tracing system is 513*4882a593Smuzhiyun turned on and off. If unspecified, there is no limit. 514*4882a593Smuzhiyun 515*4882a593Smuzhiyun The following command turns tracing off the first time a block 516*4882a593Smuzhiyun request queue is unplugged with a depth > 1. If you were tracing a 517*4882a593Smuzhiyun set of events or functions at the time, you could then examine the 518*4882a593Smuzhiyun trace buffer to see the sequence of events that led up to the 519*4882a593Smuzhiyun trigger event:: 520*4882a593Smuzhiyun 521*4882a593Smuzhiyun # echo 'traceoff:1 if nr_rq > 1' > \ 522*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/block/block_unplug/trigger 523*4882a593Smuzhiyun 524*4882a593Smuzhiyun To always disable tracing when nr_rq > 1:: 525*4882a593Smuzhiyun 526*4882a593Smuzhiyun # echo 'traceoff if nr_rq > 1' > \ 527*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/block/block_unplug/trigger 528*4882a593Smuzhiyun 529*4882a593Smuzhiyun To remove the above commands:: 530*4882a593Smuzhiyun 531*4882a593Smuzhiyun # echo '!traceoff:1 if nr_rq > 1' > \ 532*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/block/block_unplug/trigger 533*4882a593Smuzhiyun 534*4882a593Smuzhiyun # echo '!traceoff if nr_rq > 1' > \ 535*4882a593Smuzhiyun /sys/kernel/debug/tracing/events/block/block_unplug/trigger 536*4882a593Smuzhiyun 537*4882a593Smuzhiyun Note that there can be only one traceon or traceoff trigger per 538*4882a593Smuzhiyun triggering event. 539*4882a593Smuzhiyun 540*4882a593Smuzhiyun- hist 541*4882a593Smuzhiyun 542*4882a593Smuzhiyun This command aggregates event hits into a hash table keyed on one or 543*4882a593Smuzhiyun more trace event format fields (or stacktrace) and a set of running 544*4882a593Smuzhiyun totals derived from one or more trace event format fields and/or 545*4882a593Smuzhiyun event counts (hitcount). 546*4882a593Smuzhiyun 547*4882a593Smuzhiyun See Documentation/trace/histogram.rst for details and examples. 548*4882a593Smuzhiyun 549*4882a593Smuzhiyun7. In-kernel trace event API 550*4882a593Smuzhiyun============================ 551*4882a593Smuzhiyun 552*4882a593SmuzhiyunIn most cases, the command-line interface to trace events is more than 553*4882a593Smuzhiyunsufficient. Sometimes, however, applications might find the need for 554*4882a593Smuzhiyunmore complex relationships than can be expressed through a simple 555*4882a593Smuzhiyunseries of linked command-line expressions, or putting together sets of 556*4882a593Smuzhiyuncommands may be simply too cumbersome. An example might be an 557*4882a593Smuzhiyunapplication that needs to 'listen' to the trace stream in order to 558*4882a593Smuzhiyunmaintain an in-kernel state machine detecting, for instance, when an 559*4882a593Smuzhiyunillegal kernel state occurs in the scheduler. 560*4882a593Smuzhiyun 561*4882a593SmuzhiyunThe trace event subsystem provides an in-kernel API allowing modules 562*4882a593Smuzhiyunor other kernel code to generate user-defined 'synthetic' events at 563*4882a593Smuzhiyunwill, which can be used to either augment the existing trace stream 564*4882a593Smuzhiyunand/or signal that a particular important state has occurred. 565*4882a593Smuzhiyun 566*4882a593SmuzhiyunA similar in-kernel API is also available for creating kprobe and 567*4882a593Smuzhiyunkretprobe events. 568*4882a593Smuzhiyun 569*4882a593SmuzhiyunBoth the synthetic event and k/ret/probe event APIs are built on top 570*4882a593Smuzhiyunof a lower-level "dynevent_cmd" event command API, which is also 571*4882a593Smuzhiyunavailable for more specialized applications, or as the basis of other 572*4882a593Smuzhiyunhigher-level trace event APIs. 573*4882a593Smuzhiyun 574*4882a593SmuzhiyunThe API provided for these purposes is describe below and allows the 575*4882a593Smuzhiyunfollowing: 576*4882a593Smuzhiyun 577*4882a593Smuzhiyun - dynamically creating synthetic event definitions 578*4882a593Smuzhiyun - dynamically creating kprobe and kretprobe event definitions 579*4882a593Smuzhiyun - tracing synthetic events from in-kernel code 580*4882a593Smuzhiyun - the low-level "dynevent_cmd" API 581*4882a593Smuzhiyun 582*4882a593Smuzhiyun7.1 Dyamically creating synthetic event definitions 583*4882a593Smuzhiyun--------------------------------------------------- 584*4882a593Smuzhiyun 585*4882a593SmuzhiyunThere are a couple ways to create a new synthetic event from a kernel 586*4882a593Smuzhiyunmodule or other kernel code. 587*4882a593Smuzhiyun 588*4882a593SmuzhiyunThe first creates the event in one step, using synth_event_create(). 589*4882a593SmuzhiyunIn this method, the name of the event to create and an array defining 590*4882a593Smuzhiyunthe fields is supplied to synth_event_create(). If successful, a 591*4882a593Smuzhiyunsynthetic event with that name and fields will exist following that 592*4882a593Smuzhiyuncall. For example, to create a new "schedtest" synthetic event:: 593*4882a593Smuzhiyun 594*4882a593Smuzhiyun ret = synth_event_create("schedtest", sched_fields, 595*4882a593Smuzhiyun ARRAY_SIZE(sched_fields), THIS_MODULE); 596*4882a593Smuzhiyun 597*4882a593SmuzhiyunThe sched_fields param in this example points to an array of struct 598*4882a593Smuzhiyunsynth_field_desc, each of which describes an event field by type and 599*4882a593Smuzhiyunname:: 600*4882a593Smuzhiyun 601*4882a593Smuzhiyun static struct synth_field_desc sched_fields[] = { 602*4882a593Smuzhiyun { .type = "pid_t", .name = "next_pid_field" }, 603*4882a593Smuzhiyun { .type = "char[16]", .name = "next_comm_field" }, 604*4882a593Smuzhiyun { .type = "u64", .name = "ts_ns" }, 605*4882a593Smuzhiyun { .type = "u64", .name = "ts_ms" }, 606*4882a593Smuzhiyun { .type = "unsigned int", .name = "cpu" }, 607*4882a593Smuzhiyun { .type = "char[64]", .name = "my_string_field" }, 608*4882a593Smuzhiyun { .type = "int", .name = "my_int_field" }, 609*4882a593Smuzhiyun }; 610*4882a593Smuzhiyun 611*4882a593SmuzhiyunSee synth_field_size() for available types. 612*4882a593Smuzhiyun 613*4882a593SmuzhiyunIf field_name contains [n], the field is considered to be a static array. 614*4882a593Smuzhiyun 615*4882a593SmuzhiyunIf field_names contains[] (no subscript), the field is considered to 616*4882a593Smuzhiyunbe a dynamic array, which will only take as much space in the event as 617*4882a593Smuzhiyunis required to hold the array. 618*4882a593Smuzhiyun 619*4882a593SmuzhiyunBecause space for an event is reserved before assigning field values 620*4882a593Smuzhiyunto the event, using dynamic arrays implies that the piecewise 621*4882a593Smuzhiyunin-kernel API described below can't be used with dynamic arrays. The 622*4882a593Smuzhiyunother non-piecewise in-kernel APIs can, however, be used with dynamic 623*4882a593Smuzhiyunarrays. 624*4882a593Smuzhiyun 625*4882a593SmuzhiyunIf the event is created from within a module, a pointer to the module 626*4882a593Smuzhiyunmust be passed to synth_event_create(). This will ensure that the 627*4882a593Smuzhiyuntrace buffer won't contain unreadable events when the module is 628*4882a593Smuzhiyunremoved. 629*4882a593Smuzhiyun 630*4882a593SmuzhiyunAt this point, the event object is ready to be used for generating new 631*4882a593Smuzhiyunevents. 632*4882a593Smuzhiyun 633*4882a593SmuzhiyunIn the second method, the event is created in several steps. This 634*4882a593Smuzhiyunallows events to be created dynamically and without the need to create 635*4882a593Smuzhiyunand populate an array of fields beforehand. 636*4882a593Smuzhiyun 637*4882a593SmuzhiyunTo use this method, an empty or partially empty synthetic event should 638*4882a593Smuzhiyunfirst be created using synth_event_gen_cmd_start() or 639*4882a593Smuzhiyunsynth_event_gen_cmd_array_start(). For synth_event_gen_cmd_start(), 640*4882a593Smuzhiyunthe name of the event along with one or more pairs of args each pair 641*4882a593Smuzhiyunrepresenting a 'type field_name;' field specification should be 642*4882a593Smuzhiyunsupplied. For synth_event_gen_cmd_array_start(), the name of the 643*4882a593Smuzhiyunevent along with an array of struct synth_field_desc should be 644*4882a593Smuzhiyunsupplied. Before calling synth_event_gen_cmd_start() or 645*4882a593Smuzhiyunsynth_event_gen_cmd_array_start(), the user should create and 646*4882a593Smuzhiyuninitialize a dynevent_cmd object using synth_event_cmd_init(). 647*4882a593Smuzhiyun 648*4882a593SmuzhiyunFor example, to create a new "schedtest" synthetic event with two 649*4882a593Smuzhiyunfields:: 650*4882a593Smuzhiyun 651*4882a593Smuzhiyun struct dynevent_cmd cmd; 652*4882a593Smuzhiyun char *buf; 653*4882a593Smuzhiyun 654*4882a593Smuzhiyun /* Create a buffer to hold the generated command */ 655*4882a593Smuzhiyun buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL); 656*4882a593Smuzhiyun 657*4882a593Smuzhiyun /* Before generating the command, initialize the cmd object */ 658*4882a593Smuzhiyun synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN); 659*4882a593Smuzhiyun 660*4882a593Smuzhiyun ret = synth_event_gen_cmd_start(&cmd, "schedtest", THIS_MODULE, 661*4882a593Smuzhiyun "pid_t", "next_pid_field", 662*4882a593Smuzhiyun "u64", "ts_ns"); 663*4882a593Smuzhiyun 664*4882a593SmuzhiyunAlternatively, using an array of struct synth_field_desc fields 665*4882a593Smuzhiyuncontaining the same information:: 666*4882a593Smuzhiyun 667*4882a593Smuzhiyun ret = synth_event_gen_cmd_array_start(&cmd, "schedtest", THIS_MODULE, 668*4882a593Smuzhiyun fields, n_fields); 669*4882a593Smuzhiyun 670*4882a593SmuzhiyunOnce the synthetic event object has been created, it can then be 671*4882a593Smuzhiyunpopulated with more fields. Fields are added one by one using 672*4882a593Smuzhiyunsynth_event_add_field(), supplying the dynevent_cmd object, a field 673*4882a593Smuzhiyuntype, and a field name. For example, to add a new int field named 674*4882a593Smuzhiyun"intfield", the following call should be made:: 675*4882a593Smuzhiyun 676*4882a593Smuzhiyun ret = synth_event_add_field(&cmd, "int", "intfield"); 677*4882a593Smuzhiyun 678*4882a593SmuzhiyunSee synth_field_size() for available types. If field_name contains [n] 679*4882a593Smuzhiyunthe field is considered to be an array. 680*4882a593Smuzhiyun 681*4882a593SmuzhiyunA group of fields can also be added all at once using an array of 682*4882a593Smuzhiyunsynth_field_desc with add_synth_fields(). For example, this would add 683*4882a593Smuzhiyunjust the first four sched_fields:: 684*4882a593Smuzhiyun 685*4882a593Smuzhiyun ret = synth_event_add_fields(&cmd, sched_fields, 4); 686*4882a593Smuzhiyun 687*4882a593SmuzhiyunIf you already have a string of the form 'type field_name', 688*4882a593Smuzhiyunsynth_event_add_field_str() can be used to add it as-is; it will 689*4882a593Smuzhiyunalso automatically append a ';' to the string. 690*4882a593Smuzhiyun 691*4882a593SmuzhiyunOnce all the fields have been added, the event should be finalized and 692*4882a593Smuzhiyunregistered by calling the synth_event_gen_cmd_end() function:: 693*4882a593Smuzhiyun 694*4882a593Smuzhiyun ret = synth_event_gen_cmd_end(&cmd); 695*4882a593Smuzhiyun 696*4882a593SmuzhiyunAt this point, the event object is ready to be used for tracing new 697*4882a593Smuzhiyunevents. 698*4882a593Smuzhiyun 699*4882a593Smuzhiyun7.2 Tracing synthetic events from in-kernel code 700*4882a593Smuzhiyun------------------------------------------------ 701*4882a593Smuzhiyun 702*4882a593SmuzhiyunTo trace a synthetic event, there are several options. The first 703*4882a593Smuzhiyunoption is to trace the event in one call, using synth_event_trace() 704*4882a593Smuzhiyunwith a variable number of values, or synth_event_trace_array() with an 705*4882a593Smuzhiyunarray of values to be set. A second option can be used to avoid the 706*4882a593Smuzhiyunneed for a pre-formed array of values or list of arguments, via 707*4882a593Smuzhiyunsynth_event_trace_start() and synth_event_trace_end() along with 708*4882a593Smuzhiyunsynth_event_add_next_val() or synth_event_add_val() to add the values 709*4882a593Smuzhiyunpiecewise. 710*4882a593Smuzhiyun 711*4882a593Smuzhiyun7.2.1 Tracing a synthetic event all at once 712*4882a593Smuzhiyun------------------------------------------- 713*4882a593Smuzhiyun 714*4882a593SmuzhiyunTo trace a synthetic event all at once, the synth_event_trace() or 715*4882a593Smuzhiyunsynth_event_trace_array() functions can be used. 716*4882a593Smuzhiyun 717*4882a593SmuzhiyunThe synth_event_trace() function is passed the trace_event_file 718*4882a593Smuzhiyunrepresenting the synthetic event (which can be retrieved using 719*4882a593Smuzhiyuntrace_get_event_file() using the synthetic event name, "synthetic" as 720*4882a593Smuzhiyunthe system name, and the trace instance name (NULL if using the global 721*4882a593Smuzhiyuntrace array)), along with an variable number of u64 args, one for each 722*4882a593Smuzhiyunsynthetic event field, and the number of values being passed. 723*4882a593Smuzhiyun 724*4882a593SmuzhiyunSo, to trace an event corresponding to the synthetic event definition 725*4882a593Smuzhiyunabove, code like the following could be used:: 726*4882a593Smuzhiyun 727*4882a593Smuzhiyun ret = synth_event_trace(create_synth_test, 7, /* number of values */ 728*4882a593Smuzhiyun 444, /* next_pid_field */ 729*4882a593Smuzhiyun (u64)"clackers", /* next_comm_field */ 730*4882a593Smuzhiyun 1000000, /* ts_ns */ 731*4882a593Smuzhiyun 1000, /* ts_ms */ 732*4882a593Smuzhiyun smp_processor_id(),/* cpu */ 733*4882a593Smuzhiyun (u64)"Thneed", /* my_string_field */ 734*4882a593Smuzhiyun 999); /* my_int_field */ 735*4882a593Smuzhiyun 736*4882a593SmuzhiyunAll vals should be cast to u64, and string vals are just pointers to 737*4882a593Smuzhiyunstrings, cast to u64. Strings will be copied into space reserved in 738*4882a593Smuzhiyunthe event for the string, using these pointers. 739*4882a593Smuzhiyun 740*4882a593SmuzhiyunAlternatively, the synth_event_trace_array() function can be used to 741*4882a593Smuzhiyunaccomplish the same thing. It is passed the trace_event_file 742*4882a593Smuzhiyunrepresenting the synthetic event (which can be retrieved using 743*4882a593Smuzhiyuntrace_get_event_file() using the synthetic event name, "synthetic" as 744*4882a593Smuzhiyunthe system name, and the trace instance name (NULL if using the global 745*4882a593Smuzhiyuntrace array)), along with an array of u64, one for each synthetic 746*4882a593Smuzhiyunevent field. 747*4882a593Smuzhiyun 748*4882a593SmuzhiyunTo trace an event corresponding to the synthetic event definition 749*4882a593Smuzhiyunabove, code like the following could be used:: 750*4882a593Smuzhiyun 751*4882a593Smuzhiyun u64 vals[7]; 752*4882a593Smuzhiyun 753*4882a593Smuzhiyun vals[0] = 777; /* next_pid_field */ 754*4882a593Smuzhiyun vals[1] = (u64)"tiddlywinks"; /* next_comm_field */ 755*4882a593Smuzhiyun vals[2] = 1000000; /* ts_ns */ 756*4882a593Smuzhiyun vals[3] = 1000; /* ts_ms */ 757*4882a593Smuzhiyun vals[4] = smp_processor_id(); /* cpu */ 758*4882a593Smuzhiyun vals[5] = (u64)"thneed"; /* my_string_field */ 759*4882a593Smuzhiyun vals[6] = 398; /* my_int_field */ 760*4882a593Smuzhiyun 761*4882a593SmuzhiyunThe 'vals' array is just an array of u64, the number of which must 762*4882a593Smuzhiyunmatch the number of field in the synthetic event, and which must be in 763*4882a593Smuzhiyunthe same order as the synthetic event fields. 764*4882a593Smuzhiyun 765*4882a593SmuzhiyunAll vals should be cast to u64, and string vals are just pointers to 766*4882a593Smuzhiyunstrings, cast to u64. Strings will be copied into space reserved in 767*4882a593Smuzhiyunthe event for the string, using these pointers. 768*4882a593Smuzhiyun 769*4882a593SmuzhiyunIn order to trace a synthetic event, a pointer to the trace event file 770*4882a593Smuzhiyunis needed. The trace_get_event_file() function can be used to get 771*4882a593Smuzhiyunit - it will find the file in the given trace instance (in this case 772*4882a593SmuzhiyunNULL since the top trace array is being used) while at the same time 773*4882a593Smuzhiyunpreventing the instance containing it from going away:: 774*4882a593Smuzhiyun 775*4882a593Smuzhiyun schedtest_event_file = trace_get_event_file(NULL, "synthetic", 776*4882a593Smuzhiyun "schedtest"); 777*4882a593Smuzhiyun 778*4882a593SmuzhiyunBefore tracing the event, it should be enabled in some way, otherwise 779*4882a593Smuzhiyunthe synthetic event won't actually show up in the trace buffer. 780*4882a593Smuzhiyun 781*4882a593SmuzhiyunTo enable a synthetic event from the kernel, trace_array_set_clr_event() 782*4882a593Smuzhiyuncan be used (which is not specific to synthetic events, so does need 783*4882a593Smuzhiyunthe "synthetic" system name to be specified explicitly). 784*4882a593Smuzhiyun 785*4882a593SmuzhiyunTo enable the event, pass 'true' to it:: 786*4882a593Smuzhiyun 787*4882a593Smuzhiyun trace_array_set_clr_event(schedtest_event_file->tr, 788*4882a593Smuzhiyun "synthetic", "schedtest", true); 789*4882a593Smuzhiyun 790*4882a593SmuzhiyunTo disable it pass false:: 791*4882a593Smuzhiyun 792*4882a593Smuzhiyun trace_array_set_clr_event(schedtest_event_file->tr, 793*4882a593Smuzhiyun "synthetic", "schedtest", false); 794*4882a593Smuzhiyun 795*4882a593SmuzhiyunFinally, synth_event_trace_array() can be used to actually trace the 796*4882a593Smuzhiyunevent, which should be visible in the trace buffer afterwards:: 797*4882a593Smuzhiyun 798*4882a593Smuzhiyun ret = synth_event_trace_array(schedtest_event_file, vals, 799*4882a593Smuzhiyun ARRAY_SIZE(vals)); 800*4882a593Smuzhiyun 801*4882a593SmuzhiyunTo remove the synthetic event, the event should be disabled, and the 802*4882a593Smuzhiyuntrace instance should be 'put' back using trace_put_event_file():: 803*4882a593Smuzhiyun 804*4882a593Smuzhiyun trace_array_set_clr_event(schedtest_event_file->tr, 805*4882a593Smuzhiyun "synthetic", "schedtest", false); 806*4882a593Smuzhiyun trace_put_event_file(schedtest_event_file); 807*4882a593Smuzhiyun 808*4882a593SmuzhiyunIf those have been successful, synth_event_delete() can be called to 809*4882a593Smuzhiyunremove the event:: 810*4882a593Smuzhiyun 811*4882a593Smuzhiyun ret = synth_event_delete("schedtest"); 812*4882a593Smuzhiyun 813*4882a593Smuzhiyun7.2.2 Tracing a synthetic event piecewise 814*4882a593Smuzhiyun----------------------------------------- 815*4882a593Smuzhiyun 816*4882a593SmuzhiyunTo trace a synthetic using the piecewise method described above, the 817*4882a593Smuzhiyunsynth_event_trace_start() function is used to 'open' the synthetic 818*4882a593Smuzhiyunevent trace:: 819*4882a593Smuzhiyun 820*4882a593Smuzhiyun struct synth_trace_state trace_state; 821*4882a593Smuzhiyun 822*4882a593Smuzhiyun ret = synth_event_trace_start(schedtest_event_file, &trace_state); 823*4882a593Smuzhiyun 824*4882a593SmuzhiyunIt's passed the trace_event_file representing the synthetic event 825*4882a593Smuzhiyunusing the same methods as described above, along with a pointer to a 826*4882a593Smuzhiyunstruct synth_trace_state object, which will be zeroed before use and 827*4882a593Smuzhiyunused to maintain state between this and following calls. 828*4882a593Smuzhiyun 829*4882a593SmuzhiyunOnce the event has been opened, which means space for it has been 830*4882a593Smuzhiyunreserved in the trace buffer, the individual fields can be set. There 831*4882a593Smuzhiyunare two ways to do that, either one after another for each field in 832*4882a593Smuzhiyunthe event, which requires no lookups, or by name, which does. The 833*4882a593Smuzhiyuntradeoff is flexibility in doing the assignments vs the cost of a 834*4882a593Smuzhiyunlookup per field. 835*4882a593Smuzhiyun 836*4882a593SmuzhiyunTo assign the values one after the other without lookups, 837*4882a593Smuzhiyunsynth_event_add_next_val() should be used. Each call is passed the 838*4882a593Smuzhiyunsame synth_trace_state object used in the synth_event_trace_start(), 839*4882a593Smuzhiyunalong with the value to set the next field in the event. After each 840*4882a593Smuzhiyunfield is set, the 'cursor' points to the next field, which will be set 841*4882a593Smuzhiyunby the subsequent call, continuing until all the fields have been set 842*4882a593Smuzhiyunin order. The same sequence of calls as in the above examples using 843*4882a593Smuzhiyunthis method would be (without error-handling code):: 844*4882a593Smuzhiyun 845*4882a593Smuzhiyun /* next_pid_field */ 846*4882a593Smuzhiyun ret = synth_event_add_next_val(777, &trace_state); 847*4882a593Smuzhiyun 848*4882a593Smuzhiyun /* next_comm_field */ 849*4882a593Smuzhiyun ret = synth_event_add_next_val((u64)"slinky", &trace_state); 850*4882a593Smuzhiyun 851*4882a593Smuzhiyun /* ts_ns */ 852*4882a593Smuzhiyun ret = synth_event_add_next_val(1000000, &trace_state); 853*4882a593Smuzhiyun 854*4882a593Smuzhiyun /* ts_ms */ 855*4882a593Smuzhiyun ret = synth_event_add_next_val(1000, &trace_state); 856*4882a593Smuzhiyun 857*4882a593Smuzhiyun /* cpu */ 858*4882a593Smuzhiyun ret = synth_event_add_next_val(smp_processor_id(), &trace_state); 859*4882a593Smuzhiyun 860*4882a593Smuzhiyun /* my_string_field */ 861*4882a593Smuzhiyun ret = synth_event_add_next_val((u64)"thneed_2.01", &trace_state); 862*4882a593Smuzhiyun 863*4882a593Smuzhiyun /* my_int_field */ 864*4882a593Smuzhiyun ret = synth_event_add_next_val(395, &trace_state); 865*4882a593Smuzhiyun 866*4882a593SmuzhiyunTo assign the values in any order, synth_event_add_val() should be 867*4882a593Smuzhiyunused. Each call is passed the same synth_trace_state object used in 868*4882a593Smuzhiyunthe synth_event_trace_start(), along with the field name of the field 869*4882a593Smuzhiyunto set and the value to set it to. The same sequence of calls as in 870*4882a593Smuzhiyunthe above examples using this method would be (without error-handling 871*4882a593Smuzhiyuncode):: 872*4882a593Smuzhiyun 873*4882a593Smuzhiyun ret = synth_event_add_val("next_pid_field", 777, &trace_state); 874*4882a593Smuzhiyun ret = synth_event_add_val("next_comm_field", (u64)"silly putty", 875*4882a593Smuzhiyun &trace_state); 876*4882a593Smuzhiyun ret = synth_event_add_val("ts_ns", 1000000, &trace_state); 877*4882a593Smuzhiyun ret = synth_event_add_val("ts_ms", 1000, &trace_state); 878*4882a593Smuzhiyun ret = synth_event_add_val("cpu", smp_processor_id(), &trace_state); 879*4882a593Smuzhiyun ret = synth_event_add_val("my_string_field", (u64)"thneed_9", 880*4882a593Smuzhiyun &trace_state); 881*4882a593Smuzhiyun ret = synth_event_add_val("my_int_field", 3999, &trace_state); 882*4882a593Smuzhiyun 883*4882a593SmuzhiyunNote that synth_event_add_next_val() and synth_event_add_val() are 884*4882a593Smuzhiyunincompatible if used within the same trace of an event - either one 885*4882a593Smuzhiyuncan be used but not both at the same time. 886*4882a593Smuzhiyun 887*4882a593SmuzhiyunFinally, the event won't be actually traced until it's 'closed', 888*4882a593Smuzhiyunwhich is done using synth_event_trace_end(), which takes only the 889*4882a593Smuzhiyunstruct synth_trace_state object used in the previous calls:: 890*4882a593Smuzhiyun 891*4882a593Smuzhiyun ret = synth_event_trace_end(&trace_state); 892*4882a593Smuzhiyun 893*4882a593SmuzhiyunNote that synth_event_trace_end() must be called at the end regardless 894*4882a593Smuzhiyunof whether any of the add calls failed (say due to a bad field name 895*4882a593Smuzhiyunbeing passed in). 896*4882a593Smuzhiyun 897*4882a593Smuzhiyun7.3 Dyamically creating kprobe and kretprobe event definitions 898*4882a593Smuzhiyun-------------------------------------------------------------- 899*4882a593Smuzhiyun 900*4882a593SmuzhiyunTo create a kprobe or kretprobe trace event from kernel code, the 901*4882a593Smuzhiyunkprobe_event_gen_cmd_start() or kretprobe_event_gen_cmd_start() 902*4882a593Smuzhiyunfunctions can be used. 903*4882a593Smuzhiyun 904*4882a593SmuzhiyunTo create a kprobe event, an empty or partially empty kprobe event 905*4882a593Smuzhiyunshould first be created using kprobe_event_gen_cmd_start(). The name 906*4882a593Smuzhiyunof the event and the probe location should be specfied along with one 907*4882a593Smuzhiyunor args each representing a probe field should be supplied to this 908*4882a593Smuzhiyunfunction. Before calling kprobe_event_gen_cmd_start(), the user 909*4882a593Smuzhiyunshould create and initialize a dynevent_cmd object using 910*4882a593Smuzhiyunkprobe_event_cmd_init(). 911*4882a593Smuzhiyun 912*4882a593SmuzhiyunFor example, to create a new "schedtest" kprobe event with two fields:: 913*4882a593Smuzhiyun 914*4882a593Smuzhiyun struct dynevent_cmd cmd; 915*4882a593Smuzhiyun char *buf; 916*4882a593Smuzhiyun 917*4882a593Smuzhiyun /* Create a buffer to hold the generated command */ 918*4882a593Smuzhiyun buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL); 919*4882a593Smuzhiyun 920*4882a593Smuzhiyun /* Before generating the command, initialize the cmd object */ 921*4882a593Smuzhiyun kprobe_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN); 922*4882a593Smuzhiyun 923*4882a593Smuzhiyun /* 924*4882a593Smuzhiyun * Define the gen_kprobe_test event with the first 2 kprobe 925*4882a593Smuzhiyun * fields. 926*4882a593Smuzhiyun */ 927*4882a593Smuzhiyun ret = kprobe_event_gen_cmd_start(&cmd, "gen_kprobe_test", "do_sys_open", 928*4882a593Smuzhiyun "dfd=%ax", "filename=%dx"); 929*4882a593Smuzhiyun 930*4882a593SmuzhiyunOnce the kprobe event object has been created, it can then be 931*4882a593Smuzhiyunpopulated with more fields. Fields can be added using 932*4882a593Smuzhiyunkprobe_event_add_fields(), supplying the dynevent_cmd object along 933*4882a593Smuzhiyunwith a variable arg list of probe fields. For example, to add a 934*4882a593Smuzhiyuncouple additional fields, the following call could be made:: 935*4882a593Smuzhiyun 936*4882a593Smuzhiyun ret = kprobe_event_add_fields(&cmd, "flags=%cx", "mode=+4($stack)"); 937*4882a593Smuzhiyun 938*4882a593SmuzhiyunOnce all the fields have been added, the event should be finalized and 939*4882a593Smuzhiyunregistered by calling the kprobe_event_gen_cmd_end() or 940*4882a593Smuzhiyunkretprobe_event_gen_cmd_end() functions, depending on whether a kprobe 941*4882a593Smuzhiyunor kretprobe command was started:: 942*4882a593Smuzhiyun 943*4882a593Smuzhiyun ret = kprobe_event_gen_cmd_end(&cmd); 944*4882a593Smuzhiyun 945*4882a593Smuzhiyunor:: 946*4882a593Smuzhiyun 947*4882a593Smuzhiyun ret = kretprobe_event_gen_cmd_end(&cmd); 948*4882a593Smuzhiyun 949*4882a593SmuzhiyunAt this point, the event object is ready to be used for tracing new 950*4882a593Smuzhiyunevents. 951*4882a593Smuzhiyun 952*4882a593SmuzhiyunSimilarly, a kretprobe event can be created using 953*4882a593Smuzhiyunkretprobe_event_gen_cmd_start() with a probe name and location and 954*4882a593Smuzhiyunadditional params such as $retval:: 955*4882a593Smuzhiyun 956*4882a593Smuzhiyun ret = kretprobe_event_gen_cmd_start(&cmd, "gen_kretprobe_test", 957*4882a593Smuzhiyun "do_sys_open", "$retval"); 958*4882a593Smuzhiyun 959*4882a593SmuzhiyunSimilar to the synthetic event case, code like the following can be 960*4882a593Smuzhiyunused to enable the newly created kprobe event:: 961*4882a593Smuzhiyun 962*4882a593Smuzhiyun gen_kprobe_test = trace_get_event_file(NULL, "kprobes", "gen_kprobe_test"); 963*4882a593Smuzhiyun 964*4882a593Smuzhiyun ret = trace_array_set_clr_event(gen_kprobe_test->tr, 965*4882a593Smuzhiyun "kprobes", "gen_kprobe_test", true); 966*4882a593Smuzhiyun 967*4882a593SmuzhiyunFinally, also similar to synthetic events, the following code can be 968*4882a593Smuzhiyunused to give the kprobe event file back and delete the event:: 969*4882a593Smuzhiyun 970*4882a593Smuzhiyun trace_put_event_file(gen_kprobe_test); 971*4882a593Smuzhiyun 972*4882a593Smuzhiyun ret = kprobe_event_delete("gen_kprobe_test"); 973*4882a593Smuzhiyun 974*4882a593Smuzhiyun7.4 The "dynevent_cmd" low-level API 975*4882a593Smuzhiyun------------------------------------ 976*4882a593Smuzhiyun 977*4882a593SmuzhiyunBoth the in-kernel synthetic event and kprobe interfaces are built on 978*4882a593Smuzhiyuntop of a lower-level "dynevent_cmd" interface. This interface is 979*4882a593Smuzhiyunmeant to provide the basis for higher-level interfaces such as the 980*4882a593Smuzhiyunsynthetic and kprobe interfaces, which can be used as examples. 981*4882a593Smuzhiyun 982*4882a593SmuzhiyunThe basic idea is simple and amounts to providing a general-purpose 983*4882a593Smuzhiyunlayer that can be used to generate trace event commands. The 984*4882a593Smuzhiyungenerated command strings can then be passed to the command-parsing 985*4882a593Smuzhiyunand event creation code that already exists in the trace event 986*4882a593Smuzhiyunsubystem for creating the corresponding trace events. 987*4882a593Smuzhiyun 988*4882a593SmuzhiyunIn a nutshell, the way it works is that the higher-level interface 989*4882a593Smuzhiyuncode creates a struct dynevent_cmd object, then uses a couple 990*4882a593Smuzhiyunfunctions, dynevent_arg_add() and dynevent_arg_pair_add() to build up 991*4882a593Smuzhiyuna command string, which finally causes the command to be executed 992*4882a593Smuzhiyunusing the dynevent_create() function. The details of the interface 993*4882a593Smuzhiyunare described below. 994*4882a593Smuzhiyun 995*4882a593SmuzhiyunThe first step in building a new command string is to create and 996*4882a593Smuzhiyuninitialize an instance of a dynevent_cmd. Here, for instance, we 997*4882a593Smuzhiyuncreate a dynevent_cmd on the stack and initialize it:: 998*4882a593Smuzhiyun 999*4882a593Smuzhiyun struct dynevent_cmd cmd; 1000*4882a593Smuzhiyun char *buf; 1001*4882a593Smuzhiyun int ret; 1002*4882a593Smuzhiyun 1003*4882a593Smuzhiyun buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL); 1004*4882a593Smuzhiyun 1005*4882a593Smuzhiyun dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_FOO, 1006*4882a593Smuzhiyun foo_event_run_command); 1007*4882a593Smuzhiyun 1008*4882a593SmuzhiyunThe dynevent_cmd initialization needs to be given a user-specified 1009*4882a593Smuzhiyunbuffer and the length of the buffer (MAX_DYNEVENT_CMD_LEN can be used 1010*4882a593Smuzhiyunfor this purpose - at 2k it's generally too big to be comfortably put 1011*4882a593Smuzhiyunon the stack, so is dynamically allocated), a dynevent type id, which 1012*4882a593Smuzhiyunis meant to be used to check that further API calls are for the 1013*4882a593Smuzhiyuncorrect command type, and a pointer to an event-specific run_command() 1014*4882a593Smuzhiyuncallback that will be called to actually execute the event-specific 1015*4882a593Smuzhiyuncommand function. 1016*4882a593Smuzhiyun 1017*4882a593SmuzhiyunOnce that's done, the command string can by built up by successive 1018*4882a593Smuzhiyuncalls to argument-adding functions. 1019*4882a593Smuzhiyun 1020*4882a593SmuzhiyunTo add a single argument, define and initialize a struct dynevent_arg 1021*4882a593Smuzhiyunor struct dynevent_arg_pair object. Here's an example of the simplest 1022*4882a593Smuzhiyunpossible arg addition, which is simply to append the given string as 1023*4882a593Smuzhiyuna whitespace-separated argument to the command:: 1024*4882a593Smuzhiyun 1025*4882a593Smuzhiyun struct dynevent_arg arg; 1026*4882a593Smuzhiyun 1027*4882a593Smuzhiyun dynevent_arg_init(&arg, NULL, 0); 1028*4882a593Smuzhiyun 1029*4882a593Smuzhiyun arg.str = name; 1030*4882a593Smuzhiyun 1031*4882a593Smuzhiyun ret = dynevent_arg_add(cmd, &arg); 1032*4882a593Smuzhiyun 1033*4882a593SmuzhiyunThe arg object is first initialized using dynevent_arg_init() and in 1034*4882a593Smuzhiyunthis case the parameters are NULL or 0, which means there's no 1035*4882a593Smuzhiyunoptional sanity-checking function or separator appended to the end of 1036*4882a593Smuzhiyunthe arg. 1037*4882a593Smuzhiyun 1038*4882a593SmuzhiyunHere's another more complicated example using an 'arg pair', which is 1039*4882a593Smuzhiyunused to create an argument that consists of a couple components added 1040*4882a593Smuzhiyuntogether as a unit, for example, a 'type field_name;' arg or a simple 1041*4882a593Smuzhiyunexpression arg e.g. 'flags=%cx':: 1042*4882a593Smuzhiyun 1043*4882a593Smuzhiyun struct dynevent_arg_pair arg_pair; 1044*4882a593Smuzhiyun 1045*4882a593Smuzhiyun dynevent_arg_pair_init(&arg_pair, dynevent_foo_check_arg_fn, 0, ';'); 1046*4882a593Smuzhiyun 1047*4882a593Smuzhiyun arg_pair.lhs = type; 1048*4882a593Smuzhiyun arg_pair.rhs = name; 1049*4882a593Smuzhiyun 1050*4882a593Smuzhiyun ret = dynevent_arg_pair_add(cmd, &arg_pair); 1051*4882a593Smuzhiyun 1052*4882a593SmuzhiyunAgain, the arg_pair is first initialized, in this case with a callback 1053*4882a593Smuzhiyunfunction used to check the sanity of the args (for example, that 1054*4882a593Smuzhiyunneither part of the pair is NULL), along with a character to be used 1055*4882a593Smuzhiyunto add an operator between the pair (here none) and a separator to be 1056*4882a593Smuzhiyunappended onto the end of the arg pair (here ';'). 1057*4882a593Smuzhiyun 1058*4882a593SmuzhiyunThere's also a dynevent_str_add() function that can be used to simply 1059*4882a593Smuzhiyunadd a string as-is, with no spaces, delimeters, or arg check. 1060*4882a593Smuzhiyun 1061*4882a593SmuzhiyunAny number of dynevent_*_add() calls can be made to build up the string 1062*4882a593Smuzhiyun(until its length surpasses cmd->maxlen). When all the arguments have 1063*4882a593Smuzhiyunbeen added and the command string is complete, the only thing left to 1064*4882a593Smuzhiyundo is run the command, which happens by simply calling 1065*4882a593Smuzhiyundynevent_create():: 1066*4882a593Smuzhiyun 1067*4882a593Smuzhiyun ret = dynevent_create(&cmd); 1068*4882a593Smuzhiyun 1069*4882a593SmuzhiyunAt that point, if the return value is 0, the dynamic event has been 1070*4882a593Smuzhiyuncreated and is ready to use. 1071*4882a593Smuzhiyun 1072*4882a593SmuzhiyunSee the dynevent_cmd function definitions themselves for the details 1073*4882a593Smuzhiyunof the API. 1074