xref: /OK3568_Linux_fs/kernel/Documentation/trace/events.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=============
2*4882a593SmuzhiyunEvent Tracing
3*4882a593Smuzhiyun=============
4*4882a593Smuzhiyun
5*4882a593Smuzhiyun:Author: Theodore Ts'o
6*4882a593Smuzhiyun:Updated: Li Zefan and Tom Zanussi
7*4882a593Smuzhiyun
8*4882a593Smuzhiyun1. Introduction
9*4882a593Smuzhiyun===============
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunTracepoints (see Documentation/trace/tracepoints.rst) can be used
12*4882a593Smuzhiyunwithout creating custom kernel modules to register probe functions
13*4882a593Smuzhiyunusing the event tracing infrastructure.
14*4882a593Smuzhiyun
15*4882a593SmuzhiyunNot all tracepoints can be traced using the event tracing system;
16*4882a593Smuzhiyunthe kernel developer must provide code snippets which define how the
17*4882a593Smuzhiyuntracing information is saved into the tracing buffer, and how the
18*4882a593Smuzhiyuntracing information should be printed.
19*4882a593Smuzhiyun
20*4882a593Smuzhiyun2. Using Event Tracing
21*4882a593Smuzhiyun======================
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun2.1 Via the 'set_event' interface
24*4882a593Smuzhiyun---------------------------------
25*4882a593Smuzhiyun
26*4882a593SmuzhiyunThe events which are available for tracing can be found in the file
27*4882a593Smuzhiyun/sys/kernel/debug/tracing/available_events.
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunTo enable a particular event, such as 'sched_wakeup', simply echo it
30*4882a593Smuzhiyunto /sys/kernel/debug/tracing/set_event. For example::
31*4882a593Smuzhiyun
32*4882a593Smuzhiyun	# echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
33*4882a593Smuzhiyun
34*4882a593Smuzhiyun.. Note:: '>>' is necessary, otherwise it will firstly disable all the events.
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunTo disable an event, echo the event name to the set_event file prefixed
37*4882a593Smuzhiyunwith an exclamation point::
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun	# echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
40*4882a593Smuzhiyun
41*4882a593SmuzhiyunTo disable all events, echo an empty line to the set_event file::
42*4882a593Smuzhiyun
43*4882a593Smuzhiyun	# echo > /sys/kernel/debug/tracing/set_event
44*4882a593Smuzhiyun
45*4882a593SmuzhiyunTo enable all events, echo ``*:*`` or ``*:`` to the set_event file::
46*4882a593Smuzhiyun
47*4882a593Smuzhiyun	# echo *:* > /sys/kernel/debug/tracing/set_event
48*4882a593Smuzhiyun
49*4882a593SmuzhiyunThe events are organized into subsystems, such as ext4, irq, sched,
50*4882a593Smuzhiyunetc., and a full event name looks like this: <subsystem>:<event>.  The
51*4882a593Smuzhiyunsubsystem name is optional, but it is displayed in the available_events
52*4882a593Smuzhiyunfile.  All of the events in a subsystem can be specified via the syntax
53*4882a593Smuzhiyun``<subsystem>:*``; for example, to enable all irq events, you can use the
54*4882a593Smuzhiyuncommand::
55*4882a593Smuzhiyun
56*4882a593Smuzhiyun	# echo 'irq:*' > /sys/kernel/debug/tracing/set_event
57*4882a593Smuzhiyun
58*4882a593Smuzhiyun2.2 Via the 'enable' toggle
59*4882a593Smuzhiyun---------------------------
60*4882a593Smuzhiyun
61*4882a593SmuzhiyunThe events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy
62*4882a593Smuzhiyunof directories.
63*4882a593Smuzhiyun
64*4882a593SmuzhiyunTo enable event 'sched_wakeup'::
65*4882a593Smuzhiyun
66*4882a593Smuzhiyun	# echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
67*4882a593Smuzhiyun
68*4882a593SmuzhiyunTo disable it::
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun	# echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
71*4882a593Smuzhiyun
72*4882a593SmuzhiyunTo enable all events in sched subsystem::
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun	# echo 1 > /sys/kernel/debug/tracing/events/sched/enable
75*4882a593Smuzhiyun
76*4882a593SmuzhiyunTo enable all events::
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun	# echo 1 > /sys/kernel/debug/tracing/events/enable
79*4882a593Smuzhiyun
80*4882a593SmuzhiyunWhen reading one of these enable files, there are four results:
81*4882a593Smuzhiyun
82*4882a593Smuzhiyun - 0 - all events this file affects are disabled
83*4882a593Smuzhiyun - 1 - all events this file affects are enabled
84*4882a593Smuzhiyun - X - there is a mixture of events enabled and disabled
85*4882a593Smuzhiyun - ? - this file does not affect any event
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun2.3 Boot option
88*4882a593Smuzhiyun---------------
89*4882a593Smuzhiyun
90*4882a593SmuzhiyunIn order to facilitate early boot debugging, use boot option::
91*4882a593Smuzhiyun
92*4882a593Smuzhiyun	trace_event=[event-list]
93*4882a593Smuzhiyun
94*4882a593Smuzhiyunevent-list is a comma separated list of events. See section 2.1 for event
95*4882a593Smuzhiyunformat.
96*4882a593Smuzhiyun
97*4882a593Smuzhiyun3. Defining an event-enabled tracepoint
98*4882a593Smuzhiyun=======================================
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunSee The example provided in samples/trace_events
101*4882a593Smuzhiyun
102*4882a593Smuzhiyun4. Event formats
103*4882a593Smuzhiyun================
104*4882a593Smuzhiyun
105*4882a593SmuzhiyunEach trace event has a 'format' file associated with it that contains
106*4882a593Smuzhiyuna description of each field in a logged event.  This information can
107*4882a593Smuzhiyunbe used to parse the binary trace stream, and is also the place to
108*4882a593Smuzhiyunfind the field names that can be used in event filters (see section 5).
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunIt also displays the format string that will be used to print the
111*4882a593Smuzhiyunevent in text mode, along with the event name and ID used for
112*4882a593Smuzhiyunprofiling.
113*4882a593Smuzhiyun
114*4882a593SmuzhiyunEvery event has a set of ``common`` fields associated with it; these are
115*4882a593Smuzhiyunthe fields prefixed with ``common_``.  The other fields vary between
116*4882a593Smuzhiyunevents and correspond to the fields defined in the TRACE_EVENT
117*4882a593Smuzhiyundefinition for that event.
118*4882a593Smuzhiyun
119*4882a593SmuzhiyunEach field in the format has the form::
120*4882a593Smuzhiyun
121*4882a593Smuzhiyun     field:field-type field-name; offset:N; size:N;
122*4882a593Smuzhiyun
123*4882a593Smuzhiyunwhere offset is the offset of the field in the trace record and size
124*4882a593Smuzhiyunis the size of the data item, in bytes.
125*4882a593Smuzhiyun
126*4882a593SmuzhiyunFor example, here's the information displayed for the 'sched_wakeup'
127*4882a593Smuzhiyunevent::
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun	# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
130*4882a593Smuzhiyun
131*4882a593Smuzhiyun	name: sched_wakeup
132*4882a593Smuzhiyun	ID: 60
133*4882a593Smuzhiyun	format:
134*4882a593Smuzhiyun		field:unsigned short common_type;	offset:0;	size:2;
135*4882a593Smuzhiyun		field:unsigned char common_flags;	offset:2;	size:1;
136*4882a593Smuzhiyun		field:unsigned char common_preempt_count;	offset:3;	size:1;
137*4882a593Smuzhiyun		field:int common_pid;	offset:4;	size:4;
138*4882a593Smuzhiyun		field:int common_tgid;	offset:8;	size:4;
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun		field:char comm[TASK_COMM_LEN];	offset:12;	size:16;
141*4882a593Smuzhiyun		field:pid_t pid;	offset:28;	size:4;
142*4882a593Smuzhiyun		field:int prio;	offset:32;	size:4;
143*4882a593Smuzhiyun		field:int success;	offset:36;	size:4;
144*4882a593Smuzhiyun		field:int cpu;	offset:40;	size:4;
145*4882a593Smuzhiyun
146*4882a593Smuzhiyun	print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid,
147*4882a593Smuzhiyun		   REC->prio, REC->success, REC->cpu
148*4882a593Smuzhiyun
149*4882a593SmuzhiyunThis event contains 10 fields, the first 5 common and the remaining 5
150*4882a593Smuzhiyunevent-specific.  All the fields for this event are numeric, except for
151*4882a593Smuzhiyun'comm' which is a string, a distinction important for event filtering.
152*4882a593Smuzhiyun
153*4882a593Smuzhiyun5. Event filtering
154*4882a593Smuzhiyun==================
155*4882a593Smuzhiyun
156*4882a593SmuzhiyunTrace events can be filtered in the kernel by associating boolean
157*4882a593Smuzhiyun'filter expressions' with them.  As soon as an event is logged into
158*4882a593Smuzhiyunthe trace buffer, its fields are checked against the filter expression
159*4882a593Smuzhiyunassociated with that event type.  An event with field values that
160*4882a593Smuzhiyun'match' the filter will appear in the trace output, and an event whose
161*4882a593Smuzhiyunvalues don't match will be discarded.  An event with no filter
162*4882a593Smuzhiyunassociated with it matches everything, and is the default when no
163*4882a593Smuzhiyunfilter has been set for an event.
164*4882a593Smuzhiyun
165*4882a593Smuzhiyun5.1 Expression syntax
166*4882a593Smuzhiyun---------------------
167*4882a593Smuzhiyun
168*4882a593SmuzhiyunA filter expression consists of one or more 'predicates' that can be
169*4882a593Smuzhiyuncombined using the logical operators '&&' and '||'.  A predicate is
170*4882a593Smuzhiyunsimply a clause that compares the value of a field contained within a
171*4882a593Smuzhiyunlogged event with a constant value and returns either 0 or 1 depending
172*4882a593Smuzhiyunon whether the field value matched (1) or didn't match (0)::
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun	  field-name relational-operator value
175*4882a593Smuzhiyun
176*4882a593SmuzhiyunParentheses can be used to provide arbitrary logical groupings and
177*4882a593Smuzhiyundouble-quotes can be used to prevent the shell from interpreting
178*4882a593Smuzhiyunoperators as shell metacharacters.
179*4882a593Smuzhiyun
180*4882a593SmuzhiyunThe field-names available for use in filters can be found in the
181*4882a593Smuzhiyun'format' files for trace events (see section 4).
182*4882a593Smuzhiyun
183*4882a593SmuzhiyunThe relational-operators depend on the type of the field being tested:
184*4882a593Smuzhiyun
185*4882a593SmuzhiyunThe operators available for numeric fields are:
186*4882a593Smuzhiyun
187*4882a593Smuzhiyun==, !=, <, <=, >, >=, &
188*4882a593Smuzhiyun
189*4882a593SmuzhiyunAnd for string fields they are:
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun==, !=, ~
192*4882a593Smuzhiyun
193*4882a593SmuzhiyunThe glob (~) accepts a wild card character (\*,?) and character classes
194*4882a593Smuzhiyun([). For example::
195*4882a593Smuzhiyun
196*4882a593Smuzhiyun  prev_comm ~ "*sh"
197*4882a593Smuzhiyun  prev_comm ~ "sh*"
198*4882a593Smuzhiyun  prev_comm ~ "*sh*"
199*4882a593Smuzhiyun  prev_comm ~ "ba*sh"
200*4882a593Smuzhiyun
201*4882a593SmuzhiyunIf the field is a pointer that points into user space (for example
202*4882a593Smuzhiyun"filename" from sys_enter_openat), then you have to append ".ustring" to the
203*4882a593Smuzhiyunfield name::
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun  filename.ustring ~ "password"
206*4882a593Smuzhiyun
207*4882a593SmuzhiyunAs the kernel will have to know how to retrieve the memory that the pointer
208*4882a593Smuzhiyunis at from user space.
209*4882a593Smuzhiyun
210*4882a593Smuzhiyun5.2 Setting filters
211*4882a593Smuzhiyun-------------------
212*4882a593Smuzhiyun
213*4882a593SmuzhiyunA filter for an individual event is set by writing a filter expression
214*4882a593Smuzhiyunto the 'filter' file for the given event.
215*4882a593Smuzhiyun
216*4882a593SmuzhiyunFor example::
217*4882a593Smuzhiyun
218*4882a593Smuzhiyun	# cd /sys/kernel/debug/tracing/events/sched/sched_wakeup
219*4882a593Smuzhiyun	# echo "common_preempt_count > 4" > filter
220*4882a593Smuzhiyun
221*4882a593SmuzhiyunA slightly more involved example::
222*4882a593Smuzhiyun
223*4882a593Smuzhiyun	# cd /sys/kernel/debug/tracing/events/signal/signal_generate
224*4882a593Smuzhiyun	# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
225*4882a593Smuzhiyun
226*4882a593SmuzhiyunIf there is an error in the expression, you'll get an 'Invalid
227*4882a593Smuzhiyunargument' error when setting it, and the erroneous string along with
228*4882a593Smuzhiyunan error message can be seen by looking at the filter e.g.::
229*4882a593Smuzhiyun
230*4882a593Smuzhiyun	# cd /sys/kernel/debug/tracing/events/signal/signal_generate
231*4882a593Smuzhiyun	# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
232*4882a593Smuzhiyun	-bash: echo: write error: Invalid argument
233*4882a593Smuzhiyun	# cat filter
234*4882a593Smuzhiyun	((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
235*4882a593Smuzhiyun	^
236*4882a593Smuzhiyun	parse_error: Field not found
237*4882a593Smuzhiyun
238*4882a593SmuzhiyunCurrently the caret ('^') for an error always appears at the beginning of
239*4882a593Smuzhiyunthe filter string; the error message should still be useful though
240*4882a593Smuzhiyuneven without more accurate position info.
241*4882a593Smuzhiyun
242*4882a593Smuzhiyun5.2.1 Filter limitations
243*4882a593Smuzhiyun------------------------
244*4882a593Smuzhiyun
245*4882a593SmuzhiyunIf a filter is placed on a string pointer ``(char *)`` that does not point
246*4882a593Smuzhiyunto a string on the ring buffer, but instead points to kernel or user space
247*4882a593Smuzhiyunmemory, then, for safety reasons, at most 1024 bytes of the content is
248*4882a593Smuzhiyuncopied onto a temporary buffer to do the compare. If the copy of the memory
249*4882a593Smuzhiyunfaults (the pointer points to memory that should not be accessed), then the
250*4882a593Smuzhiyunstring compare will be treated as not matching.
251*4882a593Smuzhiyun
252*4882a593Smuzhiyun5.3 Clearing filters
253*4882a593Smuzhiyun--------------------
254*4882a593Smuzhiyun
255*4882a593SmuzhiyunTo clear the filter for an event, write a '0' to the event's filter
256*4882a593Smuzhiyunfile.
257*4882a593Smuzhiyun
258*4882a593SmuzhiyunTo clear the filters for all events in a subsystem, write a '0' to the
259*4882a593Smuzhiyunsubsystem's filter file.
260*4882a593Smuzhiyun
261*4882a593Smuzhiyun5.3 Subsystem filters
262*4882a593Smuzhiyun---------------------
263*4882a593Smuzhiyun
264*4882a593SmuzhiyunFor convenience, filters for every event in a subsystem can be set or
265*4882a593Smuzhiyuncleared as a group by writing a filter expression into the filter file
266*4882a593Smuzhiyunat the root of the subsystem.  Note however, that if a filter for any
267*4882a593Smuzhiyunevent within the subsystem lacks a field specified in the subsystem
268*4882a593Smuzhiyunfilter, or if the filter can't be applied for any other reason, the
269*4882a593Smuzhiyunfilter for that event will retain its previous setting.  This can
270*4882a593Smuzhiyunresult in an unintended mixture of filters which could lead to
271*4882a593Smuzhiyunconfusing (to the user who might think different filters are in
272*4882a593Smuzhiyuneffect) trace output.  Only filters that reference just the common
273*4882a593Smuzhiyunfields can be guaranteed to propagate successfully to all events.
274*4882a593Smuzhiyun
275*4882a593SmuzhiyunHere are a few subsystem filter examples that also illustrate the
276*4882a593Smuzhiyunabove points:
277*4882a593Smuzhiyun
278*4882a593SmuzhiyunClear the filters on all events in the sched subsystem::
279*4882a593Smuzhiyun
280*4882a593Smuzhiyun	# cd /sys/kernel/debug/tracing/events/sched
281*4882a593Smuzhiyun	# echo 0 > filter
282*4882a593Smuzhiyun	# cat sched_switch/filter
283*4882a593Smuzhiyun	none
284*4882a593Smuzhiyun	# cat sched_wakeup/filter
285*4882a593Smuzhiyun	none
286*4882a593Smuzhiyun
287*4882a593SmuzhiyunSet a filter using only common fields for all events in the sched
288*4882a593Smuzhiyunsubsystem (all events end up with the same filter)::
289*4882a593Smuzhiyun
290*4882a593Smuzhiyun	# cd /sys/kernel/debug/tracing/events/sched
291*4882a593Smuzhiyun	# echo common_pid == 0 > filter
292*4882a593Smuzhiyun	# cat sched_switch/filter
293*4882a593Smuzhiyun	common_pid == 0
294*4882a593Smuzhiyun	# cat sched_wakeup/filter
295*4882a593Smuzhiyun	common_pid == 0
296*4882a593Smuzhiyun
297*4882a593SmuzhiyunAttempt to set a filter using a non-common field for all events in the
298*4882a593Smuzhiyunsched subsystem (all events but those that have a prev_pid field retain
299*4882a593Smuzhiyuntheir old filters)::
300*4882a593Smuzhiyun
301*4882a593Smuzhiyun	# cd /sys/kernel/debug/tracing/events/sched
302*4882a593Smuzhiyun	# echo prev_pid == 0 > filter
303*4882a593Smuzhiyun	# cat sched_switch/filter
304*4882a593Smuzhiyun	prev_pid == 0
305*4882a593Smuzhiyun	# cat sched_wakeup/filter
306*4882a593Smuzhiyun	common_pid == 0
307*4882a593Smuzhiyun
308*4882a593Smuzhiyun5.4 PID filtering
309*4882a593Smuzhiyun-----------------
310*4882a593Smuzhiyun
311*4882a593SmuzhiyunThe set_event_pid file in the same directory as the top events directory
312*4882a593Smuzhiyunexists, will filter all events from tracing any task that does not have the
313*4882a593SmuzhiyunPID listed in the set_event_pid file.
314*4882a593Smuzhiyun::
315*4882a593Smuzhiyun
316*4882a593Smuzhiyun	# cd /sys/kernel/debug/tracing
317*4882a593Smuzhiyun	# echo $$ > set_event_pid
318*4882a593Smuzhiyun	# echo 1 > events/enable
319*4882a593Smuzhiyun
320*4882a593SmuzhiyunWill only trace events for the current task.
321*4882a593Smuzhiyun
322*4882a593SmuzhiyunTo add more PIDs without losing the PIDs already included, use '>>'.
323*4882a593Smuzhiyun::
324*4882a593Smuzhiyun
325*4882a593Smuzhiyun	# echo 123 244 1 >> set_event_pid
326*4882a593Smuzhiyun
327*4882a593Smuzhiyun
328*4882a593Smuzhiyun6. Event triggers
329*4882a593Smuzhiyun=================
330*4882a593Smuzhiyun
331*4882a593SmuzhiyunTrace events can be made to conditionally invoke trigger 'commands'
332*4882a593Smuzhiyunwhich can take various forms and are described in detail below;
333*4882a593Smuzhiyunexamples would be enabling or disabling other trace events or invoking
334*4882a593Smuzhiyuna stack trace whenever the trace event is hit.  Whenever a trace event
335*4882a593Smuzhiyunwith attached triggers is invoked, the set of trigger commands
336*4882a593Smuzhiyunassociated with that event is invoked.  Any given trigger can
337*4882a593Smuzhiyunadditionally have an event filter of the same form as described in
338*4882a593Smuzhiyunsection 5 (Event filtering) associated with it - the command will only
339*4882a593Smuzhiyunbe invoked if the event being invoked passes the associated filter.
340*4882a593SmuzhiyunIf no filter is associated with the trigger, it always passes.
341*4882a593Smuzhiyun
342*4882a593SmuzhiyunTriggers are added to and removed from a particular event by writing
343*4882a593Smuzhiyuntrigger expressions to the 'trigger' file for the given event.
344*4882a593Smuzhiyun
345*4882a593SmuzhiyunA given event can have any number of triggers associated with it,
346*4882a593Smuzhiyunsubject to any restrictions that individual commands may have in that
347*4882a593Smuzhiyunregard.
348*4882a593Smuzhiyun
349*4882a593SmuzhiyunEvent triggers are implemented on top of "soft" mode, which means that
350*4882a593Smuzhiyunwhenever a trace event has one or more triggers associated with it,
351*4882a593Smuzhiyunthe event is activated even if it isn't actually enabled, but is
352*4882a593Smuzhiyundisabled in a "soft" mode.  That is, the tracepoint will be called,
353*4882a593Smuzhiyunbut just will not be traced, unless of course it's actually enabled.
354*4882a593SmuzhiyunThis scheme allows triggers to be invoked even for events that aren't
355*4882a593Smuzhiyunenabled, and also allows the current event filter implementation to be
356*4882a593Smuzhiyunused for conditionally invoking triggers.
357*4882a593Smuzhiyun
358*4882a593SmuzhiyunThe syntax for event triggers is roughly based on the syntax for
359*4882a593Smuzhiyunset_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
360*4882a593Smuzhiyunsection of Documentation/trace/ftrace.rst), but there are major
361*4882a593Smuzhiyundifferences and the implementation isn't currently tied to it in any
362*4882a593Smuzhiyunway, so beware about making generalizations between the two.
363*4882a593Smuzhiyun
364*4882a593Smuzhiyun.. Note::
365*4882a593Smuzhiyun     Writing into trace_marker (See Documentation/trace/ftrace.rst)
366*4882a593Smuzhiyun     can also enable triggers that are written into
367*4882a593Smuzhiyun     /sys/kernel/tracing/events/ftrace/print/trigger
368*4882a593Smuzhiyun
369*4882a593Smuzhiyun6.1 Expression syntax
370*4882a593Smuzhiyun---------------------
371*4882a593Smuzhiyun
372*4882a593SmuzhiyunTriggers are added by echoing the command to the 'trigger' file::
373*4882a593Smuzhiyun
374*4882a593Smuzhiyun  # echo 'command[:count] [if filter]' > trigger
375*4882a593Smuzhiyun
376*4882a593SmuzhiyunTriggers are removed by echoing the same command but starting with '!'
377*4882a593Smuzhiyunto the 'trigger' file::
378*4882a593Smuzhiyun
379*4882a593Smuzhiyun  # echo '!command[:count] [if filter]' > trigger
380*4882a593Smuzhiyun
381*4882a593SmuzhiyunThe [if filter] part isn't used in matching commands when removing, so
382*4882a593Smuzhiyunleaving that off in a '!' command will accomplish the same thing as
383*4882a593Smuzhiyunhaving it in.
384*4882a593Smuzhiyun
385*4882a593SmuzhiyunThe filter syntax is the same as that described in the 'Event
386*4882a593Smuzhiyunfiltering' section above.
387*4882a593Smuzhiyun
388*4882a593SmuzhiyunFor ease of use, writing to the trigger file using '>' currently just
389*4882a593Smuzhiyunadds or removes a single trigger and there's no explicit '>>' support
390*4882a593Smuzhiyun('>' actually behaves like '>>') or truncation support to remove all
391*4882a593Smuzhiyuntriggers (you have to use '!' for each one added.)
392*4882a593Smuzhiyun
393*4882a593Smuzhiyun6.2 Supported trigger commands
394*4882a593Smuzhiyun------------------------------
395*4882a593Smuzhiyun
396*4882a593SmuzhiyunThe following commands are supported:
397*4882a593Smuzhiyun
398*4882a593Smuzhiyun- enable_event/disable_event
399*4882a593Smuzhiyun
400*4882a593Smuzhiyun  These commands can enable or disable another trace event whenever
401*4882a593Smuzhiyun  the triggering event is hit.  When these commands are registered,
402*4882a593Smuzhiyun  the other trace event is activated, but disabled in a "soft" mode.
403*4882a593Smuzhiyun  That is, the tracepoint will be called, but just will not be traced.
404*4882a593Smuzhiyun  The event tracepoint stays in this mode as long as there's a trigger
405*4882a593Smuzhiyun  in effect that can trigger it.
406*4882a593Smuzhiyun
407*4882a593Smuzhiyun  For example, the following trigger causes kmalloc events to be
408*4882a593Smuzhiyun  traced when a read system call is entered, and the :1 at the end
409*4882a593Smuzhiyun  specifies that this enablement happens only once::
410*4882a593Smuzhiyun
411*4882a593Smuzhiyun	  # echo 'enable_event:kmem:kmalloc:1' > \
412*4882a593Smuzhiyun	      /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
413*4882a593Smuzhiyun
414*4882a593Smuzhiyun  The following trigger causes kmalloc events to stop being traced
415*4882a593Smuzhiyun  when a read system call exits.  This disablement happens on every
416*4882a593Smuzhiyun  read system call exit::
417*4882a593Smuzhiyun
418*4882a593Smuzhiyun	  # echo 'disable_event:kmem:kmalloc' > \
419*4882a593Smuzhiyun	      /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
420*4882a593Smuzhiyun
421*4882a593Smuzhiyun  The format is::
422*4882a593Smuzhiyun
423*4882a593Smuzhiyun      enable_event:<system>:<event>[:count]
424*4882a593Smuzhiyun      disable_event:<system>:<event>[:count]
425*4882a593Smuzhiyun
426*4882a593Smuzhiyun  To remove the above commands::
427*4882a593Smuzhiyun
428*4882a593Smuzhiyun	  # echo '!enable_event:kmem:kmalloc:1' > \
429*4882a593Smuzhiyun	      /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
430*4882a593Smuzhiyun
431*4882a593Smuzhiyun	  # echo '!disable_event:kmem:kmalloc' > \
432*4882a593Smuzhiyun	      /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
433*4882a593Smuzhiyun
434*4882a593Smuzhiyun  Note that there can be any number of enable/disable_event triggers
435*4882a593Smuzhiyun  per triggering event, but there can only be one trigger per
436*4882a593Smuzhiyun  triggered event. e.g. sys_enter_read can have triggers enabling both
437*4882a593Smuzhiyun  kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc
438*4882a593Smuzhiyun  versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if
439*4882a593Smuzhiyun  bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they
440*4882a593Smuzhiyun  could be combined into a single filter on kmem:kmalloc though).
441*4882a593Smuzhiyun
442*4882a593Smuzhiyun- stacktrace
443*4882a593Smuzhiyun
444*4882a593Smuzhiyun  This command dumps a stacktrace in the trace buffer whenever the
445*4882a593Smuzhiyun  triggering event occurs.
446*4882a593Smuzhiyun
447*4882a593Smuzhiyun  For example, the following trigger dumps a stacktrace every time the
448*4882a593Smuzhiyun  kmalloc tracepoint is hit::
449*4882a593Smuzhiyun
450*4882a593Smuzhiyun	  # echo 'stacktrace' > \
451*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
452*4882a593Smuzhiyun
453*4882a593Smuzhiyun  The following trigger dumps a stacktrace the first 5 times a kmalloc
454*4882a593Smuzhiyun  request happens with a size >= 64K::
455*4882a593Smuzhiyun
456*4882a593Smuzhiyun	  # echo 'stacktrace:5 if bytes_req >= 65536' > \
457*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
458*4882a593Smuzhiyun
459*4882a593Smuzhiyun  The format is::
460*4882a593Smuzhiyun
461*4882a593Smuzhiyun      stacktrace[:count]
462*4882a593Smuzhiyun
463*4882a593Smuzhiyun  To remove the above commands::
464*4882a593Smuzhiyun
465*4882a593Smuzhiyun	  # echo '!stacktrace' > \
466*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
467*4882a593Smuzhiyun
468*4882a593Smuzhiyun	  # echo '!stacktrace:5 if bytes_req >= 65536' > \
469*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
470*4882a593Smuzhiyun
471*4882a593Smuzhiyun  The latter can also be removed more simply by the following (without
472*4882a593Smuzhiyun  the filter)::
473*4882a593Smuzhiyun
474*4882a593Smuzhiyun	  # echo '!stacktrace:5' > \
475*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
476*4882a593Smuzhiyun
477*4882a593Smuzhiyun  Note that there can be only one stacktrace trigger per triggering
478*4882a593Smuzhiyun  event.
479*4882a593Smuzhiyun
480*4882a593Smuzhiyun- snapshot
481*4882a593Smuzhiyun
482*4882a593Smuzhiyun  This command causes a snapshot to be triggered whenever the
483*4882a593Smuzhiyun  triggering event occurs.
484*4882a593Smuzhiyun
485*4882a593Smuzhiyun  The following command creates a snapshot every time a block request
486*4882a593Smuzhiyun  queue is unplugged with a depth > 1.  If you were tracing a set of
487*4882a593Smuzhiyun  events or functions at the time, the snapshot trace buffer would
488*4882a593Smuzhiyun  capture those events when the trigger event occurred::
489*4882a593Smuzhiyun
490*4882a593Smuzhiyun	  # echo 'snapshot if nr_rq > 1' > \
491*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/block/block_unplug/trigger
492*4882a593Smuzhiyun
493*4882a593Smuzhiyun  To only snapshot once::
494*4882a593Smuzhiyun
495*4882a593Smuzhiyun	  # echo 'snapshot:1 if nr_rq > 1' > \
496*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/block/block_unplug/trigger
497*4882a593Smuzhiyun
498*4882a593Smuzhiyun  To remove the above commands::
499*4882a593Smuzhiyun
500*4882a593Smuzhiyun	  # echo '!snapshot if nr_rq > 1' > \
501*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/block/block_unplug/trigger
502*4882a593Smuzhiyun
503*4882a593Smuzhiyun	  # echo '!snapshot:1 if nr_rq > 1' > \
504*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/block/block_unplug/trigger
505*4882a593Smuzhiyun
506*4882a593Smuzhiyun  Note that there can be only one snapshot trigger per triggering
507*4882a593Smuzhiyun  event.
508*4882a593Smuzhiyun
509*4882a593Smuzhiyun- traceon/traceoff
510*4882a593Smuzhiyun
511*4882a593Smuzhiyun  These commands turn tracing on and off when the specified events are
512*4882a593Smuzhiyun  hit. The parameter determines how many times the tracing system is
513*4882a593Smuzhiyun  turned on and off. If unspecified, there is no limit.
514*4882a593Smuzhiyun
515*4882a593Smuzhiyun  The following command turns tracing off the first time a block
516*4882a593Smuzhiyun  request queue is unplugged with a depth > 1.  If you were tracing a
517*4882a593Smuzhiyun  set of events or functions at the time, you could then examine the
518*4882a593Smuzhiyun  trace buffer to see the sequence of events that led up to the
519*4882a593Smuzhiyun  trigger event::
520*4882a593Smuzhiyun
521*4882a593Smuzhiyun	  # echo 'traceoff:1 if nr_rq > 1' > \
522*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/block/block_unplug/trigger
523*4882a593Smuzhiyun
524*4882a593Smuzhiyun  To always disable tracing when nr_rq  > 1::
525*4882a593Smuzhiyun
526*4882a593Smuzhiyun	  # echo 'traceoff if nr_rq > 1' > \
527*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/block/block_unplug/trigger
528*4882a593Smuzhiyun
529*4882a593Smuzhiyun  To remove the above commands::
530*4882a593Smuzhiyun
531*4882a593Smuzhiyun	  # echo '!traceoff:1 if nr_rq > 1' > \
532*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/block/block_unplug/trigger
533*4882a593Smuzhiyun
534*4882a593Smuzhiyun	  # echo '!traceoff if nr_rq > 1' > \
535*4882a593Smuzhiyun		/sys/kernel/debug/tracing/events/block/block_unplug/trigger
536*4882a593Smuzhiyun
537*4882a593Smuzhiyun  Note that there can be only one traceon or traceoff trigger per
538*4882a593Smuzhiyun  triggering event.
539*4882a593Smuzhiyun
540*4882a593Smuzhiyun- hist
541*4882a593Smuzhiyun
542*4882a593Smuzhiyun  This command aggregates event hits into a hash table keyed on one or
543*4882a593Smuzhiyun  more trace event format fields (or stacktrace) and a set of running
544*4882a593Smuzhiyun  totals derived from one or more trace event format fields and/or
545*4882a593Smuzhiyun  event counts (hitcount).
546*4882a593Smuzhiyun
547*4882a593Smuzhiyun  See Documentation/trace/histogram.rst for details and examples.
548*4882a593Smuzhiyun
549*4882a593Smuzhiyun7. In-kernel trace event API
550*4882a593Smuzhiyun============================
551*4882a593Smuzhiyun
552*4882a593SmuzhiyunIn most cases, the command-line interface to trace events is more than
553*4882a593Smuzhiyunsufficient.  Sometimes, however, applications might find the need for
554*4882a593Smuzhiyunmore complex relationships than can be expressed through a simple
555*4882a593Smuzhiyunseries of linked command-line expressions, or putting together sets of
556*4882a593Smuzhiyuncommands may be simply too cumbersome.  An example might be an
557*4882a593Smuzhiyunapplication that needs to 'listen' to the trace stream in order to
558*4882a593Smuzhiyunmaintain an in-kernel state machine detecting, for instance, when an
559*4882a593Smuzhiyunillegal kernel state occurs in the scheduler.
560*4882a593Smuzhiyun
561*4882a593SmuzhiyunThe trace event subsystem provides an in-kernel API allowing modules
562*4882a593Smuzhiyunor other kernel code to generate user-defined 'synthetic' events at
563*4882a593Smuzhiyunwill, which can be used to either augment the existing trace stream
564*4882a593Smuzhiyunand/or signal that a particular important state has occurred.
565*4882a593Smuzhiyun
566*4882a593SmuzhiyunA similar in-kernel API is also available for creating kprobe and
567*4882a593Smuzhiyunkretprobe events.
568*4882a593Smuzhiyun
569*4882a593SmuzhiyunBoth the synthetic event and k/ret/probe event APIs are built on top
570*4882a593Smuzhiyunof a lower-level "dynevent_cmd" event command API, which is also
571*4882a593Smuzhiyunavailable for more specialized applications, or as the basis of other
572*4882a593Smuzhiyunhigher-level trace event APIs.
573*4882a593Smuzhiyun
574*4882a593SmuzhiyunThe API provided for these purposes is describe below and allows the
575*4882a593Smuzhiyunfollowing:
576*4882a593Smuzhiyun
577*4882a593Smuzhiyun  - dynamically creating synthetic event definitions
578*4882a593Smuzhiyun  - dynamically creating kprobe and kretprobe event definitions
579*4882a593Smuzhiyun  - tracing synthetic events from in-kernel code
580*4882a593Smuzhiyun  - the low-level "dynevent_cmd" API
581*4882a593Smuzhiyun
582*4882a593Smuzhiyun7.1 Dyamically creating synthetic event definitions
583*4882a593Smuzhiyun---------------------------------------------------
584*4882a593Smuzhiyun
585*4882a593SmuzhiyunThere are a couple ways to create a new synthetic event from a kernel
586*4882a593Smuzhiyunmodule or other kernel code.
587*4882a593Smuzhiyun
588*4882a593SmuzhiyunThe first creates the event in one step, using synth_event_create().
589*4882a593SmuzhiyunIn this method, the name of the event to create and an array defining
590*4882a593Smuzhiyunthe fields is supplied to synth_event_create().  If successful, a
591*4882a593Smuzhiyunsynthetic event with that name and fields will exist following that
592*4882a593Smuzhiyuncall.  For example, to create a new "schedtest" synthetic event::
593*4882a593Smuzhiyun
594*4882a593Smuzhiyun  ret = synth_event_create("schedtest", sched_fields,
595*4882a593Smuzhiyun                           ARRAY_SIZE(sched_fields), THIS_MODULE);
596*4882a593Smuzhiyun
597*4882a593SmuzhiyunThe sched_fields param in this example points to an array of struct
598*4882a593Smuzhiyunsynth_field_desc, each of which describes an event field by type and
599*4882a593Smuzhiyunname::
600*4882a593Smuzhiyun
601*4882a593Smuzhiyun  static struct synth_field_desc sched_fields[] = {
602*4882a593Smuzhiyun        { .type = "pid_t",              .name = "next_pid_field" },
603*4882a593Smuzhiyun        { .type = "char[16]",           .name = "next_comm_field" },
604*4882a593Smuzhiyun        { .type = "u64",                .name = "ts_ns" },
605*4882a593Smuzhiyun        { .type = "u64",                .name = "ts_ms" },
606*4882a593Smuzhiyun        { .type = "unsigned int",       .name = "cpu" },
607*4882a593Smuzhiyun        { .type = "char[64]",           .name = "my_string_field" },
608*4882a593Smuzhiyun        { .type = "int",                .name = "my_int_field" },
609*4882a593Smuzhiyun  };
610*4882a593Smuzhiyun
611*4882a593SmuzhiyunSee synth_field_size() for available types.
612*4882a593Smuzhiyun
613*4882a593SmuzhiyunIf field_name contains [n], the field is considered to be a static array.
614*4882a593Smuzhiyun
615*4882a593SmuzhiyunIf field_names contains[] (no subscript), the field is considered to
616*4882a593Smuzhiyunbe a dynamic array, which will only take as much space in the event as
617*4882a593Smuzhiyunis required to hold the array.
618*4882a593Smuzhiyun
619*4882a593SmuzhiyunBecause space for an event is reserved before assigning field values
620*4882a593Smuzhiyunto the event, using dynamic arrays implies that the piecewise
621*4882a593Smuzhiyunin-kernel API described below can't be used with dynamic arrays.  The
622*4882a593Smuzhiyunother non-piecewise in-kernel APIs can, however, be used with dynamic
623*4882a593Smuzhiyunarrays.
624*4882a593Smuzhiyun
625*4882a593SmuzhiyunIf the event is created from within a module, a pointer to the module
626*4882a593Smuzhiyunmust be passed to synth_event_create().  This will ensure that the
627*4882a593Smuzhiyuntrace buffer won't contain unreadable events when the module is
628*4882a593Smuzhiyunremoved.
629*4882a593Smuzhiyun
630*4882a593SmuzhiyunAt this point, the event object is ready to be used for generating new
631*4882a593Smuzhiyunevents.
632*4882a593Smuzhiyun
633*4882a593SmuzhiyunIn the second method, the event is created in several steps.  This
634*4882a593Smuzhiyunallows events to be created dynamically and without the need to create
635*4882a593Smuzhiyunand populate an array of fields beforehand.
636*4882a593Smuzhiyun
637*4882a593SmuzhiyunTo use this method, an empty or partially empty synthetic event should
638*4882a593Smuzhiyunfirst be created using synth_event_gen_cmd_start() or
639*4882a593Smuzhiyunsynth_event_gen_cmd_array_start().  For synth_event_gen_cmd_start(),
640*4882a593Smuzhiyunthe name of the event along with one or more pairs of args each pair
641*4882a593Smuzhiyunrepresenting a 'type field_name;' field specification should be
642*4882a593Smuzhiyunsupplied.  For synth_event_gen_cmd_array_start(), the name of the
643*4882a593Smuzhiyunevent along with an array of struct synth_field_desc should be
644*4882a593Smuzhiyunsupplied. Before calling synth_event_gen_cmd_start() or
645*4882a593Smuzhiyunsynth_event_gen_cmd_array_start(), the user should create and
646*4882a593Smuzhiyuninitialize a dynevent_cmd object using synth_event_cmd_init().
647*4882a593Smuzhiyun
648*4882a593SmuzhiyunFor example, to create a new "schedtest" synthetic event with two
649*4882a593Smuzhiyunfields::
650*4882a593Smuzhiyun
651*4882a593Smuzhiyun  struct dynevent_cmd cmd;
652*4882a593Smuzhiyun  char *buf;
653*4882a593Smuzhiyun
654*4882a593Smuzhiyun  /* Create a buffer to hold the generated command */
655*4882a593Smuzhiyun  buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
656*4882a593Smuzhiyun
657*4882a593Smuzhiyun  /* Before generating the command, initialize the cmd object */
658*4882a593Smuzhiyun  synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
659*4882a593Smuzhiyun
660*4882a593Smuzhiyun  ret = synth_event_gen_cmd_start(&cmd, "schedtest", THIS_MODULE,
661*4882a593Smuzhiyun                                  "pid_t", "next_pid_field",
662*4882a593Smuzhiyun                                  "u64", "ts_ns");
663*4882a593Smuzhiyun
664*4882a593SmuzhiyunAlternatively, using an array of struct synth_field_desc fields
665*4882a593Smuzhiyuncontaining the same information::
666*4882a593Smuzhiyun
667*4882a593Smuzhiyun  ret = synth_event_gen_cmd_array_start(&cmd, "schedtest", THIS_MODULE,
668*4882a593Smuzhiyun                                        fields, n_fields);
669*4882a593Smuzhiyun
670*4882a593SmuzhiyunOnce the synthetic event object has been created, it can then be
671*4882a593Smuzhiyunpopulated with more fields.  Fields are added one by one using
672*4882a593Smuzhiyunsynth_event_add_field(), supplying the dynevent_cmd object, a field
673*4882a593Smuzhiyuntype, and a field name.  For example, to add a new int field named
674*4882a593Smuzhiyun"intfield", the following call should be made::
675*4882a593Smuzhiyun
676*4882a593Smuzhiyun  ret = synth_event_add_field(&cmd, "int", "intfield");
677*4882a593Smuzhiyun
678*4882a593SmuzhiyunSee synth_field_size() for available types. If field_name contains [n]
679*4882a593Smuzhiyunthe field is considered to be an array.
680*4882a593Smuzhiyun
681*4882a593SmuzhiyunA group of fields can also be added all at once using an array of
682*4882a593Smuzhiyunsynth_field_desc with add_synth_fields().  For example, this would add
683*4882a593Smuzhiyunjust the first four sched_fields::
684*4882a593Smuzhiyun
685*4882a593Smuzhiyun  ret = synth_event_add_fields(&cmd, sched_fields, 4);
686*4882a593Smuzhiyun
687*4882a593SmuzhiyunIf you already have a string of the form 'type field_name',
688*4882a593Smuzhiyunsynth_event_add_field_str() can be used to add it as-is; it will
689*4882a593Smuzhiyunalso automatically append a ';' to the string.
690*4882a593Smuzhiyun
691*4882a593SmuzhiyunOnce all the fields have been added, the event should be finalized and
692*4882a593Smuzhiyunregistered by calling the synth_event_gen_cmd_end() function::
693*4882a593Smuzhiyun
694*4882a593Smuzhiyun  ret = synth_event_gen_cmd_end(&cmd);
695*4882a593Smuzhiyun
696*4882a593SmuzhiyunAt this point, the event object is ready to be used for tracing new
697*4882a593Smuzhiyunevents.
698*4882a593Smuzhiyun
699*4882a593Smuzhiyun7.2 Tracing synthetic events from in-kernel code
700*4882a593Smuzhiyun------------------------------------------------
701*4882a593Smuzhiyun
702*4882a593SmuzhiyunTo trace a synthetic event, there are several options.  The first
703*4882a593Smuzhiyunoption is to trace the event in one call, using synth_event_trace()
704*4882a593Smuzhiyunwith a variable number of values, or synth_event_trace_array() with an
705*4882a593Smuzhiyunarray of values to be set.  A second option can be used to avoid the
706*4882a593Smuzhiyunneed for a pre-formed array of values or list of arguments, via
707*4882a593Smuzhiyunsynth_event_trace_start() and synth_event_trace_end() along with
708*4882a593Smuzhiyunsynth_event_add_next_val() or synth_event_add_val() to add the values
709*4882a593Smuzhiyunpiecewise.
710*4882a593Smuzhiyun
711*4882a593Smuzhiyun7.2.1 Tracing a synthetic event all at once
712*4882a593Smuzhiyun-------------------------------------------
713*4882a593Smuzhiyun
714*4882a593SmuzhiyunTo trace a synthetic event all at once, the synth_event_trace() or
715*4882a593Smuzhiyunsynth_event_trace_array() functions can be used.
716*4882a593Smuzhiyun
717*4882a593SmuzhiyunThe synth_event_trace() function is passed the trace_event_file
718*4882a593Smuzhiyunrepresenting the synthetic event (which can be retrieved using
719*4882a593Smuzhiyuntrace_get_event_file() using the synthetic event name, "synthetic" as
720*4882a593Smuzhiyunthe system name, and the trace instance name (NULL if using the global
721*4882a593Smuzhiyuntrace array)), along with an variable number of u64 args, one for each
722*4882a593Smuzhiyunsynthetic event field, and the number of values being passed.
723*4882a593Smuzhiyun
724*4882a593SmuzhiyunSo, to trace an event corresponding to the synthetic event definition
725*4882a593Smuzhiyunabove, code like the following could be used::
726*4882a593Smuzhiyun
727*4882a593Smuzhiyun  ret = synth_event_trace(create_synth_test, 7, /* number of values */
728*4882a593Smuzhiyun                          444,             /* next_pid_field */
729*4882a593Smuzhiyun                          (u64)"clackers", /* next_comm_field */
730*4882a593Smuzhiyun                          1000000,         /* ts_ns */
731*4882a593Smuzhiyun                          1000,            /* ts_ms */
732*4882a593Smuzhiyun                          smp_processor_id(),/* cpu */
733*4882a593Smuzhiyun                          (u64)"Thneed",   /* my_string_field */
734*4882a593Smuzhiyun                          999);            /* my_int_field */
735*4882a593Smuzhiyun
736*4882a593SmuzhiyunAll vals should be cast to u64, and string vals are just pointers to
737*4882a593Smuzhiyunstrings, cast to u64.  Strings will be copied into space reserved in
738*4882a593Smuzhiyunthe event for the string, using these pointers.
739*4882a593Smuzhiyun
740*4882a593SmuzhiyunAlternatively, the synth_event_trace_array() function can be used to
741*4882a593Smuzhiyunaccomplish the same thing.  It is passed the trace_event_file
742*4882a593Smuzhiyunrepresenting the synthetic event (which can be retrieved using
743*4882a593Smuzhiyuntrace_get_event_file() using the synthetic event name, "synthetic" as
744*4882a593Smuzhiyunthe system name, and the trace instance name (NULL if using the global
745*4882a593Smuzhiyuntrace array)), along with an array of u64, one for each synthetic
746*4882a593Smuzhiyunevent field.
747*4882a593Smuzhiyun
748*4882a593SmuzhiyunTo trace an event corresponding to the synthetic event definition
749*4882a593Smuzhiyunabove, code like the following could be used::
750*4882a593Smuzhiyun
751*4882a593Smuzhiyun  u64 vals[7];
752*4882a593Smuzhiyun
753*4882a593Smuzhiyun  vals[0] = 777;                  /* next_pid_field */
754*4882a593Smuzhiyun  vals[1] = (u64)"tiddlywinks";   /* next_comm_field */
755*4882a593Smuzhiyun  vals[2] = 1000000;              /* ts_ns */
756*4882a593Smuzhiyun  vals[3] = 1000;                 /* ts_ms */
757*4882a593Smuzhiyun  vals[4] = smp_processor_id();   /* cpu */
758*4882a593Smuzhiyun  vals[5] = (u64)"thneed";        /* my_string_field */
759*4882a593Smuzhiyun  vals[6] = 398;                  /* my_int_field */
760*4882a593Smuzhiyun
761*4882a593SmuzhiyunThe 'vals' array is just an array of u64, the number of which must
762*4882a593Smuzhiyunmatch the number of field in the synthetic event, and which must be in
763*4882a593Smuzhiyunthe same order as the synthetic event fields.
764*4882a593Smuzhiyun
765*4882a593SmuzhiyunAll vals should be cast to u64, and string vals are just pointers to
766*4882a593Smuzhiyunstrings, cast to u64.  Strings will be copied into space reserved in
767*4882a593Smuzhiyunthe event for the string, using these pointers.
768*4882a593Smuzhiyun
769*4882a593SmuzhiyunIn order to trace a synthetic event, a pointer to the trace event file
770*4882a593Smuzhiyunis needed.  The trace_get_event_file() function can be used to get
771*4882a593Smuzhiyunit - it will find the file in the given trace instance (in this case
772*4882a593SmuzhiyunNULL since the top trace array is being used) while at the same time
773*4882a593Smuzhiyunpreventing the instance containing it from going away::
774*4882a593Smuzhiyun
775*4882a593Smuzhiyun       schedtest_event_file = trace_get_event_file(NULL, "synthetic",
776*4882a593Smuzhiyun                                                   "schedtest");
777*4882a593Smuzhiyun
778*4882a593SmuzhiyunBefore tracing the event, it should be enabled in some way, otherwise
779*4882a593Smuzhiyunthe synthetic event won't actually show up in the trace buffer.
780*4882a593Smuzhiyun
781*4882a593SmuzhiyunTo enable a synthetic event from the kernel, trace_array_set_clr_event()
782*4882a593Smuzhiyuncan be used (which is not specific to synthetic events, so does need
783*4882a593Smuzhiyunthe "synthetic" system name to be specified explicitly).
784*4882a593Smuzhiyun
785*4882a593SmuzhiyunTo enable the event, pass 'true' to it::
786*4882a593Smuzhiyun
787*4882a593Smuzhiyun       trace_array_set_clr_event(schedtest_event_file->tr,
788*4882a593Smuzhiyun                                 "synthetic", "schedtest", true);
789*4882a593Smuzhiyun
790*4882a593SmuzhiyunTo disable it pass false::
791*4882a593Smuzhiyun
792*4882a593Smuzhiyun       trace_array_set_clr_event(schedtest_event_file->tr,
793*4882a593Smuzhiyun                                 "synthetic", "schedtest", false);
794*4882a593Smuzhiyun
795*4882a593SmuzhiyunFinally, synth_event_trace_array() can be used to actually trace the
796*4882a593Smuzhiyunevent, which should be visible in the trace buffer afterwards::
797*4882a593Smuzhiyun
798*4882a593Smuzhiyun       ret = synth_event_trace_array(schedtest_event_file, vals,
799*4882a593Smuzhiyun                                     ARRAY_SIZE(vals));
800*4882a593Smuzhiyun
801*4882a593SmuzhiyunTo remove the synthetic event, the event should be disabled, and the
802*4882a593Smuzhiyuntrace instance should be 'put' back using trace_put_event_file()::
803*4882a593Smuzhiyun
804*4882a593Smuzhiyun       trace_array_set_clr_event(schedtest_event_file->tr,
805*4882a593Smuzhiyun                                 "synthetic", "schedtest", false);
806*4882a593Smuzhiyun       trace_put_event_file(schedtest_event_file);
807*4882a593Smuzhiyun
808*4882a593SmuzhiyunIf those have been successful, synth_event_delete() can be called to
809*4882a593Smuzhiyunremove the event::
810*4882a593Smuzhiyun
811*4882a593Smuzhiyun       ret = synth_event_delete("schedtest");
812*4882a593Smuzhiyun
813*4882a593Smuzhiyun7.2.2 Tracing a synthetic event piecewise
814*4882a593Smuzhiyun-----------------------------------------
815*4882a593Smuzhiyun
816*4882a593SmuzhiyunTo trace a synthetic using the piecewise method described above, the
817*4882a593Smuzhiyunsynth_event_trace_start() function is used to 'open' the synthetic
818*4882a593Smuzhiyunevent trace::
819*4882a593Smuzhiyun
820*4882a593Smuzhiyun       struct synth_trace_state trace_state;
821*4882a593Smuzhiyun
822*4882a593Smuzhiyun       ret = synth_event_trace_start(schedtest_event_file, &trace_state);
823*4882a593Smuzhiyun
824*4882a593SmuzhiyunIt's passed the trace_event_file representing the synthetic event
825*4882a593Smuzhiyunusing the same methods as described above, along with a pointer to a
826*4882a593Smuzhiyunstruct synth_trace_state object, which will be zeroed before use and
827*4882a593Smuzhiyunused to maintain state between this and following calls.
828*4882a593Smuzhiyun
829*4882a593SmuzhiyunOnce the event has been opened, which means space for it has been
830*4882a593Smuzhiyunreserved in the trace buffer, the individual fields can be set.  There
831*4882a593Smuzhiyunare two ways to do that, either one after another for each field in
832*4882a593Smuzhiyunthe event, which requires no lookups, or by name, which does.  The
833*4882a593Smuzhiyuntradeoff is flexibility in doing the assignments vs the cost of a
834*4882a593Smuzhiyunlookup per field.
835*4882a593Smuzhiyun
836*4882a593SmuzhiyunTo assign the values one after the other without lookups,
837*4882a593Smuzhiyunsynth_event_add_next_val() should be used.  Each call is passed the
838*4882a593Smuzhiyunsame synth_trace_state object used in the synth_event_trace_start(),
839*4882a593Smuzhiyunalong with the value to set the next field in the event.  After each
840*4882a593Smuzhiyunfield is set, the 'cursor' points to the next field, which will be set
841*4882a593Smuzhiyunby the subsequent call, continuing until all the fields have been set
842*4882a593Smuzhiyunin order.  The same sequence of calls as in the above examples using
843*4882a593Smuzhiyunthis method would be (without error-handling code)::
844*4882a593Smuzhiyun
845*4882a593Smuzhiyun       /* next_pid_field */
846*4882a593Smuzhiyun       ret = synth_event_add_next_val(777, &trace_state);
847*4882a593Smuzhiyun
848*4882a593Smuzhiyun       /* next_comm_field */
849*4882a593Smuzhiyun       ret = synth_event_add_next_val((u64)"slinky", &trace_state);
850*4882a593Smuzhiyun
851*4882a593Smuzhiyun       /* ts_ns */
852*4882a593Smuzhiyun       ret = synth_event_add_next_val(1000000, &trace_state);
853*4882a593Smuzhiyun
854*4882a593Smuzhiyun       /* ts_ms */
855*4882a593Smuzhiyun       ret = synth_event_add_next_val(1000, &trace_state);
856*4882a593Smuzhiyun
857*4882a593Smuzhiyun       /* cpu */
858*4882a593Smuzhiyun       ret = synth_event_add_next_val(smp_processor_id(), &trace_state);
859*4882a593Smuzhiyun
860*4882a593Smuzhiyun       /* my_string_field */
861*4882a593Smuzhiyun       ret = synth_event_add_next_val((u64)"thneed_2.01", &trace_state);
862*4882a593Smuzhiyun
863*4882a593Smuzhiyun       /* my_int_field */
864*4882a593Smuzhiyun       ret = synth_event_add_next_val(395, &trace_state);
865*4882a593Smuzhiyun
866*4882a593SmuzhiyunTo assign the values in any order, synth_event_add_val() should be
867*4882a593Smuzhiyunused.  Each call is passed the same synth_trace_state object used in
868*4882a593Smuzhiyunthe synth_event_trace_start(), along with the field name of the field
869*4882a593Smuzhiyunto set and the value to set it to.  The same sequence of calls as in
870*4882a593Smuzhiyunthe above examples using this method would be (without error-handling
871*4882a593Smuzhiyuncode)::
872*4882a593Smuzhiyun
873*4882a593Smuzhiyun       ret = synth_event_add_val("next_pid_field", 777, &trace_state);
874*4882a593Smuzhiyun       ret = synth_event_add_val("next_comm_field", (u64)"silly putty",
875*4882a593Smuzhiyun                                 &trace_state);
876*4882a593Smuzhiyun       ret = synth_event_add_val("ts_ns", 1000000, &trace_state);
877*4882a593Smuzhiyun       ret = synth_event_add_val("ts_ms", 1000, &trace_state);
878*4882a593Smuzhiyun       ret = synth_event_add_val("cpu", smp_processor_id(), &trace_state);
879*4882a593Smuzhiyun       ret = synth_event_add_val("my_string_field", (u64)"thneed_9",
880*4882a593Smuzhiyun                                 &trace_state);
881*4882a593Smuzhiyun       ret = synth_event_add_val("my_int_field", 3999, &trace_state);
882*4882a593Smuzhiyun
883*4882a593SmuzhiyunNote that synth_event_add_next_val() and synth_event_add_val() are
884*4882a593Smuzhiyunincompatible if used within the same trace of an event - either one
885*4882a593Smuzhiyuncan be used but not both at the same time.
886*4882a593Smuzhiyun
887*4882a593SmuzhiyunFinally, the event won't be actually traced until it's 'closed',
888*4882a593Smuzhiyunwhich is done using synth_event_trace_end(), which takes only the
889*4882a593Smuzhiyunstruct synth_trace_state object used in the previous calls::
890*4882a593Smuzhiyun
891*4882a593Smuzhiyun       ret = synth_event_trace_end(&trace_state);
892*4882a593Smuzhiyun
893*4882a593SmuzhiyunNote that synth_event_trace_end() must be called at the end regardless
894*4882a593Smuzhiyunof whether any of the add calls failed (say due to a bad field name
895*4882a593Smuzhiyunbeing passed in).
896*4882a593Smuzhiyun
897*4882a593Smuzhiyun7.3 Dyamically creating kprobe and kretprobe event definitions
898*4882a593Smuzhiyun--------------------------------------------------------------
899*4882a593Smuzhiyun
900*4882a593SmuzhiyunTo create a kprobe or kretprobe trace event from kernel code, the
901*4882a593Smuzhiyunkprobe_event_gen_cmd_start() or kretprobe_event_gen_cmd_start()
902*4882a593Smuzhiyunfunctions can be used.
903*4882a593Smuzhiyun
904*4882a593SmuzhiyunTo create a kprobe event, an empty or partially empty kprobe event
905*4882a593Smuzhiyunshould first be created using kprobe_event_gen_cmd_start().  The name
906*4882a593Smuzhiyunof the event and the probe location should be specfied along with one
907*4882a593Smuzhiyunor args each representing a probe field should be supplied to this
908*4882a593Smuzhiyunfunction.  Before calling kprobe_event_gen_cmd_start(), the user
909*4882a593Smuzhiyunshould create and initialize a dynevent_cmd object using
910*4882a593Smuzhiyunkprobe_event_cmd_init().
911*4882a593Smuzhiyun
912*4882a593SmuzhiyunFor example, to create a new "schedtest" kprobe event with two fields::
913*4882a593Smuzhiyun
914*4882a593Smuzhiyun  struct dynevent_cmd cmd;
915*4882a593Smuzhiyun  char *buf;
916*4882a593Smuzhiyun
917*4882a593Smuzhiyun  /* Create a buffer to hold the generated command */
918*4882a593Smuzhiyun  buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
919*4882a593Smuzhiyun
920*4882a593Smuzhiyun  /* Before generating the command, initialize the cmd object */
921*4882a593Smuzhiyun  kprobe_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
922*4882a593Smuzhiyun
923*4882a593Smuzhiyun  /*
924*4882a593Smuzhiyun   * Define the gen_kprobe_test event with the first 2 kprobe
925*4882a593Smuzhiyun   * fields.
926*4882a593Smuzhiyun   */
927*4882a593Smuzhiyun  ret = kprobe_event_gen_cmd_start(&cmd, "gen_kprobe_test", "do_sys_open",
928*4882a593Smuzhiyun                                   "dfd=%ax", "filename=%dx");
929*4882a593Smuzhiyun
930*4882a593SmuzhiyunOnce the kprobe event object has been created, it can then be
931*4882a593Smuzhiyunpopulated with more fields.  Fields can be added using
932*4882a593Smuzhiyunkprobe_event_add_fields(), supplying the dynevent_cmd object along
933*4882a593Smuzhiyunwith a variable arg list of probe fields.  For example, to add a
934*4882a593Smuzhiyuncouple additional fields, the following call could be made::
935*4882a593Smuzhiyun
936*4882a593Smuzhiyun  ret = kprobe_event_add_fields(&cmd, "flags=%cx", "mode=+4($stack)");
937*4882a593Smuzhiyun
938*4882a593SmuzhiyunOnce all the fields have been added, the event should be finalized and
939*4882a593Smuzhiyunregistered by calling the kprobe_event_gen_cmd_end() or
940*4882a593Smuzhiyunkretprobe_event_gen_cmd_end() functions, depending on whether a kprobe
941*4882a593Smuzhiyunor kretprobe command was started::
942*4882a593Smuzhiyun
943*4882a593Smuzhiyun  ret = kprobe_event_gen_cmd_end(&cmd);
944*4882a593Smuzhiyun
945*4882a593Smuzhiyunor::
946*4882a593Smuzhiyun
947*4882a593Smuzhiyun  ret = kretprobe_event_gen_cmd_end(&cmd);
948*4882a593Smuzhiyun
949*4882a593SmuzhiyunAt this point, the event object is ready to be used for tracing new
950*4882a593Smuzhiyunevents.
951*4882a593Smuzhiyun
952*4882a593SmuzhiyunSimilarly, a kretprobe event can be created using
953*4882a593Smuzhiyunkretprobe_event_gen_cmd_start() with a probe name and location and
954*4882a593Smuzhiyunadditional params such as $retval::
955*4882a593Smuzhiyun
956*4882a593Smuzhiyun  ret = kretprobe_event_gen_cmd_start(&cmd, "gen_kretprobe_test",
957*4882a593Smuzhiyun                                      "do_sys_open", "$retval");
958*4882a593Smuzhiyun
959*4882a593SmuzhiyunSimilar to the synthetic event case, code like the following can be
960*4882a593Smuzhiyunused to enable the newly created kprobe event::
961*4882a593Smuzhiyun
962*4882a593Smuzhiyun  gen_kprobe_test = trace_get_event_file(NULL, "kprobes", "gen_kprobe_test");
963*4882a593Smuzhiyun
964*4882a593Smuzhiyun  ret = trace_array_set_clr_event(gen_kprobe_test->tr,
965*4882a593Smuzhiyun                                  "kprobes", "gen_kprobe_test", true);
966*4882a593Smuzhiyun
967*4882a593SmuzhiyunFinally, also similar to synthetic events, the following code can be
968*4882a593Smuzhiyunused to give the kprobe event file back and delete the event::
969*4882a593Smuzhiyun
970*4882a593Smuzhiyun  trace_put_event_file(gen_kprobe_test);
971*4882a593Smuzhiyun
972*4882a593Smuzhiyun  ret = kprobe_event_delete("gen_kprobe_test");
973*4882a593Smuzhiyun
974*4882a593Smuzhiyun7.4 The "dynevent_cmd" low-level API
975*4882a593Smuzhiyun------------------------------------
976*4882a593Smuzhiyun
977*4882a593SmuzhiyunBoth the in-kernel synthetic event and kprobe interfaces are built on
978*4882a593Smuzhiyuntop of a lower-level "dynevent_cmd" interface.  This interface is
979*4882a593Smuzhiyunmeant to provide the basis for higher-level interfaces such as the
980*4882a593Smuzhiyunsynthetic and kprobe interfaces, which can be used as examples.
981*4882a593Smuzhiyun
982*4882a593SmuzhiyunThe basic idea is simple and amounts to providing a general-purpose
983*4882a593Smuzhiyunlayer that can be used to generate trace event commands.  The
984*4882a593Smuzhiyungenerated command strings can then be passed to the command-parsing
985*4882a593Smuzhiyunand event creation code that already exists in the trace event
986*4882a593Smuzhiyunsubystem for creating the corresponding trace events.
987*4882a593Smuzhiyun
988*4882a593SmuzhiyunIn a nutshell, the way it works is that the higher-level interface
989*4882a593Smuzhiyuncode creates a struct dynevent_cmd object, then uses a couple
990*4882a593Smuzhiyunfunctions, dynevent_arg_add() and dynevent_arg_pair_add() to build up
991*4882a593Smuzhiyuna command string, which finally causes the command to be executed
992*4882a593Smuzhiyunusing the dynevent_create() function.  The details of the interface
993*4882a593Smuzhiyunare described below.
994*4882a593Smuzhiyun
995*4882a593SmuzhiyunThe first step in building a new command string is to create and
996*4882a593Smuzhiyuninitialize an instance of a dynevent_cmd.  Here, for instance, we
997*4882a593Smuzhiyuncreate a dynevent_cmd on the stack and initialize it::
998*4882a593Smuzhiyun
999*4882a593Smuzhiyun  struct dynevent_cmd cmd;
1000*4882a593Smuzhiyun  char *buf;
1001*4882a593Smuzhiyun  int ret;
1002*4882a593Smuzhiyun
1003*4882a593Smuzhiyun  buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
1004*4882a593Smuzhiyun
1005*4882a593Smuzhiyun  dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_FOO,
1006*4882a593Smuzhiyun                    foo_event_run_command);
1007*4882a593Smuzhiyun
1008*4882a593SmuzhiyunThe dynevent_cmd initialization needs to be given a user-specified
1009*4882a593Smuzhiyunbuffer and the length of the buffer (MAX_DYNEVENT_CMD_LEN can be used
1010*4882a593Smuzhiyunfor this purpose - at 2k it's generally too big to be comfortably put
1011*4882a593Smuzhiyunon the stack, so is dynamically allocated), a dynevent type id, which
1012*4882a593Smuzhiyunis meant to be used to check that further API calls are for the
1013*4882a593Smuzhiyuncorrect command type, and a pointer to an event-specific run_command()
1014*4882a593Smuzhiyuncallback that will be called to actually execute the event-specific
1015*4882a593Smuzhiyuncommand function.
1016*4882a593Smuzhiyun
1017*4882a593SmuzhiyunOnce that's done, the command string can by built up by successive
1018*4882a593Smuzhiyuncalls to argument-adding functions.
1019*4882a593Smuzhiyun
1020*4882a593SmuzhiyunTo add a single argument, define and initialize a struct dynevent_arg
1021*4882a593Smuzhiyunor struct dynevent_arg_pair object.  Here's an example of the simplest
1022*4882a593Smuzhiyunpossible arg addition, which is simply to append the given string as
1023*4882a593Smuzhiyuna whitespace-separated argument to the command::
1024*4882a593Smuzhiyun
1025*4882a593Smuzhiyun  struct dynevent_arg arg;
1026*4882a593Smuzhiyun
1027*4882a593Smuzhiyun  dynevent_arg_init(&arg, NULL, 0);
1028*4882a593Smuzhiyun
1029*4882a593Smuzhiyun  arg.str = name;
1030*4882a593Smuzhiyun
1031*4882a593Smuzhiyun  ret = dynevent_arg_add(cmd, &arg);
1032*4882a593Smuzhiyun
1033*4882a593SmuzhiyunThe arg object is first initialized using dynevent_arg_init() and in
1034*4882a593Smuzhiyunthis case the parameters are NULL or 0, which means there's no
1035*4882a593Smuzhiyunoptional sanity-checking function or separator appended to the end of
1036*4882a593Smuzhiyunthe arg.
1037*4882a593Smuzhiyun
1038*4882a593SmuzhiyunHere's another more complicated example using an 'arg pair', which is
1039*4882a593Smuzhiyunused to create an argument that consists of a couple components added
1040*4882a593Smuzhiyuntogether as a unit, for example, a 'type field_name;' arg or a simple
1041*4882a593Smuzhiyunexpression arg e.g. 'flags=%cx'::
1042*4882a593Smuzhiyun
1043*4882a593Smuzhiyun  struct dynevent_arg_pair arg_pair;
1044*4882a593Smuzhiyun
1045*4882a593Smuzhiyun  dynevent_arg_pair_init(&arg_pair, dynevent_foo_check_arg_fn, 0, ';');
1046*4882a593Smuzhiyun
1047*4882a593Smuzhiyun  arg_pair.lhs = type;
1048*4882a593Smuzhiyun  arg_pair.rhs = name;
1049*4882a593Smuzhiyun
1050*4882a593Smuzhiyun  ret = dynevent_arg_pair_add(cmd, &arg_pair);
1051*4882a593Smuzhiyun
1052*4882a593SmuzhiyunAgain, the arg_pair is first initialized, in this case with a callback
1053*4882a593Smuzhiyunfunction used to check the sanity of the args (for example, that
1054*4882a593Smuzhiyunneither part of the pair is NULL), along with a character to be used
1055*4882a593Smuzhiyunto add an operator between the pair (here none) and a separator to be
1056*4882a593Smuzhiyunappended onto the end of the arg pair (here ';').
1057*4882a593Smuzhiyun
1058*4882a593SmuzhiyunThere's also a dynevent_str_add() function that can be used to simply
1059*4882a593Smuzhiyunadd a string as-is, with no spaces, delimeters, or arg check.
1060*4882a593Smuzhiyun
1061*4882a593SmuzhiyunAny number of dynevent_*_add() calls can be made to build up the string
1062*4882a593Smuzhiyun(until its length surpasses cmd->maxlen).  When all the arguments have
1063*4882a593Smuzhiyunbeen added and the command string is complete, the only thing left to
1064*4882a593Smuzhiyundo is run the command, which happens by simply calling
1065*4882a593Smuzhiyundynevent_create()::
1066*4882a593Smuzhiyun
1067*4882a593Smuzhiyun  ret = dynevent_create(&cmd);
1068*4882a593Smuzhiyun
1069*4882a593SmuzhiyunAt that point, if the return value is 0, the dynamic event has been
1070*4882a593Smuzhiyuncreated and is ready to use.
1071*4882a593Smuzhiyun
1072*4882a593SmuzhiyunSee the dynevent_cmd function definitions themselves for the details
1073*4882a593Smuzhiyunof the API.
1074