xref: /OK3568_Linux_fs/kernel/Documentation/filesystems/relay.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun==================================
4*4882a593Smuzhiyunrelay interface (formerly relayfs)
5*4882a593Smuzhiyun==================================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunThe relay interface provides a means for kernel applications to
8*4882a593Smuzhiyunefficiently log and transfer large quantities of data from the kernel
9*4882a593Smuzhiyunto userspace via user-defined 'relay channels'.
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunA 'relay channel' is a kernel->user data relay mechanism implemented
12*4882a593Smuzhiyunas a set of per-cpu kernel buffers ('channel buffers'), each
13*4882a593Smuzhiyunrepresented as a regular file ('relay file') in user space.  Kernel
14*4882a593Smuzhiyunclients write into the channel buffers using efficient write
15*4882a593Smuzhiyunfunctions; these automatically log into the current cpu's channel
16*4882a593Smuzhiyunbuffer.  User space applications mmap() or read() from the relay files
17*4882a593Smuzhiyunand retrieve the data as it becomes available.  The relay files
18*4882a593Smuzhiyunthemselves are files created in a host filesystem, e.g. debugfs, and
19*4882a593Smuzhiyunare associated with the channel buffers using the API described below.
20*4882a593Smuzhiyun
21*4882a593SmuzhiyunThe format of the data logged into the channel buffers is completely
22*4882a593Smuzhiyunup to the kernel client; the relay interface does however provide
23*4882a593Smuzhiyunhooks which allow kernel clients to impose some structure on the
24*4882a593Smuzhiyunbuffer data.  The relay interface doesn't implement any form of data
25*4882a593Smuzhiyunfiltering - this also is left to the kernel client.  The purpose is to
26*4882a593Smuzhiyunkeep things as simple as possible.
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunThis document provides an overview of the relay interface API.  The
29*4882a593Smuzhiyundetails of the function parameters are documented along with the
30*4882a593Smuzhiyunfunctions in the relay interface code - please see that for details.
31*4882a593Smuzhiyun
32*4882a593SmuzhiyunSemantics
33*4882a593Smuzhiyun=========
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunEach relay channel has one buffer per CPU, each buffer has one or more
36*4882a593Smuzhiyunsub-buffers.  Messages are written to the first sub-buffer until it is
37*4882a593Smuzhiyuntoo full to contain a new message, in which case it is written to
38*4882a593Smuzhiyunthe next (if available).  Messages are never split across sub-buffers.
39*4882a593SmuzhiyunAt this point, userspace can be notified so it empties the first
40*4882a593Smuzhiyunsub-buffer, while the kernel continues writing to the next.
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunWhen notified that a sub-buffer is full, the kernel knows how many
43*4882a593Smuzhiyunbytes of it are padding i.e. unused space occurring because a complete
44*4882a593Smuzhiyunmessage couldn't fit into a sub-buffer.  Userspace can use this
45*4882a593Smuzhiyunknowledge to copy only valid data.
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunAfter copying it, userspace can notify the kernel that a sub-buffer
48*4882a593Smuzhiyunhas been consumed.
49*4882a593Smuzhiyun
50*4882a593SmuzhiyunA relay channel can operate in a mode where it will overwrite data not
51*4882a593Smuzhiyunyet collected by userspace, and not wait for it to be consumed.
52*4882a593Smuzhiyun
53*4882a593SmuzhiyunThe relay channel itself does not provide for communication of such
54*4882a593Smuzhiyundata between userspace and kernel, allowing the kernel side to remain
55*4882a593Smuzhiyunsimple and not impose a single interface on userspace.  It does
56*4882a593Smuzhiyunprovide a set of examples and a separate helper though, described
57*4882a593Smuzhiyunbelow.
58*4882a593Smuzhiyun
59*4882a593SmuzhiyunThe read() interface both removes padding and internally consumes the
60*4882a593Smuzhiyunread sub-buffers; thus in cases where read(2) is being used to drain
61*4882a593Smuzhiyunthe channel buffers, special-purpose communication between kernel and
62*4882a593Smuzhiyunuser isn't necessary for basic operation.
63*4882a593Smuzhiyun
64*4882a593SmuzhiyunOne of the major goals of the relay interface is to provide a low
65*4882a593Smuzhiyunoverhead mechanism for conveying kernel data to userspace.  While the
66*4882a593Smuzhiyunread() interface is easy to use, it's not as efficient as the mmap()
67*4882a593Smuzhiyunapproach; the example code attempts to make the tradeoff between the
68*4882a593Smuzhiyuntwo approaches as small as possible.
69*4882a593Smuzhiyun
70*4882a593Smuzhiyunklog and relay-apps example code
71*4882a593Smuzhiyun================================
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunThe relay interface itself is ready to use, but to make things easier,
74*4882a593Smuzhiyuna couple simple utility functions and a set of examples are provided.
75*4882a593Smuzhiyun
76*4882a593SmuzhiyunThe relay-apps example tarball, available on the relay sourceforge
77*4882a593Smuzhiyunsite, contains a set of self-contained examples, each consisting of a
78*4882a593Smuzhiyunpair of .c files containing boilerplate code for each of the user and
79*4882a593Smuzhiyunkernel sides of a relay application.  When combined these two sets of
80*4882a593Smuzhiyunboilerplate code provide glue to easily stream data to disk, without
81*4882a593Smuzhiyunhaving to bother with mundane housekeeping chores.
82*4882a593Smuzhiyun
83*4882a593SmuzhiyunThe 'klog debugging functions' patch (klog.patch in the relay-apps
84*4882a593Smuzhiyuntarball) provides a couple of high-level logging functions to the
85*4882a593Smuzhiyunkernel which allow writing formatted text or raw data to a channel,
86*4882a593Smuzhiyunregardless of whether a channel to write into exists or not, or even
87*4882a593Smuzhiyunwhether the relay interface is compiled into the kernel or not.  These
88*4882a593Smuzhiyunfunctions allow you to put unconditional 'trace' statements anywhere
89*4882a593Smuzhiyunin the kernel or kernel modules; only when there is a 'klog handler'
90*4882a593Smuzhiyunregistered will data actually be logged (see the klog and kleak
91*4882a593Smuzhiyunexamples for details).
92*4882a593Smuzhiyun
93*4882a593SmuzhiyunIt is of course possible to use the relay interface from scratch,
94*4882a593Smuzhiyuni.e. without using any of the relay-apps example code or klog, but
95*4882a593Smuzhiyunyou'll have to implement communication between userspace and kernel,
96*4882a593Smuzhiyunallowing both to convey the state of buffers (full, empty, amount of
97*4882a593Smuzhiyunpadding).  The read() interface both removes padding and internally
98*4882a593Smuzhiyunconsumes the read sub-buffers; thus in cases where read(2) is being
99*4882a593Smuzhiyunused to drain the channel buffers, special-purpose communication
100*4882a593Smuzhiyunbetween kernel and user isn't necessary for basic operation.  Things
101*4882a593Smuzhiyunsuch as buffer-full conditions would still need to be communicated via
102*4882a593Smuzhiyunsome channel though.
103*4882a593Smuzhiyun
104*4882a593Smuzhiyunklog and the relay-apps examples can be found in the relay-apps
105*4882a593Smuzhiyuntarball on http://relayfs.sourceforge.net
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunThe relay interface user space API
108*4882a593Smuzhiyun==================================
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunThe relay interface implements basic file operations for user space
111*4882a593Smuzhiyunaccess to relay channel buffer data.  Here are the file operations
112*4882a593Smuzhiyunthat are available and some comments regarding their behavior:
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun=========== ============================================================
115*4882a593Smuzhiyunopen()	    enables user to open an _existing_ channel buffer.
116*4882a593Smuzhiyun
117*4882a593Smuzhiyunmmap()      results in channel buffer being mapped into the caller's
118*4882a593Smuzhiyun	    memory space. Note that you can't do a partial mmap - you
119*4882a593Smuzhiyun	    must map the entire file, which is NRBUF * SUBBUFSIZE.
120*4882a593Smuzhiyun
121*4882a593Smuzhiyunread()      read the contents of a channel buffer.  The bytes read are
122*4882a593Smuzhiyun	    'consumed' by the reader, i.e. they won't be available
123*4882a593Smuzhiyun	    again to subsequent reads.  If the channel is being used
124*4882a593Smuzhiyun	    in no-overwrite mode (the default), it can be read at any
125*4882a593Smuzhiyun	    time even if there's an active kernel writer.  If the
126*4882a593Smuzhiyun	    channel is being used in overwrite mode and there are
127*4882a593Smuzhiyun	    active channel writers, results may be unpredictable -
128*4882a593Smuzhiyun	    users should make sure that all logging to the channel has
129*4882a593Smuzhiyun	    ended before using read() with overwrite mode.  Sub-buffer
130*4882a593Smuzhiyun	    padding is automatically removed and will not be seen by
131*4882a593Smuzhiyun	    the reader.
132*4882a593Smuzhiyun
133*4882a593Smuzhiyunsendfile()  transfer data from a channel buffer to an output file
134*4882a593Smuzhiyun	    descriptor. Sub-buffer padding is automatically removed
135*4882a593Smuzhiyun	    and will not be seen by the reader.
136*4882a593Smuzhiyun
137*4882a593Smuzhiyunpoll()      POLLIN/POLLRDNORM/POLLERR supported.  User applications are
138*4882a593Smuzhiyun	    notified when sub-buffer boundaries are crossed.
139*4882a593Smuzhiyun
140*4882a593Smuzhiyunclose()     decrements the channel buffer's refcount.  When the refcount
141*4882a593Smuzhiyun	    reaches 0, i.e. when no process or kernel client has the
142*4882a593Smuzhiyun	    buffer open, the channel buffer is freed.
143*4882a593Smuzhiyun=========== ============================================================
144*4882a593Smuzhiyun
145*4882a593SmuzhiyunIn order for a user application to make use of relay files, the
146*4882a593Smuzhiyunhost filesystem must be mounted.  For example::
147*4882a593Smuzhiyun
148*4882a593Smuzhiyun	mount -t debugfs debugfs /sys/kernel/debug
149*4882a593Smuzhiyun
150*4882a593Smuzhiyun.. Note::
151*4882a593Smuzhiyun
152*4882a593Smuzhiyun	the host filesystem doesn't need to be mounted for kernel
153*4882a593Smuzhiyun	clients to create or use channels - it only needs to be
154*4882a593Smuzhiyun	mounted when user space applications need access to the buffer
155*4882a593Smuzhiyun	data.
156*4882a593Smuzhiyun
157*4882a593Smuzhiyun
158*4882a593SmuzhiyunThe relay interface kernel API
159*4882a593Smuzhiyun==============================
160*4882a593Smuzhiyun
161*4882a593SmuzhiyunHere's a summary of the API the relay interface provides to in-kernel clients:
162*4882a593Smuzhiyun
163*4882a593SmuzhiyunTBD(curr. line MT:/API/)
164*4882a593Smuzhiyun  channel management functions::
165*4882a593Smuzhiyun
166*4882a593Smuzhiyun    relay_open(base_filename, parent, subbuf_size, n_subbufs,
167*4882a593Smuzhiyun               callbacks, private_data)
168*4882a593Smuzhiyun    relay_close(chan)
169*4882a593Smuzhiyun    relay_flush(chan)
170*4882a593Smuzhiyun    relay_reset(chan)
171*4882a593Smuzhiyun
172*4882a593Smuzhiyun  channel management typically called on instigation of userspace::
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun    relay_subbufs_consumed(chan, cpu, subbufs_consumed)
175*4882a593Smuzhiyun
176*4882a593Smuzhiyun  write functions::
177*4882a593Smuzhiyun
178*4882a593Smuzhiyun    relay_write(chan, data, length)
179*4882a593Smuzhiyun    __relay_write(chan, data, length)
180*4882a593Smuzhiyun    relay_reserve(chan, length)
181*4882a593Smuzhiyun
182*4882a593Smuzhiyun  callbacks::
183*4882a593Smuzhiyun
184*4882a593Smuzhiyun    subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
185*4882a593Smuzhiyun    buf_mapped(buf, filp)
186*4882a593Smuzhiyun    buf_unmapped(buf, filp)
187*4882a593Smuzhiyun    create_buf_file(filename, parent, mode, buf, is_global)
188*4882a593Smuzhiyun    remove_buf_file(dentry)
189*4882a593Smuzhiyun
190*4882a593Smuzhiyun  helper functions::
191*4882a593Smuzhiyun
192*4882a593Smuzhiyun    relay_buf_full(buf)
193*4882a593Smuzhiyun    subbuf_start_reserve(buf, length)
194*4882a593Smuzhiyun
195*4882a593Smuzhiyun
196*4882a593SmuzhiyunCreating a channel
197*4882a593Smuzhiyun------------------
198*4882a593Smuzhiyun
199*4882a593Smuzhiyunrelay_open() is used to create a channel, along with its per-cpu
200*4882a593Smuzhiyunchannel buffers.  Each channel buffer will have an associated file
201*4882a593Smuzhiyuncreated for it in the host filesystem, which can be and mmapped or
202*4882a593Smuzhiyunread from in user space.  The files are named basename0...basenameN-1
203*4882a593Smuzhiyunwhere N is the number of online cpus, and by default will be created
204*4882a593Smuzhiyunin the root of the filesystem (if the parent param is NULL).  If you
205*4882a593Smuzhiyunwant a directory structure to contain your relay files, you should
206*4882a593Smuzhiyuncreate it using the host filesystem's directory creation function,
207*4882a593Smuzhiyune.g. debugfs_create_dir(), and pass the parent directory to
208*4882a593Smuzhiyunrelay_open().  Users are responsible for cleaning up any directory
209*4882a593Smuzhiyunstructure they create, when the channel is closed - again the host
210*4882a593Smuzhiyunfilesystem's directory removal functions should be used for that,
211*4882a593Smuzhiyune.g. debugfs_remove().
212*4882a593Smuzhiyun
213*4882a593SmuzhiyunIn order for a channel to be created and the host filesystem's files
214*4882a593Smuzhiyunassociated with its channel buffers, the user must provide definitions
215*4882a593Smuzhiyunfor two callback functions, create_buf_file() and remove_buf_file().
216*4882a593Smuzhiyuncreate_buf_file() is called once for each per-cpu buffer from
217*4882a593Smuzhiyunrelay_open() and allows the user to create the file which will be used
218*4882a593Smuzhiyunto represent the corresponding channel buffer.  The callback should
219*4882a593Smuzhiyunreturn the dentry of the file created to represent the channel buffer.
220*4882a593Smuzhiyunremove_buf_file() must also be defined; it's responsible for deleting
221*4882a593Smuzhiyunthe file(s) created in create_buf_file() and is called during
222*4882a593Smuzhiyunrelay_close().
223*4882a593Smuzhiyun
224*4882a593SmuzhiyunHere are some typical definitions for these callbacks, in this case
225*4882a593Smuzhiyunusing debugfs::
226*4882a593Smuzhiyun
227*4882a593Smuzhiyun    /*
228*4882a593Smuzhiyun    * create_buf_file() callback.  Creates relay file in debugfs.
229*4882a593Smuzhiyun    */
230*4882a593Smuzhiyun    static struct dentry *create_buf_file_handler(const char *filename,
231*4882a593Smuzhiyun						struct dentry *parent,
232*4882a593Smuzhiyun						umode_t mode,
233*4882a593Smuzhiyun						struct rchan_buf *buf,
234*4882a593Smuzhiyun						int *is_global)
235*4882a593Smuzhiyun    {
236*4882a593Smuzhiyun	    return debugfs_create_file(filename, mode, parent, buf,
237*4882a593Smuzhiyun				    &relay_file_operations);
238*4882a593Smuzhiyun    }
239*4882a593Smuzhiyun
240*4882a593Smuzhiyun    /*
241*4882a593Smuzhiyun    * remove_buf_file() callback.  Removes relay file from debugfs.
242*4882a593Smuzhiyun    */
243*4882a593Smuzhiyun    static int remove_buf_file_handler(struct dentry *dentry)
244*4882a593Smuzhiyun    {
245*4882a593Smuzhiyun	    debugfs_remove(dentry);
246*4882a593Smuzhiyun
247*4882a593Smuzhiyun	    return 0;
248*4882a593Smuzhiyun    }
249*4882a593Smuzhiyun
250*4882a593Smuzhiyun    /*
251*4882a593Smuzhiyun    * relay interface callbacks
252*4882a593Smuzhiyun    */
253*4882a593Smuzhiyun    static struct rchan_callbacks relay_callbacks =
254*4882a593Smuzhiyun    {
255*4882a593Smuzhiyun	    .create_buf_file = create_buf_file_handler,
256*4882a593Smuzhiyun	    .remove_buf_file = remove_buf_file_handler,
257*4882a593Smuzhiyun    };
258*4882a593Smuzhiyun
259*4882a593SmuzhiyunAnd an example relay_open() invocation using them::
260*4882a593Smuzhiyun
261*4882a593Smuzhiyun  chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks, NULL);
262*4882a593Smuzhiyun
263*4882a593SmuzhiyunIf the create_buf_file() callback fails, or isn't defined, channel
264*4882a593Smuzhiyuncreation and thus relay_open() will fail.
265*4882a593Smuzhiyun
266*4882a593SmuzhiyunThe total size of each per-cpu buffer is calculated by multiplying the
267*4882a593Smuzhiyunnumber of sub-buffers by the sub-buffer size passed into relay_open().
268*4882a593SmuzhiyunThe idea behind sub-buffers is that they're basically an extension of
269*4882a593Smuzhiyundouble-buffering to N buffers, and they also allow applications to
270*4882a593Smuzhiyuneasily implement random-access-on-buffer-boundary schemes, which can
271*4882a593Smuzhiyunbe important for some high-volume applications.  The number and size
272*4882a593Smuzhiyunof sub-buffers is completely dependent on the application and even for
273*4882a593Smuzhiyunthe same application, different conditions will warrant different
274*4882a593Smuzhiyunvalues for these parameters at different times.  Typically, the right
275*4882a593Smuzhiyunvalues to use are best decided after some experimentation; in general,
276*4882a593Smuzhiyunthough, it's safe to assume that having only 1 sub-buffer is a bad
277*4882a593Smuzhiyunidea - you're guaranteed to either overwrite data or lose events
278*4882a593Smuzhiyundepending on the channel mode being used.
279*4882a593Smuzhiyun
280*4882a593SmuzhiyunThe create_buf_file() implementation can also be defined in such a way
281*4882a593Smuzhiyunas to allow the creation of a single 'global' buffer instead of the
282*4882a593Smuzhiyundefault per-cpu set.  This can be useful for applications interested
283*4882a593Smuzhiyunmainly in seeing the relative ordering of system-wide events without
284*4882a593Smuzhiyunthe need to bother with saving explicit timestamps for the purpose of
285*4882a593Smuzhiyunmerging/sorting per-cpu files in a postprocessing step.
286*4882a593Smuzhiyun
287*4882a593SmuzhiyunTo have relay_open() create a global buffer, the create_buf_file()
288*4882a593Smuzhiyunimplementation should set the value of the is_global outparam to a
289*4882a593Smuzhiyunnon-zero value in addition to creating the file that will be used to
290*4882a593Smuzhiyunrepresent the single buffer.  In the case of a global buffer,
291*4882a593Smuzhiyuncreate_buf_file() and remove_buf_file() will be called only once.  The
292*4882a593Smuzhiyunnormal channel-writing functions, e.g. relay_write(), can still be
293*4882a593Smuzhiyunused - writes from any cpu will transparently end up in the global
294*4882a593Smuzhiyunbuffer - but since it is a global buffer, callers should make sure
295*4882a593Smuzhiyunthey use the proper locking for such a buffer, either by wrapping
296*4882a593Smuzhiyunwrites in a spinlock, or by copying a write function from relay.h and
297*4882a593Smuzhiyuncreating a local version that internally does the proper locking.
298*4882a593Smuzhiyun
299*4882a593SmuzhiyunThe private_data passed into relay_open() allows clients to associate
300*4882a593Smuzhiyunuser-defined data with a channel, and is immediately available
301*4882a593Smuzhiyun(including in create_buf_file()) via chan->private_data or
302*4882a593Smuzhiyunbuf->chan->private_data.
303*4882a593Smuzhiyun
304*4882a593SmuzhiyunBuffer-only channels
305*4882a593Smuzhiyun--------------------
306*4882a593Smuzhiyun
307*4882a593SmuzhiyunThese channels have no files associated and can be created with
308*4882a593Smuzhiyunrelay_open(NULL, NULL, ...). Such channels are useful in scenarios such
309*4882a593Smuzhiyunas when doing early tracing in the kernel, before the VFS is up. In these
310*4882a593Smuzhiyuncases, one may open a buffer-only channel and then call
311*4882a593Smuzhiyunrelay_late_setup_files() when the kernel is ready to handle files,
312*4882a593Smuzhiyunto expose the buffered data to the userspace.
313*4882a593Smuzhiyun
314*4882a593SmuzhiyunChannel 'modes'
315*4882a593Smuzhiyun---------------
316*4882a593Smuzhiyun
317*4882a593Smuzhiyunrelay channels can be used in either of two modes - 'overwrite' or
318*4882a593Smuzhiyun'no-overwrite'.  The mode is entirely determined by the implementation
319*4882a593Smuzhiyunof the subbuf_start() callback, as described below.  The default if no
320*4882a593Smuzhiyunsubbuf_start() callback is defined is 'no-overwrite' mode.  If the
321*4882a593Smuzhiyundefault mode suits your needs, and you plan to use the read()
322*4882a593Smuzhiyuninterface to retrieve channel data, you can ignore the details of this
323*4882a593Smuzhiyunsection, as it pertains mainly to mmap() implementations.
324*4882a593Smuzhiyun
325*4882a593SmuzhiyunIn 'overwrite' mode, also known as 'flight recorder' mode, writes
326*4882a593Smuzhiyuncontinuously cycle around the buffer and will never fail, but will
327*4882a593Smuzhiyununconditionally overwrite old data regardless of whether it's actually
328*4882a593Smuzhiyunbeen consumed.  In no-overwrite mode, writes will fail, i.e. data will
329*4882a593Smuzhiyunbe lost, if the number of unconsumed sub-buffers equals the total
330*4882a593Smuzhiyunnumber of sub-buffers in the channel.  It should be clear that if
331*4882a593Smuzhiyunthere is no consumer or if the consumer can't consume sub-buffers fast
332*4882a593Smuzhiyunenough, data will be lost in either case; the only difference is
333*4882a593Smuzhiyunwhether data is lost from the beginning or the end of a buffer.
334*4882a593Smuzhiyun
335*4882a593SmuzhiyunAs explained above, a relay channel is made of up one or more
336*4882a593Smuzhiyunper-cpu channel buffers, each implemented as a circular buffer
337*4882a593Smuzhiyunsubdivided into one or more sub-buffers.  Messages are written into
338*4882a593Smuzhiyunthe current sub-buffer of the channel's current per-cpu buffer via the
339*4882a593Smuzhiyunwrite functions described below.  Whenever a message can't fit into
340*4882a593Smuzhiyunthe current sub-buffer, because there's no room left for it, the
341*4882a593Smuzhiyunclient is notified via the subbuf_start() callback that a switch to a
342*4882a593Smuzhiyunnew sub-buffer is about to occur.  The client uses this callback to 1)
343*4882a593Smuzhiyuninitialize the next sub-buffer if appropriate 2) finalize the previous
344*4882a593Smuzhiyunsub-buffer if appropriate and 3) return a boolean value indicating
345*4882a593Smuzhiyunwhether or not to actually move on to the next sub-buffer.
346*4882a593Smuzhiyun
347*4882a593SmuzhiyunTo implement 'no-overwrite' mode, the userspace client would provide
348*4882a593Smuzhiyunan implementation of the subbuf_start() callback something like the
349*4882a593Smuzhiyunfollowing::
350*4882a593Smuzhiyun
351*4882a593Smuzhiyun    static int subbuf_start(struct rchan_buf *buf,
352*4882a593Smuzhiyun			    void *subbuf,
353*4882a593Smuzhiyun			    void *prev_subbuf,
354*4882a593Smuzhiyun			    unsigned int prev_padding)
355*4882a593Smuzhiyun    {
356*4882a593Smuzhiyun	    if (prev_subbuf)
357*4882a593Smuzhiyun		    *((unsigned *)prev_subbuf) = prev_padding;
358*4882a593Smuzhiyun
359*4882a593Smuzhiyun	    if (relay_buf_full(buf))
360*4882a593Smuzhiyun		    return 0;
361*4882a593Smuzhiyun
362*4882a593Smuzhiyun	    subbuf_start_reserve(buf, sizeof(unsigned int));
363*4882a593Smuzhiyun
364*4882a593Smuzhiyun	    return 1;
365*4882a593Smuzhiyun    }
366*4882a593Smuzhiyun
367*4882a593SmuzhiyunIf the current buffer is full, i.e. all sub-buffers remain unconsumed,
368*4882a593Smuzhiyunthe callback returns 0 to indicate that the buffer switch should not
369*4882a593Smuzhiyunoccur yet, i.e. until the consumer has had a chance to read the
370*4882a593Smuzhiyuncurrent set of ready sub-buffers.  For the relay_buf_full() function
371*4882a593Smuzhiyunto make sense, the consumer is responsible for notifying the relay
372*4882a593Smuzhiyuninterface when sub-buffers have been consumed via
373*4882a593Smuzhiyunrelay_subbufs_consumed().  Any subsequent attempts to write into the
374*4882a593Smuzhiyunbuffer will again invoke the subbuf_start() callback with the same
375*4882a593Smuzhiyunparameters; only when the consumer has consumed one or more of the
376*4882a593Smuzhiyunready sub-buffers will relay_buf_full() return 0, in which case the
377*4882a593Smuzhiyunbuffer switch can continue.
378*4882a593Smuzhiyun
379*4882a593SmuzhiyunThe implementation of the subbuf_start() callback for 'overwrite' mode
380*4882a593Smuzhiyunwould be very similar::
381*4882a593Smuzhiyun
382*4882a593Smuzhiyun    static int subbuf_start(struct rchan_buf *buf,
383*4882a593Smuzhiyun			    void *subbuf,
384*4882a593Smuzhiyun			    void *prev_subbuf,
385*4882a593Smuzhiyun			    size_t prev_padding)
386*4882a593Smuzhiyun    {
387*4882a593Smuzhiyun	    if (prev_subbuf)
388*4882a593Smuzhiyun		    *((unsigned *)prev_subbuf) = prev_padding;
389*4882a593Smuzhiyun
390*4882a593Smuzhiyun	    subbuf_start_reserve(buf, sizeof(unsigned int));
391*4882a593Smuzhiyun
392*4882a593Smuzhiyun	    return 1;
393*4882a593Smuzhiyun    }
394*4882a593Smuzhiyun
395*4882a593SmuzhiyunIn this case, the relay_buf_full() check is meaningless and the
396*4882a593Smuzhiyuncallback always returns 1, causing the buffer switch to occur
397*4882a593Smuzhiyununconditionally.  It's also meaningless for the client to use the
398*4882a593Smuzhiyunrelay_subbufs_consumed() function in this mode, as it's never
399*4882a593Smuzhiyunconsulted.
400*4882a593Smuzhiyun
401*4882a593SmuzhiyunThe default subbuf_start() implementation, used if the client doesn't
402*4882a593Smuzhiyundefine any callbacks, or doesn't define the subbuf_start() callback,
403*4882a593Smuzhiyunimplements the simplest possible 'no-overwrite' mode, i.e. it does
404*4882a593Smuzhiyunnothing but return 0.
405*4882a593Smuzhiyun
406*4882a593SmuzhiyunHeader information can be reserved at the beginning of each sub-buffer
407*4882a593Smuzhiyunby calling the subbuf_start_reserve() helper function from within the
408*4882a593Smuzhiyunsubbuf_start() callback.  This reserved area can be used to store
409*4882a593Smuzhiyunwhatever information the client wants.  In the example above, room is
410*4882a593Smuzhiyunreserved in each sub-buffer to store the padding count for that
411*4882a593Smuzhiyunsub-buffer.  This is filled in for the previous sub-buffer in the
412*4882a593Smuzhiyunsubbuf_start() implementation; the padding value for the previous
413*4882a593Smuzhiyunsub-buffer is passed into the subbuf_start() callback along with a
414*4882a593Smuzhiyunpointer to the previous sub-buffer, since the padding value isn't
415*4882a593Smuzhiyunknown until a sub-buffer is filled.  The subbuf_start() callback is
416*4882a593Smuzhiyunalso called for the first sub-buffer when the channel is opened, to
417*4882a593Smuzhiyungive the client a chance to reserve space in it.  In this case the
418*4882a593Smuzhiyunprevious sub-buffer pointer passed into the callback will be NULL, so
419*4882a593Smuzhiyunthe client should check the value of the prev_subbuf pointer before
420*4882a593Smuzhiyunwriting into the previous sub-buffer.
421*4882a593Smuzhiyun
422*4882a593SmuzhiyunWriting to a channel
423*4882a593Smuzhiyun--------------------
424*4882a593Smuzhiyun
425*4882a593SmuzhiyunKernel clients write data into the current cpu's channel buffer using
426*4882a593Smuzhiyunrelay_write() or __relay_write().  relay_write() is the main logging
427*4882a593Smuzhiyunfunction - it uses local_irqsave() to protect the buffer and should be
428*4882a593Smuzhiyunused if you might be logging from interrupt context.  If you know
429*4882a593Smuzhiyunyou'll never be logging from interrupt context, you can use
430*4882a593Smuzhiyun__relay_write(), which only disables preemption.  These functions
431*4882a593Smuzhiyundon't return a value, so you can't determine whether or not they
432*4882a593Smuzhiyunfailed - the assumption is that you wouldn't want to check a return
433*4882a593Smuzhiyunvalue in the fast logging path anyway, and that they'll always succeed
434*4882a593Smuzhiyununless the buffer is full and no-overwrite mode is being used, in
435*4882a593Smuzhiyunwhich case you can detect a failed write in the subbuf_start()
436*4882a593Smuzhiyuncallback by calling the relay_buf_full() helper function.
437*4882a593Smuzhiyun
438*4882a593Smuzhiyunrelay_reserve() is used to reserve a slot in a channel buffer which
439*4882a593Smuzhiyuncan be written to later.  This would typically be used in applications
440*4882a593Smuzhiyunthat need to write directly into a channel buffer without having to
441*4882a593Smuzhiyunstage data in a temporary buffer beforehand.  Because the actual write
442*4882a593Smuzhiyunmay not happen immediately after the slot is reserved, applications
443*4882a593Smuzhiyunusing relay_reserve() can keep a count of the number of bytes actually
444*4882a593Smuzhiyunwritten, either in space reserved in the sub-buffers themselves or as
445*4882a593Smuzhiyuna separate array.  See the 'reserve' example in the relay-apps tarball
446*4882a593Smuzhiyunat http://relayfs.sourceforge.net for an example of how this can be
447*4882a593Smuzhiyundone.  Because the write is under control of the client and is
448*4882a593Smuzhiyunseparated from the reserve, relay_reserve() doesn't protect the buffer
449*4882a593Smuzhiyunat all - it's up to the client to provide the appropriate
450*4882a593Smuzhiyunsynchronization when using relay_reserve().
451*4882a593Smuzhiyun
452*4882a593SmuzhiyunClosing a channel
453*4882a593Smuzhiyun-----------------
454*4882a593Smuzhiyun
455*4882a593SmuzhiyunThe client calls relay_close() when it's finished using the channel.
456*4882a593SmuzhiyunThe channel and its associated buffers are destroyed when there are no
457*4882a593Smuzhiyunlonger any references to any of the channel buffers.  relay_flush()
458*4882a593Smuzhiyunforces a sub-buffer switch on all the channel buffers, and can be used
459*4882a593Smuzhiyunto finalize and process the last sub-buffers before the channel is
460*4882a593Smuzhiyunclosed.
461*4882a593Smuzhiyun
462*4882a593SmuzhiyunMisc
463*4882a593Smuzhiyun----
464*4882a593Smuzhiyun
465*4882a593SmuzhiyunSome applications may want to keep a channel around and re-use it
466*4882a593Smuzhiyunrather than open and close a new channel for each use.  relay_reset()
467*4882a593Smuzhiyuncan be used for this purpose - it resets a channel to its initial
468*4882a593Smuzhiyunstate without reallocating channel buffer memory or destroying
469*4882a593Smuzhiyunexisting mappings.  It should however only be called when it's safe to
470*4882a593Smuzhiyundo so, i.e. when the channel isn't currently being written to.
471*4882a593Smuzhiyun
472*4882a593SmuzhiyunFinally, there are a couple of utility callbacks that can be used for
473*4882a593Smuzhiyundifferent purposes.  buf_mapped() is called whenever a channel buffer
474*4882a593Smuzhiyunis mmapped from user space and buf_unmapped() is called when it's
475*4882a593Smuzhiyununmapped.  The client can use this notification to trigger actions
476*4882a593Smuzhiyunwithin the kernel application, such as enabling/disabling logging to
477*4882a593Smuzhiyunthe channel.
478*4882a593Smuzhiyun
479*4882a593Smuzhiyun
480*4882a593SmuzhiyunResources
481*4882a593Smuzhiyun=========
482*4882a593Smuzhiyun
483*4882a593SmuzhiyunFor news, example code, mailing list, etc. see the relay interface homepage:
484*4882a593Smuzhiyun
485*4882a593Smuzhiyun    http://relayfs.sourceforge.net
486*4882a593Smuzhiyun
487*4882a593Smuzhiyun
488*4882a593SmuzhiyunCredits
489*4882a593Smuzhiyun=======
490*4882a593Smuzhiyun
491*4882a593SmuzhiyunThe ideas and specs for the relay interface came about as a result of
492*4882a593Smuzhiyundiscussions on tracing involving the following:
493*4882a593Smuzhiyun
494*4882a593SmuzhiyunMichel Dagenais		<michel.dagenais@polymtl.ca>
495*4882a593SmuzhiyunRichard Moore		<richardj_moore@uk.ibm.com>
496*4882a593SmuzhiyunBob Wisniewski		<bob@watson.ibm.com>
497*4882a593SmuzhiyunKarim Yaghmour		<karim@opersys.com>
498*4882a593SmuzhiyunTom Zanussi		<zanussi@us.ibm.com>
499*4882a593Smuzhiyun
500*4882a593SmuzhiyunAlso thanks to Hubertus Franke for a lot of useful suggestions and bug
501*4882a593Smuzhiyunreports.
502