1*4882a593Smuzhiyun============================== 2*4882a593SmuzhiyunGeneral notification mechanism 3*4882a593Smuzhiyun============================== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunThe general notification mechanism is built on top of the standard pipe driver 6*4882a593Smuzhiyunwhereby it effectively splices notification messages from the kernel into pipes 7*4882a593Smuzhiyunopened by userspace. This can be used in conjunction with:: 8*4882a593Smuzhiyun 9*4882a593Smuzhiyun * Key/keyring notifications 10*4882a593Smuzhiyun 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunThe notifications buffers can be enabled by: 13*4882a593Smuzhiyun 14*4882a593Smuzhiyun "General setup"/"General notification queue" 15*4882a593Smuzhiyun (CONFIG_WATCH_QUEUE) 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunThis document has the following sections: 18*4882a593Smuzhiyun 19*4882a593Smuzhiyun.. contents:: :local: 20*4882a593Smuzhiyun 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunOverview 23*4882a593Smuzhiyun======== 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunThis facility appears as a pipe that is opened in a special mode. The pipe's 26*4882a593Smuzhiyuninternal ring buffer is used to hold messages that are generated by the kernel. 27*4882a593SmuzhiyunThese messages are then read out by read(). Splice and similar are disabled on 28*4882a593Smuzhiyunsuch pipes due to them wanting to, under some circumstances, revert their 29*4882a593Smuzhiyunadditions to the ring - which might end up interleaved with notification 30*4882a593Smuzhiyunmessages. 31*4882a593Smuzhiyun 32*4882a593SmuzhiyunThe owner of the pipe has to tell the kernel which sources it would like to 33*4882a593Smuzhiyunwatch through that pipe. Only sources that have been connected to a pipe will 34*4882a593Smuzhiyuninsert messages into it. Note that a source may be bound to multiple pipes and 35*4882a593Smuzhiyuninsert messages into all of them simultaneously. 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunFilters may also be emplaced on a pipe so that certain source types and 38*4882a593Smuzhiyunsubevents can be ignored if they're not of interest. 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunA message will be discarded if there isn't a slot available in the ring or if 41*4882a593Smuzhiyunno preallocated message buffer is available. In both of these cases, read() 42*4882a593Smuzhiyunwill insert a WATCH_META_LOSS_NOTIFICATION message into the output buffer after 43*4882a593Smuzhiyunthe last message currently in the buffer has been read. 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunNote that when producing a notification, the kernel does not wait for the 46*4882a593Smuzhiyunconsumers to collect it, but rather just continues on. This means that 47*4882a593Smuzhiyunnotifications can be generated whilst spinlocks are held and also protects the 48*4882a593Smuzhiyunkernel from being held up indefinitely by a userspace malfunction. 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun 51*4882a593SmuzhiyunMessage Structure 52*4882a593Smuzhiyun================= 53*4882a593Smuzhiyun 54*4882a593SmuzhiyunNotification messages begin with a short header:: 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun struct watch_notification { 57*4882a593Smuzhiyun __u32 type:24; 58*4882a593Smuzhiyun __u32 subtype:8; 59*4882a593Smuzhiyun __u32 info; 60*4882a593Smuzhiyun }; 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun"type" indicates the source of the notification record and "subtype" indicates 63*4882a593Smuzhiyunthe type of record from that source (see the Watch Sources section below). The 64*4882a593Smuzhiyuntype may also be "WATCH_TYPE_META". This is a special record type generated 65*4882a593Smuzhiyuninternally by the watch queue itself. There are two subtypes: 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun * WATCH_META_REMOVAL_NOTIFICATION 68*4882a593Smuzhiyun * WATCH_META_LOSS_NOTIFICATION 69*4882a593Smuzhiyun 70*4882a593SmuzhiyunThe first indicates that an object on which a watch was installed was removed 71*4882a593Smuzhiyunor destroyed and the second indicates that some messages have been lost. 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun"info" indicates a bunch of things, including: 74*4882a593Smuzhiyun 75*4882a593Smuzhiyun * The length of the message in bytes, including the header (mask with 76*4882a593Smuzhiyun WATCH_INFO_LENGTH and shift by WATCH_INFO_LENGTH__SHIFT). This indicates 77*4882a593Smuzhiyun the size of the record, which may be between 8 and 127 bytes. 78*4882a593Smuzhiyun 79*4882a593Smuzhiyun * The watch ID (mask with WATCH_INFO_ID and shift by WATCH_INFO_ID__SHIFT). 80*4882a593Smuzhiyun This indicates that caller's ID of the watch, which may be between 0 81*4882a593Smuzhiyun and 255. Multiple watches may share a queue, and this provides a means to 82*4882a593Smuzhiyun distinguish them. 83*4882a593Smuzhiyun 84*4882a593Smuzhiyun * A type-specific field (WATCH_INFO_TYPE_INFO). This is set by the 85*4882a593Smuzhiyun notification producer to indicate some meaning specific to the type and 86*4882a593Smuzhiyun subtype. 87*4882a593Smuzhiyun 88*4882a593SmuzhiyunEverything in info apart from the length can be used for filtering. 89*4882a593Smuzhiyun 90*4882a593SmuzhiyunThe header can be followed by supplementary information. The format of this is 91*4882a593Smuzhiyunat the discretion is defined by the type and subtype. 92*4882a593Smuzhiyun 93*4882a593Smuzhiyun 94*4882a593SmuzhiyunWatch List (Notification Source) API 95*4882a593Smuzhiyun==================================== 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunA "watch list" is a list of watchers that are subscribed to a source of 98*4882a593Smuzhiyunnotifications. A list may be attached to an object (say a key or a superblock) 99*4882a593Smuzhiyunor may be global (say for device events). From a userspace perspective, a 100*4882a593Smuzhiyunnon-global watch list is typically referred to by reference to the object it 101*4882a593Smuzhiyunbelongs to (such as using KEYCTL_NOTIFY and giving it a key serial number to 102*4882a593Smuzhiyunwatch that specific key). 103*4882a593Smuzhiyun 104*4882a593SmuzhiyunTo manage a watch list, the following functions are provided: 105*4882a593Smuzhiyun 106*4882a593Smuzhiyun * :: 107*4882a593Smuzhiyun 108*4882a593Smuzhiyun void init_watch_list(struct watch_list *wlist, 109*4882a593Smuzhiyun void (*release_watch)(struct watch *wlist)); 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun Initialise a watch list. If ``release_watch`` is not NULL, then this 112*4882a593Smuzhiyun indicates a function that should be called when the watch_list object is 113*4882a593Smuzhiyun destroyed to discard any references the watch list holds on the watched 114*4882a593Smuzhiyun object. 115*4882a593Smuzhiyun 116*4882a593Smuzhiyun * ``void remove_watch_list(struct watch_list *wlist);`` 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun This removes all of the watches subscribed to a watch_list and frees them 119*4882a593Smuzhiyun and then destroys the watch_list object itself. 120*4882a593Smuzhiyun 121*4882a593Smuzhiyun 122*4882a593SmuzhiyunWatch Queue (Notification Output) API 123*4882a593Smuzhiyun===================================== 124*4882a593Smuzhiyun 125*4882a593SmuzhiyunA "watch queue" is the buffer allocated by an application that notification 126*4882a593Smuzhiyunrecords will be written into. The workings of this are hidden entirely inside 127*4882a593Smuzhiyunof the pipe device driver, but it is necessary to gain a reference to it to set 128*4882a593Smuzhiyuna watch. These can be managed with: 129*4882a593Smuzhiyun 130*4882a593Smuzhiyun * ``struct watch_queue *get_watch_queue(int fd);`` 131*4882a593Smuzhiyun 132*4882a593Smuzhiyun Since watch queues are indicated to the kernel by the fd of the pipe that 133*4882a593Smuzhiyun implements the buffer, userspace must hand that fd through a system call. 134*4882a593Smuzhiyun This can be used to look up an opaque pointer to the watch queue from the 135*4882a593Smuzhiyun system call. 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun * ``void put_watch_queue(struct watch_queue *wqueue);`` 138*4882a593Smuzhiyun 139*4882a593Smuzhiyun This discards the reference obtained from ``get_watch_queue()``. 140*4882a593Smuzhiyun 141*4882a593Smuzhiyun 142*4882a593SmuzhiyunWatch Subscription API 143*4882a593Smuzhiyun====================== 144*4882a593Smuzhiyun 145*4882a593SmuzhiyunA "watch" is a subscription on a watch list, indicating the watch queue, and 146*4882a593Smuzhiyunthus the buffer, into which notification records should be written. The watch 147*4882a593Smuzhiyunqueue object may also carry filtering rules for that object, as set by 148*4882a593Smuzhiyunuserspace. Some parts of the watch struct can be set by the driver:: 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun struct watch { 151*4882a593Smuzhiyun union { 152*4882a593Smuzhiyun u32 info_id; /* ID to be OR'd in to info field */ 153*4882a593Smuzhiyun ... 154*4882a593Smuzhiyun }; 155*4882a593Smuzhiyun void *private; /* Private data for the watched object */ 156*4882a593Smuzhiyun u64 id; /* Internal identifier */ 157*4882a593Smuzhiyun ... 158*4882a593Smuzhiyun }; 159*4882a593Smuzhiyun 160*4882a593SmuzhiyunThe ``info_id`` value should be an 8-bit number obtained from userspace and 161*4882a593Smuzhiyunshifted by WATCH_INFO_ID__SHIFT. This is OR'd into the WATCH_INFO_ID field of 162*4882a593Smuzhiyunstruct watch_notification::info when and if the notification is written into 163*4882a593Smuzhiyunthe associated watch queue buffer. 164*4882a593Smuzhiyun 165*4882a593SmuzhiyunThe ``private`` field is the driver's data associated with the watch_list and 166*4882a593Smuzhiyunis cleaned up by the ``watch_list::release_watch()`` method. 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunThe ``id`` field is the source's ID. Notifications that are posted with a 169*4882a593Smuzhiyundifferent ID are ignored. 170*4882a593Smuzhiyun 171*4882a593SmuzhiyunThe following functions are provided to manage watches: 172*4882a593Smuzhiyun 173*4882a593Smuzhiyun * ``void init_watch(struct watch *watch, struct watch_queue *wqueue);`` 174*4882a593Smuzhiyun 175*4882a593Smuzhiyun Initialise a watch object, setting its pointer to the watch queue, using 176*4882a593Smuzhiyun appropriate barriering to avoid lockdep complaints. 177*4882a593Smuzhiyun 178*4882a593Smuzhiyun * ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);`` 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun Subscribe a watch to a watch list (notification source). The 181*4882a593Smuzhiyun driver-settable fields in the watch struct must have been set before this 182*4882a593Smuzhiyun is called. 183*4882a593Smuzhiyun 184*4882a593Smuzhiyun * :: 185*4882a593Smuzhiyun 186*4882a593Smuzhiyun int remove_watch_from_object(struct watch_list *wlist, 187*4882a593Smuzhiyun struct watch_queue *wqueue, 188*4882a593Smuzhiyun u64 id, false); 189*4882a593Smuzhiyun 190*4882a593Smuzhiyun Remove a watch from a watch list, where the watch must match the specified 191*4882a593Smuzhiyun watch queue (``wqueue``) and object identifier (``id``). A notification 192*4882a593Smuzhiyun (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue to 193*4882a593Smuzhiyun indicate that the watch got removed. 194*4882a593Smuzhiyun 195*4882a593Smuzhiyun * ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);`` 196*4882a593Smuzhiyun 197*4882a593Smuzhiyun Remove all the watches from a watch list. It is expected that this will be 198*4882a593Smuzhiyun called preparatory to destruction and that the watch list will be 199*4882a593Smuzhiyun inaccessible to new watches by this point. A notification 200*4882a593Smuzhiyun (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue of each 201*4882a593Smuzhiyun subscribed watch to indicate that the watch got removed. 202*4882a593Smuzhiyun 203*4882a593Smuzhiyun 204*4882a593SmuzhiyunNotification Posting API 205*4882a593Smuzhiyun======================== 206*4882a593Smuzhiyun 207*4882a593SmuzhiyunTo post a notification to watch list so that the subscribed watches can see it, 208*4882a593Smuzhiyunthe following function should be used:: 209*4882a593Smuzhiyun 210*4882a593Smuzhiyun void post_watch_notification(struct watch_list *wlist, 211*4882a593Smuzhiyun struct watch_notification *n, 212*4882a593Smuzhiyun const struct cred *cred, 213*4882a593Smuzhiyun u64 id); 214*4882a593Smuzhiyun 215*4882a593SmuzhiyunThe notification should be preformatted and a pointer to the header (``n``) 216*4882a593Smuzhiyunshould be passed in. The notification may be larger than this and the size in 217*4882a593Smuzhiyununits of buffer slots is noted in ``n->info & WATCH_INFO_LENGTH``. 218*4882a593Smuzhiyun 219*4882a593SmuzhiyunThe ``cred`` struct indicates the credentials of the source (subject) and is 220*4882a593Smuzhiyunpassed to the LSMs, such as SELinux, to allow or suppress the recording of the 221*4882a593Smuzhiyunnote in each individual queue according to the credentials of that queue 222*4882a593Smuzhiyun(object). 223*4882a593Smuzhiyun 224*4882a593SmuzhiyunThe ``id`` is the ID of the source object (such as the serial number on a key). 225*4882a593SmuzhiyunOnly watches that have the same ID set in them will see this notification. 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun 228*4882a593SmuzhiyunWatch Sources 229*4882a593Smuzhiyun============= 230*4882a593Smuzhiyun 231*4882a593SmuzhiyunAny particular buffer can be fed from multiple sources. Sources include: 232*4882a593Smuzhiyun 233*4882a593Smuzhiyun * WATCH_TYPE_KEY_NOTIFY 234*4882a593Smuzhiyun 235*4882a593Smuzhiyun Notifications of this type indicate changes to keys and keyrings, including 236*4882a593Smuzhiyun the changes of keyring contents or the attributes of keys. 237*4882a593Smuzhiyun 238*4882a593Smuzhiyun See Documentation/security/keys/core.rst for more information. 239*4882a593Smuzhiyun 240*4882a593Smuzhiyun 241*4882a593SmuzhiyunEvent Filtering 242*4882a593Smuzhiyun=============== 243*4882a593Smuzhiyun 244*4882a593SmuzhiyunOnce a watch queue has been created, a set of filters can be applied to limit 245*4882a593Smuzhiyunthe events that are received using:: 246*4882a593Smuzhiyun 247*4882a593Smuzhiyun struct watch_notification_filter filter = { 248*4882a593Smuzhiyun ... 249*4882a593Smuzhiyun }; 250*4882a593Smuzhiyun ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) 251*4882a593Smuzhiyun 252*4882a593SmuzhiyunThe filter description is a variable of type:: 253*4882a593Smuzhiyun 254*4882a593Smuzhiyun struct watch_notification_filter { 255*4882a593Smuzhiyun __u32 nr_filters; 256*4882a593Smuzhiyun __u32 __reserved; 257*4882a593Smuzhiyun struct watch_notification_type_filter filters[]; 258*4882a593Smuzhiyun }; 259*4882a593Smuzhiyun 260*4882a593SmuzhiyunWhere "nr_filters" is the number of filters in filters[] and "__reserved" 261*4882a593Smuzhiyunshould be 0. The "filters" array has elements of the following type:: 262*4882a593Smuzhiyun 263*4882a593Smuzhiyun struct watch_notification_type_filter { 264*4882a593Smuzhiyun __u32 type; 265*4882a593Smuzhiyun __u32 info_filter; 266*4882a593Smuzhiyun __u32 info_mask; 267*4882a593Smuzhiyun __u32 subtype_filter[8]; 268*4882a593Smuzhiyun }; 269*4882a593Smuzhiyun 270*4882a593SmuzhiyunWhere: 271*4882a593Smuzhiyun 272*4882a593Smuzhiyun * ``type`` is the event type to filter for and should be something like 273*4882a593Smuzhiyun "WATCH_TYPE_KEY_NOTIFY" 274*4882a593Smuzhiyun 275*4882a593Smuzhiyun * ``info_filter`` and ``info_mask`` act as a filter on the info field of the 276*4882a593Smuzhiyun notification record. The notification is only written into the buffer if:: 277*4882a593Smuzhiyun 278*4882a593Smuzhiyun (watch.info & info_mask) == info_filter 279*4882a593Smuzhiyun 280*4882a593Smuzhiyun This could be used, for example, to ignore events that are not exactly on 281*4882a593Smuzhiyun the watched point in a mount tree. 282*4882a593Smuzhiyun 283*4882a593Smuzhiyun * ``subtype_filter`` is a bitmask indicating the subtypes that are of 284*4882a593Smuzhiyun interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to 285*4882a593Smuzhiyun subtype 1, and so on. 286*4882a593Smuzhiyun 287*4882a593SmuzhiyunIf the argument to the ioctl() is NULL, then the filters will be removed and 288*4882a593Smuzhiyunall events from the watched sources will come through. 289*4882a593Smuzhiyun 290*4882a593Smuzhiyun 291*4882a593SmuzhiyunUserspace Code Example 292*4882a593Smuzhiyun====================== 293*4882a593Smuzhiyun 294*4882a593SmuzhiyunA buffer is created with something like the following:: 295*4882a593Smuzhiyun 296*4882a593Smuzhiyun pipe2(fds, O_TMPFILE); 297*4882a593Smuzhiyun ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, 256); 298*4882a593Smuzhiyun 299*4882a593SmuzhiyunIt can then be set to receive keyring change notifications:: 300*4882a593Smuzhiyun 301*4882a593Smuzhiyun keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fds[1], 0x01); 302*4882a593Smuzhiyun 303*4882a593SmuzhiyunThe notifications can then be consumed by something like the following:: 304*4882a593Smuzhiyun 305*4882a593Smuzhiyun static void consumer(int rfd, struct watch_queue_buffer *buf) 306*4882a593Smuzhiyun { 307*4882a593Smuzhiyun unsigned char buffer[128]; 308*4882a593Smuzhiyun ssize_t buf_len; 309*4882a593Smuzhiyun 310*4882a593Smuzhiyun while (buf_len = read(rfd, buffer, sizeof(buffer)), 311*4882a593Smuzhiyun buf_len > 0 312*4882a593Smuzhiyun ) { 313*4882a593Smuzhiyun void *p = buffer; 314*4882a593Smuzhiyun void *end = buffer + buf_len; 315*4882a593Smuzhiyun while (p < end) { 316*4882a593Smuzhiyun union { 317*4882a593Smuzhiyun struct watch_notification n; 318*4882a593Smuzhiyun unsigned char buf1[128]; 319*4882a593Smuzhiyun } n; 320*4882a593Smuzhiyun size_t largest, len; 321*4882a593Smuzhiyun 322*4882a593Smuzhiyun largest = end - p; 323*4882a593Smuzhiyun if (largest > 128) 324*4882a593Smuzhiyun largest = 128; 325*4882a593Smuzhiyun memcpy(&n, p, largest); 326*4882a593Smuzhiyun 327*4882a593Smuzhiyun len = (n->info & WATCH_INFO_LENGTH) >> 328*4882a593Smuzhiyun WATCH_INFO_LENGTH__SHIFT; 329*4882a593Smuzhiyun if (len == 0 || len > largest) 330*4882a593Smuzhiyun return; 331*4882a593Smuzhiyun 332*4882a593Smuzhiyun switch (n.n.type) { 333*4882a593Smuzhiyun case WATCH_TYPE_META: 334*4882a593Smuzhiyun got_meta(&n.n); 335*4882a593Smuzhiyun case WATCH_TYPE_KEY_NOTIFY: 336*4882a593Smuzhiyun saw_key_change(&n.n); 337*4882a593Smuzhiyun break; 338*4882a593Smuzhiyun } 339*4882a593Smuzhiyun 340*4882a593Smuzhiyun p += len; 341*4882a593Smuzhiyun } 342*4882a593Smuzhiyun } 343*4882a593Smuzhiyun } 344