1*4882a593Smuzhiyun======================================================= 2*4882a593SmuzhiyunConfigfs - Userspace-driven Kernel Object Configuration 3*4882a593Smuzhiyun======================================================= 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunJoel Becker <joel.becker@oracle.com> 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunUpdated: 31 March 2005 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunCopyright (c) 2005 Oracle Corporation, 10*4882a593Smuzhiyun Joel Becker <joel.becker@oracle.com> 11*4882a593Smuzhiyun 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunWhat is configfs? 14*4882a593Smuzhiyun================= 15*4882a593Smuzhiyun 16*4882a593Smuzhiyunconfigfs is a ram-based filesystem that provides the converse of 17*4882a593Smuzhiyunsysfs's functionality. Where sysfs is a filesystem-based view of 18*4882a593Smuzhiyunkernel objects, configfs is a filesystem-based manager of kernel 19*4882a593Smuzhiyunobjects, or config_items. 20*4882a593Smuzhiyun 21*4882a593SmuzhiyunWith sysfs, an object is created in kernel (for example, when a device 22*4882a593Smuzhiyunis discovered) and it is registered with sysfs. Its attributes then 23*4882a593Smuzhiyunappear in sysfs, allowing userspace to read the attributes via 24*4882a593Smuzhiyunreaddir(3)/read(2). It may allow some attributes to be modified via 25*4882a593Smuzhiyunwrite(2). The important point is that the object is created and 26*4882a593Smuzhiyundestroyed in kernel, the kernel controls the lifecycle of the sysfs 27*4882a593Smuzhiyunrepresentation, and sysfs is merely a window on all this. 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunA configfs config_item is created via an explicit userspace operation: 30*4882a593Smuzhiyunmkdir(2). It is destroyed via rmdir(2). The attributes appear at 31*4882a593Smuzhiyunmkdir(2) time, and can be read or modified via read(2) and write(2). 32*4882a593SmuzhiyunAs with sysfs, readdir(3) queries the list of items and/or attributes. 33*4882a593Smuzhiyunsymlink(2) can be used to group items together. Unlike sysfs, the 34*4882a593Smuzhiyunlifetime of the representation is completely driven by userspace. The 35*4882a593Smuzhiyunkernel modules backing the items must respond to this. 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunBoth sysfs and configfs can and should exist together on the same 38*4882a593Smuzhiyunsystem. One is not a replacement for the other. 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunUsing configfs 41*4882a593Smuzhiyun============== 42*4882a593Smuzhiyun 43*4882a593Smuzhiyunconfigfs can be compiled as a module or into the kernel. You can access 44*4882a593Smuzhiyunit by doing:: 45*4882a593Smuzhiyun 46*4882a593Smuzhiyun mount -t configfs none /config 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunThe configfs tree will be empty unless client modules are also loaded. 49*4882a593SmuzhiyunThese are modules that register their item types with configfs as 50*4882a593Smuzhiyunsubsystems. Once a client subsystem is loaded, it will appear as a 51*4882a593Smuzhiyunsubdirectory (or more than one) under /config. Like sysfs, the 52*4882a593Smuzhiyunconfigfs tree is always there, whether mounted on /config or not. 53*4882a593Smuzhiyun 54*4882a593SmuzhiyunAn item is created via mkdir(2). The item's attributes will also 55*4882a593Smuzhiyunappear at this time. readdir(3) can determine what the attributes are, 56*4882a593Smuzhiyunread(2) can query their default values, and write(2) can store new 57*4882a593Smuzhiyunvalues. Don't mix more than one attribute in one attribute file. 58*4882a593Smuzhiyun 59*4882a593SmuzhiyunThere are two types of configfs attributes: 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun* Normal attributes, which similar to sysfs attributes, are small ASCII text 62*4882a593Smuzhiyun files, with a maximum size of one page (PAGE_SIZE, 4096 on i386). Preferably 63*4882a593Smuzhiyun only one value per file should be used, and the same caveats from sysfs apply. 64*4882a593Smuzhiyun Configfs expects write(2) to store the entire buffer at once. When writing to 65*4882a593Smuzhiyun normal configfs attributes, userspace processes should first read the entire 66*4882a593Smuzhiyun file, modify the portions they wish to change, and then write the entire 67*4882a593Smuzhiyun buffer back. 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun* Binary attributes, which are somewhat similar to sysfs binary attributes, 70*4882a593Smuzhiyun but with a few slight changes to semantics. The PAGE_SIZE limitation does not 71*4882a593Smuzhiyun apply, but the whole binary item must fit in single kernel vmalloc'ed buffer. 72*4882a593Smuzhiyun The write(2) calls from user space are buffered, and the attributes' 73*4882a593Smuzhiyun write_bin_attribute method will be invoked on the final close, therefore it is 74*4882a593Smuzhiyun imperative for user-space to check the return code of close(2) in order to 75*4882a593Smuzhiyun verify that the operation finished successfully. 76*4882a593Smuzhiyun To avoid a malicious user OOMing the kernel, there's a per-binary attribute 77*4882a593Smuzhiyun maximum buffer value. 78*4882a593Smuzhiyun 79*4882a593SmuzhiyunWhen an item needs to be destroyed, remove it with rmdir(2). An 80*4882a593Smuzhiyunitem cannot be destroyed if any other item has a link to it (via 81*4882a593Smuzhiyunsymlink(2)). Links can be removed via unlink(2). 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunConfiguring FakeNBD: an Example 84*4882a593Smuzhiyun=============================== 85*4882a593Smuzhiyun 86*4882a593SmuzhiyunImagine there's a Network Block Device (NBD) driver that allows you to 87*4882a593Smuzhiyunaccess remote block devices. Call it FakeNBD. FakeNBD uses configfs 88*4882a593Smuzhiyunfor its configuration. Obviously, there will be a nice program that 89*4882a593Smuzhiyunsysadmins use to configure FakeNBD, but somehow that program has to tell 90*4882a593Smuzhiyunthe driver about it. Here's where configfs comes in. 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunWhen the FakeNBD driver is loaded, it registers itself with configfs. 93*4882a593Smuzhiyunreaddir(3) sees this just fine:: 94*4882a593Smuzhiyun 95*4882a593Smuzhiyun # ls /config 96*4882a593Smuzhiyun fakenbd 97*4882a593Smuzhiyun 98*4882a593SmuzhiyunA fakenbd connection can be created with mkdir(2). The name is 99*4882a593Smuzhiyunarbitrary, but likely the tool will make some use of the name. Perhaps 100*4882a593Smuzhiyunit is a uuid or a disk name:: 101*4882a593Smuzhiyun 102*4882a593Smuzhiyun # mkdir /config/fakenbd/disk1 103*4882a593Smuzhiyun # ls /config/fakenbd/disk1 104*4882a593Smuzhiyun target device rw 105*4882a593Smuzhiyun 106*4882a593SmuzhiyunThe target attribute contains the IP address of the server FakeNBD will 107*4882a593Smuzhiyunconnect to. The device attribute is the device on the server. 108*4882a593SmuzhiyunPredictably, the rw attribute determines whether the connection is 109*4882a593Smuzhiyunread-only or read-write:: 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun # echo 10.0.0.1 > /config/fakenbd/disk1/target 112*4882a593Smuzhiyun # echo /dev/sda1 > /config/fakenbd/disk1/device 113*4882a593Smuzhiyun # echo 1 > /config/fakenbd/disk1/rw 114*4882a593Smuzhiyun 115*4882a593SmuzhiyunThat's it. That's all there is. Now the device is configured, via the 116*4882a593Smuzhiyunshell no less. 117*4882a593Smuzhiyun 118*4882a593SmuzhiyunCoding With configfs 119*4882a593Smuzhiyun==================== 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunEvery object in configfs is a config_item. A config_item reflects an 122*4882a593Smuzhiyunobject in the subsystem. It has attributes that match values on that 123*4882a593Smuzhiyunobject. configfs handles the filesystem representation of that object 124*4882a593Smuzhiyunand its attributes, allowing the subsystem to ignore all but the 125*4882a593Smuzhiyunbasic show/store interaction. 126*4882a593Smuzhiyun 127*4882a593SmuzhiyunItems are created and destroyed inside a config_group. A group is a 128*4882a593Smuzhiyuncollection of items that share the same attributes and operations. 129*4882a593SmuzhiyunItems are created by mkdir(2) and removed by rmdir(2), but configfs 130*4882a593Smuzhiyunhandles that. The group has a set of operations to perform these tasks 131*4882a593Smuzhiyun 132*4882a593SmuzhiyunA subsystem is the top level of a client module. During initialization, 133*4882a593Smuzhiyunthe client module registers the subsystem with configfs, the subsystem 134*4882a593Smuzhiyunappears as a directory at the top of the configfs filesystem. A 135*4882a593Smuzhiyunsubsystem is also a config_group, and can do everything a config_group 136*4882a593Smuzhiyuncan. 137*4882a593Smuzhiyun 138*4882a593Smuzhiyunstruct config_item 139*4882a593Smuzhiyun================== 140*4882a593Smuzhiyun 141*4882a593Smuzhiyun:: 142*4882a593Smuzhiyun 143*4882a593Smuzhiyun struct config_item { 144*4882a593Smuzhiyun char *ci_name; 145*4882a593Smuzhiyun char ci_namebuf[UOBJ_NAME_LEN]; 146*4882a593Smuzhiyun struct kref ci_kref; 147*4882a593Smuzhiyun struct list_head ci_entry; 148*4882a593Smuzhiyun struct config_item *ci_parent; 149*4882a593Smuzhiyun struct config_group *ci_group; 150*4882a593Smuzhiyun struct config_item_type *ci_type; 151*4882a593Smuzhiyun struct dentry *ci_dentry; 152*4882a593Smuzhiyun }; 153*4882a593Smuzhiyun 154*4882a593Smuzhiyun void config_item_init(struct config_item *); 155*4882a593Smuzhiyun void config_item_init_type_name(struct config_item *, 156*4882a593Smuzhiyun const char *name, 157*4882a593Smuzhiyun struct config_item_type *type); 158*4882a593Smuzhiyun struct config_item *config_item_get(struct config_item *); 159*4882a593Smuzhiyun void config_item_put(struct config_item *); 160*4882a593Smuzhiyun 161*4882a593SmuzhiyunGenerally, struct config_item is embedded in a container structure, a 162*4882a593Smuzhiyunstructure that actually represents what the subsystem is doing. The 163*4882a593Smuzhiyunconfig_item portion of that structure is how the object interacts with 164*4882a593Smuzhiyunconfigfs. 165*4882a593Smuzhiyun 166*4882a593SmuzhiyunWhether statically defined in a source file or created by a parent 167*4882a593Smuzhiyunconfig_group, a config_item must have one of the _init() functions 168*4882a593Smuzhiyuncalled on it. This initializes the reference count and sets up the 169*4882a593Smuzhiyunappropriate fields. 170*4882a593Smuzhiyun 171*4882a593SmuzhiyunAll users of a config_item should have a reference on it via 172*4882a593Smuzhiyunconfig_item_get(), and drop the reference when they are done via 173*4882a593Smuzhiyunconfig_item_put(). 174*4882a593Smuzhiyun 175*4882a593SmuzhiyunBy itself, a config_item cannot do much more than appear in configfs. 176*4882a593SmuzhiyunUsually a subsystem wants the item to display and/or store attributes, 177*4882a593Smuzhiyunamong other things. For that, it needs a type. 178*4882a593Smuzhiyun 179*4882a593Smuzhiyunstruct config_item_type 180*4882a593Smuzhiyun======================= 181*4882a593Smuzhiyun 182*4882a593Smuzhiyun:: 183*4882a593Smuzhiyun 184*4882a593Smuzhiyun struct configfs_item_operations { 185*4882a593Smuzhiyun void (*release)(struct config_item *); 186*4882a593Smuzhiyun int (*allow_link)(struct config_item *src, 187*4882a593Smuzhiyun struct config_item *target); 188*4882a593Smuzhiyun void (*drop_link)(struct config_item *src, 189*4882a593Smuzhiyun struct config_item *target); 190*4882a593Smuzhiyun }; 191*4882a593Smuzhiyun 192*4882a593Smuzhiyun struct config_item_type { 193*4882a593Smuzhiyun struct module *ct_owner; 194*4882a593Smuzhiyun struct configfs_item_operations *ct_item_ops; 195*4882a593Smuzhiyun struct configfs_group_operations *ct_group_ops; 196*4882a593Smuzhiyun struct configfs_attribute **ct_attrs; 197*4882a593Smuzhiyun struct configfs_bin_attribute **ct_bin_attrs; 198*4882a593Smuzhiyun }; 199*4882a593Smuzhiyun 200*4882a593SmuzhiyunThe most basic function of a config_item_type is to define what 201*4882a593Smuzhiyunoperations can be performed on a config_item. All items that have been 202*4882a593Smuzhiyunallocated dynamically will need to provide the ct_item_ops->release() 203*4882a593Smuzhiyunmethod. This method is called when the config_item's reference count 204*4882a593Smuzhiyunreaches zero. 205*4882a593Smuzhiyun 206*4882a593Smuzhiyunstruct configfs_attribute 207*4882a593Smuzhiyun========================= 208*4882a593Smuzhiyun 209*4882a593Smuzhiyun:: 210*4882a593Smuzhiyun 211*4882a593Smuzhiyun struct configfs_attribute { 212*4882a593Smuzhiyun char *ca_name; 213*4882a593Smuzhiyun struct module *ca_owner; 214*4882a593Smuzhiyun umode_t ca_mode; 215*4882a593Smuzhiyun ssize_t (*show)(struct config_item *, char *); 216*4882a593Smuzhiyun ssize_t (*store)(struct config_item *, const char *, size_t); 217*4882a593Smuzhiyun }; 218*4882a593Smuzhiyun 219*4882a593SmuzhiyunWhen a config_item wants an attribute to appear as a file in the item's 220*4882a593Smuzhiyunconfigfs directory, it must define a configfs_attribute describing it. 221*4882a593SmuzhiyunIt then adds the attribute to the NULL-terminated array 222*4882a593Smuzhiyunconfig_item_type->ct_attrs. When the item appears in configfs, the 223*4882a593Smuzhiyunattribute file will appear with the configfs_attribute->ca_name 224*4882a593Smuzhiyunfilename. configfs_attribute->ca_mode specifies the file permissions. 225*4882a593Smuzhiyun 226*4882a593SmuzhiyunIf an attribute is readable and provides a ->show method, that method will 227*4882a593Smuzhiyunbe called whenever userspace asks for a read(2) on the attribute. If an 228*4882a593Smuzhiyunattribute is writable and provides a ->store method, that method will be 229*4882a593Smuzhiyuncalled whenever userspace asks for a write(2) on the attribute. 230*4882a593Smuzhiyun 231*4882a593Smuzhiyunstruct configfs_bin_attribute 232*4882a593Smuzhiyun============================= 233*4882a593Smuzhiyun 234*4882a593Smuzhiyun:: 235*4882a593Smuzhiyun 236*4882a593Smuzhiyun struct configfs_bin_attribute { 237*4882a593Smuzhiyun struct configfs_attribute cb_attr; 238*4882a593Smuzhiyun void *cb_private; 239*4882a593Smuzhiyun size_t cb_max_size; 240*4882a593Smuzhiyun }; 241*4882a593Smuzhiyun 242*4882a593SmuzhiyunThe binary attribute is used when the one needs to use binary blob to 243*4882a593Smuzhiyunappear as the contents of a file in the item's configfs directory. 244*4882a593SmuzhiyunTo do so add the binary attribute to the NULL-terminated array 245*4882a593Smuzhiyunconfig_item_type->ct_bin_attrs, and the item appears in configfs, the 246*4882a593Smuzhiyunattribute file will appear with the configfs_bin_attribute->cb_attr.ca_name 247*4882a593Smuzhiyunfilename. configfs_bin_attribute->cb_attr.ca_mode specifies the file 248*4882a593Smuzhiyunpermissions. 249*4882a593SmuzhiyunThe cb_private member is provided for use by the driver, while the 250*4882a593Smuzhiyuncb_max_size member specifies the maximum amount of vmalloc buffer 251*4882a593Smuzhiyunto be used. 252*4882a593Smuzhiyun 253*4882a593SmuzhiyunIf binary attribute is readable and the config_item provides a 254*4882a593Smuzhiyunct_item_ops->read_bin_attribute() method, that method will be called 255*4882a593Smuzhiyunwhenever userspace asks for a read(2) on the attribute. The converse 256*4882a593Smuzhiyunwill happen for write(2). The reads/writes are bufferred so only a 257*4882a593Smuzhiyunsingle read/write will occur; the attributes' need not concern itself 258*4882a593Smuzhiyunwith it. 259*4882a593Smuzhiyun 260*4882a593Smuzhiyunstruct config_group 261*4882a593Smuzhiyun=================== 262*4882a593Smuzhiyun 263*4882a593SmuzhiyunA config_item cannot live in a vacuum. The only way one can be created 264*4882a593Smuzhiyunis via mkdir(2) on a config_group. This will trigger creation of a 265*4882a593Smuzhiyunchild item:: 266*4882a593Smuzhiyun 267*4882a593Smuzhiyun struct config_group { 268*4882a593Smuzhiyun struct config_item cg_item; 269*4882a593Smuzhiyun struct list_head cg_children; 270*4882a593Smuzhiyun struct configfs_subsystem *cg_subsys; 271*4882a593Smuzhiyun struct list_head default_groups; 272*4882a593Smuzhiyun struct list_head group_entry; 273*4882a593Smuzhiyun }; 274*4882a593Smuzhiyun 275*4882a593Smuzhiyun void config_group_init(struct config_group *group); 276*4882a593Smuzhiyun void config_group_init_type_name(struct config_group *group, 277*4882a593Smuzhiyun const char *name, 278*4882a593Smuzhiyun struct config_item_type *type); 279*4882a593Smuzhiyun 280*4882a593Smuzhiyun 281*4882a593SmuzhiyunThe config_group structure contains a config_item. Properly configuring 282*4882a593Smuzhiyunthat item means that a group can behave as an item in its own right. 283*4882a593SmuzhiyunHowever, it can do more: it can create child items or groups. This is 284*4882a593Smuzhiyunaccomplished via the group operations specified on the group's 285*4882a593Smuzhiyunconfig_item_type:: 286*4882a593Smuzhiyun 287*4882a593Smuzhiyun struct configfs_group_operations { 288*4882a593Smuzhiyun struct config_item *(*make_item)(struct config_group *group, 289*4882a593Smuzhiyun const char *name); 290*4882a593Smuzhiyun struct config_group *(*make_group)(struct config_group *group, 291*4882a593Smuzhiyun const char *name); 292*4882a593Smuzhiyun int (*commit_item)(struct config_item *item); 293*4882a593Smuzhiyun void (*disconnect_notify)(struct config_group *group, 294*4882a593Smuzhiyun struct config_item *item); 295*4882a593Smuzhiyun void (*drop_item)(struct config_group *group, 296*4882a593Smuzhiyun struct config_item *item); 297*4882a593Smuzhiyun }; 298*4882a593Smuzhiyun 299*4882a593SmuzhiyunA group creates child items by providing the 300*4882a593Smuzhiyunct_group_ops->make_item() method. If provided, this method is called from 301*4882a593Smuzhiyunmkdir(2) in the group's directory. The subsystem allocates a new 302*4882a593Smuzhiyunconfig_item (or more likely, its container structure), initializes it, 303*4882a593Smuzhiyunand returns it to configfs. Configfs will then populate the filesystem 304*4882a593Smuzhiyuntree to reflect the new item. 305*4882a593Smuzhiyun 306*4882a593SmuzhiyunIf the subsystem wants the child to be a group itself, the subsystem 307*4882a593Smuzhiyunprovides ct_group_ops->make_group(). Everything else behaves the same, 308*4882a593Smuzhiyunusing the group _init() functions on the group. 309*4882a593Smuzhiyun 310*4882a593SmuzhiyunFinally, when userspace calls rmdir(2) on the item or group, 311*4882a593Smuzhiyunct_group_ops->drop_item() is called. As a config_group is also a 312*4882a593Smuzhiyunconfig_item, it is not necessary for a separate drop_group() method. 313*4882a593SmuzhiyunThe subsystem must config_item_put() the reference that was initialized 314*4882a593Smuzhiyunupon item allocation. If a subsystem has no work to do, it may omit 315*4882a593Smuzhiyunthe ct_group_ops->drop_item() method, and configfs will call 316*4882a593Smuzhiyunconfig_item_put() on the item on behalf of the subsystem. 317*4882a593Smuzhiyun 318*4882a593SmuzhiyunImportant: 319*4882a593Smuzhiyun drop_item() is void, and as such cannot fail. When rmdir(2) 320*4882a593Smuzhiyun is called, configfs WILL remove the item from the filesystem tree 321*4882a593Smuzhiyun (assuming that it has no children to keep it busy). The subsystem is 322*4882a593Smuzhiyun responsible for responding to this. If the subsystem has references to 323*4882a593Smuzhiyun the item in other threads, the memory is safe. It may take some time 324*4882a593Smuzhiyun for the item to actually disappear from the subsystem's usage. But it 325*4882a593Smuzhiyun is gone from configfs. 326*4882a593Smuzhiyun 327*4882a593SmuzhiyunWhen drop_item() is called, the item's linkage has already been torn 328*4882a593Smuzhiyundown. It no longer has a reference on its parent and has no place in 329*4882a593Smuzhiyunthe item hierarchy. If a client needs to do some cleanup before this 330*4882a593Smuzhiyunteardown happens, the subsystem can implement the 331*4882a593Smuzhiyunct_group_ops->disconnect_notify() method. The method is called after 332*4882a593Smuzhiyunconfigfs has removed the item from the filesystem view but before the 333*4882a593Smuzhiyunitem is removed from its parent group. Like drop_item(), 334*4882a593Smuzhiyundisconnect_notify() is void and cannot fail. Client subsystems should 335*4882a593Smuzhiyunnot drop any references here, as they still must do it in drop_item(). 336*4882a593Smuzhiyun 337*4882a593SmuzhiyunA config_group cannot be removed while it still has child items. This 338*4882a593Smuzhiyunis implemented in the configfs rmdir(2) code. ->drop_item() will not be 339*4882a593Smuzhiyuncalled, as the item has not been dropped. rmdir(2) will fail, as the 340*4882a593Smuzhiyundirectory is not empty. 341*4882a593Smuzhiyun 342*4882a593Smuzhiyunstruct configfs_subsystem 343*4882a593Smuzhiyun========================= 344*4882a593Smuzhiyun 345*4882a593SmuzhiyunA subsystem must register itself, usually at module_init time. This 346*4882a593Smuzhiyuntells configfs to make the subsystem appear in the file tree:: 347*4882a593Smuzhiyun 348*4882a593Smuzhiyun struct configfs_subsystem { 349*4882a593Smuzhiyun struct config_group su_group; 350*4882a593Smuzhiyun struct mutex su_mutex; 351*4882a593Smuzhiyun }; 352*4882a593Smuzhiyun 353*4882a593Smuzhiyun int configfs_register_subsystem(struct configfs_subsystem *subsys); 354*4882a593Smuzhiyun void configfs_unregister_subsystem(struct configfs_subsystem *subsys); 355*4882a593Smuzhiyun 356*4882a593SmuzhiyunA subsystem consists of a toplevel config_group and a mutex. 357*4882a593SmuzhiyunThe group is where child config_items are created. For a subsystem, 358*4882a593Smuzhiyunthis group is usually defined statically. Before calling 359*4882a593Smuzhiyunconfigfs_register_subsystem(), the subsystem must have initialized the 360*4882a593Smuzhiyungroup via the usual group _init() functions, and it must also have 361*4882a593Smuzhiyuninitialized the mutex. 362*4882a593Smuzhiyun 363*4882a593SmuzhiyunWhen the register call returns, the subsystem is live, and it 364*4882a593Smuzhiyunwill be visible via configfs. At that point, mkdir(2) can be called and 365*4882a593Smuzhiyunthe subsystem must be ready for it. 366*4882a593Smuzhiyun 367*4882a593SmuzhiyunAn Example 368*4882a593Smuzhiyun========== 369*4882a593Smuzhiyun 370*4882a593SmuzhiyunThe best example of these basic concepts is the simple_children 371*4882a593Smuzhiyunsubsystem/group and the simple_child item in 372*4882a593Smuzhiyunsamples/configfs/configfs_sample.c. It shows a trivial object displaying 373*4882a593Smuzhiyunand storing an attribute, and a simple group creating and destroying 374*4882a593Smuzhiyunthese children. 375*4882a593Smuzhiyun 376*4882a593SmuzhiyunHierarchy Navigation and the Subsystem Mutex 377*4882a593Smuzhiyun============================================ 378*4882a593Smuzhiyun 379*4882a593SmuzhiyunThere is an extra bonus that configfs provides. The config_groups and 380*4882a593Smuzhiyunconfig_items are arranged in a hierarchy due to the fact that they 381*4882a593Smuzhiyunappear in a filesystem. A subsystem is NEVER to touch the filesystem 382*4882a593Smuzhiyunparts, but the subsystem might be interested in this hierarchy. For 383*4882a593Smuzhiyunthis reason, the hierarchy is mirrored via the config_group->cg_children 384*4882a593Smuzhiyunand config_item->ci_parent structure members. 385*4882a593Smuzhiyun 386*4882a593SmuzhiyunA subsystem can navigate the cg_children list and the ci_parent pointer 387*4882a593Smuzhiyunto see the tree created by the subsystem. This can race with configfs' 388*4882a593Smuzhiyunmanagement of the hierarchy, so configfs uses the subsystem mutex to 389*4882a593Smuzhiyunprotect modifications. Whenever a subsystem wants to navigate the 390*4882a593Smuzhiyunhierarchy, it must do so under the protection of the subsystem 391*4882a593Smuzhiyunmutex. 392*4882a593Smuzhiyun 393*4882a593SmuzhiyunA subsystem will be prevented from acquiring the mutex while a newly 394*4882a593Smuzhiyunallocated item has not been linked into this hierarchy. Similarly, it 395*4882a593Smuzhiyunwill not be able to acquire the mutex while a dropping item has not 396*4882a593Smuzhiyunyet been unlinked. This means that an item's ci_parent pointer will 397*4882a593Smuzhiyunnever be NULL while the item is in configfs, and that an item will only 398*4882a593Smuzhiyunbe in its parent's cg_children list for the same duration. This allows 399*4882a593Smuzhiyuna subsystem to trust ci_parent and cg_children while they hold the 400*4882a593Smuzhiyunmutex. 401*4882a593Smuzhiyun 402*4882a593SmuzhiyunItem Aggregation Via symlink(2) 403*4882a593Smuzhiyun=============================== 404*4882a593Smuzhiyun 405*4882a593Smuzhiyunconfigfs provides a simple group via the group->item parent/child 406*4882a593Smuzhiyunrelationship. Often, however, a larger environment requires aggregation 407*4882a593Smuzhiyunoutside of the parent/child connection. This is implemented via 408*4882a593Smuzhiyunsymlink(2). 409*4882a593Smuzhiyun 410*4882a593SmuzhiyunA config_item may provide the ct_item_ops->allow_link() and 411*4882a593Smuzhiyunct_item_ops->drop_link() methods. If the ->allow_link() method exists, 412*4882a593Smuzhiyunsymlink(2) may be called with the config_item as the source of the link. 413*4882a593SmuzhiyunThese links are only allowed between configfs config_items. Any 414*4882a593Smuzhiyunsymlink(2) attempt outside the configfs filesystem will be denied. 415*4882a593Smuzhiyun 416*4882a593SmuzhiyunWhen symlink(2) is called, the source config_item's ->allow_link() 417*4882a593Smuzhiyunmethod is called with itself and a target item. If the source item 418*4882a593Smuzhiyunallows linking to target item, it returns 0. A source item may wish to 419*4882a593Smuzhiyunreject a link if it only wants links to a certain type of object (say, 420*4882a593Smuzhiyunin its own subsystem). 421*4882a593Smuzhiyun 422*4882a593SmuzhiyunWhen unlink(2) is called on the symbolic link, the source item is 423*4882a593Smuzhiyunnotified via the ->drop_link() method. Like the ->drop_item() method, 424*4882a593Smuzhiyunthis is a void function and cannot return failure. The subsystem is 425*4882a593Smuzhiyunresponsible for responding to the change. 426*4882a593Smuzhiyun 427*4882a593SmuzhiyunA config_item cannot be removed while it links to any other item, nor 428*4882a593Smuzhiyuncan it be removed while an item links to it. Dangling symlinks are not 429*4882a593Smuzhiyunallowed in configfs. 430*4882a593Smuzhiyun 431*4882a593SmuzhiyunAutomatically Created Subgroups 432*4882a593Smuzhiyun=============================== 433*4882a593Smuzhiyun 434*4882a593SmuzhiyunA new config_group may want to have two types of child config_items. 435*4882a593SmuzhiyunWhile this could be codified by magic names in ->make_item(), it is much 436*4882a593Smuzhiyunmore explicit to have a method whereby userspace sees this divergence. 437*4882a593Smuzhiyun 438*4882a593SmuzhiyunRather than have a group where some items behave differently than 439*4882a593Smuzhiyunothers, configfs provides a method whereby one or many subgroups are 440*4882a593Smuzhiyunautomatically created inside the parent at its creation. Thus, 441*4882a593Smuzhiyunmkdir("parent") results in "parent", "parent/subgroup1", up through 442*4882a593Smuzhiyun"parent/subgroupN". Items of type 1 can now be created in 443*4882a593Smuzhiyun"parent/subgroup1", and items of type N can be created in 444*4882a593Smuzhiyun"parent/subgroupN". 445*4882a593Smuzhiyun 446*4882a593SmuzhiyunThese automatic subgroups, or default groups, do not preclude other 447*4882a593Smuzhiyunchildren of the parent group. If ct_group_ops->make_group() exists, 448*4882a593Smuzhiyunother child groups can be created on the parent group directly. 449*4882a593Smuzhiyun 450*4882a593SmuzhiyunA configfs subsystem specifies default groups by adding them using the 451*4882a593Smuzhiyunconfigfs_add_default_group() function to the parent config_group 452*4882a593Smuzhiyunstructure. Each added group is populated in the configfs tree at the same 453*4882a593Smuzhiyuntime as the parent group. Similarly, they are removed at the same time 454*4882a593Smuzhiyunas the parent. No extra notification is provided. When a ->drop_item() 455*4882a593Smuzhiyunmethod call notifies the subsystem the parent group is going away, it 456*4882a593Smuzhiyunalso means every default group child associated with that parent group. 457*4882a593Smuzhiyun 458*4882a593SmuzhiyunAs a consequence of this, default groups cannot be removed directly via 459*4882a593Smuzhiyunrmdir(2). They also are not considered when rmdir(2) on the parent 460*4882a593Smuzhiyungroup is checking for children. 461*4882a593Smuzhiyun 462*4882a593SmuzhiyunDependent Subsystems 463*4882a593Smuzhiyun==================== 464*4882a593Smuzhiyun 465*4882a593SmuzhiyunSometimes other drivers depend on particular configfs items. For 466*4882a593Smuzhiyunexample, ocfs2 mounts depend on a heartbeat region item. If that 467*4882a593Smuzhiyunregion item is removed with rmdir(2), the ocfs2 mount must BUG or go 468*4882a593Smuzhiyunreadonly. Not happy. 469*4882a593Smuzhiyun 470*4882a593Smuzhiyunconfigfs provides two additional API calls: configfs_depend_item() and 471*4882a593Smuzhiyunconfigfs_undepend_item(). A client driver can call 472*4882a593Smuzhiyunconfigfs_depend_item() on an existing item to tell configfs that it is 473*4882a593Smuzhiyundepended on. configfs will then return -EBUSY from rmdir(2) for that 474*4882a593Smuzhiyunitem. When the item is no longer depended on, the client driver calls 475*4882a593Smuzhiyunconfigfs_undepend_item() on it. 476*4882a593Smuzhiyun 477*4882a593SmuzhiyunThese API cannot be called underneath any configfs callbacks, as 478*4882a593Smuzhiyunthey will conflict. They can block and allocate. A client driver 479*4882a593Smuzhiyunprobably shouldn't calling them of its own gumption. Rather it should 480*4882a593Smuzhiyunbe providing an API that external subsystems call. 481*4882a593Smuzhiyun 482*4882a593SmuzhiyunHow does this work? Imagine the ocfs2 mount process. When it mounts, 483*4882a593Smuzhiyunit asks for a heartbeat region item. This is done via a call into the 484*4882a593Smuzhiyunheartbeat code. Inside the heartbeat code, the region item is looked 485*4882a593Smuzhiyunup. Here, the heartbeat code calls configfs_depend_item(). If it 486*4882a593Smuzhiyunsucceeds, then heartbeat knows the region is safe to give to ocfs2. 487*4882a593SmuzhiyunIf it fails, it was being torn down anyway, and heartbeat can gracefully 488*4882a593Smuzhiyunpass up an error. 489*4882a593Smuzhiyun 490*4882a593SmuzhiyunCommittable Items 491*4882a593Smuzhiyun================= 492*4882a593Smuzhiyun 493*4882a593SmuzhiyunNote: 494*4882a593Smuzhiyun Committable items are currently unimplemented. 495*4882a593Smuzhiyun 496*4882a593SmuzhiyunSome config_items cannot have a valid initial state. That is, no 497*4882a593Smuzhiyundefault values can be specified for the item's attributes such that the 498*4882a593Smuzhiyunitem can do its work. Userspace must configure one or more attributes, 499*4882a593Smuzhiyunafter which the subsystem can start whatever entity this item 500*4882a593Smuzhiyunrepresents. 501*4882a593Smuzhiyun 502*4882a593SmuzhiyunConsider the FakeNBD device from above. Without a target address *and* 503*4882a593Smuzhiyuna target device, the subsystem has no idea what block device to import. 504*4882a593SmuzhiyunThe simple example assumes that the subsystem merely waits until all the 505*4882a593Smuzhiyunappropriate attributes are configured, and then connects. This will, 506*4882a593Smuzhiyunindeed, work, but now every attribute store must check if the attributes 507*4882a593Smuzhiyunare initialized. Every attribute store must fire off the connection if 508*4882a593Smuzhiyunthat condition is met. 509*4882a593Smuzhiyun 510*4882a593SmuzhiyunFar better would be an explicit action notifying the subsystem that the 511*4882a593Smuzhiyunconfig_item is ready to go. More importantly, an explicit action allows 512*4882a593Smuzhiyunthe subsystem to provide feedback as to whether the attributes are 513*4882a593Smuzhiyuninitialized in a way that makes sense. configfs provides this as 514*4882a593Smuzhiyuncommittable items. 515*4882a593Smuzhiyun 516*4882a593Smuzhiyunconfigfs still uses only normal filesystem operations. An item is 517*4882a593Smuzhiyuncommitted via rename(2). The item is moved from a directory where it 518*4882a593Smuzhiyuncan be modified to a directory where it cannot. 519*4882a593Smuzhiyun 520*4882a593SmuzhiyunAny group that provides the ct_group_ops->commit_item() method has 521*4882a593Smuzhiyuncommittable items. When this group appears in configfs, mkdir(2) will 522*4882a593Smuzhiyunnot work directly in the group. Instead, the group will have two 523*4882a593Smuzhiyunsubdirectories: "live" and "pending". The "live" directory does not 524*4882a593Smuzhiyunsupport mkdir(2) or rmdir(2) either. It only allows rename(2). The 525*4882a593Smuzhiyun"pending" directory does allow mkdir(2) and rmdir(2). An item is 526*4882a593Smuzhiyuncreated in the "pending" directory. Its attributes can be modified at 527*4882a593Smuzhiyunwill. Userspace commits the item by renaming it into the "live" 528*4882a593Smuzhiyundirectory. At this point, the subsystem receives the ->commit_item() 529*4882a593Smuzhiyuncallback. If all required attributes are filled to satisfaction, the 530*4882a593Smuzhiyunmethod returns zero and the item is moved to the "live" directory. 531*4882a593Smuzhiyun 532*4882a593SmuzhiyunAs rmdir(2) does not work in the "live" directory, an item must be 533*4882a593Smuzhiyunshutdown, or "uncommitted". Again, this is done via rename(2), this 534*4882a593Smuzhiyuntime from the "live" directory back to the "pending" one. The subsystem 535*4882a593Smuzhiyunis notified by the ct_group_ops->uncommit_object() method. 536