xref: /OK3568_Linux_fs/kernel/Documentation/filesystems/configfs.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=======================================================
2*4882a593SmuzhiyunConfigfs - Userspace-driven Kernel Object Configuration
3*4882a593Smuzhiyun=======================================================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunJoel Becker <joel.becker@oracle.com>
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunUpdated: 31 March 2005
8*4882a593Smuzhiyun
9*4882a593SmuzhiyunCopyright (c) 2005 Oracle Corporation,
10*4882a593Smuzhiyun	Joel Becker <joel.becker@oracle.com>
11*4882a593Smuzhiyun
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunWhat is configfs?
14*4882a593Smuzhiyun=================
15*4882a593Smuzhiyun
16*4882a593Smuzhiyunconfigfs is a ram-based filesystem that provides the converse of
17*4882a593Smuzhiyunsysfs's functionality.  Where sysfs is a filesystem-based view of
18*4882a593Smuzhiyunkernel objects, configfs is a filesystem-based manager of kernel
19*4882a593Smuzhiyunobjects, or config_items.
20*4882a593Smuzhiyun
21*4882a593SmuzhiyunWith sysfs, an object is created in kernel (for example, when a device
22*4882a593Smuzhiyunis discovered) and it is registered with sysfs.  Its attributes then
23*4882a593Smuzhiyunappear in sysfs, allowing userspace to read the attributes via
24*4882a593Smuzhiyunreaddir(3)/read(2).  It may allow some attributes to be modified via
25*4882a593Smuzhiyunwrite(2).  The important point is that the object is created and
26*4882a593Smuzhiyundestroyed in kernel, the kernel controls the lifecycle of the sysfs
27*4882a593Smuzhiyunrepresentation, and sysfs is merely a window on all this.
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunA configfs config_item is created via an explicit userspace operation:
30*4882a593Smuzhiyunmkdir(2).  It is destroyed via rmdir(2).  The attributes appear at
31*4882a593Smuzhiyunmkdir(2) time, and can be read or modified via read(2) and write(2).
32*4882a593SmuzhiyunAs with sysfs, readdir(3) queries the list of items and/or attributes.
33*4882a593Smuzhiyunsymlink(2) can be used to group items together.  Unlike sysfs, the
34*4882a593Smuzhiyunlifetime of the representation is completely driven by userspace.  The
35*4882a593Smuzhiyunkernel modules backing the items must respond to this.
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunBoth sysfs and configfs can and should exist together on the same
38*4882a593Smuzhiyunsystem.  One is not a replacement for the other.
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunUsing configfs
41*4882a593Smuzhiyun==============
42*4882a593Smuzhiyun
43*4882a593Smuzhiyunconfigfs can be compiled as a module or into the kernel.  You can access
44*4882a593Smuzhiyunit by doing::
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun	mount -t configfs none /config
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunThe configfs tree will be empty unless client modules are also loaded.
49*4882a593SmuzhiyunThese are modules that register their item types with configfs as
50*4882a593Smuzhiyunsubsystems.  Once a client subsystem is loaded, it will appear as a
51*4882a593Smuzhiyunsubdirectory (or more than one) under /config.  Like sysfs, the
52*4882a593Smuzhiyunconfigfs tree is always there, whether mounted on /config or not.
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunAn item is created via mkdir(2).  The item's attributes will also
55*4882a593Smuzhiyunappear at this time.  readdir(3) can determine what the attributes are,
56*4882a593Smuzhiyunread(2) can query their default values, and write(2) can store new
57*4882a593Smuzhiyunvalues.  Don't mix more than one attribute in one attribute file.
58*4882a593Smuzhiyun
59*4882a593SmuzhiyunThere are two types of configfs attributes:
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun* Normal attributes, which similar to sysfs attributes, are small ASCII text
62*4882a593Smuzhiyun  files, with a maximum size of one page (PAGE_SIZE, 4096 on i386).  Preferably
63*4882a593Smuzhiyun  only one value per file should be used, and the same caveats from sysfs apply.
64*4882a593Smuzhiyun  Configfs expects write(2) to store the entire buffer at once.  When writing to
65*4882a593Smuzhiyun  normal configfs attributes, userspace processes should first read the entire
66*4882a593Smuzhiyun  file, modify the portions they wish to change, and then write the entire
67*4882a593Smuzhiyun  buffer back.
68*4882a593Smuzhiyun
69*4882a593Smuzhiyun* Binary attributes, which are somewhat similar to sysfs binary attributes,
70*4882a593Smuzhiyun  but with a few slight changes to semantics.  The PAGE_SIZE limitation does not
71*4882a593Smuzhiyun  apply, but the whole binary item must fit in single kernel vmalloc'ed buffer.
72*4882a593Smuzhiyun  The write(2) calls from user space are buffered, and the attributes'
73*4882a593Smuzhiyun  write_bin_attribute method will be invoked on the final close, therefore it is
74*4882a593Smuzhiyun  imperative for user-space to check the return code of close(2) in order to
75*4882a593Smuzhiyun  verify that the operation finished successfully.
76*4882a593Smuzhiyun  To avoid a malicious user OOMing the kernel, there's a per-binary attribute
77*4882a593Smuzhiyun  maximum buffer value.
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunWhen an item needs to be destroyed, remove it with rmdir(2).  An
80*4882a593Smuzhiyunitem cannot be destroyed if any other item has a link to it (via
81*4882a593Smuzhiyunsymlink(2)).  Links can be removed via unlink(2).
82*4882a593Smuzhiyun
83*4882a593SmuzhiyunConfiguring FakeNBD: an Example
84*4882a593Smuzhiyun===============================
85*4882a593Smuzhiyun
86*4882a593SmuzhiyunImagine there's a Network Block Device (NBD) driver that allows you to
87*4882a593Smuzhiyunaccess remote block devices.  Call it FakeNBD.  FakeNBD uses configfs
88*4882a593Smuzhiyunfor its configuration.  Obviously, there will be a nice program that
89*4882a593Smuzhiyunsysadmins use to configure FakeNBD, but somehow that program has to tell
90*4882a593Smuzhiyunthe driver about it.  Here's where configfs comes in.
91*4882a593Smuzhiyun
92*4882a593SmuzhiyunWhen the FakeNBD driver is loaded, it registers itself with configfs.
93*4882a593Smuzhiyunreaddir(3) sees this just fine::
94*4882a593Smuzhiyun
95*4882a593Smuzhiyun	# ls /config
96*4882a593Smuzhiyun	fakenbd
97*4882a593Smuzhiyun
98*4882a593SmuzhiyunA fakenbd connection can be created with mkdir(2).  The name is
99*4882a593Smuzhiyunarbitrary, but likely the tool will make some use of the name.  Perhaps
100*4882a593Smuzhiyunit is a uuid or a disk name::
101*4882a593Smuzhiyun
102*4882a593Smuzhiyun	# mkdir /config/fakenbd/disk1
103*4882a593Smuzhiyun	# ls /config/fakenbd/disk1
104*4882a593Smuzhiyun	target device rw
105*4882a593Smuzhiyun
106*4882a593SmuzhiyunThe target attribute contains the IP address of the server FakeNBD will
107*4882a593Smuzhiyunconnect to.  The device attribute is the device on the server.
108*4882a593SmuzhiyunPredictably, the rw attribute determines whether the connection is
109*4882a593Smuzhiyunread-only or read-write::
110*4882a593Smuzhiyun
111*4882a593Smuzhiyun	# echo 10.0.0.1 > /config/fakenbd/disk1/target
112*4882a593Smuzhiyun	# echo /dev/sda1 > /config/fakenbd/disk1/device
113*4882a593Smuzhiyun	# echo 1 > /config/fakenbd/disk1/rw
114*4882a593Smuzhiyun
115*4882a593SmuzhiyunThat's it.  That's all there is.  Now the device is configured, via the
116*4882a593Smuzhiyunshell no less.
117*4882a593Smuzhiyun
118*4882a593SmuzhiyunCoding With configfs
119*4882a593Smuzhiyun====================
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunEvery object in configfs is a config_item.  A config_item reflects an
122*4882a593Smuzhiyunobject in the subsystem.  It has attributes that match values on that
123*4882a593Smuzhiyunobject.  configfs handles the filesystem representation of that object
124*4882a593Smuzhiyunand its attributes, allowing the subsystem to ignore all but the
125*4882a593Smuzhiyunbasic show/store interaction.
126*4882a593Smuzhiyun
127*4882a593SmuzhiyunItems are created and destroyed inside a config_group.  A group is a
128*4882a593Smuzhiyuncollection of items that share the same attributes and operations.
129*4882a593SmuzhiyunItems are created by mkdir(2) and removed by rmdir(2), but configfs
130*4882a593Smuzhiyunhandles that.  The group has a set of operations to perform these tasks
131*4882a593Smuzhiyun
132*4882a593SmuzhiyunA subsystem is the top level of a client module.  During initialization,
133*4882a593Smuzhiyunthe client module registers the subsystem with configfs, the subsystem
134*4882a593Smuzhiyunappears as a directory at the top of the configfs filesystem.  A
135*4882a593Smuzhiyunsubsystem is also a config_group, and can do everything a config_group
136*4882a593Smuzhiyuncan.
137*4882a593Smuzhiyun
138*4882a593Smuzhiyunstruct config_item
139*4882a593Smuzhiyun==================
140*4882a593Smuzhiyun
141*4882a593Smuzhiyun::
142*4882a593Smuzhiyun
143*4882a593Smuzhiyun	struct config_item {
144*4882a593Smuzhiyun		char                    *ci_name;
145*4882a593Smuzhiyun		char                    ci_namebuf[UOBJ_NAME_LEN];
146*4882a593Smuzhiyun		struct kref             ci_kref;
147*4882a593Smuzhiyun		struct list_head        ci_entry;
148*4882a593Smuzhiyun		struct config_item      *ci_parent;
149*4882a593Smuzhiyun		struct config_group     *ci_group;
150*4882a593Smuzhiyun		struct config_item_type *ci_type;
151*4882a593Smuzhiyun		struct dentry           *ci_dentry;
152*4882a593Smuzhiyun	};
153*4882a593Smuzhiyun
154*4882a593Smuzhiyun	void config_item_init(struct config_item *);
155*4882a593Smuzhiyun	void config_item_init_type_name(struct config_item *,
156*4882a593Smuzhiyun					const char *name,
157*4882a593Smuzhiyun					struct config_item_type *type);
158*4882a593Smuzhiyun	struct config_item *config_item_get(struct config_item *);
159*4882a593Smuzhiyun	void config_item_put(struct config_item *);
160*4882a593Smuzhiyun
161*4882a593SmuzhiyunGenerally, struct config_item is embedded in a container structure, a
162*4882a593Smuzhiyunstructure that actually represents what the subsystem is doing.  The
163*4882a593Smuzhiyunconfig_item portion of that structure is how the object interacts with
164*4882a593Smuzhiyunconfigfs.
165*4882a593Smuzhiyun
166*4882a593SmuzhiyunWhether statically defined in a source file or created by a parent
167*4882a593Smuzhiyunconfig_group, a config_item must have one of the _init() functions
168*4882a593Smuzhiyuncalled on it.  This initializes the reference count and sets up the
169*4882a593Smuzhiyunappropriate fields.
170*4882a593Smuzhiyun
171*4882a593SmuzhiyunAll users of a config_item should have a reference on it via
172*4882a593Smuzhiyunconfig_item_get(), and drop the reference when they are done via
173*4882a593Smuzhiyunconfig_item_put().
174*4882a593Smuzhiyun
175*4882a593SmuzhiyunBy itself, a config_item cannot do much more than appear in configfs.
176*4882a593SmuzhiyunUsually a subsystem wants the item to display and/or store attributes,
177*4882a593Smuzhiyunamong other things.  For that, it needs a type.
178*4882a593Smuzhiyun
179*4882a593Smuzhiyunstruct config_item_type
180*4882a593Smuzhiyun=======================
181*4882a593Smuzhiyun
182*4882a593Smuzhiyun::
183*4882a593Smuzhiyun
184*4882a593Smuzhiyun	struct configfs_item_operations {
185*4882a593Smuzhiyun		void (*release)(struct config_item *);
186*4882a593Smuzhiyun		int (*allow_link)(struct config_item *src,
187*4882a593Smuzhiyun				  struct config_item *target);
188*4882a593Smuzhiyun		void (*drop_link)(struct config_item *src,
189*4882a593Smuzhiyun				 struct config_item *target);
190*4882a593Smuzhiyun	};
191*4882a593Smuzhiyun
192*4882a593Smuzhiyun	struct config_item_type {
193*4882a593Smuzhiyun		struct module                           *ct_owner;
194*4882a593Smuzhiyun		struct configfs_item_operations         *ct_item_ops;
195*4882a593Smuzhiyun		struct configfs_group_operations        *ct_group_ops;
196*4882a593Smuzhiyun		struct configfs_attribute               **ct_attrs;
197*4882a593Smuzhiyun		struct configfs_bin_attribute		**ct_bin_attrs;
198*4882a593Smuzhiyun	};
199*4882a593Smuzhiyun
200*4882a593SmuzhiyunThe most basic function of a config_item_type is to define what
201*4882a593Smuzhiyunoperations can be performed on a config_item.  All items that have been
202*4882a593Smuzhiyunallocated dynamically will need to provide the ct_item_ops->release()
203*4882a593Smuzhiyunmethod.  This method is called when the config_item's reference count
204*4882a593Smuzhiyunreaches zero.
205*4882a593Smuzhiyun
206*4882a593Smuzhiyunstruct configfs_attribute
207*4882a593Smuzhiyun=========================
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun::
210*4882a593Smuzhiyun
211*4882a593Smuzhiyun	struct configfs_attribute {
212*4882a593Smuzhiyun		char                    *ca_name;
213*4882a593Smuzhiyun		struct module           *ca_owner;
214*4882a593Smuzhiyun		umode_t                  ca_mode;
215*4882a593Smuzhiyun		ssize_t (*show)(struct config_item *, char *);
216*4882a593Smuzhiyun		ssize_t (*store)(struct config_item *, const char *, size_t);
217*4882a593Smuzhiyun	};
218*4882a593Smuzhiyun
219*4882a593SmuzhiyunWhen a config_item wants an attribute to appear as a file in the item's
220*4882a593Smuzhiyunconfigfs directory, it must define a configfs_attribute describing it.
221*4882a593SmuzhiyunIt then adds the attribute to the NULL-terminated array
222*4882a593Smuzhiyunconfig_item_type->ct_attrs.  When the item appears in configfs, the
223*4882a593Smuzhiyunattribute file will appear with the configfs_attribute->ca_name
224*4882a593Smuzhiyunfilename.  configfs_attribute->ca_mode specifies the file permissions.
225*4882a593Smuzhiyun
226*4882a593SmuzhiyunIf an attribute is readable and provides a ->show method, that method will
227*4882a593Smuzhiyunbe called whenever userspace asks for a read(2) on the attribute.  If an
228*4882a593Smuzhiyunattribute is writable and provides a ->store  method, that method will be
229*4882a593Smuzhiyuncalled whenever userspace asks for a write(2) on the attribute.
230*4882a593Smuzhiyun
231*4882a593Smuzhiyunstruct configfs_bin_attribute
232*4882a593Smuzhiyun=============================
233*4882a593Smuzhiyun
234*4882a593Smuzhiyun::
235*4882a593Smuzhiyun
236*4882a593Smuzhiyun	struct configfs_bin_attribute {
237*4882a593Smuzhiyun		struct configfs_attribute	cb_attr;
238*4882a593Smuzhiyun		void				*cb_private;
239*4882a593Smuzhiyun		size_t				cb_max_size;
240*4882a593Smuzhiyun	};
241*4882a593Smuzhiyun
242*4882a593SmuzhiyunThe binary attribute is used when the one needs to use binary blob to
243*4882a593Smuzhiyunappear as the contents of a file in the item's configfs directory.
244*4882a593SmuzhiyunTo do so add the binary attribute to the NULL-terminated array
245*4882a593Smuzhiyunconfig_item_type->ct_bin_attrs, and the item appears in configfs, the
246*4882a593Smuzhiyunattribute file will appear with the configfs_bin_attribute->cb_attr.ca_name
247*4882a593Smuzhiyunfilename.  configfs_bin_attribute->cb_attr.ca_mode specifies the file
248*4882a593Smuzhiyunpermissions.
249*4882a593SmuzhiyunThe cb_private member is provided for use by the driver, while the
250*4882a593Smuzhiyuncb_max_size member specifies the maximum amount of vmalloc buffer
251*4882a593Smuzhiyunto be used.
252*4882a593Smuzhiyun
253*4882a593SmuzhiyunIf binary attribute is readable and the config_item provides a
254*4882a593Smuzhiyunct_item_ops->read_bin_attribute() method, that method will be called
255*4882a593Smuzhiyunwhenever userspace asks for a read(2) on the attribute.  The converse
256*4882a593Smuzhiyunwill happen for write(2). The reads/writes are bufferred so only a
257*4882a593Smuzhiyunsingle read/write will occur; the attributes' need not concern itself
258*4882a593Smuzhiyunwith it.
259*4882a593Smuzhiyun
260*4882a593Smuzhiyunstruct config_group
261*4882a593Smuzhiyun===================
262*4882a593Smuzhiyun
263*4882a593SmuzhiyunA config_item cannot live in a vacuum.  The only way one can be created
264*4882a593Smuzhiyunis via mkdir(2) on a config_group.  This will trigger creation of a
265*4882a593Smuzhiyunchild item::
266*4882a593Smuzhiyun
267*4882a593Smuzhiyun	struct config_group {
268*4882a593Smuzhiyun		struct config_item		cg_item;
269*4882a593Smuzhiyun		struct list_head		cg_children;
270*4882a593Smuzhiyun		struct configfs_subsystem 	*cg_subsys;
271*4882a593Smuzhiyun		struct list_head		default_groups;
272*4882a593Smuzhiyun		struct list_head		group_entry;
273*4882a593Smuzhiyun	};
274*4882a593Smuzhiyun
275*4882a593Smuzhiyun	void config_group_init(struct config_group *group);
276*4882a593Smuzhiyun	void config_group_init_type_name(struct config_group *group,
277*4882a593Smuzhiyun					 const char *name,
278*4882a593Smuzhiyun					 struct config_item_type *type);
279*4882a593Smuzhiyun
280*4882a593Smuzhiyun
281*4882a593SmuzhiyunThe config_group structure contains a config_item.  Properly configuring
282*4882a593Smuzhiyunthat item means that a group can behave as an item in its own right.
283*4882a593SmuzhiyunHowever, it can do more: it can create child items or groups.  This is
284*4882a593Smuzhiyunaccomplished via the group operations specified on the group's
285*4882a593Smuzhiyunconfig_item_type::
286*4882a593Smuzhiyun
287*4882a593Smuzhiyun	struct configfs_group_operations {
288*4882a593Smuzhiyun		struct config_item *(*make_item)(struct config_group *group,
289*4882a593Smuzhiyun						 const char *name);
290*4882a593Smuzhiyun		struct config_group *(*make_group)(struct config_group *group,
291*4882a593Smuzhiyun						   const char *name);
292*4882a593Smuzhiyun		int (*commit_item)(struct config_item *item);
293*4882a593Smuzhiyun		void (*disconnect_notify)(struct config_group *group,
294*4882a593Smuzhiyun					  struct config_item *item);
295*4882a593Smuzhiyun		void (*drop_item)(struct config_group *group,
296*4882a593Smuzhiyun				  struct config_item *item);
297*4882a593Smuzhiyun	};
298*4882a593Smuzhiyun
299*4882a593SmuzhiyunA group creates child items by providing the
300*4882a593Smuzhiyunct_group_ops->make_item() method.  If provided, this method is called from
301*4882a593Smuzhiyunmkdir(2) in the group's directory.  The subsystem allocates a new
302*4882a593Smuzhiyunconfig_item (or more likely, its container structure), initializes it,
303*4882a593Smuzhiyunand returns it to configfs.  Configfs will then populate the filesystem
304*4882a593Smuzhiyuntree to reflect the new item.
305*4882a593Smuzhiyun
306*4882a593SmuzhiyunIf the subsystem wants the child to be a group itself, the subsystem
307*4882a593Smuzhiyunprovides ct_group_ops->make_group().  Everything else behaves the same,
308*4882a593Smuzhiyunusing the group _init() functions on the group.
309*4882a593Smuzhiyun
310*4882a593SmuzhiyunFinally, when userspace calls rmdir(2) on the item or group,
311*4882a593Smuzhiyunct_group_ops->drop_item() is called.  As a config_group is also a
312*4882a593Smuzhiyunconfig_item, it is not necessary for a separate drop_group() method.
313*4882a593SmuzhiyunThe subsystem must config_item_put() the reference that was initialized
314*4882a593Smuzhiyunupon item allocation.  If a subsystem has no work to do, it may omit
315*4882a593Smuzhiyunthe ct_group_ops->drop_item() method, and configfs will call
316*4882a593Smuzhiyunconfig_item_put() on the item on behalf of the subsystem.
317*4882a593Smuzhiyun
318*4882a593SmuzhiyunImportant:
319*4882a593Smuzhiyun   drop_item() is void, and as such cannot fail.  When rmdir(2)
320*4882a593Smuzhiyun   is called, configfs WILL remove the item from the filesystem tree
321*4882a593Smuzhiyun   (assuming that it has no children to keep it busy).  The subsystem is
322*4882a593Smuzhiyun   responsible for responding to this.  If the subsystem has references to
323*4882a593Smuzhiyun   the item in other threads, the memory is safe.  It may take some time
324*4882a593Smuzhiyun   for the item to actually disappear from the subsystem's usage.  But it
325*4882a593Smuzhiyun   is gone from configfs.
326*4882a593Smuzhiyun
327*4882a593SmuzhiyunWhen drop_item() is called, the item's linkage has already been torn
328*4882a593Smuzhiyundown.  It no longer has a reference on its parent and has no place in
329*4882a593Smuzhiyunthe item hierarchy.  If a client needs to do some cleanup before this
330*4882a593Smuzhiyunteardown happens, the subsystem can implement the
331*4882a593Smuzhiyunct_group_ops->disconnect_notify() method.  The method is called after
332*4882a593Smuzhiyunconfigfs has removed the item from the filesystem view but before the
333*4882a593Smuzhiyunitem is removed from its parent group.  Like drop_item(),
334*4882a593Smuzhiyundisconnect_notify() is void and cannot fail.  Client subsystems should
335*4882a593Smuzhiyunnot drop any references here, as they still must do it in drop_item().
336*4882a593Smuzhiyun
337*4882a593SmuzhiyunA config_group cannot be removed while it still has child items.  This
338*4882a593Smuzhiyunis implemented in the configfs rmdir(2) code.  ->drop_item() will not be
339*4882a593Smuzhiyuncalled, as the item has not been dropped.  rmdir(2) will fail, as the
340*4882a593Smuzhiyundirectory is not empty.
341*4882a593Smuzhiyun
342*4882a593Smuzhiyunstruct configfs_subsystem
343*4882a593Smuzhiyun=========================
344*4882a593Smuzhiyun
345*4882a593SmuzhiyunA subsystem must register itself, usually at module_init time.  This
346*4882a593Smuzhiyuntells configfs to make the subsystem appear in the file tree::
347*4882a593Smuzhiyun
348*4882a593Smuzhiyun	struct configfs_subsystem {
349*4882a593Smuzhiyun		struct config_group	su_group;
350*4882a593Smuzhiyun		struct mutex		su_mutex;
351*4882a593Smuzhiyun	};
352*4882a593Smuzhiyun
353*4882a593Smuzhiyun	int configfs_register_subsystem(struct configfs_subsystem *subsys);
354*4882a593Smuzhiyun	void configfs_unregister_subsystem(struct configfs_subsystem *subsys);
355*4882a593Smuzhiyun
356*4882a593SmuzhiyunA subsystem consists of a toplevel config_group and a mutex.
357*4882a593SmuzhiyunThe group is where child config_items are created.  For a subsystem,
358*4882a593Smuzhiyunthis group is usually defined statically.  Before calling
359*4882a593Smuzhiyunconfigfs_register_subsystem(), the subsystem must have initialized the
360*4882a593Smuzhiyungroup via the usual group _init() functions, and it must also have
361*4882a593Smuzhiyuninitialized the mutex.
362*4882a593Smuzhiyun
363*4882a593SmuzhiyunWhen the register call returns, the subsystem is live, and it
364*4882a593Smuzhiyunwill be visible via configfs.  At that point, mkdir(2) can be called and
365*4882a593Smuzhiyunthe subsystem must be ready for it.
366*4882a593Smuzhiyun
367*4882a593SmuzhiyunAn Example
368*4882a593Smuzhiyun==========
369*4882a593Smuzhiyun
370*4882a593SmuzhiyunThe best example of these basic concepts is the simple_children
371*4882a593Smuzhiyunsubsystem/group and the simple_child item in
372*4882a593Smuzhiyunsamples/configfs/configfs_sample.c. It shows a trivial object displaying
373*4882a593Smuzhiyunand storing an attribute, and a simple group creating and destroying
374*4882a593Smuzhiyunthese children.
375*4882a593Smuzhiyun
376*4882a593SmuzhiyunHierarchy Navigation and the Subsystem Mutex
377*4882a593Smuzhiyun============================================
378*4882a593Smuzhiyun
379*4882a593SmuzhiyunThere is an extra bonus that configfs provides.  The config_groups and
380*4882a593Smuzhiyunconfig_items are arranged in a hierarchy due to the fact that they
381*4882a593Smuzhiyunappear in a filesystem.  A subsystem is NEVER to touch the filesystem
382*4882a593Smuzhiyunparts, but the subsystem might be interested in this hierarchy.  For
383*4882a593Smuzhiyunthis reason, the hierarchy is mirrored via the config_group->cg_children
384*4882a593Smuzhiyunand config_item->ci_parent structure members.
385*4882a593Smuzhiyun
386*4882a593SmuzhiyunA subsystem can navigate the cg_children list and the ci_parent pointer
387*4882a593Smuzhiyunto see the tree created by the subsystem.  This can race with configfs'
388*4882a593Smuzhiyunmanagement of the hierarchy, so configfs uses the subsystem mutex to
389*4882a593Smuzhiyunprotect modifications.  Whenever a subsystem wants to navigate the
390*4882a593Smuzhiyunhierarchy, it must do so under the protection of the subsystem
391*4882a593Smuzhiyunmutex.
392*4882a593Smuzhiyun
393*4882a593SmuzhiyunA subsystem will be prevented from acquiring the mutex while a newly
394*4882a593Smuzhiyunallocated item has not been linked into this hierarchy.   Similarly, it
395*4882a593Smuzhiyunwill not be able to acquire the mutex while a dropping item has not
396*4882a593Smuzhiyunyet been unlinked.  This means that an item's ci_parent pointer will
397*4882a593Smuzhiyunnever be NULL while the item is in configfs, and that an item will only
398*4882a593Smuzhiyunbe in its parent's cg_children list for the same duration.  This allows
399*4882a593Smuzhiyuna subsystem to trust ci_parent and cg_children while they hold the
400*4882a593Smuzhiyunmutex.
401*4882a593Smuzhiyun
402*4882a593SmuzhiyunItem Aggregation Via symlink(2)
403*4882a593Smuzhiyun===============================
404*4882a593Smuzhiyun
405*4882a593Smuzhiyunconfigfs provides a simple group via the group->item parent/child
406*4882a593Smuzhiyunrelationship.  Often, however, a larger environment requires aggregation
407*4882a593Smuzhiyunoutside of the parent/child connection.  This is implemented via
408*4882a593Smuzhiyunsymlink(2).
409*4882a593Smuzhiyun
410*4882a593SmuzhiyunA config_item may provide the ct_item_ops->allow_link() and
411*4882a593Smuzhiyunct_item_ops->drop_link() methods.  If the ->allow_link() method exists,
412*4882a593Smuzhiyunsymlink(2) may be called with the config_item as the source of the link.
413*4882a593SmuzhiyunThese links are only allowed between configfs config_items.  Any
414*4882a593Smuzhiyunsymlink(2) attempt outside the configfs filesystem will be denied.
415*4882a593Smuzhiyun
416*4882a593SmuzhiyunWhen symlink(2) is called, the source config_item's ->allow_link()
417*4882a593Smuzhiyunmethod is called with itself and a target item.  If the source item
418*4882a593Smuzhiyunallows linking to target item, it returns 0.  A source item may wish to
419*4882a593Smuzhiyunreject a link if it only wants links to a certain type of object (say,
420*4882a593Smuzhiyunin its own subsystem).
421*4882a593Smuzhiyun
422*4882a593SmuzhiyunWhen unlink(2) is called on the symbolic link, the source item is
423*4882a593Smuzhiyunnotified via the ->drop_link() method.  Like the ->drop_item() method,
424*4882a593Smuzhiyunthis is a void function and cannot return failure.  The subsystem is
425*4882a593Smuzhiyunresponsible for responding to the change.
426*4882a593Smuzhiyun
427*4882a593SmuzhiyunA config_item cannot be removed while it links to any other item, nor
428*4882a593Smuzhiyuncan it be removed while an item links to it.  Dangling symlinks are not
429*4882a593Smuzhiyunallowed in configfs.
430*4882a593Smuzhiyun
431*4882a593SmuzhiyunAutomatically Created Subgroups
432*4882a593Smuzhiyun===============================
433*4882a593Smuzhiyun
434*4882a593SmuzhiyunA new config_group may want to have two types of child config_items.
435*4882a593SmuzhiyunWhile this could be codified by magic names in ->make_item(), it is much
436*4882a593Smuzhiyunmore explicit to have a method whereby userspace sees this divergence.
437*4882a593Smuzhiyun
438*4882a593SmuzhiyunRather than have a group where some items behave differently than
439*4882a593Smuzhiyunothers, configfs provides a method whereby one or many subgroups are
440*4882a593Smuzhiyunautomatically created inside the parent at its creation.  Thus,
441*4882a593Smuzhiyunmkdir("parent") results in "parent", "parent/subgroup1", up through
442*4882a593Smuzhiyun"parent/subgroupN".  Items of type 1 can now be created in
443*4882a593Smuzhiyun"parent/subgroup1", and items of type N can be created in
444*4882a593Smuzhiyun"parent/subgroupN".
445*4882a593Smuzhiyun
446*4882a593SmuzhiyunThese automatic subgroups, or default groups, do not preclude other
447*4882a593Smuzhiyunchildren of the parent group.  If ct_group_ops->make_group() exists,
448*4882a593Smuzhiyunother child groups can be created on the parent group directly.
449*4882a593Smuzhiyun
450*4882a593SmuzhiyunA configfs subsystem specifies default groups by adding them using the
451*4882a593Smuzhiyunconfigfs_add_default_group() function to the parent config_group
452*4882a593Smuzhiyunstructure.  Each added group is populated in the configfs tree at the same
453*4882a593Smuzhiyuntime as the parent group.  Similarly, they are removed at the same time
454*4882a593Smuzhiyunas the parent.  No extra notification is provided.  When a ->drop_item()
455*4882a593Smuzhiyunmethod call notifies the subsystem the parent group is going away, it
456*4882a593Smuzhiyunalso means every default group child associated with that parent group.
457*4882a593Smuzhiyun
458*4882a593SmuzhiyunAs a consequence of this, default groups cannot be removed directly via
459*4882a593Smuzhiyunrmdir(2).  They also are not considered when rmdir(2) on the parent
460*4882a593Smuzhiyungroup is checking for children.
461*4882a593Smuzhiyun
462*4882a593SmuzhiyunDependent Subsystems
463*4882a593Smuzhiyun====================
464*4882a593Smuzhiyun
465*4882a593SmuzhiyunSometimes other drivers depend on particular configfs items.  For
466*4882a593Smuzhiyunexample, ocfs2 mounts depend on a heartbeat region item.  If that
467*4882a593Smuzhiyunregion item is removed with rmdir(2), the ocfs2 mount must BUG or go
468*4882a593Smuzhiyunreadonly.  Not happy.
469*4882a593Smuzhiyun
470*4882a593Smuzhiyunconfigfs provides two additional API calls: configfs_depend_item() and
471*4882a593Smuzhiyunconfigfs_undepend_item().  A client driver can call
472*4882a593Smuzhiyunconfigfs_depend_item() on an existing item to tell configfs that it is
473*4882a593Smuzhiyundepended on.  configfs will then return -EBUSY from rmdir(2) for that
474*4882a593Smuzhiyunitem.  When the item is no longer depended on, the client driver calls
475*4882a593Smuzhiyunconfigfs_undepend_item() on it.
476*4882a593Smuzhiyun
477*4882a593SmuzhiyunThese API cannot be called underneath any configfs callbacks, as
478*4882a593Smuzhiyunthey will conflict.  They can block and allocate.  A client driver
479*4882a593Smuzhiyunprobably shouldn't calling them of its own gumption.  Rather it should
480*4882a593Smuzhiyunbe providing an API that external subsystems call.
481*4882a593Smuzhiyun
482*4882a593SmuzhiyunHow does this work?  Imagine the ocfs2 mount process.  When it mounts,
483*4882a593Smuzhiyunit asks for a heartbeat region item.  This is done via a call into the
484*4882a593Smuzhiyunheartbeat code.  Inside the heartbeat code, the region item is looked
485*4882a593Smuzhiyunup.  Here, the heartbeat code calls configfs_depend_item().  If it
486*4882a593Smuzhiyunsucceeds, then heartbeat knows the region is safe to give to ocfs2.
487*4882a593SmuzhiyunIf it fails, it was being torn down anyway, and heartbeat can gracefully
488*4882a593Smuzhiyunpass up an error.
489*4882a593Smuzhiyun
490*4882a593SmuzhiyunCommittable Items
491*4882a593Smuzhiyun=================
492*4882a593Smuzhiyun
493*4882a593SmuzhiyunNote:
494*4882a593Smuzhiyun     Committable items are currently unimplemented.
495*4882a593Smuzhiyun
496*4882a593SmuzhiyunSome config_items cannot have a valid initial state.  That is, no
497*4882a593Smuzhiyundefault values can be specified for the item's attributes such that the
498*4882a593Smuzhiyunitem can do its work.  Userspace must configure one or more attributes,
499*4882a593Smuzhiyunafter which the subsystem can start whatever entity this item
500*4882a593Smuzhiyunrepresents.
501*4882a593Smuzhiyun
502*4882a593SmuzhiyunConsider the FakeNBD device from above.  Without a target address *and*
503*4882a593Smuzhiyuna target device, the subsystem has no idea what block device to import.
504*4882a593SmuzhiyunThe simple example assumes that the subsystem merely waits until all the
505*4882a593Smuzhiyunappropriate attributes are configured, and then connects.  This will,
506*4882a593Smuzhiyunindeed, work, but now every attribute store must check if the attributes
507*4882a593Smuzhiyunare initialized.  Every attribute store must fire off the connection if
508*4882a593Smuzhiyunthat condition is met.
509*4882a593Smuzhiyun
510*4882a593SmuzhiyunFar better would be an explicit action notifying the subsystem that the
511*4882a593Smuzhiyunconfig_item is ready to go.  More importantly, an explicit action allows
512*4882a593Smuzhiyunthe subsystem to provide feedback as to whether the attributes are
513*4882a593Smuzhiyuninitialized in a way that makes sense.  configfs provides this as
514*4882a593Smuzhiyuncommittable items.
515*4882a593Smuzhiyun
516*4882a593Smuzhiyunconfigfs still uses only normal filesystem operations.  An item is
517*4882a593Smuzhiyuncommitted via rename(2).  The item is moved from a directory where it
518*4882a593Smuzhiyuncan be modified to a directory where it cannot.
519*4882a593Smuzhiyun
520*4882a593SmuzhiyunAny group that provides the ct_group_ops->commit_item() method has
521*4882a593Smuzhiyuncommittable items.  When this group appears in configfs, mkdir(2) will
522*4882a593Smuzhiyunnot work directly in the group.  Instead, the group will have two
523*4882a593Smuzhiyunsubdirectories: "live" and "pending".  The "live" directory does not
524*4882a593Smuzhiyunsupport mkdir(2) or rmdir(2) either.  It only allows rename(2).  The
525*4882a593Smuzhiyun"pending" directory does allow mkdir(2) and rmdir(2).  An item is
526*4882a593Smuzhiyuncreated in the "pending" directory.  Its attributes can be modified at
527*4882a593Smuzhiyunwill.  Userspace commits the item by renaming it into the "live"
528*4882a593Smuzhiyundirectory.  At this point, the subsystem receives the ->commit_item()
529*4882a593Smuzhiyuncallback.  If all required attributes are filled to satisfaction, the
530*4882a593Smuzhiyunmethod returns zero and the item is moved to the "live" directory.
531*4882a593Smuzhiyun
532*4882a593SmuzhiyunAs rmdir(2) does not work in the "live" directory, an item must be
533*4882a593Smuzhiyunshutdown, or "uncommitted".  Again, this is done via rename(2), this
534*4882a593Smuzhiyuntime from the "live" directory back to the "pending" one.  The subsystem
535*4882a593Smuzhiyunis notified by the ct_group_ops->uncommit_object() method.
536