xref: /OK3568_Linux_fs/kernel/Documentation/filesystems/sharedsubtree.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun===============
4*4882a593SmuzhiyunShared Subtrees
5*4882a593Smuzhiyun===============
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun.. Contents:
8*4882a593Smuzhiyun	1) Overview
9*4882a593Smuzhiyun	2) Features
10*4882a593Smuzhiyun	3) Setting mount states
11*4882a593Smuzhiyun	4) Use-case
12*4882a593Smuzhiyun	5) Detailed semantics
13*4882a593Smuzhiyun	6) Quiz
14*4882a593Smuzhiyun	7) FAQ
15*4882a593Smuzhiyun	8) Implementation
16*4882a593Smuzhiyun
17*4882a593Smuzhiyun
18*4882a593Smuzhiyun1) Overview
19*4882a593Smuzhiyun-----------
20*4882a593Smuzhiyun
21*4882a593SmuzhiyunConsider the following situation:
22*4882a593Smuzhiyun
23*4882a593SmuzhiyunA process wants to clone its own namespace, but still wants to access the CD
24*4882a593Smuzhiyunthat got mounted recently.  Shared subtree semantics provide the necessary
25*4882a593Smuzhiyunmechanism to accomplish the above.
26*4882a593Smuzhiyun
27*4882a593SmuzhiyunIt provides the necessary building blocks for features like per-user-namespace
28*4882a593Smuzhiyunand versioned filesystem.
29*4882a593Smuzhiyun
30*4882a593Smuzhiyun2) Features
31*4882a593Smuzhiyun-----------
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunShared subtree provides four different flavors of mounts; struct vfsmount to be
34*4882a593Smuzhiyunprecise
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun	a. shared mount
37*4882a593Smuzhiyun	b. slave mount
38*4882a593Smuzhiyun	c. private mount
39*4882a593Smuzhiyun	d. unbindable mount
40*4882a593Smuzhiyun
41*4882a593Smuzhiyun
42*4882a593Smuzhiyun2a) A shared mount can be replicated to as many mountpoints and all the
43*4882a593Smuzhiyunreplicas continue to be exactly same.
44*4882a593Smuzhiyun
45*4882a593Smuzhiyun	Here is an example:
46*4882a593Smuzhiyun
47*4882a593Smuzhiyun	Let's say /mnt has a mount that is shared::
48*4882a593Smuzhiyun
49*4882a593Smuzhiyun	    mount --make-shared /mnt
50*4882a593Smuzhiyun
51*4882a593Smuzhiyun	Note: mount(8) command now supports the --make-shared flag,
52*4882a593Smuzhiyun	so the sample 'smount' program is no longer needed and has been
53*4882a593Smuzhiyun	removed.
54*4882a593Smuzhiyun
55*4882a593Smuzhiyun	::
56*4882a593Smuzhiyun
57*4882a593Smuzhiyun	    # mount --bind /mnt /tmp
58*4882a593Smuzhiyun
59*4882a593Smuzhiyun	The above command replicates the mount at /mnt to the mountpoint /tmp
60*4882a593Smuzhiyun	and the contents of both the mounts remain identical.
61*4882a593Smuzhiyun
62*4882a593Smuzhiyun	::
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun	    #ls /mnt
65*4882a593Smuzhiyun	    a b c
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun	    #ls /tmp
68*4882a593Smuzhiyun	    a b c
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun	Now let's say we mount a device at /tmp/a::
71*4882a593Smuzhiyun
72*4882a593Smuzhiyun	    # mount /dev/sd0  /tmp/a
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun	    #ls /tmp/a
75*4882a593Smuzhiyun	    t1 t2 t3
76*4882a593Smuzhiyun
77*4882a593Smuzhiyun	    #ls /mnt/a
78*4882a593Smuzhiyun	    t1 t2 t3
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun	Note that the mount has propagated to the mount at /mnt as well.
81*4882a593Smuzhiyun
82*4882a593Smuzhiyun	And the same is true even when /dev/sd0 is mounted on /mnt/a. The
83*4882a593Smuzhiyun	contents will be visible under /tmp/a too.
84*4882a593Smuzhiyun
85*4882a593Smuzhiyun
86*4882a593Smuzhiyun2b) A slave mount is like a shared mount except that mount and umount events
87*4882a593Smuzhiyun	only propagate towards it.
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun	All slave mounts have a master mount which is a shared.
90*4882a593Smuzhiyun
91*4882a593Smuzhiyun	Here is an example:
92*4882a593Smuzhiyun
93*4882a593Smuzhiyun	Let's say /mnt has a mount which is shared.
94*4882a593Smuzhiyun	# mount --make-shared /mnt
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun	Let's bind mount /mnt to /tmp
97*4882a593Smuzhiyun	# mount --bind /mnt /tmp
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun	the new mount at /tmp becomes a shared mount and it is a replica of
100*4882a593Smuzhiyun	the mount at /mnt.
101*4882a593Smuzhiyun
102*4882a593Smuzhiyun	Now let's make the mount at /tmp; a slave of /mnt
103*4882a593Smuzhiyun	# mount --make-slave /tmp
104*4882a593Smuzhiyun
105*4882a593Smuzhiyun	let's mount /dev/sd0 on /mnt/a
106*4882a593Smuzhiyun	# mount /dev/sd0 /mnt/a
107*4882a593Smuzhiyun
108*4882a593Smuzhiyun	#ls /mnt/a
109*4882a593Smuzhiyun	t1 t2 t3
110*4882a593Smuzhiyun
111*4882a593Smuzhiyun	#ls /tmp/a
112*4882a593Smuzhiyun	t1 t2 t3
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun	Note the mount event has propagated to the mount at /tmp
115*4882a593Smuzhiyun
116*4882a593Smuzhiyun	However let's see what happens if we mount something on the mount at /tmp
117*4882a593Smuzhiyun
118*4882a593Smuzhiyun	# mount /dev/sd1 /tmp/b
119*4882a593Smuzhiyun
120*4882a593Smuzhiyun	#ls /tmp/b
121*4882a593Smuzhiyun	s1 s2 s3
122*4882a593Smuzhiyun
123*4882a593Smuzhiyun	#ls /mnt/b
124*4882a593Smuzhiyun
125*4882a593Smuzhiyun	Note how the mount event has not propagated to the mount at
126*4882a593Smuzhiyun	/mnt
127*4882a593Smuzhiyun
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun2c) A private mount does not forward or receive propagation.
130*4882a593Smuzhiyun
131*4882a593Smuzhiyun	This is the mount we are familiar with. Its the default type.
132*4882a593Smuzhiyun
133*4882a593Smuzhiyun
134*4882a593Smuzhiyun2d) A unbindable mount is a unbindable private mount
135*4882a593Smuzhiyun
136*4882a593Smuzhiyun	let's say we have a mount at /mnt and we make it unbindable::
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun	    # mount --make-unbindable /mnt
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun	 Let's try to bind mount this mount somewhere else::
141*4882a593Smuzhiyun
142*4882a593Smuzhiyun	    # mount --bind /mnt /tmp
143*4882a593Smuzhiyun	    mount: wrong fs type, bad option, bad superblock on /mnt,
144*4882a593Smuzhiyun		    or too many mounted file systems
145*4882a593Smuzhiyun
146*4882a593Smuzhiyun	Binding a unbindable mount is a invalid operation.
147*4882a593Smuzhiyun
148*4882a593Smuzhiyun
149*4882a593Smuzhiyun3) Setting mount states
150*4882a593Smuzhiyun
151*4882a593Smuzhiyun	The mount command (util-linux package) can be used to set mount
152*4882a593Smuzhiyun	states::
153*4882a593Smuzhiyun
154*4882a593Smuzhiyun	    mount --make-shared mountpoint
155*4882a593Smuzhiyun	    mount --make-slave mountpoint
156*4882a593Smuzhiyun	    mount --make-private mountpoint
157*4882a593Smuzhiyun	    mount --make-unbindable mountpoint
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun
160*4882a593Smuzhiyun4) Use cases
161*4882a593Smuzhiyun------------
162*4882a593Smuzhiyun
163*4882a593Smuzhiyun	A) A process wants to clone its own namespace, but still wants to
164*4882a593Smuzhiyun	   access the CD that got mounted recently.
165*4882a593Smuzhiyun
166*4882a593Smuzhiyun	   Solution:
167*4882a593Smuzhiyun
168*4882a593Smuzhiyun		The system administrator can make the mount at /cdrom shared::
169*4882a593Smuzhiyun
170*4882a593Smuzhiyun		    mount --bind /cdrom /cdrom
171*4882a593Smuzhiyun		    mount --make-shared /cdrom
172*4882a593Smuzhiyun
173*4882a593Smuzhiyun		Now any process that clones off a new namespace will have a
174*4882a593Smuzhiyun		mount at /cdrom which is a replica of the same mount in the
175*4882a593Smuzhiyun		parent namespace.
176*4882a593Smuzhiyun
177*4882a593Smuzhiyun		So when a CD is inserted and mounted at /cdrom that mount gets
178*4882a593Smuzhiyun		propagated to the other mount at /cdrom in all the other clone
179*4882a593Smuzhiyun		namespaces.
180*4882a593Smuzhiyun
181*4882a593Smuzhiyun	B) A process wants its mounts invisible to any other process, but
182*4882a593Smuzhiyun	still be able to see the other system mounts.
183*4882a593Smuzhiyun
184*4882a593Smuzhiyun	   Solution:
185*4882a593Smuzhiyun
186*4882a593Smuzhiyun		To begin with, the administrator can mark the entire mount tree
187*4882a593Smuzhiyun		as shareable::
188*4882a593Smuzhiyun
189*4882a593Smuzhiyun		    mount --make-rshared /
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun		A new process can clone off a new namespace. And mark some part
192*4882a593Smuzhiyun		of its namespace as slave::
193*4882a593Smuzhiyun
194*4882a593Smuzhiyun		    mount --make-rslave /myprivatetree
195*4882a593Smuzhiyun
196*4882a593Smuzhiyun		Hence forth any mounts within the /myprivatetree done by the
197*4882a593Smuzhiyun		process will not show up in any other namespace. However mounts
198*4882a593Smuzhiyun		done in the parent namespace under /myprivatetree still shows
199*4882a593Smuzhiyun		up in the process's namespace.
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun
202*4882a593Smuzhiyun	Apart from the above semantics this feature provides the
203*4882a593Smuzhiyun	building blocks to solve the following problems:
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun	C)  Per-user namespace
206*4882a593Smuzhiyun
207*4882a593Smuzhiyun		The above semantics allows a way to share mounts across
208*4882a593Smuzhiyun		namespaces.  But namespaces are associated with processes. If
209*4882a593Smuzhiyun		namespaces are made first class objects with user API to
210*4882a593Smuzhiyun		associate/disassociate a namespace with userid, then each user
211*4882a593Smuzhiyun		could have his/her own namespace and tailor it to his/her
212*4882a593Smuzhiyun		requirements. This needs to be supported in PAM.
213*4882a593Smuzhiyun
214*4882a593Smuzhiyun	D)  Versioned files
215*4882a593Smuzhiyun
216*4882a593Smuzhiyun		If the entire mount tree is visible at multiple locations, then
217*4882a593Smuzhiyun		an underlying versioning file system can return different
218*4882a593Smuzhiyun		versions of the file depending on the path used to access that
219*4882a593Smuzhiyun		file.
220*4882a593Smuzhiyun
221*4882a593Smuzhiyun		An example is::
222*4882a593Smuzhiyun
223*4882a593Smuzhiyun		    mount --make-shared /
224*4882a593Smuzhiyun		    mount --rbind / /view/v1
225*4882a593Smuzhiyun		    mount --rbind / /view/v2
226*4882a593Smuzhiyun		    mount --rbind / /view/v3
227*4882a593Smuzhiyun		    mount --rbind / /view/v4
228*4882a593Smuzhiyun
229*4882a593Smuzhiyun		and if /usr has a versioning filesystem mounted, then that
230*4882a593Smuzhiyun		mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
231*4882a593Smuzhiyun		/view/v4/usr too
232*4882a593Smuzhiyun
233*4882a593Smuzhiyun		A user can request v3 version of the file /usr/fs/namespace.c
234*4882a593Smuzhiyun		by accessing /view/v3/usr/fs/namespace.c . The underlying
235*4882a593Smuzhiyun		versioning filesystem can then decipher that v3 version of the
236*4882a593Smuzhiyun		filesystem is being requested and return the corresponding
237*4882a593Smuzhiyun		inode.
238*4882a593Smuzhiyun
239*4882a593Smuzhiyun5) Detailed semantics
240*4882a593Smuzhiyun---------------------
241*4882a593Smuzhiyun	The section below explains the detailed semantics of
242*4882a593Smuzhiyun	bind, rbind, move, mount, umount and clone-namespace operations.
243*4882a593Smuzhiyun
244*4882a593Smuzhiyun	Note: the word 'vfsmount' and the noun 'mount' have been used
245*4882a593Smuzhiyun	to mean the same thing, throughout this document.
246*4882a593Smuzhiyun
247*4882a593Smuzhiyun5a) Mount states
248*4882a593Smuzhiyun
249*4882a593Smuzhiyun	A given mount can be in one of the following states
250*4882a593Smuzhiyun
251*4882a593Smuzhiyun	1) shared
252*4882a593Smuzhiyun	2) slave
253*4882a593Smuzhiyun	3) shared and slave
254*4882a593Smuzhiyun	4) private
255*4882a593Smuzhiyun	5) unbindable
256*4882a593Smuzhiyun
257*4882a593Smuzhiyun	A 'propagation event' is defined as event generated on a vfsmount
258*4882a593Smuzhiyun	that leads to mount or unmount actions in other vfsmounts.
259*4882a593Smuzhiyun
260*4882a593Smuzhiyun	A 'peer group' is defined as a group of vfsmounts that propagate
261*4882a593Smuzhiyun	events to each other.
262*4882a593Smuzhiyun
263*4882a593Smuzhiyun	(1) Shared mounts
264*4882a593Smuzhiyun
265*4882a593Smuzhiyun		A 'shared mount' is defined as a vfsmount that belongs to a
266*4882a593Smuzhiyun		'peer group'.
267*4882a593Smuzhiyun
268*4882a593Smuzhiyun		For example::
269*4882a593Smuzhiyun
270*4882a593Smuzhiyun			mount --make-shared /mnt
271*4882a593Smuzhiyun			mount --bind /mnt /tmp
272*4882a593Smuzhiyun
273*4882a593Smuzhiyun		The mount at /mnt and that at /tmp are both shared and belong
274*4882a593Smuzhiyun		to the same peer group. Anything mounted or unmounted under
275*4882a593Smuzhiyun		/mnt or /tmp reflect in all the other mounts of its peer
276*4882a593Smuzhiyun		group.
277*4882a593Smuzhiyun
278*4882a593Smuzhiyun
279*4882a593Smuzhiyun	(2) Slave mounts
280*4882a593Smuzhiyun
281*4882a593Smuzhiyun		A 'slave mount' is defined as a vfsmount that receives
282*4882a593Smuzhiyun		propagation events and does not forward propagation events.
283*4882a593Smuzhiyun
284*4882a593Smuzhiyun		A slave mount as the name implies has a master mount from which
285*4882a593Smuzhiyun		mount/unmount events are received. Events do not propagate from
286*4882a593Smuzhiyun		the slave mount to the master.  Only a shared mount can be made
287*4882a593Smuzhiyun		a slave by executing the following command::
288*4882a593Smuzhiyun
289*4882a593Smuzhiyun			mount --make-slave mount
290*4882a593Smuzhiyun
291*4882a593Smuzhiyun		A shared mount that is made as a slave is no more shared unless
292*4882a593Smuzhiyun		modified to become shared.
293*4882a593Smuzhiyun
294*4882a593Smuzhiyun	(3) Shared and Slave
295*4882a593Smuzhiyun
296*4882a593Smuzhiyun		A vfsmount can be both shared as well as slave.  This state
297*4882a593Smuzhiyun		indicates that the mount is a slave of some vfsmount, and
298*4882a593Smuzhiyun		has its own peer group too.  This vfsmount receives propagation
299*4882a593Smuzhiyun		events from its master vfsmount, and also forwards propagation
300*4882a593Smuzhiyun		events to its 'peer group' and to its slave vfsmounts.
301*4882a593Smuzhiyun
302*4882a593Smuzhiyun		Strictly speaking, the vfsmount is shared having its own
303*4882a593Smuzhiyun		peer group, and this peer-group is a slave of some other
304*4882a593Smuzhiyun		peer group.
305*4882a593Smuzhiyun
306*4882a593Smuzhiyun		Only a slave vfsmount can be made as 'shared and slave' by
307*4882a593Smuzhiyun		either executing the following command::
308*4882a593Smuzhiyun
309*4882a593Smuzhiyun			mount --make-shared mount
310*4882a593Smuzhiyun
311*4882a593Smuzhiyun		or by moving the slave vfsmount under a shared vfsmount.
312*4882a593Smuzhiyun
313*4882a593Smuzhiyun	(4) Private mount
314*4882a593Smuzhiyun
315*4882a593Smuzhiyun		A 'private mount' is defined as vfsmount that does not
316*4882a593Smuzhiyun		receive or forward any propagation events.
317*4882a593Smuzhiyun
318*4882a593Smuzhiyun	(5) Unbindable mount
319*4882a593Smuzhiyun
320*4882a593Smuzhiyun		A 'unbindable mount' is defined as vfsmount that does not
321*4882a593Smuzhiyun		receive or forward any propagation events and cannot
322*4882a593Smuzhiyun		be bind mounted.
323*4882a593Smuzhiyun
324*4882a593Smuzhiyun
325*4882a593Smuzhiyun   	State diagram:
326*4882a593Smuzhiyun
327*4882a593Smuzhiyun   	The state diagram below explains the state transition of a mount,
328*4882a593Smuzhiyun	in response to various commands::
329*4882a593Smuzhiyun
330*4882a593Smuzhiyun	    -----------------------------------------------------------------------
331*4882a593Smuzhiyun	    |             |make-shared |  make-slave  | make-private |make-unbindab|
332*4882a593Smuzhiyun	    --------------|------------|--------------|--------------|-------------|
333*4882a593Smuzhiyun	    |shared	  |shared      |*slave/private|   private    | unbindable  |
334*4882a593Smuzhiyun	    |             |            |              |              |             |
335*4882a593Smuzhiyun	    |-------------|------------|--------------|--------------|-------------|
336*4882a593Smuzhiyun	    |slave	  |shared      | **slave      |    private   | unbindable  |
337*4882a593Smuzhiyun	    |             |and slave   |              |              |             |
338*4882a593Smuzhiyun	    |-------------|------------|--------------|--------------|-------------|
339*4882a593Smuzhiyun	    |shared       |shared      | slave        |    private   | unbindable  |
340*4882a593Smuzhiyun	    |and slave    |and slave   |              |              |             |
341*4882a593Smuzhiyun	    |-------------|------------|--------------|--------------|-------------|
342*4882a593Smuzhiyun	    |private      |shared      |  **private   |    private   | unbindable  |
343*4882a593Smuzhiyun	    |-------------|------------|--------------|--------------|-------------|
344*4882a593Smuzhiyun	    |unbindable   |shared      |**unbindable  |    private   | unbindable  |
345*4882a593Smuzhiyun	    ------------------------------------------------------------------------
346*4882a593Smuzhiyun
347*4882a593Smuzhiyun	    * if the shared mount is the only mount in its peer group, making it
348*4882a593Smuzhiyun	    slave, makes it private automatically. Note that there is no master to
349*4882a593Smuzhiyun	    which it can be slaved to.
350*4882a593Smuzhiyun
351*4882a593Smuzhiyun	    ** slaving a non-shared mount has no effect on the mount.
352*4882a593Smuzhiyun
353*4882a593Smuzhiyun	Apart from the commands listed below, the 'move' operation also changes
354*4882a593Smuzhiyun	the state of a mount depending on type of the destination mount. Its
355*4882a593Smuzhiyun	explained in section 5d.
356*4882a593Smuzhiyun
357*4882a593Smuzhiyun5b) Bind semantics
358*4882a593Smuzhiyun
359*4882a593Smuzhiyun	Consider the following command::
360*4882a593Smuzhiyun
361*4882a593Smuzhiyun	    mount --bind A/a  B/b
362*4882a593Smuzhiyun
363*4882a593Smuzhiyun	where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
364*4882a593Smuzhiyun	is the destination mount and 'b' is the dentry in the destination mount.
365*4882a593Smuzhiyun
366*4882a593Smuzhiyun	The outcome depends on the type of mount of 'A' and 'B'. The table
367*4882a593Smuzhiyun	below contains quick reference::
368*4882a593Smuzhiyun
369*4882a593Smuzhiyun	    --------------------------------------------------------------------------
370*4882a593Smuzhiyun	    |         BIND MOUNT OPERATION                                           |
371*4882a593Smuzhiyun	    |************************************************************************|
372*4882a593Smuzhiyun	    |source(A)->| shared      |       private  |       slave    | unbindable |
373*4882a593Smuzhiyun	    | dest(B)  |              |                |                |            |
374*4882a593Smuzhiyun	    |   |      |              |                |                |            |
375*4882a593Smuzhiyun	    |   v      |              |                |                |            |
376*4882a593Smuzhiyun	    |************************************************************************|
377*4882a593Smuzhiyun	    |  shared  | shared       |     shared     | shared & slave |  invalid   |
378*4882a593Smuzhiyun	    |          |              |                |                |            |
379*4882a593Smuzhiyun	    |non-shared| shared       |      private   |      slave     |  invalid   |
380*4882a593Smuzhiyun	    **************************************************************************
381*4882a593Smuzhiyun
382*4882a593Smuzhiyun     	Details:
383*4882a593Smuzhiyun
384*4882a593Smuzhiyun    1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
385*4882a593Smuzhiyun	which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
386*4882a593Smuzhiyun	mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
387*4882a593Smuzhiyun	are created and mounted at the dentry 'b' on all mounts where 'B'
388*4882a593Smuzhiyun	propagates to. A new propagation tree containing 'C1',..,'Cn' is
389*4882a593Smuzhiyun	created. This propagation tree is identical to the propagation tree of
390*4882a593Smuzhiyun	'B'.  And finally the peer-group of 'C' is merged with the peer group
391*4882a593Smuzhiyun	of 'A'.
392*4882a593Smuzhiyun
393*4882a593Smuzhiyun    2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
394*4882a593Smuzhiyun	which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
395*4882a593Smuzhiyun	mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
396*4882a593Smuzhiyun	are created and mounted at the dentry 'b' on all mounts where 'B'
397*4882a593Smuzhiyun	propagates to. A new propagation tree is set containing all new mounts
398*4882a593Smuzhiyun	'C', 'C1', .., 'Cn' with exactly the same configuration as the
399*4882a593Smuzhiyun	propagation tree for 'B'.
400*4882a593Smuzhiyun
401*4882a593Smuzhiyun    3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
402*4882a593Smuzhiyun	mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
403*4882a593Smuzhiyun	'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
404*4882a593Smuzhiyun	'C3' ... are created and mounted at the dentry 'b' on all mounts where
405*4882a593Smuzhiyun	'B' propagates to. A new propagation tree containing the new mounts
406*4882a593Smuzhiyun	'C','C1',..  'Cn' is created. This propagation tree is identical to the
407*4882a593Smuzhiyun	propagation tree for 'B'. And finally the mount 'C' and its peer group
408*4882a593Smuzhiyun	is made the slave of mount 'Z'.  In other words, mount 'C' is in the
409*4882a593Smuzhiyun	state 'slave and shared'.
410*4882a593Smuzhiyun
411*4882a593Smuzhiyun    4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
412*4882a593Smuzhiyun	invalid operation.
413*4882a593Smuzhiyun
414*4882a593Smuzhiyun    5. 'A' is a private mount and 'B' is a non-shared(private or slave or
415*4882a593Smuzhiyun	unbindable) mount. A new mount 'C' which is clone of 'A', is created.
416*4882a593Smuzhiyun	Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
417*4882a593Smuzhiyun
418*4882a593Smuzhiyun    6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
419*4882a593Smuzhiyun	which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
420*4882a593Smuzhiyun	mounted on mount 'B' at dentry 'b'.  'C' is made a member of the
421*4882a593Smuzhiyun	peer-group of 'A'.
422*4882a593Smuzhiyun
423*4882a593Smuzhiyun    7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
424*4882a593Smuzhiyun	new mount 'C' which is a clone of 'A' is created. Its root dentry is
425*4882a593Smuzhiyun	'a'.  'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
426*4882a593Smuzhiyun	slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
427*4882a593Smuzhiyun	'Z'.  All mount/unmount events on 'Z' propagates to 'A' and 'C'. But
428*4882a593Smuzhiyun	mount/unmount on 'A' do not propagate anywhere else. Similarly
429*4882a593Smuzhiyun	mount/unmount on 'C' do not propagate anywhere else.
430*4882a593Smuzhiyun
431*4882a593Smuzhiyun    8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
432*4882a593Smuzhiyun	invalid operation. A unbindable mount cannot be bind mounted.
433*4882a593Smuzhiyun
434*4882a593Smuzhiyun5c) Rbind semantics
435*4882a593Smuzhiyun
436*4882a593Smuzhiyun	rbind is same as bind. Bind replicates the specified mount.  Rbind
437*4882a593Smuzhiyun	replicates all the mounts in the tree belonging to the specified mount.
438*4882a593Smuzhiyun	Rbind mount is bind mount applied to all the mounts in the tree.
439*4882a593Smuzhiyun
440*4882a593Smuzhiyun	If the source tree that is rbind has some unbindable mounts,
441*4882a593Smuzhiyun	then the subtree under the unbindable mount is pruned in the new
442*4882a593Smuzhiyun	location.
443*4882a593Smuzhiyun
444*4882a593Smuzhiyun	eg:
445*4882a593Smuzhiyun
446*4882a593Smuzhiyun	  let's say we have the following mount tree::
447*4882a593Smuzhiyun
448*4882a593Smuzhiyun		A
449*4882a593Smuzhiyun	      /   \
450*4882a593Smuzhiyun	      B   C
451*4882a593Smuzhiyun	     / \ / \
452*4882a593Smuzhiyun	     D E F G
453*4882a593Smuzhiyun
454*4882a593Smuzhiyun	  Let's say all the mount except the mount C in the tree are
455*4882a593Smuzhiyun	  of a type other than unbindable.
456*4882a593Smuzhiyun
457*4882a593Smuzhiyun	  If this tree is rbound to say Z
458*4882a593Smuzhiyun
459*4882a593Smuzhiyun	  We will have the following tree at the new location::
460*4882a593Smuzhiyun
461*4882a593Smuzhiyun		Z
462*4882a593Smuzhiyun		|
463*4882a593Smuzhiyun		A'
464*4882a593Smuzhiyun	       /
465*4882a593Smuzhiyun	      B'		Note how the tree under C is pruned
466*4882a593Smuzhiyun	     / \ 		in the new location.
467*4882a593Smuzhiyun	    D' E'
468*4882a593Smuzhiyun
469*4882a593Smuzhiyun
470*4882a593Smuzhiyun
471*4882a593Smuzhiyun5d) Move semantics
472*4882a593Smuzhiyun
473*4882a593Smuzhiyun	Consider the following command
474*4882a593Smuzhiyun
475*4882a593Smuzhiyun	mount --move A  B/b
476*4882a593Smuzhiyun
477*4882a593Smuzhiyun	where 'A' is the source mount, 'B' is the destination mount and 'b' is
478*4882a593Smuzhiyun	the dentry in the destination mount.
479*4882a593Smuzhiyun
480*4882a593Smuzhiyun	The outcome depends on the type of the mount of 'A' and 'B'. The table
481*4882a593Smuzhiyun	below is a quick reference::
482*4882a593Smuzhiyun
483*4882a593Smuzhiyun	    ---------------------------------------------------------------------------
484*4882a593Smuzhiyun	    |         		MOVE MOUNT OPERATION                                 |
485*4882a593Smuzhiyun	    |**************************************************************************
486*4882a593Smuzhiyun	    | source(A)->| shared      |       private  |       slave    | unbindable |
487*4882a593Smuzhiyun	    | dest(B)  |               |                |                |            |
488*4882a593Smuzhiyun	    |   |      |               |                |                |            |
489*4882a593Smuzhiyun	    |   v      |               |                |                |            |
490*4882a593Smuzhiyun	    |**************************************************************************
491*4882a593Smuzhiyun	    |  shared  | shared        |     shared     |shared and slave|  invalid   |
492*4882a593Smuzhiyun	    |          |               |                |                |            |
493*4882a593Smuzhiyun	    |non-shared| shared        |      private   |    slave       | unbindable |
494*4882a593Smuzhiyun	    ***************************************************************************
495*4882a593Smuzhiyun
496*4882a593Smuzhiyun	.. Note:: moving a mount residing under a shared mount is invalid.
497*4882a593Smuzhiyun
498*4882a593Smuzhiyun      Details follow:
499*4882a593Smuzhiyun
500*4882a593Smuzhiyun    1. 'A' is a shared mount and 'B' is a shared mount.  The mount 'A' is
501*4882a593Smuzhiyun	mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1', 'A2'...'An'
502*4882a593Smuzhiyun	are created and mounted at dentry 'b' on all mounts that receive
503*4882a593Smuzhiyun	propagation from mount 'B'. A new propagation tree is created in the
504*4882a593Smuzhiyun	exact same configuration as that of 'B'. This new propagation tree
505*4882a593Smuzhiyun	contains all the new mounts 'A1', 'A2'...  'An'.  And this new
506*4882a593Smuzhiyun	propagation tree is appended to the already existing propagation tree
507*4882a593Smuzhiyun	of 'A'.
508*4882a593Smuzhiyun
509*4882a593Smuzhiyun    2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
510*4882a593Smuzhiyun	mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
511*4882a593Smuzhiyun	are created and mounted at dentry 'b' on all mounts that receive
512*4882a593Smuzhiyun	propagation from mount 'B'. The mount 'A' becomes a shared mount and a
513*4882a593Smuzhiyun	propagation tree is created which is identical to that of
514*4882a593Smuzhiyun	'B'. This new propagation tree contains all the new mounts 'A1',
515*4882a593Smuzhiyun	'A2'...  'An'.
516*4882a593Smuzhiyun
517*4882a593Smuzhiyun    3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount.  The
518*4882a593Smuzhiyun	mount 'A' is mounted on mount 'B' at dentry 'b'.  Also new mounts 'A1',
519*4882a593Smuzhiyun	'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
520*4882a593Smuzhiyun	receive propagation from mount 'B'. A new propagation tree is created
521*4882a593Smuzhiyun	in the exact same configuration as that of 'B'. This new propagation
522*4882a593Smuzhiyun	tree contains all the new mounts 'A1', 'A2'...  'An'.  And this new
523*4882a593Smuzhiyun	propagation tree is appended to the already existing propagation tree of
524*4882a593Smuzhiyun	'A'.  Mount 'A' continues to be the slave mount of 'Z' but it also
525*4882a593Smuzhiyun	becomes 'shared'.
526*4882a593Smuzhiyun
527*4882a593Smuzhiyun    4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
528*4882a593Smuzhiyun	is invalid. Because mounting anything on the shared mount 'B' can
529*4882a593Smuzhiyun	create new mounts that get mounted on the mounts that receive
530*4882a593Smuzhiyun	propagation from 'B'.  And since the mount 'A' is unbindable, cloning
531*4882a593Smuzhiyun	it to mount at other mountpoints is not possible.
532*4882a593Smuzhiyun
533*4882a593Smuzhiyun    5. 'A' is a private mount and 'B' is a non-shared(private or slave or
534*4882a593Smuzhiyun	unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
535*4882a593Smuzhiyun
536*4882a593Smuzhiyun    6. 'A' is a shared mount and 'B' is a non-shared mount.  The mount 'A'
537*4882a593Smuzhiyun	is mounted on mount 'B' at dentry 'b'.  Mount 'A' continues to be a
538*4882a593Smuzhiyun	shared mount.
539*4882a593Smuzhiyun
540*4882a593Smuzhiyun    7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
541*4882a593Smuzhiyun	The mount 'A' is mounted on mount 'B' at dentry 'b'.  Mount 'A'
542*4882a593Smuzhiyun	continues to be a slave mount of mount 'Z'.
543*4882a593Smuzhiyun
544*4882a593Smuzhiyun    8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
545*4882a593Smuzhiyun	'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
546*4882a593Smuzhiyun	unbindable mount.
547*4882a593Smuzhiyun
548*4882a593Smuzhiyun5e) Mount semantics
549*4882a593Smuzhiyun
550*4882a593Smuzhiyun	Consider the following command::
551*4882a593Smuzhiyun
552*4882a593Smuzhiyun	    mount device  B/b
553*4882a593Smuzhiyun
554*4882a593Smuzhiyun	'B' is the destination mount and 'b' is the dentry in the destination
555*4882a593Smuzhiyun	mount.
556*4882a593Smuzhiyun
557*4882a593Smuzhiyun	The above operation is the same as bind operation with the exception
558*4882a593Smuzhiyun	that the source mount is always a private mount.
559*4882a593Smuzhiyun
560*4882a593Smuzhiyun
561*4882a593Smuzhiyun5f) Unmount semantics
562*4882a593Smuzhiyun
563*4882a593Smuzhiyun	Consider the following command::
564*4882a593Smuzhiyun
565*4882a593Smuzhiyun	    umount A
566*4882a593Smuzhiyun
567*4882a593Smuzhiyun	where 'A' is a mount mounted on mount 'B' at dentry 'b'.
568*4882a593Smuzhiyun
569*4882a593Smuzhiyun	If mount 'B' is shared, then all most-recently-mounted mounts at dentry
570*4882a593Smuzhiyun	'b' on mounts that receive propagation from mount 'B' and does not have
571*4882a593Smuzhiyun	sub-mounts within them are unmounted.
572*4882a593Smuzhiyun
573*4882a593Smuzhiyun	Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to
574*4882a593Smuzhiyun	each other.
575*4882a593Smuzhiyun
576*4882a593Smuzhiyun	let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount
577*4882a593Smuzhiyun	'B1', 'B2' and 'B3' respectively.
578*4882a593Smuzhiyun
579*4882a593Smuzhiyun	let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on
580*4882a593Smuzhiyun	mount 'B1', 'B2' and 'B3' respectively.
581*4882a593Smuzhiyun
582*4882a593Smuzhiyun	if 'C1' is unmounted, all the mounts that are most-recently-mounted on
583*4882a593Smuzhiyun	'B1' and on the mounts that 'B1' propagates-to are unmounted.
584*4882a593Smuzhiyun
585*4882a593Smuzhiyun	'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount
586*4882a593Smuzhiyun	on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'.
587*4882a593Smuzhiyun
588*4882a593Smuzhiyun	So all 'C1', 'C2' and 'C3' should be unmounted.
589*4882a593Smuzhiyun
590*4882a593Smuzhiyun	If any of 'C2' or 'C3' has some child mounts, then that mount is not
591*4882a593Smuzhiyun	unmounted, but all other mounts are unmounted. However if 'C1' is told
592*4882a593Smuzhiyun	to be unmounted and 'C1' has some sub-mounts, the umount operation is
593*4882a593Smuzhiyun	failed entirely.
594*4882a593Smuzhiyun
595*4882a593Smuzhiyun5g) Clone Namespace
596*4882a593Smuzhiyun
597*4882a593Smuzhiyun	A cloned namespace contains all the mounts as that of the parent
598*4882a593Smuzhiyun	namespace.
599*4882a593Smuzhiyun
600*4882a593Smuzhiyun	Let's say 'A' and 'B' are the corresponding mounts in the parent and the
601*4882a593Smuzhiyun	child namespace.
602*4882a593Smuzhiyun
603*4882a593Smuzhiyun	If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to
604*4882a593Smuzhiyun	each other.
605*4882a593Smuzhiyun
606*4882a593Smuzhiyun	If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of
607*4882a593Smuzhiyun	'Z'.
608*4882a593Smuzhiyun
609*4882a593Smuzhiyun	If 'A' is a private mount, then 'B' is a private mount too.
610*4882a593Smuzhiyun
611*4882a593Smuzhiyun	If 'A' is unbindable mount, then 'B' is a unbindable mount too.
612*4882a593Smuzhiyun
613*4882a593Smuzhiyun
614*4882a593Smuzhiyun6) Quiz
615*4882a593Smuzhiyun
616*4882a593Smuzhiyun	A. What is the result of the following command sequence?
617*4882a593Smuzhiyun
618*4882a593Smuzhiyun		::
619*4882a593Smuzhiyun
620*4882a593Smuzhiyun		    mount --bind /mnt /mnt
621*4882a593Smuzhiyun		    mount --make-shared /mnt
622*4882a593Smuzhiyun		    mount --bind /mnt /tmp
623*4882a593Smuzhiyun		    mount --move /tmp /mnt/1
624*4882a593Smuzhiyun
625*4882a593Smuzhiyun		what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
626*4882a593Smuzhiyun		Should they all be identical? or should /mnt and /mnt/1 be
627*4882a593Smuzhiyun		identical only?
628*4882a593Smuzhiyun
629*4882a593Smuzhiyun
630*4882a593Smuzhiyun	B. What is the result of the following command sequence?
631*4882a593Smuzhiyun
632*4882a593Smuzhiyun		::
633*4882a593Smuzhiyun
634*4882a593Smuzhiyun		    mount --make-rshared /
635*4882a593Smuzhiyun		    mkdir -p /v/1
636*4882a593Smuzhiyun		    mount --rbind / /v/1
637*4882a593Smuzhiyun
638*4882a593Smuzhiyun		what should be the content of /v/1/v/1 be?
639*4882a593Smuzhiyun
640*4882a593Smuzhiyun
641*4882a593Smuzhiyun	C. What is the result of the following command sequence?
642*4882a593Smuzhiyun
643*4882a593Smuzhiyun		::
644*4882a593Smuzhiyun
645*4882a593Smuzhiyun		    mount --bind /mnt /mnt
646*4882a593Smuzhiyun		    mount --make-shared /mnt
647*4882a593Smuzhiyun		    mkdir -p /mnt/1/2/3 /mnt/1/test
648*4882a593Smuzhiyun		    mount --bind /mnt/1 /tmp
649*4882a593Smuzhiyun		    mount --make-slave /mnt
650*4882a593Smuzhiyun		    mount --make-shared /mnt
651*4882a593Smuzhiyun		    mount --bind /mnt/1/2 /tmp1
652*4882a593Smuzhiyun		    mount --make-slave /mnt
653*4882a593Smuzhiyun
654*4882a593Smuzhiyun		At this point we have the first mount at /tmp and
655*4882a593Smuzhiyun		its root dentry is 1. Let's call this mount 'A'
656*4882a593Smuzhiyun		And then we have a second mount at /tmp1 with root
657*4882a593Smuzhiyun		dentry 2. Let's call this mount 'B'
658*4882a593Smuzhiyun		Next we have a third mount at /mnt with root dentry
659*4882a593Smuzhiyun		mnt. Let's call this mount 'C'
660*4882a593Smuzhiyun
661*4882a593Smuzhiyun		'B' is the slave of 'A' and 'C' is a slave of 'B'
662*4882a593Smuzhiyun		A -> B -> C
663*4882a593Smuzhiyun
664*4882a593Smuzhiyun		at this point if we execute the following command
665*4882a593Smuzhiyun
666*4882a593Smuzhiyun		mount --bind /bin /tmp/test
667*4882a593Smuzhiyun
668*4882a593Smuzhiyun		The mount is attempted on 'A'
669*4882a593Smuzhiyun
670*4882a593Smuzhiyun		will the mount propagate to 'B' and 'C' ?
671*4882a593Smuzhiyun
672*4882a593Smuzhiyun		what would be the contents of
673*4882a593Smuzhiyun		/mnt/1/test be?
674*4882a593Smuzhiyun
675*4882a593Smuzhiyun7) FAQ
676*4882a593Smuzhiyun
677*4882a593Smuzhiyun	Q1. Why is bind mount needed? How is it different from symbolic links?
678*4882a593Smuzhiyun		symbolic links can get stale if the destination mount gets
679*4882a593Smuzhiyun		unmounted or moved. Bind mounts continue to exist even if the
680*4882a593Smuzhiyun		other mount is unmounted or moved.
681*4882a593Smuzhiyun
682*4882a593Smuzhiyun	Q2. Why can't the shared subtree be implemented using exportfs?
683*4882a593Smuzhiyun
684*4882a593Smuzhiyun		exportfs is a heavyweight way of accomplishing part of what
685*4882a593Smuzhiyun		shared subtree can do. I cannot imagine a way to implement the
686*4882a593Smuzhiyun		semantics of slave mount using exportfs?
687*4882a593Smuzhiyun
688*4882a593Smuzhiyun	Q3 Why is unbindable mount needed?
689*4882a593Smuzhiyun
690*4882a593Smuzhiyun		Let's say we want to replicate the mount tree at multiple
691*4882a593Smuzhiyun		locations within the same subtree.
692*4882a593Smuzhiyun
693*4882a593Smuzhiyun		if one rbind mounts a tree within the same subtree 'n' times
694*4882a593Smuzhiyun		the number of mounts created is an exponential function of 'n'.
695*4882a593Smuzhiyun		Having unbindable mount can help prune the unneeded bind
696*4882a593Smuzhiyun		mounts. Here is an example.
697*4882a593Smuzhiyun
698*4882a593Smuzhiyun		step 1:
699*4882a593Smuzhiyun		   let's say the root tree has just two directories with
700*4882a593Smuzhiyun		   one vfsmount::
701*4882a593Smuzhiyun
702*4882a593Smuzhiyun				    root
703*4882a593Smuzhiyun				   /    \
704*4882a593Smuzhiyun				  tmp    usr
705*4882a593Smuzhiyun
706*4882a593Smuzhiyun		    And we want to replicate the tree at multiple
707*4882a593Smuzhiyun		    mountpoints under /root/tmp
708*4882a593Smuzhiyun
709*4882a593Smuzhiyun		step 2:
710*4882a593Smuzhiyun		      ::
711*4882a593Smuzhiyun
712*4882a593Smuzhiyun
713*4882a593Smuzhiyun			mount --make-shared /root
714*4882a593Smuzhiyun
715*4882a593Smuzhiyun			mkdir -p /tmp/m1
716*4882a593Smuzhiyun
717*4882a593Smuzhiyun			mount --rbind /root /tmp/m1
718*4882a593Smuzhiyun
719*4882a593Smuzhiyun		      the new tree now looks like this::
720*4882a593Smuzhiyun
721*4882a593Smuzhiyun				    root
722*4882a593Smuzhiyun				   /    \
723*4882a593Smuzhiyun				 tmp    usr
724*4882a593Smuzhiyun				/
725*4882a593Smuzhiyun			       m1
726*4882a593Smuzhiyun			      /  \
727*4882a593Smuzhiyun			     tmp  usr
728*4882a593Smuzhiyun			     /
729*4882a593Smuzhiyun			    m1
730*4882a593Smuzhiyun
731*4882a593Smuzhiyun			  it has two vfsmounts
732*4882a593Smuzhiyun
733*4882a593Smuzhiyun		step 3:
734*4882a593Smuzhiyun		    ::
735*4882a593Smuzhiyun
736*4882a593Smuzhiyun			    mkdir -p /tmp/m2
737*4882a593Smuzhiyun			    mount --rbind /root /tmp/m2
738*4882a593Smuzhiyun
739*4882a593Smuzhiyun			the new tree now looks like this::
740*4882a593Smuzhiyun
741*4882a593Smuzhiyun				      root
742*4882a593Smuzhiyun				     /    \
743*4882a593Smuzhiyun				   tmp     usr
744*4882a593Smuzhiyun				  /    \
745*4882a593Smuzhiyun				m1       m2
746*4882a593Smuzhiyun			       / \       /  \
747*4882a593Smuzhiyun			     tmp  usr   tmp  usr
748*4882a593Smuzhiyun			     / \          /
749*4882a593Smuzhiyun			    m1  m2      m1
750*4882a593Smuzhiyun				/ \     /  \
751*4882a593Smuzhiyun			      tmp usr  tmp   usr
752*4882a593Smuzhiyun			      /        / \
753*4882a593Smuzhiyun			     m1       m1  m2
754*4882a593Smuzhiyun			    /  \
755*4882a593Smuzhiyun			  tmp   usr
756*4882a593Smuzhiyun			  /  \
757*4882a593Smuzhiyun			 m1   m2
758*4882a593Smuzhiyun
759*4882a593Smuzhiyun		       it has 6 vfsmounts
760*4882a593Smuzhiyun
761*4882a593Smuzhiyun		step 4:
762*4882a593Smuzhiyun		      ::
763*4882a593Smuzhiyun			  mkdir -p /tmp/m3
764*4882a593Smuzhiyun			  mount --rbind /root /tmp/m3
765*4882a593Smuzhiyun
766*4882a593Smuzhiyun			  I won't draw the tree..but it has 24 vfsmounts
767*4882a593Smuzhiyun
768*4882a593Smuzhiyun
769*4882a593Smuzhiyun		at step i the number of vfsmounts is V[i] = i*V[i-1].
770*4882a593Smuzhiyun		This is an exponential function. And this tree has way more
771*4882a593Smuzhiyun		mounts than what we really needed in the first place.
772*4882a593Smuzhiyun
773*4882a593Smuzhiyun		One could use a series of umount at each step to prune
774*4882a593Smuzhiyun		out the unneeded mounts. But there is a better solution.
775*4882a593Smuzhiyun		Unclonable mounts come in handy here.
776*4882a593Smuzhiyun
777*4882a593Smuzhiyun		step 1:
778*4882a593Smuzhiyun		   let's say the root tree has just two directories with
779*4882a593Smuzhiyun		   one vfsmount::
780*4882a593Smuzhiyun
781*4882a593Smuzhiyun				    root
782*4882a593Smuzhiyun				   /    \
783*4882a593Smuzhiyun				  tmp    usr
784*4882a593Smuzhiyun
785*4882a593Smuzhiyun		    How do we set up the same tree at multiple locations under
786*4882a593Smuzhiyun		    /root/tmp
787*4882a593Smuzhiyun
788*4882a593Smuzhiyun		step 2:
789*4882a593Smuzhiyun		      ::
790*4882a593Smuzhiyun
791*4882a593Smuzhiyun
792*4882a593Smuzhiyun			mount --bind /root/tmp /root/tmp
793*4882a593Smuzhiyun
794*4882a593Smuzhiyun			mount --make-rshared /root
795*4882a593Smuzhiyun			mount --make-unbindable /root/tmp
796*4882a593Smuzhiyun
797*4882a593Smuzhiyun			mkdir -p /tmp/m1
798*4882a593Smuzhiyun
799*4882a593Smuzhiyun			mount --rbind /root /tmp/m1
800*4882a593Smuzhiyun
801*4882a593Smuzhiyun		      the new tree now looks like this::
802*4882a593Smuzhiyun
803*4882a593Smuzhiyun				    root
804*4882a593Smuzhiyun				   /    \
805*4882a593Smuzhiyun				 tmp    usr
806*4882a593Smuzhiyun				/
807*4882a593Smuzhiyun			       m1
808*4882a593Smuzhiyun			      /  \
809*4882a593Smuzhiyun			     tmp  usr
810*4882a593Smuzhiyun
811*4882a593Smuzhiyun		step 3:
812*4882a593Smuzhiyun		      ::
813*4882a593Smuzhiyun
814*4882a593Smuzhiyun			    mkdir -p /tmp/m2
815*4882a593Smuzhiyun			    mount --rbind /root /tmp/m2
816*4882a593Smuzhiyun
817*4882a593Smuzhiyun		      the new tree now looks like this::
818*4882a593Smuzhiyun
819*4882a593Smuzhiyun				    root
820*4882a593Smuzhiyun				   /    \
821*4882a593Smuzhiyun				 tmp    usr
822*4882a593Smuzhiyun				/   \
823*4882a593Smuzhiyun			       m1     m2
824*4882a593Smuzhiyun			      /  \     / \
825*4882a593Smuzhiyun			     tmp  usr tmp usr
826*4882a593Smuzhiyun
827*4882a593Smuzhiyun		step 4:
828*4882a593Smuzhiyun		      ::
829*4882a593Smuzhiyun
830*4882a593Smuzhiyun			    mkdir -p /tmp/m3
831*4882a593Smuzhiyun			    mount --rbind /root /tmp/m3
832*4882a593Smuzhiyun
833*4882a593Smuzhiyun		      the new tree now looks like this::
834*4882a593Smuzhiyun
835*4882a593Smuzhiyun				    	  root
836*4882a593Smuzhiyun				      /    	  \
837*4882a593Smuzhiyun				     tmp    	   usr
838*4882a593Smuzhiyun			         /    \    \
839*4882a593Smuzhiyun			       m1     m2     m3
840*4882a593Smuzhiyun			      /  \     / \    /  \
841*4882a593Smuzhiyun			     tmp  usr tmp usr tmp usr
842*4882a593Smuzhiyun
843*4882a593Smuzhiyun8) Implementation
844*4882a593Smuzhiyun
845*4882a593Smuzhiyun8A) Datastructure
846*4882a593Smuzhiyun
847*4882a593Smuzhiyun	4 new fields are introduced to struct vfsmount:
848*4882a593Smuzhiyun
849*4882a593Smuzhiyun	*   ->mnt_share
850*4882a593Smuzhiyun	*   ->mnt_slave_list
851*4882a593Smuzhiyun	*   ->mnt_slave
852*4882a593Smuzhiyun	*   ->mnt_master
853*4882a593Smuzhiyun
854*4882a593Smuzhiyun	->mnt_share
855*4882a593Smuzhiyun		links together all the mount to/from which this vfsmount
856*4882a593Smuzhiyun		send/receives propagation events.
857*4882a593Smuzhiyun
858*4882a593Smuzhiyun	->mnt_slave_list
859*4882a593Smuzhiyun		links all the mounts to which this vfsmount propagates
860*4882a593Smuzhiyun		to.
861*4882a593Smuzhiyun
862*4882a593Smuzhiyun	->mnt_slave
863*4882a593Smuzhiyun		links together all the slaves that its master vfsmount
864*4882a593Smuzhiyun		propagates to.
865*4882a593Smuzhiyun
866*4882a593Smuzhiyun	->mnt_master
867*4882a593Smuzhiyun		points to the master vfsmount from which this vfsmount
868*4882a593Smuzhiyun		receives propagation.
869*4882a593Smuzhiyun
870*4882a593Smuzhiyun	->mnt_flags
871*4882a593Smuzhiyun		takes two more flags to indicate the propagation status of
872*4882a593Smuzhiyun		the vfsmount.  MNT_SHARE indicates that the vfsmount is a shared
873*4882a593Smuzhiyun		vfsmount.  MNT_UNCLONABLE indicates that the vfsmount cannot be
874*4882a593Smuzhiyun		replicated.
875*4882a593Smuzhiyun
876*4882a593Smuzhiyun	All the shared vfsmounts in a peer group form a cyclic list through
877*4882a593Smuzhiyun	->mnt_share.
878*4882a593Smuzhiyun
879*4882a593Smuzhiyun	All vfsmounts with the same ->mnt_master form on a cyclic list anchored
880*4882a593Smuzhiyun	in ->mnt_master->mnt_slave_list and going through ->mnt_slave.
881*4882a593Smuzhiyun
882*4882a593Smuzhiyun	 ->mnt_master can point to arbitrary (and possibly different) members
883*4882a593Smuzhiyun	 of master peer group.  To find all immediate slaves of a peer group
884*4882a593Smuzhiyun	 you need to go through _all_ ->mnt_slave_list of its members.
885*4882a593Smuzhiyun	 Conceptually it's just a single set - distribution among the
886*4882a593Smuzhiyun	 individual lists does not affect propagation or the way propagation
887*4882a593Smuzhiyun	 tree is modified by operations.
888*4882a593Smuzhiyun
889*4882a593Smuzhiyun	All vfsmounts in a peer group have the same ->mnt_master.  If it is
890*4882a593Smuzhiyun	non-NULL, they form a contiguous (ordered) segment of slave list.
891*4882a593Smuzhiyun
892*4882a593Smuzhiyun	A example propagation tree looks as shown in the figure below.
893*4882a593Smuzhiyun	[ NOTE: Though it looks like a forest, if we consider all the shared
894*4882a593Smuzhiyun	mounts as a conceptual entity called 'pnode', it becomes a tree]::
895*4882a593Smuzhiyun
896*4882a593Smuzhiyun
897*4882a593Smuzhiyun		        A <--> B <--> C <---> D
898*4882a593Smuzhiyun		       /|\	      /|      |\
899*4882a593Smuzhiyun		      / F G	     J K      H I
900*4882a593Smuzhiyun		     /
901*4882a593Smuzhiyun		    E<-->K
902*4882a593Smuzhiyun			/|\
903*4882a593Smuzhiyun		       M L N
904*4882a593Smuzhiyun
905*4882a593Smuzhiyun	In the above figure  A,B,C and D all are shared and propagate to each
906*4882a593Smuzhiyun	other.   'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave
907*4882a593Smuzhiyun	mounts 'J' and 'K'  and  'D' has got two slave mounts 'H' and 'I'.
908*4882a593Smuzhiyun	'E' is also shared with 'K' and they propagate to each other.  And
909*4882a593Smuzhiyun	'K' has 3 slaves 'M', 'L' and 'N'
910*4882a593Smuzhiyun
911*4882a593Smuzhiyun	A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D'
912*4882a593Smuzhiyun
913*4882a593Smuzhiyun	A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
914*4882a593Smuzhiyun
915*4882a593Smuzhiyun	E's ->mnt_share links with ->mnt_share of K
916*4882a593Smuzhiyun
917*4882a593Smuzhiyun	'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A'
918*4882a593Smuzhiyun
919*4882a593Smuzhiyun	'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
920*4882a593Smuzhiyun
921*4882a593Smuzhiyun	K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
922*4882a593Smuzhiyun
923*4882a593Smuzhiyun	C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
924*4882a593Smuzhiyun
925*4882a593Smuzhiyun	J and K's ->mnt_master points to struct vfsmount of C
926*4882a593Smuzhiyun
927*4882a593Smuzhiyun	and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
928*4882a593Smuzhiyun
929*4882a593Smuzhiyun	'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
930*4882a593Smuzhiyun
931*4882a593Smuzhiyun
932*4882a593Smuzhiyun	NOTE: The propagation tree is orthogonal to the mount tree.
933*4882a593Smuzhiyun
934*4882a593Smuzhiyun8B Locking:
935*4882a593Smuzhiyun
936*4882a593Smuzhiyun	->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected
937*4882a593Smuzhiyun	by namespace_sem (exclusive for modifications, shared for reading).
938*4882a593Smuzhiyun
939*4882a593Smuzhiyun	Normally we have ->mnt_flags modifications serialized by vfsmount_lock.
940*4882a593Smuzhiyun	There are two exceptions: do_add_mount() and clone_mnt().
941*4882a593Smuzhiyun	The former modifies a vfsmount that has not been visible in any shared
942*4882a593Smuzhiyun	data structures yet.
943*4882a593Smuzhiyun	The latter holds namespace_sem and the only references to vfsmount
944*4882a593Smuzhiyun	are in lists that can't be traversed without namespace_sem.
945*4882a593Smuzhiyun
946*4882a593Smuzhiyun8C Algorithm:
947*4882a593Smuzhiyun
948*4882a593Smuzhiyun	The crux of the implementation resides in rbind/move operation.
949*4882a593Smuzhiyun
950*4882a593Smuzhiyun	The overall algorithm breaks the operation into 3 phases: (look at
951*4882a593Smuzhiyun	attach_recursive_mnt() and propagate_mnt())
952*4882a593Smuzhiyun
953*4882a593Smuzhiyun	1. prepare phase.
954*4882a593Smuzhiyun	2. commit phases.
955*4882a593Smuzhiyun	3. abort phases.
956*4882a593Smuzhiyun
957*4882a593Smuzhiyun	Prepare phase:
958*4882a593Smuzhiyun
959*4882a593Smuzhiyun	for each mount in the source tree:
960*4882a593Smuzhiyun
961*4882a593Smuzhiyun		   a) Create the necessary number of mount trees to
962*4882a593Smuzhiyun		   	be attached to each of the mounts that receive
963*4882a593Smuzhiyun			propagation from the destination mount.
964*4882a593Smuzhiyun		   b) Do not attach any of the trees to its destination.
965*4882a593Smuzhiyun		      However note down its ->mnt_parent and ->mnt_mountpoint
966*4882a593Smuzhiyun		   c) Link all the new mounts to form a propagation tree that
967*4882a593Smuzhiyun		      is identical to the propagation tree of the destination
968*4882a593Smuzhiyun		      mount.
969*4882a593Smuzhiyun
970*4882a593Smuzhiyun		   If this phase is successful, there should be 'n' new
971*4882a593Smuzhiyun		   propagation trees; where 'n' is the number of mounts in the
972*4882a593Smuzhiyun		   source tree.  Go to the commit phase
973*4882a593Smuzhiyun
974*4882a593Smuzhiyun		   Also there should be 'm' new mount trees, where 'm' is
975*4882a593Smuzhiyun		   the number of mounts to which the destination mount
976*4882a593Smuzhiyun		   propagates to.
977*4882a593Smuzhiyun
978*4882a593Smuzhiyun		   if any memory allocations fail, go to the abort phase.
979*4882a593Smuzhiyun
980*4882a593Smuzhiyun	Commit phase
981*4882a593Smuzhiyun		attach each of the mount trees to their corresponding
982*4882a593Smuzhiyun		destination mounts.
983*4882a593Smuzhiyun
984*4882a593Smuzhiyun	Abort phase
985*4882a593Smuzhiyun		delete all the newly created trees.
986*4882a593Smuzhiyun
987*4882a593Smuzhiyun	.. Note::
988*4882a593Smuzhiyun	   all the propagation related functionality resides in the file pnode.c
989*4882a593Smuzhiyun
990*4882a593Smuzhiyun
991*4882a593Smuzhiyun------------------------------------------------------------------------
992*4882a593Smuzhiyun
993*4882a593Smuzhiyunversion 0.1  (created the initial document, Ram Pai linuxram@us.ibm.com)
994*4882a593Smuzhiyun
995*4882a593Smuzhiyunversion 0.2  (Incorporated comments from Al Viro)
996