1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun=============== 4*4882a593SmuzhiyunShared Subtrees 5*4882a593Smuzhiyun=============== 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun.. Contents: 8*4882a593Smuzhiyun 1) Overview 9*4882a593Smuzhiyun 2) Features 10*4882a593Smuzhiyun 3) Setting mount states 11*4882a593Smuzhiyun 4) Use-case 12*4882a593Smuzhiyun 5) Detailed semantics 13*4882a593Smuzhiyun 6) Quiz 14*4882a593Smuzhiyun 7) FAQ 15*4882a593Smuzhiyun 8) Implementation 16*4882a593Smuzhiyun 17*4882a593Smuzhiyun 18*4882a593Smuzhiyun1) Overview 19*4882a593Smuzhiyun----------- 20*4882a593Smuzhiyun 21*4882a593SmuzhiyunConsider the following situation: 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunA process wants to clone its own namespace, but still wants to access the CD 24*4882a593Smuzhiyunthat got mounted recently. Shared subtree semantics provide the necessary 25*4882a593Smuzhiyunmechanism to accomplish the above. 26*4882a593Smuzhiyun 27*4882a593SmuzhiyunIt provides the necessary building blocks for features like per-user-namespace 28*4882a593Smuzhiyunand versioned filesystem. 29*4882a593Smuzhiyun 30*4882a593Smuzhiyun2) Features 31*4882a593Smuzhiyun----------- 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunShared subtree provides four different flavors of mounts; struct vfsmount to be 34*4882a593Smuzhiyunprecise 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun a. shared mount 37*4882a593Smuzhiyun b. slave mount 38*4882a593Smuzhiyun c. private mount 39*4882a593Smuzhiyun d. unbindable mount 40*4882a593Smuzhiyun 41*4882a593Smuzhiyun 42*4882a593Smuzhiyun2a) A shared mount can be replicated to as many mountpoints and all the 43*4882a593Smuzhiyunreplicas continue to be exactly same. 44*4882a593Smuzhiyun 45*4882a593Smuzhiyun Here is an example: 46*4882a593Smuzhiyun 47*4882a593Smuzhiyun Let's say /mnt has a mount that is shared:: 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun mount --make-shared /mnt 50*4882a593Smuzhiyun 51*4882a593Smuzhiyun Note: mount(8) command now supports the --make-shared flag, 52*4882a593Smuzhiyun so the sample 'smount' program is no longer needed and has been 53*4882a593Smuzhiyun removed. 54*4882a593Smuzhiyun 55*4882a593Smuzhiyun :: 56*4882a593Smuzhiyun 57*4882a593Smuzhiyun # mount --bind /mnt /tmp 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun The above command replicates the mount at /mnt to the mountpoint /tmp 60*4882a593Smuzhiyun and the contents of both the mounts remain identical. 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun :: 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun #ls /mnt 65*4882a593Smuzhiyun a b c 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun #ls /tmp 68*4882a593Smuzhiyun a b c 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun Now let's say we mount a device at /tmp/a:: 71*4882a593Smuzhiyun 72*4882a593Smuzhiyun # mount /dev/sd0 /tmp/a 73*4882a593Smuzhiyun 74*4882a593Smuzhiyun #ls /tmp/a 75*4882a593Smuzhiyun t1 t2 t3 76*4882a593Smuzhiyun 77*4882a593Smuzhiyun #ls /mnt/a 78*4882a593Smuzhiyun t1 t2 t3 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun Note that the mount has propagated to the mount at /mnt as well. 81*4882a593Smuzhiyun 82*4882a593Smuzhiyun And the same is true even when /dev/sd0 is mounted on /mnt/a. The 83*4882a593Smuzhiyun contents will be visible under /tmp/a too. 84*4882a593Smuzhiyun 85*4882a593Smuzhiyun 86*4882a593Smuzhiyun2b) A slave mount is like a shared mount except that mount and umount events 87*4882a593Smuzhiyun only propagate towards it. 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun All slave mounts have a master mount which is a shared. 90*4882a593Smuzhiyun 91*4882a593Smuzhiyun Here is an example: 92*4882a593Smuzhiyun 93*4882a593Smuzhiyun Let's say /mnt has a mount which is shared. 94*4882a593Smuzhiyun # mount --make-shared /mnt 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun Let's bind mount /mnt to /tmp 97*4882a593Smuzhiyun # mount --bind /mnt /tmp 98*4882a593Smuzhiyun 99*4882a593Smuzhiyun the new mount at /tmp becomes a shared mount and it is a replica of 100*4882a593Smuzhiyun the mount at /mnt. 101*4882a593Smuzhiyun 102*4882a593Smuzhiyun Now let's make the mount at /tmp; a slave of /mnt 103*4882a593Smuzhiyun # mount --make-slave /tmp 104*4882a593Smuzhiyun 105*4882a593Smuzhiyun let's mount /dev/sd0 on /mnt/a 106*4882a593Smuzhiyun # mount /dev/sd0 /mnt/a 107*4882a593Smuzhiyun 108*4882a593Smuzhiyun #ls /mnt/a 109*4882a593Smuzhiyun t1 t2 t3 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun #ls /tmp/a 112*4882a593Smuzhiyun t1 t2 t3 113*4882a593Smuzhiyun 114*4882a593Smuzhiyun Note the mount event has propagated to the mount at /tmp 115*4882a593Smuzhiyun 116*4882a593Smuzhiyun However let's see what happens if we mount something on the mount at /tmp 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun # mount /dev/sd1 /tmp/b 119*4882a593Smuzhiyun 120*4882a593Smuzhiyun #ls /tmp/b 121*4882a593Smuzhiyun s1 s2 s3 122*4882a593Smuzhiyun 123*4882a593Smuzhiyun #ls /mnt/b 124*4882a593Smuzhiyun 125*4882a593Smuzhiyun Note how the mount event has not propagated to the mount at 126*4882a593Smuzhiyun /mnt 127*4882a593Smuzhiyun 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun2c) A private mount does not forward or receive propagation. 130*4882a593Smuzhiyun 131*4882a593Smuzhiyun This is the mount we are familiar with. Its the default type. 132*4882a593Smuzhiyun 133*4882a593Smuzhiyun 134*4882a593Smuzhiyun2d) A unbindable mount is a unbindable private mount 135*4882a593Smuzhiyun 136*4882a593Smuzhiyun let's say we have a mount at /mnt and we make it unbindable:: 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun # mount --make-unbindable /mnt 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun Let's try to bind mount this mount somewhere else:: 141*4882a593Smuzhiyun 142*4882a593Smuzhiyun # mount --bind /mnt /tmp 143*4882a593Smuzhiyun mount: wrong fs type, bad option, bad superblock on /mnt, 144*4882a593Smuzhiyun or too many mounted file systems 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun Binding a unbindable mount is a invalid operation. 147*4882a593Smuzhiyun 148*4882a593Smuzhiyun 149*4882a593Smuzhiyun3) Setting mount states 150*4882a593Smuzhiyun 151*4882a593Smuzhiyun The mount command (util-linux package) can be used to set mount 152*4882a593Smuzhiyun states:: 153*4882a593Smuzhiyun 154*4882a593Smuzhiyun mount --make-shared mountpoint 155*4882a593Smuzhiyun mount --make-slave mountpoint 156*4882a593Smuzhiyun mount --make-private mountpoint 157*4882a593Smuzhiyun mount --make-unbindable mountpoint 158*4882a593Smuzhiyun 159*4882a593Smuzhiyun 160*4882a593Smuzhiyun4) Use cases 161*4882a593Smuzhiyun------------ 162*4882a593Smuzhiyun 163*4882a593Smuzhiyun A) A process wants to clone its own namespace, but still wants to 164*4882a593Smuzhiyun access the CD that got mounted recently. 165*4882a593Smuzhiyun 166*4882a593Smuzhiyun Solution: 167*4882a593Smuzhiyun 168*4882a593Smuzhiyun The system administrator can make the mount at /cdrom shared:: 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun mount --bind /cdrom /cdrom 171*4882a593Smuzhiyun mount --make-shared /cdrom 172*4882a593Smuzhiyun 173*4882a593Smuzhiyun Now any process that clones off a new namespace will have a 174*4882a593Smuzhiyun mount at /cdrom which is a replica of the same mount in the 175*4882a593Smuzhiyun parent namespace. 176*4882a593Smuzhiyun 177*4882a593Smuzhiyun So when a CD is inserted and mounted at /cdrom that mount gets 178*4882a593Smuzhiyun propagated to the other mount at /cdrom in all the other clone 179*4882a593Smuzhiyun namespaces. 180*4882a593Smuzhiyun 181*4882a593Smuzhiyun B) A process wants its mounts invisible to any other process, but 182*4882a593Smuzhiyun still be able to see the other system mounts. 183*4882a593Smuzhiyun 184*4882a593Smuzhiyun Solution: 185*4882a593Smuzhiyun 186*4882a593Smuzhiyun To begin with, the administrator can mark the entire mount tree 187*4882a593Smuzhiyun as shareable:: 188*4882a593Smuzhiyun 189*4882a593Smuzhiyun mount --make-rshared / 190*4882a593Smuzhiyun 191*4882a593Smuzhiyun A new process can clone off a new namespace. And mark some part 192*4882a593Smuzhiyun of its namespace as slave:: 193*4882a593Smuzhiyun 194*4882a593Smuzhiyun mount --make-rslave /myprivatetree 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun Hence forth any mounts within the /myprivatetree done by the 197*4882a593Smuzhiyun process will not show up in any other namespace. However mounts 198*4882a593Smuzhiyun done in the parent namespace under /myprivatetree still shows 199*4882a593Smuzhiyun up in the process's namespace. 200*4882a593Smuzhiyun 201*4882a593Smuzhiyun 202*4882a593Smuzhiyun Apart from the above semantics this feature provides the 203*4882a593Smuzhiyun building blocks to solve the following problems: 204*4882a593Smuzhiyun 205*4882a593Smuzhiyun C) Per-user namespace 206*4882a593Smuzhiyun 207*4882a593Smuzhiyun The above semantics allows a way to share mounts across 208*4882a593Smuzhiyun namespaces. But namespaces are associated with processes. If 209*4882a593Smuzhiyun namespaces are made first class objects with user API to 210*4882a593Smuzhiyun associate/disassociate a namespace with userid, then each user 211*4882a593Smuzhiyun could have his/her own namespace and tailor it to his/her 212*4882a593Smuzhiyun requirements. This needs to be supported in PAM. 213*4882a593Smuzhiyun 214*4882a593Smuzhiyun D) Versioned files 215*4882a593Smuzhiyun 216*4882a593Smuzhiyun If the entire mount tree is visible at multiple locations, then 217*4882a593Smuzhiyun an underlying versioning file system can return different 218*4882a593Smuzhiyun versions of the file depending on the path used to access that 219*4882a593Smuzhiyun file. 220*4882a593Smuzhiyun 221*4882a593Smuzhiyun An example is:: 222*4882a593Smuzhiyun 223*4882a593Smuzhiyun mount --make-shared / 224*4882a593Smuzhiyun mount --rbind / /view/v1 225*4882a593Smuzhiyun mount --rbind / /view/v2 226*4882a593Smuzhiyun mount --rbind / /view/v3 227*4882a593Smuzhiyun mount --rbind / /view/v4 228*4882a593Smuzhiyun 229*4882a593Smuzhiyun and if /usr has a versioning filesystem mounted, then that 230*4882a593Smuzhiyun mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and 231*4882a593Smuzhiyun /view/v4/usr too 232*4882a593Smuzhiyun 233*4882a593Smuzhiyun A user can request v3 version of the file /usr/fs/namespace.c 234*4882a593Smuzhiyun by accessing /view/v3/usr/fs/namespace.c . The underlying 235*4882a593Smuzhiyun versioning filesystem can then decipher that v3 version of the 236*4882a593Smuzhiyun filesystem is being requested and return the corresponding 237*4882a593Smuzhiyun inode. 238*4882a593Smuzhiyun 239*4882a593Smuzhiyun5) Detailed semantics 240*4882a593Smuzhiyun--------------------- 241*4882a593Smuzhiyun The section below explains the detailed semantics of 242*4882a593Smuzhiyun bind, rbind, move, mount, umount and clone-namespace operations. 243*4882a593Smuzhiyun 244*4882a593Smuzhiyun Note: the word 'vfsmount' and the noun 'mount' have been used 245*4882a593Smuzhiyun to mean the same thing, throughout this document. 246*4882a593Smuzhiyun 247*4882a593Smuzhiyun5a) Mount states 248*4882a593Smuzhiyun 249*4882a593Smuzhiyun A given mount can be in one of the following states 250*4882a593Smuzhiyun 251*4882a593Smuzhiyun 1) shared 252*4882a593Smuzhiyun 2) slave 253*4882a593Smuzhiyun 3) shared and slave 254*4882a593Smuzhiyun 4) private 255*4882a593Smuzhiyun 5) unbindable 256*4882a593Smuzhiyun 257*4882a593Smuzhiyun A 'propagation event' is defined as event generated on a vfsmount 258*4882a593Smuzhiyun that leads to mount or unmount actions in other vfsmounts. 259*4882a593Smuzhiyun 260*4882a593Smuzhiyun A 'peer group' is defined as a group of vfsmounts that propagate 261*4882a593Smuzhiyun events to each other. 262*4882a593Smuzhiyun 263*4882a593Smuzhiyun (1) Shared mounts 264*4882a593Smuzhiyun 265*4882a593Smuzhiyun A 'shared mount' is defined as a vfsmount that belongs to a 266*4882a593Smuzhiyun 'peer group'. 267*4882a593Smuzhiyun 268*4882a593Smuzhiyun For example:: 269*4882a593Smuzhiyun 270*4882a593Smuzhiyun mount --make-shared /mnt 271*4882a593Smuzhiyun mount --bind /mnt /tmp 272*4882a593Smuzhiyun 273*4882a593Smuzhiyun The mount at /mnt and that at /tmp are both shared and belong 274*4882a593Smuzhiyun to the same peer group. Anything mounted or unmounted under 275*4882a593Smuzhiyun /mnt or /tmp reflect in all the other mounts of its peer 276*4882a593Smuzhiyun group. 277*4882a593Smuzhiyun 278*4882a593Smuzhiyun 279*4882a593Smuzhiyun (2) Slave mounts 280*4882a593Smuzhiyun 281*4882a593Smuzhiyun A 'slave mount' is defined as a vfsmount that receives 282*4882a593Smuzhiyun propagation events and does not forward propagation events. 283*4882a593Smuzhiyun 284*4882a593Smuzhiyun A slave mount as the name implies has a master mount from which 285*4882a593Smuzhiyun mount/unmount events are received. Events do not propagate from 286*4882a593Smuzhiyun the slave mount to the master. Only a shared mount can be made 287*4882a593Smuzhiyun a slave by executing the following command:: 288*4882a593Smuzhiyun 289*4882a593Smuzhiyun mount --make-slave mount 290*4882a593Smuzhiyun 291*4882a593Smuzhiyun A shared mount that is made as a slave is no more shared unless 292*4882a593Smuzhiyun modified to become shared. 293*4882a593Smuzhiyun 294*4882a593Smuzhiyun (3) Shared and Slave 295*4882a593Smuzhiyun 296*4882a593Smuzhiyun A vfsmount can be both shared as well as slave. This state 297*4882a593Smuzhiyun indicates that the mount is a slave of some vfsmount, and 298*4882a593Smuzhiyun has its own peer group too. This vfsmount receives propagation 299*4882a593Smuzhiyun events from its master vfsmount, and also forwards propagation 300*4882a593Smuzhiyun events to its 'peer group' and to its slave vfsmounts. 301*4882a593Smuzhiyun 302*4882a593Smuzhiyun Strictly speaking, the vfsmount is shared having its own 303*4882a593Smuzhiyun peer group, and this peer-group is a slave of some other 304*4882a593Smuzhiyun peer group. 305*4882a593Smuzhiyun 306*4882a593Smuzhiyun Only a slave vfsmount can be made as 'shared and slave' by 307*4882a593Smuzhiyun either executing the following command:: 308*4882a593Smuzhiyun 309*4882a593Smuzhiyun mount --make-shared mount 310*4882a593Smuzhiyun 311*4882a593Smuzhiyun or by moving the slave vfsmount under a shared vfsmount. 312*4882a593Smuzhiyun 313*4882a593Smuzhiyun (4) Private mount 314*4882a593Smuzhiyun 315*4882a593Smuzhiyun A 'private mount' is defined as vfsmount that does not 316*4882a593Smuzhiyun receive or forward any propagation events. 317*4882a593Smuzhiyun 318*4882a593Smuzhiyun (5) Unbindable mount 319*4882a593Smuzhiyun 320*4882a593Smuzhiyun A 'unbindable mount' is defined as vfsmount that does not 321*4882a593Smuzhiyun receive or forward any propagation events and cannot 322*4882a593Smuzhiyun be bind mounted. 323*4882a593Smuzhiyun 324*4882a593Smuzhiyun 325*4882a593Smuzhiyun State diagram: 326*4882a593Smuzhiyun 327*4882a593Smuzhiyun The state diagram below explains the state transition of a mount, 328*4882a593Smuzhiyun in response to various commands:: 329*4882a593Smuzhiyun 330*4882a593Smuzhiyun ----------------------------------------------------------------------- 331*4882a593Smuzhiyun | |make-shared | make-slave | make-private |make-unbindab| 332*4882a593Smuzhiyun --------------|------------|--------------|--------------|-------------| 333*4882a593Smuzhiyun |shared |shared |*slave/private| private | unbindable | 334*4882a593Smuzhiyun | | | | | | 335*4882a593Smuzhiyun |-------------|------------|--------------|--------------|-------------| 336*4882a593Smuzhiyun |slave |shared | **slave | private | unbindable | 337*4882a593Smuzhiyun | |and slave | | | | 338*4882a593Smuzhiyun |-------------|------------|--------------|--------------|-------------| 339*4882a593Smuzhiyun |shared |shared | slave | private | unbindable | 340*4882a593Smuzhiyun |and slave |and slave | | | | 341*4882a593Smuzhiyun |-------------|------------|--------------|--------------|-------------| 342*4882a593Smuzhiyun |private |shared | **private | private | unbindable | 343*4882a593Smuzhiyun |-------------|------------|--------------|--------------|-------------| 344*4882a593Smuzhiyun |unbindable |shared |**unbindable | private | unbindable | 345*4882a593Smuzhiyun ------------------------------------------------------------------------ 346*4882a593Smuzhiyun 347*4882a593Smuzhiyun * if the shared mount is the only mount in its peer group, making it 348*4882a593Smuzhiyun slave, makes it private automatically. Note that there is no master to 349*4882a593Smuzhiyun which it can be slaved to. 350*4882a593Smuzhiyun 351*4882a593Smuzhiyun ** slaving a non-shared mount has no effect on the mount. 352*4882a593Smuzhiyun 353*4882a593Smuzhiyun Apart from the commands listed below, the 'move' operation also changes 354*4882a593Smuzhiyun the state of a mount depending on type of the destination mount. Its 355*4882a593Smuzhiyun explained in section 5d. 356*4882a593Smuzhiyun 357*4882a593Smuzhiyun5b) Bind semantics 358*4882a593Smuzhiyun 359*4882a593Smuzhiyun Consider the following command:: 360*4882a593Smuzhiyun 361*4882a593Smuzhiyun mount --bind A/a B/b 362*4882a593Smuzhiyun 363*4882a593Smuzhiyun where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B' 364*4882a593Smuzhiyun is the destination mount and 'b' is the dentry in the destination mount. 365*4882a593Smuzhiyun 366*4882a593Smuzhiyun The outcome depends on the type of mount of 'A' and 'B'. The table 367*4882a593Smuzhiyun below contains quick reference:: 368*4882a593Smuzhiyun 369*4882a593Smuzhiyun -------------------------------------------------------------------------- 370*4882a593Smuzhiyun | BIND MOUNT OPERATION | 371*4882a593Smuzhiyun |************************************************************************| 372*4882a593Smuzhiyun |source(A)->| shared | private | slave | unbindable | 373*4882a593Smuzhiyun | dest(B) | | | | | 374*4882a593Smuzhiyun | | | | | | | 375*4882a593Smuzhiyun | v | | | | | 376*4882a593Smuzhiyun |************************************************************************| 377*4882a593Smuzhiyun | shared | shared | shared | shared & slave | invalid | 378*4882a593Smuzhiyun | | | | | | 379*4882a593Smuzhiyun |non-shared| shared | private | slave | invalid | 380*4882a593Smuzhiyun ************************************************************************** 381*4882a593Smuzhiyun 382*4882a593Smuzhiyun Details: 383*4882a593Smuzhiyun 384*4882a593Smuzhiyun 1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C' 385*4882a593Smuzhiyun which is clone of 'A', is created. Its root dentry is 'a' . 'C' is 386*4882a593Smuzhiyun mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ... 387*4882a593Smuzhiyun are created and mounted at the dentry 'b' on all mounts where 'B' 388*4882a593Smuzhiyun propagates to. A new propagation tree containing 'C1',..,'Cn' is 389*4882a593Smuzhiyun created. This propagation tree is identical to the propagation tree of 390*4882a593Smuzhiyun 'B'. And finally the peer-group of 'C' is merged with the peer group 391*4882a593Smuzhiyun of 'A'. 392*4882a593Smuzhiyun 393*4882a593Smuzhiyun 2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C' 394*4882a593Smuzhiyun which is clone of 'A', is created. Its root dentry is 'a'. 'C' is 395*4882a593Smuzhiyun mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ... 396*4882a593Smuzhiyun are created and mounted at the dentry 'b' on all mounts where 'B' 397*4882a593Smuzhiyun propagates to. A new propagation tree is set containing all new mounts 398*4882a593Smuzhiyun 'C', 'C1', .., 'Cn' with exactly the same configuration as the 399*4882a593Smuzhiyun propagation tree for 'B'. 400*4882a593Smuzhiyun 401*4882a593Smuzhiyun 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new 402*4882a593Smuzhiyun mount 'C' which is clone of 'A', is created. Its root dentry is 'a' . 403*4882a593Smuzhiyun 'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2', 404*4882a593Smuzhiyun 'C3' ... are created and mounted at the dentry 'b' on all mounts where 405*4882a593Smuzhiyun 'B' propagates to. A new propagation tree containing the new mounts 406*4882a593Smuzhiyun 'C','C1',.. 'Cn' is created. This propagation tree is identical to the 407*4882a593Smuzhiyun propagation tree for 'B'. And finally the mount 'C' and its peer group 408*4882a593Smuzhiyun is made the slave of mount 'Z'. In other words, mount 'C' is in the 409*4882a593Smuzhiyun state 'slave and shared'. 410*4882a593Smuzhiyun 411*4882a593Smuzhiyun 4. 'A' is a unbindable mount and 'B' is a shared mount. This is a 412*4882a593Smuzhiyun invalid operation. 413*4882a593Smuzhiyun 414*4882a593Smuzhiyun 5. 'A' is a private mount and 'B' is a non-shared(private or slave or 415*4882a593Smuzhiyun unbindable) mount. A new mount 'C' which is clone of 'A', is created. 416*4882a593Smuzhiyun Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'. 417*4882a593Smuzhiyun 418*4882a593Smuzhiyun 6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C' 419*4882a593Smuzhiyun which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is 420*4882a593Smuzhiyun mounted on mount 'B' at dentry 'b'. 'C' is made a member of the 421*4882a593Smuzhiyun peer-group of 'A'. 422*4882a593Smuzhiyun 423*4882a593Smuzhiyun 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A 424*4882a593Smuzhiyun new mount 'C' which is a clone of 'A' is created. Its root dentry is 425*4882a593Smuzhiyun 'a'. 'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a 426*4882a593Smuzhiyun slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of 427*4882a593Smuzhiyun 'Z'. All mount/unmount events on 'Z' propagates to 'A' and 'C'. But 428*4882a593Smuzhiyun mount/unmount on 'A' do not propagate anywhere else. Similarly 429*4882a593Smuzhiyun mount/unmount on 'C' do not propagate anywhere else. 430*4882a593Smuzhiyun 431*4882a593Smuzhiyun 8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a 432*4882a593Smuzhiyun invalid operation. A unbindable mount cannot be bind mounted. 433*4882a593Smuzhiyun 434*4882a593Smuzhiyun5c) Rbind semantics 435*4882a593Smuzhiyun 436*4882a593Smuzhiyun rbind is same as bind. Bind replicates the specified mount. Rbind 437*4882a593Smuzhiyun replicates all the mounts in the tree belonging to the specified mount. 438*4882a593Smuzhiyun Rbind mount is bind mount applied to all the mounts in the tree. 439*4882a593Smuzhiyun 440*4882a593Smuzhiyun If the source tree that is rbind has some unbindable mounts, 441*4882a593Smuzhiyun then the subtree under the unbindable mount is pruned in the new 442*4882a593Smuzhiyun location. 443*4882a593Smuzhiyun 444*4882a593Smuzhiyun eg: 445*4882a593Smuzhiyun 446*4882a593Smuzhiyun let's say we have the following mount tree:: 447*4882a593Smuzhiyun 448*4882a593Smuzhiyun A 449*4882a593Smuzhiyun / \ 450*4882a593Smuzhiyun B C 451*4882a593Smuzhiyun / \ / \ 452*4882a593Smuzhiyun D E F G 453*4882a593Smuzhiyun 454*4882a593Smuzhiyun Let's say all the mount except the mount C in the tree are 455*4882a593Smuzhiyun of a type other than unbindable. 456*4882a593Smuzhiyun 457*4882a593Smuzhiyun If this tree is rbound to say Z 458*4882a593Smuzhiyun 459*4882a593Smuzhiyun We will have the following tree at the new location:: 460*4882a593Smuzhiyun 461*4882a593Smuzhiyun Z 462*4882a593Smuzhiyun | 463*4882a593Smuzhiyun A' 464*4882a593Smuzhiyun / 465*4882a593Smuzhiyun B' Note how the tree under C is pruned 466*4882a593Smuzhiyun / \ in the new location. 467*4882a593Smuzhiyun D' E' 468*4882a593Smuzhiyun 469*4882a593Smuzhiyun 470*4882a593Smuzhiyun 471*4882a593Smuzhiyun5d) Move semantics 472*4882a593Smuzhiyun 473*4882a593Smuzhiyun Consider the following command 474*4882a593Smuzhiyun 475*4882a593Smuzhiyun mount --move A B/b 476*4882a593Smuzhiyun 477*4882a593Smuzhiyun where 'A' is the source mount, 'B' is the destination mount and 'b' is 478*4882a593Smuzhiyun the dentry in the destination mount. 479*4882a593Smuzhiyun 480*4882a593Smuzhiyun The outcome depends on the type of the mount of 'A' and 'B'. The table 481*4882a593Smuzhiyun below is a quick reference:: 482*4882a593Smuzhiyun 483*4882a593Smuzhiyun --------------------------------------------------------------------------- 484*4882a593Smuzhiyun | MOVE MOUNT OPERATION | 485*4882a593Smuzhiyun |************************************************************************** 486*4882a593Smuzhiyun | source(A)->| shared | private | slave | unbindable | 487*4882a593Smuzhiyun | dest(B) | | | | | 488*4882a593Smuzhiyun | | | | | | | 489*4882a593Smuzhiyun | v | | | | | 490*4882a593Smuzhiyun |************************************************************************** 491*4882a593Smuzhiyun | shared | shared | shared |shared and slave| invalid | 492*4882a593Smuzhiyun | | | | | | 493*4882a593Smuzhiyun |non-shared| shared | private | slave | unbindable | 494*4882a593Smuzhiyun *************************************************************************** 495*4882a593Smuzhiyun 496*4882a593Smuzhiyun .. Note:: moving a mount residing under a shared mount is invalid. 497*4882a593Smuzhiyun 498*4882a593Smuzhiyun Details follow: 499*4882a593Smuzhiyun 500*4882a593Smuzhiyun 1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is 501*4882a593Smuzhiyun mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 'A2'...'An' 502*4882a593Smuzhiyun are created and mounted at dentry 'b' on all mounts that receive 503*4882a593Smuzhiyun propagation from mount 'B'. A new propagation tree is created in the 504*4882a593Smuzhiyun exact same configuration as that of 'B'. This new propagation tree 505*4882a593Smuzhiyun contains all the new mounts 'A1', 'A2'... 'An'. And this new 506*4882a593Smuzhiyun propagation tree is appended to the already existing propagation tree 507*4882a593Smuzhiyun of 'A'. 508*4882a593Smuzhiyun 509*4882a593Smuzhiyun 2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is 510*4882a593Smuzhiyun mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An' 511*4882a593Smuzhiyun are created and mounted at dentry 'b' on all mounts that receive 512*4882a593Smuzhiyun propagation from mount 'B'. The mount 'A' becomes a shared mount and a 513*4882a593Smuzhiyun propagation tree is created which is identical to that of 514*4882a593Smuzhiyun 'B'. This new propagation tree contains all the new mounts 'A1', 515*4882a593Smuzhiyun 'A2'... 'An'. 516*4882a593Smuzhiyun 517*4882a593Smuzhiyun 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The 518*4882a593Smuzhiyun mount 'A' is mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 519*4882a593Smuzhiyun 'A2'... 'An' are created and mounted at dentry 'b' on all mounts that 520*4882a593Smuzhiyun receive propagation from mount 'B'. A new propagation tree is created 521*4882a593Smuzhiyun in the exact same configuration as that of 'B'. This new propagation 522*4882a593Smuzhiyun tree contains all the new mounts 'A1', 'A2'... 'An'. And this new 523*4882a593Smuzhiyun propagation tree is appended to the already existing propagation tree of 524*4882a593Smuzhiyun 'A'. Mount 'A' continues to be the slave mount of 'Z' but it also 525*4882a593Smuzhiyun becomes 'shared'. 526*4882a593Smuzhiyun 527*4882a593Smuzhiyun 4. 'A' is a unbindable mount and 'B' is a shared mount. The operation 528*4882a593Smuzhiyun is invalid. Because mounting anything on the shared mount 'B' can 529*4882a593Smuzhiyun create new mounts that get mounted on the mounts that receive 530*4882a593Smuzhiyun propagation from 'B'. And since the mount 'A' is unbindable, cloning 531*4882a593Smuzhiyun it to mount at other mountpoints is not possible. 532*4882a593Smuzhiyun 533*4882a593Smuzhiyun 5. 'A' is a private mount and 'B' is a non-shared(private or slave or 534*4882a593Smuzhiyun unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'. 535*4882a593Smuzhiyun 536*4882a593Smuzhiyun 6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A' 537*4882a593Smuzhiyun is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a 538*4882a593Smuzhiyun shared mount. 539*4882a593Smuzhiyun 540*4882a593Smuzhiyun 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. 541*4882a593Smuzhiyun The mount 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' 542*4882a593Smuzhiyun continues to be a slave mount of mount 'Z'. 543*4882a593Smuzhiyun 544*4882a593Smuzhiyun 8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount 545*4882a593Smuzhiyun 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a 546*4882a593Smuzhiyun unbindable mount. 547*4882a593Smuzhiyun 548*4882a593Smuzhiyun5e) Mount semantics 549*4882a593Smuzhiyun 550*4882a593Smuzhiyun Consider the following command:: 551*4882a593Smuzhiyun 552*4882a593Smuzhiyun mount device B/b 553*4882a593Smuzhiyun 554*4882a593Smuzhiyun 'B' is the destination mount and 'b' is the dentry in the destination 555*4882a593Smuzhiyun mount. 556*4882a593Smuzhiyun 557*4882a593Smuzhiyun The above operation is the same as bind operation with the exception 558*4882a593Smuzhiyun that the source mount is always a private mount. 559*4882a593Smuzhiyun 560*4882a593Smuzhiyun 561*4882a593Smuzhiyun5f) Unmount semantics 562*4882a593Smuzhiyun 563*4882a593Smuzhiyun Consider the following command:: 564*4882a593Smuzhiyun 565*4882a593Smuzhiyun umount A 566*4882a593Smuzhiyun 567*4882a593Smuzhiyun where 'A' is a mount mounted on mount 'B' at dentry 'b'. 568*4882a593Smuzhiyun 569*4882a593Smuzhiyun If mount 'B' is shared, then all most-recently-mounted mounts at dentry 570*4882a593Smuzhiyun 'b' on mounts that receive propagation from mount 'B' and does not have 571*4882a593Smuzhiyun sub-mounts within them are unmounted. 572*4882a593Smuzhiyun 573*4882a593Smuzhiyun Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to 574*4882a593Smuzhiyun each other. 575*4882a593Smuzhiyun 576*4882a593Smuzhiyun let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount 577*4882a593Smuzhiyun 'B1', 'B2' and 'B3' respectively. 578*4882a593Smuzhiyun 579*4882a593Smuzhiyun let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on 580*4882a593Smuzhiyun mount 'B1', 'B2' and 'B3' respectively. 581*4882a593Smuzhiyun 582*4882a593Smuzhiyun if 'C1' is unmounted, all the mounts that are most-recently-mounted on 583*4882a593Smuzhiyun 'B1' and on the mounts that 'B1' propagates-to are unmounted. 584*4882a593Smuzhiyun 585*4882a593Smuzhiyun 'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount 586*4882a593Smuzhiyun on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'. 587*4882a593Smuzhiyun 588*4882a593Smuzhiyun So all 'C1', 'C2' and 'C3' should be unmounted. 589*4882a593Smuzhiyun 590*4882a593Smuzhiyun If any of 'C2' or 'C3' has some child mounts, then that mount is not 591*4882a593Smuzhiyun unmounted, but all other mounts are unmounted. However if 'C1' is told 592*4882a593Smuzhiyun to be unmounted and 'C1' has some sub-mounts, the umount operation is 593*4882a593Smuzhiyun failed entirely. 594*4882a593Smuzhiyun 595*4882a593Smuzhiyun5g) Clone Namespace 596*4882a593Smuzhiyun 597*4882a593Smuzhiyun A cloned namespace contains all the mounts as that of the parent 598*4882a593Smuzhiyun namespace. 599*4882a593Smuzhiyun 600*4882a593Smuzhiyun Let's say 'A' and 'B' are the corresponding mounts in the parent and the 601*4882a593Smuzhiyun child namespace. 602*4882a593Smuzhiyun 603*4882a593Smuzhiyun If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to 604*4882a593Smuzhiyun each other. 605*4882a593Smuzhiyun 606*4882a593Smuzhiyun If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of 607*4882a593Smuzhiyun 'Z'. 608*4882a593Smuzhiyun 609*4882a593Smuzhiyun If 'A' is a private mount, then 'B' is a private mount too. 610*4882a593Smuzhiyun 611*4882a593Smuzhiyun If 'A' is unbindable mount, then 'B' is a unbindable mount too. 612*4882a593Smuzhiyun 613*4882a593Smuzhiyun 614*4882a593Smuzhiyun6) Quiz 615*4882a593Smuzhiyun 616*4882a593Smuzhiyun A. What is the result of the following command sequence? 617*4882a593Smuzhiyun 618*4882a593Smuzhiyun :: 619*4882a593Smuzhiyun 620*4882a593Smuzhiyun mount --bind /mnt /mnt 621*4882a593Smuzhiyun mount --make-shared /mnt 622*4882a593Smuzhiyun mount --bind /mnt /tmp 623*4882a593Smuzhiyun mount --move /tmp /mnt/1 624*4882a593Smuzhiyun 625*4882a593Smuzhiyun what should be the contents of /mnt /mnt/1 /mnt/1/1 should be? 626*4882a593Smuzhiyun Should they all be identical? or should /mnt and /mnt/1 be 627*4882a593Smuzhiyun identical only? 628*4882a593Smuzhiyun 629*4882a593Smuzhiyun 630*4882a593Smuzhiyun B. What is the result of the following command sequence? 631*4882a593Smuzhiyun 632*4882a593Smuzhiyun :: 633*4882a593Smuzhiyun 634*4882a593Smuzhiyun mount --make-rshared / 635*4882a593Smuzhiyun mkdir -p /v/1 636*4882a593Smuzhiyun mount --rbind / /v/1 637*4882a593Smuzhiyun 638*4882a593Smuzhiyun what should be the content of /v/1/v/1 be? 639*4882a593Smuzhiyun 640*4882a593Smuzhiyun 641*4882a593Smuzhiyun C. What is the result of the following command sequence? 642*4882a593Smuzhiyun 643*4882a593Smuzhiyun :: 644*4882a593Smuzhiyun 645*4882a593Smuzhiyun mount --bind /mnt /mnt 646*4882a593Smuzhiyun mount --make-shared /mnt 647*4882a593Smuzhiyun mkdir -p /mnt/1/2/3 /mnt/1/test 648*4882a593Smuzhiyun mount --bind /mnt/1 /tmp 649*4882a593Smuzhiyun mount --make-slave /mnt 650*4882a593Smuzhiyun mount --make-shared /mnt 651*4882a593Smuzhiyun mount --bind /mnt/1/2 /tmp1 652*4882a593Smuzhiyun mount --make-slave /mnt 653*4882a593Smuzhiyun 654*4882a593Smuzhiyun At this point we have the first mount at /tmp and 655*4882a593Smuzhiyun its root dentry is 1. Let's call this mount 'A' 656*4882a593Smuzhiyun And then we have a second mount at /tmp1 with root 657*4882a593Smuzhiyun dentry 2. Let's call this mount 'B' 658*4882a593Smuzhiyun Next we have a third mount at /mnt with root dentry 659*4882a593Smuzhiyun mnt. Let's call this mount 'C' 660*4882a593Smuzhiyun 661*4882a593Smuzhiyun 'B' is the slave of 'A' and 'C' is a slave of 'B' 662*4882a593Smuzhiyun A -> B -> C 663*4882a593Smuzhiyun 664*4882a593Smuzhiyun at this point if we execute the following command 665*4882a593Smuzhiyun 666*4882a593Smuzhiyun mount --bind /bin /tmp/test 667*4882a593Smuzhiyun 668*4882a593Smuzhiyun The mount is attempted on 'A' 669*4882a593Smuzhiyun 670*4882a593Smuzhiyun will the mount propagate to 'B' and 'C' ? 671*4882a593Smuzhiyun 672*4882a593Smuzhiyun what would be the contents of 673*4882a593Smuzhiyun /mnt/1/test be? 674*4882a593Smuzhiyun 675*4882a593Smuzhiyun7) FAQ 676*4882a593Smuzhiyun 677*4882a593Smuzhiyun Q1. Why is bind mount needed? How is it different from symbolic links? 678*4882a593Smuzhiyun symbolic links can get stale if the destination mount gets 679*4882a593Smuzhiyun unmounted or moved. Bind mounts continue to exist even if the 680*4882a593Smuzhiyun other mount is unmounted or moved. 681*4882a593Smuzhiyun 682*4882a593Smuzhiyun Q2. Why can't the shared subtree be implemented using exportfs? 683*4882a593Smuzhiyun 684*4882a593Smuzhiyun exportfs is a heavyweight way of accomplishing part of what 685*4882a593Smuzhiyun shared subtree can do. I cannot imagine a way to implement the 686*4882a593Smuzhiyun semantics of slave mount using exportfs? 687*4882a593Smuzhiyun 688*4882a593Smuzhiyun Q3 Why is unbindable mount needed? 689*4882a593Smuzhiyun 690*4882a593Smuzhiyun Let's say we want to replicate the mount tree at multiple 691*4882a593Smuzhiyun locations within the same subtree. 692*4882a593Smuzhiyun 693*4882a593Smuzhiyun if one rbind mounts a tree within the same subtree 'n' times 694*4882a593Smuzhiyun the number of mounts created is an exponential function of 'n'. 695*4882a593Smuzhiyun Having unbindable mount can help prune the unneeded bind 696*4882a593Smuzhiyun mounts. Here is an example. 697*4882a593Smuzhiyun 698*4882a593Smuzhiyun step 1: 699*4882a593Smuzhiyun let's say the root tree has just two directories with 700*4882a593Smuzhiyun one vfsmount:: 701*4882a593Smuzhiyun 702*4882a593Smuzhiyun root 703*4882a593Smuzhiyun / \ 704*4882a593Smuzhiyun tmp usr 705*4882a593Smuzhiyun 706*4882a593Smuzhiyun And we want to replicate the tree at multiple 707*4882a593Smuzhiyun mountpoints under /root/tmp 708*4882a593Smuzhiyun 709*4882a593Smuzhiyun step 2: 710*4882a593Smuzhiyun :: 711*4882a593Smuzhiyun 712*4882a593Smuzhiyun 713*4882a593Smuzhiyun mount --make-shared /root 714*4882a593Smuzhiyun 715*4882a593Smuzhiyun mkdir -p /tmp/m1 716*4882a593Smuzhiyun 717*4882a593Smuzhiyun mount --rbind /root /tmp/m1 718*4882a593Smuzhiyun 719*4882a593Smuzhiyun the new tree now looks like this:: 720*4882a593Smuzhiyun 721*4882a593Smuzhiyun root 722*4882a593Smuzhiyun / \ 723*4882a593Smuzhiyun tmp usr 724*4882a593Smuzhiyun / 725*4882a593Smuzhiyun m1 726*4882a593Smuzhiyun / \ 727*4882a593Smuzhiyun tmp usr 728*4882a593Smuzhiyun / 729*4882a593Smuzhiyun m1 730*4882a593Smuzhiyun 731*4882a593Smuzhiyun it has two vfsmounts 732*4882a593Smuzhiyun 733*4882a593Smuzhiyun step 3: 734*4882a593Smuzhiyun :: 735*4882a593Smuzhiyun 736*4882a593Smuzhiyun mkdir -p /tmp/m2 737*4882a593Smuzhiyun mount --rbind /root /tmp/m2 738*4882a593Smuzhiyun 739*4882a593Smuzhiyun the new tree now looks like this:: 740*4882a593Smuzhiyun 741*4882a593Smuzhiyun root 742*4882a593Smuzhiyun / \ 743*4882a593Smuzhiyun tmp usr 744*4882a593Smuzhiyun / \ 745*4882a593Smuzhiyun m1 m2 746*4882a593Smuzhiyun / \ / \ 747*4882a593Smuzhiyun tmp usr tmp usr 748*4882a593Smuzhiyun / \ / 749*4882a593Smuzhiyun m1 m2 m1 750*4882a593Smuzhiyun / \ / \ 751*4882a593Smuzhiyun tmp usr tmp usr 752*4882a593Smuzhiyun / / \ 753*4882a593Smuzhiyun m1 m1 m2 754*4882a593Smuzhiyun / \ 755*4882a593Smuzhiyun tmp usr 756*4882a593Smuzhiyun / \ 757*4882a593Smuzhiyun m1 m2 758*4882a593Smuzhiyun 759*4882a593Smuzhiyun it has 6 vfsmounts 760*4882a593Smuzhiyun 761*4882a593Smuzhiyun step 4: 762*4882a593Smuzhiyun :: 763*4882a593Smuzhiyun mkdir -p /tmp/m3 764*4882a593Smuzhiyun mount --rbind /root /tmp/m3 765*4882a593Smuzhiyun 766*4882a593Smuzhiyun I won't draw the tree..but it has 24 vfsmounts 767*4882a593Smuzhiyun 768*4882a593Smuzhiyun 769*4882a593Smuzhiyun at step i the number of vfsmounts is V[i] = i*V[i-1]. 770*4882a593Smuzhiyun This is an exponential function. And this tree has way more 771*4882a593Smuzhiyun mounts than what we really needed in the first place. 772*4882a593Smuzhiyun 773*4882a593Smuzhiyun One could use a series of umount at each step to prune 774*4882a593Smuzhiyun out the unneeded mounts. But there is a better solution. 775*4882a593Smuzhiyun Unclonable mounts come in handy here. 776*4882a593Smuzhiyun 777*4882a593Smuzhiyun step 1: 778*4882a593Smuzhiyun let's say the root tree has just two directories with 779*4882a593Smuzhiyun one vfsmount:: 780*4882a593Smuzhiyun 781*4882a593Smuzhiyun root 782*4882a593Smuzhiyun / \ 783*4882a593Smuzhiyun tmp usr 784*4882a593Smuzhiyun 785*4882a593Smuzhiyun How do we set up the same tree at multiple locations under 786*4882a593Smuzhiyun /root/tmp 787*4882a593Smuzhiyun 788*4882a593Smuzhiyun step 2: 789*4882a593Smuzhiyun :: 790*4882a593Smuzhiyun 791*4882a593Smuzhiyun 792*4882a593Smuzhiyun mount --bind /root/tmp /root/tmp 793*4882a593Smuzhiyun 794*4882a593Smuzhiyun mount --make-rshared /root 795*4882a593Smuzhiyun mount --make-unbindable /root/tmp 796*4882a593Smuzhiyun 797*4882a593Smuzhiyun mkdir -p /tmp/m1 798*4882a593Smuzhiyun 799*4882a593Smuzhiyun mount --rbind /root /tmp/m1 800*4882a593Smuzhiyun 801*4882a593Smuzhiyun the new tree now looks like this:: 802*4882a593Smuzhiyun 803*4882a593Smuzhiyun root 804*4882a593Smuzhiyun / \ 805*4882a593Smuzhiyun tmp usr 806*4882a593Smuzhiyun / 807*4882a593Smuzhiyun m1 808*4882a593Smuzhiyun / \ 809*4882a593Smuzhiyun tmp usr 810*4882a593Smuzhiyun 811*4882a593Smuzhiyun step 3: 812*4882a593Smuzhiyun :: 813*4882a593Smuzhiyun 814*4882a593Smuzhiyun mkdir -p /tmp/m2 815*4882a593Smuzhiyun mount --rbind /root /tmp/m2 816*4882a593Smuzhiyun 817*4882a593Smuzhiyun the new tree now looks like this:: 818*4882a593Smuzhiyun 819*4882a593Smuzhiyun root 820*4882a593Smuzhiyun / \ 821*4882a593Smuzhiyun tmp usr 822*4882a593Smuzhiyun / \ 823*4882a593Smuzhiyun m1 m2 824*4882a593Smuzhiyun / \ / \ 825*4882a593Smuzhiyun tmp usr tmp usr 826*4882a593Smuzhiyun 827*4882a593Smuzhiyun step 4: 828*4882a593Smuzhiyun :: 829*4882a593Smuzhiyun 830*4882a593Smuzhiyun mkdir -p /tmp/m3 831*4882a593Smuzhiyun mount --rbind /root /tmp/m3 832*4882a593Smuzhiyun 833*4882a593Smuzhiyun the new tree now looks like this:: 834*4882a593Smuzhiyun 835*4882a593Smuzhiyun root 836*4882a593Smuzhiyun / \ 837*4882a593Smuzhiyun tmp usr 838*4882a593Smuzhiyun / \ \ 839*4882a593Smuzhiyun m1 m2 m3 840*4882a593Smuzhiyun / \ / \ / \ 841*4882a593Smuzhiyun tmp usr tmp usr tmp usr 842*4882a593Smuzhiyun 843*4882a593Smuzhiyun8) Implementation 844*4882a593Smuzhiyun 845*4882a593Smuzhiyun8A) Datastructure 846*4882a593Smuzhiyun 847*4882a593Smuzhiyun 4 new fields are introduced to struct vfsmount: 848*4882a593Smuzhiyun 849*4882a593Smuzhiyun * ->mnt_share 850*4882a593Smuzhiyun * ->mnt_slave_list 851*4882a593Smuzhiyun * ->mnt_slave 852*4882a593Smuzhiyun * ->mnt_master 853*4882a593Smuzhiyun 854*4882a593Smuzhiyun ->mnt_share 855*4882a593Smuzhiyun links together all the mount to/from which this vfsmount 856*4882a593Smuzhiyun send/receives propagation events. 857*4882a593Smuzhiyun 858*4882a593Smuzhiyun ->mnt_slave_list 859*4882a593Smuzhiyun links all the mounts to which this vfsmount propagates 860*4882a593Smuzhiyun to. 861*4882a593Smuzhiyun 862*4882a593Smuzhiyun ->mnt_slave 863*4882a593Smuzhiyun links together all the slaves that its master vfsmount 864*4882a593Smuzhiyun propagates to. 865*4882a593Smuzhiyun 866*4882a593Smuzhiyun ->mnt_master 867*4882a593Smuzhiyun points to the master vfsmount from which this vfsmount 868*4882a593Smuzhiyun receives propagation. 869*4882a593Smuzhiyun 870*4882a593Smuzhiyun ->mnt_flags 871*4882a593Smuzhiyun takes two more flags to indicate the propagation status of 872*4882a593Smuzhiyun the vfsmount. MNT_SHARE indicates that the vfsmount is a shared 873*4882a593Smuzhiyun vfsmount. MNT_UNCLONABLE indicates that the vfsmount cannot be 874*4882a593Smuzhiyun replicated. 875*4882a593Smuzhiyun 876*4882a593Smuzhiyun All the shared vfsmounts in a peer group form a cyclic list through 877*4882a593Smuzhiyun ->mnt_share. 878*4882a593Smuzhiyun 879*4882a593Smuzhiyun All vfsmounts with the same ->mnt_master form on a cyclic list anchored 880*4882a593Smuzhiyun in ->mnt_master->mnt_slave_list and going through ->mnt_slave. 881*4882a593Smuzhiyun 882*4882a593Smuzhiyun ->mnt_master can point to arbitrary (and possibly different) members 883*4882a593Smuzhiyun of master peer group. To find all immediate slaves of a peer group 884*4882a593Smuzhiyun you need to go through _all_ ->mnt_slave_list of its members. 885*4882a593Smuzhiyun Conceptually it's just a single set - distribution among the 886*4882a593Smuzhiyun individual lists does not affect propagation or the way propagation 887*4882a593Smuzhiyun tree is modified by operations. 888*4882a593Smuzhiyun 889*4882a593Smuzhiyun All vfsmounts in a peer group have the same ->mnt_master. If it is 890*4882a593Smuzhiyun non-NULL, they form a contiguous (ordered) segment of slave list. 891*4882a593Smuzhiyun 892*4882a593Smuzhiyun A example propagation tree looks as shown in the figure below. 893*4882a593Smuzhiyun [ NOTE: Though it looks like a forest, if we consider all the shared 894*4882a593Smuzhiyun mounts as a conceptual entity called 'pnode', it becomes a tree]:: 895*4882a593Smuzhiyun 896*4882a593Smuzhiyun 897*4882a593Smuzhiyun A <--> B <--> C <---> D 898*4882a593Smuzhiyun /|\ /| |\ 899*4882a593Smuzhiyun / F G J K H I 900*4882a593Smuzhiyun / 901*4882a593Smuzhiyun E<-->K 902*4882a593Smuzhiyun /|\ 903*4882a593Smuzhiyun M L N 904*4882a593Smuzhiyun 905*4882a593Smuzhiyun In the above figure A,B,C and D all are shared and propagate to each 906*4882a593Smuzhiyun other. 'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave 907*4882a593Smuzhiyun mounts 'J' and 'K' and 'D' has got two slave mounts 'H' and 'I'. 908*4882a593Smuzhiyun 'E' is also shared with 'K' and they propagate to each other. And 909*4882a593Smuzhiyun 'K' has 3 slaves 'M', 'L' and 'N' 910*4882a593Smuzhiyun 911*4882a593Smuzhiyun A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D' 912*4882a593Smuzhiyun 913*4882a593Smuzhiyun A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G' 914*4882a593Smuzhiyun 915*4882a593Smuzhiyun E's ->mnt_share links with ->mnt_share of K 916*4882a593Smuzhiyun 917*4882a593Smuzhiyun 'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A' 918*4882a593Smuzhiyun 919*4882a593Smuzhiyun 'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K' 920*4882a593Smuzhiyun 921*4882a593Smuzhiyun K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N' 922*4882a593Smuzhiyun 923*4882a593Smuzhiyun C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K' 924*4882a593Smuzhiyun 925*4882a593Smuzhiyun J and K's ->mnt_master points to struct vfsmount of C 926*4882a593Smuzhiyun 927*4882a593Smuzhiyun and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I' 928*4882a593Smuzhiyun 929*4882a593Smuzhiyun 'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'. 930*4882a593Smuzhiyun 931*4882a593Smuzhiyun 932*4882a593Smuzhiyun NOTE: The propagation tree is orthogonal to the mount tree. 933*4882a593Smuzhiyun 934*4882a593Smuzhiyun8B Locking: 935*4882a593Smuzhiyun 936*4882a593Smuzhiyun ->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected 937*4882a593Smuzhiyun by namespace_sem (exclusive for modifications, shared for reading). 938*4882a593Smuzhiyun 939*4882a593Smuzhiyun Normally we have ->mnt_flags modifications serialized by vfsmount_lock. 940*4882a593Smuzhiyun There are two exceptions: do_add_mount() and clone_mnt(). 941*4882a593Smuzhiyun The former modifies a vfsmount that has not been visible in any shared 942*4882a593Smuzhiyun data structures yet. 943*4882a593Smuzhiyun The latter holds namespace_sem and the only references to vfsmount 944*4882a593Smuzhiyun are in lists that can't be traversed without namespace_sem. 945*4882a593Smuzhiyun 946*4882a593Smuzhiyun8C Algorithm: 947*4882a593Smuzhiyun 948*4882a593Smuzhiyun The crux of the implementation resides in rbind/move operation. 949*4882a593Smuzhiyun 950*4882a593Smuzhiyun The overall algorithm breaks the operation into 3 phases: (look at 951*4882a593Smuzhiyun attach_recursive_mnt() and propagate_mnt()) 952*4882a593Smuzhiyun 953*4882a593Smuzhiyun 1. prepare phase. 954*4882a593Smuzhiyun 2. commit phases. 955*4882a593Smuzhiyun 3. abort phases. 956*4882a593Smuzhiyun 957*4882a593Smuzhiyun Prepare phase: 958*4882a593Smuzhiyun 959*4882a593Smuzhiyun for each mount in the source tree: 960*4882a593Smuzhiyun 961*4882a593Smuzhiyun a) Create the necessary number of mount trees to 962*4882a593Smuzhiyun be attached to each of the mounts that receive 963*4882a593Smuzhiyun propagation from the destination mount. 964*4882a593Smuzhiyun b) Do not attach any of the trees to its destination. 965*4882a593Smuzhiyun However note down its ->mnt_parent and ->mnt_mountpoint 966*4882a593Smuzhiyun c) Link all the new mounts to form a propagation tree that 967*4882a593Smuzhiyun is identical to the propagation tree of the destination 968*4882a593Smuzhiyun mount. 969*4882a593Smuzhiyun 970*4882a593Smuzhiyun If this phase is successful, there should be 'n' new 971*4882a593Smuzhiyun propagation trees; where 'n' is the number of mounts in the 972*4882a593Smuzhiyun source tree. Go to the commit phase 973*4882a593Smuzhiyun 974*4882a593Smuzhiyun Also there should be 'm' new mount trees, where 'm' is 975*4882a593Smuzhiyun the number of mounts to which the destination mount 976*4882a593Smuzhiyun propagates to. 977*4882a593Smuzhiyun 978*4882a593Smuzhiyun if any memory allocations fail, go to the abort phase. 979*4882a593Smuzhiyun 980*4882a593Smuzhiyun Commit phase 981*4882a593Smuzhiyun attach each of the mount trees to their corresponding 982*4882a593Smuzhiyun destination mounts. 983*4882a593Smuzhiyun 984*4882a593Smuzhiyun Abort phase 985*4882a593Smuzhiyun delete all the newly created trees. 986*4882a593Smuzhiyun 987*4882a593Smuzhiyun .. Note:: 988*4882a593Smuzhiyun all the propagation related functionality resides in the file pnode.c 989*4882a593Smuzhiyun 990*4882a593Smuzhiyun 991*4882a593Smuzhiyun------------------------------------------------------------------------ 992*4882a593Smuzhiyun 993*4882a593Smuzhiyunversion 0.1 (created the initial document, Ram Pai linuxram@us.ibm.com) 994*4882a593Smuzhiyun 995*4882a593Smuzhiyunversion 0.2 (Incorporated comments from Al Viro) 996