1*4882a593Smuzhiyun============================== 2*4882a593SmuzhiyunDevice-mapper snapshot support 3*4882a593Smuzhiyun============================== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunDevice-mapper allows you, without massive data copying: 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun- To create snapshots of any block device i.e. mountable, saved states of 8*4882a593Smuzhiyun the block device which are also writable without interfering with the 9*4882a593Smuzhiyun original content; 10*4882a593Smuzhiyun- To create device "forks", i.e. multiple different versions of the 11*4882a593Smuzhiyun same data stream. 12*4882a593Smuzhiyun- To merge a snapshot of a block device back into the snapshot's origin 13*4882a593Smuzhiyun device. 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunIn the first two cases, dm copies only the chunks of data that get 16*4882a593Smuzhiyunchanged and uses a separate copy-on-write (COW) block device for 17*4882a593Smuzhiyunstorage. 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunFor snapshot merge the contents of the COW storage are merged back into 20*4882a593Smuzhiyunthe origin device. 21*4882a593Smuzhiyun 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunThere are three dm targets available: 24*4882a593Smuzhiyunsnapshot, snapshot-origin, and snapshot-merge. 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun- snapshot-origin <origin> 27*4882a593Smuzhiyun 28*4882a593Smuzhiyunwhich will normally have one or more snapshots based on it. 29*4882a593SmuzhiyunReads will be mapped directly to the backing device. For each write, the 30*4882a593Smuzhiyunoriginal data will be saved in the <COW device> of each snapshot to keep 31*4882a593Smuzhiyunits visible content unchanged, at least until the <COW device> fills up. 32*4882a593Smuzhiyun 33*4882a593Smuzhiyun 34*4882a593Smuzhiyun- snapshot <origin> <COW device> <persistent?> <chunksize> 35*4882a593Smuzhiyun [<# feature args> [<arg>]*] 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunA snapshot of the <origin> block device is created. Changed chunks of 38*4882a593Smuzhiyun<chunksize> sectors will be stored on the <COW device>. Writes will 39*4882a593Smuzhiyunonly go to the <COW device>. Reads will come from the <COW device> or 40*4882a593Smuzhiyunfrom <origin> for unchanged data. <COW device> will often be 41*4882a593Smuzhiyunsmaller than the origin and if it fills up the snapshot will become 42*4882a593Smuzhiyunuseless and be disabled, returning errors. So it is important to monitor 43*4882a593Smuzhiyunthe amount of free space and expand the <COW device> before it fills up. 44*4882a593Smuzhiyun 45*4882a593Smuzhiyun<persistent?> is P (Persistent) or N (Not persistent - will not survive 46*4882a593Smuzhiyunafter reboot). O (Overflow) can be added as a persistent store option 47*4882a593Smuzhiyunto allow userspace to advertise its support for seeing "Overflow" in the 48*4882a593Smuzhiyunsnapshot status. So supported store types are "P", "PO" and "N". 49*4882a593Smuzhiyun 50*4882a593SmuzhiyunThe difference between persistent and transient is with transient 51*4882a593Smuzhiyunsnapshots less metadata must be saved on disk - they can be kept in 52*4882a593Smuzhiyunmemory by the kernel. 53*4882a593Smuzhiyun 54*4882a593SmuzhiyunWhen loading or unloading the snapshot target, the corresponding 55*4882a593Smuzhiyunsnapshot-origin or snapshot-merge target must be suspended. A failure to 56*4882a593Smuzhiyunsuspend the origin target could result in data corruption. 57*4882a593Smuzhiyun 58*4882a593SmuzhiyunOptional features: 59*4882a593Smuzhiyun 60*4882a593Smuzhiyun discard_zeroes_cow - a discard issued to the snapshot device that 61*4882a593Smuzhiyun maps to entire chunks to will zero the corresponding exception(s) in 62*4882a593Smuzhiyun the snapshot's exception store. 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun discard_passdown_origin - a discard to the snapshot device is passed 65*4882a593Smuzhiyun down to the snapshot-origin's underlying device. This doesn't cause 66*4882a593Smuzhiyun copy-out to the snapshot exception store because the snapshot-origin 67*4882a593Smuzhiyun target is bypassed. 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun The discard_passdown_origin feature depends on the discard_zeroes_cow 70*4882a593Smuzhiyun feature being enabled. 71*4882a593Smuzhiyun 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun- snapshot-merge <origin> <COW device> <persistent> <chunksize> 74*4882a593Smuzhiyun [<# feature args> [<arg>]*] 75*4882a593Smuzhiyun 76*4882a593Smuzhiyuntakes the same table arguments as the snapshot target except it only 77*4882a593Smuzhiyunworks with persistent snapshots. This target assumes the role of the 78*4882a593Smuzhiyun"snapshot-origin" target and must not be loaded if the "snapshot-origin" 79*4882a593Smuzhiyunis still present for <origin>. 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunCreates a merging snapshot that takes control of the changed chunks 82*4882a593Smuzhiyunstored in the <COW device> of an existing snapshot, through a handover 83*4882a593Smuzhiyunprocedure, and merges these chunks back into the <origin>. Once merging 84*4882a593Smuzhiyunhas started (in the background) the <origin> may be opened and the merge 85*4882a593Smuzhiyunwill continue while I/O is flowing to it. Changes to the <origin> are 86*4882a593Smuzhiyundeferred until the merging snapshot's corresponding chunk(s) have been 87*4882a593Smuzhiyunmerged. Once merging has started the snapshot device, associated with 88*4882a593Smuzhiyunthe "snapshot" target, will return -EIO when accessed. 89*4882a593Smuzhiyun 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunHow snapshot is used by LVM2 92*4882a593Smuzhiyun============================ 93*4882a593SmuzhiyunWhen you create the first LVM2 snapshot of a volume, four dm devices are used: 94*4882a593Smuzhiyun 95*4882a593Smuzhiyun1) a device containing the original mapping table of the source volume; 96*4882a593Smuzhiyun2) a device used as the <COW device>; 97*4882a593Smuzhiyun3) a "snapshot" device, combining #1 and #2, which is the visible snapshot 98*4882a593Smuzhiyun volume; 99*4882a593Smuzhiyun4) the "original" volume (which uses the device number used by the original 100*4882a593Smuzhiyun source volume), whose table is replaced by a "snapshot-origin" mapping 101*4882a593Smuzhiyun from device #1. 102*4882a593Smuzhiyun 103*4882a593SmuzhiyunA fixed naming scheme is used, so with the following commands:: 104*4882a593Smuzhiyun 105*4882a593Smuzhiyun lvcreate -L 1G -n base volumeGroup 106*4882a593Smuzhiyun lvcreate -L 100M --snapshot -n snap volumeGroup/base 107*4882a593Smuzhiyun 108*4882a593Smuzhiyunwe'll have this situation (with volumes in above order):: 109*4882a593Smuzhiyun 110*4882a593Smuzhiyun # dmsetup table|grep volumeGroup 111*4882a593Smuzhiyun 112*4882a593Smuzhiyun volumeGroup-base-real: 0 2097152 linear 8:19 384 113*4882a593Smuzhiyun volumeGroup-snap-cow: 0 204800 linear 8:19 2097536 114*4882a593Smuzhiyun volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16 115*4882a593Smuzhiyun volumeGroup-base: 0 2097152 snapshot-origin 254:11 116*4882a593Smuzhiyun 117*4882a593Smuzhiyun # ls -lL /dev/mapper/volumeGroup-* 118*4882a593Smuzhiyun brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real 119*4882a593Smuzhiyun brw------- 1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow 120*4882a593Smuzhiyun brw------- 1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap 121*4882a593Smuzhiyun brw------- 1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base 122*4882a593Smuzhiyun 123*4882a593Smuzhiyun 124*4882a593SmuzhiyunHow snapshot-merge is used by LVM2 125*4882a593Smuzhiyun================================== 126*4882a593SmuzhiyunA merging snapshot assumes the role of the "snapshot-origin" while 127*4882a593Smuzhiyunmerging. As such the "snapshot-origin" is replaced with 128*4882a593Smuzhiyun"snapshot-merge". The "-real" device is not changed and the "-cow" 129*4882a593Smuzhiyundevice is renamed to <origin name>-cow to aid LVM2's cleanup of the 130*4882a593Smuzhiyunmerging snapshot after it completes. The "snapshot" that hands over its 131*4882a593SmuzhiyunCOW device to the "snapshot-merge" is deactivated (unless using lvchange 132*4882a593Smuzhiyun--refresh); but if it is left active it will simply return I/O errors. 133*4882a593Smuzhiyun 134*4882a593SmuzhiyunA snapshot will merge into its origin with the following command:: 135*4882a593Smuzhiyun 136*4882a593Smuzhiyun lvconvert --merge volumeGroup/snap 137*4882a593Smuzhiyun 138*4882a593Smuzhiyunwe'll now have this situation:: 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun # dmsetup table|grep volumeGroup 141*4882a593Smuzhiyun 142*4882a593Smuzhiyun volumeGroup-base-real: 0 2097152 linear 8:19 384 143*4882a593Smuzhiyun volumeGroup-base-cow: 0 204800 linear 8:19 2097536 144*4882a593Smuzhiyun volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun # ls -lL /dev/mapper/volumeGroup-* 147*4882a593Smuzhiyun brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real 148*4882a593Smuzhiyun brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow 149*4882a593Smuzhiyun brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base 150*4882a593Smuzhiyun 151*4882a593Smuzhiyun 152*4882a593SmuzhiyunHow to determine when a merging is complete 153*4882a593Smuzhiyun=========================================== 154*4882a593SmuzhiyunThe snapshot-merge and snapshot status lines end with: 155*4882a593Smuzhiyun 156*4882a593Smuzhiyun <sectors_allocated>/<total_sectors> <metadata_sectors> 157*4882a593Smuzhiyun 158*4882a593SmuzhiyunBoth <sectors_allocated> and <total_sectors> include both data and metadata. 159*4882a593SmuzhiyunDuring merging, the number of sectors allocated gets smaller and 160*4882a593Smuzhiyunsmaller. Merging has finished when the number of sectors holding data 161*4882a593Smuzhiyunis zero, in other words <sectors_allocated> == <metadata_sectors>. 162*4882a593Smuzhiyun 163*4882a593SmuzhiyunHere is a practical example (using a hybrid of lvm and dmsetup commands):: 164*4882a593Smuzhiyun 165*4882a593Smuzhiyun # lvs 166*4882a593Smuzhiyun LV VG Attr LSize Origin Snap% Move Log Copy% Convert 167*4882a593Smuzhiyun base volumeGroup owi-a- 4.00g 168*4882a593Smuzhiyun snap volumeGroup swi-a- 1.00g base 18.97 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun # dmsetup status volumeGroup-snap 171*4882a593Smuzhiyun 0 8388608 snapshot 397896/2097152 1560 172*4882a593Smuzhiyun ^^^^ metadata sectors 173*4882a593Smuzhiyun 174*4882a593Smuzhiyun # lvconvert --merge -b volumeGroup/snap 175*4882a593Smuzhiyun Merging of volume snap started. 176*4882a593Smuzhiyun 177*4882a593Smuzhiyun # lvs volumeGroup/snap 178*4882a593Smuzhiyun LV VG Attr LSize Origin Snap% Move Log Copy% Convert 179*4882a593Smuzhiyun base volumeGroup Owi-a- 4.00g 17.23 180*4882a593Smuzhiyun 181*4882a593Smuzhiyun # dmsetup status volumeGroup-base 182*4882a593Smuzhiyun 0 8388608 snapshot-merge 281688/2097152 1104 183*4882a593Smuzhiyun 184*4882a593Smuzhiyun # dmsetup status volumeGroup-base 185*4882a593Smuzhiyun 0 8388608 snapshot-merge 180480/2097152 712 186*4882a593Smuzhiyun 187*4882a593Smuzhiyun # dmsetup status volumeGroup-base 188*4882a593Smuzhiyun 0 8388608 snapshot-merge 16/2097152 16 189*4882a593Smuzhiyun 190*4882a593SmuzhiyunMerging has finished. 191*4882a593Smuzhiyun 192*4882a593Smuzhiyun:: 193*4882a593Smuzhiyun 194*4882a593Smuzhiyun # lvs 195*4882a593Smuzhiyun LV VG Attr LSize Origin Snap% Move Log Copy% Convert 196*4882a593Smuzhiyun base volumeGroup owi-a- 4.00g 197