xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/device-mapper/dm-clone.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0-only
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun========
4*4882a593Smuzhiyundm-clone
5*4882a593Smuzhiyun========
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunIntroduction
8*4882a593Smuzhiyun============
9*4882a593Smuzhiyun
10*4882a593Smuzhiyundm-clone is a device mapper target which produces a one-to-one copy of an
11*4882a593Smuzhiyunexisting, read-only source device into a writable destination device: It
12*4882a593Smuzhiyunpresents a virtual block device which makes all data appear immediately, and
13*4882a593Smuzhiyunredirects reads and writes accordingly.
14*4882a593Smuzhiyun
15*4882a593SmuzhiyunThe main use case of dm-clone is to clone a potentially remote, high-latency,
16*4882a593Smuzhiyunread-only, archival-type block device into a writable, fast, primary-type device
17*4882a593Smuzhiyunfor fast, low-latency I/O. The cloned device is visible/mountable immediately
18*4882a593Smuzhiyunand the copy of the source device to the destination device happens in the
19*4882a593Smuzhiyunbackground, in parallel with user I/O.
20*4882a593Smuzhiyun
21*4882a593SmuzhiyunFor example, one could restore an application backup from a read-only copy,
22*4882a593Smuzhiyunaccessible through a network storage protocol (NBD, Fibre Channel, iSCSI, AoE,
23*4882a593Smuzhiyunetc.), into a local SSD or NVMe device, and start using the device immediately,
24*4882a593Smuzhiyunwithout waiting for the restore to complete.
25*4882a593Smuzhiyun
26*4882a593SmuzhiyunWhen the cloning completes, the dm-clone table can be removed altogether and be
27*4882a593Smuzhiyunreplaced, e.g., by a linear table, mapping directly to the destination device.
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunThe dm-clone target reuses the metadata library used by the thin-provisioning
30*4882a593Smuzhiyuntarget.
31*4882a593Smuzhiyun
32*4882a593SmuzhiyunGlossary
33*4882a593Smuzhiyun========
34*4882a593Smuzhiyun
35*4882a593Smuzhiyun   Hydration
36*4882a593Smuzhiyun     The process of filling a region of the destination device with data from
37*4882a593Smuzhiyun     the same region of the source device, i.e., copying the region from the
38*4882a593Smuzhiyun     source to the destination device.
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunOnce a region gets hydrated we redirect all I/O regarding it to the destination
41*4882a593Smuzhiyundevice.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunDesign
44*4882a593Smuzhiyun======
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunSub-devices
47*4882a593Smuzhiyun-----------
48*4882a593Smuzhiyun
49*4882a593SmuzhiyunThe target is constructed by passing three devices to it (along with other
50*4882a593Smuzhiyunparameters detailed later):
51*4882a593Smuzhiyun
52*4882a593Smuzhiyun1. A source device - the read-only device that gets cloned and source of the
53*4882a593Smuzhiyun   hydration.
54*4882a593Smuzhiyun
55*4882a593Smuzhiyun2. A destination device - the destination of the hydration, which will become a
56*4882a593Smuzhiyun   clone of the source device.
57*4882a593Smuzhiyun
58*4882a593Smuzhiyun3. A small metadata device - it records which regions are already valid in the
59*4882a593Smuzhiyun   destination device, i.e., which regions have already been hydrated, or have
60*4882a593Smuzhiyun   been written to directly, via user I/O.
61*4882a593Smuzhiyun
62*4882a593SmuzhiyunThe size of the destination device must be at least equal to the size of the
63*4882a593Smuzhiyunsource device.
64*4882a593Smuzhiyun
65*4882a593SmuzhiyunRegions
66*4882a593Smuzhiyun-------
67*4882a593Smuzhiyun
68*4882a593Smuzhiyundm-clone divides the source and destination devices in fixed sized regions.
69*4882a593SmuzhiyunRegions are the unit of hydration, i.e., the minimum amount of data copied from
70*4882a593Smuzhiyunthe source to the destination device.
71*4882a593Smuzhiyun
72*4882a593SmuzhiyunThe region size is configurable when you first create the dm-clone device. The
73*4882a593Smuzhiyunrecommended region size is the same as the file system block size, which usually
74*4882a593Smuzhiyunis 4KB. The region size must be between 8 sectors (4KB) and 2097152 sectors
75*4882a593Smuzhiyun(1GB) and a power of two.
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunReads and writes from/to hydrated regions are serviced from the destination
78*4882a593Smuzhiyundevice.
79*4882a593Smuzhiyun
80*4882a593SmuzhiyunA read to a not yet hydrated region is serviced directly from the source device.
81*4882a593Smuzhiyun
82*4882a593SmuzhiyunA write to a not yet hydrated region will be delayed until the corresponding
83*4882a593Smuzhiyunregion has been hydrated and the hydration of the region starts immediately.
84*4882a593Smuzhiyun
85*4882a593SmuzhiyunNote that a write request with size equal to region size will skip copying of
86*4882a593Smuzhiyunthe corresponding region from the source device and overwrite the region of the
87*4882a593Smuzhiyundestination device directly.
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunDiscards
90*4882a593Smuzhiyun--------
91*4882a593Smuzhiyun
92*4882a593Smuzhiyundm-clone interprets a discard request to a range that hasn't been hydrated yet
93*4882a593Smuzhiyunas a hint to skip hydration of the regions covered by the request, i.e., it
94*4882a593Smuzhiyunskips copying the region's data from the source to the destination device, and
95*4882a593Smuzhiyunonly updates its metadata.
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunIf the destination device supports discards, then by default dm-clone will pass
98*4882a593Smuzhiyundown discard requests to it.
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunBackground Hydration
101*4882a593Smuzhiyun--------------------
102*4882a593Smuzhiyun
103*4882a593Smuzhiyundm-clone copies continuously from the source to the destination device, until
104*4882a593Smuzhiyunall of the device has been copied.
105*4882a593Smuzhiyun
106*4882a593SmuzhiyunCopying data from the source to the destination device uses bandwidth. The user
107*4882a593Smuzhiyuncan set a throttle to prevent more than a certain amount of copying occurring at
108*4882a593Smuzhiyunany one time. Moreover, dm-clone takes into account user I/O traffic going to
109*4882a593Smuzhiyunthe devices and pauses the background hydration when there is I/O in-flight.
110*4882a593Smuzhiyun
111*4882a593SmuzhiyunA message `hydration_threshold <#regions>` can be used to set the maximum number
112*4882a593Smuzhiyunof regions being copied, the default being 1 region.
113*4882a593Smuzhiyun
114*4882a593Smuzhiyundm-clone employs dm-kcopyd for copying portions of the source device to the
115*4882a593Smuzhiyundestination device. By default, we issue copy requests of size equal to the
116*4882a593Smuzhiyunregion size. A message `hydration_batch_size <#regions>` can be used to tune the
117*4882a593Smuzhiyunsize of these copy requests. Increasing the hydration batch size results in
118*4882a593Smuzhiyundm-clone trying to batch together contiguous regions, so we copy the data in
119*4882a593Smuzhiyunbatches of this many regions.
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunWhen the hydration of the destination device finishes, a dm event will be sent
122*4882a593Smuzhiyunto user space.
123*4882a593Smuzhiyun
124*4882a593SmuzhiyunUpdating on-disk metadata
125*4882a593Smuzhiyun-------------------------
126*4882a593Smuzhiyun
127*4882a593SmuzhiyunOn-disk metadata is committed every time a FLUSH or FUA bio is written. If no
128*4882a593Smuzhiyunsuch requests are made then commits will occur every second. This means the
129*4882a593Smuzhiyundm-clone device behaves like a physical disk that has a volatile write cache. If
130*4882a593Smuzhiyunpower is lost you may lose some recent writes. The metadata should always be
131*4882a593Smuzhiyunconsistent in spite of any crash.
132*4882a593Smuzhiyun
133*4882a593SmuzhiyunTarget Interface
134*4882a593Smuzhiyun================
135*4882a593Smuzhiyun
136*4882a593SmuzhiyunConstructor
137*4882a593Smuzhiyun-----------
138*4882a593Smuzhiyun
139*4882a593Smuzhiyun  ::
140*4882a593Smuzhiyun
141*4882a593Smuzhiyun   clone <metadata dev> <destination dev> <source dev> <region size>
142*4882a593Smuzhiyun         [<#feature args> [<feature arg>]* [<#core args> [<core arg>]*]]
143*4882a593Smuzhiyun
144*4882a593Smuzhiyun ================ ==============================================================
145*4882a593Smuzhiyun metadata dev     Fast device holding the persistent metadata
146*4882a593Smuzhiyun destination dev  The destination device, where the source will be cloned
147*4882a593Smuzhiyun source dev       Read only device containing the data that gets cloned
148*4882a593Smuzhiyun region size      The size of a region in sectors
149*4882a593Smuzhiyun
150*4882a593Smuzhiyun #feature args    Number of feature arguments passed
151*4882a593Smuzhiyun feature args     no_hydration or no_discard_passdown
152*4882a593Smuzhiyun
153*4882a593Smuzhiyun #core args       An even number of arguments corresponding to key/value pairs
154*4882a593Smuzhiyun                  passed to dm-clone
155*4882a593Smuzhiyun core args        Key/value pairs passed to dm-clone, e.g. `hydration_threshold
156*4882a593Smuzhiyun                  256`
157*4882a593Smuzhiyun ================ ==============================================================
158*4882a593Smuzhiyun
159*4882a593SmuzhiyunOptional feature arguments are:
160*4882a593Smuzhiyun
161*4882a593Smuzhiyun ==================== =========================================================
162*4882a593Smuzhiyun no_hydration         Create a dm-clone instance with background hydration
163*4882a593Smuzhiyun                      disabled
164*4882a593Smuzhiyun no_discard_passdown  Disable passing down discards to the destination device
165*4882a593Smuzhiyun ==================== =========================================================
166*4882a593Smuzhiyun
167*4882a593SmuzhiyunOptional core arguments are:
168*4882a593Smuzhiyun
169*4882a593Smuzhiyun ================================ ==============================================
170*4882a593Smuzhiyun hydration_threshold <#regions>   Maximum number of regions being copied from
171*4882a593Smuzhiyun                                  the source to the destination device at any
172*4882a593Smuzhiyun                                  one time, during background hydration.
173*4882a593Smuzhiyun hydration_batch_size <#regions>  During background hydration, try to batch
174*4882a593Smuzhiyun                                  together contiguous regions, so we copy data
175*4882a593Smuzhiyun                                  from the source to the destination device in
176*4882a593Smuzhiyun                                  batches of this many regions.
177*4882a593Smuzhiyun ================================ ==============================================
178*4882a593Smuzhiyun
179*4882a593SmuzhiyunStatus
180*4882a593Smuzhiyun------
181*4882a593Smuzhiyun
182*4882a593Smuzhiyun  ::
183*4882a593Smuzhiyun
184*4882a593Smuzhiyun   <metadata block size> <#used metadata blocks>/<#total metadata blocks>
185*4882a593Smuzhiyun   <region size> <#hydrated regions>/<#total regions> <#hydrating regions>
186*4882a593Smuzhiyun   <#feature args> <feature args>* <#core args> <core args>*
187*4882a593Smuzhiyun   <clone metadata mode>
188*4882a593Smuzhiyun
189*4882a593Smuzhiyun ======================= =======================================================
190*4882a593Smuzhiyun metadata block size     Fixed block size for each metadata block in sectors
191*4882a593Smuzhiyun #used metadata blocks   Number of metadata blocks used
192*4882a593Smuzhiyun #total metadata blocks  Total number of metadata blocks
193*4882a593Smuzhiyun region size             Configurable region size for the device in sectors
194*4882a593Smuzhiyun #hydrated regions       Number of regions that have finished hydrating
195*4882a593Smuzhiyun #total regions          Total number of regions to hydrate
196*4882a593Smuzhiyun #hydrating regions      Number of regions currently hydrating
197*4882a593Smuzhiyun #feature args           Number of feature arguments to follow
198*4882a593Smuzhiyun feature args            Feature arguments, e.g. `no_hydration`
199*4882a593Smuzhiyun #core args              Even number of core arguments to follow
200*4882a593Smuzhiyun core args               Key/value pairs for tuning the core, e.g.
201*4882a593Smuzhiyun                         `hydration_threshold 256`
202*4882a593Smuzhiyun clone metadata mode     ro if read-only, rw if read-write
203*4882a593Smuzhiyun
204*4882a593Smuzhiyun                         In serious cases where even a read-only mode is deemed
205*4882a593Smuzhiyun                         unsafe no further I/O will be permitted and the status
206*4882a593Smuzhiyun                         will just contain the string 'Fail'. If the metadata
207*4882a593Smuzhiyun                         mode changes, a dm event will be sent to user space.
208*4882a593Smuzhiyun ======================= =======================================================
209*4882a593Smuzhiyun
210*4882a593SmuzhiyunMessages
211*4882a593Smuzhiyun--------
212*4882a593Smuzhiyun
213*4882a593Smuzhiyun  `disable_hydration`
214*4882a593Smuzhiyun      Disable the background hydration of the destination device.
215*4882a593Smuzhiyun
216*4882a593Smuzhiyun  `enable_hydration`
217*4882a593Smuzhiyun      Enable the background hydration of the destination device.
218*4882a593Smuzhiyun
219*4882a593Smuzhiyun  `hydration_threshold <#regions>`
220*4882a593Smuzhiyun      Set background hydration threshold.
221*4882a593Smuzhiyun
222*4882a593Smuzhiyun  `hydration_batch_size <#regions>`
223*4882a593Smuzhiyun      Set background hydration batch size.
224*4882a593Smuzhiyun
225*4882a593SmuzhiyunExamples
226*4882a593Smuzhiyun========
227*4882a593Smuzhiyun
228*4882a593SmuzhiyunClone a device containing a file system
229*4882a593Smuzhiyun---------------------------------------
230*4882a593Smuzhiyun
231*4882a593Smuzhiyun1. Create the dm-clone device.
232*4882a593Smuzhiyun
233*4882a593Smuzhiyun   ::
234*4882a593Smuzhiyun
235*4882a593Smuzhiyun    dmsetup create clone --table "0 1048576000 clone $metadata_dev $dest_dev \
236*4882a593Smuzhiyun      $source_dev 8 1 no_hydration"
237*4882a593Smuzhiyun
238*4882a593Smuzhiyun2. Mount the device and trim the file system. dm-clone interprets the discards
239*4882a593Smuzhiyun   sent by the file system and it will not hydrate the unused space.
240*4882a593Smuzhiyun
241*4882a593Smuzhiyun   ::
242*4882a593Smuzhiyun
243*4882a593Smuzhiyun    mount /dev/mapper/clone /mnt/cloned-fs
244*4882a593Smuzhiyun    fstrim /mnt/cloned-fs
245*4882a593Smuzhiyun
246*4882a593Smuzhiyun3. Enable background hydration of the destination device.
247*4882a593Smuzhiyun
248*4882a593Smuzhiyun   ::
249*4882a593Smuzhiyun
250*4882a593Smuzhiyun    dmsetup message clone 0 enable_hydration
251*4882a593Smuzhiyun
252*4882a593Smuzhiyun4. When the hydration finishes, we can replace the dm-clone table with a linear
253*4882a593Smuzhiyun   table.
254*4882a593Smuzhiyun
255*4882a593Smuzhiyun   ::
256*4882a593Smuzhiyun
257*4882a593Smuzhiyun    dmsetup suspend clone
258*4882a593Smuzhiyun    dmsetup load clone --table "0 1048576000 linear $dest_dev 0"
259*4882a593Smuzhiyun    dmsetup resume clone
260*4882a593Smuzhiyun
261*4882a593Smuzhiyun   The metadata device is no longer needed and can be safely discarded or reused
262*4882a593Smuzhiyun   for other purposes.
263*4882a593Smuzhiyun
264*4882a593SmuzhiyunKnown issues
265*4882a593Smuzhiyun============
266*4882a593Smuzhiyun
267*4882a593Smuzhiyun1. We redirect reads, to not-yet-hydrated regions, to the source device. If
268*4882a593Smuzhiyun   reading the source device has high latency and the user repeatedly reads from
269*4882a593Smuzhiyun   the same regions, this behaviour could degrade performance. We should use
270*4882a593Smuzhiyun   these reads as hints to hydrate the relevant regions sooner. Currently, we
271*4882a593Smuzhiyun   rely on the page cache to cache these regions, so we hopefully don't end up
272*4882a593Smuzhiyun   reading them multiple times from the source device.
273*4882a593Smuzhiyun
274*4882a593Smuzhiyun2. Release in-core resources, i.e., the bitmaps tracking which regions are
275*4882a593Smuzhiyun   hydrated, after the hydration has finished.
276*4882a593Smuzhiyun
277*4882a593Smuzhiyun3. During background hydration, if we fail to read the source or write to the
278*4882a593Smuzhiyun   destination device, we print an error message, but the hydration process
279*4882a593Smuzhiyun   continues indefinitely, until it succeeds. We should stop the background
280*4882a593Smuzhiyun   hydration after a number of failures and emit a dm event for user space to
281*4882a593Smuzhiyun   notice.
282*4882a593Smuzhiyun
283*4882a593SmuzhiyunWhy not...?
284*4882a593Smuzhiyun===========
285*4882a593Smuzhiyun
286*4882a593SmuzhiyunWe explored the following alternatives before implementing dm-clone:
287*4882a593Smuzhiyun
288*4882a593Smuzhiyun1. Use dm-cache with cache size equal to the source device and implement a new
289*4882a593Smuzhiyun   cloning policy:
290*4882a593Smuzhiyun
291*4882a593Smuzhiyun   * The resulting cache device is not a one-to-one mirror of the source device
292*4882a593Smuzhiyun     and thus we cannot remove the cache device once cloning completes.
293*4882a593Smuzhiyun
294*4882a593Smuzhiyun   * dm-cache writes to the source device, which violates our requirement that
295*4882a593Smuzhiyun     the source device must be treated as read-only.
296*4882a593Smuzhiyun
297*4882a593Smuzhiyun   * Caching is semantically different from cloning.
298*4882a593Smuzhiyun
299*4882a593Smuzhiyun2. Use dm-snapshot with a COW device equal to the source device:
300*4882a593Smuzhiyun
301*4882a593Smuzhiyun   * dm-snapshot stores its metadata in the COW device, so the resulting device
302*4882a593Smuzhiyun     is not a one-to-one mirror of the source device.
303*4882a593Smuzhiyun
304*4882a593Smuzhiyun   * No background copying mechanism.
305*4882a593Smuzhiyun
306*4882a593Smuzhiyun   * dm-snapshot needs to commit its metadata whenever a pending exception
307*4882a593Smuzhiyun     completes, to ensure snapshot consistency. In the case of cloning, we don't
308*4882a593Smuzhiyun     need to be so strict and can rely on committing metadata every time a FLUSH
309*4882a593Smuzhiyun     or FUA bio is written, or periodically, like dm-thin and dm-cache do. This
310*4882a593Smuzhiyun     improves the performance significantly.
311*4882a593Smuzhiyun
312*4882a593Smuzhiyun3. Use dm-mirror: The mirror target has a background copying/mirroring
313*4882a593Smuzhiyun   mechanism, but it writes to all mirrors, thus violating our requirement that
314*4882a593Smuzhiyun   the source device must be treated as read-only.
315*4882a593Smuzhiyun
316*4882a593Smuzhiyun4. Use dm-thin's external snapshot functionality. This approach is the most
317*4882a593Smuzhiyun   promising among all alternatives, as the thinly-provisioned volume is a
318*4882a593Smuzhiyun   one-to-one mirror of the source device and handles reads and writes to
319*4882a593Smuzhiyun   un-provisioned/not-yet-cloned areas the same way as dm-clone does.
320*4882a593Smuzhiyun
321*4882a593Smuzhiyun   Still:
322*4882a593Smuzhiyun
323*4882a593Smuzhiyun   * There is no background copying mechanism, though one could be implemented.
324*4882a593Smuzhiyun
325*4882a593Smuzhiyun   * Most importantly, we want to support arbitrary block devices as the
326*4882a593Smuzhiyun     destination of the cloning process and not restrict ourselves to
327*4882a593Smuzhiyun     thinly-provisioned volumes. Thin-provisioning has an inherent metadata
328*4882a593Smuzhiyun     overhead, for maintaining the thin volume mappings, which significantly
329*4882a593Smuzhiyun     degrades performance.
330*4882a593Smuzhiyun
331*4882a593Smuzhiyun   Moreover, cloning a device shouldn't force the use of thin-provisioning. On
332*4882a593Smuzhiyun   the other hand, if we wish to use thin provisioning, we can just use a thin
333*4882a593Smuzhiyun   LV as dm-clone's destination device.
334