xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/device-mapper/persistent-data.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun===============
2*4882a593SmuzhiyunPersistent data
3*4882a593Smuzhiyun===============
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunIntroduction
6*4882a593Smuzhiyun============
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunThe more-sophisticated device-mapper targets require complex metadata
9*4882a593Smuzhiyunthat is managed in kernel.  In late 2010 we were seeing that various
10*4882a593Smuzhiyundifferent targets were rolling their own data structures, for example:
11*4882a593Smuzhiyun
12*4882a593Smuzhiyun- Mikulas Patocka's multisnap implementation
13*4882a593Smuzhiyun- Heinz Mauelshagen's thin provisioning target
14*4882a593Smuzhiyun- Another btree-based caching target posted to dm-devel
15*4882a593Smuzhiyun- Another multi-snapshot target based on a design of Daniel Phillips
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunMaintaining these data structures takes a lot of work, so if possible
18*4882a593Smuzhiyunwe'd like to reduce the number.
19*4882a593Smuzhiyun
20*4882a593SmuzhiyunThe persistent-data library is an attempt to provide a re-usable
21*4882a593Smuzhiyunframework for people who want to store metadata in device-mapper
22*4882a593Smuzhiyuntargets.  It's currently used by the thin-provisioning target and an
23*4882a593Smuzhiyunupcoming hierarchical storage target.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunOverview
26*4882a593Smuzhiyun========
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunThe main documentation is in the header files which can all be found
29*4882a593Smuzhiyununder drivers/md/persistent-data.
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunThe block manager
32*4882a593Smuzhiyun-----------------
33*4882a593Smuzhiyun
34*4882a593Smuzhiyundm-block-manager.[hc]
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunThis provides access to the data on disk in fixed sized-blocks.  There
37*4882a593Smuzhiyunis a read/write locking interface to prevent concurrent accesses, and
38*4882a593Smuzhiyunkeep data that is being used in the cache.
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunClients of persistent-data are unlikely to use this directly.
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunThe transaction manager
43*4882a593Smuzhiyun-----------------------
44*4882a593Smuzhiyun
45*4882a593Smuzhiyundm-transaction-manager.[hc]
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunThis restricts access to blocks and enforces copy-on-write semantics.
48*4882a593SmuzhiyunThe only way you can get hold of a writable block through the
49*4882a593Smuzhiyuntransaction manager is by shadowing an existing block (ie. doing
50*4882a593Smuzhiyuncopy-on-write) or allocating a fresh one.  Shadowing is elided within
51*4882a593Smuzhiyunthe same transaction so performance is reasonable.  The commit method
52*4882a593Smuzhiyunensures that all data is flushed before it writes the superblock.
53*4882a593SmuzhiyunOn power failure your metadata will be as it was when last committed.
54*4882a593Smuzhiyun
55*4882a593SmuzhiyunThe Space Maps
56*4882a593Smuzhiyun--------------
57*4882a593Smuzhiyun
58*4882a593Smuzhiyundm-space-map.h
59*4882a593Smuzhiyundm-space-map-metadata.[hc]
60*4882a593Smuzhiyundm-space-map-disk.[hc]
61*4882a593Smuzhiyun
62*4882a593SmuzhiyunOn-disk data structures that keep track of reference counts of blocks.
63*4882a593SmuzhiyunAlso acts as the allocator of new blocks.  Currently two
64*4882a593Smuzhiyunimplementations: a simpler one for managing blocks on a different
65*4882a593Smuzhiyundevice (eg. thinly-provisioned data blocks); and one for managing
66*4882a593Smuzhiyunthe metadata space.  The latter is complicated by the need to store
67*4882a593Smuzhiyunits own data within the space it's managing.
68*4882a593Smuzhiyun
69*4882a593SmuzhiyunThe data structures
70*4882a593Smuzhiyun-------------------
71*4882a593Smuzhiyun
72*4882a593Smuzhiyundm-btree.[hc]
73*4882a593Smuzhiyundm-btree-remove.c
74*4882a593Smuzhiyundm-btree-spine.c
75*4882a593Smuzhiyundm-btree-internal.h
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunCurrently there is only one data structure, a hierarchical btree.
78*4882a593SmuzhiyunThere are plans to add more.  For example, something with an
79*4882a593Smuzhiyunarray-like interface would see a lot of use.
80*4882a593Smuzhiyun
81*4882a593SmuzhiyunThe btree is 'hierarchical' in that you can define it to be composed
82*4882a593Smuzhiyunof nested btrees, and take multiple keys.  For example, the
83*4882a593Smuzhiyunthin-provisioning target uses a btree with two levels of nesting.
84*4882a593SmuzhiyunThe first maps a device id to a mapping tree, and that in turn maps a
85*4882a593Smuzhiyunvirtual block to a physical block.
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunValues stored in the btrees can have arbitrary size.  Keys are always
88*4882a593Smuzhiyun64bits, although nesting allows you to use multiple keys.
89