1*4882a593Smuzhiyun=============== 2*4882a593SmuzhiyunPersistent data 3*4882a593Smuzhiyun=============== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunIntroduction 6*4882a593Smuzhiyun============ 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunThe more-sophisticated device-mapper targets require complex metadata 9*4882a593Smuzhiyunthat is managed in kernel. In late 2010 we were seeing that various 10*4882a593Smuzhiyundifferent targets were rolling their own data structures, for example: 11*4882a593Smuzhiyun 12*4882a593Smuzhiyun- Mikulas Patocka's multisnap implementation 13*4882a593Smuzhiyun- Heinz Mauelshagen's thin provisioning target 14*4882a593Smuzhiyun- Another btree-based caching target posted to dm-devel 15*4882a593Smuzhiyun- Another multi-snapshot target based on a design of Daniel Phillips 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunMaintaining these data structures takes a lot of work, so if possible 18*4882a593Smuzhiyunwe'd like to reduce the number. 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunThe persistent-data library is an attempt to provide a re-usable 21*4882a593Smuzhiyunframework for people who want to store metadata in device-mapper 22*4882a593Smuzhiyuntargets. It's currently used by the thin-provisioning target and an 23*4882a593Smuzhiyunupcoming hierarchical storage target. 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunOverview 26*4882a593Smuzhiyun======== 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunThe main documentation is in the header files which can all be found 29*4882a593Smuzhiyununder drivers/md/persistent-data. 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunThe block manager 32*4882a593Smuzhiyun----------------- 33*4882a593Smuzhiyun 34*4882a593Smuzhiyundm-block-manager.[hc] 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunThis provides access to the data on disk in fixed sized-blocks. There 37*4882a593Smuzhiyunis a read/write locking interface to prevent concurrent accesses, and 38*4882a593Smuzhiyunkeep data that is being used in the cache. 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunClients of persistent-data are unlikely to use this directly. 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunThe transaction manager 43*4882a593Smuzhiyun----------------------- 44*4882a593Smuzhiyun 45*4882a593Smuzhiyundm-transaction-manager.[hc] 46*4882a593Smuzhiyun 47*4882a593SmuzhiyunThis restricts access to blocks and enforces copy-on-write semantics. 48*4882a593SmuzhiyunThe only way you can get hold of a writable block through the 49*4882a593Smuzhiyuntransaction manager is by shadowing an existing block (ie. doing 50*4882a593Smuzhiyuncopy-on-write) or allocating a fresh one. Shadowing is elided within 51*4882a593Smuzhiyunthe same transaction so performance is reasonable. The commit method 52*4882a593Smuzhiyunensures that all data is flushed before it writes the superblock. 53*4882a593SmuzhiyunOn power failure your metadata will be as it was when last committed. 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunThe Space Maps 56*4882a593Smuzhiyun-------------- 57*4882a593Smuzhiyun 58*4882a593Smuzhiyundm-space-map.h 59*4882a593Smuzhiyundm-space-map-metadata.[hc] 60*4882a593Smuzhiyundm-space-map-disk.[hc] 61*4882a593Smuzhiyun 62*4882a593SmuzhiyunOn-disk data structures that keep track of reference counts of blocks. 63*4882a593SmuzhiyunAlso acts as the allocator of new blocks. Currently two 64*4882a593Smuzhiyunimplementations: a simpler one for managing blocks on a different 65*4882a593Smuzhiyundevice (eg. thinly-provisioned data blocks); and one for managing 66*4882a593Smuzhiyunthe metadata space. The latter is complicated by the need to store 67*4882a593Smuzhiyunits own data within the space it's managing. 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunThe data structures 70*4882a593Smuzhiyun------------------- 71*4882a593Smuzhiyun 72*4882a593Smuzhiyundm-btree.[hc] 73*4882a593Smuzhiyundm-btree-remove.c 74*4882a593Smuzhiyundm-btree-spine.c 75*4882a593Smuzhiyundm-btree-internal.h 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunCurrently there is only one data structure, a hierarchical btree. 78*4882a593SmuzhiyunThere are plans to add more. For example, something with an 79*4882a593Smuzhiyunarray-like interface would see a lot of use. 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunThe btree is 'hierarchical' in that you can define it to be composed 82*4882a593Smuzhiyunof nested btrees, and take multiple keys. For example, the 83*4882a593Smuzhiyunthin-provisioning target uses a btree with two levels of nesting. 84*4882a593SmuzhiyunThe first maps a device id to a mapping tree, and that in turn maps a 85*4882a593Smuzhiyunvirtual block to a physical block. 86*4882a593Smuzhiyun 87*4882a593SmuzhiyunValues stored in the btrees can have arbitrary size. Keys are always 88*4882a593Smuzhiyun64bits, although nesting allows you to use multiple keys. 89