1*4882a593Smuzhiyun============================= 2*4882a593SmuzhiyunGuidance for writing policies 3*4882a593Smuzhiyun============================= 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunTry to keep transactionality out of it. The core is careful to 6*4882a593Smuzhiyunavoid asking about anything that is migrating. This is a pain, but 7*4882a593Smuzhiyunmakes it easier to write the policies. 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunMappings are loaded into the policy at construction time. 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunEvery bio that is mapped by the target is referred to the policy. 12*4882a593SmuzhiyunThe policy can return a simple HIT or MISS or issue a migration. 13*4882a593Smuzhiyun 14*4882a593SmuzhiyunCurrently there's no way for the policy to issue background work, 15*4882a593Smuzhiyune.g. to start writing back dirty blocks that are going to be evicted 16*4882a593Smuzhiyunsoon. 17*4882a593Smuzhiyun 18*4882a593SmuzhiyunBecause we map bios, rather than requests it's easy for the policy 19*4882a593Smuzhiyunto get fooled by many small bios. For this reason the core target 20*4882a593Smuzhiyunissues periodic ticks to the policy. It's suggested that the policy 21*4882a593Smuzhiyundoesn't update states (eg, hit counts) for a block more than once 22*4882a593Smuzhiyunfor each tick. The core ticks by watching bios complete, and so 23*4882a593Smuzhiyuntrying to see when the io scheduler has let the ios run. 24*4882a593Smuzhiyun 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunOverview of supplied cache replacement policies 27*4882a593Smuzhiyun=============================================== 28*4882a593Smuzhiyun 29*4882a593Smuzhiyunmultiqueue (mq) 30*4882a593Smuzhiyun--------------- 31*4882a593Smuzhiyun 32*4882a593SmuzhiyunThis policy is now an alias for smq (see below). 33*4882a593Smuzhiyun 34*4882a593SmuzhiyunThe following tunables are accepted, but have no effect:: 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun 'sequential_threshold <#nr_sequential_ios>' 37*4882a593Smuzhiyun 'random_threshold <#nr_random_ios>' 38*4882a593Smuzhiyun 'read_promote_adjustment <value>' 39*4882a593Smuzhiyun 'write_promote_adjustment <value>' 40*4882a593Smuzhiyun 'discard_promote_adjustment <value>' 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunStochastic multiqueue (smq) 43*4882a593Smuzhiyun--------------------------- 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunThis policy is the default. 46*4882a593Smuzhiyun 47*4882a593SmuzhiyunThe stochastic multi-queue (smq) policy addresses some of the problems 48*4882a593Smuzhiyunwith the multiqueue (mq) policy. 49*4882a593Smuzhiyun 50*4882a593SmuzhiyunThe smq policy (vs mq) offers the promise of less memory utilization, 51*4882a593Smuzhiyunimproved performance and increased adaptability in the face of changing 52*4882a593Smuzhiyunworkloads. smq also does not have any cumbersome tuning knobs. 53*4882a593Smuzhiyun 54*4882a593SmuzhiyunUsers may switch from "mq" to "smq" simply by appropriately reloading a 55*4882a593SmuzhiyunDM table that is using the cache target. Doing so will cause all of the 56*4882a593Smuzhiyunmq policy's hints to be dropped. Also, performance of the cache may 57*4882a593Smuzhiyundegrade slightly until smq recalculates the origin device's hotspots 58*4882a593Smuzhiyunthat should be cached. 59*4882a593Smuzhiyun 60*4882a593SmuzhiyunMemory usage 61*4882a593Smuzhiyun^^^^^^^^^^^^ 62*4882a593Smuzhiyun 63*4882a593SmuzhiyunThe mq policy used a lot of memory; 88 bytes per cache block on a 64 64*4882a593Smuzhiyunbit machine. 65*4882a593Smuzhiyun 66*4882a593Smuzhiyunsmq uses 28bit indexes to implement its data structures rather than 67*4882a593Smuzhiyunpointers. It avoids storing an explicit hit count for each block. It 68*4882a593Smuzhiyunhas a 'hotspot' queue, rather than a pre-cache, which uses a quarter of 69*4882a593Smuzhiyunthe entries (each hotspot block covers a larger area than a single 70*4882a593Smuzhiyuncache block). 71*4882a593Smuzhiyun 72*4882a593SmuzhiyunAll this means smq uses ~25bytes per cache block. Still a lot of 73*4882a593Smuzhiyunmemory, but a substantial improvement nontheless. 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunLevel balancing 76*4882a593Smuzhiyun^^^^^^^^^^^^^^^ 77*4882a593Smuzhiyun 78*4882a593Smuzhiyunmq placed entries in different levels of the multiqueue structures 79*4882a593Smuzhiyunbased on their hit count (~ln(hit count)). This meant the bottom 80*4882a593Smuzhiyunlevels generally had the most entries, and the top ones had very 81*4882a593Smuzhiyunfew. Having unbalanced levels like this reduced the efficacy of the 82*4882a593Smuzhiyunmultiqueue. 83*4882a593Smuzhiyun 84*4882a593Smuzhiyunsmq does not maintain a hit count, instead it swaps hit entries with 85*4882a593Smuzhiyunthe least recently used entry from the level above. The overall 86*4882a593Smuzhiyunordering being a side effect of this stochastic process. With this 87*4882a593Smuzhiyunscheme we can decide how many entries occupy each multiqueue level, 88*4882a593Smuzhiyunresulting in better promotion/demotion decisions. 89*4882a593Smuzhiyun 90*4882a593SmuzhiyunAdaptability: 91*4882a593SmuzhiyunThe mq policy maintained a hit count for each cache block. For a 92*4882a593Smuzhiyundifferent block to get promoted to the cache its hit count has to 93*4882a593Smuzhiyunexceed the lowest currently in the cache. This meant it could take a 94*4882a593Smuzhiyunlong time for the cache to adapt between varying IO patterns. 95*4882a593Smuzhiyun 96*4882a593Smuzhiyunsmq doesn't maintain hit counts, so a lot of this problem just goes 97*4882a593Smuzhiyunaway. In addition it tracks performance of the hotspot queue, which 98*4882a593Smuzhiyunis used to decide which blocks to promote. If the hotspot queue is 99*4882a593Smuzhiyunperforming badly then it starts moving entries more quickly between 100*4882a593Smuzhiyunlevels. This lets it adapt to new IO patterns very quickly. 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunPerformance 103*4882a593Smuzhiyun^^^^^^^^^^^ 104*4882a593Smuzhiyun 105*4882a593SmuzhiyunTesting smq shows substantially better performance than mq. 106*4882a593Smuzhiyun 107*4882a593Smuzhiyuncleaner 108*4882a593Smuzhiyun------- 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunThe cleaner writes back all dirty blocks in a cache to decommission it. 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunExamples 113*4882a593Smuzhiyun======== 114*4882a593Smuzhiyun 115*4882a593SmuzhiyunThe syntax for a table is:: 116*4882a593Smuzhiyun 117*4882a593Smuzhiyun cache <metadata dev> <cache dev> <origin dev> <block size> 118*4882a593Smuzhiyun <#feature_args> [<feature arg>]* 119*4882a593Smuzhiyun <policy> <#policy_args> [<policy arg>]* 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunThe syntax to send a message using the dmsetup command is:: 122*4882a593Smuzhiyun 123*4882a593Smuzhiyun dmsetup message <mapped device> 0 sequential_threshold 1024 124*4882a593Smuzhiyun dmsetup message <mapped device> 0 random_threshold 8 125*4882a593Smuzhiyun 126*4882a593SmuzhiyunUsing dmsetup:: 127*4882a593Smuzhiyun 128*4882a593Smuzhiyun dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \ 129*4882a593Smuzhiyun /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8" 130*4882a593Smuzhiyun creates a 128GB large mapped device named 'blah' with the 131*4882a593Smuzhiyun sequential threshold set to 1024 and the random_threshold set to 8. 132