xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/device-mapper/cache-policies.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=============================
2*4882a593SmuzhiyunGuidance for writing policies
3*4882a593Smuzhiyun=============================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunTry to keep transactionality out of it.  The core is careful to
6*4882a593Smuzhiyunavoid asking about anything that is migrating.  This is a pain, but
7*4882a593Smuzhiyunmakes it easier to write the policies.
8*4882a593Smuzhiyun
9*4882a593SmuzhiyunMappings are loaded into the policy at construction time.
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunEvery bio that is mapped by the target is referred to the policy.
12*4882a593SmuzhiyunThe policy can return a simple HIT or MISS or issue a migration.
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunCurrently there's no way for the policy to issue background work,
15*4882a593Smuzhiyune.g. to start writing back dirty blocks that are going to be evicted
16*4882a593Smuzhiyunsoon.
17*4882a593Smuzhiyun
18*4882a593SmuzhiyunBecause we map bios, rather than requests it's easy for the policy
19*4882a593Smuzhiyunto get fooled by many small bios.  For this reason the core target
20*4882a593Smuzhiyunissues periodic ticks to the policy.  It's suggested that the policy
21*4882a593Smuzhiyundoesn't update states (eg, hit counts) for a block more than once
22*4882a593Smuzhiyunfor each tick.  The core ticks by watching bios complete, and so
23*4882a593Smuzhiyuntrying to see when the io scheduler has let the ios run.
24*4882a593Smuzhiyun
25*4882a593Smuzhiyun
26*4882a593SmuzhiyunOverview of supplied cache replacement policies
27*4882a593Smuzhiyun===============================================
28*4882a593Smuzhiyun
29*4882a593Smuzhiyunmultiqueue (mq)
30*4882a593Smuzhiyun---------------
31*4882a593Smuzhiyun
32*4882a593SmuzhiyunThis policy is now an alias for smq (see below).
33*4882a593Smuzhiyun
34*4882a593SmuzhiyunThe following tunables are accepted, but have no effect::
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun	'sequential_threshold <#nr_sequential_ios>'
37*4882a593Smuzhiyun	'random_threshold <#nr_random_ios>'
38*4882a593Smuzhiyun	'read_promote_adjustment <value>'
39*4882a593Smuzhiyun	'write_promote_adjustment <value>'
40*4882a593Smuzhiyun	'discard_promote_adjustment <value>'
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunStochastic multiqueue (smq)
43*4882a593Smuzhiyun---------------------------
44*4882a593Smuzhiyun
45*4882a593SmuzhiyunThis policy is the default.
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunThe stochastic multi-queue (smq) policy addresses some of the problems
48*4882a593Smuzhiyunwith the multiqueue (mq) policy.
49*4882a593Smuzhiyun
50*4882a593SmuzhiyunThe smq policy (vs mq) offers the promise of less memory utilization,
51*4882a593Smuzhiyunimproved performance and increased adaptability in the face of changing
52*4882a593Smuzhiyunworkloads.  smq also does not have any cumbersome tuning knobs.
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunUsers may switch from "mq" to "smq" simply by appropriately reloading a
55*4882a593SmuzhiyunDM table that is using the cache target.  Doing so will cause all of the
56*4882a593Smuzhiyunmq policy's hints to be dropped.  Also, performance of the cache may
57*4882a593Smuzhiyundegrade slightly until smq recalculates the origin device's hotspots
58*4882a593Smuzhiyunthat should be cached.
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunMemory usage
61*4882a593Smuzhiyun^^^^^^^^^^^^
62*4882a593Smuzhiyun
63*4882a593SmuzhiyunThe mq policy used a lot of memory; 88 bytes per cache block on a 64
64*4882a593Smuzhiyunbit machine.
65*4882a593Smuzhiyun
66*4882a593Smuzhiyunsmq uses 28bit indexes to implement its data structures rather than
67*4882a593Smuzhiyunpointers.  It avoids storing an explicit hit count for each block.  It
68*4882a593Smuzhiyunhas a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
69*4882a593Smuzhiyunthe entries (each hotspot block covers a larger area than a single
70*4882a593Smuzhiyuncache block).
71*4882a593Smuzhiyun
72*4882a593SmuzhiyunAll this means smq uses ~25bytes per cache block.  Still a lot of
73*4882a593Smuzhiyunmemory, but a substantial improvement nontheless.
74*4882a593Smuzhiyun
75*4882a593SmuzhiyunLevel balancing
76*4882a593Smuzhiyun^^^^^^^^^^^^^^^
77*4882a593Smuzhiyun
78*4882a593Smuzhiyunmq placed entries in different levels of the multiqueue structures
79*4882a593Smuzhiyunbased on their hit count (~ln(hit count)).  This meant the bottom
80*4882a593Smuzhiyunlevels generally had the most entries, and the top ones had very
81*4882a593Smuzhiyunfew.  Having unbalanced levels like this reduced the efficacy of the
82*4882a593Smuzhiyunmultiqueue.
83*4882a593Smuzhiyun
84*4882a593Smuzhiyunsmq does not maintain a hit count, instead it swaps hit entries with
85*4882a593Smuzhiyunthe least recently used entry from the level above.  The overall
86*4882a593Smuzhiyunordering being a side effect of this stochastic process.  With this
87*4882a593Smuzhiyunscheme we can decide how many entries occupy each multiqueue level,
88*4882a593Smuzhiyunresulting in better promotion/demotion decisions.
89*4882a593Smuzhiyun
90*4882a593SmuzhiyunAdaptability:
91*4882a593SmuzhiyunThe mq policy maintained a hit count for each cache block.  For a
92*4882a593Smuzhiyundifferent block to get promoted to the cache its hit count has to
93*4882a593Smuzhiyunexceed the lowest currently in the cache.  This meant it could take a
94*4882a593Smuzhiyunlong time for the cache to adapt between varying IO patterns.
95*4882a593Smuzhiyun
96*4882a593Smuzhiyunsmq doesn't maintain hit counts, so a lot of this problem just goes
97*4882a593Smuzhiyunaway.  In addition it tracks performance of the hotspot queue, which
98*4882a593Smuzhiyunis used to decide which blocks to promote.  If the hotspot queue is
99*4882a593Smuzhiyunperforming badly then it starts moving entries more quickly between
100*4882a593Smuzhiyunlevels.  This lets it adapt to new IO patterns very quickly.
101*4882a593Smuzhiyun
102*4882a593SmuzhiyunPerformance
103*4882a593Smuzhiyun^^^^^^^^^^^
104*4882a593Smuzhiyun
105*4882a593SmuzhiyunTesting smq shows substantially better performance than mq.
106*4882a593Smuzhiyun
107*4882a593Smuzhiyuncleaner
108*4882a593Smuzhiyun-------
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunThe cleaner writes back all dirty blocks in a cache to decommission it.
111*4882a593Smuzhiyun
112*4882a593SmuzhiyunExamples
113*4882a593Smuzhiyun========
114*4882a593Smuzhiyun
115*4882a593SmuzhiyunThe syntax for a table is::
116*4882a593Smuzhiyun
117*4882a593Smuzhiyun	cache <metadata dev> <cache dev> <origin dev> <block size>
118*4882a593Smuzhiyun	<#feature_args> [<feature arg>]*
119*4882a593Smuzhiyun	<policy> <#policy_args> [<policy arg>]*
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunThe syntax to send a message using the dmsetup command is::
122*4882a593Smuzhiyun
123*4882a593Smuzhiyun	dmsetup message <mapped device> 0 sequential_threshold 1024
124*4882a593Smuzhiyun	dmsetup message <mapped device> 0 random_threshold 8
125*4882a593Smuzhiyun
126*4882a593SmuzhiyunUsing dmsetup::
127*4882a593Smuzhiyun
128*4882a593Smuzhiyun	dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
129*4882a593Smuzhiyun	    /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
130*4882a593Smuzhiyun	creates a 128GB large mapped device named 'blah' with the
131*4882a593Smuzhiyun	sequential threshold set to 1024 and the random_threshold set to 8.
132