admin-guide/device-mapper/cache-policies.rst

*4882a593Smuzhiyun=============================
*4882a593SmuzhiyunGuidance for writing policies
*4882a593Smuzhiyun=============================
*4882a593Smuzhiyun
*4882a593SmuzhiyunTry to keep transactionality out of it.  The core is careful to
*4882a593Smuzhiyunavoid asking about anything that is migrating.  This is a pain, but
*4882a593Smuzhiyunmakes it easier to write the policies.
*4882a593Smuzhiyun
*4882a593SmuzhiyunMappings are loaded into the policy at construction time.
*4882a593Smuzhiyun
*4882a593SmuzhiyunEvery bio that is mapped by the target is referred to the policy.
*4882a593SmuzhiyunThe policy can return a simple HIT or MISS or issue a migration.
*4882a593Smuzhiyun
*4882a593SmuzhiyunCurrently there's no way for the policy to issue background work,
*4882a593Smuzhiyune.g. to start writing back dirty blocks that are going to be evicted
*4882a593Smuzhiyunsoon.
*4882a593Smuzhiyun
*4882a593SmuzhiyunBecause we map bios, rather than requests it's easy for the policy
*4882a593Smuzhiyunto get fooled by many small bios.  For this reason the core target
*4882a593Smuzhiyunissues periodic ticks to the policy.  It's suggested that the policy
*4882a593Smuzhiyundoesn't update states (eg, hit counts) for a block more than once
*4882a593Smuzhiyunfor each tick.  The core ticks by watching bios complete, and so
*4882a593Smuzhiyuntrying to see when the io scheduler has let the ios run.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunOverview of supplied cache replacement policies
*4882a593Smuzhiyun===============================================
*4882a593Smuzhiyun
*4882a593Smuzhiyunmultiqueue (mq)
*4882a593Smuzhiyun---------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis policy is now an alias for smq (see below).
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe following tunables are accepted, but have no effect::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	'sequential_threshold <#nr_sequential_ios>'
*4882a593Smuzhiyun	'random_threshold <#nr_random_ios>'
*4882a593Smuzhiyun	'read_promote_adjustment <value>'
*4882a593Smuzhiyun	'write_promote_adjustment <value>'
*4882a593Smuzhiyun	'discard_promote_adjustment <value>'
*4882a593Smuzhiyun
*4882a593SmuzhiyunStochastic multiqueue (smq)
*4882a593Smuzhiyun---------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis policy is the default.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe stochastic multi-queue (smq) policy addresses some of the problems
*4882a593Smuzhiyunwith the multiqueue (mq) policy.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe smq policy (vs mq) offers the promise of less memory utilization,
*4882a593Smuzhiyunimproved performance and increased adaptability in the face of changing
*4882a593Smuzhiyunworkloads.  smq also does not have any cumbersome tuning knobs.
*4882a593Smuzhiyun
*4882a593SmuzhiyunUsers may switch from "mq" to "smq" simply by appropriately reloading a
*4882a593SmuzhiyunDM table that is using the cache target.  Doing so will cause all of the
*4882a593Smuzhiyunmq policy's hints to be dropped.  Also, performance of the cache may
*4882a593Smuzhiyundegrade slightly until smq recalculates the origin device's hotspots
*4882a593Smuzhiyunthat should be cached.
*4882a593Smuzhiyun
*4882a593SmuzhiyunMemory usage
*4882a593Smuzhiyun^^^^^^^^^^^^
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe mq policy used a lot of memory; 88 bytes per cache block on a 64
*4882a593Smuzhiyunbit machine.
*4882a593Smuzhiyun
*4882a593Smuzhiyunsmq uses 28bit indexes to implement its data structures rather than
*4882a593Smuzhiyunpointers.  It avoids storing an explicit hit count for each block.  It
*4882a593Smuzhiyunhas a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
*4882a593Smuzhiyunthe entries (each hotspot block covers a larger area than a single
*4882a593Smuzhiyuncache block).
*4882a593Smuzhiyun
*4882a593SmuzhiyunAll this means smq uses ~25bytes per cache block.  Still a lot of
*4882a593Smuzhiyunmemory, but a substantial improvement nontheless.
*4882a593Smuzhiyun
*4882a593SmuzhiyunLevel balancing
*4882a593Smuzhiyun^^^^^^^^^^^^^^^
*4882a593Smuzhiyun
*4882a593Smuzhiyunmq placed entries in different levels of the multiqueue structures
*4882a593Smuzhiyunbased on their hit count (~ln(hit count)).  This meant the bottom
*4882a593Smuzhiyunlevels generally had the most entries, and the top ones had very
*4882a593Smuzhiyunfew.  Having unbalanced levels like this reduced the efficacy of the
*4882a593Smuzhiyunmultiqueue.
*4882a593Smuzhiyun
*4882a593Smuzhiyunsmq does not maintain a hit count, instead it swaps hit entries with
*4882a593Smuzhiyunthe least recently used entry from the level above.  The overall
*4882a593Smuzhiyunordering being a side effect of this stochastic process.  With this
*4882a593Smuzhiyunscheme we can decide how many entries occupy each multiqueue level,
*4882a593Smuzhiyunresulting in better promotion/demotion decisions.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAdaptability:
*4882a593SmuzhiyunThe mq policy maintained a hit count for each cache block.  For a
*4882a593Smuzhiyundifferent block to get promoted to the cache its hit count has to
*4882a593Smuzhiyunexceed the lowest currently in the cache.  This meant it could take a
*4882a593Smuzhiyunlong time for the cache to adapt between varying IO patterns.
*4882a593Smuzhiyun
*4882a593Smuzhiyunsmq doesn't maintain hit counts, so a lot of this problem just goes
*4882a593Smuzhiyunaway.  In addition it tracks performance of the hotspot queue, which
*4882a593Smuzhiyunis used to decide which blocks to promote.  If the hotspot queue is
*4882a593Smuzhiyunperforming badly then it starts moving entries more quickly between
*4882a593Smuzhiyunlevels.  This lets it adapt to new IO patterns very quickly.
*4882a593Smuzhiyun
*4882a593SmuzhiyunPerformance
*4882a593Smuzhiyun^^^^^^^^^^^
*4882a593Smuzhiyun
*4882a593SmuzhiyunTesting smq shows substantially better performance than mq.
*4882a593Smuzhiyun
*4882a593Smuzhiyuncleaner
*4882a593Smuzhiyun-------
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe cleaner writes back all dirty blocks in a cache to decommission it.
*4882a593Smuzhiyun
*4882a593SmuzhiyunExamples
*4882a593Smuzhiyun========
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe syntax for a table is::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	cache <metadata dev> <cache dev> <origin dev> <block size>
*4882a593Smuzhiyun	<#feature_args> [<feature arg>]*
*4882a593Smuzhiyun	<policy> <#policy_args> [<policy arg>]*
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe syntax to send a message using the dmsetup command is::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	dmsetup message <mapped device> 0 sequential_threshold 1024
*4882a593Smuzhiyun	dmsetup message <mapped device> 0 random_threshold 8
*4882a593Smuzhiyun
*4882a593SmuzhiyunUsing dmsetup::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
*4882a593Smuzhiyun	    /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
*4882a593Smuzhiyun	creates a 128GB large mapped device named 'blah' with the
*4882a593Smuzhiyun	sequential threshold set to 1024 and the random_threshold set to 8.