xref: /OK3568_Linux_fs/kernel/Documentation/vm/zswap.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. _zswap:
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun=====
4*4882a593Smuzhiyunzswap
5*4882a593Smuzhiyun=====
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunOverview
8*4882a593Smuzhiyun========
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunZswap is a lightweight compressed cache for swap pages. It takes pages that are
11*4882a593Smuzhiyunin the process of being swapped out and attempts to compress them into a
12*4882a593Smuzhiyundynamically allocated RAM-based memory pool.  zswap basically trades CPU cycles
13*4882a593Smuzhiyunfor potentially reduced swap I/O.  This trade-off can also result in a
14*4882a593Smuzhiyunsignificant performance improvement if reads from the compressed cache are
15*4882a593Smuzhiyunfaster than reads from a swap device.
16*4882a593Smuzhiyun
17*4882a593Smuzhiyun.. note::
18*4882a593Smuzhiyun   Zswap is a new feature as of v3.11 and interacts heavily with memory
19*4882a593Smuzhiyun   reclaim.  This interaction has not been fully explored on the large set of
20*4882a593Smuzhiyun   potential configurations and workloads that exist.  For this reason, zswap
21*4882a593Smuzhiyun   is a work in progress and should be considered experimental.
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun   Some potential benefits:
24*4882a593Smuzhiyun
25*4882a593Smuzhiyun* Desktop/laptop users with limited RAM capacities can mitigate the
26*4882a593Smuzhiyun  performance impact of swapping.
27*4882a593Smuzhiyun* Overcommitted guests that share a common I/O resource can
28*4882a593Smuzhiyun  dramatically reduce their swap I/O pressure, avoiding heavy handed I/O
29*4882a593Smuzhiyun  throttling by the hypervisor. This allows more work to get done with less
30*4882a593Smuzhiyun  impact to the guest workload and guests sharing the I/O subsystem
31*4882a593Smuzhiyun* Users with SSDs as swap devices can extend the life of the device by
32*4882a593Smuzhiyun  drastically reducing life-shortening writes.
33*4882a593Smuzhiyun
34*4882a593SmuzhiyunZswap evicts pages from compressed cache on an LRU basis to the backing swap
35*4882a593Smuzhiyundevice when the compressed pool reaches its size limit.  This requirement had
36*4882a593Smuzhiyunbeen identified in prior community discussions.
37*4882a593Smuzhiyun
38*4882a593SmuzhiyunWhether Zswap is enabled at the boot time depends on whether
39*4882a593Smuzhiyunthe ``CONFIG_ZSWAP_DEFAULT_ON`` Kconfig option is enabled or not.
40*4882a593SmuzhiyunThis setting can then be overridden by providing the kernel command line
41*4882a593Smuzhiyun``zswap.enabled=`` option, for example ``zswap.enabled=0``.
42*4882a593SmuzhiyunZswap can also be enabled and disabled at runtime using the sysfs interface.
43*4882a593SmuzhiyunAn example command to enable zswap at runtime, assuming sysfs is mounted
44*4882a593Smuzhiyunat ``/sys``, is::
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun	echo 1 > /sys/module/zswap/parameters/enabled
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunWhen zswap is disabled at runtime it will stop storing pages that are
49*4882a593Smuzhiyunbeing swapped out.  However, it will _not_ immediately write out or fault
50*4882a593Smuzhiyunback into memory all of the pages stored in the compressed pool.  The
51*4882a593Smuzhiyunpages stored in zswap will remain in the compressed pool until they are
52*4882a593Smuzhiyuneither invalidated or faulted back into memory.  In order to force all
53*4882a593Smuzhiyunpages out of the compressed pool, a swapoff on the swap device(s) will
54*4882a593Smuzhiyunfault back into memory all swapped out pages, including those in the
55*4882a593Smuzhiyuncompressed pool.
56*4882a593Smuzhiyun
57*4882a593SmuzhiyunDesign
58*4882a593Smuzhiyun======
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunZswap receives pages for compression through the Frontswap API and is able to
61*4882a593Smuzhiyunevict pages from its own compressed pool on an LRU basis and write them back to
62*4882a593Smuzhiyunthe backing swap device in the case that the compressed pool is full.
63*4882a593Smuzhiyun
64*4882a593SmuzhiyunZswap makes use of zpool for the managing the compressed memory pool.  Each
65*4882a593Smuzhiyunallocation in zpool is not directly accessible by address.  Rather, a handle is
66*4882a593Smuzhiyunreturned by the allocation routine and that handle must be mapped before being
67*4882a593Smuzhiyunaccessed.  The compressed memory pool grows on demand and shrinks as compressed
68*4882a593Smuzhiyunpages are freed.  The pool is not preallocated.  By default, a zpool
69*4882a593Smuzhiyunof type selected in ``CONFIG_ZSWAP_ZPOOL_DEFAULT`` Kconfig option is created,
70*4882a593Smuzhiyunbut it can be overridden at boot time by setting the ``zpool`` attribute,
71*4882a593Smuzhiyune.g. ``zswap.zpool=zbud``. It can also be changed at runtime using the sysfs
72*4882a593Smuzhiyun``zpool`` attribute, e.g.::
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun	echo zbud > /sys/module/zswap/parameters/zpool
75*4882a593Smuzhiyun
76*4882a593SmuzhiyunThe zbud type zpool allocates exactly 1 page to store 2 compressed pages, which
77*4882a593Smuzhiyunmeans the compression ratio will always be 2:1 or worse (because of half-full
78*4882a593Smuzhiyunzbud pages).  The zsmalloc type zpool has a more complex compressed page
79*4882a593Smuzhiyunstorage method, and it can achieve greater storage densities.  However,
80*4882a593Smuzhiyunzsmalloc does not implement compressed page eviction, so once zswap fills it
81*4882a593Smuzhiyuncannot evict the oldest page, it can only reject new pages.
82*4882a593Smuzhiyun
83*4882a593SmuzhiyunWhen a swap page is passed from frontswap to zswap, zswap maintains a mapping
84*4882a593Smuzhiyunof the swap entry, a combination of the swap type and swap offset, to the zpool
85*4882a593Smuzhiyunhandle that references that compressed swap page.  This mapping is achieved
86*4882a593Smuzhiyunwith a red-black tree per swap type.  The swap offset is the search key for the
87*4882a593Smuzhiyuntree nodes.
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunDuring a page fault on a PTE that is a swap entry, frontswap calls the zswap
90*4882a593Smuzhiyunload function to decompress the page into the page allocated by the page fault
91*4882a593Smuzhiyunhandler.
92*4882a593Smuzhiyun
93*4882a593SmuzhiyunOnce there are no PTEs referencing a swap page stored in zswap (i.e. the count
94*4882a593Smuzhiyunin the swap_map goes to 0) the swap code calls the zswap invalidate function,
95*4882a593Smuzhiyunvia frontswap, to free the compressed entry.
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunZswap seeks to be simple in its policies.  Sysfs attributes allow for one user
98*4882a593Smuzhiyuncontrolled policy:
99*4882a593Smuzhiyun
100*4882a593Smuzhiyun* max_pool_percent - The maximum percentage of memory that the compressed
101*4882a593Smuzhiyun  pool can occupy.
102*4882a593Smuzhiyun
103*4882a593SmuzhiyunThe default compressor is selected in ``CONFIG_ZSWAP_COMPRESSOR_DEFAULT``
104*4882a593SmuzhiyunKconfig option, but it can be overridden at boot time by setting the
105*4882a593Smuzhiyun``compressor`` attribute, e.g. ``zswap.compressor=lzo``.
106*4882a593SmuzhiyunIt can also be changed at runtime using the sysfs "compressor"
107*4882a593Smuzhiyunattribute, e.g.::
108*4882a593Smuzhiyun
109*4882a593Smuzhiyun	echo lzo > /sys/module/zswap/parameters/compressor
110*4882a593Smuzhiyun
111*4882a593SmuzhiyunWhen the zpool and/or compressor parameter is changed at runtime, any existing
112*4882a593Smuzhiyuncompressed pages are not modified; they are left in their own zpool.  When a
113*4882a593Smuzhiyunrequest is made for a page in an old zpool, it is uncompressed using its
114*4882a593Smuzhiyunoriginal compressor.  Once all pages are removed from an old zpool, the zpool
115*4882a593Smuzhiyunand its compressor are freed.
116*4882a593Smuzhiyun
117*4882a593SmuzhiyunSome of the pages in zswap are same-value filled pages (i.e. contents of the
118*4882a593Smuzhiyunpage have same value or repetitive pattern). These pages include zero-filled
119*4882a593Smuzhiyunpages and they are handled differently. During store operation, a page is
120*4882a593Smuzhiyunchecked if it is a same-value filled page before compressing it. If true, the
121*4882a593Smuzhiyuncompressed length of the page is set to zero and the pattern or same-filled
122*4882a593Smuzhiyunvalue is stored.
123*4882a593Smuzhiyun
124*4882a593SmuzhiyunSame-value filled pages identification feature is enabled by default and can be
125*4882a593Smuzhiyundisabled at boot time by setting the ``same_filled_pages_enabled`` attribute
126*4882a593Smuzhiyunto 0, e.g. ``zswap.same_filled_pages_enabled=0``. It can also be enabled and
127*4882a593Smuzhiyundisabled at runtime using the sysfs ``same_filled_pages_enabled``
128*4882a593Smuzhiyunattribute, e.g.::
129*4882a593Smuzhiyun
130*4882a593Smuzhiyun	echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled
131*4882a593Smuzhiyun
132*4882a593SmuzhiyunWhen zswap same-filled page identification is disabled at runtime, it will stop
133*4882a593Smuzhiyunchecking for the same-value filled pages during store operation. However, the
134*4882a593Smuzhiyunexisting pages which are marked as same-value filled pages remain stored
135*4882a593Smuzhiyununchanged in zswap until they are either loaded or invalidated.
136*4882a593Smuzhiyun
137*4882a593SmuzhiyunTo prevent zswap from shrinking pool when zswap is full and there's a high
138*4882a593Smuzhiyunpressure on swap (this will result in flipping pages in and out zswap pool
139*4882a593Smuzhiyunwithout any real benefit but with a performance drop for the system), a
140*4882a593Smuzhiyunspecial parameter has been introduced to implement a sort of hysteresis to
141*4882a593Smuzhiyunrefuse taking pages into zswap pool until it has sufficient space if the limit
142*4882a593Smuzhiyunhas been hit. To set the threshold at which zswap would start accepting pages
143*4882a593Smuzhiyunagain after it became full, use the sysfs ``accept_threshold_percent``
144*4882a593Smuzhiyunattribute, e. g.::
145*4882a593Smuzhiyun
146*4882a593Smuzhiyun	echo 80 > /sys/module/zswap/parameters/accept_threshold_percent
147*4882a593Smuzhiyun
148*4882a593SmuzhiyunSetting this parameter to 100 will disable the hysteresis.
149*4882a593Smuzhiyun
150*4882a593SmuzhiyunA debugfs interface is provided for various statistic about pool size, number
151*4882a593Smuzhiyunof pages stored, same-value filled pages and various counters for the reasons
152*4882a593Smuzhiyunpages are rejected.
153