1*4882a593Smuzhiyun.. _zswap: 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun===== 4*4882a593Smuzhiyunzswap 5*4882a593Smuzhiyun===== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunOverview 8*4882a593Smuzhiyun======== 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunZswap is a lightweight compressed cache for swap pages. It takes pages that are 11*4882a593Smuzhiyunin the process of being swapped out and attempts to compress them into a 12*4882a593Smuzhiyundynamically allocated RAM-based memory pool. zswap basically trades CPU cycles 13*4882a593Smuzhiyunfor potentially reduced swap I/O. This trade-off can also result in a 14*4882a593Smuzhiyunsignificant performance improvement if reads from the compressed cache are 15*4882a593Smuzhiyunfaster than reads from a swap device. 16*4882a593Smuzhiyun 17*4882a593Smuzhiyun.. note:: 18*4882a593Smuzhiyun Zswap is a new feature as of v3.11 and interacts heavily with memory 19*4882a593Smuzhiyun reclaim. This interaction has not been fully explored on the large set of 20*4882a593Smuzhiyun potential configurations and workloads that exist. For this reason, zswap 21*4882a593Smuzhiyun is a work in progress and should be considered experimental. 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun Some potential benefits: 24*4882a593Smuzhiyun 25*4882a593Smuzhiyun* Desktop/laptop users with limited RAM capacities can mitigate the 26*4882a593Smuzhiyun performance impact of swapping. 27*4882a593Smuzhiyun* Overcommitted guests that share a common I/O resource can 28*4882a593Smuzhiyun dramatically reduce their swap I/O pressure, avoiding heavy handed I/O 29*4882a593Smuzhiyun throttling by the hypervisor. This allows more work to get done with less 30*4882a593Smuzhiyun impact to the guest workload and guests sharing the I/O subsystem 31*4882a593Smuzhiyun* Users with SSDs as swap devices can extend the life of the device by 32*4882a593Smuzhiyun drastically reducing life-shortening writes. 33*4882a593Smuzhiyun 34*4882a593SmuzhiyunZswap evicts pages from compressed cache on an LRU basis to the backing swap 35*4882a593Smuzhiyundevice when the compressed pool reaches its size limit. This requirement had 36*4882a593Smuzhiyunbeen identified in prior community discussions. 37*4882a593Smuzhiyun 38*4882a593SmuzhiyunWhether Zswap is enabled at the boot time depends on whether 39*4882a593Smuzhiyunthe ``CONFIG_ZSWAP_DEFAULT_ON`` Kconfig option is enabled or not. 40*4882a593SmuzhiyunThis setting can then be overridden by providing the kernel command line 41*4882a593Smuzhiyun``zswap.enabled=`` option, for example ``zswap.enabled=0``. 42*4882a593SmuzhiyunZswap can also be enabled and disabled at runtime using the sysfs interface. 43*4882a593SmuzhiyunAn example command to enable zswap at runtime, assuming sysfs is mounted 44*4882a593Smuzhiyunat ``/sys``, is:: 45*4882a593Smuzhiyun 46*4882a593Smuzhiyun echo 1 > /sys/module/zswap/parameters/enabled 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunWhen zswap is disabled at runtime it will stop storing pages that are 49*4882a593Smuzhiyunbeing swapped out. However, it will _not_ immediately write out or fault 50*4882a593Smuzhiyunback into memory all of the pages stored in the compressed pool. The 51*4882a593Smuzhiyunpages stored in zswap will remain in the compressed pool until they are 52*4882a593Smuzhiyuneither invalidated or faulted back into memory. In order to force all 53*4882a593Smuzhiyunpages out of the compressed pool, a swapoff on the swap device(s) will 54*4882a593Smuzhiyunfault back into memory all swapped out pages, including those in the 55*4882a593Smuzhiyuncompressed pool. 56*4882a593Smuzhiyun 57*4882a593SmuzhiyunDesign 58*4882a593Smuzhiyun====== 59*4882a593Smuzhiyun 60*4882a593SmuzhiyunZswap receives pages for compression through the Frontswap API and is able to 61*4882a593Smuzhiyunevict pages from its own compressed pool on an LRU basis and write them back to 62*4882a593Smuzhiyunthe backing swap device in the case that the compressed pool is full. 63*4882a593Smuzhiyun 64*4882a593SmuzhiyunZswap makes use of zpool for the managing the compressed memory pool. Each 65*4882a593Smuzhiyunallocation in zpool is not directly accessible by address. Rather, a handle is 66*4882a593Smuzhiyunreturned by the allocation routine and that handle must be mapped before being 67*4882a593Smuzhiyunaccessed. The compressed memory pool grows on demand and shrinks as compressed 68*4882a593Smuzhiyunpages are freed. The pool is not preallocated. By default, a zpool 69*4882a593Smuzhiyunof type selected in ``CONFIG_ZSWAP_ZPOOL_DEFAULT`` Kconfig option is created, 70*4882a593Smuzhiyunbut it can be overridden at boot time by setting the ``zpool`` attribute, 71*4882a593Smuzhiyune.g. ``zswap.zpool=zbud``. It can also be changed at runtime using the sysfs 72*4882a593Smuzhiyun``zpool`` attribute, e.g.:: 73*4882a593Smuzhiyun 74*4882a593Smuzhiyun echo zbud > /sys/module/zswap/parameters/zpool 75*4882a593Smuzhiyun 76*4882a593SmuzhiyunThe zbud type zpool allocates exactly 1 page to store 2 compressed pages, which 77*4882a593Smuzhiyunmeans the compression ratio will always be 2:1 or worse (because of half-full 78*4882a593Smuzhiyunzbud pages). The zsmalloc type zpool has a more complex compressed page 79*4882a593Smuzhiyunstorage method, and it can achieve greater storage densities. However, 80*4882a593Smuzhiyunzsmalloc does not implement compressed page eviction, so once zswap fills it 81*4882a593Smuzhiyuncannot evict the oldest page, it can only reject new pages. 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunWhen a swap page is passed from frontswap to zswap, zswap maintains a mapping 84*4882a593Smuzhiyunof the swap entry, a combination of the swap type and swap offset, to the zpool 85*4882a593Smuzhiyunhandle that references that compressed swap page. This mapping is achieved 86*4882a593Smuzhiyunwith a red-black tree per swap type. The swap offset is the search key for the 87*4882a593Smuzhiyuntree nodes. 88*4882a593Smuzhiyun 89*4882a593SmuzhiyunDuring a page fault on a PTE that is a swap entry, frontswap calls the zswap 90*4882a593Smuzhiyunload function to decompress the page into the page allocated by the page fault 91*4882a593Smuzhiyunhandler. 92*4882a593Smuzhiyun 93*4882a593SmuzhiyunOnce there are no PTEs referencing a swap page stored in zswap (i.e. the count 94*4882a593Smuzhiyunin the swap_map goes to 0) the swap code calls the zswap invalidate function, 95*4882a593Smuzhiyunvia frontswap, to free the compressed entry. 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunZswap seeks to be simple in its policies. Sysfs attributes allow for one user 98*4882a593Smuzhiyuncontrolled policy: 99*4882a593Smuzhiyun 100*4882a593Smuzhiyun* max_pool_percent - The maximum percentage of memory that the compressed 101*4882a593Smuzhiyun pool can occupy. 102*4882a593Smuzhiyun 103*4882a593SmuzhiyunThe default compressor is selected in ``CONFIG_ZSWAP_COMPRESSOR_DEFAULT`` 104*4882a593SmuzhiyunKconfig option, but it can be overridden at boot time by setting the 105*4882a593Smuzhiyun``compressor`` attribute, e.g. ``zswap.compressor=lzo``. 106*4882a593SmuzhiyunIt can also be changed at runtime using the sysfs "compressor" 107*4882a593Smuzhiyunattribute, e.g.:: 108*4882a593Smuzhiyun 109*4882a593Smuzhiyun echo lzo > /sys/module/zswap/parameters/compressor 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunWhen the zpool and/or compressor parameter is changed at runtime, any existing 112*4882a593Smuzhiyuncompressed pages are not modified; they are left in their own zpool. When a 113*4882a593Smuzhiyunrequest is made for a page in an old zpool, it is uncompressed using its 114*4882a593Smuzhiyunoriginal compressor. Once all pages are removed from an old zpool, the zpool 115*4882a593Smuzhiyunand its compressor are freed. 116*4882a593Smuzhiyun 117*4882a593SmuzhiyunSome of the pages in zswap are same-value filled pages (i.e. contents of the 118*4882a593Smuzhiyunpage have same value or repetitive pattern). These pages include zero-filled 119*4882a593Smuzhiyunpages and they are handled differently. During store operation, a page is 120*4882a593Smuzhiyunchecked if it is a same-value filled page before compressing it. If true, the 121*4882a593Smuzhiyuncompressed length of the page is set to zero and the pattern or same-filled 122*4882a593Smuzhiyunvalue is stored. 123*4882a593Smuzhiyun 124*4882a593SmuzhiyunSame-value filled pages identification feature is enabled by default and can be 125*4882a593Smuzhiyundisabled at boot time by setting the ``same_filled_pages_enabled`` attribute 126*4882a593Smuzhiyunto 0, e.g. ``zswap.same_filled_pages_enabled=0``. It can also be enabled and 127*4882a593Smuzhiyundisabled at runtime using the sysfs ``same_filled_pages_enabled`` 128*4882a593Smuzhiyunattribute, e.g.:: 129*4882a593Smuzhiyun 130*4882a593Smuzhiyun echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled 131*4882a593Smuzhiyun 132*4882a593SmuzhiyunWhen zswap same-filled page identification is disabled at runtime, it will stop 133*4882a593Smuzhiyunchecking for the same-value filled pages during store operation. However, the 134*4882a593Smuzhiyunexisting pages which are marked as same-value filled pages remain stored 135*4882a593Smuzhiyununchanged in zswap until they are either loaded or invalidated. 136*4882a593Smuzhiyun 137*4882a593SmuzhiyunTo prevent zswap from shrinking pool when zswap is full and there's a high 138*4882a593Smuzhiyunpressure on swap (this will result in flipping pages in and out zswap pool 139*4882a593Smuzhiyunwithout any real benefit but with a performance drop for the system), a 140*4882a593Smuzhiyunspecial parameter has been introduced to implement a sort of hysteresis to 141*4882a593Smuzhiyunrefuse taking pages into zswap pool until it has sufficient space if the limit 142*4882a593Smuzhiyunhas been hit. To set the threshold at which zswap would start accepting pages 143*4882a593Smuzhiyunagain after it became full, use the sysfs ``accept_threshold_percent`` 144*4882a593Smuzhiyunattribute, e. g.:: 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun echo 80 > /sys/module/zswap/parameters/accept_threshold_percent 147*4882a593Smuzhiyun 148*4882a593SmuzhiyunSetting this parameter to 100 will disable the hysteresis. 149*4882a593Smuzhiyun 150*4882a593SmuzhiyunA debugfs interface is provided for various statistic about pool size, number 151*4882a593Smuzhiyunof pages stored, same-value filled pages and various counters for the reasons 152*4882a593Smuzhiyunpages are rejected. 153