1*4882a593Smuzhiyun.. _ksm: 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun======================= 4*4882a593SmuzhiyunKernel Samepage Merging 5*4882a593Smuzhiyun======================= 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunKSM is a memory-saving de-duplication feature, enabled by CONFIG_KSM=y, 8*4882a593Smuzhiyunadded to the Linux kernel in 2.6.32. See ``mm/ksm.c`` for its implementation, 9*4882a593Smuzhiyunand http://lwn.net/Articles/306704/ and https://lwn.net/Articles/330589/ 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunThe userspace interface of KSM is described in :ref:`Documentation/admin-guide/mm/ksm.rst <admin_guide_ksm>` 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunDesign 14*4882a593Smuzhiyun====== 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunOverview 17*4882a593Smuzhiyun-------- 18*4882a593Smuzhiyun 19*4882a593Smuzhiyun.. kernel-doc:: mm/ksm.c 20*4882a593Smuzhiyun :DOC: Overview 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunReverse mapping 23*4882a593Smuzhiyun--------------- 24*4882a593SmuzhiyunKSM maintains reverse mapping information for KSM pages in the stable 25*4882a593Smuzhiyuntree. 26*4882a593Smuzhiyun 27*4882a593SmuzhiyunIf a KSM page is shared between less than ``max_page_sharing`` VMAs, 28*4882a593Smuzhiyunthe node of the stable tree that represents such KSM page points to a 29*4882a593Smuzhiyunlist of struct rmap_item and the ``page->mapping`` of the 30*4882a593SmuzhiyunKSM page points to the stable tree node. 31*4882a593Smuzhiyun 32*4882a593SmuzhiyunWhen the sharing passes this threshold, KSM adds a second dimension to 33*4882a593Smuzhiyunthe stable tree. The tree node becomes a "chain" that links one or 34*4882a593Smuzhiyunmore "dups". Each "dup" keeps reverse mapping information for a KSM 35*4882a593Smuzhiyunpage with ``page->mapping`` pointing to that "dup". 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunEvery "chain" and all "dups" linked into a "chain" enforce the 38*4882a593Smuzhiyuninvariant that they represent the same write protected memory content, 39*4882a593Smuzhiyuneven if each "dup" will be pointed by a different KSM page copy of 40*4882a593Smuzhiyunthat content. 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunThis way the stable tree lookup computational complexity is unaffected 43*4882a593Smuzhiyunif compared to an unlimited list of reverse mappings. It is still 44*4882a593Smuzhiyunenforced that there cannot be KSM page content duplicates in the 45*4882a593Smuzhiyunstable tree itself. 46*4882a593Smuzhiyun 47*4882a593SmuzhiyunThe deduplication limit enforced by ``max_page_sharing`` is required 48*4882a593Smuzhiyunto avoid the virtual memory rmap lists to grow too large. The rmap 49*4882a593Smuzhiyunwalk has O(N) complexity where N is the number of rmap_items 50*4882a593Smuzhiyun(i.e. virtual mappings) that are sharing the page, which is in turn 51*4882a593Smuzhiyuncapped by ``max_page_sharing``. So this effectively spreads the linear 52*4882a593SmuzhiyunO(N) computational complexity from rmap walk context over different 53*4882a593SmuzhiyunKSM pages. The ksmd walk over the stable_node "chains" is also O(N), 54*4882a593Smuzhiyunbut N is the number of stable_node "dups", not the number of 55*4882a593Smuzhiyunrmap_items, so it has not a significant impact on ksmd performance. In 56*4882a593Smuzhiyunpractice the best stable_node "dup" candidate will be kept and found 57*4882a593Smuzhiyunat the head of the "dups" list. 58*4882a593Smuzhiyun 59*4882a593SmuzhiyunHigh values of ``max_page_sharing`` result in faster memory merging 60*4882a593Smuzhiyun(because there will be fewer stable_node dups queued into the 61*4882a593Smuzhiyunstable_node chain->hlist to check for pruning) and higher 62*4882a593Smuzhiyundeduplication factor at the expense of slower worst case for rmap 63*4882a593Smuzhiyunwalks for any KSM page which can happen during swapping, compaction, 64*4882a593SmuzhiyunNUMA balancing and page migration. 65*4882a593Smuzhiyun 66*4882a593SmuzhiyunThe ``stable_node_dups/stable_node_chains`` ratio is also affected by the 67*4882a593Smuzhiyun``max_page_sharing`` tunable, and an high ratio may indicate fragmentation 68*4882a593Smuzhiyunin the stable_node dups, which could be solved by introducing 69*4882a593Smuzhiyunfragmentation algorithms in ksmd which would refile rmap_items from 70*4882a593Smuzhiyunone stable_node dup to another stable_node dup, in order to free up 71*4882a593Smuzhiyunstable_node "dups" with few rmap_items in them, but that may increase 72*4882a593Smuzhiyunthe ksmd CPU usage and possibly slowdown the readonly computations on 73*4882a593Smuzhiyunthe KSM pages of the applications. 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunThe whole list of stable_node "dups" linked in the stable_node 76*4882a593Smuzhiyun"chains" is scanned periodically in order to prune stale stable_nodes. 77*4882a593SmuzhiyunThe frequency of such scans is defined by 78*4882a593Smuzhiyun``stable_node_chains_prune_millisecs`` sysfs tunable. 79*4882a593Smuzhiyun 80*4882a593SmuzhiyunReference 81*4882a593Smuzhiyun--------- 82*4882a593Smuzhiyun.. kernel-doc:: mm/ksm.c 83*4882a593Smuzhiyun :functions: mm_slot ksm_scan stable_node rmap_item 84*4882a593Smuzhiyun 85*4882a593Smuzhiyun-- 86*4882a593SmuzhiyunIzik Eidus, 87*4882a593SmuzhiyunHugh Dickins, 17 Nov 2009 88