xref: /OK3568_Linux_fs/kernel/Documentation/vm/swap_numa.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. _swap_numa:
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun===========================================
4*4882a593SmuzhiyunAutomatically bind swap device to numa node
5*4882a593Smuzhiyun===========================================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunIf the system has more than one swap device and swap device has the node
8*4882a593Smuzhiyuninformation, we can make use of this information to decide which swap
9*4882a593Smuzhiyundevice to use in get_swap_pages() to get better performance.
10*4882a593Smuzhiyun
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunHow to use this feature
13*4882a593Smuzhiyun=======================
14*4882a593Smuzhiyun
15*4882a593SmuzhiyunSwap device has priority and that decides the order of it to be used. To make
16*4882a593Smuzhiyunuse of automatically binding, there is no need to manipulate priority settings
17*4882a593Smuzhiyunfor swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and
18*4882a593SmuzhiyunswapB, with swapA attached to node 0 and swapB attached to node 1, are going
19*4882a593Smuzhiyunto be swapped on. Simply swapping them on by doing::
20*4882a593Smuzhiyun
21*4882a593Smuzhiyun	# swapon /dev/swapA
22*4882a593Smuzhiyun	# swapon /dev/swapB
23*4882a593Smuzhiyun
24*4882a593SmuzhiyunThen node 0 will use the two swap devices in the order of swapA then swapB and
25*4882a593Smuzhiyunnode 1 will use the two swap devices in the order of swapB then swapA. Note
26*4882a593Smuzhiyunthat the order of them being swapped on doesn't matter.
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunA more complex example on a 4 node machine. Assume 6 swap devices are going to
29*4882a593Smuzhiyunbe swapped on: swapA and swapB are attached to node 0, swapC is attached to
30*4882a593Smuzhiyunnode 1, swapD and swapE are attached to node 2 and swapF is attached to node3.
31*4882a593SmuzhiyunThe way to swap them on is the same as above::
32*4882a593Smuzhiyun
33*4882a593Smuzhiyun	# swapon /dev/swapA
34*4882a593Smuzhiyun	# swapon /dev/swapB
35*4882a593Smuzhiyun	# swapon /dev/swapC
36*4882a593Smuzhiyun	# swapon /dev/swapD
37*4882a593Smuzhiyun	# swapon /dev/swapE
38*4882a593Smuzhiyun	# swapon /dev/swapF
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunThen node 0 will use them in the order of::
41*4882a593Smuzhiyun
42*4882a593Smuzhiyun	swapA/swapB -> swapC -> swapD -> swapE -> swapF
43*4882a593Smuzhiyun
44*4882a593SmuzhiyunswapA and swapB will be used in a round robin mode before any other swap device.
45*4882a593Smuzhiyun
46*4882a593Smuzhiyunnode 1 will use them in the order of::
47*4882a593Smuzhiyun
48*4882a593Smuzhiyun	swapC -> swapA -> swapB -> swapD -> swapE -> swapF
49*4882a593Smuzhiyun
50*4882a593Smuzhiyunnode 2 will use them in the order of::
51*4882a593Smuzhiyun
52*4882a593Smuzhiyun	swapD/swapE -> swapA -> swapB -> swapC -> swapF
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunSimilaly, swapD and swapE will be used in a round robin mode before any
55*4882a593Smuzhiyunother swap devices.
56*4882a593Smuzhiyun
57*4882a593Smuzhiyunnode 3 will use them in the order of::
58*4882a593Smuzhiyun
59*4882a593Smuzhiyun	swapF -> swapA -> swapB -> swapC -> swapD -> swapE
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun
62*4882a593SmuzhiyunImplementation details
63*4882a593Smuzhiyun======================
64*4882a593Smuzhiyun
65*4882a593SmuzhiyunThe current code uses a priority based list, swap_avail_list, to decide
66*4882a593Smuzhiyunwhich swap device to use and if multiple swap devices share the same
67*4882a593Smuzhiyunpriority, they are used round robin. This change here replaces the single
68*4882a593Smuzhiyunglobal swap_avail_list with a per-numa-node list, i.e. for each numa node,
69*4882a593Smuzhiyunit sees its own priority based list of available swap devices. Swap
70*4882a593Smuzhiyundevice's priority can be promoted on its matching node's swap_avail_list.
71*4882a593Smuzhiyun
72*4882a593SmuzhiyunThe current swap device's priority is set as: user can set a >=0 value,
73*4882a593Smuzhiyunor the system will pick one starting from -1 then downwards. The priority
74*4882a593Smuzhiyunvalue in the swap_avail_list is the negated value of the swap device's
75*4882a593Smuzhiyundue to plist being sorted from low to high. The new policy doesn't change
76*4882a593Smuzhiyunthe semantics for priority >=0 cases, the previous starting from -1 then
77*4882a593Smuzhiyundownwards now becomes starting from -2 then downwards and -1 is reserved
78*4882a593Smuzhiyunas the promoted value. So if multiple swap devices are attached to the same
79*4882a593Smuzhiyunnode, they will all be promoted to priority -1 on that node's plist and will
80*4882a593Smuzhiyunbe used round robin before any other swap devices.
81