xref: /OK3568_Linux_fs/kernel/Documentation/core-api/memory-hotplug.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. _memory_hotplug:
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun==============
4*4882a593SmuzhiyunMemory hotplug
5*4882a593Smuzhiyun==============
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunMemory hotplug event notifier
8*4882a593Smuzhiyun=============================
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunHotplugging events are sent to a notification queue.
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunThere are six types of notification defined in ``include/linux/memory.h``:
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunMEM_GOING_ONLINE
15*4882a593Smuzhiyun  Generated before new memory becomes available in order to be able to
16*4882a593Smuzhiyun  prepare subsystems to handle memory. The page allocator is still unable
17*4882a593Smuzhiyun  to allocate from the new memory.
18*4882a593Smuzhiyun
19*4882a593SmuzhiyunMEM_CANCEL_ONLINE
20*4882a593Smuzhiyun  Generated if MEM_GOING_ONLINE fails.
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunMEM_ONLINE
23*4882a593Smuzhiyun  Generated when memory has successfully brought online. The callback may
24*4882a593Smuzhiyun  allocate pages from the new memory.
25*4882a593Smuzhiyun
26*4882a593SmuzhiyunMEM_GOING_OFFLINE
27*4882a593Smuzhiyun  Generated to begin the process of offlining memory. Allocations are no
28*4882a593Smuzhiyun  longer possible from the memory but some of the memory to be offlined
29*4882a593Smuzhiyun  is still in use. The callback can be used to free memory known to a
30*4882a593Smuzhiyun  subsystem from the indicated memory block.
31*4882a593Smuzhiyun
32*4882a593SmuzhiyunMEM_CANCEL_OFFLINE
33*4882a593Smuzhiyun  Generated if MEM_GOING_OFFLINE fails. Memory is available again from
34*4882a593Smuzhiyun  the memory block that we attempted to offline.
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunMEM_OFFLINE
37*4882a593Smuzhiyun  Generated after offlining memory is complete.
38*4882a593Smuzhiyun
39*4882a593SmuzhiyunA callback routine can be registered by calling::
40*4882a593Smuzhiyun
41*4882a593Smuzhiyun  hotplug_memory_notifier(callback_func, priority)
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunCallback functions with higher values of priority are called before callback
44*4882a593Smuzhiyunfunctions with lower values.
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunA callback function must have the following prototype::
47*4882a593Smuzhiyun
48*4882a593Smuzhiyun  int callback_func(
49*4882a593Smuzhiyun    struct notifier_block *self, unsigned long action, void *arg);
50*4882a593Smuzhiyun
51*4882a593SmuzhiyunThe first argument of the callback function (self) is a pointer to the block
52*4882a593Smuzhiyunof the notifier chain that points to the callback function itself.
53*4882a593SmuzhiyunThe second argument (action) is one of the event types described above.
54*4882a593SmuzhiyunThe third argument (arg) passes a pointer of struct memory_notify::
55*4882a593Smuzhiyun
56*4882a593Smuzhiyun	struct memory_notify {
57*4882a593Smuzhiyun		unsigned long start_pfn;
58*4882a593Smuzhiyun		unsigned long nr_pages;
59*4882a593Smuzhiyun		int status_change_nid_normal;
60*4882a593Smuzhiyun		int status_change_nid_high;
61*4882a593Smuzhiyun		int status_change_nid;
62*4882a593Smuzhiyun	}
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun- start_pfn is start_pfn of online/offline memory.
65*4882a593Smuzhiyun- nr_pages is # of pages of online/offline memory.
66*4882a593Smuzhiyun- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
67*4882a593Smuzhiyun  is (will be) set/clear, if this is -1, then nodemask status is not changed.
68*4882a593Smuzhiyun- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
69*4882a593Smuzhiyun  is (will be) set/clear, if this is -1, then nodemask status is not changed.
70*4882a593Smuzhiyun- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
71*4882a593Smuzhiyun  set/clear. It means a new(memoryless) node gets new memory by online and a
72*4882a593Smuzhiyun  node loses all memory. If this is -1, then nodemask status is not changed.
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun  If status_changed_nid* >= 0, callback should create/discard structures for the
75*4882a593Smuzhiyun  node if necessary.
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunThe callback routine shall return one of the values
78*4882a593SmuzhiyunNOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
79*4882a593Smuzhiyundefined in ``include/linux/notifier.h``
80*4882a593Smuzhiyun
81*4882a593SmuzhiyunNOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
82*4882a593Smuzhiyun
83*4882a593SmuzhiyunNOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
84*4882a593SmuzhiyunMEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
85*4882a593Smuzhiyunfurther processing of the notification queue.
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunNOTIFY_STOP stops further processing of the notification queue.
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunLocking Internals
90*4882a593Smuzhiyun=================
91*4882a593Smuzhiyun
92*4882a593SmuzhiyunWhen adding/removing memory that uses memory block devices (i.e. ordinary RAM),
93*4882a593Smuzhiyunthe device_hotplug_lock should be held to:
94*4882a593Smuzhiyun
95*4882a593Smuzhiyun- synchronize against online/offline requests (e.g. via sysfs). This way, memory
96*4882a593Smuzhiyun  block devices can only be accessed (.online/.state attributes) by user
97*4882a593Smuzhiyun  space once memory has been fully added. And when removing memory, we
98*4882a593Smuzhiyun  know nobody is in critical sections.
99*4882a593Smuzhiyun- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
100*4882a593Smuzhiyun
101*4882a593SmuzhiyunEspecially, there is a possible lock inversion that is avoided using
102*4882a593Smuzhiyundevice_hotplug_lock when adding memory and user space tries to online that
103*4882a593Smuzhiyunmemory faster than expected:
104*4882a593Smuzhiyun
105*4882a593Smuzhiyun- device_online() will first take the device_lock(), followed by
106*4882a593Smuzhiyun  mem_hotplug_lock
107*4882a593Smuzhiyun- add_memory_resource() will first take the mem_hotplug_lock, followed by
108*4882a593Smuzhiyun  the device_lock() (while creating the devices, during bus_add_device()).
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunAs the device is visible to user space before taking the device_lock(), this
111*4882a593Smuzhiyuncan result in a lock inversion.
112*4882a593Smuzhiyun
113*4882a593Smuzhiyunonlining/offlining of memory should be done via device_online()/
114*4882a593Smuzhiyundevice_offline() - to make sure it is properly synchronized to actions
115*4882a593Smuzhiyunvia sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
116*4882a593Smuzhiyun
117*4882a593SmuzhiyunWhen adding/removing/onlining/offlining memory or adding/removing
118*4882a593Smuzhiyunheterogeneous/device memory, we should always hold the mem_hotplug_lock in
119*4882a593Smuzhiyunwrite mode to serialise memory hotplug (e.g. access to global/zone
120*4882a593Smuzhiyunvariables).
121*4882a593Smuzhiyun
122*4882a593SmuzhiyunIn addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
123*4882a593Smuzhiyunmode allows for a quite efficient get_online_mems/put_online_mems
124*4882a593Smuzhiyunimplementation, so code accessing memory can protect from that memory
125*4882a593Smuzhiyunvanishing.
126