1*4882a593Smuzhiyun.. _memory_hotplug: 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun============== 4*4882a593SmuzhiyunMemory hotplug 5*4882a593Smuzhiyun============== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunMemory hotplug event notifier 8*4882a593Smuzhiyun============================= 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunHotplugging events are sent to a notification queue. 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunThere are six types of notification defined in ``include/linux/memory.h``: 13*4882a593Smuzhiyun 14*4882a593SmuzhiyunMEM_GOING_ONLINE 15*4882a593Smuzhiyun Generated before new memory becomes available in order to be able to 16*4882a593Smuzhiyun prepare subsystems to handle memory. The page allocator is still unable 17*4882a593Smuzhiyun to allocate from the new memory. 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunMEM_CANCEL_ONLINE 20*4882a593Smuzhiyun Generated if MEM_GOING_ONLINE fails. 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunMEM_ONLINE 23*4882a593Smuzhiyun Generated when memory has successfully brought online. The callback may 24*4882a593Smuzhiyun allocate pages from the new memory. 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunMEM_GOING_OFFLINE 27*4882a593Smuzhiyun Generated to begin the process of offlining memory. Allocations are no 28*4882a593Smuzhiyun longer possible from the memory but some of the memory to be offlined 29*4882a593Smuzhiyun is still in use. The callback can be used to free memory known to a 30*4882a593Smuzhiyun subsystem from the indicated memory block. 31*4882a593Smuzhiyun 32*4882a593SmuzhiyunMEM_CANCEL_OFFLINE 33*4882a593Smuzhiyun Generated if MEM_GOING_OFFLINE fails. Memory is available again from 34*4882a593Smuzhiyun the memory block that we attempted to offline. 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunMEM_OFFLINE 37*4882a593Smuzhiyun Generated after offlining memory is complete. 38*4882a593Smuzhiyun 39*4882a593SmuzhiyunA callback routine can be registered by calling:: 40*4882a593Smuzhiyun 41*4882a593Smuzhiyun hotplug_memory_notifier(callback_func, priority) 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunCallback functions with higher values of priority are called before callback 44*4882a593Smuzhiyunfunctions with lower values. 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunA callback function must have the following prototype:: 47*4882a593Smuzhiyun 48*4882a593Smuzhiyun int callback_func( 49*4882a593Smuzhiyun struct notifier_block *self, unsigned long action, void *arg); 50*4882a593Smuzhiyun 51*4882a593SmuzhiyunThe first argument of the callback function (self) is a pointer to the block 52*4882a593Smuzhiyunof the notifier chain that points to the callback function itself. 53*4882a593SmuzhiyunThe second argument (action) is one of the event types described above. 54*4882a593SmuzhiyunThe third argument (arg) passes a pointer of struct memory_notify:: 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun struct memory_notify { 57*4882a593Smuzhiyun unsigned long start_pfn; 58*4882a593Smuzhiyun unsigned long nr_pages; 59*4882a593Smuzhiyun int status_change_nid_normal; 60*4882a593Smuzhiyun int status_change_nid_high; 61*4882a593Smuzhiyun int status_change_nid; 62*4882a593Smuzhiyun } 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun- start_pfn is start_pfn of online/offline memory. 65*4882a593Smuzhiyun- nr_pages is # of pages of online/offline memory. 66*4882a593Smuzhiyun- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask 67*4882a593Smuzhiyun is (will be) set/clear, if this is -1, then nodemask status is not changed. 68*4882a593Smuzhiyun- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask 69*4882a593Smuzhiyun is (will be) set/clear, if this is -1, then nodemask status is not changed. 70*4882a593Smuzhiyun- status_change_nid is set node id when N_MEMORY of nodemask is (will be) 71*4882a593Smuzhiyun set/clear. It means a new(memoryless) node gets new memory by online and a 72*4882a593Smuzhiyun node loses all memory. If this is -1, then nodemask status is not changed. 73*4882a593Smuzhiyun 74*4882a593Smuzhiyun If status_changed_nid* >= 0, callback should create/discard structures for the 75*4882a593Smuzhiyun node if necessary. 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunThe callback routine shall return one of the values 78*4882a593SmuzhiyunNOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP 79*4882a593Smuzhiyundefined in ``include/linux/notifier.h`` 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunNOTIFY_DONE and NOTIFY_OK have no effect on the further processing. 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunNOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, 84*4882a593SmuzhiyunMEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops 85*4882a593Smuzhiyunfurther processing of the notification queue. 86*4882a593Smuzhiyun 87*4882a593SmuzhiyunNOTIFY_STOP stops further processing of the notification queue. 88*4882a593Smuzhiyun 89*4882a593SmuzhiyunLocking Internals 90*4882a593Smuzhiyun================= 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunWhen adding/removing memory that uses memory block devices (i.e. ordinary RAM), 93*4882a593Smuzhiyunthe device_hotplug_lock should be held to: 94*4882a593Smuzhiyun 95*4882a593Smuzhiyun- synchronize against online/offline requests (e.g. via sysfs). This way, memory 96*4882a593Smuzhiyun block devices can only be accessed (.online/.state attributes) by user 97*4882a593Smuzhiyun space once memory has been fully added. And when removing memory, we 98*4882a593Smuzhiyun know nobody is in critical sections. 99*4882a593Smuzhiyun- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) 100*4882a593Smuzhiyun 101*4882a593SmuzhiyunEspecially, there is a possible lock inversion that is avoided using 102*4882a593Smuzhiyundevice_hotplug_lock when adding memory and user space tries to online that 103*4882a593Smuzhiyunmemory faster than expected: 104*4882a593Smuzhiyun 105*4882a593Smuzhiyun- device_online() will first take the device_lock(), followed by 106*4882a593Smuzhiyun mem_hotplug_lock 107*4882a593Smuzhiyun- add_memory_resource() will first take the mem_hotplug_lock, followed by 108*4882a593Smuzhiyun the device_lock() (while creating the devices, during bus_add_device()). 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunAs the device is visible to user space before taking the device_lock(), this 111*4882a593Smuzhiyuncan result in a lock inversion. 112*4882a593Smuzhiyun 113*4882a593Smuzhiyunonlining/offlining of memory should be done via device_online()/ 114*4882a593Smuzhiyundevice_offline() - to make sure it is properly synchronized to actions 115*4882a593Smuzhiyunvia sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) 116*4882a593Smuzhiyun 117*4882a593SmuzhiyunWhen adding/removing/onlining/offlining memory or adding/removing 118*4882a593Smuzhiyunheterogeneous/device memory, we should always hold the mem_hotplug_lock in 119*4882a593Smuzhiyunwrite mode to serialise memory hotplug (e.g. access to global/zone 120*4882a593Smuzhiyunvariables). 121*4882a593Smuzhiyun 122*4882a593SmuzhiyunIn addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read 123*4882a593Smuzhiyunmode allows for a quite efficient get_online_mems/put_online_mems 124*4882a593Smuzhiyunimplementation, so code accessing memory can protect from that memory 125*4882a593Smuzhiyunvanishing. 126