1*4882a593Smuzhiyun=========================================================================== 2*4882a593SmuzhiyunProper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe 3*4882a593Smuzhiyun=========================================================================== 4*4882a593Smuzhiyun 5*4882a593Smuzhiyun:Author: Robert Love <rml@tech9.net> 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunIntroduction 9*4882a593Smuzhiyun============ 10*4882a593Smuzhiyun 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunA preemptible kernel creates new locking issues. The issues are the same as 13*4882a593Smuzhiyunthose under SMP: concurrency and reentrancy. Thankfully, the Linux preemptible 14*4882a593Smuzhiyunkernel model leverages existing SMP locking mechanisms. Thus, the kernel 15*4882a593Smuzhiyunrequires explicit additional locking for very few additional situations. 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunThis document is for all kernel hackers. Developing code in the kernel 18*4882a593Smuzhiyunrequires protecting these situations. 19*4882a593Smuzhiyun 20*4882a593Smuzhiyun 21*4882a593SmuzhiyunRULE #1: Per-CPU data structures need explicit protection 22*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 23*4882a593Smuzhiyun 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunTwo similar problems arise. An example code snippet:: 26*4882a593Smuzhiyun 27*4882a593Smuzhiyun struct this_needs_locking tux[NR_CPUS]; 28*4882a593Smuzhiyun tux[smp_processor_id()] = some_value; 29*4882a593Smuzhiyun /* task is preempted here... */ 30*4882a593Smuzhiyun something = tux[smp_processor_id()]; 31*4882a593Smuzhiyun 32*4882a593SmuzhiyunFirst, since the data is per-CPU, it may not have explicit SMP locking, but 33*4882a593Smuzhiyunrequire it otherwise. Second, when a preempted task is finally rescheduled, 34*4882a593Smuzhiyunthe previous value of smp_processor_id may not equal the current. You must 35*4882a593Smuzhiyunprotect these situations by disabling preemption around them. 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunYou can also use put_cpu() and get_cpu(), which will disable preemption. 38*4882a593Smuzhiyun 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunRULE #2: CPU state must be protected. 41*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 42*4882a593Smuzhiyun 43*4882a593Smuzhiyun 44*4882a593SmuzhiyunUnder preemption, the state of the CPU must be protected. This is arch- 45*4882a593Smuzhiyundependent, but includes CPU structures and state not preserved over a context 46*4882a593Smuzhiyunswitch. For example, on x86, entering and exiting FPU mode is now a critical 47*4882a593Smuzhiyunsection that must occur while preemption is disabled. Think what would happen 48*4882a593Smuzhiyunif the kernel is executing a floating-point instruction and is then preempted. 49*4882a593SmuzhiyunRemember, the kernel does not save FPU state except for user tasks. Therefore, 50*4882a593Smuzhiyunupon preemption, the FPU registers will be sold to the lowest bidder. Thus, 51*4882a593Smuzhiyunpreemption must be disabled around such regions. 52*4882a593Smuzhiyun 53*4882a593SmuzhiyunNote, some FPU functions are already explicitly preempt safe. For example, 54*4882a593Smuzhiyunkernel_fpu_begin and kernel_fpu_end will disable and enable preemption. 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun 57*4882a593SmuzhiyunRULE #3: Lock acquire and release must be performed by same task 58*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 59*4882a593Smuzhiyun 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunA lock acquired in one task must be released by the same task. This 62*4882a593Smuzhiyunmeans you can't do oddball things like acquire a lock and go off to 63*4882a593Smuzhiyunplay while another task releases it. If you want to do something 64*4882a593Smuzhiyunlike this, acquire and release the task in the same code path and 65*4882a593Smuzhiyunhave the caller wait on an event by the other task. 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun 68*4882a593SmuzhiyunSolution 69*4882a593Smuzhiyun======== 70*4882a593Smuzhiyun 71*4882a593Smuzhiyun 72*4882a593SmuzhiyunData protection under preemption is achieved by disabling preemption for the 73*4882a593Smuzhiyunduration of the critical region. 74*4882a593Smuzhiyun 75*4882a593Smuzhiyun:: 76*4882a593Smuzhiyun 77*4882a593Smuzhiyun preempt_enable() decrement the preempt counter 78*4882a593Smuzhiyun preempt_disable() increment the preempt counter 79*4882a593Smuzhiyun preempt_enable_no_resched() decrement, but do not immediately preempt 80*4882a593Smuzhiyun preempt_check_resched() if needed, reschedule 81*4882a593Smuzhiyun preempt_count() return the preempt counter 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunThe functions are nestable. In other words, you can call preempt_disable 84*4882a593Smuzhiyunn-times in a code path, and preemption will not be reenabled until the n-th 85*4882a593Smuzhiyuncall to preempt_enable. The preempt statements define to nothing if 86*4882a593Smuzhiyunpreemption is not enabled. 87*4882a593Smuzhiyun 88*4882a593SmuzhiyunNote that you do not need to explicitly prevent preemption if you are holding 89*4882a593Smuzhiyunany locks or interrupts are disabled, since preemption is implicitly disabled 90*4882a593Smuzhiyunin those cases. 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunBut keep in mind that 'irqs disabled' is a fundamentally unsafe way of 93*4882a593Smuzhiyundisabling preemption - any cond_resched() or cond_resched_lock() might trigger 94*4882a593Smuzhiyuna reschedule if the preempt count is 0. A simple printk() might trigger a 95*4882a593Smuzhiyunreschedule. So use this implicit preemption-disabling property only if you 96*4882a593Smuzhiyunknow that the affected codepath does not do any of this. Best policy is to use 97*4882a593Smuzhiyunthis only for small, atomic code that you wrote and which calls no complex 98*4882a593Smuzhiyunfunctions. 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunExample:: 101*4882a593Smuzhiyun 102*4882a593Smuzhiyun cpucache_t *cc; /* this is per-CPU */ 103*4882a593Smuzhiyun preempt_disable(); 104*4882a593Smuzhiyun cc = cc_data(searchp); 105*4882a593Smuzhiyun if (cc && cc->avail) { 106*4882a593Smuzhiyun __free_block(searchp, cc_entry(cc), cc->avail); 107*4882a593Smuzhiyun cc->avail = 0; 108*4882a593Smuzhiyun } 109*4882a593Smuzhiyun preempt_enable(); 110*4882a593Smuzhiyun return 0; 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunNotice how the preemption statements must encompass every reference of the 113*4882a593Smuzhiyuncritical variables. Another example:: 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun int buf[NR_CPUS]; 116*4882a593Smuzhiyun set_cpu_val(buf); 117*4882a593Smuzhiyun if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n"); 118*4882a593Smuzhiyun spin_lock(&buf_lock); 119*4882a593Smuzhiyun /* ... */ 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunThis code is not preempt-safe, but see how easily we can fix it by simply 122*4882a593Smuzhiyunmoving the spin_lock up two lines. 123*4882a593Smuzhiyun 124*4882a593Smuzhiyun 125*4882a593SmuzhiyunPreventing preemption using interrupt disabling 126*4882a593Smuzhiyun=============================================== 127*4882a593Smuzhiyun 128*4882a593Smuzhiyun 129*4882a593SmuzhiyunIt is possible to prevent a preemption event using local_irq_disable and 130*4882a593Smuzhiyunlocal_irq_save. Note, when doing so, you must be very careful to not cause 131*4882a593Smuzhiyunan event that would set need_resched and result in a preemption check. When 132*4882a593Smuzhiyunin doubt, rely on locking or explicit preemption disabling. 133*4882a593Smuzhiyun 134*4882a593SmuzhiyunNote in 2.5 interrupt disabling is now only per-CPU (e.g. local). 135*4882a593Smuzhiyun 136*4882a593SmuzhiyunAn additional concern is proper usage of local_irq_disable and local_irq_save. 137*4882a593SmuzhiyunThese may be used to protect from preemption, however, on exit, if preemption 138*4882a593Smuzhiyunmay be enabled, a test to see if preemption is required should be done. If 139*4882a593Smuzhiyunthese are called from the spin_lock and read/write lock macros, the right thing 140*4882a593Smuzhiyunis done. They may also be called within a spin-lock protected region, however, 141*4882a593Smuzhiyunif they are ever called outside of this context, a test for preemption should 142*4882a593Smuzhiyunbe made. Do note that calls from interrupt context or bottom half/ tasklets 143*4882a593Smuzhiyunare also protected by preemption locks and so may use the versions which do 144*4882a593Smuzhiyunnot check preemption. 145