1*4882a593Smuzhiyun======================= 2*4882a593SmuzhiyunGeneric Mutex Subsystem 3*4882a593Smuzhiyun======================= 4*4882a593Smuzhiyun 5*4882a593Smuzhiyunstarted by Ingo Molnar <mingo@redhat.com> 6*4882a593Smuzhiyun 7*4882a593Smuzhiyunupdated by Davidlohr Bueso <davidlohr@hp.com> 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunWhat are mutexes? 10*4882a593Smuzhiyun----------------- 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunIn the Linux kernel, mutexes refer to a particular locking primitive 13*4882a593Smuzhiyunthat enforces serialization on shared memory systems, and not only to 14*4882a593Smuzhiyunthe generic term referring to 'mutual exclusion' found in academia 15*4882a593Smuzhiyunor similar theoretical text books. Mutexes are sleeping locks which 16*4882a593Smuzhiyunbehave similarly to binary semaphores, and were introduced in 2006[1] 17*4882a593Smuzhiyunas an alternative to these. This new data structure provided a number 18*4882a593Smuzhiyunof advantages, including simpler interfaces, and at that time smaller 19*4882a593Smuzhiyuncode (see Disadvantages). 20*4882a593Smuzhiyun 21*4882a593Smuzhiyun[1] https://lwn.net/Articles/164802/ 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunImplementation 24*4882a593Smuzhiyun-------------- 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunMutexes are represented by 'struct mutex', defined in include/linux/mutex.h 27*4882a593Smuzhiyunand implemented in kernel/locking/mutex.c. These locks use an atomic variable 28*4882a593Smuzhiyun(->owner) to keep track of the lock state during its lifetime. Field owner 29*4882a593Smuzhiyunactually contains `struct task_struct *` to the current lock owner and it is 30*4882a593Smuzhiyuntherefore NULL if not currently owned. Since task_struct pointers are aligned 31*4882a593Smuzhiyunto at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g., 32*4882a593Smuzhiyunif waiter list is non-empty). In its most basic form it also includes a 33*4882a593Smuzhiyunwait-queue and a spinlock that serializes access to it. Furthermore, 34*4882a593SmuzhiyunCONFIG_MUTEX_SPIN_ON_OWNER=y systems use a spinner MCS lock (->osq), described 35*4882a593Smuzhiyunbelow in (ii). 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunWhen acquiring a mutex, there are three possible paths that can be 38*4882a593Smuzhiyuntaken, depending on the state of the lock: 39*4882a593Smuzhiyun 40*4882a593Smuzhiyun(i) fastpath: tries to atomically acquire the lock by cmpxchg()ing the owner with 41*4882a593Smuzhiyun the current task. This only works in the uncontended case (cmpxchg() checks 42*4882a593Smuzhiyun against 0UL, so all 3 state bits above have to be 0). If the lock is 43*4882a593Smuzhiyun contended it goes to the next possible path. 44*4882a593Smuzhiyun 45*4882a593Smuzhiyun(ii) midpath: aka optimistic spinning, tries to spin for acquisition 46*4882a593Smuzhiyun while the lock owner is running and there are no other tasks ready 47*4882a593Smuzhiyun to run that have higher priority (need_resched). The rationale is 48*4882a593Smuzhiyun that if the lock owner is running, it is likely to release the lock 49*4882a593Smuzhiyun soon. The mutex spinners are queued up using MCS lock so that only 50*4882a593Smuzhiyun one spinner can compete for the mutex. 51*4882a593Smuzhiyun 52*4882a593Smuzhiyun The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock 53*4882a593Smuzhiyun with the desirable properties of being fair and with each cpu trying 54*4882a593Smuzhiyun to acquire the lock spinning on a local variable. It avoids expensive 55*4882a593Smuzhiyun cacheline bouncing that common test-and-set spinlock implementations 56*4882a593Smuzhiyun incur. An MCS-like lock is specially tailored for optimistic spinning 57*4882a593Smuzhiyun for sleeping lock implementation. An important feature of the customized 58*4882a593Smuzhiyun MCS lock is that it has the extra property that spinners are able to exit 59*4882a593Smuzhiyun the MCS spinlock queue when they need to reschedule. This further helps 60*4882a593Smuzhiyun avoid situations where MCS spinners that need to reschedule would continue 61*4882a593Smuzhiyun waiting to spin on mutex owner, only to go directly to slowpath upon 62*4882a593Smuzhiyun obtaining the MCS lock. 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun(iii) slowpath: last resort, if the lock is still unable to be acquired, 66*4882a593Smuzhiyun the task is added to the wait-queue and sleeps until woken up by the 67*4882a593Smuzhiyun unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE. 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunWhile formally kernel mutexes are sleepable locks, it is path (ii) that 70*4882a593Smuzhiyunmakes them more practically a hybrid type. By simply not interrupting a 71*4882a593Smuzhiyuntask and busy-waiting for a few cycles instead of immediately sleeping, 72*4882a593Smuzhiyunthe performance of this lock has been seen to significantly improve a 73*4882a593Smuzhiyunnumber of workloads. Note that this technique is also used for rw-semaphores. 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunSemantics 76*4882a593Smuzhiyun--------- 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunThe mutex subsystem checks and enforces the following rules: 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun - Only one task can hold the mutex at a time. 81*4882a593Smuzhiyun - Only the owner can unlock the mutex. 82*4882a593Smuzhiyun - Multiple unlocks are not permitted. 83*4882a593Smuzhiyun - Recursive locking/unlocking is not permitted. 84*4882a593Smuzhiyun - A mutex must only be initialized via the API (see below). 85*4882a593Smuzhiyun - A task may not exit with a mutex held. 86*4882a593Smuzhiyun - Memory areas where held locks reside must not be freed. 87*4882a593Smuzhiyun - Held mutexes must not be reinitialized. 88*4882a593Smuzhiyun - Mutexes may not be used in hardware or software interrupt 89*4882a593Smuzhiyun contexts such as tasklets and timers. 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunThese semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled. 92*4882a593SmuzhiyunIn addition, the mutex debugging code also implements a number of other 93*4882a593Smuzhiyunfeatures that make lock debugging easier and faster: 94*4882a593Smuzhiyun 95*4882a593Smuzhiyun - Uses symbolic names of mutexes, whenever they are printed 96*4882a593Smuzhiyun in debug output. 97*4882a593Smuzhiyun - Point-of-acquire tracking, symbolic lookup of function names, 98*4882a593Smuzhiyun list of all locks held in the system, printout of them. 99*4882a593Smuzhiyun - Owner tracking. 100*4882a593Smuzhiyun - Detects self-recursing locks and prints out all relevant info. 101*4882a593Smuzhiyun - Detects multi-task circular deadlocks and prints out all affected 102*4882a593Smuzhiyun locks and tasks (and only those tasks). 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun 105*4882a593SmuzhiyunInterfaces 106*4882a593Smuzhiyun---------- 107*4882a593SmuzhiyunStatically define the mutex:: 108*4882a593Smuzhiyun 109*4882a593Smuzhiyun DEFINE_MUTEX(name); 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunDynamically initialize the mutex:: 112*4882a593Smuzhiyun 113*4882a593Smuzhiyun mutex_init(mutex); 114*4882a593Smuzhiyun 115*4882a593SmuzhiyunAcquire the mutex, uninterruptible:: 116*4882a593Smuzhiyun 117*4882a593Smuzhiyun void mutex_lock(struct mutex *lock); 118*4882a593Smuzhiyun void mutex_lock_nested(struct mutex *lock, unsigned int subclass); 119*4882a593Smuzhiyun int mutex_trylock(struct mutex *lock); 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunAcquire the mutex, interruptible:: 122*4882a593Smuzhiyun 123*4882a593Smuzhiyun int mutex_lock_interruptible_nested(struct mutex *lock, 124*4882a593Smuzhiyun unsigned int subclass); 125*4882a593Smuzhiyun int mutex_lock_interruptible(struct mutex *lock); 126*4882a593Smuzhiyun 127*4882a593SmuzhiyunAcquire the mutex, interruptible, if dec to 0:: 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); 130*4882a593Smuzhiyun 131*4882a593SmuzhiyunUnlock the mutex:: 132*4882a593Smuzhiyun 133*4882a593Smuzhiyun void mutex_unlock(struct mutex *lock); 134*4882a593Smuzhiyun 135*4882a593SmuzhiyunTest if the mutex is taken:: 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun int mutex_is_locked(struct mutex *lock); 138*4882a593Smuzhiyun 139*4882a593SmuzhiyunDisadvantages 140*4882a593Smuzhiyun------------- 141*4882a593Smuzhiyun 142*4882a593SmuzhiyunUnlike its original design and purpose, 'struct mutex' is among the largest 143*4882a593Smuzhiyunlocks in the kernel. E.g: on x86-64 it is 32 bytes, where 'struct semaphore' 144*4882a593Smuzhiyunis 24 bytes and rw_semaphore is 40 bytes. Larger structure sizes mean more CPU 145*4882a593Smuzhiyuncache and memory footprint. 146*4882a593Smuzhiyun 147*4882a593SmuzhiyunWhen to use mutexes 148*4882a593Smuzhiyun------------------- 149*4882a593Smuzhiyun 150*4882a593SmuzhiyunUnless the strict semantics of mutexes are unsuitable and/or the critical 151*4882a593Smuzhiyunregion prevents the lock from being shared, always prefer them to any other 152*4882a593Smuzhiyunlocking primitive. 153