xref: /OK3568_Linux_fs/kernel/Documentation/locking/mutex-design.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=======================
2*4882a593SmuzhiyunGeneric Mutex Subsystem
3*4882a593Smuzhiyun=======================
4*4882a593Smuzhiyun
5*4882a593Smuzhiyunstarted by Ingo Molnar <mingo@redhat.com>
6*4882a593Smuzhiyun
7*4882a593Smuzhiyunupdated by Davidlohr Bueso <davidlohr@hp.com>
8*4882a593Smuzhiyun
9*4882a593SmuzhiyunWhat are mutexes?
10*4882a593Smuzhiyun-----------------
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunIn the Linux kernel, mutexes refer to a particular locking primitive
13*4882a593Smuzhiyunthat enforces serialization on shared memory systems, and not only to
14*4882a593Smuzhiyunthe generic term referring to 'mutual exclusion' found in academia
15*4882a593Smuzhiyunor similar theoretical text books. Mutexes are sleeping locks which
16*4882a593Smuzhiyunbehave similarly to binary semaphores, and were introduced in 2006[1]
17*4882a593Smuzhiyunas an alternative to these. This new data structure provided a number
18*4882a593Smuzhiyunof advantages, including simpler interfaces, and at that time smaller
19*4882a593Smuzhiyuncode (see Disadvantages).
20*4882a593Smuzhiyun
21*4882a593Smuzhiyun[1] https://lwn.net/Articles/164802/
22*4882a593Smuzhiyun
23*4882a593SmuzhiyunImplementation
24*4882a593Smuzhiyun--------------
25*4882a593Smuzhiyun
26*4882a593SmuzhiyunMutexes are represented by 'struct mutex', defined in include/linux/mutex.h
27*4882a593Smuzhiyunand implemented in kernel/locking/mutex.c. These locks use an atomic variable
28*4882a593Smuzhiyun(->owner) to keep track of the lock state during its lifetime.  Field owner
29*4882a593Smuzhiyunactually contains `struct task_struct *` to the current lock owner and it is
30*4882a593Smuzhiyuntherefore NULL if not currently owned. Since task_struct pointers are aligned
31*4882a593Smuzhiyunto at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g.,
32*4882a593Smuzhiyunif waiter list is non-empty).  In its most basic form it also includes a
33*4882a593Smuzhiyunwait-queue and a spinlock that serializes access to it. Furthermore,
34*4882a593SmuzhiyunCONFIG_MUTEX_SPIN_ON_OWNER=y systems use a spinner MCS lock (->osq), described
35*4882a593Smuzhiyunbelow in (ii).
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunWhen acquiring a mutex, there are three possible paths that can be
38*4882a593Smuzhiyuntaken, depending on the state of the lock:
39*4882a593Smuzhiyun
40*4882a593Smuzhiyun(i) fastpath: tries to atomically acquire the lock by cmpxchg()ing the owner with
41*4882a593Smuzhiyun    the current task. This only works in the uncontended case (cmpxchg() checks
42*4882a593Smuzhiyun    against 0UL, so all 3 state bits above have to be 0). If the lock is
43*4882a593Smuzhiyun    contended it goes to the next possible path.
44*4882a593Smuzhiyun
45*4882a593Smuzhiyun(ii) midpath: aka optimistic spinning, tries to spin for acquisition
46*4882a593Smuzhiyun     while the lock owner is running and there are no other tasks ready
47*4882a593Smuzhiyun     to run that have higher priority (need_resched). The rationale is
48*4882a593Smuzhiyun     that if the lock owner is running, it is likely to release the lock
49*4882a593Smuzhiyun     soon. The mutex spinners are queued up using MCS lock so that only
50*4882a593Smuzhiyun     one spinner can compete for the mutex.
51*4882a593Smuzhiyun
52*4882a593Smuzhiyun     The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock
53*4882a593Smuzhiyun     with the desirable properties of being fair and with each cpu trying
54*4882a593Smuzhiyun     to acquire the lock spinning on a local variable. It avoids expensive
55*4882a593Smuzhiyun     cacheline bouncing that common test-and-set spinlock implementations
56*4882a593Smuzhiyun     incur. An MCS-like lock is specially tailored for optimistic spinning
57*4882a593Smuzhiyun     for sleeping lock implementation. An important feature of the customized
58*4882a593Smuzhiyun     MCS lock is that it has the extra property that spinners are able to exit
59*4882a593Smuzhiyun     the MCS spinlock queue when they need to reschedule. This further helps
60*4882a593Smuzhiyun     avoid situations where MCS spinners that need to reschedule would continue
61*4882a593Smuzhiyun     waiting to spin on mutex owner, only to go directly to slowpath upon
62*4882a593Smuzhiyun     obtaining the MCS lock.
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun(iii) slowpath: last resort, if the lock is still unable to be acquired,
66*4882a593Smuzhiyun      the task is added to the wait-queue and sleeps until woken up by the
67*4882a593Smuzhiyun      unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE.
68*4882a593Smuzhiyun
69*4882a593SmuzhiyunWhile formally kernel mutexes are sleepable locks, it is path (ii) that
70*4882a593Smuzhiyunmakes them more practically a hybrid type. By simply not interrupting a
71*4882a593Smuzhiyuntask and busy-waiting for a few cycles instead of immediately sleeping,
72*4882a593Smuzhiyunthe performance of this lock has been seen to significantly improve a
73*4882a593Smuzhiyunnumber of workloads. Note that this technique is also used for rw-semaphores.
74*4882a593Smuzhiyun
75*4882a593SmuzhiyunSemantics
76*4882a593Smuzhiyun---------
77*4882a593Smuzhiyun
78*4882a593SmuzhiyunThe mutex subsystem checks and enforces the following rules:
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun    - Only one task can hold the mutex at a time.
81*4882a593Smuzhiyun    - Only the owner can unlock the mutex.
82*4882a593Smuzhiyun    - Multiple unlocks are not permitted.
83*4882a593Smuzhiyun    - Recursive locking/unlocking is not permitted.
84*4882a593Smuzhiyun    - A mutex must only be initialized via the API (see below).
85*4882a593Smuzhiyun    - A task may not exit with a mutex held.
86*4882a593Smuzhiyun    - Memory areas where held locks reside must not be freed.
87*4882a593Smuzhiyun    - Held mutexes must not be reinitialized.
88*4882a593Smuzhiyun    - Mutexes may not be used in hardware or software interrupt
89*4882a593Smuzhiyun      contexts such as tasklets and timers.
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunThese semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled.
92*4882a593SmuzhiyunIn addition, the mutex debugging code also implements a number of other
93*4882a593Smuzhiyunfeatures that make lock debugging easier and faster:
94*4882a593Smuzhiyun
95*4882a593Smuzhiyun    - Uses symbolic names of mutexes, whenever they are printed
96*4882a593Smuzhiyun      in debug output.
97*4882a593Smuzhiyun    - Point-of-acquire tracking, symbolic lookup of function names,
98*4882a593Smuzhiyun      list of all locks held in the system, printout of them.
99*4882a593Smuzhiyun    - Owner tracking.
100*4882a593Smuzhiyun    - Detects self-recursing locks and prints out all relevant info.
101*4882a593Smuzhiyun    - Detects multi-task circular deadlocks and prints out all affected
102*4882a593Smuzhiyun      locks and tasks (and only those tasks).
103*4882a593Smuzhiyun
104*4882a593Smuzhiyun
105*4882a593SmuzhiyunInterfaces
106*4882a593Smuzhiyun----------
107*4882a593SmuzhiyunStatically define the mutex::
108*4882a593Smuzhiyun
109*4882a593Smuzhiyun   DEFINE_MUTEX(name);
110*4882a593Smuzhiyun
111*4882a593SmuzhiyunDynamically initialize the mutex::
112*4882a593Smuzhiyun
113*4882a593Smuzhiyun   mutex_init(mutex);
114*4882a593Smuzhiyun
115*4882a593SmuzhiyunAcquire the mutex, uninterruptible::
116*4882a593Smuzhiyun
117*4882a593Smuzhiyun   void mutex_lock(struct mutex *lock);
118*4882a593Smuzhiyun   void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
119*4882a593Smuzhiyun   int  mutex_trylock(struct mutex *lock);
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunAcquire the mutex, interruptible::
122*4882a593Smuzhiyun
123*4882a593Smuzhiyun   int mutex_lock_interruptible_nested(struct mutex *lock,
124*4882a593Smuzhiyun				       unsigned int subclass);
125*4882a593Smuzhiyun   int mutex_lock_interruptible(struct mutex *lock);
126*4882a593Smuzhiyun
127*4882a593SmuzhiyunAcquire the mutex, interruptible, if dec to 0::
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun   int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
130*4882a593Smuzhiyun
131*4882a593SmuzhiyunUnlock the mutex::
132*4882a593Smuzhiyun
133*4882a593Smuzhiyun   void mutex_unlock(struct mutex *lock);
134*4882a593Smuzhiyun
135*4882a593SmuzhiyunTest if the mutex is taken::
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun   int mutex_is_locked(struct mutex *lock);
138*4882a593Smuzhiyun
139*4882a593SmuzhiyunDisadvantages
140*4882a593Smuzhiyun-------------
141*4882a593Smuzhiyun
142*4882a593SmuzhiyunUnlike its original design and purpose, 'struct mutex' is among the largest
143*4882a593Smuzhiyunlocks in the kernel. E.g: on x86-64 it is 32 bytes, where 'struct semaphore'
144*4882a593Smuzhiyunis 24 bytes and rw_semaphore is 40 bytes. Larger structure sizes mean more CPU
145*4882a593Smuzhiyuncache and memory footprint.
146*4882a593Smuzhiyun
147*4882a593SmuzhiyunWhen to use mutexes
148*4882a593Smuzhiyun-------------------
149*4882a593Smuzhiyun
150*4882a593SmuzhiyunUnless the strict semantics of mutexes are unsuitable and/or the critical
151*4882a593Smuzhiyunregion prevents the lock from being shared, always prefer them to any other
152*4882a593Smuzhiyunlocking primitive.
153