xref: /OK3568_Linux_fs/kernel/Documentation/locking/ww-mutex-design.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun======================================
2*4882a593SmuzhiyunWound/Wait Deadlock-Proof Mutex Design
3*4882a593Smuzhiyun======================================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunPlease read mutex-design.txt first, as it applies to wait/wound mutexes too.
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunMotivation for WW-Mutexes
8*4882a593Smuzhiyun-------------------------
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunGPU's do operations that commonly involve many buffers.  Those buffers
11*4882a593Smuzhiyuncan be shared across contexts/processes, exist in different memory
12*4882a593Smuzhiyundomains (for example VRAM vs system memory), and so on.  And with
13*4882a593SmuzhiyunPRIME / dmabuf, they can even be shared across devices.  So there are
14*4882a593Smuzhiyuna handful of situations where the driver needs to wait for buffers to
15*4882a593Smuzhiyunbecome ready.  If you think about this in terms of waiting on a buffer
16*4882a593Smuzhiyunmutex for it to become available, this presents a problem because
17*4882a593Smuzhiyunthere is no way to guarantee that buffers appear in a execbuf/batch in
18*4882a593Smuzhiyunthe same order in all contexts.  That is directly under control of
19*4882a593Smuzhiyunuserspace, and a result of the sequence of GL calls that an application
20*4882a593Smuzhiyunmakes.	Which results in the potential for deadlock.  The problem gets
21*4882a593Smuzhiyunmore complex when you consider that the kernel may need to migrate the
22*4882a593Smuzhiyunbuffer(s) into VRAM before the GPU operates on the buffer(s), which
23*4882a593Smuzhiyunmay in turn require evicting some other buffers (and you don't want to
24*4882a593Smuzhiyunevict other buffers which are already queued up to the GPU), but for a
25*4882a593Smuzhiyunsimplified understanding of the problem you can ignore this.
26*4882a593Smuzhiyun
27*4882a593SmuzhiyunThe algorithm that the TTM graphics subsystem came up with for dealing with
28*4882a593Smuzhiyunthis problem is quite simple.  For each group of buffers (execbuf) that need
29*4882a593Smuzhiyunto be locked, the caller would be assigned a unique reservation id/ticket,
30*4882a593Smuzhiyunfrom a global counter.  In case of deadlock while locking all the buffers
31*4882a593Smuzhiyunassociated with a execbuf, the one with the lowest reservation ticket (i.e.
32*4882a593Smuzhiyunthe oldest task) wins, and the one with the higher reservation id (i.e. the
33*4882a593Smuzhiyunyounger task) unlocks all of the buffers that it has already locked, and then
34*4882a593Smuzhiyuntries again.
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunIn the RDBMS literature, a reservation ticket is associated with a transaction.
37*4882a593Smuzhiyunand the deadlock handling approach is called Wait-Die. The name is based on
38*4882a593Smuzhiyunthe actions of a locking thread when it encounters an already locked mutex.
39*4882a593SmuzhiyunIf the transaction holding the lock is younger, the locking transaction waits.
40*4882a593SmuzhiyunIf the transaction holding the lock is older, the locking transaction backs off
41*4882a593Smuzhiyunand dies. Hence Wait-Die.
42*4882a593SmuzhiyunThere is also another algorithm called Wound-Wait:
43*4882a593SmuzhiyunIf the transaction holding the lock is younger, the locking transaction
44*4882a593Smuzhiyunwounds the transaction holding the lock, requesting it to die.
45*4882a593SmuzhiyunIf the transaction holding the lock is older, it waits for the other
46*4882a593Smuzhiyuntransaction. Hence Wound-Wait.
47*4882a593SmuzhiyunThe two algorithms are both fair in that a transaction will eventually succeed.
48*4882a593SmuzhiyunHowever, the Wound-Wait algorithm is typically stated to generate fewer backoffs
49*4882a593Smuzhiyuncompared to Wait-Die, but is, on the other hand, associated with more work than
50*4882a593SmuzhiyunWait-Die when recovering from a backoff. Wound-Wait is also a preemptive
51*4882a593Smuzhiyunalgorithm in that transactions are wounded by other transactions, and that
52*4882a593Smuzhiyunrequires a reliable way to pick up the wounded condition and preempt the
53*4882a593Smuzhiyunrunning transaction. Note that this is not the same as process preemption. A
54*4882a593SmuzhiyunWound-Wait transaction is considered preempted when it dies (returning
55*4882a593Smuzhiyun-EDEADLK) following a wound.
56*4882a593Smuzhiyun
57*4882a593SmuzhiyunConcepts
58*4882a593Smuzhiyun--------
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunCompared to normal mutexes two additional concepts/objects show up in the lock
61*4882a593Smuzhiyuninterface for w/w mutexes:
62*4882a593Smuzhiyun
63*4882a593SmuzhiyunAcquire context: To ensure eventual forward progress it is important the a task
64*4882a593Smuzhiyuntrying to acquire locks doesn't grab a new reservation id, but keeps the one it
65*4882a593Smuzhiyunacquired when starting the lock acquisition. This ticket is stored in the
66*4882a593Smuzhiyunacquire context. Furthermore the acquire context keeps track of debugging state
67*4882a593Smuzhiyunto catch w/w mutex interface abuse. An acquire context is representing a
68*4882a593Smuzhiyuntransaction.
69*4882a593Smuzhiyun
70*4882a593SmuzhiyunW/w class: In contrast to normal mutexes the lock class needs to be explicit for
71*4882a593Smuzhiyunw/w mutexes, since it is required to initialize the acquire context. The lock
72*4882a593Smuzhiyunclass also specifies what algorithm to use, Wound-Wait or Wait-Die.
73*4882a593Smuzhiyun
74*4882a593SmuzhiyunFurthermore there are three different class of w/w lock acquire functions:
75*4882a593Smuzhiyun
76*4882a593Smuzhiyun* Normal lock acquisition with a context, using ww_mutex_lock.
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun* Slowpath lock acquisition on the contending lock, used by the task that just
79*4882a593Smuzhiyun  killed its transaction after having dropped all already acquired locks.
80*4882a593Smuzhiyun  These functions have the _slow postfix.
81*4882a593Smuzhiyun
82*4882a593Smuzhiyun  From a simple semantics point-of-view the _slow functions are not strictly
83*4882a593Smuzhiyun  required, since simply calling the normal ww_mutex_lock functions on the
84*4882a593Smuzhiyun  contending lock (after having dropped all other already acquired locks) will
85*4882a593Smuzhiyun  work correctly. After all if no other ww mutex has been acquired yet there's
86*4882a593Smuzhiyun  no deadlock potential and hence the ww_mutex_lock call will block and not
87*4882a593Smuzhiyun  prematurely return -EDEADLK. The advantage of the _slow functions is in
88*4882a593Smuzhiyun  interface safety:
89*4882a593Smuzhiyun
90*4882a593Smuzhiyun  - ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow
91*4882a593Smuzhiyun    has a void return type. Note that since ww mutex code needs loops/retries
92*4882a593Smuzhiyun    anyway the __must_check doesn't result in spurious warnings, even though the
93*4882a593Smuzhiyun    very first lock operation can never fail.
94*4882a593Smuzhiyun  - When full debugging is enabled ww_mutex_lock_slow checks that all acquired
95*4882a593Smuzhiyun    ww mutex have been released (preventing deadlocks) and makes sure that we
96*4882a593Smuzhiyun    block on the contending lock (preventing spinning through the -EDEADLK
97*4882a593Smuzhiyun    slowpath until the contended lock can be acquired).
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun* Functions to only acquire a single w/w mutex, which results in the exact same
100*4882a593Smuzhiyun  semantics as a normal mutex. This is done by calling ww_mutex_lock with a NULL
101*4882a593Smuzhiyun  context.
102*4882a593Smuzhiyun
103*4882a593Smuzhiyun  Again this is not strictly required. But often you only want to acquire a
104*4882a593Smuzhiyun  single lock in which case it's pointless to set up an acquire context (and so
105*4882a593Smuzhiyun  better to avoid grabbing a deadlock avoidance ticket).
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunOf course, all the usual variants for handling wake-ups due to signals are also
108*4882a593Smuzhiyunprovided.
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunUsage
111*4882a593Smuzhiyun-----
112*4882a593Smuzhiyun
113*4882a593SmuzhiyunThe algorithm (Wait-Die vs Wound-Wait) is chosen by using either
114*4882a593SmuzhiyunDEFINE_WW_CLASS() (Wound-Wait) or DEFINE_WD_CLASS() (Wait-Die)
115*4882a593SmuzhiyunAs a rough rule of thumb, use Wound-Wait iff you
116*4882a593Smuzhiyunexpect the number of simultaneous competing transactions to be typically small,
117*4882a593Smuzhiyunand you want to reduce the number of rollbacks.
118*4882a593Smuzhiyun
119*4882a593SmuzhiyunThree different ways to acquire locks within the same w/w class. Common
120*4882a593Smuzhiyundefinitions for methods #1 and #2::
121*4882a593Smuzhiyun
122*4882a593Smuzhiyun  static DEFINE_WW_CLASS(ww_class);
123*4882a593Smuzhiyun
124*4882a593Smuzhiyun  struct obj {
125*4882a593Smuzhiyun	struct ww_mutex lock;
126*4882a593Smuzhiyun	/* obj data */
127*4882a593Smuzhiyun  };
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun  struct obj_entry {
130*4882a593Smuzhiyun	struct list_head head;
131*4882a593Smuzhiyun	struct obj *obj;
132*4882a593Smuzhiyun  };
133*4882a593Smuzhiyun
134*4882a593SmuzhiyunMethod 1, using a list in execbuf->buffers that's not allowed to be reordered.
135*4882a593SmuzhiyunThis is useful if a list of required objects is already tracked somewhere.
136*4882a593SmuzhiyunFurthermore the lock helper can use propagate the -EALREADY return code back to
137*4882a593Smuzhiyunthe caller as a signal that an object is twice on the list. This is useful if
138*4882a593Smuzhiyunthe list is constructed from userspace input and the ABI requires userspace to
139*4882a593Smuzhiyunnot have duplicate entries (e.g. for a gpu commandbuffer submission ioctl)::
140*4882a593Smuzhiyun
141*4882a593Smuzhiyun  int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
142*4882a593Smuzhiyun  {
143*4882a593Smuzhiyun	struct obj *res_obj = NULL;
144*4882a593Smuzhiyun	struct obj_entry *contended_entry = NULL;
145*4882a593Smuzhiyun	struct obj_entry *entry;
146*4882a593Smuzhiyun
147*4882a593Smuzhiyun	ww_acquire_init(ctx, &ww_class);
148*4882a593Smuzhiyun
149*4882a593Smuzhiyun  retry:
150*4882a593Smuzhiyun	list_for_each_entry (entry, list, head) {
151*4882a593Smuzhiyun		if (entry->obj == res_obj) {
152*4882a593Smuzhiyun			res_obj = NULL;
153*4882a593Smuzhiyun			continue;
154*4882a593Smuzhiyun		}
155*4882a593Smuzhiyun		ret = ww_mutex_lock(&entry->obj->lock, ctx);
156*4882a593Smuzhiyun		if (ret < 0) {
157*4882a593Smuzhiyun			contended_entry = entry;
158*4882a593Smuzhiyun			goto err;
159*4882a593Smuzhiyun		}
160*4882a593Smuzhiyun	}
161*4882a593Smuzhiyun
162*4882a593Smuzhiyun	ww_acquire_done(ctx);
163*4882a593Smuzhiyun	return 0;
164*4882a593Smuzhiyun
165*4882a593Smuzhiyun  err:
166*4882a593Smuzhiyun	list_for_each_entry_continue_reverse (entry, list, head)
167*4882a593Smuzhiyun		ww_mutex_unlock(&entry->obj->lock);
168*4882a593Smuzhiyun
169*4882a593Smuzhiyun	if (res_obj)
170*4882a593Smuzhiyun		ww_mutex_unlock(&res_obj->lock);
171*4882a593Smuzhiyun
172*4882a593Smuzhiyun	if (ret == -EDEADLK) {
173*4882a593Smuzhiyun		/* we lost out in a seqno race, lock and retry.. */
174*4882a593Smuzhiyun		ww_mutex_lock_slow(&contended_entry->obj->lock, ctx);
175*4882a593Smuzhiyun		res_obj = contended_entry->obj;
176*4882a593Smuzhiyun		goto retry;
177*4882a593Smuzhiyun	}
178*4882a593Smuzhiyun	ww_acquire_fini(ctx);
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun	return ret;
181*4882a593Smuzhiyun  }
182*4882a593Smuzhiyun
183*4882a593SmuzhiyunMethod 2, using a list in execbuf->buffers that can be reordered. Same semantics
184*4882a593Smuzhiyunof duplicate entry detection using -EALREADY as method 1 above. But the
185*4882a593Smuzhiyunlist-reordering allows for a bit more idiomatic code::
186*4882a593Smuzhiyun
187*4882a593Smuzhiyun  int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
188*4882a593Smuzhiyun  {
189*4882a593Smuzhiyun	struct obj_entry *entry, *entry2;
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun	ww_acquire_init(ctx, &ww_class);
192*4882a593Smuzhiyun
193*4882a593Smuzhiyun	list_for_each_entry (entry, list, head) {
194*4882a593Smuzhiyun		ret = ww_mutex_lock(&entry->obj->lock, ctx);
195*4882a593Smuzhiyun		if (ret < 0) {
196*4882a593Smuzhiyun			entry2 = entry;
197*4882a593Smuzhiyun
198*4882a593Smuzhiyun			list_for_each_entry_continue_reverse (entry2, list, head)
199*4882a593Smuzhiyun				ww_mutex_unlock(&entry2->obj->lock);
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun			if (ret != -EDEADLK) {
202*4882a593Smuzhiyun				ww_acquire_fini(ctx);
203*4882a593Smuzhiyun				return ret;
204*4882a593Smuzhiyun			}
205*4882a593Smuzhiyun
206*4882a593Smuzhiyun			/* we lost out in a seqno race, lock and retry.. */
207*4882a593Smuzhiyun			ww_mutex_lock_slow(&entry->obj->lock, ctx);
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun			/*
210*4882a593Smuzhiyun			 * Move buf to head of the list, this will point
211*4882a593Smuzhiyun			 * buf->next to the first unlocked entry,
212*4882a593Smuzhiyun			 * restarting the for loop.
213*4882a593Smuzhiyun			 */
214*4882a593Smuzhiyun			list_del(&entry->head);
215*4882a593Smuzhiyun			list_add(&entry->head, list);
216*4882a593Smuzhiyun		}
217*4882a593Smuzhiyun	}
218*4882a593Smuzhiyun
219*4882a593Smuzhiyun	ww_acquire_done(ctx);
220*4882a593Smuzhiyun	return 0;
221*4882a593Smuzhiyun  }
222*4882a593Smuzhiyun
223*4882a593SmuzhiyunUnlocking works the same way for both methods #1 and #2::
224*4882a593Smuzhiyun
225*4882a593Smuzhiyun  void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
226*4882a593Smuzhiyun  {
227*4882a593Smuzhiyun	struct obj_entry *entry;
228*4882a593Smuzhiyun
229*4882a593Smuzhiyun	list_for_each_entry (entry, list, head)
230*4882a593Smuzhiyun		ww_mutex_unlock(&entry->obj->lock);
231*4882a593Smuzhiyun
232*4882a593Smuzhiyun	ww_acquire_fini(ctx);
233*4882a593Smuzhiyun  }
234*4882a593Smuzhiyun
235*4882a593SmuzhiyunMethod 3 is useful if the list of objects is constructed ad-hoc and not upfront,
236*4882a593Smuzhiyune.g. when adjusting edges in a graph where each node has its own ww_mutex lock,
237*4882a593Smuzhiyunand edges can only be changed when holding the locks of all involved nodes. w/w
238*4882a593Smuzhiyunmutexes are a natural fit for such a case for two reasons:
239*4882a593Smuzhiyun
240*4882a593Smuzhiyun- They can handle lock-acquisition in any order which allows us to start walking
241*4882a593Smuzhiyun  a graph from a starting point and then iteratively discovering new edges and
242*4882a593Smuzhiyun  locking down the nodes those edges connect to.
243*4882a593Smuzhiyun- Due to the -EALREADY return code signalling that a given objects is already
244*4882a593Smuzhiyun  held there's no need for additional book-keeping to break cycles in the graph
245*4882a593Smuzhiyun  or keep track off which looks are already held (when using more than one node
246*4882a593Smuzhiyun  as a starting point).
247*4882a593Smuzhiyun
248*4882a593SmuzhiyunNote that this approach differs in two important ways from the above methods:
249*4882a593Smuzhiyun
250*4882a593Smuzhiyun- Since the list of objects is dynamically constructed (and might very well be
251*4882a593Smuzhiyun  different when retrying due to hitting the -EDEADLK die condition) there's
252*4882a593Smuzhiyun  no need to keep any object on a persistent list when it's not locked. We can
253*4882a593Smuzhiyun  therefore move the list_head into the object itself.
254*4882a593Smuzhiyun- On the other hand the dynamic object list construction also means that the -EALREADY return
255*4882a593Smuzhiyun  code can't be propagated.
256*4882a593Smuzhiyun
257*4882a593SmuzhiyunNote also that methods #1 and #2 and method #3 can be combined, e.g. to first lock a
258*4882a593Smuzhiyunlist of starting nodes (passed in from userspace) using one of the above
259*4882a593Smuzhiyunmethods. And then lock any additional objects affected by the operations using
260*4882a593Smuzhiyunmethod #3 below. The backoff/retry procedure will be a bit more involved, since
261*4882a593Smuzhiyunwhen the dynamic locking step hits -EDEADLK we also need to unlock all the
262*4882a593Smuzhiyunobjects acquired with the fixed list. But the w/w mutex debug checks will catch
263*4882a593Smuzhiyunany interface misuse for these cases.
264*4882a593Smuzhiyun
265*4882a593SmuzhiyunAlso, method 3 can't fail the lock acquisition step since it doesn't return
266*4882a593Smuzhiyun-EALREADY. Of course this would be different when using the _interruptible
267*4882a593Smuzhiyunvariants, but that's outside of the scope of these examples here::
268*4882a593Smuzhiyun
269*4882a593Smuzhiyun  struct obj {
270*4882a593Smuzhiyun	struct ww_mutex ww_mutex;
271*4882a593Smuzhiyun	struct list_head locked_list;
272*4882a593Smuzhiyun  };
273*4882a593Smuzhiyun
274*4882a593Smuzhiyun  static DEFINE_WW_CLASS(ww_class);
275*4882a593Smuzhiyun
276*4882a593Smuzhiyun  void __unlock_objs(struct list_head *list)
277*4882a593Smuzhiyun  {
278*4882a593Smuzhiyun	struct obj *entry, *temp;
279*4882a593Smuzhiyun
280*4882a593Smuzhiyun	list_for_each_entry_safe (entry, temp, list, locked_list) {
281*4882a593Smuzhiyun		/* need to do that before unlocking, since only the current lock holder is
282*4882a593Smuzhiyun		allowed to use object */
283*4882a593Smuzhiyun		list_del(&entry->locked_list);
284*4882a593Smuzhiyun		ww_mutex_unlock(entry->ww_mutex)
285*4882a593Smuzhiyun	}
286*4882a593Smuzhiyun  }
287*4882a593Smuzhiyun
288*4882a593Smuzhiyun  void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
289*4882a593Smuzhiyun  {
290*4882a593Smuzhiyun	struct obj *obj;
291*4882a593Smuzhiyun
292*4882a593Smuzhiyun	ww_acquire_init(ctx, &ww_class);
293*4882a593Smuzhiyun
294*4882a593Smuzhiyun  retry:
295*4882a593Smuzhiyun	/* re-init loop start state */
296*4882a593Smuzhiyun	loop {
297*4882a593Smuzhiyun		/* magic code which walks over a graph and decides which objects
298*4882a593Smuzhiyun		 * to lock */
299*4882a593Smuzhiyun
300*4882a593Smuzhiyun		ret = ww_mutex_lock(obj->ww_mutex, ctx);
301*4882a593Smuzhiyun		if (ret == -EALREADY) {
302*4882a593Smuzhiyun			/* we have that one already, get to the next object */
303*4882a593Smuzhiyun			continue;
304*4882a593Smuzhiyun		}
305*4882a593Smuzhiyun		if (ret == -EDEADLK) {
306*4882a593Smuzhiyun			__unlock_objs(list);
307*4882a593Smuzhiyun
308*4882a593Smuzhiyun			ww_mutex_lock_slow(obj, ctx);
309*4882a593Smuzhiyun			list_add(&entry->locked_list, list);
310*4882a593Smuzhiyun			goto retry;
311*4882a593Smuzhiyun		}
312*4882a593Smuzhiyun
313*4882a593Smuzhiyun		/* locked a new object, add it to the list */
314*4882a593Smuzhiyun		list_add_tail(&entry->locked_list, list);
315*4882a593Smuzhiyun	}
316*4882a593Smuzhiyun
317*4882a593Smuzhiyun	ww_acquire_done(ctx);
318*4882a593Smuzhiyun	return 0;
319*4882a593Smuzhiyun  }
320*4882a593Smuzhiyun
321*4882a593Smuzhiyun  void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
322*4882a593Smuzhiyun  {
323*4882a593Smuzhiyun	__unlock_objs(list);
324*4882a593Smuzhiyun	ww_acquire_fini(ctx);
325*4882a593Smuzhiyun  }
326*4882a593Smuzhiyun
327*4882a593SmuzhiyunMethod 4: Only lock one single objects. In that case deadlock detection and
328*4882a593Smuzhiyunprevention is obviously overkill, since with grabbing just one lock you can't
329*4882a593Smuzhiyunproduce a deadlock within just one class. To simplify this case the w/w mutex
330*4882a593Smuzhiyunapi can be used with a NULL context.
331*4882a593Smuzhiyun
332*4882a593SmuzhiyunImplementation Details
333*4882a593Smuzhiyun----------------------
334*4882a593Smuzhiyun
335*4882a593SmuzhiyunDesign:
336*4882a593Smuzhiyun^^^^^^^
337*4882a593Smuzhiyun
338*4882a593Smuzhiyun  ww_mutex currently encapsulates a struct mutex, this means no extra overhead for
339*4882a593Smuzhiyun  normal mutex locks, which are far more common. As such there is only a small
340*4882a593Smuzhiyun  increase in code size if wait/wound mutexes are not used.
341*4882a593Smuzhiyun
342*4882a593Smuzhiyun  We maintain the following invariants for the wait list:
343*4882a593Smuzhiyun
344*4882a593Smuzhiyun  (1) Waiters with an acquire context are sorted by stamp order; waiters
345*4882a593Smuzhiyun      without an acquire context are interspersed in FIFO order.
346*4882a593Smuzhiyun  (2) For Wait-Die, among waiters with contexts, only the first one can have
347*4882a593Smuzhiyun      other locks acquired already (ctx->acquired > 0). Note that this waiter
348*4882a593Smuzhiyun      may come after other waiters without contexts in the list.
349*4882a593Smuzhiyun
350*4882a593Smuzhiyun  The Wound-Wait preemption is implemented with a lazy-preemption scheme:
351*4882a593Smuzhiyun  The wounded status of the transaction is checked only when there is
352*4882a593Smuzhiyun  contention for a new lock and hence a true chance of deadlock. In that
353*4882a593Smuzhiyun  situation, if the transaction is wounded, it backs off, clears the
354*4882a593Smuzhiyun  wounded status and retries. A great benefit of implementing preemption in
355*4882a593Smuzhiyun  this way is that the wounded transaction can identify a contending lock to
356*4882a593Smuzhiyun  wait for before restarting the transaction. Just blindly restarting the
357*4882a593Smuzhiyun  transaction would likely make the transaction end up in a situation where
358*4882a593Smuzhiyun  it would have to back off again.
359*4882a593Smuzhiyun
360*4882a593Smuzhiyun  In general, not much contention is expected. The locks are typically used to
361*4882a593Smuzhiyun  serialize access to resources for devices, and optimization focus should
362*4882a593Smuzhiyun  therefore be directed towards the uncontended cases.
363*4882a593Smuzhiyun
364*4882a593SmuzhiyunLockdep:
365*4882a593Smuzhiyun^^^^^^^^
366*4882a593Smuzhiyun
367*4882a593Smuzhiyun  Special care has been taken to warn for as many cases of api abuse
368*4882a593Smuzhiyun  as possible. Some common api abuses will be caught with
369*4882a593Smuzhiyun  CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended.
370*4882a593Smuzhiyun
371*4882a593Smuzhiyun  Some of the errors which will be warned about:
372*4882a593Smuzhiyun   - Forgetting to call ww_acquire_fini or ww_acquire_init.
373*4882a593Smuzhiyun   - Attempting to lock more mutexes after ww_acquire_done.
374*4882a593Smuzhiyun   - Attempting to lock the wrong mutex after -EDEADLK and
375*4882a593Smuzhiyun     unlocking all mutexes.
376*4882a593Smuzhiyun   - Attempting to lock the right mutex after -EDEADLK,
377*4882a593Smuzhiyun     before unlocking all mutexes.
378*4882a593Smuzhiyun
379*4882a593Smuzhiyun   - Calling ww_mutex_lock_slow before -EDEADLK was returned.
380*4882a593Smuzhiyun
381*4882a593Smuzhiyun   - Unlocking mutexes with the wrong unlock function.
382*4882a593Smuzhiyun   - Calling one of the ww_acquire_* twice on the same context.
383*4882a593Smuzhiyun   - Using a different ww_class for the mutex than for the ww_acquire_ctx.
384*4882a593Smuzhiyun   - Normal lockdep errors that can result in deadlocks.
385*4882a593Smuzhiyun
386*4882a593Smuzhiyun  Some of the lockdep errors that can result in deadlocks:
387*4882a593Smuzhiyun   - Calling ww_acquire_init to initialize a second ww_acquire_ctx before
388*4882a593Smuzhiyun     having called ww_acquire_fini on the first.
389*4882a593Smuzhiyun   - 'normal' deadlocks that can occur.
390*4882a593Smuzhiyun
391*4882a593SmuzhiyunFIXME:
392*4882a593Smuzhiyun  Update this section once we have the TASK_DEADLOCK task state flag magic
393*4882a593Smuzhiyun  implemented.
394