xref: /OK3568_Linux_fs/kernel/Documentation/scheduler/completion.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun================================================
2*4882a593SmuzhiyunCompletions - "wait for completion" barrier APIs
3*4882a593Smuzhiyun================================================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunIntroduction:
6*4882a593Smuzhiyun-------------
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunIf you have one or more threads that must wait for some kernel activity
9*4882a593Smuzhiyunto have reached a point or a specific state, completions can provide a
10*4882a593Smuzhiyunrace-free solution to this problem. Semantically they are somewhat like a
11*4882a593Smuzhiyunpthread_barrier() and have similar use-cases.
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunCompletions are a code synchronization mechanism which is preferable to any
14*4882a593Smuzhiyunmisuse of locks/semaphores and busy-loops. Any time you think of using
15*4882a593Smuzhiyunyield() or some quirky msleep(1) loop to allow something else to proceed,
16*4882a593Smuzhiyunyou probably want to look into using one of the wait_for_completion*()
17*4882a593Smuzhiyuncalls and complete() instead.
18*4882a593Smuzhiyun
19*4882a593SmuzhiyunThe advantage of using completions is that they have a well defined, focused
20*4882a593Smuzhiyunpurpose which makes it very easy to see the intent of the code, but they
21*4882a593Smuzhiyunalso result in more efficient code as all threads can continue execution
22*4882a593Smuzhiyununtil the result is actually needed, and both the waiting and the signalling
23*4882a593Smuzhiyunis highly efficient using low level scheduler sleep/wakeup facilities.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunCompletions are built on top of the waitqueue and wakeup infrastructure of
26*4882a593Smuzhiyunthe Linux scheduler. The event the threads on the waitqueue are waiting for
27*4882a593Smuzhiyunis reduced to a simple flag in 'struct completion', appropriately called "done".
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunAs completions are scheduling related, the code can be found in
30*4882a593Smuzhiyunkernel/sched/completion.c.
31*4882a593Smuzhiyun
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunUsage:
34*4882a593Smuzhiyun------
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunThere are three main parts to using completions:
37*4882a593Smuzhiyun
38*4882a593Smuzhiyun - the initialization of the 'struct completion' synchronization object
39*4882a593Smuzhiyun - the waiting part through a call to one of the variants of wait_for_completion(),
40*4882a593Smuzhiyun - the signaling side through a call to complete() or complete_all().
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunThere are also some helper functions for checking the state of completions.
43*4882a593SmuzhiyunNote that while initialization must happen first, the waiting and signaling
44*4882a593Smuzhiyunpart can happen in any order. I.e. it's entirely normal for a thread
45*4882a593Smuzhiyunto have marked a completion as 'done' before another thread checks whether
46*4882a593Smuzhiyunit has to wait for it.
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunTo use completions you need to #include <linux/completion.h> and
49*4882a593Smuzhiyuncreate a static or dynamic variable of type 'struct completion',
50*4882a593Smuzhiyunwhich has only two fields::
51*4882a593Smuzhiyun
52*4882a593Smuzhiyun	struct completion {
53*4882a593Smuzhiyun		unsigned int done;
54*4882a593Smuzhiyun		wait_queue_head_t wait;
55*4882a593Smuzhiyun	};
56*4882a593Smuzhiyun
57*4882a593SmuzhiyunThis provides the ->wait waitqueue to place tasks on for waiting (if any), and
58*4882a593Smuzhiyunthe ->done completion flag for indicating whether it's completed or not.
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunCompletions should be named to refer to the event that is being synchronized on.
61*4882a593SmuzhiyunA good example is::
62*4882a593Smuzhiyun
63*4882a593Smuzhiyun	wait_for_completion(&early_console_added);
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun	complete(&early_console_added);
66*4882a593Smuzhiyun
67*4882a593SmuzhiyunGood, intuitive naming (as always) helps code readability. Naming a completion
68*4882a593Smuzhiyun'complete' is not helpful unless the purpose is super obvious...
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun
71*4882a593SmuzhiyunInitializing completions:
72*4882a593Smuzhiyun-------------------------
73*4882a593Smuzhiyun
74*4882a593SmuzhiyunDynamically allocated completion objects should preferably be embedded in data
75*4882a593Smuzhiyunstructures that are assured to be alive for the life-time of the function/driver,
76*4882a593Smuzhiyunto prevent races with asynchronous complete() calls from occurring.
77*4882a593Smuzhiyun
78*4882a593SmuzhiyunParticular care should be taken when using the _timeout() or _killable()/_interruptible()
79*4882a593Smuzhiyunvariants of wait_for_completion(), as it must be assured that memory de-allocation
80*4882a593Smuzhiyundoes not happen until all related activities (complete() or reinit_completion())
81*4882a593Smuzhiyunhave taken place, even if these wait functions return prematurely due to a timeout
82*4882a593Smuzhiyunor a signal triggering.
83*4882a593Smuzhiyun
84*4882a593SmuzhiyunInitializing of dynamically allocated completion objects is done via a call to
85*4882a593Smuzhiyuninit_completion()::
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun	init_completion(&dynamic_object->done);
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunIn this call we initialize the waitqueue and set ->done to 0, i.e. "not completed"
90*4882a593Smuzhiyunor "not done".
91*4882a593Smuzhiyun
92*4882a593SmuzhiyunThe re-initialization function, reinit_completion(), simply resets the
93*4882a593Smuzhiyun->done field to 0 ("not done"), without touching the waitqueue.
94*4882a593SmuzhiyunCallers of this function must make sure that there are no racy
95*4882a593Smuzhiyunwait_for_completion() calls going on in parallel.
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunCalling init_completion() on the same completion object twice is
98*4882a593Smuzhiyunmost likely a bug as it re-initializes the queue to an empty queue and
99*4882a593Smuzhiyunenqueued tasks could get "lost" - use reinit_completion() in that case,
100*4882a593Smuzhiyunbut be aware of other races.
101*4882a593Smuzhiyun
102*4882a593SmuzhiyunFor static declaration and initialization, macros are available.
103*4882a593Smuzhiyun
104*4882a593SmuzhiyunFor static (or global) declarations in file scope you can use
105*4882a593SmuzhiyunDECLARE_COMPLETION()::
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun	static DECLARE_COMPLETION(setup_done);
108*4882a593Smuzhiyun	DECLARE_COMPLETION(setup_done);
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunNote that in this case the completion is boot time (or module load time)
111*4882a593Smuzhiyuninitialized to 'not done' and doesn't require an init_completion() call.
112*4882a593Smuzhiyun
113*4882a593SmuzhiyunWhen a completion is declared as a local variable within a function,
114*4882a593Smuzhiyunthen the initialization should always use DECLARE_COMPLETION_ONSTACK()
115*4882a593Smuzhiyunexplicitly, not just to make lockdep happy, but also to make it clear
116*4882a593Smuzhiyunthat limited scope had been considered and is intentional::
117*4882a593Smuzhiyun
118*4882a593Smuzhiyun	DECLARE_COMPLETION_ONSTACK(setup_done)
119*4882a593Smuzhiyun
120*4882a593SmuzhiyunNote that when using completion objects as local variables you must be
121*4882a593Smuzhiyunacutely aware of the short life time of the function stack: the function
122*4882a593Smuzhiyunmust not return to a calling context until all activities (such as waiting
123*4882a593Smuzhiyunthreads) have ceased and the completion object is completely unused.
124*4882a593Smuzhiyun
125*4882a593SmuzhiyunTo emphasise this again: in particular when using some of the waiting API variants
126*4882a593Smuzhiyunwith more complex outcomes, such as the timeout or signalling (_timeout(),
127*4882a593Smuzhiyun_killable() and _interruptible()) variants, the wait might complete
128*4882a593Smuzhiyunprematurely while the object might still be in use by another thread - and a return
129*4882a593Smuzhiyunfrom the wait_on_completion*() caller function will deallocate the function
130*4882a593Smuzhiyunstack and cause subtle data corruption if a complete() is done in some
131*4882a593Smuzhiyunother thread. Simple testing might not trigger these kinds of races.
132*4882a593Smuzhiyun
133*4882a593SmuzhiyunIf unsure, use dynamically allocated completion objects, preferably embedded
134*4882a593Smuzhiyunin some other long lived object that has a boringly long life time which
135*4882a593Smuzhiyunexceeds the life time of any helper threads using the completion object,
136*4882a593Smuzhiyunor has a lock or other synchronization mechanism to make sure complete()
137*4882a593Smuzhiyunis not called on a freed object.
138*4882a593Smuzhiyun
139*4882a593SmuzhiyunA naive DECLARE_COMPLETION() on the stack triggers a lockdep warning.
140*4882a593Smuzhiyun
141*4882a593SmuzhiyunWaiting for completions:
142*4882a593Smuzhiyun------------------------
143*4882a593Smuzhiyun
144*4882a593SmuzhiyunFor a thread to wait for some concurrent activity to finish, it
145*4882a593Smuzhiyuncalls wait_for_completion() on the initialized completion structure::
146*4882a593Smuzhiyun
147*4882a593Smuzhiyun	void wait_for_completion(struct completion *done)
148*4882a593Smuzhiyun
149*4882a593SmuzhiyunA typical usage scenario is::
150*4882a593Smuzhiyun
151*4882a593Smuzhiyun	CPU#1					CPU#2
152*4882a593Smuzhiyun
153*4882a593Smuzhiyun	struct completion setup_done;
154*4882a593Smuzhiyun
155*4882a593Smuzhiyun	init_completion(&setup_done);
156*4882a593Smuzhiyun	initialize_work(...,&setup_done,...);
157*4882a593Smuzhiyun
158*4882a593Smuzhiyun	/* run non-dependent code */		/* do setup */
159*4882a593Smuzhiyun
160*4882a593Smuzhiyun	wait_for_completion(&setup_done);	complete(setup_done);
161*4882a593Smuzhiyun
162*4882a593SmuzhiyunThis is not implying any particular order between wait_for_completion() and
163*4882a593Smuzhiyunthe call to complete() - if the call to complete() happened before the call
164*4882a593Smuzhiyunto wait_for_completion() then the waiting side simply will continue
165*4882a593Smuzhiyunimmediately as all dependencies are satisfied; if not, it will block until
166*4882a593Smuzhiyuncompletion is signaled by complete().
167*4882a593Smuzhiyun
168*4882a593SmuzhiyunNote that wait_for_completion() is calling spin_lock_irq()/spin_unlock_irq(),
169*4882a593Smuzhiyunso it can only be called safely when you know that interrupts are enabled.
170*4882a593SmuzhiyunCalling it from IRQs-off atomic contexts will result in hard-to-detect
171*4882a593Smuzhiyunspurious enabling of interrupts.
172*4882a593Smuzhiyun
173*4882a593SmuzhiyunThe default behavior is to wait without a timeout and to mark the task as
174*4882a593Smuzhiyununinterruptible. wait_for_completion() and its variants are only safe
175*4882a593Smuzhiyunin process context (as they can sleep) but not in atomic context,
176*4882a593Smuzhiyuninterrupt context, with disabled IRQs, or preemption is disabled - see also
177*4882a593Smuzhiyuntry_wait_for_completion() below for handling completion in atomic/interrupt
178*4882a593Smuzhiyuncontext.
179*4882a593Smuzhiyun
180*4882a593SmuzhiyunAs all variants of wait_for_completion() can (obviously) block for a long
181*4882a593Smuzhiyuntime depending on the nature of the activity they are waiting for, so in
182*4882a593Smuzhiyunmost cases you probably don't want to call this with held mutexes.
183*4882a593Smuzhiyun
184*4882a593Smuzhiyun
185*4882a593Smuzhiyunwait_for_completion*() variants available:
186*4882a593Smuzhiyun------------------------------------------
187*4882a593Smuzhiyun
188*4882a593SmuzhiyunThe below variants all return status and this status should be checked in
189*4882a593Smuzhiyunmost(/all) cases - in cases where the status is deliberately not checked you
190*4882a593Smuzhiyunprobably want to make a note explaining this (e.g. see
191*4882a593Smuzhiyunarch/arm/kernel/smp.c:__cpu_up()).
192*4882a593Smuzhiyun
193*4882a593SmuzhiyunA common problem that occurs is to have unclean assignment of return types,
194*4882a593Smuzhiyunso take care to assign return-values to variables of the proper type.
195*4882a593Smuzhiyun
196*4882a593SmuzhiyunChecking for the specific meaning of return values also has been found
197*4882a593Smuzhiyunto be quite inaccurate, e.g. constructs like::
198*4882a593Smuzhiyun
199*4882a593Smuzhiyun	if (!wait_for_completion_interruptible_timeout(...))
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun... would execute the same code path for successful completion and for the
202*4882a593Smuzhiyuninterrupted case - which is probably not what you want::
203*4882a593Smuzhiyun
204*4882a593Smuzhiyun	int wait_for_completion_interruptible(struct completion *done)
205*4882a593Smuzhiyun
206*4882a593SmuzhiyunThis function marks the task TASK_INTERRUPTIBLE while it is waiting.
207*4882a593SmuzhiyunIf a signal was received while waiting it will return -ERESTARTSYS; 0 otherwise::
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun	unsigned long wait_for_completion_timeout(struct completion *done, unsigned long timeout)
210*4882a593Smuzhiyun
211*4882a593SmuzhiyunThe task is marked as TASK_UNINTERRUPTIBLE and will wait at most 'timeout'
212*4882a593Smuzhiyunjiffies. If a timeout occurs it returns 0, else the remaining time in
213*4882a593Smuzhiyunjiffies (but at least 1).
214*4882a593Smuzhiyun
215*4882a593SmuzhiyunTimeouts are preferably calculated with msecs_to_jiffies() or usecs_to_jiffies(),
216*4882a593Smuzhiyunto make the code largely HZ-invariant.
217*4882a593Smuzhiyun
218*4882a593SmuzhiyunIf the returned timeout value is deliberately ignored a comment should probably explain
219*4882a593Smuzhiyunwhy (e.g. see drivers/mfd/wm8350-core.c wm8350_read_auxadc())::
220*4882a593Smuzhiyun
221*4882a593Smuzhiyun	long wait_for_completion_interruptible_timeout(struct completion *done, unsigned long timeout)
222*4882a593Smuzhiyun
223*4882a593SmuzhiyunThis function passes a timeout in jiffies and marks the task as
224*4882a593SmuzhiyunTASK_INTERRUPTIBLE. If a signal was received it will return -ERESTARTSYS;
225*4882a593Smuzhiyunotherwise it returns 0 if the completion timed out, or the remaining time in
226*4882a593Smuzhiyunjiffies if completion occurred.
227*4882a593Smuzhiyun
228*4882a593SmuzhiyunFurther variants include _killable which uses TASK_KILLABLE as the
229*4882a593Smuzhiyundesignated tasks state and will return -ERESTARTSYS if it is interrupted,
230*4882a593Smuzhiyunor 0 if completion was achieved.  There is a _timeout variant as well::
231*4882a593Smuzhiyun
232*4882a593Smuzhiyun	long wait_for_completion_killable(struct completion *done)
233*4882a593Smuzhiyun	long wait_for_completion_killable_timeout(struct completion *done, unsigned long timeout)
234*4882a593Smuzhiyun
235*4882a593SmuzhiyunThe _io variants wait_for_completion_io() behave the same as the non-_io
236*4882a593Smuzhiyunvariants, except for accounting waiting time as 'waiting on IO', which has
237*4882a593Smuzhiyunan impact on how the task is accounted in scheduling/IO stats::
238*4882a593Smuzhiyun
239*4882a593Smuzhiyun	void wait_for_completion_io(struct completion *done)
240*4882a593Smuzhiyun	unsigned long wait_for_completion_io_timeout(struct completion *done, unsigned long timeout)
241*4882a593Smuzhiyun
242*4882a593Smuzhiyun
243*4882a593SmuzhiyunSignaling completions:
244*4882a593Smuzhiyun----------------------
245*4882a593Smuzhiyun
246*4882a593SmuzhiyunA thread that wants to signal that the conditions for continuation have been
247*4882a593Smuzhiyunachieved calls complete() to signal exactly one of the waiters that it can
248*4882a593Smuzhiyuncontinue::
249*4882a593Smuzhiyun
250*4882a593Smuzhiyun	void complete(struct completion *done)
251*4882a593Smuzhiyun
252*4882a593Smuzhiyun... or calls complete_all() to signal all current and future waiters::
253*4882a593Smuzhiyun
254*4882a593Smuzhiyun	void complete_all(struct completion *done)
255*4882a593Smuzhiyun
256*4882a593SmuzhiyunThe signaling will work as expected even if completions are signaled before
257*4882a593Smuzhiyuna thread starts waiting. This is achieved by the waiter "consuming"
258*4882a593Smuzhiyun(decrementing) the done field of 'struct completion'. Waiting threads
259*4882a593Smuzhiyunwakeup order is the same in which they were enqueued (FIFO order).
260*4882a593Smuzhiyun
261*4882a593SmuzhiyunIf complete() is called multiple times then this will allow for that number
262*4882a593Smuzhiyunof waiters to continue - each call to complete() will simply increment the
263*4882a593Smuzhiyundone field. Calling complete_all() multiple times is a bug though. Both
264*4882a593Smuzhiyuncomplete() and complete_all() can be called in IRQ/atomic context safely.
265*4882a593Smuzhiyun
266*4882a593SmuzhiyunThere can only be one thread calling complete() or complete_all() on a
267*4882a593Smuzhiyunparticular 'struct completion' at any time - serialized through the wait
268*4882a593Smuzhiyunqueue spinlock. Any such concurrent calls to complete() or complete_all()
269*4882a593Smuzhiyunprobably are a design bug.
270*4882a593Smuzhiyun
271*4882a593SmuzhiyunSignaling completion from IRQ context is fine as it will appropriately
272*4882a593Smuzhiyunlock with spin_lock_irqsave()/spin_unlock_irqrestore() and it will never
273*4882a593Smuzhiyunsleep.
274*4882a593Smuzhiyun
275*4882a593Smuzhiyun
276*4882a593Smuzhiyuntry_wait_for_completion()/completion_done():
277*4882a593Smuzhiyun--------------------------------------------
278*4882a593Smuzhiyun
279*4882a593SmuzhiyunThe try_wait_for_completion() function will not put the thread on the wait
280*4882a593Smuzhiyunqueue but rather returns false if it would need to enqueue (block) the thread,
281*4882a593Smuzhiyunelse it consumes one posted completion and returns true::
282*4882a593Smuzhiyun
283*4882a593Smuzhiyun	bool try_wait_for_completion(struct completion *done)
284*4882a593Smuzhiyun
285*4882a593SmuzhiyunFinally, to check the state of a completion without changing it in any way,
286*4882a593Smuzhiyuncall completion_done(), which returns false if there are no posted
287*4882a593Smuzhiyuncompletions that were not yet consumed by waiters (implying that there are
288*4882a593Smuzhiyunwaiters) and true otherwise::
289*4882a593Smuzhiyun
290*4882a593Smuzhiyun	bool completion_done(struct completion *done)
291*4882a593Smuzhiyun
292*4882a593SmuzhiyunBoth try_wait_for_completion() and completion_done() are safe to be called in
293*4882a593SmuzhiyunIRQ or atomic context.
294