1*4882a593Smuzhiyun================================================ 2*4882a593SmuzhiyunCompletions - "wait for completion" barrier APIs 3*4882a593Smuzhiyun================================================ 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunIntroduction: 6*4882a593Smuzhiyun------------- 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunIf you have one or more threads that must wait for some kernel activity 9*4882a593Smuzhiyunto have reached a point or a specific state, completions can provide a 10*4882a593Smuzhiyunrace-free solution to this problem. Semantically they are somewhat like a 11*4882a593Smuzhiyunpthread_barrier() and have similar use-cases. 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunCompletions are a code synchronization mechanism which is preferable to any 14*4882a593Smuzhiyunmisuse of locks/semaphores and busy-loops. Any time you think of using 15*4882a593Smuzhiyunyield() or some quirky msleep(1) loop to allow something else to proceed, 16*4882a593Smuzhiyunyou probably want to look into using one of the wait_for_completion*() 17*4882a593Smuzhiyuncalls and complete() instead. 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunThe advantage of using completions is that they have a well defined, focused 20*4882a593Smuzhiyunpurpose which makes it very easy to see the intent of the code, but they 21*4882a593Smuzhiyunalso result in more efficient code as all threads can continue execution 22*4882a593Smuzhiyununtil the result is actually needed, and both the waiting and the signalling 23*4882a593Smuzhiyunis highly efficient using low level scheduler sleep/wakeup facilities. 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunCompletions are built on top of the waitqueue and wakeup infrastructure of 26*4882a593Smuzhiyunthe Linux scheduler. The event the threads on the waitqueue are waiting for 27*4882a593Smuzhiyunis reduced to a simple flag in 'struct completion', appropriately called "done". 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunAs completions are scheduling related, the code can be found in 30*4882a593Smuzhiyunkernel/sched/completion.c. 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunUsage: 34*4882a593Smuzhiyun------ 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunThere are three main parts to using completions: 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun - the initialization of the 'struct completion' synchronization object 39*4882a593Smuzhiyun - the waiting part through a call to one of the variants of wait_for_completion(), 40*4882a593Smuzhiyun - the signaling side through a call to complete() or complete_all(). 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunThere are also some helper functions for checking the state of completions. 43*4882a593SmuzhiyunNote that while initialization must happen first, the waiting and signaling 44*4882a593Smuzhiyunpart can happen in any order. I.e. it's entirely normal for a thread 45*4882a593Smuzhiyunto have marked a completion as 'done' before another thread checks whether 46*4882a593Smuzhiyunit has to wait for it. 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunTo use completions you need to #include <linux/completion.h> and 49*4882a593Smuzhiyuncreate a static or dynamic variable of type 'struct completion', 50*4882a593Smuzhiyunwhich has only two fields:: 51*4882a593Smuzhiyun 52*4882a593Smuzhiyun struct completion { 53*4882a593Smuzhiyun unsigned int done; 54*4882a593Smuzhiyun wait_queue_head_t wait; 55*4882a593Smuzhiyun }; 56*4882a593Smuzhiyun 57*4882a593SmuzhiyunThis provides the ->wait waitqueue to place tasks on for waiting (if any), and 58*4882a593Smuzhiyunthe ->done completion flag for indicating whether it's completed or not. 59*4882a593Smuzhiyun 60*4882a593SmuzhiyunCompletions should be named to refer to the event that is being synchronized on. 61*4882a593SmuzhiyunA good example is:: 62*4882a593Smuzhiyun 63*4882a593Smuzhiyun wait_for_completion(&early_console_added); 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun complete(&early_console_added); 66*4882a593Smuzhiyun 67*4882a593SmuzhiyunGood, intuitive naming (as always) helps code readability. Naming a completion 68*4882a593Smuzhiyun'complete' is not helpful unless the purpose is super obvious... 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun 71*4882a593SmuzhiyunInitializing completions: 72*4882a593Smuzhiyun------------------------- 73*4882a593Smuzhiyun 74*4882a593SmuzhiyunDynamically allocated completion objects should preferably be embedded in data 75*4882a593Smuzhiyunstructures that are assured to be alive for the life-time of the function/driver, 76*4882a593Smuzhiyunto prevent races with asynchronous complete() calls from occurring. 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunParticular care should be taken when using the _timeout() or _killable()/_interruptible() 79*4882a593Smuzhiyunvariants of wait_for_completion(), as it must be assured that memory de-allocation 80*4882a593Smuzhiyundoes not happen until all related activities (complete() or reinit_completion()) 81*4882a593Smuzhiyunhave taken place, even if these wait functions return prematurely due to a timeout 82*4882a593Smuzhiyunor a signal triggering. 83*4882a593Smuzhiyun 84*4882a593SmuzhiyunInitializing of dynamically allocated completion objects is done via a call to 85*4882a593Smuzhiyuninit_completion():: 86*4882a593Smuzhiyun 87*4882a593Smuzhiyun init_completion(&dynamic_object->done); 88*4882a593Smuzhiyun 89*4882a593SmuzhiyunIn this call we initialize the waitqueue and set ->done to 0, i.e. "not completed" 90*4882a593Smuzhiyunor "not done". 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunThe re-initialization function, reinit_completion(), simply resets the 93*4882a593Smuzhiyun->done field to 0 ("not done"), without touching the waitqueue. 94*4882a593SmuzhiyunCallers of this function must make sure that there are no racy 95*4882a593Smuzhiyunwait_for_completion() calls going on in parallel. 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunCalling init_completion() on the same completion object twice is 98*4882a593Smuzhiyunmost likely a bug as it re-initializes the queue to an empty queue and 99*4882a593Smuzhiyunenqueued tasks could get "lost" - use reinit_completion() in that case, 100*4882a593Smuzhiyunbut be aware of other races. 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunFor static declaration and initialization, macros are available. 103*4882a593Smuzhiyun 104*4882a593SmuzhiyunFor static (or global) declarations in file scope you can use 105*4882a593SmuzhiyunDECLARE_COMPLETION():: 106*4882a593Smuzhiyun 107*4882a593Smuzhiyun static DECLARE_COMPLETION(setup_done); 108*4882a593Smuzhiyun DECLARE_COMPLETION(setup_done); 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunNote that in this case the completion is boot time (or module load time) 111*4882a593Smuzhiyuninitialized to 'not done' and doesn't require an init_completion() call. 112*4882a593Smuzhiyun 113*4882a593SmuzhiyunWhen a completion is declared as a local variable within a function, 114*4882a593Smuzhiyunthen the initialization should always use DECLARE_COMPLETION_ONSTACK() 115*4882a593Smuzhiyunexplicitly, not just to make lockdep happy, but also to make it clear 116*4882a593Smuzhiyunthat limited scope had been considered and is intentional:: 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun DECLARE_COMPLETION_ONSTACK(setup_done) 119*4882a593Smuzhiyun 120*4882a593SmuzhiyunNote that when using completion objects as local variables you must be 121*4882a593Smuzhiyunacutely aware of the short life time of the function stack: the function 122*4882a593Smuzhiyunmust not return to a calling context until all activities (such as waiting 123*4882a593Smuzhiyunthreads) have ceased and the completion object is completely unused. 124*4882a593Smuzhiyun 125*4882a593SmuzhiyunTo emphasise this again: in particular when using some of the waiting API variants 126*4882a593Smuzhiyunwith more complex outcomes, such as the timeout or signalling (_timeout(), 127*4882a593Smuzhiyun_killable() and _interruptible()) variants, the wait might complete 128*4882a593Smuzhiyunprematurely while the object might still be in use by another thread - and a return 129*4882a593Smuzhiyunfrom the wait_on_completion*() caller function will deallocate the function 130*4882a593Smuzhiyunstack and cause subtle data corruption if a complete() is done in some 131*4882a593Smuzhiyunother thread. Simple testing might not trigger these kinds of races. 132*4882a593Smuzhiyun 133*4882a593SmuzhiyunIf unsure, use dynamically allocated completion objects, preferably embedded 134*4882a593Smuzhiyunin some other long lived object that has a boringly long life time which 135*4882a593Smuzhiyunexceeds the life time of any helper threads using the completion object, 136*4882a593Smuzhiyunor has a lock or other synchronization mechanism to make sure complete() 137*4882a593Smuzhiyunis not called on a freed object. 138*4882a593Smuzhiyun 139*4882a593SmuzhiyunA naive DECLARE_COMPLETION() on the stack triggers a lockdep warning. 140*4882a593Smuzhiyun 141*4882a593SmuzhiyunWaiting for completions: 142*4882a593Smuzhiyun------------------------ 143*4882a593Smuzhiyun 144*4882a593SmuzhiyunFor a thread to wait for some concurrent activity to finish, it 145*4882a593Smuzhiyuncalls wait_for_completion() on the initialized completion structure:: 146*4882a593Smuzhiyun 147*4882a593Smuzhiyun void wait_for_completion(struct completion *done) 148*4882a593Smuzhiyun 149*4882a593SmuzhiyunA typical usage scenario is:: 150*4882a593Smuzhiyun 151*4882a593Smuzhiyun CPU#1 CPU#2 152*4882a593Smuzhiyun 153*4882a593Smuzhiyun struct completion setup_done; 154*4882a593Smuzhiyun 155*4882a593Smuzhiyun init_completion(&setup_done); 156*4882a593Smuzhiyun initialize_work(...,&setup_done,...); 157*4882a593Smuzhiyun 158*4882a593Smuzhiyun /* run non-dependent code */ /* do setup */ 159*4882a593Smuzhiyun 160*4882a593Smuzhiyun wait_for_completion(&setup_done); complete(setup_done); 161*4882a593Smuzhiyun 162*4882a593SmuzhiyunThis is not implying any particular order between wait_for_completion() and 163*4882a593Smuzhiyunthe call to complete() - if the call to complete() happened before the call 164*4882a593Smuzhiyunto wait_for_completion() then the waiting side simply will continue 165*4882a593Smuzhiyunimmediately as all dependencies are satisfied; if not, it will block until 166*4882a593Smuzhiyuncompletion is signaled by complete(). 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunNote that wait_for_completion() is calling spin_lock_irq()/spin_unlock_irq(), 169*4882a593Smuzhiyunso it can only be called safely when you know that interrupts are enabled. 170*4882a593SmuzhiyunCalling it from IRQs-off atomic contexts will result in hard-to-detect 171*4882a593Smuzhiyunspurious enabling of interrupts. 172*4882a593Smuzhiyun 173*4882a593SmuzhiyunThe default behavior is to wait without a timeout and to mark the task as 174*4882a593Smuzhiyununinterruptible. wait_for_completion() and its variants are only safe 175*4882a593Smuzhiyunin process context (as they can sleep) but not in atomic context, 176*4882a593Smuzhiyuninterrupt context, with disabled IRQs, or preemption is disabled - see also 177*4882a593Smuzhiyuntry_wait_for_completion() below for handling completion in atomic/interrupt 178*4882a593Smuzhiyuncontext. 179*4882a593Smuzhiyun 180*4882a593SmuzhiyunAs all variants of wait_for_completion() can (obviously) block for a long 181*4882a593Smuzhiyuntime depending on the nature of the activity they are waiting for, so in 182*4882a593Smuzhiyunmost cases you probably don't want to call this with held mutexes. 183*4882a593Smuzhiyun 184*4882a593Smuzhiyun 185*4882a593Smuzhiyunwait_for_completion*() variants available: 186*4882a593Smuzhiyun------------------------------------------ 187*4882a593Smuzhiyun 188*4882a593SmuzhiyunThe below variants all return status and this status should be checked in 189*4882a593Smuzhiyunmost(/all) cases - in cases where the status is deliberately not checked you 190*4882a593Smuzhiyunprobably want to make a note explaining this (e.g. see 191*4882a593Smuzhiyunarch/arm/kernel/smp.c:__cpu_up()). 192*4882a593Smuzhiyun 193*4882a593SmuzhiyunA common problem that occurs is to have unclean assignment of return types, 194*4882a593Smuzhiyunso take care to assign return-values to variables of the proper type. 195*4882a593Smuzhiyun 196*4882a593SmuzhiyunChecking for the specific meaning of return values also has been found 197*4882a593Smuzhiyunto be quite inaccurate, e.g. constructs like:: 198*4882a593Smuzhiyun 199*4882a593Smuzhiyun if (!wait_for_completion_interruptible_timeout(...)) 200*4882a593Smuzhiyun 201*4882a593Smuzhiyun... would execute the same code path for successful completion and for the 202*4882a593Smuzhiyuninterrupted case - which is probably not what you want:: 203*4882a593Smuzhiyun 204*4882a593Smuzhiyun int wait_for_completion_interruptible(struct completion *done) 205*4882a593Smuzhiyun 206*4882a593SmuzhiyunThis function marks the task TASK_INTERRUPTIBLE while it is waiting. 207*4882a593SmuzhiyunIf a signal was received while waiting it will return -ERESTARTSYS; 0 otherwise:: 208*4882a593Smuzhiyun 209*4882a593Smuzhiyun unsigned long wait_for_completion_timeout(struct completion *done, unsigned long timeout) 210*4882a593Smuzhiyun 211*4882a593SmuzhiyunThe task is marked as TASK_UNINTERRUPTIBLE and will wait at most 'timeout' 212*4882a593Smuzhiyunjiffies. If a timeout occurs it returns 0, else the remaining time in 213*4882a593Smuzhiyunjiffies (but at least 1). 214*4882a593Smuzhiyun 215*4882a593SmuzhiyunTimeouts are preferably calculated with msecs_to_jiffies() or usecs_to_jiffies(), 216*4882a593Smuzhiyunto make the code largely HZ-invariant. 217*4882a593Smuzhiyun 218*4882a593SmuzhiyunIf the returned timeout value is deliberately ignored a comment should probably explain 219*4882a593Smuzhiyunwhy (e.g. see drivers/mfd/wm8350-core.c wm8350_read_auxadc()):: 220*4882a593Smuzhiyun 221*4882a593Smuzhiyun long wait_for_completion_interruptible_timeout(struct completion *done, unsigned long timeout) 222*4882a593Smuzhiyun 223*4882a593SmuzhiyunThis function passes a timeout in jiffies and marks the task as 224*4882a593SmuzhiyunTASK_INTERRUPTIBLE. If a signal was received it will return -ERESTARTSYS; 225*4882a593Smuzhiyunotherwise it returns 0 if the completion timed out, or the remaining time in 226*4882a593Smuzhiyunjiffies if completion occurred. 227*4882a593Smuzhiyun 228*4882a593SmuzhiyunFurther variants include _killable which uses TASK_KILLABLE as the 229*4882a593Smuzhiyundesignated tasks state and will return -ERESTARTSYS if it is interrupted, 230*4882a593Smuzhiyunor 0 if completion was achieved. There is a _timeout variant as well:: 231*4882a593Smuzhiyun 232*4882a593Smuzhiyun long wait_for_completion_killable(struct completion *done) 233*4882a593Smuzhiyun long wait_for_completion_killable_timeout(struct completion *done, unsigned long timeout) 234*4882a593Smuzhiyun 235*4882a593SmuzhiyunThe _io variants wait_for_completion_io() behave the same as the non-_io 236*4882a593Smuzhiyunvariants, except for accounting waiting time as 'waiting on IO', which has 237*4882a593Smuzhiyunan impact on how the task is accounted in scheduling/IO stats:: 238*4882a593Smuzhiyun 239*4882a593Smuzhiyun void wait_for_completion_io(struct completion *done) 240*4882a593Smuzhiyun unsigned long wait_for_completion_io_timeout(struct completion *done, unsigned long timeout) 241*4882a593Smuzhiyun 242*4882a593Smuzhiyun 243*4882a593SmuzhiyunSignaling completions: 244*4882a593Smuzhiyun---------------------- 245*4882a593Smuzhiyun 246*4882a593SmuzhiyunA thread that wants to signal that the conditions for continuation have been 247*4882a593Smuzhiyunachieved calls complete() to signal exactly one of the waiters that it can 248*4882a593Smuzhiyuncontinue:: 249*4882a593Smuzhiyun 250*4882a593Smuzhiyun void complete(struct completion *done) 251*4882a593Smuzhiyun 252*4882a593Smuzhiyun... or calls complete_all() to signal all current and future waiters:: 253*4882a593Smuzhiyun 254*4882a593Smuzhiyun void complete_all(struct completion *done) 255*4882a593Smuzhiyun 256*4882a593SmuzhiyunThe signaling will work as expected even if completions are signaled before 257*4882a593Smuzhiyuna thread starts waiting. This is achieved by the waiter "consuming" 258*4882a593Smuzhiyun(decrementing) the done field of 'struct completion'. Waiting threads 259*4882a593Smuzhiyunwakeup order is the same in which they were enqueued (FIFO order). 260*4882a593Smuzhiyun 261*4882a593SmuzhiyunIf complete() is called multiple times then this will allow for that number 262*4882a593Smuzhiyunof waiters to continue - each call to complete() will simply increment the 263*4882a593Smuzhiyundone field. Calling complete_all() multiple times is a bug though. Both 264*4882a593Smuzhiyuncomplete() and complete_all() can be called in IRQ/atomic context safely. 265*4882a593Smuzhiyun 266*4882a593SmuzhiyunThere can only be one thread calling complete() or complete_all() on a 267*4882a593Smuzhiyunparticular 'struct completion' at any time - serialized through the wait 268*4882a593Smuzhiyunqueue spinlock. Any such concurrent calls to complete() or complete_all() 269*4882a593Smuzhiyunprobably are a design bug. 270*4882a593Smuzhiyun 271*4882a593SmuzhiyunSignaling completion from IRQ context is fine as it will appropriately 272*4882a593Smuzhiyunlock with spin_lock_irqsave()/spin_unlock_irqrestore() and it will never 273*4882a593Smuzhiyunsleep. 274*4882a593Smuzhiyun 275*4882a593Smuzhiyun 276*4882a593Smuzhiyuntry_wait_for_completion()/completion_done(): 277*4882a593Smuzhiyun-------------------------------------------- 278*4882a593Smuzhiyun 279*4882a593SmuzhiyunThe try_wait_for_completion() function will not put the thread on the wait 280*4882a593Smuzhiyunqueue but rather returns false if it would need to enqueue (block) the thread, 281*4882a593Smuzhiyunelse it consumes one posted completion and returns true:: 282*4882a593Smuzhiyun 283*4882a593Smuzhiyun bool try_wait_for_completion(struct completion *done) 284*4882a593Smuzhiyun 285*4882a593SmuzhiyunFinally, to check the state of a completion without changing it in any way, 286*4882a593Smuzhiyuncall completion_done(), which returns false if there are no posted 287*4882a593Smuzhiyuncompletions that were not yet consumed by waiters (implying that there are 288*4882a593Smuzhiyunwaiters) and true otherwise:: 289*4882a593Smuzhiyun 290*4882a593Smuzhiyun bool completion_done(struct completion *done) 291*4882a593Smuzhiyun 292*4882a593SmuzhiyunBoth try_wait_for_completion() and completion_done() are safe to be called in 293*4882a593SmuzhiyunIRQ or atomic context. 294