Documentation/RCU/checklist.rst

*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
*4882a593Smuzhiyun
*4882a593Smuzhiyun================================
*4882a593SmuzhiyunReview Checklist for RCU Patches
*4882a593Smuzhiyun================================
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis document contains a checklist for producing and reviewing patches
*4882a593Smuzhiyunthat make use of RCU.  Violating any of the rules listed below will
*4882a593Smuzhiyunresult in the same sorts of problems that leaving out a locking primitive
*4882a593Smuzhiyunwould cause.  This list is based on experiences reviewing such patches
*4882a593Smuzhiyunover a rather long period of time, but improvements are always welcome!
*4882a593Smuzhiyun
*4882a593Smuzhiyun0.	Is RCU being applied to a read-mostly situation?  If the data
*4882a593Smuzhiyun	structure is updated more than about 10% of the time, then you
*4882a593Smuzhiyun	should strongly consider some other approach, unless detailed
*4882a593Smuzhiyun	performance measurements show that RCU is nonetheless the right
*4882a593Smuzhiyun	tool for the job.  Yes, RCU does reduce read-side overhead by
*4882a593Smuzhiyun	increasing write-side overhead, which is exactly why normal uses
*4882a593Smuzhiyun	of RCU will do much more reading than updating.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Another exception is where performance is not an issue, and RCU
*4882a593Smuzhiyun	provides a simpler implementation.  An example of this situation
*4882a593Smuzhiyun	is the dynamic NMI code in the Linux 2.6 kernel, at least on
*4882a593Smuzhiyun	architectures where NMIs are rare.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Yet another exception is where the low real-time latency of RCU's
*4882a593Smuzhiyun	read-side primitives is critically important.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	One final exception is where RCU readers are used to prevent
*4882a593Smuzhiyun	the ABA problem (https://en.wikipedia.org/wiki/ABA_problem)
*4882a593Smuzhiyun	for lockless updates.  This does result in the mildly
*4882a593Smuzhiyun	counter-intuitive situation where rcu_read_lock() and
*4882a593Smuzhiyun	rcu_read_unlock() are used to protect updates, however, this
*4882a593Smuzhiyun	approach provides the same potential simplifications that garbage
*4882a593Smuzhiyun	collectors do.
*4882a593Smuzhiyun
*4882a593Smuzhiyun1.	Does the update code have proper mutual exclusion?
*4882a593Smuzhiyun
*4882a593Smuzhiyun	RCU does allow -readers- to run (almost) naked, but -writers- must
*4882a593Smuzhiyun	still use some sort of mutual exclusion, such as:
*4882a593Smuzhiyun
*4882a593Smuzhiyun	a.	locking,
*4882a593Smuzhiyun	b.	atomic operations, or
*4882a593Smuzhiyun	c.	restricting updates to a single task.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	If you choose #b, be prepared to describe how you have handled
*4882a593Smuzhiyun	memory barriers on weakly ordered machines (pretty much all of
*4882a593Smuzhiyun	them -- even x86 allows later loads to be reordered to precede
*4882a593Smuzhiyun	earlier stores), and be prepared to explain why this added
*4882a593Smuzhiyun	complexity is worthwhile.  If you choose #c, be prepared to
*4882a593Smuzhiyun	explain how this single task does not become a major bottleneck on
*4882a593Smuzhiyun	big multiprocessor machines (for example, if the task is updating
*4882a593Smuzhiyun	information relating to itself that other tasks can read, there
*4882a593Smuzhiyun	by definition can be no bottleneck).  Note that the definition
*4882a593Smuzhiyun	of "large" has changed significantly:  Eight CPUs was "large"
*4882a593Smuzhiyun	in the year 2000, but a hundred CPUs was unremarkable in 2017.
*4882a593Smuzhiyun
*4882a593Smuzhiyun2.	Do the RCU read-side critical sections make proper use of
*4882a593Smuzhiyun	rcu_read_lock() and friends?  These primitives are needed
*4882a593Smuzhiyun	to prevent grace periods from ending prematurely, which
*4882a593Smuzhiyun	could result in data being unceremoniously freed out from
*4882a593Smuzhiyun	under your read-side code, which can greatly increase the
*4882a593Smuzhiyun	actuarial risk of your kernel.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	As a rough rule of thumb, any dereference of an RCU-protected
*4882a593Smuzhiyun	pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
*4882a593Smuzhiyun	rcu_read_lock_sched(), or by the appropriate update-side lock.
*4882a593Smuzhiyun	Disabling of preemption can serve as rcu_read_lock_sched(), but
*4882a593Smuzhiyun	is less readable and prevents lockdep from detecting locking issues.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Letting RCU-protected pointers "leak" out of an RCU read-side
*4882a593Smuzhiyun	critical section is every bid as bad as letting them leak out
*4882a593Smuzhiyun	from under a lock.  Unless, of course, you have arranged some
*4882a593Smuzhiyun	other means of protection, such as a lock or a reference count
*4882a593Smuzhiyun	-before- letting them out of the RCU read-side critical section.
*4882a593Smuzhiyun
*4882a593Smuzhiyun3.	Does the update code tolerate concurrent accesses?
*4882a593Smuzhiyun
*4882a593Smuzhiyun	The whole point of RCU is to permit readers to run without
*4882a593Smuzhiyun	any locks or atomic operations.  This means that readers will
*4882a593Smuzhiyun	be running while updates are in progress.  There are a number
*4882a593Smuzhiyun	of ways to handle this concurrency, depending on the situation:
*4882a593Smuzhiyun
*4882a593Smuzhiyun	a.	Use the RCU variants of the list and hlist update
*4882a593Smuzhiyun		primitives to add, remove, and replace elements on
*4882a593Smuzhiyun		an RCU-protected list.	Alternatively, use the other
*4882a593Smuzhiyun		RCU-protected data structures that have been added to
*4882a593Smuzhiyun		the Linux kernel.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		This is almost always the best approach.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	b.	Proceed as in (a) above, but also maintain per-element
*4882a593Smuzhiyun		locks (that are acquired by both readers and writers)
*4882a593Smuzhiyun		that guard per-element state.  Of course, fields that
*4882a593Smuzhiyun		the readers refrain from accessing can be guarded by
*4882a593Smuzhiyun		some other lock acquired only by updaters, if desired.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		This works quite well, also.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	c.	Make updates appear atomic to readers.	For example,
*4882a593Smuzhiyun		pointer updates to properly aligned fields will
*4882a593Smuzhiyun		appear atomic, as will individual atomic primitives.
*4882a593Smuzhiyun		Sequences of operations performed under a lock will -not-
*4882a593Smuzhiyun		appear to be atomic to RCU readers, nor will sequences
*4882a593Smuzhiyun		of multiple atomic primitives.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		This can work, but is starting to get a bit tricky.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	d.	Carefully order the updates and the reads so that
*4882a593Smuzhiyun		readers see valid data at all phases of the update.
*4882a593Smuzhiyun		This is often more difficult than it sounds, especially
*4882a593Smuzhiyun		given modern CPUs' tendency to reorder memory references.
*4882a593Smuzhiyun		One must usually liberally sprinkle memory barriers
*4882a593Smuzhiyun		(smp_wmb(), smp_rmb(), smp_mb()) through the code,
*4882a593Smuzhiyun		making it difficult to understand and to test.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		It is usually better to group the changing data into
*4882a593Smuzhiyun		a separate structure, so that the change may be made
*4882a593Smuzhiyun		to appear atomic by updating a pointer to reference
*4882a593Smuzhiyun		a new structure containing updated values.
*4882a593Smuzhiyun
*4882a593Smuzhiyun4.	Weakly ordered CPUs pose special challenges.  Almost all CPUs
*4882a593Smuzhiyun	are weakly ordered -- even x86 CPUs allow later loads to be
*4882a593Smuzhiyun	reordered to precede earlier stores.  RCU code must take all of
*4882a593Smuzhiyun	the following measures to prevent memory-corruption problems:
*4882a593Smuzhiyun
*4882a593Smuzhiyun	a.	Readers must maintain proper ordering of their memory
*4882a593Smuzhiyun		accesses.  The rcu_dereference() primitive ensures that
*4882a593Smuzhiyun		the CPU picks up the pointer before it picks up the data
*4882a593Smuzhiyun		that the pointer points to.  This really is necessary
*4882a593Smuzhiyun		on Alpha CPUs.	If you don't believe me, see:
*4882a593Smuzhiyun
*4882a593Smuzhiyun			http://www.openvms.compaq.com/wizard/wiz_2637.html
*4882a593Smuzhiyun
*4882a593Smuzhiyun		The rcu_dereference() primitive is also an excellent
*4882a593Smuzhiyun		documentation aid, letting the person reading the
*4882a593Smuzhiyun		code know exactly which pointers are protected by RCU.
*4882a593Smuzhiyun		Please note that compilers can also reorder code, and
*4882a593Smuzhiyun		they are becoming increasingly aggressive about doing
*4882a593Smuzhiyun		just that.  The rcu_dereference() primitive therefore also
*4882a593Smuzhiyun		prevents destructive compiler optimizations.  However,
*4882a593Smuzhiyun		with a bit of devious creativity, it is possible to
*4882a593Smuzhiyun		mishandle the return value from rcu_dereference().
*4882a593Smuzhiyun		Please see rcu_dereference.txt in this directory for
*4882a593Smuzhiyun		more information.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		The rcu_dereference() primitive is used by the
*4882a593Smuzhiyun		various "_rcu()" list-traversal primitives, such
*4882a593Smuzhiyun		as the list_for_each_entry_rcu().  Note that it is
*4882a593Smuzhiyun		perfectly legal (if redundant) for update-side code to
*4882a593Smuzhiyun		use rcu_dereference() and the "_rcu()" list-traversal
*4882a593Smuzhiyun		primitives.  This is particularly useful in code that
*4882a593Smuzhiyun		is common to readers and updaters.  However, lockdep
*4882a593Smuzhiyun		will complain if you access rcu_dereference() outside
*4882a593Smuzhiyun		of an RCU read-side critical section.  See lockdep.txt
*4882a593Smuzhiyun		to learn what to do about this.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		Of course, neither rcu_dereference() nor the "_rcu()"
*4882a593Smuzhiyun		list-traversal primitives can substitute for a good
*4882a593Smuzhiyun		concurrency design coordinating among multiple updaters.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	b.	If the list macros are being used, the list_add_tail_rcu()
*4882a593Smuzhiyun		and list_add_rcu() primitives must be used in order
*4882a593Smuzhiyun		to prevent weakly ordered machines from misordering
*4882a593Smuzhiyun		structure initialization and pointer planting.
*4882a593Smuzhiyun		Similarly, if the hlist macros are being used, the
*4882a593Smuzhiyun		hlist_add_head_rcu() primitive is required.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	c.	If the list macros are being used, the list_del_rcu()
*4882a593Smuzhiyun		primitive must be used to keep list_del()'s pointer
*4882a593Smuzhiyun		poisoning from inflicting toxic effects on concurrent
*4882a593Smuzhiyun		readers.  Similarly, if the hlist macros are being used,
*4882a593Smuzhiyun		the hlist_del_rcu() primitive is required.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		The list_replace_rcu() and hlist_replace_rcu() primitives
*4882a593Smuzhiyun		may be used to replace an old structure with a new one
*4882a593Smuzhiyun		in their respective types of RCU-protected lists.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	d.	Rules similar to (4b) and (4c) apply to the "hlist_nulls"
*4882a593Smuzhiyun		type of RCU-protected linked lists.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	e.	Updates must ensure that initialization of a given
*4882a593Smuzhiyun		structure happens before pointers to that structure are
*4882a593Smuzhiyun		publicized.  Use the rcu_assign_pointer() primitive
*4882a593Smuzhiyun		when publicizing a pointer to a structure that can
*4882a593Smuzhiyun		be traversed by an RCU read-side critical section.
*4882a593Smuzhiyun
*4882a593Smuzhiyun5.	If call_rcu() or call_srcu() is used, the callback function will
*4882a593Smuzhiyun	be called from softirq context.  In particular, it cannot block.
*4882a593Smuzhiyun
*4882a593Smuzhiyun6.	Since synchronize_rcu() can block, it cannot be called
*4882a593Smuzhiyun	from any sort of irq context.  The same rule applies
*4882a593Smuzhiyun	for synchronize_srcu(), synchronize_rcu_expedited(), and
*4882a593Smuzhiyun	synchronize_srcu_expedited().
*4882a593Smuzhiyun
*4882a593Smuzhiyun	The expedited forms of these primitives have the same semantics
*4882a593Smuzhiyun	as the non-expedited forms, but expediting is both expensive and
*4882a593Smuzhiyun	(with the exception of synchronize_srcu_expedited()) unfriendly
*4882a593Smuzhiyun	to real-time workloads.  Use of the expedited primitives should
*4882a593Smuzhiyun	be restricted to rare configuration-change operations that would
*4882a593Smuzhiyun	not normally be undertaken while a real-time workload is running.
*4882a593Smuzhiyun	However, real-time workloads can use rcupdate.rcu_normal kernel
*4882a593Smuzhiyun	boot parameter to completely disable expedited grace periods,
*4882a593Smuzhiyun	though this might have performance implications.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	In particular, if you find yourself invoking one of the expedited
*4882a593Smuzhiyun	primitives repeatedly in a loop, please do everyone a favor:
*4882a593Smuzhiyun	Restructure your code so that it batches the updates, allowing
*4882a593Smuzhiyun	a single non-expedited primitive to cover the entire batch.
*4882a593Smuzhiyun	This will very likely be faster than the loop containing the
*4882a593Smuzhiyun	expedited primitive, and will be much much easier on the rest
*4882a593Smuzhiyun	of the system, especially to real-time workloads running on
*4882a593Smuzhiyun	the rest of the system.
*4882a593Smuzhiyun
*4882a593Smuzhiyun7.	As of v4.20, a given kernel implements only one RCU flavor,
*4882a593Smuzhiyun	which is RCU-sched for PREEMPT=n and RCU-preempt for PREEMPT=y.
*4882a593Smuzhiyun	If the updater uses call_rcu() or synchronize_rcu(),
*4882a593Smuzhiyun	then the corresponding readers my use rcu_read_lock() and
*4882a593Smuzhiyun	rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(),
*4882a593Smuzhiyun	or any pair of primitives that disables and re-enables preemption,
*4882a593Smuzhiyun	for example, rcu_read_lock_sched() and rcu_read_unlock_sched().
*4882a593Smuzhiyun	If the updater uses synchronize_srcu() or call_srcu(),
*4882a593Smuzhiyun	then the corresponding readers must use srcu_read_lock() and
*4882a593Smuzhiyun	srcu_read_unlock(), and with the same srcu_struct.  The rules for
*4882a593Smuzhiyun	the expedited primitives are the same as for their non-expedited
*4882a593Smuzhiyun	counterparts.  Mixing things up will result in confusion and
*4882a593Smuzhiyun	broken kernels, and has even resulted in an exploitable security
*4882a593Smuzhiyun	issue.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	One exception to this rule: rcu_read_lock() and rcu_read_unlock()
*4882a593Smuzhiyun	may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
*4882a593Smuzhiyun	in cases where local bottom halves are already known to be
*4882a593Smuzhiyun	disabled, for example, in irq or softirq context.  Commenting
*4882a593Smuzhiyun	such cases is a must, of course!  And the jury is still out on
*4882a593Smuzhiyun	whether the increased speed is worth it.
*4882a593Smuzhiyun
*4882a593Smuzhiyun8.	Although synchronize_rcu() is slower than is call_rcu(), it
*4882a593Smuzhiyun	usually results in simpler code.  So, unless update performance is
*4882a593Smuzhiyun	critically important, the updaters cannot block, or the latency of
*4882a593Smuzhiyun	synchronize_rcu() is visible from userspace, synchronize_rcu()
*4882a593Smuzhiyun	should be used in preference to call_rcu().  Furthermore,
*4882a593Smuzhiyun	kfree_rcu() usually results in even simpler code than does
*4882a593Smuzhiyun	synchronize_rcu() without synchronize_rcu()'s multi-millisecond
*4882a593Smuzhiyun	latency.  So please take advantage of kfree_rcu()'s "fire and
*4882a593Smuzhiyun	forget" memory-freeing capabilities where it applies.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	An especially important property of the synchronize_rcu()
*4882a593Smuzhiyun	primitive is that it automatically self-limits: if grace periods
*4882a593Smuzhiyun	are delayed for whatever reason, then the synchronize_rcu()
*4882a593Smuzhiyun	primitive will correspondingly delay updates.  In contrast,
*4882a593Smuzhiyun	code using call_rcu() should explicitly limit update rate in
*4882a593Smuzhiyun	cases where grace periods are delayed, as failing to do so can
*4882a593Smuzhiyun	result in excessive realtime latencies or even OOM conditions.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Ways of gaining this self-limiting property when using call_rcu()
*4882a593Smuzhiyun	include:
*4882a593Smuzhiyun
*4882a593Smuzhiyun	a.	Keeping a count of the number of data-structure elements
*4882a593Smuzhiyun		used by the RCU-protected data structure, including
*4882a593Smuzhiyun		those waiting for a grace period to elapse.  Enforce a
*4882a593Smuzhiyun		limit on this number, stalling updates as needed to allow
*4882a593Smuzhiyun		previously deferred frees to complete.	Alternatively,
*4882a593Smuzhiyun		limit only the number awaiting deferred free rather than
*4882a593Smuzhiyun		the total number of elements.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		One way to stall the updates is to acquire the update-side
*4882a593Smuzhiyun		mutex.	(Don't try this with a spinlock -- other CPUs
*4882a593Smuzhiyun		spinning on the lock could prevent the grace period
*4882a593Smuzhiyun		from ever ending.)  Another way to stall the updates
*4882a593Smuzhiyun		is for the updates to use a wrapper function around
*4882a593Smuzhiyun		the memory allocator, so that this wrapper function
*4882a593Smuzhiyun		simulates OOM when there is too much memory awaiting an
*4882a593Smuzhiyun		RCU grace period.  There are of course many other
*4882a593Smuzhiyun		variations on this theme.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	b.	Limiting update rate.  For example, if updates occur only
*4882a593Smuzhiyun		once per hour, then no explicit rate limiting is
*4882a593Smuzhiyun		required, unless your system is already badly broken.
*4882a593Smuzhiyun		Older versions of the dcache subsystem take this approach,
*4882a593Smuzhiyun		guarding updates with a global lock, limiting their rate.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	c.	Trusted update -- if updates can only be done manually by
*4882a593Smuzhiyun		superuser or some other trusted user, then it might not
*4882a593Smuzhiyun		be necessary to automatically limit them.  The theory
*4882a593Smuzhiyun		here is that superuser already has lots of ways to crash
*4882a593Smuzhiyun		the machine.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	d.	Periodically invoke synchronize_rcu(), permitting a limited
*4882a593Smuzhiyun		number of updates per grace period.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	The same cautions apply to call_srcu() and kfree_rcu().
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Note that although these primitives do take action to avoid memory
*4882a593Smuzhiyun	exhaustion when any given CPU has too many callbacks, a determined
*4882a593Smuzhiyun	user could still exhaust memory.  This is especially the case
*4882a593Smuzhiyun	if a system with a large number of CPUs has been configured to
*4882a593Smuzhiyun	offload all of its RCU callbacks onto a single CPU, or if the
*4882a593Smuzhiyun	system has relatively little free memory.
*4882a593Smuzhiyun
*4882a593Smuzhiyun9.	All RCU list-traversal primitives, which include
*4882a593Smuzhiyun	rcu_dereference(), list_for_each_entry_rcu(), and
*4882a593Smuzhiyun	list_for_each_safe_rcu(), must be either within an RCU read-side
*4882a593Smuzhiyun	critical section or must be protected by appropriate update-side
*4882a593Smuzhiyun	locks.	RCU read-side critical sections are delimited by
*4882a593Smuzhiyun	rcu_read_lock() and rcu_read_unlock(), or by similar primitives
*4882a593Smuzhiyun	such as rcu_read_lock_bh() and rcu_read_unlock_bh(), in which
*4882a593Smuzhiyun	case the matching rcu_dereference() primitive must be used in
*4882a593Smuzhiyun	order to keep lockdep happy, in this case, rcu_dereference_bh().
*4882a593Smuzhiyun
*4882a593Smuzhiyun	The reason that it is permissible to use RCU list-traversal
*4882a593Smuzhiyun	primitives when the update-side lock is held is that doing so
*4882a593Smuzhiyun	can be quite helpful in reducing code bloat when common code is
*4882a593Smuzhiyun	shared between readers and updaters.  Additional primitives
*4882a593Smuzhiyun	are provided for this case, as discussed in lockdep.txt.
*4882a593Smuzhiyun
*4882a593Smuzhiyun10.	Conversely, if you are in an RCU read-side critical section,
*4882a593Smuzhiyun	and you don't hold the appropriate update-side lock, you -must-
*4882a593Smuzhiyun	use the "_rcu()" variants of the list macros.  Failing to do so
*4882a593Smuzhiyun	will break Alpha, cause aggressive compilers to generate bad code,
*4882a593Smuzhiyun	and confuse people trying to read your code.
*4882a593Smuzhiyun
*4882a593Smuzhiyun11.	Any lock acquired by an RCU callback must be acquired elsewhere
*4882a593Smuzhiyun	with softirq disabled, e.g., via spin_lock_irqsave(),
*4882a593Smuzhiyun	spin_lock_bh(), etc.  Failing to disable softirq on a given
*4882a593Smuzhiyun	acquisition of that lock will result in deadlock as soon as
*4882a593Smuzhiyun	the RCU softirq handler happens to run your RCU callback while
*4882a593Smuzhiyun	interrupting that acquisition's critical section.
*4882a593Smuzhiyun
*4882a593Smuzhiyun12.	RCU callbacks can be and are executed in parallel.  In many cases,
*4882a593Smuzhiyun	the callback code simply wrappers around kfree(), so that this
*4882a593Smuzhiyun	is not an issue (or, more accurately, to the extent that it is
*4882a593Smuzhiyun	an issue, the memory-allocator locking handles it).  However,
*4882a593Smuzhiyun	if the callbacks do manipulate a shared data structure, they
*4882a593Smuzhiyun	must use whatever locking or other synchronization is required
*4882a593Smuzhiyun	to safely access and/or modify that data structure.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Do not assume that RCU callbacks will be executed on the same
*4882a593Smuzhiyun	CPU that executed the corresponding call_rcu() or call_srcu().
*4882a593Smuzhiyun	For example, if a given CPU goes offline while having an RCU
*4882a593Smuzhiyun	callback pending, then that RCU callback will execute on some
*4882a593Smuzhiyun	surviving CPU.	(If this was not the case, a self-spawning RCU
*4882a593Smuzhiyun	callback would prevent the victim CPU from ever going offline.)
*4882a593Smuzhiyun	Furthermore, CPUs designated by rcu_nocbs= might well -always-
*4882a593Smuzhiyun	have their RCU callbacks executed on some other CPUs, in fact,
*4882a593Smuzhiyun	for some  real-time workloads, this is the whole point of using
*4882a593Smuzhiyun	the rcu_nocbs= kernel boot parameter.
*4882a593Smuzhiyun
*4882a593Smuzhiyun13.	Unlike other forms of RCU, it -is- permissible to block in an
*4882a593Smuzhiyun	SRCU read-side critical section (demarked by srcu_read_lock()
*4882a593Smuzhiyun	and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
*4882a593Smuzhiyun	Please note that if you don't need to sleep in read-side critical
*4882a593Smuzhiyun	sections, you should be using RCU rather than SRCU, because RCU
*4882a593Smuzhiyun	is almost always faster and easier to use than is SRCU.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Also unlike other forms of RCU, explicit initialization and
*4882a593Smuzhiyun	cleanup is required either at build time via DEFINE_SRCU()
*4882a593Smuzhiyun	or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct()
*4882a593Smuzhiyun	and cleanup_srcu_struct().  These last two are passed a
*4882a593Smuzhiyun	"struct srcu_struct" that defines the scope of a given
*4882a593Smuzhiyun	SRCU domain.  Once initialized, the srcu_struct is passed
*4882a593Smuzhiyun	to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(),
*4882a593Smuzhiyun	synchronize_srcu_expedited(), and call_srcu().	A given
*4882a593Smuzhiyun	synchronize_srcu() waits only for SRCU read-side critical
*4882a593Smuzhiyun	sections governed by srcu_read_lock() and srcu_read_unlock()
*4882a593Smuzhiyun	calls that have been passed the same srcu_struct.  This property
*4882a593Smuzhiyun	is what makes sleeping read-side critical sections tolerable --
*4882a593Smuzhiyun	a given subsystem delays only its own updates, not those of other
*4882a593Smuzhiyun	subsystems using SRCU.	Therefore, SRCU is less prone to OOM the
*4882a593Smuzhiyun	system than RCU would be if RCU's read-side critical sections
*4882a593Smuzhiyun	were permitted to sleep.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	The ability to sleep in read-side critical sections does not
*4882a593Smuzhiyun	come for free.	First, corresponding srcu_read_lock() and
*4882a593Smuzhiyun	srcu_read_unlock() calls must be passed the same srcu_struct.
*4882a593Smuzhiyun	Second, grace-period-detection overhead is amortized only
*4882a593Smuzhiyun	over those updates sharing a given srcu_struct, rather than
*4882a593Smuzhiyun	being globally amortized as they are for other forms of RCU.
*4882a593Smuzhiyun	Therefore, SRCU should be used in preference to rw_semaphore
*4882a593Smuzhiyun	only in extremely read-intensive situations, or in situations
*4882a593Smuzhiyun	requiring SRCU's read-side deadlock immunity or low read-side
*4882a593Smuzhiyun	realtime latency.  You should also consider percpu_rw_semaphore
*4882a593Smuzhiyun	when you need lightweight readers.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	SRCU's expedited primitive (synchronize_srcu_expedited())
*4882a593Smuzhiyun	never sends IPIs to other CPUs, so it is easier on
*4882a593Smuzhiyun	real-time workloads than is synchronize_rcu_expedited().
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Note that rcu_assign_pointer() relates to SRCU just as it does to
*4882a593Smuzhiyun	other forms of RCU, but instead of rcu_dereference() you should
*4882a593Smuzhiyun	use srcu_dereference() in order to avoid lockdep splats.
*4882a593Smuzhiyun
*4882a593Smuzhiyun14.	The whole point of call_rcu(), synchronize_rcu(), and friends
*4882a593Smuzhiyun	is to wait until all pre-existing readers have finished before
*4882a593Smuzhiyun	carrying out some otherwise-destructive operation.  It is
*4882a593Smuzhiyun	therefore critically important to -first- remove any path
*4882a593Smuzhiyun	that readers can follow that could be affected by the
*4882a593Smuzhiyun	destructive operation, and -only- -then- invoke call_rcu(),
*4882a593Smuzhiyun	synchronize_rcu(), or friends.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Because these primitives only wait for pre-existing readers, it
*4882a593Smuzhiyun	is the caller's responsibility to guarantee that any subsequent
*4882a593Smuzhiyun	readers will execute safely.
*4882a593Smuzhiyun
*4882a593Smuzhiyun15.	The various RCU read-side primitives do -not- necessarily contain
*4882a593Smuzhiyun	memory barriers.  You should therefore plan for the CPU
*4882a593Smuzhiyun	and the compiler to freely reorder code into and out of RCU
*4882a593Smuzhiyun	read-side critical sections.  It is the responsibility of the
*4882a593Smuzhiyun	RCU update-side primitives to deal with this.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	For SRCU readers, you can use smp_mb__after_srcu_read_unlock()
*4882a593Smuzhiyun	immediately after an srcu_read_unlock() to get a full barrier.
*4882a593Smuzhiyun
*4882a593Smuzhiyun16.	Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
*4882a593Smuzhiyun	__rcu sparse checks to validate your RCU code.	These can help
*4882a593Smuzhiyun	find problems as follows:
*4882a593Smuzhiyun
*4882a593Smuzhiyun	CONFIG_PROVE_LOCKING:
*4882a593Smuzhiyun		check that accesses to RCU-protected data
*4882a593Smuzhiyun		structures are carried out under the proper RCU
*4882a593Smuzhiyun		read-side critical section, while holding the right
*4882a593Smuzhiyun		combination of locks, or whatever other conditions
*4882a593Smuzhiyun		are appropriate.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	CONFIG_DEBUG_OBJECTS_RCU_HEAD:
*4882a593Smuzhiyun		check that you don't pass the
*4882a593Smuzhiyun		same object to call_rcu() (or friends) before an RCU
*4882a593Smuzhiyun		grace period has elapsed since the last time that you
*4882a593Smuzhiyun		passed that same object to call_rcu() (or friends).
*4882a593Smuzhiyun
*4882a593Smuzhiyun	__rcu sparse checks:
*4882a593Smuzhiyun		tag the pointer to the RCU-protected data
*4882a593Smuzhiyun		structure with __rcu, and sparse will warn you if you
*4882a593Smuzhiyun		access that pointer without the services of one of the
*4882a593Smuzhiyun		variants of rcu_dereference().
*4882a593Smuzhiyun
*4882a593Smuzhiyun	These debugging aids can help you find problems that are
*4882a593Smuzhiyun	otherwise extremely difficult to spot.
*4882a593Smuzhiyun
*4882a593Smuzhiyun17.	If you register a callback using call_rcu() or call_srcu(), and
*4882a593Smuzhiyun	pass in a function defined within a loadable module, then it in
*4882a593Smuzhiyun	necessary to wait for all pending callbacks to be invoked after
*4882a593Smuzhiyun	the last invocation and before unloading that module.  Note that
*4882a593Smuzhiyun	it is absolutely -not- sufficient to wait for a grace period!
*4882a593Smuzhiyun	The current (say) synchronize_rcu() implementation is -not-
*4882a593Smuzhiyun	guaranteed to wait for callbacks registered on other CPUs.
*4882a593Smuzhiyun	Or even on the current CPU if that CPU recently went offline
*4882a593Smuzhiyun	and came back online.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	You instead need to use one of the barrier functions:
*4882a593Smuzhiyun
*4882a593Smuzhiyun	-	call_rcu() -> rcu_barrier()
*4882a593Smuzhiyun	-	call_srcu() -> srcu_barrier()
*4882a593Smuzhiyun
*4882a593Smuzhiyun	However, these barrier functions are absolutely -not- guaranteed
*4882a593Smuzhiyun	to wait for a grace period.  In fact, if there are no call_rcu()
*4882a593Smuzhiyun	callbacks waiting anywhere in the system, rcu_barrier() is within
*4882a593Smuzhiyun	its rights to return immediately.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	So if you need to wait for both an RCU grace period and for
*4882a593Smuzhiyun	all pre-existing call_rcu() callbacks, you will need to execute
*4882a593Smuzhiyun	both rcu_barrier() and synchronize_rcu(), if necessary, using
*4882a593Smuzhiyun	something like workqueues to to execute them concurrently.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	See rcubarrier.txt for more information.