Documentation/RCU/whatisRCU.rst

*4882a593Smuzhiyun.. _whatisrcu_doc:
*4882a593Smuzhiyun
*4882a593SmuzhiyunWhat is RCU?  --  "Read, Copy, Update"
*4882a593Smuzhiyun======================================
*4882a593Smuzhiyun
*4882a593SmuzhiyunPlease note that the "What is RCU?" LWN series is an excellent place
*4882a593Smuzhiyunto start learning about RCU:
*4882a593Smuzhiyun
*4882a593Smuzhiyun| 1.	What is RCU, Fundamentally?  http://lwn.net/Articles/262464/
*4882a593Smuzhiyun| 2.	What is RCU? Part 2: Usage   http://lwn.net/Articles/263130/
*4882a593Smuzhiyun| 3.	RCU part 3: the RCU API      http://lwn.net/Articles/264090/
*4882a593Smuzhiyun| 4.	The RCU API, 2010 Edition    http://lwn.net/Articles/418853/
*4882a593Smuzhiyun| 	2010 Big API Table           http://lwn.net/Articles/419086/
*4882a593Smuzhiyun| 5.	The RCU API, 2014 Edition    http://lwn.net/Articles/609904/
*4882a593Smuzhiyun|	2014 Big API Table           http://lwn.net/Articles/609973/
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunWhat is RCU?
*4882a593Smuzhiyun
*4882a593SmuzhiyunRCU is a synchronization mechanism that was added to the Linux kernel
*4882a593Smuzhiyunduring the 2.5 development effort that is optimized for read-mostly
*4882a593Smuzhiyunsituations.  Although RCU is actually quite simple once you understand it,
*4882a593Smuzhiyungetting there can sometimes be a challenge.  Part of the problem is that
*4882a593Smuzhiyunmost of the past descriptions of RCU have been written with the mistaken
*4882a593Smuzhiyunassumption that there is "one true way" to describe RCU.  Instead,
*4882a593Smuzhiyunthe experience has been that different people must take different paths
*4882a593Smuzhiyunto arrive at an understanding of RCU.  This document provides several
*4882a593Smuzhiyundifferent paths, as follows:
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`1.	RCU OVERVIEW <1_whatisRCU>`
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`2.	WHAT IS RCU'S CORE API? <2_whatisRCU>`
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`3.	WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`4.	WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`5.	WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`6.	ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`7.	FULL LIST OF RCU APIs <7_whatisRCU>`
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`8.	ANSWERS TO QUICK QUIZZES <8_whatisRCU>`
*4882a593Smuzhiyun
*4882a593SmuzhiyunPeople who prefer starting with a conceptual overview should focus on
*4882a593SmuzhiyunSection 1, though most readers will profit by reading this section at
*4882a593Smuzhiyunsome point.  People who prefer to start with an API that they can then
*4882a593Smuzhiyunexperiment with should focus on Section 2.  People who prefer to start
*4882a593Smuzhiyunwith example uses should focus on Sections 3 and 4.  People who need to
*4882a593Smuzhiyununderstand the RCU implementation should focus on Section 5, then dive
*4882a593Smuzhiyuninto the kernel source code.  People who reason best by analogy should
*4882a593Smuzhiyunfocus on Section 6.  Section 7 serves as an index to the docbook API
*4882a593Smuzhiyundocumentation, and Section 8 is the traditional answer key.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSo, start with the section that makes the most sense to you and your
*4882a593Smuzhiyunpreferred method of learning.  If you need to know everything about
*4882a593Smuzhiyuneverything, feel free to read the whole thing -- but if you are really
*4882a593Smuzhiyunthat type of person, you have perused the source code and will therefore
*4882a593Smuzhiyunnever need this document anyway.  ;-)
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. _1_whatisRCU:
*4882a593Smuzhiyun
*4882a593Smuzhiyun1.  RCU OVERVIEW
*4882a593Smuzhiyun----------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe basic idea behind RCU is to split updates into "removal" and
*4882a593Smuzhiyun"reclamation" phases.  The removal phase removes references to data items
*4882a593Smuzhiyunwithin a data structure (possibly by replacing them with references to
*4882a593Smuzhiyunnew versions of these data items), and can run concurrently with readers.
*4882a593SmuzhiyunThe reason that it is safe to run the removal phase concurrently with
*4882a593Smuzhiyunreaders is the semantics of modern CPUs guarantee that readers will see
*4882a593Smuzhiyuneither the old or the new version of the data structure rather than a
*4882a593Smuzhiyunpartially updated reference.  The reclamation phase does the work of reclaiming
*4882a593Smuzhiyun(e.g., freeing) the data items removed from the data structure during the
*4882a593Smuzhiyunremoval phase.  Because reclaiming data items can disrupt any readers
*4882a593Smuzhiyunconcurrently referencing those data items, the reclamation phase must
*4882a593Smuzhiyunnot start until readers no longer hold references to those data items.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSplitting the update into removal and reclamation phases permits the
*4882a593Smuzhiyunupdater to perform the removal phase immediately, and to defer the
*4882a593Smuzhiyunreclamation phase until all readers active during the removal phase have
*4882a593Smuzhiyuncompleted, either by blocking until they finish or by registering a
*4882a593Smuzhiyuncallback that is invoked after they finish.  Only readers that are active
*4882a593Smuzhiyunduring the removal phase need be considered, because any reader starting
*4882a593Smuzhiyunafter the removal phase will be unable to gain a reference to the removed
*4882a593Smuzhiyundata items, and therefore cannot be disrupted by the reclamation phase.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSo the typical RCU update sequence goes something like the following:
*4882a593Smuzhiyun
*4882a593Smuzhiyuna.	Remove pointers to a data structure, so that subsequent
*4882a593Smuzhiyun	readers cannot gain a reference to it.
*4882a593Smuzhiyun
*4882a593Smuzhiyunb.	Wait for all previous readers to complete their RCU read-side
*4882a593Smuzhiyun	critical sections.
*4882a593Smuzhiyun
*4882a593Smuzhiyunc.	At this point, there cannot be any readers who hold references
*4882a593Smuzhiyun	to the data structure, so it now may safely be reclaimed
*4882a593Smuzhiyun	(e.g., kfree()d).
*4882a593Smuzhiyun
*4882a593SmuzhiyunStep (b) above is the key idea underlying RCU's deferred destruction.
*4882a593SmuzhiyunThe ability to wait until all readers are done allows RCU readers to
*4882a593Smuzhiyunuse much lighter-weight synchronization, in some cases, absolutely no
*4882a593Smuzhiyunsynchronization at all.  In contrast, in more conventional lock-based
*4882a593Smuzhiyunschemes, readers must use heavy-weight synchronization in order to
*4882a593Smuzhiyunprevent an updater from deleting the data structure out from under them.
*4882a593SmuzhiyunThis is because lock-based updaters typically update data items in place,
*4882a593Smuzhiyunand must therefore exclude readers.  In contrast, RCU-based updaters
*4882a593Smuzhiyuntypically take advantage of the fact that writes to single aligned
*4882a593Smuzhiyunpointers are atomic on modern CPUs, allowing atomic insertion, removal,
*4882a593Smuzhiyunand replacement of data items in a linked structure without disrupting
*4882a593Smuzhiyunreaders.  Concurrent RCU readers can then continue accessing the old
*4882a593Smuzhiyunversions, and can dispense with the atomic operations, memory barriers,
*4882a593Smuzhiyunand communications cache misses that are so expensive on present-day
*4882a593SmuzhiyunSMP computer systems, even in absence of lock contention.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIn the three-step procedure shown above, the updater is performing both
*4882a593Smuzhiyunthe removal and the reclamation step, but it is often helpful for an
*4882a593Smuzhiyunentirely different thread to do the reclamation, as is in fact the case
*4882a593Smuzhiyunin the Linux kernel's directory-entry cache (dcache).  Even if the same
*4882a593Smuzhiyunthread performs both the update step (step (a) above) and the reclamation
*4882a593Smuzhiyunstep (step (c) above), it is often helpful to think of them separately.
*4882a593SmuzhiyunFor example, RCU readers and updaters need not communicate at all,
*4882a593Smuzhiyunbut RCU provides implicit low-overhead communication between readers
*4882a593Smuzhiyunand reclaimers, namely, in step (b) above.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSo how the heck can a reclaimer tell when a reader is done, given
*4882a593Smuzhiyunthat readers are not doing any sort of synchronization operations???
*4882a593SmuzhiyunRead on to learn about how RCU's API makes this easy.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. _2_whatisRCU:
*4882a593Smuzhiyun
*4882a593Smuzhiyun2.  WHAT IS RCU'S CORE API?
*4882a593Smuzhiyun---------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe core RCU API is quite small:
*4882a593Smuzhiyun
*4882a593Smuzhiyuna.	rcu_read_lock()
*4882a593Smuzhiyunb.	rcu_read_unlock()
*4882a593Smuzhiyunc.	synchronize_rcu() / call_rcu()
*4882a593Smuzhiyund.	rcu_assign_pointer()
*4882a593Smuzhiyune.	rcu_dereference()
*4882a593Smuzhiyun
*4882a593SmuzhiyunThere are many other members of the RCU API, but the rest can be
*4882a593Smuzhiyunexpressed in terms of these five, though most implementations instead
*4882a593Smuzhiyunexpress synchronize_rcu() in terms of the call_rcu() callback API.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe five core RCU APIs are described below, the other 18 will be enumerated
*4882a593Smuzhiyunlater.  See the kernel docbook documentation for more info, or look directly
*4882a593Smuzhiyunat the function header comments.
*4882a593Smuzhiyun
*4882a593Smuzhiyunrcu_read_lock()
*4882a593Smuzhiyun^^^^^^^^^^^^^^^
*4882a593Smuzhiyun	void rcu_read_lock(void);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Used by a reader to inform the reclaimer that the reader is
*4882a593Smuzhiyun	entering an RCU read-side critical section.  It is illegal
*4882a593Smuzhiyun	to block while in an RCU read-side critical section, though
*4882a593Smuzhiyun	kernels built with CONFIG_PREEMPT_RCU can preempt RCU
*4882a593Smuzhiyun	read-side critical sections.  Any RCU-protected data structure
*4882a593Smuzhiyun	accessed during an RCU read-side critical section is guaranteed to
*4882a593Smuzhiyun	remain unreclaimed for the full duration of that critical section.
*4882a593Smuzhiyun	Reference counts may be used in conjunction with RCU to maintain
*4882a593Smuzhiyun	longer-term references to data structures.
*4882a593Smuzhiyun
*4882a593Smuzhiyunrcu_read_unlock()
*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^
*4882a593Smuzhiyun	void rcu_read_unlock(void);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Used by a reader to inform the reclaimer that the reader is
*4882a593Smuzhiyun	exiting an RCU read-side critical section.  Note that RCU
*4882a593Smuzhiyun	read-side critical sections may be nested and/or overlapping.
*4882a593Smuzhiyun
*4882a593Smuzhiyunsynchronize_rcu()
*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^
*4882a593Smuzhiyun	void synchronize_rcu(void);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Marks the end of updater code and the beginning of reclaimer
*4882a593Smuzhiyun	code.  It does this by blocking until all pre-existing RCU
*4882a593Smuzhiyun	read-side critical sections on all CPUs have completed.
*4882a593Smuzhiyun	Note that synchronize_rcu() will **not** necessarily wait for
*4882a593Smuzhiyun	any subsequent RCU read-side critical sections to complete.
*4882a593Smuzhiyun	For example, consider the following sequence of events::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	         CPU 0                  CPU 1                 CPU 2
*4882a593Smuzhiyun	     ----------------- ------------------------- ---------------
*4882a593Smuzhiyun	 1.  rcu_read_lock()
*4882a593Smuzhiyun	 2.                    enters synchronize_rcu()
*4882a593Smuzhiyun	 3.                                               rcu_read_lock()
*4882a593Smuzhiyun	 4.  rcu_read_unlock()
*4882a593Smuzhiyun	 5.                     exits synchronize_rcu()
*4882a593Smuzhiyun	 6.                                              rcu_read_unlock()
*4882a593Smuzhiyun
*4882a593Smuzhiyun	To reiterate, synchronize_rcu() waits only for ongoing RCU
*4882a593Smuzhiyun	read-side critical sections to complete, not necessarily for
*4882a593Smuzhiyun	any that begin after synchronize_rcu() is invoked.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Of course, synchronize_rcu() does not necessarily return
*4882a593Smuzhiyun	**immediately** after the last pre-existing RCU read-side critical
*4882a593Smuzhiyun	section completes.  For one thing, there might well be scheduling
*4882a593Smuzhiyun	delays.  For another thing, many RCU implementations process
*4882a593Smuzhiyun	requests in batches in order to improve efficiencies, which can
*4882a593Smuzhiyun	further delay synchronize_rcu().
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Since synchronize_rcu() is the API that must figure out when
*4882a593Smuzhiyun	readers are done, its implementation is key to RCU.  For RCU
*4882a593Smuzhiyun	to be useful in all but the most read-intensive situations,
*4882a593Smuzhiyun	synchronize_rcu()'s overhead must also be quite small.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	The call_rcu() API is a callback form of synchronize_rcu(),
*4882a593Smuzhiyun	and is described in more detail in a later section.  Instead of
*4882a593Smuzhiyun	blocking, it registers a function and argument which are invoked
*4882a593Smuzhiyun	after all ongoing RCU read-side critical sections have completed.
*4882a593Smuzhiyun	This callback variant is particularly useful in situations where
*4882a593Smuzhiyun	it is illegal to block or where update-side performance is
*4882a593Smuzhiyun	critically important.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	However, the call_rcu() API should not be used lightly, as use
*4882a593Smuzhiyun	of the synchronize_rcu() API generally results in simpler code.
*4882a593Smuzhiyun	In addition, the synchronize_rcu() API has the nice property
*4882a593Smuzhiyun	of automatically limiting update rate should grace periods
*4882a593Smuzhiyun	be delayed.  This property results in system resilience in face
*4882a593Smuzhiyun	of denial-of-service attacks.  Code using call_rcu() should limit
*4882a593Smuzhiyun	update rate in order to gain this same sort of resilience.  See
*4882a593Smuzhiyun	checklist.txt for some approaches to limiting the update rate.
*4882a593Smuzhiyun
*4882a593Smuzhiyunrcu_assign_pointer()
*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^
*4882a593Smuzhiyun	void rcu_assign_pointer(p, typeof(p) v);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Yes, rcu_assign_pointer() **is** implemented as a macro, though it
*4882a593Smuzhiyun	would be cool to be able to declare a function in this manner.
*4882a593Smuzhiyun	(Compiler experts will no doubt disagree.)
*4882a593Smuzhiyun
*4882a593Smuzhiyun	The updater uses this function to assign a new value to an
*4882a593Smuzhiyun	RCU-protected pointer, in order to safely communicate the change
*4882a593Smuzhiyun	in value from the updater to the reader.  This macro does not
*4882a593Smuzhiyun	evaluate to an rvalue, but it does execute any memory-barrier
*4882a593Smuzhiyun	instructions required for a given CPU architecture.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Perhaps just as important, it serves to document (1) which
*4882a593Smuzhiyun	pointers are protected by RCU and (2) the point at which a
*4882a593Smuzhiyun	given structure becomes accessible to other CPUs.  That said,
*4882a593Smuzhiyun	rcu_assign_pointer() is most frequently used indirectly, via
*4882a593Smuzhiyun	the _rcu list-manipulation primitives such as list_add_rcu().
*4882a593Smuzhiyun
*4882a593Smuzhiyunrcu_dereference()
*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^
*4882a593Smuzhiyun	typeof(p) rcu_dereference(p);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Like rcu_assign_pointer(), rcu_dereference() must be implemented
*4882a593Smuzhiyun	as a macro.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	The reader uses rcu_dereference() to fetch an RCU-protected
*4882a593Smuzhiyun	pointer, which returns a value that may then be safely
*4882a593Smuzhiyun	dereferenced.  Note that rcu_dereference() does not actually
*4882a593Smuzhiyun	dereference the pointer, instead, it protects the pointer for
*4882a593Smuzhiyun	later dereferencing.  It also executes any needed memory-barrier
*4882a593Smuzhiyun	instructions for a given CPU architecture.  Currently, only Alpha
*4882a593Smuzhiyun	needs memory barriers within rcu_dereference() -- on other CPUs,
*4882a593Smuzhiyun	it compiles to nothing, not even a compiler directive.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Common coding practice uses rcu_dereference() to copy an
*4882a593Smuzhiyun	RCU-protected pointer to a local variable, then dereferences
*4882a593Smuzhiyun	this local variable, for example as follows::
*4882a593Smuzhiyun
*4882a593Smuzhiyun		p = rcu_dereference(head.next);
*4882a593Smuzhiyun		return p->data;
*4882a593Smuzhiyun
*4882a593Smuzhiyun	However, in this case, one could just as easily combine these
*4882a593Smuzhiyun	into one statement::
*4882a593Smuzhiyun
*4882a593Smuzhiyun		return rcu_dereference(head.next)->data;
*4882a593Smuzhiyun
*4882a593Smuzhiyun	If you are going to be fetching multiple fields from the
*4882a593Smuzhiyun	RCU-protected structure, using the local variable is of
*4882a593Smuzhiyun	course preferred.  Repeated rcu_dereference() calls look
*4882a593Smuzhiyun	ugly, do not guarantee that the same pointer will be returned
*4882a593Smuzhiyun	if an update happened while in the critical section, and incur
*4882a593Smuzhiyun	unnecessary overhead on Alpha CPUs.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Note that the value returned by rcu_dereference() is valid
*4882a593Smuzhiyun	only within the enclosing RCU read-side critical section [1]_.
*4882a593Smuzhiyun	For example, the following is **not** legal::
*4882a593Smuzhiyun
*4882a593Smuzhiyun		rcu_read_lock();
*4882a593Smuzhiyun		p = rcu_dereference(head.next);
*4882a593Smuzhiyun		rcu_read_unlock();
*4882a593Smuzhiyun		x = p->address;	/* BUG!!! */
*4882a593Smuzhiyun		rcu_read_lock();
*4882a593Smuzhiyun		y = p->data;	/* BUG!!! */
*4882a593Smuzhiyun		rcu_read_unlock();
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Holding a reference from one RCU read-side critical section
*4882a593Smuzhiyun	to another is just as illegal as holding a reference from
*4882a593Smuzhiyun	one lock-based critical section to another!  Similarly,
*4882a593Smuzhiyun	using a reference outside of the critical section in which
*4882a593Smuzhiyun	it was acquired is just as illegal as doing so with normal
*4882a593Smuzhiyun	locking.
*4882a593Smuzhiyun
*4882a593Smuzhiyun	As with rcu_assign_pointer(), an important function of
*4882a593Smuzhiyun	rcu_dereference() is to document which pointers are protected by
*4882a593Smuzhiyun	RCU, in particular, flagging a pointer that is subject to changing
*4882a593Smuzhiyun	at any time, including immediately after the rcu_dereference().
*4882a593Smuzhiyun	And, again like rcu_assign_pointer(), rcu_dereference() is
*4882a593Smuzhiyun	typically used indirectly, via the _rcu list-manipulation
*4882a593Smuzhiyun	primitives, such as list_for_each_entry_rcu() [2]_.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. 	[1] The variant rcu_dereference_protected() can be used outside
*4882a593Smuzhiyun	of an RCU read-side critical section as long as the usage is
*4882a593Smuzhiyun	protected by locks acquired by the update-side code.  This variant
*4882a593Smuzhiyun	avoids the lockdep warning that would happen when using (for
*4882a593Smuzhiyun	example) rcu_dereference() without rcu_read_lock() protection.
*4882a593Smuzhiyun	Using rcu_dereference_protected() also has the advantage
*4882a593Smuzhiyun	of permitting compiler optimizations that rcu_dereference()
*4882a593Smuzhiyun	must prohibit.	The rcu_dereference_protected() variant takes
*4882a593Smuzhiyun	a lockdep expression to indicate which locks must be acquired
*4882a593Smuzhiyun	by the caller. If the indicated protection is not provided,
*4882a593Smuzhiyun	a lockdep splat is emitted.  See Documentation/RCU/Design/Requirements/Requirements.rst
*4882a593Smuzhiyun	and the API's code comments for more details and example usage.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. 	[2] If the list_for_each_entry_rcu() instance might be used by
*4882a593Smuzhiyun	update-side code as well as by RCU readers, then an additional
*4882a593Smuzhiyun	lockdep expression can be added to its list of arguments.
*4882a593Smuzhiyun	For example, given an additional "lock_is_held(&mylock)" argument,
*4882a593Smuzhiyun	the RCU lockdep code would complain only if this instance was
*4882a593Smuzhiyun	invoked outside of an RCU read-side critical section and without
*4882a593Smuzhiyun	the protection of mylock.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe following diagram shows how each API communicates among the
*4882a593Smuzhiyunreader, updater, and reclaimer.
*4882a593Smuzhiyun::
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593Smuzhiyun	    rcu_assign_pointer()
*4882a593Smuzhiyun	                            +--------+
*4882a593Smuzhiyun	    +---------------------->| reader |---------+
*4882a593Smuzhiyun	    |                       +--------+         |
*4882a593Smuzhiyun	    |                           |              |
*4882a593Smuzhiyun	    |                           |              | Protect:
*4882a593Smuzhiyun	    |                           |              | rcu_read_lock()
*4882a593Smuzhiyun	    |                           |              | rcu_read_unlock()
*4882a593Smuzhiyun	    |        rcu_dereference()  |              |
*4882a593Smuzhiyun	    +---------+                 |              |
*4882a593Smuzhiyun	    | updater |<----------------+              |
*4882a593Smuzhiyun	    +---------+                                V
*4882a593Smuzhiyun	    |                                    +-----------+
*4882a593Smuzhiyun	    +----------------------------------->| reclaimer |
*4882a593Smuzhiyun	                                         +-----------+
*4882a593Smuzhiyun	      Defer:
*4882a593Smuzhiyun	      synchronize_rcu() & call_rcu()
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe RCU infrastructure observes the time sequence of rcu_read_lock(),
*4882a593Smuzhiyunrcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
*4882a593Smuzhiyunorder to determine when (1) synchronize_rcu() invocations may return
*4882a593Smuzhiyunto their callers and (2) call_rcu() callbacks may be invoked.  Efficient
*4882a593Smuzhiyunimplementations of the RCU infrastructure make heavy use of batching in
*4882a593Smuzhiyunorder to amortize their overhead over many uses of the corresponding APIs.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThere are at least three flavors of RCU usage in the Linux kernel. The diagram
*4882a593Smuzhiyunabove shows the most common one. On the updater side, the rcu_assign_pointer(),
*4882a593Smuzhiyunsynchronize_rcu() and call_rcu() primitives used are the same for all three
*4882a593Smuzhiyunflavors. However for protection (on the reader side), the primitives used vary
*4882a593Smuzhiyundepending on the flavor:
*4882a593Smuzhiyun
*4882a593Smuzhiyuna.	rcu_read_lock() / rcu_read_unlock()
*4882a593Smuzhiyun	rcu_dereference()
*4882a593Smuzhiyun
*4882a593Smuzhiyunb.	rcu_read_lock_bh() / rcu_read_unlock_bh()
*4882a593Smuzhiyun	local_bh_disable() / local_bh_enable()
*4882a593Smuzhiyun	rcu_dereference_bh()
*4882a593Smuzhiyun
*4882a593Smuzhiyunc.	rcu_read_lock_sched() / rcu_read_unlock_sched()
*4882a593Smuzhiyun	preempt_disable() / preempt_enable()
*4882a593Smuzhiyun	local_irq_save() / local_irq_restore()
*4882a593Smuzhiyun	hardirq enter / hardirq exit
*4882a593Smuzhiyun	NMI enter / NMI exit
*4882a593Smuzhiyun	rcu_dereference_sched()
*4882a593Smuzhiyun
*4882a593SmuzhiyunThese three flavors are used as follows:
*4882a593Smuzhiyun
*4882a593Smuzhiyuna.	RCU applied to normal data structures.
*4882a593Smuzhiyun
*4882a593Smuzhiyunb.	RCU applied to networking data structures that may be subjected
*4882a593Smuzhiyun	to remote denial-of-service attacks.
*4882a593Smuzhiyun
*4882a593Smuzhiyunc.	RCU applied to scheduler and interrupt/NMI-handler tasks.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAgain, most uses will be of (a).  The (b) and (c) cases are important
*4882a593Smuzhiyunfor specialized uses, but are relatively uncommon.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. _3_whatisRCU:
*4882a593Smuzhiyun
*4882a593Smuzhiyun3.  WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
*4882a593Smuzhiyun-----------------------------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis section shows a simple use of the core RCU API to protect a
*4882a593Smuzhiyunglobal pointer to a dynamically allocated structure.  More-typical
*4882a593Smuzhiyunuses of RCU may be found in :ref:`listRCU.rst <list_rcu_doc>`,
*4882a593Smuzhiyun:ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst <NMI_rcu_doc>`.
*4882a593Smuzhiyun::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	struct foo {
*4882a593Smuzhiyun		int a;
*4882a593Smuzhiyun		char b;
*4882a593Smuzhiyun		long c;
*4882a593Smuzhiyun	};
*4882a593Smuzhiyun	DEFINE_SPINLOCK(foo_mutex);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	struct foo __rcu *gbl_foo;
*4882a593Smuzhiyun
*4882a593Smuzhiyun	/*
*4882a593Smuzhiyun	 * Create a new struct foo that is the same as the one currently
*4882a593Smuzhiyun	 * pointed to by gbl_foo, except that field "a" is replaced
*4882a593Smuzhiyun	 * with "new_a".  Points gbl_foo to the new structure, and
*4882a593Smuzhiyun	 * frees up the old structure after a grace period.
*4882a593Smuzhiyun	 *
*4882a593Smuzhiyun	 * Uses rcu_assign_pointer() to ensure that concurrent readers
*4882a593Smuzhiyun	 * see the initialized version of the new structure.
*4882a593Smuzhiyun	 *
*4882a593Smuzhiyun	 * Uses synchronize_rcu() to ensure that any readers that might
*4882a593Smuzhiyun	 * have references to the old structure complete before freeing
*4882a593Smuzhiyun	 * the old structure.
*4882a593Smuzhiyun	 */
*4882a593Smuzhiyun	void foo_update_a(int new_a)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		struct foo *new_fp;
*4882a593Smuzhiyun		struct foo *old_fp;
*4882a593Smuzhiyun
*4882a593Smuzhiyun		new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
*4882a593Smuzhiyun		spin_lock(&foo_mutex);
*4882a593Smuzhiyun		old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
*4882a593Smuzhiyun		*new_fp = *old_fp;
*4882a593Smuzhiyun		new_fp->a = new_a;
*4882a593Smuzhiyun		rcu_assign_pointer(gbl_foo, new_fp);
*4882a593Smuzhiyun		spin_unlock(&foo_mutex);
*4882a593Smuzhiyun		synchronize_rcu();
*4882a593Smuzhiyun		kfree(old_fp);
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593Smuzhiyun	/*
*4882a593Smuzhiyun	 * Return the value of field "a" of the current gbl_foo
*4882a593Smuzhiyun	 * structure.  Use rcu_read_lock() and rcu_read_unlock()
*4882a593Smuzhiyun	 * to ensure that the structure does not get deleted out
*4882a593Smuzhiyun	 * from under us, and use rcu_dereference() to ensure that
*4882a593Smuzhiyun	 * we see the initialized version of the structure (important
*4882a593Smuzhiyun	 * for DEC Alpha and for people reading the code).
*4882a593Smuzhiyun	 */
*4882a593Smuzhiyun	int foo_get_a(void)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		int retval;
*4882a593Smuzhiyun
*4882a593Smuzhiyun		rcu_read_lock();
*4882a593Smuzhiyun		retval = rcu_dereference(gbl_foo)->a;
*4882a593Smuzhiyun		rcu_read_unlock();
*4882a593Smuzhiyun		return retval;
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593SmuzhiyunSo, to sum up:
*4882a593Smuzhiyun
*4882a593Smuzhiyun-	Use rcu_read_lock() and rcu_read_unlock() to guard RCU
*4882a593Smuzhiyun	read-side critical sections.
*4882a593Smuzhiyun
*4882a593Smuzhiyun-	Within an RCU read-side critical section, use rcu_dereference()
*4882a593Smuzhiyun	to dereference RCU-protected pointers.
*4882a593Smuzhiyun
*4882a593Smuzhiyun-	Use some solid scheme (such as locks or semaphores) to
*4882a593Smuzhiyun	keep concurrent updates from interfering with each other.
*4882a593Smuzhiyun
*4882a593Smuzhiyun-	Use rcu_assign_pointer() to update an RCU-protected pointer.
*4882a593Smuzhiyun	This primitive protects concurrent readers from the updater,
*4882a593Smuzhiyun	**not** concurrent updates from each other!  You therefore still
*4882a593Smuzhiyun	need to use locking (or something similar) to keep concurrent
*4882a593Smuzhiyun	rcu_assign_pointer() primitives from interfering with each other.
*4882a593Smuzhiyun
*4882a593Smuzhiyun-	Use synchronize_rcu() **after** removing a data element from an
*4882a593Smuzhiyun	RCU-protected data structure, but **before** reclaiming/freeing
*4882a593Smuzhiyun	the data element, in order to wait for the completion of all
*4882a593Smuzhiyun	RCU read-side critical sections that might be referencing that
*4882a593Smuzhiyun	data item.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSee checklist.txt for additional rules to follow when using RCU.
*4882a593SmuzhiyunAnd again, more-typical uses of RCU may be found in :ref:`listRCU.rst
*4882a593Smuzhiyun<list_rcu_doc>`, :ref:`arrayRCU.rst <array_rcu_doc>`, and :ref:`NMI-RCU.rst
*4882a593Smuzhiyun<NMI_rcu_doc>`.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. _4_whatisRCU:
*4882a593Smuzhiyun
*4882a593Smuzhiyun4.  WHAT IF MY UPDATING THREAD CANNOT BLOCK?
*4882a593Smuzhiyun--------------------------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunIn the example above, foo_update_a() blocks until a grace period elapses.
*4882a593SmuzhiyunThis is quite simple, but in some cases one cannot afford to wait so
*4882a593Smuzhiyunlong -- there might be other high-priority work to be done.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIn such cases, one uses call_rcu() rather than synchronize_rcu().
*4882a593SmuzhiyunThe call_rcu() API is as follows::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void call_rcu(struct rcu_head * head,
*4882a593Smuzhiyun		      void (*func)(struct rcu_head *head));
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis function invokes func(head) after a grace period has elapsed.
*4882a593SmuzhiyunThis invocation might happen from either softirq or process context,
*4882a593Smuzhiyunso the function is not permitted to block.  The foo struct needs to
*4882a593Smuzhiyunhave an rcu_head structure added, perhaps as follows::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	struct foo {
*4882a593Smuzhiyun		int a;
*4882a593Smuzhiyun		char b;
*4882a593Smuzhiyun		long c;
*4882a593Smuzhiyun		struct rcu_head rcu;
*4882a593Smuzhiyun	};
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe foo_update_a() function might then be written as follows::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	/*
*4882a593Smuzhiyun	 * Create a new struct foo that is the same as the one currently
*4882a593Smuzhiyun	 * pointed to by gbl_foo, except that field "a" is replaced
*4882a593Smuzhiyun	 * with "new_a".  Points gbl_foo to the new structure, and
*4882a593Smuzhiyun	 * frees up the old structure after a grace period.
*4882a593Smuzhiyun	 *
*4882a593Smuzhiyun	 * Uses rcu_assign_pointer() to ensure that concurrent readers
*4882a593Smuzhiyun	 * see the initialized version of the new structure.
*4882a593Smuzhiyun	 *
*4882a593Smuzhiyun	 * Uses call_rcu() to ensure that any readers that might have
*4882a593Smuzhiyun	 * references to the old structure complete before freeing the
*4882a593Smuzhiyun	 * old structure.
*4882a593Smuzhiyun	 */
*4882a593Smuzhiyun	void foo_update_a(int new_a)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		struct foo *new_fp;
*4882a593Smuzhiyun		struct foo *old_fp;
*4882a593Smuzhiyun
*4882a593Smuzhiyun		new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
*4882a593Smuzhiyun		spin_lock(&foo_mutex);
*4882a593Smuzhiyun		old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
*4882a593Smuzhiyun		*new_fp = *old_fp;
*4882a593Smuzhiyun		new_fp->a = new_a;
*4882a593Smuzhiyun		rcu_assign_pointer(gbl_foo, new_fp);
*4882a593Smuzhiyun		spin_unlock(&foo_mutex);
*4882a593Smuzhiyun		call_rcu(&old_fp->rcu, foo_reclaim);
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe foo_reclaim() function might appear as follows::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void foo_reclaim(struct rcu_head *rp)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		struct foo *fp = container_of(rp, struct foo, rcu);
*4882a593Smuzhiyun
*4882a593Smuzhiyun		foo_cleanup(fp->a);
*4882a593Smuzhiyun
*4882a593Smuzhiyun		kfree(fp);
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe container_of() primitive is a macro that, given a pointer into a
*4882a593Smuzhiyunstruct, the type of the struct, and the pointed-to field within the
*4882a593Smuzhiyunstruct, returns a pointer to the beginning of the struct.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe use of call_rcu() permits the caller of foo_update_a() to
*4882a593Smuzhiyunimmediately regain control, without needing to worry further about the
*4882a593Smuzhiyunold version of the newly updated element.  It also clearly shows the
*4882a593SmuzhiyunRCU distinction between updater, namely foo_update_a(), and reclaimer,
*4882a593Smuzhiyunnamely foo_reclaim().
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe summary of advice is the same as for the previous section, except
*4882a593Smuzhiyunthat we are now using call_rcu() rather than synchronize_rcu():
*4882a593Smuzhiyun
*4882a593Smuzhiyun-	Use call_rcu() **after** removing a data element from an
*4882a593Smuzhiyun	RCU-protected data structure in order to register a callback
*4882a593Smuzhiyun	function that will be invoked after the completion of all RCU
*4882a593Smuzhiyun	read-side critical sections that might be referencing that
*4882a593Smuzhiyun	data item.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf the callback for call_rcu() is not doing anything more than calling
*4882a593Smuzhiyunkfree() on the structure, you can use kfree_rcu() instead of call_rcu()
*4882a593Smuzhiyunto avoid having to write your own callback::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	kfree_rcu(old_fp, rcu);
*4882a593Smuzhiyun
*4882a593SmuzhiyunAgain, see checklist.txt for additional rules governing the use of RCU.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. _5_whatisRCU:
*4882a593Smuzhiyun
*4882a593Smuzhiyun5.  WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
*4882a593Smuzhiyun------------------------------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunOne of the nice things about RCU is that it has extremely simple "toy"
*4882a593Smuzhiyunimplementations that are a good first step towards understanding the
*4882a593Smuzhiyunproduction-quality implementations in the Linux kernel.  This section
*4882a593Smuzhiyunpresents two such "toy" implementations of RCU, one that is implemented
*4882a593Smuzhiyunin terms of familiar locking primitives, and another that more closely
*4882a593Smuzhiyunresembles "classic" RCU.  Both are way too simple for real-world use,
*4882a593Smuzhiyunlacking both functionality and performance.  However, they are useful
*4882a593Smuzhiyunin getting a feel for how RCU works.  See kernel/rcu/update.c for a
*4882a593Smuzhiyunproduction-quality implementation, and see:
*4882a593Smuzhiyun
*4882a593Smuzhiyun	http://www.rdrop.com/users/paulmck/RCU
*4882a593Smuzhiyun
*4882a593Smuzhiyunfor papers describing the Linux kernel RCU implementation.  The OLS'01
*4882a593Smuzhiyunand OLS'02 papers are a good introduction, and the dissertation provides
*4882a593Smuzhiyunmore details on the current implementation as of early 2004.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593Smuzhiyun5A.  "TOY" IMPLEMENTATION #1: LOCKING
*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*4882a593SmuzhiyunThis section presents a "toy" RCU implementation that is based on
*4882a593Smuzhiyunfamiliar locking primitives.  Its overhead makes it a non-starter for
*4882a593Smuzhiyunreal-life use, as does its lack of scalability.  It is also unsuitable
*4882a593Smuzhiyunfor realtime use, since it allows scheduling latency to "bleed" from
*4882a593Smuzhiyunone read-side critical section to another.  It also assumes recursive
*4882a593Smuzhiyunreader-writer locks:  If you try this with non-recursive locks, and
*4882a593Smuzhiyunyou allow nested rcu_read_lock() calls, you can deadlock.
*4882a593Smuzhiyun
*4882a593SmuzhiyunHowever, it is probably the easiest implementation to relate to, so is
*4882a593Smuzhiyuna good starting point.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIt is extremely simple::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	static DEFINE_RWLOCK(rcu_gp_mutex);
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void rcu_read_lock(void)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		read_lock(&rcu_gp_mutex);
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void rcu_read_unlock(void)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		read_unlock(&rcu_gp_mutex);
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void synchronize_rcu(void)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		write_lock(&rcu_gp_mutex);
*4882a593Smuzhiyun		smp_mb__after_spinlock();
*4882a593Smuzhiyun		write_unlock(&rcu_gp_mutex);
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593Smuzhiyun[You can ignore rcu_assign_pointer() and rcu_dereference() without missing
*4882a593Smuzhiyunmuch.  But here are simplified versions anyway.  And whatever you do,
*4882a593Smuzhiyundon't forget about them when submitting patches making use of RCU!]::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	#define rcu_assign_pointer(p, v) \
*4882a593Smuzhiyun	({ \
*4882a593Smuzhiyun		smp_store_release(&(p), (v)); \
*4882a593Smuzhiyun	})
*4882a593Smuzhiyun
*4882a593Smuzhiyun	#define rcu_dereference(p) \
*4882a593Smuzhiyun	({ \
*4882a593Smuzhiyun		typeof(p) _________p1 = READ_ONCE(p); \
*4882a593Smuzhiyun		(_________p1); \
*4882a593Smuzhiyun	})
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe rcu_read_lock() and rcu_read_unlock() primitive read-acquire
*4882a593Smuzhiyunand release a global reader-writer lock.  The synchronize_rcu()
*4882a593Smuzhiyunprimitive write-acquires this same lock, then releases it.  This means
*4882a593Smuzhiyunthat once synchronize_rcu() exits, all RCU read-side critical sections
*4882a593Smuzhiyunthat were in progress before synchronize_rcu() was called are guaranteed
*4882a593Smuzhiyunto have completed -- there is no way that synchronize_rcu() would have
*4882a593Smuzhiyunbeen able to write-acquire the lock otherwise.  The smp_mb__after_spinlock()
*4882a593Smuzhiyunpromotes synchronize_rcu() to a full memory barrier in compliance with
*4882a593Smuzhiyunthe "Memory-Barrier Guarantees" listed in:
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Documentation/RCU/Design/Requirements/Requirements.rst
*4882a593Smuzhiyun
*4882a593SmuzhiyunIt is possible to nest rcu_read_lock(), since reader-writer locks may
*4882a593Smuzhiyunbe recursively acquired.  Note also that rcu_read_lock() is immune
*4882a593Smuzhiyunfrom deadlock (an important property of RCU).  The reason for this is
*4882a593Smuzhiyunthat the only thing that can block rcu_read_lock() is a synchronize_rcu().
*4882a593SmuzhiyunBut synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
*4882a593Smuzhiyunso there can be no deadlock cycle.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. _quiz_1:
*4882a593Smuzhiyun
*4882a593SmuzhiyunQuick Quiz #1:
*4882a593Smuzhiyun		Why is this argument naive?  How could a deadlock
*4882a593Smuzhiyun		occur when using this algorithm in a real-world Linux
*4882a593Smuzhiyun		kernel?  How could this deadlock be avoided?
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`Answers to Quick Quiz <8_whatisRCU>`
*4882a593Smuzhiyun
*4882a593Smuzhiyun5B.  "TOY" EXAMPLE #2: CLASSIC RCU
*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*4882a593SmuzhiyunThis section presents a "toy" RCU implementation that is based on
*4882a593Smuzhiyun"classic RCU".  It is also short on performance (but only for updates) and
*4882a593Smuzhiyunon features such as hotplug CPU and the ability to run in CONFIG_PREEMPT
*4882a593Smuzhiyunkernels.  The definitions of rcu_dereference() and rcu_assign_pointer()
*4882a593Smuzhiyunare the same as those shown in the preceding section, so they are omitted.
*4882a593Smuzhiyun::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void rcu_read_lock(void) { }
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void rcu_read_unlock(void) { }
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void synchronize_rcu(void)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		int cpu;
*4882a593Smuzhiyun
*4882a593Smuzhiyun		for_each_possible_cpu(cpu)
*4882a593Smuzhiyun			run_on(cpu);
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593SmuzhiyunNote that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.
*4882a593SmuzhiyunThis is the great strength of classic RCU in a non-preemptive kernel:
*4882a593Smuzhiyunread-side overhead is precisely zero, at least on non-Alpha CPUs.
*4882a593SmuzhiyunAnd there is absolutely no way that rcu_read_lock() can possibly
*4882a593Smuzhiyunparticipate in a deadlock cycle!
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe implementation of synchronize_rcu() simply schedules itself on each
*4882a593SmuzhiyunCPU in turn.  The run_on() primitive can be implemented straightforwardly
*4882a593Smuzhiyunin terms of the sched_setaffinity() primitive.  Of course, a somewhat less
*4882a593Smuzhiyun"toy" implementation would restore the affinity upon completion rather
*4882a593Smuzhiyunthan just leaving all tasks running on the last CPU, but when I said
*4882a593Smuzhiyun"toy", I meant **toy**!
*4882a593Smuzhiyun
*4882a593SmuzhiyunSo how the heck is this supposed to work???
*4882a593Smuzhiyun
*4882a593SmuzhiyunRemember that it is illegal to block while in an RCU read-side critical
*4882a593Smuzhiyunsection.  Therefore, if a given CPU executes a context switch, we know
*4882a593Smuzhiyunthat it must have completed all preceding RCU read-side critical sections.
*4882a593SmuzhiyunOnce **all** CPUs have executed a context switch, then **all** preceding
*4882a593SmuzhiyunRCU read-side critical sections will have completed.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSo, suppose that we remove a data item from its structure and then invoke
*4882a593Smuzhiyunsynchronize_rcu().  Once synchronize_rcu() returns, we are guaranteed
*4882a593Smuzhiyunthat there are no RCU read-side critical sections holding a reference
*4882a593Smuzhiyunto that data item, so we can safely reclaim it.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. _quiz_2:
*4882a593Smuzhiyun
*4882a593SmuzhiyunQuick Quiz #2:
*4882a593Smuzhiyun		Give an example where Classic RCU's read-side
*4882a593Smuzhiyun		overhead is **negative**.
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`Answers to Quick Quiz <8_whatisRCU>`
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. _quiz_3:
*4882a593Smuzhiyun
*4882a593SmuzhiyunQuick Quiz #3:
*4882a593Smuzhiyun		If it is illegal to block in an RCU read-side
*4882a593Smuzhiyun		critical section, what the heck do you do in
*4882a593Smuzhiyun		PREEMPT_RT, where normal spinlocks can block???
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`Answers to Quick Quiz <8_whatisRCU>`
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. _6_whatisRCU:
*4882a593Smuzhiyun
*4882a593Smuzhiyun6.  ANALOGY WITH READER-WRITER LOCKING
*4882a593Smuzhiyun--------------------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunAlthough RCU can be used in many different ways, a very common use of
*4882a593SmuzhiyunRCU is analogous to reader-writer locking.  The following unified
*4882a593Smuzhiyundiff shows how closely related RCU and reader-writer locking can be.
*4882a593Smuzhiyun::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	@@ -5,5 +5,5 @@ struct el {
*4882a593Smuzhiyun	 	int data;
*4882a593Smuzhiyun	 	/* Other data fields */
*4882a593Smuzhiyun	 };
*4882a593Smuzhiyun	-rwlock_t listmutex;
*4882a593Smuzhiyun	+spinlock_t listmutex;
*4882a593Smuzhiyun	 struct el head;
*4882a593Smuzhiyun
*4882a593Smuzhiyun	@@ -13,15 +14,15 @@
*4882a593Smuzhiyun		struct list_head *lp;
*4882a593Smuzhiyun		struct el *p;
*4882a593Smuzhiyun
*4882a593Smuzhiyun	-	read_lock(&listmutex);
*4882a593Smuzhiyun	-	list_for_each_entry(p, head, lp) {
*4882a593Smuzhiyun	+	rcu_read_lock();
*4882a593Smuzhiyun	+	list_for_each_entry_rcu(p, head, lp) {
*4882a593Smuzhiyun			if (p->key == key) {
*4882a593Smuzhiyun				*result = p->data;
*4882a593Smuzhiyun	-			read_unlock(&listmutex);
*4882a593Smuzhiyun	+			rcu_read_unlock();
*4882a593Smuzhiyun				return 1;
*4882a593Smuzhiyun			}
*4882a593Smuzhiyun		}
*4882a593Smuzhiyun	-	read_unlock(&listmutex);
*4882a593Smuzhiyun	+	rcu_read_unlock();
*4882a593Smuzhiyun		return 0;
*4882a593Smuzhiyun	 }
*4882a593Smuzhiyun
*4882a593Smuzhiyun	@@ -29,15 +30,16 @@
*4882a593Smuzhiyun	 {
*4882a593Smuzhiyun		struct el *p;
*4882a593Smuzhiyun
*4882a593Smuzhiyun	-	write_lock(&listmutex);
*4882a593Smuzhiyun	+	spin_lock(&listmutex);
*4882a593Smuzhiyun		list_for_each_entry(p, head, lp) {
*4882a593Smuzhiyun			if (p->key == key) {
*4882a593Smuzhiyun	-			list_del(&p->list);
*4882a593Smuzhiyun	-			write_unlock(&listmutex);
*4882a593Smuzhiyun	+			list_del_rcu(&p->list);
*4882a593Smuzhiyun	+			spin_unlock(&listmutex);
*4882a593Smuzhiyun	+			synchronize_rcu();
*4882a593Smuzhiyun				kfree(p);
*4882a593Smuzhiyun				return 1;
*4882a593Smuzhiyun			}
*4882a593Smuzhiyun		}
*4882a593Smuzhiyun	-	write_unlock(&listmutex);
*4882a593Smuzhiyun	+	spin_unlock(&listmutex);
*4882a593Smuzhiyun		return 0;
*4882a593Smuzhiyun	 }
*4882a593Smuzhiyun
*4882a593SmuzhiyunOr, for those who prefer a side-by-side listing::
*4882a593Smuzhiyun
*4882a593Smuzhiyun 1 struct el {                          1 struct el {
*4882a593Smuzhiyun 2   struct list_head list;             2   struct list_head list;
*4882a593Smuzhiyun 3   long key;                          3   long key;
*4882a593Smuzhiyun 4   spinlock_t mutex;                  4   spinlock_t mutex;
*4882a593Smuzhiyun 5   int data;                          5   int data;
*4882a593Smuzhiyun 6   /* Other data fields */            6   /* Other data fields */
*4882a593Smuzhiyun 7 };                                   7 };
*4882a593Smuzhiyun 8 rwlock_t listmutex;                  8 spinlock_t listmutex;
*4882a593Smuzhiyun 9 struct el head;                      9 struct el head;
*4882a593Smuzhiyun
*4882a593Smuzhiyun::
*4882a593Smuzhiyun
*4882a593Smuzhiyun  1 int search(long key, int *result)    1 int search(long key, int *result)
*4882a593Smuzhiyun  2 {                                    2 {
*4882a593Smuzhiyun  3   struct list_head *lp;              3   struct list_head *lp;
*4882a593Smuzhiyun  4   struct el *p;                      4   struct el *p;
*4882a593Smuzhiyun  5                                      5
*4882a593Smuzhiyun  6   read_lock(&listmutex);             6   rcu_read_lock();
*4882a593Smuzhiyun  7   list_for_each_entry(p, head, lp) { 7   list_for_each_entry_rcu(p, head, lp) {
*4882a593Smuzhiyun  8     if (p->key == key) {             8     if (p->key == key) {
*4882a593Smuzhiyun  9       *result = p->data;             9       *result = p->data;
*4882a593Smuzhiyun 10       read_unlock(&listmutex);      10       rcu_read_unlock();
*4882a593Smuzhiyun 11       return 1;                     11       return 1;
*4882a593Smuzhiyun 12     }                               12     }
*4882a593Smuzhiyun 13   }                                 13   }
*4882a593Smuzhiyun 14   read_unlock(&listmutex);          14   rcu_read_unlock();
*4882a593Smuzhiyun 15   return 0;                         15   return 0;
*4882a593Smuzhiyun 16 }                                   16 }
*4882a593Smuzhiyun
*4882a593Smuzhiyun::
*4882a593Smuzhiyun
*4882a593Smuzhiyun  1 int delete(long key)                 1 int delete(long key)
*4882a593Smuzhiyun  2 {                                    2 {
*4882a593Smuzhiyun  3   struct el *p;                      3   struct el *p;
*4882a593Smuzhiyun  4                                      4
*4882a593Smuzhiyun  5   write_lock(&listmutex);            5   spin_lock(&listmutex);
*4882a593Smuzhiyun  6   list_for_each_entry(p, head, lp) { 6   list_for_each_entry(p, head, lp) {
*4882a593Smuzhiyun  7     if (p->key == key) {             7     if (p->key == key) {
*4882a593Smuzhiyun  8       list_del(&p->list);            8       list_del_rcu(&p->list);
*4882a593Smuzhiyun  9       write_unlock(&listmutex);      9       spin_unlock(&listmutex);
*4882a593Smuzhiyun                                        10       synchronize_rcu();
*4882a593Smuzhiyun 10       kfree(p);                     11       kfree(p);
*4882a593Smuzhiyun 11       return 1;                     12       return 1;
*4882a593Smuzhiyun 12     }                               13     }
*4882a593Smuzhiyun 13   }                                 14   }
*4882a593Smuzhiyun 14   write_unlock(&listmutex);         15   spin_unlock(&listmutex);
*4882a593Smuzhiyun 15   return 0;                         16   return 0;
*4882a593Smuzhiyun 16 }                                   17 }
*4882a593Smuzhiyun
*4882a593SmuzhiyunEither way, the differences are quite small.  Read-side locking moves
*4882a593Smuzhiyunto rcu_read_lock() and rcu_read_unlock, update-side locking moves from
*4882a593Smuzhiyuna reader-writer lock to a simple spinlock, and a synchronize_rcu()
*4882a593Smuzhiyunprecedes the kfree().
*4882a593Smuzhiyun
*4882a593SmuzhiyunHowever, there is one potential catch: the read-side and update-side
*4882a593Smuzhiyuncritical sections can now run concurrently.  In many cases, this will
*4882a593Smuzhiyunnot be a problem, but it is necessary to check carefully regardless.
*4882a593SmuzhiyunFor example, if multiple independent list updates must be seen as
*4882a593Smuzhiyuna single atomic update, converting to RCU will require special care.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAlso, the presence of synchronize_rcu() means that the RCU version of
*4882a593Smuzhiyundelete() can now block.  If this is a problem, there is a callback-based
*4882a593Smuzhiyunmechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
*4882a593Smuzhiyunbe used in place of synchronize_rcu().
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. _7_whatisRCU:
*4882a593Smuzhiyun
*4882a593Smuzhiyun7.  FULL LIST OF RCU APIs
*4882a593Smuzhiyun-------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe RCU APIs are documented in docbook-format header comments in the
*4882a593SmuzhiyunLinux-kernel source code, but it helps to have a full list of the
*4882a593SmuzhiyunAPIs, since there does not appear to be a way to categorize them
*4882a593Smuzhiyunin docbook.  Here is the list, by category.
*4882a593Smuzhiyun
*4882a593SmuzhiyunRCU list traversal::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	list_entry_rcu
*4882a593Smuzhiyun	list_entry_lockless
*4882a593Smuzhiyun	list_first_entry_rcu
*4882a593Smuzhiyun	list_next_rcu
*4882a593Smuzhiyun	list_for_each_entry_rcu
*4882a593Smuzhiyun	list_for_each_entry_continue_rcu
*4882a593Smuzhiyun	list_for_each_entry_from_rcu
*4882a593Smuzhiyun	list_first_or_null_rcu
*4882a593Smuzhiyun	list_next_or_null_rcu
*4882a593Smuzhiyun	hlist_first_rcu
*4882a593Smuzhiyun	hlist_next_rcu
*4882a593Smuzhiyun	hlist_pprev_rcu
*4882a593Smuzhiyun	hlist_for_each_entry_rcu
*4882a593Smuzhiyun	hlist_for_each_entry_rcu_bh
*4882a593Smuzhiyun	hlist_for_each_entry_from_rcu
*4882a593Smuzhiyun	hlist_for_each_entry_continue_rcu
*4882a593Smuzhiyun	hlist_for_each_entry_continue_rcu_bh
*4882a593Smuzhiyun	hlist_nulls_first_rcu
*4882a593Smuzhiyun	hlist_nulls_for_each_entry_rcu
*4882a593Smuzhiyun	hlist_bl_first_rcu
*4882a593Smuzhiyun	hlist_bl_for_each_entry_rcu
*4882a593Smuzhiyun
*4882a593SmuzhiyunRCU pointer/list update::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	rcu_assign_pointer
*4882a593Smuzhiyun	list_add_rcu
*4882a593Smuzhiyun	list_add_tail_rcu
*4882a593Smuzhiyun	list_del_rcu
*4882a593Smuzhiyun	list_replace_rcu
*4882a593Smuzhiyun	hlist_add_behind_rcu
*4882a593Smuzhiyun	hlist_add_before_rcu
*4882a593Smuzhiyun	hlist_add_head_rcu
*4882a593Smuzhiyun	hlist_add_tail_rcu
*4882a593Smuzhiyun	hlist_del_rcu
*4882a593Smuzhiyun	hlist_del_init_rcu
*4882a593Smuzhiyun	hlist_replace_rcu
*4882a593Smuzhiyun	list_splice_init_rcu
*4882a593Smuzhiyun	list_splice_tail_init_rcu
*4882a593Smuzhiyun	hlist_nulls_del_init_rcu
*4882a593Smuzhiyun	hlist_nulls_del_rcu
*4882a593Smuzhiyun	hlist_nulls_add_head_rcu
*4882a593Smuzhiyun	hlist_bl_add_head_rcu
*4882a593Smuzhiyun	hlist_bl_del_init_rcu
*4882a593Smuzhiyun	hlist_bl_del_rcu
*4882a593Smuzhiyun	hlist_bl_set_first_rcu
*4882a593Smuzhiyun
*4882a593SmuzhiyunRCU::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Critical sections	Grace period		Barrier
*4882a593Smuzhiyun
*4882a593Smuzhiyun	rcu_read_lock		synchronize_net		rcu_barrier
*4882a593Smuzhiyun	rcu_read_unlock		synchronize_rcu
*4882a593Smuzhiyun	rcu_dereference		synchronize_rcu_expedited
*4882a593Smuzhiyun	rcu_read_lock_held	call_rcu
*4882a593Smuzhiyun	rcu_dereference_check	kfree_rcu
*4882a593Smuzhiyun	rcu_dereference_protected
*4882a593Smuzhiyun
*4882a593Smuzhiyunbh::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Critical sections	Grace period		Barrier
*4882a593Smuzhiyun
*4882a593Smuzhiyun	rcu_read_lock_bh	call_rcu		rcu_barrier
*4882a593Smuzhiyun	rcu_read_unlock_bh	synchronize_rcu
*4882a593Smuzhiyun	[local_bh_disable]	synchronize_rcu_expedited
*4882a593Smuzhiyun	[and friends]
*4882a593Smuzhiyun	rcu_dereference_bh
*4882a593Smuzhiyun	rcu_dereference_bh_check
*4882a593Smuzhiyun	rcu_dereference_bh_protected
*4882a593Smuzhiyun	rcu_read_lock_bh_held
*4882a593Smuzhiyun
*4882a593Smuzhiyunsched::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Critical sections	Grace period		Barrier
*4882a593Smuzhiyun
*4882a593Smuzhiyun	rcu_read_lock_sched	call_rcu		rcu_barrier
*4882a593Smuzhiyun	rcu_read_unlock_sched	synchronize_rcu
*4882a593Smuzhiyun	[preempt_disable]	synchronize_rcu_expedited
*4882a593Smuzhiyun	[and friends]
*4882a593Smuzhiyun	rcu_read_lock_sched_notrace
*4882a593Smuzhiyun	rcu_read_unlock_sched_notrace
*4882a593Smuzhiyun	rcu_dereference_sched
*4882a593Smuzhiyun	rcu_dereference_sched_check
*4882a593Smuzhiyun	rcu_dereference_sched_protected
*4882a593Smuzhiyun	rcu_read_lock_sched_held
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunSRCU::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Critical sections	Grace period		Barrier
*4882a593Smuzhiyun
*4882a593Smuzhiyun	srcu_read_lock		call_srcu		srcu_barrier
*4882a593Smuzhiyun	srcu_read_unlock	synchronize_srcu
*4882a593Smuzhiyun	srcu_dereference	synchronize_srcu_expedited
*4882a593Smuzhiyun	srcu_dereference_check
*4882a593Smuzhiyun	srcu_read_lock_held
*4882a593Smuzhiyun
*4882a593SmuzhiyunSRCU: Initialization/cleanup::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	DEFINE_SRCU
*4882a593Smuzhiyun	DEFINE_STATIC_SRCU
*4882a593Smuzhiyun	init_srcu_struct
*4882a593Smuzhiyun	cleanup_srcu_struct
*4882a593Smuzhiyun
*4882a593SmuzhiyunAll: lockdep-checked RCU-protected pointer access::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	rcu_access_pointer
*4882a593Smuzhiyun	rcu_dereference_raw
*4882a593Smuzhiyun	RCU_LOCKDEP_WARN
*4882a593Smuzhiyun	rcu_sleep_check
*4882a593Smuzhiyun	RCU_NONIDLE
*4882a593Smuzhiyun
*4882a593SmuzhiyunSee the comment headers in the source code (or the docbook generated
*4882a593Smuzhiyunfrom them) for more information.
*4882a593Smuzhiyun
*4882a593SmuzhiyunHowever, given that there are no fewer than four families of RCU APIs
*4882a593Smuzhiyunin the Linux kernel, how do you choose which one to use?  The following
*4882a593Smuzhiyunlist can be helpful:
*4882a593Smuzhiyun
*4882a593Smuzhiyuna.	Will readers need to block?  If so, you need SRCU.
*4882a593Smuzhiyun
*4882a593Smuzhiyunb.	What about the -rt patchset?  If readers would need to block
*4882a593Smuzhiyun	in an non-rt kernel, you need SRCU.  If readers would block
*4882a593Smuzhiyun	in a -rt kernel, but not in a non-rt kernel, SRCU is not
*4882a593Smuzhiyun	necessary.  (The -rt patchset turns spinlocks into sleeplocks,
*4882a593Smuzhiyun	hence this distinction.)
*4882a593Smuzhiyun
*4882a593Smuzhiyunc.	Do you need to treat NMI handlers, hardirq handlers,
*4882a593Smuzhiyun	and code segments with preemption disabled (whether
*4882a593Smuzhiyun	via preempt_disable(), local_irq_save(), local_bh_disable(),
*4882a593Smuzhiyun	or some other mechanism) as if they were explicit RCU readers?
*4882a593Smuzhiyun	If so, RCU-sched is the only choice that will work for you.
*4882a593Smuzhiyun
*4882a593Smuzhiyund.	Do you need RCU grace periods to complete even in the face
*4882a593Smuzhiyun	of softirq monopolization of one or more of the CPUs?  For
*4882a593Smuzhiyun	example, is your code subject to network-based denial-of-service
*4882a593Smuzhiyun	attacks?  If so, you should disable softirq across your readers,
*4882a593Smuzhiyun	for example, by using rcu_read_lock_bh().
*4882a593Smuzhiyun
*4882a593Smuzhiyune.	Is your workload too update-intensive for normal use of
*4882a593Smuzhiyun	RCU, but inappropriate for other synchronization mechanisms?
*4882a593Smuzhiyun	If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
*4882a593Smuzhiyun	named SLAB_DESTROY_BY_RCU).  But please be careful!
*4882a593Smuzhiyun
*4882a593Smuzhiyunf.	Do you need read-side critical sections that are respected
*4882a593Smuzhiyun	even though they are in the middle of the idle loop, during
*4882a593Smuzhiyun	user-mode execution, or on an offlined CPU?  If so, SRCU is the
*4882a593Smuzhiyun	only choice that will work for you.
*4882a593Smuzhiyun
*4882a593Smuzhiyung.	Otherwise, use RCU.
*4882a593Smuzhiyun
*4882a593SmuzhiyunOf course, this all assumes that you have determined that RCU is in fact
*4882a593Smuzhiyunthe right tool for your job.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. _8_whatisRCU:
*4882a593Smuzhiyun
*4882a593Smuzhiyun8.  ANSWERS TO QUICK QUIZZES
*4882a593Smuzhiyun----------------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunQuick Quiz #1:
*4882a593Smuzhiyun		Why is this argument naive?  How could a deadlock
*4882a593Smuzhiyun		occur when using this algorithm in a real-world Linux
*4882a593Smuzhiyun		kernel?  [Referring to the lock-based "toy" RCU
*4882a593Smuzhiyun		algorithm.]
*4882a593Smuzhiyun
*4882a593SmuzhiyunAnswer:
*4882a593Smuzhiyun		Consider the following sequence of events:
*4882a593Smuzhiyun
*4882a593Smuzhiyun		1.	CPU 0 acquires some unrelated lock, call it
*4882a593Smuzhiyun			"problematic_lock", disabling irq via
*4882a593Smuzhiyun			spin_lock_irqsave().
*4882a593Smuzhiyun
*4882a593Smuzhiyun		2.	CPU 1 enters synchronize_rcu(), write-acquiring
*4882a593Smuzhiyun			rcu_gp_mutex.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		3.	CPU 0 enters rcu_read_lock(), but must wait
*4882a593Smuzhiyun			because CPU 1 holds rcu_gp_mutex.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		4.	CPU 1 is interrupted, and the irq handler
*4882a593Smuzhiyun			attempts to acquire problematic_lock.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		The system is now deadlocked.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		One way to avoid this deadlock is to use an approach like
*4882a593Smuzhiyun		that of CONFIG_PREEMPT_RT, where all normal spinlocks
*4882a593Smuzhiyun		become blocking locks, and all irq handlers execute in
*4882a593Smuzhiyun		the context of special tasks.  In this case, in step 4
*4882a593Smuzhiyun		above, the irq handler would block, allowing CPU 1 to
*4882a593Smuzhiyun		release rcu_gp_mutex, avoiding the deadlock.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		Even in the absence of deadlock, this RCU implementation
*4882a593Smuzhiyun		allows latency to "bleed" from readers to other
*4882a593Smuzhiyun		readers through synchronize_rcu().  To see this,
*4882a593Smuzhiyun		consider task A in an RCU read-side critical section
*4882a593Smuzhiyun		(thus read-holding rcu_gp_mutex), task B blocked
*4882a593Smuzhiyun		attempting to write-acquire rcu_gp_mutex, and
*4882a593Smuzhiyun		task C blocked in rcu_read_lock() attempting to
*4882a593Smuzhiyun		read_acquire rcu_gp_mutex.  Task A's RCU read-side
*4882a593Smuzhiyun		latency is holding up task C, albeit indirectly via
*4882a593Smuzhiyun		task B.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		Realtime RCU implementations therefore use a counter-based
*4882a593Smuzhiyun		approach where tasks in RCU read-side critical sections
*4882a593Smuzhiyun		cannot be blocked by tasks executing synchronize_rcu().
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`Back to Quick Quiz #1 <quiz_1>`
*4882a593Smuzhiyun
*4882a593SmuzhiyunQuick Quiz #2:
*4882a593Smuzhiyun		Give an example where Classic RCU's read-side
*4882a593Smuzhiyun		overhead is **negative**.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAnswer:
*4882a593Smuzhiyun		Imagine a single-CPU system with a non-CONFIG_PREEMPT
*4882a593Smuzhiyun		kernel where a routing table is used by process-context
*4882a593Smuzhiyun		code, but can be updated by irq-context code (for example,
*4882a593Smuzhiyun		by an "ICMP REDIRECT" packet).	The usual way of handling
*4882a593Smuzhiyun		this would be to have the process-context code disable
*4882a593Smuzhiyun		interrupts while searching the routing table.  Use of
*4882a593Smuzhiyun		RCU allows such interrupt-disabling to be dispensed with.
*4882a593Smuzhiyun		Thus, without RCU, you pay the cost of disabling interrupts,
*4882a593Smuzhiyun		and with RCU you don't.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		One can argue that the overhead of RCU in this
*4882a593Smuzhiyun		case is negative with respect to the single-CPU
*4882a593Smuzhiyun		interrupt-disabling approach.  Others might argue that
*4882a593Smuzhiyun		the overhead of RCU is merely zero, and that replacing
*4882a593Smuzhiyun		the positive overhead of the interrupt-disabling scheme
*4882a593Smuzhiyun		with the zero-overhead RCU scheme does not constitute
*4882a593Smuzhiyun		negative overhead.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		In real life, of course, things are more complex.  But
*4882a593Smuzhiyun		even the theoretical possibility of negative overhead for
*4882a593Smuzhiyun		a synchronization primitive is a bit unexpected.  ;-)
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`Back to Quick Quiz #2 <quiz_2>`
*4882a593Smuzhiyun
*4882a593SmuzhiyunQuick Quiz #3:
*4882a593Smuzhiyun		If it is illegal to block in an RCU read-side
*4882a593Smuzhiyun		critical section, what the heck do you do in
*4882a593Smuzhiyun		PREEMPT_RT, where normal spinlocks can block???
*4882a593Smuzhiyun
*4882a593SmuzhiyunAnswer:
*4882a593Smuzhiyun		Just as PREEMPT_RT permits preemption of spinlock
*4882a593Smuzhiyun		critical sections, it permits preemption of RCU
*4882a593Smuzhiyun		read-side critical sections.  It also permits
*4882a593Smuzhiyun		spinlocks blocking while in RCU read-side critical
*4882a593Smuzhiyun		sections.
*4882a593Smuzhiyun
*4882a593Smuzhiyun		Why the apparent inconsistency?  Because it is
*4882a593Smuzhiyun		possible to use priority boosting to keep the RCU
*4882a593Smuzhiyun		grace periods short if need be (for example, if running
*4882a593Smuzhiyun		short of memory).  In contrast, if blocking waiting
*4882a593Smuzhiyun		for (say) network reception, there is no way to know
*4882a593Smuzhiyun		what should be boosted.  Especially given that the
*4882a593Smuzhiyun		process we need to boost might well be a human being
*4882a593Smuzhiyun		who just went out for a pizza or something.  And although
*4882a593Smuzhiyun		a computer-operated cattle prod might arouse serious
*4882a593Smuzhiyun		interest, it might also provoke serious objections.
*4882a593Smuzhiyun		Besides, how does the computer know what pizza parlor
*4882a593Smuzhiyun		the human being went to???
*4882a593Smuzhiyun
*4882a593Smuzhiyun:ref:`Back to Quick Quiz #3 <quiz_3>`
*4882a593Smuzhiyun
*4882a593SmuzhiyunACKNOWLEDGEMENTS
*4882a593Smuzhiyun
*4882a593SmuzhiyunMy thanks to the people who helped make this human-readable, including
*4882a593SmuzhiyunJon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunFor more information, see http://www.rdrop.com/users/paulmck/RCU.