Documentation/core-api/atomic_ops.rst

*4882a593Smuzhiyun=======================================================
*4882a593SmuzhiyunSemantics and Behavior of Atomic and Bitmask Operations
*4882a593Smuzhiyun=======================================================
*4882a593Smuzhiyun
*4882a593Smuzhiyun:Author: David S. Miller
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis document is intended to serve as a guide to Linux port
*4882a593Smuzhiyunmaintainers on how to implement atomic counter, bitops, and spinlock
*4882a593Smuzhiyuninterfaces properly.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAtomic Type And Operations
*4882a593Smuzhiyun==========================
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe atomic_t type should be defined as a signed integer and
*4882a593Smuzhiyunthe atomic_long_t type as a signed long integer.  Also, they should
*4882a593Smuzhiyunbe made opaque such that any kind of cast to a normal C integer type
*4882a593Smuzhiyunwill fail.  Something like the following should suffice::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	typedef struct { int counter; } atomic_t;
*4882a593Smuzhiyun	typedef struct { long counter; } atomic_long_t;
*4882a593Smuzhiyun
*4882a593SmuzhiyunHistorically, counter has been declared volatile.  This is now discouraged.
*4882a593SmuzhiyunSee :ref:`Documentation/process/volatile-considered-harmful.rst
*4882a593Smuzhiyun<volatile_considered_harmful>` for the complete rationale.
*4882a593Smuzhiyun
*4882a593Smuzhiyunlocal_t is very similar to atomic_t. If the counter is per CPU and only
*4882a593Smuzhiyunupdated by one CPU, local_t is probably more appropriate. Please see
*4882a593Smuzhiyun:ref:`Documentation/core-api/local_ops.rst <local_ops>` for the semantics of
*4882a593Smuzhiyunlocal_t.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe first operations to implement for atomic_t's are the initializers and
*4882a593Smuzhiyunplain writes. ::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	#define ATOMIC_INIT(i)		{ (i) }
*4882a593Smuzhiyun	#define atomic_set(v, i)	((v)->counter = (i))
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe first macro is used in definitions, such as::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	static atomic_t my_counter = ATOMIC_INIT(1);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe initializer is atomic in that the return values of the atomic operations
*4882a593Smuzhiyunare guaranteed to be correct reflecting the initialized value if the
*4882a593Smuzhiyuninitializer is used before runtime.  If the initializer is used at runtime, a
*4882a593Smuzhiyunproper implicit or explicit read memory barrier is needed before reading the
*4882a593Smuzhiyunvalue with atomic_read from another thread.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAs with all of the ``atomic_`` interfaces, replace the leading ``atomic_``
*4882a593Smuzhiyunwith ``atomic_long_`` to operate on atomic_long_t.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe second interface can be used at runtime, as in::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	struct foo { atomic_t counter; };
*4882a593Smuzhiyun	...
*4882a593Smuzhiyun
*4882a593Smuzhiyun	struct foo *k;
*4882a593Smuzhiyun
*4882a593Smuzhiyun	k = kmalloc(sizeof(*k), GFP_KERNEL);
*4882a593Smuzhiyun	if (!k)
*4882a593Smuzhiyun		return -ENOMEM;
*4882a593Smuzhiyun	atomic_set(&k->counter, 0);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe setting is atomic in that the return values of the atomic operations by
*4882a593Smuzhiyunall threads are guaranteed to be correct reflecting either the value that has
*4882a593Smuzhiyunbeen set with this operation or set with another operation.  A proper implicit
*4882a593Smuzhiyunor explicit memory barrier is needed before the value set with the operation
*4882a593Smuzhiyunis guaranteed to be readable with atomic_read from another thread.
*4882a593Smuzhiyun
*4882a593SmuzhiyunNext, we have::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	#define atomic_read(v)	((v)->counter)
*4882a593Smuzhiyun
*4882a593Smuzhiyunwhich simply reads the counter value currently visible to the calling thread.
*4882a593SmuzhiyunThe read is atomic in that the return value is guaranteed to be one of the
*4882a593Smuzhiyunvalues initialized or modified with the interface operations if a proper
*4882a593Smuzhiyunimplicit or explicit memory barrier is used after possible runtime
*4882a593Smuzhiyuninitialization by any other thread and the value is modified only with the
*4882a593Smuzhiyuninterface operations.  atomic_read does not guarantee that the runtime
*4882a593Smuzhiyuninitialization by any other thread is visible yet, so the user of the
*4882a593Smuzhiyuninterface must take care of that with a proper implicit or explicit memory
*4882a593Smuzhiyunbarrier.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. warning::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	``atomic_read()`` and ``atomic_set()`` DO NOT IMPLY BARRIERS!
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Some architectures may choose to use the volatile keyword, barriers, or
*4882a593Smuzhiyun	inline assembly to guarantee some degree of immediacy for atomic_read()
*4882a593Smuzhiyun	and atomic_set().  This is not uniformly guaranteed, and may change in
*4882a593Smuzhiyun	the future, so all users of atomic_t should treat atomic_read() and
*4882a593Smuzhiyun	atomic_set() as simple C statements that may be reordered or optimized
*4882a593Smuzhiyun	away entirely by the compiler or processor, and explicitly invoke the
*4882a593Smuzhiyun	appropriate compiler and/or memory barrier for each use case.  Failure
*4882a593Smuzhiyun	to do so will result in code that may suddenly break when used with
*4882a593Smuzhiyun	different architectures or compiler optimizations, or even changes in
*4882a593Smuzhiyun	unrelated code which changes how the compiler optimizes the section
*4882a593Smuzhiyun	accessing atomic_t variables.
*4882a593Smuzhiyun
*4882a593SmuzhiyunProperly aligned pointers, longs, ints, and chars (and unsigned
*4882a593Smuzhiyunequivalents) may be atomically loaded from and stored to in the same
*4882a593Smuzhiyunsense as described for atomic_read() and atomic_set().  The READ_ONCE()
*4882a593Smuzhiyunand WRITE_ONCE() macros should be used to prevent the compiler from using
*4882a593Smuzhiyunoptimizations that might otherwise optimize accesses out of existence on
*4882a593Smuzhiyunthe one hand, or that might create unsolicited accesses on the other.
*4882a593Smuzhiyun
*4882a593SmuzhiyunFor example consider the following code::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	while (a > 0)
*4882a593Smuzhiyun		do_something();
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf the compiler can prove that do_something() does not store to the
*4882a593Smuzhiyunvariable a, then the compiler is within its rights transforming this to
*4882a593Smuzhiyunthe following::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	if (a > 0)
*4882a593Smuzhiyun		for (;;)
*4882a593Smuzhiyun			do_something();
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf you don't want the compiler to do this (and you probably don't), then
*4882a593Smuzhiyunyou should use something like the following::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	while (READ_ONCE(a) > 0)
*4882a593Smuzhiyun		do_something();
*4882a593Smuzhiyun
*4882a593SmuzhiyunAlternatively, you could place a barrier() call in the loop.
*4882a593Smuzhiyun
*4882a593SmuzhiyunFor another example, consider the following code::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	tmp_a = a;
*4882a593Smuzhiyun	do_something_with(tmp_a);
*4882a593Smuzhiyun	do_something_else_with(tmp_a);
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf the compiler can prove that do_something_with() does not store to the
*4882a593Smuzhiyunvariable a, then the compiler is within its rights to manufacture an
*4882a593Smuzhiyunadditional load as follows::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	tmp_a = a;
*4882a593Smuzhiyun	do_something_with(tmp_a);
*4882a593Smuzhiyun	tmp_a = a;
*4882a593Smuzhiyun	do_something_else_with(tmp_a);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis could fatally confuse your code if it expected the same value
*4882a593Smuzhiyunto be passed to do_something_with() and do_something_else_with().
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe compiler would be likely to manufacture this additional load if
*4882a593Smuzhiyundo_something_with() was an inline function that made very heavy use
*4882a593Smuzhiyunof registers: reloading from variable a could save a flush to the
*4882a593Smuzhiyunstack and later reload.  To prevent the compiler from attacking your
*4882a593Smuzhiyuncode in this manner, write the following::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	tmp_a = READ_ONCE(a);
*4882a593Smuzhiyun	do_something_with(tmp_a);
*4882a593Smuzhiyun	do_something_else_with(tmp_a);
*4882a593Smuzhiyun
*4882a593SmuzhiyunFor a final example, consider the following code, assuming that the
*4882a593Smuzhiyunvariable a is set at boot time before the second CPU is brought online
*4882a593Smuzhiyunand never changed later, so that memory barriers are not needed::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	if (a)
*4882a593Smuzhiyun		b = 9;
*4882a593Smuzhiyun	else
*4882a593Smuzhiyun		b = 42;
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe compiler is within its rights to manufacture an additional store
*4882a593Smuzhiyunby transforming the above code into the following::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	b = 42;
*4882a593Smuzhiyun	if (a)
*4882a593Smuzhiyun		b = 9;
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis could come as a fatal surprise to other code running concurrently
*4882a593Smuzhiyunthat expected b to never have the value 42 if a was zero.  To prevent
*4882a593Smuzhiyunthe compiler from doing this, write something like::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	if (a)
*4882a593Smuzhiyun		WRITE_ONCE(b, 9);
*4882a593Smuzhiyun	else
*4882a593Smuzhiyun		WRITE_ONCE(b, 42);
*4882a593Smuzhiyun
*4882a593SmuzhiyunDon't even -think- about doing this without proper use of memory barriers,
*4882a593Smuzhiyunlocks, or atomic operations if variable a can change at runtime!
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. warning::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	``READ_ONCE()`` OR ``WRITE_ONCE()`` DO NOT IMPLY A BARRIER!
*4882a593Smuzhiyun
*4882a593SmuzhiyunNow, we move onto the atomic operation interfaces typically implemented with
*4882a593Smuzhiyunthe help of assembly code. ::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void atomic_add(int i, atomic_t *v);
*4882a593Smuzhiyun	void atomic_sub(int i, atomic_t *v);
*4882a593Smuzhiyun	void atomic_inc(atomic_t *v);
*4882a593Smuzhiyun	void atomic_dec(atomic_t *v);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThese four routines add and subtract integral values to/from the given
*4882a593Smuzhiyunatomic_t value.  The first two routines pass explicit integers by
*4882a593Smuzhiyunwhich to make the adjustment, whereas the latter two use an implicit
*4882a593Smuzhiyunadjustment value of "1".
*4882a593Smuzhiyun
*4882a593SmuzhiyunOne very important aspect of these two routines is that they DO NOT
*4882a593Smuzhiyunrequire any explicit memory barriers.  They need only perform the
*4882a593Smuzhiyunatomic_t counter update in an SMP safe manner.
*4882a593Smuzhiyun
*4882a593SmuzhiyunNext, we have::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int atomic_inc_return(atomic_t *v);
*4882a593Smuzhiyun	int atomic_dec_return(atomic_t *v);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThese routines add 1 and subtract 1, respectively, from the given
*4882a593Smuzhiyunatomic_t and return the new counter value after the operation is
*4882a593Smuzhiyunperformed.
*4882a593Smuzhiyun
*4882a593SmuzhiyunUnlike the above routines, it is required that these primitives
*4882a593Smuzhiyuninclude explicit memory barriers that are performed before and after
*4882a593Smuzhiyunthe operation.  It must be done such that all memory operations before
*4882a593Smuzhiyunand after the atomic operation calls are strongly ordered with respect
*4882a593Smuzhiyunto the atomic operation itself.
*4882a593Smuzhiyun
*4882a593SmuzhiyunFor example, it should behave as if a smp_mb() call existed both
*4882a593Smuzhiyunbefore and after the atomic operation.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf the atomic instructions used in an implementation provide explicit
*4882a593Smuzhiyunmemory barrier semantics which satisfy the above requirements, that is
*4882a593Smuzhiyunfine as well.
*4882a593Smuzhiyun
*4882a593SmuzhiyunLet's move on::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int atomic_add_return(int i, atomic_t *v);
*4882a593Smuzhiyun	int atomic_sub_return(int i, atomic_t *v);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThese behave just like atomic_{inc,dec}_return() except that an
*4882a593Smuzhiyunexplicit counter adjustment is given instead of the implicit "1".
*4882a593SmuzhiyunThis means that like atomic_{inc,dec}_return(), the memory barrier
*4882a593Smuzhiyunsemantics are required.
*4882a593Smuzhiyun
*4882a593SmuzhiyunNext::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int atomic_inc_and_test(atomic_t *v);
*4882a593Smuzhiyun	int atomic_dec_and_test(atomic_t *v);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThese two routines increment and decrement by 1, respectively, the
*4882a593Smuzhiyungiven atomic counter.  They return a boolean indicating whether the
*4882a593Smuzhiyunresulting counter value was zero or not.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAgain, these primitives provide explicit memory barrier semantics around
*4882a593Smuzhiyunthe atomic operation::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int atomic_sub_and_test(int i, atomic_t *v);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis is identical to atomic_dec_and_test() except that an explicit
*4882a593Smuzhiyundecrement is given instead of the implicit "1".  This primitive must
*4882a593Smuzhiyunprovide explicit memory barrier semantics around the operation::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int atomic_add_negative(int i, atomic_t *v);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe given increment is added to the given atomic counter value.  A boolean
*4882a593Smuzhiyunis return which indicates whether the resulting counter value is negative.
*4882a593SmuzhiyunThis primitive must provide explicit memory barrier semantics around
*4882a593Smuzhiyunthe operation.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThen::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int atomic_xchg(atomic_t *v, int new);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis performs an atomic exchange operation on the atomic variable v, setting
*4882a593Smuzhiyunthe given new value.  It returns the old value that the atomic variable v had
*4882a593Smuzhiyunjust before the operation.
*4882a593Smuzhiyun
*4882a593Smuzhiyunatomic_xchg must provide explicit memory barriers around the operation. ::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int atomic_cmpxchg(atomic_t *v, int old, int new);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis performs an atomic compare exchange operation on the atomic value v,
*4882a593Smuzhiyunwith the given old and new values. Like all atomic_xxx operations,
*4882a593Smuzhiyunatomic_cmpxchg will only satisfy its atomicity semantics as long as all
*4882a593Smuzhiyunother accesses of \*v are performed through atomic_xxx operations.
*4882a593Smuzhiyun
*4882a593Smuzhiyunatomic_cmpxchg must provide explicit memory barriers around the operation,
*4882a593Smuzhiyunalthough if the comparison fails then no memory ordering guarantees are
*4882a593Smuzhiyunrequired.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe semantics for atomic_cmpxchg are the same as those defined for 'cas'
*4882a593Smuzhiyunbelow.
*4882a593Smuzhiyun
*4882a593SmuzhiyunFinally::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int atomic_add_unless(atomic_t *v, int a, int u);
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf the atomic value v is not equal to u, this function adds a to v, and
*4882a593Smuzhiyunreturns non zero. If v is equal to u then it returns zero. This is done as
*4882a593Smuzhiyunan atomic operation.
*4882a593Smuzhiyun
*4882a593Smuzhiyunatomic_add_unless must provide explicit memory barriers around the
*4882a593Smuzhiyunoperation unless it fails (returns 0).
*4882a593Smuzhiyun
*4882a593Smuzhiyunatomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0)
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf a caller requires memory barrier semantics around an atomic_t
*4882a593Smuzhiyunoperation which does not return a value, a set of interfaces are
*4882a593Smuzhiyundefined which accomplish this::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void smp_mb__before_atomic(void);
*4882a593Smuzhiyun	void smp_mb__after_atomic(void);
*4882a593Smuzhiyun
*4882a593SmuzhiyunPreceding a non-value-returning read-modify-write atomic operation with
*4882a593Smuzhiyunsmp_mb__before_atomic() and following it with smp_mb__after_atomic()
*4882a593Smuzhiyunprovides the same full ordering that is provided by value-returning
*4882a593Smuzhiyunread-modify-write atomic operations.
*4882a593Smuzhiyun
*4882a593SmuzhiyunFor example, smp_mb__before_atomic() can be used like so::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	obj->dead = 1;
*4882a593Smuzhiyun	smp_mb__before_atomic();
*4882a593Smuzhiyun	atomic_dec(&obj->ref_count);
*4882a593Smuzhiyun
*4882a593SmuzhiyunIt makes sure that all memory operations preceding the atomic_dec()
*4882a593Smuzhiyuncall are strongly ordered with respect to the atomic counter
*4882a593Smuzhiyunoperation.  In the above example, it guarantees that the assignment of
*4882a593Smuzhiyun"1" to obj->dead will be globally visible to other cpus before the
*4882a593Smuzhiyunatomic counter decrement.
*4882a593Smuzhiyun
*4882a593SmuzhiyunWithout the explicit smp_mb__before_atomic() call, the
*4882a593Smuzhiyunimplementation could legally allow the atomic counter update visible
*4882a593Smuzhiyunto other cpus before the "obj->dead = 1;" assignment.
*4882a593Smuzhiyun
*4882a593SmuzhiyunA missing memory barrier in the cases where they are required by the
*4882a593Smuzhiyunatomic_t implementation above can have disastrous results.  Here is
*4882a593Smuzhiyunan example, which follows a pattern occurring frequently in the Linux
*4882a593Smuzhiyunkernel.  It is the use of atomic counters to implement reference
*4882a593Smuzhiyuncounting, and it works such that once the counter falls to zero it can
*4882a593Smuzhiyunbe guaranteed that no other entity can be accessing the object::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	static void obj_list_add(struct obj *obj, struct list_head *head)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		obj->active = 1;
*4882a593Smuzhiyun		list_add(&obj->list, head);
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593Smuzhiyun	static void obj_list_del(struct obj *obj)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		list_del(&obj->list);
*4882a593Smuzhiyun		obj->active = 0;
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593Smuzhiyun	static void obj_destroy(struct obj *obj)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		BUG_ON(obj->active);
*4882a593Smuzhiyun		kfree(obj);
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593Smuzhiyun	struct obj *obj_list_peek(struct list_head *head)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		if (!list_empty(head)) {
*4882a593Smuzhiyun			struct obj *obj;
*4882a593Smuzhiyun
*4882a593Smuzhiyun			obj = list_entry(head->next, struct obj, list);
*4882a593Smuzhiyun			atomic_inc(&obj->refcnt);
*4882a593Smuzhiyun			return obj;
*4882a593Smuzhiyun		}
*4882a593Smuzhiyun		return NULL;
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void obj_poke(void)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		struct obj *obj;
*4882a593Smuzhiyun
*4882a593Smuzhiyun		spin_lock(&global_list_lock);
*4882a593Smuzhiyun		obj = obj_list_peek(&global_list);
*4882a593Smuzhiyun		spin_unlock(&global_list_lock);
*4882a593Smuzhiyun
*4882a593Smuzhiyun		if (obj) {
*4882a593Smuzhiyun			obj->ops->poke(obj);
*4882a593Smuzhiyun			if (atomic_dec_and_test(&obj->refcnt))
*4882a593Smuzhiyun				obj_destroy(obj);
*4882a593Smuzhiyun		}
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void obj_timeout(struct obj *obj)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		spin_lock(&global_list_lock);
*4882a593Smuzhiyun		obj_list_del(obj);
*4882a593Smuzhiyun		spin_unlock(&global_list_lock);
*4882a593Smuzhiyun
*4882a593Smuzhiyun		if (atomic_dec_and_test(&obj->refcnt))
*4882a593Smuzhiyun			obj_destroy(obj);
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. note::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	This is a simplification of the ARP queue management in the generic
*4882a593Smuzhiyun	neighbour discover code of the networking.  Olaf Kirch found a bug wrt.
*4882a593Smuzhiyun	memory barriers in kfree_skb() that exposed the atomic_t memory barrier
*4882a593Smuzhiyun	requirements quite clearly.
*4882a593Smuzhiyun
*4882a593SmuzhiyunGiven the above scheme, it must be the case that the obj->active
*4882a593Smuzhiyunupdate done by the obj list deletion be visible to other processors
*4882a593Smuzhiyunbefore the atomic counter decrement is performed.
*4882a593Smuzhiyun
*4882a593SmuzhiyunOtherwise, the counter could fall to zero, yet obj->active would still
*4882a593Smuzhiyunbe set, thus triggering the assertion in obj_destroy().  The error
*4882a593Smuzhiyunsequence looks like this::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	cpu 0				cpu 1
*4882a593Smuzhiyun	obj_poke()			obj_timeout()
*4882a593Smuzhiyun	obj = obj_list_peek();
*4882a593Smuzhiyun	... gains ref to obj, refcnt=2
*4882a593Smuzhiyun					obj_list_del(obj);
*4882a593Smuzhiyun					obj->active = 0 ...
*4882a593Smuzhiyun					... visibility delayed ...
*4882a593Smuzhiyun					atomic_dec_and_test()
*4882a593Smuzhiyun					... refcnt drops to 1 ...
*4882a593Smuzhiyun	atomic_dec_and_test()
*4882a593Smuzhiyun	... refcount drops to 0 ...
*4882a593Smuzhiyun	obj_destroy()
*4882a593Smuzhiyun	BUG() triggers since obj->active
*4882a593Smuzhiyun	still seen as one
*4882a593Smuzhiyun					obj->active update visibility occurs
*4882a593Smuzhiyun
*4882a593SmuzhiyunWith the memory barrier semantics required of the atomic_t operations
*4882a593Smuzhiyunwhich return values, the above sequence of memory visibility can never
*4882a593Smuzhiyunhappen.  Specifically, in the above case the atomic_dec_and_test()
*4882a593Smuzhiyuncounter decrement would not become globally visible until the
*4882a593Smuzhiyunobj->active update does.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAs a historical note, 32-bit Sparc used to only allow usage of
*4882a593Smuzhiyun24-bits of its atomic_t type.  This was because it used 8 bits
*4882a593Smuzhiyunas a spinlock for SMP safety.  Sparc32 lacked a "compare and swap"
*4882a593Smuzhiyuntype instruction.  However, 32-bit Sparc has since been moved over
*4882a593Smuzhiyunto a "hash table of spinlocks" scheme, that allows the full 32-bit
*4882a593Smuzhiyuncounter to be realized.  Essentially, an array of spinlocks are
*4882a593Smuzhiyunindexed into based upon the address of the atomic_t being operated
*4882a593Smuzhiyunon, and that lock protects the atomic operation.  Parisc uses the
*4882a593Smuzhiyunsame scheme.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAnother note is that the atomic_t operations returning values are
*4882a593Smuzhiyunextremely slow on an old 386.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunAtomic Bitmask
*4882a593Smuzhiyun==============
*4882a593Smuzhiyun
*4882a593SmuzhiyunWe will now cover the atomic bitmask operations.  You will find that
*4882a593Smuzhiyuntheir SMP and memory barrier semantics are similar in shape and scope
*4882a593Smuzhiyunto the atomic_t ops above.
*4882a593Smuzhiyun
*4882a593SmuzhiyunNative atomic bit operations are defined to operate on objects aligned
*4882a593Smuzhiyunto the size of an "unsigned long" C data type, and are least of that
*4882a593Smuzhiyunsize.  The endianness of the bits within each "unsigned long" are the
*4882a593Smuzhiyunnative endianness of the cpu. ::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void set_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun	void clear_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun	void change_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThese routines set, clear, and change, respectively, the bit number
*4882a593Smuzhiyunindicated by "nr" on the bit mask pointed to by "ADDR".
*4882a593Smuzhiyun
*4882a593SmuzhiyunThey must execute atomically, yet there are no implicit memory barrier
*4882a593Smuzhiyunsemantics required of these interfaces. ::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun	int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun	int test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun
*4882a593SmuzhiyunLike the above, except that these routines return a boolean which
*4882a593Smuzhiyunindicates whether the changed bit was set _BEFORE_ the atomic bit
*4882a593Smuzhiyunoperation.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. warning::
*4882a593Smuzhiyun        It is incredibly important that the value be a boolean, ie. "0" or "1".
*4882a593Smuzhiyun        Do not try to be fancy and save a few instructions by declaring the
*4882a593Smuzhiyun        above to return "long" and just returning something like "old_val &
*4882a593Smuzhiyun        mask" because that will not work.
*4882a593Smuzhiyun
*4882a593SmuzhiyunFor one thing, this return value gets truncated to int in many code
*4882a593Smuzhiyunpaths using these interfaces, so on 64-bit if the bit is set in the
*4882a593Smuzhiyunupper 32-bits then testers will never see that.
*4882a593Smuzhiyun
*4882a593SmuzhiyunOne great example of where this problem crops up are the thread_info
*4882a593Smuzhiyunflag operations.  Routines such as test_and_set_ti_thread_flag() chop
*4882a593Smuzhiyunthe return value into an int.  There are other places where things
*4882a593Smuzhiyunlike this occur as well.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThese routines, like the atomic_t counter operations returning values,
*4882a593Smuzhiyunmust provide explicit memory barrier semantics around their execution.
*4882a593SmuzhiyunAll memory operations before the atomic bit operation call must be
*4882a593Smuzhiyunmade visible globally before the atomic bit operation is made visible.
*4882a593SmuzhiyunLikewise, the atomic bit operation must be visible globally before any
*4882a593Smuzhiyunsubsequent memory operation is made visible.  For example::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	obj->dead = 1;
*4882a593Smuzhiyun	if (test_and_set_bit(0, &obj->flags))
*4882a593Smuzhiyun		/* ... */;
*4882a593Smuzhiyun	obj->killed = 1;
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe implementation of test_and_set_bit() must guarantee that
*4882a593Smuzhiyun"obj->dead = 1;" is visible to cpus before the atomic memory operation
*4882a593Smuzhiyundone by test_and_set_bit() becomes visible.  Likewise, the atomic
*4882a593Smuzhiyunmemory operation done by test_and_set_bit() must become visible before
*4882a593Smuzhiyun"obj->killed = 1;" is visible.
*4882a593Smuzhiyun
*4882a593SmuzhiyunFinally there is the basic operation::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);
*4882a593Smuzhiyun
*4882a593SmuzhiyunWhich returns a boolean indicating if bit "nr" is set in the bitmask
*4882a593Smuzhiyunpointed to by "addr".
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf explicit memory barriers are required around {set,clear}_bit() (which do
*4882a593Smuzhiyunnot return a value, and thus does not need to provide memory barrier
*4882a593Smuzhiyunsemantics), two interfaces are provided::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void smp_mb__before_atomic(void);
*4882a593Smuzhiyun	void smp_mb__after_atomic(void);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThey are used as follows, and are akin to their atomic_t operation
*4882a593Smuzhiyunbrothers::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	/* All memory operations before this call will
*4882a593Smuzhiyun	 * be globally visible before the clear_bit().
*4882a593Smuzhiyun	 */
*4882a593Smuzhiyun	smp_mb__before_atomic();
*4882a593Smuzhiyun	clear_bit( ... );
*4882a593Smuzhiyun
*4882a593Smuzhiyun	/* The clear_bit() will be visible before all
*4882a593Smuzhiyun	 * subsequent memory operations.
*4882a593Smuzhiyun	 */
*4882a593Smuzhiyun	 smp_mb__after_atomic();
*4882a593Smuzhiyun
*4882a593SmuzhiyunThere are two special bitops with lock barrier semantics (acquire/release,
*4882a593Smuzhiyunsame as spinlocks). These operate in the same way as their non-_lock/unlock
*4882a593Smuzhiyunpostfixed variants, except that they are to provide acquire/release semantics,
*4882a593Smuzhiyunrespectively. This means they can be used for bit_spin_trylock and
*4882a593Smuzhiyunbit_spin_unlock type operations without specifying any more barriers. ::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
*4882a593Smuzhiyun	void clear_bit_unlock(unsigned long nr, unsigned long *addr);
*4882a593Smuzhiyun	void __clear_bit_unlock(unsigned long nr, unsigned long *addr);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe __clear_bit_unlock version is non-atomic, however it still implements
*4882a593Smuzhiyununlock barrier semantics. This can be useful if the lock itself is protecting
*4882a593Smuzhiyunthe other bits in the word.
*4882a593Smuzhiyun
*4882a593SmuzhiyunFinally, there are non-atomic versions of the bitmask operations
*4882a593Smuzhiyunprovided.  They are used in contexts where some other higher-level SMP
*4882a593Smuzhiyunlocking scheme is being used to protect the bitmask, and thus less
*4882a593Smuzhiyunexpensive non-atomic operations may be used in the implementation.
*4882a593SmuzhiyunThey have names similar to the above bitmask operation interfaces,
*4882a593Smuzhiyunexcept that two underscores are prefixed to the interface name. ::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void __set_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun	void __clear_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun	void __change_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun	int __test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun	int __test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun	int __test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
*4882a593Smuzhiyun
*4882a593SmuzhiyunThese non-atomic variants also do not require any special memory
*4882a593Smuzhiyunbarrier semantics.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe routines xchg() and cmpxchg() must provide the same exact
*4882a593Smuzhiyunmemory-barrier semantics as the atomic and bit operations returning
*4882a593Smuzhiyunvalues.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. note::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	If someone wants to use xchg(), cmpxchg() and their variants,
*4882a593Smuzhiyun	linux/atomic.h should be included rather than asm/cmpxchg.h, unless the
*4882a593Smuzhiyun	code is in arch/* and can take care of itself.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSpinlocks and rwlocks have memory barrier expectations as well.
*4882a593SmuzhiyunThe rule to follow is simple:
*4882a593Smuzhiyun
*4882a593Smuzhiyun1) When acquiring a lock, the implementation must make it globally
*4882a593Smuzhiyun   visible before any subsequent memory operation.
*4882a593Smuzhiyun
*4882a593Smuzhiyun2) When releasing a lock, the implementation must make it such that
*4882a593Smuzhiyun   all previous memory operations are globally visible before the
*4882a593Smuzhiyun   lock release.
*4882a593Smuzhiyun
*4882a593SmuzhiyunWhich finally brings us to _atomic_dec_and_lock().  There is an
*4882a593Smuzhiyunarchitecture-neutral version implemented in lib/dec_and_lock.c,
*4882a593Smuzhiyunbut most platforms will wish to optimize this in assembler. ::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
*4882a593Smuzhiyun
*4882a593SmuzhiyunAtomically decrement the given counter, and if will drop to zero
*4882a593Smuzhiyunatomically acquire the given spinlock and perform the decrement
*4882a593Smuzhiyunof the counter to zero.  If it does not drop to zero, do nothing
*4882a593Smuzhiyunwith the spinlock.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIt is actually pretty simple to get the memory barrier correct.
*4882a593SmuzhiyunSimply satisfy the spinlock grab requirements, which is make
*4882a593Smuzhiyunsure the spinlock operation is globally visible before any
*4882a593Smuzhiyunsubsequent memory operation.
*4882a593Smuzhiyun
*4882a593SmuzhiyunWe can demonstrate this operation more clearly if we define
*4882a593Smuzhiyunan abstract atomic operation::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	long cas(long *mem, long old, long new);
*4882a593Smuzhiyun
*4882a593Smuzhiyun"cas" stands for "compare and swap".  It atomically:
*4882a593Smuzhiyun
*4882a593Smuzhiyun1) Compares "old" with the value currently at "mem".
*4882a593Smuzhiyun2) If they are equal, "new" is written to "mem".
*4882a593Smuzhiyun3) Regardless, the current value at "mem" is returned.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAs an example usage, here is what an atomic counter update
*4882a593Smuzhiyunmight look like::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	void example_atomic_inc(long *counter)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		long old, new, ret;
*4882a593Smuzhiyun
*4882a593Smuzhiyun		while (1) {
*4882a593Smuzhiyun			old = *counter;
*4882a593Smuzhiyun			new = old + 1;
*4882a593Smuzhiyun
*4882a593Smuzhiyun			ret = cas(counter, old, new);
*4882a593Smuzhiyun			if (ret == old)
*4882a593Smuzhiyun				break;
*4882a593Smuzhiyun		}
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593SmuzhiyunLet's use cas() in order to build a pseudo-C atomic_dec_and_lock()::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		long old, new, ret;
*4882a593Smuzhiyun		int went_to_zero;
*4882a593Smuzhiyun
*4882a593Smuzhiyun		went_to_zero = 0;
*4882a593Smuzhiyun		while (1) {
*4882a593Smuzhiyun			old = atomic_read(atomic);
*4882a593Smuzhiyun			new = old - 1;
*4882a593Smuzhiyun			if (new == 0) {
*4882a593Smuzhiyun				went_to_zero = 1;
*4882a593Smuzhiyun				spin_lock(lock);
*4882a593Smuzhiyun			}
*4882a593Smuzhiyun			ret = cas(atomic, old, new);
*4882a593Smuzhiyun			if (ret == old)
*4882a593Smuzhiyun				break;
*4882a593Smuzhiyun			if (went_to_zero) {
*4882a593Smuzhiyun				spin_unlock(lock);
*4882a593Smuzhiyun				went_to_zero = 0;
*4882a593Smuzhiyun			}
*4882a593Smuzhiyun		}
*4882a593Smuzhiyun
*4882a593Smuzhiyun		return went_to_zero;
*4882a593Smuzhiyun	}
*4882a593Smuzhiyun
*4882a593SmuzhiyunNow, as far as memory barriers go, as long as spin_lock()
*4882a593Smuzhiyunstrictly orders all subsequent memory operations (including
*4882a593Smuzhiyunthe cas()) with respect to itself, things will be fine.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSaid another way, _atomic_dec_and_lock() must guarantee that
*4882a593Smuzhiyuna counter dropping to zero is never made visible before the
*4882a593Smuzhiyunspinlock being acquired.
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. note::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	Note that this also means that for the case where the counter is not
*4882a593Smuzhiyun	dropping to zero, there are no memory ordering requirements.