xref: /OK3568_Linux_fs/kernel/Documentation/arm/cluster-pm-race-avoidance.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=========================================================
2*4882a593SmuzhiyunCluster-wide Power-up/power-down race avoidance algorithm
3*4882a593Smuzhiyun=========================================================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunThis file documents the algorithm which is used to coordinate CPU and
6*4882a593Smuzhiyuncluster setup and teardown operations and to manage hardware coherency
7*4882a593Smuzhiyuncontrols safely.
8*4882a593Smuzhiyun
9*4882a593SmuzhiyunThe section "Rationale" explains what the algorithm is for and why it is
10*4882a593Smuzhiyunneeded.  "Basic model" explains general concepts using a simplified view
11*4882a593Smuzhiyunof the system.  The other sections explain the actual details of the
12*4882a593Smuzhiyunalgorithm in use.
13*4882a593Smuzhiyun
14*4882a593Smuzhiyun
15*4882a593SmuzhiyunRationale
16*4882a593Smuzhiyun---------
17*4882a593Smuzhiyun
18*4882a593SmuzhiyunIn a system containing multiple CPUs, it is desirable to have the
19*4882a593Smuzhiyunability to turn off individual CPUs when the system is idle, reducing
20*4882a593Smuzhiyunpower consumption and thermal dissipation.
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunIn a system containing multiple clusters of CPUs, it is also desirable
23*4882a593Smuzhiyunto have the ability to turn off entire clusters.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunTurning entire clusters off and on is a risky business, because it
26*4882a593Smuzhiyuninvolves performing potentially destructive operations affecting a group
27*4882a593Smuzhiyunof independently running CPUs, while the OS continues to run.  This
28*4882a593Smuzhiyunmeans that we need some coordination in order to ensure that critical
29*4882a593Smuzhiyuncluster-level operations are only performed when it is truly safe to do
30*4882a593Smuzhiyunso.
31*4882a593Smuzhiyun
32*4882a593SmuzhiyunSimple locking may not be sufficient to solve this problem, because
33*4882a593Smuzhiyunmechanisms like Linux spinlocks may rely on coherency mechanisms which
34*4882a593Smuzhiyunare not immediately enabled when a cluster powers up.  Since enabling or
35*4882a593Smuzhiyundisabling those mechanisms may itself be a non-atomic operation (such as
36*4882a593Smuzhiyunwriting some hardware registers and invalidating large caches), other
37*4882a593Smuzhiyunmethods of coordination are required in order to guarantee safe
38*4882a593Smuzhiyunpower-down and power-up at the cluster level.
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunThe mechanism presented in this document describes a coherent memory
41*4882a593Smuzhiyunbased protocol for performing the needed coordination.  It aims to be as
42*4882a593Smuzhiyunlightweight as possible, while providing the required safety properties.
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun
45*4882a593SmuzhiyunBasic model
46*4882a593Smuzhiyun-----------
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunEach cluster and CPU is assigned a state, as follows:
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun	- DOWN
51*4882a593Smuzhiyun	- COMING_UP
52*4882a593Smuzhiyun	- UP
53*4882a593Smuzhiyun	- GOING_DOWN
54*4882a593Smuzhiyun
55*4882a593Smuzhiyun::
56*4882a593Smuzhiyun
57*4882a593Smuzhiyun	    +---------> UP ----------+
58*4882a593Smuzhiyun	    |                        v
59*4882a593Smuzhiyun
60*4882a593Smuzhiyun	COMING_UP                GOING_DOWN
61*4882a593Smuzhiyun
62*4882a593Smuzhiyun	    ^                        |
63*4882a593Smuzhiyun	    +--------- DOWN <--------+
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun
66*4882a593SmuzhiyunDOWN:
67*4882a593Smuzhiyun	The CPU or cluster is not coherent, and is either powered off or
68*4882a593Smuzhiyun	suspended, or is ready to be powered off or suspended.
69*4882a593Smuzhiyun
70*4882a593SmuzhiyunCOMING_UP:
71*4882a593Smuzhiyun	The CPU or cluster has committed to moving to the UP state.
72*4882a593Smuzhiyun	It may be part way through the process of initialisation and
73*4882a593Smuzhiyun	enabling coherency.
74*4882a593Smuzhiyun
75*4882a593SmuzhiyunUP:
76*4882a593Smuzhiyun	The CPU or cluster is active and coherent at the hardware
77*4882a593Smuzhiyun	level.  A CPU in this state is not necessarily being used
78*4882a593Smuzhiyun	actively by the kernel.
79*4882a593Smuzhiyun
80*4882a593SmuzhiyunGOING_DOWN:
81*4882a593Smuzhiyun	The CPU or cluster has committed to moving to the DOWN
82*4882a593Smuzhiyun	state.  It may be part way through the process of teardown and
83*4882a593Smuzhiyun	coherency exit.
84*4882a593Smuzhiyun
85*4882a593Smuzhiyun
86*4882a593SmuzhiyunEach CPU has one of these states assigned to it at any point in time.
87*4882a593SmuzhiyunThe CPU states are described in the "CPU state" section, below.
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunEach cluster is also assigned a state, but it is necessary to split the
90*4882a593Smuzhiyunstate value into two parts (the "cluster" state and "inbound" state) and
91*4882a593Smuzhiyunto introduce additional states in order to avoid races between different
92*4882a593SmuzhiyunCPUs in the cluster simultaneously modifying the state.  The cluster-
93*4882a593Smuzhiyunlevel states are described in the "Cluster state" section.
94*4882a593Smuzhiyun
95*4882a593SmuzhiyunTo help distinguish the CPU states from cluster states in this
96*4882a593Smuzhiyundiscussion, the state names are given a `CPU_` prefix for the CPU states,
97*4882a593Smuzhiyunand a `CLUSTER_` or `INBOUND_` prefix for the cluster states.
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunCPU state
101*4882a593Smuzhiyun---------
102*4882a593Smuzhiyun
103*4882a593SmuzhiyunIn this algorithm, each individual core in a multi-core processor is
104*4882a593Smuzhiyunreferred to as a "CPU".  CPUs are assumed to be single-threaded:
105*4882a593Smuzhiyuntherefore, a CPU can only be doing one thing at a single point in time.
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunThis means that CPUs fit the basic model closely.
108*4882a593Smuzhiyun
109*4882a593SmuzhiyunThe algorithm defines the following states for each CPU in the system:
110*4882a593Smuzhiyun
111*4882a593Smuzhiyun	- CPU_DOWN
112*4882a593Smuzhiyun	- CPU_COMING_UP
113*4882a593Smuzhiyun	- CPU_UP
114*4882a593Smuzhiyun	- CPU_GOING_DOWN
115*4882a593Smuzhiyun
116*4882a593Smuzhiyun::
117*4882a593Smuzhiyun
118*4882a593Smuzhiyun	 cluster setup and
119*4882a593Smuzhiyun	CPU setup complete          policy decision
120*4882a593Smuzhiyun	      +-----------> CPU_UP ------------+
121*4882a593Smuzhiyun	      |                                v
122*4882a593Smuzhiyun
123*4882a593Smuzhiyun	CPU_COMING_UP                   CPU_GOING_DOWN
124*4882a593Smuzhiyun
125*4882a593Smuzhiyun	      ^                                |
126*4882a593Smuzhiyun	      +----------- CPU_DOWN <----------+
127*4882a593Smuzhiyun	 policy decision           CPU teardown complete
128*4882a593Smuzhiyun	or hardware event
129*4882a593Smuzhiyun
130*4882a593Smuzhiyun
131*4882a593SmuzhiyunThe definitions of the four states correspond closely to the states of
132*4882a593Smuzhiyunthe basic model.
133*4882a593Smuzhiyun
134*4882a593SmuzhiyunTransitions between states occur as follows.
135*4882a593Smuzhiyun
136*4882a593SmuzhiyunA trigger event (spontaneous) means that the CPU can transition to the
137*4882a593Smuzhiyunnext state as a result of making local progress only, with no
138*4882a593Smuzhiyunrequirement for any external event to happen.
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun
141*4882a593SmuzhiyunCPU_DOWN:
142*4882a593Smuzhiyun	A CPU reaches the CPU_DOWN state when it is ready for
143*4882a593Smuzhiyun	power-down.  On reaching this state, the CPU will typically
144*4882a593Smuzhiyun	power itself down or suspend itself, via a WFI instruction or a
145*4882a593Smuzhiyun	firmware call.
146*4882a593Smuzhiyun
147*4882a593Smuzhiyun	Next state:
148*4882a593Smuzhiyun		CPU_COMING_UP
149*4882a593Smuzhiyun	Conditions:
150*4882a593Smuzhiyun		none
151*4882a593Smuzhiyun
152*4882a593Smuzhiyun	Trigger events:
153*4882a593Smuzhiyun		a) an explicit hardware power-up operation, resulting
154*4882a593Smuzhiyun		   from a policy decision on another CPU;
155*4882a593Smuzhiyun
156*4882a593Smuzhiyun		b) a hardware event, such as an interrupt.
157*4882a593Smuzhiyun
158*4882a593Smuzhiyun
159*4882a593SmuzhiyunCPU_COMING_UP:
160*4882a593Smuzhiyun	A CPU cannot start participating in hardware coherency until the
161*4882a593Smuzhiyun	cluster is set up and coherent.  If the cluster is not ready,
162*4882a593Smuzhiyun	then the CPU will wait in the CPU_COMING_UP state until the
163*4882a593Smuzhiyun	cluster has been set up.
164*4882a593Smuzhiyun
165*4882a593Smuzhiyun	Next state:
166*4882a593Smuzhiyun		CPU_UP
167*4882a593Smuzhiyun	Conditions:
168*4882a593Smuzhiyun		The CPU's parent cluster must be in CLUSTER_UP.
169*4882a593Smuzhiyun	Trigger events:
170*4882a593Smuzhiyun		Transition of the parent cluster to CLUSTER_UP.
171*4882a593Smuzhiyun
172*4882a593Smuzhiyun	Refer to the "Cluster state" section for a description of the
173*4882a593Smuzhiyun	CLUSTER_UP state.
174*4882a593Smuzhiyun
175*4882a593Smuzhiyun
176*4882a593SmuzhiyunCPU_UP:
177*4882a593Smuzhiyun	When a CPU reaches the CPU_UP state, it is safe for the CPU to
178*4882a593Smuzhiyun	start participating in local coherency.
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun	This is done by jumping to the kernel's CPU resume code.
181*4882a593Smuzhiyun
182*4882a593Smuzhiyun	Note that the definition of this state is slightly different
183*4882a593Smuzhiyun	from the basic model definition: CPU_UP does not mean that the
184*4882a593Smuzhiyun	CPU is coherent yet, but it does mean that it is safe to resume
185*4882a593Smuzhiyun	the kernel.  The kernel handles the rest of the resume
186*4882a593Smuzhiyun	procedure, so the remaining steps are not visible as part of the
187*4882a593Smuzhiyun	race avoidance algorithm.
188*4882a593Smuzhiyun
189*4882a593Smuzhiyun	The CPU remains in this state until an explicit policy decision
190*4882a593Smuzhiyun	is made to shut down or suspend the CPU.
191*4882a593Smuzhiyun
192*4882a593Smuzhiyun	Next state:
193*4882a593Smuzhiyun		CPU_GOING_DOWN
194*4882a593Smuzhiyun	Conditions:
195*4882a593Smuzhiyun		none
196*4882a593Smuzhiyun	Trigger events:
197*4882a593Smuzhiyun		explicit policy decision
198*4882a593Smuzhiyun
199*4882a593Smuzhiyun
200*4882a593SmuzhiyunCPU_GOING_DOWN:
201*4882a593Smuzhiyun	While in this state, the CPU exits coherency, including any
202*4882a593Smuzhiyun	operations required to achieve this (such as cleaning data
203*4882a593Smuzhiyun	caches).
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun	Next state:
206*4882a593Smuzhiyun		CPU_DOWN
207*4882a593Smuzhiyun	Conditions:
208*4882a593Smuzhiyun		local CPU teardown complete
209*4882a593Smuzhiyun	Trigger events:
210*4882a593Smuzhiyun		(spontaneous)
211*4882a593Smuzhiyun
212*4882a593Smuzhiyun
213*4882a593SmuzhiyunCluster state
214*4882a593Smuzhiyun-------------
215*4882a593Smuzhiyun
216*4882a593SmuzhiyunA cluster is a group of connected CPUs with some common resources.
217*4882a593SmuzhiyunBecause a cluster contains multiple CPUs, it can be doing multiple
218*4882a593Smuzhiyunthings at the same time.  This has some implications.  In particular, a
219*4882a593SmuzhiyunCPU can start up while another CPU is tearing the cluster down.
220*4882a593Smuzhiyun
221*4882a593SmuzhiyunIn this discussion, the "outbound side" is the view of the cluster state
222*4882a593Smuzhiyunas seen by a CPU tearing the cluster down.  The "inbound side" is the
223*4882a593Smuzhiyunview of the cluster state as seen by a CPU setting the CPU up.
224*4882a593Smuzhiyun
225*4882a593SmuzhiyunIn order to enable safe coordination in such situations, it is important
226*4882a593Smuzhiyunthat a CPU which is setting up the cluster can advertise its state
227*4882a593Smuzhiyunindependently of the CPU which is tearing down the cluster.  For this
228*4882a593Smuzhiyunreason, the cluster state is split into two parts:
229*4882a593Smuzhiyun
230*4882a593Smuzhiyun	"cluster" state: The global state of the cluster; or the state
231*4882a593Smuzhiyun	on the outbound side:
232*4882a593Smuzhiyun
233*4882a593Smuzhiyun		- CLUSTER_DOWN
234*4882a593Smuzhiyun		- CLUSTER_UP
235*4882a593Smuzhiyun		- CLUSTER_GOING_DOWN
236*4882a593Smuzhiyun
237*4882a593Smuzhiyun	"inbound" state: The state of the cluster on the inbound side.
238*4882a593Smuzhiyun
239*4882a593Smuzhiyun		- INBOUND_NOT_COMING_UP
240*4882a593Smuzhiyun		- INBOUND_COMING_UP
241*4882a593Smuzhiyun
242*4882a593Smuzhiyun
243*4882a593Smuzhiyun	The different pairings of these states results in six possible
244*4882a593Smuzhiyun	states for the cluster as a whole::
245*4882a593Smuzhiyun
246*4882a593Smuzhiyun	                            CLUSTER_UP
247*4882a593Smuzhiyun	          +==========> INBOUND_NOT_COMING_UP -------------+
248*4882a593Smuzhiyun	          #                                               |
249*4882a593Smuzhiyun	                                                          |
250*4882a593Smuzhiyun	     CLUSTER_UP     <----+                                |
251*4882a593Smuzhiyun	  INBOUND_COMING_UP      |                                v
252*4882a593Smuzhiyun
253*4882a593Smuzhiyun	          ^             CLUSTER_GOING_DOWN       CLUSTER_GOING_DOWN
254*4882a593Smuzhiyun	          #              INBOUND_COMING_UP <=== INBOUND_NOT_COMING_UP
255*4882a593Smuzhiyun
256*4882a593Smuzhiyun	    CLUSTER_DOWN         |                                |
257*4882a593Smuzhiyun	  INBOUND_COMING_UP <----+                                |
258*4882a593Smuzhiyun	                                                          |
259*4882a593Smuzhiyun	          ^                                               |
260*4882a593Smuzhiyun	          +===========     CLUSTER_DOWN      <------------+
261*4882a593Smuzhiyun	                       INBOUND_NOT_COMING_UP
262*4882a593Smuzhiyun
263*4882a593Smuzhiyun	Transitions -----> can only be made by the outbound CPU, and
264*4882a593Smuzhiyun	only involve changes to the "cluster" state.
265*4882a593Smuzhiyun
266*4882a593Smuzhiyun	Transitions ===##> can only be made by the inbound CPU, and only
267*4882a593Smuzhiyun	involve changes to the "inbound" state, except where there is no
268*4882a593Smuzhiyun	further transition possible on the outbound side (i.e., the
269*4882a593Smuzhiyun	outbound CPU has put the cluster into the CLUSTER_DOWN state).
270*4882a593Smuzhiyun
271*4882a593Smuzhiyun	The race avoidance algorithm does not provide a way to determine
272*4882a593Smuzhiyun	which exact CPUs within the cluster play these roles.  This must
273*4882a593Smuzhiyun	be decided in advance by some other means.  Refer to the section
274*4882a593Smuzhiyun	"Last man and first man selection" for more explanation.
275*4882a593Smuzhiyun
276*4882a593Smuzhiyun
277*4882a593Smuzhiyun	CLUSTER_DOWN/INBOUND_NOT_COMING_UP is the only state where the
278*4882a593Smuzhiyun	cluster can actually be powered down.
279*4882a593Smuzhiyun
280*4882a593Smuzhiyun	The parallelism of the inbound and outbound CPUs is observed by
281*4882a593Smuzhiyun	the existence of two different paths from CLUSTER_GOING_DOWN/
282*4882a593Smuzhiyun	INBOUND_NOT_COMING_UP (corresponding to GOING_DOWN in the basic
283*4882a593Smuzhiyun	model) to CLUSTER_DOWN/INBOUND_COMING_UP (corresponding to
284*4882a593Smuzhiyun	COMING_UP in the basic model).  The second path avoids cluster
285*4882a593Smuzhiyun	teardown completely.
286*4882a593Smuzhiyun
287*4882a593Smuzhiyun	CLUSTER_UP/INBOUND_COMING_UP is equivalent to UP in the basic
288*4882a593Smuzhiyun	model.  The final transition to CLUSTER_UP/INBOUND_NOT_COMING_UP
289*4882a593Smuzhiyun	is trivial and merely resets the state machine ready for the
290*4882a593Smuzhiyun	next cycle.
291*4882a593Smuzhiyun
292*4882a593Smuzhiyun	Details of the allowable transitions follow.
293*4882a593Smuzhiyun
294*4882a593Smuzhiyun	The next state in each case is notated
295*4882a593Smuzhiyun
296*4882a593Smuzhiyun		<cluster state>/<inbound state> (<transitioner>)
297*4882a593Smuzhiyun
298*4882a593Smuzhiyun	where the <transitioner> is the side on which the transition
299*4882a593Smuzhiyun	can occur; either the inbound or the outbound side.
300*4882a593Smuzhiyun
301*4882a593Smuzhiyun
302*4882a593SmuzhiyunCLUSTER_DOWN/INBOUND_NOT_COMING_UP:
303*4882a593Smuzhiyun	Next state:
304*4882a593Smuzhiyun		CLUSTER_DOWN/INBOUND_COMING_UP (inbound)
305*4882a593Smuzhiyun	Conditions:
306*4882a593Smuzhiyun		none
307*4882a593Smuzhiyun
308*4882a593Smuzhiyun	Trigger events:
309*4882a593Smuzhiyun		a) an explicit hardware power-up operation, resulting
310*4882a593Smuzhiyun		   from a policy decision on another CPU;
311*4882a593Smuzhiyun
312*4882a593Smuzhiyun		b) a hardware event, such as an interrupt.
313*4882a593Smuzhiyun
314*4882a593Smuzhiyun
315*4882a593SmuzhiyunCLUSTER_DOWN/INBOUND_COMING_UP:
316*4882a593Smuzhiyun
317*4882a593Smuzhiyun	In this state, an inbound CPU sets up the cluster, including
318*4882a593Smuzhiyun	enabling of hardware coherency at the cluster level and any
319*4882a593Smuzhiyun	other operations (such as cache invalidation) which are required
320*4882a593Smuzhiyun	in order to achieve this.
321*4882a593Smuzhiyun
322*4882a593Smuzhiyun	The purpose of this state is to do sufficient cluster-level
323*4882a593Smuzhiyun	setup to enable other CPUs in the cluster to enter coherency
324*4882a593Smuzhiyun	safely.
325*4882a593Smuzhiyun
326*4882a593Smuzhiyun	Next state:
327*4882a593Smuzhiyun		CLUSTER_UP/INBOUND_COMING_UP (inbound)
328*4882a593Smuzhiyun	Conditions:
329*4882a593Smuzhiyun		cluster-level setup and hardware coherency complete
330*4882a593Smuzhiyun	Trigger events:
331*4882a593Smuzhiyun		(spontaneous)
332*4882a593Smuzhiyun
333*4882a593Smuzhiyun
334*4882a593SmuzhiyunCLUSTER_UP/INBOUND_COMING_UP:
335*4882a593Smuzhiyun
336*4882a593Smuzhiyun	Cluster-level setup is complete and hardware coherency is
337*4882a593Smuzhiyun	enabled for the cluster.  Other CPUs in the cluster can safely
338*4882a593Smuzhiyun	enter coherency.
339*4882a593Smuzhiyun
340*4882a593Smuzhiyun	This is a transient state, leading immediately to
341*4882a593Smuzhiyun	CLUSTER_UP/INBOUND_NOT_COMING_UP.  All other CPUs on the cluster
342*4882a593Smuzhiyun	should consider treat these two states as equivalent.
343*4882a593Smuzhiyun
344*4882a593Smuzhiyun	Next state:
345*4882a593Smuzhiyun		CLUSTER_UP/INBOUND_NOT_COMING_UP (inbound)
346*4882a593Smuzhiyun	Conditions:
347*4882a593Smuzhiyun		none
348*4882a593Smuzhiyun	Trigger events:
349*4882a593Smuzhiyun		(spontaneous)
350*4882a593Smuzhiyun
351*4882a593Smuzhiyun
352*4882a593SmuzhiyunCLUSTER_UP/INBOUND_NOT_COMING_UP:
353*4882a593Smuzhiyun
354*4882a593Smuzhiyun	Cluster-level setup is complete and hardware coherency is
355*4882a593Smuzhiyun	enabled for the cluster.  Other CPUs in the cluster can safely
356*4882a593Smuzhiyun	enter coherency.
357*4882a593Smuzhiyun
358*4882a593Smuzhiyun	The cluster will remain in this state until a policy decision is
359*4882a593Smuzhiyun	made to power the cluster down.
360*4882a593Smuzhiyun
361*4882a593Smuzhiyun	Next state:
362*4882a593Smuzhiyun		CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP (outbound)
363*4882a593Smuzhiyun	Conditions:
364*4882a593Smuzhiyun		none
365*4882a593Smuzhiyun	Trigger events:
366*4882a593Smuzhiyun		policy decision to power down the cluster
367*4882a593Smuzhiyun
368*4882a593Smuzhiyun
369*4882a593SmuzhiyunCLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP:
370*4882a593Smuzhiyun
371*4882a593Smuzhiyun	An outbound CPU is tearing the cluster down.  The selected CPU
372*4882a593Smuzhiyun	must wait in this state until all CPUs in the cluster are in the
373*4882a593Smuzhiyun	CPU_DOWN state.
374*4882a593Smuzhiyun
375*4882a593Smuzhiyun	When all CPUs are in the CPU_DOWN state, the cluster can be torn
376*4882a593Smuzhiyun	down, for example by cleaning data caches and exiting
377*4882a593Smuzhiyun	cluster-level coherency.
378*4882a593Smuzhiyun
379*4882a593Smuzhiyun	To avoid wasteful unnecessary teardown operations, the outbound
380*4882a593Smuzhiyun	should check the inbound cluster state for asynchronous
381*4882a593Smuzhiyun	transitions to INBOUND_COMING_UP.  Alternatively, individual
382*4882a593Smuzhiyun	CPUs can be checked for entry into CPU_COMING_UP or CPU_UP.
383*4882a593Smuzhiyun
384*4882a593Smuzhiyun
385*4882a593Smuzhiyun	Next states:
386*4882a593Smuzhiyun
387*4882a593Smuzhiyun	CLUSTER_DOWN/INBOUND_NOT_COMING_UP (outbound)
388*4882a593Smuzhiyun		Conditions:
389*4882a593Smuzhiyun			cluster torn down and ready to power off
390*4882a593Smuzhiyun		Trigger events:
391*4882a593Smuzhiyun			(spontaneous)
392*4882a593Smuzhiyun
393*4882a593Smuzhiyun	CLUSTER_GOING_DOWN/INBOUND_COMING_UP (inbound)
394*4882a593Smuzhiyun		Conditions:
395*4882a593Smuzhiyun			none
396*4882a593Smuzhiyun
397*4882a593Smuzhiyun		Trigger events:
398*4882a593Smuzhiyun			a) an explicit hardware power-up operation,
399*4882a593Smuzhiyun			   resulting from a policy decision on another
400*4882a593Smuzhiyun			   CPU;
401*4882a593Smuzhiyun
402*4882a593Smuzhiyun			b) a hardware event, such as an interrupt.
403*4882a593Smuzhiyun
404*4882a593Smuzhiyun
405*4882a593SmuzhiyunCLUSTER_GOING_DOWN/INBOUND_COMING_UP:
406*4882a593Smuzhiyun
407*4882a593Smuzhiyun	The cluster is (or was) being torn down, but another CPU has
408*4882a593Smuzhiyun	come online in the meantime and is trying to set up the cluster
409*4882a593Smuzhiyun	again.
410*4882a593Smuzhiyun
411*4882a593Smuzhiyun	If the outbound CPU observes this state, it has two choices:
412*4882a593Smuzhiyun
413*4882a593Smuzhiyun		a) back out of teardown, restoring the cluster to the
414*4882a593Smuzhiyun		   CLUSTER_UP state;
415*4882a593Smuzhiyun
416*4882a593Smuzhiyun		b) finish tearing the cluster down and put the cluster
417*4882a593Smuzhiyun		   in the CLUSTER_DOWN state; the inbound CPU will
418*4882a593Smuzhiyun		   set up the cluster again from there.
419*4882a593Smuzhiyun
420*4882a593Smuzhiyun	Choice (a) permits the removal of some latency by avoiding
421*4882a593Smuzhiyun	unnecessary teardown and setup operations in situations where
422*4882a593Smuzhiyun	the cluster is not really going to be powered down.
423*4882a593Smuzhiyun
424*4882a593Smuzhiyun
425*4882a593Smuzhiyun	Next states:
426*4882a593Smuzhiyun
427*4882a593Smuzhiyun	CLUSTER_UP/INBOUND_COMING_UP (outbound)
428*4882a593Smuzhiyun		Conditions:
429*4882a593Smuzhiyun				cluster-level setup and hardware
430*4882a593Smuzhiyun				coherency complete
431*4882a593Smuzhiyun
432*4882a593Smuzhiyun		Trigger events:
433*4882a593Smuzhiyun				(spontaneous)
434*4882a593Smuzhiyun
435*4882a593Smuzhiyun	CLUSTER_DOWN/INBOUND_COMING_UP (outbound)
436*4882a593Smuzhiyun		Conditions:
437*4882a593Smuzhiyun			cluster torn down and ready to power off
438*4882a593Smuzhiyun
439*4882a593Smuzhiyun		Trigger events:
440*4882a593Smuzhiyun			(spontaneous)
441*4882a593Smuzhiyun
442*4882a593Smuzhiyun
443*4882a593SmuzhiyunLast man and First man selection
444*4882a593Smuzhiyun--------------------------------
445*4882a593Smuzhiyun
446*4882a593SmuzhiyunThe CPU which performs cluster tear-down operations on the outbound side
447*4882a593Smuzhiyunis commonly referred to as the "last man".
448*4882a593Smuzhiyun
449*4882a593SmuzhiyunThe CPU which performs cluster setup on the inbound side is commonly
450*4882a593Smuzhiyunreferred to as the "first man".
451*4882a593Smuzhiyun
452*4882a593SmuzhiyunThe race avoidance algorithm documented above does not provide a
453*4882a593Smuzhiyunmechanism to choose which CPUs should play these roles.
454*4882a593Smuzhiyun
455*4882a593Smuzhiyun
456*4882a593SmuzhiyunLast man:
457*4882a593Smuzhiyun
458*4882a593SmuzhiyunWhen shutting down the cluster, all the CPUs involved are initially
459*4882a593Smuzhiyunexecuting Linux and hence coherent.  Therefore, ordinary spinlocks can
460*4882a593Smuzhiyunbe used to select a last man safely, before the CPUs become
461*4882a593Smuzhiyunnon-coherent.
462*4882a593Smuzhiyun
463*4882a593Smuzhiyun
464*4882a593SmuzhiyunFirst man:
465*4882a593Smuzhiyun
466*4882a593SmuzhiyunBecause CPUs may power up asynchronously in response to external wake-up
467*4882a593Smuzhiyunevents, a dynamic mechanism is needed to make sure that only one CPU
468*4882a593Smuzhiyunattempts to play the first man role and do the cluster-level
469*4882a593Smuzhiyuninitialisation: any other CPUs must wait for this to complete before
470*4882a593Smuzhiyunproceeding.
471*4882a593Smuzhiyun
472*4882a593SmuzhiyunCluster-level initialisation may involve actions such as configuring
473*4882a593Smuzhiyuncoherency controls in the bus fabric.
474*4882a593Smuzhiyun
475*4882a593SmuzhiyunThe current implementation in mcpm_head.S uses a separate mutual exclusion
476*4882a593Smuzhiyunmechanism to do this arbitration.  This mechanism is documented in
477*4882a593Smuzhiyundetail in vlocks.txt.
478*4882a593Smuzhiyun
479*4882a593Smuzhiyun
480*4882a593SmuzhiyunFeatures and Limitations
481*4882a593Smuzhiyun------------------------
482*4882a593Smuzhiyun
483*4882a593SmuzhiyunImplementation:
484*4882a593Smuzhiyun
485*4882a593Smuzhiyun	The current ARM-based implementation is split between
486*4882a593Smuzhiyun	arch/arm/common/mcpm_head.S (low-level inbound CPU operations) and
487*4882a593Smuzhiyun	arch/arm/common/mcpm_entry.c (everything else):
488*4882a593Smuzhiyun
489*4882a593Smuzhiyun	__mcpm_cpu_going_down() signals the transition of a CPU to the
490*4882a593Smuzhiyun	CPU_GOING_DOWN state.
491*4882a593Smuzhiyun
492*4882a593Smuzhiyun	__mcpm_cpu_down() signals the transition of a CPU to the CPU_DOWN
493*4882a593Smuzhiyun	state.
494*4882a593Smuzhiyun
495*4882a593Smuzhiyun	A CPU transitions to CPU_COMING_UP and then to CPU_UP via the
496*4882a593Smuzhiyun	low-level power-up code in mcpm_head.S.  This could
497*4882a593Smuzhiyun	involve CPU-specific setup code, but in the current
498*4882a593Smuzhiyun	implementation it does not.
499*4882a593Smuzhiyun
500*4882a593Smuzhiyun	__mcpm_outbound_enter_critical() and __mcpm_outbound_leave_critical()
501*4882a593Smuzhiyun	handle transitions from CLUSTER_UP to CLUSTER_GOING_DOWN
502*4882a593Smuzhiyun	and from there to CLUSTER_DOWN or back to CLUSTER_UP (in
503*4882a593Smuzhiyun	the case of an aborted cluster power-down).
504*4882a593Smuzhiyun
505*4882a593Smuzhiyun	These functions are more complex than the __mcpm_cpu_*()
506*4882a593Smuzhiyun	functions due to the extra inter-CPU coordination which
507*4882a593Smuzhiyun	is needed for safe transitions at the cluster level.
508*4882a593Smuzhiyun
509*4882a593Smuzhiyun	A cluster transitions from CLUSTER_DOWN back to CLUSTER_UP via
510*4882a593Smuzhiyun	the low-level power-up code in mcpm_head.S.  This
511*4882a593Smuzhiyun	typically involves platform-specific setup code,
512*4882a593Smuzhiyun	provided by the platform-specific power_up_setup
513*4882a593Smuzhiyun	function registered via mcpm_sync_init.
514*4882a593Smuzhiyun
515*4882a593SmuzhiyunDeep topologies:
516*4882a593Smuzhiyun
517*4882a593Smuzhiyun	As currently described and implemented, the algorithm does not
518*4882a593Smuzhiyun	support CPU topologies involving more than two levels (i.e.,
519*4882a593Smuzhiyun	clusters of clusters are not supported).  The algorithm could be
520*4882a593Smuzhiyun	extended by replicating the cluster-level states for the
521*4882a593Smuzhiyun	additional topological levels, and modifying the transition
522*4882a593Smuzhiyun	rules for the intermediate (non-outermost) cluster levels.
523*4882a593Smuzhiyun
524*4882a593Smuzhiyun
525*4882a593SmuzhiyunColophon
526*4882a593Smuzhiyun--------
527*4882a593Smuzhiyun
528*4882a593SmuzhiyunOriginally created and documented by Dave Martin for Linaro Limited, in
529*4882a593Smuzhiyuncollaboration with Nicolas Pitre and Achin Gupta.
530*4882a593Smuzhiyun
531*4882a593SmuzhiyunCopyright (C) 2012-2013  Linaro Limited
532*4882a593SmuzhiyunDistributed under the terms of Version 2 of the GNU General Public
533*4882a593SmuzhiyunLicense, as defined in linux/COPYING.
534