xref: /rk3399_ARM-atf/docs/perf/psci-performance-juno.rst (revision 77873ef1dd2627efce7f0e8416d7504710edb18b)
1PSCI Performance Measurements on Arm Juno Development Platform
2==============================================================
3
4This document summarises the findings of performance measurements of key
5operations in the Trusted Firmware-A Power State Coordination Interface (PSCI)
6implementation, using the in-built Performance Measurement Framework (PMF) and
7runtime instrumentation timestamps.
8
9Method
10------
11
12We used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2
13x Cortex-A57 clusters running at the following frequencies:
14
15+-----------------+--------------------+
16| Domain          | Frequency (MHz)    |
17+=================+====================+
18| Cortex-A57      | 900 (nominal)      |
19+-----------------+--------------------+
20| Cortex-A53      | 650 (underdrive)   |
21+-----------------+--------------------+
22| AXI subsystem   | 533                |
23+-----------------+--------------------+
24
25Juno supports CPU, cluster and system power down states, corresponding to power
26levels 0, 1 and 2 respectively. It does not support any retention states.
27
28Given that runtime instrumentation using PMF is invasive, there is a small
29(unquantified) overhead on the results. PMF uses the generic counter for
30timestamps, which runs at 50MHz on Juno.
31
32The following source trees and binaries were used:
33
34- `TF-A v2.14-rc0`_
35- `TFTF v2.14-rc0`_
36
37Please see the Runtime Instrumentation :ref:`Testing Methodology
38<Runtime Instrumentation Methodology>`
39page for more details. The tests were ran using the
40`tf-psci-lava-instr/juno-enable-runtime-instr,juno-instrumentation:juno-tftf`
41configuration in CI.
42
43Results
44-------
45
46``CPU_SUSPEND`` to deepest power level
47~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48
49.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
50        parallel (v2.14)
51
52    +---------+------+------------------+---------------------+-------------------+
53    | Cluster | Core |    Powerdown     |        Wakeup       |    Cache Flush    |
54    +---------+------+------------------+---------------------+-------------------+
55    |    0    |  0   |     332440.0     | 270640.0(+1031.44%) | 169500.0(+22.05%) |
56    +---------+------+------------------+---------------------+-------------------+
57    |    0    |  1   | 624520.0(-1.01%) |   30260.0(-88.07%)  | 166740.0(+21.76%) |
58    +---------+------+------------------+---------------------+-------------------+
59    |    1    |  0   | 187960.0(+1.74%) |   25460.0(+9.93%)   |  90420.0(+12.69%) |
60    +---------+------+------------------+---------------------+-------------------+
61    |    1    |  1   |     479100.0     |   20520.0(+10.56%)  |  87500.0(+14.38%) |
62    +---------+------+------------------+---------------------+-------------------+
63    |    1    |  2   | 923480.0(-1.11%) |   294160.0(+1.58%)  |  87500.0(+14.62%) |
64    +---------+------+------------------+---------------------+-------------------+
65    |    1    |  3   |    1106300.0     |       238320.0      |  87340.0(+14.35%) |
66    +---------+------+------------------+---------------------+-------------------+
67
68.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
69        parallel (v2.13)
70
71    +---------+------+--------------------+--------------------+---------------------+
72    | Cluster | Core |     Powerdown      |       Wakeup       |     Cache Flush     |
73    +---------+------+--------------------+--------------------+---------------------+
74    |    0    |  0   | 333000.0(-52.92%)  |  23920.0(-40.11%)  |  138880.0(-17.24%)  |
75    +---------+------+--------------------+--------------------+---------------------+
76    |    0    |  1   | 630900.0(+145.95%) | 253720.0(-46.56%)  | 136940.0(+1987.50%) |
77    +---------+------+--------------------+--------------------+---------------------+
78    |    1    |  0   | 184740.0(+71.92%)  |  23160.0(-95.39%)  |  80240.0(+1283.45%) |
79    +---------+------+--------------------+--------------------+---------------------+
80    |    1    |  1   | 481140.0(+18.16%)  |  18560.0(-88.25%)  |  76500.0(+1520.76%) |
81    +---------+------+--------------------+--------------------+---------------------+
82    |    1    |  2   | 933880.0(+67.76%)  | 289580.0(+189.64%) |  76340.0(+1510.55%) |
83    +---------+------+--------------------+--------------------+---------------------+
84    |    1    |  3   | 1112480.0(+9.76%)  | 238420.0(+753.94%) |   76380.0(-15.32%)  |
85    +---------+------+--------------------+--------------------+---------------------+
86
87.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
88        serial (v2.14)
89
90    +---------+------+------------------+------------------+-------------------+
91    | Cluster | Core |    Powerdown     |      Wakeup      |    Cache Flush    |
92    +---------+------+------------------+------------------+-------------------+
93    |    0    |  0   | 267000.0(+9.39%) | 31080.0(+26.96%) | 168520.0(+22.44%) |
94    +---------+------+------------------+------------------+-------------------+
95    |    0    |  1   | 267440.0(+9.52%) | 30680.0(+28.69%) | 168480.0(+22.21%) |
96    +---------+------+------------------+------------------+-------------------+
97    |    1    |  0   | 291300.0(-1.18%) | 25140.0(+6.80%)  |  86980.0(+13.52%) |
98    +---------+------+------------------+------------------+-------------------+
99    |    1    |  1   | 184260.0(+2.31%) | 23140.0(+9.46%)  |  87940.0(+14.03%) |
100    +---------+------+------------------+------------------+-------------------+
101    |    1    |  2   | 184520.0(+2.20%) | 23460.0(+12.79%) |  87520.0(+14.02%) |
102    +---------+------+------------------+------------------+-------------------+
103    |    1    |  3   | 184700.0(+2.27%) | 23240.0(+9.62%)  |  87180.0(+13.43%) |
104    +---------+------+------------------+------------------+-------------------+
105
106.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
107        serial (v2.13)
108
109    +---------+------+-------------------+------------------+--------------------+
110    | Cluster | Core |     Powerdown     |      Wakeup      |    Cache Flush     |
111    +---------+------+-------------------+------------------+--------------------+
112    |    0    |  0   |  244080.0(-9.21%) | 24480.0(-40.00%) | 137640.0(-18.19%)  |
113    +---------+------+-------------------+------------------+--------------------+
114    |    0    |  1   |  244200.0(-9.06%) | 23840.0(-41.57%) | 137860.0(-17.91%)  |
115    +---------+------+-------------------+------------------+--------------------+
116    |    1    |  0   |  294780.0(-1.56%) | 23540.0(-14.83%) |  76620.0(-12.35%)  |
117    +---------+------+-------------------+------------------+--------------------+
118    |    1    |  1   | 180100.0(+74.72%) | 21140.0(-6.63%)  | 77120.0(+1533.90%) |
119    +---------+------+-------------------+------------------+--------------------+
120    |    1    |  2   | 180540.0(+75.25%) | 20800.0(-10.34%) | 76760.0(+1554.31%) |
121    +---------+------+-------------------+------------------+--------------------+
122    |    1    |  3   | 180600.0(+75.44%) | 21200.0(-7.99%)  | 76860.0(+1542.31%) |
123    +---------+------+-------------------+------------------+--------------------+
124
125``CPU_SUSPEND`` to power level 0
126~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
127
128.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
129        parallel (v2.14)
130
131    +---------+------+--------------------+------------------+------------------+
132    | Cluster | Core |     Powerdown      |      Wakeup      |   Cache Flush    |
133    +---------+------+--------------------+------------------+------------------+
134    |    0    |  0   |  683780.0(-2.74%)  | 22560.0(+33.81%) | 11040.0(+38.35%) |
135    +---------+------+--------------------+------------------+------------------+
136    |    0    |  1   |  829620.0(-2.61%)  | 22820.0(+39.15%) | 11480.0(+42.79%) |
137    +---------+------+--------------------+------------------+------------------+
138    |    1    |  0   | 104520.0(-74.34%)  | 17200.0(+13.91%) | 8680.0(+20.56%)  |
139    +---------+------+--------------------+------------------+------------------+
140    |    1    |  1   | 249200.0(+124.54%) | 17100.0(+10.61%) | 8480.0(+29.27%)  |
141    +---------+------+--------------------+------------------+------------------+
142    |    1    |  2   | 393980.0(-28.95%)  | 17480.0(+13.51%) | 8320.0(+19.88%)  |
143    +---------+------+--------------------+------------------+------------------+
144    |    1    |  3   | 539520.0(+108.34%) | 16980.0(+9.13%)  | 8300.0(+25.00%)  |
145    +---------+------+--------------------+------------------+------------------+
146
147.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
148        parallel (v2.13)
149
150    +---------+------+--------------------+------------------+-----------------+
151    | Cluster | Core |     Powerdown      |      Wakeup      |   Cache Flush   |
152    +---------+------+--------------------+------------------+-----------------+
153    |    0    |  0   | 703060.0(-17.69%)  | 16860.0(-47.87%) | 7980.0(-19.88%) |
154    +---------+------+--------------------+------------------+-----------------+
155    |    0    |  1   | 851880.0(+20.98%)  | 16400.0(-49.41%) | 8040.0(-17.45%) |
156    +---------+------+--------------------+------------------+-----------------+
157    |    1    |  0   | 407400.0(+58.99%)  | 15100.0(-26.20%) |  7200.0(-5.76%) |
158    +---------+------+--------------------+------------------+-----------------+
159    |    1    |  1   | 110980.0(-72.67%)  | 15460.0(-23.47%) | 6560.0(-10.87%) |
160    +---------+------+--------------------+------------------+-----------------+
161    |    1    |  2   |      554540.0      | 15400.0(-23.46%) |  6940.0(-2.53%) |
162    +---------+------+--------------------+------------------+-----------------+
163    |    1    |  3   | 258960.0(+143.06%) | 15560.0(-25.05%) |      6640.0     |
164    +---------+------+--------------------+------------------+-----------------+
165
166.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.14)
167
168    +---------+------+------------------+------------------+-----------------+
169    | Cluster | Core |    Powerdown     |      Wakeup      |   Cache Flush   |
170    +---------+------+------------------+------------------+-----------------+
171    |    0    |  0   | 101100.0(-4.73%) | 22820.0(+33.45%) | 7360.0(+39.92%) |
172    +---------+------+------------------+------------------+-----------------+
173    |    0    |  1   | 101400.0(-5.13%) | 22720.0(+33.18%) | 7560.0(+43.18%) |
174    +---------+------+------------------+------------------+-----------------+
175    |    1    |  0   |     291440.0     | 16880.0(+8.21%)  |      4580.0     |
176    +---------+------+------------------+------------------+-----------------+
177    |    1    |  1   | 96600.0(-6.45%)  | 16860.0(+9.20%)  |  4600.0(+3.14%) |
178    +---------+------+------------------+------------------+-----------------+
179    |    1    |  2   | 97060.0(-6.40%)  | 16980.0(+11.27%) |  4640.0(+3.11%) |
180    +---------+------+------------------+------------------+-----------------+
181    |    1    |  3   | 96660.0(-6.77%)  | 16960.0(+7.89%)  |  4620.0(+2.67%) |
182    +---------+------+------------------+------------------+-----------------+
183
184
185.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.13)
186
187    +---------+------+------------------+------------------+-----------------+
188    | Cluster | Core |    Powerdown     |      Wakeup      |   Cache Flush   |
189    +---------+------+------------------+------------------+-----------------+
190    |    0    |  0   | 106120.0(+1.49%) | 17100.0(-48.24%) | 5260.0(-23.77%) |
191    +---------+------+------------------+------------------+-----------------+
192    |    0    |  1   | 106880.0(+2.40%) | 17060.0(-47.08%) | 5280.0(-21.89%) |
193    +---------+------+------------------+------------------+-----------------+
194    |    1    |  0   |     294360.0     | 15600.0(-20.97%) |      4560.0     |
195    +---------+------+------------------+------------------+-----------------+
196    |    1    |  1   | 103260.0(+3.82%) | 15440.0(-20.41%) |  4460.0(-5.11%) |
197    +---------+------+------------------+------------------+-----------------+
198    |    1    |  2   | 103700.0(+4.33%) | 15260.0(-24.08%) |  4500.0(-2.60%) |
199    +---------+------+------------------+------------------+-----------------+
200    |    1    |  3   | 103680.0(+4.26%) | 15720.0(-20.53%) |  4500.0(-1.32%) |
201    +---------+------+------------------+------------------+-----------------+
202
203``CPU_OFF`` on all non-lead CPUs
204~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
205
206``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
207core to the deepest power level.
208
209.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.14)
210
211    +---------+------+------------------+------------------+-------------------+
212    | Cluster | Core |    Powerdown     |      Wakeup      |    Cache Flush    |
213    +---------+------+------------------+------------------+-------------------+
214    |    0    |  0   | 267240.0(+9.97%) | 32940.0(+24.68%) | 168460.0(+22.45%) |
215    +---------+------+------------------+------------------+-------------------+
216    |    0    |  1   | 267340.0(+9.46%) | 33720.0(+28.12%) | 168500.0(+22.21%) |
217    +---------+------+------------------+------------------+-------------------+
218    |    1    |  0   | 185740.0(+1.85%) | 25120.0(+6.17%)  |  88380.0(+13.31%) |
219    +---------+------+------------------+------------------+-------------------+
220    |    1    |  1   | 101940.0(-5.77%) | 24240.0(+6.88%)  |   4600.0(+4.07%)  |
221    +---------+------+------------------+------------------+-------------------+
222    |    1    |  2   | 101800.0(-6.04%) | 23060.0(+6.17%)  |   4660.0(+9.91%)  |
223    +---------+------+------------------+------------------+-------------------+
224    |    1    |  3   | 101820.0(-5.91%) | 23340.0(+7.66%)  |   4640.0(+6.91%)  |
225    +---------+------+------------------+------------------+-------------------+
226
227.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.13)
228
229    +---------+------+------------------+------------------+-------------------+
230    | Cluster | Core |    Powerdown     |      Wakeup      |    Cache Flush    |
231    +---------+------+------------------+------------------+-------------------+
232    |    0    |  0   | 243020.0(-9.14%) | 26420.0(-39.51%) | 137580.0(-17.85%) |
233    +---------+------+------------------+------------------+-------------------+
234    |    0    |  1   | 244240.0(-8.87%) | 26320.0(-38.93%) | 137880.0(-17.73%) |
235    +---------+------+------------------+------------------+-------------------+
236    |    1    |  0   | 182360.0(-2.89%) | 23660.0(-15.20%) |  78000.0(-11.08%) |
237    +---------+------+------------------+------------------+-------------------+
238    |    1    |  1   | 108180.0(+4.68%) | 22680.0(-14.16%) |       4420.0      |
239    +---------+------+------------------+------------------+-------------------+
240    |    1    |  2   | 108340.0(+4.92%) | 21720.0(-16.40%) |   4240.0(-4.93%)  |
241    +---------+------+------------------+------------------+-------------------+
242    |    1    |  3   | 108220.0(+4.82%) | 21680.0(-16.16%) |   4340.0(-3.12%)  |
243    +---------+------+------------------+------------------+-------------------+
244
245``CPU_VERSION`` in parallel
246~~~~~~~~~~~~~~~~~~~~~~~~~~~
247
248.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.14)
249
250    +---------+------+--------------------+
251    | Cluster | Core |      Latency       |
252    +---------+------+--------------------+
253    |    0    |  0   |  1200.0(+20.00%)   |
254    +---------+------+--------------------+
255    |    0    |  1   |   1160.0(+9.43%)   |
256    +---------+------+--------------------+
257    |    1    |  0   |   700.0(+16.67%)   |
258    +---------+------+--------------------+
259    |    1    |  1   |   1040.0(+4.00%)   |
260    +---------+------+--------------------+
261    |    1    |  2   |   1020.0(+4.08%)   |
262    +---------+------+--------------------+
263    |    1    |  3   |   1080.0(+8.00%)   |
264    +---------+------+--------------------+
265
266.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.13)
267
268    +---------+------+--------------------+
269    | Cluster | Core |      Latency       |
270    +---------+------+--------------------+
271    |    0    |  0   |  1000.0(-19.35%)   |
272    +---------+------+--------------------+
273    |    0    |  1   |  1060.0(-17.19%)   |
274    +---------+------+--------------------+
275    |    1    |  0   |   600.0(-11.76%)   |
276    +---------+------+--------------------+
277    |    1    |  1   |   1000.0(+2.04%)   |
278    +---------+------+--------------------+
279    |    1    |  2   |   980.0(+4.26%)    |
280    +---------+------+--------------------+
281    |    1    |  3   |   1000.0(+2.04%)   |
282    +---------+------+--------------------+
283
284Annotated Historic Results
285--------------------------
286
287The following results are based on the upstream `TF master as of 31/01/2017`_.
288TF-A was built using the same build instructions as detailed in the procedure
289above.
290
291In the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
292CPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
293CPU.
294
295``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and
296``CFLUSH_OVERHEAD`` the latency of the cache flush operation.
297
298``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
299~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
300
301+-------+---------------------+--------------------+--------------------------+
302| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
303+=======+=====================+====================+==========================+
304| 0     | 27                  | 20                 | 5                        |
305+-------+---------------------+--------------------+--------------------------+
306| 1     | 114                 | 86                 | 5                        |
307+-------+---------------------+--------------------+--------------------------+
308| 2     | 202                 | 58                 | 5                        |
309+-------+---------------------+--------------------+--------------------------+
310| 3     | 375                 | 29                 | 94                       |
311+-------+---------------------+--------------------+--------------------------+
312| 4     | 20                  | 22                 | 6                        |
313+-------+---------------------+--------------------+--------------------------+
314| 5     | 290                 | 18                 | 206                      |
315+-------+---------------------+--------------------+--------------------------+
316
317A large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is
318observed due to TF PSCI lock contention. In the worst case, CPU 3 has to wait
319for the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release
320the lock before proceeding.
321
322The ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the
323last CPUs in their respective clusters to power down, therefore both the L1 and
324L2 caches are flushed.
325
326The ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3
327because the L2 cache size for the big cluster is lot larger (2MB) compared to
328the little cluster (1MB).
329
330``CPU_SUSPEND`` to power level 0 on all CPUs in parallel
331~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
332
333+-------+---------------------+--------------------+--------------------------+
334| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
335+=======+=====================+====================+==========================+
336| 0     | 116                 | 14                 | 8                        |
337+-------+---------------------+--------------------+--------------------------+
338| 1     | 204                 | 14                 | 8                        |
339+-------+---------------------+--------------------+--------------------------+
340| 2     | 287                 | 13                 | 8                        |
341+-------+---------------------+--------------------+--------------------------+
342| 3     | 376                 | 13                 | 9                        |
343+-------+---------------------+--------------------+--------------------------+
344| 4     | 29                  | 15                 | 7                        |
345+-------+---------------------+--------------------+--------------------------+
346| 5     | 21                  | 15                 | 8                        |
347+-------+---------------------+--------------------+--------------------------+
348
349There is no lock contention in TF generic code at power level 0 but the large
350variance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno
351platform code. The platform lock is used to mediate access to a single SCP
352communication channel. This is compounded by the SCP firmware waiting for each
353AP CPU to enter WFI before making the channel available to other CPUs, which
354effectively serializes the SCP power down commands from all CPUs.
355
356On platforms with a more efficient CPU power down mechanism, it should be
357possible to make the ``PSCI_ENTRY`` times smaller and consistent.
358
359The ``PSCI_EXIT`` times are consistent across all CPUs because TF does not
360require locks at power level 0.
361
362The ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only
363the cache associated with power level 0 is flushed (L1).
364
365``CPU_SUSPEND`` to deepest power level on all CPUs in sequence
366~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
367
368+-------+---------------------+--------------------+--------------------------+
369| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
370+=======+=====================+====================+==========================+
371| 0     | 114                 | 20                 | 94                       |
372+-------+---------------------+--------------------+--------------------------+
373| 1     | 114                 | 20                 | 94                       |
374+-------+---------------------+--------------------+--------------------------+
375| 2     | 114                 | 20                 | 94                       |
376+-------+---------------------+--------------------+--------------------------+
377| 3     | 114                 | 20                 | 94                       |
378+-------+---------------------+--------------------+--------------------------+
379| 4     | 195                 | 22                 | 180                      |
380+-------+---------------------+--------------------+--------------------------+
381| 5     | 21                  | 17                 | 6                        |
382+-------+---------------------+--------------------+--------------------------+
383
384The ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster
385are large because all other CPUs in the cluster are powered down during the
386test. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a
387flush of both L1 and L2 caches.
388
389The ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
390CPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
391to the little cluster (1MB).
392
393The ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead
394CPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to
395level 0, which only requires L1 cache flush.
396
397``CPU_SUSPEND`` to power level 0 on all CPUs in sequence
398~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
399
400+-------+---------------------+--------------------+--------------------------+
401| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
402+=======+=====================+====================+==========================+
403| 0     | 22                  | 14                 | 5                        |
404+-------+---------------------+--------------------+--------------------------+
405| 1     | 22                  | 14                 | 5                        |
406+-------+---------------------+--------------------+--------------------------+
407| 2     | 21                  | 14                 | 5                        |
408+-------+---------------------+--------------------+--------------------------+
409| 3     | 22                  | 14                 | 5                        |
410+-------+---------------------+--------------------+--------------------------+
411| 4     | 17                  | 14                 | 6                        |
412+-------+---------------------+--------------------+--------------------------+
413| 5     | 18                  | 15                 | 6                        |
414+-------+---------------------+--------------------+--------------------------+
415
416Here the times are small and consistent since there is no contention and it is
417only necessary to flush the cache to power level 0 (L1). This is the best case
418scenario.
419
420The ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than
421for the CPUs in little cluster due to greater CPU performance.
422
423The ``PSCI_EXIT`` times are generally lower than in the last test because the
424cluster remains powered on throughout the test and there is less code to execute
425on power on (for example, no need to enter CCI coherency)
426
427``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level
428~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
429
430The test sequence here is as follows:
431
4321. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence.
433
4342. Program wake up timer and suspend the lead CPU to the deepest power level.
435
4363. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU.
437
438+-------+---------------------+--------------------+--------------------------+
439| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
440+=======+=====================+====================+==========================+
441| 0     | 110                 | 28                 | 93                       |
442+-------+---------------------+--------------------+--------------------------+
443| 1     | 110                 | 28                 | 93                       |
444+-------+---------------------+--------------------+--------------------------+
445| 2     | 110                 | 28                 | 93                       |
446+-------+---------------------+--------------------+--------------------------+
447| 3     | 111                 | 28                 | 93                       |
448+-------+---------------------+--------------------+--------------------------+
449| 4     | 195                 | 22                 | 181                      |
450+-------+---------------------+--------------------+--------------------------+
451| 5     | 20                  | 23                 | 6                        |
452+-------+---------------------+--------------------+--------------------------+
453
454The ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other
455CPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call
456powers down to the cluster level, requiring a flush of both L1 and L2 caches.
457
458The ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because
459lead CPU 4 is running and CPU 5 only powers down to level 0, which only requires
460an L1 cache flush.
461
462The ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
463CPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
464to the little cluster (1MB).
465
466The ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than
467for CPUs in the little cluster due to greater CPU performance.  These times
468generally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests
469because there is more code to execute in the "on finisher" compared to the
470"suspend finisher" (for example, GIC redistributor register programming).
471
472``PSCI_VERSION`` on all CPUs in parallel
473~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
474
475Since very little code is associated with ``PSCI_VERSION``, this test
476approximates the round trip latency for handling a fast SMC at EL3 in TF.
477
478+-------+-------------------+
479| CPU   | TOTAL TIME (ns)   |
480+=======+===================+
481| 0     | 3020              |
482+-------+-------------------+
483| 1     | 2940              |
484+-------+-------------------+
485| 2     | 2980              |
486+-------+-------------------+
487| 3     | 3060              |
488+-------+-------------------+
489| 4     | 520               |
490+-------+-------------------+
491| 5     | 720               |
492+-------+-------------------+
493
494The times for the big CPUs are less than the little CPUs due to greater CPU
495performance.
496
497We suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache
498effects, given that these measurements are at the nano-second level.
499
500--------------
501
502*Copyright (c) 2019-2025, Arm Limited and Contributors. All rights reserved.*
503
504.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
505.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
506.. _TF-A v2.14-rc0: https://git.trustedfirmware.org/plugins/gitiles/TF-A/trusted-firmware-a/+/refs/tags/v2.14-rc0
507.. _TFTF v2.14-rc0: https://git.trustedfirmware.org/plugins/gitiles/TF-A/tf-a-tests/+/refs/tags/v2.14-rc0
508