xref: /rk3399_ARM-atf/docs/perf/psci-performance-juno.rst (revision 932d6cdb2557d2b39e32d8c5004fc5852cba413e)
140d553cfSPaul BeesleyPSCI Performance Measurements on Arm Juno Development Platform
240d553cfSPaul Beesley==============================================================
340d553cfSPaul Beesley
440d553cfSPaul BeesleyThis document summarises the findings of performance measurements of key
5bd97f83aSJohn Tsichritzisoperations in the Trusted Firmware-A Power State Coordination Interface (PSCI)
6bd97f83aSJohn Tsichritzisimplementation, using the in-built Performance Measurement Framework (PMF) and
7bd97f83aSJohn Tsichritzisruntime instrumentation timestamps.
840d553cfSPaul Beesley
940d553cfSPaul BeesleyMethod
1040d553cfSPaul Beesley------
1140d553cfSPaul Beesley
1240d553cfSPaul BeesleyWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2
1340d553cfSPaul Beesleyx Cortex-A57 clusters running at the following frequencies:
1440d553cfSPaul Beesley
1540d553cfSPaul Beesley+-----------------+--------------------+
1640d553cfSPaul Beesley| Domain          | Frequency (MHz)    |
1740d553cfSPaul Beesley+=================+====================+
1840d553cfSPaul Beesley| Cortex-A57      | 900 (nominal)      |
1940d553cfSPaul Beesley+-----------------+--------------------+
2040d553cfSPaul Beesley| Cortex-A53      | 650 (underdrive)   |
2140d553cfSPaul Beesley+-----------------+--------------------+
2240d553cfSPaul Beesley| AXI subsystem   | 533                |
2340d553cfSPaul Beesley+-----------------+--------------------+
2440d553cfSPaul Beesley
2540d553cfSPaul BeesleyJuno supports CPU, cluster and system power down states, corresponding to power
2640d553cfSPaul Beesleylevels 0, 1 and 2 respectively. It does not support any retention states.
2740d553cfSPaul Beesley
28a3077ae1SHarrison MutaiGiven that runtime instrumentation using PMF is invasive, there is a small
29a3077ae1SHarrison Mutai(unquantified) overhead on the results. PMF uses the generic counter for
30a3077ae1SHarrison Mutaitimestamps, which runs at 50MHz on Juno.
31a3077ae1SHarrison Mutai
32a3077ae1SHarrison MutaiThe following source trees and binaries were used:
33a3077ae1SHarrison Mutai
34*932d6cdbSHarrison Mutai- `TF-A v2.11-rc0`_
35*932d6cdbSHarrison Mutai- `TFTF v2.11-rc0`_
36a3077ae1SHarrison Mutai
375fdf198cSThaddeus SernaPlease see the Runtime Instrumentation :ref:`Testing Methodology
385fdf198cSThaddeus Serna<Runtime Instrumentation Methodology>`
395fdf198cSThaddeus Sernapage for more details.
40a3077ae1SHarrison Mutai
41a3077ae1SHarrison MutaiProcedure
42a3077ae1SHarrison Mutai---------
43a3077ae1SHarrison Mutai
44a3077ae1SHarrison Mutai#. Build TFTF with runtime instrumentation enabled:
4540d553cfSPaul Beesley
4629c02529SPaul Beesley    .. code:: shell
4740d553cfSPaul Beesley
48a3077ae1SHarrison Mutai        make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
49a3077ae1SHarrison Mutai            TESTS=runtime-instrumentation all
5040d553cfSPaul Beesley
51a3077ae1SHarrison Mutai#. Fetch Juno's SCP binary from TF-A's archive:
5240d553cfSPaul Beesley
53a3077ae1SHarrison Mutai    .. code:: shell
5440d553cfSPaul Beesley
55a3077ae1SHarrison Mutai        curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \
56a3077ae1SHarrison Mutai            https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin
5740d553cfSPaul Beesley
58a3077ae1SHarrison Mutai#. Build TF-A with the following build options:
5940d553cfSPaul Beesley
60a3077ae1SHarrison Mutai    .. code:: shell
61a3077ae1SHarrison Mutai
62a3077ae1SHarrison Mutai        make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
63a3077ae1SHarrison Mutai            BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \
64a3077ae1SHarrison Mutai            ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip
65a3077ae1SHarrison Mutai
66a3077ae1SHarrison Mutai#. Load the following images onto the development board: ``fip.bin``,
67a3077ae1SHarrison Mutai   ``scp_bl2.bin``.
68a3077ae1SHarrison Mutai
69a3077ae1SHarrison MutaiResults
70a3077ae1SHarrison Mutai-------
71a3077ae1SHarrison Mutai
72a3077ae1SHarrison Mutai``CPU_SUSPEND`` to deepest power level
73a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
74a3077ae1SHarrison Mutai
75a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
76*932d6cdbSHarrison Mutai        parallel (v2.11)
77a3077ae1SHarrison Mutai
78*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------------------+-------------+
7994276a56SHarrison Mutai    | Cluster | Core |     Powerdown     |       Wakeup       | Cache Flush |
80*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------------------+-------------+
81*932d6cdbSHarrison Mutai    |    0    |  0   |  112.98 (-53.44%) |  26.16 (-89.33%)   |     5.48    |
82*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------------------+-------------+
83*932d6cdbSHarrison Mutai    |    0    |  1   |       411.18      | 438.88 (+1572.56%) |    138.54   |
84*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------------------+-------------+
85*932d6cdbSHarrison Mutai    |    1    |  0   | 261.82 (+150.88%) | 474.06 (+1649.30%) |     5.6     |
86*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------------------+-------------+
87*932d6cdbSHarrison Mutai    |    1    |  1   |  714.76 (+86.84%) |       26.44        |     4.48    |
88*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------------------+-------------+
89*932d6cdbSHarrison Mutai    |    1    |  2   |       862.66      |  149.34 (-45.00%)  |     4.38    |
90*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------------------+-------------+
91*932d6cdbSHarrison Mutai    |    1    |  3   |      1045.12      |  98.12 (-55.76%)   |    79.74    |
92*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------------------+-------------+
93a3077ae1SHarrison Mutai
94a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
9594276a56SHarrison Mutai        parallel (v2.10)
96a3077ae1SHarrison Mutai
9794276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
9894276a56SHarrison Mutai    | Cluster | Core |     Powerdown     | Wakeup | Cache Flush |
9994276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
10094276a56SHarrison Mutai    |    0    |  0   | 242.66 (+132.03%) | 245.1  |     5.4     |
10194276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
10294276a56SHarrison Mutai    |    0    |  1   |  522.08 (+35.87%) | 26.24  |    138.32   |
10394276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
10494276a56SHarrison Mutai    |    1    |  0   |  104.36 (-57.33%) |  27.1  |     5.32    |
10594276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
10694276a56SHarrison Mutai    |    1    |  1   |  382.56 (-42.95%) | 23.34  |     4.42    |
10794276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
10894276a56SHarrison Mutai    |    1    |  2   |       807.74      | 271.54 |     4.64    |
10994276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
11094276a56SHarrison Mutai    |    1    |  3   |       981.36      | 221.8  |    79.48    |
11194276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
11294276a56SHarrison Mutai
11394276a56SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
114*932d6cdbSHarrison Mutai        serial (v2.11)
11594276a56SHarrison Mutai
11694276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
11794276a56SHarrison Mutai    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
11894276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
119*932d6cdbSHarrison Mutai    |    0    |  0   |   244.42  | 27.42  |    138.12   |
12094276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
121*932d6cdbSHarrison Mutai    |    0    |  1   |   245.02  | 27.34  |    138.08   |
12294276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
123*932d6cdbSHarrison Mutai    |    1    |  0   |   297.66  |  26.2  |    77.68    |
12494276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
125*932d6cdbSHarrison Mutai    |    1    |  1   |   108.02  | 21.94  |     4.52    |
12694276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
127*932d6cdbSHarrison Mutai    |    1    |  2   |   107.48  | 21.88  |     4.46    |
12894276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
129*932d6cdbSHarrison Mutai    |    1    |  3   |   107.52  | 21.86  |     4.46    |
13094276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
13194276a56SHarrison Mutai
13294276a56SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
13394276a56SHarrison Mutai        serial (v2.10)
13494276a56SHarrison Mutai
13594276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
13694276a56SHarrison Mutai    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
13794276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
13894276a56SHarrison Mutai    |    0    |  0   |   236.84  |  27.1  |    138.36   |
13994276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
14094276a56SHarrison Mutai    |    0    |  1   |   236.96  |  27.1  |    138.32   |
14194276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
14294276a56SHarrison Mutai    |    1    |  0   |   280.06  | 26.94  |     77.5    |
14394276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
14494276a56SHarrison Mutai    |    1    |  1   |   100.76  | 23.42  |     4.36    |
14594276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
14694276a56SHarrison Mutai    |    1    |  2   |   100.02  | 23.42  |     4.44    |
14794276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
14894276a56SHarrison Mutai    |    1    |  3   |   100.08  |  23.2  |     4.4     |
14994276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
150a3077ae1SHarrison Mutai
151a3077ae1SHarrison Mutai``CPU_SUSPEND`` to power level 0
152a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153a3077ae1SHarrison Mutai
154a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
155*932d6cdbSHarrison Mutai        parallel (v2.11)
156a3077ae1SHarrison Mutai
157*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------+-------------+
15894276a56SHarrison Mutai    | Cluster | Core |     Powerdown     | Wakeup | Cache Flush |
159*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------+-------------+
160*932d6cdbSHarrison Mutai    |    0    |  0   |       704.46      | 19.28  |     7.86    |
161*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------+-------------+
162*932d6cdbSHarrison Mutai    |    0    |  1   |       853.66      | 18.78  |     7.82    |
163*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------+-------------+
164*932d6cdbSHarrison Mutai    |    1    |  0   | 556.52 (+425.51%) | 19.06  |     7.82    |
165*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------+-------------+
166*932d6cdbSHarrison Mutai    |    1    |  1   |  113.28 (-70.47%) | 19.28  |     7.48    |
167*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------+-------------+
168*932d6cdbSHarrison Mutai    |    1    |  2   |  260.62 (-50.22%) |  19.8  |     7.26    |
169*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------+-------------+
170*932d6cdbSHarrison Mutai    |    1    |  3   |  408.16 (+66.94%) | 19.82  |     7.38    |
171*932d6cdbSHarrison Mutai    +---------+------+-------------------+--------+-------------+
172a3077ae1SHarrison Mutai
17394276a56SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
17494276a56SHarrison Mutai        parallel (v2.10)
175a3077ae1SHarrison Mutai
17694276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
17794276a56SHarrison Mutai    | Cluster | Core |     Powerdown     | Wakeup | Cache Flush |
17894276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
17994276a56SHarrison Mutai    |    0    |  0   |       801.04      | 18.66  |     8.22    |
18094276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
18194276a56SHarrison Mutai    |    0    |  1   |       661.28      | 19.08  |     7.88    |
18294276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
18394276a56SHarrison Mutai    |    1    |  0   |  105.9 (-72.51%)  |  20.3  |     7.58    |
18494276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
18594276a56SHarrison Mutai    |    1    |  1   | 383.58 (+261.32%) |  20.4  |     7.42    |
18694276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
18794276a56SHarrison Mutai    |    1    |  2   |       523.52      |  20.1  |     7.74    |
18894276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
18994276a56SHarrison Mutai    |    1    |  3   |       244.5       | 20.16  |     7.56    |
19094276a56SHarrison Mutai    +---------+------+-------------------+--------+-------------+
19194276a56SHarrison Mutai
192*932d6cdbSHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.11)
19394276a56SHarrison Mutai
19494276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
19594276a56SHarrison Mutai    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
19694276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
197*932d6cdbSHarrison Mutai    |    0    |  0   |   106.78  |  19.2  |     5.32    |
19894276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
199*932d6cdbSHarrison Mutai    |    0    |  1   |   107.44  | 19.64  |     5.44    |
20094276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
201*932d6cdbSHarrison Mutai    |    1    |  0   |   295.82  | 19.14  |     4.34    |
20294276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
203*932d6cdbSHarrison Mutai    |    1    |  1   |   104.34  | 19.18  |     4.28    |
20494276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
205*932d6cdbSHarrison Mutai    |    1    |  2   |   103.96  | 19.34  |     4.4     |
20694276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
207*932d6cdbSHarrison Mutai    |    1    |  3   |   104.32  | 19.18  |     4.34    |
20894276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
20994276a56SHarrison Mutai
21094276a56SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.10)
21194276a56SHarrison Mutai
21294276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
21394276a56SHarrison Mutai    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
21494276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
21594276a56SHarrison Mutai    |    0    |  0   |   99.84   | 18.86  |     5.54    |
21694276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
21794276a56SHarrison Mutai    |    0    |  1   |   100.2   | 18.82  |     5.66    |
21894276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
21994276a56SHarrison Mutai    |    1    |  0   |   278.12  | 20.56  |     4.48    |
22094276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
22194276a56SHarrison Mutai    |    1    |  1   |   96.68   | 20.62  |     4.3     |
22294276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
22394276a56SHarrison Mutai    |    1    |  2   |   96.94   | 20.14  |     4.42    |
22494276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
22594276a56SHarrison Mutai    |    1    |  3   |   96.68   | 20.46  |     4.32    |
22694276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
227a3077ae1SHarrison Mutai
228a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs
229a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
230a3077ae1SHarrison Mutai
231a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
232a3077ae1SHarrison Mutaicore to the deepest power level.
233a3077ae1SHarrison Mutai
234*932d6cdbSHarrison Mutai.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.11)
235a3077ae1SHarrison Mutai
23694276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
23794276a56SHarrison Mutai    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
23894276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
239*932d6cdbSHarrison Mutai    |    0    |  0   |   243.62  | 29.84  |    137.66   |
24094276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
241*932d6cdbSHarrison Mutai    |    0    |  1   |   243.88  | 29.54  |    137.8    |
24294276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
243*932d6cdbSHarrison Mutai    |    1    |  0   |   183.26  | 26.22  |    77.76    |
24494276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
245*932d6cdbSHarrison Mutai    |    1    |  1   |   107.64  | 26.74  |     4.34    |
24694276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
247*932d6cdbSHarrison Mutai    |    1    |  2   |   107.52  |  25.9  |     4.32    |
24894276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
249*932d6cdbSHarrison Mutai    |    1    |  3   |   107.74  |  25.8  |     4.34    |
25094276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
25194276a56SHarrison Mutai
25294276a56SHarrison Mutai.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.10)
25394276a56SHarrison Mutai
25494276a56SHarrison Mutai    +---------------------------------------------------+
25594276a56SHarrison Mutai    |       test_rt_instr_cpu_off_serial (latest)       |
25694276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
25794276a56SHarrison Mutai    | Cluster | Core | Powerdown | Wakeup | Cache Flush |
25894276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
25994276a56SHarrison Mutai    |    0    |  0   |   236.04  | 30.02  |    137.9    |
26094276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
26194276a56SHarrison Mutai    |    0    |  1   |   235.38  |  29.7  |    137.72   |
26294276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
26394276a56SHarrison Mutai    |    1    |  0   |   175.18  | 26.96  |    77.26    |
26494276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
26594276a56SHarrison Mutai    |    1    |  1   |   100.56  | 28.34  |     4.32    |
26694276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
26794276a56SHarrison Mutai    |    1    |  2   |   100.38  | 26.82  |     4.3     |
26894276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
26994276a56SHarrison Mutai    |    1    |  3   |   100.86  | 26.98  |     4.42    |
27094276a56SHarrison Mutai    +---------+------+-----------+--------+-------------+
271a3077ae1SHarrison Mutai
272a3077ae1SHarrison Mutai``CPU_VERSION`` in parallel
273a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~
274a3077ae1SHarrison Mutai
275*932d6cdbSHarrison Mutai.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.11)
276a3077ae1SHarrison Mutai
277*932d6cdbSHarrison Mutai    +-------------+--------+--------------+
278a3077ae1SHarrison Mutai    |   Cluster   |  Core  |   Latency    |
279*932d6cdbSHarrison Mutai    +-------------+--------+--------------+
280*932d6cdbSHarrison Mutai    |      0      |   0    |     1.26     |
281*932d6cdbSHarrison Mutai    +-------------+--------+--------------+
282*932d6cdbSHarrison Mutai    |      0      |   1    |     0.96     |
283*932d6cdbSHarrison Mutai    +-------------+--------+--------------+
284*932d6cdbSHarrison Mutai    |      1      |   0    |     0.54     |
285*932d6cdbSHarrison Mutai    +-------------+--------+--------------+
286*932d6cdbSHarrison Mutai    |      1      |   1    |     0.94     |
287*932d6cdbSHarrison Mutai    +-------------+--------+--------------+
288*932d6cdbSHarrison Mutai    |      1      |   2    |     0.92     |
289*932d6cdbSHarrison Mutai    +-------------+--------+--------------+
290*932d6cdbSHarrison Mutai    |      1      |   3    |     1.02     |
291*932d6cdbSHarrison Mutai    +-------------+--------+--------------+
29294276a56SHarrison Mutai
29394276a56SHarrison Mutai.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.10)
29494276a56SHarrison Mutai
29594276a56SHarrison Mutai    +-------------+--------+----------------------+
29694276a56SHarrison Mutai    |   Cluster   |  Core  |       Latency        |
29794276a56SHarrison Mutai    +-------------+--------+----------------------+
29894276a56SHarrison Mutai    |      0      |   0    |    1.1 (-25.68%)     |
29994276a56SHarrison Mutai    +-------------+--------+----------------------+
30094276a56SHarrison Mutai    |      0      |   1    |         1.06         |
30194276a56SHarrison Mutai    +-------------+--------+----------------------+
30294276a56SHarrison Mutai    |      1      |   0    |         0.58         |
30394276a56SHarrison Mutai    +-------------+--------+----------------------+
30494276a56SHarrison Mutai    |      1      |   1    |         0.88         |
30594276a56SHarrison Mutai    +-------------+--------+----------------------+
30694276a56SHarrison Mutai    |      1      |   2    |         0.92         |
30794276a56SHarrison Mutai    +-------------+--------+----------------------+
30894276a56SHarrison Mutai    |      1      |   3    |         0.9          |
30994276a56SHarrison Mutai    +-------------+--------+----------------------+
310a3077ae1SHarrison Mutai
311a3077ae1SHarrison MutaiAnnotated Historic Results
312a3077ae1SHarrison Mutai--------------------------
313a3077ae1SHarrison Mutai
314a3077ae1SHarrison MutaiThe following results are based on the upstream `TF master as of 31/01/2017`_.
315a3077ae1SHarrison MutaiTF-A was built using the same build instructions as detailed in the procedure
316a3077ae1SHarrison Mutaiabove.
31740d553cfSPaul Beesley
31840d553cfSPaul BeesleyIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
31940d553cfSPaul BeesleyCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
32040d553cfSPaul BeesleyCPU.
32140d553cfSPaul Beesley
322a3077ae1SHarrison Mutai``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and
323a3077ae1SHarrison Mutai``CFLUSH_OVERHEAD`` the latency of the cache flush operation.
32440d553cfSPaul Beesley
32540d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
32640d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
32740d553cfSPaul Beesley
32840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
32940d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
33040d553cfSPaul Beesley+=======+=====================+====================+==========================+
33140d553cfSPaul Beesley| 0     | 27                  | 20                 | 5                        |
33240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
33340d553cfSPaul Beesley| 1     | 114                 | 86                 | 5                        |
33440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
33540d553cfSPaul Beesley| 2     | 202                 | 58                 | 5                        |
33640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
33740d553cfSPaul Beesley| 3     | 375                 | 29                 | 94                       |
33840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
33940d553cfSPaul Beesley| 4     | 20                  | 22                 | 6                        |
34040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34140d553cfSPaul Beesley| 5     | 290                 | 18                 | 206                      |
34240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34340d553cfSPaul Beesley
34440d553cfSPaul BeesleyA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is
34540d553cfSPaul Beesleyobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait
34640d553cfSPaul Beesleyfor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release
34740d553cfSPaul Beesleythe lock before proceeding.
34840d553cfSPaul Beesley
34940d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the
35040d553cfSPaul Beesleylast CPUs in their respective clusters to power down, therefore both the L1 and
35140d553cfSPaul BeesleyL2 caches are flushed.
35240d553cfSPaul Beesley
35340d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3
35440d553cfSPaul Beesleybecause the L2 cache size for the big cluster is lot larger (2MB) compared to
35540d553cfSPaul Beesleythe little cluster (1MB).
35640d553cfSPaul Beesley
35740d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in parallel
35840d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
35940d553cfSPaul Beesley
36040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36140d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
36240d553cfSPaul Beesley+=======+=====================+====================+==========================+
36340d553cfSPaul Beesley| 0     | 116                 | 14                 | 8                        |
36440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36540d553cfSPaul Beesley| 1     | 204                 | 14                 | 8                        |
36640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36740d553cfSPaul Beesley| 2     | 287                 | 13                 | 8                        |
36840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36940d553cfSPaul Beesley| 3     | 376                 | 13                 | 9                        |
37040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
37140d553cfSPaul Beesley| 4     | 29                  | 15                 | 7                        |
37240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
37340d553cfSPaul Beesley| 5     | 21                  | 15                 | 8                        |
37440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
37540d553cfSPaul Beesley
37640d553cfSPaul BeesleyThere is no lock contention in TF generic code at power level 0 but the large
37740d553cfSPaul Beesleyvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno
37840d553cfSPaul Beesleyplatform code. The platform lock is used to mediate access to a single SCP
37940d553cfSPaul Beesleycommunication channel. This is compounded by the SCP firmware waiting for each
38040d553cfSPaul BeesleyAP CPU to enter WFI before making the channel available to other CPUs, which
38140d553cfSPaul Beesleyeffectively serializes the SCP power down commands from all CPUs.
38240d553cfSPaul Beesley
38340d553cfSPaul BeesleyOn platforms with a more efficient CPU power down mechanism, it should be
38440d553cfSPaul Beesleypossible to make the ``PSCI_ENTRY`` times smaller and consistent.
38540d553cfSPaul Beesley
38640d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not
38740d553cfSPaul Beesleyrequire locks at power level 0.
38840d553cfSPaul Beesley
38940d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only
39040d553cfSPaul Beesleythe cache associated with power level 0 is flushed (L1).
39140d553cfSPaul Beesley
39240d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in sequence
39340d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
39440d553cfSPaul Beesley
39540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
39640d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
39740d553cfSPaul Beesley+=======+=====================+====================+==========================+
39840d553cfSPaul Beesley| 0     | 114                 | 20                 | 94                       |
39940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40040d553cfSPaul Beesley| 1     | 114                 | 20                 | 94                       |
40140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40240d553cfSPaul Beesley| 2     | 114                 | 20                 | 94                       |
40340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40440d553cfSPaul Beesley| 3     | 114                 | 20                 | 94                       |
40540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40640d553cfSPaul Beesley| 4     | 195                 | 22                 | 180                      |
40740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40840d553cfSPaul Beesley| 5     | 21                  | 17                 | 6                        |
40940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
41040d553cfSPaul Beesley
411be653a69SPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster
41240d553cfSPaul Beesleyare large because all other CPUs in the cluster are powered down during the
41340d553cfSPaul Beesleytest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a
41440d553cfSPaul Beesleyflush of both L1 and L2 caches.
41540d553cfSPaul Beesley
41640d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
41740d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
41840d553cfSPaul Beesleyto the little cluster (1MB).
41940d553cfSPaul Beesley
42040d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead
42140d553cfSPaul BeesleyCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to
42240d553cfSPaul Beesleylevel 0, which only requires L1 cache flush.
42340d553cfSPaul Beesley
42440d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in sequence
42540d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
42640d553cfSPaul Beesley
42740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
42840d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
42940d553cfSPaul Beesley+=======+=====================+====================+==========================+
43040d553cfSPaul Beesley| 0     | 22                  | 14                 | 5                        |
43140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
43240d553cfSPaul Beesley| 1     | 22                  | 14                 | 5                        |
43340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
43440d553cfSPaul Beesley| 2     | 21                  | 14                 | 5                        |
43540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
43640d553cfSPaul Beesley| 3     | 22                  | 14                 | 5                        |
43740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
43840d553cfSPaul Beesley| 4     | 17                  | 14                 | 6                        |
43940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
44040d553cfSPaul Beesley| 5     | 18                  | 15                 | 6                        |
44140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
44240d553cfSPaul Beesley
44340d553cfSPaul BeesleyHere the times are small and consistent since there is no contention and it is
44440d553cfSPaul Beesleyonly necessary to flush the cache to power level 0 (L1). This is the best case
44540d553cfSPaul Beesleyscenario.
44640d553cfSPaul Beesley
44740d553cfSPaul BeesleyThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than
44840d553cfSPaul Beesleyfor the CPUs in little cluster due to greater CPU performance.
44940d553cfSPaul Beesley
45040d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are generally lower than in the last test because the
45140d553cfSPaul Beesleycluster remains powered on throughout the test and there is less code to execute
45240d553cfSPaul Beesleyon power on (for example, no need to enter CCI coherency)
45340d553cfSPaul Beesley
45440d553cfSPaul Beesley``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level
45540d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45640d553cfSPaul Beesley
45740d553cfSPaul BeesleyThe test sequence here is as follows:
45840d553cfSPaul Beesley
45940d553cfSPaul Beesley1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence.
46040d553cfSPaul Beesley
46140d553cfSPaul Beesley2. Program wake up timer and suspend the lead CPU to the deepest power level.
46240d553cfSPaul Beesley
46340d553cfSPaul Beesley3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU.
46440d553cfSPaul Beesley
46540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
46640d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
46740d553cfSPaul Beesley+=======+=====================+====================+==========================+
46840d553cfSPaul Beesley| 0     | 110                 | 28                 | 93                       |
46940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
47040d553cfSPaul Beesley| 1     | 110                 | 28                 | 93                       |
47140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
47240d553cfSPaul Beesley| 2     | 110                 | 28                 | 93                       |
47340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
47440d553cfSPaul Beesley| 3     | 111                 | 28                 | 93                       |
47540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
47640d553cfSPaul Beesley| 4     | 195                 | 22                 | 181                      |
47740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
47840d553cfSPaul Beesley| 5     | 20                  | 23                 | 6                        |
47940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
48040d553cfSPaul Beesley
48140d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other
48240d553cfSPaul BeesleyCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call
48340d553cfSPaul Beesleypowers down to the cluster level, requiring a flush of both L1 and L2 caches.
48440d553cfSPaul Beesley
48540d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because
48640d553cfSPaul Beesleylead CPU 4 is running and CPU 5 only powers down to level 0, which only requires
48740d553cfSPaul Beesleyan L1 cache flush.
48840d553cfSPaul Beesley
48940d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
49040d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
49140d553cfSPaul Beesleyto the little cluster (1MB).
49240d553cfSPaul Beesley
49340d553cfSPaul BeesleyThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than
49440d553cfSPaul Beesleyfor CPUs in the little cluster due to greater CPU performance.  These times
49540d553cfSPaul Beesleygenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests
49640d553cfSPaul Beesleybecause there is more code to execute in the "on finisher" compared to the
49740d553cfSPaul Beesley"suspend finisher" (for example, GIC redistributor register programming).
49840d553cfSPaul Beesley
49940d553cfSPaul Beesley``PSCI_VERSION`` on all CPUs in parallel
50040d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
50140d553cfSPaul Beesley
50240d553cfSPaul BeesleySince very little code is associated with ``PSCI_VERSION``, this test
50340d553cfSPaul Beesleyapproximates the round trip latency for handling a fast SMC at EL3 in TF.
50440d553cfSPaul Beesley
50540d553cfSPaul Beesley+-------+-------------------+
50640d553cfSPaul Beesley| CPU   | TOTAL TIME (ns)   |
50740d553cfSPaul Beesley+=======+===================+
50840d553cfSPaul Beesley| 0     | 3020              |
50940d553cfSPaul Beesley+-------+-------------------+
51040d553cfSPaul Beesley| 1     | 2940              |
51140d553cfSPaul Beesley+-------+-------------------+
51240d553cfSPaul Beesley| 2     | 2980              |
51340d553cfSPaul Beesley+-------+-------------------+
51440d553cfSPaul Beesley| 3     | 3060              |
51540d553cfSPaul Beesley+-------+-------------------+
51640d553cfSPaul Beesley| 4     | 520               |
51740d553cfSPaul Beesley+-------+-------------------+
51840d553cfSPaul Beesley| 5     | 720               |
51940d553cfSPaul Beesley+-------+-------------------+
52040d553cfSPaul Beesley
52140d553cfSPaul BeesleyThe times for the big CPUs are less than the little CPUs due to greater CPU
52240d553cfSPaul Beesleyperformance.
52340d553cfSPaul Beesley
52440d553cfSPaul BeesleyWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache
52540d553cfSPaul Beesleyeffects, given that these measurements are at the nano-second level.
52640d553cfSPaul Beesley
527bd97f83aSJohn Tsichritzis--------------
528bd97f83aSJohn Tsichritzis
529*932d6cdbSHarrison Mutai*Copyright (c) 2019-2024, Arm Limited and Contributors. All rights reserved.*
530bd97f83aSJohn Tsichritzis
5310cbcccc0SHarrison Mutai.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
53240d553cfSPaul Beesley.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
533*932d6cdbSHarrison Mutai.. _TF-A v2.11-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.11-rc0
534*932d6cdbSHarrison Mutai.. _TFTF v2.11-rc0: https://git.trustedfirmware.org/TF-A/tf-a-tests.git/tree/?h=v2.11-rc0
535