xref: /rk3399_ARM-atf/docs/perf/psci-performance-juno.rst (revision 5fdf198c117a4b6dbcf5242f5136f7224ceff6ff)
140d553cfSPaul BeesleyPSCI Performance Measurements on Arm Juno Development Platform
240d553cfSPaul Beesley==============================================================
340d553cfSPaul Beesley
440d553cfSPaul BeesleyThis document summarises the findings of performance measurements of key
5bd97f83aSJohn Tsichritzisoperations in the Trusted Firmware-A Power State Coordination Interface (PSCI)
6bd97f83aSJohn Tsichritzisimplementation, using the in-built Performance Measurement Framework (PMF) and
7bd97f83aSJohn Tsichritzisruntime instrumentation timestamps.
840d553cfSPaul Beesley
940d553cfSPaul BeesleyMethod
1040d553cfSPaul Beesley------
1140d553cfSPaul Beesley
1240d553cfSPaul BeesleyWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2
1340d553cfSPaul Beesleyx Cortex-A57 clusters running at the following frequencies:
1440d553cfSPaul Beesley
1540d553cfSPaul Beesley+-----------------+--------------------+
1640d553cfSPaul Beesley| Domain          | Frequency (MHz)    |
1740d553cfSPaul Beesley+=================+====================+
1840d553cfSPaul Beesley| Cortex-A57      | 900 (nominal)      |
1940d553cfSPaul Beesley+-----------------+--------------------+
2040d553cfSPaul Beesley| Cortex-A53      | 650 (underdrive)   |
2140d553cfSPaul Beesley+-----------------+--------------------+
2240d553cfSPaul Beesley| AXI subsystem   | 533                |
2340d553cfSPaul Beesley+-----------------+--------------------+
2440d553cfSPaul Beesley
2540d553cfSPaul BeesleyJuno supports CPU, cluster and system power down states, corresponding to power
2640d553cfSPaul Beesleylevels 0, 1 and 2 respectively. It does not support any retention states.
2740d553cfSPaul Beesley
28a3077ae1SHarrison MutaiGiven that runtime instrumentation using PMF is invasive, there is a small
29a3077ae1SHarrison Mutai(unquantified) overhead on the results. PMF uses the generic counter for
30a3077ae1SHarrison Mutaitimestamps, which runs at 50MHz on Juno.
31a3077ae1SHarrison Mutai
32a3077ae1SHarrison MutaiThe following source trees and binaries were used:
33a3077ae1SHarrison Mutai
34a3077ae1SHarrison Mutai- TF-A [`v2.9-rc0`_]
35a3077ae1SHarrison Mutai- TFTF [`v2.9-rc0`_]
36a3077ae1SHarrison Mutai
37*5fdf198cSThaddeus SernaPlease see the Runtime Instrumentation :ref:`Testing Methodology
38*5fdf198cSThaddeus Serna<Runtime Instrumentation Methodology>`
39*5fdf198cSThaddeus Sernapage for more details.
40a3077ae1SHarrison Mutai
41a3077ae1SHarrison MutaiProcedure
42a3077ae1SHarrison Mutai---------
43a3077ae1SHarrison Mutai
44a3077ae1SHarrison Mutai#. Build TFTF with runtime instrumentation enabled:
4540d553cfSPaul Beesley
4629c02529SPaul Beesley    .. code:: shell
4740d553cfSPaul Beesley
48a3077ae1SHarrison Mutai        make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
49a3077ae1SHarrison Mutai            TESTS=runtime-instrumentation all
5040d553cfSPaul Beesley
51a3077ae1SHarrison Mutai#. Fetch Juno's SCP binary from TF-A's archive:
5240d553cfSPaul Beesley
53a3077ae1SHarrison Mutai    .. code:: shell
5440d553cfSPaul Beesley
55a3077ae1SHarrison Mutai        curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \
56a3077ae1SHarrison Mutai            https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin
5740d553cfSPaul Beesley
58a3077ae1SHarrison Mutai#. Build TF-A with the following build options:
5940d553cfSPaul Beesley
60a3077ae1SHarrison Mutai    .. code:: shell
61a3077ae1SHarrison Mutai
62a3077ae1SHarrison Mutai        make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
63a3077ae1SHarrison Mutai            BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \
64a3077ae1SHarrison Mutai            ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip
65a3077ae1SHarrison Mutai
66a3077ae1SHarrison Mutai#. Load the following images onto the development board: ``fip.bin``,
67a3077ae1SHarrison Mutai   ``scp_bl2.bin``.
68a3077ae1SHarrison Mutai
69a3077ae1SHarrison MutaiResults
70a3077ae1SHarrison Mutai-------
71a3077ae1SHarrison Mutai
72a3077ae1SHarrison Mutai``CPU_SUSPEND`` to deepest power level
73a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
74a3077ae1SHarrison Mutai
75a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
76a3077ae1SHarrison Mutai        parallel
77a3077ae1SHarrison Mutai
78a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
79a3077ae1SHarrison Mutai    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
80a3077ae1SHarrison Mutai    +=========+======+===========+=========+=============+
81a3077ae1SHarrison Mutai    |    0    |  0   |   243.76  |  239.92 |     6.32    |
82a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
83a3077ae1SHarrison Mutai    |    0    |  1   |   663.5   |  30.32  |    167.82   |
84a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
85a3077ae1SHarrison Mutai    |    1    |  0   |   105.12  |  22.84  |     5.88    |
86a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
87a3077ae1SHarrison Mutai    |    1    |  1   |   384.16  |  19.06  |     4.7     |
88a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
89a3077ae1SHarrison Mutai    |    1    |  2   |   523.98  |  270.46 |     4.74    |
90a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
91a3077ae1SHarrison Mutai    |    1    |  3   |   950.54  |  220.9  |     89.2    |
92a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
93a3077ae1SHarrison Mutai
94a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
95a3077ae1SHarrison Mutai        serial
96a3077ae1SHarrison Mutai
97a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
98a3077ae1SHarrison Mutai    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
99a3077ae1SHarrison Mutai    +=========+======+===========+=========+=============+
100a3077ae1SHarrison Mutai    |    0    |  0   |   266.96  |  31.74  |    167.92   |
101a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
102a3077ae1SHarrison Mutai    |    0    |  1   |   266.9   |  31.52  |    167.82   |
103a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
104a3077ae1SHarrison Mutai    |    1    |  0   |   279.86  |  23.42  |    87.52    |
105a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
106a3077ae1SHarrison Mutai    |    1    |  1   |   101.38  |   18.8  |     4.64    |
107a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
108a3077ae1SHarrison Mutai    |    1    |  2   |   101.18  |  19.28  |     4.64    |
109a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
110a3077ae1SHarrison Mutai    |    1    |  3   |   101.32  |  19.02  |     4.62    |
111a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
112a3077ae1SHarrison Mutai
113a3077ae1SHarrison Mutai``CPU_SUSPEND`` to power level 0
114a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
115a3077ae1SHarrison Mutai
116a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
117a3077ae1SHarrison Mutai        parallel
118a3077ae1SHarrison Mutai
119a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
120a3077ae1SHarrison Mutai    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
121a3077ae1SHarrison Mutai    +=========+======+===========+=========+=============+
122a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
123a3077ae1SHarrison Mutai    |    0    |  0   |   661.94  |  22.88  |     9.66    |
124a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
125a3077ae1SHarrison Mutai    |    0    |  1   |   801.64  |  23.38  |     9.62    |
126a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
127a3077ae1SHarrison Mutai    |    1    |  0   |   105.56  |  16.02  |     8.12    |
128a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
129a3077ae1SHarrison Mutai    |    1    |  1   |   245.42  |  16.26  |     7.78    |
130a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
131a3077ae1SHarrison Mutai    |    1    |  2   |   384.42  |   16.1  |     7.84    |
132a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
133a3077ae1SHarrison Mutai    |    1    |  3   |   523.74  |   15.4  |     8.02    |
134a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
135a3077ae1SHarrison Mutai
136a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial
137a3077ae1SHarrison Mutai
138a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
139a3077ae1SHarrison Mutai    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
140a3077ae1SHarrison Mutai    +=========+======+===========+=========+=============+
141a3077ae1SHarrison Mutai    |    0    |  0   |   102.16  |  23.64  |     6.7     |
142a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
143a3077ae1SHarrison Mutai    |    0    |  1   |   101.66  |  23.78  |     6.6     |
144a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
145a3077ae1SHarrison Mutai    |    1    |  0   |   277.74  |  15.96  |     4.66    |
146a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
147a3077ae1SHarrison Mutai    |    1    |  1   |    98.0   |  15.88  |     4.64    |
148a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
149a3077ae1SHarrison Mutai    |    1    |  2   |   97.66   |  15.88  |     4.62    |
150a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
151a3077ae1SHarrison Mutai    |    1    |  3   |   97.76   |  15.38  |     4.64    |
152a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
153a3077ae1SHarrison Mutai
154a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs
155a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
156a3077ae1SHarrison Mutai
157a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
158a3077ae1SHarrison Mutaicore to the deepest power level.
159a3077ae1SHarrison Mutai
160a3077ae1SHarrison Mutai.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs
161a3077ae1SHarrison Mutai
162a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
163a3077ae1SHarrison Mutai    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
164a3077ae1SHarrison Mutai    +=========+======+===========+=========+=============+
165a3077ae1SHarrison Mutai    |    0    |  0   |   265.38  |  34.12  |    167.36   |
166a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
167a3077ae1SHarrison Mutai    |    0    |  1   |   265.72  |  33.98  |    167.48   |
168a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
169a3077ae1SHarrison Mutai    |    1    |  0   |   185.3   |  23.18  |    87.42    |
170a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
171a3077ae1SHarrison Mutai    |    1    |  1   |   101.58  |  23.46  |     4.48    |
172a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
173a3077ae1SHarrison Mutai    |    1    |  2   |   101.66  |  22.02  |     4.72    |
174a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
175a3077ae1SHarrison Mutai    |    1    |  3   |   101.48  |  22.22  |     4.52    |
176a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
177a3077ae1SHarrison Mutai
178a3077ae1SHarrison Mutai``CPU_VERSION`` in parallel
179a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~
180a3077ae1SHarrison Mutai
181a3077ae1SHarrison Mutai.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores
182a3077ae1SHarrison Mutai
183a3077ae1SHarrison Mutai    +-------------+--------+--------------+
184a3077ae1SHarrison Mutai    |   Cluster   |  Core  |   Latency    |
185a3077ae1SHarrison Mutai    +=============+========+==============+
186a3077ae1SHarrison Mutai    |      0      |   0    |     1.22     |
187a3077ae1SHarrison Mutai    +-------------+--------+--------------+
188a3077ae1SHarrison Mutai    |      0      |   1    |     1.2      |
189a3077ae1SHarrison Mutai    +-------------+--------+--------------+
190a3077ae1SHarrison Mutai    |      1      |   0    |     0.6      |
191a3077ae1SHarrison Mutai    +-------------+--------+--------------+
192a3077ae1SHarrison Mutai    |      1      |   1    |     1.08     |
193a3077ae1SHarrison Mutai    +-------------+--------+--------------+
194a3077ae1SHarrison Mutai    |      1      |   2    |     1.04     |
195a3077ae1SHarrison Mutai    +-------------+--------+--------------+
196a3077ae1SHarrison Mutai    |      1      |   3    |     1.04     |
197a3077ae1SHarrison Mutai    +-------------+--------+--------------+
198a3077ae1SHarrison Mutai
199a3077ae1SHarrison MutaiAnnotated Historic Results
200a3077ae1SHarrison Mutai--------------------------
201a3077ae1SHarrison Mutai
202a3077ae1SHarrison MutaiThe following results are based on the upstream `TF master as of 31/01/2017`_.
203a3077ae1SHarrison MutaiTF-A was built using the same build instructions as detailed in the procedure
204a3077ae1SHarrison Mutaiabove.
20540d553cfSPaul Beesley
20640d553cfSPaul BeesleyIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
20740d553cfSPaul BeesleyCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
20840d553cfSPaul BeesleyCPU.
20940d553cfSPaul Beesley
210a3077ae1SHarrison Mutai``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and
211a3077ae1SHarrison Mutai``CFLUSH_OVERHEAD`` the latency of the cache flush operation.
21240d553cfSPaul Beesley
21340d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
21440d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
21540d553cfSPaul Beesley
21640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
21740d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
21840d553cfSPaul Beesley+=======+=====================+====================+==========================+
21940d553cfSPaul Beesley| 0     | 27                  | 20                 | 5                        |
22040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
22140d553cfSPaul Beesley| 1     | 114                 | 86                 | 5                        |
22240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
22340d553cfSPaul Beesley| 2     | 202                 | 58                 | 5                        |
22440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
22540d553cfSPaul Beesley| 3     | 375                 | 29                 | 94                       |
22640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
22740d553cfSPaul Beesley| 4     | 20                  | 22                 | 6                        |
22840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
22940d553cfSPaul Beesley| 5     | 290                 | 18                 | 206                      |
23040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
23140d553cfSPaul Beesley
23240d553cfSPaul BeesleyA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is
23340d553cfSPaul Beesleyobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait
23440d553cfSPaul Beesleyfor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release
23540d553cfSPaul Beesleythe lock before proceeding.
23640d553cfSPaul Beesley
23740d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the
23840d553cfSPaul Beesleylast CPUs in their respective clusters to power down, therefore both the L1 and
23940d553cfSPaul BeesleyL2 caches are flushed.
24040d553cfSPaul Beesley
24140d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3
24240d553cfSPaul Beesleybecause the L2 cache size for the big cluster is lot larger (2MB) compared to
24340d553cfSPaul Beesleythe little cluster (1MB).
24440d553cfSPaul Beesley
24540d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in parallel
24640d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24740d553cfSPaul Beesley
24840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
24940d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
25040d553cfSPaul Beesley+=======+=====================+====================+==========================+
25140d553cfSPaul Beesley| 0     | 116                 | 14                 | 8                        |
25240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
25340d553cfSPaul Beesley| 1     | 204                 | 14                 | 8                        |
25440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
25540d553cfSPaul Beesley| 2     | 287                 | 13                 | 8                        |
25640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
25740d553cfSPaul Beesley| 3     | 376                 | 13                 | 9                        |
25840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
25940d553cfSPaul Beesley| 4     | 29                  | 15                 | 7                        |
26040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
26140d553cfSPaul Beesley| 5     | 21                  | 15                 | 8                        |
26240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
26340d553cfSPaul Beesley
26440d553cfSPaul BeesleyThere is no lock contention in TF generic code at power level 0 but the large
26540d553cfSPaul Beesleyvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno
26640d553cfSPaul Beesleyplatform code. The platform lock is used to mediate access to a single SCP
26740d553cfSPaul Beesleycommunication channel. This is compounded by the SCP firmware waiting for each
26840d553cfSPaul BeesleyAP CPU to enter WFI before making the channel available to other CPUs, which
26940d553cfSPaul Beesleyeffectively serializes the SCP power down commands from all CPUs.
27040d553cfSPaul Beesley
27140d553cfSPaul BeesleyOn platforms with a more efficient CPU power down mechanism, it should be
27240d553cfSPaul Beesleypossible to make the ``PSCI_ENTRY`` times smaller and consistent.
27340d553cfSPaul Beesley
27440d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not
27540d553cfSPaul Beesleyrequire locks at power level 0.
27640d553cfSPaul Beesley
27740d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only
27840d553cfSPaul Beesleythe cache associated with power level 0 is flushed (L1).
27940d553cfSPaul Beesley
28040d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in sequence
28140d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
28240d553cfSPaul Beesley
28340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
28440d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
28540d553cfSPaul Beesley+=======+=====================+====================+==========================+
28640d553cfSPaul Beesley| 0     | 114                 | 20                 | 94                       |
28740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
28840d553cfSPaul Beesley| 1     | 114                 | 20                 | 94                       |
28940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
29040d553cfSPaul Beesley| 2     | 114                 | 20                 | 94                       |
29140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
29240d553cfSPaul Beesley| 3     | 114                 | 20                 | 94                       |
29340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
29440d553cfSPaul Beesley| 4     | 195                 | 22                 | 180                      |
29540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
29640d553cfSPaul Beesley| 5     | 21                  | 17                 | 6                        |
29740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
29840d553cfSPaul Beesley
299be653a69SPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster
30040d553cfSPaul Beesleyare large because all other CPUs in the cluster are powered down during the
30140d553cfSPaul Beesleytest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a
30240d553cfSPaul Beesleyflush of both L1 and L2 caches.
30340d553cfSPaul Beesley
30440d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
30540d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
30640d553cfSPaul Beesleyto the little cluster (1MB).
30740d553cfSPaul Beesley
30840d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead
30940d553cfSPaul BeesleyCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to
31040d553cfSPaul Beesleylevel 0, which only requires L1 cache flush.
31140d553cfSPaul Beesley
31240d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in sequence
31340d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31440d553cfSPaul Beesley
31540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
31640d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
31740d553cfSPaul Beesley+=======+=====================+====================+==========================+
31840d553cfSPaul Beesley| 0     | 22                  | 14                 | 5                        |
31940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
32040d553cfSPaul Beesley| 1     | 22                  | 14                 | 5                        |
32140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
32240d553cfSPaul Beesley| 2     | 21                  | 14                 | 5                        |
32340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
32440d553cfSPaul Beesley| 3     | 22                  | 14                 | 5                        |
32540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
32640d553cfSPaul Beesley| 4     | 17                  | 14                 | 6                        |
32740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
32840d553cfSPaul Beesley| 5     | 18                  | 15                 | 6                        |
32940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
33040d553cfSPaul Beesley
33140d553cfSPaul BeesleyHere the times are small and consistent since there is no contention and it is
33240d553cfSPaul Beesleyonly necessary to flush the cache to power level 0 (L1). This is the best case
33340d553cfSPaul Beesleyscenario.
33440d553cfSPaul Beesley
33540d553cfSPaul BeesleyThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than
33640d553cfSPaul Beesleyfor the CPUs in little cluster due to greater CPU performance.
33740d553cfSPaul Beesley
33840d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are generally lower than in the last test because the
33940d553cfSPaul Beesleycluster remains powered on throughout the test and there is less code to execute
34040d553cfSPaul Beesleyon power on (for example, no need to enter CCI coherency)
34140d553cfSPaul Beesley
34240d553cfSPaul Beesley``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level
34340d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
34440d553cfSPaul Beesley
34540d553cfSPaul BeesleyThe test sequence here is as follows:
34640d553cfSPaul Beesley
34740d553cfSPaul Beesley1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence.
34840d553cfSPaul Beesley
34940d553cfSPaul Beesley2. Program wake up timer and suspend the lead CPU to the deepest power level.
35040d553cfSPaul Beesley
35140d553cfSPaul Beesley3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU.
35240d553cfSPaul Beesley
35340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
35440d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
35540d553cfSPaul Beesley+=======+=====================+====================+==========================+
35640d553cfSPaul Beesley| 0     | 110                 | 28                 | 93                       |
35740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
35840d553cfSPaul Beesley| 1     | 110                 | 28                 | 93                       |
35940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36040d553cfSPaul Beesley| 2     | 110                 | 28                 | 93                       |
36140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36240d553cfSPaul Beesley| 3     | 111                 | 28                 | 93                       |
36340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36440d553cfSPaul Beesley| 4     | 195                 | 22                 | 181                      |
36540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36640d553cfSPaul Beesley| 5     | 20                  | 23                 | 6                        |
36740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36840d553cfSPaul Beesley
36940d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other
37040d553cfSPaul BeesleyCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call
37140d553cfSPaul Beesleypowers down to the cluster level, requiring a flush of both L1 and L2 caches.
37240d553cfSPaul Beesley
37340d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because
37440d553cfSPaul Beesleylead CPU 4 is running and CPU 5 only powers down to level 0, which only requires
37540d553cfSPaul Beesleyan L1 cache flush.
37640d553cfSPaul Beesley
37740d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
37840d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
37940d553cfSPaul Beesleyto the little cluster (1MB).
38040d553cfSPaul Beesley
38140d553cfSPaul BeesleyThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than
38240d553cfSPaul Beesleyfor CPUs in the little cluster due to greater CPU performance.  These times
38340d553cfSPaul Beesleygenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests
38440d553cfSPaul Beesleybecause there is more code to execute in the "on finisher" compared to the
38540d553cfSPaul Beesley"suspend finisher" (for example, GIC redistributor register programming).
38640d553cfSPaul Beesley
38740d553cfSPaul Beesley``PSCI_VERSION`` on all CPUs in parallel
38840d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
38940d553cfSPaul Beesley
39040d553cfSPaul BeesleySince very little code is associated with ``PSCI_VERSION``, this test
39140d553cfSPaul Beesleyapproximates the round trip latency for handling a fast SMC at EL3 in TF.
39240d553cfSPaul Beesley
39340d553cfSPaul Beesley+-------+-------------------+
39440d553cfSPaul Beesley| CPU   | TOTAL TIME (ns)   |
39540d553cfSPaul Beesley+=======+===================+
39640d553cfSPaul Beesley| 0     | 3020              |
39740d553cfSPaul Beesley+-------+-------------------+
39840d553cfSPaul Beesley| 1     | 2940              |
39940d553cfSPaul Beesley+-------+-------------------+
40040d553cfSPaul Beesley| 2     | 2980              |
40140d553cfSPaul Beesley+-------+-------------------+
40240d553cfSPaul Beesley| 3     | 3060              |
40340d553cfSPaul Beesley+-------+-------------------+
40440d553cfSPaul Beesley| 4     | 520               |
40540d553cfSPaul Beesley+-------+-------------------+
40640d553cfSPaul Beesley| 5     | 720               |
40740d553cfSPaul Beesley+-------+-------------------+
40840d553cfSPaul Beesley
40940d553cfSPaul BeesleyThe times for the big CPUs are less than the little CPUs due to greater CPU
41040d553cfSPaul Beesleyperformance.
41140d553cfSPaul Beesley
41240d553cfSPaul BeesleyWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache
41340d553cfSPaul Beesleyeffects, given that these measurements are at the nano-second level.
41440d553cfSPaul Beesley
415bd97f83aSJohn Tsichritzis--------------
416bd97f83aSJohn Tsichritzis
4170cbcccc0SHarrison Mutai*Copyright (c) 2019-2023, Arm Limited and Contributors. All rights reserved.*
418bd97f83aSJohn Tsichritzis
4190cbcccc0SHarrison Mutai.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
42040d553cfSPaul Beesley.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
421a3077ae1SHarrison Mutai.. _v2.9-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.9-rc0
422