xref: /rk3399_ARM-atf/docs/perf/psci-performance-juno.rst (revision a3077ae1e96e9a3ae690cad2c5497f4d0374635e)
140d553cfSPaul BeesleyPSCI Performance Measurements on Arm Juno Development Platform
240d553cfSPaul Beesley==============================================================
340d553cfSPaul Beesley
440d553cfSPaul BeesleyThis document summarises the findings of performance measurements of key
5bd97f83aSJohn Tsichritzisoperations in the Trusted Firmware-A Power State Coordination Interface (PSCI)
6bd97f83aSJohn Tsichritzisimplementation, using the in-built Performance Measurement Framework (PMF) and
7bd97f83aSJohn Tsichritzisruntime instrumentation timestamps.
840d553cfSPaul Beesley
940d553cfSPaul BeesleyMethod
1040d553cfSPaul Beesley------
1140d553cfSPaul Beesley
1240d553cfSPaul BeesleyWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2
1340d553cfSPaul Beesleyx Cortex-A57 clusters running at the following frequencies:
1440d553cfSPaul Beesley
1540d553cfSPaul Beesley+-----------------+--------------------+
1640d553cfSPaul Beesley| Domain          | Frequency (MHz)    |
1740d553cfSPaul Beesley+=================+====================+
1840d553cfSPaul Beesley| Cortex-A57      | 900 (nominal)      |
1940d553cfSPaul Beesley+-----------------+--------------------+
2040d553cfSPaul Beesley| Cortex-A53      | 650 (underdrive)   |
2140d553cfSPaul Beesley+-----------------+--------------------+
2240d553cfSPaul Beesley| AXI subsystem   | 533                |
2340d553cfSPaul Beesley+-----------------+--------------------+
2440d553cfSPaul Beesley
2540d553cfSPaul BeesleyJuno supports CPU, cluster and system power down states, corresponding to power
2640d553cfSPaul Beesleylevels 0, 1 and 2 respectively. It does not support any retention states.
2740d553cfSPaul Beesley
28*a3077ae1SHarrison MutaiGiven that runtime instrumentation using PMF is invasive, there is a small
29*a3077ae1SHarrison Mutai(unquantified) overhead on the results. PMF uses the generic counter for
30*a3077ae1SHarrison Mutaitimestamps, which runs at 50MHz on Juno.
31*a3077ae1SHarrison Mutai
32*a3077ae1SHarrison MutaiThe following source trees and binaries were used:
33*a3077ae1SHarrison Mutai
34*a3077ae1SHarrison Mutai- TF-A [`v2.9-rc0`_]
35*a3077ae1SHarrison Mutai- TFTF [`v2.9-rc0`_]
36*a3077ae1SHarrison Mutai
37*a3077ae1SHarrison MutaiPlease see the Runtime Instrumentation `Testing Methodology`_ page for more
38*a3077ae1SHarrison Mutaidetails.
39*a3077ae1SHarrison Mutai
40*a3077ae1SHarrison MutaiProcedure
41*a3077ae1SHarrison Mutai---------
42*a3077ae1SHarrison Mutai
43*a3077ae1SHarrison Mutai#. Build TFTF with runtime instrumentation enabled:
4440d553cfSPaul Beesley
4529c02529SPaul Beesley    .. code:: shell
4640d553cfSPaul Beesley
47*a3077ae1SHarrison Mutai        make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
48*a3077ae1SHarrison Mutai            TESTS=runtime-instrumentation all
4940d553cfSPaul Beesley
50*a3077ae1SHarrison Mutai#. Fetch Juno's SCP binary from TF-A's archive:
5140d553cfSPaul Beesley
52*a3077ae1SHarrison Mutai    .. code:: shell
5340d553cfSPaul Beesley
54*a3077ae1SHarrison Mutai        curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \
55*a3077ae1SHarrison Mutai            https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin
5640d553cfSPaul Beesley
57*a3077ae1SHarrison Mutai#. Build TF-A with the following build options:
5840d553cfSPaul Beesley
59*a3077ae1SHarrison Mutai    .. code:: shell
60*a3077ae1SHarrison Mutai
61*a3077ae1SHarrison Mutai        make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \
62*a3077ae1SHarrison Mutai            BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \
63*a3077ae1SHarrison Mutai            ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip
64*a3077ae1SHarrison Mutai
65*a3077ae1SHarrison Mutai#. Load the following images onto the development board: ``fip.bin``,
66*a3077ae1SHarrison Mutai   ``scp_bl2.bin``.
67*a3077ae1SHarrison Mutai
68*a3077ae1SHarrison MutaiResults
69*a3077ae1SHarrison Mutai-------
70*a3077ae1SHarrison Mutai
71*a3077ae1SHarrison Mutai``CPU_SUSPEND`` to deepest power level
72*a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
73*a3077ae1SHarrison Mutai
74*a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
75*a3077ae1SHarrison Mutai        parallel
76*a3077ae1SHarrison Mutai
77*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
78*a3077ae1SHarrison Mutai    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
79*a3077ae1SHarrison Mutai    +=========+======+===========+=========+=============+
80*a3077ae1SHarrison Mutai    |    0    |  0   |   243.76  |  239.92 |     6.32    |
81*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
82*a3077ae1SHarrison Mutai    |    0    |  1   |   663.5   |  30.32  |    167.82   |
83*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
84*a3077ae1SHarrison Mutai    |    1    |  0   |   105.12  |  22.84  |     5.88    |
85*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
86*a3077ae1SHarrison Mutai    |    1    |  1   |   384.16  |  19.06  |     4.7     |
87*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
88*a3077ae1SHarrison Mutai    |    1    |  2   |   523.98  |  270.46 |     4.74    |
89*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
90*a3077ae1SHarrison Mutai    |    1    |  3   |   950.54  |  220.9  |     89.2    |
91*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
92*a3077ae1SHarrison Mutai
93*a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
94*a3077ae1SHarrison Mutai        serial
95*a3077ae1SHarrison Mutai
96*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
97*a3077ae1SHarrison Mutai    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
98*a3077ae1SHarrison Mutai    +=========+======+===========+=========+=============+
99*a3077ae1SHarrison Mutai    |    0    |  0   |   266.96  |  31.74  |    167.92   |
100*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
101*a3077ae1SHarrison Mutai    |    0    |  1   |   266.9   |  31.52  |    167.82   |
102*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
103*a3077ae1SHarrison Mutai    |    1    |  0   |   279.86  |  23.42  |    87.52    |
104*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
105*a3077ae1SHarrison Mutai    |    1    |  1   |   101.38  |   18.8  |     4.64    |
106*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
107*a3077ae1SHarrison Mutai    |    1    |  2   |   101.18  |  19.28  |     4.64    |
108*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
109*a3077ae1SHarrison Mutai    |    1    |  3   |   101.32  |  19.02  |     4.62    |
110*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
111*a3077ae1SHarrison Mutai
112*a3077ae1SHarrison Mutai``CPU_SUSPEND`` to power level 0
113*a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114*a3077ae1SHarrison Mutai
115*a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
116*a3077ae1SHarrison Mutai        parallel
117*a3077ae1SHarrison Mutai
118*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
119*a3077ae1SHarrison Mutai    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
120*a3077ae1SHarrison Mutai    +=========+======+===========+=========+=============+
121*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
122*a3077ae1SHarrison Mutai    |    0    |  0   |   661.94  |  22.88  |     9.66    |
123*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
124*a3077ae1SHarrison Mutai    |    0    |  1   |   801.64  |  23.38  |     9.62    |
125*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
126*a3077ae1SHarrison Mutai    |    1    |  0   |   105.56  |  16.02  |     8.12    |
127*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
128*a3077ae1SHarrison Mutai    |    1    |  1   |   245.42  |  16.26  |     7.78    |
129*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
130*a3077ae1SHarrison Mutai    |    1    |  2   |   384.42  |   16.1  |     7.84    |
131*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
132*a3077ae1SHarrison Mutai    |    1    |  3   |   523.74  |   15.4  |     8.02    |
133*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
134*a3077ae1SHarrison Mutai
135*a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial
136*a3077ae1SHarrison Mutai
137*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
138*a3077ae1SHarrison Mutai    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
139*a3077ae1SHarrison Mutai    +=========+======+===========+=========+=============+
140*a3077ae1SHarrison Mutai    |    0    |  0   |   102.16  |  23.64  |     6.7     |
141*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
142*a3077ae1SHarrison Mutai    |    0    |  1   |   101.66  |  23.78  |     6.6     |
143*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
144*a3077ae1SHarrison Mutai    |    1    |  0   |   277.74  |  15.96  |     4.66    |
145*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
146*a3077ae1SHarrison Mutai    |    1    |  1   |    98.0   |  15.88  |     4.64    |
147*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
148*a3077ae1SHarrison Mutai    |    1    |  2   |   97.66   |  15.88  |     4.62    |
149*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
150*a3077ae1SHarrison Mutai    |    1    |  3   |   97.76   |  15.38  |     4.64    |
151*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
152*a3077ae1SHarrison Mutai
153*a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs
154*a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
155*a3077ae1SHarrison Mutai
156*a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
157*a3077ae1SHarrison Mutaicore to the deepest power level.
158*a3077ae1SHarrison Mutai
159*a3077ae1SHarrison Mutai.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs
160*a3077ae1SHarrison Mutai
161*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
162*a3077ae1SHarrison Mutai    | Cluster | Core | Powerdown | Wakekup | Cache Flush |
163*a3077ae1SHarrison Mutai    +=========+======+===========+=========+=============+
164*a3077ae1SHarrison Mutai    |    0    |  0   |   265.38  |  34.12  |    167.36   |
165*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
166*a3077ae1SHarrison Mutai    |    0    |  1   |   265.72  |  33.98  |    167.48   |
167*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
168*a3077ae1SHarrison Mutai    |    1    |  0   |   185.3   |  23.18  |    87.42    |
169*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
170*a3077ae1SHarrison Mutai    |    1    |  1   |   101.58  |  23.46  |     4.48    |
171*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
172*a3077ae1SHarrison Mutai    |    1    |  2   |   101.66  |  22.02  |     4.72    |
173*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
174*a3077ae1SHarrison Mutai    |    1    |  3   |   101.48  |  22.22  |     4.52    |
175*a3077ae1SHarrison Mutai    +---------+------+-----------+---------+-------------+
176*a3077ae1SHarrison Mutai
177*a3077ae1SHarrison Mutai``CPU_VERSION`` in parallel
178*a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~
179*a3077ae1SHarrison Mutai
180*a3077ae1SHarrison Mutai.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores
181*a3077ae1SHarrison Mutai
182*a3077ae1SHarrison Mutai    +-------------+--------+--------------+
183*a3077ae1SHarrison Mutai    |   Cluster   |  Core  |   Latency    |
184*a3077ae1SHarrison Mutai    +=============+========+==============+
185*a3077ae1SHarrison Mutai    |      0      |   0    |     1.22     |
186*a3077ae1SHarrison Mutai    +-------------+--------+--------------+
187*a3077ae1SHarrison Mutai    |      0      |   1    |     1.2      |
188*a3077ae1SHarrison Mutai    +-------------+--------+--------------+
189*a3077ae1SHarrison Mutai    |      1      |   0    |     0.6      |
190*a3077ae1SHarrison Mutai    +-------------+--------+--------------+
191*a3077ae1SHarrison Mutai    |      1      |   1    |     1.08     |
192*a3077ae1SHarrison Mutai    +-------------+--------+--------------+
193*a3077ae1SHarrison Mutai    |      1      |   2    |     1.04     |
194*a3077ae1SHarrison Mutai    +-------------+--------+--------------+
195*a3077ae1SHarrison Mutai    |      1      |   3    |     1.04     |
196*a3077ae1SHarrison Mutai    +-------------+--------+--------------+
197*a3077ae1SHarrison Mutai
198*a3077ae1SHarrison MutaiAnnotated Historic Results
199*a3077ae1SHarrison Mutai--------------------------
200*a3077ae1SHarrison Mutai
201*a3077ae1SHarrison MutaiThe following results are based on the upstream `TF master as of 31/01/2017`_.
202*a3077ae1SHarrison MutaiTF-A was built using the same build instructions as detailed in the procedure
203*a3077ae1SHarrison Mutaiabove.
20440d553cfSPaul Beesley
20540d553cfSPaul BeesleyIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
20640d553cfSPaul BeesleyCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
20740d553cfSPaul BeesleyCPU.
20840d553cfSPaul Beesley
209*a3077ae1SHarrison Mutai``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and
210*a3077ae1SHarrison Mutai``CFLUSH_OVERHEAD`` the latency of the cache flush operation.
21140d553cfSPaul Beesley
21240d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
21340d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
21440d553cfSPaul Beesley
21540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
21640d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
21740d553cfSPaul Beesley+=======+=====================+====================+==========================+
21840d553cfSPaul Beesley| 0     | 27                  | 20                 | 5                        |
21940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
22040d553cfSPaul Beesley| 1     | 114                 | 86                 | 5                        |
22140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
22240d553cfSPaul Beesley| 2     | 202                 | 58                 | 5                        |
22340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
22440d553cfSPaul Beesley| 3     | 375                 | 29                 | 94                       |
22540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
22640d553cfSPaul Beesley| 4     | 20                  | 22                 | 6                        |
22740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
22840d553cfSPaul Beesley| 5     | 290                 | 18                 | 206                      |
22940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
23040d553cfSPaul Beesley
23140d553cfSPaul BeesleyA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is
23240d553cfSPaul Beesleyobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait
23340d553cfSPaul Beesleyfor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release
23440d553cfSPaul Beesleythe lock before proceeding.
23540d553cfSPaul Beesley
23640d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the
23740d553cfSPaul Beesleylast CPUs in their respective clusters to power down, therefore both the L1 and
23840d553cfSPaul BeesleyL2 caches are flushed.
23940d553cfSPaul Beesley
24040d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3
24140d553cfSPaul Beesleybecause the L2 cache size for the big cluster is lot larger (2MB) compared to
24240d553cfSPaul Beesleythe little cluster (1MB).
24340d553cfSPaul Beesley
24440d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in parallel
24540d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24640d553cfSPaul Beesley
24740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
24840d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
24940d553cfSPaul Beesley+=======+=====================+====================+==========================+
25040d553cfSPaul Beesley| 0     | 116                 | 14                 | 8                        |
25140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
25240d553cfSPaul Beesley| 1     | 204                 | 14                 | 8                        |
25340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
25440d553cfSPaul Beesley| 2     | 287                 | 13                 | 8                        |
25540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
25640d553cfSPaul Beesley| 3     | 376                 | 13                 | 9                        |
25740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
25840d553cfSPaul Beesley| 4     | 29                  | 15                 | 7                        |
25940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
26040d553cfSPaul Beesley| 5     | 21                  | 15                 | 8                        |
26140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
26240d553cfSPaul Beesley
26340d553cfSPaul BeesleyThere is no lock contention in TF generic code at power level 0 but the large
26440d553cfSPaul Beesleyvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno
26540d553cfSPaul Beesleyplatform code. The platform lock is used to mediate access to a single SCP
26640d553cfSPaul Beesleycommunication channel. This is compounded by the SCP firmware waiting for each
26740d553cfSPaul BeesleyAP CPU to enter WFI before making the channel available to other CPUs, which
26840d553cfSPaul Beesleyeffectively serializes the SCP power down commands from all CPUs.
26940d553cfSPaul Beesley
27040d553cfSPaul BeesleyOn platforms with a more efficient CPU power down mechanism, it should be
27140d553cfSPaul Beesleypossible to make the ``PSCI_ENTRY`` times smaller and consistent.
27240d553cfSPaul Beesley
27340d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not
27440d553cfSPaul Beesleyrequire locks at power level 0.
27540d553cfSPaul Beesley
27640d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only
27740d553cfSPaul Beesleythe cache associated with power level 0 is flushed (L1).
27840d553cfSPaul Beesley
27940d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in sequence
28040d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
28140d553cfSPaul Beesley
28240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
28340d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
28440d553cfSPaul Beesley+=======+=====================+====================+==========================+
28540d553cfSPaul Beesley| 0     | 114                 | 20                 | 94                       |
28640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
28740d553cfSPaul Beesley| 1     | 114                 | 20                 | 94                       |
28840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
28940d553cfSPaul Beesley| 2     | 114                 | 20                 | 94                       |
29040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
29140d553cfSPaul Beesley| 3     | 114                 | 20                 | 94                       |
29240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
29340d553cfSPaul Beesley| 4     | 195                 | 22                 | 180                      |
29440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
29540d553cfSPaul Beesley| 5     | 21                  | 17                 | 6                        |
29640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
29740d553cfSPaul Beesley
298be653a69SPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster
29940d553cfSPaul Beesleyare large because all other CPUs in the cluster are powered down during the
30040d553cfSPaul Beesleytest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a
30140d553cfSPaul Beesleyflush of both L1 and L2 caches.
30240d553cfSPaul Beesley
30340d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
30440d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
30540d553cfSPaul Beesleyto the little cluster (1MB).
30640d553cfSPaul Beesley
30740d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead
30840d553cfSPaul BeesleyCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to
30940d553cfSPaul Beesleylevel 0, which only requires L1 cache flush.
31040d553cfSPaul Beesley
31140d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in sequence
31240d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31340d553cfSPaul Beesley
31440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
31540d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
31640d553cfSPaul Beesley+=======+=====================+====================+==========================+
31740d553cfSPaul Beesley| 0     | 22                  | 14                 | 5                        |
31840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
31940d553cfSPaul Beesley| 1     | 22                  | 14                 | 5                        |
32040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
32140d553cfSPaul Beesley| 2     | 21                  | 14                 | 5                        |
32240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
32340d553cfSPaul Beesley| 3     | 22                  | 14                 | 5                        |
32440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
32540d553cfSPaul Beesley| 4     | 17                  | 14                 | 6                        |
32640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
32740d553cfSPaul Beesley| 5     | 18                  | 15                 | 6                        |
32840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
32940d553cfSPaul Beesley
33040d553cfSPaul BeesleyHere the times are small and consistent since there is no contention and it is
33140d553cfSPaul Beesleyonly necessary to flush the cache to power level 0 (L1). This is the best case
33240d553cfSPaul Beesleyscenario.
33340d553cfSPaul Beesley
33440d553cfSPaul BeesleyThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than
33540d553cfSPaul Beesleyfor the CPUs in little cluster due to greater CPU performance.
33640d553cfSPaul Beesley
33740d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are generally lower than in the last test because the
33840d553cfSPaul Beesleycluster remains powered on throughout the test and there is less code to execute
33940d553cfSPaul Beesleyon power on (for example, no need to enter CCI coherency)
34040d553cfSPaul Beesley
34140d553cfSPaul Beesley``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level
34240d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
34340d553cfSPaul Beesley
34440d553cfSPaul BeesleyThe test sequence here is as follows:
34540d553cfSPaul Beesley
34640d553cfSPaul Beesley1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence.
34740d553cfSPaul Beesley
34840d553cfSPaul Beesley2. Program wake up timer and suspend the lead CPU to the deepest power level.
34940d553cfSPaul Beesley
35040d553cfSPaul Beesley3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU.
35140d553cfSPaul Beesley
35240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
35340d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
35440d553cfSPaul Beesley+=======+=====================+====================+==========================+
35540d553cfSPaul Beesley| 0     | 110                 | 28                 | 93                       |
35640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
35740d553cfSPaul Beesley| 1     | 110                 | 28                 | 93                       |
35840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
35940d553cfSPaul Beesley| 2     | 110                 | 28                 | 93                       |
36040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36140d553cfSPaul Beesley| 3     | 111                 | 28                 | 93                       |
36240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36340d553cfSPaul Beesley| 4     | 195                 | 22                 | 181                      |
36440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36540d553cfSPaul Beesley| 5     | 20                  | 23                 | 6                        |
36640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36740d553cfSPaul Beesley
36840d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other
36940d553cfSPaul BeesleyCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call
37040d553cfSPaul Beesleypowers down to the cluster level, requiring a flush of both L1 and L2 caches.
37140d553cfSPaul Beesley
37240d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because
37340d553cfSPaul Beesleylead CPU 4 is running and CPU 5 only powers down to level 0, which only requires
37440d553cfSPaul Beesleyan L1 cache flush.
37540d553cfSPaul Beesley
37640d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
37740d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
37840d553cfSPaul Beesleyto the little cluster (1MB).
37940d553cfSPaul Beesley
38040d553cfSPaul BeesleyThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than
38140d553cfSPaul Beesleyfor CPUs in the little cluster due to greater CPU performance.  These times
38240d553cfSPaul Beesleygenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests
38340d553cfSPaul Beesleybecause there is more code to execute in the "on finisher" compared to the
38440d553cfSPaul Beesley"suspend finisher" (for example, GIC redistributor register programming).
38540d553cfSPaul Beesley
38640d553cfSPaul Beesley``PSCI_VERSION`` on all CPUs in parallel
38740d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
38840d553cfSPaul Beesley
38940d553cfSPaul BeesleySince very little code is associated with ``PSCI_VERSION``, this test
39040d553cfSPaul Beesleyapproximates the round trip latency for handling a fast SMC at EL3 in TF.
39140d553cfSPaul Beesley
39240d553cfSPaul Beesley+-------+-------------------+
39340d553cfSPaul Beesley| CPU   | TOTAL TIME (ns)   |
39440d553cfSPaul Beesley+=======+===================+
39540d553cfSPaul Beesley| 0     | 3020              |
39640d553cfSPaul Beesley+-------+-------------------+
39740d553cfSPaul Beesley| 1     | 2940              |
39840d553cfSPaul Beesley+-------+-------------------+
39940d553cfSPaul Beesley| 2     | 2980              |
40040d553cfSPaul Beesley+-------+-------------------+
40140d553cfSPaul Beesley| 3     | 3060              |
40240d553cfSPaul Beesley+-------+-------------------+
40340d553cfSPaul Beesley| 4     | 520               |
40440d553cfSPaul Beesley+-------+-------------------+
40540d553cfSPaul Beesley| 5     | 720               |
40640d553cfSPaul Beesley+-------+-------------------+
40740d553cfSPaul Beesley
40840d553cfSPaul BeesleyThe times for the big CPUs are less than the little CPUs due to greater CPU
40940d553cfSPaul Beesleyperformance.
41040d553cfSPaul Beesley
41140d553cfSPaul BeesleyWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache
41240d553cfSPaul Beesleyeffects, given that these measurements are at the nano-second level.
41340d553cfSPaul Beesley
414bd97f83aSJohn Tsichritzis--------------
415bd97f83aSJohn Tsichritzis
4160cbcccc0SHarrison Mutai*Copyright (c) 2019-2023, Arm Limited and Contributors. All rights reserved.*
417bd97f83aSJohn Tsichritzis
4180cbcccc0SHarrison Mutai.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
41940d553cfSPaul Beesley.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
420*a3077ae1SHarrison Mutai.. _v2.9-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.9-rc0
421*a3077ae1SHarrison Mutai.. _Testing Methodology: ../perf/psci-performance-methodology.html
422