140d553cfSPaul BeesleyPSCI Performance Measurements on Arm Juno Development Platform 240d553cfSPaul Beesley============================================================== 340d553cfSPaul Beesley 440d553cfSPaul BeesleyThis document summarises the findings of performance measurements of key 5bd97f83aSJohn Tsichritzisoperations in the Trusted Firmware-A Power State Coordination Interface (PSCI) 6bd97f83aSJohn Tsichritzisimplementation, using the in-built Performance Measurement Framework (PMF) and 7bd97f83aSJohn Tsichritzisruntime instrumentation timestamps. 840d553cfSPaul Beesley 940d553cfSPaul BeesleyMethod 1040d553cfSPaul Beesley------ 1140d553cfSPaul Beesley 1240d553cfSPaul BeesleyWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2 1340d553cfSPaul Beesleyx Cortex-A57 clusters running at the following frequencies: 1440d553cfSPaul Beesley 1540d553cfSPaul Beesley+-----------------+--------------------+ 1640d553cfSPaul Beesley| Domain | Frequency (MHz) | 1740d553cfSPaul Beesley+=================+====================+ 1840d553cfSPaul Beesley| Cortex-A57 | 900 (nominal) | 1940d553cfSPaul Beesley+-----------------+--------------------+ 2040d553cfSPaul Beesley| Cortex-A53 | 650 (underdrive) | 2140d553cfSPaul Beesley+-----------------+--------------------+ 2240d553cfSPaul Beesley| AXI subsystem | 533 | 2340d553cfSPaul Beesley+-----------------+--------------------+ 2440d553cfSPaul Beesley 2540d553cfSPaul BeesleyJuno supports CPU, cluster and system power down states, corresponding to power 2640d553cfSPaul Beesleylevels 0, 1 and 2 respectively. It does not support any retention states. 2740d553cfSPaul Beesley 28a3077ae1SHarrison MutaiGiven that runtime instrumentation using PMF is invasive, there is a small 29a3077ae1SHarrison Mutai(unquantified) overhead on the results. PMF uses the generic counter for 30a3077ae1SHarrison Mutaitimestamps, which runs at 50MHz on Juno. 31a3077ae1SHarrison Mutai 32a3077ae1SHarrison MutaiThe following source trees and binaries were used: 33a3077ae1SHarrison Mutai 34a3077ae1SHarrison Mutai- TF-A [`v2.9-rc0`_] 35a3077ae1SHarrison Mutai- TFTF [`v2.9-rc0`_] 36a3077ae1SHarrison Mutai 37*5fdf198cSThaddeus SernaPlease see the Runtime Instrumentation :ref:`Testing Methodology 38*5fdf198cSThaddeus Serna<Runtime Instrumentation Methodology>` 39*5fdf198cSThaddeus Sernapage for more details. 40a3077ae1SHarrison Mutai 41a3077ae1SHarrison MutaiProcedure 42a3077ae1SHarrison Mutai--------- 43a3077ae1SHarrison Mutai 44a3077ae1SHarrison Mutai#. Build TFTF with runtime instrumentation enabled: 4540d553cfSPaul Beesley 4629c02529SPaul Beesley .. code:: shell 4740d553cfSPaul Beesley 48a3077ae1SHarrison Mutai make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \ 49a3077ae1SHarrison Mutai TESTS=runtime-instrumentation all 5040d553cfSPaul Beesley 51a3077ae1SHarrison Mutai#. Fetch Juno's SCP binary from TF-A's archive: 5240d553cfSPaul Beesley 53a3077ae1SHarrison Mutai .. code:: shell 5440d553cfSPaul Beesley 55a3077ae1SHarrison Mutai curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \ 56a3077ae1SHarrison Mutai https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin 5740d553cfSPaul Beesley 58a3077ae1SHarrison Mutai#. Build TF-A with the following build options: 5940d553cfSPaul Beesley 60a3077ae1SHarrison Mutai .. code:: shell 61a3077ae1SHarrison Mutai 62a3077ae1SHarrison Mutai make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \ 63a3077ae1SHarrison Mutai BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \ 64a3077ae1SHarrison Mutai ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip 65a3077ae1SHarrison Mutai 66a3077ae1SHarrison Mutai#. Load the following images onto the development board: ``fip.bin``, 67a3077ae1SHarrison Mutai ``scp_bl2.bin``. 68a3077ae1SHarrison Mutai 69a3077ae1SHarrison MutaiResults 70a3077ae1SHarrison Mutai------- 71a3077ae1SHarrison Mutai 72a3077ae1SHarrison Mutai``CPU_SUSPEND`` to deepest power level 73a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 74a3077ae1SHarrison Mutai 75a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 76a3077ae1SHarrison Mutai parallel 77a3077ae1SHarrison Mutai 78a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 79a3077ae1SHarrison Mutai | Cluster | Core | Powerdown | Wakekup | Cache Flush | 80a3077ae1SHarrison Mutai +=========+======+===========+=========+=============+ 81a3077ae1SHarrison Mutai | 0 | 0 | 243.76 | 239.92 | 6.32 | 82a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 83a3077ae1SHarrison Mutai | 0 | 1 | 663.5 | 30.32 | 167.82 | 84a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 85a3077ae1SHarrison Mutai | 1 | 0 | 105.12 | 22.84 | 5.88 | 86a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 87a3077ae1SHarrison Mutai | 1 | 1 | 384.16 | 19.06 | 4.7 | 88a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 89a3077ae1SHarrison Mutai | 1 | 2 | 523.98 | 270.46 | 4.74 | 90a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 91a3077ae1SHarrison Mutai | 1 | 3 | 950.54 | 220.9 | 89.2 | 92a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 93a3077ae1SHarrison Mutai 94a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 95a3077ae1SHarrison Mutai serial 96a3077ae1SHarrison Mutai 97a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 98a3077ae1SHarrison Mutai | Cluster | Core | Powerdown | Wakekup | Cache Flush | 99a3077ae1SHarrison Mutai +=========+======+===========+=========+=============+ 100a3077ae1SHarrison Mutai | 0 | 0 | 266.96 | 31.74 | 167.92 | 101a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 102a3077ae1SHarrison Mutai | 0 | 1 | 266.9 | 31.52 | 167.82 | 103a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 104a3077ae1SHarrison Mutai | 1 | 0 | 279.86 | 23.42 | 87.52 | 105a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 106a3077ae1SHarrison Mutai | 1 | 1 | 101.38 | 18.8 | 4.64 | 107a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 108a3077ae1SHarrison Mutai | 1 | 2 | 101.18 | 19.28 | 4.64 | 109a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 110a3077ae1SHarrison Mutai | 1 | 3 | 101.32 | 19.02 | 4.62 | 111a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 112a3077ae1SHarrison Mutai 113a3077ae1SHarrison Mutai``CPU_SUSPEND`` to power level 0 114a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 115a3077ae1SHarrison Mutai 116a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in 117a3077ae1SHarrison Mutai parallel 118a3077ae1SHarrison Mutai 119a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 120a3077ae1SHarrison Mutai | Cluster | Core | Powerdown | Wakekup | Cache Flush | 121a3077ae1SHarrison Mutai +=========+======+===========+=========+=============+ 122a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 123a3077ae1SHarrison Mutai | 0 | 0 | 661.94 | 22.88 | 9.66 | 124a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 125a3077ae1SHarrison Mutai | 0 | 1 | 801.64 | 23.38 | 9.62 | 126a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 127a3077ae1SHarrison Mutai | 1 | 0 | 105.56 | 16.02 | 8.12 | 128a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 129a3077ae1SHarrison Mutai | 1 | 1 | 245.42 | 16.26 | 7.78 | 130a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 131a3077ae1SHarrison Mutai | 1 | 2 | 384.42 | 16.1 | 7.84 | 132a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 133a3077ae1SHarrison Mutai | 1 | 3 | 523.74 | 15.4 | 8.02 | 134a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 135a3077ae1SHarrison Mutai 136a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial 137a3077ae1SHarrison Mutai 138a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 139a3077ae1SHarrison Mutai | Cluster | Core | Powerdown | Wakekup | Cache Flush | 140a3077ae1SHarrison Mutai +=========+======+===========+=========+=============+ 141a3077ae1SHarrison Mutai | 0 | 0 | 102.16 | 23.64 | 6.7 | 142a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 143a3077ae1SHarrison Mutai | 0 | 1 | 101.66 | 23.78 | 6.6 | 144a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 145a3077ae1SHarrison Mutai | 1 | 0 | 277.74 | 15.96 | 4.66 | 146a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 147a3077ae1SHarrison Mutai | 1 | 1 | 98.0 | 15.88 | 4.64 | 148a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 149a3077ae1SHarrison Mutai | 1 | 2 | 97.66 | 15.88 | 4.62 | 150a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 151a3077ae1SHarrison Mutai | 1 | 3 | 97.76 | 15.38 | 4.64 | 152a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 153a3077ae1SHarrison Mutai 154a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs 155a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 156a3077ae1SHarrison Mutai 157a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead 158a3077ae1SHarrison Mutaicore to the deepest power level. 159a3077ae1SHarrison Mutai 160a3077ae1SHarrison Mutai.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs 161a3077ae1SHarrison Mutai 162a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 163a3077ae1SHarrison Mutai | Cluster | Core | Powerdown | Wakekup | Cache Flush | 164a3077ae1SHarrison Mutai +=========+======+===========+=========+=============+ 165a3077ae1SHarrison Mutai | 0 | 0 | 265.38 | 34.12 | 167.36 | 166a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 167a3077ae1SHarrison Mutai | 0 | 1 | 265.72 | 33.98 | 167.48 | 168a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 169a3077ae1SHarrison Mutai | 1 | 0 | 185.3 | 23.18 | 87.42 | 170a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 171a3077ae1SHarrison Mutai | 1 | 1 | 101.58 | 23.46 | 4.48 | 172a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 173a3077ae1SHarrison Mutai | 1 | 2 | 101.66 | 22.02 | 4.72 | 174a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 175a3077ae1SHarrison Mutai | 1 | 3 | 101.48 | 22.22 | 4.52 | 176a3077ae1SHarrison Mutai +---------+------+-----------+---------+-------------+ 177a3077ae1SHarrison Mutai 178a3077ae1SHarrison Mutai``CPU_VERSION`` in parallel 179a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~ 180a3077ae1SHarrison Mutai 181a3077ae1SHarrison Mutai.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores 182a3077ae1SHarrison Mutai 183a3077ae1SHarrison Mutai +-------------+--------+--------------+ 184a3077ae1SHarrison Mutai | Cluster | Core | Latency | 185a3077ae1SHarrison Mutai +=============+========+==============+ 186a3077ae1SHarrison Mutai | 0 | 0 | 1.22 | 187a3077ae1SHarrison Mutai +-------------+--------+--------------+ 188a3077ae1SHarrison Mutai | 0 | 1 | 1.2 | 189a3077ae1SHarrison Mutai +-------------+--------+--------------+ 190a3077ae1SHarrison Mutai | 1 | 0 | 0.6 | 191a3077ae1SHarrison Mutai +-------------+--------+--------------+ 192a3077ae1SHarrison Mutai | 1 | 1 | 1.08 | 193a3077ae1SHarrison Mutai +-------------+--------+--------------+ 194a3077ae1SHarrison Mutai | 1 | 2 | 1.04 | 195a3077ae1SHarrison Mutai +-------------+--------+--------------+ 196a3077ae1SHarrison Mutai | 1 | 3 | 1.04 | 197a3077ae1SHarrison Mutai +-------------+--------+--------------+ 198a3077ae1SHarrison Mutai 199a3077ae1SHarrison MutaiAnnotated Historic Results 200a3077ae1SHarrison Mutai-------------------------- 201a3077ae1SHarrison Mutai 202a3077ae1SHarrison MutaiThe following results are based on the upstream `TF master as of 31/01/2017`_. 203a3077ae1SHarrison MutaiTF-A was built using the same build instructions as detailed in the procedure 204a3077ae1SHarrison Mutaiabove. 20540d553cfSPaul Beesley 20640d553cfSPaul BeesleyIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and 20740d553cfSPaul BeesleyCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead 20840d553cfSPaul BeesleyCPU. 20940d553cfSPaul Beesley 210a3077ae1SHarrison Mutai``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and 211a3077ae1SHarrison Mutai``CFLUSH_OVERHEAD`` the latency of the cache flush operation. 21240d553cfSPaul Beesley 21340d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in parallel 21440d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 21540d553cfSPaul Beesley 21640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 21740d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 21840d553cfSPaul Beesley+=======+=====================+====================+==========================+ 21940d553cfSPaul Beesley| 0 | 27 | 20 | 5 | 22040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 22140d553cfSPaul Beesley| 1 | 114 | 86 | 5 | 22240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 22340d553cfSPaul Beesley| 2 | 202 | 58 | 5 | 22440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 22540d553cfSPaul Beesley| 3 | 375 | 29 | 94 | 22640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 22740d553cfSPaul Beesley| 4 | 20 | 22 | 6 | 22840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 22940d553cfSPaul Beesley| 5 | 290 | 18 | 206 | 23040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 23140d553cfSPaul Beesley 23240d553cfSPaul BeesleyA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is 23340d553cfSPaul Beesleyobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait 23440d553cfSPaul Beesleyfor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release 23540d553cfSPaul Beesleythe lock before proceeding. 23640d553cfSPaul Beesley 23740d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the 23840d553cfSPaul Beesleylast CPUs in their respective clusters to power down, therefore both the L1 and 23940d553cfSPaul BeesleyL2 caches are flushed. 24040d553cfSPaul Beesley 24140d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3 24240d553cfSPaul Beesleybecause the L2 cache size for the big cluster is lot larger (2MB) compared to 24340d553cfSPaul Beesleythe little cluster (1MB). 24440d553cfSPaul Beesley 24540d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in parallel 24640d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 24740d553cfSPaul Beesley 24840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 24940d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 25040d553cfSPaul Beesley+=======+=====================+====================+==========================+ 25140d553cfSPaul Beesley| 0 | 116 | 14 | 8 | 25240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 25340d553cfSPaul Beesley| 1 | 204 | 14 | 8 | 25440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 25540d553cfSPaul Beesley| 2 | 287 | 13 | 8 | 25640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 25740d553cfSPaul Beesley| 3 | 376 | 13 | 9 | 25840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 25940d553cfSPaul Beesley| 4 | 29 | 15 | 7 | 26040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 26140d553cfSPaul Beesley| 5 | 21 | 15 | 8 | 26240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 26340d553cfSPaul Beesley 26440d553cfSPaul BeesleyThere is no lock contention in TF generic code at power level 0 but the large 26540d553cfSPaul Beesleyvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno 26640d553cfSPaul Beesleyplatform code. The platform lock is used to mediate access to a single SCP 26740d553cfSPaul Beesleycommunication channel. This is compounded by the SCP firmware waiting for each 26840d553cfSPaul BeesleyAP CPU to enter WFI before making the channel available to other CPUs, which 26940d553cfSPaul Beesleyeffectively serializes the SCP power down commands from all CPUs. 27040d553cfSPaul Beesley 27140d553cfSPaul BeesleyOn platforms with a more efficient CPU power down mechanism, it should be 27240d553cfSPaul Beesleypossible to make the ``PSCI_ENTRY`` times smaller and consistent. 27340d553cfSPaul Beesley 27440d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not 27540d553cfSPaul Beesleyrequire locks at power level 0. 27640d553cfSPaul Beesley 27740d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only 27840d553cfSPaul Beesleythe cache associated with power level 0 is flushed (L1). 27940d553cfSPaul Beesley 28040d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in sequence 28140d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 28240d553cfSPaul Beesley 28340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 28440d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 28540d553cfSPaul Beesley+=======+=====================+====================+==========================+ 28640d553cfSPaul Beesley| 0 | 114 | 20 | 94 | 28740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 28840d553cfSPaul Beesley| 1 | 114 | 20 | 94 | 28940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 29040d553cfSPaul Beesley| 2 | 114 | 20 | 94 | 29140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 29240d553cfSPaul Beesley| 3 | 114 | 20 | 94 | 29340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 29440d553cfSPaul Beesley| 4 | 195 | 22 | 180 | 29540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 29640d553cfSPaul Beesley| 5 | 21 | 17 | 6 | 29740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 29840d553cfSPaul Beesley 299be653a69SPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster 30040d553cfSPaul Beesleyare large because all other CPUs in the cluster are powered down during the 30140d553cfSPaul Beesleytest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a 30240d553cfSPaul Beesleyflush of both L1 and L2 caches. 30340d553cfSPaul Beesley 30440d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 30540d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 30640d553cfSPaul Beesleyto the little cluster (1MB). 30740d553cfSPaul Beesley 30840d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead 30940d553cfSPaul BeesleyCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to 31040d553cfSPaul Beesleylevel 0, which only requires L1 cache flush. 31140d553cfSPaul Beesley 31240d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in sequence 31340d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 31440d553cfSPaul Beesley 31540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 31640d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 31740d553cfSPaul Beesley+=======+=====================+====================+==========================+ 31840d553cfSPaul Beesley| 0 | 22 | 14 | 5 | 31940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 32040d553cfSPaul Beesley| 1 | 22 | 14 | 5 | 32140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 32240d553cfSPaul Beesley| 2 | 21 | 14 | 5 | 32340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 32440d553cfSPaul Beesley| 3 | 22 | 14 | 5 | 32540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 32640d553cfSPaul Beesley| 4 | 17 | 14 | 6 | 32740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 32840d553cfSPaul Beesley| 5 | 18 | 15 | 6 | 32940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 33040d553cfSPaul Beesley 33140d553cfSPaul BeesleyHere the times are small and consistent since there is no contention and it is 33240d553cfSPaul Beesleyonly necessary to flush the cache to power level 0 (L1). This is the best case 33340d553cfSPaul Beesleyscenario. 33440d553cfSPaul Beesley 33540d553cfSPaul BeesleyThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than 33640d553cfSPaul Beesleyfor the CPUs in little cluster due to greater CPU performance. 33740d553cfSPaul Beesley 33840d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are generally lower than in the last test because the 33940d553cfSPaul Beesleycluster remains powered on throughout the test and there is less code to execute 34040d553cfSPaul Beesleyon power on (for example, no need to enter CCI coherency) 34140d553cfSPaul Beesley 34240d553cfSPaul Beesley``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level 34340d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 34440d553cfSPaul Beesley 34540d553cfSPaul BeesleyThe test sequence here is as follows: 34640d553cfSPaul Beesley 34740d553cfSPaul Beesley1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence. 34840d553cfSPaul Beesley 34940d553cfSPaul Beesley2. Program wake up timer and suspend the lead CPU to the deepest power level. 35040d553cfSPaul Beesley 35140d553cfSPaul Beesley3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU. 35240d553cfSPaul Beesley 35340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 35440d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 35540d553cfSPaul Beesley+=======+=====================+====================+==========================+ 35640d553cfSPaul Beesley| 0 | 110 | 28 | 93 | 35740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 35840d553cfSPaul Beesley| 1 | 110 | 28 | 93 | 35940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 36040d553cfSPaul Beesley| 2 | 110 | 28 | 93 | 36140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 36240d553cfSPaul Beesley| 3 | 111 | 28 | 93 | 36340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 36440d553cfSPaul Beesley| 4 | 195 | 22 | 181 | 36540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 36640d553cfSPaul Beesley| 5 | 20 | 23 | 6 | 36740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 36840d553cfSPaul Beesley 36940d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other 37040d553cfSPaul BeesleyCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call 37140d553cfSPaul Beesleypowers down to the cluster level, requiring a flush of both L1 and L2 caches. 37240d553cfSPaul Beesley 37340d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because 37440d553cfSPaul Beesleylead CPU 4 is running and CPU 5 only powers down to level 0, which only requires 37540d553cfSPaul Beesleyan L1 cache flush. 37640d553cfSPaul Beesley 37740d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 37840d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 37940d553cfSPaul Beesleyto the little cluster (1MB). 38040d553cfSPaul Beesley 38140d553cfSPaul BeesleyThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than 38240d553cfSPaul Beesleyfor CPUs in the little cluster due to greater CPU performance. These times 38340d553cfSPaul Beesleygenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests 38440d553cfSPaul Beesleybecause there is more code to execute in the "on finisher" compared to the 38540d553cfSPaul Beesley"suspend finisher" (for example, GIC redistributor register programming). 38640d553cfSPaul Beesley 38740d553cfSPaul Beesley``PSCI_VERSION`` on all CPUs in parallel 38840d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 38940d553cfSPaul Beesley 39040d553cfSPaul BeesleySince very little code is associated with ``PSCI_VERSION``, this test 39140d553cfSPaul Beesleyapproximates the round trip latency for handling a fast SMC at EL3 in TF. 39240d553cfSPaul Beesley 39340d553cfSPaul Beesley+-------+-------------------+ 39440d553cfSPaul Beesley| CPU | TOTAL TIME (ns) | 39540d553cfSPaul Beesley+=======+===================+ 39640d553cfSPaul Beesley| 0 | 3020 | 39740d553cfSPaul Beesley+-------+-------------------+ 39840d553cfSPaul Beesley| 1 | 2940 | 39940d553cfSPaul Beesley+-------+-------------------+ 40040d553cfSPaul Beesley| 2 | 2980 | 40140d553cfSPaul Beesley+-------+-------------------+ 40240d553cfSPaul Beesley| 3 | 3060 | 40340d553cfSPaul Beesley+-------+-------------------+ 40440d553cfSPaul Beesley| 4 | 520 | 40540d553cfSPaul Beesley+-------+-------------------+ 40640d553cfSPaul Beesley| 5 | 720 | 40740d553cfSPaul Beesley+-------+-------------------+ 40840d553cfSPaul Beesley 40940d553cfSPaul BeesleyThe times for the big CPUs are less than the little CPUs due to greater CPU 41040d553cfSPaul Beesleyperformance. 41140d553cfSPaul Beesley 41240d553cfSPaul BeesleyWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache 41340d553cfSPaul Beesleyeffects, given that these measurements are at the nano-second level. 41440d553cfSPaul Beesley 415bd97f83aSJohn Tsichritzis-------------- 416bd97f83aSJohn Tsichritzis 4170cbcccc0SHarrison Mutai*Copyright (c) 2019-2023, Arm Limited and Contributors. All rights reserved.* 418bd97f83aSJohn Tsichritzis 4190cbcccc0SHarrison Mutai.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/ 42040d553cfSPaul Beesley.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d 421a3077ae1SHarrison Mutai.. _v2.9-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.9-rc0 422