140d553cfSPaul BeesleyPSCI Performance Measurements on Arm Juno Development Platform 240d553cfSPaul Beesley============================================================== 340d553cfSPaul Beesley 440d553cfSPaul BeesleyThis document summarises the findings of performance measurements of key 5bd97f83aSJohn Tsichritzisoperations in the Trusted Firmware-A Power State Coordination Interface (PSCI) 6bd97f83aSJohn Tsichritzisimplementation, using the in-built Performance Measurement Framework (PMF) and 7bd97f83aSJohn Tsichritzisruntime instrumentation timestamps. 840d553cfSPaul Beesley 940d553cfSPaul BeesleyMethod 1040d553cfSPaul Beesley------ 1140d553cfSPaul Beesley 1240d553cfSPaul BeesleyWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2 1340d553cfSPaul Beesleyx Cortex-A57 clusters running at the following frequencies: 1440d553cfSPaul Beesley 1540d553cfSPaul Beesley+-----------------+--------------------+ 1640d553cfSPaul Beesley| Domain | Frequency (MHz) | 1740d553cfSPaul Beesley+=================+====================+ 1840d553cfSPaul Beesley| Cortex-A57 | 900 (nominal) | 1940d553cfSPaul Beesley+-----------------+--------------------+ 2040d553cfSPaul Beesley| Cortex-A53 | 650 (underdrive) | 2140d553cfSPaul Beesley+-----------------+--------------------+ 2240d553cfSPaul Beesley| AXI subsystem | 533 | 2340d553cfSPaul Beesley+-----------------+--------------------+ 2440d553cfSPaul Beesley 2540d553cfSPaul BeesleyJuno supports CPU, cluster and system power down states, corresponding to power 2640d553cfSPaul Beesleylevels 0, 1 and 2 respectively. It does not support any retention states. 2740d553cfSPaul Beesley 28a3077ae1SHarrison MutaiGiven that runtime instrumentation using PMF is invasive, there is a small 29a3077ae1SHarrison Mutai(unquantified) overhead on the results. PMF uses the generic counter for 30a3077ae1SHarrison Mutaitimestamps, which runs at 50MHz on Juno. 31a3077ae1SHarrison Mutai 32a3077ae1SHarrison MutaiThe following source trees and binaries were used: 33a3077ae1SHarrison Mutai 34*a0db5c74SZachary Leaf- `TF-A v2.12-rc0`_ 35*a0db5c74SZachary Leaf- `TFTF v2.12-rc0`_ 36a3077ae1SHarrison Mutai 375fdf198cSThaddeus SernaPlease see the Runtime Instrumentation :ref:`Testing Methodology 385fdf198cSThaddeus Serna<Runtime Instrumentation Methodology>` 395fdf198cSThaddeus Sernapage for more details. 40a3077ae1SHarrison Mutai 41a3077ae1SHarrison MutaiProcedure 42a3077ae1SHarrison Mutai--------- 43a3077ae1SHarrison Mutai 44a3077ae1SHarrison Mutai#. Build TFTF with runtime instrumentation enabled: 4540d553cfSPaul Beesley 4629c02529SPaul Beesley .. code:: shell 4740d553cfSPaul Beesley 48a3077ae1SHarrison Mutai make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \ 49a3077ae1SHarrison Mutai TESTS=runtime-instrumentation all 5040d553cfSPaul Beesley 51a3077ae1SHarrison Mutai#. Fetch Juno's SCP binary from TF-A's archive: 5240d553cfSPaul Beesley 53a3077ae1SHarrison Mutai .. code:: shell 5440d553cfSPaul Beesley 55a3077ae1SHarrison Mutai curl --fail --connect-timeout 5 --retry 5 -sLS -o scp_bl2.bin \ 56a3077ae1SHarrison Mutai https://downloads.trustedfirmware.org/tf-a/css_scp_2.12.0/juno/release/juno-bl2.bin 5740d553cfSPaul Beesley 58a3077ae1SHarrison Mutai#. Build TF-A with the following build options: 5940d553cfSPaul Beesley 60a3077ae1SHarrison Mutai .. code:: shell 61a3077ae1SHarrison Mutai 62a3077ae1SHarrison Mutai make CROSS_COMPILE=aarch64-none-elf- PLAT=juno \ 63a3077ae1SHarrison Mutai BL33="/path/to/tftf.bin" SCP_BL2="scp_bl2.bin" \ 64a3077ae1SHarrison Mutai ENABLE_RUNTIME_INSTRUMENTATION=1 fiptool all fip 65a3077ae1SHarrison Mutai 66a3077ae1SHarrison Mutai#. Load the following images onto the development board: ``fip.bin``, 67a3077ae1SHarrison Mutai ``scp_bl2.bin``. 68a3077ae1SHarrison Mutai 69a3077ae1SHarrison MutaiResults 70a3077ae1SHarrison Mutai------- 71a3077ae1SHarrison Mutai 72a3077ae1SHarrison Mutai``CPU_SUSPEND`` to deepest power level 73a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 74a3077ae1SHarrison Mutai 75a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 76*a0db5c74SZachary Leaf parallel (v2.12) 77*a0db5c74SZachary Leaf 78*a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 79*a0db5c74SZachary Leaf | Cluster | Core | Powerdown | Wakeup | Cache Flush | 80*a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 81*a0db5c74SZachary Leaf | 0 | 0 | 244.52 (-65.43%) | 26.92 (-32.60%) | 5.54 (-96.70%) | 82*a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 83*a0db5c74SZachary Leaf | 0 | 1 | 526.18 (+105.12%) | 416.1 | 138.52 (+2011.59%) | 84*a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 85*a0db5c74SZachary Leaf | 1 | 0 | 104.34 | 27.02 (-94.62%) | 5.32 | 86*a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 87*a0db5c74SZachary Leaf | 1 | 1 | 384.98 | 23.06 (-85.40%) | 4.48 | 88*a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 89*a0db5c74SZachary Leaf | 1 | 2 | 812.44 (+45.94%) | 126.78 | 4.54 | 90*a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 91*a0db5c74SZachary Leaf | 1 | 3 | 986.84 | 77.22 (+176.58%) | 79.76 | 92*a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 93*a0db5c74SZachary Leaf 94*a0db5c74SZachary Leaf.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 95932d6cdbSHarrison Mutai parallel (v2.11) 96a3077ae1SHarrison Mutai 97932d6cdbSHarrison Mutai +---------+------+-------------------+--------------------+-------------+ 9894276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 99932d6cdbSHarrison Mutai +---------+------+-------------------+--------------------+-------------+ 100932d6cdbSHarrison Mutai | 0 | 0 | 112.98 (-53.44%) | 26.16 (-89.33%) | 5.48 | 101932d6cdbSHarrison Mutai +---------+------+-------------------+--------------------+-------------+ 102932d6cdbSHarrison Mutai | 0 | 1 | 411.18 | 438.88 (+1572.56%) | 138.54 | 103932d6cdbSHarrison Mutai +---------+------+-------------------+--------------------+-------------+ 104932d6cdbSHarrison Mutai | 1 | 0 | 261.82 (+150.88%) | 474.06 (+1649.30%) | 5.6 | 105932d6cdbSHarrison Mutai +---------+------+-------------------+--------------------+-------------+ 106932d6cdbSHarrison Mutai | 1 | 1 | 714.76 (+86.84%) | 26.44 | 4.48 | 107932d6cdbSHarrison Mutai +---------+------+-------------------+--------------------+-------------+ 108932d6cdbSHarrison Mutai | 1 | 2 | 862.66 | 149.34 (-45.00%) | 4.38 | 109932d6cdbSHarrison Mutai +---------+------+-------------------+--------------------+-------------+ 110932d6cdbSHarrison Mutai | 1 | 3 | 1045.12 | 98.12 (-55.76%) | 79.74 | 111932d6cdbSHarrison Mutai +---------+------+-------------------+--------------------+-------------+ 112a3077ae1SHarrison Mutai 113a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 114*a0db5c74SZachary Leaf serial (v2.12) 115a3077ae1SHarrison Mutai 116*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 11794276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 118*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 119*a0db5c74SZachary Leaf | 0 | 0 | 236.36 | 27.94 (-31.52%) | 138.0 | 120*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 121*a0db5c74SZachary Leaf | 0 | 1 | 236.58 | 27.86 (-31.72%) | 138.2 | 122*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 123*a0db5c74SZachary Leaf | 1 | 0 | 280.68 | 27.02 | 77.6 | 124*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 125*a0db5c74SZachary Leaf | 1 | 1 | 101.4 | 22.52 | 4.42 | 126*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 127*a0db5c74SZachary Leaf | 1 | 2 | 100.92 | 22.68 | 4.4 | 128*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 129*a0db5c74SZachary Leaf | 1 | 3 | 100.96 | 22.54 | 4.38 | 130*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 13194276a56SHarrison Mutai 13294276a56SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 133932d6cdbSHarrison Mutai serial (v2.11) 13494276a56SHarrison Mutai 13594276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 13694276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 13794276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 138932d6cdbSHarrison Mutai | 0 | 0 | 244.42 | 27.42 | 138.12 | 13994276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 140932d6cdbSHarrison Mutai | 0 | 1 | 245.02 | 27.34 | 138.08 | 14194276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 142932d6cdbSHarrison Mutai | 1 | 0 | 297.66 | 26.2 | 77.68 | 14394276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 144932d6cdbSHarrison Mutai | 1 | 1 | 108.02 | 21.94 | 4.52 | 14594276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 146932d6cdbSHarrison Mutai | 1 | 2 | 107.48 | 21.88 | 4.46 | 14794276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 148932d6cdbSHarrison Mutai | 1 | 3 | 107.52 | 21.86 | 4.46 | 14994276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 15094276a56SHarrison Mutai 151a3077ae1SHarrison Mutai``CPU_SUSPEND`` to power level 0 152a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 153a3077ae1SHarrison Mutai 154a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in 155*a0db5c74SZachary Leaf parallel (v2.12) 156*a0db5c74SZachary Leaf 157*a0db5c74SZachary Leaf +--------------------------------------------------------------------+ 158*a0db5c74SZachary Leaf | test_rt_instr_cpu_susp_parallel | 159*a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 160*a0db5c74SZachary Leaf | Cluster | Core | Powerdown | Wakeup | Cache Flush | 161*a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 162*a0db5c74SZachary Leaf | 0 | 0 | 663.12 | 19.66 (-39.21%) | 8.26 | 163*a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 164*a0db5c74SZachary Leaf | 0 | 1 | 804.18 | 19.24 (-40.65%) | 8.1 | 165*a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 166*a0db5c74SZachary Leaf | 1 | 0 | 105.58 (-58.80%) | 19.68 | 7.42 | 167*a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 168*a0db5c74SZachary Leaf | 1 | 1 | 245.02 (-39.67%) | 19.8 | 6.82 | 169*a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 170*a0db5c74SZachary Leaf | 1 | 2 | 383.82 (-30.83%) | 18.84 | 7.06 | 171*a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 172*a0db5c74SZachary Leaf | 1 | 3 | 523.36 (+391.23%) | 19.0 | 7.3 | 173*a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 174*a0db5c74SZachary Leaf 175*a0db5c74SZachary Leaf.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in 176932d6cdbSHarrison Mutai parallel (v2.11) 177a3077ae1SHarrison Mutai 178932d6cdbSHarrison Mutai +---------+------+-------------------+--------+-------------+ 17994276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 180932d6cdbSHarrison Mutai +---------+------+-------------------+--------+-------------+ 181932d6cdbSHarrison Mutai | 0 | 0 | 704.46 | 19.28 | 7.86 | 182932d6cdbSHarrison Mutai +---------+------+-------------------+--------+-------------+ 183932d6cdbSHarrison Mutai | 0 | 1 | 853.66 | 18.78 | 7.82 | 184932d6cdbSHarrison Mutai +---------+------+-------------------+--------+-------------+ 185932d6cdbSHarrison Mutai | 1 | 0 | 556.52 (+425.51%) | 19.06 | 7.82 | 186932d6cdbSHarrison Mutai +---------+------+-------------------+--------+-------------+ 187932d6cdbSHarrison Mutai | 1 | 1 | 113.28 (-70.47%) | 19.28 | 7.48 | 188932d6cdbSHarrison Mutai +---------+------+-------------------+--------+-------------+ 189932d6cdbSHarrison Mutai | 1 | 2 | 260.62 (-50.22%) | 19.8 | 7.26 | 190932d6cdbSHarrison Mutai +---------+------+-------------------+--------+-------------+ 191932d6cdbSHarrison Mutai | 1 | 3 | 408.16 (+66.94%) | 19.82 | 7.38 | 192932d6cdbSHarrison Mutai +---------+------+-------------------+--------+-------------+ 193a3077ae1SHarrison Mutai 194*a0db5c74SZachary Leaf.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.12) 195a3077ae1SHarrison Mutai 196*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 19794276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 198*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 199*a0db5c74SZachary Leaf | 0 | 0 | 100.04 | 20.32 (-38.50%) | 5.62 | 200*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 201*a0db5c74SZachary Leaf | 0 | 1 | 99.78 | 20.6 (-36.10%) | 5.42 | 202*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 203*a0db5c74SZachary Leaf | 1 | 0 | 278.28 | 19.52 | 4.32 | 204*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 205*a0db5c74SZachary Leaf | 1 | 1 | 97.3 | 19.44 | 4.26 | 206*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 207*a0db5c74SZachary Leaf | 1 | 2 | 97.56 | 19.52 | 4.32 | 208*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 209*a0db5c74SZachary Leaf | 1 | 3 | 97.52 | 19.46 | 4.26 | 210*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 21194276a56SHarrison Mutai 212932d6cdbSHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.11) 21394276a56SHarrison Mutai 21494276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 21594276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 21694276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 217932d6cdbSHarrison Mutai | 0 | 0 | 106.78 | 19.2 | 5.32 | 21894276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 219932d6cdbSHarrison Mutai | 0 | 1 | 107.44 | 19.64 | 5.44 | 22094276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 221932d6cdbSHarrison Mutai | 1 | 0 | 295.82 | 19.14 | 4.34 | 22294276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 223932d6cdbSHarrison Mutai | 1 | 1 | 104.34 | 19.18 | 4.28 | 22494276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 225932d6cdbSHarrison Mutai | 1 | 2 | 103.96 | 19.34 | 4.4 | 22694276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 227932d6cdbSHarrison Mutai | 1 | 3 | 104.32 | 19.18 | 4.34 | 22894276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 22994276a56SHarrison Mutai 230a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs 231a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 232a3077ae1SHarrison Mutai 233a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead 234a3077ae1SHarrison Mutaicore to the deepest power level. 235a3077ae1SHarrison Mutai 236*a0db5c74SZachary Leaf.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.12) 237*a0db5c74SZachary Leaf 238*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 239*a0db5c74SZachary Leaf | Cluster | Core | Powerdown | Wakeup | Cache Flush | 240*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 241*a0db5c74SZachary Leaf | 0 | 0 | 236.3 | 30.88 (-29.30%) | 137.76 | 242*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 243*a0db5c74SZachary Leaf | 0 | 1 | 236.66 | 30.5 (-29.23%) | 138.02 | 244*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 245*a0db5c74SZachary Leaf | 1 | 0 | 175.9 | 27.0 | 77.86 | 246*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 247*a0db5c74SZachary Leaf | 1 | 1 | 100.96 | 27.56 | 4.26 | 248*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 249*a0db5c74SZachary Leaf | 1 | 2 | 101.04 | 26.48 | 4.38 | 250*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 251*a0db5c74SZachary Leaf | 1 | 3 | 101.08 | 26.74 | 4.4 | 252*a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 253*a0db5c74SZachary Leaf 254932d6cdbSHarrison Mutai.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.11) 255a3077ae1SHarrison Mutai 25694276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 25794276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 25894276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 259932d6cdbSHarrison Mutai | 0 | 0 | 243.62 | 29.84 | 137.66 | 26094276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 261932d6cdbSHarrison Mutai | 0 | 1 | 243.88 | 29.54 | 137.8 | 26294276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 263932d6cdbSHarrison Mutai | 1 | 0 | 183.26 | 26.22 | 77.76 | 26494276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 265932d6cdbSHarrison Mutai | 1 | 1 | 107.64 | 26.74 | 4.34 | 26694276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 267932d6cdbSHarrison Mutai | 1 | 2 | 107.52 | 25.9 | 4.32 | 26894276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 269932d6cdbSHarrison Mutai | 1 | 3 | 107.74 | 25.8 | 4.34 | 27094276a56SHarrison Mutai +---------+------+-----------+--------+-------------+ 27194276a56SHarrison Mutai 272a3077ae1SHarrison Mutai``CPU_VERSION`` in parallel 273a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~ 274a3077ae1SHarrison Mutai 275*a0db5c74SZachary Leaf.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.12) 276*a0db5c74SZachary Leaf 277*a0db5c74SZachary Leaf +-------------+--------+--------------+ 278*a0db5c74SZachary Leaf | Cluster | Core | Latency | 279*a0db5c74SZachary Leaf +-------------+--------+--------------+ 280*a0db5c74SZachary Leaf | 0 | 0 | 1.0 | 281*a0db5c74SZachary Leaf +-------------+--------+--------------+ 282*a0db5c74SZachary Leaf | 0 | 1 | 1.02 | 283*a0db5c74SZachary Leaf +-------------+--------+--------------+ 284*a0db5c74SZachary Leaf | 1 | 0 | 0.52 | 285*a0db5c74SZachary Leaf +-------------+--------+--------------+ 286*a0db5c74SZachary Leaf | 1 | 1 | 0.94 | 287*a0db5c74SZachary Leaf +-------------+--------+--------------+ 288*a0db5c74SZachary Leaf | 1 | 2 | 0.94 | 289*a0db5c74SZachary Leaf +-------------+--------+--------------+ 290*a0db5c74SZachary Leaf | 1 | 3 | 0.92 | 291*a0db5c74SZachary Leaf +-------------+--------+--------------+ 292*a0db5c74SZachary Leaf 293932d6cdbSHarrison Mutai.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.11) 294a3077ae1SHarrison Mutai 295932d6cdbSHarrison Mutai +-------------+--------+--------------+ 296a3077ae1SHarrison Mutai | Cluster | Core | Latency | 297932d6cdbSHarrison Mutai +-------------+--------+--------------+ 298932d6cdbSHarrison Mutai | 0 | 0 | 1.26 | 299932d6cdbSHarrison Mutai +-------------+--------+--------------+ 300932d6cdbSHarrison Mutai | 0 | 1 | 0.96 | 301932d6cdbSHarrison Mutai +-------------+--------+--------------+ 302932d6cdbSHarrison Mutai | 1 | 0 | 0.54 | 303932d6cdbSHarrison Mutai +-------------+--------+--------------+ 304932d6cdbSHarrison Mutai | 1 | 1 | 0.94 | 305932d6cdbSHarrison Mutai +-------------+--------+--------------+ 306932d6cdbSHarrison Mutai | 1 | 2 | 0.92 | 307932d6cdbSHarrison Mutai +-------------+--------+--------------+ 308932d6cdbSHarrison Mutai | 1 | 3 | 1.02 | 309932d6cdbSHarrison Mutai +-------------+--------+--------------+ 31094276a56SHarrison Mutai 311a3077ae1SHarrison MutaiAnnotated Historic Results 312a3077ae1SHarrison Mutai-------------------------- 313a3077ae1SHarrison Mutai 314a3077ae1SHarrison MutaiThe following results are based on the upstream `TF master as of 31/01/2017`_. 315a3077ae1SHarrison MutaiTF-A was built using the same build instructions as detailed in the procedure 316a3077ae1SHarrison Mutaiabove. 31740d553cfSPaul Beesley 31840d553cfSPaul BeesleyIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and 31940d553cfSPaul BeesleyCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead 32040d553cfSPaul BeesleyCPU. 32140d553cfSPaul Beesley 322a3077ae1SHarrison Mutai``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and 323a3077ae1SHarrison Mutai``CFLUSH_OVERHEAD`` the latency of the cache flush operation. 32440d553cfSPaul Beesley 32540d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in parallel 32640d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 32740d553cfSPaul Beesley 32840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 32940d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 33040d553cfSPaul Beesley+=======+=====================+====================+==========================+ 33140d553cfSPaul Beesley| 0 | 27 | 20 | 5 | 33240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 33340d553cfSPaul Beesley| 1 | 114 | 86 | 5 | 33440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 33540d553cfSPaul Beesley| 2 | 202 | 58 | 5 | 33640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 33740d553cfSPaul Beesley| 3 | 375 | 29 | 94 | 33840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 33940d553cfSPaul Beesley| 4 | 20 | 22 | 6 | 34040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34140d553cfSPaul Beesley| 5 | 290 | 18 | 206 | 34240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34340d553cfSPaul Beesley 34440d553cfSPaul BeesleyA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is 34540d553cfSPaul Beesleyobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait 34640d553cfSPaul Beesleyfor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release 34740d553cfSPaul Beesleythe lock before proceeding. 34840d553cfSPaul Beesley 34940d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the 35040d553cfSPaul Beesleylast CPUs in their respective clusters to power down, therefore both the L1 and 35140d553cfSPaul BeesleyL2 caches are flushed. 35240d553cfSPaul Beesley 35340d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3 35440d553cfSPaul Beesleybecause the L2 cache size for the big cluster is lot larger (2MB) compared to 35540d553cfSPaul Beesleythe little cluster (1MB). 35640d553cfSPaul Beesley 35740d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in parallel 35840d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 35940d553cfSPaul Beesley 36040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 36140d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 36240d553cfSPaul Beesley+=======+=====================+====================+==========================+ 36340d553cfSPaul Beesley| 0 | 116 | 14 | 8 | 36440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 36540d553cfSPaul Beesley| 1 | 204 | 14 | 8 | 36640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 36740d553cfSPaul Beesley| 2 | 287 | 13 | 8 | 36840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 36940d553cfSPaul Beesley| 3 | 376 | 13 | 9 | 37040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 37140d553cfSPaul Beesley| 4 | 29 | 15 | 7 | 37240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 37340d553cfSPaul Beesley| 5 | 21 | 15 | 8 | 37440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 37540d553cfSPaul Beesley 37640d553cfSPaul BeesleyThere is no lock contention in TF generic code at power level 0 but the large 37740d553cfSPaul Beesleyvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno 37840d553cfSPaul Beesleyplatform code. The platform lock is used to mediate access to a single SCP 37940d553cfSPaul Beesleycommunication channel. This is compounded by the SCP firmware waiting for each 38040d553cfSPaul BeesleyAP CPU to enter WFI before making the channel available to other CPUs, which 38140d553cfSPaul Beesleyeffectively serializes the SCP power down commands from all CPUs. 38240d553cfSPaul Beesley 38340d553cfSPaul BeesleyOn platforms with a more efficient CPU power down mechanism, it should be 38440d553cfSPaul Beesleypossible to make the ``PSCI_ENTRY`` times smaller and consistent. 38540d553cfSPaul Beesley 38640d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not 38740d553cfSPaul Beesleyrequire locks at power level 0. 38840d553cfSPaul Beesley 38940d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only 39040d553cfSPaul Beesleythe cache associated with power level 0 is flushed (L1). 39140d553cfSPaul Beesley 39240d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in sequence 39340d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 39440d553cfSPaul Beesley 39540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 39640d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 39740d553cfSPaul Beesley+=======+=====================+====================+==========================+ 39840d553cfSPaul Beesley| 0 | 114 | 20 | 94 | 39940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40040d553cfSPaul Beesley| 1 | 114 | 20 | 94 | 40140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40240d553cfSPaul Beesley| 2 | 114 | 20 | 94 | 40340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40440d553cfSPaul Beesley| 3 | 114 | 20 | 94 | 40540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40640d553cfSPaul Beesley| 4 | 195 | 22 | 180 | 40740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40840d553cfSPaul Beesley| 5 | 21 | 17 | 6 | 40940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 41040d553cfSPaul Beesley 411be653a69SPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster 41240d553cfSPaul Beesleyare large because all other CPUs in the cluster are powered down during the 41340d553cfSPaul Beesleytest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a 41440d553cfSPaul Beesleyflush of both L1 and L2 caches. 41540d553cfSPaul Beesley 41640d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 41740d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 41840d553cfSPaul Beesleyto the little cluster (1MB). 41940d553cfSPaul Beesley 42040d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead 42140d553cfSPaul BeesleyCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to 42240d553cfSPaul Beesleylevel 0, which only requires L1 cache flush. 42340d553cfSPaul Beesley 42440d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in sequence 42540d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 42640d553cfSPaul Beesley 42740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 42840d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 42940d553cfSPaul Beesley+=======+=====================+====================+==========================+ 43040d553cfSPaul Beesley| 0 | 22 | 14 | 5 | 43140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 43240d553cfSPaul Beesley| 1 | 22 | 14 | 5 | 43340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 43440d553cfSPaul Beesley| 2 | 21 | 14 | 5 | 43540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 43640d553cfSPaul Beesley| 3 | 22 | 14 | 5 | 43740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 43840d553cfSPaul Beesley| 4 | 17 | 14 | 6 | 43940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 44040d553cfSPaul Beesley| 5 | 18 | 15 | 6 | 44140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 44240d553cfSPaul Beesley 44340d553cfSPaul BeesleyHere the times are small and consistent since there is no contention and it is 44440d553cfSPaul Beesleyonly necessary to flush the cache to power level 0 (L1). This is the best case 44540d553cfSPaul Beesleyscenario. 44640d553cfSPaul Beesley 44740d553cfSPaul BeesleyThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than 44840d553cfSPaul Beesleyfor the CPUs in little cluster due to greater CPU performance. 44940d553cfSPaul Beesley 45040d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are generally lower than in the last test because the 45140d553cfSPaul Beesleycluster remains powered on throughout the test and there is less code to execute 45240d553cfSPaul Beesleyon power on (for example, no need to enter CCI coherency) 45340d553cfSPaul Beesley 45440d553cfSPaul Beesley``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level 45540d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 45640d553cfSPaul Beesley 45740d553cfSPaul BeesleyThe test sequence here is as follows: 45840d553cfSPaul Beesley 45940d553cfSPaul Beesley1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence. 46040d553cfSPaul Beesley 46140d553cfSPaul Beesley2. Program wake up timer and suspend the lead CPU to the deepest power level. 46240d553cfSPaul Beesley 46340d553cfSPaul Beesley3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU. 46440d553cfSPaul Beesley 46540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 46640d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 46740d553cfSPaul Beesley+=======+=====================+====================+==========================+ 46840d553cfSPaul Beesley| 0 | 110 | 28 | 93 | 46940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 47040d553cfSPaul Beesley| 1 | 110 | 28 | 93 | 47140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 47240d553cfSPaul Beesley| 2 | 110 | 28 | 93 | 47340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 47440d553cfSPaul Beesley| 3 | 111 | 28 | 93 | 47540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 47640d553cfSPaul Beesley| 4 | 195 | 22 | 181 | 47740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 47840d553cfSPaul Beesley| 5 | 20 | 23 | 6 | 47940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 48040d553cfSPaul Beesley 48140d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other 48240d553cfSPaul BeesleyCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call 48340d553cfSPaul Beesleypowers down to the cluster level, requiring a flush of both L1 and L2 caches. 48440d553cfSPaul Beesley 48540d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because 48640d553cfSPaul Beesleylead CPU 4 is running and CPU 5 only powers down to level 0, which only requires 48740d553cfSPaul Beesleyan L1 cache flush. 48840d553cfSPaul Beesley 48940d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 49040d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 49140d553cfSPaul Beesleyto the little cluster (1MB). 49240d553cfSPaul Beesley 49340d553cfSPaul BeesleyThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than 49440d553cfSPaul Beesleyfor CPUs in the little cluster due to greater CPU performance. These times 49540d553cfSPaul Beesleygenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests 49640d553cfSPaul Beesleybecause there is more code to execute in the "on finisher" compared to the 49740d553cfSPaul Beesley"suspend finisher" (for example, GIC redistributor register programming). 49840d553cfSPaul Beesley 49940d553cfSPaul Beesley``PSCI_VERSION`` on all CPUs in parallel 50040d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 50140d553cfSPaul Beesley 50240d553cfSPaul BeesleySince very little code is associated with ``PSCI_VERSION``, this test 50340d553cfSPaul Beesleyapproximates the round trip latency for handling a fast SMC at EL3 in TF. 50440d553cfSPaul Beesley 50540d553cfSPaul Beesley+-------+-------------------+ 50640d553cfSPaul Beesley| CPU | TOTAL TIME (ns) | 50740d553cfSPaul Beesley+=======+===================+ 50840d553cfSPaul Beesley| 0 | 3020 | 50940d553cfSPaul Beesley+-------+-------------------+ 51040d553cfSPaul Beesley| 1 | 2940 | 51140d553cfSPaul Beesley+-------+-------------------+ 51240d553cfSPaul Beesley| 2 | 2980 | 51340d553cfSPaul Beesley+-------+-------------------+ 51440d553cfSPaul Beesley| 3 | 3060 | 51540d553cfSPaul Beesley+-------+-------------------+ 51640d553cfSPaul Beesley| 4 | 520 | 51740d553cfSPaul Beesley+-------+-------------------+ 51840d553cfSPaul Beesley| 5 | 720 | 51940d553cfSPaul Beesley+-------+-------------------+ 52040d553cfSPaul Beesley 52140d553cfSPaul BeesleyThe times for the big CPUs are less than the little CPUs due to greater CPU 52240d553cfSPaul Beesleyperformance. 52340d553cfSPaul Beesley 52440d553cfSPaul BeesleyWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache 52540d553cfSPaul Beesleyeffects, given that these measurements are at the nano-second level. 52640d553cfSPaul Beesley 527bd97f83aSJohn Tsichritzis-------------- 528bd97f83aSJohn Tsichritzis 529932d6cdbSHarrison Mutai*Copyright (c) 2019-2024, Arm Limited and Contributors. All rights reserved.* 530bd97f83aSJohn Tsichritzis 5310cbcccc0SHarrison Mutai.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/ 53240d553cfSPaul Beesley.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d 533*a0db5c74SZachary Leaf.. _TF-A v2.12-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.12-rc0 534*a0db5c74SZachary Leaf.. _TFTF v2.12-rc0: https://git.trustedfirmware.org/TF-A/tf-a-tests.git/tree/?h=v2.12-rc0 535