140d553cfSPaul BeesleyPSCI Performance Measurements on Arm Juno Development Platform 240d553cfSPaul Beesley============================================================== 340d553cfSPaul Beesley 440d553cfSPaul BeesleyThis document summarises the findings of performance measurements of key 5bd97f83aSJohn Tsichritzisoperations in the Trusted Firmware-A Power State Coordination Interface (PSCI) 6bd97f83aSJohn Tsichritzisimplementation, using the in-built Performance Measurement Framework (PMF) and 7bd97f83aSJohn Tsichritzisruntime instrumentation timestamps. 840d553cfSPaul Beesley 940d553cfSPaul BeesleyMethod 1040d553cfSPaul Beesley------ 1140d553cfSPaul Beesley 1240d553cfSPaul BeesleyWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2 1340d553cfSPaul Beesleyx Cortex-A57 clusters running at the following frequencies: 1440d553cfSPaul Beesley 1540d553cfSPaul Beesley+-----------------+--------------------+ 1640d553cfSPaul Beesley| Domain | Frequency (MHz) | 1740d553cfSPaul Beesley+=================+====================+ 1840d553cfSPaul Beesley| Cortex-A57 | 900 (nominal) | 1940d553cfSPaul Beesley+-----------------+--------------------+ 2040d553cfSPaul Beesley| Cortex-A53 | 650 (underdrive) | 2140d553cfSPaul Beesley+-----------------+--------------------+ 2240d553cfSPaul Beesley| AXI subsystem | 533 | 2340d553cfSPaul Beesley+-----------------+--------------------+ 2440d553cfSPaul Beesley 2540d553cfSPaul BeesleyJuno supports CPU, cluster and system power down states, corresponding to power 2640d553cfSPaul Beesleylevels 0, 1 and 2 respectively. It does not support any retention states. 2740d553cfSPaul Beesley 28a3077ae1SHarrison MutaiGiven that runtime instrumentation using PMF is invasive, there is a small 29a3077ae1SHarrison Mutai(unquantified) overhead on the results. PMF uses the generic counter for 30a3077ae1SHarrison Mutaitimestamps, which runs at 50MHz on Juno. 31a3077ae1SHarrison Mutai 32a3077ae1SHarrison MutaiThe following source trees and binaries were used: 33a3077ae1SHarrison Mutai 34*5059fea0SBoyan Karatotev- `TF-A v2.13-rc0`_ 35*5059fea0SBoyan Karatotev- `TFTF v2.13-rc0`_ 36a3077ae1SHarrison Mutai 375fdf198cSThaddeus SernaPlease see the Runtime Instrumentation :ref:`Testing Methodology 385fdf198cSThaddeus Serna<Runtime Instrumentation Methodology>` 399b65ffefSBoyan Karatotevpage for more details. The tests were ran using the 409b65ffefSBoyan Karatotev`tf-psci-lava-instr/juno-enable-runtime-instr,juno-instrumentation:juno-tftf` 419b65ffefSBoyan Karatotevconfiguration in CI. 42a3077ae1SHarrison Mutai 43a3077ae1SHarrison MutaiResults 44a3077ae1SHarrison Mutai------- 45a3077ae1SHarrison Mutai 46a3077ae1SHarrison Mutai``CPU_SUSPEND`` to deepest power level 47a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 48a3077ae1SHarrison Mutai 49a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 50*5059fea0SBoyan Karatotev parallel (v2.13) 51*5059fea0SBoyan Karatotev 52*5059fea0SBoyan Karatotev +---------+------+------------------+-------------------+--------------------+ 53*5059fea0SBoyan Karatotev | Cluster | Core | Powerdown | Wakeup | Cache Flush | 54*5059fea0SBoyan Karatotev +---------+------+------------------+-------------------+--------------------+ 55*5059fea0SBoyan Karatotev | 0 | 0 | 333.0 (-52.92%) | 23.92 (-40.11%) | 138.88 | 56*5059fea0SBoyan Karatotev +---------+------+------------------+-------------------+--------------------+ 57*5059fea0SBoyan Karatotev | 0 | 1 | 630.9 (+145.95%) | 253.72 (-46.56%) | 136.94 (+1987.50%) | 58*5059fea0SBoyan Karatotev +---------+------+------------------+-------------------+--------------------+ 59*5059fea0SBoyan Karatotev | 1 | 0 | 184.74 (+71.92%) | 23.16 (-95.39%) | 80.24 (+1283.45%) | 60*5059fea0SBoyan Karatotev +---------+------+------------------+-------------------+--------------------+ 61*5059fea0SBoyan Karatotev | 1 | 1 | 481.14 | 18.56 (-88.25%) | 76.5 (+1520.76%) | 62*5059fea0SBoyan Karatotev +---------+------+------------------+-------------------+--------------------+ 63*5059fea0SBoyan Karatotev | 1 | 2 | 933.88 (+67.76%) | 289.58 (+189.64%) | 76.34 (+1510.55%) | 64*5059fea0SBoyan Karatotev +---------+------+------------------+-------------------+--------------------+ 65*5059fea0SBoyan Karatotev | 1 | 3 | 1112.48 | 238.42 (+753.94%) | 76.38 | 66*5059fea0SBoyan Karatotev +---------+------+------------------+-------------------+--------------------+ 67*5059fea0SBoyan Karatotev 68*5059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 69a0db5c74SZachary Leaf parallel (v2.12) 70a0db5c74SZachary Leaf 71a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 72a0db5c74SZachary Leaf | Cluster | Core | Powerdown | Wakeup | Cache Flush | 73a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 74a0db5c74SZachary Leaf | 0 | 0 | 244.52 (-65.43%) | 26.92 (-32.60%) | 5.54 (-96.70%) | 75a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 76a0db5c74SZachary Leaf | 0 | 1 | 526.18 (+105.12%) | 416.1 | 138.52 (+2011.59%) | 77a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 78a0db5c74SZachary Leaf | 1 | 0 | 104.34 | 27.02 (-94.62%) | 5.32 | 79a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 80a0db5c74SZachary Leaf | 1 | 1 | 384.98 | 23.06 (-85.40%) | 4.48 | 81a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 82a0db5c74SZachary Leaf | 1 | 2 | 812.44 (+45.94%) | 126.78 | 4.54 | 83a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 84a0db5c74SZachary Leaf | 1 | 3 | 986.84 | 77.22 (+176.58%) | 79.76 | 85a0db5c74SZachary Leaf +---------+------+-------------------+------------------+--------------------+ 86a0db5c74SZachary Leaf 87a0db5c74SZachary Leaf.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 88*5059fea0SBoyan Karatotev serial (v2.13) 89a3077ae1SHarrison Mutai 90*5059fea0SBoyan Karatotev +---------+------+------------------+-----------------+-------------------+ 9194276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 92*5059fea0SBoyan Karatotev +---------+------+------------------+-----------------+-------------------+ 93*5059fea0SBoyan Karatotev | 0 | 0 | 244.08 | 24.48 (-40.00%) | 137.64 | 94*5059fea0SBoyan Karatotev +---------+------+------------------+-----------------+-------------------+ 95*5059fea0SBoyan Karatotev | 0 | 1 | 244.2 | 23.84 (-41.57%) | 137.86 | 96*5059fea0SBoyan Karatotev +---------+------+------------------+-----------------+-------------------+ 97*5059fea0SBoyan Karatotev | 1 | 0 | 294.78 | 23.54 | 76.62 | 98*5059fea0SBoyan Karatotev +---------+------+------------------+-----------------+-------------------+ 99*5059fea0SBoyan Karatotev | 1 | 1 | 180.1 (+74.72%) | 21.14 | 77.12 (+1533.90%) | 100*5059fea0SBoyan Karatotev +---------+------+------------------+-----------------+-------------------+ 101*5059fea0SBoyan Karatotev | 1 | 2 | 180.54 (+75.25%) | 20.8 | 76.76 (+1554.31%) | 102*5059fea0SBoyan Karatotev +---------+------+------------------+-----------------+-------------------+ 103*5059fea0SBoyan Karatotev | 1 | 3 | 180.6 (+75.44%) | 21.2 | 76.86 (+1542.31%) | 104*5059fea0SBoyan Karatotev +---------+------+------------------+-----------------+-------------------+ 105a3077ae1SHarrison Mutai 106a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 107a0db5c74SZachary Leaf serial (v2.12) 108a3077ae1SHarrison Mutai 109a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 11094276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 111a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 112a0db5c74SZachary Leaf | 0 | 0 | 236.36 | 27.94 (-31.52%) | 138.0 | 113a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 114a0db5c74SZachary Leaf | 0 | 1 | 236.58 | 27.86 (-31.72%) | 138.2 | 115a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 116a0db5c74SZachary Leaf | 1 | 0 | 280.68 | 27.02 | 77.6 | 117a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 118a0db5c74SZachary Leaf | 1 | 1 | 101.4 | 22.52 | 4.42 | 119a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 120a0db5c74SZachary Leaf | 1 | 2 | 100.92 | 22.68 | 4.4 | 121a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 122a0db5c74SZachary Leaf | 1 | 3 | 100.96 | 22.54 | 4.38 | 123a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 12494276a56SHarrison Mutai 125a3077ae1SHarrison Mutai``CPU_SUSPEND`` to power level 0 126a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 127a3077ae1SHarrison Mutai 128a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in 129*5059fea0SBoyan Karatotev parallel (v2.13) 130*5059fea0SBoyan Karatotev 131*5059fea0SBoyan Karatotev +---------+------+-------------------+-----------------+-------------+ 132*5059fea0SBoyan Karatotev | Cluster | Core | Powerdown | Wakeup | Cache Flush | 133*5059fea0SBoyan Karatotev +---------+------+-------------------+-----------------+-------------+ 134*5059fea0SBoyan Karatotev | 0 | 0 | 703.06 | 16.86 (-47.87%) | 7.98 | 135*5059fea0SBoyan Karatotev +---------+------+-------------------+-----------------+-------------+ 136*5059fea0SBoyan Karatotev | 0 | 1 | 851.88 | 16.4 (-49.41%) | 8.04 | 137*5059fea0SBoyan Karatotev +---------+------+-------------------+-----------------+-------------+ 138*5059fea0SBoyan Karatotev | 1 | 0 | 407.4 (+58.99%) | 15.1 (-26.20%) | 7.2 | 139*5059fea0SBoyan Karatotev +---------+------+-------------------+-----------------+-------------+ 140*5059fea0SBoyan Karatotev | 1 | 1 | 110.98 (-72.67%) | 15.46 | 6.56 | 141*5059fea0SBoyan Karatotev +---------+------+-------------------+-----------------+-------------+ 142*5059fea0SBoyan Karatotev | 1 | 2 | 554.54 | 15.4 | 6.94 | 143*5059fea0SBoyan Karatotev +---------+------+-------------------+-----------------+-------------+ 144*5059fea0SBoyan Karatotev | 1 | 3 | 258.96 (+143.06%) | 15.56 (-25.05%) | 6.64 | 145*5059fea0SBoyan Karatotev +---------+------+-------------------+-----------------+-------------+ 146*5059fea0SBoyan Karatotev 147*5059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in 148a0db5c74SZachary Leaf parallel (v2.12) 149a0db5c74SZachary Leaf 150a0db5c74SZachary Leaf +--------------------------------------------------------------------+ 151a0db5c74SZachary Leaf | test_rt_instr_cpu_susp_parallel | 152a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 153a0db5c74SZachary Leaf | Cluster | Core | Powerdown | Wakeup | Cache Flush | 154a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 155a0db5c74SZachary Leaf | 0 | 0 | 663.12 | 19.66 (-39.21%) | 8.26 | 156a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 157a0db5c74SZachary Leaf | 0 | 1 | 804.18 | 19.24 (-40.65%) | 8.1 | 158a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 159a0db5c74SZachary Leaf | 1 | 0 | 105.58 (-58.80%) | 19.68 | 7.42 | 160a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 161a0db5c74SZachary Leaf | 1 | 1 | 245.02 (-39.67%) | 19.8 | 6.82 | 162a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 163a0db5c74SZachary Leaf | 1 | 2 | 383.82 (-30.83%) | 18.84 | 7.06 | 164a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 165a0db5c74SZachary Leaf | 1 | 3 | 523.36 (+391.23%) | 19.0 | 7.3 | 166a0db5c74SZachary Leaf +---------+------+-------------------+-----------------+-------------+ 167a0db5c74SZachary Leaf 168*5059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.13) 169a3077ae1SHarrison Mutai 170*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 17194276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 172*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 173*5059fea0SBoyan Karatotev | 0 | 0 | 106.12 | 17.1 (-48.24%) | 5.26 | 174*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 175*5059fea0SBoyan Karatotev | 0 | 1 | 106.88 | 17.06 (-47.08%) | 5.28 | 176*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 177*5059fea0SBoyan Karatotev | 1 | 0 | 294.36 | 15.6 | 4.56 | 178*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 179*5059fea0SBoyan Karatotev | 1 | 1 | 103.26 | 15.44 | 4.46 | 180*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 181*5059fea0SBoyan Karatotev | 1 | 2 | 103.7 | 15.26 | 4.5 | 182*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 183*5059fea0SBoyan Karatotev | 1 | 3 | 103.68 | 15.72 | 4.5 | 184*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 185a3077ae1SHarrison Mutai 186a0db5c74SZachary Leaf.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.12) 187a3077ae1SHarrison Mutai 188a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 18994276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 190a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 191a0db5c74SZachary Leaf | 0 | 0 | 100.04 | 20.32 (-38.50%) | 5.62 | 192a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 193a0db5c74SZachary Leaf | 0 | 1 | 99.78 | 20.6 (-36.10%) | 5.42 | 194a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 195a0db5c74SZachary Leaf | 1 | 0 | 278.28 | 19.52 | 4.32 | 196a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 197a0db5c74SZachary Leaf | 1 | 1 | 97.3 | 19.44 | 4.26 | 198a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 199a0db5c74SZachary Leaf | 1 | 2 | 97.56 | 19.52 | 4.32 | 200a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 201a0db5c74SZachary Leaf | 1 | 3 | 97.52 | 19.46 | 4.26 | 202a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 20394276a56SHarrison Mutai 204a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs 205a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 206a3077ae1SHarrison Mutai 207a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead 208a3077ae1SHarrison Mutaicore to the deepest power level. 209a3077ae1SHarrison Mutai 210*5059fea0SBoyan Karatotev.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.13) 211*5059fea0SBoyan Karatotev 212*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 213*5059fea0SBoyan Karatotev | Cluster | Core | Powerdown | Wakeup | Cache Flush | 214*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 215*5059fea0SBoyan Karatotev | 0 | 0 | 243.02 | 26.42 (-39.51%) | 137.58 | 216*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 217*5059fea0SBoyan Karatotev | 0 | 1 | 244.24 | 26.32 (-38.93%) | 137.88 | 218*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 219*5059fea0SBoyan Karatotev | 1 | 0 | 182.36 | 23.66 | 78.0 | 220*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 221*5059fea0SBoyan Karatotev | 1 | 1 | 108.18 | 22.68 | 4.42 | 222*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 223*5059fea0SBoyan Karatotev | 1 | 2 | 108.34 | 21.72 | 4.24 | 224*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 225*5059fea0SBoyan Karatotev | 1 | 3 | 108.22 | 21.68 | 4.34 | 226*5059fea0SBoyan Karatotev +---------+------+-----------+-----------------+-------------+ 227*5059fea0SBoyan Karatotev 228a0db5c74SZachary Leaf.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.12) 229a0db5c74SZachary Leaf 230a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 231a0db5c74SZachary Leaf | Cluster | Core | Powerdown | Wakeup | Cache Flush | 232a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 233a0db5c74SZachary Leaf | 0 | 0 | 236.3 | 30.88 (-29.30%) | 137.76 | 234a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 235a0db5c74SZachary Leaf | 0 | 1 | 236.66 | 30.5 (-29.23%) | 138.02 | 236a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 237a0db5c74SZachary Leaf | 1 | 0 | 175.9 | 27.0 | 77.86 | 238a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 239a0db5c74SZachary Leaf | 1 | 1 | 100.96 | 27.56 | 4.26 | 240a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 241a0db5c74SZachary Leaf | 1 | 2 | 101.04 | 26.48 | 4.38 | 242a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 243a0db5c74SZachary Leaf | 1 | 3 | 101.08 | 26.74 | 4.4 | 244a0db5c74SZachary Leaf +---------+------+-----------+-----------------+-------------+ 245a0db5c74SZachary Leaf 246a3077ae1SHarrison Mutai``CPU_VERSION`` in parallel 247a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~ 248a3077ae1SHarrison Mutai 249*5059fea0SBoyan Karatotev.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.13) 250*5059fea0SBoyan Karatotev 251*5059fea0SBoyan Karatotev +-------------+--------+--------------+ 252*5059fea0SBoyan Karatotev | Cluster | Core | Latency | 253*5059fea0SBoyan Karatotev +-------------+--------+--------------+ 254*5059fea0SBoyan Karatotev | 0 | 0 | 1.0 | 255*5059fea0SBoyan Karatotev +-------------+--------+--------------+ 256*5059fea0SBoyan Karatotev | 0 | 1 | 1.06 | 257*5059fea0SBoyan Karatotev +-------------+--------+--------------+ 258*5059fea0SBoyan Karatotev | 1 | 0 | 0.6 | 259*5059fea0SBoyan Karatotev +-------------+--------+--------------+ 260*5059fea0SBoyan Karatotev | 1 | 1 | 1.0 | 261*5059fea0SBoyan Karatotev +-------------+--------+--------------+ 262*5059fea0SBoyan Karatotev | 1 | 2 | 0.98 | 263*5059fea0SBoyan Karatotev +-------------+--------+--------------+ 264*5059fea0SBoyan Karatotev | 1 | 3 | 1.0 | 265*5059fea0SBoyan Karatotev +-------------+--------+--------------+ 266*5059fea0SBoyan Karatotev 267a0db5c74SZachary Leaf.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.12) 268a0db5c74SZachary Leaf 269a0db5c74SZachary Leaf +-------------+--------+--------------+ 270a0db5c74SZachary Leaf | Cluster | Core | Latency | 271a0db5c74SZachary Leaf +-------------+--------+--------------+ 272a0db5c74SZachary Leaf | 0 | 0 | 1.0 | 273a0db5c74SZachary Leaf +-------------+--------+--------------+ 274a0db5c74SZachary Leaf | 0 | 1 | 1.02 | 275a0db5c74SZachary Leaf +-------------+--------+--------------+ 276a0db5c74SZachary Leaf | 1 | 0 | 0.52 | 277a0db5c74SZachary Leaf +-------------+--------+--------------+ 278a0db5c74SZachary Leaf | 1 | 1 | 0.94 | 279a0db5c74SZachary Leaf +-------------+--------+--------------+ 280a0db5c74SZachary Leaf | 1 | 2 | 0.94 | 281a0db5c74SZachary Leaf +-------------+--------+--------------+ 282a0db5c74SZachary Leaf | 1 | 3 | 0.92 | 283a0db5c74SZachary Leaf +-------------+--------+--------------+ 284a0db5c74SZachary Leaf 285a3077ae1SHarrison MutaiAnnotated Historic Results 286a3077ae1SHarrison Mutai-------------------------- 287a3077ae1SHarrison Mutai 288a3077ae1SHarrison MutaiThe following results are based on the upstream `TF master as of 31/01/2017`_. 289a3077ae1SHarrison MutaiTF-A was built using the same build instructions as detailed in the procedure 290a3077ae1SHarrison Mutaiabove. 29140d553cfSPaul Beesley 29240d553cfSPaul BeesleyIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and 29340d553cfSPaul BeesleyCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead 29440d553cfSPaul BeesleyCPU. 29540d553cfSPaul Beesley 296a3077ae1SHarrison Mutai``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and 297a3077ae1SHarrison Mutai``CFLUSH_OVERHEAD`` the latency of the cache flush operation. 29840d553cfSPaul Beesley 29940d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in parallel 30040d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 30140d553cfSPaul Beesley 30240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 30340d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 30440d553cfSPaul Beesley+=======+=====================+====================+==========================+ 30540d553cfSPaul Beesley| 0 | 27 | 20 | 5 | 30640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 30740d553cfSPaul Beesley| 1 | 114 | 86 | 5 | 30840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 30940d553cfSPaul Beesley| 2 | 202 | 58 | 5 | 31040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 31140d553cfSPaul Beesley| 3 | 375 | 29 | 94 | 31240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 31340d553cfSPaul Beesley| 4 | 20 | 22 | 6 | 31440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 31540d553cfSPaul Beesley| 5 | 290 | 18 | 206 | 31640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 31740d553cfSPaul Beesley 31840d553cfSPaul BeesleyA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is 31940d553cfSPaul Beesleyobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait 32040d553cfSPaul Beesleyfor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release 32140d553cfSPaul Beesleythe lock before proceeding. 32240d553cfSPaul Beesley 32340d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the 32440d553cfSPaul Beesleylast CPUs in their respective clusters to power down, therefore both the L1 and 32540d553cfSPaul BeesleyL2 caches are flushed. 32640d553cfSPaul Beesley 32740d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3 32840d553cfSPaul Beesleybecause the L2 cache size for the big cluster is lot larger (2MB) compared to 32940d553cfSPaul Beesleythe little cluster (1MB). 33040d553cfSPaul Beesley 33140d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in parallel 33240d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 33340d553cfSPaul Beesley 33440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 33540d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 33640d553cfSPaul Beesley+=======+=====================+====================+==========================+ 33740d553cfSPaul Beesley| 0 | 116 | 14 | 8 | 33840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 33940d553cfSPaul Beesley| 1 | 204 | 14 | 8 | 34040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34140d553cfSPaul Beesley| 2 | 287 | 13 | 8 | 34240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34340d553cfSPaul Beesley| 3 | 376 | 13 | 9 | 34440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34540d553cfSPaul Beesley| 4 | 29 | 15 | 7 | 34640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34740d553cfSPaul Beesley| 5 | 21 | 15 | 8 | 34840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34940d553cfSPaul Beesley 35040d553cfSPaul BeesleyThere is no lock contention in TF generic code at power level 0 but the large 35140d553cfSPaul Beesleyvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno 35240d553cfSPaul Beesleyplatform code. The platform lock is used to mediate access to a single SCP 35340d553cfSPaul Beesleycommunication channel. This is compounded by the SCP firmware waiting for each 35440d553cfSPaul BeesleyAP CPU to enter WFI before making the channel available to other CPUs, which 35540d553cfSPaul Beesleyeffectively serializes the SCP power down commands from all CPUs. 35640d553cfSPaul Beesley 35740d553cfSPaul BeesleyOn platforms with a more efficient CPU power down mechanism, it should be 35840d553cfSPaul Beesleypossible to make the ``PSCI_ENTRY`` times smaller and consistent. 35940d553cfSPaul Beesley 36040d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not 36140d553cfSPaul Beesleyrequire locks at power level 0. 36240d553cfSPaul Beesley 36340d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only 36440d553cfSPaul Beesleythe cache associated with power level 0 is flushed (L1). 36540d553cfSPaul Beesley 36640d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in sequence 36740d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 36840d553cfSPaul Beesley 36940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 37040d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 37140d553cfSPaul Beesley+=======+=====================+====================+==========================+ 37240d553cfSPaul Beesley| 0 | 114 | 20 | 94 | 37340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 37440d553cfSPaul Beesley| 1 | 114 | 20 | 94 | 37540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 37640d553cfSPaul Beesley| 2 | 114 | 20 | 94 | 37740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 37840d553cfSPaul Beesley| 3 | 114 | 20 | 94 | 37940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 38040d553cfSPaul Beesley| 4 | 195 | 22 | 180 | 38140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 38240d553cfSPaul Beesley| 5 | 21 | 17 | 6 | 38340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 38440d553cfSPaul Beesley 385be653a69SPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster 38640d553cfSPaul Beesleyare large because all other CPUs in the cluster are powered down during the 38740d553cfSPaul Beesleytest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a 38840d553cfSPaul Beesleyflush of both L1 and L2 caches. 38940d553cfSPaul Beesley 39040d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 39140d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 39240d553cfSPaul Beesleyto the little cluster (1MB). 39340d553cfSPaul Beesley 39440d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead 39540d553cfSPaul BeesleyCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to 39640d553cfSPaul Beesleylevel 0, which only requires L1 cache flush. 39740d553cfSPaul Beesley 39840d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in sequence 39940d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 40040d553cfSPaul Beesley 40140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40240d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 40340d553cfSPaul Beesley+=======+=====================+====================+==========================+ 40440d553cfSPaul Beesley| 0 | 22 | 14 | 5 | 40540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40640d553cfSPaul Beesley| 1 | 22 | 14 | 5 | 40740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40840d553cfSPaul Beesley| 2 | 21 | 14 | 5 | 40940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 41040d553cfSPaul Beesley| 3 | 22 | 14 | 5 | 41140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 41240d553cfSPaul Beesley| 4 | 17 | 14 | 6 | 41340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 41440d553cfSPaul Beesley| 5 | 18 | 15 | 6 | 41540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 41640d553cfSPaul Beesley 41740d553cfSPaul BeesleyHere the times are small and consistent since there is no contention and it is 41840d553cfSPaul Beesleyonly necessary to flush the cache to power level 0 (L1). This is the best case 41940d553cfSPaul Beesleyscenario. 42040d553cfSPaul Beesley 42140d553cfSPaul BeesleyThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than 42240d553cfSPaul Beesleyfor the CPUs in little cluster due to greater CPU performance. 42340d553cfSPaul Beesley 42440d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are generally lower than in the last test because the 42540d553cfSPaul Beesleycluster remains powered on throughout the test and there is less code to execute 42640d553cfSPaul Beesleyon power on (for example, no need to enter CCI coherency) 42740d553cfSPaul Beesley 42840d553cfSPaul Beesley``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level 42940d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 43040d553cfSPaul Beesley 43140d553cfSPaul BeesleyThe test sequence here is as follows: 43240d553cfSPaul Beesley 43340d553cfSPaul Beesley1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence. 43440d553cfSPaul Beesley 43540d553cfSPaul Beesley2. Program wake up timer and suspend the lead CPU to the deepest power level. 43640d553cfSPaul Beesley 43740d553cfSPaul Beesley3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU. 43840d553cfSPaul Beesley 43940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 44040d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 44140d553cfSPaul Beesley+=======+=====================+====================+==========================+ 44240d553cfSPaul Beesley| 0 | 110 | 28 | 93 | 44340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 44440d553cfSPaul Beesley| 1 | 110 | 28 | 93 | 44540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 44640d553cfSPaul Beesley| 2 | 110 | 28 | 93 | 44740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 44840d553cfSPaul Beesley| 3 | 111 | 28 | 93 | 44940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 45040d553cfSPaul Beesley| 4 | 195 | 22 | 181 | 45140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 45240d553cfSPaul Beesley| 5 | 20 | 23 | 6 | 45340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 45440d553cfSPaul Beesley 45540d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other 45640d553cfSPaul BeesleyCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call 45740d553cfSPaul Beesleypowers down to the cluster level, requiring a flush of both L1 and L2 caches. 45840d553cfSPaul Beesley 45940d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because 46040d553cfSPaul Beesleylead CPU 4 is running and CPU 5 only powers down to level 0, which only requires 46140d553cfSPaul Beesleyan L1 cache flush. 46240d553cfSPaul Beesley 46340d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 46440d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 46540d553cfSPaul Beesleyto the little cluster (1MB). 46640d553cfSPaul Beesley 46740d553cfSPaul BeesleyThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than 46840d553cfSPaul Beesleyfor CPUs in the little cluster due to greater CPU performance. These times 46940d553cfSPaul Beesleygenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests 47040d553cfSPaul Beesleybecause there is more code to execute in the "on finisher" compared to the 47140d553cfSPaul Beesley"suspend finisher" (for example, GIC redistributor register programming). 47240d553cfSPaul Beesley 47340d553cfSPaul Beesley``PSCI_VERSION`` on all CPUs in parallel 47440d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 47540d553cfSPaul Beesley 47640d553cfSPaul BeesleySince very little code is associated with ``PSCI_VERSION``, this test 47740d553cfSPaul Beesleyapproximates the round trip latency for handling a fast SMC at EL3 in TF. 47840d553cfSPaul Beesley 47940d553cfSPaul Beesley+-------+-------------------+ 48040d553cfSPaul Beesley| CPU | TOTAL TIME (ns) | 48140d553cfSPaul Beesley+=======+===================+ 48240d553cfSPaul Beesley| 0 | 3020 | 48340d553cfSPaul Beesley+-------+-------------------+ 48440d553cfSPaul Beesley| 1 | 2940 | 48540d553cfSPaul Beesley+-------+-------------------+ 48640d553cfSPaul Beesley| 2 | 2980 | 48740d553cfSPaul Beesley+-------+-------------------+ 48840d553cfSPaul Beesley| 3 | 3060 | 48940d553cfSPaul Beesley+-------+-------------------+ 49040d553cfSPaul Beesley| 4 | 520 | 49140d553cfSPaul Beesley+-------+-------------------+ 49240d553cfSPaul Beesley| 5 | 720 | 49340d553cfSPaul Beesley+-------+-------------------+ 49440d553cfSPaul Beesley 49540d553cfSPaul BeesleyThe times for the big CPUs are less than the little CPUs due to greater CPU 49640d553cfSPaul Beesleyperformance. 49740d553cfSPaul Beesley 49840d553cfSPaul BeesleyWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache 49940d553cfSPaul Beesleyeffects, given that these measurements are at the nano-second level. 50040d553cfSPaul Beesley 501bd97f83aSJohn Tsichritzis-------------- 502bd97f83aSJohn Tsichritzis 5039b65ffefSBoyan Karatotev*Copyright (c) 2019-2025, Arm Limited and Contributors. All rights reserved.* 504bd97f83aSJohn Tsichritzis 5050cbcccc0SHarrison Mutai.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/ 50640d553cfSPaul Beesley.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d 507*5059fea0SBoyan Karatotev.. _TF-A v2.13-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.13-rc0 508*5059fea0SBoyan Karatotev.. _TFTF v2.13-rc0: https://git.trustedfirmware.org/TF-A/tf-a-tests.git/tree/?h=v2.13-rc0 509