140d553cfSPaul BeesleyPSCI Performance Measurements on Arm Juno Development Platform 240d553cfSPaul Beesley============================================================== 340d553cfSPaul Beesley 440d553cfSPaul BeesleyThis document summarises the findings of performance measurements of key 5bd97f83aSJohn Tsichritzisoperations in the Trusted Firmware-A Power State Coordination Interface (PSCI) 6bd97f83aSJohn Tsichritzisimplementation, using the in-built Performance Measurement Framework (PMF) and 7bd97f83aSJohn Tsichritzisruntime instrumentation timestamps. 840d553cfSPaul Beesley 940d553cfSPaul BeesleyMethod 1040d553cfSPaul Beesley------ 1140d553cfSPaul Beesley 1240d553cfSPaul BeesleyWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2 1340d553cfSPaul Beesleyx Cortex-A57 clusters running at the following frequencies: 1440d553cfSPaul Beesley 1540d553cfSPaul Beesley+-----------------+--------------------+ 1640d553cfSPaul Beesley| Domain | Frequency (MHz) | 1740d553cfSPaul Beesley+=================+====================+ 1840d553cfSPaul Beesley| Cortex-A57 | 900 (nominal) | 1940d553cfSPaul Beesley+-----------------+--------------------+ 2040d553cfSPaul Beesley| Cortex-A53 | 650 (underdrive) | 2140d553cfSPaul Beesley+-----------------+--------------------+ 2240d553cfSPaul Beesley| AXI subsystem | 533 | 2340d553cfSPaul Beesley+-----------------+--------------------+ 2440d553cfSPaul Beesley 2540d553cfSPaul BeesleyJuno supports CPU, cluster and system power down states, corresponding to power 2640d553cfSPaul Beesleylevels 0, 1 and 2 respectively. It does not support any retention states. 2740d553cfSPaul Beesley 28a3077ae1SHarrison MutaiGiven that runtime instrumentation using PMF is invasive, there is a small 29a3077ae1SHarrison Mutai(unquantified) overhead on the results. PMF uses the generic counter for 30a3077ae1SHarrison Mutaitimestamps, which runs at 50MHz on Juno. 31a3077ae1SHarrison Mutai 32a3077ae1SHarrison MutaiThe following source trees and binaries were used: 33a3077ae1SHarrison Mutai 34*97020355SXialin Liu- `TF-A v2.14-rc0`_ 35*97020355SXialin Liu- `TFTF v2.14-rc0`_ 36a3077ae1SHarrison Mutai 375fdf198cSThaddeus SernaPlease see the Runtime Instrumentation :ref:`Testing Methodology 385fdf198cSThaddeus Serna<Runtime Instrumentation Methodology>` 399b65ffefSBoyan Karatotevpage for more details. The tests were ran using the 409b65ffefSBoyan Karatotev`tf-psci-lava-instr/juno-enable-runtime-instr,juno-instrumentation:juno-tftf` 419b65ffefSBoyan Karatotevconfiguration in CI. 42a3077ae1SHarrison Mutai 43a3077ae1SHarrison MutaiResults 44a3077ae1SHarrison Mutai------- 45a3077ae1SHarrison Mutai 46a3077ae1SHarrison Mutai``CPU_SUSPEND`` to deepest power level 47a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 48a3077ae1SHarrison Mutai 49a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 50*97020355SXialin Liu parallel (v2.14) 515059fea0SBoyan Karatotev 52*97020355SXialin Liu +---------+------+------------------+---------------------+-------------------+ 535059fea0SBoyan Karatotev | Cluster | Core | Powerdown | Wakeup | Cache Flush | 54*97020355SXialin Liu +---------+------+------------------+---------------------+-------------------+ 55*97020355SXialin Liu | 0 | 0 | 332440.0 | 270640.0(+1031.44%) | 169500.0(+22.05%) | 56*97020355SXialin Liu +---------+------+------------------+---------------------+-------------------+ 57*97020355SXialin Liu | 0 | 1 | 624520.0(-1.01%) | 30260.0(-88.07%) | 166740.0(+21.76%) | 58*97020355SXialin Liu +---------+------+------------------+---------------------+-------------------+ 59*97020355SXialin Liu | 1 | 0 | 187960.0(+1.74%) | 25460.0(+9.93%) | 90420.0(+12.69%) | 60*97020355SXialin Liu +---------+------+------------------+---------------------+-------------------+ 61*97020355SXialin Liu | 1 | 1 | 479100.0 | 20520.0(+10.56%) | 87500.0(+14.38%) | 62*97020355SXialin Liu +---------+------+------------------+---------------------+-------------------+ 63*97020355SXialin Liu | 1 | 2 | 923480.0(-1.11%) | 294160.0(+1.58%) | 87500.0(+14.62%) | 64*97020355SXialin Liu +---------+------+------------------+---------------------+-------------------+ 65*97020355SXialin Liu | 1 | 3 | 1106300.0 | 238320.0 | 87340.0(+14.35%) | 66*97020355SXialin Liu +---------+------+------------------+---------------------+-------------------+ 675059fea0SBoyan Karatotev 685059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 69*97020355SXialin Liu parallel (v2.13) 70a0db5c74SZachary Leaf 71*97020355SXialin Liu +---------+------+--------------------+--------------------+---------------------+ 72a0db5c74SZachary Leaf | Cluster | Core | Powerdown | Wakeup | Cache Flush | 73*97020355SXialin Liu +---------+------+--------------------+--------------------+---------------------+ 74*97020355SXialin Liu | 0 | 0 | 333000.0(-52.92%) | 23920.0(-40.11%) | 138880.0(-17.24%) | 75*97020355SXialin Liu +---------+------+--------------------+--------------------+---------------------+ 76*97020355SXialin Liu | 0 | 1 | 630900.0(+145.95%) | 253720.0(-46.56%) | 136940.0(+1987.50%) | 77*97020355SXialin Liu +---------+------+--------------------+--------------------+---------------------+ 78*97020355SXialin Liu | 1 | 0 | 184740.0(+71.92%) | 23160.0(-95.39%) | 80240.0(+1283.45%) | 79*97020355SXialin Liu +---------+------+--------------------+--------------------+---------------------+ 80*97020355SXialin Liu | 1 | 1 | 481140.0(+18.16%) | 18560.0(-88.25%) | 76500.0(+1520.76%) | 81*97020355SXialin Liu +---------+------+--------------------+--------------------+---------------------+ 82*97020355SXialin Liu | 1 | 2 | 933880.0(+67.76%) | 289580.0(+189.64%) | 76340.0(+1510.55%) | 83*97020355SXialin Liu +---------+------+--------------------+--------------------+---------------------+ 84*97020355SXialin Liu | 1 | 3 | 1112480.0(+9.76%) | 238420.0(+753.94%) | 76380.0(-15.32%) | 85*97020355SXialin Liu +---------+------+--------------------+--------------------+---------------------+ 86*97020355SXialin Liu 87*97020355SXialin Liu.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 88*97020355SXialin Liu serial (v2.14) 89*97020355SXialin Liu 90*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 91*97020355SXialin Liu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 92*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 93*97020355SXialin Liu | 0 | 0 | 267000.0(+9.39%) | 31080.0(+26.96%) | 168520.0(+22.44%) | 94*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 95*97020355SXialin Liu | 0 | 1 | 267440.0(+9.52%) | 30680.0(+28.69%) | 168480.0(+22.21%) | 96*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 97*97020355SXialin Liu | 1 | 0 | 291300.0(-1.18%) | 25140.0(+6.80%) | 86980.0(+13.52%) | 98*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 99*97020355SXialin Liu | 1 | 1 | 184260.0(+2.31%) | 23140.0(+9.46%) | 87940.0(+14.03%) | 100*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 101*97020355SXialin Liu | 1 | 2 | 184520.0(+2.20%) | 23460.0(+12.79%) | 87520.0(+14.02%) | 102*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 103*97020355SXialin Liu | 1 | 3 | 184700.0(+2.27%) | 23240.0(+9.62%) | 87180.0(+13.43%) | 104*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 105a0db5c74SZachary Leaf 106a0db5c74SZachary Leaf.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 1075059fea0SBoyan Karatotev serial (v2.13) 108a3077ae1SHarrison Mutai 109*97020355SXialin Liu +---------+------+-------------------+------------------+--------------------+ 11094276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 111*97020355SXialin Liu +---------+------+-------------------+------------------+--------------------+ 112*97020355SXialin Liu | 0 | 0 | 244080.0(-9.21%) | 24480.0(-40.00%) | 137640.0(-18.19%) | 113*97020355SXialin Liu +---------+------+-------------------+------------------+--------------------+ 114*97020355SXialin Liu | 0 | 1 | 244200.0(-9.06%) | 23840.0(-41.57%) | 137860.0(-17.91%) | 115*97020355SXialin Liu +---------+------+-------------------+------------------+--------------------+ 116*97020355SXialin Liu | 1 | 0 | 294780.0(-1.56%) | 23540.0(-14.83%) | 76620.0(-12.35%) | 117*97020355SXialin Liu +---------+------+-------------------+------------------+--------------------+ 118*97020355SXialin Liu | 1 | 1 | 180100.0(+74.72%) | 21140.0(-6.63%) | 77120.0(+1533.90%) | 119*97020355SXialin Liu +---------+------+-------------------+------------------+--------------------+ 120*97020355SXialin Liu | 1 | 2 | 180540.0(+75.25%) | 20800.0(-10.34%) | 76760.0(+1554.31%) | 121*97020355SXialin Liu +---------+------+-------------------+------------------+--------------------+ 122*97020355SXialin Liu | 1 | 3 | 180600.0(+75.44%) | 21200.0(-7.99%) | 76860.0(+1542.31%) | 123*97020355SXialin Liu +---------+------+-------------------+------------------+--------------------+ 12494276a56SHarrison Mutai 125a3077ae1SHarrison Mutai``CPU_SUSPEND`` to power level 0 126a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 127a3077ae1SHarrison Mutai 128a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in 129*97020355SXialin Liu parallel (v2.14) 1305059fea0SBoyan Karatotev 131*97020355SXialin Liu +---------+------+--------------------+------------------+------------------+ 1325059fea0SBoyan Karatotev | Cluster | Core | Powerdown | Wakeup | Cache Flush | 133*97020355SXialin Liu +---------+------+--------------------+------------------+------------------+ 134*97020355SXialin Liu | 0 | 0 | 683780.0(-2.74%) | 22560.0(+33.81%) | 11040.0(+38.35%) | 135*97020355SXialin Liu +---------+------+--------------------+------------------+------------------+ 136*97020355SXialin Liu | 0 | 1 | 829620.0(-2.61%) | 22820.0(+39.15%) | 11480.0(+42.79%) | 137*97020355SXialin Liu +---------+------+--------------------+------------------+------------------+ 138*97020355SXialin Liu | 1 | 0 | 104520.0(-74.34%) | 17200.0(+13.91%) | 8680.0(+20.56%) | 139*97020355SXialin Liu +---------+------+--------------------+------------------+------------------+ 140*97020355SXialin Liu | 1 | 1 | 249200.0(+124.54%) | 17100.0(+10.61%) | 8480.0(+29.27%) | 141*97020355SXialin Liu +---------+------+--------------------+------------------+------------------+ 142*97020355SXialin Liu | 1 | 2 | 393980.0(-28.95%) | 17480.0(+13.51%) | 8320.0(+19.88%) | 143*97020355SXialin Liu +---------+------+--------------------+------------------+------------------+ 144*97020355SXialin Liu | 1 | 3 | 539520.0(+108.34%) | 16980.0(+9.13%) | 8300.0(+25.00%) | 145*97020355SXialin Liu +---------+------+--------------------+------------------+------------------+ 1465059fea0SBoyan Karatotev 1475059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in 148*97020355SXialin Liu parallel (v2.13) 149a0db5c74SZachary Leaf 150*97020355SXialin Liu +---------+------+--------------------+------------------+-----------------+ 151a0db5c74SZachary Leaf | Cluster | Core | Powerdown | Wakeup | Cache Flush | 152*97020355SXialin Liu +---------+------+--------------------+------------------+-----------------+ 153*97020355SXialin Liu | 0 | 0 | 703060.0(-17.69%) | 16860.0(-47.87%) | 7980.0(-19.88%) | 154*97020355SXialin Liu +---------+------+--------------------+------------------+-----------------+ 155*97020355SXialin Liu | 0 | 1 | 851880.0(+20.98%) | 16400.0(-49.41%) | 8040.0(-17.45%) | 156*97020355SXialin Liu +---------+------+--------------------+------------------+-----------------+ 157*97020355SXialin Liu | 1 | 0 | 407400.0(+58.99%) | 15100.0(-26.20%) | 7200.0(-5.76%) | 158*97020355SXialin Liu +---------+------+--------------------+------------------+-----------------+ 159*97020355SXialin Liu | 1 | 1 | 110980.0(-72.67%) | 15460.0(-23.47%) | 6560.0(-10.87%) | 160*97020355SXialin Liu +---------+------+--------------------+------------------+-----------------+ 161*97020355SXialin Liu | 1 | 2 | 554540.0 | 15400.0(-23.46%) | 6940.0(-2.53%) | 162*97020355SXialin Liu +---------+------+--------------------+------------------+-----------------+ 163*97020355SXialin Liu | 1 | 3 | 258960.0(+143.06%) | 15560.0(-25.05%) | 6640.0 | 164*97020355SXialin Liu +---------+------+--------------------+------------------+-----------------+ 165*97020355SXialin Liu 166*97020355SXialin Liu.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.14) 167*97020355SXialin Liu 168*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 169*97020355SXialin Liu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 170*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 171*97020355SXialin Liu | 0 | 0 | 101100.0(-4.73%) | 22820.0(+33.45%) | 7360.0(+39.92%) | 172*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 173*97020355SXialin Liu | 0 | 1 | 101400.0(-5.13%) | 22720.0(+33.18%) | 7560.0(+43.18%) | 174*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 175*97020355SXialin Liu | 1 | 0 | 291440.0 | 16880.0(+8.21%) | 4580.0 | 176*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 177*97020355SXialin Liu | 1 | 1 | 96600.0(-6.45%) | 16860.0(+9.20%) | 4600.0(+3.14%) | 178*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 179*97020355SXialin Liu | 1 | 2 | 97060.0(-6.40%) | 16980.0(+11.27%) | 4640.0(+3.11%) | 180*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 181*97020355SXialin Liu | 1 | 3 | 96660.0(-6.77%) | 16960.0(+7.89%) | 4620.0(+2.67%) | 182*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 183*97020355SXialin Liu 184a0db5c74SZachary Leaf 1855059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.13) 186a3077ae1SHarrison Mutai 187*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 18894276a56SHarrison Mutai | Cluster | Core | Powerdown | Wakeup | Cache Flush | 189*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 190*97020355SXialin Liu | 0 | 0 | 106120.0(+1.49%) | 17100.0(-48.24%) | 5260.0(-23.77%) | 191*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 192*97020355SXialin Liu | 0 | 1 | 106880.0(+2.40%) | 17060.0(-47.08%) | 5280.0(-21.89%) | 193*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 194*97020355SXialin Liu | 1 | 0 | 294360.0 | 15600.0(-20.97%) | 4560.0 | 195*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 196*97020355SXialin Liu | 1 | 1 | 103260.0(+3.82%) | 15440.0(-20.41%) | 4460.0(-5.11%) | 197*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 198*97020355SXialin Liu | 1 | 2 | 103700.0(+4.33%) | 15260.0(-24.08%) | 4500.0(-2.60%) | 199*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 200*97020355SXialin Liu | 1 | 3 | 103680.0(+4.26%) | 15720.0(-20.53%) | 4500.0(-1.32%) | 201*97020355SXialin Liu +---------+------+------------------+------------------+-----------------+ 20294276a56SHarrison Mutai 203a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs 204a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 205a3077ae1SHarrison Mutai 206a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead 207a3077ae1SHarrison Mutaicore to the deepest power level. 208a3077ae1SHarrison Mutai 209*97020355SXialin Liu.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.14) 210*97020355SXialin Liu 211*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 212*97020355SXialin Liu | Cluster | Core | Powerdown | Wakeup | Cache Flush | 213*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 214*97020355SXialin Liu | 0 | 0 | 267240.0(+9.97%) | 32940.0(+24.68%) | 168460.0(+22.45%) | 215*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 216*97020355SXialin Liu | 0 | 1 | 267340.0(+9.46%) | 33720.0(+28.12%) | 168500.0(+22.21%) | 217*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 218*97020355SXialin Liu | 1 | 0 | 185740.0(+1.85%) | 25120.0(+6.17%) | 88380.0(+13.31%) | 219*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 220*97020355SXialin Liu | 1 | 1 | 101940.0(-5.77%) | 24240.0(+6.88%) | 4600.0(+4.07%) | 221*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 222*97020355SXialin Liu | 1 | 2 | 101800.0(-6.04%) | 23060.0(+6.17%) | 4660.0(+9.91%) | 223*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 224*97020355SXialin Liu | 1 | 3 | 101820.0(-5.91%) | 23340.0(+7.66%) | 4640.0(+6.91%) | 225*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 226*97020355SXialin Liu 2275059fea0SBoyan Karatotev.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.13) 2285059fea0SBoyan Karatotev 229*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 2305059fea0SBoyan Karatotev | Cluster | Core | Powerdown | Wakeup | Cache Flush | 231*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 232*97020355SXialin Liu | 0 | 0 | 243020.0(-9.14%) | 26420.0(-39.51%) | 137580.0(-17.85%) | 233*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 234*97020355SXialin Liu | 0 | 1 | 244240.0(-8.87%) | 26320.0(-38.93%) | 137880.0(-17.73%) | 235*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 236*97020355SXialin Liu | 1 | 0 | 182360.0(-2.89%) | 23660.0(-15.20%) | 78000.0(-11.08%) | 237*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 238*97020355SXialin Liu | 1 | 1 | 108180.0(+4.68%) | 22680.0(-14.16%) | 4420.0 | 239*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 240*97020355SXialin Liu | 1 | 2 | 108340.0(+4.92%) | 21720.0(-16.40%) | 4240.0(-4.93%) | 241*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 242*97020355SXialin Liu | 1 | 3 | 108220.0(+4.82%) | 21680.0(-16.16%) | 4340.0(-3.12%) | 243*97020355SXialin Liu +---------+------+------------------+------------------+-------------------+ 244a0db5c74SZachary Leaf 245a3077ae1SHarrison Mutai``CPU_VERSION`` in parallel 246a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~ 247a3077ae1SHarrison Mutai 248*97020355SXialin Liu.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.14) 249*97020355SXialin Liu 250*97020355SXialin Liu +---------+------+--------------------+ 251*97020355SXialin Liu | Cluster | Core | Latency | 252*97020355SXialin Liu +---------+------+--------------------+ 253*97020355SXialin Liu | 0 | 0 | 1200.0(+20.00%) | 254*97020355SXialin Liu +---------+------+--------------------+ 255*97020355SXialin Liu | 0 | 1 | 1160.0(+9.43%) | 256*97020355SXialin Liu +---------+------+--------------------+ 257*97020355SXialin Liu | 1 | 0 | 700.0(+16.67%) | 258*97020355SXialin Liu +---------+------+--------------------+ 259*97020355SXialin Liu | 1 | 1 | 1040.0(+4.00%) | 260*97020355SXialin Liu +---------+------+--------------------+ 261*97020355SXialin Liu | 1 | 2 | 1020.0(+4.08%) | 262*97020355SXialin Liu +---------+------+--------------------+ 263*97020355SXialin Liu | 1 | 3 | 1080.0(+8.00%) | 264*97020355SXialin Liu +---------+------+--------------------+ 265*97020355SXialin Liu 2665059fea0SBoyan Karatotev.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.13) 2675059fea0SBoyan Karatotev 268*97020355SXialin Liu +---------+------+--------------------+ 2695059fea0SBoyan Karatotev | Cluster | Core | Latency | 270*97020355SXialin Liu +---------+------+--------------------+ 271*97020355SXialin Liu | 0 | 0 | 1000.0(-19.35%) | 272*97020355SXialin Liu +---------+------+--------------------+ 273*97020355SXialin Liu | 0 | 1 | 1060.0(-17.19%) | 274*97020355SXialin Liu +---------+------+--------------------+ 275*97020355SXialin Liu | 1 | 0 | 600.0(-11.76%) | 276*97020355SXialin Liu +---------+------+--------------------+ 277*97020355SXialin Liu | 1 | 1 | 1000.0(+2.04%) | 278*97020355SXialin Liu +---------+------+--------------------+ 279*97020355SXialin Liu | 1 | 2 | 980.0(+4.26%) | 280*97020355SXialin Liu +---------+------+--------------------+ 281*97020355SXialin Liu | 1 | 3 | 1000.0(+2.04%) | 282*97020355SXialin Liu +---------+------+--------------------+ 283a0db5c74SZachary Leaf 284a3077ae1SHarrison MutaiAnnotated Historic Results 285a3077ae1SHarrison Mutai-------------------------- 286a3077ae1SHarrison Mutai 287a3077ae1SHarrison MutaiThe following results are based on the upstream `TF master as of 31/01/2017`_. 288a3077ae1SHarrison MutaiTF-A was built using the same build instructions as detailed in the procedure 289a3077ae1SHarrison Mutaiabove. 29040d553cfSPaul Beesley 29140d553cfSPaul BeesleyIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and 29240d553cfSPaul BeesleyCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead 29340d553cfSPaul BeesleyCPU. 29440d553cfSPaul Beesley 295a3077ae1SHarrison Mutai``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and 296a3077ae1SHarrison Mutai``CFLUSH_OVERHEAD`` the latency of the cache flush operation. 29740d553cfSPaul Beesley 29840d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in parallel 29940d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 30040d553cfSPaul Beesley 30140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 30240d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 30340d553cfSPaul Beesley+=======+=====================+====================+==========================+ 30440d553cfSPaul Beesley| 0 | 27 | 20 | 5 | 30540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 30640d553cfSPaul Beesley| 1 | 114 | 86 | 5 | 30740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 30840d553cfSPaul Beesley| 2 | 202 | 58 | 5 | 30940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 31040d553cfSPaul Beesley| 3 | 375 | 29 | 94 | 31140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 31240d553cfSPaul Beesley| 4 | 20 | 22 | 6 | 31340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 31440d553cfSPaul Beesley| 5 | 290 | 18 | 206 | 31540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 31640d553cfSPaul Beesley 31740d553cfSPaul BeesleyA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is 31840d553cfSPaul Beesleyobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait 31940d553cfSPaul Beesleyfor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release 32040d553cfSPaul Beesleythe lock before proceeding. 32140d553cfSPaul Beesley 32240d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the 32340d553cfSPaul Beesleylast CPUs in their respective clusters to power down, therefore both the L1 and 32440d553cfSPaul BeesleyL2 caches are flushed. 32540d553cfSPaul Beesley 32640d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3 32740d553cfSPaul Beesleybecause the L2 cache size for the big cluster is lot larger (2MB) compared to 32840d553cfSPaul Beesleythe little cluster (1MB). 32940d553cfSPaul Beesley 33040d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in parallel 33140d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 33240d553cfSPaul Beesley 33340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 33440d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 33540d553cfSPaul Beesley+=======+=====================+====================+==========================+ 33640d553cfSPaul Beesley| 0 | 116 | 14 | 8 | 33740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 33840d553cfSPaul Beesley| 1 | 204 | 14 | 8 | 33940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34040d553cfSPaul Beesley| 2 | 287 | 13 | 8 | 34140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34240d553cfSPaul Beesley| 3 | 376 | 13 | 9 | 34340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34440d553cfSPaul Beesley| 4 | 29 | 15 | 7 | 34540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34640d553cfSPaul Beesley| 5 | 21 | 15 | 8 | 34740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 34840d553cfSPaul Beesley 34940d553cfSPaul BeesleyThere is no lock contention in TF generic code at power level 0 but the large 35040d553cfSPaul Beesleyvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno 35140d553cfSPaul Beesleyplatform code. The platform lock is used to mediate access to a single SCP 35240d553cfSPaul Beesleycommunication channel. This is compounded by the SCP firmware waiting for each 35340d553cfSPaul BeesleyAP CPU to enter WFI before making the channel available to other CPUs, which 35440d553cfSPaul Beesleyeffectively serializes the SCP power down commands from all CPUs. 35540d553cfSPaul Beesley 35640d553cfSPaul BeesleyOn platforms with a more efficient CPU power down mechanism, it should be 35740d553cfSPaul Beesleypossible to make the ``PSCI_ENTRY`` times smaller and consistent. 35840d553cfSPaul Beesley 35940d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not 36040d553cfSPaul Beesleyrequire locks at power level 0. 36140d553cfSPaul Beesley 36240d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only 36340d553cfSPaul Beesleythe cache associated with power level 0 is flushed (L1). 36440d553cfSPaul Beesley 36540d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in sequence 36640d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 36740d553cfSPaul Beesley 36840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 36940d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 37040d553cfSPaul Beesley+=======+=====================+====================+==========================+ 37140d553cfSPaul Beesley| 0 | 114 | 20 | 94 | 37240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 37340d553cfSPaul Beesley| 1 | 114 | 20 | 94 | 37440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 37540d553cfSPaul Beesley| 2 | 114 | 20 | 94 | 37640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 37740d553cfSPaul Beesley| 3 | 114 | 20 | 94 | 37840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 37940d553cfSPaul Beesley| 4 | 195 | 22 | 180 | 38040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 38140d553cfSPaul Beesley| 5 | 21 | 17 | 6 | 38240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 38340d553cfSPaul Beesley 384be653a69SPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster 38540d553cfSPaul Beesleyare large because all other CPUs in the cluster are powered down during the 38640d553cfSPaul Beesleytest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a 38740d553cfSPaul Beesleyflush of both L1 and L2 caches. 38840d553cfSPaul Beesley 38940d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 39040d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 39140d553cfSPaul Beesleyto the little cluster (1MB). 39240d553cfSPaul Beesley 39340d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead 39440d553cfSPaul BeesleyCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to 39540d553cfSPaul Beesleylevel 0, which only requires L1 cache flush. 39640d553cfSPaul Beesley 39740d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in sequence 39840d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 39940d553cfSPaul Beesley 40040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40140d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 40240d553cfSPaul Beesley+=======+=====================+====================+==========================+ 40340d553cfSPaul Beesley| 0 | 22 | 14 | 5 | 40440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40540d553cfSPaul Beesley| 1 | 22 | 14 | 5 | 40640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40740d553cfSPaul Beesley| 2 | 21 | 14 | 5 | 40840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 40940d553cfSPaul Beesley| 3 | 22 | 14 | 5 | 41040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 41140d553cfSPaul Beesley| 4 | 17 | 14 | 6 | 41240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 41340d553cfSPaul Beesley| 5 | 18 | 15 | 6 | 41440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 41540d553cfSPaul Beesley 41640d553cfSPaul BeesleyHere the times are small and consistent since there is no contention and it is 41740d553cfSPaul Beesleyonly necessary to flush the cache to power level 0 (L1). This is the best case 41840d553cfSPaul Beesleyscenario. 41940d553cfSPaul Beesley 42040d553cfSPaul BeesleyThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than 42140d553cfSPaul Beesleyfor the CPUs in little cluster due to greater CPU performance. 42240d553cfSPaul Beesley 42340d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are generally lower than in the last test because the 42440d553cfSPaul Beesleycluster remains powered on throughout the test and there is less code to execute 42540d553cfSPaul Beesleyon power on (for example, no need to enter CCI coherency) 42640d553cfSPaul Beesley 42740d553cfSPaul Beesley``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level 42840d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 42940d553cfSPaul Beesley 43040d553cfSPaul BeesleyThe test sequence here is as follows: 43140d553cfSPaul Beesley 43240d553cfSPaul Beesley1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence. 43340d553cfSPaul Beesley 43440d553cfSPaul Beesley2. Program wake up timer and suspend the lead CPU to the deepest power level. 43540d553cfSPaul Beesley 43640d553cfSPaul Beesley3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU. 43740d553cfSPaul Beesley 43840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 43940d553cfSPaul Beesley| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 44040d553cfSPaul Beesley+=======+=====================+====================+==========================+ 44140d553cfSPaul Beesley| 0 | 110 | 28 | 93 | 44240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 44340d553cfSPaul Beesley| 1 | 110 | 28 | 93 | 44440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 44540d553cfSPaul Beesley| 2 | 110 | 28 | 93 | 44640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 44740d553cfSPaul Beesley| 3 | 111 | 28 | 93 | 44840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 44940d553cfSPaul Beesley| 4 | 195 | 22 | 181 | 45040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 45140d553cfSPaul Beesley| 5 | 20 | 23 | 6 | 45240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+ 45340d553cfSPaul Beesley 45440d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other 45540d553cfSPaul BeesleyCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call 45640d553cfSPaul Beesleypowers down to the cluster level, requiring a flush of both L1 and L2 caches. 45740d553cfSPaul Beesley 45840d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because 45940d553cfSPaul Beesleylead CPU 4 is running and CPU 5 only powers down to level 0, which only requires 46040d553cfSPaul Beesleyan L1 cache flush. 46140d553cfSPaul Beesley 46240d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 46340d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 46440d553cfSPaul Beesleyto the little cluster (1MB). 46540d553cfSPaul Beesley 46640d553cfSPaul BeesleyThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than 46740d553cfSPaul Beesleyfor CPUs in the little cluster due to greater CPU performance. These times 46840d553cfSPaul Beesleygenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests 46940d553cfSPaul Beesleybecause there is more code to execute in the "on finisher" compared to the 47040d553cfSPaul Beesley"suspend finisher" (for example, GIC redistributor register programming). 47140d553cfSPaul Beesley 47240d553cfSPaul Beesley``PSCI_VERSION`` on all CPUs in parallel 47340d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 47440d553cfSPaul Beesley 47540d553cfSPaul BeesleySince very little code is associated with ``PSCI_VERSION``, this test 47640d553cfSPaul Beesleyapproximates the round trip latency for handling a fast SMC at EL3 in TF. 47740d553cfSPaul Beesley 47840d553cfSPaul Beesley+-------+-------------------+ 47940d553cfSPaul Beesley| CPU | TOTAL TIME (ns) | 48040d553cfSPaul Beesley+=======+===================+ 48140d553cfSPaul Beesley| 0 | 3020 | 48240d553cfSPaul Beesley+-------+-------------------+ 48340d553cfSPaul Beesley| 1 | 2940 | 48440d553cfSPaul Beesley+-------+-------------------+ 48540d553cfSPaul Beesley| 2 | 2980 | 48640d553cfSPaul Beesley+-------+-------------------+ 48740d553cfSPaul Beesley| 3 | 3060 | 48840d553cfSPaul Beesley+-------+-------------------+ 48940d553cfSPaul Beesley| 4 | 520 | 49040d553cfSPaul Beesley+-------+-------------------+ 49140d553cfSPaul Beesley| 5 | 720 | 49240d553cfSPaul Beesley+-------+-------------------+ 49340d553cfSPaul Beesley 49440d553cfSPaul BeesleyThe times for the big CPUs are less than the little CPUs due to greater CPU 49540d553cfSPaul Beesleyperformance. 49640d553cfSPaul Beesley 49740d553cfSPaul BeesleyWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache 49840d553cfSPaul Beesleyeffects, given that these measurements are at the nano-second level. 49940d553cfSPaul Beesley 500bd97f83aSJohn Tsichritzis-------------- 501bd97f83aSJohn Tsichritzis 5029b65ffefSBoyan Karatotev*Copyright (c) 2019-2025, Arm Limited and Contributors. All rights reserved.* 503bd97f83aSJohn Tsichritzis 5040cbcccc0SHarrison Mutai.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/ 50540d553cfSPaul Beesley.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d 506*97020355SXialin Liu.. _TF-A v2.14-rc0: https://git.trustedfirmware.org/plugins/gitiles/TF-A/trusted-firmware-a/+/refs/tags/v2.14-rc0 507*97020355SXialin Liu.. _TFTF v2.14-rc0: https://git.trustedfirmware.org/plugins/gitiles/TF-A/tf-a-tests/+/refs/tags/v2.14-rc0 508