1PSCI Performance Measurements on Arm Juno Development Platform 2============================================================== 3 4This document summarises the findings of performance measurements of key 5operations in the Trusted Firmware-A Power State Coordination Interface (PSCI) 6implementation, using the in-built Performance Measurement Framework (PMF) and 7runtime instrumentation timestamps. 8 9Method 10------ 11 12We used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2 13x Cortex-A57 clusters running at the following frequencies: 14 15+-----------------+--------------------+ 16| Domain | Frequency (MHz) | 17+=================+====================+ 18| Cortex-A57 | 900 (nominal) | 19+-----------------+--------------------+ 20| Cortex-A53 | 650 (underdrive) | 21+-----------------+--------------------+ 22| AXI subsystem | 533 | 23+-----------------+--------------------+ 24 25Juno supports CPU, cluster and system power down states, corresponding to power 26levels 0, 1 and 2 respectively. It does not support any retention states. 27 28Given that runtime instrumentation using PMF is invasive, there is a small 29(unquantified) overhead on the results. PMF uses the generic counter for 30timestamps, which runs at 50MHz on Juno. 31 32The following source trees and binaries were used: 33 34- `TF-A v2.14-rc0`_ 35- `TFTF v2.14-rc0`_ 36 37Please see the Runtime Instrumentation :ref:`Testing Methodology 38<Runtime Instrumentation Methodology>` 39page for more details. The tests were ran using the 40`tf-psci-lava-instr/juno-enable-runtime-instr,juno-instrumentation:juno-tftf` 41configuration in CI. 42 43Results 44------- 45 46``CPU_SUSPEND`` to deepest power level 47~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 48 49.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 50 parallel (v2.14) 51 52 +---------+------+------------------+---------------------+-------------------+ 53 | Cluster | Core | Powerdown | Wakeup | Cache Flush | 54 +---------+------+------------------+---------------------+-------------------+ 55 | 0 | 0 | 332440.0 | 270640.0(+1031.44%) | 169500.0(+22.05%) | 56 +---------+------+------------------+---------------------+-------------------+ 57 | 0 | 1 | 624520.0(-1.01%) | 30260.0(-88.07%) | 166740.0(+21.76%) | 58 +---------+------+------------------+---------------------+-------------------+ 59 | 1 | 0 | 187960.0(+1.74%) | 25460.0(+9.93%) | 90420.0(+12.69%) | 60 +---------+------+------------------+---------------------+-------------------+ 61 | 1 | 1 | 479100.0 | 20520.0(+10.56%) | 87500.0(+14.38%) | 62 +---------+------+------------------+---------------------+-------------------+ 63 | 1 | 2 | 923480.0(-1.11%) | 294160.0(+1.58%) | 87500.0(+14.62%) | 64 +---------+------+------------------+---------------------+-------------------+ 65 | 1 | 3 | 1106300.0 | 238320.0 | 87340.0(+14.35%) | 66 +---------+------+------------------+---------------------+-------------------+ 67 68.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 69 parallel (v2.13) 70 71 +---------+------+--------------------+--------------------+---------------------+ 72 | Cluster | Core | Powerdown | Wakeup | Cache Flush | 73 +---------+------+--------------------+--------------------+---------------------+ 74 | 0 | 0 | 333000.0(-52.92%) | 23920.0(-40.11%) | 138880.0(-17.24%) | 75 +---------+------+--------------------+--------------------+---------------------+ 76 | 0 | 1 | 630900.0(+145.95%) | 253720.0(-46.56%) | 136940.0(+1987.50%) | 77 +---------+------+--------------------+--------------------+---------------------+ 78 | 1 | 0 | 184740.0(+71.92%) | 23160.0(-95.39%) | 80240.0(+1283.45%) | 79 +---------+------+--------------------+--------------------+---------------------+ 80 | 1 | 1 | 481140.0(+18.16%) | 18560.0(-88.25%) | 76500.0(+1520.76%) | 81 +---------+------+--------------------+--------------------+---------------------+ 82 | 1 | 2 | 933880.0(+67.76%) | 289580.0(+189.64%) | 76340.0(+1510.55%) | 83 +---------+------+--------------------+--------------------+---------------------+ 84 | 1 | 3 | 1112480.0(+9.76%) | 238420.0(+753.94%) | 76380.0(-15.32%) | 85 +---------+------+--------------------+--------------------+---------------------+ 86 87.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 88 serial (v2.14) 89 90 +---------+------+------------------+------------------+-------------------+ 91 | Cluster | Core | Powerdown | Wakeup | Cache Flush | 92 +---------+------+------------------+------------------+-------------------+ 93 | 0 | 0 | 267000.0(+9.39%) | 31080.0(+26.96%) | 168520.0(+22.44%) | 94 +---------+------+------------------+------------------+-------------------+ 95 | 0 | 1 | 267440.0(+9.52%) | 30680.0(+28.69%) | 168480.0(+22.21%) | 96 +---------+------+------------------+------------------+-------------------+ 97 | 1 | 0 | 291300.0(-1.18%) | 25140.0(+6.80%) | 86980.0(+13.52%) | 98 +---------+------+------------------+------------------+-------------------+ 99 | 1 | 1 | 184260.0(+2.31%) | 23140.0(+9.46%) | 87940.0(+14.03%) | 100 +---------+------+------------------+------------------+-------------------+ 101 | 1 | 2 | 184520.0(+2.20%) | 23460.0(+12.79%) | 87520.0(+14.02%) | 102 +---------+------+------------------+------------------+-------------------+ 103 | 1 | 3 | 184700.0(+2.27%) | 23240.0(+9.62%) | 87180.0(+13.43%) | 104 +---------+------+------------------+------------------+-------------------+ 105 106.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in 107 serial (v2.13) 108 109 +---------+------+-------------------+------------------+--------------------+ 110 | Cluster | Core | Powerdown | Wakeup | Cache Flush | 111 +---------+------+-------------------+------------------+--------------------+ 112 | 0 | 0 | 244080.0(-9.21%) | 24480.0(-40.00%) | 137640.0(-18.19%) | 113 +---------+------+-------------------+------------------+--------------------+ 114 | 0 | 1 | 244200.0(-9.06%) | 23840.0(-41.57%) | 137860.0(-17.91%) | 115 +---------+------+-------------------+------------------+--------------------+ 116 | 1 | 0 | 294780.0(-1.56%) | 23540.0(-14.83%) | 76620.0(-12.35%) | 117 +---------+------+-------------------+------------------+--------------------+ 118 | 1 | 1 | 180100.0(+74.72%) | 21140.0(-6.63%) | 77120.0(+1533.90%) | 119 +---------+------+-------------------+------------------+--------------------+ 120 | 1 | 2 | 180540.0(+75.25%) | 20800.0(-10.34%) | 76760.0(+1554.31%) | 121 +---------+------+-------------------+------------------+--------------------+ 122 | 1 | 3 | 180600.0(+75.44%) | 21200.0(-7.99%) | 76860.0(+1542.31%) | 123 +---------+------+-------------------+------------------+--------------------+ 124 125``CPU_SUSPEND`` to power level 0 126~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 127 128.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in 129 parallel (v2.14) 130 131 +---------+------+--------------------+------------------+------------------+ 132 | Cluster | Core | Powerdown | Wakeup | Cache Flush | 133 +---------+------+--------------------+------------------+------------------+ 134 | 0 | 0 | 683780.0(-2.74%) | 22560.0(+33.81%) | 11040.0(+38.35%) | 135 +---------+------+--------------------+------------------+------------------+ 136 | 0 | 1 | 829620.0(-2.61%) | 22820.0(+39.15%) | 11480.0(+42.79%) | 137 +---------+------+--------------------+------------------+------------------+ 138 | 1 | 0 | 104520.0(-74.34%) | 17200.0(+13.91%) | 8680.0(+20.56%) | 139 +---------+------+--------------------+------------------+------------------+ 140 | 1 | 1 | 249200.0(+124.54%) | 17100.0(+10.61%) | 8480.0(+29.27%) | 141 +---------+------+--------------------+------------------+------------------+ 142 | 1 | 2 | 393980.0(-28.95%) | 17480.0(+13.51%) | 8320.0(+19.88%) | 143 +---------+------+--------------------+------------------+------------------+ 144 | 1 | 3 | 539520.0(+108.34%) | 16980.0(+9.13%) | 8300.0(+25.00%) | 145 +---------+------+--------------------+------------------+------------------+ 146 147.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in 148 parallel (v2.13) 149 150 +---------+------+--------------------+------------------+-----------------+ 151 | Cluster | Core | Powerdown | Wakeup | Cache Flush | 152 +---------+------+--------------------+------------------+-----------------+ 153 | 0 | 0 | 703060.0(-17.69%) | 16860.0(-47.87%) | 7980.0(-19.88%) | 154 +---------+------+--------------------+------------------+-----------------+ 155 | 0 | 1 | 851880.0(+20.98%) | 16400.0(-49.41%) | 8040.0(-17.45%) | 156 +---------+------+--------------------+------------------+-----------------+ 157 | 1 | 0 | 407400.0(+58.99%) | 15100.0(-26.20%) | 7200.0(-5.76%) | 158 +---------+------+--------------------+------------------+-----------------+ 159 | 1 | 1 | 110980.0(-72.67%) | 15460.0(-23.47%) | 6560.0(-10.87%) | 160 +---------+------+--------------------+------------------+-----------------+ 161 | 1 | 2 | 554540.0 | 15400.0(-23.46%) | 6940.0(-2.53%) | 162 +---------+------+--------------------+------------------+-----------------+ 163 | 1 | 3 | 258960.0(+143.06%) | 15560.0(-25.05%) | 6640.0 | 164 +---------+------+--------------------+------------------+-----------------+ 165 166.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.14) 167 168 +---------+------+------------------+------------------+-----------------+ 169 | Cluster | Core | Powerdown | Wakeup | Cache Flush | 170 +---------+------+------------------+------------------+-----------------+ 171 | 0 | 0 | 101100.0(-4.73%) | 22820.0(+33.45%) | 7360.0(+39.92%) | 172 +---------+------+------------------+------------------+-----------------+ 173 | 0 | 1 | 101400.0(-5.13%) | 22720.0(+33.18%) | 7560.0(+43.18%) | 174 +---------+------+------------------+------------------+-----------------+ 175 | 1 | 0 | 291440.0 | 16880.0(+8.21%) | 4580.0 | 176 +---------+------+------------------+------------------+-----------------+ 177 | 1 | 1 | 96600.0(-6.45%) | 16860.0(+9.20%) | 4600.0(+3.14%) | 178 +---------+------+------------------+------------------+-----------------+ 179 | 1 | 2 | 97060.0(-6.40%) | 16980.0(+11.27%) | 4640.0(+3.11%) | 180 +---------+------+------------------+------------------+-----------------+ 181 | 1 | 3 | 96660.0(-6.77%) | 16960.0(+7.89%) | 4620.0(+2.67%) | 182 +---------+------+------------------+------------------+-----------------+ 183 184 185.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.13) 186 187 +---------+------+------------------+------------------+-----------------+ 188 | Cluster | Core | Powerdown | Wakeup | Cache Flush | 189 +---------+------+------------------+------------------+-----------------+ 190 | 0 | 0 | 106120.0(+1.49%) | 17100.0(-48.24%) | 5260.0(-23.77%) | 191 +---------+------+------------------+------------------+-----------------+ 192 | 0 | 1 | 106880.0(+2.40%) | 17060.0(-47.08%) | 5280.0(-21.89%) | 193 +---------+------+------------------+------------------+-----------------+ 194 | 1 | 0 | 294360.0 | 15600.0(-20.97%) | 4560.0 | 195 +---------+------+------------------+------------------+-----------------+ 196 | 1 | 1 | 103260.0(+3.82%) | 15440.0(-20.41%) | 4460.0(-5.11%) | 197 +---------+------+------------------+------------------+-----------------+ 198 | 1 | 2 | 103700.0(+4.33%) | 15260.0(-24.08%) | 4500.0(-2.60%) | 199 +---------+------+------------------+------------------+-----------------+ 200 | 1 | 3 | 103680.0(+4.26%) | 15720.0(-20.53%) | 4500.0(-1.32%) | 201 +---------+------+------------------+------------------+-----------------+ 202 203``CPU_OFF`` on all non-lead CPUs 204~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 205 206``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead 207core to the deepest power level. 208 209.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.14) 210 211 +---------+------+------------------+------------------+-------------------+ 212 | Cluster | Core | Powerdown | Wakeup | Cache Flush | 213 +---------+------+------------------+------------------+-------------------+ 214 | 0 | 0 | 267240.0(+9.97%) | 32940.0(+24.68%) | 168460.0(+22.45%) | 215 +---------+------+------------------+------------------+-------------------+ 216 | 0 | 1 | 267340.0(+9.46%) | 33720.0(+28.12%) | 168500.0(+22.21%) | 217 +---------+------+------------------+------------------+-------------------+ 218 | 1 | 0 | 185740.0(+1.85%) | 25120.0(+6.17%) | 88380.0(+13.31%) | 219 +---------+------+------------------+------------------+-------------------+ 220 | 1 | 1 | 101940.0(-5.77%) | 24240.0(+6.88%) | 4600.0(+4.07%) | 221 +---------+------+------------------+------------------+-------------------+ 222 | 1 | 2 | 101800.0(-6.04%) | 23060.0(+6.17%) | 4660.0(+9.91%) | 223 +---------+------+------------------+------------------+-------------------+ 224 | 1 | 3 | 101820.0(-5.91%) | 23340.0(+7.66%) | 4640.0(+6.91%) | 225 +---------+------+------------------+------------------+-------------------+ 226 227.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.13) 228 229 +---------+------+------------------+------------------+-------------------+ 230 | Cluster | Core | Powerdown | Wakeup | Cache Flush | 231 +---------+------+------------------+------------------+-------------------+ 232 | 0 | 0 | 243020.0(-9.14%) | 26420.0(-39.51%) | 137580.0(-17.85%) | 233 +---------+------+------------------+------------------+-------------------+ 234 | 0 | 1 | 244240.0(-8.87%) | 26320.0(-38.93%) | 137880.0(-17.73%) | 235 +---------+------+------------------+------------------+-------------------+ 236 | 1 | 0 | 182360.0(-2.89%) | 23660.0(-15.20%) | 78000.0(-11.08%) | 237 +---------+------+------------------+------------------+-------------------+ 238 | 1 | 1 | 108180.0(+4.68%) | 22680.0(-14.16%) | 4420.0 | 239 +---------+------+------------------+------------------+-------------------+ 240 | 1 | 2 | 108340.0(+4.92%) | 21720.0(-16.40%) | 4240.0(-4.93%) | 241 +---------+------+------------------+------------------+-------------------+ 242 | 1 | 3 | 108220.0(+4.82%) | 21680.0(-16.16%) | 4340.0(-3.12%) | 243 +---------+------+------------------+------------------+-------------------+ 244 245``CPU_VERSION`` in parallel 246~~~~~~~~~~~~~~~~~~~~~~~~~~~ 247 248.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.14) 249 250 +---------+------+--------------------+ 251 | Cluster | Core | Latency | 252 +---------+------+--------------------+ 253 | 0 | 0 | 1200.0(+20.00%) | 254 +---------+------+--------------------+ 255 | 0 | 1 | 1160.0(+9.43%) | 256 +---------+------+--------------------+ 257 | 1 | 0 | 700.0(+16.67%) | 258 +---------+------+--------------------+ 259 | 1 | 1 | 1040.0(+4.00%) | 260 +---------+------+--------------------+ 261 | 1 | 2 | 1020.0(+4.08%) | 262 +---------+------+--------------------+ 263 | 1 | 3 | 1080.0(+8.00%) | 264 +---------+------+--------------------+ 265 266.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.13) 267 268 +---------+------+--------------------+ 269 | Cluster | Core | Latency | 270 +---------+------+--------------------+ 271 | 0 | 0 | 1000.0(-19.35%) | 272 +---------+------+--------------------+ 273 | 0 | 1 | 1060.0(-17.19%) | 274 +---------+------+--------------------+ 275 | 1 | 0 | 600.0(-11.76%) | 276 +---------+------+--------------------+ 277 | 1 | 1 | 1000.0(+2.04%) | 278 +---------+------+--------------------+ 279 | 1 | 2 | 980.0(+4.26%) | 280 +---------+------+--------------------+ 281 | 1 | 3 | 1000.0(+2.04%) | 282 +---------+------+--------------------+ 283 284Annotated Historic Results 285-------------------------- 286 287The following results are based on the upstream `TF master as of 31/01/2017`_. 288TF-A was built using the same build instructions as detailed in the procedure 289above. 290 291In the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and 292CPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead 293CPU. 294 295``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and 296``CFLUSH_OVERHEAD`` the latency of the cache flush operation. 297 298``CPU_SUSPEND`` to deepest power level on all CPUs in parallel 299~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 300 301+-------+---------------------+--------------------+--------------------------+ 302| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 303+=======+=====================+====================+==========================+ 304| 0 | 27 | 20 | 5 | 305+-------+---------------------+--------------------+--------------------------+ 306| 1 | 114 | 86 | 5 | 307+-------+---------------------+--------------------+--------------------------+ 308| 2 | 202 | 58 | 5 | 309+-------+---------------------+--------------------+--------------------------+ 310| 3 | 375 | 29 | 94 | 311+-------+---------------------+--------------------+--------------------------+ 312| 4 | 20 | 22 | 6 | 313+-------+---------------------+--------------------+--------------------------+ 314| 5 | 290 | 18 | 206 | 315+-------+---------------------+--------------------+--------------------------+ 316 317A large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is 318observed due to TF PSCI lock contention. In the worst case, CPU 3 has to wait 319for the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release 320the lock before proceeding. 321 322The ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the 323last CPUs in their respective clusters to power down, therefore both the L1 and 324L2 caches are flushed. 325 326The ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3 327because the L2 cache size for the big cluster is lot larger (2MB) compared to 328the little cluster (1MB). 329 330``CPU_SUSPEND`` to power level 0 on all CPUs in parallel 331~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 332 333+-------+---------------------+--------------------+--------------------------+ 334| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 335+=======+=====================+====================+==========================+ 336| 0 | 116 | 14 | 8 | 337+-------+---------------------+--------------------+--------------------------+ 338| 1 | 204 | 14 | 8 | 339+-------+---------------------+--------------------+--------------------------+ 340| 2 | 287 | 13 | 8 | 341+-------+---------------------+--------------------+--------------------------+ 342| 3 | 376 | 13 | 9 | 343+-------+---------------------+--------------------+--------------------------+ 344| 4 | 29 | 15 | 7 | 345+-------+---------------------+--------------------+--------------------------+ 346| 5 | 21 | 15 | 8 | 347+-------+---------------------+--------------------+--------------------------+ 348 349There is no lock contention in TF generic code at power level 0 but the large 350variance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno 351platform code. The platform lock is used to mediate access to a single SCP 352communication channel. This is compounded by the SCP firmware waiting for each 353AP CPU to enter WFI before making the channel available to other CPUs, which 354effectively serializes the SCP power down commands from all CPUs. 355 356On platforms with a more efficient CPU power down mechanism, it should be 357possible to make the ``PSCI_ENTRY`` times smaller and consistent. 358 359The ``PSCI_EXIT`` times are consistent across all CPUs because TF does not 360require locks at power level 0. 361 362The ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only 363the cache associated with power level 0 is flushed (L1). 364 365``CPU_SUSPEND`` to deepest power level on all CPUs in sequence 366~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 367 368+-------+---------------------+--------------------+--------------------------+ 369| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 370+=======+=====================+====================+==========================+ 371| 0 | 114 | 20 | 94 | 372+-------+---------------------+--------------------+--------------------------+ 373| 1 | 114 | 20 | 94 | 374+-------+---------------------+--------------------+--------------------------+ 375| 2 | 114 | 20 | 94 | 376+-------+---------------------+--------------------+--------------------------+ 377| 3 | 114 | 20 | 94 | 378+-------+---------------------+--------------------+--------------------------+ 379| 4 | 195 | 22 | 180 | 380+-------+---------------------+--------------------+--------------------------+ 381| 5 | 21 | 17 | 6 | 382+-------+---------------------+--------------------+--------------------------+ 383 384The ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster 385are large because all other CPUs in the cluster are powered down during the 386test. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a 387flush of both L1 and L2 caches. 388 389The ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 390CPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 391to the little cluster (1MB). 392 393The ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead 394CPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to 395level 0, which only requires L1 cache flush. 396 397``CPU_SUSPEND`` to power level 0 on all CPUs in sequence 398~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 399 400+-------+---------------------+--------------------+--------------------------+ 401| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 402+=======+=====================+====================+==========================+ 403| 0 | 22 | 14 | 5 | 404+-------+---------------------+--------------------+--------------------------+ 405| 1 | 22 | 14 | 5 | 406+-------+---------------------+--------------------+--------------------------+ 407| 2 | 21 | 14 | 5 | 408+-------+---------------------+--------------------+--------------------------+ 409| 3 | 22 | 14 | 5 | 410+-------+---------------------+--------------------+--------------------------+ 411| 4 | 17 | 14 | 6 | 412+-------+---------------------+--------------------+--------------------------+ 413| 5 | 18 | 15 | 6 | 414+-------+---------------------+--------------------+--------------------------+ 415 416Here the times are small and consistent since there is no contention and it is 417only necessary to flush the cache to power level 0 (L1). This is the best case 418scenario. 419 420The ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than 421for the CPUs in little cluster due to greater CPU performance. 422 423The ``PSCI_EXIT`` times are generally lower than in the last test because the 424cluster remains powered on throughout the test and there is less code to execute 425on power on (for example, no need to enter CCI coherency) 426 427``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level 428~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 429 430The test sequence here is as follows: 431 4321. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence. 433 4342. Program wake up timer and suspend the lead CPU to the deepest power level. 435 4363. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU. 437 438+-------+---------------------+--------------------+--------------------------+ 439| CPU | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) | 440+=======+=====================+====================+==========================+ 441| 0 | 110 | 28 | 93 | 442+-------+---------------------+--------------------+--------------------------+ 443| 1 | 110 | 28 | 93 | 444+-------+---------------------+--------------------+--------------------------+ 445| 2 | 110 | 28 | 93 | 446+-------+---------------------+--------------------+--------------------------+ 447| 3 | 111 | 28 | 93 | 448+-------+---------------------+--------------------+--------------------------+ 449| 4 | 195 | 22 | 181 | 450+-------+---------------------+--------------------+--------------------------+ 451| 5 | 20 | 23 | 6 | 452+-------+---------------------+--------------------+--------------------------+ 453 454The ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other 455CPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call 456powers down to the cluster level, requiring a flush of both L1 and L2 caches. 457 458The ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because 459lead CPU 4 is running and CPU 5 only powers down to level 0, which only requires 460an L1 cache flush. 461 462The ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little 463CPUs because the L2 cache size for the big cluster is lot larger (2MB) compared 464to the little cluster (1MB). 465 466The ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than 467for CPUs in the little cluster due to greater CPU performance. These times 468generally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests 469because there is more code to execute in the "on finisher" compared to the 470"suspend finisher" (for example, GIC redistributor register programming). 471 472``PSCI_VERSION`` on all CPUs in parallel 473~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 474 475Since very little code is associated with ``PSCI_VERSION``, this test 476approximates the round trip latency for handling a fast SMC at EL3 in TF. 477 478+-------+-------------------+ 479| CPU | TOTAL TIME (ns) | 480+=======+===================+ 481| 0 | 3020 | 482+-------+-------------------+ 483| 1 | 2940 | 484+-------+-------------------+ 485| 2 | 2980 | 486+-------+-------------------+ 487| 3 | 3060 | 488+-------+-------------------+ 489| 4 | 520 | 490+-------+-------------------+ 491| 5 | 720 | 492+-------+-------------------+ 493 494The times for the big CPUs are less than the little CPUs due to greater CPU 495performance. 496 497We suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache 498effects, given that these measurements are at the nano-second level. 499 500-------------- 501 502*Copyright (c) 2019-2025, Arm Limited and Contributors. All rights reserved.* 503 504.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/ 505.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d 506.. _TF-A v2.14-rc0: https://git.trustedfirmware.org/plugins/gitiles/TF-A/trusted-firmware-a/+/refs/tags/v2.14-rc0 507.. _TFTF v2.14-rc0: https://git.trustedfirmware.org/plugins/gitiles/TF-A/tf-a-tests/+/refs/tags/v2.14-rc0 508