xref: /rk3399_ARM-atf/docs/perf/psci-performance-juno.rst (revision 77873ef1dd2627efce7f0e8416d7504710edb18b)
140d553cfSPaul BeesleyPSCI Performance Measurements on Arm Juno Development Platform
240d553cfSPaul Beesley==============================================================
340d553cfSPaul Beesley
440d553cfSPaul BeesleyThis document summarises the findings of performance measurements of key
5bd97f83aSJohn Tsichritzisoperations in the Trusted Firmware-A Power State Coordination Interface (PSCI)
6bd97f83aSJohn Tsichritzisimplementation, using the in-built Performance Measurement Framework (PMF) and
7bd97f83aSJohn Tsichritzisruntime instrumentation timestamps.
840d553cfSPaul Beesley
940d553cfSPaul BeesleyMethod
1040d553cfSPaul Beesley------
1140d553cfSPaul Beesley
1240d553cfSPaul BeesleyWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2
1340d553cfSPaul Beesleyx Cortex-A57 clusters running at the following frequencies:
1440d553cfSPaul Beesley
1540d553cfSPaul Beesley+-----------------+--------------------+
1640d553cfSPaul Beesley| Domain          | Frequency (MHz)    |
1740d553cfSPaul Beesley+=================+====================+
1840d553cfSPaul Beesley| Cortex-A57      | 900 (nominal)      |
1940d553cfSPaul Beesley+-----------------+--------------------+
2040d553cfSPaul Beesley| Cortex-A53      | 650 (underdrive)   |
2140d553cfSPaul Beesley+-----------------+--------------------+
2240d553cfSPaul Beesley| AXI subsystem   | 533                |
2340d553cfSPaul Beesley+-----------------+--------------------+
2440d553cfSPaul Beesley
2540d553cfSPaul BeesleyJuno supports CPU, cluster and system power down states, corresponding to power
2640d553cfSPaul Beesleylevels 0, 1 and 2 respectively. It does not support any retention states.
2740d553cfSPaul Beesley
28a3077ae1SHarrison MutaiGiven that runtime instrumentation using PMF is invasive, there is a small
29a3077ae1SHarrison Mutai(unquantified) overhead on the results. PMF uses the generic counter for
30a3077ae1SHarrison Mutaitimestamps, which runs at 50MHz on Juno.
31a3077ae1SHarrison Mutai
32a3077ae1SHarrison MutaiThe following source trees and binaries were used:
33a3077ae1SHarrison Mutai
34*97020355SXialin Liu- `TF-A v2.14-rc0`_
35*97020355SXialin Liu- `TFTF v2.14-rc0`_
36a3077ae1SHarrison Mutai
375fdf198cSThaddeus SernaPlease see the Runtime Instrumentation :ref:`Testing Methodology
385fdf198cSThaddeus Serna<Runtime Instrumentation Methodology>`
399b65ffefSBoyan Karatotevpage for more details. The tests were ran using the
409b65ffefSBoyan Karatotev`tf-psci-lava-instr/juno-enable-runtime-instr,juno-instrumentation:juno-tftf`
419b65ffefSBoyan Karatotevconfiguration in CI.
42a3077ae1SHarrison Mutai
43a3077ae1SHarrison MutaiResults
44a3077ae1SHarrison Mutai-------
45a3077ae1SHarrison Mutai
46a3077ae1SHarrison Mutai``CPU_SUSPEND`` to deepest power level
47a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48a3077ae1SHarrison Mutai
49a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
50*97020355SXialin Liu        parallel (v2.14)
515059fea0SBoyan Karatotev
52*97020355SXialin Liu    +---------+------+------------------+---------------------+-------------------+
535059fea0SBoyan Karatotev    | Cluster | Core |    Powerdown     |        Wakeup       |    Cache Flush    |
54*97020355SXialin Liu    +---------+------+------------------+---------------------+-------------------+
55*97020355SXialin Liu    |    0    |  0   |     332440.0     | 270640.0(+1031.44%) | 169500.0(+22.05%) |
56*97020355SXialin Liu    +---------+------+------------------+---------------------+-------------------+
57*97020355SXialin Liu    |    0    |  1   | 624520.0(-1.01%) |   30260.0(-88.07%)  | 166740.0(+21.76%) |
58*97020355SXialin Liu    +---------+------+------------------+---------------------+-------------------+
59*97020355SXialin Liu    |    1    |  0   | 187960.0(+1.74%) |   25460.0(+9.93%)   |  90420.0(+12.69%) |
60*97020355SXialin Liu    +---------+------+------------------+---------------------+-------------------+
61*97020355SXialin Liu    |    1    |  1   |     479100.0     |   20520.0(+10.56%)  |  87500.0(+14.38%) |
62*97020355SXialin Liu    +---------+------+------------------+---------------------+-------------------+
63*97020355SXialin Liu    |    1    |  2   | 923480.0(-1.11%) |   294160.0(+1.58%)  |  87500.0(+14.62%) |
64*97020355SXialin Liu    +---------+------+------------------+---------------------+-------------------+
65*97020355SXialin Liu    |    1    |  3   |    1106300.0     |       238320.0      |  87340.0(+14.35%) |
66*97020355SXialin Liu    +---------+------+------------------+---------------------+-------------------+
675059fea0SBoyan Karatotev
685059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
69*97020355SXialin Liu        parallel (v2.13)
70a0db5c74SZachary Leaf
71*97020355SXialin Liu    +---------+------+--------------------+--------------------+---------------------+
72a0db5c74SZachary Leaf    | Cluster | Core |     Powerdown      |       Wakeup       |     Cache Flush     |
73*97020355SXialin Liu    +---------+------+--------------------+--------------------+---------------------+
74*97020355SXialin Liu    |    0    |  0   | 333000.0(-52.92%)  |  23920.0(-40.11%)  |  138880.0(-17.24%)  |
75*97020355SXialin Liu    +---------+------+--------------------+--------------------+---------------------+
76*97020355SXialin Liu    |    0    |  1   | 630900.0(+145.95%) | 253720.0(-46.56%)  | 136940.0(+1987.50%) |
77*97020355SXialin Liu    +---------+------+--------------------+--------------------+---------------------+
78*97020355SXialin Liu    |    1    |  0   | 184740.0(+71.92%)  |  23160.0(-95.39%)  |  80240.0(+1283.45%) |
79*97020355SXialin Liu    +---------+------+--------------------+--------------------+---------------------+
80*97020355SXialin Liu    |    1    |  1   | 481140.0(+18.16%)  |  18560.0(-88.25%)  |  76500.0(+1520.76%) |
81*97020355SXialin Liu    +---------+------+--------------------+--------------------+---------------------+
82*97020355SXialin Liu    |    1    |  2   | 933880.0(+67.76%)  | 289580.0(+189.64%) |  76340.0(+1510.55%) |
83*97020355SXialin Liu    +---------+------+--------------------+--------------------+---------------------+
84*97020355SXialin Liu    |    1    |  3   | 1112480.0(+9.76%)  | 238420.0(+753.94%) |   76380.0(-15.32%)  |
85*97020355SXialin Liu    +---------+------+--------------------+--------------------+---------------------+
86*97020355SXialin Liu
87*97020355SXialin Liu.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
88*97020355SXialin Liu        serial (v2.14)
89*97020355SXialin Liu
90*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
91*97020355SXialin Liu    | Cluster | Core |    Powerdown     |      Wakeup      |    Cache Flush    |
92*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
93*97020355SXialin Liu    |    0    |  0   | 267000.0(+9.39%) | 31080.0(+26.96%) | 168520.0(+22.44%) |
94*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
95*97020355SXialin Liu    |    0    |  1   | 267440.0(+9.52%) | 30680.0(+28.69%) | 168480.0(+22.21%) |
96*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
97*97020355SXialin Liu    |    1    |  0   | 291300.0(-1.18%) | 25140.0(+6.80%)  |  86980.0(+13.52%) |
98*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
99*97020355SXialin Liu    |    1    |  1   | 184260.0(+2.31%) | 23140.0(+9.46%)  |  87940.0(+14.03%) |
100*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
101*97020355SXialin Liu    |    1    |  2   | 184520.0(+2.20%) | 23460.0(+12.79%) |  87520.0(+14.02%) |
102*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
103*97020355SXialin Liu    |    1    |  3   | 184700.0(+2.27%) | 23240.0(+9.62%)  |  87180.0(+13.43%) |
104*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
105a0db5c74SZachary Leaf
106a0db5c74SZachary Leaf.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
1075059fea0SBoyan Karatotev        serial (v2.13)
108a3077ae1SHarrison Mutai
109*97020355SXialin Liu    +---------+------+-------------------+------------------+--------------------+
11094276a56SHarrison Mutai    | Cluster | Core |     Powerdown     |      Wakeup      |    Cache Flush     |
111*97020355SXialin Liu    +---------+------+-------------------+------------------+--------------------+
112*97020355SXialin Liu    |    0    |  0   |  244080.0(-9.21%) | 24480.0(-40.00%) | 137640.0(-18.19%)  |
113*97020355SXialin Liu    +---------+------+-------------------+------------------+--------------------+
114*97020355SXialin Liu    |    0    |  1   |  244200.0(-9.06%) | 23840.0(-41.57%) | 137860.0(-17.91%)  |
115*97020355SXialin Liu    +---------+------+-------------------+------------------+--------------------+
116*97020355SXialin Liu    |    1    |  0   |  294780.0(-1.56%) | 23540.0(-14.83%) |  76620.0(-12.35%)  |
117*97020355SXialin Liu    +---------+------+-------------------+------------------+--------------------+
118*97020355SXialin Liu    |    1    |  1   | 180100.0(+74.72%) | 21140.0(-6.63%)  | 77120.0(+1533.90%) |
119*97020355SXialin Liu    +---------+------+-------------------+------------------+--------------------+
120*97020355SXialin Liu    |    1    |  2   | 180540.0(+75.25%) | 20800.0(-10.34%) | 76760.0(+1554.31%) |
121*97020355SXialin Liu    +---------+------+-------------------+------------------+--------------------+
122*97020355SXialin Liu    |    1    |  3   | 180600.0(+75.44%) | 21200.0(-7.99%)  | 76860.0(+1542.31%) |
123*97020355SXialin Liu    +---------+------+-------------------+------------------+--------------------+
12494276a56SHarrison Mutai
125a3077ae1SHarrison Mutai``CPU_SUSPEND`` to power level 0
126a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
127a3077ae1SHarrison Mutai
128a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
129*97020355SXialin Liu        parallel (v2.14)
1305059fea0SBoyan Karatotev
131*97020355SXialin Liu    +---------+------+--------------------+------------------+------------------+
1325059fea0SBoyan Karatotev    | Cluster | Core |     Powerdown      |      Wakeup      |   Cache Flush    |
133*97020355SXialin Liu    +---------+------+--------------------+------------------+------------------+
134*97020355SXialin Liu    |    0    |  0   |  683780.0(-2.74%)  | 22560.0(+33.81%) | 11040.0(+38.35%) |
135*97020355SXialin Liu    +---------+------+--------------------+------------------+------------------+
136*97020355SXialin Liu    |    0    |  1   |  829620.0(-2.61%)  | 22820.0(+39.15%) | 11480.0(+42.79%) |
137*97020355SXialin Liu    +---------+------+--------------------+------------------+------------------+
138*97020355SXialin Liu    |    1    |  0   | 104520.0(-74.34%)  | 17200.0(+13.91%) | 8680.0(+20.56%)  |
139*97020355SXialin Liu    +---------+------+--------------------+------------------+------------------+
140*97020355SXialin Liu    |    1    |  1   | 249200.0(+124.54%) | 17100.0(+10.61%) | 8480.0(+29.27%)  |
141*97020355SXialin Liu    +---------+------+--------------------+------------------+------------------+
142*97020355SXialin Liu    |    1    |  2   | 393980.0(-28.95%)  | 17480.0(+13.51%) | 8320.0(+19.88%)  |
143*97020355SXialin Liu    +---------+------+--------------------+------------------+------------------+
144*97020355SXialin Liu    |    1    |  3   | 539520.0(+108.34%) | 16980.0(+9.13%)  | 8300.0(+25.00%)  |
145*97020355SXialin Liu    +---------+------+--------------------+------------------+------------------+
1465059fea0SBoyan Karatotev
1475059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
148*97020355SXialin Liu        parallel (v2.13)
149a0db5c74SZachary Leaf
150*97020355SXialin Liu    +---------+------+--------------------+------------------+-----------------+
151a0db5c74SZachary Leaf    | Cluster | Core |     Powerdown      |      Wakeup      |   Cache Flush   |
152*97020355SXialin Liu    +---------+------+--------------------+------------------+-----------------+
153*97020355SXialin Liu    |    0    |  0   | 703060.0(-17.69%)  | 16860.0(-47.87%) | 7980.0(-19.88%) |
154*97020355SXialin Liu    +---------+------+--------------------+------------------+-----------------+
155*97020355SXialin Liu    |    0    |  1   | 851880.0(+20.98%)  | 16400.0(-49.41%) | 8040.0(-17.45%) |
156*97020355SXialin Liu    +---------+------+--------------------+------------------+-----------------+
157*97020355SXialin Liu    |    1    |  0   | 407400.0(+58.99%)  | 15100.0(-26.20%) |  7200.0(-5.76%) |
158*97020355SXialin Liu    +---------+------+--------------------+------------------+-----------------+
159*97020355SXialin Liu    |    1    |  1   | 110980.0(-72.67%)  | 15460.0(-23.47%) | 6560.0(-10.87%) |
160*97020355SXialin Liu    +---------+------+--------------------+------------------+-----------------+
161*97020355SXialin Liu    |    1    |  2   |      554540.0      | 15400.0(-23.46%) |  6940.0(-2.53%) |
162*97020355SXialin Liu    +---------+------+--------------------+------------------+-----------------+
163*97020355SXialin Liu    |    1    |  3   | 258960.0(+143.06%) | 15560.0(-25.05%) |      6640.0     |
164*97020355SXialin Liu    +---------+------+--------------------+------------------+-----------------+
165*97020355SXialin Liu
166*97020355SXialin Liu.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.14)
167*97020355SXialin Liu
168*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
169*97020355SXialin Liu    | Cluster | Core |    Powerdown     |      Wakeup      |   Cache Flush   |
170*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
171*97020355SXialin Liu    |    0    |  0   | 101100.0(-4.73%) | 22820.0(+33.45%) | 7360.0(+39.92%) |
172*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
173*97020355SXialin Liu    |    0    |  1   | 101400.0(-5.13%) | 22720.0(+33.18%) | 7560.0(+43.18%) |
174*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
175*97020355SXialin Liu    |    1    |  0   |     291440.0     | 16880.0(+8.21%)  |      4580.0     |
176*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
177*97020355SXialin Liu    |    1    |  1   | 96600.0(-6.45%)  | 16860.0(+9.20%)  |  4600.0(+3.14%) |
178*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
179*97020355SXialin Liu    |    1    |  2   | 97060.0(-6.40%)  | 16980.0(+11.27%) |  4640.0(+3.11%) |
180*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
181*97020355SXialin Liu    |    1    |  3   | 96660.0(-6.77%)  | 16960.0(+7.89%)  |  4620.0(+2.67%) |
182*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
183*97020355SXialin Liu
184a0db5c74SZachary Leaf
1855059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.13)
186a3077ae1SHarrison Mutai
187*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
18894276a56SHarrison Mutai    | Cluster | Core |    Powerdown     |      Wakeup      |   Cache Flush   |
189*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
190*97020355SXialin Liu    |    0    |  0   | 106120.0(+1.49%) | 17100.0(-48.24%) | 5260.0(-23.77%) |
191*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
192*97020355SXialin Liu    |    0    |  1   | 106880.0(+2.40%) | 17060.0(-47.08%) | 5280.0(-21.89%) |
193*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
194*97020355SXialin Liu    |    1    |  0   |     294360.0     | 15600.0(-20.97%) |      4560.0     |
195*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
196*97020355SXialin Liu    |    1    |  1   | 103260.0(+3.82%) | 15440.0(-20.41%) |  4460.0(-5.11%) |
197*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
198*97020355SXialin Liu    |    1    |  2   | 103700.0(+4.33%) | 15260.0(-24.08%) |  4500.0(-2.60%) |
199*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
200*97020355SXialin Liu    |    1    |  3   | 103680.0(+4.26%) | 15720.0(-20.53%) |  4500.0(-1.32%) |
201*97020355SXialin Liu    +---------+------+------------------+------------------+-----------------+
20294276a56SHarrison Mutai
203a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs
204a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
205a3077ae1SHarrison Mutai
206a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
207a3077ae1SHarrison Mutaicore to the deepest power level.
208a3077ae1SHarrison Mutai
209*97020355SXialin Liu.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.14)
210*97020355SXialin Liu
211*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
212*97020355SXialin Liu    | Cluster | Core |    Powerdown     |      Wakeup      |    Cache Flush    |
213*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
214*97020355SXialin Liu    |    0    |  0   | 267240.0(+9.97%) | 32940.0(+24.68%) | 168460.0(+22.45%) |
215*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
216*97020355SXialin Liu    |    0    |  1   | 267340.0(+9.46%) | 33720.0(+28.12%) | 168500.0(+22.21%) |
217*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
218*97020355SXialin Liu    |    1    |  0   | 185740.0(+1.85%) | 25120.0(+6.17%)  |  88380.0(+13.31%) |
219*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
220*97020355SXialin Liu    |    1    |  1   | 101940.0(-5.77%) | 24240.0(+6.88%)  |   4600.0(+4.07%)  |
221*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
222*97020355SXialin Liu    |    1    |  2   | 101800.0(-6.04%) | 23060.0(+6.17%)  |   4660.0(+9.91%)  |
223*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
224*97020355SXialin Liu    |    1    |  3   | 101820.0(-5.91%) | 23340.0(+7.66%)  |   4640.0(+6.91%)  |
225*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
226*97020355SXialin Liu
2275059fea0SBoyan Karatotev.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.13)
2285059fea0SBoyan Karatotev
229*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
2305059fea0SBoyan Karatotev    | Cluster | Core |    Powerdown     |      Wakeup      |    Cache Flush    |
231*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
232*97020355SXialin Liu    |    0    |  0   | 243020.0(-9.14%) | 26420.0(-39.51%) | 137580.0(-17.85%) |
233*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
234*97020355SXialin Liu    |    0    |  1   | 244240.0(-8.87%) | 26320.0(-38.93%) | 137880.0(-17.73%) |
235*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
236*97020355SXialin Liu    |    1    |  0   | 182360.0(-2.89%) | 23660.0(-15.20%) |  78000.0(-11.08%) |
237*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
238*97020355SXialin Liu    |    1    |  1   | 108180.0(+4.68%) | 22680.0(-14.16%) |       4420.0      |
239*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
240*97020355SXialin Liu    |    1    |  2   | 108340.0(+4.92%) | 21720.0(-16.40%) |   4240.0(-4.93%)  |
241*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
242*97020355SXialin Liu    |    1    |  3   | 108220.0(+4.82%) | 21680.0(-16.16%) |   4340.0(-3.12%)  |
243*97020355SXialin Liu    +---------+------+------------------+------------------+-------------------+
244a0db5c74SZachary Leaf
245a3077ae1SHarrison Mutai``CPU_VERSION`` in parallel
246a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~
247a3077ae1SHarrison Mutai
248*97020355SXialin Liu.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.14)
249*97020355SXialin Liu
250*97020355SXialin Liu    +---------+------+--------------------+
251*97020355SXialin Liu    | Cluster | Core |      Latency       |
252*97020355SXialin Liu    +---------+------+--------------------+
253*97020355SXialin Liu    |    0    |  0   |  1200.0(+20.00%)   |
254*97020355SXialin Liu    +---------+------+--------------------+
255*97020355SXialin Liu    |    0    |  1   |   1160.0(+9.43%)   |
256*97020355SXialin Liu    +---------+------+--------------------+
257*97020355SXialin Liu    |    1    |  0   |   700.0(+16.67%)   |
258*97020355SXialin Liu    +---------+------+--------------------+
259*97020355SXialin Liu    |    1    |  1   |   1040.0(+4.00%)   |
260*97020355SXialin Liu    +---------+------+--------------------+
261*97020355SXialin Liu    |    1    |  2   |   1020.0(+4.08%)   |
262*97020355SXialin Liu    +---------+------+--------------------+
263*97020355SXialin Liu    |    1    |  3   |   1080.0(+8.00%)   |
264*97020355SXialin Liu    +---------+------+--------------------+
265*97020355SXialin Liu
2665059fea0SBoyan Karatotev.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.13)
2675059fea0SBoyan Karatotev
268*97020355SXialin Liu    +---------+------+--------------------+
2695059fea0SBoyan Karatotev    | Cluster | Core |      Latency       |
270*97020355SXialin Liu    +---------+------+--------------------+
271*97020355SXialin Liu    |    0    |  0   |  1000.0(-19.35%)   |
272*97020355SXialin Liu    +---------+------+--------------------+
273*97020355SXialin Liu    |    0    |  1   |  1060.0(-17.19%)   |
274*97020355SXialin Liu    +---------+------+--------------------+
275*97020355SXialin Liu    |    1    |  0   |   600.0(-11.76%)   |
276*97020355SXialin Liu    +---------+------+--------------------+
277*97020355SXialin Liu    |    1    |  1   |   1000.0(+2.04%)   |
278*97020355SXialin Liu    +---------+------+--------------------+
279*97020355SXialin Liu    |    1    |  2   |   980.0(+4.26%)    |
280*97020355SXialin Liu    +---------+------+--------------------+
281*97020355SXialin Liu    |    1    |  3   |   1000.0(+2.04%)   |
282*97020355SXialin Liu    +---------+------+--------------------+
283a0db5c74SZachary Leaf
284a3077ae1SHarrison MutaiAnnotated Historic Results
285a3077ae1SHarrison Mutai--------------------------
286a3077ae1SHarrison Mutai
287a3077ae1SHarrison MutaiThe following results are based on the upstream `TF master as of 31/01/2017`_.
288a3077ae1SHarrison MutaiTF-A was built using the same build instructions as detailed in the procedure
289a3077ae1SHarrison Mutaiabove.
29040d553cfSPaul Beesley
29140d553cfSPaul BeesleyIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
29240d553cfSPaul BeesleyCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
29340d553cfSPaul BeesleyCPU.
29440d553cfSPaul Beesley
295a3077ae1SHarrison Mutai``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and
296a3077ae1SHarrison Mutai``CFLUSH_OVERHEAD`` the latency of the cache flush operation.
29740d553cfSPaul Beesley
29840d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
29940d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30040d553cfSPaul Beesley
30140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
30240d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
30340d553cfSPaul Beesley+=======+=====================+====================+==========================+
30440d553cfSPaul Beesley| 0     | 27                  | 20                 | 5                        |
30540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
30640d553cfSPaul Beesley| 1     | 114                 | 86                 | 5                        |
30740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
30840d553cfSPaul Beesley| 2     | 202                 | 58                 | 5                        |
30940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
31040d553cfSPaul Beesley| 3     | 375                 | 29                 | 94                       |
31140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
31240d553cfSPaul Beesley| 4     | 20                  | 22                 | 6                        |
31340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
31440d553cfSPaul Beesley| 5     | 290                 | 18                 | 206                      |
31540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
31640d553cfSPaul Beesley
31740d553cfSPaul BeesleyA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is
31840d553cfSPaul Beesleyobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait
31940d553cfSPaul Beesleyfor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release
32040d553cfSPaul Beesleythe lock before proceeding.
32140d553cfSPaul Beesley
32240d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the
32340d553cfSPaul Beesleylast CPUs in their respective clusters to power down, therefore both the L1 and
32440d553cfSPaul BeesleyL2 caches are flushed.
32540d553cfSPaul Beesley
32640d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3
32740d553cfSPaul Beesleybecause the L2 cache size for the big cluster is lot larger (2MB) compared to
32840d553cfSPaul Beesleythe little cluster (1MB).
32940d553cfSPaul Beesley
33040d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in parallel
33140d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
33240d553cfSPaul Beesley
33340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
33440d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
33540d553cfSPaul Beesley+=======+=====================+====================+==========================+
33640d553cfSPaul Beesley| 0     | 116                 | 14                 | 8                        |
33740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
33840d553cfSPaul Beesley| 1     | 204                 | 14                 | 8                        |
33940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34040d553cfSPaul Beesley| 2     | 287                 | 13                 | 8                        |
34140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34240d553cfSPaul Beesley| 3     | 376                 | 13                 | 9                        |
34340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34440d553cfSPaul Beesley| 4     | 29                  | 15                 | 7                        |
34540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34640d553cfSPaul Beesley| 5     | 21                  | 15                 | 8                        |
34740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34840d553cfSPaul Beesley
34940d553cfSPaul BeesleyThere is no lock contention in TF generic code at power level 0 but the large
35040d553cfSPaul Beesleyvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno
35140d553cfSPaul Beesleyplatform code. The platform lock is used to mediate access to a single SCP
35240d553cfSPaul Beesleycommunication channel. This is compounded by the SCP firmware waiting for each
35340d553cfSPaul BeesleyAP CPU to enter WFI before making the channel available to other CPUs, which
35440d553cfSPaul Beesleyeffectively serializes the SCP power down commands from all CPUs.
35540d553cfSPaul Beesley
35640d553cfSPaul BeesleyOn platforms with a more efficient CPU power down mechanism, it should be
35740d553cfSPaul Beesleypossible to make the ``PSCI_ENTRY`` times smaller and consistent.
35840d553cfSPaul Beesley
35940d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not
36040d553cfSPaul Beesleyrequire locks at power level 0.
36140d553cfSPaul Beesley
36240d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only
36340d553cfSPaul Beesleythe cache associated with power level 0 is flushed (L1).
36440d553cfSPaul Beesley
36540d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in sequence
36640d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
36740d553cfSPaul Beesley
36840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
36940d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
37040d553cfSPaul Beesley+=======+=====================+====================+==========================+
37140d553cfSPaul Beesley| 0     | 114                 | 20                 | 94                       |
37240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
37340d553cfSPaul Beesley| 1     | 114                 | 20                 | 94                       |
37440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
37540d553cfSPaul Beesley| 2     | 114                 | 20                 | 94                       |
37640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
37740d553cfSPaul Beesley| 3     | 114                 | 20                 | 94                       |
37840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
37940d553cfSPaul Beesley| 4     | 195                 | 22                 | 180                      |
38040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
38140d553cfSPaul Beesley| 5     | 21                  | 17                 | 6                        |
38240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
38340d553cfSPaul Beesley
384be653a69SPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster
38540d553cfSPaul Beesleyare large because all other CPUs in the cluster are powered down during the
38640d553cfSPaul Beesleytest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a
38740d553cfSPaul Beesleyflush of both L1 and L2 caches.
38840d553cfSPaul Beesley
38940d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
39040d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
39140d553cfSPaul Beesleyto the little cluster (1MB).
39240d553cfSPaul Beesley
39340d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead
39440d553cfSPaul BeesleyCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to
39540d553cfSPaul Beesleylevel 0, which only requires L1 cache flush.
39640d553cfSPaul Beesley
39740d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in sequence
39840d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
39940d553cfSPaul Beesley
40040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40140d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
40240d553cfSPaul Beesley+=======+=====================+====================+==========================+
40340d553cfSPaul Beesley| 0     | 22                  | 14                 | 5                        |
40440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40540d553cfSPaul Beesley| 1     | 22                  | 14                 | 5                        |
40640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40740d553cfSPaul Beesley| 2     | 21                  | 14                 | 5                        |
40840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40940d553cfSPaul Beesley| 3     | 22                  | 14                 | 5                        |
41040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
41140d553cfSPaul Beesley| 4     | 17                  | 14                 | 6                        |
41240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
41340d553cfSPaul Beesley| 5     | 18                  | 15                 | 6                        |
41440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
41540d553cfSPaul Beesley
41640d553cfSPaul BeesleyHere the times are small and consistent since there is no contention and it is
41740d553cfSPaul Beesleyonly necessary to flush the cache to power level 0 (L1). This is the best case
41840d553cfSPaul Beesleyscenario.
41940d553cfSPaul Beesley
42040d553cfSPaul BeesleyThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than
42140d553cfSPaul Beesleyfor the CPUs in little cluster due to greater CPU performance.
42240d553cfSPaul Beesley
42340d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are generally lower than in the last test because the
42440d553cfSPaul Beesleycluster remains powered on throughout the test and there is less code to execute
42540d553cfSPaul Beesleyon power on (for example, no need to enter CCI coherency)
42640d553cfSPaul Beesley
42740d553cfSPaul Beesley``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level
42840d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
42940d553cfSPaul Beesley
43040d553cfSPaul BeesleyThe test sequence here is as follows:
43140d553cfSPaul Beesley
43240d553cfSPaul Beesley1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence.
43340d553cfSPaul Beesley
43440d553cfSPaul Beesley2. Program wake up timer and suspend the lead CPU to the deepest power level.
43540d553cfSPaul Beesley
43640d553cfSPaul Beesley3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU.
43740d553cfSPaul Beesley
43840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
43940d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
44040d553cfSPaul Beesley+=======+=====================+====================+==========================+
44140d553cfSPaul Beesley| 0     | 110                 | 28                 | 93                       |
44240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
44340d553cfSPaul Beesley| 1     | 110                 | 28                 | 93                       |
44440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
44540d553cfSPaul Beesley| 2     | 110                 | 28                 | 93                       |
44640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
44740d553cfSPaul Beesley| 3     | 111                 | 28                 | 93                       |
44840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
44940d553cfSPaul Beesley| 4     | 195                 | 22                 | 181                      |
45040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
45140d553cfSPaul Beesley| 5     | 20                  | 23                 | 6                        |
45240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
45340d553cfSPaul Beesley
45440d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other
45540d553cfSPaul BeesleyCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call
45640d553cfSPaul Beesleypowers down to the cluster level, requiring a flush of both L1 and L2 caches.
45740d553cfSPaul Beesley
45840d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because
45940d553cfSPaul Beesleylead CPU 4 is running and CPU 5 only powers down to level 0, which only requires
46040d553cfSPaul Beesleyan L1 cache flush.
46140d553cfSPaul Beesley
46240d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
46340d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
46440d553cfSPaul Beesleyto the little cluster (1MB).
46540d553cfSPaul Beesley
46640d553cfSPaul BeesleyThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than
46740d553cfSPaul Beesleyfor CPUs in the little cluster due to greater CPU performance.  These times
46840d553cfSPaul Beesleygenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests
46940d553cfSPaul Beesleybecause there is more code to execute in the "on finisher" compared to the
47040d553cfSPaul Beesley"suspend finisher" (for example, GIC redistributor register programming).
47140d553cfSPaul Beesley
47240d553cfSPaul Beesley``PSCI_VERSION`` on all CPUs in parallel
47340d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
47440d553cfSPaul Beesley
47540d553cfSPaul BeesleySince very little code is associated with ``PSCI_VERSION``, this test
47640d553cfSPaul Beesleyapproximates the round trip latency for handling a fast SMC at EL3 in TF.
47740d553cfSPaul Beesley
47840d553cfSPaul Beesley+-------+-------------------+
47940d553cfSPaul Beesley| CPU   | TOTAL TIME (ns)   |
48040d553cfSPaul Beesley+=======+===================+
48140d553cfSPaul Beesley| 0     | 3020              |
48240d553cfSPaul Beesley+-------+-------------------+
48340d553cfSPaul Beesley| 1     | 2940              |
48440d553cfSPaul Beesley+-------+-------------------+
48540d553cfSPaul Beesley| 2     | 2980              |
48640d553cfSPaul Beesley+-------+-------------------+
48740d553cfSPaul Beesley| 3     | 3060              |
48840d553cfSPaul Beesley+-------+-------------------+
48940d553cfSPaul Beesley| 4     | 520               |
49040d553cfSPaul Beesley+-------+-------------------+
49140d553cfSPaul Beesley| 5     | 720               |
49240d553cfSPaul Beesley+-------+-------------------+
49340d553cfSPaul Beesley
49440d553cfSPaul BeesleyThe times for the big CPUs are less than the little CPUs due to greater CPU
49540d553cfSPaul Beesleyperformance.
49640d553cfSPaul Beesley
49740d553cfSPaul BeesleyWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache
49840d553cfSPaul Beesleyeffects, given that these measurements are at the nano-second level.
49940d553cfSPaul Beesley
500bd97f83aSJohn Tsichritzis--------------
501bd97f83aSJohn Tsichritzis
5029b65ffefSBoyan Karatotev*Copyright (c) 2019-2025, Arm Limited and Contributors. All rights reserved.*
503bd97f83aSJohn Tsichritzis
5040cbcccc0SHarrison Mutai.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
50540d553cfSPaul Beesley.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
506*97020355SXialin Liu.. _TF-A v2.14-rc0: https://git.trustedfirmware.org/plugins/gitiles/TF-A/trusted-firmware-a/+/refs/tags/v2.14-rc0
507*97020355SXialin Liu.. _TFTF v2.14-rc0: https://git.trustedfirmware.org/plugins/gitiles/TF-A/tf-a-tests/+/refs/tags/v2.14-rc0
508