xref: /rk3399_ARM-atf/docs/perf/psci-performance-juno.rst (revision 5059fea0602d19e52431dc184aa56c9bed3b8999)
140d553cfSPaul BeesleyPSCI Performance Measurements on Arm Juno Development Platform
240d553cfSPaul Beesley==============================================================
340d553cfSPaul Beesley
440d553cfSPaul BeesleyThis document summarises the findings of performance measurements of key
5bd97f83aSJohn Tsichritzisoperations in the Trusted Firmware-A Power State Coordination Interface (PSCI)
6bd97f83aSJohn Tsichritzisimplementation, using the in-built Performance Measurement Framework (PMF) and
7bd97f83aSJohn Tsichritzisruntime instrumentation timestamps.
840d553cfSPaul Beesley
940d553cfSPaul BeesleyMethod
1040d553cfSPaul Beesley------
1140d553cfSPaul Beesley
1240d553cfSPaul BeesleyWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2
1340d553cfSPaul Beesleyx Cortex-A57 clusters running at the following frequencies:
1440d553cfSPaul Beesley
1540d553cfSPaul Beesley+-----------------+--------------------+
1640d553cfSPaul Beesley| Domain          | Frequency (MHz)    |
1740d553cfSPaul Beesley+=================+====================+
1840d553cfSPaul Beesley| Cortex-A57      | 900 (nominal)      |
1940d553cfSPaul Beesley+-----------------+--------------------+
2040d553cfSPaul Beesley| Cortex-A53      | 650 (underdrive)   |
2140d553cfSPaul Beesley+-----------------+--------------------+
2240d553cfSPaul Beesley| AXI subsystem   | 533                |
2340d553cfSPaul Beesley+-----------------+--------------------+
2440d553cfSPaul Beesley
2540d553cfSPaul BeesleyJuno supports CPU, cluster and system power down states, corresponding to power
2640d553cfSPaul Beesleylevels 0, 1 and 2 respectively. It does not support any retention states.
2740d553cfSPaul Beesley
28a3077ae1SHarrison MutaiGiven that runtime instrumentation using PMF is invasive, there is a small
29a3077ae1SHarrison Mutai(unquantified) overhead on the results. PMF uses the generic counter for
30a3077ae1SHarrison Mutaitimestamps, which runs at 50MHz on Juno.
31a3077ae1SHarrison Mutai
32a3077ae1SHarrison MutaiThe following source trees and binaries were used:
33a3077ae1SHarrison Mutai
34*5059fea0SBoyan Karatotev- `TF-A v2.13-rc0`_
35*5059fea0SBoyan Karatotev- `TFTF v2.13-rc0`_
36a3077ae1SHarrison Mutai
375fdf198cSThaddeus SernaPlease see the Runtime Instrumentation :ref:`Testing Methodology
385fdf198cSThaddeus Serna<Runtime Instrumentation Methodology>`
399b65ffefSBoyan Karatotevpage for more details. The tests were ran using the
409b65ffefSBoyan Karatotev`tf-psci-lava-instr/juno-enable-runtime-instr,juno-instrumentation:juno-tftf`
419b65ffefSBoyan Karatotevconfiguration in CI.
42a3077ae1SHarrison Mutai
43a3077ae1SHarrison MutaiResults
44a3077ae1SHarrison Mutai-------
45a3077ae1SHarrison Mutai
46a3077ae1SHarrison Mutai``CPU_SUSPEND`` to deepest power level
47a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48a3077ae1SHarrison Mutai
49a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
50*5059fea0SBoyan Karatotev        parallel (v2.13)
51*5059fea0SBoyan Karatotev
52*5059fea0SBoyan Karatotev    +---------+------+------------------+-------------------+--------------------+
53*5059fea0SBoyan Karatotev    | Cluster | Core |    Powerdown     |       Wakeup      |    Cache Flush     |
54*5059fea0SBoyan Karatotev    +---------+------+------------------+-------------------+--------------------+
55*5059fea0SBoyan Karatotev    |    0    |  0   | 333.0 (-52.92%)  |  23.92 (-40.11%)  |       138.88       |
56*5059fea0SBoyan Karatotev    +---------+------+------------------+-------------------+--------------------+
57*5059fea0SBoyan Karatotev    |    0    |  1   | 630.9 (+145.95%) |  253.72 (-46.56%) | 136.94 (+1987.50%) |
58*5059fea0SBoyan Karatotev    +---------+------+------------------+-------------------+--------------------+
59*5059fea0SBoyan Karatotev    |    1    |  0   | 184.74 (+71.92%) |  23.16 (-95.39%)  | 80.24 (+1283.45%)  |
60*5059fea0SBoyan Karatotev    +---------+------+------------------+-------------------+--------------------+
61*5059fea0SBoyan Karatotev    |    1    |  1   |      481.14      |  18.56 (-88.25%)  |  76.5 (+1520.76%)  |
62*5059fea0SBoyan Karatotev    +---------+------+------------------+-------------------+--------------------+
63*5059fea0SBoyan Karatotev    |    1    |  2   | 933.88 (+67.76%) | 289.58 (+189.64%) | 76.34 (+1510.55%)  |
64*5059fea0SBoyan Karatotev    +---------+------+------------------+-------------------+--------------------+
65*5059fea0SBoyan Karatotev    |    1    |  3   |     1112.48      | 238.42 (+753.94%) |       76.38        |
66*5059fea0SBoyan Karatotev    +---------+------+------------------+-------------------+--------------------+
67*5059fea0SBoyan Karatotev
68*5059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
69a0db5c74SZachary Leaf        parallel (v2.12)
70a0db5c74SZachary Leaf
71a0db5c74SZachary Leaf    +---------+------+-------------------+------------------+--------------------+
72a0db5c74SZachary Leaf    | Cluster | Core |     Powerdown     |      Wakeup      |    Cache Flush     |
73a0db5c74SZachary Leaf    +---------+------+-------------------+------------------+--------------------+
74a0db5c74SZachary Leaf    |    0    |  0   |  244.52 (-65.43%) | 26.92 (-32.60%)  |   5.54 (-96.70%)   |
75a0db5c74SZachary Leaf    +---------+------+-------------------+------------------+--------------------+
76a0db5c74SZachary Leaf    |    0    |  1   | 526.18 (+105.12%) |      416.1       | 138.52 (+2011.59%) |
77a0db5c74SZachary Leaf    +---------+------+-------------------+------------------+--------------------+
78a0db5c74SZachary Leaf    |    1    |  0   |       104.34      | 27.02 (-94.62%)  |        5.32        |
79a0db5c74SZachary Leaf    +---------+------+-------------------+------------------+--------------------+
80a0db5c74SZachary Leaf    |    1    |  1   |       384.98      | 23.06 (-85.40%)  |        4.48        |
81a0db5c74SZachary Leaf    +---------+------+-------------------+------------------+--------------------+
82a0db5c74SZachary Leaf    |    1    |  2   |  812.44 (+45.94%) |      126.78      |        4.54        |
83a0db5c74SZachary Leaf    +---------+------+-------------------+------------------+--------------------+
84a0db5c74SZachary Leaf    |    1    |  3   |       986.84      | 77.22 (+176.58%) |       79.76        |
85a0db5c74SZachary Leaf    +---------+------+-------------------+------------------+--------------------+
86a0db5c74SZachary Leaf
87a0db5c74SZachary Leaf.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
88*5059fea0SBoyan Karatotev        serial (v2.13)
89a3077ae1SHarrison Mutai
90*5059fea0SBoyan Karatotev    +---------+------+------------------+-----------------+-------------------+
9194276a56SHarrison Mutai    | Cluster | Core |    Powerdown     |      Wakeup     |    Cache Flush    |
92*5059fea0SBoyan Karatotev    +---------+------+------------------+-----------------+-------------------+
93*5059fea0SBoyan Karatotev    |    0    |  0   |      244.08      | 24.48 (-40.00%) |       137.64      |
94*5059fea0SBoyan Karatotev    +---------+------+------------------+-----------------+-------------------+
95*5059fea0SBoyan Karatotev    |    0    |  1   |      244.2       | 23.84 (-41.57%) |       137.86      |
96*5059fea0SBoyan Karatotev    +---------+------+------------------+-----------------+-------------------+
97*5059fea0SBoyan Karatotev    |    1    |  0   |      294.78      |      23.54      |       76.62       |
98*5059fea0SBoyan Karatotev    +---------+------+------------------+-----------------+-------------------+
99*5059fea0SBoyan Karatotev    |    1    |  1   | 180.1 (+74.72%)  |      21.14      | 77.12 (+1533.90%) |
100*5059fea0SBoyan Karatotev    +---------+------+------------------+-----------------+-------------------+
101*5059fea0SBoyan Karatotev    |    1    |  2   | 180.54 (+75.25%) |       20.8      | 76.76 (+1554.31%) |
102*5059fea0SBoyan Karatotev    +---------+------+------------------+-----------------+-------------------+
103*5059fea0SBoyan Karatotev    |    1    |  3   | 180.6 (+75.44%)  |       21.2      | 76.86 (+1542.31%) |
104*5059fea0SBoyan Karatotev    +---------+------+------------------+-----------------+-------------------+
105a3077ae1SHarrison Mutai
106a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to deepest power level in
107a0db5c74SZachary Leaf        serial (v2.12)
108a3077ae1SHarrison Mutai
109a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
11094276a56SHarrison Mutai    | Cluster | Core | Powerdown |      Wakeup     | Cache Flush |
111a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
112a0db5c74SZachary Leaf    |    0    |  0   |   236.36  | 27.94 (-31.52%) |    138.0    |
113a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
114a0db5c74SZachary Leaf    |    0    |  1   |   236.58  | 27.86 (-31.72%) |    138.2    |
115a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
116a0db5c74SZachary Leaf    |    1    |  0   |   280.68  |      27.02      |     77.6    |
117a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
118a0db5c74SZachary Leaf    |    1    |  1   |   101.4   |      22.52      |     4.42    |
119a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
120a0db5c74SZachary Leaf    |    1    |  2   |   100.92  |      22.68      |     4.4     |
121a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
122a0db5c74SZachary Leaf    |    1    |  3   |   100.96  |      22.54      |     4.38    |
123a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
12494276a56SHarrison Mutai
125a3077ae1SHarrison Mutai``CPU_SUSPEND`` to power level 0
126a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
127a3077ae1SHarrison Mutai
128a3077ae1SHarrison Mutai.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
129*5059fea0SBoyan Karatotev        parallel (v2.13)
130*5059fea0SBoyan Karatotev
131*5059fea0SBoyan Karatotev    +---------+------+-------------------+-----------------+-------------+
132*5059fea0SBoyan Karatotev    | Cluster | Core |     Powerdown     |      Wakeup     | Cache Flush |
133*5059fea0SBoyan Karatotev    +---------+------+-------------------+-----------------+-------------+
134*5059fea0SBoyan Karatotev    |    0    |  0   |       703.06      | 16.86 (-47.87%) |     7.98    |
135*5059fea0SBoyan Karatotev    +---------+------+-------------------+-----------------+-------------+
136*5059fea0SBoyan Karatotev    |    0    |  1   |       851.88      |  16.4 (-49.41%) |     8.04    |
137*5059fea0SBoyan Karatotev    +---------+------+-------------------+-----------------+-------------+
138*5059fea0SBoyan Karatotev    |    1    |  0   |  407.4 (+58.99%)  |  15.1 (-26.20%) |     7.2     |
139*5059fea0SBoyan Karatotev    +---------+------+-------------------+-----------------+-------------+
140*5059fea0SBoyan Karatotev    |    1    |  1   |  110.98 (-72.67%) |      15.46      |     6.56    |
141*5059fea0SBoyan Karatotev    +---------+------+-------------------+-----------------+-------------+
142*5059fea0SBoyan Karatotev    |    1    |  2   |       554.54      |       15.4      |     6.94    |
143*5059fea0SBoyan Karatotev    +---------+------+-------------------+-----------------+-------------+
144*5059fea0SBoyan Karatotev    |    1    |  3   | 258.96 (+143.06%) | 15.56 (-25.05%) |     6.64    |
145*5059fea0SBoyan Karatotev    +---------+------+-------------------+-----------------+-------------+
146*5059fea0SBoyan Karatotev
147*5059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in
148a0db5c74SZachary Leaf        parallel (v2.12)
149a0db5c74SZachary Leaf
150a0db5c74SZachary Leaf    +--------------------------------------------------------------------+
151a0db5c74SZachary Leaf    |                  test_rt_instr_cpu_susp_parallel                   |
152a0db5c74SZachary Leaf    +---------+------+-------------------+-----------------+-------------+
153a0db5c74SZachary Leaf    | Cluster | Core |     Powerdown     |      Wakeup     | Cache Flush |
154a0db5c74SZachary Leaf    +---------+------+-------------------+-----------------+-------------+
155a0db5c74SZachary Leaf    |    0    |  0   |       663.12      | 19.66 (-39.21%) |     8.26    |
156a0db5c74SZachary Leaf    +---------+------+-------------------+-----------------+-------------+
157a0db5c74SZachary Leaf    |    0    |  1   |       804.18      | 19.24 (-40.65%) |     8.1     |
158a0db5c74SZachary Leaf    +---------+------+-------------------+-----------------+-------------+
159a0db5c74SZachary Leaf    |    1    |  0   |  105.58 (-58.80%) |      19.68      |     7.42    |
160a0db5c74SZachary Leaf    +---------+------+-------------------+-----------------+-------------+
161a0db5c74SZachary Leaf    |    1    |  1   |  245.02 (-39.67%) |       19.8      |     6.82    |
162a0db5c74SZachary Leaf    +---------+------+-------------------+-----------------+-------------+
163a0db5c74SZachary Leaf    |    1    |  2   |  383.82 (-30.83%) |      18.84      |     7.06    |
164a0db5c74SZachary Leaf    +---------+------+-------------------+-----------------+-------------+
165a0db5c74SZachary Leaf    |    1    |  3   | 523.36 (+391.23%) |       19.0      |     7.3     |
166a0db5c74SZachary Leaf    +---------+------+-------------------+-----------------+-------------+
167a0db5c74SZachary Leaf
168*5059fea0SBoyan Karatotev.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.13)
169a3077ae1SHarrison Mutai
170*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
17194276a56SHarrison Mutai    | Cluster | Core | Powerdown |      Wakeup     | Cache Flush |
172*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
173*5059fea0SBoyan Karatotev    |    0    |  0   |   106.12  |  17.1 (-48.24%) |     5.26    |
174*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
175*5059fea0SBoyan Karatotev    |    0    |  1   |   106.88  | 17.06 (-47.08%) |     5.28    |
176*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
177*5059fea0SBoyan Karatotev    |    1    |  0   |   294.36  |       15.6      |     4.56    |
178*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
179*5059fea0SBoyan Karatotev    |    1    |  1   |   103.26  |      15.44      |     4.46    |
180*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
181*5059fea0SBoyan Karatotev    |    1    |  2   |   103.7   |      15.26      |     4.5     |
182*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
183*5059fea0SBoyan Karatotev    |    1    |  3   |   103.68  |      15.72      |     4.5     |
184*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
185a3077ae1SHarrison Mutai
186a0db5c74SZachary Leaf.. table:: ``CPU_SUSPEND`` latencies (µs) to power level 0 in serial (v2.12)
187a3077ae1SHarrison Mutai
188a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
18994276a56SHarrison Mutai    | Cluster | Core | Powerdown |      Wakeup     | Cache Flush |
190a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
191a0db5c74SZachary Leaf    |    0    |  0   |   100.04  | 20.32 (-38.50%) |     5.62    |
192a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
193a0db5c74SZachary Leaf    |    0    |  1   |   99.78   |  20.6 (-36.10%) |     5.42    |
194a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
195a0db5c74SZachary Leaf    |    1    |  0   |   278.28  |      19.52      |     4.32    |
196a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
197a0db5c74SZachary Leaf    |    1    |  1   |    97.3   |      19.44      |     4.26    |
198a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
199a0db5c74SZachary Leaf    |    1    |  2   |   97.56   |      19.52      |     4.32    |
200a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
201a0db5c74SZachary Leaf    |    1    |  3   |   97.52   |      19.46      |     4.26    |
202a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
20394276a56SHarrison Mutai
204a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs
205a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
206a3077ae1SHarrison Mutai
207a3077ae1SHarrison Mutai``CPU_OFF`` on all non-lead CPUs in sequence then, ``CPU_SUSPEND`` on the lead
208a3077ae1SHarrison Mutaicore to the deepest power level.
209a3077ae1SHarrison Mutai
210*5059fea0SBoyan Karatotev.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.13)
211*5059fea0SBoyan Karatotev
212*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
213*5059fea0SBoyan Karatotev    | Cluster | Core | Powerdown |      Wakeup     | Cache Flush |
214*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
215*5059fea0SBoyan Karatotev    |    0    |  0   |   243.02  | 26.42 (-39.51%) |    137.58   |
216*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
217*5059fea0SBoyan Karatotev    |    0    |  1   |   244.24  | 26.32 (-38.93%) |    137.88   |
218*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
219*5059fea0SBoyan Karatotev    |    1    |  0   |   182.36  |      23.66      |     78.0    |
220*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
221*5059fea0SBoyan Karatotev    |    1    |  1   |   108.18  |      22.68      |     4.42    |
222*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
223*5059fea0SBoyan Karatotev    |    1    |  2   |   108.34  |      21.72      |     4.24    |
224*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
225*5059fea0SBoyan Karatotev    |    1    |  3   |   108.22  |      21.68      |     4.34    |
226*5059fea0SBoyan Karatotev    +---------+------+-----------+-----------------+-------------+
227*5059fea0SBoyan Karatotev
228a0db5c74SZachary Leaf.. table:: ``CPU_OFF`` latencies (µs) on all non-lead CPUs (v2.12)
229a0db5c74SZachary Leaf
230a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
231a0db5c74SZachary Leaf    | Cluster | Core | Powerdown |      Wakeup     | Cache Flush |
232a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
233a0db5c74SZachary Leaf    |    0    |  0   |   236.3   | 30.88 (-29.30%) |    137.76   |
234a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
235a0db5c74SZachary Leaf    |    0    |  1   |   236.66  |  30.5 (-29.23%) |    138.02   |
236a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
237a0db5c74SZachary Leaf    |    1    |  0   |   175.9   |       27.0      |    77.86    |
238a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
239a0db5c74SZachary Leaf    |    1    |  1   |   100.96  |      27.56      |     4.26    |
240a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
241a0db5c74SZachary Leaf    |    1    |  2   |   101.04  |      26.48      |     4.38    |
242a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
243a0db5c74SZachary Leaf    |    1    |  3   |   101.08  |      26.74      |     4.4     |
244a0db5c74SZachary Leaf    +---------+------+-----------+-----------------+-------------+
245a0db5c74SZachary Leaf
246a3077ae1SHarrison Mutai``CPU_VERSION`` in parallel
247a3077ae1SHarrison Mutai~~~~~~~~~~~~~~~~~~~~~~~~~~~
248a3077ae1SHarrison Mutai
249*5059fea0SBoyan Karatotev.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.13)
250*5059fea0SBoyan Karatotev
251*5059fea0SBoyan Karatotev    +-------------+--------+--------------+
252*5059fea0SBoyan Karatotev    |   Cluster   |  Core  |   Latency    |
253*5059fea0SBoyan Karatotev    +-------------+--------+--------------+
254*5059fea0SBoyan Karatotev    |      0      |   0    |     1.0      |
255*5059fea0SBoyan Karatotev    +-------------+--------+--------------+
256*5059fea0SBoyan Karatotev    |      0      |   1    |     1.06     |
257*5059fea0SBoyan Karatotev    +-------------+--------+--------------+
258*5059fea0SBoyan Karatotev    |      1      |   0    |     0.6      |
259*5059fea0SBoyan Karatotev    +-------------+--------+--------------+
260*5059fea0SBoyan Karatotev    |      1      |   1    |     1.0      |
261*5059fea0SBoyan Karatotev    +-------------+--------+--------------+
262*5059fea0SBoyan Karatotev    |      1      |   2    |     0.98     |
263*5059fea0SBoyan Karatotev    +-------------+--------+--------------+
264*5059fea0SBoyan Karatotev    |      1      |   3    |     1.0      |
265*5059fea0SBoyan Karatotev    +-------------+--------+--------------+
266*5059fea0SBoyan Karatotev
267a0db5c74SZachary Leaf.. table:: ``CPU_VERSION`` latency (µs) in parallel on all cores (2.12)
268a0db5c74SZachary Leaf
269a0db5c74SZachary Leaf    +-------------+--------+--------------+
270a0db5c74SZachary Leaf    |   Cluster   |  Core  |   Latency    |
271a0db5c74SZachary Leaf    +-------------+--------+--------------+
272a0db5c74SZachary Leaf    |      0      |   0    |     1.0      |
273a0db5c74SZachary Leaf    +-------------+--------+--------------+
274a0db5c74SZachary Leaf    |      0      |   1    |     1.02     |
275a0db5c74SZachary Leaf    +-------------+--------+--------------+
276a0db5c74SZachary Leaf    |      1      |   0    |     0.52     |
277a0db5c74SZachary Leaf    +-------------+--------+--------------+
278a0db5c74SZachary Leaf    |      1      |   1    |     0.94     |
279a0db5c74SZachary Leaf    +-------------+--------+--------------+
280a0db5c74SZachary Leaf    |      1      |   2    |     0.94     |
281a0db5c74SZachary Leaf    +-------------+--------+--------------+
282a0db5c74SZachary Leaf    |      1      |   3    |     0.92     |
283a0db5c74SZachary Leaf    +-------------+--------+--------------+
284a0db5c74SZachary Leaf
285a3077ae1SHarrison MutaiAnnotated Historic Results
286a3077ae1SHarrison Mutai--------------------------
287a3077ae1SHarrison Mutai
288a3077ae1SHarrison MutaiThe following results are based on the upstream `TF master as of 31/01/2017`_.
289a3077ae1SHarrison MutaiTF-A was built using the same build instructions as detailed in the procedure
290a3077ae1SHarrison Mutaiabove.
29140d553cfSPaul Beesley
29240d553cfSPaul BeesleyIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
29340d553cfSPaul BeesleyCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
29440d553cfSPaul BeesleyCPU.
29540d553cfSPaul Beesley
296a3077ae1SHarrison Mutai``PSCI_ENTRY`` corresponds to the powerdown latency, ``PSCI_EXIT`` the wakeup latency, and
297a3077ae1SHarrison Mutai``CFLUSH_OVERHEAD`` the latency of the cache flush operation.
29840d553cfSPaul Beesley
29940d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
30040d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30140d553cfSPaul Beesley
30240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
30340d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
30440d553cfSPaul Beesley+=======+=====================+====================+==========================+
30540d553cfSPaul Beesley| 0     | 27                  | 20                 | 5                        |
30640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
30740d553cfSPaul Beesley| 1     | 114                 | 86                 | 5                        |
30840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
30940d553cfSPaul Beesley| 2     | 202                 | 58                 | 5                        |
31040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
31140d553cfSPaul Beesley| 3     | 375                 | 29                 | 94                       |
31240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
31340d553cfSPaul Beesley| 4     | 20                  | 22                 | 6                        |
31440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
31540d553cfSPaul Beesley| 5     | 290                 | 18                 | 206                      |
31640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
31740d553cfSPaul Beesley
31840d553cfSPaul BeesleyA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is
31940d553cfSPaul Beesleyobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait
32040d553cfSPaul Beesleyfor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release
32140d553cfSPaul Beesleythe lock before proceeding.
32240d553cfSPaul Beesley
32340d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the
32440d553cfSPaul Beesleylast CPUs in their respective clusters to power down, therefore both the L1 and
32540d553cfSPaul BeesleyL2 caches are flushed.
32640d553cfSPaul Beesley
32740d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3
32840d553cfSPaul Beesleybecause the L2 cache size for the big cluster is lot larger (2MB) compared to
32940d553cfSPaul Beesleythe little cluster (1MB).
33040d553cfSPaul Beesley
33140d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in parallel
33240d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
33340d553cfSPaul Beesley
33440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
33540d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
33640d553cfSPaul Beesley+=======+=====================+====================+==========================+
33740d553cfSPaul Beesley| 0     | 116                 | 14                 | 8                        |
33840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
33940d553cfSPaul Beesley| 1     | 204                 | 14                 | 8                        |
34040d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34140d553cfSPaul Beesley| 2     | 287                 | 13                 | 8                        |
34240d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34340d553cfSPaul Beesley| 3     | 376                 | 13                 | 9                        |
34440d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34540d553cfSPaul Beesley| 4     | 29                  | 15                 | 7                        |
34640d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34740d553cfSPaul Beesley| 5     | 21                  | 15                 | 8                        |
34840d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
34940d553cfSPaul Beesley
35040d553cfSPaul BeesleyThere is no lock contention in TF generic code at power level 0 but the large
35140d553cfSPaul Beesleyvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno
35240d553cfSPaul Beesleyplatform code. The platform lock is used to mediate access to a single SCP
35340d553cfSPaul Beesleycommunication channel. This is compounded by the SCP firmware waiting for each
35440d553cfSPaul BeesleyAP CPU to enter WFI before making the channel available to other CPUs, which
35540d553cfSPaul Beesleyeffectively serializes the SCP power down commands from all CPUs.
35640d553cfSPaul Beesley
35740d553cfSPaul BeesleyOn platforms with a more efficient CPU power down mechanism, it should be
35840d553cfSPaul Beesleypossible to make the ``PSCI_ENTRY`` times smaller and consistent.
35940d553cfSPaul Beesley
36040d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not
36140d553cfSPaul Beesleyrequire locks at power level 0.
36240d553cfSPaul Beesley
36340d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only
36440d553cfSPaul Beesleythe cache associated with power level 0 is flushed (L1).
36540d553cfSPaul Beesley
36640d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in sequence
36740d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
36840d553cfSPaul Beesley
36940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
37040d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
37140d553cfSPaul Beesley+=======+=====================+====================+==========================+
37240d553cfSPaul Beesley| 0     | 114                 | 20                 | 94                       |
37340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
37440d553cfSPaul Beesley| 1     | 114                 | 20                 | 94                       |
37540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
37640d553cfSPaul Beesley| 2     | 114                 | 20                 | 94                       |
37740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
37840d553cfSPaul Beesley| 3     | 114                 | 20                 | 94                       |
37940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
38040d553cfSPaul Beesley| 4     | 195                 | 22                 | 180                      |
38140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
38240d553cfSPaul Beesley| 5     | 21                  | 17                 | 6                        |
38340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
38440d553cfSPaul Beesley
385be653a69SPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster
38640d553cfSPaul Beesleyare large because all other CPUs in the cluster are powered down during the
38740d553cfSPaul Beesleytest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a
38840d553cfSPaul Beesleyflush of both L1 and L2 caches.
38940d553cfSPaul Beesley
39040d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
39140d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
39240d553cfSPaul Beesleyto the little cluster (1MB).
39340d553cfSPaul Beesley
39440d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead
39540d553cfSPaul BeesleyCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to
39640d553cfSPaul Beesleylevel 0, which only requires L1 cache flush.
39740d553cfSPaul Beesley
39840d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in sequence
39940d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
40040d553cfSPaul Beesley
40140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40240d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
40340d553cfSPaul Beesley+=======+=====================+====================+==========================+
40440d553cfSPaul Beesley| 0     | 22                  | 14                 | 5                        |
40540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40640d553cfSPaul Beesley| 1     | 22                  | 14                 | 5                        |
40740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
40840d553cfSPaul Beesley| 2     | 21                  | 14                 | 5                        |
40940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
41040d553cfSPaul Beesley| 3     | 22                  | 14                 | 5                        |
41140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
41240d553cfSPaul Beesley| 4     | 17                  | 14                 | 6                        |
41340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
41440d553cfSPaul Beesley| 5     | 18                  | 15                 | 6                        |
41540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
41640d553cfSPaul Beesley
41740d553cfSPaul BeesleyHere the times are small and consistent since there is no contention and it is
41840d553cfSPaul Beesleyonly necessary to flush the cache to power level 0 (L1). This is the best case
41940d553cfSPaul Beesleyscenario.
42040d553cfSPaul Beesley
42140d553cfSPaul BeesleyThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than
42240d553cfSPaul Beesleyfor the CPUs in little cluster due to greater CPU performance.
42340d553cfSPaul Beesley
42440d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are generally lower than in the last test because the
42540d553cfSPaul Beesleycluster remains powered on throughout the test and there is less code to execute
42640d553cfSPaul Beesleyon power on (for example, no need to enter CCI coherency)
42740d553cfSPaul Beesley
42840d553cfSPaul Beesley``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level
42940d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
43040d553cfSPaul Beesley
43140d553cfSPaul BeesleyThe test sequence here is as follows:
43240d553cfSPaul Beesley
43340d553cfSPaul Beesley1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence.
43440d553cfSPaul Beesley
43540d553cfSPaul Beesley2. Program wake up timer and suspend the lead CPU to the deepest power level.
43640d553cfSPaul Beesley
43740d553cfSPaul Beesley3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU.
43840d553cfSPaul Beesley
43940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
44040d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
44140d553cfSPaul Beesley+=======+=====================+====================+==========================+
44240d553cfSPaul Beesley| 0     | 110                 | 28                 | 93                       |
44340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
44440d553cfSPaul Beesley| 1     | 110                 | 28                 | 93                       |
44540d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
44640d553cfSPaul Beesley| 2     | 110                 | 28                 | 93                       |
44740d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
44840d553cfSPaul Beesley| 3     | 111                 | 28                 | 93                       |
44940d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
45040d553cfSPaul Beesley| 4     | 195                 | 22                 | 181                      |
45140d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
45240d553cfSPaul Beesley| 5     | 20                  | 23                 | 6                        |
45340d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
45440d553cfSPaul Beesley
45540d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other
45640d553cfSPaul BeesleyCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call
45740d553cfSPaul Beesleypowers down to the cluster level, requiring a flush of both L1 and L2 caches.
45840d553cfSPaul Beesley
45940d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because
46040d553cfSPaul Beesleylead CPU 4 is running and CPU 5 only powers down to level 0, which only requires
46140d553cfSPaul Beesleyan L1 cache flush.
46240d553cfSPaul Beesley
46340d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
46440d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
46540d553cfSPaul Beesleyto the little cluster (1MB).
46640d553cfSPaul Beesley
46740d553cfSPaul BeesleyThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than
46840d553cfSPaul Beesleyfor CPUs in the little cluster due to greater CPU performance.  These times
46940d553cfSPaul Beesleygenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests
47040d553cfSPaul Beesleybecause there is more code to execute in the "on finisher" compared to the
47140d553cfSPaul Beesley"suspend finisher" (for example, GIC redistributor register programming).
47240d553cfSPaul Beesley
47340d553cfSPaul Beesley``PSCI_VERSION`` on all CPUs in parallel
47440d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
47540d553cfSPaul Beesley
47640d553cfSPaul BeesleySince very little code is associated with ``PSCI_VERSION``, this test
47740d553cfSPaul Beesleyapproximates the round trip latency for handling a fast SMC at EL3 in TF.
47840d553cfSPaul Beesley
47940d553cfSPaul Beesley+-------+-------------------+
48040d553cfSPaul Beesley| CPU   | TOTAL TIME (ns)   |
48140d553cfSPaul Beesley+=======+===================+
48240d553cfSPaul Beesley| 0     | 3020              |
48340d553cfSPaul Beesley+-------+-------------------+
48440d553cfSPaul Beesley| 1     | 2940              |
48540d553cfSPaul Beesley+-------+-------------------+
48640d553cfSPaul Beesley| 2     | 2980              |
48740d553cfSPaul Beesley+-------+-------------------+
48840d553cfSPaul Beesley| 3     | 3060              |
48940d553cfSPaul Beesley+-------+-------------------+
49040d553cfSPaul Beesley| 4     | 520               |
49140d553cfSPaul Beesley+-------+-------------------+
49240d553cfSPaul Beesley| 5     | 720               |
49340d553cfSPaul Beesley+-------+-------------------+
49440d553cfSPaul Beesley
49540d553cfSPaul BeesleyThe times for the big CPUs are less than the little CPUs due to greater CPU
49640d553cfSPaul Beesleyperformance.
49740d553cfSPaul Beesley
49840d553cfSPaul BeesleyWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache
49940d553cfSPaul Beesleyeffects, given that these measurements are at the nano-second level.
50040d553cfSPaul Beesley
501bd97f83aSJohn Tsichritzis--------------
502bd97f83aSJohn Tsichritzis
5039b65ffefSBoyan Karatotev*Copyright (c) 2019-2025, Arm Limited and Contributors. All rights reserved.*
504bd97f83aSJohn Tsichritzis
5050cbcccc0SHarrison Mutai.. _Juno R1 platform: https://developer.arm.com/documentation/100122/latest/
50640d553cfSPaul Beesley.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
507*5059fea0SBoyan Karatotev.. _TF-A v2.13-rc0: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?h=v2.13-rc0
508*5059fea0SBoyan Karatotev.. _TFTF v2.13-rc0: https://git.trustedfirmware.org/TF-A/tf-a-tests.git/tree/?h=v2.13-rc0
509