xref: /rk3399_ARM-atf/docs/perf/psci-performance-juno.rst (revision 40d553cfde38d4f68449c62967cd1ce0d6478750)
1*40d553cfSPaul BeesleyPSCI Performance Measurements on Arm Juno Development Platform
2*40d553cfSPaul Beesley==============================================================
3*40d553cfSPaul Beesley
4*40d553cfSPaul BeesleyThis document summarises the findings of performance measurements of key
5*40d553cfSPaul Beesleyoperations in the ARM Trusted Firmware (TF) Power State Coordination Interface
6*40d553cfSPaul Beesley(PSCI) implementation, using the in-built Performance Measurement Framework
7*40d553cfSPaul Beesley(PMF) and runtime instrumentation timestamps.
8*40d553cfSPaul Beesley
9*40d553cfSPaul BeesleyMethod
10*40d553cfSPaul Beesley------
11*40d553cfSPaul Beesley
12*40d553cfSPaul BeesleyWe used the `Juno R1 platform`_ for these tests, which has 4 x Cortex-A53 and 2
13*40d553cfSPaul Beesleyx Cortex-A57 clusters running at the following frequencies:
14*40d553cfSPaul Beesley
15*40d553cfSPaul Beesley+-----------------+--------------------+
16*40d553cfSPaul Beesley| Domain          | Frequency (MHz)    |
17*40d553cfSPaul Beesley+=================+====================+
18*40d553cfSPaul Beesley| Cortex-A57      | 900 (nominal)      |
19*40d553cfSPaul Beesley+-----------------+--------------------+
20*40d553cfSPaul Beesley| Cortex-A53      | 650 (underdrive)   |
21*40d553cfSPaul Beesley+-----------------+--------------------+
22*40d553cfSPaul Beesley| AXI subsystem   | 533                |
23*40d553cfSPaul Beesley+-----------------+--------------------+
24*40d553cfSPaul Beesley
25*40d553cfSPaul BeesleyJuno supports CPU, cluster and system power down states, corresponding to power
26*40d553cfSPaul Beesleylevels 0, 1 and 2 respectively. It does not support any retention states.
27*40d553cfSPaul Beesley
28*40d553cfSPaul BeesleyWe used the upstream `TF master as of 31/01/2017`_, building the platform using
29*40d553cfSPaul Beesleythe ``ENABLE_RUNTIME_INSTRUMENTATION`` option:
30*40d553cfSPaul Beesley
31*40d553cfSPaul Beesley::
32*40d553cfSPaul Beesley
33*40d553cfSPaul Beesley    make PLAT=juno ENABLE_RUNTIME_INSTRUMENTATION=1 \
34*40d553cfSPaul Beesley        SCP_BL2=<path/to/scp-fw.bin>                \
35*40d553cfSPaul Beesley        BL33=<path/to/test-fw.bin>                  \
36*40d553cfSPaul Beesley        all fip
37*40d553cfSPaul Beesley
38*40d553cfSPaul BeesleyWhen using the debug build of TF, there was no noticeable difference in the
39*40d553cfSPaul Beesleyresults.
40*40d553cfSPaul Beesley
41*40d553cfSPaul BeesleyThe tests are based on an ARM-internal test framework. The release build of this
42*40d553cfSPaul Beesleyframework was used because the results in the debug build became skewed; the
43*40d553cfSPaul Beesleyconsole output prevented some of the tests from executing in parallel.
44*40d553cfSPaul Beesley
45*40d553cfSPaul BeesleyThe tests consist of both parallel and sequential tests, which are broadly
46*40d553cfSPaul Beesleydescribed as follows:
47*40d553cfSPaul Beesley
48*40d553cfSPaul Beesley- **Parallel Tests** This type of test powers on all the non-lead CPUs and
49*40d553cfSPaul Beesley  brings them and the lead CPU to a common synchronization point.  The lead CPU
50*40d553cfSPaul Beesley  then initiates the test on all CPUs in parallel.
51*40d553cfSPaul Beesley
52*40d553cfSPaul Beesley- **Sequential Tests** This type of test powers on each non-lead CPU in
53*40d553cfSPaul Beesley  sequence. The lead CPU initiates the test on a non-lead CPU then waits for the
54*40d553cfSPaul Beesley  test to complete before proceeding to the next non-lead CPU. The lead CPU then
55*40d553cfSPaul Beesley  executes the test on itself.
56*40d553cfSPaul Beesley
57*40d553cfSPaul BeesleyIn the results below, CPUs 0-3 refer to CPUs in the little cluster (A53) and
58*40d553cfSPaul BeesleyCPUs 4-5 refer to CPUs in the big cluster (A57). In all cases CPU 4 is the lead
59*40d553cfSPaul BeesleyCPU.
60*40d553cfSPaul Beesley
61*40d553cfSPaul Beesley``PSCI_ENTRY`` refers to the time taken from entering the TF PSCI implementation
62*40d553cfSPaul Beesleyto the point the hardware enters the low power state (WFI). Referring to the TF
63*40d553cfSPaul Beesleyruntime instrumentation points, this corresponds to:
64*40d553cfSPaul Beesley``(RT_INSTR_ENTER_HW_LOW_PWR - RT_INSTR_ENTER_PSCI)``.
65*40d553cfSPaul Beesley
66*40d553cfSPaul Beesley``PSCI_EXIT`` refers to the time taken from the point the hardware exits the low
67*40d553cfSPaul Beesleypower state to exiting the TF PSCI implementation. This corresponds to:
68*40d553cfSPaul Beesley``(RT_INSTR_EXIT_PSCI - RT_INSTR_EXIT_HW_LOW_PWR)``.
69*40d553cfSPaul Beesley
70*40d553cfSPaul Beesley``CFLUSH_OVERHEAD`` refers to the part of ``PSCI_ENTRY`` taken to flush the
71*40d553cfSPaul Beesleycaches. This corresponds to: ``(RT_INSTR_EXIT_CFLUSH - RT_INSTR_ENTER_CFLUSH)``.
72*40d553cfSPaul Beesley
73*40d553cfSPaul BeesleyNote there is very little variance observed in the values given (~1us), although
74*40d553cfSPaul Beesleythe values for each CPU are sometimes interchanged, depending on the order in
75*40d553cfSPaul Beesleywhich locks are acquired. Also, there is very little variance observed between
76*40d553cfSPaul Beesleyexecuting the tests sequentially in a single boot or rebooting between tests.
77*40d553cfSPaul Beesley
78*40d553cfSPaul BeesleyGiven that runtime instrumentation using PMF is invasive, there is a small
79*40d553cfSPaul Beesley(unquantified) overhead on the results. PMF uses the generic counter for
80*40d553cfSPaul Beesleytimestamps, which runs at 50MHz on Juno.
81*40d553cfSPaul Beesley
82*40d553cfSPaul BeesleyResults and Commentary
83*40d553cfSPaul Beesley----------------------
84*40d553cfSPaul Beesley
85*40d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in parallel
86*40d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
87*40d553cfSPaul Beesley
88*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
89*40d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
90*40d553cfSPaul Beesley+=======+=====================+====================+==========================+
91*40d553cfSPaul Beesley| 0     | 27                  | 20                 | 5                        |
92*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
93*40d553cfSPaul Beesley| 1     | 114                 | 86                 | 5                        |
94*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
95*40d553cfSPaul Beesley| 2     | 202                 | 58                 | 5                        |
96*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
97*40d553cfSPaul Beesley| 3     | 375                 | 29                 | 94                       |
98*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
99*40d553cfSPaul Beesley| 4     | 20                  | 22                 | 6                        |
100*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
101*40d553cfSPaul Beesley| 5     | 290                 | 18                 | 206                      |
102*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
103*40d553cfSPaul Beesley
104*40d553cfSPaul BeesleyA large variance in ``PSCI_ENTRY`` and ``PSCI_EXIT`` times across CPUs is
105*40d553cfSPaul Beesleyobserved due to TF PSCI lock contention. In the worst case, CPU 3 has to wait
106*40d553cfSPaul Beesleyfor the 3 other CPUs in the cluster (0-2) to complete ``PSCI_ENTRY`` and release
107*40d553cfSPaul Beesleythe lock before proceeding.
108*40d553cfSPaul Beesley
109*40d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for CPUs 3 and 5 are higher because they are the
110*40d553cfSPaul Beesleylast CPUs in their respective clusters to power down, therefore both the L1 and
111*40d553cfSPaul BeesleyL2 caches are flushed.
112*40d553cfSPaul Beesley
113*40d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 5 is a lot larger than that for CPU 3
114*40d553cfSPaul Beesleybecause the L2 cache size for the big cluster is lot larger (2MB) compared to
115*40d553cfSPaul Beesleythe little cluster (1MB).
116*40d553cfSPaul Beesley
117*40d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in parallel
118*40d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
119*40d553cfSPaul Beesley
120*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
121*40d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
122*40d553cfSPaul Beesley+=======+=====================+====================+==========================+
123*40d553cfSPaul Beesley| 0     | 116                 | 14                 | 8                        |
124*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
125*40d553cfSPaul Beesley| 1     | 204                 | 14                 | 8                        |
126*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
127*40d553cfSPaul Beesley| 2     | 287                 | 13                 | 8                        |
128*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
129*40d553cfSPaul Beesley| 3     | 376                 | 13                 | 9                        |
130*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
131*40d553cfSPaul Beesley| 4     | 29                  | 15                 | 7                        |
132*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
133*40d553cfSPaul Beesley| 5     | 21                  | 15                 | 8                        |
134*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
135*40d553cfSPaul Beesley
136*40d553cfSPaul BeesleyThere is no lock contention in TF generic code at power level 0 but the large
137*40d553cfSPaul Beesleyvariance in ``PSCI_ENTRY`` times across CPUs is due to lock contention in Juno
138*40d553cfSPaul Beesleyplatform code. The platform lock is used to mediate access to a single SCP
139*40d553cfSPaul Beesleycommunication channel. This is compounded by the SCP firmware waiting for each
140*40d553cfSPaul BeesleyAP CPU to enter WFI before making the channel available to other CPUs, which
141*40d553cfSPaul Beesleyeffectively serializes the SCP power down commands from all CPUs.
142*40d553cfSPaul Beesley
143*40d553cfSPaul BeesleyOn platforms with a more efficient CPU power down mechanism, it should be
144*40d553cfSPaul Beesleypossible to make the ``PSCI_ENTRY`` times smaller and consistent.
145*40d553cfSPaul Beesley
146*40d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are consistent across all CPUs because TF does not
147*40d553cfSPaul Beesleyrequire locks at power level 0.
148*40d553cfSPaul Beesley
149*40d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all CPUs are small and consistent since only
150*40d553cfSPaul Beesleythe cache associated with power level 0 is flushed (L1).
151*40d553cfSPaul Beesley
152*40d553cfSPaul Beesley``CPU_SUSPEND`` to deepest power level on all CPUs in sequence
153*40d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
154*40d553cfSPaul Beesley
155*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
156*40d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
157*40d553cfSPaul Beesley+=======+=====================+====================+==========================+
158*40d553cfSPaul Beesley| 0     | 114                 | 20                 | 94                       |
159*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
160*40d553cfSPaul Beesley| 1     | 114                 | 20                 | 94                       |
161*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
162*40d553cfSPaul Beesley| 2     | 114                 | 20                 | 94                       |
163*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
164*40d553cfSPaul Beesley| 3     | 114                 | 20                 | 94                       |
165*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
166*40d553cfSPaul Beesley| 4     | 195                 | 22                 | 180                      |
167*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
168*40d553cfSPaul Beesley| 5     | 21                  | 17                 | 6                        |
169*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
170*40d553cfSPaul Beesley
171*40d553cfSPaul BeesleyThe ``CLUSH_OVERHEAD`` times for lead CPU 4 and all CPUs in the non-lead cluster
172*40d553cfSPaul Beesleyare large because all other CPUs in the cluster are powered down during the
173*40d553cfSPaul Beesleytest. The ``CPU_SUSPEND`` call powers down to the cluster level, requiring a
174*40d553cfSPaul Beesleyflush of both L1 and L2 caches.
175*40d553cfSPaul Beesley
176*40d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
177*40d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
178*40d553cfSPaul Beesleyto the little cluster (1MB).
179*40d553cfSPaul Beesley
180*40d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are low because lead
181*40d553cfSPaul BeesleyCPU 4 continues to run while CPU 5 is suspended. Hence CPU 5 only powers down to
182*40d553cfSPaul Beesleylevel 0, which only requires L1 cache flush.
183*40d553cfSPaul Beesley
184*40d553cfSPaul Beesley``CPU_SUSPEND`` to power level 0 on all CPUs in sequence
185*40d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
186*40d553cfSPaul Beesley
187*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
188*40d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
189*40d553cfSPaul Beesley+=======+=====================+====================+==========================+
190*40d553cfSPaul Beesley| 0     | 22                  | 14                 | 5                        |
191*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
192*40d553cfSPaul Beesley| 1     | 22                  | 14                 | 5                        |
193*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
194*40d553cfSPaul Beesley| 2     | 21                  | 14                 | 5                        |
195*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
196*40d553cfSPaul Beesley| 3     | 22                  | 14                 | 5                        |
197*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
198*40d553cfSPaul Beesley| 4     | 17                  | 14                 | 6                        |
199*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
200*40d553cfSPaul Beesley| 5     | 18                  | 15                 | 6                        |
201*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
202*40d553cfSPaul Beesley
203*40d553cfSPaul BeesleyHere the times are small and consistent since there is no contention and it is
204*40d553cfSPaul Beesleyonly necessary to flush the cache to power level 0 (L1). This is the best case
205*40d553cfSPaul Beesleyscenario.
206*40d553cfSPaul Beesley
207*40d553cfSPaul BeesleyThe ``PSCI_ENTRY`` times for CPUs in the big cluster are slightly smaller than
208*40d553cfSPaul Beesleyfor the CPUs in little cluster due to greater CPU performance.
209*40d553cfSPaul Beesley
210*40d553cfSPaul BeesleyThe ``PSCI_EXIT`` times are generally lower than in the last test because the
211*40d553cfSPaul Beesleycluster remains powered on throughout the test and there is less code to execute
212*40d553cfSPaul Beesleyon power on (for example, no need to enter CCI coherency)
213*40d553cfSPaul Beesley
214*40d553cfSPaul Beesley``CPU_OFF`` on all non-lead CPUs in sequence then ``CPU_SUSPEND`` on lead CPU to deepest power level
215*40d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
216*40d553cfSPaul Beesley
217*40d553cfSPaul BeesleyThe test sequence here is as follows:
218*40d553cfSPaul Beesley
219*40d553cfSPaul Beesley1. Call ``CPU_ON`` and ``CPU_OFF`` on each non-lead CPU in sequence.
220*40d553cfSPaul Beesley
221*40d553cfSPaul Beesley2. Program wake up timer and suspend the lead CPU to the deepest power level.
222*40d553cfSPaul Beesley
223*40d553cfSPaul Beesley3. Call ``CPU_ON`` on non-lead CPU to get the timestamps from each CPU.
224*40d553cfSPaul Beesley
225*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
226*40d553cfSPaul Beesley| CPU   | ``PSCI_ENTRY`` (us) | ``PSCI_EXIT`` (us) | ``CFLUSH_OVERHEAD`` (us) |
227*40d553cfSPaul Beesley+=======+=====================+====================+==========================+
228*40d553cfSPaul Beesley| 0     | 110                 | 28                 | 93                       |
229*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
230*40d553cfSPaul Beesley| 1     | 110                 | 28                 | 93                       |
231*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
232*40d553cfSPaul Beesley| 2     | 110                 | 28                 | 93                       |
233*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
234*40d553cfSPaul Beesley| 3     | 111                 | 28                 | 93                       |
235*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
236*40d553cfSPaul Beesley| 4     | 195                 | 22                 | 181                      |
237*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
238*40d553cfSPaul Beesley| 5     | 20                  | 23                 | 6                        |
239*40d553cfSPaul Beesley+-------+---------------------+--------------------+--------------------------+
240*40d553cfSPaul Beesley
241*40d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` times for all little CPUs are large because all other
242*40d553cfSPaul BeesleyCPUs in that cluster are powerered down during the test. The ``CPU_OFF`` call
243*40d553cfSPaul Beesleypowers down to the cluster level, requiring a flush of both L1 and L2 caches.
244*40d553cfSPaul Beesley
245*40d553cfSPaul BeesleyThe ``PSCI_ENTRY`` and ``CFLUSH_OVERHEAD`` times for CPU 5 are small because
246*40d553cfSPaul Beesleylead CPU 4 is running and CPU 5 only powers down to level 0, which only requires
247*40d553cfSPaul Beesleyan L1 cache flush.
248*40d553cfSPaul Beesley
249*40d553cfSPaul BeesleyThe ``CFLUSH_OVERHEAD`` time for CPU 4 is a lot larger than those for the little
250*40d553cfSPaul BeesleyCPUs because the L2 cache size for the big cluster is lot larger (2MB) compared
251*40d553cfSPaul Beesleyto the little cluster (1MB).
252*40d553cfSPaul Beesley
253*40d553cfSPaul BeesleyThe ``PSCI_EXIT`` times for CPUs in the big cluster are slightly smaller than
254*40d553cfSPaul Beesleyfor CPUs in the little cluster due to greater CPU performance.  These times
255*40d553cfSPaul Beesleygenerally are greater than the ``PSCI_EXIT`` times in the ``CPU_SUSPEND`` tests
256*40d553cfSPaul Beesleybecause there is more code to execute in the "on finisher" compared to the
257*40d553cfSPaul Beesley"suspend finisher" (for example, GIC redistributor register programming).
258*40d553cfSPaul Beesley
259*40d553cfSPaul Beesley``PSCI_VERSION`` on all CPUs in parallel
260*40d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
261*40d553cfSPaul Beesley
262*40d553cfSPaul BeesleySince very little code is associated with ``PSCI_VERSION``, this test
263*40d553cfSPaul Beesleyapproximates the round trip latency for handling a fast SMC at EL3 in TF.
264*40d553cfSPaul Beesley
265*40d553cfSPaul Beesley+-------+-------------------+
266*40d553cfSPaul Beesley| CPU   | TOTAL TIME (ns)   |
267*40d553cfSPaul Beesley+=======+===================+
268*40d553cfSPaul Beesley| 0     | 3020              |
269*40d553cfSPaul Beesley+-------+-------------------+
270*40d553cfSPaul Beesley| 1     | 2940              |
271*40d553cfSPaul Beesley+-------+-------------------+
272*40d553cfSPaul Beesley| 2     | 2980              |
273*40d553cfSPaul Beesley+-------+-------------------+
274*40d553cfSPaul Beesley| 3     | 3060              |
275*40d553cfSPaul Beesley+-------+-------------------+
276*40d553cfSPaul Beesley| 4     | 520               |
277*40d553cfSPaul Beesley+-------+-------------------+
278*40d553cfSPaul Beesley| 5     | 720               |
279*40d553cfSPaul Beesley+-------+-------------------+
280*40d553cfSPaul Beesley
281*40d553cfSPaul BeesleyThe times for the big CPUs are less than the little CPUs due to greater CPU
282*40d553cfSPaul Beesleyperformance.
283*40d553cfSPaul Beesley
284*40d553cfSPaul BeesleyWe suspect the time for lead CPU 4 is shorter than CPU 5 due to subtle cache
285*40d553cfSPaul Beesleyeffects, given that these measurements are at the nano-second level.
286*40d553cfSPaul Beesley
287*40d553cfSPaul Beesley.. _Juno R1 platform: https://www.arm.com/files/pdf/Juno_r1_ARM_Dev_datasheet.pdf
288*40d553cfSPaul Beesley.. _TF master as of 31/01/2017: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/?id=c38b36d
289