xref: /rk3399_ARM-atf/docs/components/ras.rst (revision 40d553cfde38d4f68449c62967cd1ce0d6478750)
1*40d553cfSPaul BeesleyRAS support in Trusted Firmware-A
2*40d553cfSPaul Beesley=================================
3*40d553cfSPaul Beesley
4*40d553cfSPaul Beesley
5*40d553cfSPaul Beesley
6*40d553cfSPaul Beesley.. contents::
7*40d553cfSPaul Beesley    :depth: 2
8*40d553cfSPaul Beesley
9*40d553cfSPaul Beesley.. |EHF| replace:: Exception Handling Framework
10*40d553cfSPaul Beesley.. |TF-A| replace:: Trusted Firmware-A
11*40d553cfSPaul Beesley
12*40d553cfSPaul BeesleyThis document describes |TF-A| support for Arm Reliability, Availability, and
13*40d553cfSPaul BeesleyServiceability (RAS) extensions. RAS is a mandatory extension for Armv8.2 and
14*40d553cfSPaul Beesleylater CPUs, and also an optional extension to the base Armv8.0 architecture.
15*40d553cfSPaul Beesley
16*40d553cfSPaul BeesleyIn conjunction with the |EHF|, support for RAS extension enables firmware-first
17*40d553cfSPaul Beesleyparadigm for handling platform errors: exceptions resulting from errors are
18*40d553cfSPaul Beesleyrouted to and handled in EL3. Said errors are Synchronous External Abort (SEA),
19*40d553cfSPaul BeesleyAsynchronous External Abort (signalled as SErrors), Fault Handling and Error
20*40d553cfSPaul BeesleyRecovery interrupts.  The |EHF| document mentions various `error handling
21*40d553cfSPaul Beesleyuse-cases`__.
22*40d553cfSPaul Beesley
23*40d553cfSPaul Beesley.. __: exception-handling.rst#delegation-use-cases
24*40d553cfSPaul Beesley
25*40d553cfSPaul BeesleyFor the description of Arm RAS extensions, Standard Error Records, and the
26*40d553cfSPaul Beesleyprecise definition of RAS terminology, please refer to the Arm Architecture
27*40d553cfSPaul BeesleyReference Manual. The rest of this document assumes familiarity with
28*40d553cfSPaul Beesleyarchitecture and terminology.
29*40d553cfSPaul Beesley
30*40d553cfSPaul BeesleyOverview
31*40d553cfSPaul Beesley--------
32*40d553cfSPaul Beesley
33*40d553cfSPaul BeesleyAs mentioned above, the RAS support in |TF-A| enables routing to and handling of
34*40d553cfSPaul Beesleyexceptions resulting from platform errors in EL3. It allows the platform to
35*40d553cfSPaul Beesleydefine an External Abort handler, and to register RAS nodes and interrupts. RAS
36*40d553cfSPaul Beesleyframework also provides `helpers`__ for accessing Standard Error Records as
37*40d553cfSPaul Beesleyintroduced by the RAS extensions.
38*40d553cfSPaul Beesley
39*40d553cfSPaul Beesley.. __: `Standard Error Record helpers`_
40*40d553cfSPaul Beesley
41*40d553cfSPaul BeesleyThe build option ``RAS_EXTENSION`` when set to ``1`` includes the RAS in run
42*40d553cfSPaul Beesleytime firmware; ``EL3_EXCEPTION_HANDLING`` and ``HANDLE_EA_EL3_FIRST`` must also
43*40d553cfSPaul Beesleybe set ``1``.
44*40d553cfSPaul Beesley
45*40d553cfSPaul Beesley.. _ras-figure:
46*40d553cfSPaul Beesley
47*40d553cfSPaul Beesley.. image:: ../draw.io/ras.svg
48*40d553cfSPaul Beesley
49*40d553cfSPaul BeesleySee more on `Engaging the RAS framework`_.
50*40d553cfSPaul Beesley
51*40d553cfSPaul BeesleyPlatform APIs
52*40d553cfSPaul Beesley-------------
53*40d553cfSPaul Beesley
54*40d553cfSPaul BeesleyThe RAS framework allows the platform to define handlers for External Abort,
55*40d553cfSPaul BeesleyUncontainable Errors, Double Fault, and errors rising from EL3 execution. Please
56*40d553cfSPaul Beesleyrefer to the porting guide for the `RAS platform API descriptions`__.
57*40d553cfSPaul Beesley
58*40d553cfSPaul Beesley.. __: ../getting_started/porting-guide.rst#external-abort-handling-and-ras-support
59*40d553cfSPaul Beesley
60*40d553cfSPaul BeesleyRegistering RAS error records
61*40d553cfSPaul Beesley-----------------------------
62*40d553cfSPaul Beesley
63*40d553cfSPaul BeesleyRAS nodes are components in the system capable of signalling errors to PEs
64*40d553cfSPaul Beesleythrough one one of the notification mechanisms—SEAs, SErrors, or interrupts. RAS
65*40d553cfSPaul Beesleynodes contain one or more error records, which are registers through which the
66*40d553cfSPaul Beesleynodes advertise various properties of the signalled error. Arm recommends that
67*40d553cfSPaul Beesleyerror records are implemented in the Standard Error Record format. The RAS
68*40d553cfSPaul Beesleyarchitecture allows for error records to be accessible via system or
69*40d553cfSPaul Beesleymemory-mapped registers.
70*40d553cfSPaul Beesley
71*40d553cfSPaul BeesleyThe platform should enumerate the error records providing for each of them:
72*40d553cfSPaul Beesley
73*40d553cfSPaul Beesley-  A handler to probe error records for errors;
74*40d553cfSPaul Beesley-  When the probing identifies an error, a handler to handle it;
75*40d553cfSPaul Beesley-  For memory-mapped error record, its base address and size in KB; for a system
76*40d553cfSPaul Beesley   register-accessed record, the start index of the record and number of
77*40d553cfSPaul Beesley   continuous records from that index;
78*40d553cfSPaul Beesley-  Any node-specific auxiliary data.
79*40d553cfSPaul Beesley
80*40d553cfSPaul BeesleyWith this information supplied, when the run time firmware receives one of the
81*40d553cfSPaul Beesleynotification mechanisms, the RAS framework can iterate through and probe error
82*40d553cfSPaul Beesleyrecords for error, and invoke the appropriate handler to handle it.
83*40d553cfSPaul Beesley
84*40d553cfSPaul BeesleyThe RAS framework provides the macros to populate error record information. The
85*40d553cfSPaul Beesleymacros are versioned, and the latest version as of this writing is 1. These
86*40d553cfSPaul Beesleymacros create a structure of type ``struct err_record_info`` from its arguments,
87*40d553cfSPaul Beesleywhich are later passed to probe and error handlers.
88*40d553cfSPaul Beesley
89*40d553cfSPaul BeesleyFor memory-mapped error records:
90*40d553cfSPaul Beesley
91*40d553cfSPaul Beesley.. code:: c
92*40d553cfSPaul Beesley
93*40d553cfSPaul Beesley    ERR_RECORD_MEMMAP_V1(base_addr, size_num_k, probe, handler, aux)
94*40d553cfSPaul Beesley
95*40d553cfSPaul BeesleyAnd, for system register ones:
96*40d553cfSPaul Beesley
97*40d553cfSPaul Beesley.. code:: c
98*40d553cfSPaul Beesley
99*40d553cfSPaul Beesley    ERR_RECORD_SYSREG_V1(idx_start, num_idx, probe, handler, aux)
100*40d553cfSPaul Beesley
101*40d553cfSPaul BeesleyThe probe handler must have the following prototype:
102*40d553cfSPaul Beesley
103*40d553cfSPaul Beesley.. code:: c
104*40d553cfSPaul Beesley
105*40d553cfSPaul Beesley    typedef int (*err_record_probe_t)(const struct err_record_info *info,
106*40d553cfSPaul Beesley                    int *probe_data);
107*40d553cfSPaul Beesley
108*40d553cfSPaul BeesleyThe probe handler must return a non-zero value if an error was detected, or 0
109*40d553cfSPaul Beesleyotherwise. The ``probe_data`` output parameter can be used to pass any useful
110*40d553cfSPaul Beesleyinformation resulting from probe to the error handler (see `below`__). For
111*40d553cfSPaul Beesleyexample, it could return the index of the record.
112*40d553cfSPaul Beesley
113*40d553cfSPaul Beesley.. __: `Standard Error Record helpers`_
114*40d553cfSPaul Beesley
115*40d553cfSPaul BeesleyThe error handler must have the following prototype:
116*40d553cfSPaul Beesley
117*40d553cfSPaul Beesley.. code:: c
118*40d553cfSPaul Beesley
119*40d553cfSPaul Beesley    typedef int (*err_record_handler_t)(const struct err_record_info *info,
120*40d553cfSPaul Beesley               int probe_data, const struct err_handler_data *const data);
121*40d553cfSPaul Beesley
122*40d553cfSPaul BeesleyThe ``data`` constant parameter describes the various properties of the error,
123*40d553cfSPaul Beesleyincluding the reason for the error, exception syndrome, and also ``flags``,
124*40d553cfSPaul Beesley``cookie``, and ``handle`` parameters from the `top-level exception handler`__.
125*40d553cfSPaul Beesley
126*40d553cfSPaul Beesley.. __: interrupt-framework-design.rst#el3-interrupts
127*40d553cfSPaul Beesley
128*40d553cfSPaul BeesleyThe platform is expected populate an array using the macros above, and register
129*40d553cfSPaul Beesleythe it with the RAS framework using the macro ``REGISTER_ERR_RECORD_INFO()``,
130*40d553cfSPaul Beesleypassing it the name of the array describing the records. Note that the macro
131*40d553cfSPaul Beesleymust be used in the same file where the array is defined.
132*40d553cfSPaul Beesley
133*40d553cfSPaul BeesleyStandard Error Record helpers
134*40d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
135*40d553cfSPaul Beesley
136*40d553cfSPaul BeesleyThe |TF-A| RAS framework provides probe handlers for Standard Error Records, for
137*40d553cfSPaul Beesleyboth memory-mapped and System Register accesses:
138*40d553cfSPaul Beesley
139*40d553cfSPaul Beesley.. code:: c
140*40d553cfSPaul Beesley
141*40d553cfSPaul Beesley    int ras_err_ser_probe_memmap(const struct err_record_info *info,
142*40d553cfSPaul Beesley                int *probe_data);
143*40d553cfSPaul Beesley
144*40d553cfSPaul Beesley    int ras_err_ser_probe_sysreg(const struct err_record_info *info,
145*40d553cfSPaul Beesley                int *probe_data);
146*40d553cfSPaul Beesley
147*40d553cfSPaul BeesleyWhen the platform enumerates error records, for those records in the Standard
148*40d553cfSPaul BeesleyError Record format, these helpers maybe used instead of rolling out their own.
149*40d553cfSPaul BeesleyBoth helpers above:
150*40d553cfSPaul Beesley
151*40d553cfSPaul Beesley-  Return non-zero value when an error is detected in a Standard Error Record;
152*40d553cfSPaul Beesley-  Set ``probe_data`` to the index of the error record upon detecting an error.
153*40d553cfSPaul Beesley
154*40d553cfSPaul BeesleyRegistering RAS interrupts
155*40d553cfSPaul Beesley--------------------------
156*40d553cfSPaul Beesley
157*40d553cfSPaul BeesleyRAS nodes can signal errors to the PE by raising Fault Handling and/or Error
158*40d553cfSPaul BeesleyRecovery interrupts. For the firmware-first handling paradigm for interrupts to
159*40d553cfSPaul Beesleywork, the platform must setup and register with |EHF|. See `Interaction with
160*40d553cfSPaul BeesleyException Handling Framework`_.
161*40d553cfSPaul Beesley
162*40d553cfSPaul BeesleyFor each RAS interrupt, the platform has to provide structure of type ``struct
163*40d553cfSPaul Beesleyras_interrupt``:
164*40d553cfSPaul Beesley
165*40d553cfSPaul Beesley-  Interrupt number;
166*40d553cfSPaul Beesley-  The associated error record information (pointer to the corresponding
167*40d553cfSPaul Beesley   ``struct err_record_info``);
168*40d553cfSPaul Beesley-  Optionally, a cookie.
169*40d553cfSPaul Beesley
170*40d553cfSPaul BeesleyThe platform is expected to define an array of ``struct ras_interrupt``, and
171*40d553cfSPaul Beesleyregister it with the RAS framework using the macro
172*40d553cfSPaul Beesley``REGISTER_RAS_INTERRUPTS()``, passing it the name of the array. Note that the
173*40d553cfSPaul Beesleymacro must be used in the same file where the array is defined.
174*40d553cfSPaul Beesley
175*40d553cfSPaul BeesleyThe array of ``struct ras_interrupt`` must be sorted in the increasing order of
176*40d553cfSPaul Beesleyinterrupt number. This allows for fast look of handlers in order to service RAS
177*40d553cfSPaul Beesleyinterrupts.
178*40d553cfSPaul Beesley
179*40d553cfSPaul BeesleyDouble-fault handling
180*40d553cfSPaul Beesley---------------------
181*40d553cfSPaul Beesley
182*40d553cfSPaul BeesleyA Double Fault condition arises when an error is signalled to the PE while
183*40d553cfSPaul Beesleyhandling of a previously signalled error is still underway. When a Double Fault
184*40d553cfSPaul Beesleycondition arises, the Arm RAS extensions only require for handler to perform
185*40d553cfSPaul Beesleyorderly shutdown of the system, as recovery may be impossible.
186*40d553cfSPaul Beesley
187*40d553cfSPaul BeesleyThe RAS extensions part of Armv8.4 introduced new architectural features to deal
188*40d553cfSPaul Beesleywith Double Fault conditions, specifically, the introduction of ``NMEA`` and
189*40d553cfSPaul Beesley``EASE`` bits to ``SCR_EL3`` register. These were introduced to assist EL3
190*40d553cfSPaul Beesleysoftware which runs part of its entry/exit routines with exceptions momentarily
191*40d553cfSPaul Beesleymasked—meaning, in such systems, External Aborts/SErrors are not immediately
192*40d553cfSPaul Beesleyhandled when they occur, but only after the exceptions are unmasked again.
193*40d553cfSPaul Beesley
194*40d553cfSPaul Beesley|TF-A|, for legacy reasons, executes entire EL3 with all exceptions unmasked.
195*40d553cfSPaul BeesleyThis means that all exceptions routed to EL3 are handled immediately. |TF-A|
196*40d553cfSPaul Beesleythus is able to detect a Double Fault conditions in software, without needing
197*40d553cfSPaul Beesleythe intended advantages of Armv8.4 Double Fault architecture extensions.
198*40d553cfSPaul Beesley
199*40d553cfSPaul BeesleyDouble faults are fatal, and terminate at the platform double fault handler, and
200*40d553cfSPaul Beesleydoesn't return.
201*40d553cfSPaul Beesley
202*40d553cfSPaul BeesleyEngaging the RAS framework
203*40d553cfSPaul Beesley--------------------------
204*40d553cfSPaul Beesley
205*40d553cfSPaul BeesleyEnabling RAS support is a platform choice constructed from three distinct, but
206*40d553cfSPaul Beesleyrelated, build options:
207*40d553cfSPaul Beesley
208*40d553cfSPaul Beesley-  ``RAS_EXTENSION=1`` includes the RAS framework in the run time firmware;
209*40d553cfSPaul Beesley
210*40d553cfSPaul Beesley-  ``EL3_EXCEPTION_HANDLING=1`` enables handling of exceptions at EL3. See
211*40d553cfSPaul Beesley   `Interaction with Exception Handling Framework`_;
212*40d553cfSPaul Beesley
213*40d553cfSPaul Beesley-  ``HANDLE_EA_EL3_FIRST=1`` enables routing of External Aborts and SErrors to
214*40d553cfSPaul Beesley   EL3.
215*40d553cfSPaul Beesley
216*40d553cfSPaul BeesleyThe RAS support in |TF-A| introduces a default implementation of
217*40d553cfSPaul Beesley``plat_ea_handler``, the External Abort handler in EL3. When ``RAS_EXTENSION``
218*40d553cfSPaul Beesleyis set to ``1``, it'll first call ``ras_ea_handler()`` function, which is the
219*40d553cfSPaul Beesleytop-level RAS exception handler. ``ras_ea_handler`` is responsible for iterating
220*40d553cfSPaul Beesleyto through platform-supplied error records, probe them, and when an error is
221*40d553cfSPaul Beesleyidentified, look up and invoke the corresponding error handler.
222*40d553cfSPaul Beesley
223*40d553cfSPaul BeesleyNote that, if the platform chooses to override the ``plat_ea_handler`` function
224*40d553cfSPaul Beesleyand intend to use the RAS framework, it must explicitly call
225*40d553cfSPaul Beesley``ras_ea_handler()`` from within.
226*40d553cfSPaul Beesley
227*40d553cfSPaul BeesleySimilarly, for RAS interrupts, the framework defines
228*40d553cfSPaul Beesley``ras_interrupt_handler()``. The RAS framework arranges for it to be invoked
229*40d553cfSPaul Beesleywhen  a RAS interrupt taken at EL3. The function bisects the platform-supplied
230*40d553cfSPaul Beesleysorted array of interrupts to look up the error record information associated
231*40d553cfSPaul Beesleywith the interrupt number. That error handler for that record is then invoked to
232*40d553cfSPaul Beesleyhandle the error.
233*40d553cfSPaul Beesley
234*40d553cfSPaul BeesleyInteraction with Exception Handling Framework
235*40d553cfSPaul Beesley---------------------------------------------
236*40d553cfSPaul Beesley
237*40d553cfSPaul BeesleyAs mentioned in earlier sections, RAS framework interacts with the |EHF| to
238*40d553cfSPaul Beesleyarbitrate handling of RAS exceptions with others that are routed to EL3. This
239*40d553cfSPaul Beesleymeans that the platform must partition a `priority level`__ for handling RAS
240*40d553cfSPaul Beesleyexceptions. The platform must then define the macro ``PLAT_RAS_PRI`` to the
241*40d553cfSPaul Beesleypriority level used for RAS exceptions. Platforms would typically want to
242*40d553cfSPaul Beesleyallocate the highest secure priority for RAS handling.
243*40d553cfSPaul Beesley
244*40d553cfSPaul Beesley.. __: exception-handling.rst#partitioning-priority-levels
245*40d553cfSPaul Beesley
246*40d553cfSPaul BeesleyHandling of both `interrupt`__ and `non-interrupt`__ exceptions follow the
247*40d553cfSPaul Beesleysequences outlined in the |EHF| documentation. I.e., for interrupts, the
248*40d553cfSPaul Beesleypriority management is implicit; but for non-interrupt exceptions, they're
249*40d553cfSPaul Beesleyexplicit using `EHF APIs`__.
250*40d553cfSPaul Beesley
251*40d553cfSPaul Beesley.. __: exception-handling.rst#interrupt-flow
252*40d553cfSPaul Beesley.. __: exception-handling.rst#non-interrupt-flow
253*40d553cfSPaul Beesley.. __: exception-handling.rst#activating-and-deactivating-priorities
254*40d553cfSPaul Beesley
255*40d553cfSPaul Beesley----
256*40d553cfSPaul Beesley
257*40d553cfSPaul Beesley*Copyright (c) 2018, Arm Limited and Contributors. All rights reserved.*
258