xref: /rk3399_ARM-atf/docs/components/ras.rst (revision a2c320a83ef3966b30929636fb8345a7eabee2ae)
18aa05055SPaul BeesleyReliability, Availability, and Serviceability (RAS) Extensions
28aa05055SPaul Beesley==============================================================
340d553cfSPaul Beesley
440d553cfSPaul Beesley.. |EHF| replace:: Exception Handling Framework
540d553cfSPaul Beesley.. |TF-A| replace:: Trusted Firmware-A
640d553cfSPaul Beesley
740d553cfSPaul BeesleyThis document describes |TF-A| support for Arm Reliability, Availability, and
840d553cfSPaul BeesleyServiceability (RAS) extensions. RAS is a mandatory extension for Armv8.2 and
940d553cfSPaul Beesleylater CPUs, and also an optional extension to the base Armv8.0 architecture.
1040d553cfSPaul Beesley
1140d553cfSPaul BeesleyIn conjunction with the |EHF|, support for RAS extension enables firmware-first
1240d553cfSPaul Beesleyparadigm for handling platform errors: exceptions resulting from errors are
1340d553cfSPaul Beesleyrouted to and handled in EL3. Said errors are Synchronous External Abort (SEA),
1440d553cfSPaul BeesleyAsynchronous External Abort (signalled as SErrors), Fault Handling and Error
1540d553cfSPaul BeesleyRecovery interrupts.  The |EHF| document mentions various `error handling
1640d553cfSPaul Beesleyuse-cases`__.
1740d553cfSPaul Beesley
1840d553cfSPaul Beesley.. __: exception-handling.rst#delegation-use-cases
1940d553cfSPaul Beesley
2040d553cfSPaul BeesleyFor the description of Arm RAS extensions, Standard Error Records, and the
2140d553cfSPaul Beesleyprecise definition of RAS terminology, please refer to the Arm Architecture
2240d553cfSPaul BeesleyReference Manual. The rest of this document assumes familiarity with
2340d553cfSPaul Beesleyarchitecture and terminology.
2440d553cfSPaul Beesley
2540d553cfSPaul BeesleyOverview
2640d553cfSPaul Beesley--------
2740d553cfSPaul Beesley
2840d553cfSPaul BeesleyAs mentioned above, the RAS support in |TF-A| enables routing to and handling of
2940d553cfSPaul Beesleyexceptions resulting from platform errors in EL3. It allows the platform to
3040d553cfSPaul Beesleydefine an External Abort handler, and to register RAS nodes and interrupts. RAS
3140d553cfSPaul Beesleyframework also provides `helpers`__ for accessing Standard Error Records as
3240d553cfSPaul Beesleyintroduced by the RAS extensions.
3340d553cfSPaul Beesley
3440d553cfSPaul Beesley.. __: `Standard Error Record helpers`_
3540d553cfSPaul Beesley
3640d553cfSPaul BeesleyThe build option ``RAS_EXTENSION`` when set to ``1`` includes the RAS in run
3740d553cfSPaul Beesleytime firmware; ``EL3_EXCEPTION_HANDLING`` and ``HANDLE_EA_EL3_FIRST`` must also
3840d553cfSPaul Beesleybe set ``1``.
3940d553cfSPaul Beesley
4040d553cfSPaul Beesley.. _ras-figure:
4140d553cfSPaul Beesley
42*a2c320a8SPaul Beesley.. image:: ../resources/diagrams/draw.io/ras.svg
4340d553cfSPaul Beesley
4440d553cfSPaul BeesleySee more on `Engaging the RAS framework`_.
4540d553cfSPaul Beesley
4640d553cfSPaul BeesleyPlatform APIs
4740d553cfSPaul Beesley-------------
4840d553cfSPaul Beesley
4940d553cfSPaul BeesleyThe RAS framework allows the platform to define handlers for External Abort,
5040d553cfSPaul BeesleyUncontainable Errors, Double Fault, and errors rising from EL3 execution. Please
5140d553cfSPaul Beesleyrefer to the porting guide for the `RAS platform API descriptions`__.
5240d553cfSPaul Beesley
5340d553cfSPaul Beesley.. __: ../getting_started/porting-guide.rst#external-abort-handling-and-ras-support
5440d553cfSPaul Beesley
5540d553cfSPaul BeesleyRegistering RAS error records
5640d553cfSPaul Beesley-----------------------------
5740d553cfSPaul Beesley
5840d553cfSPaul BeesleyRAS nodes are components in the system capable of signalling errors to PEs
5940d553cfSPaul Beesleythrough one one of the notification mechanisms—SEAs, SErrors, or interrupts. RAS
6040d553cfSPaul Beesleynodes contain one or more error records, which are registers through which the
6140d553cfSPaul Beesleynodes advertise various properties of the signalled error. Arm recommends that
6240d553cfSPaul Beesleyerror records are implemented in the Standard Error Record format. The RAS
6340d553cfSPaul Beesleyarchitecture allows for error records to be accessible via system or
6440d553cfSPaul Beesleymemory-mapped registers.
6540d553cfSPaul Beesley
6640d553cfSPaul BeesleyThe platform should enumerate the error records providing for each of them:
6740d553cfSPaul Beesley
6840d553cfSPaul Beesley-  A handler to probe error records for errors;
6940d553cfSPaul Beesley-  When the probing identifies an error, a handler to handle it;
7040d553cfSPaul Beesley-  For memory-mapped error record, its base address and size in KB; for a system
7140d553cfSPaul Beesley   register-accessed record, the start index of the record and number of
7240d553cfSPaul Beesley   continuous records from that index;
7340d553cfSPaul Beesley-  Any node-specific auxiliary data.
7440d553cfSPaul Beesley
7540d553cfSPaul BeesleyWith this information supplied, when the run time firmware receives one of the
7640d553cfSPaul Beesleynotification mechanisms, the RAS framework can iterate through and probe error
7740d553cfSPaul Beesleyrecords for error, and invoke the appropriate handler to handle it.
7840d553cfSPaul Beesley
7940d553cfSPaul BeesleyThe RAS framework provides the macros to populate error record information. The
8040d553cfSPaul Beesleymacros are versioned, and the latest version as of this writing is 1. These
8140d553cfSPaul Beesleymacros create a structure of type ``struct err_record_info`` from its arguments,
8240d553cfSPaul Beesleywhich are later passed to probe and error handlers.
8340d553cfSPaul Beesley
8440d553cfSPaul BeesleyFor memory-mapped error records:
8540d553cfSPaul Beesley
8640d553cfSPaul Beesley.. code:: c
8740d553cfSPaul Beesley
8840d553cfSPaul Beesley    ERR_RECORD_MEMMAP_V1(base_addr, size_num_k, probe, handler, aux)
8940d553cfSPaul Beesley
9040d553cfSPaul BeesleyAnd, for system register ones:
9140d553cfSPaul Beesley
9240d553cfSPaul Beesley.. code:: c
9340d553cfSPaul Beesley
9440d553cfSPaul Beesley    ERR_RECORD_SYSREG_V1(idx_start, num_idx, probe, handler, aux)
9540d553cfSPaul Beesley
9640d553cfSPaul BeesleyThe probe handler must have the following prototype:
9740d553cfSPaul Beesley
9840d553cfSPaul Beesley.. code:: c
9940d553cfSPaul Beesley
10040d553cfSPaul Beesley    typedef int (*err_record_probe_t)(const struct err_record_info *info,
10140d553cfSPaul Beesley                    int *probe_data);
10240d553cfSPaul Beesley
10340d553cfSPaul BeesleyThe probe handler must return a non-zero value if an error was detected, or 0
10440d553cfSPaul Beesleyotherwise. The ``probe_data`` output parameter can be used to pass any useful
10540d553cfSPaul Beesleyinformation resulting from probe to the error handler (see `below`__). For
10640d553cfSPaul Beesleyexample, it could return the index of the record.
10740d553cfSPaul Beesley
10840d553cfSPaul Beesley.. __: `Standard Error Record helpers`_
10940d553cfSPaul Beesley
11040d553cfSPaul BeesleyThe error handler must have the following prototype:
11140d553cfSPaul Beesley
11240d553cfSPaul Beesley.. code:: c
11340d553cfSPaul Beesley
11440d553cfSPaul Beesley    typedef int (*err_record_handler_t)(const struct err_record_info *info,
11540d553cfSPaul Beesley               int probe_data, const struct err_handler_data *const data);
11640d553cfSPaul Beesley
11740d553cfSPaul BeesleyThe ``data`` constant parameter describes the various properties of the error,
11840d553cfSPaul Beesleyincluding the reason for the error, exception syndrome, and also ``flags``,
11940d553cfSPaul Beesley``cookie``, and ``handle`` parameters from the `top-level exception handler`__.
12040d553cfSPaul Beesley
12140d553cfSPaul Beesley.. __: interrupt-framework-design.rst#el3-interrupts
12240d553cfSPaul Beesley
12340d553cfSPaul BeesleyThe platform is expected populate an array using the macros above, and register
12440d553cfSPaul Beesleythe it with the RAS framework using the macro ``REGISTER_ERR_RECORD_INFO()``,
12540d553cfSPaul Beesleypassing it the name of the array describing the records. Note that the macro
12640d553cfSPaul Beesleymust be used in the same file where the array is defined.
12740d553cfSPaul Beesley
12840d553cfSPaul BeesleyStandard Error Record helpers
12940d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13040d553cfSPaul Beesley
13140d553cfSPaul BeesleyThe |TF-A| RAS framework provides probe handlers for Standard Error Records, for
13240d553cfSPaul Beesleyboth memory-mapped and System Register accesses:
13340d553cfSPaul Beesley
13440d553cfSPaul Beesley.. code:: c
13540d553cfSPaul Beesley
13640d553cfSPaul Beesley    int ras_err_ser_probe_memmap(const struct err_record_info *info,
13740d553cfSPaul Beesley                int *probe_data);
13840d553cfSPaul Beesley
13940d553cfSPaul Beesley    int ras_err_ser_probe_sysreg(const struct err_record_info *info,
14040d553cfSPaul Beesley                int *probe_data);
14140d553cfSPaul Beesley
14240d553cfSPaul BeesleyWhen the platform enumerates error records, for those records in the Standard
14340d553cfSPaul BeesleyError Record format, these helpers maybe used instead of rolling out their own.
14440d553cfSPaul BeesleyBoth helpers above:
14540d553cfSPaul Beesley
14640d553cfSPaul Beesley-  Return non-zero value when an error is detected in a Standard Error Record;
14740d553cfSPaul Beesley-  Set ``probe_data`` to the index of the error record upon detecting an error.
14840d553cfSPaul Beesley
14940d553cfSPaul BeesleyRegistering RAS interrupts
15040d553cfSPaul Beesley--------------------------
15140d553cfSPaul Beesley
15240d553cfSPaul BeesleyRAS nodes can signal errors to the PE by raising Fault Handling and/or Error
15340d553cfSPaul BeesleyRecovery interrupts. For the firmware-first handling paradigm for interrupts to
15440d553cfSPaul Beesleywork, the platform must setup and register with |EHF|. See `Interaction with
15540d553cfSPaul BeesleyException Handling Framework`_.
15640d553cfSPaul Beesley
15740d553cfSPaul BeesleyFor each RAS interrupt, the platform has to provide structure of type ``struct
15840d553cfSPaul Beesleyras_interrupt``:
15940d553cfSPaul Beesley
16040d553cfSPaul Beesley-  Interrupt number;
16140d553cfSPaul Beesley-  The associated error record information (pointer to the corresponding
16240d553cfSPaul Beesley   ``struct err_record_info``);
16340d553cfSPaul Beesley-  Optionally, a cookie.
16440d553cfSPaul Beesley
16540d553cfSPaul BeesleyThe platform is expected to define an array of ``struct ras_interrupt``, and
16640d553cfSPaul Beesleyregister it with the RAS framework using the macro
16740d553cfSPaul Beesley``REGISTER_RAS_INTERRUPTS()``, passing it the name of the array. Note that the
16840d553cfSPaul Beesleymacro must be used in the same file where the array is defined.
16940d553cfSPaul Beesley
17040d553cfSPaul BeesleyThe array of ``struct ras_interrupt`` must be sorted in the increasing order of
17140d553cfSPaul Beesleyinterrupt number. This allows for fast look of handlers in order to service RAS
17240d553cfSPaul Beesleyinterrupts.
17340d553cfSPaul Beesley
17440d553cfSPaul BeesleyDouble-fault handling
17540d553cfSPaul Beesley---------------------
17640d553cfSPaul Beesley
17740d553cfSPaul BeesleyA Double Fault condition arises when an error is signalled to the PE while
17840d553cfSPaul Beesleyhandling of a previously signalled error is still underway. When a Double Fault
17940d553cfSPaul Beesleycondition arises, the Arm RAS extensions only require for handler to perform
18040d553cfSPaul Beesleyorderly shutdown of the system, as recovery may be impossible.
18140d553cfSPaul Beesley
18240d553cfSPaul BeesleyThe RAS extensions part of Armv8.4 introduced new architectural features to deal
18340d553cfSPaul Beesleywith Double Fault conditions, specifically, the introduction of ``NMEA`` and
18440d553cfSPaul Beesley``EASE`` bits to ``SCR_EL3`` register. These were introduced to assist EL3
18540d553cfSPaul Beesleysoftware which runs part of its entry/exit routines with exceptions momentarily
18640d553cfSPaul Beesleymasked—meaning, in such systems, External Aborts/SErrors are not immediately
18740d553cfSPaul Beesleyhandled when they occur, but only after the exceptions are unmasked again.
18840d553cfSPaul Beesley
18940d553cfSPaul Beesley|TF-A|, for legacy reasons, executes entire EL3 with all exceptions unmasked.
19040d553cfSPaul BeesleyThis means that all exceptions routed to EL3 are handled immediately. |TF-A|
19140d553cfSPaul Beesleythus is able to detect a Double Fault conditions in software, without needing
19240d553cfSPaul Beesleythe intended advantages of Armv8.4 Double Fault architecture extensions.
19340d553cfSPaul Beesley
19440d553cfSPaul BeesleyDouble faults are fatal, and terminate at the platform double fault handler, and
19540d553cfSPaul Beesleydoesn't return.
19640d553cfSPaul Beesley
19740d553cfSPaul BeesleyEngaging the RAS framework
19840d553cfSPaul Beesley--------------------------
19940d553cfSPaul Beesley
20040d553cfSPaul BeesleyEnabling RAS support is a platform choice constructed from three distinct, but
20140d553cfSPaul Beesleyrelated, build options:
20240d553cfSPaul Beesley
20340d553cfSPaul Beesley-  ``RAS_EXTENSION=1`` includes the RAS framework in the run time firmware;
20440d553cfSPaul Beesley
20540d553cfSPaul Beesley-  ``EL3_EXCEPTION_HANDLING=1`` enables handling of exceptions at EL3. See
20640d553cfSPaul Beesley   `Interaction with Exception Handling Framework`_;
20740d553cfSPaul Beesley
20840d553cfSPaul Beesley-  ``HANDLE_EA_EL3_FIRST=1`` enables routing of External Aborts and SErrors to
20940d553cfSPaul Beesley   EL3.
21040d553cfSPaul Beesley
21140d553cfSPaul BeesleyThe RAS support in |TF-A| introduces a default implementation of
21240d553cfSPaul Beesley``plat_ea_handler``, the External Abort handler in EL3. When ``RAS_EXTENSION``
21340d553cfSPaul Beesleyis set to ``1``, it'll first call ``ras_ea_handler()`` function, which is the
21440d553cfSPaul Beesleytop-level RAS exception handler. ``ras_ea_handler`` is responsible for iterating
21540d553cfSPaul Beesleyto through platform-supplied error records, probe them, and when an error is
21640d553cfSPaul Beesleyidentified, look up and invoke the corresponding error handler.
21740d553cfSPaul Beesley
21840d553cfSPaul BeesleyNote that, if the platform chooses to override the ``plat_ea_handler`` function
21940d553cfSPaul Beesleyand intend to use the RAS framework, it must explicitly call
22040d553cfSPaul Beesley``ras_ea_handler()`` from within.
22140d553cfSPaul Beesley
22240d553cfSPaul BeesleySimilarly, for RAS interrupts, the framework defines
22340d553cfSPaul Beesley``ras_interrupt_handler()``. The RAS framework arranges for it to be invoked
22440d553cfSPaul Beesleywhen  a RAS interrupt taken at EL3. The function bisects the platform-supplied
22540d553cfSPaul Beesleysorted array of interrupts to look up the error record information associated
22640d553cfSPaul Beesleywith the interrupt number. That error handler for that record is then invoked to
22740d553cfSPaul Beesleyhandle the error.
22840d553cfSPaul Beesley
22940d553cfSPaul BeesleyInteraction with Exception Handling Framework
23040d553cfSPaul Beesley---------------------------------------------
23140d553cfSPaul Beesley
23240d553cfSPaul BeesleyAs mentioned in earlier sections, RAS framework interacts with the |EHF| to
23340d553cfSPaul Beesleyarbitrate handling of RAS exceptions with others that are routed to EL3. This
23440d553cfSPaul Beesleymeans that the platform must partition a `priority level`__ for handling RAS
23540d553cfSPaul Beesleyexceptions. The platform must then define the macro ``PLAT_RAS_PRI`` to the
23640d553cfSPaul Beesleypriority level used for RAS exceptions. Platforms would typically want to
23740d553cfSPaul Beesleyallocate the highest secure priority for RAS handling.
23840d553cfSPaul Beesley
23940d553cfSPaul Beesley.. __: exception-handling.rst#partitioning-priority-levels
24040d553cfSPaul Beesley
24140d553cfSPaul BeesleyHandling of both `interrupt`__ and `non-interrupt`__ exceptions follow the
24240d553cfSPaul Beesleysequences outlined in the |EHF| documentation. I.e., for interrupts, the
24340d553cfSPaul Beesleypriority management is implicit; but for non-interrupt exceptions, they're
24440d553cfSPaul Beesleyexplicit using `EHF APIs`__.
24540d553cfSPaul Beesley
24640d553cfSPaul Beesley.. __: exception-handling.rst#interrupt-flow
24740d553cfSPaul Beesley.. __: exception-handling.rst#non-interrupt-flow
24840d553cfSPaul Beesley.. __: exception-handling.rst#activating-and-deactivating-priorities
24940d553cfSPaul Beesley
25040d553cfSPaul Beesley----
25140d553cfSPaul Beesley
25240d553cfSPaul Beesley*Copyright (c) 2018, Arm Limited and Contributors. All rights reserved.*
253