xref: /rk3399_ARM-atf/docs/components/ras.rst (revision 8aa050554b996406231a66a048b56fa03ba220c8)
1*8aa05055SPaul BeesleyReliability, Availability, and Serviceability (RAS) Extensions
2*8aa05055SPaul Beesley==============================================================
340d553cfSPaul Beesley
440d553cfSPaul Beesley.. contents::
540d553cfSPaul Beesley    :depth: 2
640d553cfSPaul Beesley
740d553cfSPaul Beesley.. |EHF| replace:: Exception Handling Framework
840d553cfSPaul Beesley.. |TF-A| replace:: Trusted Firmware-A
940d553cfSPaul Beesley
1040d553cfSPaul BeesleyThis document describes |TF-A| support for Arm Reliability, Availability, and
1140d553cfSPaul BeesleyServiceability (RAS) extensions. RAS is a mandatory extension for Armv8.2 and
1240d553cfSPaul Beesleylater CPUs, and also an optional extension to the base Armv8.0 architecture.
1340d553cfSPaul Beesley
1440d553cfSPaul BeesleyIn conjunction with the |EHF|, support for RAS extension enables firmware-first
1540d553cfSPaul Beesleyparadigm for handling platform errors: exceptions resulting from errors are
1640d553cfSPaul Beesleyrouted to and handled in EL3. Said errors are Synchronous External Abort (SEA),
1740d553cfSPaul BeesleyAsynchronous External Abort (signalled as SErrors), Fault Handling and Error
1840d553cfSPaul BeesleyRecovery interrupts.  The |EHF| document mentions various `error handling
1940d553cfSPaul Beesleyuse-cases`__.
2040d553cfSPaul Beesley
2140d553cfSPaul Beesley.. __: exception-handling.rst#delegation-use-cases
2240d553cfSPaul Beesley
2340d553cfSPaul BeesleyFor the description of Arm RAS extensions, Standard Error Records, and the
2440d553cfSPaul Beesleyprecise definition of RAS terminology, please refer to the Arm Architecture
2540d553cfSPaul BeesleyReference Manual. The rest of this document assumes familiarity with
2640d553cfSPaul Beesleyarchitecture and terminology.
2740d553cfSPaul Beesley
2840d553cfSPaul BeesleyOverview
2940d553cfSPaul Beesley--------
3040d553cfSPaul Beesley
3140d553cfSPaul BeesleyAs mentioned above, the RAS support in |TF-A| enables routing to and handling of
3240d553cfSPaul Beesleyexceptions resulting from platform errors in EL3. It allows the platform to
3340d553cfSPaul Beesleydefine an External Abort handler, and to register RAS nodes and interrupts. RAS
3440d553cfSPaul Beesleyframework also provides `helpers`__ for accessing Standard Error Records as
3540d553cfSPaul Beesleyintroduced by the RAS extensions.
3640d553cfSPaul Beesley
3740d553cfSPaul Beesley.. __: `Standard Error Record helpers`_
3840d553cfSPaul Beesley
3940d553cfSPaul BeesleyThe build option ``RAS_EXTENSION`` when set to ``1`` includes the RAS in run
4040d553cfSPaul Beesleytime firmware; ``EL3_EXCEPTION_HANDLING`` and ``HANDLE_EA_EL3_FIRST`` must also
4140d553cfSPaul Beesleybe set ``1``.
4240d553cfSPaul Beesley
4340d553cfSPaul Beesley.. _ras-figure:
4440d553cfSPaul Beesley
4540d553cfSPaul Beesley.. image:: ../draw.io/ras.svg
4640d553cfSPaul Beesley
4740d553cfSPaul BeesleySee more on `Engaging the RAS framework`_.
4840d553cfSPaul Beesley
4940d553cfSPaul BeesleyPlatform APIs
5040d553cfSPaul Beesley-------------
5140d553cfSPaul Beesley
5240d553cfSPaul BeesleyThe RAS framework allows the platform to define handlers for External Abort,
5340d553cfSPaul BeesleyUncontainable Errors, Double Fault, and errors rising from EL3 execution. Please
5440d553cfSPaul Beesleyrefer to the porting guide for the `RAS platform API descriptions`__.
5540d553cfSPaul Beesley
5640d553cfSPaul Beesley.. __: ../getting_started/porting-guide.rst#external-abort-handling-and-ras-support
5740d553cfSPaul Beesley
5840d553cfSPaul BeesleyRegistering RAS error records
5940d553cfSPaul Beesley-----------------------------
6040d553cfSPaul Beesley
6140d553cfSPaul BeesleyRAS nodes are components in the system capable of signalling errors to PEs
6240d553cfSPaul Beesleythrough one one of the notification mechanisms—SEAs, SErrors, or interrupts. RAS
6340d553cfSPaul Beesleynodes contain one or more error records, which are registers through which the
6440d553cfSPaul Beesleynodes advertise various properties of the signalled error. Arm recommends that
6540d553cfSPaul Beesleyerror records are implemented in the Standard Error Record format. The RAS
6640d553cfSPaul Beesleyarchitecture allows for error records to be accessible via system or
6740d553cfSPaul Beesleymemory-mapped registers.
6840d553cfSPaul Beesley
6940d553cfSPaul BeesleyThe platform should enumerate the error records providing for each of them:
7040d553cfSPaul Beesley
7140d553cfSPaul Beesley-  A handler to probe error records for errors;
7240d553cfSPaul Beesley-  When the probing identifies an error, a handler to handle it;
7340d553cfSPaul Beesley-  For memory-mapped error record, its base address and size in KB; for a system
7440d553cfSPaul Beesley   register-accessed record, the start index of the record and number of
7540d553cfSPaul Beesley   continuous records from that index;
7640d553cfSPaul Beesley-  Any node-specific auxiliary data.
7740d553cfSPaul Beesley
7840d553cfSPaul BeesleyWith this information supplied, when the run time firmware receives one of the
7940d553cfSPaul Beesleynotification mechanisms, the RAS framework can iterate through and probe error
8040d553cfSPaul Beesleyrecords for error, and invoke the appropriate handler to handle it.
8140d553cfSPaul Beesley
8240d553cfSPaul BeesleyThe RAS framework provides the macros to populate error record information. The
8340d553cfSPaul Beesleymacros are versioned, and the latest version as of this writing is 1. These
8440d553cfSPaul Beesleymacros create a structure of type ``struct err_record_info`` from its arguments,
8540d553cfSPaul Beesleywhich are later passed to probe and error handlers.
8640d553cfSPaul Beesley
8740d553cfSPaul BeesleyFor memory-mapped error records:
8840d553cfSPaul Beesley
8940d553cfSPaul Beesley.. code:: c
9040d553cfSPaul Beesley
9140d553cfSPaul Beesley    ERR_RECORD_MEMMAP_V1(base_addr, size_num_k, probe, handler, aux)
9240d553cfSPaul Beesley
9340d553cfSPaul BeesleyAnd, for system register ones:
9440d553cfSPaul Beesley
9540d553cfSPaul Beesley.. code:: c
9640d553cfSPaul Beesley
9740d553cfSPaul Beesley    ERR_RECORD_SYSREG_V1(idx_start, num_idx, probe, handler, aux)
9840d553cfSPaul Beesley
9940d553cfSPaul BeesleyThe probe handler must have the following prototype:
10040d553cfSPaul Beesley
10140d553cfSPaul Beesley.. code:: c
10240d553cfSPaul Beesley
10340d553cfSPaul Beesley    typedef int (*err_record_probe_t)(const struct err_record_info *info,
10440d553cfSPaul Beesley                    int *probe_data);
10540d553cfSPaul Beesley
10640d553cfSPaul BeesleyThe probe handler must return a non-zero value if an error was detected, or 0
10740d553cfSPaul Beesleyotherwise. The ``probe_data`` output parameter can be used to pass any useful
10840d553cfSPaul Beesleyinformation resulting from probe to the error handler (see `below`__). For
10940d553cfSPaul Beesleyexample, it could return the index of the record.
11040d553cfSPaul Beesley
11140d553cfSPaul Beesley.. __: `Standard Error Record helpers`_
11240d553cfSPaul Beesley
11340d553cfSPaul BeesleyThe error handler must have the following prototype:
11440d553cfSPaul Beesley
11540d553cfSPaul Beesley.. code:: c
11640d553cfSPaul Beesley
11740d553cfSPaul Beesley    typedef int (*err_record_handler_t)(const struct err_record_info *info,
11840d553cfSPaul Beesley               int probe_data, const struct err_handler_data *const data);
11940d553cfSPaul Beesley
12040d553cfSPaul BeesleyThe ``data`` constant parameter describes the various properties of the error,
12140d553cfSPaul Beesleyincluding the reason for the error, exception syndrome, and also ``flags``,
12240d553cfSPaul Beesley``cookie``, and ``handle`` parameters from the `top-level exception handler`__.
12340d553cfSPaul Beesley
12440d553cfSPaul Beesley.. __: interrupt-framework-design.rst#el3-interrupts
12540d553cfSPaul Beesley
12640d553cfSPaul BeesleyThe platform is expected populate an array using the macros above, and register
12740d553cfSPaul Beesleythe it with the RAS framework using the macro ``REGISTER_ERR_RECORD_INFO()``,
12840d553cfSPaul Beesleypassing it the name of the array describing the records. Note that the macro
12940d553cfSPaul Beesleymust be used in the same file where the array is defined.
13040d553cfSPaul Beesley
13140d553cfSPaul BeesleyStandard Error Record helpers
13240d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13340d553cfSPaul Beesley
13440d553cfSPaul BeesleyThe |TF-A| RAS framework provides probe handlers for Standard Error Records, for
13540d553cfSPaul Beesleyboth memory-mapped and System Register accesses:
13640d553cfSPaul Beesley
13740d553cfSPaul Beesley.. code:: c
13840d553cfSPaul Beesley
13940d553cfSPaul Beesley    int ras_err_ser_probe_memmap(const struct err_record_info *info,
14040d553cfSPaul Beesley                int *probe_data);
14140d553cfSPaul Beesley
14240d553cfSPaul Beesley    int ras_err_ser_probe_sysreg(const struct err_record_info *info,
14340d553cfSPaul Beesley                int *probe_data);
14440d553cfSPaul Beesley
14540d553cfSPaul BeesleyWhen the platform enumerates error records, for those records in the Standard
14640d553cfSPaul BeesleyError Record format, these helpers maybe used instead of rolling out their own.
14740d553cfSPaul BeesleyBoth helpers above:
14840d553cfSPaul Beesley
14940d553cfSPaul Beesley-  Return non-zero value when an error is detected in a Standard Error Record;
15040d553cfSPaul Beesley-  Set ``probe_data`` to the index of the error record upon detecting an error.
15140d553cfSPaul Beesley
15240d553cfSPaul BeesleyRegistering RAS interrupts
15340d553cfSPaul Beesley--------------------------
15440d553cfSPaul Beesley
15540d553cfSPaul BeesleyRAS nodes can signal errors to the PE by raising Fault Handling and/or Error
15640d553cfSPaul BeesleyRecovery interrupts. For the firmware-first handling paradigm for interrupts to
15740d553cfSPaul Beesleywork, the platform must setup and register with |EHF|. See `Interaction with
15840d553cfSPaul BeesleyException Handling Framework`_.
15940d553cfSPaul Beesley
16040d553cfSPaul BeesleyFor each RAS interrupt, the platform has to provide structure of type ``struct
16140d553cfSPaul Beesleyras_interrupt``:
16240d553cfSPaul Beesley
16340d553cfSPaul Beesley-  Interrupt number;
16440d553cfSPaul Beesley-  The associated error record information (pointer to the corresponding
16540d553cfSPaul Beesley   ``struct err_record_info``);
16640d553cfSPaul Beesley-  Optionally, a cookie.
16740d553cfSPaul Beesley
16840d553cfSPaul BeesleyThe platform is expected to define an array of ``struct ras_interrupt``, and
16940d553cfSPaul Beesleyregister it with the RAS framework using the macro
17040d553cfSPaul Beesley``REGISTER_RAS_INTERRUPTS()``, passing it the name of the array. Note that the
17140d553cfSPaul Beesleymacro must be used in the same file where the array is defined.
17240d553cfSPaul Beesley
17340d553cfSPaul BeesleyThe array of ``struct ras_interrupt`` must be sorted in the increasing order of
17440d553cfSPaul Beesleyinterrupt number. This allows for fast look of handlers in order to service RAS
17540d553cfSPaul Beesleyinterrupts.
17640d553cfSPaul Beesley
17740d553cfSPaul BeesleyDouble-fault handling
17840d553cfSPaul Beesley---------------------
17940d553cfSPaul Beesley
18040d553cfSPaul BeesleyA Double Fault condition arises when an error is signalled to the PE while
18140d553cfSPaul Beesleyhandling of a previously signalled error is still underway. When a Double Fault
18240d553cfSPaul Beesleycondition arises, the Arm RAS extensions only require for handler to perform
18340d553cfSPaul Beesleyorderly shutdown of the system, as recovery may be impossible.
18440d553cfSPaul Beesley
18540d553cfSPaul BeesleyThe RAS extensions part of Armv8.4 introduced new architectural features to deal
18640d553cfSPaul Beesleywith Double Fault conditions, specifically, the introduction of ``NMEA`` and
18740d553cfSPaul Beesley``EASE`` bits to ``SCR_EL3`` register. These were introduced to assist EL3
18840d553cfSPaul Beesleysoftware which runs part of its entry/exit routines with exceptions momentarily
18940d553cfSPaul Beesleymasked—meaning, in such systems, External Aborts/SErrors are not immediately
19040d553cfSPaul Beesleyhandled when they occur, but only after the exceptions are unmasked again.
19140d553cfSPaul Beesley
19240d553cfSPaul Beesley|TF-A|, for legacy reasons, executes entire EL3 with all exceptions unmasked.
19340d553cfSPaul BeesleyThis means that all exceptions routed to EL3 are handled immediately. |TF-A|
19440d553cfSPaul Beesleythus is able to detect a Double Fault conditions in software, without needing
19540d553cfSPaul Beesleythe intended advantages of Armv8.4 Double Fault architecture extensions.
19640d553cfSPaul Beesley
19740d553cfSPaul BeesleyDouble faults are fatal, and terminate at the platform double fault handler, and
19840d553cfSPaul Beesleydoesn't return.
19940d553cfSPaul Beesley
20040d553cfSPaul BeesleyEngaging the RAS framework
20140d553cfSPaul Beesley--------------------------
20240d553cfSPaul Beesley
20340d553cfSPaul BeesleyEnabling RAS support is a platform choice constructed from three distinct, but
20440d553cfSPaul Beesleyrelated, build options:
20540d553cfSPaul Beesley
20640d553cfSPaul Beesley-  ``RAS_EXTENSION=1`` includes the RAS framework in the run time firmware;
20740d553cfSPaul Beesley
20840d553cfSPaul Beesley-  ``EL3_EXCEPTION_HANDLING=1`` enables handling of exceptions at EL3. See
20940d553cfSPaul Beesley   `Interaction with Exception Handling Framework`_;
21040d553cfSPaul Beesley
21140d553cfSPaul Beesley-  ``HANDLE_EA_EL3_FIRST=1`` enables routing of External Aborts and SErrors to
21240d553cfSPaul Beesley   EL3.
21340d553cfSPaul Beesley
21440d553cfSPaul BeesleyThe RAS support in |TF-A| introduces a default implementation of
21540d553cfSPaul Beesley``plat_ea_handler``, the External Abort handler in EL3. When ``RAS_EXTENSION``
21640d553cfSPaul Beesleyis set to ``1``, it'll first call ``ras_ea_handler()`` function, which is the
21740d553cfSPaul Beesleytop-level RAS exception handler. ``ras_ea_handler`` is responsible for iterating
21840d553cfSPaul Beesleyto through platform-supplied error records, probe them, and when an error is
21940d553cfSPaul Beesleyidentified, look up and invoke the corresponding error handler.
22040d553cfSPaul Beesley
22140d553cfSPaul BeesleyNote that, if the platform chooses to override the ``plat_ea_handler`` function
22240d553cfSPaul Beesleyand intend to use the RAS framework, it must explicitly call
22340d553cfSPaul Beesley``ras_ea_handler()`` from within.
22440d553cfSPaul Beesley
22540d553cfSPaul BeesleySimilarly, for RAS interrupts, the framework defines
22640d553cfSPaul Beesley``ras_interrupt_handler()``. The RAS framework arranges for it to be invoked
22740d553cfSPaul Beesleywhen  a RAS interrupt taken at EL3. The function bisects the platform-supplied
22840d553cfSPaul Beesleysorted array of interrupts to look up the error record information associated
22940d553cfSPaul Beesleywith the interrupt number. That error handler for that record is then invoked to
23040d553cfSPaul Beesleyhandle the error.
23140d553cfSPaul Beesley
23240d553cfSPaul BeesleyInteraction with Exception Handling Framework
23340d553cfSPaul Beesley---------------------------------------------
23440d553cfSPaul Beesley
23540d553cfSPaul BeesleyAs mentioned in earlier sections, RAS framework interacts with the |EHF| to
23640d553cfSPaul Beesleyarbitrate handling of RAS exceptions with others that are routed to EL3. This
23740d553cfSPaul Beesleymeans that the platform must partition a `priority level`__ for handling RAS
23840d553cfSPaul Beesleyexceptions. The platform must then define the macro ``PLAT_RAS_PRI`` to the
23940d553cfSPaul Beesleypriority level used for RAS exceptions. Platforms would typically want to
24040d553cfSPaul Beesleyallocate the highest secure priority for RAS handling.
24140d553cfSPaul Beesley
24240d553cfSPaul Beesley.. __: exception-handling.rst#partitioning-priority-levels
24340d553cfSPaul Beesley
24440d553cfSPaul BeesleyHandling of both `interrupt`__ and `non-interrupt`__ exceptions follow the
24540d553cfSPaul Beesleysequences outlined in the |EHF| documentation. I.e., for interrupts, the
24640d553cfSPaul Beesleypriority management is implicit; but for non-interrupt exceptions, they're
24740d553cfSPaul Beesleyexplicit using `EHF APIs`__.
24840d553cfSPaul Beesley
24940d553cfSPaul Beesley.. __: exception-handling.rst#interrupt-flow
25040d553cfSPaul Beesley.. __: exception-handling.rst#non-interrupt-flow
25140d553cfSPaul Beesley.. __: exception-handling.rst#activating-and-deactivating-priorities
25240d553cfSPaul Beesley
25340d553cfSPaul Beesley----
25440d553cfSPaul Beesley
25540d553cfSPaul Beesley*Copyright (c) 2018, Arm Limited and Contributors. All rights reserved.*
256