xref: /rk3399_ARM-atf/docs/components/ras.rst (revision 347609510e30f5cc3f33beaad3cf085e8296b883)
18aa05055SPaul BeesleyReliability, Availability, and Serviceability (RAS) Extensions
28aa05055SPaul Beesley==============================================================
340d553cfSPaul Beesley
440d553cfSPaul BeesleyThis document describes |TF-A| support for Arm Reliability, Availability, and
540d553cfSPaul BeesleyServiceability (RAS) extensions. RAS is a mandatory extension for Armv8.2 and
640d553cfSPaul Beesleylater CPUs, and also an optional extension to the base Armv8.0 architecture.
740d553cfSPaul Beesley
840d553cfSPaul BeesleyIn conjunction with the |EHF|, support for RAS extension enables firmware-first
940d553cfSPaul Beesleyparadigm for handling platform errors: exceptions resulting from errors are
1040d553cfSPaul Beesleyrouted to and handled in EL3. Said errors are Synchronous External Abort (SEA),
1140d553cfSPaul BeesleyAsynchronous External Abort (signalled as SErrors), Fault Handling and Error
1240d553cfSPaul BeesleyRecovery interrupts.  The |EHF| document mentions various `error handling
1340d553cfSPaul Beesleyuse-cases`__.
1440d553cfSPaul Beesley
1540d553cfSPaul Beesley.. __: exception-handling.rst#delegation-use-cases
1640d553cfSPaul Beesley
1740d553cfSPaul BeesleyFor the description of Arm RAS extensions, Standard Error Records, and the
1840d553cfSPaul Beesleyprecise definition of RAS terminology, please refer to the Arm Architecture
1940d553cfSPaul BeesleyReference Manual. The rest of this document assumes familiarity with
2040d553cfSPaul Beesleyarchitecture and terminology.
2140d553cfSPaul Beesley
2240d553cfSPaul BeesleyOverview
2340d553cfSPaul Beesley--------
2440d553cfSPaul Beesley
2540d553cfSPaul BeesleyAs mentioned above, the RAS support in |TF-A| enables routing to and handling of
2640d553cfSPaul Beesleyexceptions resulting from platform errors in EL3. It allows the platform to
2740d553cfSPaul Beesleydefine an External Abort handler, and to register RAS nodes and interrupts. RAS
2840d553cfSPaul Beesleyframework also provides `helpers`__ for accessing Standard Error Records as
2940d553cfSPaul Beesleyintroduced by the RAS extensions.
3040d553cfSPaul Beesley
3140d553cfSPaul Beesley.. __: `Standard Error Record helpers`_
3240d553cfSPaul Beesley
3340d553cfSPaul BeesleyThe build option ``RAS_EXTENSION`` when set to ``1`` includes the RAS in run
3440d553cfSPaul Beesleytime firmware; ``EL3_EXCEPTION_HANDLING`` and ``HANDLE_EA_EL3_FIRST`` must also
3540d553cfSPaul Beesleybe set ``1``.
3640d553cfSPaul Beesley
3740d553cfSPaul Beesley.. _ras-figure:
3840d553cfSPaul Beesley
39a2c320a8SPaul Beesley.. image:: ../resources/diagrams/draw.io/ras.svg
4040d553cfSPaul Beesley
4140d553cfSPaul BeesleySee more on `Engaging the RAS framework`_.
4240d553cfSPaul Beesley
4340d553cfSPaul BeesleyPlatform APIs
4440d553cfSPaul Beesley-------------
4540d553cfSPaul Beesley
4640d553cfSPaul BeesleyThe RAS framework allows the platform to define handlers for External Abort,
4740d553cfSPaul BeesleyUncontainable Errors, Double Fault, and errors rising from EL3 execution. Please
4840d553cfSPaul Beesleyrefer to the porting guide for the `RAS platform API descriptions`__.
4940d553cfSPaul Beesley
5040d553cfSPaul Beesley.. __: ../getting_started/porting-guide.rst#external-abort-handling-and-ras-support
5140d553cfSPaul Beesley
5240d553cfSPaul BeesleyRegistering RAS error records
5340d553cfSPaul Beesley-----------------------------
5440d553cfSPaul Beesley
5540d553cfSPaul BeesleyRAS nodes are components in the system capable of signalling errors to PEs
5640d553cfSPaul Beesleythrough one one of the notification mechanisms—SEAs, SErrors, or interrupts. RAS
5740d553cfSPaul Beesleynodes contain one or more error records, which are registers through which the
5840d553cfSPaul Beesleynodes advertise various properties of the signalled error. Arm recommends that
5940d553cfSPaul Beesleyerror records are implemented in the Standard Error Record format. The RAS
6040d553cfSPaul Beesleyarchitecture allows for error records to be accessible via system or
6140d553cfSPaul Beesleymemory-mapped registers.
6240d553cfSPaul Beesley
6340d553cfSPaul BeesleyThe platform should enumerate the error records providing for each of them:
6440d553cfSPaul Beesley
6540d553cfSPaul Beesley-  A handler to probe error records for errors;
6640d553cfSPaul Beesley-  When the probing identifies an error, a handler to handle it;
6740d553cfSPaul Beesley-  For memory-mapped error record, its base address and size in KB; for a system
6840d553cfSPaul Beesley   register-accessed record, the start index of the record and number of
6940d553cfSPaul Beesley   continuous records from that index;
7040d553cfSPaul Beesley-  Any node-specific auxiliary data.
7140d553cfSPaul Beesley
7240d553cfSPaul BeesleyWith this information supplied, when the run time firmware receives one of the
7340d553cfSPaul Beesleynotification mechanisms, the RAS framework can iterate through and probe error
7440d553cfSPaul Beesleyrecords for error, and invoke the appropriate handler to handle it.
7540d553cfSPaul Beesley
7640d553cfSPaul BeesleyThe RAS framework provides the macros to populate error record information. The
7740d553cfSPaul Beesleymacros are versioned, and the latest version as of this writing is 1. These
7840d553cfSPaul Beesleymacros create a structure of type ``struct err_record_info`` from its arguments,
7940d553cfSPaul Beesleywhich are later passed to probe and error handlers.
8040d553cfSPaul Beesley
8140d553cfSPaul BeesleyFor memory-mapped error records:
8240d553cfSPaul Beesley
8340d553cfSPaul Beesley.. code:: c
8440d553cfSPaul Beesley
8540d553cfSPaul Beesley    ERR_RECORD_MEMMAP_V1(base_addr, size_num_k, probe, handler, aux)
8640d553cfSPaul Beesley
8740d553cfSPaul BeesleyAnd, for system register ones:
8840d553cfSPaul Beesley
8940d553cfSPaul Beesley.. code:: c
9040d553cfSPaul Beesley
9140d553cfSPaul Beesley    ERR_RECORD_SYSREG_V1(idx_start, num_idx, probe, handler, aux)
9240d553cfSPaul Beesley
9340d553cfSPaul BeesleyThe probe handler must have the following prototype:
9440d553cfSPaul Beesley
9540d553cfSPaul Beesley.. code:: c
9640d553cfSPaul Beesley
9740d553cfSPaul Beesley    typedef int (*err_record_probe_t)(const struct err_record_info *info,
9840d553cfSPaul Beesley                    int *probe_data);
9940d553cfSPaul Beesley
10040d553cfSPaul BeesleyThe probe handler must return a non-zero value if an error was detected, or 0
10140d553cfSPaul Beesleyotherwise. The ``probe_data`` output parameter can be used to pass any useful
10240d553cfSPaul Beesleyinformation resulting from probe to the error handler (see `below`__). For
10340d553cfSPaul Beesleyexample, it could return the index of the record.
10440d553cfSPaul Beesley
10540d553cfSPaul Beesley.. __: `Standard Error Record helpers`_
10640d553cfSPaul Beesley
10740d553cfSPaul BeesleyThe error handler must have the following prototype:
10840d553cfSPaul Beesley
10940d553cfSPaul Beesley.. code:: c
11040d553cfSPaul Beesley
11140d553cfSPaul Beesley    typedef int (*err_record_handler_t)(const struct err_record_info *info,
11240d553cfSPaul Beesley               int probe_data, const struct err_handler_data *const data);
11340d553cfSPaul Beesley
11440d553cfSPaul BeesleyThe ``data`` constant parameter describes the various properties of the error,
11540d553cfSPaul Beesleyincluding the reason for the error, exception syndrome, and also ``flags``,
11640d553cfSPaul Beesley``cookie``, and ``handle`` parameters from the `top-level exception handler`__.
11740d553cfSPaul Beesley
11840d553cfSPaul Beesley.. __: interrupt-framework-design.rst#el3-interrupts
11940d553cfSPaul Beesley
12040d553cfSPaul BeesleyThe platform is expected populate an array using the macros above, and register
12140d553cfSPaul Beesleythe it with the RAS framework using the macro ``REGISTER_ERR_RECORD_INFO()``,
12240d553cfSPaul Beesleypassing it the name of the array describing the records. Note that the macro
12340d553cfSPaul Beesleymust be used in the same file where the array is defined.
12440d553cfSPaul Beesley
12540d553cfSPaul BeesleyStandard Error Record helpers
12640d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12740d553cfSPaul Beesley
12840d553cfSPaul BeesleyThe |TF-A| RAS framework provides probe handlers for Standard Error Records, for
12940d553cfSPaul Beesleyboth memory-mapped and System Register accesses:
13040d553cfSPaul Beesley
13140d553cfSPaul Beesley.. code:: c
13240d553cfSPaul Beesley
13340d553cfSPaul Beesley    int ras_err_ser_probe_memmap(const struct err_record_info *info,
13440d553cfSPaul Beesley                int *probe_data);
13540d553cfSPaul Beesley
13640d553cfSPaul Beesley    int ras_err_ser_probe_sysreg(const struct err_record_info *info,
13740d553cfSPaul Beesley                int *probe_data);
13840d553cfSPaul Beesley
13940d553cfSPaul BeesleyWhen the platform enumerates error records, for those records in the Standard
14040d553cfSPaul BeesleyError Record format, these helpers maybe used instead of rolling out their own.
14140d553cfSPaul BeesleyBoth helpers above:
14240d553cfSPaul Beesley
14340d553cfSPaul Beesley-  Return non-zero value when an error is detected in a Standard Error Record;
14440d553cfSPaul Beesley-  Set ``probe_data`` to the index of the error record upon detecting an error.
14540d553cfSPaul Beesley
14640d553cfSPaul BeesleyRegistering RAS interrupts
14740d553cfSPaul Beesley--------------------------
14840d553cfSPaul Beesley
14940d553cfSPaul BeesleyRAS nodes can signal errors to the PE by raising Fault Handling and/or Error
15040d553cfSPaul BeesleyRecovery interrupts. For the firmware-first handling paradigm for interrupts to
15140d553cfSPaul Beesleywork, the platform must setup and register with |EHF|. See `Interaction with
15240d553cfSPaul BeesleyException Handling Framework`_.
15340d553cfSPaul Beesley
15440d553cfSPaul BeesleyFor each RAS interrupt, the platform has to provide structure of type ``struct
15540d553cfSPaul Beesleyras_interrupt``:
15640d553cfSPaul Beesley
15740d553cfSPaul Beesley-  Interrupt number;
15840d553cfSPaul Beesley-  The associated error record information (pointer to the corresponding
15940d553cfSPaul Beesley   ``struct err_record_info``);
16040d553cfSPaul Beesley-  Optionally, a cookie.
16140d553cfSPaul Beesley
16240d553cfSPaul BeesleyThe platform is expected to define an array of ``struct ras_interrupt``, and
16340d553cfSPaul Beesleyregister it with the RAS framework using the macro
16440d553cfSPaul Beesley``REGISTER_RAS_INTERRUPTS()``, passing it the name of the array. Note that the
16540d553cfSPaul Beesleymacro must be used in the same file where the array is defined.
16640d553cfSPaul Beesley
16740d553cfSPaul BeesleyThe array of ``struct ras_interrupt`` must be sorted in the increasing order of
16840d553cfSPaul Beesleyinterrupt number. This allows for fast look of handlers in order to service RAS
16940d553cfSPaul Beesleyinterrupts.
17040d553cfSPaul Beesley
17140d553cfSPaul BeesleyDouble-fault handling
17240d553cfSPaul Beesley---------------------
17340d553cfSPaul Beesley
17440d553cfSPaul BeesleyA Double Fault condition arises when an error is signalled to the PE while
17540d553cfSPaul Beesleyhandling of a previously signalled error is still underway. When a Double Fault
17640d553cfSPaul Beesleycondition arises, the Arm RAS extensions only require for handler to perform
17740d553cfSPaul Beesleyorderly shutdown of the system, as recovery may be impossible.
17840d553cfSPaul Beesley
17940d553cfSPaul BeesleyThe RAS extensions part of Armv8.4 introduced new architectural features to deal
18040d553cfSPaul Beesleywith Double Fault conditions, specifically, the introduction of ``NMEA`` and
18140d553cfSPaul Beesley``EASE`` bits to ``SCR_EL3`` register. These were introduced to assist EL3
18240d553cfSPaul Beesleysoftware which runs part of its entry/exit routines with exceptions momentarily
18340d553cfSPaul Beesleymasked—meaning, in such systems, External Aborts/SErrors are not immediately
18440d553cfSPaul Beesleyhandled when they occur, but only after the exceptions are unmasked again.
18540d553cfSPaul Beesley
18640d553cfSPaul Beesley|TF-A|, for legacy reasons, executes entire EL3 with all exceptions unmasked.
18740d553cfSPaul BeesleyThis means that all exceptions routed to EL3 are handled immediately. |TF-A|
18840d553cfSPaul Beesleythus is able to detect a Double Fault conditions in software, without needing
18940d553cfSPaul Beesleythe intended advantages of Armv8.4 Double Fault architecture extensions.
19040d553cfSPaul Beesley
19140d553cfSPaul BeesleyDouble faults are fatal, and terminate at the platform double fault handler, and
19240d553cfSPaul Beesleydoesn't return.
19340d553cfSPaul Beesley
19440d553cfSPaul BeesleyEngaging the RAS framework
19540d553cfSPaul Beesley--------------------------
19640d553cfSPaul Beesley
19740d553cfSPaul BeesleyEnabling RAS support is a platform choice constructed from three distinct, but
19840d553cfSPaul Beesleyrelated, build options:
19940d553cfSPaul Beesley
20040d553cfSPaul Beesley-  ``RAS_EXTENSION=1`` includes the RAS framework in the run time firmware;
20140d553cfSPaul Beesley
20240d553cfSPaul Beesley-  ``EL3_EXCEPTION_HANDLING=1`` enables handling of exceptions at EL3. See
20340d553cfSPaul Beesley   `Interaction with Exception Handling Framework`_;
20440d553cfSPaul Beesley
20540d553cfSPaul Beesley-  ``HANDLE_EA_EL3_FIRST=1`` enables routing of External Aborts and SErrors to
20640d553cfSPaul Beesley   EL3.
20740d553cfSPaul Beesley
20840d553cfSPaul BeesleyThe RAS support in |TF-A| introduces a default implementation of
20940d553cfSPaul Beesley``plat_ea_handler``, the External Abort handler in EL3. When ``RAS_EXTENSION``
21040d553cfSPaul Beesleyis set to ``1``, it'll first call ``ras_ea_handler()`` function, which is the
21140d553cfSPaul Beesleytop-level RAS exception handler. ``ras_ea_handler`` is responsible for iterating
21240d553cfSPaul Beesleyto through platform-supplied error records, probe them, and when an error is
21340d553cfSPaul Beesleyidentified, look up and invoke the corresponding error handler.
21440d553cfSPaul Beesley
21540d553cfSPaul BeesleyNote that, if the platform chooses to override the ``plat_ea_handler`` function
21640d553cfSPaul Beesleyand intend to use the RAS framework, it must explicitly call
21740d553cfSPaul Beesley``ras_ea_handler()`` from within.
21840d553cfSPaul Beesley
21940d553cfSPaul BeesleySimilarly, for RAS interrupts, the framework defines
22040d553cfSPaul Beesley``ras_interrupt_handler()``. The RAS framework arranges for it to be invoked
22140d553cfSPaul Beesleywhen  a RAS interrupt taken at EL3. The function bisects the platform-supplied
22240d553cfSPaul Beesleysorted array of interrupts to look up the error record information associated
22340d553cfSPaul Beesleywith the interrupt number. That error handler for that record is then invoked to
22440d553cfSPaul Beesleyhandle the error.
22540d553cfSPaul Beesley
22640d553cfSPaul BeesleyInteraction with Exception Handling Framework
22740d553cfSPaul Beesley---------------------------------------------
22840d553cfSPaul Beesley
22940d553cfSPaul BeesleyAs mentioned in earlier sections, RAS framework interacts with the |EHF| to
23040d553cfSPaul Beesleyarbitrate handling of RAS exceptions with others that are routed to EL3. This
23140d553cfSPaul Beesleymeans that the platform must partition a `priority level`__ for handling RAS
23240d553cfSPaul Beesleyexceptions. The platform must then define the macro ``PLAT_RAS_PRI`` to the
23340d553cfSPaul Beesleypriority level used for RAS exceptions. Platforms would typically want to
23440d553cfSPaul Beesleyallocate the highest secure priority for RAS handling.
23540d553cfSPaul Beesley
23640d553cfSPaul Beesley.. __: exception-handling.rst#partitioning-priority-levels
23740d553cfSPaul Beesley
23840d553cfSPaul BeesleyHandling of both `interrupt`__ and `non-interrupt`__ exceptions follow the
23940d553cfSPaul Beesleysequences outlined in the |EHF| documentation. I.e., for interrupts, the
24040d553cfSPaul Beesleypriority management is implicit; but for non-interrupt exceptions, they're
24140d553cfSPaul Beesleyexplicit using `EHF APIs`__.
24240d553cfSPaul Beesley
24340d553cfSPaul Beesley.. __: exception-handling.rst#interrupt-flow
24440d553cfSPaul Beesley.. __: exception-handling.rst#non-interrupt-flow
24540d553cfSPaul Beesley.. __: exception-handling.rst#activating-and-deactivating-priorities
24640d553cfSPaul Beesley
247*34760951SPaul Beesley--------------
24840d553cfSPaul Beesley
249*34760951SPaul Beesley*Copyright (c) 2018-2019, Arm Limited and Contributors. All rights reserved.*
250