1*40d553cfSPaul BeesleyRAS support in Trusted Firmware-A 2*40d553cfSPaul Beesley================================= 3*40d553cfSPaul Beesley 4*40d553cfSPaul Beesley 5*40d553cfSPaul Beesley 6*40d553cfSPaul Beesley.. contents:: 7*40d553cfSPaul Beesley :depth: 2 8*40d553cfSPaul Beesley 9*40d553cfSPaul Beesley.. |EHF| replace:: Exception Handling Framework 10*40d553cfSPaul Beesley.. |TF-A| replace:: Trusted Firmware-A 11*40d553cfSPaul Beesley 12*40d553cfSPaul BeesleyThis document describes |TF-A| support for Arm Reliability, Availability, and 13*40d553cfSPaul BeesleyServiceability (RAS) extensions. RAS is a mandatory extension for Armv8.2 and 14*40d553cfSPaul Beesleylater CPUs, and also an optional extension to the base Armv8.0 architecture. 15*40d553cfSPaul Beesley 16*40d553cfSPaul BeesleyIn conjunction with the |EHF|, support for RAS extension enables firmware-first 17*40d553cfSPaul Beesleyparadigm for handling platform errors: exceptions resulting from errors are 18*40d553cfSPaul Beesleyrouted to and handled in EL3. Said errors are Synchronous External Abort (SEA), 19*40d553cfSPaul BeesleyAsynchronous External Abort (signalled as SErrors), Fault Handling and Error 20*40d553cfSPaul BeesleyRecovery interrupts. The |EHF| document mentions various `error handling 21*40d553cfSPaul Beesleyuse-cases`__. 22*40d553cfSPaul Beesley 23*40d553cfSPaul Beesley.. __: exception-handling.rst#delegation-use-cases 24*40d553cfSPaul Beesley 25*40d553cfSPaul BeesleyFor the description of Arm RAS extensions, Standard Error Records, and the 26*40d553cfSPaul Beesleyprecise definition of RAS terminology, please refer to the Arm Architecture 27*40d553cfSPaul BeesleyReference Manual. The rest of this document assumes familiarity with 28*40d553cfSPaul Beesleyarchitecture and terminology. 29*40d553cfSPaul Beesley 30*40d553cfSPaul BeesleyOverview 31*40d553cfSPaul Beesley-------- 32*40d553cfSPaul Beesley 33*40d553cfSPaul BeesleyAs mentioned above, the RAS support in |TF-A| enables routing to and handling of 34*40d553cfSPaul Beesleyexceptions resulting from platform errors in EL3. It allows the platform to 35*40d553cfSPaul Beesleydefine an External Abort handler, and to register RAS nodes and interrupts. RAS 36*40d553cfSPaul Beesleyframework also provides `helpers`__ for accessing Standard Error Records as 37*40d553cfSPaul Beesleyintroduced by the RAS extensions. 38*40d553cfSPaul Beesley 39*40d553cfSPaul Beesley.. __: `Standard Error Record helpers`_ 40*40d553cfSPaul Beesley 41*40d553cfSPaul BeesleyThe build option ``RAS_EXTENSION`` when set to ``1`` includes the RAS in run 42*40d553cfSPaul Beesleytime firmware; ``EL3_EXCEPTION_HANDLING`` and ``HANDLE_EA_EL3_FIRST`` must also 43*40d553cfSPaul Beesleybe set ``1``. 44*40d553cfSPaul Beesley 45*40d553cfSPaul Beesley.. _ras-figure: 46*40d553cfSPaul Beesley 47*40d553cfSPaul Beesley.. image:: ../draw.io/ras.svg 48*40d553cfSPaul Beesley 49*40d553cfSPaul BeesleySee more on `Engaging the RAS framework`_. 50*40d553cfSPaul Beesley 51*40d553cfSPaul BeesleyPlatform APIs 52*40d553cfSPaul Beesley------------- 53*40d553cfSPaul Beesley 54*40d553cfSPaul BeesleyThe RAS framework allows the platform to define handlers for External Abort, 55*40d553cfSPaul BeesleyUncontainable Errors, Double Fault, and errors rising from EL3 execution. Please 56*40d553cfSPaul Beesleyrefer to the porting guide for the `RAS platform API descriptions`__. 57*40d553cfSPaul Beesley 58*40d553cfSPaul Beesley.. __: ../getting_started/porting-guide.rst#external-abort-handling-and-ras-support 59*40d553cfSPaul Beesley 60*40d553cfSPaul BeesleyRegistering RAS error records 61*40d553cfSPaul Beesley----------------------------- 62*40d553cfSPaul Beesley 63*40d553cfSPaul BeesleyRAS nodes are components in the system capable of signalling errors to PEs 64*40d553cfSPaul Beesleythrough one one of the notification mechanisms—SEAs, SErrors, or interrupts. RAS 65*40d553cfSPaul Beesleynodes contain one or more error records, which are registers through which the 66*40d553cfSPaul Beesleynodes advertise various properties of the signalled error. Arm recommends that 67*40d553cfSPaul Beesleyerror records are implemented in the Standard Error Record format. The RAS 68*40d553cfSPaul Beesleyarchitecture allows for error records to be accessible via system or 69*40d553cfSPaul Beesleymemory-mapped registers. 70*40d553cfSPaul Beesley 71*40d553cfSPaul BeesleyThe platform should enumerate the error records providing for each of them: 72*40d553cfSPaul Beesley 73*40d553cfSPaul Beesley- A handler to probe error records for errors; 74*40d553cfSPaul Beesley- When the probing identifies an error, a handler to handle it; 75*40d553cfSPaul Beesley- For memory-mapped error record, its base address and size in KB; for a system 76*40d553cfSPaul Beesley register-accessed record, the start index of the record and number of 77*40d553cfSPaul Beesley continuous records from that index; 78*40d553cfSPaul Beesley- Any node-specific auxiliary data. 79*40d553cfSPaul Beesley 80*40d553cfSPaul BeesleyWith this information supplied, when the run time firmware receives one of the 81*40d553cfSPaul Beesleynotification mechanisms, the RAS framework can iterate through and probe error 82*40d553cfSPaul Beesleyrecords for error, and invoke the appropriate handler to handle it. 83*40d553cfSPaul Beesley 84*40d553cfSPaul BeesleyThe RAS framework provides the macros to populate error record information. The 85*40d553cfSPaul Beesleymacros are versioned, and the latest version as of this writing is 1. These 86*40d553cfSPaul Beesleymacros create a structure of type ``struct err_record_info`` from its arguments, 87*40d553cfSPaul Beesleywhich are later passed to probe and error handlers. 88*40d553cfSPaul Beesley 89*40d553cfSPaul BeesleyFor memory-mapped error records: 90*40d553cfSPaul Beesley 91*40d553cfSPaul Beesley.. code:: c 92*40d553cfSPaul Beesley 93*40d553cfSPaul Beesley ERR_RECORD_MEMMAP_V1(base_addr, size_num_k, probe, handler, aux) 94*40d553cfSPaul Beesley 95*40d553cfSPaul BeesleyAnd, for system register ones: 96*40d553cfSPaul Beesley 97*40d553cfSPaul Beesley.. code:: c 98*40d553cfSPaul Beesley 99*40d553cfSPaul Beesley ERR_RECORD_SYSREG_V1(idx_start, num_idx, probe, handler, aux) 100*40d553cfSPaul Beesley 101*40d553cfSPaul BeesleyThe probe handler must have the following prototype: 102*40d553cfSPaul Beesley 103*40d553cfSPaul Beesley.. code:: c 104*40d553cfSPaul Beesley 105*40d553cfSPaul Beesley typedef int (*err_record_probe_t)(const struct err_record_info *info, 106*40d553cfSPaul Beesley int *probe_data); 107*40d553cfSPaul Beesley 108*40d553cfSPaul BeesleyThe probe handler must return a non-zero value if an error was detected, or 0 109*40d553cfSPaul Beesleyotherwise. The ``probe_data`` output parameter can be used to pass any useful 110*40d553cfSPaul Beesleyinformation resulting from probe to the error handler (see `below`__). For 111*40d553cfSPaul Beesleyexample, it could return the index of the record. 112*40d553cfSPaul Beesley 113*40d553cfSPaul Beesley.. __: `Standard Error Record helpers`_ 114*40d553cfSPaul Beesley 115*40d553cfSPaul BeesleyThe error handler must have the following prototype: 116*40d553cfSPaul Beesley 117*40d553cfSPaul Beesley.. code:: c 118*40d553cfSPaul Beesley 119*40d553cfSPaul Beesley typedef int (*err_record_handler_t)(const struct err_record_info *info, 120*40d553cfSPaul Beesley int probe_data, const struct err_handler_data *const data); 121*40d553cfSPaul Beesley 122*40d553cfSPaul BeesleyThe ``data`` constant parameter describes the various properties of the error, 123*40d553cfSPaul Beesleyincluding the reason for the error, exception syndrome, and also ``flags``, 124*40d553cfSPaul Beesley``cookie``, and ``handle`` parameters from the `top-level exception handler`__. 125*40d553cfSPaul Beesley 126*40d553cfSPaul Beesley.. __: interrupt-framework-design.rst#el3-interrupts 127*40d553cfSPaul Beesley 128*40d553cfSPaul BeesleyThe platform is expected populate an array using the macros above, and register 129*40d553cfSPaul Beesleythe it with the RAS framework using the macro ``REGISTER_ERR_RECORD_INFO()``, 130*40d553cfSPaul Beesleypassing it the name of the array describing the records. Note that the macro 131*40d553cfSPaul Beesleymust be used in the same file where the array is defined. 132*40d553cfSPaul Beesley 133*40d553cfSPaul BeesleyStandard Error Record helpers 134*40d553cfSPaul Beesley~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 135*40d553cfSPaul Beesley 136*40d553cfSPaul BeesleyThe |TF-A| RAS framework provides probe handlers for Standard Error Records, for 137*40d553cfSPaul Beesleyboth memory-mapped and System Register accesses: 138*40d553cfSPaul Beesley 139*40d553cfSPaul Beesley.. code:: c 140*40d553cfSPaul Beesley 141*40d553cfSPaul Beesley int ras_err_ser_probe_memmap(const struct err_record_info *info, 142*40d553cfSPaul Beesley int *probe_data); 143*40d553cfSPaul Beesley 144*40d553cfSPaul Beesley int ras_err_ser_probe_sysreg(const struct err_record_info *info, 145*40d553cfSPaul Beesley int *probe_data); 146*40d553cfSPaul Beesley 147*40d553cfSPaul BeesleyWhen the platform enumerates error records, for those records in the Standard 148*40d553cfSPaul BeesleyError Record format, these helpers maybe used instead of rolling out their own. 149*40d553cfSPaul BeesleyBoth helpers above: 150*40d553cfSPaul Beesley 151*40d553cfSPaul Beesley- Return non-zero value when an error is detected in a Standard Error Record; 152*40d553cfSPaul Beesley- Set ``probe_data`` to the index of the error record upon detecting an error. 153*40d553cfSPaul Beesley 154*40d553cfSPaul BeesleyRegistering RAS interrupts 155*40d553cfSPaul Beesley-------------------------- 156*40d553cfSPaul Beesley 157*40d553cfSPaul BeesleyRAS nodes can signal errors to the PE by raising Fault Handling and/or Error 158*40d553cfSPaul BeesleyRecovery interrupts. For the firmware-first handling paradigm for interrupts to 159*40d553cfSPaul Beesleywork, the platform must setup and register with |EHF|. See `Interaction with 160*40d553cfSPaul BeesleyException Handling Framework`_. 161*40d553cfSPaul Beesley 162*40d553cfSPaul BeesleyFor each RAS interrupt, the platform has to provide structure of type ``struct 163*40d553cfSPaul Beesleyras_interrupt``: 164*40d553cfSPaul Beesley 165*40d553cfSPaul Beesley- Interrupt number; 166*40d553cfSPaul Beesley- The associated error record information (pointer to the corresponding 167*40d553cfSPaul Beesley ``struct err_record_info``); 168*40d553cfSPaul Beesley- Optionally, a cookie. 169*40d553cfSPaul Beesley 170*40d553cfSPaul BeesleyThe platform is expected to define an array of ``struct ras_interrupt``, and 171*40d553cfSPaul Beesleyregister it with the RAS framework using the macro 172*40d553cfSPaul Beesley``REGISTER_RAS_INTERRUPTS()``, passing it the name of the array. Note that the 173*40d553cfSPaul Beesleymacro must be used in the same file where the array is defined. 174*40d553cfSPaul Beesley 175*40d553cfSPaul BeesleyThe array of ``struct ras_interrupt`` must be sorted in the increasing order of 176*40d553cfSPaul Beesleyinterrupt number. This allows for fast look of handlers in order to service RAS 177*40d553cfSPaul Beesleyinterrupts. 178*40d553cfSPaul Beesley 179*40d553cfSPaul BeesleyDouble-fault handling 180*40d553cfSPaul Beesley--------------------- 181*40d553cfSPaul Beesley 182*40d553cfSPaul BeesleyA Double Fault condition arises when an error is signalled to the PE while 183*40d553cfSPaul Beesleyhandling of a previously signalled error is still underway. When a Double Fault 184*40d553cfSPaul Beesleycondition arises, the Arm RAS extensions only require for handler to perform 185*40d553cfSPaul Beesleyorderly shutdown of the system, as recovery may be impossible. 186*40d553cfSPaul Beesley 187*40d553cfSPaul BeesleyThe RAS extensions part of Armv8.4 introduced new architectural features to deal 188*40d553cfSPaul Beesleywith Double Fault conditions, specifically, the introduction of ``NMEA`` and 189*40d553cfSPaul Beesley``EASE`` bits to ``SCR_EL3`` register. These were introduced to assist EL3 190*40d553cfSPaul Beesleysoftware which runs part of its entry/exit routines with exceptions momentarily 191*40d553cfSPaul Beesleymasked—meaning, in such systems, External Aborts/SErrors are not immediately 192*40d553cfSPaul Beesleyhandled when they occur, but only after the exceptions are unmasked again. 193*40d553cfSPaul Beesley 194*40d553cfSPaul Beesley|TF-A|, for legacy reasons, executes entire EL3 with all exceptions unmasked. 195*40d553cfSPaul BeesleyThis means that all exceptions routed to EL3 are handled immediately. |TF-A| 196*40d553cfSPaul Beesleythus is able to detect a Double Fault conditions in software, without needing 197*40d553cfSPaul Beesleythe intended advantages of Armv8.4 Double Fault architecture extensions. 198*40d553cfSPaul Beesley 199*40d553cfSPaul BeesleyDouble faults are fatal, and terminate at the platform double fault handler, and 200*40d553cfSPaul Beesleydoesn't return. 201*40d553cfSPaul Beesley 202*40d553cfSPaul BeesleyEngaging the RAS framework 203*40d553cfSPaul Beesley-------------------------- 204*40d553cfSPaul Beesley 205*40d553cfSPaul BeesleyEnabling RAS support is a platform choice constructed from three distinct, but 206*40d553cfSPaul Beesleyrelated, build options: 207*40d553cfSPaul Beesley 208*40d553cfSPaul Beesley- ``RAS_EXTENSION=1`` includes the RAS framework in the run time firmware; 209*40d553cfSPaul Beesley 210*40d553cfSPaul Beesley- ``EL3_EXCEPTION_HANDLING=1`` enables handling of exceptions at EL3. See 211*40d553cfSPaul Beesley `Interaction with Exception Handling Framework`_; 212*40d553cfSPaul Beesley 213*40d553cfSPaul Beesley- ``HANDLE_EA_EL3_FIRST=1`` enables routing of External Aborts and SErrors to 214*40d553cfSPaul Beesley EL3. 215*40d553cfSPaul Beesley 216*40d553cfSPaul BeesleyThe RAS support in |TF-A| introduces a default implementation of 217*40d553cfSPaul Beesley``plat_ea_handler``, the External Abort handler in EL3. When ``RAS_EXTENSION`` 218*40d553cfSPaul Beesleyis set to ``1``, it'll first call ``ras_ea_handler()`` function, which is the 219*40d553cfSPaul Beesleytop-level RAS exception handler. ``ras_ea_handler`` is responsible for iterating 220*40d553cfSPaul Beesleyto through platform-supplied error records, probe them, and when an error is 221*40d553cfSPaul Beesleyidentified, look up and invoke the corresponding error handler. 222*40d553cfSPaul Beesley 223*40d553cfSPaul BeesleyNote that, if the platform chooses to override the ``plat_ea_handler`` function 224*40d553cfSPaul Beesleyand intend to use the RAS framework, it must explicitly call 225*40d553cfSPaul Beesley``ras_ea_handler()`` from within. 226*40d553cfSPaul Beesley 227*40d553cfSPaul BeesleySimilarly, for RAS interrupts, the framework defines 228*40d553cfSPaul Beesley``ras_interrupt_handler()``. The RAS framework arranges for it to be invoked 229*40d553cfSPaul Beesleywhen a RAS interrupt taken at EL3. The function bisects the platform-supplied 230*40d553cfSPaul Beesleysorted array of interrupts to look up the error record information associated 231*40d553cfSPaul Beesleywith the interrupt number. That error handler for that record is then invoked to 232*40d553cfSPaul Beesleyhandle the error. 233*40d553cfSPaul Beesley 234*40d553cfSPaul BeesleyInteraction with Exception Handling Framework 235*40d553cfSPaul Beesley--------------------------------------------- 236*40d553cfSPaul Beesley 237*40d553cfSPaul BeesleyAs mentioned in earlier sections, RAS framework interacts with the |EHF| to 238*40d553cfSPaul Beesleyarbitrate handling of RAS exceptions with others that are routed to EL3. This 239*40d553cfSPaul Beesleymeans that the platform must partition a `priority level`__ for handling RAS 240*40d553cfSPaul Beesleyexceptions. The platform must then define the macro ``PLAT_RAS_PRI`` to the 241*40d553cfSPaul Beesleypriority level used for RAS exceptions. Platforms would typically want to 242*40d553cfSPaul Beesleyallocate the highest secure priority for RAS handling. 243*40d553cfSPaul Beesley 244*40d553cfSPaul Beesley.. __: exception-handling.rst#partitioning-priority-levels 245*40d553cfSPaul Beesley 246*40d553cfSPaul BeesleyHandling of both `interrupt`__ and `non-interrupt`__ exceptions follow the 247*40d553cfSPaul Beesleysequences outlined in the |EHF| documentation. I.e., for interrupts, the 248*40d553cfSPaul Beesleypriority management is implicit; but for non-interrupt exceptions, they're 249*40d553cfSPaul Beesleyexplicit using `EHF APIs`__. 250*40d553cfSPaul Beesley 251*40d553cfSPaul Beesley.. __: exception-handling.rst#interrupt-flow 252*40d553cfSPaul Beesley.. __: exception-handling.rst#non-interrupt-flow 253*40d553cfSPaul Beesley.. __: exception-handling.rst#activating-and-deactivating-priorities 254*40d553cfSPaul Beesley 255*40d553cfSPaul Beesley---- 256*40d553cfSPaul Beesley 257*40d553cfSPaul Beesley*Copyright (c) 2018, Arm Limited and Contributors. All rights reserved.* 258