| 53644fa8 | 07-Apr-2025 |
Boyan Karatotev <boyan.karatotev@arm.com> |
fix(libc): make sure __init functions are garbage collected
RECLAIM_INIT_CODE is useful to remove code that is only necessary during boot. However, these functions are generally called once and as s
fix(libc): make sure __init functions are garbage collected
RECLAIM_INIT_CODE is useful to remove code that is only necessary during boot. However, these functions are generally called once and as such prime candidates for inlining. When building with LTO, the compiler is pretty good at inlining every single one, making this option pointless.
So tell the compiler to not inline these functions. This ensures they are kept separate and they can be garbage collected later. This is expected to cost a little bit of speed due to the extra branching.
Change-Id: Ie83a9ec8db03cb42139742fc6d728d12ce8549d3 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| bdaf0d9b | 03-Apr-2025 |
Govindraj Raja <govindraj.raja@arm.com> |
fix(cpus): fix clang compilation issue
A potential problem with clang version < 17 can cause resolving nested 'cfi_startproc' to fail compilation.
So add a variant of check_errara/reset_macros that
fix(cpus): fix clang compilation issue
A potential problem with clang version < 17 can cause resolving nested 'cfi_startproc' to fail compilation.
So add a variant of check_errara/reset_macros that is compatible with clang version < 17 to ignore `cfi_startproc` and `cfi_endproc`.
This wouldn't cause any performance issue and will not affect any functional behaviour.
Change-Id: I46147af2dd0accd5be14ddb26dea03bb2f87cba8 Signed-off-by: Govindraj Raja <govindraj.raja@arm.com>
show more ...
|
| 34d7f196 | 17-Mar-2025 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(libc): use builtin implementations where possible
When conditions are right, eg a small memcpy of a known size and alignment, the compiler may know of a sequence that is optimal for the given c
perf(libc): use builtin implementations where possible
When conditions are right, eg a small memcpy of a known size and alignment, the compiler may know of a sequence that is optimal for the given constraints and inline it. If the compiler doesn't find one, it will emit a call to the generic function (in the libc) which will implement this in the most generic and unconstrained manner. That generic function is rarely the most optimal when constraints are known.
So give the compiler a chance to do this. Replace calls to libc functions that have builtins to the builtin and keep the generic implementation if it decides to emit a call anyway.
And example of this in action is usage of FEAT_MOPS. When the compiler is aware of the feature (-march=armv8.8-a) then it will emit the 3 MOPS instructions instead of calls to our memcpy() and memset() implementations.
Change-Id: I9860cfada1d941b613ebd4da068e9992c387952e Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| ac9f4b4d | 25-Mar-2025 |
Govindraj Raja <govindraj.raja@arm.com> |
fix(cpus): remove errata setting PF_MODE to conservative
The erratum titled “Disabling of data prefetcher with outstanding prefetch TLB miss might cause a deadlock” should not be handled within TF-A
fix(cpus): remove errata setting PF_MODE to conservative
The erratum titled “Disabling of data prefetcher with outstanding prefetch TLB miss might cause a deadlock” should not be handled within TF-A. The current workaround attempts to follow option 2 but misapplies it. Specifically, it statically sets PF_MODE to conservative, which is not the recommended approach. According to the erratum documentation, PF_MODE should be configured in conservative mode only when we disable data prefetcher however this is not done in TF-A and thus the workaround is not needed in TF-A.
The static setting of PF_MODE in TF-A does not correctly address the erratum and may introduce unnecessary performance degradation on platforms that adopt it without fully understanding its implications.
To prevent incorrect or unintended use, the current implementation of this erratum workaround should be removed from TF-A and not adopted by platforms.
List of Impacted CPU's with Errata Numbers and reference to SDEN -
Cortex-A78 - 2132060 - https://developer.arm.com/documentation/SDEN1401784/latest Cortex-A78C - 2132064 - https://developer.arm.com/documentation/SDEN-2004089/latest Cortex-A710 - 2058056 - https://developer.arm.com/documentation/SDEN-1775101/latest Cortex-X2 - 2058056 - https://developer.arm.com/documentation/SDEN-1775100/latest Cortex-X3 - 2070301 - https://developer.arm.com/documentation/SDEN2055130/latest Neoverse-N2 - 2138953 - https://developer.arm.com/documentation/SDEN-1982442/latest Neoverse-V1 - 2108267 - https://developer.arm.com/documentation/SDEN-1401781/latest Neoverse-V2 - 2331132 - https://developer.arm.com/documentation/SDEN-2332927/latest
Change-Id: Icf4048508ae070b2df073cc46c63be058b2779df Signed-off-by: Govindraj Raja <govindraj.raja@arm.com>
show more ...
|
| 518b278b | 24-Mar-2025 |
Manish Pandey <manish.pandey2@arm.com> |
Merge changes from topic "hm/handoff-aarch32" into integration
* changes: refactor(arm): simplify early platform setup functions feat(bl32): enable r3 usage for boot args feat(handoff): add li
Merge changes from topic "hm/handoff-aarch32" into integration
* changes: refactor(arm): simplify early platform setup functions feat(bl32): enable r3 usage for boot args feat(handoff): add lib to sp-min sources feat(handoff): add 32-bit variant of SRAM layout feat(handoff): add 32-bit variant of ep info fix(aarch32): avoid using r12 to store boot params fix(arm): reinit secure and non-secure tls refactor(handoff): downgrade error messages
show more ...
|
| 38b5f93a | 20-Mar-2025 |
Madhukar Pappireddy <madhukar.pappireddy@arm.com> |
Merge "feat(lib): implement strnlen secure and strcpy secure function" into integration |
| eb088894 | 17-Mar-2025 |
Jit Loon Lim <jit.loon.lim@altera.com> |
feat(lib): implement strnlen secure and strcpy secure function
Implement safer version of 'strnlen' function to handle NULL terminated strings with additional bound checking and secure version of st
feat(lib): implement strnlen secure and strcpy secure function
Implement safer version of 'strnlen' function to handle NULL terminated strings with additional bound checking and secure version of string copy function to support better security and avoid destination buffer overflow.
Change-Id: I93916f003b192c1c6da6a4f78a627c8885db11d9 Signed-off-by: Jit Loon Lim <jit.loon.lim@altera.com> Signed-off-by: Girisha Dengi <girisha.dengi@intel.com>
show more ...
|
| f2bd3528 | 19-Feb-2025 |
John Powell <john.powell@arm.com> |
fix(errata): workaround for Cortex-A510 erratum 2971420
Cortex-A510 erratum 2971420 applies to revisions r0p1, r0p2, r0p3, r1p0, r1p1, r1p2 and r1p3, and is still open.
Under some conditions, data
fix(errata): workaround for Cortex-A510 erratum 2971420
Cortex-A510 erratum 2971420 applies to revisions r0p1, r0p2, r0p3, r1p0, r1p1, r1p2 and r1p3, and is still open.
Under some conditions, data might be corrupted if Trace Buffer Extension (TRBE) is enabled. The workaround is to disable trace collection via TRBE by programming MDCR_EL3.NSTB[1] to the opposite value of SCR_EL3.NS on a security state switch. Since we only enable TRBE for non-secure world, the workaround is to disable TRBE by setting the NSTB field to 00 so accesses are trapped to EL3 and secure state owns the buffer.
SDEN: https://developer.arm.com/documentation/SDEN-1873361/latest/
Signed-off-by: John Powell <john.powell@arm.com> Change-Id: Ia77051f6b64c726a8c50596c78f220d323ab7d97
show more ...
|
| fcf2ab71 | 11-Feb-2025 |
John Powell <john.powell@arm.com> |
fix(cpus): workaround for Cortex-A715 erratum 2804830
Cortex-A715 erratum 2804830 applies to r0p0, r1p0, r1p1 and r1p2, and is fixed in r1p3.
Under some conditions, writes of a 64B-aligned, 64B gra
fix(cpus): workaround for Cortex-A715 erratum 2804830
Cortex-A715 erratum 2804830 applies to r0p0, r1p0, r1p1 and r1p2, and is fixed in r1p3.
Under some conditions, writes of a 64B-aligned, 64B granule of memory might cause data corruption without this workaround. See SDEN for details.
Since this workaround disables write streaming, it is expected to have a significant performance impact for code that is heavily reliant on write streaming, such as memcpy or memset.
SDEN: https://developer.arm.com/documentation/SDEN-2148827/latest/
Change-Id: Ia12f6c7de7c92f6ea4aec3057b228b828d48724c Signed-off-by: John Powell <john.powell@arm.com>
show more ...
|
| 8001247c | 16-Dec-2024 |
Harrison Mutai <harrison.mutai@arm.com> |
feat(handoff): add 32-bit variant of SRAM layout
Introduce the 32-bit variant of the SRAM layout used by BL1 to communicate available free SRAM to BL2. This layout was added to the specification in:
feat(handoff): add 32-bit variant of SRAM layout
Introduce the 32-bit variant of the SRAM layout used by BL1 to communicate available free SRAM to BL2. This layout was added to the specification in: https://github.com/FirmwareHandoff/firmware_handoff/pull/54.
Change-Id: I559fb8a00725eaedf01856af42d73029802aa095 Signed-off-by: Harrison Mutai <harrison.mutai@arm.com>
show more ...
|
| 7ffc1d6c | 16-Dec-2024 |
Harrison Mutai <harrison.mutai@arm.com> |
feat(handoff): add 32-bit variant of ep info
Add the 32-bit version of the entry_point_info structure used to pass the boot arguments for future executables, added to the spec under the PR: https://
feat(handoff): add 32-bit variant of ep info
Add the 32-bit version of the entry_point_info structure used to pass the boot arguments for future executables, added to the spec under the PR: https://github.com/FirmwareHandoff/firmware_handoff/pull/54.
Change-Id: Id98e0f98db6ffd4790193e201f24e62101450e20 Signed-off-by: Harrison Mutai <harrison.mutai@arm.com>
show more ...
|
| af1dd6e1 | 09-Mar-2025 |
Manish V Badarkhe <Manish.Badarkhe@arm.com> |
feat(lib): add EXTRACT_FIELD macro for field extraction
Introduce a new EXTRACT_FIELD macro to simplify the extraction of specific fields from a value by shifting the value right and applying the ma
feat(lib): add EXTRACT_FIELD macro for field extraction
Introduce a new EXTRACT_FIELD macro to simplify the extraction of specific fields from a value by shifting the value right and applying the mask.
Change-Id: Iae9573d6d23067bbde13253e264e4f6f18b806c2 Signed-off-by: Manish V Badarkhe <Manish.Badarkhe@arm.com>
show more ...
|
| bbff267b | 24-Feb-2025 |
Arvind Ram Prakash <arvind.ramprakash@arm.com> |
fix(errata-abi): add support for handling split workarounds
Certain erratum workarounds like Neoverse N1 1542419, need a part of their mitigation done in EL3 and the rest in lower EL. But currently
fix(errata-abi): add support for handling split workarounds
Certain erratum workarounds like Neoverse N1 1542419, need a part of their mitigation done in EL3 and the rest in lower EL. But currently such workarounds return HIGHER_EL_MITIGATION which indicates that the erratum has already been mitigated by a higher EL(EL3 in this case) which causes the lower EL to not apply it's part of the mitigation.
This patch fixes this issue by adding support for split workarounds so that on certain errata we return AFFECTED even though EL3 has applied it's workaround. This is done by reusing the chosen field of erratum_entry structure into a bitfield that has two bitfields - Bit 0 indicates that the erratum has been enabled in build, Bit 1 indicates that the erratum is a split workaround and should return AFFECTED instead of HIGHER_EL_MITIGATION.
SDEN documentation: https://developer.arm.com/documentation/SDEN885747/latest
Signed-off-by: Arvind Ram Prakash <arvind.ramprakash@arm.com> Change-Id: Iec94d665b5f55609507a219a7d1771eb75e7f4a7
show more ...
|
| ec6f49c2 | 01-Aug-2024 |
Vinoj Soundararajan <vinojs@google.com> |
feat(ras): add eabort get helper function
Add EABORT get field helper function to obtain SET, AET (UET) values from esr_el3/disr_el1 based on PE error state recording in the exception syndrome refer
feat(ras): add eabort get helper function
Add EABORT get field helper function to obtain SET, AET (UET) values from esr_el3/disr_el1 based on PE error state recording in the exception syndrome refer to RAS PE architecture in https://developer.arm.com/documentation/ddi0487/latest/
Change-Id: I0011f041a3089c9bbf670275687ad7c3362a07f9 Signed-off-by: Vinoj Soundararajan <vinojs@google.com>
show more ...
|
| daeae495 | 01-Aug-2024 |
Vinoj Soundararajan <vinojs@google.com> |
feat(ras): add asynchronous error type corrected
Add asynchronous error type Corrected (CE) to error status AET based on PE error state recording in the exception syndrome Refer to https://developer
feat(ras): add asynchronous error type corrected
Add asynchronous error type Corrected (CE) to error status AET based on PE error state recording in the exception syndrome Refer to https://developer.arm.com/documentation/ddi0487/latest/ RAS PE architecture.
Change-Id: I9f2525411b94c8fd397b4a0b8cf5dc47457a2771 Signed-off-by: Vinoj Soundararajan <vinojs@google.com>
show more ...
|
| e5cd3e81 | 01-Aug-2024 |
Vinoj Soundararajan <vinojs@google.com> |
fix(ras): fix typo in uncorrectable error type UEO
Fix spelling for UEO from restable to restartable based on PE error state recording in the exception syndrome Refer to https://developer.arm.com/do
fix(ras): fix typo in uncorrectable error type UEO
Fix spelling for UEO from restable to restartable based on PE error state recording in the exception syndrome Refer to https://developer.arm.com/documentation/ddi0487/latest/ RAS PE architecture.
Change-Id: I4da419f2120a7385853d4da78b409c675cdfe1c8 Signed-off-by: Vinoj Soundararajan <vinojs@google.com>
show more ...
|
| 9c17687a | 01-Aug-2024 |
Vinoj Soundararajan <vinojs@google.com> |
fix(ras): fix status synchronous error type fields
Based on SET bits of ISS encoding for an exception from Data or Instruction Abort. (Refer to ESR_EL3) 1. Fix Synchronous error type restartable val
fix(ras): fix status synchronous error type fields
Based on SET bits of ISS encoding for an exception from Data or Instruction Abort. (Refer to ESR_EL3) 1. Fix Synchronous error type restartable value from 1 to 3 2. Remove corrected CE field which is not applicable to SET
Change-Id: If357da9881bee962825bc3b9423ba7fc107f9b1d Signed-off-by: Vinoj Soundararajan <vinojs@google.com>
show more ...
|
| 7990cc80 | 28-Feb-2025 |
Manish V Badarkhe <manish.badarkhe@arm.com> |
Merge "feat(handoff): add transfer entry printer" into integration |
| c7220035 | 03-Feb-2025 |
Manish Pandey <manish.pandey2@arm.com> |
fix(el3-runtime): replace CTX_ESR_EL3 with CTX_DOUBLE_FAULT_ESR
ESR_EL3 value is updated when an exception is taken to EL3 and its value does not change until a new exception is taken to EL3. We nee
fix(el3-runtime): replace CTX_ESR_EL3 with CTX_DOUBLE_FAULT_ESR
ESR_EL3 value is updated when an exception is taken to EL3 and its value does not change until a new exception is taken to EL3. We need to save ESR in context memory only when we expect nested exception in EL3.
The scenarios where we would expect nested EL3 execution are related with FFH_SUPPORT, namely 1.Handling pending async EAs at EL3 boundry - It uses CTX_SAVED_ESR_EL3 to preserve origins esr_el3 2.Double fault handling - Introduce an explicit storage (CTX_DOUBLE_FAULT_ESR) for esr_el3 to take care of DobuleFault.
As the ESR context has been removed, read the register directly instead of its context value in RD platform.
Signed-off-by: Manish Pandey <manish.pandey2@arm.com> Change-Id: I7720c5f03903f894a77413a235e3cc05c86f9c17
show more ...
|
| 98c65165 | 26-Feb-2025 |
Govindraj Raja <govindraj.raja@arm.com> |
chore: rename arcadia to Cortex-A320
Cortex-A320 has been announced, rename arcadia to Cortex-A320.
Ref: https://newsroom.arm.com/blog/introducing-arm-cortex-a320-cpu https://www.arm.com/products/s
chore: rename arcadia to Cortex-A320
Cortex-A320 has been announced, rename arcadia to Cortex-A320.
Ref: https://newsroom.arm.com/blog/introducing-arm-cortex-a320-cpu https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a320
Change-Id: Ifb3743d43dca3d8caaf1e7416715ccca4fdf195f Signed-off-by: Govindraj Raja <govindraj.raja@arm.com>
show more ...
|
| 937c513d | 13-Dec-2024 |
Harrison Mutai <harrison.mutai@arm.com> |
feat(handoff): add transfer entry printer
Change-Id: Ib7d370b023f92f2fffbd341bcf874914fcc1bac2 Signed-off-by: Harrison Mutai <harrison.mutai@arm.com> |
| 0a580b51 | 15-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cm): drop ZCR_EL3 saving and some ISBs and replace them with root context
SVE and SME aren't enabled symmetrically for all worlds, but EL3 needs to context switch them nonetheless. Previously,
perf(cm): drop ZCR_EL3 saving and some ISBs and replace them with root context
SVE and SME aren't enabled symmetrically for all worlds, but EL3 needs to context switch them nonetheless. Previously, this had to happen by writing the enable bits just before reading/writing the relevant context. But since the introduction of root context, this need not be the case. We can have these enables always be present for EL3 and save on some work (and ISBs!) on every context switch.
We can also hoist ZCR_EL3 to a never changing register, as we set its value to be identical for every world, which happens to be the one we want for EL3 too.
Change-Id: I3d950e72049a298008205ba32f230d5a5c02f8b0 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 83ec7e45 | 06-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness in that this is rather slow. A large part of it is written in assembly, making it opaque to the compiler for optimisations. The future proofness requires reading registers that are effectively `volatile`, making it even harder for the compiler, as well as adding lots of implicit barriers, making it hard for the microarchitecutre to optimise as well.
We can make a few assumptions, checked by a few well placed asserts, and remove a lot of this burden. For a start, at the moment there are 4 group 0 counters with static assignments. Contexting them is a trivial affair that doesn't need a loop. Similarly, there can only be up to 16 group 1 counters. Contexting them is a bit harder, but we can do with a single branch with a falling through switch. If/when both of these change, we have a pair of asserts and the feature detection mechanism to guard us against pretending that we support something we don't.
We can drop contexting of the offset registers. They are fully accessible by EL2 and as such are its responsibility to preserve on powerdown.
Another small thing we can do, is pass the core_pos into the hook. The caller already knows which core we're running on, we don't need to call this non-trivial function again.
Finally, knowing this, we don't really need the auxiliary AMUs to be described by the device tree. Linux doesn't care at the moment, and any information we need for EL3 can be neatly placed in a simple array.
All of this, combined with lifting the actual saving out of assembly, reduces the instructions to save the context from 180 to 40, including a lot fewer branches. The code is also much shorter and easier to read.
Also propagate to aarch32 so that the two don't diverge too much.
Change-Id: Ib62e6e9ba5be7fb9fb8965c8eee148d5598a5361 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 2590e819 | 25-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(mpmm): greatly simplify MPMM enablement
MPMM is a core-specific microarchitectural feature. It has been present in every Arm core since the Cortex-A510 and has been implemented in exactly the s
perf(mpmm): greatly simplify MPMM enablement
MPMM is a core-specific microarchitectural feature. It has been present in every Arm core since the Cortex-A510 and has been implemented in exactly the same way. Despite that, it is enabled more like an architectural feature with a top level enable flag. This utilised the identical implementation.
This duality has left MPMM in an awkward place, where its enablement should be generic, like an architectural feature, but since it is not, it should also be core-specific if it ever changes. One choice to do this has been through the device tree.
This has worked just fine so far, however, recent implementations expose a weakness in that this is rather slow - the device tree has to be read, there's a long call stack of functions with many branches, and system registers are read. In the hot path of PSCI CPU powerdown, this has a significant and measurable impact. Besides it being a rather large amount of code that is difficult to understand.
Since MPMM is a microarchitectural feature, its correct placement is in the reset function. The essence of the current enablement is to write CPUPPMCR_EL3.MPMM_EN if CPUPPMCR_EL3.MPMMPINCTL == 0. Replacing the C enablement with an assembly macro in each CPU's reset function achieves the same effect with just a single close branch and a grand total of 6 instructions (versus the old 2 branches and 32 instructions).
Having done this, the device tree entry becomes redundant. Should a core that doesn't support MPMM arise, this can cleanly be handled in the reset function. As such, the whole ENABLE_MPMM_FCONF and platform hooks mechanisms become obsolete and are removed.
Change-Id: I1d0475b21a1625bb3519f513ba109284f973ffdf Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| a8a5d39d | 24-Feb-2025 |
Manish V Badarkhe <manish.badarkhe@arm.com> |
Merge changes from topic "bk/errata_speed" into integration
* changes: refactor(cpus): declare runtime errata correctly perf(cpus): make reset errata do fewer branches perf(cpus): inline the i
Merge changes from topic "bk/errata_speed" into integration
* changes: refactor(cpus): declare runtime errata correctly perf(cpus): make reset errata do fewer branches perf(cpus): inline the init_cpu_data_ptr function perf(cpus): inline the reset function perf(cpus): inline the cpu_get_rev_var call perf(cpus): inline cpu_rev_var checks refactor(cpus): register DSU errata with the errata framework's wrappers refactor(cpus): convert checker functions to standard helpers refactor(cpus): convert the Cortex-A65 to use the errata framework fix(cpus): declare reset errata correctly
show more ...
|