| f396aec8 | 09-Sep-2025 |
Arvind Ram Prakash <arvind.ramprakash@arm.com> |
feat(cpufeat): add support for FEAT_IDTE3
This patch adds support for FEAT_IDTE3, which introduces support for handling the trapping of Group 3 and Group 5 (only GMID_EL1) registers to EL3 (unless t
feat(cpufeat): add support for FEAT_IDTE3
This patch adds support for FEAT_IDTE3, which introduces support for handling the trapping of Group 3 and Group 5 (only GMID_EL1) registers to EL3 (unless trapped to EL2). IDTE3 allows EL3 to modify the view of ID registers for lower ELs, and this capability is used to disable fields of ID registers tied to disabled features.
The ID registers are initially read as-is and stored in context. Then, based on the feature enablement status for each world, if a particular feature is disabled, its corresponding field in the cached ID register is set to Res0. When lower ELs attempt to read an ID register, the cached ID register value is returned. This allows EL3 to prevent lower ELs from accessing feature-specific system registers that are disabled in EL3, even though the hardware implements them.
The emulated ID register values are stored primarily in per-world context, except for certain debug-related ID registers such as ID_AA64DFR0_EL1 and ID_AA64DFR1_EL1, which are stored in the cpu_data and are unique to each PE. This is done to support feature asymmetry that is commonly seen in debug features.
FEAT_IDTE3 traps all Group 3 ID registers in the range op0 == 3, op1 == 0, CRn == 0, CRm == {2–7}, op2 == {0–7} and the Group 5 GMID_EL1 register. However, only a handful of ID registers contain fields used to detect features enabled in EL3. Hence, we only cache those ID registers, while the rest are transparently returned as is to the lower EL.
This patch updates the CREATE_FEATURE_FUNCS macro to generate update_feat_xyz_idreg_field() functions that disable ID register fields on a per-feature basis. The enabled_worlds scope is used to disable ID register fields for security states where the feature is not enabled.
This EXPERIMENTAL feature is controlled by the ENABLE_FEAT_IDTE3 build flag and is currently disabled by default.
Signed-off-by: Arvind Ram Prakash <arvind.ramprakash@arm.com> Change-Id: I5f998eeab81bb48c7595addc5595313a9ebb96d5
show more ...
|
| 4779becd | 06-Aug-2025 |
Boyan Karatotev <boyan.karatotev@arm.com> |
refactor(el3-runtime): streamline cpu_data assembly offsets using the cpu_ops template
The cpu_data structure, just like cpu_ops, is collection of disparate data that must be accessible from both C
refactor(el3-runtime): streamline cpu_data assembly offsets using the cpu_ops template
The cpu_data structure, just like cpu_ops, is collection of disparate data that must be accessible from both C and assembly. Achieving this is tricky as there is no way to export structure offsets from C directly so they must be manually recreated with `#define`s and asserts. However, the cpu_data structure is quite old and the assembly offsets are a patchwork of additions and extremely difficult to reason with and modify. In fact, certain currently unused builds with ENABLE_RUNTIME_INSTRUMENTATION=1 fail to build.
To untangle this, convert the assembly offsets to the pattern used for the cpu_ops structure. That is, first define the sizes of every member, as generically as possible, and then chain their offsets one after the other. To make sure this is always correct, add a CASSERT for the offset of every member. This makes it easy to modify the structure and fixes the build failures.
Change-Id: I61aeb55e9c494896663a3c719c10e3c072f56349 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 34a22a02 | 05-Aug-2025 |
Boyan Karatotev <boyan.karatotev@arm.com> |
refactor(el3-runtime): move context security states to context.h
The three security states (S, NS, RL) are architecturally quite consistent - anything that uses them has the same numerical assignmen
refactor(el3-runtime): move context security states to context.h
The three security states (S, NS, RL) are architecturally quite consistent - anything that uses them has the same numerical assignments (0, 1, 2) and they are quite convenient for indexing. However, we're not as consistent in tf-a and this is defined in a few places. Since cpu_data has a dependency on the context management library, use its security state convention in a few more places and take away this responsibility from cpu_data.
Change-Id: Iec73b2be2eef91975554767557de72424d0031f1 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 0a580b51 | 15-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cm): drop ZCR_EL3 saving and some ISBs and replace them with root context
SVE and SME aren't enabled symmetrically for all worlds, but EL3 needs to context switch them nonetheless. Previously,
perf(cm): drop ZCR_EL3 saving and some ISBs and replace them with root context
SVE and SME aren't enabled symmetrically for all worlds, but EL3 needs to context switch them nonetheless. Previously, this had to happen by writing the enable bits just before reading/writing the relevant context. But since the introduction of root context, this need not be the case. We can have these enables always be present for EL3 and save on some work (and ISBs!) on every context switch.
We can also hoist ZCR_EL3 to a never changing register, as we set its value to be identical for every world, which happens to be the one we want for EL3 too.
Change-Id: I3d950e72049a298008205ba32f230d5a5c02f8b0 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 83ec7e45 | 06-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness in that this is rather slow. A large part of it is written in assembly, making it opaque to the compiler for optimisations. The future proofness requires reading registers that are effectively `volatile`, making it even harder for the compiler, as well as adding lots of implicit barriers, making it hard for the microarchitecutre to optimise as well.
We can make a few assumptions, checked by a few well placed asserts, and remove a lot of this burden. For a start, at the moment there are 4 group 0 counters with static assignments. Contexting them is a trivial affair that doesn't need a loop. Similarly, there can only be up to 16 group 1 counters. Contexting them is a bit harder, but we can do with a single branch with a falling through switch. If/when both of these change, we have a pair of asserts and the feature detection mechanism to guard us against pretending that we support something we don't.
We can drop contexting of the offset registers. They are fully accessible by EL2 and as such are its responsibility to preserve on powerdown.
Another small thing we can do, is pass the core_pos into the hook. The caller already knows which core we're running on, we don't need to call this non-trivial function again.
Finally, knowing this, we don't really need the auxiliary AMUs to be described by the device tree. Linux doesn't care at the moment, and any information we need for EL3 can be neatly placed in a simple array.
All of this, combined with lifting the actual saving out of assembly, reduces the instructions to save the context from 180 to 40, including a lot fewer branches. The code is also much shorter and easier to read.
Also propagate to aarch32 so that the two don't diverge too much.
Change-Id: Ib62e6e9ba5be7fb9fb8965c8eee148d5598a5361 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|