| 83ec7e45 | 06-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness in that this is rather slow. A large part of it is written in assembly, making it opaque to the compiler for optimisations. The future proofness requires reading registers that are effectively `volatile`, making it even harder for the compiler, as well as adding lots of implicit barriers, making it hard for the microarchitecutre to optimise as well.
We can make a few assumptions, checked by a few well placed asserts, and remove a lot of this burden. For a start, at the moment there are 4 group 0 counters with static assignments. Contexting them is a trivial affair that doesn't need a loop. Similarly, there can only be up to 16 group 1 counters. Contexting them is a bit harder, but we can do with a single branch with a falling through switch. If/when both of these change, we have a pair of asserts and the feature detection mechanism to guard us against pretending that we support something we don't.
We can drop contexting of the offset registers. They are fully accessible by EL2 and as such are its responsibility to preserve on powerdown.
Another small thing we can do, is pass the core_pos into the hook. The caller already knows which core we're running on, we don't need to call this non-trivial function again.
Finally, knowing this, we don't really need the auxiliary AMUs to be described by the device tree. Linux doesn't care at the moment, and any information we need for EL3 can be neatly placed in a simple array.
All of this, combined with lifting the actual saving out of assembly, reduces the instructions to save the context from 180 to 40, including a lot fewer branches. The code is also much shorter and easier to read.
Also propagate to aarch32 so that the two don't diverge too much.
Change-Id: Ib62e6e9ba5be7fb9fb8965c8eee148d5598a5361 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 2590e819 | 25-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(mpmm): greatly simplify MPMM enablement
MPMM is a core-specific microarchitectural feature. It has been present in every Arm core since the Cortex-A510 and has been implemented in exactly the s
perf(mpmm): greatly simplify MPMM enablement
MPMM is a core-specific microarchitectural feature. It has been present in every Arm core since the Cortex-A510 and has been implemented in exactly the same way. Despite that, it is enabled more like an architectural feature with a top level enable flag. This utilised the identical implementation.
This duality has left MPMM in an awkward place, where its enablement should be generic, like an architectural feature, but since it is not, it should also be core-specific if it ever changes. One choice to do this has been through the device tree.
This has worked just fine so far, however, recent implementations expose a weakness in that this is rather slow - the device tree has to be read, there's a long call stack of functions with many branches, and system registers are read. In the hot path of PSCI CPU powerdown, this has a significant and measurable impact. Besides it being a rather large amount of code that is difficult to understand.
Since MPMM is a microarchitectural feature, its correct placement is in the reset function. The essence of the current enablement is to write CPUPPMCR_EL3.MPMM_EN if CPUPPMCR_EL3.MPMMPINCTL == 0. Replacing the C enablement with an assembly macro in each CPU's reset function achieves the same effect with just a single close branch and a grand total of 6 instructions (versus the old 2 branches and 32 instructions).
Having done this, the device tree entry becomes redundant. Should a core that doesn't support MPMM arise, this can cleanly be handled in the reset function. As such, the whole ENABLE_MPMM_FCONF and platform hooks mechanisms become obsolete and are removed.
Change-Id: I1d0475b21a1625bb3519f513ba109284f973ffdf Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| b07c317f | 19-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cpus): inline the init_cpu_data_ptr function
Similar to the reset function inline, inline this too to not do a costly branch with no extra cost.
Change-Id: I54cc399e570e9d0f373ae13c7224d32dbdf
perf(cpus): inline the init_cpu_data_ptr function
Similar to the reset function inline, inline this too to not do a costly branch with no extra cost.
Change-Id: I54cc399e570e9d0f373ae13c7224d32dbdfae1e5 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 79c0c7fa | 10-Dec-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
refactor(cm): clean up per-world context
In preparation for SMCCC_ARCH_FEATURE_AVAILABILITY, it is useful for context to be directly related to the underlying system. Currently, certain bits like SC
refactor(cm): clean up per-world context
In preparation for SMCCC_ARCH_FEATURE_AVAILABILITY, it is useful for context to be directly related to the underlying system. Currently, certain bits like SCR_EL3.APK are always set with the understanding that they will only take effect if the feature is present.
However, that is problematic for SMCCC_ARCH_FEATURE_AVAILABILITY (an SMCCC call to report which features firmware enables), as simply reading the enable bit may contradict the ID register, like the APK bit above for a system with no Pauth present.
This patch is to clean up these cases. Add a check for PAuth's presence so that the APK bit remains unset if not present. Also move SPE and TRBE enablement to only the NS context. They already only enable the features for NS only and disable them for Secure and Realm worlds. This change only makes these worlds' context read 0 for easy bitmasking.
There's only a single snag on SPE and TRBE. Currently, their fields have the same values and any world asymmetry is handled by hardware. Since we don't want to do that, the buffers' ownership will change if we just set the fields to 0 for non-NS worlds. Doing that, however, exposes Secure state to a potential denial of service attack - a malicious NS can enable profiling and call an SMC. Then, the owning security state will change and since no SPE/TRBE registers are contexted, Secure state will start generating records. Always have NS world own the buffers to prevent this.
Finally, get rid of manage_extensions_common() as it's just a level of indirection to enable a single feature.
Change-Id: I487bd4c70ac3e2105583917a0e5499e0ee248ed9 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| fc7dca72 | 16-Dec-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
refactor(cm): change owning security state when a feature is disabled
SPE and TRBE don't have an outright EL3 disable, there are only constraints on what's allowed. Since we only enable them for NS
refactor(cm): change owning security state when a feature is disabled
SPE and TRBE don't have an outright EL3 disable, there are only constraints on what's allowed. Since we only enable them for NS at the moment, we want NS to own the buffers even when the feature should be "disabled" for a world. This means that when we're running in NS everything is as normal but when running in S/RL then tracing is prohibited (since the buffers are owned by NS). This allows us to fiddle with context a bit more without having to context switch registers.
Change-Id: Ie1dc7c00e4cf9bcc746f02ae43633acca32d3758 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 73d98e37 | 02-Dec-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
fix(trbe): add a tsb before context switching
Just like for SPE, we need to synchronize TRBE samples before we change the context to ensure everything goes where it was intended to. If that is not d
fix(trbe): add a tsb before context switching
Just like for SPE, we need to synchronize TRBE samples before we change the context to ensure everything goes where it was intended to. If that is not done, the in-flight entries might use any piece of now incorrect context as there are no implicit ordering requirements.
Prior to root context, the buffer drain hooks would have done that. But now that must happen much earlier. So add a tsb to prepare_el3_entry as well.
Annoyingly, the barrier can be reordered relative to other instructions by default (rule RCKVWP). So add an isb after the psb/tsb to assure that they are ordered, at least as far as context is concerned.
Then, drop the buffer draining hooks. Everything they need to do is already done by now. There's a notable difference in that there are no dsb-s now. Since EL3 does not access the buffers or the feature specific context, we don't need to wait for them to finish.
Finally, drop a stray isb in the context saving macro. It is now absorbed into root context, but was missed.
Change-Id: I30797a40ac7f91d0bb71ad271a1597e85092ccd5 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 4ec4e545 | 06-Sep-2024 |
Jayanth Dodderi Chidanand <jayanthdodderi.chidanand@arm.com> |
feat(sctlr2): add support for FEAT_SCTLR2
Arm v8.9 introduces FEAT_SCTLR2, adding SCTLR2_ELx registers. Support this, context switching the registers and disabling traps so lower ELs can access the
feat(sctlr2): add support for FEAT_SCTLR2
Arm v8.9 introduces FEAT_SCTLR2, adding SCTLR2_ELx registers. Support this, context switching the registers and disabling traps so lower ELs can access the new registers.
Change the FVP platform to default to handling this as a dynamic option so the right decision can be made by the code at runtime.
Change-Id: I0c4cba86917b6b065a7e8dd6af7daf64ee18dcda Signed-off-by: Jayanth Dodderi Chidanand <jayanthdodderi.chidanand@arm.com> Signed-off-by: Govindraj Raja <govindraj.raja@arm.com>
show more ...
|