| 0a580b51 | 15-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cm): drop ZCR_EL3 saving and some ISBs and replace them with root context
SVE and SME aren't enabled symmetrically for all worlds, but EL3 needs to context switch them nonetheless. Previously,
perf(cm): drop ZCR_EL3 saving and some ISBs and replace them with root context
SVE and SME aren't enabled symmetrically for all worlds, but EL3 needs to context switch them nonetheless. Previously, this had to happen by writing the enable bits just before reading/writing the relevant context. But since the introduction of root context, this need not be the case. We can have these enables always be present for EL3 and save on some work (and ISBs!) on every context switch.
We can also hoist ZCR_EL3 to a never changing register, as we set its value to be identical for every world, which happens to be the one we want for EL3 too.
Change-Id: I3d950e72049a298008205ba32f230d5a5c02f8b0 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 83ec7e45 | 06-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness in that this is rather slow. A large part of it is written in assembly, making it opaque to the compiler for optimisations. The future proofness requires reading registers that are effectively `volatile`, making it even harder for the compiler, as well as adding lots of implicit barriers, making it hard for the microarchitecutre to optimise as well.
We can make a few assumptions, checked by a few well placed asserts, and remove a lot of this burden. For a start, at the moment there are 4 group 0 counters with static assignments. Contexting them is a trivial affair that doesn't need a loop. Similarly, there can only be up to 16 group 1 counters. Contexting them is a bit harder, but we can do with a single branch with a falling through switch. If/when both of these change, we have a pair of asserts and the feature detection mechanism to guard us against pretending that we support something we don't.
We can drop contexting of the offset registers. They are fully accessible by EL2 and as such are its responsibility to preserve on powerdown.
Another small thing we can do, is pass the core_pos into the hook. The caller already knows which core we're running on, we don't need to call this non-trivial function again.
Finally, knowing this, we don't really need the auxiliary AMUs to be described by the device tree. Linux doesn't care at the moment, and any information we need for EL3 can be neatly placed in a simple array.
All of this, combined with lifting the actual saving out of assembly, reduces the instructions to save the context from 180 to 40, including a lot fewer branches. The code is also much shorter and easier to read.
Also propagate to aarch32 so that the two don't diverge too much.
Change-Id: Ib62e6e9ba5be7fb9fb8965c8eee148d5598a5361 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 41ae0473 | 03-Feb-2025 |
Sona Mathew <sonarebecca.mathew@arm.com> |
fix(rmm): add support for BRBCR_EL2 register for feat_brbe
Currently BRBE is being disabled for Realm world in EL3 by switching the SBRBE bit in mdcr_el3 register to 0b00. The patch removes the swit
fix(rmm): add support for BRBCR_EL2 register for feat_brbe
Currently BRBE is being disabled for Realm world in EL3 by switching the SBRBE bit in mdcr_el3 register to 0b00. The patch removes the switching of SBRBE bits, and adds context switch of BRBCR_EL2 register.
Change-Id: I66ca13edefc37e40fa265fd438b0b66f7d09b4bb Signed-off-by: Sona Mathew <sonarebecca.mathew@arm.com>
show more ...
|
| 73d98e37 | 02-Dec-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
fix(trbe): add a tsb before context switching
Just like for SPE, we need to synchronize TRBE samples before we change the context to ensure everything goes where it was intended to. If that is not d
fix(trbe): add a tsb before context switching
Just like for SPE, we need to synchronize TRBE samples before we change the context to ensure everything goes where it was intended to. If that is not done, the in-flight entries might use any piece of now incorrect context as there are no implicit ordering requirements.
Prior to root context, the buffer drain hooks would have done that. But now that must happen much earlier. So add a tsb to prepare_el3_entry as well.
Annoyingly, the barrier can be reordered relative to other instructions by default (rule RCKVWP). So add an isb after the psb/tsb to assure that they are ordered, at least as far as context is concerned.
Then, drop the buffer draining hooks. Everything they need to do is already done by now. There's a notable difference in that there are no dsb-s now. Since EL3 does not access the buffers or the feature specific context, we don't need to wait for them to finish.
Finally, drop a stray isb in the context saving macro. It is now absorbed into root context, but was missed.
Change-Id: I30797a40ac7f91d0bb71ad271a1597e85092ccd5 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 19d52a83 | 09-Aug-2024 |
Andre Przywara <andre.przywara@arm.com> |
feat(cpufeat): add ENABLE_FEAT_LS64_ACCDATA
Armv8.6 introduced the FEAT_LS64 extension, which provides a 64 *byte* store instruction. A related instruction is ST64BV0, which will replace the lowest
feat(cpufeat): add ENABLE_FEAT_LS64_ACCDATA
Armv8.6 introduced the FEAT_LS64 extension, which provides a 64 *byte* store instruction. A related instruction is ST64BV0, which will replace the lowest 32 bits of the data with a value taken from the ACCDATA_EL1 system register (so that EL0 cannot alter them). Using that ST64BV0 instruction and accessing the ACCDATA_EL1 system register is guarded by two SCR_EL3 bits, which we should set to avoid a trap into EL3, when lower ELs use one of those.
Add the required bits and pieces to make this feature usable: - Add the ENABLE_FEAT_LS64_ACCDATA build option (defaulting to 0). - Add the CPUID and SCR_EL3 bit definitions associated with FEAT_LS64. - Add a feature check to check for the existing four variants of the LS64 feature and detect future extensions. - Add code to save and restore the ACCDATA_EL1 register on secure/non-secure context switches. - Enable the feature with runtime detection for FVP and Arm FPGA.
Please note that the *basic* FEAT_LS64 feature does not feature any trap bits, it's only the addition of the ACCDATA_EL1 system register that adds these traps and the SCR_EL3 bits.
Change-Id: Ie3e2ca2d9c4fbbd45c0cc6089accbb825579138a Signed-off-by: Andre Przywara <andre.przywara@arm.com>
show more ...
|