| 83ec7e45 | 06-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness in that this is rather slow. A large part of it is written in assembly, making it opaque to the compiler for optimisations. The future proofness requires reading registers that are effectively `volatile`, making it even harder for the compiler, as well as adding lots of implicit barriers, making it hard for the microarchitecutre to optimise as well.
We can make a few assumptions, checked by a few well placed asserts, and remove a lot of this burden. For a start, at the moment there are 4 group 0 counters with static assignments. Contexting them is a trivial affair that doesn't need a loop. Similarly, there can only be up to 16 group 1 counters. Contexting them is a bit harder, but we can do with a single branch with a falling through switch. If/when both of these change, we have a pair of asserts and the feature detection mechanism to guard us against pretending that we support something we don't.
We can drop contexting of the offset registers. They are fully accessible by EL2 and as such are its responsibility to preserve on powerdown.
Another small thing we can do, is pass the core_pos into the hook. The caller already knows which core we're running on, we don't need to call this non-trivial function again.
Finally, knowing this, we don't really need the auxiliary AMUs to be described by the device tree. Linux doesn't care at the moment, and any information we need for EL3 can be neatly placed in a simple array.
All of this, combined with lifting the actual saving out of assembly, reduces the instructions to save the context from 180 to 40, including a lot fewer branches. The code is also much shorter and easier to read.
Also propagate to aarch32 so that the two don't diverge too much.
Change-Id: Ib62e6e9ba5be7fb9fb8965c8eee148d5598a5361 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 73d98e37 | 02-Dec-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
fix(trbe): add a tsb before context switching
Just like for SPE, we need to synchronize TRBE samples before we change the context to ensure everything goes where it was intended to. If that is not d
fix(trbe): add a tsb before context switching
Just like for SPE, we need to synchronize TRBE samples before we change the context to ensure everything goes where it was intended to. If that is not done, the in-flight entries might use any piece of now incorrect context as there are no implicit ordering requirements.
Prior to root context, the buffer drain hooks would have done that. But now that must happen much earlier. So add a tsb to prepare_el3_entry as well.
Annoyingly, the barrier can be reordered relative to other instructions by default (rule RCKVWP). So add an isb after the psb/tsb to assure that they are ordered, at least as far as context is concerned.
Then, drop the buffer draining hooks. Everything they need to do is already done by now. There's a notable difference in that there are no dsb-s now. Since EL3 does not access the buffers or the feature specific context, we don't need to wait for them to finish.
Finally, drop a stray isb in the context saving macro. It is now absorbed into root context, but was missed.
Change-Id: I30797a40ac7f91d0bb71ad271a1597e85092ccd5 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| b36e975e | 19-Jul-2024 |
Arvind Ram Prakash <arvind.ramprakash@arm.com> |
feat(trbe): introduce trbe_disable() function
This patch adds trbe_disable() which disables Trace buffer access from lower ELs in all security state. This function makes Secure state the owner of Tr
feat(trbe): introduce trbe_disable() function
This patch adds trbe_disable() which disables Trace buffer access from lower ELs in all security state. This function makes Secure state the owner of Trace buffer and access from EL2/EL1 generate trap exceptions to EL3.
Signed-off-by: Arvind Ram Prakash <arvind.ramprakash@arm.com> Change-Id: If3e3bd621684b3c28f44c3ed2fe3df30b143f8cd
show more ...
|
| 651fe507 | 18-Jul-2024 |
Manish Pandey <manish.pandey2@arm.com> |
feat(spe): introduce spe_disable() function
Introduce a function to disable SPE feature for Non-secure state and do the default setting of making Secure state the owner of profiling buffers and trap
feat(spe): introduce spe_disable() function
Introduce a function to disable SPE feature for Non-secure state and do the default setting of making Secure state the owner of profiling buffers and trap access of profiling and profiling buffer control registers from lower ELs to EL3.
This functionality is required to handle asymmetric cores where SPE has to disabled at runtime.
Signed-off-by: Manish Pandey <manish.pandey2@arm.com> Change-Id: I2f99e922e8df06bfc900c153137aef7c9dcfd759
show more ...
|