| 83ec7e45 | 06-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness in that this is rather slow. A large part of it is written in assembly, making it opaque to the compiler for optimisations. The future proofness requires reading registers that are effectively `volatile`, making it even harder for the compiler, as well as adding lots of implicit barriers, making it hard for the microarchitecutre to optimise as well.
We can make a few assumptions, checked by a few well placed asserts, and remove a lot of this burden. For a start, at the moment there are 4 group 0 counters with static assignments. Contexting them is a trivial affair that doesn't need a loop. Similarly, there can only be up to 16 group 1 counters. Contexting them is a bit harder, but we can do with a single branch with a falling through switch. If/when both of these change, we have a pair of asserts and the feature detection mechanism to guard us against pretending that we support something we don't.
We can drop contexting of the offset registers. They are fully accessible by EL2 and as such are its responsibility to preserve on powerdown.
Another small thing we can do, is pass the core_pos into the hook. The caller already knows which core we're running on, we don't need to call this non-trivial function again.
Finally, knowing this, we don't really need the auxiliary AMUs to be described by the device tree. Linux doesn't care at the moment, and any information we need for EL3 can be neatly placed in a simple array.
All of this, combined with lifting the actual saving out of assembly, reduces the instructions to save the context from 180 to 40, including a lot fewer branches. The code is also much shorter and easier to read.
Also propagate to aarch32 so that the two don't diverge too much.
Change-Id: Ib62e6e9ba5be7fb9fb8965c8eee148d5598a5361 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 5b46aacc | 04-Oct-2024 |
Yann Gautier <yann.gautier@st.com> |
refactor(tc): add plat_rse_comms_init
The same way it is done for neoverse_rd, create a plat_rse_comms_init() function to call rse_comms_init().
Signed-off-by: Yann Gautier <yann.gautier@st.com> Ch
refactor(tc): add plat_rse_comms_init
The same way it is done for neoverse_rd, create a plat_rse_comms_init() function to call rse_comms_init().
Signed-off-by: Yann Gautier <yann.gautier@st.com> Change-Id: I12f3b8a38a5369decb4b97f8aceeb0dc81cbea28
show more ...
|
| 2ae197ac | 16-May-2024 |
Leo Yan <leo.yan@arm.com> |
feat(tc): enable trng
Enable the trng on the platform, which can be used by other features. `rng-seed` has been removed and enabled `FEAT_RNG_TRAP` to trap to EL3 when accessing system registers RND
feat(tc): enable trng
Enable the trng on the platform, which can be used by other features. `rng-seed` has been removed and enabled `FEAT_RNG_TRAP` to trap to EL3 when accessing system registers RNDR and RNDRRS
Change-Id: Ibde39115f285e67d31b14863c75beaf37493deca Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Icen Zeyada <Icen.Zeyada2@arm.com>
show more ...
|
| 45c7328c | 20-Sep-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
fix(cpus): avoid SME related loss of context on powerdown
Travis' and Gelas' TRMs tell us to disable SME (set PSTATE.{ZA, SM} to 0) when we're attempting to power down. What they don't tell us is th
fix(cpus): avoid SME related loss of context on powerdown
Travis' and Gelas' TRMs tell us to disable SME (set PSTATE.{ZA, SM} to 0) when we're attempting to power down. What they don't tell us is that if this isn't done, the powerdown request will be rejected. On the CPU_OFF path that's not a problem - we can force SVCR to 0 and be certain the core will power off.
On the suspend to powerdown path, however, we cannot do this. The TRM also tells us that the sequence could also be aborted on eg. GIC interrupts. If this were to happen when we have overwritten SVCR to 0, upon a return to the caller they would experience a loss of context. We know that at least Linux may call into PSCI with SVCR != 0. One option is to save the entire SME context which would be quite expensive just to work around. Another option is to downgrade the request to a normal suspend when SME was left on. This option is better as this is expected to happen rarely enough to ignore the wasted power and we don't want to burden the generic (correct) path with needless context management.
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com> Change-Id: I698fa8490ebf51461f6aa8bba84f9827c5c46ad4
show more ...
|
| 2b5e00d4 | 19-Dec-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
feat(psci): allow cores to wake up from powerdown
The simplistic view of a core's powerdown sequence is that power is atomically cut upon calling `wfi`. However, it turns out that it has lots to do
feat(psci): allow cores to wake up from powerdown
The simplistic view of a core's powerdown sequence is that power is atomically cut upon calling `wfi`. However, it turns out that it has lots to do - it has to talk to the interconnect to exit coherency, clean caches, check for RAS errors, etc. These take significant amounts of time and are certainly not atomic. As such there is a significant window of opportunity for external events to happen. Many of these steps are not destructive to context, so theoretically, the core can just "give up" half way (or roll certain actions back) and carry on running. The point in this sequence after which roll back is not possible is called the point of no return.
One of these actions is the checking for RAS errors. It is possible for one to happen during this lengthy sequence, or at least remain undiscovered until that point. If the core were to continue powerdown when that happens, there would be no (easy) way to inform anyone about it. Rejecting the powerdown and letting software handle the error is the best way to implement this.
Arm cores since at least the a510 have included this exact feature. So far it hasn't been deemed necessary to account for it in firmware due to the low likelihood of this happening. However, events like GIC wakeup requests are much more probable. Older cores will powerdown and immediately power back up when this happens. Travis and Gelas include a feature similar to the RAS case above, called powerdown abandon. The idea is that this will improve the latency to service the interrupt by saving on work which the core and software need to do.
So far firmware has relied on the `wfi` being the point of no return and if it doesn't explicitly detect a pending interrupt quite early on, it will embark onto a sequence that it expects to end with shutdown. To accommodate for it not being a point of no return, we must undo all of the system management we did, just like in the warm boot entrypoint.
To achieve that, the pwr_domain_pwr_down_wfi hook must not be terminal. Most recent platforms do some platform management and finish on the standard `wfi`, followed by a panic or an endless loop as this is expected to not return. To make this generic, any platform that wishes to support wakeups must instead let common code call `psci_power_down_wfi()` right after. Besides wakeups, this lets common code handle powerdown errata better as well.
Then, the CPU_OFF case is simple - PSCI does not allow it to return. So the best that can be done is to attempt the `wfi` a few times (the choice of 32 is arbitrary) in the hope that the wakeup is transient. If it isn't, the only choice is to panic, as the system is likely to be in a bad state, eg. interrupts weren't routed away. The same applies for SYSTEM_OFF, SYSTEM_RESET, and SYSTEM_RESET2. There the panic won't matter as the system is going offline one way or another. The RAS case will be considered in a separate patch.
Now, the CPU_SUSPEND case is more involved. First, to powerdown it must wipe its context as it is not written on warm boot. But it cannot be overwritten in case of a wakeup. To avoid the catch 22, save a copy that will only be used if powerdown fails. That is about 500 bytes on the stack so it hopefully doesn't tip anyone over any limits. In future that can be avoided by having a core manage its own context.
Second, when the core wakes up, it must undo anything it did to prepare for poweroff, which for the cores we care about, is writing CPUPWRCTLR_EL1.CORE_PWRDN_EN. The least intrusive for the cpu library way of doing this is to simply call the power off hook again and have the hook toggle the bit. If in the future there need to be more complex sequences, their direction can be advised on the value of this bit.
Third, do the actual "resume". Most of the logic is already there for the retention suspend, so that only needs a small touch up to apply to the powerdown case as well. The missing bit is the powerdown specific state management. Luckily, the warmboot entrypoint does exactly that already too, so steal that and we're done.
All of this is hidden behind a FEAT_PABANDON flag since it has a large memory and runtime cost that we don't want to burden non pabandon cores with.
Finally, do some function renaming to better reflect their purpose and make names a little bit more consistent.
Change-Id: I2405b59300c2e24ce02e266f91b7c51474c1145f Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 7b41acaf | 05-Jul-2024 |
Jagdish Gediya <jagdish.gediya@arm.com> |
fix(tc): enable Last-level cache (LLC) for tc4
EXTLLC bit in CPUECTLR_EL1(for non-gelas cpus) and in CPUECTLR2_EL1 register for gelas cpu enables external Last-level cache in the system,
External L
fix(tc): enable Last-level cache (LLC) for tc4
EXTLLC bit in CPUECTLR_EL1(for non-gelas cpus) and in CPUECTLR2_EL1 register for gelas cpu enables external Last-level cache in the system,
External LLC is present on TC4 systems in MCN but it is not enabled in CPU registers so enable it.
On TC4, Gelas vs Non-Gelas CPUs have different bits to enable EXTLLC so take care of that as well.
Change-Id: Ic6a74b4af110a3c34d19131676e51901ea2bf6e3 Signed-off-by: Jagdish Gediya <jagdish.gediya@arm.com> Signed-off-by: Icen.Zeyada <Icen.Zeyada2@arm.com>
show more ...
|
| 54289385 | 13-Aug-2024 |
Jagdish Gediya <jagdish.gediya@arm.com> |
fix(tc): set console baurate to 38400 for fvp as well
Set console baurate to 38400 for fvp as well for code simplicity.
Change-Id: I58ba6b7043541f6eb67e32257307da4eba0bb28a Signed-off-by: Jagdish G
fix(tc): set console baurate to 38400 for fvp as well
Set console baurate to 38400 for fvp as well for code simplicity.
Change-Id: I58ba6b7043541f6eb67e32257307da4eba0bb28a Signed-off-by: Jagdish Gediya <jagdish.gediya@arm.com> Signed-off-by: Icen.Zeyada <Icen.Zeyada2@arm.com>
show more ...
|
| d1062c47 | 19-Jun-2024 |
Jagdish Gediya <jagdish.gediya@arm.com> |
feat(tc): enable MCN non-secure access to pmu counters on TC4
MCN PMU counters are by default non-accesible from non-secure world, so enable the non-secure access to those PMU counters so that linux
feat(tc): enable MCN non-secure access to pmu counters on TC4
MCN PMU counters are by default non-accesible from non-secure world, so enable the non-secure access to those PMU counters so that linux perf driver can read them.
Signed-off-by: Jagdish Gediya <jagdish.gediya@arm.com> Signed-off-by: Icen Zeyada <Icen.Zeyada2@arm.com> Change-Id: I1cf1f88f97e9062592fd5603a78fd36f15a15f89
show more ...
|