| ef738d19 | 21-Jun-2024 |
Manish Pandey <manish.pandey2@arm.com> |
feat(psci): remove cpu context init by index
Currently, the calling core (meaning the core which received the call to CPU_ON or the powerdown path of CPU_SUSPEND on the same core) is in charge of in
feat(psci): remove cpu context init by index
Currently, the calling core (meaning the core which received the call to CPU_ON or the powerdown path of CPU_SUSPEND on the same core) is in charge of initialising the context for the waking core (the warmboot entrypoint for both). This is convenient because the calling core can write the context while in coherency and the waking core will only need the context after its entered coherency. This avoids any cache maintenance and makes communication simple.
However, this has 3 main problems: a) asymmetric feature support is problematic - the calling core has no way of knowing the feature set of the waking core. If the two diverge, the architectural feature discovery via ID registers breaks down. We've thus far "fixed" this on a case by case basis which doesn't scale and introduces redundancy.
b) powerdown abandon (pabandon) introduces a contradiction - the calling core has to initialise the context for when the core wakes up, but should the core not powerdown it needs its old context intact. The only way to work around this is by keeping two copies of context which incurs a runtime and memory overhead.
c) cm_prepare_el3_exit[_ns]() doesn't have access to the entrypoint but needs it to make initialisation decisions. We can infer some of this from registers that have already been written but this is awkwardly limiting for what we can do. This also necessitates the split from the context initialisation.
We can solve all three by a making a core be in full ownership of its own context. The calling core then only writes entrypoint information and nothing else. The waking core then initialises its own context as it sees fit with full knowledge of the whole picture.
The only tricky bit is cache coherency - the waking core has to be able to coherently observe its new entrypoint. Calling cores will write to the shared region with coherent caches on. If we make sure to read the context only after the waking core has entered coherency, then we can avoid cache operations and let hardware handle everything.
We can skip the spsr check for FEAT_TCR2 as it doesn't make a difference. We can also skip enabling it twice from generic code.
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com> Signed-off-by: Manish Pandey <manish.pandey2@arm.com> Change-Id: I86e7fe8b698191fc3b469e5ced1fd010f8754b0e
show more ...
|
| 4a7916a5 | 26-Mar-2025 |
Sudeep Holla <sudeep.holla@arm.com> |
docs: clarify multiple UUID support in ffa manifest
If a partition supports multiple UUID, the UUID property in the list of partition properties in the FF-A manifest must be a list or array. Update
docs: clarify multiple UUID support in ffa manifest
If a partition supports multiple UUID, the UUID property in the list of partition properties in the FF-A manifest must be a list or array. Update the document to clarify the same.
Change-Id: Id13fe8dbf57a00e5cd186158270b716a4a9aedf7 Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
show more ...
|
| 0a580b51 | 15-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cm): drop ZCR_EL3 saving and some ISBs and replace them with root context
SVE and SME aren't enabled symmetrically for all worlds, but EL3 needs to context switch them nonetheless. Previously,
perf(cm): drop ZCR_EL3 saving and some ISBs and replace them with root context
SVE and SME aren't enabled symmetrically for all worlds, but EL3 needs to context switch them nonetheless. Previously, this had to happen by writing the enable bits just before reading/writing the relevant context. But since the introduction of root context, this need not be the case. We can have these enables always be present for EL3 and save on some work (and ISBs!) on every context switch.
We can also hoist ZCR_EL3 to a never changing register, as we set its value to be identical for every world, which happens to be the one we want for EL3 too.
Change-Id: I3d950e72049a298008205ba32f230d5a5c02f8b0 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 83ec7e45 | 06-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness
perf(amu): greatly simplify AMU context management
The current code is incredibly resilient to updates to the spec and has worked quite well so far. However, recent implementations expose a weakness in that this is rather slow. A large part of it is written in assembly, making it opaque to the compiler for optimisations. The future proofness requires reading registers that are effectively `volatile`, making it even harder for the compiler, as well as adding lots of implicit barriers, making it hard for the microarchitecutre to optimise as well.
We can make a few assumptions, checked by a few well placed asserts, and remove a lot of this burden. For a start, at the moment there are 4 group 0 counters with static assignments. Contexting them is a trivial affair that doesn't need a loop. Similarly, there can only be up to 16 group 1 counters. Contexting them is a bit harder, but we can do with a single branch with a falling through switch. If/when both of these change, we have a pair of asserts and the feature detection mechanism to guard us against pretending that we support something we don't.
We can drop contexting of the offset registers. They are fully accessible by EL2 and as such are its responsibility to preserve on powerdown.
Another small thing we can do, is pass the core_pos into the hook. The caller already knows which core we're running on, we don't need to call this non-trivial function again.
Finally, knowing this, we don't really need the auxiliary AMUs to be described by the device tree. Linux doesn't care at the moment, and any information we need for EL3 can be neatly placed in a simple array.
All of this, combined with lifting the actual saving out of assembly, reduces the instructions to save the context from 180 to 40, including a lot fewer branches. The code is also much shorter and easier to read.
Also propagate to aarch32 so that the two don't diverge too much.
Change-Id: Ib62e6e9ba5be7fb9fb8965c8eee148d5598a5361 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 3406ff00 | 18-Sep-2024 |
Madhukar Pappireddy <madhukar.pappireddy@arm.com> |
docs: fix ff-a manifest binding document
The support for runtime-model has never been implemented by any SPMC. Hence, remove the corresponding field from binding document.
Also, fix the incorrect d
docs: fix ff-a manifest binding document
The support for runtime-model has never been implemented by any SPMC. Hence, remove the corresponding field from binding document.
Also, fix the incorrect description of the `managed-exit-virq` property.
Signed-off-by: Madhukar Pappireddy <madhukar.pappireddy@arm.com> Change-Id: I0a5ef3f08202a8c76edd9a6e1ac680ac3a38ca60
show more ...
|
| 42cf6026 | 10-Jul-2024 |
Juan Pablo Conde <juanpablo.conde@arm.com> |
refactor(rmmd): plat token requests in pieces
Until now, the attestation token size was limited by the size of the shared buffer between RMM and TF-A. With this change, RMM can now request the token
refactor(rmmd): plat token requests in pieces
Until now, the attestation token size was limited by the size of the shared buffer between RMM and TF-A. With this change, RMM can now request the token in pieces, so they fit in the shared buffer. A new output parameter was added to the SMC call, which will return (along with the size of bytes copied into the buffer) the number of bytes of the token that remain to be retrieved.
TF-A will keep an offset variable that will indicate the position in the token where the next call will retrieve bytes from. This offset will be increased on every call by adding the number number of bytes copied. If the received hash size is not 0, TF-A will reset the offset to 0 and copy from that position on.
The SMC call will now return at most the size of the shared buffer in bytes on every call. Therefore, from now on, multiple SMC calls may be needed to be issued if the token size exceeds the shared buffer size.
Change-Id: I591f7013d06f64e98afaf9535dbea6f815799723 Signed-off-by: Juan Pablo Conde <juanpablo.conde@arm.com>
show more ...
|