| 8c6fa3d4 | 03-Feb-2026 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cpufeat): check the lock isn't held before trying to acquire it
Some context: `ldxr` primes the monitor, `cas` is allowed to clear the monitor even when it fails, `wfe` wakes up on clearing the
perf(cpufeat): check the lock isn't held before trying to acquire it
Some context: `ldxr` primes the monitor, `cas` is allowed to clear the monitor even when it fails, `wfe` wakes up on clearing the monitor.
When we have 3 participants A, B, and C, A has acquired the lock and B tries to acquire the lock after, it will run `cas`, fail, do `ldxr` to prime the monitor and `wfe` to sleep. If C then tries to acquire the monitor it will also run `cas` as its first order of business, promptly waking B up, fail, do `ldxr` to prime its monitor and sleep too at `wfe`. Then, when B wakes up, it will run `cas`, wake C up....
On real platforms the only drawback of this is excessive power consumption. With only a handful of participants we can be sure to have enough wake up noise so that no secondary cores will sleep. On virtual platforms (mainly FVP) this gets into a pathological case where from the model's perspective each core is active and actively amplifying each others' workloads, slowing the working core down by orders of magnitude.
To fix this use a slightly different atomic lock acquisition algorithm. Instead of doing `cas` to try to acquire the lock first and only then falling back to the `ldxr` to go to sleep, reverse the order so that we first check the lock isn't held with `ldxr` (and sleeping if not) and only trying to lock it when we know we have a chance.
In effect, this patch implements the exact assembly sequence as described in "Learn the architecture - Implementation Software Synchronization Primitives in A64", chapter 3.3.
Finally, this patch adds a `sev` instruction to wake cores if they use atomics since they will not do so automatically and now they won't be spuriously woken.
Change-Id: I9cf3fd1d7b5cf56f08c705114bc6cada80961388 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 07ba153f | 19-Dec-2025 |
Boyan Karatotev <boyan.karatotev@arm.com> |
feat(locks): make spin_trylock with exclusives spin until it knows the state of the lock
spin_trylock() is meant to be a non-blocking equivalent of spin_lock(). When we have atomics this is easy - t
feat(locks): make spin_trylock with exclusives spin until it knows the state of the lock
spin_trylock() is meant to be a non-blocking equivalent of spin_lock(). When we have atomics this is easy - the `cas` will directly return the state of the lock (held or not held). However, when using exclusives, there's a third state - failed to hold the lock. This happens when the store exclusive couldn't complete the write and bailed. The current implementation will pigeonhole this state into a "not held" state which loses this subtlety and can return with no one holding the lock.
This patch makes it so that the operation is retried until the core is certain it either holds the lock or someone else does. This keeps the nonblocking nature of the trylock call but may incur a delay until the state of the lock settles.
Change-Id: I1a57de22557e13c22f6a6afdef4c28f679dbe7f2 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|