| 89dba82d | 22-Jan-2025 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cpus): make reset errata do fewer branches
Errata application is painful for performance. For a start, it's done when the core has just come out of reset, which means branch predictors and cach
perf(cpus): make reset errata do fewer branches
Errata application is painful for performance. For a start, it's done when the core has just come out of reset, which means branch predictors and caches will be empty so a branch to a workaround function must be fetched from memory and that round trip is very slow. Then it also runs with the I-cache off, which means that the loop to iterate over the workarounds must also be fetched from memory on each iteration.
We can remove both branches. First, we can simply apply every erratum directly instead of defining a workaround function and jumping to it. Currently, no errata that need to be applied at both reset and runtime, with the same workaround function, exist. If the need arose in future, this should be achievable with a reset + runtime wrapper combo.
Then, we can construct a function that applies each erratum linearly instead of looping over the list. If this function is part of the reset function, then the only "far" branches at reset will be for the checker functions. Importantly, this mitigates the slowdown even when an erratum is disabled.
The result is ~50% speedup on N1SDP and ~20% on AArch64 Juno on wakeup from PSCI calls that end in powerdown. This is roughly back to the baseline of v2.9, before the errata framework regressed on performance (or a little better). It is important to note that there are other slowdowns since then that remain unknown.
Change-Id: Ie4d5288a331b11fd648e5c4a0b652b74160b07b9 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 6bb96fa6 | 27-Jan-2023 |
Boyan Karatotev <boyan.karatotev@arm.com> |
refactor(cpus): rename errata_report.h to errata.h
The ERRATA_XXX macros, used in cpu_helpers.S, are necessary for the check_errata_xxx family of functions. The CPU_REV should be used in the cpu fil
refactor(cpus): rename errata_report.h to errata.h
The ERRATA_XXX macros, used in cpu_helpers.S, are necessary for the check_errata_xxx family of functions. The CPU_REV should be used in the cpu files but for whatever reason the values have been hard-coded so far (at the cost of readability). It's evident this file is not strictly for status reporting.
The new purpose of this file is to make it a one-stop-shop for all things errata.
Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com> Change-Id: I1ce22dd36df5aa0bcfc5f2772251f91af8703dfb
show more ...
|
| 5668db72 | 12-Jan-2023 |
Andrew Davis <afd@ti.com> |
feat(ti): set snoop-delayed exclusive handling on A72 cores
Snoop requests should not be responded to during atomic operations. This can be handled by the interconnect using its global monitor or by
feat(ti): set snoop-delayed exclusive handling on A72 cores
Snoop requests should not be responded to during atomic operations. This can be handled by the interconnect using its global monitor or by the core's SCU delaying to check for the corresponding atomic monitor state.
TI SoCs take the second approach. Set the snoop-delayed exclusive handling bit to inform the core it needs to delay responses to perform this check.
As J784s4 is currently the only SoC with multiple A72 clusters, limit this delay to only that device.
Signed-off-by: Andrew Davis <afd@ti.com> Change-Id: I875f64e4f53d47a9a0ccbf3415edc565be7f84d9
show more ...
|
| 81858a35 | 10-Jan-2023 |
Andrew Davis <afd@ti.com> |
feat(ti): set L2 cache ECC and and parity on A72 cores
The Cortex-A72 based cores on K3 platforms have cache ECC and parity protection, enable these.
Signed-off-by: Andrew Davis <afd@ti.com> Change
feat(ti): set L2 cache ECC and and parity on A72 cores
The Cortex-A72 based cores on K3 platforms have cache ECC and parity protection, enable these.
Signed-off-by: Andrew Davis <afd@ti.com> Change-Id: Icd00bc4aa9c1c48f0fb2a10ea66e75e0b146ef3c
show more ...
|