| 2590e819 | 25-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(mpmm): greatly simplify MPMM enablement
MPMM is a core-specific microarchitectural feature. It has been present in every Arm core since the Cortex-A510 and has been implemented in exactly the s
perf(mpmm): greatly simplify MPMM enablement
MPMM is a core-specific microarchitectural feature. It has been present in every Arm core since the Cortex-A510 and has been implemented in exactly the same way. Despite that, it is enabled more like an architectural feature with a top level enable flag. This utilised the identical implementation.
This duality has left MPMM in an awkward place, where its enablement should be generic, like an architectural feature, but since it is not, it should also be core-specific if it ever changes. One choice to do this has been through the device tree.
This has worked just fine so far, however, recent implementations expose a weakness in that this is rather slow - the device tree has to be read, there's a long call stack of functions with many branches, and system registers are read. In the hot path of PSCI CPU powerdown, this has a significant and measurable impact. Besides it being a rather large amount of code that is difficult to understand.
Since MPMM is a microarchitectural feature, its correct placement is in the reset function. The essence of the current enablement is to write CPUPPMCR_EL3.MPMM_EN if CPUPPMCR_EL3.MPMMPINCTL == 0. Replacing the C enablement with an assembly macro in each CPU's reset function achieves the same effect with just a single close branch and a grand total of 6 instructions (versus the old 2 branches and 32 instructions).
Having done this, the device tree entry becomes redundant. Should a core that doesn't support MPMM arise, this can cleanly be handled in the reset function. As such, the whole ENABLE_MPMM_FCONF and platform hooks mechanisms become obsolete and are removed.
Change-Id: I1d0475b21a1625bb3519f513ba109284f973ffdf Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| a8a5d39d | 24-Feb-2025 |
Manish V Badarkhe <manish.badarkhe@arm.com> |
Merge changes from topic "bk/errata_speed" into integration
* changes: refactor(cpus): declare runtime errata correctly perf(cpus): make reset errata do fewer branches perf(cpus): inline the i
Merge changes from topic "bk/errata_speed" into integration
* changes: refactor(cpus): declare runtime errata correctly perf(cpus): make reset errata do fewer branches perf(cpus): inline the init_cpu_data_ptr function perf(cpus): inline the reset function perf(cpus): inline the cpu_get_rev_var call perf(cpus): inline cpu_rev_var checks refactor(cpus): register DSU errata with the errata framework's wrappers refactor(cpus): convert checker functions to standard helpers refactor(cpus): convert the Cortex-A65 to use the errata framework fix(cpus): declare reset errata correctly
show more ...
|
| 89dba82d | 22-Jan-2025 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cpus): make reset errata do fewer branches
Errata application is painful for performance. For a start, it's done when the core has just come out of reset, which means branch predictors and cach
perf(cpus): make reset errata do fewer branches
Errata application is painful for performance. For a start, it's done when the core has just come out of reset, which means branch predictors and caches will be empty so a branch to a workaround function must be fetched from memory and that round trip is very slow. Then it also runs with the I-cache off, which means that the loop to iterate over the workarounds must also be fetched from memory on each iteration.
We can remove both branches. First, we can simply apply every erratum directly instead of defining a workaround function and jumping to it. Currently, no errata that need to be applied at both reset and runtime, with the same workaround function, exist. If the need arose in future, this should be achievable with a reset + runtime wrapper combo.
Then, we can construct a function that applies each erratum linearly instead of looping over the list. If this function is part of the reset function, then the only "far" branches at reset will be for the checker functions. Importantly, this mitigates the slowdown even when an erratum is disabled.
The result is ~50% speedup on N1SDP and ~20% on AArch64 Juno on wakeup from PSCI calls that end in powerdown. This is roughly back to the baseline of v2.9, before the errata framework regressed on performance (or a little better). It is important to note that there are other slowdowns since then that remain unknown.
Change-Id: Ie4d5288a331b11fd648e5c4a0b652b74160b07b9 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| b07c317f | 19-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cpus): inline the init_cpu_data_ptr function
Similar to the reset function inline, inline this too to not do a costly branch with no extra cost.
Change-Id: I54cc399e570e9d0f373ae13c7224d32dbdf
perf(cpus): inline the init_cpu_data_ptr function
Similar to the reset function inline, inline this too to not do a costly branch with no extra cost.
Change-Id: I54cc399e570e9d0f373ae13c7224d32dbdfae1e5 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 0d020822 | 19-Nov-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cpus): inline the reset function
Similar to the cpu_rev_var and cpu_ger_rev_var functions, inline the call_reset_handler handler. This way we skip the costly branch at no extra cost as this is
perf(cpus): inline the reset function
Similar to the cpu_rev_var and cpu_ger_rev_var functions, inline the call_reset_handler handler. This way we skip the costly branch at no extra cost as this is the only place where this is called.
While we're at it, drop the options for CPU_NO_RESET_FUNC. The only cpus that need that are virtual cpus which can spare the tiny bit of performance lost. The rest are real cores which can save on the check for zero.
Now is a good time to put the assert for a missing cpu in the get_cpu_ops_ptr function so that it's a bit better encapsulated.
Change-Id: Ia7c3dcd13b75e5d7c8bafad4698994ea65f42406 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 41ae0473 | 03-Feb-2025 |
Sona Mathew <sonarebecca.mathew@arm.com> |
fix(rmm): add support for BRBCR_EL2 register for feat_brbe
Currently BRBE is being disabled for Realm world in EL3 by switching the SBRBE bit in mdcr_el3 register to 0b00. The patch removes the swit
fix(rmm): add support for BRBCR_EL2 register for feat_brbe
Currently BRBE is being disabled for Realm world in EL3 by switching the SBRBE bit in mdcr_el3 register to 0b00. The patch removes the switching of SBRBE bits, and adds context switch of BRBCR_EL2 register.
Change-Id: I66ca13edefc37e40fa265fd438b0b66f7d09b4bb Signed-off-by: Sona Mathew <sonarebecca.mathew@arm.com>
show more ...
|
| 36eeb59f | 04-Dec-2024 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cpus): inline the cpu_get_rev_var call
Similar to the cpu_rev_var_xy functions, branching far away so early in the reset sequence incurs significant slowdowns. Inline the function.
Change-Id:
perf(cpus): inline the cpu_get_rev_var call
Similar to the cpu_rev_var_xy functions, branching far away so early in the reset sequence incurs significant slowdowns. Inline the function.
Change-Id: Ifc349015902cd803e11a1946208141bfe7606b89 Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 7791ce21 | 21-Jan-2025 |
Boyan Karatotev <boyan.karatotev@arm.com> |
perf(cpus): inline cpu_rev_var checks
We strive to apply errata as close to reset as possible with as few things enabled as possible. Importantly, the I-cache will not be enabled. This means that re
perf(cpus): inline cpu_rev_var checks
We strive to apply errata as close to reset as possible with as few things enabled as possible. Importantly, the I-cache will not be enabled. This means that repeated branches to these tiny functions must be re-fetched all the way from memory each time which has glacial speed. Cores are allowed to fetch things ahead of time though as long as execution is fairly linear. So we can trade a little bit of space (3 to 7 instructions per erratum) to keep things linear and not have to go to memory.
While we're at it, optimise the the cpu_rev_var_{ls, hs, range} functions to take up less space. Dropping the moves allows for a bit of assembly magic that produces the same result in 2 and 3 instructions respectively.
Change-Id: I51608352f23b2244ea7a99e76c10892d257f12bf Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| b62673c6 | 23-Jan-2025 |
Boyan Karatotev <boyan.karatotev@arm.com> |
refactor(cpus): register DSU errata with the errata framework's wrappers
The existing DSU errata workarounds hijack the errata framework's inner workings to register with it. However, that is undesi
refactor(cpus): register DSU errata with the errata framework's wrappers
The existing DSU errata workarounds hijack the errata framework's inner workings to register with it. However, that is undesirable as any change to the framework may end up missing these workarounds. So convert the checks and workarounds to macros and have them included with the standard wrappers.
The only problem with this is the is_scu_present_in_dsu weak function. Fortunately, it is only needed for 2 of the errata and only on 3 cores. So drop it, assuming the default behaviour and have the callers handle the exception.
Change-Id: Iefa36325804ea093e938f867b9a6f49a6984b8ae Signed-off-by: Boyan Karatotev <boyan.karatotev@arm.com>
show more ...
|
| 99b2ae26 | 20-Feb-2025 |
Manish V Badarkhe <manish.badarkhe@arm.com> |
Merge changes from topic "jw/gic-lca-support" into integration
* changes: fix(rdn2): add LCA multichip data for RD-N2-Cfg2 fix(rdv3): add LCA multichip data for RD-V3-Cfg2 feat(gic): add suppo
Merge changes from topic "jw/gic-lca-support" into integration
* changes: fix(rdn2): add LCA multichip data for RD-N2-Cfg2 fix(rdv3): add LCA multichip data for RD-V3-Cfg2 feat(gic): add support for local chip addressing
show more ...
|
| cf6d73d4 | 28-Jan-2025 |
Ghennadi Procopciuc <ghennadi.procopciuc@nxp.com> |
feat(nxp-clk): add clock modules for uSDHC
One of the uSDHC module's clock lines is attached to the CGM_MUX 14 divider, which is connected to PERIPH_DFS3. The other one is attached to XBAR_DIV3.
feat(nxp-clk): add clock modules for uSDHC
One of the uSDHC module's clock lines is attached to the CGM_MUX 14 divider, which is connected to PERIPH_DFS3. The other one is attached to XBAR_DIV3.
Change-Id: I23f468a3e5f7daa832c0841b55211a048284a7f0 Signed-off-by: Ghennadi Procopciuc <ghennadi.procopciuc@nxp.com>
show more ...
|
| 63d536fe | 23-Jan-2025 |
Ghennadi Procopciuc <ghennadi.procopciuc@nxp.com> |
feat(nxp-clk): add clock objects for CGM dividers
The CGM dividers are controllable dividers attached to a CGM mux. Its divison factor can be controlled through the MC_CGM's registers.
Change-Id: I
feat(nxp-clk): add clock objects for CGM dividers
The CGM dividers are controllable dividers attached to a CGM mux. Its divison factor can be controlled through the MC_CGM's registers.
Change-Id: Id2786a46c5a1d389ca32a4839c7158949aec3b0a Signed-off-by: Ghennadi Procopciuc <ghennadi.procopciuc@nxp.com>
show more ...
|
| 29f8a952 | 20-Jan-2025 |
Ghennadi Procopciuc <ghennadi.procopciuc@nxp.com> |
feat(nxp-clk): add base address for PERIPH_DFS
The PERIPH_DFS module is used to clock the SD and QSPI modules.
Change-Id: I440fd806d71acab641f0003a7f2a5ce720b469c6 Signed-off-by: Ghennadi Procopciu
feat(nxp-clk): add base address for PERIPH_DFS
The PERIPH_DFS module is used to clock the SD and QSPI modules.
Change-Id: I440fd806d71acab641f0003a7f2a5ce720b469c6 Signed-off-by: Ghennadi Procopciuc <ghennadi.procopciuc@nxp.com>
show more ...
|
| 7ce483e1 | 16-Feb-2025 |
Manish V Badarkhe <Manish.Badarkhe@arm.com> |
fix(libc): remove __Nonnull type specifier
Clang's nullability completeness checks were triggered after adding the _Nonnull specifier to one function. Removing it prevents Clang from flagging atexit
fix(libc): remove __Nonnull type specifier
Clang's nullability completeness checks were triggered after adding the _Nonnull specifier to one function. Removing it prevents Clang from flagging atexit() for missing a nullability specifier.
include/lib/libc/stdlib.h:25:25: error: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Werror,-Wnullability-completeness] 25 | extern int atexit(void (*func)(void));
This change ensures compliance with the C standard while preventing unexpected build errors.
Change-Id: I2f881c55b36b692d22c3db22149c6402c32e8c3e Signed-off-by: Manish V Badarkhe <Manish.Badarkhe@arm.com>
show more ...
|
| e0be63c8 | 13-Feb-2025 |
Manish V Badarkhe <manish.badarkhe@arm.com> |
Merge changes I712712d7,I1932500e,I75dda77e,I12f3b8a3,Ia72e5900 into integration
* changes: refactor(rse)!: remove rse_comms_init refactor(arm): switch to rse_mbx_init refactor(rse): put MHU c
Merge changes I712712d7,I1932500e,I75dda77e,I12f3b8a3,Ia72e5900 into integration
* changes: refactor(rse)!: remove rse_comms_init refactor(arm): switch to rse_mbx_init refactor(rse): put MHU code in a dedicated file refactor(tc): add plat_rse_comms_init refactor(arm)!: rename PLAT_MHU_VERSION flag
show more ...
|
| b51436c2 | 20-Nov-2024 |
Levi Yun <yeoreum.yun@arm.com> |
feat(spm_mm): move mm_communication header define to general header
To support TPM start method with SIP, SIP handler dispatch request to secure partition via MM_COMMUNICATE abi. That means spm_mm s
feat(spm_mm): move mm_communication header define to general header
To support TPM start method with SIP, SIP handler dispatch request to secure partition via MM_COMMUNICATE abi. That means spm_mm sip handler should generate mm communication header.
Move mm_communication header's definition to spm_mm_svc header.
Change-Id: I40567c16e67b068ee83a39eff050d6578aecfb2c Signed-off-by: Levi Yun <yeoreum.yun@arm.com>
show more ...
|
| 6db9aac6 | 12-Feb-2025 |
Lauren Wehrmeister <lauren.wehrmeister@arm.com> |
Merge changes from topic "mb/drtm" into integration
* changes: fix(drtm): fix DLME data size check fix(drtm): sort the address-map in ascending order feat(libc): import qsort implementation |
| e1362231 | 12-Feb-2025 |
Soby Mathew <soby.mathew@arm.com> |
Merge changes from topic "memory_bank" into integration
* changes: fix(qemu): statically allocate bitlocks array feat(qemu): update for renamed struct memory_bank feat(fvp): increase GPT PPS t
Merge changes from topic "memory_bank" into integration
* changes: fix(qemu): statically allocate bitlocks array feat(qemu): update for renamed struct memory_bank feat(fvp): increase GPT PPS to 1TB feat(gpt): statically allocate bitlocks array chore(gpt): define PPS in platform header files feat(fvp): allocate L0 GPT at the top of SRAM feat(fvp): change size of PCIe memory region 2 feat(rmm): add PCIe IO info to Boot manifest feat(fvp): define single Root region
show more ...
|
| 91c7a952 | 07-Feb-2025 |
Yann Gautier <yann.gautier@st.com> |
refactor(rse)!: remove rse_comms_init
The function to use is now rse_mbx_init(), that does the same if using MHU.
Change-Id: I712712d7d1bcd8c96d26951e176b877afb65209d Signed-off-by: Yann Gautier <y
refactor(rse)!: remove rse_comms_init
The function to use is now rse_mbx_init(), that does the same if using MHU.
Change-Id: I712712d7d1bcd8c96d26951e176b877afb65209d Signed-off-by: Yann Gautier <yann.gautier@st.com>
show more ...
|
| 36416b1e | 23-Sep-2024 |
Yann Gautier <yann.gautier@st.com> |
refactor(rse): put MHU code in a dedicated file
To be able to use RSE comms without MHU, a first step is to disentangle the rse_comms.c file with MHU code direct calls. This is done with the creatio
refactor(rse): put MHU code in a dedicated file
To be able to use RSE comms without MHU, a first step is to disentangle the rse_comms.c file with MHU code direct calls. This is done with the creation of a new file rse_comms_mhu.c. New APIs are created to initialize the mailbox, get max message size and send and receive data.
Signed-off-by: Yann Gautier <yann.gautier@st.com> Change-Id: I75dda77e1886beaa6ced6f92c311617125918cfa
show more ...
|
| fcb80d7d | 11-Feb-2025 |
Manish Pandey <manish.pandey2@arm.com> |
Merge changes I765a7fa0,Ic33f0b6d,I8d1a88c7,I381f96be,I698fa849, ... into integration
* changes: fix(cpus): clear CPUPWRCTLR_EL1.CORE_PWRDN_EN_BIT on reset chore(docs): drop the "wfi" from `pwr_
Merge changes I765a7fa0,Ic33f0b6d,I8d1a88c7,I381f96be,I698fa849, ... into integration
* changes: fix(cpus): clear CPUPWRCTLR_EL1.CORE_PWRDN_EN_BIT on reset chore(docs): drop the "wfi" from `pwr_domain_pwr_down_wfi` chore(psci): drop skip_wfi variable feat(arm): convert arm platforms to expect a wakeup fix(cpus): avoid SME related loss of context on powerdown feat(psci): allow cores to wake up from powerdown refactor: panic after calling psci_power_down_wfi() refactor(cpus): undo errata mitigations feat(cpus): add sysreg_bit_toggle
show more ...
|
| a32a77f9 | 11-Feb-2025 |
Jean-Philippe Brucker <jean-philippe@linaro.org> |
fix(qemu): statically allocate bitlocks array
gpt_runtime_init() now takes the bitlock array's address and size as argument. Rather than reserving space at the end of the L0 GPT for storing bitlocks
fix(qemu): statically allocate bitlocks array
gpt_runtime_init() now takes the bitlock array's address and size as argument. Rather than reserving space at the end of the L0 GPT for storing bitlocks, allocate a static array and pass its address to gpt_runtime_init(). This frees up a little bit of space formerly reserved for alignment of the GPT.
Change-Id: I48a1a2bc230f64e13e3ed08b18ebdc2d387d77d0 Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
show more ...
|
| aeec55c8 | 05-Feb-2025 |
AlexeiFedorov <Alexei.Fedorov@arm.com> |
feat(fvp): increase GPT PPS to 1TB
- Increase PPS for FVP from 64GB to 1TB. - GPT L0 table for 1TB PPS requires 8KB memory. - Set FVP_TRUSTED_SRAM_SIZE to 384 with ENABLE_RME=1 option. - Add 256MB
feat(fvp): increase GPT PPS to 1TB
- Increase PPS for FVP from 64GB to 1TB. - GPT L0 table for 1TB PPS requires 8KB memory. - Set FVP_TRUSTED_SRAM_SIZE to 384 with ENABLE_RME=1 option. - Add 256MB of PCIe memory region 1 and 3GB of PCIe memory region 2 to FVP PAS regions array.
Change-Id: Icadd528576f53c55b5d461ff4dcd357429ba622a Signed-off-by: AlexeiFedorov <Alexei.Fedorov@arm.com>
show more ...
|
| b0f1c840 | 24-Jan-2025 |
AlexeiFedorov <Alexei.Fedorov@arm.com> |
feat(gpt): statically allocate bitlocks array
Statically allocate 'gpt_bitlock' array of fine-grained 'bitlock_t' data structures in arm_bl31_setup.c. The amount of memory needed for this array is c
feat(gpt): statically allocate bitlocks array
Statically allocate 'gpt_bitlock' array of fine-grained 'bitlock_t' data structures in arm_bl31_setup.c. The amount of memory needed for this array is controlled by 'RME_GPT_BITLOCK_BLOCK' build option and 'PLAT_ARM_PPS' macro defined in platform_def.h which specifies the size of protected physical address space in bytes. 'PLAT_ARM_PPS' takes values from 4GB to 4PB supported by Arm architecture.
Change-Id: Icf620b5039e45df6828d58fca089cad83b0bc669 Signed-off-by: AlexeiFedorov <Alexei.Fedorov@arm.com>
show more ...
|
| ac07f3ab | 22-Jan-2025 |
AlexeiFedorov <Alexei.Fedorov@arm.com> |
chore(gpt): define PPS in platform header files
Define protected physical address size in bytes PLAT_ARM_PPS macro for FVP and RDV3 in platform_def.h files.
Change-Id: I7f6529dfbb8df864091fbefc0813
chore(gpt): define PPS in platform header files
Define protected physical address size in bytes PLAT_ARM_PPS macro for FVP and RDV3 in platform_def.h files.
Change-Id: I7f6529dfbb8df864091fbefc08131a0e6d689eb6 Signed-off-by: AlexeiFedorov <Alexei.Fedorov@arm.com>
show more ...
|