docs/components/numa-per-cpu.rst

16 example, PSCI or SPM context) is stored in a global array or contiguous region,
17 usually located in the memory of a single node. This approach introduces two key
18 issues in multi-node systems:
23   limits scalability in systems where each node has limited local memory.
26      :alt: Diagram showing the BL31 binary section layout in TF-A within local
34   *Figure: Typical BL31/BL32 binary storage in local memory*
46 platforms place them in the nodes with the lowest access latency.
52 **allocating**, **defining**, and **accessing** per-CPU data in a NUMA-aware
54 platforms while optimizing for performance in multi-node systems.
60 to **allocate** per-CPU global variables and ensure that these objects reside in
62 objects are allocated in the local memory of their respective nodes.
75 *Figure: BL31/BL32 binary storage in local memory of per node when per-cpu NUMA
82 This linker section also addresses a common performance issue in modern
85 accessed variables may be logically independent, their proximity in memory can
86 result in repeated cache invalidations and reloads. Cache-coherency mechanisms
101 *Figure: Two processors modifying different variables placed too closely in
116 per-CPU objects efficiently in multi-node systems.
132 efficiently in multi-node systems.
143 For use in assembly routines, a corresponding macro version is provided:
148 duplicating addressing logic in assembly files.
154 requirements in order for the runtime to correctly set up per-CPU sections on
160 Set ``PLATFORM_NODE_COUNT`` to a value greater than 1 (>=2) in the platform
168 default ``PLATFORM_NODE_COUNT`` is 1. The NUMA framework is not supported in
182 - encode them in platform-specific tables compiled into the image.
184 If a node described in platform data is not populated at runtime, the hooks may
217   coherency early in the boot process.