xref: /OK3568_Linux_fs/kernel/Documentation/arm64/memory.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun==============================
2*4882a593SmuzhiyunMemory Layout on AArch64 Linux
3*4882a593Smuzhiyun==============================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunAuthor: Catalin Marinas <catalin.marinas@arm.com>
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunThis document describes the virtual memory layout used by the AArch64
8*4882a593SmuzhiyunLinux kernel. The architecture allows up to 4 levels of translation
9*4882a593Smuzhiyuntables with a 4KB page size and up to 3 levels with a 64KB page size.
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunAArch64 Linux uses either 3 levels or 4 levels of translation tables
12*4882a593Smuzhiyunwith the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
13*4882a593Smuzhiyun(256TB) virtual addresses, respectively, for both user and kernel. With
14*4882a593Smuzhiyun64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
15*4882a593Smuzhiyunvirtual address, are used but the memory layout is the same.
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunARMv8.2 adds optional support for Large Virtual Address space. This is
18*4882a593Smuzhiyunonly available when running with a 64KB page size and expands the
19*4882a593Smuzhiyunnumber of descriptors in the first level of translation.
20*4882a593Smuzhiyun
21*4882a593SmuzhiyunUser addresses have bits 63:48 set to 0 while the kernel addresses have
22*4882a593Smuzhiyunthe same bits set to 1. TTBRx selection is given by bit 63 of the
23*4882a593Smuzhiyunvirtual address. The swapper_pg_dir contains only kernel (global)
24*4882a593Smuzhiyunmappings while the user pgd contains only user (non-global) mappings.
25*4882a593SmuzhiyunThe swapper_pg_dir address is written to TTBR1 and never written to
26*4882a593SmuzhiyunTTBR0.
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunAArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
30*4882a593Smuzhiyun
31*4882a593Smuzhiyun  Start			End			Size		Use
32*4882a593Smuzhiyun  -----------------------------------------------------------------------
33*4882a593Smuzhiyun  0000000000000000	0000ffffffffffff	 256TB		user
34*4882a593Smuzhiyun  ffff000000000000	ffff7fffffffffff	 128TB		kernel logical memory map
35*4882a593Smuzhiyun  ffff800000000000	ffff9fffffffffff	  32TB		kasan shadow region
36*4882a593Smuzhiyun  ffffa00000000000	ffffa00007ffffff	 128MB		bpf jit region
37*4882a593Smuzhiyun  ffffa00008000000	ffffa0000fffffff	 128MB		modules
38*4882a593Smuzhiyun  ffffa00010000000	fffffdffbffeffff	 ~93TB		vmalloc
39*4882a593Smuzhiyun  fffffdffbfff0000	fffffdfffe5f8fff	~998MB		[guard region]
40*4882a593Smuzhiyun  fffffdfffe5f9000	fffffdfffe9fffff	4124KB		fixed mappings
41*4882a593Smuzhiyun  fffffdfffea00000	fffffdfffebfffff	   2MB		[guard region]
42*4882a593Smuzhiyun  fffffdfffec00000	fffffdffffbfffff	  16MB		PCI I/O space
43*4882a593Smuzhiyun  fffffdffffc00000	fffffdffffdfffff	   2MB		[guard region]
44*4882a593Smuzhiyun  fffffdffffe00000	ffffffffffdfffff	   2TB		vmemmap
45*4882a593Smuzhiyun  ffffffffffe00000	ffffffffffffffff	   2MB		[guard region]
46*4882a593Smuzhiyun
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunAArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support)::
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun  Start			End			Size		Use
51*4882a593Smuzhiyun  -----------------------------------------------------------------------
52*4882a593Smuzhiyun  0000000000000000	000fffffffffffff	   4PB		user
53*4882a593Smuzhiyun  fff0000000000000	fff7ffffffffffff	   2PB		kernel logical memory map
54*4882a593Smuzhiyun  fff8000000000000	fffd9fffffffffff	1440TB		[gap]
55*4882a593Smuzhiyun  fffda00000000000	ffff9fffffffffff	 512TB		kasan shadow region
56*4882a593Smuzhiyun  ffffa00000000000	ffffa00007ffffff	 128MB		bpf jit region
57*4882a593Smuzhiyun  ffffa00008000000	ffffa0000fffffff	 128MB		modules
58*4882a593Smuzhiyun  ffffa00010000000	fffff81ffffeffff	 ~88TB		vmalloc
59*4882a593Smuzhiyun  fffff81fffff0000	fffffc1ffe58ffff	  ~3TB		[guard region]
60*4882a593Smuzhiyun  fffffc1ffe590000	fffffc1ffe9fffff	4544KB		fixed mappings
61*4882a593Smuzhiyun  fffffc1ffea00000	fffffc1ffebfffff	   2MB		[guard region]
62*4882a593Smuzhiyun  fffffc1ffec00000	fffffc1fffbfffff	  16MB		PCI I/O space
63*4882a593Smuzhiyun  fffffc1fffc00000	fffffc1fffdfffff	   2MB		[guard region]
64*4882a593Smuzhiyun  fffffc1fffe00000	ffffffffffdfffff	3968GB		vmemmap
65*4882a593Smuzhiyun  ffffffffffe00000	ffffffffffffffff	   2MB		[guard region]
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun
68*4882a593SmuzhiyunTranslation table lookup with 4KB pages::
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun  +--------+--------+--------+--------+--------+--------+--------+--------+
71*4882a593Smuzhiyun  |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
72*4882a593Smuzhiyun  +--------+--------+--------+--------+--------+--------+--------+--------+
73*4882a593Smuzhiyun   |                 |         |         |         |         |
74*4882a593Smuzhiyun   |                 |         |         |         |         v
75*4882a593Smuzhiyun   |                 |         |         |         |   [11:0]  in-page offset
76*4882a593Smuzhiyun   |                 |         |         |         +-> [20:12] L3 index
77*4882a593Smuzhiyun   |                 |         |         +-----------> [29:21] L2 index
78*4882a593Smuzhiyun   |                 |         +---------------------> [38:30] L1 index
79*4882a593Smuzhiyun   |                 +-------------------------------> [47:39] L0 index
80*4882a593Smuzhiyun   +-------------------------------------------------> [63] TTBR0/1
81*4882a593Smuzhiyun
82*4882a593Smuzhiyun
83*4882a593SmuzhiyunTranslation table lookup with 64KB pages::
84*4882a593Smuzhiyun
85*4882a593Smuzhiyun  +--------+--------+--------+--------+--------+--------+--------+--------+
86*4882a593Smuzhiyun  |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
87*4882a593Smuzhiyun  +--------+--------+--------+--------+--------+--------+--------+--------+
88*4882a593Smuzhiyun   |                 |    |               |              |
89*4882a593Smuzhiyun   |                 |    |               |              v
90*4882a593Smuzhiyun   |                 |    |               |            [15:0]  in-page offset
91*4882a593Smuzhiyun   |                 |    |               +----------> [28:16] L3 index
92*4882a593Smuzhiyun   |                 |    +--------------------------> [41:29] L2 index
93*4882a593Smuzhiyun   |                 +-------------------------------> [47:42] L1 index (48-bit)
94*4882a593Smuzhiyun   |                                                   [51:42] L1 index (52-bit)
95*4882a593Smuzhiyun   +-------------------------------------------------> [63] TTBR0/1
96*4882a593Smuzhiyun
97*4882a593Smuzhiyun
98*4882a593SmuzhiyunWhen using KVM without the Virtualization Host Extensions, the
99*4882a593Smuzhiyunhypervisor maps kernel pages in EL2 at a fixed (and potentially
100*4882a593Smuzhiyunrandom) offset from the linear mapping. See the kern_hyp_va macro and
101*4882a593Smuzhiyunkvm_update_va_mask function for more details. MMIO devices such as
102*4882a593SmuzhiyunGICv2 gets mapped next to the HYP idmap page, as do vectors when
103*4882a593SmuzhiyunARM64_SPECTRE_V3A is enabled for particular CPUs.
104*4882a593Smuzhiyun
105*4882a593SmuzhiyunWhen using KVM with the Virtualization Host Extensions, no additional
106*4882a593Smuzhiyunmappings are created, since the host kernel runs directly in EL2.
107*4882a593Smuzhiyun
108*4882a593Smuzhiyun52-bit VA support in the kernel
109*4882a593Smuzhiyun-------------------------------
110*4882a593SmuzhiyunIf the ARMv8.2-LVA optional feature is present, and we are running
111*4882a593Smuzhiyunwith a 64KB page size; then it is possible to use 52-bits of address
112*4882a593Smuzhiyunspace for both userspace and kernel addresses. However, any kernel
113*4882a593Smuzhiyunbinary that supports 52-bit must also be able to fall back to 48-bit
114*4882a593Smuzhiyunat early boot time if the hardware feature is not present.
115*4882a593Smuzhiyun
116*4882a593SmuzhiyunThis fallback mechanism necessitates the kernel .text to be in the
117*4882a593Smuzhiyunhigher addresses such that they are invariant to 48/52-bit VAs. Due
118*4882a593Smuzhiyunto the kasan shadow being a fraction of the entire kernel VA space,
119*4882a593Smuzhiyunthe end of the kasan shadow must also be in the higher half of the
120*4882a593Smuzhiyunkernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit,
121*4882a593Smuzhiyunthe end of the kasan shadow is invariant and dependent on ~0UL,
122*4882a593Smuzhiyunwhilst the start address will "grow" towards the lower addresses).
123*4882a593Smuzhiyun
124*4882a593SmuzhiyunIn order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET
125*4882a593Smuzhiyunis kept constant at 0xFFF0000000000000 (corresponding to 52-bit),
126*4882a593Smuzhiyunthis obviates the need for an extra variable read. The physvirt
127*4882a593Smuzhiyunoffset and vmemmap offsets are computed at early boot to enable
128*4882a593Smuzhiyunthis logic.
129*4882a593Smuzhiyun
130*4882a593SmuzhiyunAs a single binary will need to support both 48-bit and 52-bit VA
131*4882a593Smuzhiyunspaces, the VMEMMAP must be sized large enough for 52-bit VAs and
132*4882a593Smuzhiyunalso must be sized large enough to accommodate a fixed PAGE_OFFSET.
133*4882a593Smuzhiyun
134*4882a593SmuzhiyunMost code in the kernel should not need to consider the VA_BITS, for
135*4882a593Smuzhiyuncode that does need to know the VA size the variables are
136*4882a593Smuzhiyundefined as follows:
137*4882a593Smuzhiyun
138*4882a593SmuzhiyunVA_BITS		constant	the *maximum* VA space size
139*4882a593Smuzhiyun
140*4882a593SmuzhiyunVA_BITS_MIN	constant	the *minimum* VA space size
141*4882a593Smuzhiyun
142*4882a593Smuzhiyunvabits_actual	variable	the *actual* VA space size
143*4882a593Smuzhiyun
144*4882a593Smuzhiyun
145*4882a593SmuzhiyunMaximum and minimum sizes can be useful to ensure that buffers are
146*4882a593Smuzhiyunsized large enough or that addresses are positioned close enough for
147*4882a593Smuzhiyunthe "worst" case.
148*4882a593Smuzhiyun
149*4882a593Smuzhiyun52-bit userspace VAs
150*4882a593Smuzhiyun--------------------
151*4882a593SmuzhiyunTo maintain compatibility with software that relies on the ARMv8.0
152*4882a593SmuzhiyunVA space maximum size of 48-bits, the kernel will, by default,
153*4882a593Smuzhiyunreturn virtual addresses to userspace from a 48-bit range.
154*4882a593Smuzhiyun
155*4882a593SmuzhiyunSoftware can "opt-in" to receiving VAs from a 52-bit space by
156*4882a593Smuzhiyunspecifying an mmap hint parameter that is larger than 48-bit.
157*4882a593Smuzhiyun
158*4882a593SmuzhiyunFor example:
159*4882a593Smuzhiyun
160*4882a593Smuzhiyun.. code-block:: c
161*4882a593Smuzhiyun
162*4882a593Smuzhiyun   maybe_high_address = mmap(~0UL, size, prot, flags,...);
163*4882a593Smuzhiyun
164*4882a593SmuzhiyunIt is also possible to build a debug kernel that returns addresses
165*4882a593Smuzhiyunfrom a 52-bit space by enabling the following kernel config options:
166*4882a593Smuzhiyun
167*4882a593Smuzhiyun.. code-block:: sh
168*4882a593Smuzhiyun
169*4882a593Smuzhiyun   CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y
170*4882a593Smuzhiyun
171*4882a593SmuzhiyunNote that this option is only intended for debugging applications
172*4882a593Smuzhiyunand should not be used in production.
173