xref: /OK3568_Linux_fs/kernel/Documentation/x86/x86_64/mm.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun=================
4*4882a593SmuzhiyunMemory Management
5*4882a593Smuzhiyun=================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunComplete virtual memory map with 4-level page tables
8*4882a593Smuzhiyun====================================================
9*4882a593Smuzhiyun
10*4882a593Smuzhiyun.. note::
11*4882a593Smuzhiyun
12*4882a593Smuzhiyun - Negative addresses such as "-23 TB" are absolute addresses in bytes, counted down
13*4882a593Smuzhiyun   from the top of the 64-bit address space. It's easier to understand the layout
14*4882a593Smuzhiyun   when seen both in absolute addresses and in distance-from-top notation.
15*4882a593Smuzhiyun
16*4882a593Smuzhiyun   For example 0xffffe90000000000 == -23 TB, it's 23 TB lower than the top of the
17*4882a593Smuzhiyun   64-bit address space (ffffffffffffffff).
18*4882a593Smuzhiyun
19*4882a593Smuzhiyun   Note that as we get closer to the top of the address space, the notation changes
20*4882a593Smuzhiyun   from TB to GB and then MB/KB.
21*4882a593Smuzhiyun
22*4882a593Smuzhiyun - "16M TB" might look weird at first sight, but it's an easier way to visualize size
23*4882a593Smuzhiyun   notation than "16 EB", which few will recognize at first sight as 16 exabytes.
24*4882a593Smuzhiyun   It also shows it nicely how incredibly large 64-bit address space is.
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun::
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun  ========================================================================================================================
29*4882a593Smuzhiyun      Start addr    |   Offset   |     End addr     |  Size   | VM area description
30*4882a593Smuzhiyun  ========================================================================================================================
31*4882a593Smuzhiyun                    |            |                  |         |
32*4882a593Smuzhiyun   0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm
33*4882a593Smuzhiyun  __________________|____________|__________________|_________|___________________________________________________________
34*4882a593Smuzhiyun                    |            |                  |         |
35*4882a593Smuzhiyun   0000800000000000 | +128    TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
36*4882a593Smuzhiyun                    |            |                  |         |     virtual memory addresses up to the -128 TB
37*4882a593Smuzhiyun                    |            |                  |         |     starting offset of kernel mappings.
38*4882a593Smuzhiyun  __________________|____________|__________________|_________|___________________________________________________________
39*4882a593Smuzhiyun                                                              |
40*4882a593Smuzhiyun                                                              | Kernel-space virtual memory, shared between all processes:
41*4882a593Smuzhiyun  ____________________________________________________________|___________________________________________________________
42*4882a593Smuzhiyun                    |            |                  |         |
43*4882a593Smuzhiyun   ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
44*4882a593Smuzhiyun   ffff880000000000 | -120    TB | ffff887fffffffff |  0.5 TB | LDT remap for PTI
45*4882a593Smuzhiyun   ffff888000000000 | -119.5  TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
46*4882a593Smuzhiyun   ffffc88000000000 |  -55.5  TB | ffffc8ffffffffff |  0.5 TB | ... unused hole
47*4882a593Smuzhiyun   ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
48*4882a593Smuzhiyun   ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
49*4882a593Smuzhiyun   ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
50*4882a593Smuzhiyun   ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
51*4882a593Smuzhiyun   ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
52*4882a593Smuzhiyun  __________________|____________|__________________|_________|____________________________________________________________
53*4882a593Smuzhiyun                                                              |
54*4882a593Smuzhiyun                                                              | Identical layout to the 56-bit one from here on:
55*4882a593Smuzhiyun  ____________________________________________________________|____________________________________________________________
56*4882a593Smuzhiyun                    |            |                  |         |
57*4882a593Smuzhiyun   fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
58*4882a593Smuzhiyun                    |            |                  |         | vaddr_end for KASLR
59*4882a593Smuzhiyun   fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
60*4882a593Smuzhiyun   fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
61*4882a593Smuzhiyun   ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
62*4882a593Smuzhiyun   ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
63*4882a593Smuzhiyun   ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
64*4882a593Smuzhiyun   ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
65*4882a593Smuzhiyun   ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
66*4882a593Smuzhiyun   ffffffff80000000 |-2048    MB |                  |         |
67*4882a593Smuzhiyun   ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
68*4882a593Smuzhiyun   ffffffffff000000 |  -16    MB |                  |         |
69*4882a593Smuzhiyun      FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
70*4882a593Smuzhiyun   ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
71*4882a593Smuzhiyun   ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
72*4882a593Smuzhiyun  __________________|____________|__________________|_________|___________________________________________________________
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun
75*4882a593SmuzhiyunComplete virtual memory map with 5-level page tables
76*4882a593Smuzhiyun====================================================
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun.. note::
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun - With 56-bit addresses, user-space memory gets expanded by a factor of 512x,
81*4882a593Smuzhiyun   from 0.125 PB to 64 PB. All kernel mappings shift down to the -64 PB starting
82*4882a593Smuzhiyun   offset and many of the regions expand to support the much larger physical
83*4882a593Smuzhiyun   memory supported.
84*4882a593Smuzhiyun
85*4882a593Smuzhiyun::
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun  ========================================================================================================================
88*4882a593Smuzhiyun      Start addr    |   Offset   |     End addr     |  Size   | VM area description
89*4882a593Smuzhiyun  ========================================================================================================================
90*4882a593Smuzhiyun                    |            |                  |         |
91*4882a593Smuzhiyun   0000000000000000 |    0       | 00ffffffffffffff |   64 PB | user-space virtual memory, different per mm
92*4882a593Smuzhiyun  __________________|____________|__________________|_________|___________________________________________________________
93*4882a593Smuzhiyun                    |            |                  |         |
94*4882a593Smuzhiyun   0100000000000000 |  +64    PB | feffffffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
95*4882a593Smuzhiyun                    |            |                  |         |     virtual memory addresses up to the -64 PB
96*4882a593Smuzhiyun                    |            |                  |         |     starting offset of kernel mappings.
97*4882a593Smuzhiyun  __________________|____________|__________________|_________|___________________________________________________________
98*4882a593Smuzhiyun                                                              |
99*4882a593Smuzhiyun                                                              | Kernel-space virtual memory, shared between all processes:
100*4882a593Smuzhiyun  ____________________________________________________________|___________________________________________________________
101*4882a593Smuzhiyun                    |            |                  |         |
102*4882a593Smuzhiyun   ff00000000000000 |  -64    PB | ff0fffffffffffff |    4 PB | ... guard hole, also reserved for hypervisor
103*4882a593Smuzhiyun   ff10000000000000 |  -60    PB | ff10ffffffffffff | 0.25 PB | LDT remap for PTI
104*4882a593Smuzhiyun   ff11000000000000 |  -59.75 PB | ff90ffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
105*4882a593Smuzhiyun   ff91000000000000 |  -27.75 PB | ff9fffffffffffff | 3.75 PB | ... unused hole
106*4882a593Smuzhiyun   ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
107*4882a593Smuzhiyun   ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
108*4882a593Smuzhiyun   ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
109*4882a593Smuzhiyun   ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
110*4882a593Smuzhiyun   ffdf000000000000 |   -8.25 PB | fffffbffffffffff |   ~8 PB | KASAN shadow memory
111*4882a593Smuzhiyun  __________________|____________|__________________|_________|____________________________________________________________
112*4882a593Smuzhiyun                                                              |
113*4882a593Smuzhiyun                                                              | Identical layout to the 47-bit one from here on:
114*4882a593Smuzhiyun  ____________________________________________________________|____________________________________________________________
115*4882a593Smuzhiyun                    |            |                  |         |
116*4882a593Smuzhiyun   fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
117*4882a593Smuzhiyun                    |            |                  |         | vaddr_end for KASLR
118*4882a593Smuzhiyun   fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
119*4882a593Smuzhiyun   fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
120*4882a593Smuzhiyun   ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
121*4882a593Smuzhiyun   ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
122*4882a593Smuzhiyun   ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
123*4882a593Smuzhiyun   ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
124*4882a593Smuzhiyun   ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
125*4882a593Smuzhiyun   ffffffff80000000 |-2048    MB |                  |         |
126*4882a593Smuzhiyun   ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
127*4882a593Smuzhiyun   ffffffffff000000 |  -16    MB |                  |         |
128*4882a593Smuzhiyun      FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
129*4882a593Smuzhiyun   ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
130*4882a593Smuzhiyun   ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
131*4882a593Smuzhiyun  __________________|____________|__________________|_________|___________________________________________________________
132*4882a593Smuzhiyun
133*4882a593SmuzhiyunArchitecture defines a 64-bit virtual address. Implementations can support
134*4882a593Smuzhiyunless. Currently supported are 48- and 57-bit virtual addresses. Bits 63
135*4882a593Smuzhiyunthrough to the most-significant implemented bit are sign extended.
136*4882a593SmuzhiyunThis causes hole between user space and kernel addresses if you interpret them
137*4882a593Smuzhiyunas unsigned.
138*4882a593Smuzhiyun
139*4882a593SmuzhiyunThe direct mapping covers all memory in the system up to the highest
140*4882a593Smuzhiyunmemory address (this means in some cases it can also include PCI memory
141*4882a593Smuzhiyunholes).
142*4882a593Smuzhiyun
143*4882a593Smuzhiyunvmalloc space is lazily synchronized into the different PML4/PML5 pages of
144*4882a593Smuzhiyunthe processes using the page fault handler, with init_top_pgt as
145*4882a593Smuzhiyunreference.
146*4882a593Smuzhiyun
147*4882a593SmuzhiyunWe map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual
148*4882a593Smuzhiyunmemory window (this size is arbitrary, it can be raised later if needed).
149*4882a593SmuzhiyunThe mappings are not part of any other kernel PGD and are only available
150*4882a593Smuzhiyunduring EFI runtime calls.
151*4882a593Smuzhiyun
152*4882a593SmuzhiyunNote that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
153*4882a593Smuzhiyunphysical memory, vmalloc/ioremap space and virtual memory map are randomized.
154*4882a593SmuzhiyunTheir order is preserved but their base will be offset early at boot time.
155*4882a593Smuzhiyun
156*4882a593SmuzhiyunBe very careful vs. KASLR when changing anything here. The KASLR address
157*4882a593Smuzhiyunrange must not overlap with anything except the KASAN shadow area, which is
158*4882a593Smuzhiyuncorrect as KASAN disables KASLR.
159*4882a593Smuzhiyun
160*4882a593SmuzhiyunFor both 4- and 5-level layouts, the STACKLEAK_POISON value in the last 2MB
161*4882a593Smuzhiyunhole: ffffffffffff4111
162