1*4882a593Smuzhiyun============================== 2*4882a593SmuzhiyunMemory Layout on AArch64 Linux 3*4882a593Smuzhiyun============================== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunAuthor: Catalin Marinas <catalin.marinas@arm.com> 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunThis document describes the virtual memory layout used by the AArch64 8*4882a593SmuzhiyunLinux kernel. The architecture allows up to 4 levels of translation 9*4882a593Smuzhiyuntables with a 4KB page size and up to 3 levels with a 64KB page size. 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunAArch64 Linux uses either 3 levels or 4 levels of translation tables 12*4882a593Smuzhiyunwith the 4KB page configuration, allowing 39-bit (512GB) or 48-bit 13*4882a593Smuzhiyun(256TB) virtual addresses, respectively, for both user and kernel. With 14*4882a593Smuzhiyun64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB) 15*4882a593Smuzhiyunvirtual address, are used but the memory layout is the same. 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunARMv8.2 adds optional support for Large Virtual Address space. This is 18*4882a593Smuzhiyunonly available when running with a 64KB page size and expands the 19*4882a593Smuzhiyunnumber of descriptors in the first level of translation. 20*4882a593Smuzhiyun 21*4882a593SmuzhiyunUser addresses have bits 63:48 set to 0 while the kernel addresses have 22*4882a593Smuzhiyunthe same bits set to 1. TTBRx selection is given by bit 63 of the 23*4882a593Smuzhiyunvirtual address. The swapper_pg_dir contains only kernel (global) 24*4882a593Smuzhiyunmappings while the user pgd contains only user (non-global) mappings. 25*4882a593SmuzhiyunThe swapper_pg_dir address is written to TTBR1 and never written to 26*4882a593SmuzhiyunTTBR0. 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunAArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: 30*4882a593Smuzhiyun 31*4882a593Smuzhiyun Start End Size Use 32*4882a593Smuzhiyun ----------------------------------------------------------------------- 33*4882a593Smuzhiyun 0000000000000000 0000ffffffffffff 256TB user 34*4882a593Smuzhiyun ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map 35*4882a593Smuzhiyun ffff800000000000 ffff9fffffffffff 32TB kasan shadow region 36*4882a593Smuzhiyun ffffa00000000000 ffffa00007ffffff 128MB bpf jit region 37*4882a593Smuzhiyun ffffa00008000000 ffffa0000fffffff 128MB modules 38*4882a593Smuzhiyun ffffa00010000000 fffffdffbffeffff ~93TB vmalloc 39*4882a593Smuzhiyun fffffdffbfff0000 fffffdfffe5f8fff ~998MB [guard region] 40*4882a593Smuzhiyun fffffdfffe5f9000 fffffdfffe9fffff 4124KB fixed mappings 41*4882a593Smuzhiyun fffffdfffea00000 fffffdfffebfffff 2MB [guard region] 42*4882a593Smuzhiyun fffffdfffec00000 fffffdffffbfffff 16MB PCI I/O space 43*4882a593Smuzhiyun fffffdffffc00000 fffffdffffdfffff 2MB [guard region] 44*4882a593Smuzhiyun fffffdffffe00000 ffffffffffdfffff 2TB vmemmap 45*4882a593Smuzhiyun ffffffffffe00000 ffffffffffffffff 2MB [guard region] 46*4882a593Smuzhiyun 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunAArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):: 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun Start End Size Use 51*4882a593Smuzhiyun ----------------------------------------------------------------------- 52*4882a593Smuzhiyun 0000000000000000 000fffffffffffff 4PB user 53*4882a593Smuzhiyun fff0000000000000 fff7ffffffffffff 2PB kernel logical memory map 54*4882a593Smuzhiyun fff8000000000000 fffd9fffffffffff 1440TB [gap] 55*4882a593Smuzhiyun fffda00000000000 ffff9fffffffffff 512TB kasan shadow region 56*4882a593Smuzhiyun ffffa00000000000 ffffa00007ffffff 128MB bpf jit region 57*4882a593Smuzhiyun ffffa00008000000 ffffa0000fffffff 128MB modules 58*4882a593Smuzhiyun ffffa00010000000 fffff81ffffeffff ~88TB vmalloc 59*4882a593Smuzhiyun fffff81fffff0000 fffffc1ffe58ffff ~3TB [guard region] 60*4882a593Smuzhiyun fffffc1ffe590000 fffffc1ffe9fffff 4544KB fixed mappings 61*4882a593Smuzhiyun fffffc1ffea00000 fffffc1ffebfffff 2MB [guard region] 62*4882a593Smuzhiyun fffffc1ffec00000 fffffc1fffbfffff 16MB PCI I/O space 63*4882a593Smuzhiyun fffffc1fffc00000 fffffc1fffdfffff 2MB [guard region] 64*4882a593Smuzhiyun fffffc1fffe00000 ffffffffffdfffff 3968GB vmemmap 65*4882a593Smuzhiyun ffffffffffe00000 ffffffffffffffff 2MB [guard region] 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun 68*4882a593SmuzhiyunTranslation table lookup with 4KB pages:: 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun +--------+--------+--------+--------+--------+--------+--------+--------+ 71*4882a593Smuzhiyun |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| 72*4882a593Smuzhiyun +--------+--------+--------+--------+--------+--------+--------+--------+ 73*4882a593Smuzhiyun | | | | | | 74*4882a593Smuzhiyun | | | | | v 75*4882a593Smuzhiyun | | | | | [11:0] in-page offset 76*4882a593Smuzhiyun | | | | +-> [20:12] L3 index 77*4882a593Smuzhiyun | | | +-----------> [29:21] L2 index 78*4882a593Smuzhiyun | | +---------------------> [38:30] L1 index 79*4882a593Smuzhiyun | +-------------------------------> [47:39] L0 index 80*4882a593Smuzhiyun +-------------------------------------------------> [63] TTBR0/1 81*4882a593Smuzhiyun 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunTranslation table lookup with 64KB pages:: 84*4882a593Smuzhiyun 85*4882a593Smuzhiyun +--------+--------+--------+--------+--------+--------+--------+--------+ 86*4882a593Smuzhiyun |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| 87*4882a593Smuzhiyun +--------+--------+--------+--------+--------+--------+--------+--------+ 88*4882a593Smuzhiyun | | | | | 89*4882a593Smuzhiyun | | | | v 90*4882a593Smuzhiyun | | | | [15:0] in-page offset 91*4882a593Smuzhiyun | | | +----------> [28:16] L3 index 92*4882a593Smuzhiyun | | +--------------------------> [41:29] L2 index 93*4882a593Smuzhiyun | +-------------------------------> [47:42] L1 index (48-bit) 94*4882a593Smuzhiyun | [51:42] L1 index (52-bit) 95*4882a593Smuzhiyun +-------------------------------------------------> [63] TTBR0/1 96*4882a593Smuzhiyun 97*4882a593Smuzhiyun 98*4882a593SmuzhiyunWhen using KVM without the Virtualization Host Extensions, the 99*4882a593Smuzhiyunhypervisor maps kernel pages in EL2 at a fixed (and potentially 100*4882a593Smuzhiyunrandom) offset from the linear mapping. See the kern_hyp_va macro and 101*4882a593Smuzhiyunkvm_update_va_mask function for more details. MMIO devices such as 102*4882a593SmuzhiyunGICv2 gets mapped next to the HYP idmap page, as do vectors when 103*4882a593SmuzhiyunARM64_SPECTRE_V3A is enabled for particular CPUs. 104*4882a593Smuzhiyun 105*4882a593SmuzhiyunWhen using KVM with the Virtualization Host Extensions, no additional 106*4882a593Smuzhiyunmappings are created, since the host kernel runs directly in EL2. 107*4882a593Smuzhiyun 108*4882a593Smuzhiyun52-bit VA support in the kernel 109*4882a593Smuzhiyun------------------------------- 110*4882a593SmuzhiyunIf the ARMv8.2-LVA optional feature is present, and we are running 111*4882a593Smuzhiyunwith a 64KB page size; then it is possible to use 52-bits of address 112*4882a593Smuzhiyunspace for both userspace and kernel addresses. However, any kernel 113*4882a593Smuzhiyunbinary that supports 52-bit must also be able to fall back to 48-bit 114*4882a593Smuzhiyunat early boot time if the hardware feature is not present. 115*4882a593Smuzhiyun 116*4882a593SmuzhiyunThis fallback mechanism necessitates the kernel .text to be in the 117*4882a593Smuzhiyunhigher addresses such that they are invariant to 48/52-bit VAs. Due 118*4882a593Smuzhiyunto the kasan shadow being a fraction of the entire kernel VA space, 119*4882a593Smuzhiyunthe end of the kasan shadow must also be in the higher half of the 120*4882a593Smuzhiyunkernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit, 121*4882a593Smuzhiyunthe end of the kasan shadow is invariant and dependent on ~0UL, 122*4882a593Smuzhiyunwhilst the start address will "grow" towards the lower addresses). 123*4882a593Smuzhiyun 124*4882a593SmuzhiyunIn order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET 125*4882a593Smuzhiyunis kept constant at 0xFFF0000000000000 (corresponding to 52-bit), 126*4882a593Smuzhiyunthis obviates the need for an extra variable read. The physvirt 127*4882a593Smuzhiyunoffset and vmemmap offsets are computed at early boot to enable 128*4882a593Smuzhiyunthis logic. 129*4882a593Smuzhiyun 130*4882a593SmuzhiyunAs a single binary will need to support both 48-bit and 52-bit VA 131*4882a593Smuzhiyunspaces, the VMEMMAP must be sized large enough for 52-bit VAs and 132*4882a593Smuzhiyunalso must be sized large enough to accommodate a fixed PAGE_OFFSET. 133*4882a593Smuzhiyun 134*4882a593SmuzhiyunMost code in the kernel should not need to consider the VA_BITS, for 135*4882a593Smuzhiyuncode that does need to know the VA size the variables are 136*4882a593Smuzhiyundefined as follows: 137*4882a593Smuzhiyun 138*4882a593SmuzhiyunVA_BITS constant the *maximum* VA space size 139*4882a593Smuzhiyun 140*4882a593SmuzhiyunVA_BITS_MIN constant the *minimum* VA space size 141*4882a593Smuzhiyun 142*4882a593Smuzhiyunvabits_actual variable the *actual* VA space size 143*4882a593Smuzhiyun 144*4882a593Smuzhiyun 145*4882a593SmuzhiyunMaximum and minimum sizes can be useful to ensure that buffers are 146*4882a593Smuzhiyunsized large enough or that addresses are positioned close enough for 147*4882a593Smuzhiyunthe "worst" case. 148*4882a593Smuzhiyun 149*4882a593Smuzhiyun52-bit userspace VAs 150*4882a593Smuzhiyun-------------------- 151*4882a593SmuzhiyunTo maintain compatibility with software that relies on the ARMv8.0 152*4882a593SmuzhiyunVA space maximum size of 48-bits, the kernel will, by default, 153*4882a593Smuzhiyunreturn virtual addresses to userspace from a 48-bit range. 154*4882a593Smuzhiyun 155*4882a593SmuzhiyunSoftware can "opt-in" to receiving VAs from a 52-bit space by 156*4882a593Smuzhiyunspecifying an mmap hint parameter that is larger than 48-bit. 157*4882a593Smuzhiyun 158*4882a593SmuzhiyunFor example: 159*4882a593Smuzhiyun 160*4882a593Smuzhiyun.. code-block:: c 161*4882a593Smuzhiyun 162*4882a593Smuzhiyun maybe_high_address = mmap(~0UL, size, prot, flags,...); 163*4882a593Smuzhiyun 164*4882a593SmuzhiyunIt is also possible to build a debug kernel that returns addresses 165*4882a593Smuzhiyunfrom a 52-bit space by enabling the following kernel config options: 166*4882a593Smuzhiyun 167*4882a593Smuzhiyun.. code-block:: sh 168*4882a593Smuzhiyun 169*4882a593Smuzhiyun CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y 170*4882a593Smuzhiyun 171*4882a593SmuzhiyunNote that this option is only intended for debugging applications 172*4882a593Smuzhiyunand should not be used in production. 173