1*4882a593Smuzhiyun===================== 2*4882a593SmuzhiyunBooting AArch64 Linux 3*4882a593Smuzhiyun===================== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunAuthor: Will Deacon <will.deacon@arm.com> 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunDate : 07 September 2012 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunThis document is based on the ARM booting document by Russell King and 10*4882a593Smuzhiyunis relevant to all public releases of the AArch64 Linux kernel. 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunThe AArch64 exception model is made up of a number of exception levels 13*4882a593Smuzhiyun(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure 14*4882a593Smuzhiyuncounterpart. EL2 is the hypervisor level and exists only in non-secure 15*4882a593Smuzhiyunmode. EL3 is the highest priority level and exists only in secure mode. 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunFor the purposes of this document, we will use the term `boot loader` 18*4882a593Smuzhiyunsimply to define all software that executes on the CPU(s) before control 19*4882a593Smuzhiyunis passed to the Linux kernel. This may include secure monitor and 20*4882a593Smuzhiyunhypervisor code, or it may just be a handful of instructions for 21*4882a593Smuzhiyunpreparing a minimal boot environment. 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunEssentially, the boot loader should provide (as a minimum) the 24*4882a593Smuzhiyunfollowing: 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun1. Setup and initialise the RAM 27*4882a593Smuzhiyun2. Setup the device tree 28*4882a593Smuzhiyun3. Decompress the kernel image 29*4882a593Smuzhiyun4. Call the kernel image 30*4882a593Smuzhiyun 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun1. Setup and initialise RAM 33*4882a593Smuzhiyun--------------------------- 34*4882a593Smuzhiyun 35*4882a593SmuzhiyunRequirement: MANDATORY 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunThe boot loader is expected to find and initialise all RAM that the 38*4882a593Smuzhiyunkernel will use for volatile data storage in the system. It performs 39*4882a593Smuzhiyunthis in a machine dependent manner. (It may use internal algorithms 40*4882a593Smuzhiyunto automatically locate and size all RAM, or it may use knowledge of 41*4882a593Smuzhiyunthe RAM in the machine, or any other method the boot loader designer 42*4882a593Smuzhiyunsees fit.) 43*4882a593Smuzhiyun 44*4882a593Smuzhiyun 45*4882a593Smuzhiyun2. Setup the device tree 46*4882a593Smuzhiyun------------------------- 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunRequirement: MANDATORY 49*4882a593Smuzhiyun 50*4882a593SmuzhiyunThe device tree blob (dtb) must be placed on an 8-byte boundary and must 51*4882a593Smuzhiyunnot exceed 2 megabytes in size. Since the dtb will be mapped cacheable 52*4882a593Smuzhiyunusing blocks of up to 2 megabytes in size, it must not be placed within 53*4882a593Smuzhiyunany 2M region which must be mapped with any specific attributes. 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunNOTE: versions prior to v4.2 also require that the DTB be placed within 56*4882a593Smuzhiyunthe 512 MB region starting at text_offset bytes below the kernel Image. 57*4882a593Smuzhiyun 58*4882a593Smuzhiyun3. Decompress the kernel image 59*4882a593Smuzhiyun------------------------------ 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunRequirement: OPTIONAL 62*4882a593Smuzhiyun 63*4882a593SmuzhiyunThe AArch64 kernel does not currently provide a decompressor and 64*4882a593Smuzhiyuntherefore requires decompression (gzip etc.) to be performed by the boot 65*4882a593Smuzhiyunloader if a compressed Image target (e.g. Image.gz) is used. For 66*4882a593Smuzhiyunbootloaders that do not implement this requirement, the uncompressed 67*4882a593SmuzhiyunImage target is available instead. 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun4. Call the kernel image 71*4882a593Smuzhiyun------------------------ 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunRequirement: MANDATORY 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunThe decompressed kernel image contains a 64-byte header as follows:: 76*4882a593Smuzhiyun 77*4882a593Smuzhiyun u32 code0; /* Executable code */ 78*4882a593Smuzhiyun u32 code1; /* Executable code */ 79*4882a593Smuzhiyun u64 text_offset; /* Image load offset, little endian */ 80*4882a593Smuzhiyun u64 image_size; /* Effective Image size, little endian */ 81*4882a593Smuzhiyun u64 flags; /* kernel flags, little endian */ 82*4882a593Smuzhiyun u64 res2 = 0; /* reserved */ 83*4882a593Smuzhiyun u64 res3 = 0; /* reserved */ 84*4882a593Smuzhiyun u64 res4 = 0; /* reserved */ 85*4882a593Smuzhiyun u32 magic = 0x644d5241; /* Magic number, little endian, "ARM\x64" */ 86*4882a593Smuzhiyun u32 res5; /* reserved (used for PE COFF offset) */ 87*4882a593Smuzhiyun 88*4882a593Smuzhiyun 89*4882a593SmuzhiyunHeader notes: 90*4882a593Smuzhiyun 91*4882a593Smuzhiyun- As of v3.17, all fields are little endian unless stated otherwise. 92*4882a593Smuzhiyun 93*4882a593Smuzhiyun- code0/code1 are responsible for branching to stext. 94*4882a593Smuzhiyun 95*4882a593Smuzhiyun- when booting through EFI, code0/code1 are initially skipped. 96*4882a593Smuzhiyun res5 is an offset to the PE header and the PE header has the EFI 97*4882a593Smuzhiyun entry point (efi_stub_entry). When the stub has done its work, it 98*4882a593Smuzhiyun jumps to code0 to resume the normal boot process. 99*4882a593Smuzhiyun 100*4882a593Smuzhiyun- Prior to v3.17, the endianness of text_offset was not specified. In 101*4882a593Smuzhiyun these cases image_size is zero and text_offset is 0x80000 in the 102*4882a593Smuzhiyun endianness of the kernel. Where image_size is non-zero image_size is 103*4882a593Smuzhiyun little-endian and must be respected. Where image_size is zero, 104*4882a593Smuzhiyun text_offset can be assumed to be 0x80000. 105*4882a593Smuzhiyun 106*4882a593Smuzhiyun- The flags field (introduced in v3.17) is a little-endian 64-bit field 107*4882a593Smuzhiyun composed as follows: 108*4882a593Smuzhiyun 109*4882a593Smuzhiyun ============= =============================================================== 110*4882a593Smuzhiyun Bit 0 Kernel endianness. 1 if BE, 0 if LE. 111*4882a593Smuzhiyun Bit 1-2 Kernel Page size. 112*4882a593Smuzhiyun 113*4882a593Smuzhiyun * 0 - Unspecified. 114*4882a593Smuzhiyun * 1 - 4K 115*4882a593Smuzhiyun * 2 - 16K 116*4882a593Smuzhiyun * 3 - 64K 117*4882a593Smuzhiyun Bit 3 Kernel physical placement 118*4882a593Smuzhiyun 119*4882a593Smuzhiyun 0 120*4882a593Smuzhiyun 2MB aligned base should be as close as possible 121*4882a593Smuzhiyun to the base of DRAM, since memory below it is not 122*4882a593Smuzhiyun accessible via the linear mapping 123*4882a593Smuzhiyun 1 124*4882a593Smuzhiyun 2MB aligned base may be anywhere in physical 125*4882a593Smuzhiyun memory 126*4882a593Smuzhiyun Bits 4-63 Reserved. 127*4882a593Smuzhiyun ============= =============================================================== 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun- When image_size is zero, a bootloader should attempt to keep as much 130*4882a593Smuzhiyun memory as possible free for use by the kernel immediately after the 131*4882a593Smuzhiyun end of the kernel image. The amount of space required will vary 132*4882a593Smuzhiyun depending on selected features, and is effectively unbound. 133*4882a593Smuzhiyun 134*4882a593SmuzhiyunThe Image must be placed text_offset bytes from a 2MB aligned base 135*4882a593Smuzhiyunaddress anywhere in usable system RAM and called there. The region 136*4882a593Smuzhiyunbetween the 2 MB aligned base address and the start of the image has no 137*4882a593Smuzhiyunspecial significance to the kernel, and may be used for other purposes. 138*4882a593SmuzhiyunAt least image_size bytes from the start of the image must be free for 139*4882a593Smuzhiyunuse by the kernel. 140*4882a593SmuzhiyunNOTE: versions prior to v4.6 cannot make use of memory below the 141*4882a593Smuzhiyunphysical offset of the Image so it is recommended that the Image be 142*4882a593Smuzhiyunplaced as close as possible to the start of system RAM. 143*4882a593Smuzhiyun 144*4882a593SmuzhiyunIf an initrd/initramfs is passed to the kernel at boot, it must reside 145*4882a593Smuzhiyunentirely within a 1 GB aligned physical memory window of up to 32 GB in 146*4882a593Smuzhiyunsize that fully covers the kernel Image as well. 147*4882a593Smuzhiyun 148*4882a593SmuzhiyunAny memory described to the kernel (even that below the start of the 149*4882a593Smuzhiyunimage) which is not marked as reserved from the kernel (e.g., with a 150*4882a593Smuzhiyunmemreserve region in the device tree) will be considered as available to 151*4882a593Smuzhiyunthe kernel. 152*4882a593Smuzhiyun 153*4882a593SmuzhiyunBefore jumping into the kernel, the following conditions must be met: 154*4882a593Smuzhiyun 155*4882a593Smuzhiyun- Quiesce all DMA capable devices so that memory does not get 156*4882a593Smuzhiyun corrupted by bogus network packets or disk data. This will save 157*4882a593Smuzhiyun you many hours of debug. 158*4882a593Smuzhiyun 159*4882a593Smuzhiyun- Primary CPU general-purpose register settings: 160*4882a593Smuzhiyun 161*4882a593Smuzhiyun - x0 = physical address of device tree blob (dtb) in system RAM. 162*4882a593Smuzhiyun - x1 = 0 (reserved for future use) 163*4882a593Smuzhiyun - x2 = 0 (reserved for future use) 164*4882a593Smuzhiyun - x3 = 0 (reserved for future use) 165*4882a593Smuzhiyun 166*4882a593Smuzhiyun- CPU mode 167*4882a593Smuzhiyun 168*4882a593Smuzhiyun All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, 169*4882a593Smuzhiyun IRQ and FIQ). 170*4882a593Smuzhiyun The CPU must be in either EL2 (RECOMMENDED in order to have access to 171*4882a593Smuzhiyun the virtualisation extensions) or non-secure EL1. 172*4882a593Smuzhiyun 173*4882a593Smuzhiyun- Caches, MMUs 174*4882a593Smuzhiyun 175*4882a593Smuzhiyun The MMU must be off. 176*4882a593Smuzhiyun 177*4882a593Smuzhiyun The instruction cache may be on or off, and must not hold any stale 178*4882a593Smuzhiyun entries corresponding to the loaded kernel image. 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun The address range corresponding to the loaded kernel image must be 181*4882a593Smuzhiyun cleaned to the PoC. In the presence of a system cache or other 182*4882a593Smuzhiyun coherent masters with caches enabled, this will typically require 183*4882a593Smuzhiyun cache maintenance by VA rather than set/way operations. 184*4882a593Smuzhiyun System caches which respect the architected cache maintenance by VA 185*4882a593Smuzhiyun operations must be configured and may be enabled. 186*4882a593Smuzhiyun System caches which do not respect architected cache maintenance by VA 187*4882a593Smuzhiyun operations (not recommended) must be configured and disabled. 188*4882a593Smuzhiyun 189*4882a593Smuzhiyun- Architected timers 190*4882a593Smuzhiyun 191*4882a593Smuzhiyun CNTFRQ must be programmed with the timer frequency and CNTVOFF must 192*4882a593Smuzhiyun be programmed with a consistent value on all CPUs. If entering the 193*4882a593Smuzhiyun kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where 194*4882a593Smuzhiyun available. 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun- Coherency 197*4882a593Smuzhiyun 198*4882a593Smuzhiyun All CPUs to be booted by the kernel must be part of the same coherency 199*4882a593Smuzhiyun domain on entry to the kernel. This may require IMPLEMENTATION DEFINED 200*4882a593Smuzhiyun initialisation to enable the receiving of maintenance operations on 201*4882a593Smuzhiyun each CPU. 202*4882a593Smuzhiyun 203*4882a593Smuzhiyun- System registers 204*4882a593Smuzhiyun 205*4882a593Smuzhiyun All writable architected system registers at the exception level where 206*4882a593Smuzhiyun the kernel image will be entered must be initialised by software at a 207*4882a593Smuzhiyun higher exception level to prevent execution in an UNKNOWN state. 208*4882a593Smuzhiyun 209*4882a593Smuzhiyun - SCR_EL3.FIQ must have the same value across all CPUs the kernel is 210*4882a593Smuzhiyun executing on. 211*4882a593Smuzhiyun - The value of SCR_EL3.FIQ must be the same as the one present at boot 212*4882a593Smuzhiyun time whenever the kernel is executing. 213*4882a593Smuzhiyun 214*4882a593Smuzhiyun For systems with a GICv3 interrupt controller to be used in v3 mode: 215*4882a593Smuzhiyun - If EL3 is present: 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun - ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1. 218*4882a593Smuzhiyun - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1. 219*4882a593Smuzhiyun - ICC_CTLR_EL3.PMHE (bit 6) must be set to the same value across 220*4882a593Smuzhiyun all CPUs the kernel is executing on, and must stay constant 221*4882a593Smuzhiyun for the lifetime of the kernel. 222*4882a593Smuzhiyun 223*4882a593Smuzhiyun - If the kernel is entered at EL1: 224*4882a593Smuzhiyun 225*4882a593Smuzhiyun - ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 226*4882a593Smuzhiyun - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. 227*4882a593Smuzhiyun 228*4882a593Smuzhiyun - The DT or ACPI tables must describe a GICv3 interrupt controller. 229*4882a593Smuzhiyun 230*4882a593Smuzhiyun For systems with a GICv3 interrupt controller to be used in 231*4882a593Smuzhiyun compatibility (v2) mode: 232*4882a593Smuzhiyun 233*4882a593Smuzhiyun - If EL3 is present: 234*4882a593Smuzhiyun 235*4882a593Smuzhiyun ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0. 236*4882a593Smuzhiyun 237*4882a593Smuzhiyun - If the kernel is entered at EL1: 238*4882a593Smuzhiyun 239*4882a593Smuzhiyun ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0. 240*4882a593Smuzhiyun 241*4882a593Smuzhiyun - The DT or ACPI tables must describe a GICv2 interrupt controller. 242*4882a593Smuzhiyun 243*4882a593Smuzhiyun For CPUs with pointer authentication functionality: 244*4882a593Smuzhiyun 245*4882a593Smuzhiyun - If EL3 is present: 246*4882a593Smuzhiyun 247*4882a593Smuzhiyun - SCR_EL3.APK (bit 16) must be initialised to 0b1 248*4882a593Smuzhiyun - SCR_EL3.API (bit 17) must be initialised to 0b1 249*4882a593Smuzhiyun 250*4882a593Smuzhiyun - If the kernel is entered at EL1: 251*4882a593Smuzhiyun 252*4882a593Smuzhiyun - HCR_EL2.APK (bit 40) must be initialised to 0b1 253*4882a593Smuzhiyun - HCR_EL2.API (bit 41) must be initialised to 0b1 254*4882a593Smuzhiyun 255*4882a593Smuzhiyun For CPUs with Activity Monitors Unit v1 (AMUv1) extension present: 256*4882a593Smuzhiyun 257*4882a593Smuzhiyun - If EL3 is present: 258*4882a593Smuzhiyun 259*4882a593Smuzhiyun - CPTR_EL3.TAM (bit 30) must be initialised to 0b0 260*4882a593Smuzhiyun - CPTR_EL2.TAM (bit 30) must be initialised to 0b0 261*4882a593Smuzhiyun - AMCNTENSET0_EL0 must be initialised to 0b1111 262*4882a593Smuzhiyun - AMCNTENSET1_EL0 must be initialised to a platform specific value 263*4882a593Smuzhiyun having 0b1 set for the corresponding bit for each of the auxiliary 264*4882a593Smuzhiyun counters present. 265*4882a593Smuzhiyun 266*4882a593Smuzhiyun - If the kernel is entered at EL1: 267*4882a593Smuzhiyun 268*4882a593Smuzhiyun - AMCNTENSET0_EL0 must be initialised to 0b1111 269*4882a593Smuzhiyun - AMCNTENSET1_EL0 must be initialised to a platform specific value 270*4882a593Smuzhiyun having 0b1 set for the corresponding bit for each of the auxiliary 271*4882a593Smuzhiyun counters present. 272*4882a593Smuzhiyun 273*4882a593SmuzhiyunThe requirements described above for CPU mode, caches, MMUs, architected 274*4882a593Smuzhiyuntimers, coherency and system registers apply to all CPUs. All CPUs must 275*4882a593Smuzhiyunenter the kernel in the same exception level. 276*4882a593Smuzhiyun 277*4882a593SmuzhiyunThe boot loader is expected to enter the kernel on each CPU in the 278*4882a593Smuzhiyunfollowing manner: 279*4882a593Smuzhiyun 280*4882a593Smuzhiyun- The primary CPU must jump directly to the first instruction of the 281*4882a593Smuzhiyun kernel image. The device tree blob passed by this CPU must contain 282*4882a593Smuzhiyun an 'enable-method' property for each cpu node. The supported 283*4882a593Smuzhiyun enable-methods are described below. 284*4882a593Smuzhiyun 285*4882a593Smuzhiyun It is expected that the bootloader will generate these device tree 286*4882a593Smuzhiyun properties and insert them into the blob prior to kernel entry. 287*4882a593Smuzhiyun 288*4882a593Smuzhiyun- CPUs with a "spin-table" enable-method must have a 'cpu-release-addr' 289*4882a593Smuzhiyun property in their cpu node. This property identifies a 290*4882a593Smuzhiyun naturally-aligned 64-bit zero-initalised memory location. 291*4882a593Smuzhiyun 292*4882a593Smuzhiyun These CPUs should spin outside of the kernel in a reserved area of 293*4882a593Smuzhiyun memory (communicated to the kernel by a /memreserve/ region in the 294*4882a593Smuzhiyun device tree) polling their cpu-release-addr location, which must be 295*4882a593Smuzhiyun contained in the reserved region. A wfe instruction may be inserted 296*4882a593Smuzhiyun to reduce the overhead of the busy-loop and a sev will be issued by 297*4882a593Smuzhiyun the primary CPU. When a read of the location pointed to by the 298*4882a593Smuzhiyun cpu-release-addr returns a non-zero value, the CPU must jump to this 299*4882a593Smuzhiyun value. The value will be written as a single 64-bit little-endian 300*4882a593Smuzhiyun value, so CPUs must convert the read value to their native endianness 301*4882a593Smuzhiyun before jumping to it. 302*4882a593Smuzhiyun 303*4882a593Smuzhiyun- CPUs with a "psci" enable method should remain outside of 304*4882a593Smuzhiyun the kernel (i.e. outside of the regions of memory described to the 305*4882a593Smuzhiyun kernel in the memory node, or in a reserved area of memory described 306*4882a593Smuzhiyun to the kernel by a /memreserve/ region in the device tree). The 307*4882a593Smuzhiyun kernel will issue CPU_ON calls as described in ARM document number ARM 308*4882a593Smuzhiyun DEN 0022A ("Power State Coordination Interface System Software on ARM 309*4882a593Smuzhiyun processors") to bring CPUs into the kernel. 310*4882a593Smuzhiyun 311*4882a593Smuzhiyun The device tree should contain a 'psci' node, as described in 312*4882a593Smuzhiyun Documentation/devicetree/bindings/arm/psci.yaml. 313*4882a593Smuzhiyun 314*4882a593Smuzhiyun- Secondary CPU general-purpose register settings 315*4882a593Smuzhiyun 316*4882a593Smuzhiyun - x0 = 0 (reserved for future use) 317*4882a593Smuzhiyun - x1 = 0 (reserved for future use) 318*4882a593Smuzhiyun - x2 = 0 (reserved for future use) 319*4882a593Smuzhiyun - x3 = 0 (reserved for future use) 320