xref: /OK3568_Linux_fs/kernel/Documentation/arm64/booting.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=====================
2*4882a593SmuzhiyunBooting AArch64 Linux
3*4882a593Smuzhiyun=====================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunAuthor: Will Deacon <will.deacon@arm.com>
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunDate  : 07 September 2012
8*4882a593Smuzhiyun
9*4882a593SmuzhiyunThis document is based on the ARM booting document by Russell King and
10*4882a593Smuzhiyunis relevant to all public releases of the AArch64 Linux kernel.
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunThe AArch64 exception model is made up of a number of exception levels
13*4882a593Smuzhiyun(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure
14*4882a593Smuzhiyuncounterpart.  EL2 is the hypervisor level and exists only in non-secure
15*4882a593Smuzhiyunmode. EL3 is the highest priority level and exists only in secure mode.
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunFor the purposes of this document, we will use the term `boot loader`
18*4882a593Smuzhiyunsimply to define all software that executes on the CPU(s) before control
19*4882a593Smuzhiyunis passed to the Linux kernel.  This may include secure monitor and
20*4882a593Smuzhiyunhypervisor code, or it may just be a handful of instructions for
21*4882a593Smuzhiyunpreparing a minimal boot environment.
22*4882a593Smuzhiyun
23*4882a593SmuzhiyunEssentially, the boot loader should provide (as a minimum) the
24*4882a593Smuzhiyunfollowing:
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun1. Setup and initialise the RAM
27*4882a593Smuzhiyun2. Setup the device tree
28*4882a593Smuzhiyun3. Decompress the kernel image
29*4882a593Smuzhiyun4. Call the kernel image
30*4882a593Smuzhiyun
31*4882a593Smuzhiyun
32*4882a593Smuzhiyun1. Setup and initialise RAM
33*4882a593Smuzhiyun---------------------------
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunRequirement: MANDATORY
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunThe boot loader is expected to find and initialise all RAM that the
38*4882a593Smuzhiyunkernel will use for volatile data storage in the system.  It performs
39*4882a593Smuzhiyunthis in a machine dependent manner.  (It may use internal algorithms
40*4882a593Smuzhiyunto automatically locate and size all RAM, or it may use knowledge of
41*4882a593Smuzhiyunthe RAM in the machine, or any other method the boot loader designer
42*4882a593Smuzhiyunsees fit.)
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun
45*4882a593Smuzhiyun2. Setup the device tree
46*4882a593Smuzhiyun-------------------------
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunRequirement: MANDATORY
49*4882a593Smuzhiyun
50*4882a593SmuzhiyunThe device tree blob (dtb) must be placed on an 8-byte boundary and must
51*4882a593Smuzhiyunnot exceed 2 megabytes in size. Since the dtb will be mapped cacheable
52*4882a593Smuzhiyunusing blocks of up to 2 megabytes in size, it must not be placed within
53*4882a593Smuzhiyunany 2M region which must be mapped with any specific attributes.
54*4882a593Smuzhiyun
55*4882a593SmuzhiyunNOTE: versions prior to v4.2 also require that the DTB be placed within
56*4882a593Smuzhiyunthe 512 MB region starting at text_offset bytes below the kernel Image.
57*4882a593Smuzhiyun
58*4882a593Smuzhiyun3. Decompress the kernel image
59*4882a593Smuzhiyun------------------------------
60*4882a593Smuzhiyun
61*4882a593SmuzhiyunRequirement: OPTIONAL
62*4882a593Smuzhiyun
63*4882a593SmuzhiyunThe AArch64 kernel does not currently provide a decompressor and
64*4882a593Smuzhiyuntherefore requires decompression (gzip etc.) to be performed by the boot
65*4882a593Smuzhiyunloader if a compressed Image target (e.g. Image.gz) is used.  For
66*4882a593Smuzhiyunbootloaders that do not implement this requirement, the uncompressed
67*4882a593SmuzhiyunImage target is available instead.
68*4882a593Smuzhiyun
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun4. Call the kernel image
71*4882a593Smuzhiyun------------------------
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunRequirement: MANDATORY
74*4882a593Smuzhiyun
75*4882a593SmuzhiyunThe decompressed kernel image contains a 64-byte header as follows::
76*4882a593Smuzhiyun
77*4882a593Smuzhiyun  u32 code0;			/* Executable code */
78*4882a593Smuzhiyun  u32 code1;			/* Executable code */
79*4882a593Smuzhiyun  u64 text_offset;		/* Image load offset, little endian */
80*4882a593Smuzhiyun  u64 image_size;		/* Effective Image size, little endian */
81*4882a593Smuzhiyun  u64 flags;			/* kernel flags, little endian */
82*4882a593Smuzhiyun  u64 res2	= 0;		/* reserved */
83*4882a593Smuzhiyun  u64 res3	= 0;		/* reserved */
84*4882a593Smuzhiyun  u64 res4	= 0;		/* reserved */
85*4882a593Smuzhiyun  u32 magic	= 0x644d5241;	/* Magic number, little endian, "ARM\x64" */
86*4882a593Smuzhiyun  u32 res5;			/* reserved (used for PE COFF offset) */
87*4882a593Smuzhiyun
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunHeader notes:
90*4882a593Smuzhiyun
91*4882a593Smuzhiyun- As of v3.17, all fields are little endian unless stated otherwise.
92*4882a593Smuzhiyun
93*4882a593Smuzhiyun- code0/code1 are responsible for branching to stext.
94*4882a593Smuzhiyun
95*4882a593Smuzhiyun- when booting through EFI, code0/code1 are initially skipped.
96*4882a593Smuzhiyun  res5 is an offset to the PE header and the PE header has the EFI
97*4882a593Smuzhiyun  entry point (efi_stub_entry).  When the stub has done its work, it
98*4882a593Smuzhiyun  jumps to code0 to resume the normal boot process.
99*4882a593Smuzhiyun
100*4882a593Smuzhiyun- Prior to v3.17, the endianness of text_offset was not specified.  In
101*4882a593Smuzhiyun  these cases image_size is zero and text_offset is 0x80000 in the
102*4882a593Smuzhiyun  endianness of the kernel.  Where image_size is non-zero image_size is
103*4882a593Smuzhiyun  little-endian and must be respected.  Where image_size is zero,
104*4882a593Smuzhiyun  text_offset can be assumed to be 0x80000.
105*4882a593Smuzhiyun
106*4882a593Smuzhiyun- The flags field (introduced in v3.17) is a little-endian 64-bit field
107*4882a593Smuzhiyun  composed as follows:
108*4882a593Smuzhiyun
109*4882a593Smuzhiyun  ============= ===============================================================
110*4882a593Smuzhiyun  Bit 0		Kernel endianness.  1 if BE, 0 if LE.
111*4882a593Smuzhiyun  Bit 1-2	Kernel Page size.
112*4882a593Smuzhiyun
113*4882a593Smuzhiyun			* 0 - Unspecified.
114*4882a593Smuzhiyun			* 1 - 4K
115*4882a593Smuzhiyun			* 2 - 16K
116*4882a593Smuzhiyun			* 3 - 64K
117*4882a593Smuzhiyun  Bit 3		Kernel physical placement
118*4882a593Smuzhiyun
119*4882a593Smuzhiyun			0
120*4882a593Smuzhiyun			  2MB aligned base should be as close as possible
121*4882a593Smuzhiyun			  to the base of DRAM, since memory below it is not
122*4882a593Smuzhiyun			  accessible via the linear mapping
123*4882a593Smuzhiyun			1
124*4882a593Smuzhiyun			  2MB aligned base may be anywhere in physical
125*4882a593Smuzhiyun			  memory
126*4882a593Smuzhiyun  Bits 4-63	Reserved.
127*4882a593Smuzhiyun  ============= ===============================================================
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun- When image_size is zero, a bootloader should attempt to keep as much
130*4882a593Smuzhiyun  memory as possible free for use by the kernel immediately after the
131*4882a593Smuzhiyun  end of the kernel image. The amount of space required will vary
132*4882a593Smuzhiyun  depending on selected features, and is effectively unbound.
133*4882a593Smuzhiyun
134*4882a593SmuzhiyunThe Image must be placed text_offset bytes from a 2MB aligned base
135*4882a593Smuzhiyunaddress anywhere in usable system RAM and called there. The region
136*4882a593Smuzhiyunbetween the 2 MB aligned base address and the start of the image has no
137*4882a593Smuzhiyunspecial significance to the kernel, and may be used for other purposes.
138*4882a593SmuzhiyunAt least image_size bytes from the start of the image must be free for
139*4882a593Smuzhiyunuse by the kernel.
140*4882a593SmuzhiyunNOTE: versions prior to v4.6 cannot make use of memory below the
141*4882a593Smuzhiyunphysical offset of the Image so it is recommended that the Image be
142*4882a593Smuzhiyunplaced as close as possible to the start of system RAM.
143*4882a593Smuzhiyun
144*4882a593SmuzhiyunIf an initrd/initramfs is passed to the kernel at boot, it must reside
145*4882a593Smuzhiyunentirely within a 1 GB aligned physical memory window of up to 32 GB in
146*4882a593Smuzhiyunsize that fully covers the kernel Image as well.
147*4882a593Smuzhiyun
148*4882a593SmuzhiyunAny memory described to the kernel (even that below the start of the
149*4882a593Smuzhiyunimage) which is not marked as reserved from the kernel (e.g., with a
150*4882a593Smuzhiyunmemreserve region in the device tree) will be considered as available to
151*4882a593Smuzhiyunthe kernel.
152*4882a593Smuzhiyun
153*4882a593SmuzhiyunBefore jumping into the kernel, the following conditions must be met:
154*4882a593Smuzhiyun
155*4882a593Smuzhiyun- Quiesce all DMA capable devices so that memory does not get
156*4882a593Smuzhiyun  corrupted by bogus network packets or disk data.  This will save
157*4882a593Smuzhiyun  you many hours of debug.
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun- Primary CPU general-purpose register settings:
160*4882a593Smuzhiyun
161*4882a593Smuzhiyun    - x0 = physical address of device tree blob (dtb) in system RAM.
162*4882a593Smuzhiyun    - x1 = 0 (reserved for future use)
163*4882a593Smuzhiyun    - x2 = 0 (reserved for future use)
164*4882a593Smuzhiyun    - x3 = 0 (reserved for future use)
165*4882a593Smuzhiyun
166*4882a593Smuzhiyun- CPU mode
167*4882a593Smuzhiyun
168*4882a593Smuzhiyun  All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError,
169*4882a593Smuzhiyun  IRQ and FIQ).
170*4882a593Smuzhiyun  The CPU must be in either EL2 (RECOMMENDED in order to have access to
171*4882a593Smuzhiyun  the virtualisation extensions) or non-secure EL1.
172*4882a593Smuzhiyun
173*4882a593Smuzhiyun- Caches, MMUs
174*4882a593Smuzhiyun
175*4882a593Smuzhiyun  The MMU must be off.
176*4882a593Smuzhiyun
177*4882a593Smuzhiyun  The instruction cache may be on or off, and must not hold any stale
178*4882a593Smuzhiyun  entries corresponding to the loaded kernel image.
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun  The address range corresponding to the loaded kernel image must be
181*4882a593Smuzhiyun  cleaned to the PoC. In the presence of a system cache or other
182*4882a593Smuzhiyun  coherent masters with caches enabled, this will typically require
183*4882a593Smuzhiyun  cache maintenance by VA rather than set/way operations.
184*4882a593Smuzhiyun  System caches which respect the architected cache maintenance by VA
185*4882a593Smuzhiyun  operations must be configured and may be enabled.
186*4882a593Smuzhiyun  System caches which do not respect architected cache maintenance by VA
187*4882a593Smuzhiyun  operations (not recommended) must be configured and disabled.
188*4882a593Smuzhiyun
189*4882a593Smuzhiyun- Architected timers
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun  CNTFRQ must be programmed with the timer frequency and CNTVOFF must
192*4882a593Smuzhiyun  be programmed with a consistent value on all CPUs.  If entering the
193*4882a593Smuzhiyun  kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where
194*4882a593Smuzhiyun  available.
195*4882a593Smuzhiyun
196*4882a593Smuzhiyun- Coherency
197*4882a593Smuzhiyun
198*4882a593Smuzhiyun  All CPUs to be booted by the kernel must be part of the same coherency
199*4882a593Smuzhiyun  domain on entry to the kernel.  This may require IMPLEMENTATION DEFINED
200*4882a593Smuzhiyun  initialisation to enable the receiving of maintenance operations on
201*4882a593Smuzhiyun  each CPU.
202*4882a593Smuzhiyun
203*4882a593Smuzhiyun- System registers
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun  All writable architected system registers at the exception level where
206*4882a593Smuzhiyun  the kernel image will be entered must be initialised by software at a
207*4882a593Smuzhiyun  higher exception level to prevent execution in an UNKNOWN state.
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun  - SCR_EL3.FIQ must have the same value across all CPUs the kernel is
210*4882a593Smuzhiyun    executing on.
211*4882a593Smuzhiyun  - The value of SCR_EL3.FIQ must be the same as the one present at boot
212*4882a593Smuzhiyun    time whenever the kernel is executing.
213*4882a593Smuzhiyun
214*4882a593Smuzhiyun  For systems with a GICv3 interrupt controller to be used in v3 mode:
215*4882a593Smuzhiyun  - If EL3 is present:
216*4882a593Smuzhiyun
217*4882a593Smuzhiyun      - ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1.
218*4882a593Smuzhiyun      - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1.
219*4882a593Smuzhiyun      - ICC_CTLR_EL3.PMHE (bit 6) must be set to the same value across
220*4882a593Smuzhiyun        all CPUs the kernel is executing on, and must stay constant
221*4882a593Smuzhiyun        for the lifetime of the kernel.
222*4882a593Smuzhiyun
223*4882a593Smuzhiyun  - If the kernel is entered at EL1:
224*4882a593Smuzhiyun
225*4882a593Smuzhiyun      - ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1
226*4882a593Smuzhiyun      - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1.
227*4882a593Smuzhiyun
228*4882a593Smuzhiyun  - The DT or ACPI tables must describe a GICv3 interrupt controller.
229*4882a593Smuzhiyun
230*4882a593Smuzhiyun  For systems with a GICv3 interrupt controller to be used in
231*4882a593Smuzhiyun  compatibility (v2) mode:
232*4882a593Smuzhiyun
233*4882a593Smuzhiyun  - If EL3 is present:
234*4882a593Smuzhiyun
235*4882a593Smuzhiyun      ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0.
236*4882a593Smuzhiyun
237*4882a593Smuzhiyun  - If the kernel is entered at EL1:
238*4882a593Smuzhiyun
239*4882a593Smuzhiyun      ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0.
240*4882a593Smuzhiyun
241*4882a593Smuzhiyun  - The DT or ACPI tables must describe a GICv2 interrupt controller.
242*4882a593Smuzhiyun
243*4882a593Smuzhiyun  For CPUs with pointer authentication functionality:
244*4882a593Smuzhiyun
245*4882a593Smuzhiyun  - If EL3 is present:
246*4882a593Smuzhiyun
247*4882a593Smuzhiyun    - SCR_EL3.APK (bit 16) must be initialised to 0b1
248*4882a593Smuzhiyun    - SCR_EL3.API (bit 17) must be initialised to 0b1
249*4882a593Smuzhiyun
250*4882a593Smuzhiyun  - If the kernel is entered at EL1:
251*4882a593Smuzhiyun
252*4882a593Smuzhiyun    - HCR_EL2.APK (bit 40) must be initialised to 0b1
253*4882a593Smuzhiyun    - HCR_EL2.API (bit 41) must be initialised to 0b1
254*4882a593Smuzhiyun
255*4882a593Smuzhiyun  For CPUs with Activity Monitors Unit v1 (AMUv1) extension present:
256*4882a593Smuzhiyun
257*4882a593Smuzhiyun  - If EL3 is present:
258*4882a593Smuzhiyun
259*4882a593Smuzhiyun    - CPTR_EL3.TAM (bit 30) must be initialised to 0b0
260*4882a593Smuzhiyun    - CPTR_EL2.TAM (bit 30) must be initialised to 0b0
261*4882a593Smuzhiyun    - AMCNTENSET0_EL0 must be initialised to 0b1111
262*4882a593Smuzhiyun    - AMCNTENSET1_EL0 must be initialised to a platform specific value
263*4882a593Smuzhiyun      having 0b1 set for the corresponding bit for each of the auxiliary
264*4882a593Smuzhiyun      counters present.
265*4882a593Smuzhiyun
266*4882a593Smuzhiyun  - If the kernel is entered at EL1:
267*4882a593Smuzhiyun
268*4882a593Smuzhiyun    - AMCNTENSET0_EL0 must be initialised to 0b1111
269*4882a593Smuzhiyun    - AMCNTENSET1_EL0 must be initialised to a platform specific value
270*4882a593Smuzhiyun      having 0b1 set for the corresponding bit for each of the auxiliary
271*4882a593Smuzhiyun      counters present.
272*4882a593Smuzhiyun
273*4882a593SmuzhiyunThe requirements described above for CPU mode, caches, MMUs, architected
274*4882a593Smuzhiyuntimers, coherency and system registers apply to all CPUs.  All CPUs must
275*4882a593Smuzhiyunenter the kernel in the same exception level.
276*4882a593Smuzhiyun
277*4882a593SmuzhiyunThe boot loader is expected to enter the kernel on each CPU in the
278*4882a593Smuzhiyunfollowing manner:
279*4882a593Smuzhiyun
280*4882a593Smuzhiyun- The primary CPU must jump directly to the first instruction of the
281*4882a593Smuzhiyun  kernel image.  The device tree blob passed by this CPU must contain
282*4882a593Smuzhiyun  an 'enable-method' property for each cpu node.  The supported
283*4882a593Smuzhiyun  enable-methods are described below.
284*4882a593Smuzhiyun
285*4882a593Smuzhiyun  It is expected that the bootloader will generate these device tree
286*4882a593Smuzhiyun  properties and insert them into the blob prior to kernel entry.
287*4882a593Smuzhiyun
288*4882a593Smuzhiyun- CPUs with a "spin-table" enable-method must have a 'cpu-release-addr'
289*4882a593Smuzhiyun  property in their cpu node.  This property identifies a
290*4882a593Smuzhiyun  naturally-aligned 64-bit zero-initalised memory location.
291*4882a593Smuzhiyun
292*4882a593Smuzhiyun  These CPUs should spin outside of the kernel in a reserved area of
293*4882a593Smuzhiyun  memory (communicated to the kernel by a /memreserve/ region in the
294*4882a593Smuzhiyun  device tree) polling their cpu-release-addr location, which must be
295*4882a593Smuzhiyun  contained in the reserved region.  A wfe instruction may be inserted
296*4882a593Smuzhiyun  to reduce the overhead of the busy-loop and a sev will be issued by
297*4882a593Smuzhiyun  the primary CPU.  When a read of the location pointed to by the
298*4882a593Smuzhiyun  cpu-release-addr returns a non-zero value, the CPU must jump to this
299*4882a593Smuzhiyun  value.  The value will be written as a single 64-bit little-endian
300*4882a593Smuzhiyun  value, so CPUs must convert the read value to their native endianness
301*4882a593Smuzhiyun  before jumping to it.
302*4882a593Smuzhiyun
303*4882a593Smuzhiyun- CPUs with a "psci" enable method should remain outside of
304*4882a593Smuzhiyun  the kernel (i.e. outside of the regions of memory described to the
305*4882a593Smuzhiyun  kernel in the memory node, or in a reserved area of memory described
306*4882a593Smuzhiyun  to the kernel by a /memreserve/ region in the device tree).  The
307*4882a593Smuzhiyun  kernel will issue CPU_ON calls as described in ARM document number ARM
308*4882a593Smuzhiyun  DEN 0022A ("Power State Coordination Interface System Software on ARM
309*4882a593Smuzhiyun  processors") to bring CPUs into the kernel.
310*4882a593Smuzhiyun
311*4882a593Smuzhiyun  The device tree should contain a 'psci' node, as described in
312*4882a593Smuzhiyun  Documentation/devicetree/bindings/arm/psci.yaml.
313*4882a593Smuzhiyun
314*4882a593Smuzhiyun- Secondary CPU general-purpose register settings
315*4882a593Smuzhiyun
316*4882a593Smuzhiyun  - x0 = 0 (reserved for future use)
317*4882a593Smuzhiyun  - x1 = 0 (reserved for future use)
318*4882a593Smuzhiyun  - x2 = 0 (reserved for future use)
319*4882a593Smuzhiyun  - x3 = 0 (reserved for future use)
320