xref: /OK3568_Linux_fs/kernel/Documentation/arm64/sve.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun===================================================
2*4882a593SmuzhiyunScalable Vector Extension support for AArch64 Linux
3*4882a593Smuzhiyun===================================================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunAuthor: Dave Martin <Dave.Martin@arm.com>
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunDate:   4 August 2017
8*4882a593Smuzhiyun
9*4882a593SmuzhiyunThis document outlines briefly the interface provided to userspace by Linux in
10*4882a593Smuzhiyunorder to support use of the ARM Scalable Vector Extension (SVE).
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunThis is an outline of the most important features and issues only and not
13*4882a593Smuzhiyunintended to be exhaustive.
14*4882a593Smuzhiyun
15*4882a593SmuzhiyunThis document does not aim to describe the SVE architecture or programmer's
16*4882a593Smuzhiyunmodel.  To aid understanding, a minimal description of relevant programmer's
17*4882a593Smuzhiyunmodel features for SVE is included in Appendix A.
18*4882a593Smuzhiyun
19*4882a593Smuzhiyun
20*4882a593Smuzhiyun1.  General
21*4882a593Smuzhiyun-----------
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
24*4882a593Smuzhiyun  tracked per-thread.
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
27*4882a593Smuzhiyun  AT_HWCAP entry.  Presence of this flag implies the presence of the SVE
28*4882a593Smuzhiyun  instructions and registers, and the Linux-specific system interfaces
29*4882a593Smuzhiyun  described in this document.  SVE is reported in /proc/cpuinfo as "sve".
30*4882a593Smuzhiyun
31*4882a593Smuzhiyun* Support for the execution of SVE instructions in userspace can also be
32*4882a593Smuzhiyun  detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS
33*4882a593Smuzhiyun  instruction, and checking that the value of the SVE field is nonzero. [3]
34*4882a593Smuzhiyun
35*4882a593Smuzhiyun  It does not guarantee the presence of the system interfaces described in the
36*4882a593Smuzhiyun  following sections: software that needs to verify that those interfaces are
37*4882a593Smuzhiyun  present must check for HWCAP_SVE instead.
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun* On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also
40*4882a593Smuzhiyun  be reported in the AT_HWCAP2 aux vector entry.  In addition to this,
41*4882a593Smuzhiyun  optional extensions to SVE2 may be reported by the presence of:
42*4882a593Smuzhiyun
43*4882a593Smuzhiyun	HWCAP2_SVE2
44*4882a593Smuzhiyun	HWCAP2_SVEAES
45*4882a593Smuzhiyun	HWCAP2_SVEPMULL
46*4882a593Smuzhiyun	HWCAP2_SVEBITPERM
47*4882a593Smuzhiyun	HWCAP2_SVESHA3
48*4882a593Smuzhiyun	HWCAP2_SVESM4
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun  This list may be extended over time as the SVE architecture evolves.
51*4882a593Smuzhiyun
52*4882a593Smuzhiyun  These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1,
53*4882a593Smuzhiyun  which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
54*4882a593Smuzhiyun  cpu-feature-registers.txt for details.
55*4882a593Smuzhiyun
56*4882a593Smuzhiyun* Debuggers should restrict themselves to interacting with the target via the
57*4882a593Smuzhiyun  NT_ARM_SVE regset.  The recommended way of detecting support for this regset
58*4882a593Smuzhiyun  is to connect to a target process first and then attempt a
59*4882a593Smuzhiyun  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
62*4882a593Smuzhiyun  between userspace and the kernel, the register value is encoded in memory in
63*4882a593Smuzhiyun  an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at
64*4882a593Smuzhiyun  byte offset i from the start of the memory representation.  This affects for
65*4882a593Smuzhiyun  example the signal frame (struct sve_context) and ptrace interface
66*4882a593Smuzhiyun  (struct user_sve_header) and associated data.
67*4882a593Smuzhiyun
68*4882a593Smuzhiyun  Beware that on big-endian systems this results in a different byte order than
69*4882a593Smuzhiyun  for the FPSIMD V-registers, which are stored as single host-endian 128-bit
70*4882a593Smuzhiyun  values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at
71*4882a593Smuzhiyun  byte offset i.  (struct fpsimd_context, struct user_fpsimd_state).
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun2.  Vector length terminology
75*4882a593Smuzhiyun-----------------------------
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunThe size of an SVE vector (Z) register is referred to as the "vector length".
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunTo avoid confusion about the units used to express vector length, the kernel
80*4882a593Smuzhiyunadopts the following conventions:
81*4882a593Smuzhiyun
82*4882a593Smuzhiyun* Vector length (VL) = size of a Z-register in bytes
83*4882a593Smuzhiyun
84*4882a593Smuzhiyun* Vector quadwords (VQ) = size of a Z-register in units of 128 bits
85*4882a593Smuzhiyun
86*4882a593Smuzhiyun(So, VL = 16 * VQ.)
87*4882a593Smuzhiyun
88*4882a593SmuzhiyunThe VQ convention is used where the underlying granularity is important, such
89*4882a593Smuzhiyunas in data structure definitions.  In most other situations, the VL convention
90*4882a593Smuzhiyunis used.  This is consistent with the meaning of the "VL" pseudo-register in
91*4882a593Smuzhiyunthe SVE instruction set architecture.
92*4882a593Smuzhiyun
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun3.  System call behaviour
95*4882a593Smuzhiyun-------------------------
96*4882a593Smuzhiyun
97*4882a593Smuzhiyun* On syscall, V0..V31 are preserved (as without SVE).  Thus, bits [127:0] of
98*4882a593Smuzhiyun  Z0..Z31 are preserved.  All other bits of Z0..Z31, and all of P0..P15 and FFR
99*4882a593Smuzhiyun  become unspecified on return from a syscall.
100*4882a593Smuzhiyun
101*4882a593Smuzhiyun* The SVE registers are not used to pass arguments to or receive results from
102*4882a593Smuzhiyun  any syscall.
103*4882a593Smuzhiyun
104*4882a593Smuzhiyun* In practice the affected registers/bits will be preserved or will be replaced
105*4882a593Smuzhiyun  with zeros on return from a syscall, but userspace should not make
106*4882a593Smuzhiyun  assumptions about this.  The kernel behaviour may vary on a case-by-case
107*4882a593Smuzhiyun  basis.
108*4882a593Smuzhiyun
109*4882a593Smuzhiyun* All other SVE state of a thread, including the currently configured vector
110*4882a593Smuzhiyun  length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector
111*4882a593Smuzhiyun  length (if any), is preserved across all syscalls, subject to the specific
112*4882a593Smuzhiyun  exceptions for execve() described in section 6.
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun  In particular, on return from a fork() or clone(), the parent and new child
115*4882a593Smuzhiyun  process or thread share identical SVE configuration, matching that of the
116*4882a593Smuzhiyun  parent before the call.
117*4882a593Smuzhiyun
118*4882a593Smuzhiyun
119*4882a593Smuzhiyun4.  Signal handling
120*4882a593Smuzhiyun-------------------
121*4882a593Smuzhiyun
122*4882a593Smuzhiyun* A new signal frame record sve_context encodes the SVE registers on signal
123*4882a593Smuzhiyun  delivery. [1]
124*4882a593Smuzhiyun
125*4882a593Smuzhiyun* This record is supplementary to fpsimd_context.  The FPSR and FPCR registers
126*4882a593Smuzhiyun  are only present in fpsimd_context.  For convenience, the content of V0..V31
127*4882a593Smuzhiyun  is duplicated between sve_context and fpsimd_context.
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun* The signal frame record for SVE always contains basic metadata, in particular
130*4882a593Smuzhiyun  the thread's vector length (in sve_context.vl).
131*4882a593Smuzhiyun
132*4882a593Smuzhiyun* The SVE registers may or may not be included in the record, depending on
133*4882a593Smuzhiyun  whether the registers are live for the thread.  The registers are present if
134*4882a593Smuzhiyun  and only if:
135*4882a593Smuzhiyun  sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)).
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun* If the registers are present, the remainder of the record has a vl-dependent
138*4882a593Smuzhiyun  size and layout.  Macros SVE_SIG_* are defined [1] to facilitate access to
139*4882a593Smuzhiyun  the members.
140*4882a593Smuzhiyun
141*4882a593Smuzhiyun* Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant
142*4882a593Smuzhiyun  layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the
143*4882a593Smuzhiyun  start of the register's representation in memory.
144*4882a593Smuzhiyun
145*4882a593Smuzhiyun* If the SVE context is too big to fit in sigcontext.__reserved[], then extra
146*4882a593Smuzhiyun  space is allocated on the stack, an extra_context record is written in
147*4882a593Smuzhiyun  __reserved[] referencing this space.  sve_context is then written in the
148*4882a593Smuzhiyun  extra space.  Refer to [1] for further details about this mechanism.
149*4882a593Smuzhiyun
150*4882a593Smuzhiyun
151*4882a593Smuzhiyun5.  Signal return
152*4882a593Smuzhiyun-----------------
153*4882a593Smuzhiyun
154*4882a593SmuzhiyunWhen returning from a signal handler:
155*4882a593Smuzhiyun
156*4882a593Smuzhiyun* If there is no sve_context record in the signal frame, or if the record is
157*4882a593Smuzhiyun  present but contains no register data as desribed in the previous section,
158*4882a593Smuzhiyun  then the SVE registers/bits become non-live and take unspecified values.
159*4882a593Smuzhiyun
160*4882a593Smuzhiyun* If sve_context is present in the signal frame and contains full register
161*4882a593Smuzhiyun  data, the SVE registers become live and are populated with the specified
162*4882a593Smuzhiyun  data.  However, for backward compatibility reasons, bits [127:0] of Z0..Z31
163*4882a593Smuzhiyun  are always restored from the corresponding members of fpsimd_context.vregs[]
164*4882a593Smuzhiyun  and not from sve_context.  The remaining bits are restored from sve_context.
165*4882a593Smuzhiyun
166*4882a593Smuzhiyun* Inclusion of fpsimd_context in the signal frame remains mandatory,
167*4882a593Smuzhiyun  irrespective of whether sve_context is present or not.
168*4882a593Smuzhiyun
169*4882a593Smuzhiyun* The vector length cannot be changed via signal return.  If sve_context.vl in
170*4882a593Smuzhiyun  the signal frame does not match the current vector length, the signal return
171*4882a593Smuzhiyun  attempt is treated as illegal, resulting in a forced SIGSEGV.
172*4882a593Smuzhiyun
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun6.  prctl extensions
175*4882a593Smuzhiyun--------------------
176*4882a593Smuzhiyun
177*4882a593SmuzhiyunSome new prctl() calls are added to allow programs to manage the SVE vector
178*4882a593Smuzhiyunlength:
179*4882a593Smuzhiyun
180*4882a593Smuzhiyunprctl(PR_SVE_SET_VL, unsigned long arg)
181*4882a593Smuzhiyun
182*4882a593Smuzhiyun    Sets the vector length of the calling thread and related flags, where
183*4882a593Smuzhiyun    arg == vl | flags.  Other threads of the calling process are unaffected.
184*4882a593Smuzhiyun
185*4882a593Smuzhiyun    vl is the desired vector length, where sve_vl_valid(vl) must be true.
186*4882a593Smuzhiyun
187*4882a593Smuzhiyun    flags:
188*4882a593Smuzhiyun
189*4882a593Smuzhiyun	PR_SVE_VL_INHERIT
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun	    Inherit the current vector length across execve().  Otherwise, the
192*4882a593Smuzhiyun	    vector length is reset to the system default at execve().  (See
193*4882a593Smuzhiyun	    Section 9.)
194*4882a593Smuzhiyun
195*4882a593Smuzhiyun	PR_SVE_SET_VL_ONEXEC
196*4882a593Smuzhiyun
197*4882a593Smuzhiyun	    Defer the requested vector length change until the next execve()
198*4882a593Smuzhiyun	    performed by this thread.
199*4882a593Smuzhiyun
200*4882a593Smuzhiyun	    The effect is equivalent to implicit exceution of the following
201*4882a593Smuzhiyun	    call immediately after the next execve() (if any) by the thread:
202*4882a593Smuzhiyun
203*4882a593Smuzhiyun		prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC)
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun	    This allows launching of a new program with a different vector
206*4882a593Smuzhiyun	    length, while avoiding runtime side effects in the caller.
207*4882a593Smuzhiyun
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun	    Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect
210*4882a593Smuzhiyun	    immediately.
211*4882a593Smuzhiyun
212*4882a593Smuzhiyun
213*4882a593Smuzhiyun    Return value: a nonnegative on success, or a negative value on error:
214*4882a593Smuzhiyun	EINVAL: SVE not supported, invalid vector length requested, or
215*4882a593Smuzhiyun	    invalid flags.
216*4882a593Smuzhiyun
217*4882a593Smuzhiyun
218*4882a593Smuzhiyun    On success:
219*4882a593Smuzhiyun
220*4882a593Smuzhiyun    * Either the calling thread's vector length or the deferred vector length
221*4882a593Smuzhiyun      to be applied at the next execve() by the thread (dependent on whether
222*4882a593Smuzhiyun      PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value
223*4882a593Smuzhiyun      supported by the system that is less than or equal to vl.  If vl ==
224*4882a593Smuzhiyun      SVE_VL_MAX, the value set will be the largest value supported by the
225*4882a593Smuzhiyun      system.
226*4882a593Smuzhiyun
227*4882a593Smuzhiyun    * Any previously outstanding deferred vector length change in the calling
228*4882a593Smuzhiyun      thread is cancelled.
229*4882a593Smuzhiyun
230*4882a593Smuzhiyun    * The returned value describes the resulting configuration, encoded as for
231*4882a593Smuzhiyun      PR_SVE_GET_VL.  The vector length reported in this value is the new
232*4882a593Smuzhiyun      current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not
233*4882a593Smuzhiyun      present in arg; otherwise, the reported vector length is the deferred
234*4882a593Smuzhiyun      vector length that will be applied at the next execve() by the calling
235*4882a593Smuzhiyun      thread.
236*4882a593Smuzhiyun
237*4882a593Smuzhiyun    * Changing the vector length causes all of P0..P15, FFR and all bits of
238*4882a593Smuzhiyun      Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
239*4882a593Smuzhiyun      unspecified.  Calling PR_SVE_SET_VL with vl equal to the thread's current
240*4882a593Smuzhiyun      vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC
241*4882a593Smuzhiyun      flag, does not constitute a change to the vector length for this purpose.
242*4882a593Smuzhiyun
243*4882a593Smuzhiyun
244*4882a593Smuzhiyunprctl(PR_SVE_GET_VL)
245*4882a593Smuzhiyun
246*4882a593Smuzhiyun    Gets the vector length of the calling thread.
247*4882a593Smuzhiyun
248*4882a593Smuzhiyun    The following flag may be OR-ed into the result:
249*4882a593Smuzhiyun
250*4882a593Smuzhiyun	PR_SVE_VL_INHERIT
251*4882a593Smuzhiyun
252*4882a593Smuzhiyun	    Vector length will be inherited across execve().
253*4882a593Smuzhiyun
254*4882a593Smuzhiyun    There is no way to determine whether there is an outstanding deferred
255*4882a593Smuzhiyun    vector length change (which would only normally be the case between a
256*4882a593Smuzhiyun    fork() or vfork() and the corresponding execve() in typical use).
257*4882a593Smuzhiyun
258*4882a593Smuzhiyun    To extract the vector length from the result, and it with
259*4882a593Smuzhiyun    PR_SVE_VL_LEN_MASK.
260*4882a593Smuzhiyun
261*4882a593Smuzhiyun    Return value: a nonnegative value on success, or a negative value on error:
262*4882a593Smuzhiyun	EINVAL: SVE not supported.
263*4882a593Smuzhiyun
264*4882a593Smuzhiyun
265*4882a593Smuzhiyun7.  ptrace extensions
266*4882a593Smuzhiyun---------------------
267*4882a593Smuzhiyun
268*4882a593Smuzhiyun* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and
269*4882a593Smuzhiyun  PTRACE_SETREGSET.
270*4882a593Smuzhiyun
271*4882a593Smuzhiyun  Refer to [2] for definitions.
272*4882a593Smuzhiyun
273*4882a593SmuzhiyunThe regset data starts with struct user_sve_header, containing:
274*4882a593Smuzhiyun
275*4882a593Smuzhiyun    size
276*4882a593Smuzhiyun
277*4882a593Smuzhiyun	Size of the complete regset, in bytes.
278*4882a593Smuzhiyun	This depends on vl and possibly on other things in the future.
279*4882a593Smuzhiyun
280*4882a593Smuzhiyun	If a call to PTRACE_GETREGSET requests less data than the value of
281*4882a593Smuzhiyun	size, the caller can allocate a larger buffer and retry in order to
282*4882a593Smuzhiyun	read the complete regset.
283*4882a593Smuzhiyun
284*4882a593Smuzhiyun    max_size
285*4882a593Smuzhiyun
286*4882a593Smuzhiyun	Maximum size in bytes that the regset can grow to for the target
287*4882a593Smuzhiyun	thread.  The regset won't grow bigger than this even if the target
288*4882a593Smuzhiyun	thread changes its vector length etc.
289*4882a593Smuzhiyun
290*4882a593Smuzhiyun    vl
291*4882a593Smuzhiyun
292*4882a593Smuzhiyun	Target thread's current vector length, in bytes.
293*4882a593Smuzhiyun
294*4882a593Smuzhiyun    max_vl
295*4882a593Smuzhiyun
296*4882a593Smuzhiyun	Maximum possible vector length for the target thread.
297*4882a593Smuzhiyun
298*4882a593Smuzhiyun    flags
299*4882a593Smuzhiyun
300*4882a593Smuzhiyun	either
301*4882a593Smuzhiyun
302*4882a593Smuzhiyun	    SVE_PT_REGS_FPSIMD
303*4882a593Smuzhiyun
304*4882a593Smuzhiyun		SVE registers are not live (GETREGSET) or are to be made
305*4882a593Smuzhiyun		non-live (SETREGSET).
306*4882a593Smuzhiyun
307*4882a593Smuzhiyun		The payload is of type struct user_fpsimd_state, with the same
308*4882a593Smuzhiyun		meaning as for NT_PRFPREG, starting at offset
309*4882a593Smuzhiyun		SVE_PT_FPSIMD_OFFSET from the start of user_sve_header.
310*4882a593Smuzhiyun
311*4882a593Smuzhiyun		Extra data might be appended in the future: the size of the
312*4882a593Smuzhiyun		payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags).
313*4882a593Smuzhiyun
314*4882a593Smuzhiyun		vq should be obtained using sve_vq_from_vl(vl).
315*4882a593Smuzhiyun
316*4882a593Smuzhiyun		or
317*4882a593Smuzhiyun
318*4882a593Smuzhiyun	    SVE_PT_REGS_SVE
319*4882a593Smuzhiyun
320*4882a593Smuzhiyun		SVE registers are live (GETREGSET) or are to be made live
321*4882a593Smuzhiyun		(SETREGSET).
322*4882a593Smuzhiyun
323*4882a593Smuzhiyun		The payload contains the SVE register data, starting at offset
324*4882a593Smuzhiyun		SVE_PT_SVE_OFFSET from the start of user_sve_header, and with
325*4882a593Smuzhiyun		size SVE_PT_SVE_SIZE(vq, flags);
326*4882a593Smuzhiyun
327*4882a593Smuzhiyun	... OR-ed with zero or more of the following flags, which have the same
328*4882a593Smuzhiyun	meaning and behaviour as the corresponding PR_SET_VL_* flags:
329*4882a593Smuzhiyun
330*4882a593Smuzhiyun	    SVE_PT_VL_INHERIT
331*4882a593Smuzhiyun
332*4882a593Smuzhiyun	    SVE_PT_VL_ONEXEC (SETREGSET only).
333*4882a593Smuzhiyun
334*4882a593Smuzhiyun* The effects of changing the vector length and/or flags are equivalent to
335*4882a593Smuzhiyun  those documented for PR_SVE_SET_VL.
336*4882a593Smuzhiyun
337*4882a593Smuzhiyun  The caller must make a further GETREGSET call if it needs to know what VL is
338*4882a593Smuzhiyun  actually set by SETREGSET, unless is it known in advance that the requested
339*4882a593Smuzhiyun  VL is supported.
340*4882a593Smuzhiyun
341*4882a593Smuzhiyun* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on
342*4882a593Smuzhiyun  the header fields.  The SVE_PT_SVE_*() macros are provided to facilitate
343*4882a593Smuzhiyun  access to the members.
344*4882a593Smuzhiyun
345*4882a593Smuzhiyun* In either case, for SETREGSET it is permissible to omit the payload, in which
346*4882a593Smuzhiyun  case only the vector length and flags are changed (along with any
347*4882a593Smuzhiyun  consequences of those changes).
348*4882a593Smuzhiyun
349*4882a593Smuzhiyun* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the
350*4882a593Smuzhiyun  requested VL is not supported, the effect will be the same as if the
351*4882a593Smuzhiyun  payload were omitted, except that an EIO error is reported.  No
352*4882a593Smuzhiyun  attempt is made to translate the payload data to the correct layout
353*4882a593Smuzhiyun  for the vector length actually set.  The thread's FPSIMD state is
354*4882a593Smuzhiyun  preserved, but the remaining bits of the SVE registers become
355*4882a593Smuzhiyun  unspecified.  It is up to the caller to translate the payload layout
356*4882a593Smuzhiyun  for the actual VL and retry.
357*4882a593Smuzhiyun
358*4882a593Smuzhiyun* The effect of writing a partial, incomplete payload is unspecified.
359*4882a593Smuzhiyun
360*4882a593Smuzhiyun
361*4882a593Smuzhiyun8.  ELF coredump extensions
362*4882a593Smuzhiyun---------------------------
363*4882a593Smuzhiyun
364*4882a593Smuzhiyun* A NT_ARM_SVE note will be added to each coredump for each thread of the
365*4882a593Smuzhiyun  dumped process.  The contents will be equivalent to the data that would have
366*4882a593Smuzhiyun  been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread
367*4882a593Smuzhiyun  when the coredump was generated.
368*4882a593Smuzhiyun
369*4882a593Smuzhiyun
370*4882a593Smuzhiyun9.  System runtime configuration
371*4882a593Smuzhiyun--------------------------------
372*4882a593Smuzhiyun
373*4882a593Smuzhiyun* To mitigate the ABI impact of expansion of the signal frame, a policy
374*4882a593Smuzhiyun  mechanism is provided for administrators, distro maintainers and developers
375*4882a593Smuzhiyun  to set the default vector length for userspace processes:
376*4882a593Smuzhiyun
377*4882a593Smuzhiyun/proc/sys/abi/sve_default_vector_length
378*4882a593Smuzhiyun
379*4882a593Smuzhiyun    Writing the text representation of an integer to this file sets the system
380*4882a593Smuzhiyun    default vector length to the specified value, unless the value is greater
381*4882a593Smuzhiyun    than the maximum vector length supported by the system in which case the
382*4882a593Smuzhiyun    default vector length is set to that maximum.
383*4882a593Smuzhiyun
384*4882a593Smuzhiyun    The result can be determined by reopening the file and reading its
385*4882a593Smuzhiyun    contents.
386*4882a593Smuzhiyun
387*4882a593Smuzhiyun    At boot, the default vector length is initially set to 64 or the maximum
388*4882a593Smuzhiyun    supported vector length, whichever is smaller.  This determines the initial
389*4882a593Smuzhiyun    vector length of the init process (PID 1).
390*4882a593Smuzhiyun
391*4882a593Smuzhiyun    Reading this file returns the current system default vector length.
392*4882a593Smuzhiyun
393*4882a593Smuzhiyun* At every execve() call, the new vector length of the new process is set to
394*4882a593Smuzhiyun  the system default vector length, unless
395*4882a593Smuzhiyun
396*4882a593Smuzhiyun    * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the
397*4882a593Smuzhiyun      calling thread, or
398*4882a593Smuzhiyun
399*4882a593Smuzhiyun    * a deferred vector length change is pending, established via the
400*4882a593Smuzhiyun      PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC).
401*4882a593Smuzhiyun
402*4882a593Smuzhiyun* Modifying the system default vector length does not affect the vector length
403*4882a593Smuzhiyun  of any existing process or thread that does not make an execve() call.
404*4882a593Smuzhiyun
405*4882a593Smuzhiyun
406*4882a593SmuzhiyunAppendix A.  SVE programmer's model (informative)
407*4882a593Smuzhiyun=================================================
408*4882a593Smuzhiyun
409*4882a593SmuzhiyunThis section provides a minimal description of the additions made by SVE to the
410*4882a593SmuzhiyunARMv8-A programmer's model that are relevant to this document.
411*4882a593Smuzhiyun
412*4882a593SmuzhiyunNote: This section is for information only and not intended to be complete or
413*4882a593Smuzhiyunto replace any architectural specification.
414*4882a593Smuzhiyun
415*4882a593SmuzhiyunA.1.  Registers
416*4882a593Smuzhiyun---------------
417*4882a593Smuzhiyun
418*4882a593SmuzhiyunIn A64 state, SVE adds the following:
419*4882a593Smuzhiyun
420*4882a593Smuzhiyun* 32 8VL-bit vector registers Z0..Z31
421*4882a593Smuzhiyun  For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn.
422*4882a593Smuzhiyun
423*4882a593Smuzhiyun  A register write using a Vn register name zeros all bits of the corresponding
424*4882a593Smuzhiyun  Zn except for bits [127:0].
425*4882a593Smuzhiyun
426*4882a593Smuzhiyun* 16 VL-bit predicate registers P0..P15
427*4882a593Smuzhiyun
428*4882a593Smuzhiyun* 1 VL-bit special-purpose predicate register FFR (the "first-fault register")
429*4882a593Smuzhiyun
430*4882a593Smuzhiyun* a VL "pseudo-register" that determines the size of each vector register
431*4882a593Smuzhiyun
432*4882a593Smuzhiyun  The SVE instruction set architecture provides no way to write VL directly.
433*4882a593Smuzhiyun  Instead, it can be modified only by EL1 and above, by writing appropriate
434*4882a593Smuzhiyun  system registers.
435*4882a593Smuzhiyun
436*4882a593Smuzhiyun* The value of VL can be configured at runtime by EL1 and above:
437*4882a593Smuzhiyun  16 <= VL <= VLmax, where VL must be a multiple of 16.
438*4882a593Smuzhiyun
439*4882a593Smuzhiyun* The maximum vector length is determined by the hardware:
440*4882a593Smuzhiyun  16 <= VLmax <= 256.
441*4882a593Smuzhiyun
442*4882a593Smuzhiyun  (The SVE architecture specifies 256, but permits future architecture
443*4882a593Smuzhiyun  revisions to raise this limit.)
444*4882a593Smuzhiyun
445*4882a593Smuzhiyun* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point
446*4882a593Smuzhiyun  operations in a similar way to the way in which they interact with ARMv8
447*4882a593Smuzhiyun  floating-point operations::
448*4882a593Smuzhiyun
449*4882a593Smuzhiyun         8VL-1                       128               0  bit index
450*4882a593Smuzhiyun        +----          ////            -----------------+
451*4882a593Smuzhiyun     Z0 |                               :       V0      |
452*4882a593Smuzhiyun      :                                          :
453*4882a593Smuzhiyun     Z7 |                               :       V7      |
454*4882a593Smuzhiyun     Z8 |                               :     * V8      |
455*4882a593Smuzhiyun      :                                       :  :
456*4882a593Smuzhiyun    Z15 |                               :     *V15      |
457*4882a593Smuzhiyun    Z16 |                               :      V16      |
458*4882a593Smuzhiyun      :                                          :
459*4882a593Smuzhiyun    Z31 |                               :      V31      |
460*4882a593Smuzhiyun        +----          ////            -----------------+
461*4882a593Smuzhiyun                                                 31    0
462*4882a593Smuzhiyun         VL-1                  0                +-------+
463*4882a593Smuzhiyun        +----       ////      --+          FPSR |       |
464*4882a593Smuzhiyun     P0 |                       |               +-------+
465*4882a593Smuzhiyun      : |                       |         *FPCR |       |
466*4882a593Smuzhiyun    P15 |                       |               +-------+
467*4882a593Smuzhiyun        +----       ////      --+
468*4882a593Smuzhiyun    FFR |                       |               +-----+
469*4882a593Smuzhiyun        +----       ////      --+            VL |     |
470*4882a593Smuzhiyun                                                +-----+
471*4882a593Smuzhiyun
472*4882a593Smuzhiyun(*) callee-save:
473*4882a593Smuzhiyun    This only applies to bits [63:0] of Z-/V-registers.
474*4882a593Smuzhiyun    FPCR contains callee-save and caller-save bits.  See [4] for details.
475*4882a593Smuzhiyun
476*4882a593Smuzhiyun
477*4882a593SmuzhiyunA.2.  Procedure call standard
478*4882a593Smuzhiyun-----------------------------
479*4882a593Smuzhiyun
480*4882a593SmuzhiyunThe ARMv8-A base procedure call standard is extended as follows with respect to
481*4882a593Smuzhiyunthe additional SVE register state:
482*4882a593Smuzhiyun
483*4882a593Smuzhiyun* All SVE register bits that are not shared with FP/SIMD are caller-save.
484*4882a593Smuzhiyun
485*4882a593Smuzhiyun* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save.
486*4882a593Smuzhiyun
487*4882a593Smuzhiyun  This follows from the way these bits are mapped to V8..V15, which are caller-
488*4882a593Smuzhiyun  save in the base procedure call standard.
489*4882a593Smuzhiyun
490*4882a593Smuzhiyun
491*4882a593SmuzhiyunAppendix B.  ARMv8-A FP/SIMD programmer's model
492*4882a593Smuzhiyun===============================================
493*4882a593Smuzhiyun
494*4882a593SmuzhiyunNote: This section is for information only and not intended to be complete or
495*4882a593Smuzhiyunto replace any architectural specification.
496*4882a593Smuzhiyun
497*4882a593SmuzhiyunRefer to [4] for more information.
498*4882a593Smuzhiyun
499*4882a593SmuzhiyunARMv8-A defines the following floating-point / SIMD register state:
500*4882a593Smuzhiyun
501*4882a593Smuzhiyun* 32 128-bit vector registers V0..V31
502*4882a593Smuzhiyun* 2 32-bit status/control registers FPSR, FPCR
503*4882a593Smuzhiyun
504*4882a593Smuzhiyun::
505*4882a593Smuzhiyun
506*4882a593Smuzhiyun         127           0  bit index
507*4882a593Smuzhiyun        +---------------+
508*4882a593Smuzhiyun     V0 |               |
509*4882a593Smuzhiyun      : :               :
510*4882a593Smuzhiyun     V7 |               |
511*4882a593Smuzhiyun   * V8 |               |
512*4882a593Smuzhiyun   :  : :               :
513*4882a593Smuzhiyun   *V15 |               |
514*4882a593Smuzhiyun    V16 |               |
515*4882a593Smuzhiyun      : :               :
516*4882a593Smuzhiyun    V31 |               |
517*4882a593Smuzhiyun        +---------------+
518*4882a593Smuzhiyun
519*4882a593Smuzhiyun                 31    0
520*4882a593Smuzhiyun                +-------+
521*4882a593Smuzhiyun           FPSR |       |
522*4882a593Smuzhiyun                +-------+
523*4882a593Smuzhiyun          *FPCR |       |
524*4882a593Smuzhiyun                +-------+
525*4882a593Smuzhiyun
526*4882a593Smuzhiyun(*) callee-save:
527*4882a593Smuzhiyun    This only applies to bits [63:0] of V-registers.
528*4882a593Smuzhiyun    FPCR contains a mixture of callee-save and caller-save bits.
529*4882a593Smuzhiyun
530*4882a593Smuzhiyun
531*4882a593SmuzhiyunReferences
532*4882a593Smuzhiyun==========
533*4882a593Smuzhiyun
534*4882a593Smuzhiyun[1] arch/arm64/include/uapi/asm/sigcontext.h
535*4882a593Smuzhiyun    AArch64 Linux signal ABI definitions
536*4882a593Smuzhiyun
537*4882a593Smuzhiyun[2] arch/arm64/include/uapi/asm/ptrace.h
538*4882a593Smuzhiyun    AArch64 Linux ptrace ABI definitions
539*4882a593Smuzhiyun
540*4882a593Smuzhiyun[3] Documentation/arm64/cpu-feature-registers.rst
541*4882a593Smuzhiyun
542*4882a593Smuzhiyun[4] ARM IHI0055C
543*4882a593Smuzhiyun    http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
544*4882a593Smuzhiyun    http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
545*4882a593Smuzhiyun    Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
546