1*4882a593Smuzhiyun=================================================== 2*4882a593SmuzhiyunScalable Vector Extension support for AArch64 Linux 3*4882a593Smuzhiyun=================================================== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunAuthor: Dave Martin <Dave.Martin@arm.com> 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunDate: 4 August 2017 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunThis document outlines briefly the interface provided to userspace by Linux in 10*4882a593Smuzhiyunorder to support use of the ARM Scalable Vector Extension (SVE). 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunThis is an outline of the most important features and issues only and not 13*4882a593Smuzhiyunintended to be exhaustive. 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunThis document does not aim to describe the SVE architecture or programmer's 16*4882a593Smuzhiyunmodel. To aid understanding, a minimal description of relevant programmer's 17*4882a593Smuzhiyunmodel features for SVE is included in Appendix A. 18*4882a593Smuzhiyun 19*4882a593Smuzhiyun 20*4882a593Smuzhiyun1. General 21*4882a593Smuzhiyun----------- 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are 24*4882a593Smuzhiyun tracked per-thread. 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector 27*4882a593Smuzhiyun AT_HWCAP entry. Presence of this flag implies the presence of the SVE 28*4882a593Smuzhiyun instructions and registers, and the Linux-specific system interfaces 29*4882a593Smuzhiyun described in this document. SVE is reported in /proc/cpuinfo as "sve". 30*4882a593Smuzhiyun 31*4882a593Smuzhiyun* Support for the execution of SVE instructions in userspace can also be 32*4882a593Smuzhiyun detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS 33*4882a593Smuzhiyun instruction, and checking that the value of the SVE field is nonzero. [3] 34*4882a593Smuzhiyun 35*4882a593Smuzhiyun It does not guarantee the presence of the system interfaces described in the 36*4882a593Smuzhiyun following sections: software that needs to verify that those interfaces are 37*4882a593Smuzhiyun present must check for HWCAP_SVE instead. 38*4882a593Smuzhiyun 39*4882a593Smuzhiyun* On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also 40*4882a593Smuzhiyun be reported in the AT_HWCAP2 aux vector entry. In addition to this, 41*4882a593Smuzhiyun optional extensions to SVE2 may be reported by the presence of: 42*4882a593Smuzhiyun 43*4882a593Smuzhiyun HWCAP2_SVE2 44*4882a593Smuzhiyun HWCAP2_SVEAES 45*4882a593Smuzhiyun HWCAP2_SVEPMULL 46*4882a593Smuzhiyun HWCAP2_SVEBITPERM 47*4882a593Smuzhiyun HWCAP2_SVESHA3 48*4882a593Smuzhiyun HWCAP2_SVESM4 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun This list may be extended over time as the SVE architecture evolves. 51*4882a593Smuzhiyun 52*4882a593Smuzhiyun These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1, 53*4882a593Smuzhiyun which userspace can read using an MRS instruction. See elf_hwcaps.txt and 54*4882a593Smuzhiyun cpu-feature-registers.txt for details. 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun* Debuggers should restrict themselves to interacting with the target via the 57*4882a593Smuzhiyun NT_ARM_SVE regset. The recommended way of detecting support for this regset 58*4882a593Smuzhiyun is to connect to a target process first and then attempt a 59*4882a593Smuzhiyun ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory 62*4882a593Smuzhiyun between userspace and the kernel, the register value is encoded in memory in 63*4882a593Smuzhiyun an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at 64*4882a593Smuzhiyun byte offset i from the start of the memory representation. This affects for 65*4882a593Smuzhiyun example the signal frame (struct sve_context) and ptrace interface 66*4882a593Smuzhiyun (struct user_sve_header) and associated data. 67*4882a593Smuzhiyun 68*4882a593Smuzhiyun Beware that on big-endian systems this results in a different byte order than 69*4882a593Smuzhiyun for the FPSIMD V-registers, which are stored as single host-endian 128-bit 70*4882a593Smuzhiyun values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at 71*4882a593Smuzhiyun byte offset i. (struct fpsimd_context, struct user_fpsimd_state). 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun 74*4882a593Smuzhiyun2. Vector length terminology 75*4882a593Smuzhiyun----------------------------- 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunThe size of an SVE vector (Z) register is referred to as the "vector length". 78*4882a593Smuzhiyun 79*4882a593SmuzhiyunTo avoid confusion about the units used to express vector length, the kernel 80*4882a593Smuzhiyunadopts the following conventions: 81*4882a593Smuzhiyun 82*4882a593Smuzhiyun* Vector length (VL) = size of a Z-register in bytes 83*4882a593Smuzhiyun 84*4882a593Smuzhiyun* Vector quadwords (VQ) = size of a Z-register in units of 128 bits 85*4882a593Smuzhiyun 86*4882a593Smuzhiyun(So, VL = 16 * VQ.) 87*4882a593Smuzhiyun 88*4882a593SmuzhiyunThe VQ convention is used where the underlying granularity is important, such 89*4882a593Smuzhiyunas in data structure definitions. In most other situations, the VL convention 90*4882a593Smuzhiyunis used. This is consistent with the meaning of the "VL" pseudo-register in 91*4882a593Smuzhiyunthe SVE instruction set architecture. 92*4882a593Smuzhiyun 93*4882a593Smuzhiyun 94*4882a593Smuzhiyun3. System call behaviour 95*4882a593Smuzhiyun------------------------- 96*4882a593Smuzhiyun 97*4882a593Smuzhiyun* On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of 98*4882a593Smuzhiyun Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR 99*4882a593Smuzhiyun become unspecified on return from a syscall. 100*4882a593Smuzhiyun 101*4882a593Smuzhiyun* The SVE registers are not used to pass arguments to or receive results from 102*4882a593Smuzhiyun any syscall. 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun* In practice the affected registers/bits will be preserved or will be replaced 105*4882a593Smuzhiyun with zeros on return from a syscall, but userspace should not make 106*4882a593Smuzhiyun assumptions about this. The kernel behaviour may vary on a case-by-case 107*4882a593Smuzhiyun basis. 108*4882a593Smuzhiyun 109*4882a593Smuzhiyun* All other SVE state of a thread, including the currently configured vector 110*4882a593Smuzhiyun length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector 111*4882a593Smuzhiyun length (if any), is preserved across all syscalls, subject to the specific 112*4882a593Smuzhiyun exceptions for execve() described in section 6. 113*4882a593Smuzhiyun 114*4882a593Smuzhiyun In particular, on return from a fork() or clone(), the parent and new child 115*4882a593Smuzhiyun process or thread share identical SVE configuration, matching that of the 116*4882a593Smuzhiyun parent before the call. 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun 119*4882a593Smuzhiyun4. Signal handling 120*4882a593Smuzhiyun------------------- 121*4882a593Smuzhiyun 122*4882a593Smuzhiyun* A new signal frame record sve_context encodes the SVE registers on signal 123*4882a593Smuzhiyun delivery. [1] 124*4882a593Smuzhiyun 125*4882a593Smuzhiyun* This record is supplementary to fpsimd_context. The FPSR and FPCR registers 126*4882a593Smuzhiyun are only present in fpsimd_context. For convenience, the content of V0..V31 127*4882a593Smuzhiyun is duplicated between sve_context and fpsimd_context. 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun* The signal frame record for SVE always contains basic metadata, in particular 130*4882a593Smuzhiyun the thread's vector length (in sve_context.vl). 131*4882a593Smuzhiyun 132*4882a593Smuzhiyun* The SVE registers may or may not be included in the record, depending on 133*4882a593Smuzhiyun whether the registers are live for the thread. The registers are present if 134*4882a593Smuzhiyun and only if: 135*4882a593Smuzhiyun sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)). 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun* If the registers are present, the remainder of the record has a vl-dependent 138*4882a593Smuzhiyun size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to 139*4882a593Smuzhiyun the members. 140*4882a593Smuzhiyun 141*4882a593Smuzhiyun* Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant 142*4882a593Smuzhiyun layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the 143*4882a593Smuzhiyun start of the register's representation in memory. 144*4882a593Smuzhiyun 145*4882a593Smuzhiyun* If the SVE context is too big to fit in sigcontext.__reserved[], then extra 146*4882a593Smuzhiyun space is allocated on the stack, an extra_context record is written in 147*4882a593Smuzhiyun __reserved[] referencing this space. sve_context is then written in the 148*4882a593Smuzhiyun extra space. Refer to [1] for further details about this mechanism. 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun 151*4882a593Smuzhiyun5. Signal return 152*4882a593Smuzhiyun----------------- 153*4882a593Smuzhiyun 154*4882a593SmuzhiyunWhen returning from a signal handler: 155*4882a593Smuzhiyun 156*4882a593Smuzhiyun* If there is no sve_context record in the signal frame, or if the record is 157*4882a593Smuzhiyun present but contains no register data as desribed in the previous section, 158*4882a593Smuzhiyun then the SVE registers/bits become non-live and take unspecified values. 159*4882a593Smuzhiyun 160*4882a593Smuzhiyun* If sve_context is present in the signal frame and contains full register 161*4882a593Smuzhiyun data, the SVE registers become live and are populated with the specified 162*4882a593Smuzhiyun data. However, for backward compatibility reasons, bits [127:0] of Z0..Z31 163*4882a593Smuzhiyun are always restored from the corresponding members of fpsimd_context.vregs[] 164*4882a593Smuzhiyun and not from sve_context. The remaining bits are restored from sve_context. 165*4882a593Smuzhiyun 166*4882a593Smuzhiyun* Inclusion of fpsimd_context in the signal frame remains mandatory, 167*4882a593Smuzhiyun irrespective of whether sve_context is present or not. 168*4882a593Smuzhiyun 169*4882a593Smuzhiyun* The vector length cannot be changed via signal return. If sve_context.vl in 170*4882a593Smuzhiyun the signal frame does not match the current vector length, the signal return 171*4882a593Smuzhiyun attempt is treated as illegal, resulting in a forced SIGSEGV. 172*4882a593Smuzhiyun 173*4882a593Smuzhiyun 174*4882a593Smuzhiyun6. prctl extensions 175*4882a593Smuzhiyun-------------------- 176*4882a593Smuzhiyun 177*4882a593SmuzhiyunSome new prctl() calls are added to allow programs to manage the SVE vector 178*4882a593Smuzhiyunlength: 179*4882a593Smuzhiyun 180*4882a593Smuzhiyunprctl(PR_SVE_SET_VL, unsigned long arg) 181*4882a593Smuzhiyun 182*4882a593Smuzhiyun Sets the vector length of the calling thread and related flags, where 183*4882a593Smuzhiyun arg == vl | flags. Other threads of the calling process are unaffected. 184*4882a593Smuzhiyun 185*4882a593Smuzhiyun vl is the desired vector length, where sve_vl_valid(vl) must be true. 186*4882a593Smuzhiyun 187*4882a593Smuzhiyun flags: 188*4882a593Smuzhiyun 189*4882a593Smuzhiyun PR_SVE_VL_INHERIT 190*4882a593Smuzhiyun 191*4882a593Smuzhiyun Inherit the current vector length across execve(). Otherwise, the 192*4882a593Smuzhiyun vector length is reset to the system default at execve(). (See 193*4882a593Smuzhiyun Section 9.) 194*4882a593Smuzhiyun 195*4882a593Smuzhiyun PR_SVE_SET_VL_ONEXEC 196*4882a593Smuzhiyun 197*4882a593Smuzhiyun Defer the requested vector length change until the next execve() 198*4882a593Smuzhiyun performed by this thread. 199*4882a593Smuzhiyun 200*4882a593Smuzhiyun The effect is equivalent to implicit exceution of the following 201*4882a593Smuzhiyun call immediately after the next execve() (if any) by the thread: 202*4882a593Smuzhiyun 203*4882a593Smuzhiyun prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC) 204*4882a593Smuzhiyun 205*4882a593Smuzhiyun This allows launching of a new program with a different vector 206*4882a593Smuzhiyun length, while avoiding runtime side effects in the caller. 207*4882a593Smuzhiyun 208*4882a593Smuzhiyun 209*4882a593Smuzhiyun Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect 210*4882a593Smuzhiyun immediately. 211*4882a593Smuzhiyun 212*4882a593Smuzhiyun 213*4882a593Smuzhiyun Return value: a nonnegative on success, or a negative value on error: 214*4882a593Smuzhiyun EINVAL: SVE not supported, invalid vector length requested, or 215*4882a593Smuzhiyun invalid flags. 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun 218*4882a593Smuzhiyun On success: 219*4882a593Smuzhiyun 220*4882a593Smuzhiyun * Either the calling thread's vector length or the deferred vector length 221*4882a593Smuzhiyun to be applied at the next execve() by the thread (dependent on whether 222*4882a593Smuzhiyun PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value 223*4882a593Smuzhiyun supported by the system that is less than or equal to vl. If vl == 224*4882a593Smuzhiyun SVE_VL_MAX, the value set will be the largest value supported by the 225*4882a593Smuzhiyun system. 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun * Any previously outstanding deferred vector length change in the calling 228*4882a593Smuzhiyun thread is cancelled. 229*4882a593Smuzhiyun 230*4882a593Smuzhiyun * The returned value describes the resulting configuration, encoded as for 231*4882a593Smuzhiyun PR_SVE_GET_VL. The vector length reported in this value is the new 232*4882a593Smuzhiyun current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not 233*4882a593Smuzhiyun present in arg; otherwise, the reported vector length is the deferred 234*4882a593Smuzhiyun vector length that will be applied at the next execve() by the calling 235*4882a593Smuzhiyun thread. 236*4882a593Smuzhiyun 237*4882a593Smuzhiyun * Changing the vector length causes all of P0..P15, FFR and all bits of 238*4882a593Smuzhiyun Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become 239*4882a593Smuzhiyun unspecified. Calling PR_SVE_SET_VL with vl equal to the thread's current 240*4882a593Smuzhiyun vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC 241*4882a593Smuzhiyun flag, does not constitute a change to the vector length for this purpose. 242*4882a593Smuzhiyun 243*4882a593Smuzhiyun 244*4882a593Smuzhiyunprctl(PR_SVE_GET_VL) 245*4882a593Smuzhiyun 246*4882a593Smuzhiyun Gets the vector length of the calling thread. 247*4882a593Smuzhiyun 248*4882a593Smuzhiyun The following flag may be OR-ed into the result: 249*4882a593Smuzhiyun 250*4882a593Smuzhiyun PR_SVE_VL_INHERIT 251*4882a593Smuzhiyun 252*4882a593Smuzhiyun Vector length will be inherited across execve(). 253*4882a593Smuzhiyun 254*4882a593Smuzhiyun There is no way to determine whether there is an outstanding deferred 255*4882a593Smuzhiyun vector length change (which would only normally be the case between a 256*4882a593Smuzhiyun fork() or vfork() and the corresponding execve() in typical use). 257*4882a593Smuzhiyun 258*4882a593Smuzhiyun To extract the vector length from the result, and it with 259*4882a593Smuzhiyun PR_SVE_VL_LEN_MASK. 260*4882a593Smuzhiyun 261*4882a593Smuzhiyun Return value: a nonnegative value on success, or a negative value on error: 262*4882a593Smuzhiyun EINVAL: SVE not supported. 263*4882a593Smuzhiyun 264*4882a593Smuzhiyun 265*4882a593Smuzhiyun7. ptrace extensions 266*4882a593Smuzhiyun--------------------- 267*4882a593Smuzhiyun 268*4882a593Smuzhiyun* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and 269*4882a593Smuzhiyun PTRACE_SETREGSET. 270*4882a593Smuzhiyun 271*4882a593Smuzhiyun Refer to [2] for definitions. 272*4882a593Smuzhiyun 273*4882a593SmuzhiyunThe regset data starts with struct user_sve_header, containing: 274*4882a593Smuzhiyun 275*4882a593Smuzhiyun size 276*4882a593Smuzhiyun 277*4882a593Smuzhiyun Size of the complete regset, in bytes. 278*4882a593Smuzhiyun This depends on vl and possibly on other things in the future. 279*4882a593Smuzhiyun 280*4882a593Smuzhiyun If a call to PTRACE_GETREGSET requests less data than the value of 281*4882a593Smuzhiyun size, the caller can allocate a larger buffer and retry in order to 282*4882a593Smuzhiyun read the complete regset. 283*4882a593Smuzhiyun 284*4882a593Smuzhiyun max_size 285*4882a593Smuzhiyun 286*4882a593Smuzhiyun Maximum size in bytes that the regset can grow to for the target 287*4882a593Smuzhiyun thread. The regset won't grow bigger than this even if the target 288*4882a593Smuzhiyun thread changes its vector length etc. 289*4882a593Smuzhiyun 290*4882a593Smuzhiyun vl 291*4882a593Smuzhiyun 292*4882a593Smuzhiyun Target thread's current vector length, in bytes. 293*4882a593Smuzhiyun 294*4882a593Smuzhiyun max_vl 295*4882a593Smuzhiyun 296*4882a593Smuzhiyun Maximum possible vector length for the target thread. 297*4882a593Smuzhiyun 298*4882a593Smuzhiyun flags 299*4882a593Smuzhiyun 300*4882a593Smuzhiyun either 301*4882a593Smuzhiyun 302*4882a593Smuzhiyun SVE_PT_REGS_FPSIMD 303*4882a593Smuzhiyun 304*4882a593Smuzhiyun SVE registers are not live (GETREGSET) or are to be made 305*4882a593Smuzhiyun non-live (SETREGSET). 306*4882a593Smuzhiyun 307*4882a593Smuzhiyun The payload is of type struct user_fpsimd_state, with the same 308*4882a593Smuzhiyun meaning as for NT_PRFPREG, starting at offset 309*4882a593Smuzhiyun SVE_PT_FPSIMD_OFFSET from the start of user_sve_header. 310*4882a593Smuzhiyun 311*4882a593Smuzhiyun Extra data might be appended in the future: the size of the 312*4882a593Smuzhiyun payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags). 313*4882a593Smuzhiyun 314*4882a593Smuzhiyun vq should be obtained using sve_vq_from_vl(vl). 315*4882a593Smuzhiyun 316*4882a593Smuzhiyun or 317*4882a593Smuzhiyun 318*4882a593Smuzhiyun SVE_PT_REGS_SVE 319*4882a593Smuzhiyun 320*4882a593Smuzhiyun SVE registers are live (GETREGSET) or are to be made live 321*4882a593Smuzhiyun (SETREGSET). 322*4882a593Smuzhiyun 323*4882a593Smuzhiyun The payload contains the SVE register data, starting at offset 324*4882a593Smuzhiyun SVE_PT_SVE_OFFSET from the start of user_sve_header, and with 325*4882a593Smuzhiyun size SVE_PT_SVE_SIZE(vq, flags); 326*4882a593Smuzhiyun 327*4882a593Smuzhiyun ... OR-ed with zero or more of the following flags, which have the same 328*4882a593Smuzhiyun meaning and behaviour as the corresponding PR_SET_VL_* flags: 329*4882a593Smuzhiyun 330*4882a593Smuzhiyun SVE_PT_VL_INHERIT 331*4882a593Smuzhiyun 332*4882a593Smuzhiyun SVE_PT_VL_ONEXEC (SETREGSET only). 333*4882a593Smuzhiyun 334*4882a593Smuzhiyun* The effects of changing the vector length and/or flags are equivalent to 335*4882a593Smuzhiyun those documented for PR_SVE_SET_VL. 336*4882a593Smuzhiyun 337*4882a593Smuzhiyun The caller must make a further GETREGSET call if it needs to know what VL is 338*4882a593Smuzhiyun actually set by SETREGSET, unless is it known in advance that the requested 339*4882a593Smuzhiyun VL is supported. 340*4882a593Smuzhiyun 341*4882a593Smuzhiyun* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on 342*4882a593Smuzhiyun the header fields. The SVE_PT_SVE_*() macros are provided to facilitate 343*4882a593Smuzhiyun access to the members. 344*4882a593Smuzhiyun 345*4882a593Smuzhiyun* In either case, for SETREGSET it is permissible to omit the payload, in which 346*4882a593Smuzhiyun case only the vector length and flags are changed (along with any 347*4882a593Smuzhiyun consequences of those changes). 348*4882a593Smuzhiyun 349*4882a593Smuzhiyun* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the 350*4882a593Smuzhiyun requested VL is not supported, the effect will be the same as if the 351*4882a593Smuzhiyun payload were omitted, except that an EIO error is reported. No 352*4882a593Smuzhiyun attempt is made to translate the payload data to the correct layout 353*4882a593Smuzhiyun for the vector length actually set. The thread's FPSIMD state is 354*4882a593Smuzhiyun preserved, but the remaining bits of the SVE registers become 355*4882a593Smuzhiyun unspecified. It is up to the caller to translate the payload layout 356*4882a593Smuzhiyun for the actual VL and retry. 357*4882a593Smuzhiyun 358*4882a593Smuzhiyun* The effect of writing a partial, incomplete payload is unspecified. 359*4882a593Smuzhiyun 360*4882a593Smuzhiyun 361*4882a593Smuzhiyun8. ELF coredump extensions 362*4882a593Smuzhiyun--------------------------- 363*4882a593Smuzhiyun 364*4882a593Smuzhiyun* A NT_ARM_SVE note will be added to each coredump for each thread of the 365*4882a593Smuzhiyun dumped process. The contents will be equivalent to the data that would have 366*4882a593Smuzhiyun been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread 367*4882a593Smuzhiyun when the coredump was generated. 368*4882a593Smuzhiyun 369*4882a593Smuzhiyun 370*4882a593Smuzhiyun9. System runtime configuration 371*4882a593Smuzhiyun-------------------------------- 372*4882a593Smuzhiyun 373*4882a593Smuzhiyun* To mitigate the ABI impact of expansion of the signal frame, a policy 374*4882a593Smuzhiyun mechanism is provided for administrators, distro maintainers and developers 375*4882a593Smuzhiyun to set the default vector length for userspace processes: 376*4882a593Smuzhiyun 377*4882a593Smuzhiyun/proc/sys/abi/sve_default_vector_length 378*4882a593Smuzhiyun 379*4882a593Smuzhiyun Writing the text representation of an integer to this file sets the system 380*4882a593Smuzhiyun default vector length to the specified value, unless the value is greater 381*4882a593Smuzhiyun than the maximum vector length supported by the system in which case the 382*4882a593Smuzhiyun default vector length is set to that maximum. 383*4882a593Smuzhiyun 384*4882a593Smuzhiyun The result can be determined by reopening the file and reading its 385*4882a593Smuzhiyun contents. 386*4882a593Smuzhiyun 387*4882a593Smuzhiyun At boot, the default vector length is initially set to 64 or the maximum 388*4882a593Smuzhiyun supported vector length, whichever is smaller. This determines the initial 389*4882a593Smuzhiyun vector length of the init process (PID 1). 390*4882a593Smuzhiyun 391*4882a593Smuzhiyun Reading this file returns the current system default vector length. 392*4882a593Smuzhiyun 393*4882a593Smuzhiyun* At every execve() call, the new vector length of the new process is set to 394*4882a593Smuzhiyun the system default vector length, unless 395*4882a593Smuzhiyun 396*4882a593Smuzhiyun * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the 397*4882a593Smuzhiyun calling thread, or 398*4882a593Smuzhiyun 399*4882a593Smuzhiyun * a deferred vector length change is pending, established via the 400*4882a593Smuzhiyun PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC). 401*4882a593Smuzhiyun 402*4882a593Smuzhiyun* Modifying the system default vector length does not affect the vector length 403*4882a593Smuzhiyun of any existing process or thread that does not make an execve() call. 404*4882a593Smuzhiyun 405*4882a593Smuzhiyun 406*4882a593SmuzhiyunAppendix A. SVE programmer's model (informative) 407*4882a593Smuzhiyun================================================= 408*4882a593Smuzhiyun 409*4882a593SmuzhiyunThis section provides a minimal description of the additions made by SVE to the 410*4882a593SmuzhiyunARMv8-A programmer's model that are relevant to this document. 411*4882a593Smuzhiyun 412*4882a593SmuzhiyunNote: This section is for information only and not intended to be complete or 413*4882a593Smuzhiyunto replace any architectural specification. 414*4882a593Smuzhiyun 415*4882a593SmuzhiyunA.1. Registers 416*4882a593Smuzhiyun--------------- 417*4882a593Smuzhiyun 418*4882a593SmuzhiyunIn A64 state, SVE adds the following: 419*4882a593Smuzhiyun 420*4882a593Smuzhiyun* 32 8VL-bit vector registers Z0..Z31 421*4882a593Smuzhiyun For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn. 422*4882a593Smuzhiyun 423*4882a593Smuzhiyun A register write using a Vn register name zeros all bits of the corresponding 424*4882a593Smuzhiyun Zn except for bits [127:0]. 425*4882a593Smuzhiyun 426*4882a593Smuzhiyun* 16 VL-bit predicate registers P0..P15 427*4882a593Smuzhiyun 428*4882a593Smuzhiyun* 1 VL-bit special-purpose predicate register FFR (the "first-fault register") 429*4882a593Smuzhiyun 430*4882a593Smuzhiyun* a VL "pseudo-register" that determines the size of each vector register 431*4882a593Smuzhiyun 432*4882a593Smuzhiyun The SVE instruction set architecture provides no way to write VL directly. 433*4882a593Smuzhiyun Instead, it can be modified only by EL1 and above, by writing appropriate 434*4882a593Smuzhiyun system registers. 435*4882a593Smuzhiyun 436*4882a593Smuzhiyun* The value of VL can be configured at runtime by EL1 and above: 437*4882a593Smuzhiyun 16 <= VL <= VLmax, where VL must be a multiple of 16. 438*4882a593Smuzhiyun 439*4882a593Smuzhiyun* The maximum vector length is determined by the hardware: 440*4882a593Smuzhiyun 16 <= VLmax <= 256. 441*4882a593Smuzhiyun 442*4882a593Smuzhiyun (The SVE architecture specifies 256, but permits future architecture 443*4882a593Smuzhiyun revisions to raise this limit.) 444*4882a593Smuzhiyun 445*4882a593Smuzhiyun* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point 446*4882a593Smuzhiyun operations in a similar way to the way in which they interact with ARMv8 447*4882a593Smuzhiyun floating-point operations:: 448*4882a593Smuzhiyun 449*4882a593Smuzhiyun 8VL-1 128 0 bit index 450*4882a593Smuzhiyun +---- //// -----------------+ 451*4882a593Smuzhiyun Z0 | : V0 | 452*4882a593Smuzhiyun : : 453*4882a593Smuzhiyun Z7 | : V7 | 454*4882a593Smuzhiyun Z8 | : * V8 | 455*4882a593Smuzhiyun : : : 456*4882a593Smuzhiyun Z15 | : *V15 | 457*4882a593Smuzhiyun Z16 | : V16 | 458*4882a593Smuzhiyun : : 459*4882a593Smuzhiyun Z31 | : V31 | 460*4882a593Smuzhiyun +---- //// -----------------+ 461*4882a593Smuzhiyun 31 0 462*4882a593Smuzhiyun VL-1 0 +-------+ 463*4882a593Smuzhiyun +---- //// --+ FPSR | | 464*4882a593Smuzhiyun P0 | | +-------+ 465*4882a593Smuzhiyun : | | *FPCR | | 466*4882a593Smuzhiyun P15 | | +-------+ 467*4882a593Smuzhiyun +---- //// --+ 468*4882a593Smuzhiyun FFR | | +-----+ 469*4882a593Smuzhiyun +---- //// --+ VL | | 470*4882a593Smuzhiyun +-----+ 471*4882a593Smuzhiyun 472*4882a593Smuzhiyun(*) callee-save: 473*4882a593Smuzhiyun This only applies to bits [63:0] of Z-/V-registers. 474*4882a593Smuzhiyun FPCR contains callee-save and caller-save bits. See [4] for details. 475*4882a593Smuzhiyun 476*4882a593Smuzhiyun 477*4882a593SmuzhiyunA.2. Procedure call standard 478*4882a593Smuzhiyun----------------------------- 479*4882a593Smuzhiyun 480*4882a593SmuzhiyunThe ARMv8-A base procedure call standard is extended as follows with respect to 481*4882a593Smuzhiyunthe additional SVE register state: 482*4882a593Smuzhiyun 483*4882a593Smuzhiyun* All SVE register bits that are not shared with FP/SIMD are caller-save. 484*4882a593Smuzhiyun 485*4882a593Smuzhiyun* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save. 486*4882a593Smuzhiyun 487*4882a593Smuzhiyun This follows from the way these bits are mapped to V8..V15, which are caller- 488*4882a593Smuzhiyun save in the base procedure call standard. 489*4882a593Smuzhiyun 490*4882a593Smuzhiyun 491*4882a593SmuzhiyunAppendix B. ARMv8-A FP/SIMD programmer's model 492*4882a593Smuzhiyun=============================================== 493*4882a593Smuzhiyun 494*4882a593SmuzhiyunNote: This section is for information only and not intended to be complete or 495*4882a593Smuzhiyunto replace any architectural specification. 496*4882a593Smuzhiyun 497*4882a593SmuzhiyunRefer to [4] for more information. 498*4882a593Smuzhiyun 499*4882a593SmuzhiyunARMv8-A defines the following floating-point / SIMD register state: 500*4882a593Smuzhiyun 501*4882a593Smuzhiyun* 32 128-bit vector registers V0..V31 502*4882a593Smuzhiyun* 2 32-bit status/control registers FPSR, FPCR 503*4882a593Smuzhiyun 504*4882a593Smuzhiyun:: 505*4882a593Smuzhiyun 506*4882a593Smuzhiyun 127 0 bit index 507*4882a593Smuzhiyun +---------------+ 508*4882a593Smuzhiyun V0 | | 509*4882a593Smuzhiyun : : : 510*4882a593Smuzhiyun V7 | | 511*4882a593Smuzhiyun * V8 | | 512*4882a593Smuzhiyun : : : : 513*4882a593Smuzhiyun *V15 | | 514*4882a593Smuzhiyun V16 | | 515*4882a593Smuzhiyun : : : 516*4882a593Smuzhiyun V31 | | 517*4882a593Smuzhiyun +---------------+ 518*4882a593Smuzhiyun 519*4882a593Smuzhiyun 31 0 520*4882a593Smuzhiyun +-------+ 521*4882a593Smuzhiyun FPSR | | 522*4882a593Smuzhiyun +-------+ 523*4882a593Smuzhiyun *FPCR | | 524*4882a593Smuzhiyun +-------+ 525*4882a593Smuzhiyun 526*4882a593Smuzhiyun(*) callee-save: 527*4882a593Smuzhiyun This only applies to bits [63:0] of V-registers. 528*4882a593Smuzhiyun FPCR contains a mixture of callee-save and caller-save bits. 529*4882a593Smuzhiyun 530*4882a593Smuzhiyun 531*4882a593SmuzhiyunReferences 532*4882a593Smuzhiyun========== 533*4882a593Smuzhiyun 534*4882a593Smuzhiyun[1] arch/arm64/include/uapi/asm/sigcontext.h 535*4882a593Smuzhiyun AArch64 Linux signal ABI definitions 536*4882a593Smuzhiyun 537*4882a593Smuzhiyun[2] arch/arm64/include/uapi/asm/ptrace.h 538*4882a593Smuzhiyun AArch64 Linux ptrace ABI definitions 539*4882a593Smuzhiyun 540*4882a593Smuzhiyun[3] Documentation/arm64/cpu-feature-registers.rst 541*4882a593Smuzhiyun 542*4882a593Smuzhiyun[4] ARM IHI0055C 543*4882a593Smuzhiyun http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf 544*4882a593Smuzhiyun http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html 545*4882a593Smuzhiyun Procedure Call Standard for the ARM 64-bit Architecture (AArch64) 546