xref: /OK3568_Linux_fs/kernel/Documentation/arm/kernel_mode_neon.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun================
2*4882a593SmuzhiyunKernel mode NEON
3*4882a593Smuzhiyun================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunTL;DR summary
6*4882a593Smuzhiyun-------------
7*4882a593Smuzhiyun* Use only NEON instructions, or VFP instructions that don't rely on support
8*4882a593Smuzhiyun  code
9*4882a593Smuzhiyun* Isolate your NEON code in a separate compilation unit, and compile it with
10*4882a593Smuzhiyun  '-march=armv7-a -mfpu=neon -mfloat-abi=softfp'
11*4882a593Smuzhiyun* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
12*4882a593Smuzhiyun  NEON code
13*4882a593Smuzhiyun* Don't sleep in your NEON code, and be aware that it will be executed with
14*4882a593Smuzhiyun  preemption disabled
15*4882a593Smuzhiyun
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunIntroduction
18*4882a593Smuzhiyun------------
19*4882a593SmuzhiyunIt is possible to use NEON instructions (and in some cases, VFP instructions) in
20*4882a593Smuzhiyuncode that runs in kernel mode. However, for performance reasons, the NEON/VFP
21*4882a593Smuzhiyunregister file is not preserved and restored at every context switch or taken
22*4882a593Smuzhiyunexception like the normal register file is, so some manual intervention is
23*4882a593Smuzhiyunrequired. Furthermore, special care is required for code that may sleep [i.e.,
24*4882a593Smuzhiyunmay call schedule()], as NEON or VFP instructions will be executed in a
25*4882a593Smuzhiyunnon-preemptible section for reasons outlined below.
26*4882a593Smuzhiyun
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunLazy preserve and restore
29*4882a593Smuzhiyun-------------------------
30*4882a593SmuzhiyunThe NEON/VFP register file is managed using lazy preserve (on UP systems) and
31*4882a593Smuzhiyunlazy restore (on both SMP and UP systems). This means that the register file is
32*4882a593Smuzhiyunkept 'live', and is only preserved and restored when multiple tasks are
33*4882a593Smuzhiyuncontending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
34*4882a593Smuzhiyunanother core). Lazy restore is implemented by disabling the NEON/VFP unit after
35*4882a593Smuzhiyunevery context switch, resulting in a trap when subsequently a NEON/VFP
36*4882a593Smuzhiyuninstruction is issued, allowing the kernel to step in and perform the restore if
37*4882a593Smuzhiyunnecessary.
38*4882a593Smuzhiyun
39*4882a593SmuzhiyunAny use of the NEON/VFP unit in kernel mode should not interfere with this, so
40*4882a593Smuzhiyunit is required to do an 'eager' preserve of the NEON/VFP register file, and
41*4882a593Smuzhiyunenable the NEON/VFP unit explicitly so no exceptions are generated on first
42*4882a593Smuzhiyunsubsequent use. This is handled by the function kernel_neon_begin(), which
43*4882a593Smuzhiyunshould be called before any kernel mode NEON or VFP instructions are issued.
44*4882a593SmuzhiyunLikewise, the NEON/VFP unit should be disabled again after use to make sure user
45*4882a593Smuzhiyunmode will hit the lazy restore trap upon next use. This is handled by the
46*4882a593Smuzhiyunfunction kernel_neon_end().
47*4882a593Smuzhiyun
48*4882a593Smuzhiyun
49*4882a593SmuzhiyunInterruptions in kernel mode
50*4882a593Smuzhiyun----------------------------
51*4882a593SmuzhiyunFor reasons of performance and simplicity, it was decided that there shall be no
52*4882a593Smuzhiyunpreserve/restore mechanism for the kernel mode NEON/VFP register contents. This
53*4882a593Smuzhiyunimplies that interruptions of a kernel mode NEON section can only be allowed if
54*4882a593Smuzhiyunthey are guaranteed not to touch the NEON/VFP registers. For this reason, the
55*4882a593Smuzhiyunfollowing rules and restrictions apply in the kernel:
56*4882a593Smuzhiyun* NEON/VFP code is not allowed in interrupt context;
57*4882a593Smuzhiyun* NEON/VFP code is not allowed to sleep;
58*4882a593Smuzhiyun* NEON/VFP code is executed with preemption disabled.
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunIf latency is a concern, it is possible to put back to back calls to
61*4882a593Smuzhiyunkernel_neon_end() and kernel_neon_begin() in places in your code where none of
62*4882a593Smuzhiyunthe NEON registers are live. (Additional calls to kernel_neon_begin() should be
63*4882a593Smuzhiyunreasonably cheap if no context switch occurred in the meantime)
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun
66*4882a593SmuzhiyunVFP and support code
67*4882a593Smuzhiyun--------------------
68*4882a593SmuzhiyunEarlier versions of VFP (prior to version 3) rely on software support for things
69*4882a593Smuzhiyunlike IEEE-754 compliant underflow handling etc. When the VFP unit needs such
70*4882a593Smuzhiyunsoftware assistance, it signals the kernel by raising an undefined instruction
71*4882a593Smuzhiyunexception. The kernel responds by inspecting the VFP control registers and the
72*4882a593Smuzhiyuncurrent instruction and arguments, and emulates the instruction in software.
73*4882a593Smuzhiyun
74*4882a593SmuzhiyunSuch software assistance is currently not implemented for VFP instructions
75*4882a593Smuzhiyunexecuted in kernel mode. If such a condition is encountered, the kernel will
76*4882a593Smuzhiyunfail and generate an OOPS.
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunSeparating NEON code from ordinary code
80*4882a593Smuzhiyun---------------------------------------
81*4882a593SmuzhiyunThe compiler is not aware of the special significance of kernel_neon_begin() and
82*4882a593Smuzhiyunkernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
83*4882a593Smuzhiyunbetween calls to these respective functions. Furthermore, GCC may generate NEON
84*4882a593Smuzhiyuninstructions of its own at -O3 level if -mfpu=neon is selected, and even if the
85*4882a593Smuzhiyunkernel is currently compiled at -O2, future changes may result in NEON/VFP
86*4882a593Smuzhiyuninstructions appearing in unexpected places if no special care is taken.
87*4882a593Smuzhiyun
88*4882a593SmuzhiyunTherefore, the recommended and only supported way of using NEON/VFP in the
89*4882a593Smuzhiyunkernel is by adhering to the following rules:
90*4882a593Smuzhiyun
91*4882a593Smuzhiyun* isolate the NEON code in a separate compilation unit and compile it with
92*4882a593Smuzhiyun  '-march=armv7-a -mfpu=neon -mfloat-abi=softfp';
93*4882a593Smuzhiyun* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
94*4882a593Smuzhiyun  into the unit containing the NEON code from a compilation unit which is *not*
95*4882a593Smuzhiyun  built with the GCC flag '-mfpu=neon' set.
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunAs the kernel is compiled with '-msoft-float', the above will guarantee that
98*4882a593Smuzhiyunboth NEON and VFP instructions will only ever appear in designated compilation
99*4882a593Smuzhiyununits at any optimization level.
100*4882a593Smuzhiyun
101*4882a593Smuzhiyun
102*4882a593SmuzhiyunNEON assembler
103*4882a593Smuzhiyun--------------
104*4882a593SmuzhiyunNEON assembler is supported with no additional caveats as long as the rules
105*4882a593Smuzhiyunabove are followed.
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun
108*4882a593SmuzhiyunNEON code generated by GCC
109*4882a593Smuzhiyun--------------------------
110*4882a593SmuzhiyunThe GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
111*4882a593Smuzhiyunparallelism, and generates NEON code from ordinary C source code. This is fully
112*4882a593Smuzhiyunsupported as long as the rules above are followed.
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun
115*4882a593SmuzhiyunNEON intrinsics
116*4882a593Smuzhiyun---------------
117*4882a593SmuzhiyunNEON intrinsics are also supported. However, as code using NEON intrinsics
118*4882a593Smuzhiyunrelies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
119*4882a593Smuzhiyunobserve the following in addition to the rules above:
120*4882a593Smuzhiyun
121*4882a593Smuzhiyun* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
122*4882a593Smuzhiyun  uses its builtin version of <stdint.h> (this is a C99 header which the kernel
123*4882a593Smuzhiyun  does not supply);
124*4882a593Smuzhiyun* Include <arm_neon.h> last, or at least after <linux/types.h>
125