xref: /OK3568_Linux_fs/kernel/Documentation/dev-tools/kfence.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun.. Copyright (C) 2020, Google LLC.
3*4882a593Smuzhiyun
4*4882a593SmuzhiyunKernel Electric-Fence (KFENCE)
5*4882a593Smuzhiyun==============================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunKernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
8*4882a593Smuzhiyunerror detector. KFENCE detects heap out-of-bounds access, use-after-free, and
9*4882a593Smuzhiyuninvalid-free errors.
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunKFENCE is designed to be enabled in production kernels, and has near zero
12*4882a593Smuzhiyunperformance overhead. Compared to KASAN, KFENCE trades performance for
13*4882a593Smuzhiyunprecision. The main motivation behind KFENCE's design, is that with enough
14*4882a593Smuzhiyuntotal uptime KFENCE will detect bugs in code paths not typically exercised by
15*4882a593Smuzhiyunnon-production test workloads. One way to quickly achieve a large enough total
16*4882a593Smuzhiyunuptime is when the tool is deployed across a large fleet of machines.
17*4882a593Smuzhiyun
18*4882a593SmuzhiyunUsage
19*4882a593Smuzhiyun-----
20*4882a593Smuzhiyun
21*4882a593SmuzhiyunTo enable KFENCE, configure the kernel with::
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun    CONFIG_KFENCE=y
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunTo build a kernel with KFENCE support, but disabled by default (to enable, set
26*4882a593Smuzhiyun``kfence.sample_interval`` to non-zero value), configure the kernel with::
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun    CONFIG_KFENCE=y
29*4882a593Smuzhiyun    CONFIG_KFENCE_SAMPLE_INTERVAL=0
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunKFENCE provides several other configuration options to customize behaviour (see
32*4882a593Smuzhiyunthe respective help text in ``lib/Kconfig.kfence`` for more info).
33*4882a593Smuzhiyun
34*4882a593SmuzhiyunTuning performance
35*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunThe most important parameter is KFENCE's sample interval, which can be set via
38*4882a593Smuzhiyunthe kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
39*4882a593Smuzhiyunsample interval determines the frequency with which heap allocations will be
40*4882a593Smuzhiyunguarded by KFENCE. The default is configurable via the Kconfig option
41*4882a593Smuzhiyun``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
42*4882a593Smuzhiyundisables KFENCE.
43*4882a593Smuzhiyun
44*4882a593SmuzhiyunThe KFENCE memory pool is of fixed size, and if the pool is exhausted, no
45*4882a593Smuzhiyunfurther KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
46*4882a593Smuzhiyun255), the number of available guarded objects can be controlled. Each object
47*4882a593Smuzhiyunrequires 2 pages, one for the object itself and the other one used as a guard
48*4882a593Smuzhiyunpage; object pages are interleaved with guard pages, and every object page is
49*4882a593Smuzhiyuntherefore surrounded by two guard pages.
50*4882a593Smuzhiyun
51*4882a593SmuzhiyunThe total memory dedicated to the KFENCE memory pool can be computed as::
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun    ( #objects + 1 ) * 2 * PAGE_SIZE
54*4882a593Smuzhiyun
55*4882a593SmuzhiyunUsing the default config, and assuming a page size of 4 KiB, results in
56*4882a593Smuzhiyundedicating 2 MiB to the KFENCE memory pool.
57*4882a593Smuzhiyun
58*4882a593SmuzhiyunNote: On architectures that support huge pages, KFENCE will ensure that the
59*4882a593Smuzhiyunpool is using pages of size ``PAGE_SIZE``. This will result in additional page
60*4882a593Smuzhiyuntables being allocated.
61*4882a593Smuzhiyun
62*4882a593SmuzhiyunError reports
63*4882a593Smuzhiyun~~~~~~~~~~~~~
64*4882a593Smuzhiyun
65*4882a593SmuzhiyunA typical out-of-bounds access looks like this::
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun    ==================================================================
68*4882a593Smuzhiyun    BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa3/0x22b
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun    Out-of-bounds read at 0xffffffffb672efff (1B left of kfence-#17):
71*4882a593Smuzhiyun     test_out_of_bounds_read+0xa3/0x22b
72*4882a593Smuzhiyun     kunit_try_run_case+0x51/0x85
73*4882a593Smuzhiyun     kunit_generic_run_threadfn_adapter+0x16/0x30
74*4882a593Smuzhiyun     kthread+0x137/0x160
75*4882a593Smuzhiyun     ret_from_fork+0x22/0x30
76*4882a593Smuzhiyun
77*4882a593Smuzhiyun    kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated by task 507:
78*4882a593Smuzhiyun     test_alloc+0xf3/0x25b
79*4882a593Smuzhiyun     test_out_of_bounds_read+0x98/0x22b
80*4882a593Smuzhiyun     kunit_try_run_case+0x51/0x85
81*4882a593Smuzhiyun     kunit_generic_run_threadfn_adapter+0x16/0x30
82*4882a593Smuzhiyun     kthread+0x137/0x160
83*4882a593Smuzhiyun     ret_from_fork+0x22/0x30
84*4882a593Smuzhiyun
85*4882a593Smuzhiyun    CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7
86*4882a593Smuzhiyun    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
87*4882a593Smuzhiyun    ==================================================================
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunThe header of the report provides a short summary of the function involved in
90*4882a593Smuzhiyunthe access. It is followed by more detailed information about the access and
91*4882a593Smuzhiyunits origin. Note that, real kernel addresses are only shown when using the
92*4882a593Smuzhiyunkernel command line option ``no_hash_pointers``.
93*4882a593Smuzhiyun
94*4882a593SmuzhiyunUse-after-free accesses are reported as::
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun    ==================================================================
97*4882a593Smuzhiyun    BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun    Use-after-free read at 0xffffffffb673dfe0 (in kfence-#24):
100*4882a593Smuzhiyun     test_use_after_free_read+0xb3/0x143
101*4882a593Smuzhiyun     kunit_try_run_case+0x51/0x85
102*4882a593Smuzhiyun     kunit_generic_run_threadfn_adapter+0x16/0x30
103*4882a593Smuzhiyun     kthread+0x137/0x160
104*4882a593Smuzhiyun     ret_from_fork+0x22/0x30
105*4882a593Smuzhiyun
106*4882a593Smuzhiyun    kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated by task 507:
107*4882a593Smuzhiyun     test_alloc+0xf3/0x25b
108*4882a593Smuzhiyun     test_use_after_free_read+0x76/0x143
109*4882a593Smuzhiyun     kunit_try_run_case+0x51/0x85
110*4882a593Smuzhiyun     kunit_generic_run_threadfn_adapter+0x16/0x30
111*4882a593Smuzhiyun     kthread+0x137/0x160
112*4882a593Smuzhiyun     ret_from_fork+0x22/0x30
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun    freed by task 507:
115*4882a593Smuzhiyun     test_use_after_free_read+0xa8/0x143
116*4882a593Smuzhiyun     kunit_try_run_case+0x51/0x85
117*4882a593Smuzhiyun     kunit_generic_run_threadfn_adapter+0x16/0x30
118*4882a593Smuzhiyun     kthread+0x137/0x160
119*4882a593Smuzhiyun     ret_from_fork+0x22/0x30
120*4882a593Smuzhiyun
121*4882a593Smuzhiyun    CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
122*4882a593Smuzhiyun    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
123*4882a593Smuzhiyun    ==================================================================
124*4882a593Smuzhiyun
125*4882a593SmuzhiyunKFENCE also reports on invalid frees, such as double-frees::
126*4882a593Smuzhiyun
127*4882a593Smuzhiyun    ==================================================================
128*4882a593Smuzhiyun    BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
129*4882a593Smuzhiyun
130*4882a593Smuzhiyun    Invalid free of 0xffffffffb6741000:
131*4882a593Smuzhiyun     test_double_free+0xdc/0x171
132*4882a593Smuzhiyun     kunit_try_run_case+0x51/0x85
133*4882a593Smuzhiyun     kunit_generic_run_threadfn_adapter+0x16/0x30
134*4882a593Smuzhiyun     kthread+0x137/0x160
135*4882a593Smuzhiyun     ret_from_fork+0x22/0x30
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun    kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated by task 507:
138*4882a593Smuzhiyun     test_alloc+0xf3/0x25b
139*4882a593Smuzhiyun     test_double_free+0x76/0x171
140*4882a593Smuzhiyun     kunit_try_run_case+0x51/0x85
141*4882a593Smuzhiyun     kunit_generic_run_threadfn_adapter+0x16/0x30
142*4882a593Smuzhiyun     kthread+0x137/0x160
143*4882a593Smuzhiyun     ret_from_fork+0x22/0x30
144*4882a593Smuzhiyun
145*4882a593Smuzhiyun    freed by task 507:
146*4882a593Smuzhiyun     test_double_free+0xa8/0x171
147*4882a593Smuzhiyun     kunit_try_run_case+0x51/0x85
148*4882a593Smuzhiyun     kunit_generic_run_threadfn_adapter+0x16/0x30
149*4882a593Smuzhiyun     kthread+0x137/0x160
150*4882a593Smuzhiyun     ret_from_fork+0x22/0x30
151*4882a593Smuzhiyun
152*4882a593Smuzhiyun    CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
153*4882a593Smuzhiyun    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
154*4882a593Smuzhiyun    ==================================================================
155*4882a593Smuzhiyun
156*4882a593SmuzhiyunKFENCE also uses pattern-based redzones on the other side of an object's guard
157*4882a593Smuzhiyunpage, to detect out-of-bounds writes on the unprotected side of the object.
158*4882a593SmuzhiyunThese are reported on frees::
159*4882a593Smuzhiyun
160*4882a593Smuzhiyun    ==================================================================
161*4882a593Smuzhiyun    BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
162*4882a593Smuzhiyun
163*4882a593Smuzhiyun    Corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ] (in kfence-#69):
164*4882a593Smuzhiyun     test_kmalloc_aligned_oob_write+0xef/0x184
165*4882a593Smuzhiyun     kunit_try_run_case+0x51/0x85
166*4882a593Smuzhiyun     kunit_generic_run_threadfn_adapter+0x16/0x30
167*4882a593Smuzhiyun     kthread+0x137/0x160
168*4882a593Smuzhiyun     ret_from_fork+0x22/0x30
169*4882a593Smuzhiyun
170*4882a593Smuzhiyun    kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated by task 507:
171*4882a593Smuzhiyun     test_alloc+0xf3/0x25b
172*4882a593Smuzhiyun     test_kmalloc_aligned_oob_write+0x57/0x184
173*4882a593Smuzhiyun     kunit_try_run_case+0x51/0x85
174*4882a593Smuzhiyun     kunit_generic_run_threadfn_adapter+0x16/0x30
175*4882a593Smuzhiyun     kthread+0x137/0x160
176*4882a593Smuzhiyun     ret_from_fork+0x22/0x30
177*4882a593Smuzhiyun
178*4882a593Smuzhiyun    CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
179*4882a593Smuzhiyun    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
180*4882a593Smuzhiyun    ==================================================================
181*4882a593Smuzhiyun
182*4882a593SmuzhiyunFor such errors, the address where the corruption occurred as well as the
183*4882a593Smuzhiyuninvalidly written bytes (offset from the address) are shown; in this
184*4882a593Smuzhiyunrepresentation, '.' denote untouched bytes. In the example above ``0xac`` is
185*4882a593Smuzhiyunthe value written to the invalid address at offset 0, and the remaining '.'
186*4882a593Smuzhiyundenote that no following bytes have been touched. Note that, real values are
187*4882a593Smuzhiyunonly shown if the kernel was booted with ``no_hash_pointers``; to avoid
188*4882a593Smuzhiyuninformation disclosure otherwise, '!' is used instead to denote invalidly
189*4882a593Smuzhiyunwritten bytes.
190*4882a593Smuzhiyun
191*4882a593SmuzhiyunAnd finally, KFENCE may also report on invalid accesses to any protected page
192*4882a593Smuzhiyunwhere it was not possible to determine an associated object, e.g. if adjacent
193*4882a593Smuzhiyunobject pages had not yet been allocated::
194*4882a593Smuzhiyun
195*4882a593Smuzhiyun    ==================================================================
196*4882a593Smuzhiyun    BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0
197*4882a593Smuzhiyun
198*4882a593Smuzhiyun    Invalid read at 0xffffffffb670b00a:
199*4882a593Smuzhiyun     test_invalid_access+0x26/0xe0
200*4882a593Smuzhiyun     kunit_try_run_case+0x51/0x85
201*4882a593Smuzhiyun     kunit_generic_run_threadfn_adapter+0x16/0x30
202*4882a593Smuzhiyun     kthread+0x137/0x160
203*4882a593Smuzhiyun     ret_from_fork+0x22/0x30
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun    CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G        W         5.8.0-rc6+ #7
206*4882a593Smuzhiyun    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
207*4882a593Smuzhiyun    ==================================================================
208*4882a593Smuzhiyun
209*4882a593SmuzhiyunDebugFS interface
210*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~
211*4882a593Smuzhiyun
212*4882a593SmuzhiyunSome debugging information is exposed via debugfs:
213*4882a593Smuzhiyun
214*4882a593Smuzhiyun* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
215*4882a593Smuzhiyun
216*4882a593Smuzhiyun* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
217*4882a593Smuzhiyun  allocated via KFENCE, including those already freed but protected.
218*4882a593Smuzhiyun
219*4882a593SmuzhiyunImplementation Details
220*4882a593Smuzhiyun----------------------
221*4882a593Smuzhiyun
222*4882a593SmuzhiyunGuarded allocations are set up based on the sample interval. After expiration
223*4882a593Smuzhiyunof the sample interval, the next allocation through the main allocator (SLAB or
224*4882a593SmuzhiyunSLUB) returns a guarded allocation from the KFENCE object pool (allocation
225*4882a593Smuzhiyunsizes up to PAGE_SIZE are supported). At this point, the timer is reset, and
226*4882a593Smuzhiyunthe next allocation is set up after the expiration of the interval. To "gate" a
227*4882a593SmuzhiyunKFENCE allocation through the main allocator's fast-path without overhead,
228*4882a593SmuzhiyunKFENCE relies on static branches via the static keys infrastructure. The static
229*4882a593Smuzhiyunbranch is toggled to redirect the allocation to KFENCE.
230*4882a593Smuzhiyun
231*4882a593SmuzhiyunKFENCE objects each reside on a dedicated page, at either the left or right
232*4882a593Smuzhiyunpage boundaries selected at random. The pages to the left and right of the
233*4882a593Smuzhiyunobject page are "guard pages", whose attributes are changed to a protected
234*4882a593Smuzhiyunstate, and cause page faults on any attempted access. Such page faults are then
235*4882a593Smuzhiyunintercepted by KFENCE, which handles the fault gracefully by reporting an
236*4882a593Smuzhiyunout-of-bounds access, and marking the page as accessible so that the faulting
237*4882a593Smuzhiyuncode can (wrongly) continue executing (set ``panic_on_warn`` to panic instead).
238*4882a593Smuzhiyun
239*4882a593SmuzhiyunTo detect out-of-bounds writes to memory within the object's page itself,
240*4882a593SmuzhiyunKFENCE also uses pattern-based redzones. For each object page, a redzone is set
241*4882a593Smuzhiyunup for all non-object memory. For typical alignments, the redzone is only
242*4882a593Smuzhiyunrequired on the unguarded side of an object. Because KFENCE must honor the
243*4882a593Smuzhiyuncache's requested alignment, special alignments may result in unprotected gaps
244*4882a593Smuzhiyunon either side of an object, all of which are redzoned.
245*4882a593Smuzhiyun
246*4882a593SmuzhiyunThe following figure illustrates the page layout::
247*4882a593Smuzhiyun
248*4882a593Smuzhiyun    ---+-----------+-----------+-----------+-----------+-----------+---
249*4882a593Smuzhiyun       | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
250*4882a593Smuzhiyun       | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
251*4882a593Smuzhiyun       | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
252*4882a593Smuzhiyun       | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
253*4882a593Smuzhiyun       | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
254*4882a593Smuzhiyun       | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
255*4882a593Smuzhiyun    ---+-----------+-----------+-----------+-----------+-----------+---
256*4882a593Smuzhiyun
257*4882a593SmuzhiyunUpon deallocation of a KFENCE object, the object's page is again protected and
258*4882a593Smuzhiyunthe object is marked as freed. Any further access to the object causes a fault
259*4882a593Smuzhiyunand KFENCE reports a use-after-free access. Freed objects are inserted at the
260*4882a593Smuzhiyuntail of KFENCE's freelist, so that the least recently freed objects are reused
261*4882a593Smuzhiyunfirst, and the chances of detecting use-after-frees of recently freed objects
262*4882a593Smuzhiyunis increased.
263*4882a593Smuzhiyun
264*4882a593SmuzhiyunInterface
265*4882a593Smuzhiyun---------
266*4882a593Smuzhiyun
267*4882a593SmuzhiyunThe following describes the functions which are used by allocators as well as
268*4882a593Smuzhiyunpage handling code to set up and deal with KFENCE allocations.
269*4882a593Smuzhiyun
270*4882a593Smuzhiyun.. kernel-doc:: include/linux/kfence.h
271*4882a593Smuzhiyun   :functions: is_kfence_address
272*4882a593Smuzhiyun               kfence_shutdown_cache
273*4882a593Smuzhiyun               kfence_alloc kfence_free __kfence_free
274*4882a593Smuzhiyun               kfence_ksize kfence_object_start
275*4882a593Smuzhiyun               kfence_handle_page_fault
276*4882a593Smuzhiyun
277*4882a593SmuzhiyunRelated Tools
278*4882a593Smuzhiyun-------------
279*4882a593Smuzhiyun
280*4882a593SmuzhiyunIn userspace, a similar approach is taken by `GWP-ASan
281*4882a593Smuzhiyun<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and
282*4882a593Smuzhiyuna sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is
283*4882a593Smuzhiyundirectly influenced by GWP-ASan, and can be seen as its kernel sibling. Another
284*4882a593Smuzhiyunsimilar but non-sampling approach, that also inspired the name "KFENCE", can be
285*4882a593Smuzhiyunfound in the userspace `Electric Fence Malloc Debugger
286*4882a593Smuzhiyun<https://linux.die.net/man/3/efence>`_.
287*4882a593Smuzhiyun
288*4882a593SmuzhiyunIn the kernel, several tools exist to debug memory access errors, and in
289*4882a593Smuzhiyunparticular KASAN can detect all bug classes that KFENCE can detect. While KASAN
290*4882a593Smuzhiyunis more precise, relying on compiler instrumentation, this comes at a
291*4882a593Smuzhiyunperformance cost.
292*4882a593Smuzhiyun
293*4882a593SmuzhiyunIt is worth highlighting that KASAN and KFENCE are complementary, with
294*4882a593Smuzhiyundifferent target environments. For instance, KASAN is the better debugging-aid,
295*4882a593Smuzhiyunwhere test cases or reproducers exists: due to the lower chance to detect the
296*4882a593Smuzhiyunerror, it would require more effort using KFENCE to debug. Deployments at scale
297*4882a593Smuzhiyunthat cannot afford to enable KASAN, however, would benefit from using KFENCE to
298*4882a593Smuzhiyundiscover bugs due to code paths not exercised by test cases or fuzzers.
299