1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun.. Copyright (C) 2020, Google LLC. 3*4882a593Smuzhiyun 4*4882a593SmuzhiyunKernel Electric-Fence (KFENCE) 5*4882a593Smuzhiyun============================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunKernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety 8*4882a593Smuzhiyunerror detector. KFENCE detects heap out-of-bounds access, use-after-free, and 9*4882a593Smuzhiyuninvalid-free errors. 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunKFENCE is designed to be enabled in production kernels, and has near zero 12*4882a593Smuzhiyunperformance overhead. Compared to KASAN, KFENCE trades performance for 13*4882a593Smuzhiyunprecision. The main motivation behind KFENCE's design, is that with enough 14*4882a593Smuzhiyuntotal uptime KFENCE will detect bugs in code paths not typically exercised by 15*4882a593Smuzhiyunnon-production test workloads. One way to quickly achieve a large enough total 16*4882a593Smuzhiyunuptime is when the tool is deployed across a large fleet of machines. 17*4882a593Smuzhiyun 18*4882a593SmuzhiyunUsage 19*4882a593Smuzhiyun----- 20*4882a593Smuzhiyun 21*4882a593SmuzhiyunTo enable KFENCE, configure the kernel with:: 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun CONFIG_KFENCE=y 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunTo build a kernel with KFENCE support, but disabled by default (to enable, set 26*4882a593Smuzhiyun``kfence.sample_interval`` to non-zero value), configure the kernel with:: 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun CONFIG_KFENCE=y 29*4882a593Smuzhiyun CONFIG_KFENCE_SAMPLE_INTERVAL=0 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunKFENCE provides several other configuration options to customize behaviour (see 32*4882a593Smuzhiyunthe respective help text in ``lib/Kconfig.kfence`` for more info). 33*4882a593Smuzhiyun 34*4882a593SmuzhiyunTuning performance 35*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~ 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunThe most important parameter is KFENCE's sample interval, which can be set via 38*4882a593Smuzhiyunthe kernel boot parameter ``kfence.sample_interval`` in milliseconds. The 39*4882a593Smuzhiyunsample interval determines the frequency with which heap allocations will be 40*4882a593Smuzhiyunguarded by KFENCE. The default is configurable via the Kconfig option 41*4882a593Smuzhiyun``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0`` 42*4882a593Smuzhiyundisables KFENCE. 43*4882a593Smuzhiyun 44*4882a593SmuzhiyunThe KFENCE memory pool is of fixed size, and if the pool is exhausted, no 45*4882a593Smuzhiyunfurther KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default 46*4882a593Smuzhiyun255), the number of available guarded objects can be controlled. Each object 47*4882a593Smuzhiyunrequires 2 pages, one for the object itself and the other one used as a guard 48*4882a593Smuzhiyunpage; object pages are interleaved with guard pages, and every object page is 49*4882a593Smuzhiyuntherefore surrounded by two guard pages. 50*4882a593Smuzhiyun 51*4882a593SmuzhiyunThe total memory dedicated to the KFENCE memory pool can be computed as:: 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun ( #objects + 1 ) * 2 * PAGE_SIZE 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunUsing the default config, and assuming a page size of 4 KiB, results in 56*4882a593Smuzhiyundedicating 2 MiB to the KFENCE memory pool. 57*4882a593Smuzhiyun 58*4882a593SmuzhiyunNote: On architectures that support huge pages, KFENCE will ensure that the 59*4882a593Smuzhiyunpool is using pages of size ``PAGE_SIZE``. This will result in additional page 60*4882a593Smuzhiyuntables being allocated. 61*4882a593Smuzhiyun 62*4882a593SmuzhiyunError reports 63*4882a593Smuzhiyun~~~~~~~~~~~~~ 64*4882a593Smuzhiyun 65*4882a593SmuzhiyunA typical out-of-bounds access looks like this:: 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun ================================================================== 68*4882a593Smuzhiyun BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa3/0x22b 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun Out-of-bounds read at 0xffffffffb672efff (1B left of kfence-#17): 71*4882a593Smuzhiyun test_out_of_bounds_read+0xa3/0x22b 72*4882a593Smuzhiyun kunit_try_run_case+0x51/0x85 73*4882a593Smuzhiyun kunit_generic_run_threadfn_adapter+0x16/0x30 74*4882a593Smuzhiyun kthread+0x137/0x160 75*4882a593Smuzhiyun ret_from_fork+0x22/0x30 76*4882a593Smuzhiyun 77*4882a593Smuzhiyun kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated by task 507: 78*4882a593Smuzhiyun test_alloc+0xf3/0x25b 79*4882a593Smuzhiyun test_out_of_bounds_read+0x98/0x22b 80*4882a593Smuzhiyun kunit_try_run_case+0x51/0x85 81*4882a593Smuzhiyun kunit_generic_run_threadfn_adapter+0x16/0x30 82*4882a593Smuzhiyun kthread+0x137/0x160 83*4882a593Smuzhiyun ret_from_fork+0x22/0x30 84*4882a593Smuzhiyun 85*4882a593Smuzhiyun CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7 86*4882a593Smuzhiyun Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 87*4882a593Smuzhiyun ================================================================== 88*4882a593Smuzhiyun 89*4882a593SmuzhiyunThe header of the report provides a short summary of the function involved in 90*4882a593Smuzhiyunthe access. It is followed by more detailed information about the access and 91*4882a593Smuzhiyunits origin. Note that, real kernel addresses are only shown when using the 92*4882a593Smuzhiyunkernel command line option ``no_hash_pointers``. 93*4882a593Smuzhiyun 94*4882a593SmuzhiyunUse-after-free accesses are reported as:: 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun ================================================================== 97*4882a593Smuzhiyun BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143 98*4882a593Smuzhiyun 99*4882a593Smuzhiyun Use-after-free read at 0xffffffffb673dfe0 (in kfence-#24): 100*4882a593Smuzhiyun test_use_after_free_read+0xb3/0x143 101*4882a593Smuzhiyun kunit_try_run_case+0x51/0x85 102*4882a593Smuzhiyun kunit_generic_run_threadfn_adapter+0x16/0x30 103*4882a593Smuzhiyun kthread+0x137/0x160 104*4882a593Smuzhiyun ret_from_fork+0x22/0x30 105*4882a593Smuzhiyun 106*4882a593Smuzhiyun kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated by task 507: 107*4882a593Smuzhiyun test_alloc+0xf3/0x25b 108*4882a593Smuzhiyun test_use_after_free_read+0x76/0x143 109*4882a593Smuzhiyun kunit_try_run_case+0x51/0x85 110*4882a593Smuzhiyun kunit_generic_run_threadfn_adapter+0x16/0x30 111*4882a593Smuzhiyun kthread+0x137/0x160 112*4882a593Smuzhiyun ret_from_fork+0x22/0x30 113*4882a593Smuzhiyun 114*4882a593Smuzhiyun freed by task 507: 115*4882a593Smuzhiyun test_use_after_free_read+0xa8/0x143 116*4882a593Smuzhiyun kunit_try_run_case+0x51/0x85 117*4882a593Smuzhiyun kunit_generic_run_threadfn_adapter+0x16/0x30 118*4882a593Smuzhiyun kthread+0x137/0x160 119*4882a593Smuzhiyun ret_from_fork+0x22/0x30 120*4882a593Smuzhiyun 121*4882a593Smuzhiyun CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 122*4882a593Smuzhiyun Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 123*4882a593Smuzhiyun ================================================================== 124*4882a593Smuzhiyun 125*4882a593SmuzhiyunKFENCE also reports on invalid frees, such as double-frees:: 126*4882a593Smuzhiyun 127*4882a593Smuzhiyun ================================================================== 128*4882a593Smuzhiyun BUG: KFENCE: invalid free in test_double_free+0xdc/0x171 129*4882a593Smuzhiyun 130*4882a593Smuzhiyun Invalid free of 0xffffffffb6741000: 131*4882a593Smuzhiyun test_double_free+0xdc/0x171 132*4882a593Smuzhiyun kunit_try_run_case+0x51/0x85 133*4882a593Smuzhiyun kunit_generic_run_threadfn_adapter+0x16/0x30 134*4882a593Smuzhiyun kthread+0x137/0x160 135*4882a593Smuzhiyun ret_from_fork+0x22/0x30 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated by task 507: 138*4882a593Smuzhiyun test_alloc+0xf3/0x25b 139*4882a593Smuzhiyun test_double_free+0x76/0x171 140*4882a593Smuzhiyun kunit_try_run_case+0x51/0x85 141*4882a593Smuzhiyun kunit_generic_run_threadfn_adapter+0x16/0x30 142*4882a593Smuzhiyun kthread+0x137/0x160 143*4882a593Smuzhiyun ret_from_fork+0x22/0x30 144*4882a593Smuzhiyun 145*4882a593Smuzhiyun freed by task 507: 146*4882a593Smuzhiyun test_double_free+0xa8/0x171 147*4882a593Smuzhiyun kunit_try_run_case+0x51/0x85 148*4882a593Smuzhiyun kunit_generic_run_threadfn_adapter+0x16/0x30 149*4882a593Smuzhiyun kthread+0x137/0x160 150*4882a593Smuzhiyun ret_from_fork+0x22/0x30 151*4882a593Smuzhiyun 152*4882a593Smuzhiyun CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 153*4882a593Smuzhiyun Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 154*4882a593Smuzhiyun ================================================================== 155*4882a593Smuzhiyun 156*4882a593SmuzhiyunKFENCE also uses pattern-based redzones on the other side of an object's guard 157*4882a593Smuzhiyunpage, to detect out-of-bounds writes on the unprotected side of the object. 158*4882a593SmuzhiyunThese are reported on frees:: 159*4882a593Smuzhiyun 160*4882a593Smuzhiyun ================================================================== 161*4882a593Smuzhiyun BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184 162*4882a593Smuzhiyun 163*4882a593Smuzhiyun Corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ] (in kfence-#69): 164*4882a593Smuzhiyun test_kmalloc_aligned_oob_write+0xef/0x184 165*4882a593Smuzhiyun kunit_try_run_case+0x51/0x85 166*4882a593Smuzhiyun kunit_generic_run_threadfn_adapter+0x16/0x30 167*4882a593Smuzhiyun kthread+0x137/0x160 168*4882a593Smuzhiyun ret_from_fork+0x22/0x30 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated by task 507: 171*4882a593Smuzhiyun test_alloc+0xf3/0x25b 172*4882a593Smuzhiyun test_kmalloc_aligned_oob_write+0x57/0x184 173*4882a593Smuzhiyun kunit_try_run_case+0x51/0x85 174*4882a593Smuzhiyun kunit_generic_run_threadfn_adapter+0x16/0x30 175*4882a593Smuzhiyun kthread+0x137/0x160 176*4882a593Smuzhiyun ret_from_fork+0x22/0x30 177*4882a593Smuzhiyun 178*4882a593Smuzhiyun CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 179*4882a593Smuzhiyun Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 180*4882a593Smuzhiyun ================================================================== 181*4882a593Smuzhiyun 182*4882a593SmuzhiyunFor such errors, the address where the corruption occurred as well as the 183*4882a593Smuzhiyuninvalidly written bytes (offset from the address) are shown; in this 184*4882a593Smuzhiyunrepresentation, '.' denote untouched bytes. In the example above ``0xac`` is 185*4882a593Smuzhiyunthe value written to the invalid address at offset 0, and the remaining '.' 186*4882a593Smuzhiyundenote that no following bytes have been touched. Note that, real values are 187*4882a593Smuzhiyunonly shown if the kernel was booted with ``no_hash_pointers``; to avoid 188*4882a593Smuzhiyuninformation disclosure otherwise, '!' is used instead to denote invalidly 189*4882a593Smuzhiyunwritten bytes. 190*4882a593Smuzhiyun 191*4882a593SmuzhiyunAnd finally, KFENCE may also report on invalid accesses to any protected page 192*4882a593Smuzhiyunwhere it was not possible to determine an associated object, e.g. if adjacent 193*4882a593Smuzhiyunobject pages had not yet been allocated:: 194*4882a593Smuzhiyun 195*4882a593Smuzhiyun ================================================================== 196*4882a593Smuzhiyun BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0 197*4882a593Smuzhiyun 198*4882a593Smuzhiyun Invalid read at 0xffffffffb670b00a: 199*4882a593Smuzhiyun test_invalid_access+0x26/0xe0 200*4882a593Smuzhiyun kunit_try_run_case+0x51/0x85 201*4882a593Smuzhiyun kunit_generic_run_threadfn_adapter+0x16/0x30 202*4882a593Smuzhiyun kthread+0x137/0x160 203*4882a593Smuzhiyun ret_from_fork+0x22/0x30 204*4882a593Smuzhiyun 205*4882a593Smuzhiyun CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 206*4882a593Smuzhiyun Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 207*4882a593Smuzhiyun ================================================================== 208*4882a593Smuzhiyun 209*4882a593SmuzhiyunDebugFS interface 210*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~ 211*4882a593Smuzhiyun 212*4882a593SmuzhiyunSome debugging information is exposed via debugfs: 213*4882a593Smuzhiyun 214*4882a593Smuzhiyun* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics. 215*4882a593Smuzhiyun 216*4882a593Smuzhiyun* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects 217*4882a593Smuzhiyun allocated via KFENCE, including those already freed but protected. 218*4882a593Smuzhiyun 219*4882a593SmuzhiyunImplementation Details 220*4882a593Smuzhiyun---------------------- 221*4882a593Smuzhiyun 222*4882a593SmuzhiyunGuarded allocations are set up based on the sample interval. After expiration 223*4882a593Smuzhiyunof the sample interval, the next allocation through the main allocator (SLAB or 224*4882a593SmuzhiyunSLUB) returns a guarded allocation from the KFENCE object pool (allocation 225*4882a593Smuzhiyunsizes up to PAGE_SIZE are supported). At this point, the timer is reset, and 226*4882a593Smuzhiyunthe next allocation is set up after the expiration of the interval. To "gate" a 227*4882a593SmuzhiyunKFENCE allocation through the main allocator's fast-path without overhead, 228*4882a593SmuzhiyunKFENCE relies on static branches via the static keys infrastructure. The static 229*4882a593Smuzhiyunbranch is toggled to redirect the allocation to KFENCE. 230*4882a593Smuzhiyun 231*4882a593SmuzhiyunKFENCE objects each reside on a dedicated page, at either the left or right 232*4882a593Smuzhiyunpage boundaries selected at random. The pages to the left and right of the 233*4882a593Smuzhiyunobject page are "guard pages", whose attributes are changed to a protected 234*4882a593Smuzhiyunstate, and cause page faults on any attempted access. Such page faults are then 235*4882a593Smuzhiyunintercepted by KFENCE, which handles the fault gracefully by reporting an 236*4882a593Smuzhiyunout-of-bounds access, and marking the page as accessible so that the faulting 237*4882a593Smuzhiyuncode can (wrongly) continue executing (set ``panic_on_warn`` to panic instead). 238*4882a593Smuzhiyun 239*4882a593SmuzhiyunTo detect out-of-bounds writes to memory within the object's page itself, 240*4882a593SmuzhiyunKFENCE also uses pattern-based redzones. For each object page, a redzone is set 241*4882a593Smuzhiyunup for all non-object memory. For typical alignments, the redzone is only 242*4882a593Smuzhiyunrequired on the unguarded side of an object. Because KFENCE must honor the 243*4882a593Smuzhiyuncache's requested alignment, special alignments may result in unprotected gaps 244*4882a593Smuzhiyunon either side of an object, all of which are redzoned. 245*4882a593Smuzhiyun 246*4882a593SmuzhiyunThe following figure illustrates the page layout:: 247*4882a593Smuzhiyun 248*4882a593Smuzhiyun ---+-----------+-----------+-----------+-----------+-----------+--- 249*4882a593Smuzhiyun | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx | 250*4882a593Smuzhiyun | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx | 251*4882a593Smuzhiyun | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x | 252*4882a593Smuzhiyun | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx | 253*4882a593Smuzhiyun | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx | 254*4882a593Smuzhiyun | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx | 255*4882a593Smuzhiyun ---+-----------+-----------+-----------+-----------+-----------+--- 256*4882a593Smuzhiyun 257*4882a593SmuzhiyunUpon deallocation of a KFENCE object, the object's page is again protected and 258*4882a593Smuzhiyunthe object is marked as freed. Any further access to the object causes a fault 259*4882a593Smuzhiyunand KFENCE reports a use-after-free access. Freed objects are inserted at the 260*4882a593Smuzhiyuntail of KFENCE's freelist, so that the least recently freed objects are reused 261*4882a593Smuzhiyunfirst, and the chances of detecting use-after-frees of recently freed objects 262*4882a593Smuzhiyunis increased. 263*4882a593Smuzhiyun 264*4882a593SmuzhiyunInterface 265*4882a593Smuzhiyun--------- 266*4882a593Smuzhiyun 267*4882a593SmuzhiyunThe following describes the functions which are used by allocators as well as 268*4882a593Smuzhiyunpage handling code to set up and deal with KFENCE allocations. 269*4882a593Smuzhiyun 270*4882a593Smuzhiyun.. kernel-doc:: include/linux/kfence.h 271*4882a593Smuzhiyun :functions: is_kfence_address 272*4882a593Smuzhiyun kfence_shutdown_cache 273*4882a593Smuzhiyun kfence_alloc kfence_free __kfence_free 274*4882a593Smuzhiyun kfence_ksize kfence_object_start 275*4882a593Smuzhiyun kfence_handle_page_fault 276*4882a593Smuzhiyun 277*4882a593SmuzhiyunRelated Tools 278*4882a593Smuzhiyun------------- 279*4882a593Smuzhiyun 280*4882a593SmuzhiyunIn userspace, a similar approach is taken by `GWP-ASan 281*4882a593Smuzhiyun<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and 282*4882a593Smuzhiyuna sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is 283*4882a593Smuzhiyundirectly influenced by GWP-ASan, and can be seen as its kernel sibling. Another 284*4882a593Smuzhiyunsimilar but non-sampling approach, that also inspired the name "KFENCE", can be 285*4882a593Smuzhiyunfound in the userspace `Electric Fence Malloc Debugger 286*4882a593Smuzhiyun<https://linux.die.net/man/3/efence>`_. 287*4882a593Smuzhiyun 288*4882a593SmuzhiyunIn the kernel, several tools exist to debug memory access errors, and in 289*4882a593Smuzhiyunparticular KASAN can detect all bug classes that KFENCE can detect. While KASAN 290*4882a593Smuzhiyunis more precise, relying on compiler instrumentation, this comes at a 291*4882a593Smuzhiyunperformance cost. 292*4882a593Smuzhiyun 293*4882a593SmuzhiyunIt is worth highlighting that KASAN and KFENCE are complementary, with 294*4882a593Smuzhiyundifferent target environments. For instance, KASAN is the better debugging-aid, 295*4882a593Smuzhiyunwhere test cases or reproducers exists: due to the lower chance to detect the 296*4882a593Smuzhiyunerror, it would require more effort using KFENCE to debug. Deployments at scale 297*4882a593Smuzhiyunthat cannot afford to enable KASAN, however, would benefit from using KFENCE to 298*4882a593Smuzhiyundiscover bugs due to code paths not exercised by test cases or fuzzers. 299