1*4882a593SmuzhiyunKernel Memory Leak Detector 2*4882a593Smuzhiyun=========================== 3*4882a593Smuzhiyun 4*4882a593SmuzhiyunKmemleak provides a way of detecting possible kernel memory leaks in a 5*4882a593Smuzhiyunway similar to a `tracing garbage collector 6*4882a593Smuzhiyun<https://en.wikipedia.org/wiki/Tracing_garbage_collection>`_, 7*4882a593Smuzhiyunwith the difference that the orphan objects are not freed but only 8*4882a593Smuzhiyunreported via /sys/kernel/debug/kmemleak. A similar method is used by the 9*4882a593SmuzhiyunValgrind tool (``memcheck --leak-check``) to detect the memory leaks in 10*4882a593Smuzhiyunuser-space applications. 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunUsage 13*4882a593Smuzhiyun----- 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunCONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel 16*4882a593Smuzhiyunthread scans the memory every 10 minutes (by default) and prints the 17*4882a593Smuzhiyunnumber of new unreferenced objects found. If the ``debugfs`` isn't already 18*4882a593Smuzhiyunmounted, mount with:: 19*4882a593Smuzhiyun 20*4882a593Smuzhiyun # mount -t debugfs nodev /sys/kernel/debug/ 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunTo display the details of all the possible scanned memory leaks:: 23*4882a593Smuzhiyun 24*4882a593Smuzhiyun # cat /sys/kernel/debug/kmemleak 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunTo trigger an intermediate memory scan:: 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun # echo scan > /sys/kernel/debug/kmemleak 29*4882a593Smuzhiyun 30*4882a593SmuzhiyunTo clear the list of all current possible memory leaks:: 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun # echo clear > /sys/kernel/debug/kmemleak 33*4882a593Smuzhiyun 34*4882a593SmuzhiyunNew leaks will then come up upon reading ``/sys/kernel/debug/kmemleak`` 35*4882a593Smuzhiyunagain. 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunNote that the orphan objects are listed in the order they were allocated 38*4882a593Smuzhiyunand one object at the beginning of the list may cause other subsequent 39*4882a593Smuzhiyunobjects to be reported as orphan. 40*4882a593Smuzhiyun 41*4882a593SmuzhiyunMemory scanning parameters can be modified at run-time by writing to the 42*4882a593Smuzhiyun``/sys/kernel/debug/kmemleak`` file. The following parameters are supported: 43*4882a593Smuzhiyun 44*4882a593Smuzhiyun- off 45*4882a593Smuzhiyun disable kmemleak (irreversible) 46*4882a593Smuzhiyun- stack=on 47*4882a593Smuzhiyun enable the task stacks scanning (default) 48*4882a593Smuzhiyun- stack=off 49*4882a593Smuzhiyun disable the tasks stacks scanning 50*4882a593Smuzhiyun- scan=on 51*4882a593Smuzhiyun start the automatic memory scanning thread (default) 52*4882a593Smuzhiyun- scan=off 53*4882a593Smuzhiyun stop the automatic memory scanning thread 54*4882a593Smuzhiyun- scan=<secs> 55*4882a593Smuzhiyun set the automatic memory scanning period in seconds 56*4882a593Smuzhiyun (default 600, 0 to stop the automatic scanning) 57*4882a593Smuzhiyun- scan 58*4882a593Smuzhiyun trigger a memory scan 59*4882a593Smuzhiyun- clear 60*4882a593Smuzhiyun clear list of current memory leak suspects, done by 61*4882a593Smuzhiyun marking all current reported unreferenced objects grey, 62*4882a593Smuzhiyun or free all kmemleak objects if kmemleak has been disabled. 63*4882a593Smuzhiyun- dump=<addr> 64*4882a593Smuzhiyun dump information about the object found at <addr> 65*4882a593Smuzhiyun 66*4882a593SmuzhiyunKmemleak can also be disabled at boot-time by passing ``kmemleak=off`` on 67*4882a593Smuzhiyunthe kernel command line. 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunMemory may be allocated or freed before kmemleak is initialised and 70*4882a593Smuzhiyunthese actions are stored in an early log buffer. The size of this buffer 71*4882a593Smuzhiyunis configured via the CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE option. 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunIf CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF are enabled, the kmemleak is 74*4882a593Smuzhiyundisabled by default. Passing ``kmemleak=on`` on the kernel command 75*4882a593Smuzhiyunline enables the function. 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunIf you are getting errors like "Error while writing to stdout" or "write_loop: 78*4882a593SmuzhiyunInvalid argument", make sure kmemleak is properly enabled. 79*4882a593Smuzhiyun 80*4882a593SmuzhiyunBasic Algorithm 81*4882a593Smuzhiyun--------------- 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunThe memory allocations via :c:func:`kmalloc`, :c:func:`vmalloc`, 84*4882a593Smuzhiyun:c:func:`kmem_cache_alloc` and 85*4882a593Smuzhiyunfriends are traced and the pointers, together with additional 86*4882a593Smuzhiyuninformation like size and stack trace, are stored in a rbtree. 87*4882a593SmuzhiyunThe corresponding freeing function calls are tracked and the pointers 88*4882a593Smuzhiyunremoved from the kmemleak data structures. 89*4882a593Smuzhiyun 90*4882a593SmuzhiyunAn allocated block of memory is considered orphan if no pointer to its 91*4882a593Smuzhiyunstart address or to any location inside the block can be found by 92*4882a593Smuzhiyunscanning the memory (including saved registers). This means that there 93*4882a593Smuzhiyunmight be no way for the kernel to pass the address of the allocated 94*4882a593Smuzhiyunblock to a freeing function and therefore the block is considered a 95*4882a593Smuzhiyunmemory leak. 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunThe scanning algorithm steps: 98*4882a593Smuzhiyun 99*4882a593Smuzhiyun 1. mark all objects as white (remaining white objects will later be 100*4882a593Smuzhiyun considered orphan) 101*4882a593Smuzhiyun 2. scan the memory starting with the data section and stacks, checking 102*4882a593Smuzhiyun the values against the addresses stored in the rbtree. If 103*4882a593Smuzhiyun a pointer to a white object is found, the object is added to the 104*4882a593Smuzhiyun gray list 105*4882a593Smuzhiyun 3. scan the gray objects for matching addresses (some white objects 106*4882a593Smuzhiyun can become gray and added at the end of the gray list) until the 107*4882a593Smuzhiyun gray set is finished 108*4882a593Smuzhiyun 4. the remaining white objects are considered orphan and reported via 109*4882a593Smuzhiyun /sys/kernel/debug/kmemleak 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunSome allocated memory blocks have pointers stored in the kernel's 112*4882a593Smuzhiyuninternal data structures and they cannot be detected as orphans. To 113*4882a593Smuzhiyunavoid this, kmemleak can also store the number of values pointing to an 114*4882a593Smuzhiyunaddress inside the block address range that need to be found so that the 115*4882a593Smuzhiyunblock is not considered a leak. One example is __vmalloc(). 116*4882a593Smuzhiyun 117*4882a593SmuzhiyunTesting specific sections with kmemleak 118*4882a593Smuzhiyun--------------------------------------- 119*4882a593Smuzhiyun 120*4882a593SmuzhiyunUpon initial bootup your /sys/kernel/debug/kmemleak output page may be 121*4882a593Smuzhiyunquite extensive. This can also be the case if you have very buggy code 122*4882a593Smuzhiyunwhen doing development. To work around these situations you can use the 123*4882a593Smuzhiyun'clear' command to clear all reported unreferenced objects from the 124*4882a593Smuzhiyun/sys/kernel/debug/kmemleak output. By issuing a 'scan' after a 'clear' 125*4882a593Smuzhiyunyou can find new unreferenced objects; this should help with testing 126*4882a593Smuzhiyunspecific sections of code. 127*4882a593Smuzhiyun 128*4882a593SmuzhiyunTo test a critical section on demand with a clean kmemleak do:: 129*4882a593Smuzhiyun 130*4882a593Smuzhiyun # echo clear > /sys/kernel/debug/kmemleak 131*4882a593Smuzhiyun ... test your kernel or modules ... 132*4882a593Smuzhiyun # echo scan > /sys/kernel/debug/kmemleak 133*4882a593Smuzhiyun 134*4882a593SmuzhiyunThen as usual to get your report with:: 135*4882a593Smuzhiyun 136*4882a593Smuzhiyun # cat /sys/kernel/debug/kmemleak 137*4882a593Smuzhiyun 138*4882a593SmuzhiyunFreeing kmemleak internal objects 139*4882a593Smuzhiyun--------------------------------- 140*4882a593Smuzhiyun 141*4882a593SmuzhiyunTo allow access to previously found memory leaks after kmemleak has been 142*4882a593Smuzhiyundisabled by the user or due to an fatal error, internal kmemleak objects 143*4882a593Smuzhiyunwon't be freed when kmemleak is disabled, and those objects may occupy 144*4882a593Smuzhiyuna large part of physical memory. 145*4882a593Smuzhiyun 146*4882a593SmuzhiyunIn this situation, you may reclaim memory with:: 147*4882a593Smuzhiyun 148*4882a593Smuzhiyun # echo clear > /sys/kernel/debug/kmemleak 149*4882a593Smuzhiyun 150*4882a593SmuzhiyunKmemleak API 151*4882a593Smuzhiyun------------ 152*4882a593Smuzhiyun 153*4882a593SmuzhiyunSee the include/linux/kmemleak.h header for the functions prototype. 154*4882a593Smuzhiyun 155*4882a593Smuzhiyun- ``kmemleak_init`` - initialize kmemleak 156*4882a593Smuzhiyun- ``kmemleak_alloc`` - notify of a memory block allocation 157*4882a593Smuzhiyun- ``kmemleak_alloc_percpu`` - notify of a percpu memory block allocation 158*4882a593Smuzhiyun- ``kmemleak_vmalloc`` - notify of a vmalloc() memory allocation 159*4882a593Smuzhiyun- ``kmemleak_free`` - notify of a memory block freeing 160*4882a593Smuzhiyun- ``kmemleak_free_part`` - notify of a partial memory block freeing 161*4882a593Smuzhiyun- ``kmemleak_free_percpu`` - notify of a percpu memory block freeing 162*4882a593Smuzhiyun- ``kmemleak_update_trace`` - update object allocation stack trace 163*4882a593Smuzhiyun- ``kmemleak_not_leak`` - mark an object as not a leak 164*4882a593Smuzhiyun- ``kmemleak_ignore`` - do not scan or report an object as leak 165*4882a593Smuzhiyun- ``kmemleak_scan_area`` - add scan areas inside a memory block 166*4882a593Smuzhiyun- ``kmemleak_no_scan`` - do not scan a memory block 167*4882a593Smuzhiyun- ``kmemleak_erase`` - erase an old value in a pointer variable 168*4882a593Smuzhiyun- ``kmemleak_alloc_recursive`` - as kmemleak_alloc but checks the recursiveness 169*4882a593Smuzhiyun- ``kmemleak_free_recursive`` - as kmemleak_free but checks the recursiveness 170*4882a593Smuzhiyun 171*4882a593SmuzhiyunThe following functions take a physical address as the object pointer 172*4882a593Smuzhiyunand only perform the corresponding action if the address has a lowmem 173*4882a593Smuzhiyunmapping: 174*4882a593Smuzhiyun 175*4882a593Smuzhiyun- ``kmemleak_alloc_phys`` 176*4882a593Smuzhiyun- ``kmemleak_free_part_phys`` 177*4882a593Smuzhiyun- ``kmemleak_not_leak_phys`` 178*4882a593Smuzhiyun- ``kmemleak_ignore_phys`` 179*4882a593Smuzhiyun 180*4882a593SmuzhiyunDealing with false positives/negatives 181*4882a593Smuzhiyun-------------------------------------- 182*4882a593Smuzhiyun 183*4882a593SmuzhiyunThe false negatives are real memory leaks (orphan objects) but not 184*4882a593Smuzhiyunreported by kmemleak because values found during the memory scanning 185*4882a593Smuzhiyunpoint to such objects. To reduce the number of false negatives, kmemleak 186*4882a593Smuzhiyunprovides the kmemleak_ignore, kmemleak_scan_area, kmemleak_no_scan and 187*4882a593Smuzhiyunkmemleak_erase functions (see above). The task stacks also increase the 188*4882a593Smuzhiyunamount of false negatives and their scanning is not enabled by default. 189*4882a593Smuzhiyun 190*4882a593SmuzhiyunThe false positives are objects wrongly reported as being memory leaks 191*4882a593Smuzhiyun(orphan). For objects known not to be leaks, kmemleak provides the 192*4882a593Smuzhiyunkmemleak_not_leak function. The kmemleak_ignore could also be used if 193*4882a593Smuzhiyunthe memory block is known not to contain other pointers and it will no 194*4882a593Smuzhiyunlonger be scanned. 195*4882a593Smuzhiyun 196*4882a593SmuzhiyunSome of the reported leaks are only transient, especially on SMP 197*4882a593Smuzhiyunsystems, because of pointers temporarily stored in CPU registers or 198*4882a593Smuzhiyunstacks. Kmemleak defines MSECS_MIN_AGE (defaulting to 1000) representing 199*4882a593Smuzhiyunthe minimum age of an object to be reported as a memory leak. 200*4882a593Smuzhiyun 201*4882a593SmuzhiyunLimitations and Drawbacks 202*4882a593Smuzhiyun------------------------- 203*4882a593Smuzhiyun 204*4882a593SmuzhiyunThe main drawback is the reduced performance of memory allocation and 205*4882a593Smuzhiyunfreeing. To avoid other penalties, the memory scanning is only performed 206*4882a593Smuzhiyunwhen the /sys/kernel/debug/kmemleak file is read. Anyway, this tool is 207*4882a593Smuzhiyunintended for debugging purposes where the performance might not be the 208*4882a593Smuzhiyunmost important requirement. 209*4882a593Smuzhiyun 210*4882a593SmuzhiyunTo keep the algorithm simple, kmemleak scans for values pointing to any 211*4882a593Smuzhiyunaddress inside a block's address range. This may lead to an increased 212*4882a593Smuzhiyunnumber of false negatives. However, it is likely that a real memory leak 213*4882a593Smuzhiyunwill eventually become visible. 214*4882a593Smuzhiyun 215*4882a593SmuzhiyunAnother source of false negatives is the data stored in non-pointer 216*4882a593Smuzhiyunvalues. In a future version, kmemleak could only scan the pointer 217*4882a593Smuzhiyunmembers in the allocated structures. This feature would solve many of 218*4882a593Smuzhiyunthe false negative cases described above. 219*4882a593Smuzhiyun 220*4882a593SmuzhiyunThe tool can report false positives. These are cases where an allocated 221*4882a593Smuzhiyunblock doesn't need to be freed (some cases in the init_call functions), 222*4882a593Smuzhiyunthe pointer is calculated by other methods than the usual container_of 223*4882a593Smuzhiyunmacro or the pointer is stored in a location not scanned by kmemleak. 224*4882a593Smuzhiyun 225*4882a593SmuzhiyunPage allocations and ioremap are not tracked. 226*4882a593Smuzhiyun 227*4882a593SmuzhiyunTesting with kmemleak-test 228*4882a593Smuzhiyun-------------------------- 229*4882a593Smuzhiyun 230*4882a593SmuzhiyunTo check if you have all set up to use kmemleak, you can use the kmemleak-test 231*4882a593Smuzhiyunmodule, a module that deliberately leaks memory. Set CONFIG_DEBUG_KMEMLEAK_TEST 232*4882a593Smuzhiyunas module (it can't be used as built-in) and boot the kernel with kmemleak 233*4882a593Smuzhiyunenabled. Load the module and perform a scan with:: 234*4882a593Smuzhiyun 235*4882a593Smuzhiyun # modprobe kmemleak-test 236*4882a593Smuzhiyun # echo scan > /sys/kernel/debug/kmemleak 237*4882a593Smuzhiyun 238*4882a593SmuzhiyunNote that the you may not get results instantly or on the first scanning. When 239*4882a593Smuzhiyunkmemleak gets results, it'll log ``kmemleak: <count of leaks> new suspected 240*4882a593Smuzhiyunmemory leaks``. Then read the file to see then:: 241*4882a593Smuzhiyun 242*4882a593Smuzhiyun # cat /sys/kernel/debug/kmemleak 243*4882a593Smuzhiyun unreferenced object 0xffff89862ca702e8 (size 32): 244*4882a593Smuzhiyun comm "modprobe", pid 2088, jiffies 4294680594 (age 375.486s) 245*4882a593Smuzhiyun hex dump (first 32 bytes): 246*4882a593Smuzhiyun 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 247*4882a593Smuzhiyun 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. 248*4882a593Smuzhiyun backtrace: 249*4882a593Smuzhiyun [<00000000e0a73ec7>] 0xffffffffc01d2036 250*4882a593Smuzhiyun [<000000000c5d2a46>] do_one_initcall+0x41/0x1df 251*4882a593Smuzhiyun [<0000000046db7e0a>] do_init_module+0x55/0x200 252*4882a593Smuzhiyun [<00000000542b9814>] load_module+0x203c/0x2480 253*4882a593Smuzhiyun [<00000000c2850256>] __do_sys_finit_module+0xba/0xe0 254*4882a593Smuzhiyun [<000000006564e7ef>] do_syscall_64+0x43/0x110 255*4882a593Smuzhiyun [<000000007c873fa6>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 256*4882a593Smuzhiyun ... 257*4882a593Smuzhiyun 258*4882a593SmuzhiyunRemoving the module with ``rmmod kmemleak_test`` should also trigger some 259*4882a593Smuzhiyunkmemleak results. 260