xref: /OK3568_Linux_fs/kernel/Documentation/dev-tools/kmemleak.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593SmuzhiyunKernel Memory Leak Detector
2*4882a593Smuzhiyun===========================
3*4882a593Smuzhiyun
4*4882a593SmuzhiyunKmemleak provides a way of detecting possible kernel memory leaks in a
5*4882a593Smuzhiyunway similar to a `tracing garbage collector
6*4882a593Smuzhiyun<https://en.wikipedia.org/wiki/Tracing_garbage_collection>`_,
7*4882a593Smuzhiyunwith the difference that the orphan objects are not freed but only
8*4882a593Smuzhiyunreported via /sys/kernel/debug/kmemleak. A similar method is used by the
9*4882a593SmuzhiyunValgrind tool (``memcheck --leak-check``) to detect the memory leaks in
10*4882a593Smuzhiyunuser-space applications.
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunUsage
13*4882a593Smuzhiyun-----
14*4882a593Smuzhiyun
15*4882a593SmuzhiyunCONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel
16*4882a593Smuzhiyunthread scans the memory every 10 minutes (by default) and prints the
17*4882a593Smuzhiyunnumber of new unreferenced objects found. If the ``debugfs`` isn't already
18*4882a593Smuzhiyunmounted, mount with::
19*4882a593Smuzhiyun
20*4882a593Smuzhiyun  # mount -t debugfs nodev /sys/kernel/debug/
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunTo display the details of all the possible scanned memory leaks::
23*4882a593Smuzhiyun
24*4882a593Smuzhiyun  # cat /sys/kernel/debug/kmemleak
25*4882a593Smuzhiyun
26*4882a593SmuzhiyunTo trigger an intermediate memory scan::
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun  # echo scan > /sys/kernel/debug/kmemleak
29*4882a593Smuzhiyun
30*4882a593SmuzhiyunTo clear the list of all current possible memory leaks::
31*4882a593Smuzhiyun
32*4882a593Smuzhiyun  # echo clear > /sys/kernel/debug/kmemleak
33*4882a593Smuzhiyun
34*4882a593SmuzhiyunNew leaks will then come up upon reading ``/sys/kernel/debug/kmemleak``
35*4882a593Smuzhiyunagain.
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunNote that the orphan objects are listed in the order they were allocated
38*4882a593Smuzhiyunand one object at the beginning of the list may cause other subsequent
39*4882a593Smuzhiyunobjects to be reported as orphan.
40*4882a593Smuzhiyun
41*4882a593SmuzhiyunMemory scanning parameters can be modified at run-time by writing to the
42*4882a593Smuzhiyun``/sys/kernel/debug/kmemleak`` file. The following parameters are supported:
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun- off
45*4882a593Smuzhiyun    disable kmemleak (irreversible)
46*4882a593Smuzhiyun- stack=on
47*4882a593Smuzhiyun    enable the task stacks scanning (default)
48*4882a593Smuzhiyun- stack=off
49*4882a593Smuzhiyun    disable the tasks stacks scanning
50*4882a593Smuzhiyun- scan=on
51*4882a593Smuzhiyun    start the automatic memory scanning thread (default)
52*4882a593Smuzhiyun- scan=off
53*4882a593Smuzhiyun    stop the automatic memory scanning thread
54*4882a593Smuzhiyun- scan=<secs>
55*4882a593Smuzhiyun    set the automatic memory scanning period in seconds
56*4882a593Smuzhiyun    (default 600, 0 to stop the automatic scanning)
57*4882a593Smuzhiyun- scan
58*4882a593Smuzhiyun    trigger a memory scan
59*4882a593Smuzhiyun- clear
60*4882a593Smuzhiyun    clear list of current memory leak suspects, done by
61*4882a593Smuzhiyun    marking all current reported unreferenced objects grey,
62*4882a593Smuzhiyun    or free all kmemleak objects if kmemleak has been disabled.
63*4882a593Smuzhiyun- dump=<addr>
64*4882a593Smuzhiyun    dump information about the object found at <addr>
65*4882a593Smuzhiyun
66*4882a593SmuzhiyunKmemleak can also be disabled at boot-time by passing ``kmemleak=off`` on
67*4882a593Smuzhiyunthe kernel command line.
68*4882a593Smuzhiyun
69*4882a593SmuzhiyunMemory may be allocated or freed before kmemleak is initialised and
70*4882a593Smuzhiyunthese actions are stored in an early log buffer. The size of this buffer
71*4882a593Smuzhiyunis configured via the CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE option.
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunIf CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF are enabled, the kmemleak is
74*4882a593Smuzhiyundisabled by default. Passing ``kmemleak=on`` on the kernel command
75*4882a593Smuzhiyunline enables the function.
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunIf you are getting errors like "Error while writing to stdout" or "write_loop:
78*4882a593SmuzhiyunInvalid argument", make sure kmemleak is properly enabled.
79*4882a593Smuzhiyun
80*4882a593SmuzhiyunBasic Algorithm
81*4882a593Smuzhiyun---------------
82*4882a593Smuzhiyun
83*4882a593SmuzhiyunThe memory allocations via :c:func:`kmalloc`, :c:func:`vmalloc`,
84*4882a593Smuzhiyun:c:func:`kmem_cache_alloc` and
85*4882a593Smuzhiyunfriends are traced and the pointers, together with additional
86*4882a593Smuzhiyuninformation like size and stack trace, are stored in a rbtree.
87*4882a593SmuzhiyunThe corresponding freeing function calls are tracked and the pointers
88*4882a593Smuzhiyunremoved from the kmemleak data structures.
89*4882a593Smuzhiyun
90*4882a593SmuzhiyunAn allocated block of memory is considered orphan if no pointer to its
91*4882a593Smuzhiyunstart address or to any location inside the block can be found by
92*4882a593Smuzhiyunscanning the memory (including saved registers). This means that there
93*4882a593Smuzhiyunmight be no way for the kernel to pass the address of the allocated
94*4882a593Smuzhiyunblock to a freeing function and therefore the block is considered a
95*4882a593Smuzhiyunmemory leak.
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunThe scanning algorithm steps:
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun  1. mark all objects as white (remaining white objects will later be
100*4882a593Smuzhiyun     considered orphan)
101*4882a593Smuzhiyun  2. scan the memory starting with the data section and stacks, checking
102*4882a593Smuzhiyun     the values against the addresses stored in the rbtree. If
103*4882a593Smuzhiyun     a pointer to a white object is found, the object is added to the
104*4882a593Smuzhiyun     gray list
105*4882a593Smuzhiyun  3. scan the gray objects for matching addresses (some white objects
106*4882a593Smuzhiyun     can become gray and added at the end of the gray list) until the
107*4882a593Smuzhiyun     gray set is finished
108*4882a593Smuzhiyun  4. the remaining white objects are considered orphan and reported via
109*4882a593Smuzhiyun     /sys/kernel/debug/kmemleak
110*4882a593Smuzhiyun
111*4882a593SmuzhiyunSome allocated memory blocks have pointers stored in the kernel's
112*4882a593Smuzhiyuninternal data structures and they cannot be detected as orphans. To
113*4882a593Smuzhiyunavoid this, kmemleak can also store the number of values pointing to an
114*4882a593Smuzhiyunaddress inside the block address range that need to be found so that the
115*4882a593Smuzhiyunblock is not considered a leak. One example is __vmalloc().
116*4882a593Smuzhiyun
117*4882a593SmuzhiyunTesting specific sections with kmemleak
118*4882a593Smuzhiyun---------------------------------------
119*4882a593Smuzhiyun
120*4882a593SmuzhiyunUpon initial bootup your /sys/kernel/debug/kmemleak output page may be
121*4882a593Smuzhiyunquite extensive. This can also be the case if you have very buggy code
122*4882a593Smuzhiyunwhen doing development. To work around these situations you can use the
123*4882a593Smuzhiyun'clear' command to clear all reported unreferenced objects from the
124*4882a593Smuzhiyun/sys/kernel/debug/kmemleak output. By issuing a 'scan' after a 'clear'
125*4882a593Smuzhiyunyou can find new unreferenced objects; this should help with testing
126*4882a593Smuzhiyunspecific sections of code.
127*4882a593Smuzhiyun
128*4882a593SmuzhiyunTo test a critical section on demand with a clean kmemleak do::
129*4882a593Smuzhiyun
130*4882a593Smuzhiyun  # echo clear > /sys/kernel/debug/kmemleak
131*4882a593Smuzhiyun  ... test your kernel or modules ...
132*4882a593Smuzhiyun  # echo scan > /sys/kernel/debug/kmemleak
133*4882a593Smuzhiyun
134*4882a593SmuzhiyunThen as usual to get your report with::
135*4882a593Smuzhiyun
136*4882a593Smuzhiyun  # cat /sys/kernel/debug/kmemleak
137*4882a593Smuzhiyun
138*4882a593SmuzhiyunFreeing kmemleak internal objects
139*4882a593Smuzhiyun---------------------------------
140*4882a593Smuzhiyun
141*4882a593SmuzhiyunTo allow access to previously found memory leaks after kmemleak has been
142*4882a593Smuzhiyundisabled by the user or due to an fatal error, internal kmemleak objects
143*4882a593Smuzhiyunwon't be freed when kmemleak is disabled, and those objects may occupy
144*4882a593Smuzhiyuna large part of physical memory.
145*4882a593Smuzhiyun
146*4882a593SmuzhiyunIn this situation, you may reclaim memory with::
147*4882a593Smuzhiyun
148*4882a593Smuzhiyun  # echo clear > /sys/kernel/debug/kmemleak
149*4882a593Smuzhiyun
150*4882a593SmuzhiyunKmemleak API
151*4882a593Smuzhiyun------------
152*4882a593Smuzhiyun
153*4882a593SmuzhiyunSee the include/linux/kmemleak.h header for the functions prototype.
154*4882a593Smuzhiyun
155*4882a593Smuzhiyun- ``kmemleak_init``		 - initialize kmemleak
156*4882a593Smuzhiyun- ``kmemleak_alloc``		 - notify of a memory block allocation
157*4882a593Smuzhiyun- ``kmemleak_alloc_percpu``	 - notify of a percpu memory block allocation
158*4882a593Smuzhiyun- ``kmemleak_vmalloc``		 - notify of a vmalloc() memory allocation
159*4882a593Smuzhiyun- ``kmemleak_free``		 - notify of a memory block freeing
160*4882a593Smuzhiyun- ``kmemleak_free_part``	 - notify of a partial memory block freeing
161*4882a593Smuzhiyun- ``kmemleak_free_percpu``	 - notify of a percpu memory block freeing
162*4882a593Smuzhiyun- ``kmemleak_update_trace``	 - update object allocation stack trace
163*4882a593Smuzhiyun- ``kmemleak_not_leak``	 - mark an object as not a leak
164*4882a593Smuzhiyun- ``kmemleak_ignore``		 - do not scan or report an object as leak
165*4882a593Smuzhiyun- ``kmemleak_scan_area``	 - add scan areas inside a memory block
166*4882a593Smuzhiyun- ``kmemleak_no_scan``	 - do not scan a memory block
167*4882a593Smuzhiyun- ``kmemleak_erase``		 - erase an old value in a pointer variable
168*4882a593Smuzhiyun- ``kmemleak_alloc_recursive`` - as kmemleak_alloc but checks the recursiveness
169*4882a593Smuzhiyun- ``kmemleak_free_recursive``	 - as kmemleak_free but checks the recursiveness
170*4882a593Smuzhiyun
171*4882a593SmuzhiyunThe following functions take a physical address as the object pointer
172*4882a593Smuzhiyunand only perform the corresponding action if the address has a lowmem
173*4882a593Smuzhiyunmapping:
174*4882a593Smuzhiyun
175*4882a593Smuzhiyun- ``kmemleak_alloc_phys``
176*4882a593Smuzhiyun- ``kmemleak_free_part_phys``
177*4882a593Smuzhiyun- ``kmemleak_not_leak_phys``
178*4882a593Smuzhiyun- ``kmemleak_ignore_phys``
179*4882a593Smuzhiyun
180*4882a593SmuzhiyunDealing with false positives/negatives
181*4882a593Smuzhiyun--------------------------------------
182*4882a593Smuzhiyun
183*4882a593SmuzhiyunThe false negatives are real memory leaks (orphan objects) but not
184*4882a593Smuzhiyunreported by kmemleak because values found during the memory scanning
185*4882a593Smuzhiyunpoint to such objects. To reduce the number of false negatives, kmemleak
186*4882a593Smuzhiyunprovides the kmemleak_ignore, kmemleak_scan_area, kmemleak_no_scan and
187*4882a593Smuzhiyunkmemleak_erase functions (see above). The task stacks also increase the
188*4882a593Smuzhiyunamount of false negatives and their scanning is not enabled by default.
189*4882a593Smuzhiyun
190*4882a593SmuzhiyunThe false positives are objects wrongly reported as being memory leaks
191*4882a593Smuzhiyun(orphan). For objects known not to be leaks, kmemleak provides the
192*4882a593Smuzhiyunkmemleak_not_leak function. The kmemleak_ignore could also be used if
193*4882a593Smuzhiyunthe memory block is known not to contain other pointers and it will no
194*4882a593Smuzhiyunlonger be scanned.
195*4882a593Smuzhiyun
196*4882a593SmuzhiyunSome of the reported leaks are only transient, especially on SMP
197*4882a593Smuzhiyunsystems, because of pointers temporarily stored in CPU registers or
198*4882a593Smuzhiyunstacks. Kmemleak defines MSECS_MIN_AGE (defaulting to 1000) representing
199*4882a593Smuzhiyunthe minimum age of an object to be reported as a memory leak.
200*4882a593Smuzhiyun
201*4882a593SmuzhiyunLimitations and Drawbacks
202*4882a593Smuzhiyun-------------------------
203*4882a593Smuzhiyun
204*4882a593SmuzhiyunThe main drawback is the reduced performance of memory allocation and
205*4882a593Smuzhiyunfreeing. To avoid other penalties, the memory scanning is only performed
206*4882a593Smuzhiyunwhen the /sys/kernel/debug/kmemleak file is read. Anyway, this tool is
207*4882a593Smuzhiyunintended for debugging purposes where the performance might not be the
208*4882a593Smuzhiyunmost important requirement.
209*4882a593Smuzhiyun
210*4882a593SmuzhiyunTo keep the algorithm simple, kmemleak scans for values pointing to any
211*4882a593Smuzhiyunaddress inside a block's address range. This may lead to an increased
212*4882a593Smuzhiyunnumber of false negatives. However, it is likely that a real memory leak
213*4882a593Smuzhiyunwill eventually become visible.
214*4882a593Smuzhiyun
215*4882a593SmuzhiyunAnother source of false negatives is the data stored in non-pointer
216*4882a593Smuzhiyunvalues. In a future version, kmemleak could only scan the pointer
217*4882a593Smuzhiyunmembers in the allocated structures. This feature would solve many of
218*4882a593Smuzhiyunthe false negative cases described above.
219*4882a593Smuzhiyun
220*4882a593SmuzhiyunThe tool can report false positives. These are cases where an allocated
221*4882a593Smuzhiyunblock doesn't need to be freed (some cases in the init_call functions),
222*4882a593Smuzhiyunthe pointer is calculated by other methods than the usual container_of
223*4882a593Smuzhiyunmacro or the pointer is stored in a location not scanned by kmemleak.
224*4882a593Smuzhiyun
225*4882a593SmuzhiyunPage allocations and ioremap are not tracked.
226*4882a593Smuzhiyun
227*4882a593SmuzhiyunTesting with kmemleak-test
228*4882a593Smuzhiyun--------------------------
229*4882a593Smuzhiyun
230*4882a593SmuzhiyunTo check if you have all set up to use kmemleak, you can use the kmemleak-test
231*4882a593Smuzhiyunmodule, a module that deliberately leaks memory. Set CONFIG_DEBUG_KMEMLEAK_TEST
232*4882a593Smuzhiyunas module (it can't be used as built-in) and boot the kernel with kmemleak
233*4882a593Smuzhiyunenabled. Load the module and perform a scan with::
234*4882a593Smuzhiyun
235*4882a593Smuzhiyun        # modprobe kmemleak-test
236*4882a593Smuzhiyun        # echo scan > /sys/kernel/debug/kmemleak
237*4882a593Smuzhiyun
238*4882a593SmuzhiyunNote that the you may not get results instantly or on the first scanning. When
239*4882a593Smuzhiyunkmemleak gets results, it'll log ``kmemleak: <count of leaks> new suspected
240*4882a593Smuzhiyunmemory leaks``. Then read the file to see then::
241*4882a593Smuzhiyun
242*4882a593Smuzhiyun        # cat /sys/kernel/debug/kmemleak
243*4882a593Smuzhiyun        unreferenced object 0xffff89862ca702e8 (size 32):
244*4882a593Smuzhiyun          comm "modprobe", pid 2088, jiffies 4294680594 (age 375.486s)
245*4882a593Smuzhiyun          hex dump (first 32 bytes):
246*4882a593Smuzhiyun            6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
247*4882a593Smuzhiyun            6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
248*4882a593Smuzhiyun          backtrace:
249*4882a593Smuzhiyun            [<00000000e0a73ec7>] 0xffffffffc01d2036
250*4882a593Smuzhiyun            [<000000000c5d2a46>] do_one_initcall+0x41/0x1df
251*4882a593Smuzhiyun            [<0000000046db7e0a>] do_init_module+0x55/0x200
252*4882a593Smuzhiyun            [<00000000542b9814>] load_module+0x203c/0x2480
253*4882a593Smuzhiyun            [<00000000c2850256>] __do_sys_finit_module+0xba/0xe0
254*4882a593Smuzhiyun            [<000000006564e7ef>] do_syscall_64+0x43/0x110
255*4882a593Smuzhiyun            [<000000007c873fa6>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
256*4882a593Smuzhiyun        ...
257*4882a593Smuzhiyun
258*4882a593SmuzhiyunRemoving the module with ``rmmod kmemleak_test`` should also trigger some
259*4882a593Smuzhiyunkmemleak results.
260