xref: /OK3568_Linux_fs/kernel/Documentation/core-api/gfp_mask-from-fs-io.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. _gfp_mask_from_fs_io:
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun=================================
4*4882a593SmuzhiyunGFP masks used from FS/IO context
5*4882a593Smuzhiyun=================================
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun:Date: May, 2018
8*4882a593Smuzhiyun:Author: Michal Hocko <mhocko@kernel.org>
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunIntroduction
11*4882a593Smuzhiyun============
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunCode paths in the filesystem and IO stacks must be careful when
14*4882a593Smuzhiyunallocating memory to prevent recursion deadlocks caused by direct
15*4882a593Smuzhiyunmemory reclaim calling back into the FS or IO paths and blocking on
16*4882a593Smuzhiyunalready held resources (e.g. locks - most commonly those used for the
17*4882a593Smuzhiyuntransaction context).
18*4882a593Smuzhiyun
19*4882a593SmuzhiyunThe traditional way to avoid this deadlock problem is to clear __GFP_FS
20*4882a593Smuzhiyunrespectively __GFP_IO (note the latter implies clearing the first as well) in
21*4882a593Smuzhiyunthe gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can be
22*4882a593Smuzhiyunused as shortcut. It turned out though that above approach has led to
23*4882a593Smuzhiyunabuses when the restricted gfp mask is used "just in case" without a
24*4882a593Smuzhiyundeeper consideration which leads to problems because an excessive use
25*4882a593Smuzhiyunof GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory
26*4882a593Smuzhiyunreclaim issues.
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunNew API
29*4882a593Smuzhiyun========
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunSince 4.12 we do have a generic scope API for both NOFS and NOIO context
32*4882a593Smuzhiyun``memalloc_nofs_save``, ``memalloc_nofs_restore`` respectively ``memalloc_noio_save``,
33*4882a593Smuzhiyun``memalloc_noio_restore`` which allow to mark a scope to be a critical
34*4882a593Smuzhiyunsection from a filesystem or I/O point of view. Any allocation from that
35*4882a593Smuzhiyunscope will inherently drop __GFP_FS respectively __GFP_IO from the given
36*4882a593Smuzhiyunmask so no memory allocation can recurse back in the FS/IO.
37*4882a593Smuzhiyun
38*4882a593Smuzhiyun.. kernel-doc:: include/linux/sched/mm.h
39*4882a593Smuzhiyun   :functions: memalloc_nofs_save memalloc_nofs_restore
40*4882a593Smuzhiyun.. kernel-doc:: include/linux/sched/mm.h
41*4882a593Smuzhiyun   :functions: memalloc_noio_save memalloc_noio_restore
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunFS/IO code then simply calls the appropriate save function before
44*4882a593Smuzhiyunany critical section with respect to the reclaim is started - e.g.
45*4882a593Smuzhiyunlock shared with the reclaim context or when a transaction context
46*4882a593Smuzhiyunnesting would be possible via reclaim. The restore function should be
47*4882a593Smuzhiyuncalled when the critical section ends. All that ideally along with an
48*4882a593Smuzhiyunexplanation what is the reclaim context for easier maintenance.
49*4882a593Smuzhiyun
50*4882a593SmuzhiyunPlease note that the proper pairing of save/restore functions
51*4882a593Smuzhiyunallows nesting so it is safe to call ``memalloc_noio_save`` or
52*4882a593Smuzhiyun``memalloc_noio_restore`` respectively from an existing NOIO or NOFS
53*4882a593Smuzhiyunscope.
54*4882a593Smuzhiyun
55*4882a593SmuzhiyunWhat about __vmalloc(GFP_NOFS)
56*4882a593Smuzhiyun==============================
57*4882a593Smuzhiyun
58*4882a593Smuzhiyunvmalloc doesn't support GFP_NOFS semantic because there are hardcoded
59*4882a593SmuzhiyunGFP_KERNEL allocations deep inside the allocator which are quite non-trivial
60*4882a593Smuzhiyunto fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is
61*4882a593Smuzhiyunalmost always a bug. The good news is that the NOFS/NOIO semantic can be
62*4882a593Smuzhiyunachieved by the scope API.
63*4882a593Smuzhiyun
64*4882a593SmuzhiyunIn the ideal world, upper layers should already mark dangerous contexts
65*4882a593Smuzhiyunand so no special care is required and vmalloc should be called without
66*4882a593Smuzhiyunany problems. Sometimes if the context is not really clear or there are
67*4882a593Smuzhiyunlayering violations then the recommended way around that is to wrap ``vmalloc``
68*4882a593Smuzhiyunby the scope API with a comment explaining the problem.
69