1*4882a593Smuzhiyun.. _gfp_mask_from_fs_io: 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun================================= 4*4882a593SmuzhiyunGFP masks used from FS/IO context 5*4882a593Smuzhiyun================================= 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun:Date: May, 2018 8*4882a593Smuzhiyun:Author: Michal Hocko <mhocko@kernel.org> 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunIntroduction 11*4882a593Smuzhiyun============ 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunCode paths in the filesystem and IO stacks must be careful when 14*4882a593Smuzhiyunallocating memory to prevent recursion deadlocks caused by direct 15*4882a593Smuzhiyunmemory reclaim calling back into the FS or IO paths and blocking on 16*4882a593Smuzhiyunalready held resources (e.g. locks - most commonly those used for the 17*4882a593Smuzhiyuntransaction context). 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunThe traditional way to avoid this deadlock problem is to clear __GFP_FS 20*4882a593Smuzhiyunrespectively __GFP_IO (note the latter implies clearing the first as well) in 21*4882a593Smuzhiyunthe gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can be 22*4882a593Smuzhiyunused as shortcut. It turned out though that above approach has led to 23*4882a593Smuzhiyunabuses when the restricted gfp mask is used "just in case" without a 24*4882a593Smuzhiyundeeper consideration which leads to problems because an excessive use 25*4882a593Smuzhiyunof GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory 26*4882a593Smuzhiyunreclaim issues. 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunNew API 29*4882a593Smuzhiyun======== 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunSince 4.12 we do have a generic scope API for both NOFS and NOIO context 32*4882a593Smuzhiyun``memalloc_nofs_save``, ``memalloc_nofs_restore`` respectively ``memalloc_noio_save``, 33*4882a593Smuzhiyun``memalloc_noio_restore`` which allow to mark a scope to be a critical 34*4882a593Smuzhiyunsection from a filesystem or I/O point of view. Any allocation from that 35*4882a593Smuzhiyunscope will inherently drop __GFP_FS respectively __GFP_IO from the given 36*4882a593Smuzhiyunmask so no memory allocation can recurse back in the FS/IO. 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun.. kernel-doc:: include/linux/sched/mm.h 39*4882a593Smuzhiyun :functions: memalloc_nofs_save memalloc_nofs_restore 40*4882a593Smuzhiyun.. kernel-doc:: include/linux/sched/mm.h 41*4882a593Smuzhiyun :functions: memalloc_noio_save memalloc_noio_restore 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunFS/IO code then simply calls the appropriate save function before 44*4882a593Smuzhiyunany critical section with respect to the reclaim is started - e.g. 45*4882a593Smuzhiyunlock shared with the reclaim context or when a transaction context 46*4882a593Smuzhiyunnesting would be possible via reclaim. The restore function should be 47*4882a593Smuzhiyuncalled when the critical section ends. All that ideally along with an 48*4882a593Smuzhiyunexplanation what is the reclaim context for easier maintenance. 49*4882a593Smuzhiyun 50*4882a593SmuzhiyunPlease note that the proper pairing of save/restore functions 51*4882a593Smuzhiyunallows nesting so it is safe to call ``memalloc_noio_save`` or 52*4882a593Smuzhiyun``memalloc_noio_restore`` respectively from an existing NOIO or NOFS 53*4882a593Smuzhiyunscope. 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunWhat about __vmalloc(GFP_NOFS) 56*4882a593Smuzhiyun============================== 57*4882a593Smuzhiyun 58*4882a593Smuzhiyunvmalloc doesn't support GFP_NOFS semantic because there are hardcoded 59*4882a593SmuzhiyunGFP_KERNEL allocations deep inside the allocator which are quite non-trivial 60*4882a593Smuzhiyunto fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is 61*4882a593Smuzhiyunalmost always a bug. The good news is that the NOFS/NOIO semantic can be 62*4882a593Smuzhiyunachieved by the scope API. 63*4882a593Smuzhiyun 64*4882a593SmuzhiyunIn the ideal world, upper layers should already mark dangerous contexts 65*4882a593Smuzhiyunand so no special care is required and vmalloc should be called without 66*4882a593Smuzhiyunany problems. Sometimes if the context is not really clear or there are 67*4882a593Smuzhiyunlayering violations then the recommended way around that is to wrap ``vmalloc`` 68*4882a593Smuzhiyunby the scope API with a comment explaining the problem. 69