xref: /OK3568_Linux_fs/kernel/Documentation/filesystems/porting.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun====================
2*4882a593SmuzhiyunChanges since 2.5.0:
3*4882a593Smuzhiyun====================
4*4882a593Smuzhiyun
5*4882a593Smuzhiyun---
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun**recommended**
8*4882a593Smuzhiyun
9*4882a593SmuzhiyunNew helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(),
10*4882a593Smuzhiyunsb_set_blocksize() and sb_min_blocksize().
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunUse them.
13*4882a593Smuzhiyun
14*4882a593Smuzhiyun(sb_find_get_block() replaces 2.4's get_hash_table())
15*4882a593Smuzhiyun
16*4882a593Smuzhiyun---
17*4882a593Smuzhiyun
18*4882a593Smuzhiyun**recommended**
19*4882a593Smuzhiyun
20*4882a593SmuzhiyunNew methods: ->alloc_inode() and ->destroy_inode().
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunRemove inode->u.foo_inode_i
23*4882a593Smuzhiyun
24*4882a593SmuzhiyunDeclare::
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun	struct foo_inode_info {
27*4882a593Smuzhiyun		/* fs-private stuff */
28*4882a593Smuzhiyun		struct inode vfs_inode;
29*4882a593Smuzhiyun	};
30*4882a593Smuzhiyun	static inline struct foo_inode_info *FOO_I(struct inode *inode)
31*4882a593Smuzhiyun	{
32*4882a593Smuzhiyun		return list_entry(inode, struct foo_inode_info, vfs_inode);
33*4882a593Smuzhiyun	}
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunUse FOO_I(inode) instead of &inode->u.foo_inode_i;
36*4882a593Smuzhiyun
37*4882a593SmuzhiyunAdd foo_alloc_inode() and foo_destroy_inode() - the former should allocate
38*4882a593Smuzhiyunfoo_inode_info and return the address of ->vfs_inode, the latter should free
39*4882a593SmuzhiyunFOO_I(inode) (see in-tree filesystems for examples).
40*4882a593Smuzhiyun
41*4882a593SmuzhiyunMake them ->alloc_inode and ->destroy_inode in your super_operations.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunKeep in mind that now you need explicit initialization of private data
44*4882a593Smuzhiyuntypically between calling iget_locked() and unlocking the inode.
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunAt some point that will become mandatory.
47*4882a593Smuzhiyun
48*4882a593Smuzhiyun---
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun**mandatory**
51*4882a593Smuzhiyun
52*4882a593SmuzhiyunChange of file_system_type method (->read_super to ->get_sb)
53*4882a593Smuzhiyun
54*4882a593Smuzhiyun->read_super() is no more.  Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV.
55*4882a593Smuzhiyun
56*4882a593SmuzhiyunTurn your foo_read_super() into a function that would return 0 in case of
57*4882a593Smuzhiyunsuccess and negative number in case of error (-EINVAL unless you have more
58*4882a593Smuzhiyuninformative error value to report).  Call it foo_fill_super().  Now declare::
59*4882a593Smuzhiyun
60*4882a593Smuzhiyun  int foo_get_sb(struct file_system_type *fs_type,
61*4882a593Smuzhiyun	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
62*4882a593Smuzhiyun  {
63*4882a593Smuzhiyun	return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
64*4882a593Smuzhiyun			   mnt);
65*4882a593Smuzhiyun  }
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
68*4882a593Smuzhiyunfilesystem).
69*4882a593Smuzhiyun
70*4882a593SmuzhiyunReplace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as
71*4882a593Smuzhiyunfoo_get_sb.
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun---
74*4882a593Smuzhiyun
75*4882a593Smuzhiyun**mandatory**
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunLocking change: ->s_vfs_rename_sem is taken only by cross-directory renames.
78*4882a593SmuzhiyunMost likely there is no need to change anything, but if you relied on
79*4882a593Smuzhiyunglobal exclusion between renames for some internal purpose - you need to
80*4882a593Smuzhiyunchange your internal locking.  Otherwise exclusion warranties remain the
81*4882a593Smuzhiyunsame (i.e. parents and victim are locked, etc.).
82*4882a593Smuzhiyun
83*4882a593Smuzhiyun---
84*4882a593Smuzhiyun
85*4882a593Smuzhiyun**informational**
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunNow we have the exclusion between ->lookup() and directory removal (by
88*4882a593Smuzhiyun->rmdir() and ->rename()).  If you used to need that exclusion and do
89*4882a593Smuzhiyunit by internal locking (most of filesystems couldn't care less) - you
90*4882a593Smuzhiyuncan relax your locking.
91*4882a593Smuzhiyun
92*4882a593Smuzhiyun---
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun**mandatory**
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
97*4882a593Smuzhiyun->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename()
98*4882a593Smuzhiyunand ->readdir() are called without BKL now.  Grab it on entry, drop upon return
99*4882a593Smuzhiyun- that will guarantee the same locking you used to have.  If your method or its
100*4882a593Smuzhiyunparts do not need BKL - better yet, now you can shift lock_kernel() and
101*4882a593Smuzhiyununlock_kernel() so that they would protect exactly what needs to be
102*4882a593Smuzhiyunprotected.
103*4882a593Smuzhiyun
104*4882a593Smuzhiyun---
105*4882a593Smuzhiyun
106*4882a593Smuzhiyun**mandatory**
107*4882a593Smuzhiyun
108*4882a593SmuzhiyunBKL is also moved from around sb operations. BKL should have been shifted into
109*4882a593Smuzhiyunindividual fs sb_op functions.  If you don't need it, remove it.
110*4882a593Smuzhiyun
111*4882a593Smuzhiyun---
112*4882a593Smuzhiyun
113*4882a593Smuzhiyun**informational**
114*4882a593Smuzhiyun
115*4882a593Smuzhiyuncheck for ->link() target not being a directory is done by callers.  Feel
116*4882a593Smuzhiyunfree to drop it...
117*4882a593Smuzhiyun
118*4882a593Smuzhiyun---
119*4882a593Smuzhiyun
120*4882a593Smuzhiyun**informational**
121*4882a593Smuzhiyun
122*4882a593Smuzhiyun->link() callers hold ->i_mutex on the object we are linking to.  Some of your
123*4882a593Smuzhiyunproblems might be over...
124*4882a593Smuzhiyun
125*4882a593Smuzhiyun---
126*4882a593Smuzhiyun
127*4882a593Smuzhiyun**mandatory**
128*4882a593Smuzhiyun
129*4882a593Smuzhiyunnew file_system_type method - kill_sb(superblock).  If you are converting
130*4882a593Smuzhiyunan existing filesystem, set it according to ->fs_flags::
131*4882a593Smuzhiyun
132*4882a593Smuzhiyun	FS_REQUIRES_DEV		-	kill_block_super
133*4882a593Smuzhiyun	FS_LITTER		-	kill_litter_super
134*4882a593Smuzhiyun	neither			-	kill_anon_super
135*4882a593Smuzhiyun
136*4882a593SmuzhiyunFS_LITTER is gone - just remove it from fs_flags.
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun---
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun**mandatory**
141*4882a593Smuzhiyun
142*4882a593SmuzhiyunFS_SINGLE is gone (actually, that had happened back when ->get_sb()
143*4882a593Smuzhiyunwent in - and hadn't been documented ;-/).  Just remove it from fs_flags
144*4882a593Smuzhiyun(and see ->get_sb() entry for other actions).
145*4882a593Smuzhiyun
146*4882a593Smuzhiyun---
147*4882a593Smuzhiyun
148*4882a593Smuzhiyun**mandatory**
149*4882a593Smuzhiyun
150*4882a593Smuzhiyun->setattr() is called without BKL now.  Caller _always_ holds ->i_mutex, so
151*4882a593Smuzhiyunwatch for ->i_mutex-grabbing code that might be used by your ->setattr().
152*4882a593SmuzhiyunCallers of notify_change() need ->i_mutex now.
153*4882a593Smuzhiyun
154*4882a593Smuzhiyun---
155*4882a593Smuzhiyun
156*4882a593Smuzhiyun**recommended**
157*4882a593Smuzhiyun
158*4882a593SmuzhiyunNew super_block field ``struct export_operations *s_export_op`` for
159*4882a593Smuzhiyunexplicit support for exporting, e.g. via NFS.  The structure is fully
160*4882a593Smuzhiyundocumented at its declaration in include/linux/fs.h, and in
161*4882a593SmuzhiyunDocumentation/filesystems/nfs/exporting.rst.
162*4882a593Smuzhiyun
163*4882a593SmuzhiyunBriefly it allows for the definition of decode_fh and encode_fh operations
164*4882a593Smuzhiyunto encode and decode filehandles, and allows the filesystem to use
165*4882a593Smuzhiyuna standard helper function for decode_fh, and provide file-system specific
166*4882a593Smuzhiyunsupport for this helper, particularly get_parent.
167*4882a593Smuzhiyun
168*4882a593SmuzhiyunIt is planned that this will be required for exporting once the code
169*4882a593Smuzhiyunsettles down a bit.
170*4882a593Smuzhiyun
171*4882a593Smuzhiyun**mandatory**
172*4882a593Smuzhiyun
173*4882a593Smuzhiyuns_export_op is now required for exporting a filesystem.
174*4882a593Smuzhiyunisofs, ext2, ext3, resierfs, fat
175*4882a593Smuzhiyuncan be used as examples of very different filesystems.
176*4882a593Smuzhiyun
177*4882a593Smuzhiyun---
178*4882a593Smuzhiyun
179*4882a593Smuzhiyun**mandatory**
180*4882a593Smuzhiyun
181*4882a593Smuzhiyuniget4() and the read_inode2 callback have been superseded by iget5_locked()
182*4882a593Smuzhiyunwhich has the following prototype::
183*4882a593Smuzhiyun
184*4882a593Smuzhiyun    struct inode *iget5_locked(struct super_block *sb, unsigned long ino,
185*4882a593Smuzhiyun				int (*test)(struct inode *, void *),
186*4882a593Smuzhiyun				int (*set)(struct inode *, void *),
187*4882a593Smuzhiyun				void *data);
188*4882a593Smuzhiyun
189*4882a593Smuzhiyun'test' is an additional function that can be used when the inode
190*4882a593Smuzhiyunnumber is not sufficient to identify the actual file object. 'set'
191*4882a593Smuzhiyunshould be a non-blocking function that initializes those parts of a
192*4882a593Smuzhiyunnewly created inode to allow the test function to succeed. 'data' is
193*4882a593Smuzhiyunpassed as an opaque value to both test and set functions.
194*4882a593Smuzhiyun
195*4882a593SmuzhiyunWhen the inode has been created by iget5_locked(), it will be returned with the
196*4882a593SmuzhiyunI_NEW flag set and will still be locked.  The filesystem then needs to finalize
197*4882a593Smuzhiyunthe initialization. Once the inode is initialized it must be unlocked by
198*4882a593Smuzhiyuncalling unlock_new_inode().
199*4882a593Smuzhiyun
200*4882a593SmuzhiyunThe filesystem is responsible for setting (and possibly testing) i_ino
201*4882a593Smuzhiyunwhen appropriate. There is also a simpler iget_locked function that
202*4882a593Smuzhiyunjust takes the superblock and inode number as arguments and does the
203*4882a593Smuzhiyuntest and set for you.
204*4882a593Smuzhiyun
205*4882a593Smuzhiyune.g.::
206*4882a593Smuzhiyun
207*4882a593Smuzhiyun	inode = iget_locked(sb, ino);
208*4882a593Smuzhiyun	if (inode->i_state & I_NEW) {
209*4882a593Smuzhiyun		err = read_inode_from_disk(inode);
210*4882a593Smuzhiyun		if (err < 0) {
211*4882a593Smuzhiyun			iget_failed(inode);
212*4882a593Smuzhiyun			return err;
213*4882a593Smuzhiyun		}
214*4882a593Smuzhiyun		unlock_new_inode(inode);
215*4882a593Smuzhiyun	}
216*4882a593Smuzhiyun
217*4882a593SmuzhiyunNote that if the process of setting up a new inode fails, then iget_failed()
218*4882a593Smuzhiyunshould be called on the inode to render it dead, and an appropriate error
219*4882a593Smuzhiyunshould be passed back to the caller.
220*4882a593Smuzhiyun
221*4882a593Smuzhiyun---
222*4882a593Smuzhiyun
223*4882a593Smuzhiyun**recommended**
224*4882a593Smuzhiyun
225*4882a593Smuzhiyun->getattr() finally getting used.  See instances in nfs, minix, etc.
226*4882a593Smuzhiyun
227*4882a593Smuzhiyun---
228*4882a593Smuzhiyun
229*4882a593Smuzhiyun**mandatory**
230*4882a593Smuzhiyun
231*4882a593Smuzhiyun->revalidate() is gone.  If your filesystem had it - provide ->getattr()
232*4882a593Smuzhiyunand let it call whatever you had as ->revlidate() + (for symlinks that
233*4882a593Smuzhiyunhad ->revalidate()) add calls in ->follow_link()/->readlink().
234*4882a593Smuzhiyun
235*4882a593Smuzhiyun---
236*4882a593Smuzhiyun
237*4882a593Smuzhiyun**mandatory**
238*4882a593Smuzhiyun
239*4882a593Smuzhiyun->d_parent changes are not protected by BKL anymore.  Read access is safe
240*4882a593Smuzhiyunif at least one of the following is true:
241*4882a593Smuzhiyun
242*4882a593Smuzhiyun	* filesystem has no cross-directory rename()
243*4882a593Smuzhiyun	* we know that parent had been locked (e.g. we are looking at
244*4882a593Smuzhiyun	  ->d_parent of ->lookup() argument).
245*4882a593Smuzhiyun	* we are called from ->rename().
246*4882a593Smuzhiyun	* the child's ->d_lock is held
247*4882a593Smuzhiyun
248*4882a593SmuzhiyunAudit your code and add locking if needed.  Notice that any place that is
249*4882a593Smuzhiyunnot protected by the conditions above is risky even in the old tree - you
250*4882a593Smuzhiyunhad been relying on BKL and that's prone to screwups.  Old tree had quite
251*4882a593Smuzhiyuna few holes of that kind - unprotected access to ->d_parent leading to
252*4882a593Smuzhiyunanything from oops to silent memory corruption.
253*4882a593Smuzhiyun
254*4882a593Smuzhiyun---
255*4882a593Smuzhiyun
256*4882a593Smuzhiyun**mandatory**
257*4882a593Smuzhiyun
258*4882a593SmuzhiyunFS_NOMOUNT is gone.  If you use it - just set SB_NOUSER in flags
259*4882a593Smuzhiyun(see rootfs for one kind of solution and bdev/socket/pipe for another).
260*4882a593Smuzhiyun
261*4882a593Smuzhiyun---
262*4882a593Smuzhiyun
263*4882a593Smuzhiyun**recommended**
264*4882a593Smuzhiyun
265*4882a593SmuzhiyunUse bdev_read_only(bdev) instead of is_read_only(kdev).  The latter
266*4882a593Smuzhiyunis still alive, but only because of the mess in drivers/s390/block/dasd.c.
267*4882a593SmuzhiyunAs soon as it gets fixed is_read_only() will die.
268*4882a593Smuzhiyun
269*4882a593Smuzhiyun---
270*4882a593Smuzhiyun
271*4882a593Smuzhiyun**mandatory**
272*4882a593Smuzhiyun
273*4882a593Smuzhiyun->permission() is called without BKL now. Grab it on entry, drop upon
274*4882a593Smuzhiyunreturn - that will guarantee the same locking you used to have.  If
275*4882a593Smuzhiyunyour method or its parts do not need BKL - better yet, now you can
276*4882a593Smuzhiyunshift lock_kernel() and unlock_kernel() so that they would protect
277*4882a593Smuzhiyunexactly what needs to be protected.
278*4882a593Smuzhiyun
279*4882a593Smuzhiyun---
280*4882a593Smuzhiyun
281*4882a593Smuzhiyun**mandatory**
282*4882a593Smuzhiyun
283*4882a593Smuzhiyun->statfs() is now called without BKL held.  BKL should have been
284*4882a593Smuzhiyunshifted into individual fs sb_op functions where it's not clear that
285*4882a593Smuzhiyunit's safe to remove it.  If you don't need it, remove it.
286*4882a593Smuzhiyun
287*4882a593Smuzhiyun---
288*4882a593Smuzhiyun
289*4882a593Smuzhiyun**mandatory**
290*4882a593Smuzhiyun
291*4882a593Smuzhiyunis_read_only() is gone; use bdev_read_only() instead.
292*4882a593Smuzhiyun
293*4882a593Smuzhiyun---
294*4882a593Smuzhiyun
295*4882a593Smuzhiyun**mandatory**
296*4882a593Smuzhiyun
297*4882a593Smuzhiyundestroy_buffers() is gone; use invalidate_bdev().
298*4882a593Smuzhiyun
299*4882a593Smuzhiyun---
300*4882a593Smuzhiyun
301*4882a593Smuzhiyun**mandatory**
302*4882a593Smuzhiyun
303*4882a593Smuzhiyunfsync_dev() is gone; use fsync_bdev().  NOTE: lvm breakage is
304*4882a593Smuzhiyundeliberate; as soon as struct block_device * is propagated in a reasonable
305*4882a593Smuzhiyunway by that code fixing will become trivial; until then nothing can be
306*4882a593Smuzhiyundone.
307*4882a593Smuzhiyun
308*4882a593Smuzhiyun**mandatory**
309*4882a593Smuzhiyun
310*4882a593Smuzhiyunblock truncatation on error exit from ->write_begin, and ->direct_IO
311*4882a593Smuzhiyunmoved from generic methods (block_write_begin, cont_write_begin,
312*4882a593Smuzhiyunnobh_write_begin, blockdev_direct_IO*) to callers.  Take a look at
313*4882a593Smuzhiyunext2_write_failed and callers for an example.
314*4882a593Smuzhiyun
315*4882a593Smuzhiyun**mandatory**
316*4882a593Smuzhiyun
317*4882a593Smuzhiyun->truncate is gone.  The whole truncate sequence needs to be
318*4882a593Smuzhiyunimplemented in ->setattr, which is now mandatory for filesystems
319*4882a593Smuzhiyunimplementing on-disk size changes.  Start with a copy of the old inode_setattr
320*4882a593Smuzhiyunand vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to
321*4882a593Smuzhiyunbe in order of zeroing blocks using block_truncate_page or similar helpers,
322*4882a593Smuzhiyunsize update and on finally on-disk truncation which should not fail.
323*4882a593Smuzhiyunsetattr_prepare (which used to be inode_change_ok) now includes the size checks
324*4882a593Smuzhiyunfor ATTR_SIZE and must be called in the beginning of ->setattr unconditionally.
325*4882a593Smuzhiyun
326*4882a593Smuzhiyun**mandatory**
327*4882a593Smuzhiyun
328*4882a593Smuzhiyun->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
329*4882a593Smuzhiyunbe used instead.  It gets called whenever the inode is evicted, whether it has
330*4882a593Smuzhiyunremaining links or not.  Caller does *not* evict the pagecache or inode-associated
331*4882a593Smuzhiyunmetadata buffers; the method has to use truncate_inode_pages_final() to get rid
332*4882a593Smuzhiyunof those. Caller makes sure async writeback cannot be running for the inode while
333*4882a593Smuzhiyun(or after) ->evict_inode() is called.
334*4882a593Smuzhiyun
335*4882a593Smuzhiyun->drop_inode() returns int now; it's called on final iput() with
336*4882a593Smuzhiyuninode->i_lock held and it returns true if filesystems wants the inode to be
337*4882a593Smuzhiyundropped.  As before, generic_drop_inode() is still the default and it's been
338*4882a593Smuzhiyunupdated appropriately.  generic_delete_inode() is also alive and it consists
339*4882a593Smuzhiyunsimply of return 1.  Note that all actual eviction work is done by caller after
340*4882a593Smuzhiyun->drop_inode() returns.
341*4882a593Smuzhiyun
342*4882a593SmuzhiyunAs before, clear_inode() must be called exactly once on each call of
343*4882a593Smuzhiyun->evict_inode() (as it used to be for each call of ->delete_inode()).  Unlike
344*4882a593Smuzhiyunbefore, if you are using inode-associated metadata buffers (i.e.
345*4882a593Smuzhiyunmark_buffer_dirty_inode()), it's your responsibility to call
346*4882a593Smuzhiyuninvalidate_inode_buffers() before clear_inode().
347*4882a593Smuzhiyun
348*4882a593SmuzhiyunNOTE: checking i_nlink in the beginning of ->write_inode() and bailing out
349*4882a593Smuzhiyunif it's zero is not *and* *never* *had* *been* enough.  Final unlink() and iput()
350*4882a593Smuzhiyunmay happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
351*4882a593Smuzhiyunfree the on-disk inode, you may end up doing that while ->write_inode() is writing
352*4882a593Smuzhiyunto it.
353*4882a593Smuzhiyun
354*4882a593Smuzhiyun---
355*4882a593Smuzhiyun
356*4882a593Smuzhiyun**mandatory**
357*4882a593Smuzhiyun
358*4882a593Smuzhiyun.d_delete() now only advises the dcache as to whether or not to cache
359*4882a593Smuzhiyununreferenced dentries, and is now only called when the dentry refcount goes to
360*4882a593Smuzhiyun0. Even on 0 refcount transition, it must be able to tolerate being called 0,
361*4882a593Smuzhiyun1, or more times (eg. constant, idempotent).
362*4882a593Smuzhiyun
363*4882a593Smuzhiyun---
364*4882a593Smuzhiyun
365*4882a593Smuzhiyun**mandatory**
366*4882a593Smuzhiyun
367*4882a593Smuzhiyun.d_compare() calling convention and locking rules are significantly
368*4882a593Smuzhiyunchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and
369*4882a593Smuzhiyunlook at examples of other filesystems) for guidance.
370*4882a593Smuzhiyun
371*4882a593Smuzhiyun---
372*4882a593Smuzhiyun
373*4882a593Smuzhiyun**mandatory**
374*4882a593Smuzhiyun
375*4882a593Smuzhiyun.d_hash() calling convention and locking rules are significantly
376*4882a593Smuzhiyunchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and
377*4882a593Smuzhiyunlook at examples of other filesystems) for guidance.
378*4882a593Smuzhiyun
379*4882a593Smuzhiyun---
380*4882a593Smuzhiyun
381*4882a593Smuzhiyun**mandatory**
382*4882a593Smuzhiyun
383*4882a593Smuzhiyundcache_lock is gone, replaced by fine grained locks. See fs/dcache.c
384*4882a593Smuzhiyunfor details of what locks to replace dcache_lock with in order to protect
385*4882a593Smuzhiyunparticular things. Most of the time, a filesystem only needs ->d_lock, which
386*4882a593Smuzhiyunprotects *all* the dcache state of a given dentry.
387*4882a593Smuzhiyun
388*4882a593Smuzhiyun---
389*4882a593Smuzhiyun
390*4882a593Smuzhiyun**mandatory**
391*4882a593Smuzhiyun
392*4882a593SmuzhiyunFilesystems must RCU-free their inodes, if they can have been accessed
393*4882a593Smuzhiyunvia rcu-walk path walk (basically, if the file can have had a path name in the
394*4882a593Smuzhiyunvfs namespace).
395*4882a593Smuzhiyun
396*4882a593SmuzhiyunEven though i_dentry and i_rcu share storage in a union, we will
397*4882a593Smuzhiyuninitialize the former in inode_init_always(), so just leave it alone in
398*4882a593Smuzhiyunthe callback.  It used to be necessary to clean it there, but not anymore
399*4882a593Smuzhiyun(starting at 3.2).
400*4882a593Smuzhiyun
401*4882a593Smuzhiyun---
402*4882a593Smuzhiyun
403*4882a593Smuzhiyun**recommended**
404*4882a593Smuzhiyun
405*4882a593Smuzhiyunvfs now tries to do path walking in "rcu-walk mode", which avoids
406*4882a593Smuzhiyunatomic operations and scalability hazards on dentries and inodes (see
407*4882a593SmuzhiyunDocumentation/filesystems/path-lookup.txt). d_hash and d_compare changes
408*4882a593Smuzhiyun(above) are examples of the changes required to support this. For more complex
409*4882a593Smuzhiyunfilesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so
410*4882a593Smuzhiyunno changes are required to the filesystem. However, this is costly and loses
411*4882a593Smuzhiyunthe benefits of rcu-walk mode. We will begin to add filesystem callbacks that
412*4882a593Smuzhiyunare rcu-walk aware, shown below. Filesystems should take advantage of this
413*4882a593Smuzhiyunwhere possible.
414*4882a593Smuzhiyun
415*4882a593Smuzhiyun---
416*4882a593Smuzhiyun
417*4882a593Smuzhiyun**mandatory**
418*4882a593Smuzhiyun
419*4882a593Smuzhiyund_revalidate is a callback that is made on every path element (if
420*4882a593Smuzhiyunthe filesystem provides it), which requires dropping out of rcu-walk mode. This
421*4882a593Smuzhiyunmay now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be
422*4882a593Smuzhiyunreturned if the filesystem cannot handle rcu-walk. See
423*4882a593SmuzhiyunDocumentation/filesystems/vfs.rst for more details.
424*4882a593Smuzhiyun
425*4882a593Smuzhiyunpermission is an inode permission check that is called on many or all
426*4882a593Smuzhiyundirectory inodes on the way down a path walk (to check for exec permission). It
427*4882a593Smuzhiyunmust now be rcu-walk aware (mask & MAY_NOT_BLOCK).  See
428*4882a593SmuzhiyunDocumentation/filesystems/vfs.rst for more details.
429*4882a593Smuzhiyun
430*4882a593Smuzhiyun---
431*4882a593Smuzhiyun
432*4882a593Smuzhiyun**mandatory**
433*4882a593Smuzhiyun
434*4882a593SmuzhiyunIn ->fallocate() you must check the mode option passed in.  If your
435*4882a593Smuzhiyunfilesystem does not support hole punching (deallocating space in the middle of a
436*4882a593Smuzhiyunfile) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode.
437*4882a593SmuzhiyunCurrently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set,
438*4882a593Smuzhiyunso the i_size should not change when hole punching, even when puching the end of
439*4882a593Smuzhiyuna file off.
440*4882a593Smuzhiyun
441*4882a593Smuzhiyun---
442*4882a593Smuzhiyun
443*4882a593Smuzhiyun**mandatory**
444*4882a593Smuzhiyun
445*4882a593Smuzhiyun->get_sb() is gone.  Switch to use of ->mount().  Typically it's just
446*4882a593Smuzhiyuna matter of switching from calling ``get_sb_``... to ``mount_``... and changing
447*4882a593Smuzhiyunthe function type.  If you were doing it manually, just switch from setting
448*4882a593Smuzhiyun->mnt_root to some pointer to returning that pointer.  On errors return
449*4882a593SmuzhiyunERR_PTR(...).
450*4882a593Smuzhiyun
451*4882a593Smuzhiyun---
452*4882a593Smuzhiyun
453*4882a593Smuzhiyun**mandatory**
454*4882a593Smuzhiyun
455*4882a593Smuzhiyun->permission() and generic_permission()have lost flags
456*4882a593Smuzhiyunargument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask.
457*4882a593Smuzhiyun
458*4882a593Smuzhiyungeneric_permission() has also lost the check_acl argument; ACL checking
459*4882a593Smuzhiyunhas been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl
460*4882a593Smuzhiyunto read an ACL from disk.
461*4882a593Smuzhiyun
462*4882a593Smuzhiyun---
463*4882a593Smuzhiyun
464*4882a593Smuzhiyun**mandatory**
465*4882a593Smuzhiyun
466*4882a593SmuzhiyunIf you implement your own ->llseek() you must handle SEEK_HOLE and
467*4882a593SmuzhiyunSEEK_DATA.  You can hanle this by returning -EINVAL, but it would be nicer to
468*4882a593Smuzhiyunsupport it in some way.  The generic handler assumes that the entire file is
469*4882a593Smuzhiyundata and there is a virtual hole at the end of the file.  So if the provided
470*4882a593Smuzhiyunoffset is less than i_size and SEEK_DATA is specified, return the same offset.
471*4882a593SmuzhiyunIf the above is true for the offset and you are given SEEK_HOLE, return the end
472*4882a593Smuzhiyunof the file.  If the offset is i_size or greater return -ENXIO in either case.
473*4882a593Smuzhiyun
474*4882a593Smuzhiyun**mandatory**
475*4882a593Smuzhiyun
476*4882a593SmuzhiyunIf you have your own ->fsync() you must make sure to call
477*4882a593Smuzhiyunfilemap_write_and_wait_range() so that all dirty pages are synced out properly.
478*4882a593SmuzhiyunYou must also keep in mind that ->fsync() is not called with i_mutex held
479*4882a593Smuzhiyunanymore, so if you require i_mutex locking you must make sure to take it and
480*4882a593Smuzhiyunrelease it yourself.
481*4882a593Smuzhiyun
482*4882a593Smuzhiyun---
483*4882a593Smuzhiyun
484*4882a593Smuzhiyun**mandatory**
485*4882a593Smuzhiyun
486*4882a593Smuzhiyund_alloc_root() is gone, along with a lot of bugs caused by code
487*4882a593Smuzhiyunmisusing it.  Replacement: d_make_root(inode).  On success d_make_root(inode)
488*4882a593Smuzhiyunallocates and returns a new dentry instantiated with the passed in inode.
489*4882a593SmuzhiyunOn failure NULL is returned and the passed in inode is dropped so the reference
490*4882a593Smuzhiyunto inode is consumed in all cases and failure handling need not do any cleanup
491*4882a593Smuzhiyunfor the inode.  If d_make_root(inode) is passed a NULL inode it returns NULL
492*4882a593Smuzhiyunand also requires no further error handling. Typical usage is::
493*4882a593Smuzhiyun
494*4882a593Smuzhiyun	inode = foofs_new_inode(....);
495*4882a593Smuzhiyun	s->s_root = d_make_root(inode);
496*4882a593Smuzhiyun	if (!s->s_root)
497*4882a593Smuzhiyun		/* Nothing needed for the inode cleanup */
498*4882a593Smuzhiyun		return -ENOMEM;
499*4882a593Smuzhiyun	...
500*4882a593Smuzhiyun
501*4882a593Smuzhiyun---
502*4882a593Smuzhiyun
503*4882a593Smuzhiyun**mandatory**
504*4882a593Smuzhiyun
505*4882a593SmuzhiyunThe witch is dead!  Well, 2/3 of it, anyway.  ->d_revalidate() and
506*4882a593Smuzhiyun->lookup() do *not* take struct nameidata anymore; just the flags.
507*4882a593Smuzhiyun
508*4882a593Smuzhiyun---
509*4882a593Smuzhiyun
510*4882a593Smuzhiyun**mandatory**
511*4882a593Smuzhiyun
512*4882a593Smuzhiyun->create() doesn't take ``struct nameidata *``; unlike the previous
513*4882a593Smuzhiyuntwo, it gets "is it an O_EXCL or equivalent?" boolean argument.  Note that
514*4882a593Smuzhiyunlocal filesystems can ignore tha argument - they are guaranteed that the
515*4882a593Smuzhiyunobject doesn't exist.  It's remote/distributed ones that might care...
516*4882a593Smuzhiyun
517*4882a593Smuzhiyun---
518*4882a593Smuzhiyun
519*4882a593Smuzhiyun**mandatory**
520*4882a593Smuzhiyun
521*4882a593SmuzhiyunFS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate()
522*4882a593Smuzhiyunin your dentry operations instead.
523*4882a593Smuzhiyun
524*4882a593Smuzhiyun---
525*4882a593Smuzhiyun
526*4882a593Smuzhiyun**mandatory**
527*4882a593Smuzhiyun
528*4882a593Smuzhiyunvfs_readdir() is gone; switch to iterate_dir() instead
529*4882a593Smuzhiyun
530*4882a593Smuzhiyun---
531*4882a593Smuzhiyun
532*4882a593Smuzhiyun**mandatory**
533*4882a593Smuzhiyun
534*4882a593Smuzhiyun->readdir() is gone now; switch to ->iterate()
535*4882a593Smuzhiyun
536*4882a593Smuzhiyun**mandatory**
537*4882a593Smuzhiyun
538*4882a593Smuzhiyunvfs_follow_link has been removed.  Filesystems must use nd_set_link
539*4882a593Smuzhiyunfrom ->follow_link for normal symlinks, or nd_jump_link for magic
540*4882a593Smuzhiyun/proc/<pid> style links.
541*4882a593Smuzhiyun
542*4882a593Smuzhiyun---
543*4882a593Smuzhiyun
544*4882a593Smuzhiyun**mandatory**
545*4882a593Smuzhiyun
546*4882a593Smuzhiyuniget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be
547*4882a593Smuzhiyuncalled with both ->i_lock and inode_hash_lock held; the former is *not*
548*4882a593Smuzhiyuntaken anymore, so verify that your callbacks do not rely on it (none
549*4882a593Smuzhiyunof the in-tree instances did).  inode_hash_lock is still held,
550*4882a593Smuzhiyunof course, so they are still serialized wrt removal from inode hash,
551*4882a593Smuzhiyunas well as wrt set() callback of iget5_locked().
552*4882a593Smuzhiyun
553*4882a593Smuzhiyun---
554*4882a593Smuzhiyun
555*4882a593Smuzhiyun**mandatory**
556*4882a593Smuzhiyun
557*4882a593Smuzhiyund_materialise_unique() is gone; d_splice_alias() does everything you
558*4882a593Smuzhiyunneed now.  Remember that they have opposite orders of arguments ;-/
559*4882a593Smuzhiyun
560*4882a593Smuzhiyun---
561*4882a593Smuzhiyun
562*4882a593Smuzhiyun**mandatory**
563*4882a593Smuzhiyun
564*4882a593Smuzhiyunf_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid
565*4882a593Smuzhiyunit entirely.
566*4882a593Smuzhiyun
567*4882a593Smuzhiyun---
568*4882a593Smuzhiyun
569*4882a593Smuzhiyun**mandatory**
570*4882a593Smuzhiyun
571*4882a593Smuzhiyunnever call ->read() and ->write() directly; use __vfs_{read,write} or
572*4882a593Smuzhiyunwrappers; instead of checking for ->write or ->read being NULL, look for
573*4882a593SmuzhiyunFMODE_CAN_{WRITE,READ} in file->f_mode.
574*4882a593Smuzhiyun
575*4882a593Smuzhiyun---
576*4882a593Smuzhiyun
577*4882a593Smuzhiyun**mandatory**
578*4882a593Smuzhiyun
579*4882a593Smuzhiyundo _not_ use new_sync_{read,write} for ->read/->write; leave it NULL
580*4882a593Smuzhiyuninstead.
581*4882a593Smuzhiyun
582*4882a593Smuzhiyun---
583*4882a593Smuzhiyun
584*4882a593Smuzhiyun**mandatory**
585*4882a593Smuzhiyun	->aio_read/->aio_write are gone.  Use ->read_iter/->write_iter.
586*4882a593Smuzhiyun
587*4882a593Smuzhiyun---
588*4882a593Smuzhiyun
589*4882a593Smuzhiyun**recommended**
590*4882a593Smuzhiyun
591*4882a593Smuzhiyunfor embedded ("fast") symlinks just set inode->i_link to wherever the
592*4882a593Smuzhiyunsymlink body is and use simple_follow_link() as ->follow_link().
593*4882a593Smuzhiyun
594*4882a593Smuzhiyun---
595*4882a593Smuzhiyun
596*4882a593Smuzhiyun**mandatory**
597*4882a593Smuzhiyun
598*4882a593Smuzhiyuncalling conventions for ->follow_link() have changed.  Instead of returning
599*4882a593Smuzhiyuncookie and using nd_set_link() to store the body to traverse, we return
600*4882a593Smuzhiyunthe body to traverse and store the cookie using explicit void ** argument.
601*4882a593Smuzhiyunnameidata isn't passed at all - nd_jump_link() doesn't need it and
602*4882a593Smuzhiyunnd_[gs]et_link() is gone.
603*4882a593Smuzhiyun
604*4882a593Smuzhiyun---
605*4882a593Smuzhiyun
606*4882a593Smuzhiyun**mandatory**
607*4882a593Smuzhiyun
608*4882a593Smuzhiyuncalling conventions for ->put_link() have changed.  It gets inode instead of
609*4882a593Smuzhiyundentry,  it does not get nameidata at all and it gets called only when cookie
610*4882a593Smuzhiyunis non-NULL.  Note that link body isn't available anymore, so if you need it,
611*4882a593Smuzhiyunstore it as cookie.
612*4882a593Smuzhiyun
613*4882a593Smuzhiyun---
614*4882a593Smuzhiyun
615*4882a593Smuzhiyun**mandatory**
616*4882a593Smuzhiyun
617*4882a593Smuzhiyunany symlink that might use page_follow_link_light/page_put_link() must
618*4882a593Smuzhiyunhave inode_nohighmem(inode) called before anything might start playing with
619*4882a593Smuzhiyunits pagecache.  No highmem pages should end up in the pagecache of such
620*4882a593Smuzhiyunsymlinks.  That includes any preseeding that might be done during symlink
621*4882a593Smuzhiyuncreation.  __page_symlink() will honour the mapping gfp flags, so once
622*4882a593Smuzhiyunyou've done inode_nohighmem() it's safe to use, but if you allocate and
623*4882a593Smuzhiyuninsert the page manually, make sure to use the right gfp flags.
624*4882a593Smuzhiyun
625*4882a593Smuzhiyun---
626*4882a593Smuzhiyun
627*4882a593Smuzhiyun**mandatory**
628*4882a593Smuzhiyun
629*4882a593Smuzhiyun->follow_link() is replaced with ->get_link(); same API, except that
630*4882a593Smuzhiyun
631*4882a593Smuzhiyun	* ->get_link() gets inode as a separate argument
632*4882a593Smuzhiyun	* ->get_link() may be called in RCU mode - in that case NULL
633*4882a593Smuzhiyun	  dentry is passed
634*4882a593Smuzhiyun
635*4882a593Smuzhiyun---
636*4882a593Smuzhiyun
637*4882a593Smuzhiyun**mandatory**
638*4882a593Smuzhiyun
639*4882a593Smuzhiyun->get_link() gets struct delayed_call ``*done`` now, and should do
640*4882a593Smuzhiyunset_delayed_call() where it used to set ``*cookie``.
641*4882a593Smuzhiyun
642*4882a593Smuzhiyun->put_link() is gone - just give the destructor to set_delayed_call()
643*4882a593Smuzhiyunin ->get_link().
644*4882a593Smuzhiyun
645*4882a593Smuzhiyun---
646*4882a593Smuzhiyun
647*4882a593Smuzhiyun**mandatory**
648*4882a593Smuzhiyun
649*4882a593Smuzhiyun->getxattr() and xattr_handler.get() get dentry and inode passed separately.
650*4882a593Smuzhiyundentry might be yet to be attached to inode, so do _not_ use its ->d_inode
651*4882a593Smuzhiyunin the instances.  Rationale: !@#!@# security_d_instantiate() needs to be
652*4882a593Smuzhiyuncalled before we attach dentry to inode.
653*4882a593Smuzhiyun
654*4882a593Smuzhiyun---
655*4882a593Smuzhiyun
656*4882a593Smuzhiyun**mandatory**
657*4882a593Smuzhiyun
658*4882a593Smuzhiyunsymlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/
659*4882a593Smuzhiyuni_pipe/i_link union zeroed out at inode eviction.  As the result, you can't
660*4882a593Smuzhiyunassume that non-NULL value in ->i_nlink at ->destroy_inode() implies that
661*4882a593Smuzhiyunit's a symlink.  Checking ->i_mode is really needed now.  In-tree we had
662*4882a593Smuzhiyunto fix shmem_destroy_callback() that used to take that kind of shortcut;
663*4882a593Smuzhiyunwatch out, since that shortcut is no longer valid.
664*4882a593Smuzhiyun
665*4882a593Smuzhiyun---
666*4882a593Smuzhiyun
667*4882a593Smuzhiyun**mandatory**
668*4882a593Smuzhiyun
669*4882a593Smuzhiyun->i_mutex is replaced with ->i_rwsem now.  inode_lock() et.al. work as
670*4882a593Smuzhiyunthey used to - they just take it exclusive.  However, ->lookup() may be
671*4882a593Smuzhiyuncalled with parent locked shared.  Its instances must not
672*4882a593Smuzhiyun
673*4882a593Smuzhiyun	* use d_instantiate) and d_rehash() separately - use d_add() or
674*4882a593Smuzhiyun	  d_splice_alias() instead.
675*4882a593Smuzhiyun	* use d_rehash() alone - call d_add(new_dentry, NULL) instead.
676*4882a593Smuzhiyun	* in the unlikely case when (read-only) access to filesystem
677*4882a593Smuzhiyun	  data structures needs exclusion for some reason, arrange it
678*4882a593Smuzhiyun	  yourself.  None of the in-tree filesystems needed that.
679*4882a593Smuzhiyun	* rely on ->d_parent and ->d_name not changing after dentry has
680*4882a593Smuzhiyun	  been fed to d_add() or d_splice_alias().  Again, none of the
681*4882a593Smuzhiyun	  in-tree instances relied upon that.
682*4882a593Smuzhiyun
683*4882a593SmuzhiyunWe are guaranteed that lookups of the same name in the same directory
684*4882a593Smuzhiyunwill not happen in parallel ("same" in the sense of your ->d_compare()).
685*4882a593SmuzhiyunLookups on different names in the same directory can and do happen in
686*4882a593Smuzhiyunparallel now.
687*4882a593Smuzhiyun
688*4882a593Smuzhiyun---
689*4882a593Smuzhiyun
690*4882a593Smuzhiyun**recommended**
691*4882a593Smuzhiyun
692*4882a593Smuzhiyun->iterate_shared() is added; it's a parallel variant of ->iterate().
693*4882a593SmuzhiyunExclusion on struct file level is still provided (as well as that
694*4882a593Smuzhiyunbetween it and lseek on the same struct file), but if your directory
695*4882a593Smuzhiyunhas been opened several times, you can get these called in parallel.
696*4882a593SmuzhiyunExclusion between that method and all directory-modifying ones is
697*4882a593Smuzhiyunstill provided, of course.
698*4882a593Smuzhiyun
699*4882a593SmuzhiyunOften enough ->iterate() can serve as ->iterate_shared() without any
700*4882a593Smuzhiyunchanges - it is a read-only operation, after all.  If you have any
701*4882a593Smuzhiyunper-inode or per-dentry in-core data structures modified by ->iterate(),
702*4882a593Smuzhiyunyou might need something to serialize the access to them.  If you
703*4882a593Smuzhiyundo dcache pre-seeding, you'll need to switch to d_alloc_parallel() for
704*4882a593Smuzhiyunthat; look for in-tree examples.
705*4882a593Smuzhiyun
706*4882a593SmuzhiyunOld method is only used if the new one is absent; eventually it will
707*4882a593Smuzhiyunbe removed.  Switch while you still can; the old one won't stay.
708*4882a593Smuzhiyun
709*4882a593Smuzhiyun---
710*4882a593Smuzhiyun
711*4882a593Smuzhiyun**mandatory**
712*4882a593Smuzhiyun
713*4882a593Smuzhiyun->atomic_open() calls without O_CREAT may happen in parallel.
714*4882a593Smuzhiyun
715*4882a593Smuzhiyun---
716*4882a593Smuzhiyun
717*4882a593Smuzhiyun**mandatory**
718*4882a593Smuzhiyun
719*4882a593Smuzhiyun->setxattr() and xattr_handler.set() get dentry and inode passed separately.
720*4882a593Smuzhiyundentry might be yet to be attached to inode, so do _not_ use its ->d_inode
721*4882a593Smuzhiyunin the instances.  Rationale: !@#!@# security_d_instantiate() needs to be
722*4882a593Smuzhiyuncalled before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack
723*4882a593Smuzhiyun->d_instantiate() uses not just ->getxattr() but ->setxattr() as well.
724*4882a593Smuzhiyun
725*4882a593Smuzhiyun---
726*4882a593Smuzhiyun
727*4882a593Smuzhiyun**mandatory**
728*4882a593Smuzhiyun
729*4882a593Smuzhiyun->d_compare() doesn't get parent as a separate argument anymore.  If you
730*4882a593Smuzhiyunused it for finding the struct super_block involved, dentry->d_sb will
731*4882a593Smuzhiyunwork just as well; if it's something more complicated, use dentry->d_parent.
732*4882a593SmuzhiyunJust be careful not to assume that fetching it more than once will yield
733*4882a593Smuzhiyunthe same value - in RCU mode it could change under you.
734*4882a593Smuzhiyun
735*4882a593Smuzhiyun---
736*4882a593Smuzhiyun
737*4882a593Smuzhiyun**mandatory**
738*4882a593Smuzhiyun
739*4882a593Smuzhiyun->rename() has an added flags argument.  Any flags not handled by the
740*4882a593Smuzhiyunfilesystem should result in EINVAL being returned.
741*4882a593Smuzhiyun
742*4882a593Smuzhiyun---
743*4882a593Smuzhiyun
744*4882a593Smuzhiyun
745*4882a593Smuzhiyun**recommended**
746*4882a593Smuzhiyun
747*4882a593Smuzhiyun->readlink is optional for symlinks.  Don't set, unless filesystem needs
748*4882a593Smuzhiyunto fake something for readlink(2).
749*4882a593Smuzhiyun
750*4882a593Smuzhiyun---
751*4882a593Smuzhiyun
752*4882a593Smuzhiyun**mandatory**
753*4882a593Smuzhiyun
754*4882a593Smuzhiyun->getattr() is now passed a struct path rather than a vfsmount and
755*4882a593Smuzhiyundentry separately, and it now has request_mask and query_flags arguments
756*4882a593Smuzhiyunto specify the fields and sync type requested by statx.  Filesystems not
757*4882a593Smuzhiyunsupporting any statx-specific features may ignore the new arguments.
758*4882a593Smuzhiyun
759*4882a593Smuzhiyun---
760*4882a593Smuzhiyun
761*4882a593Smuzhiyun**mandatory**
762*4882a593Smuzhiyun
763*4882a593Smuzhiyun->atomic_open() calling conventions have changed.  Gone is ``int *opened``,
764*4882a593Smuzhiyunalong with FILE_OPENED/FILE_CREATED.  In place of those we have
765*4882a593SmuzhiyunFMODE_OPENED/FMODE_CREATED, set in file->f_mode.  Additionally, return
766*4882a593Smuzhiyunvalue for 'called finish_no_open(), open it yourself' case has become
767*4882a593Smuzhiyun0, not 1.  Since finish_no_open() itself is returning 0 now, that part
768*4882a593Smuzhiyundoes not need any changes in ->atomic_open() instances.
769*4882a593Smuzhiyun
770*4882a593Smuzhiyun---
771*4882a593Smuzhiyun
772*4882a593Smuzhiyun**mandatory**
773*4882a593Smuzhiyun
774*4882a593Smuzhiyunalloc_file() has become static now; two wrappers are to be used instead.
775*4882a593Smuzhiyunalloc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases
776*4882a593Smuzhiyunwhen dentry needs to be created; that's the majority of old alloc_file()
777*4882a593Smuzhiyunusers.  Calling conventions: on success a reference to new struct file
778*4882a593Smuzhiyunis returned and callers reference to inode is subsumed by that.  On
779*4882a593Smuzhiyunfailure, ERR_PTR() is returned and no caller's references are affected,
780*4882a593Smuzhiyunso the caller needs to drop the inode reference it held.
781*4882a593Smuzhiyunalloc_file_clone(file, flags, ops) does not affect any caller's references.
782*4882a593SmuzhiyunOn success you get a new struct file sharing the mount/dentry with the
783*4882a593Smuzhiyunoriginal, on failure - ERR_PTR().
784*4882a593Smuzhiyun
785*4882a593Smuzhiyun---
786*4882a593Smuzhiyun
787*4882a593Smuzhiyun**mandatory**
788*4882a593Smuzhiyun
789*4882a593Smuzhiyun->clone_file_range() and ->dedupe_file_range have been replaced with
790*4882a593Smuzhiyun->remap_file_range().  See Documentation/filesystems/vfs.rst for more
791*4882a593Smuzhiyuninformation.
792*4882a593Smuzhiyun
793*4882a593Smuzhiyun---
794*4882a593Smuzhiyun
795*4882a593Smuzhiyun**recommended**
796*4882a593Smuzhiyun
797*4882a593Smuzhiyun->lookup() instances doing an equivalent of::
798*4882a593Smuzhiyun
799*4882a593Smuzhiyun	if (IS_ERR(inode))
800*4882a593Smuzhiyun		return ERR_CAST(inode);
801*4882a593Smuzhiyun	return d_splice_alias(inode, dentry);
802*4882a593Smuzhiyun
803*4882a593Smuzhiyundon't need to bother with the check - d_splice_alias() will do the
804*4882a593Smuzhiyunright thing when given ERR_PTR(...) as inode.  Moreover, passing NULL
805*4882a593Smuzhiyuninode to d_splice_alias() will also do the right thing (equivalent of
806*4882a593Smuzhiyund_add(dentry, NULL); return NULL;), so that kind of special cases
807*4882a593Smuzhiyunalso doesn't need a separate treatment.
808*4882a593Smuzhiyun
809*4882a593Smuzhiyun---
810*4882a593Smuzhiyun
811*4882a593Smuzhiyun**strongly recommended**
812*4882a593Smuzhiyun
813*4882a593Smuzhiyuntake the RCU-delayed parts of ->destroy_inode() into a new method -
814*4882a593Smuzhiyun->free_inode().  If ->destroy_inode() becomes empty - all the better,
815*4882a593Smuzhiyunjust get rid of it.  Synchronous work (e.g. the stuff that can't
816*4882a593Smuzhiyunbe done from an RCU callback, or any WARN_ON() where we want the
817*4882a593Smuzhiyunstack trace) *might* be movable to ->evict_inode(); however,
818*4882a593Smuzhiyunthat goes only for the things that are not needed to balance something
819*4882a593Smuzhiyundone by ->alloc_inode().  IOW, if it's cleaning up the stuff that
820*4882a593Smuzhiyunmight have accumulated over the life of in-core inode, ->evict_inode()
821*4882a593Smuzhiyunmight be a fit.
822*4882a593Smuzhiyun
823*4882a593SmuzhiyunRules for inode destruction:
824*4882a593Smuzhiyun
825*4882a593Smuzhiyun	* if ->destroy_inode() is non-NULL, it gets called
826*4882a593Smuzhiyun	* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
827*4882a593Smuzhiyun	* combination of NULL ->destroy_inode and NULL ->free_inode is
828*4882a593Smuzhiyun	  treated as NULL/free_inode_nonrcu, to preserve the compatibility.
829*4882a593Smuzhiyun
830*4882a593SmuzhiyunNote that the callback (be it via ->free_inode() or explicit call_rcu()
831*4882a593Smuzhiyunin ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
832*4882a593Smuzhiyunas the matter of fact, the superblock and all associated structures
833*4882a593Smuzhiyunmight be already gone.  The filesystem driver is guaranteed to be still
834*4882a593Smuzhiyunthere, but that's it.  Freeing memory in the callback is fine; doing
835*4882a593Smuzhiyunmore than that is possible, but requires a lot of care and is best
836*4882a593Smuzhiyunavoided.
837*4882a593Smuzhiyun
838*4882a593Smuzhiyun---
839*4882a593Smuzhiyun
840*4882a593Smuzhiyun**mandatory**
841*4882a593Smuzhiyun
842*4882a593SmuzhiyunDCACHE_RCUACCESS is gone; having an RCU delay on dentry freeing is the
843*4882a593Smuzhiyundefault.  DCACHE_NORCU opts out, and only d_alloc_pseudo() has any
844*4882a593Smuzhiyunbusiness doing so.
845*4882a593Smuzhiyun
846*4882a593Smuzhiyun---
847*4882a593Smuzhiyun
848*4882a593Smuzhiyun**mandatory**
849*4882a593Smuzhiyun
850*4882a593Smuzhiyund_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are
851*4882a593Smuzhiyunvery suspect (and won't work in modules).  Such uses are very likely to
852*4882a593Smuzhiyunbe misspelled d_alloc_anon().
853*4882a593Smuzhiyun
854*4882a593Smuzhiyun---
855*4882a593Smuzhiyun
856*4882a593Smuzhiyun**mandatory**
857*4882a593Smuzhiyun
858*4882a593Smuzhiyun[should've been added in 2016] stale comment in finish_open() nonwithstanding,
859*4882a593Smuzhiyunfailure exits in ->atomic_open() instances should *NOT* fput() the file,
860*4882a593Smuzhiyunno matter what.  Everything is handled by the caller.
861*4882a593Smuzhiyun
862*4882a593Smuzhiyun---
863*4882a593Smuzhiyun
864*4882a593Smuzhiyun**mandatory**
865*4882a593Smuzhiyun
866*4882a593Smuzhiyunclone_private_mount() returns a longterm mount now, so the proper destructor of
867*4882a593Smuzhiyunits result is kern_unmount() or kern_unmount_array().
868