1*4882a593Smuzhiyun==================== 2*4882a593SmuzhiyunChanges since 2.5.0: 3*4882a593Smuzhiyun==================== 4*4882a593Smuzhiyun 5*4882a593Smuzhiyun--- 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun**recommended** 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunNew helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(), 10*4882a593Smuzhiyunsb_set_blocksize() and sb_min_blocksize(). 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunUse them. 13*4882a593Smuzhiyun 14*4882a593Smuzhiyun(sb_find_get_block() replaces 2.4's get_hash_table()) 15*4882a593Smuzhiyun 16*4882a593Smuzhiyun--- 17*4882a593Smuzhiyun 18*4882a593Smuzhiyun**recommended** 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunNew methods: ->alloc_inode() and ->destroy_inode(). 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunRemove inode->u.foo_inode_i 23*4882a593Smuzhiyun 24*4882a593SmuzhiyunDeclare:: 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun struct foo_inode_info { 27*4882a593Smuzhiyun /* fs-private stuff */ 28*4882a593Smuzhiyun struct inode vfs_inode; 29*4882a593Smuzhiyun }; 30*4882a593Smuzhiyun static inline struct foo_inode_info *FOO_I(struct inode *inode) 31*4882a593Smuzhiyun { 32*4882a593Smuzhiyun return list_entry(inode, struct foo_inode_info, vfs_inode); 33*4882a593Smuzhiyun } 34*4882a593Smuzhiyun 35*4882a593SmuzhiyunUse FOO_I(inode) instead of &inode->u.foo_inode_i; 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunAdd foo_alloc_inode() and foo_destroy_inode() - the former should allocate 38*4882a593Smuzhiyunfoo_inode_info and return the address of ->vfs_inode, the latter should free 39*4882a593SmuzhiyunFOO_I(inode) (see in-tree filesystems for examples). 40*4882a593Smuzhiyun 41*4882a593SmuzhiyunMake them ->alloc_inode and ->destroy_inode in your super_operations. 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunKeep in mind that now you need explicit initialization of private data 44*4882a593Smuzhiyuntypically between calling iget_locked() and unlocking the inode. 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunAt some point that will become mandatory. 47*4882a593Smuzhiyun 48*4882a593Smuzhiyun--- 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun**mandatory** 51*4882a593Smuzhiyun 52*4882a593SmuzhiyunChange of file_system_type method (->read_super to ->get_sb) 53*4882a593Smuzhiyun 54*4882a593Smuzhiyun->read_super() is no more. Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV. 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunTurn your foo_read_super() into a function that would return 0 in case of 57*4882a593Smuzhiyunsuccess and negative number in case of error (-EINVAL unless you have more 58*4882a593Smuzhiyuninformative error value to report). Call it foo_fill_super(). Now declare:: 59*4882a593Smuzhiyun 60*4882a593Smuzhiyun int foo_get_sb(struct file_system_type *fs_type, 61*4882a593Smuzhiyun int flags, const char *dev_name, void *data, struct vfsmount *mnt) 62*4882a593Smuzhiyun { 63*4882a593Smuzhiyun return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super, 64*4882a593Smuzhiyun mnt); 65*4882a593Smuzhiyun } 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of 68*4882a593Smuzhiyunfilesystem). 69*4882a593Smuzhiyun 70*4882a593SmuzhiyunReplace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as 71*4882a593Smuzhiyunfoo_get_sb. 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun--- 74*4882a593Smuzhiyun 75*4882a593Smuzhiyun**mandatory** 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunLocking change: ->s_vfs_rename_sem is taken only by cross-directory renames. 78*4882a593SmuzhiyunMost likely there is no need to change anything, but if you relied on 79*4882a593Smuzhiyunglobal exclusion between renames for some internal purpose - you need to 80*4882a593Smuzhiyunchange your internal locking. Otherwise exclusion warranties remain the 81*4882a593Smuzhiyunsame (i.e. parents and victim are locked, etc.). 82*4882a593Smuzhiyun 83*4882a593Smuzhiyun--- 84*4882a593Smuzhiyun 85*4882a593Smuzhiyun**informational** 86*4882a593Smuzhiyun 87*4882a593SmuzhiyunNow we have the exclusion between ->lookup() and directory removal (by 88*4882a593Smuzhiyun->rmdir() and ->rename()). If you used to need that exclusion and do 89*4882a593Smuzhiyunit by internal locking (most of filesystems couldn't care less) - you 90*4882a593Smuzhiyuncan relax your locking. 91*4882a593Smuzhiyun 92*4882a593Smuzhiyun--- 93*4882a593Smuzhiyun 94*4882a593Smuzhiyun**mandatory** 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(), 97*4882a593Smuzhiyun->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename() 98*4882a593Smuzhiyunand ->readdir() are called without BKL now. Grab it on entry, drop upon return 99*4882a593Smuzhiyun- that will guarantee the same locking you used to have. If your method or its 100*4882a593Smuzhiyunparts do not need BKL - better yet, now you can shift lock_kernel() and 101*4882a593Smuzhiyununlock_kernel() so that they would protect exactly what needs to be 102*4882a593Smuzhiyunprotected. 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun--- 105*4882a593Smuzhiyun 106*4882a593Smuzhiyun**mandatory** 107*4882a593Smuzhiyun 108*4882a593SmuzhiyunBKL is also moved from around sb operations. BKL should have been shifted into 109*4882a593Smuzhiyunindividual fs sb_op functions. If you don't need it, remove it. 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun--- 112*4882a593Smuzhiyun 113*4882a593Smuzhiyun**informational** 114*4882a593Smuzhiyun 115*4882a593Smuzhiyuncheck for ->link() target not being a directory is done by callers. Feel 116*4882a593Smuzhiyunfree to drop it... 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun--- 119*4882a593Smuzhiyun 120*4882a593Smuzhiyun**informational** 121*4882a593Smuzhiyun 122*4882a593Smuzhiyun->link() callers hold ->i_mutex on the object we are linking to. Some of your 123*4882a593Smuzhiyunproblems might be over... 124*4882a593Smuzhiyun 125*4882a593Smuzhiyun--- 126*4882a593Smuzhiyun 127*4882a593Smuzhiyun**mandatory** 128*4882a593Smuzhiyun 129*4882a593Smuzhiyunnew file_system_type method - kill_sb(superblock). If you are converting 130*4882a593Smuzhiyunan existing filesystem, set it according to ->fs_flags:: 131*4882a593Smuzhiyun 132*4882a593Smuzhiyun FS_REQUIRES_DEV - kill_block_super 133*4882a593Smuzhiyun FS_LITTER - kill_litter_super 134*4882a593Smuzhiyun neither - kill_anon_super 135*4882a593Smuzhiyun 136*4882a593SmuzhiyunFS_LITTER is gone - just remove it from fs_flags. 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun--- 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun**mandatory** 141*4882a593Smuzhiyun 142*4882a593SmuzhiyunFS_SINGLE is gone (actually, that had happened back when ->get_sb() 143*4882a593Smuzhiyunwent in - and hadn't been documented ;-/). Just remove it from fs_flags 144*4882a593Smuzhiyun(and see ->get_sb() entry for other actions). 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun--- 147*4882a593Smuzhiyun 148*4882a593Smuzhiyun**mandatory** 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so 151*4882a593Smuzhiyunwatch for ->i_mutex-grabbing code that might be used by your ->setattr(). 152*4882a593SmuzhiyunCallers of notify_change() need ->i_mutex now. 153*4882a593Smuzhiyun 154*4882a593Smuzhiyun--- 155*4882a593Smuzhiyun 156*4882a593Smuzhiyun**recommended** 157*4882a593Smuzhiyun 158*4882a593SmuzhiyunNew super_block field ``struct export_operations *s_export_op`` for 159*4882a593Smuzhiyunexplicit support for exporting, e.g. via NFS. The structure is fully 160*4882a593Smuzhiyundocumented at its declaration in include/linux/fs.h, and in 161*4882a593SmuzhiyunDocumentation/filesystems/nfs/exporting.rst. 162*4882a593Smuzhiyun 163*4882a593SmuzhiyunBriefly it allows for the definition of decode_fh and encode_fh operations 164*4882a593Smuzhiyunto encode and decode filehandles, and allows the filesystem to use 165*4882a593Smuzhiyuna standard helper function for decode_fh, and provide file-system specific 166*4882a593Smuzhiyunsupport for this helper, particularly get_parent. 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunIt is planned that this will be required for exporting once the code 169*4882a593Smuzhiyunsettles down a bit. 170*4882a593Smuzhiyun 171*4882a593Smuzhiyun**mandatory** 172*4882a593Smuzhiyun 173*4882a593Smuzhiyuns_export_op is now required for exporting a filesystem. 174*4882a593Smuzhiyunisofs, ext2, ext3, resierfs, fat 175*4882a593Smuzhiyuncan be used as examples of very different filesystems. 176*4882a593Smuzhiyun 177*4882a593Smuzhiyun--- 178*4882a593Smuzhiyun 179*4882a593Smuzhiyun**mandatory** 180*4882a593Smuzhiyun 181*4882a593Smuzhiyuniget4() and the read_inode2 callback have been superseded by iget5_locked() 182*4882a593Smuzhiyunwhich has the following prototype:: 183*4882a593Smuzhiyun 184*4882a593Smuzhiyun struct inode *iget5_locked(struct super_block *sb, unsigned long ino, 185*4882a593Smuzhiyun int (*test)(struct inode *, void *), 186*4882a593Smuzhiyun int (*set)(struct inode *, void *), 187*4882a593Smuzhiyun void *data); 188*4882a593Smuzhiyun 189*4882a593Smuzhiyun'test' is an additional function that can be used when the inode 190*4882a593Smuzhiyunnumber is not sufficient to identify the actual file object. 'set' 191*4882a593Smuzhiyunshould be a non-blocking function that initializes those parts of a 192*4882a593Smuzhiyunnewly created inode to allow the test function to succeed. 'data' is 193*4882a593Smuzhiyunpassed as an opaque value to both test and set functions. 194*4882a593Smuzhiyun 195*4882a593SmuzhiyunWhen the inode has been created by iget5_locked(), it will be returned with the 196*4882a593SmuzhiyunI_NEW flag set and will still be locked. The filesystem then needs to finalize 197*4882a593Smuzhiyunthe initialization. Once the inode is initialized it must be unlocked by 198*4882a593Smuzhiyuncalling unlock_new_inode(). 199*4882a593Smuzhiyun 200*4882a593SmuzhiyunThe filesystem is responsible for setting (and possibly testing) i_ino 201*4882a593Smuzhiyunwhen appropriate. There is also a simpler iget_locked function that 202*4882a593Smuzhiyunjust takes the superblock and inode number as arguments and does the 203*4882a593Smuzhiyuntest and set for you. 204*4882a593Smuzhiyun 205*4882a593Smuzhiyune.g.:: 206*4882a593Smuzhiyun 207*4882a593Smuzhiyun inode = iget_locked(sb, ino); 208*4882a593Smuzhiyun if (inode->i_state & I_NEW) { 209*4882a593Smuzhiyun err = read_inode_from_disk(inode); 210*4882a593Smuzhiyun if (err < 0) { 211*4882a593Smuzhiyun iget_failed(inode); 212*4882a593Smuzhiyun return err; 213*4882a593Smuzhiyun } 214*4882a593Smuzhiyun unlock_new_inode(inode); 215*4882a593Smuzhiyun } 216*4882a593Smuzhiyun 217*4882a593SmuzhiyunNote that if the process of setting up a new inode fails, then iget_failed() 218*4882a593Smuzhiyunshould be called on the inode to render it dead, and an appropriate error 219*4882a593Smuzhiyunshould be passed back to the caller. 220*4882a593Smuzhiyun 221*4882a593Smuzhiyun--- 222*4882a593Smuzhiyun 223*4882a593Smuzhiyun**recommended** 224*4882a593Smuzhiyun 225*4882a593Smuzhiyun->getattr() finally getting used. See instances in nfs, minix, etc. 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun--- 228*4882a593Smuzhiyun 229*4882a593Smuzhiyun**mandatory** 230*4882a593Smuzhiyun 231*4882a593Smuzhiyun->revalidate() is gone. If your filesystem had it - provide ->getattr() 232*4882a593Smuzhiyunand let it call whatever you had as ->revlidate() + (for symlinks that 233*4882a593Smuzhiyunhad ->revalidate()) add calls in ->follow_link()/->readlink(). 234*4882a593Smuzhiyun 235*4882a593Smuzhiyun--- 236*4882a593Smuzhiyun 237*4882a593Smuzhiyun**mandatory** 238*4882a593Smuzhiyun 239*4882a593Smuzhiyun->d_parent changes are not protected by BKL anymore. Read access is safe 240*4882a593Smuzhiyunif at least one of the following is true: 241*4882a593Smuzhiyun 242*4882a593Smuzhiyun * filesystem has no cross-directory rename() 243*4882a593Smuzhiyun * we know that parent had been locked (e.g. we are looking at 244*4882a593Smuzhiyun ->d_parent of ->lookup() argument). 245*4882a593Smuzhiyun * we are called from ->rename(). 246*4882a593Smuzhiyun * the child's ->d_lock is held 247*4882a593Smuzhiyun 248*4882a593SmuzhiyunAudit your code and add locking if needed. Notice that any place that is 249*4882a593Smuzhiyunnot protected by the conditions above is risky even in the old tree - you 250*4882a593Smuzhiyunhad been relying on BKL and that's prone to screwups. Old tree had quite 251*4882a593Smuzhiyuna few holes of that kind - unprotected access to ->d_parent leading to 252*4882a593Smuzhiyunanything from oops to silent memory corruption. 253*4882a593Smuzhiyun 254*4882a593Smuzhiyun--- 255*4882a593Smuzhiyun 256*4882a593Smuzhiyun**mandatory** 257*4882a593Smuzhiyun 258*4882a593SmuzhiyunFS_NOMOUNT is gone. If you use it - just set SB_NOUSER in flags 259*4882a593Smuzhiyun(see rootfs for one kind of solution and bdev/socket/pipe for another). 260*4882a593Smuzhiyun 261*4882a593Smuzhiyun--- 262*4882a593Smuzhiyun 263*4882a593Smuzhiyun**recommended** 264*4882a593Smuzhiyun 265*4882a593SmuzhiyunUse bdev_read_only(bdev) instead of is_read_only(kdev). The latter 266*4882a593Smuzhiyunis still alive, but only because of the mess in drivers/s390/block/dasd.c. 267*4882a593SmuzhiyunAs soon as it gets fixed is_read_only() will die. 268*4882a593Smuzhiyun 269*4882a593Smuzhiyun--- 270*4882a593Smuzhiyun 271*4882a593Smuzhiyun**mandatory** 272*4882a593Smuzhiyun 273*4882a593Smuzhiyun->permission() is called without BKL now. Grab it on entry, drop upon 274*4882a593Smuzhiyunreturn - that will guarantee the same locking you used to have. If 275*4882a593Smuzhiyunyour method or its parts do not need BKL - better yet, now you can 276*4882a593Smuzhiyunshift lock_kernel() and unlock_kernel() so that they would protect 277*4882a593Smuzhiyunexactly what needs to be protected. 278*4882a593Smuzhiyun 279*4882a593Smuzhiyun--- 280*4882a593Smuzhiyun 281*4882a593Smuzhiyun**mandatory** 282*4882a593Smuzhiyun 283*4882a593Smuzhiyun->statfs() is now called without BKL held. BKL should have been 284*4882a593Smuzhiyunshifted into individual fs sb_op functions where it's not clear that 285*4882a593Smuzhiyunit's safe to remove it. If you don't need it, remove it. 286*4882a593Smuzhiyun 287*4882a593Smuzhiyun--- 288*4882a593Smuzhiyun 289*4882a593Smuzhiyun**mandatory** 290*4882a593Smuzhiyun 291*4882a593Smuzhiyunis_read_only() is gone; use bdev_read_only() instead. 292*4882a593Smuzhiyun 293*4882a593Smuzhiyun--- 294*4882a593Smuzhiyun 295*4882a593Smuzhiyun**mandatory** 296*4882a593Smuzhiyun 297*4882a593Smuzhiyundestroy_buffers() is gone; use invalidate_bdev(). 298*4882a593Smuzhiyun 299*4882a593Smuzhiyun--- 300*4882a593Smuzhiyun 301*4882a593Smuzhiyun**mandatory** 302*4882a593Smuzhiyun 303*4882a593Smuzhiyunfsync_dev() is gone; use fsync_bdev(). NOTE: lvm breakage is 304*4882a593Smuzhiyundeliberate; as soon as struct block_device * is propagated in a reasonable 305*4882a593Smuzhiyunway by that code fixing will become trivial; until then nothing can be 306*4882a593Smuzhiyundone. 307*4882a593Smuzhiyun 308*4882a593Smuzhiyun**mandatory** 309*4882a593Smuzhiyun 310*4882a593Smuzhiyunblock truncatation on error exit from ->write_begin, and ->direct_IO 311*4882a593Smuzhiyunmoved from generic methods (block_write_begin, cont_write_begin, 312*4882a593Smuzhiyunnobh_write_begin, blockdev_direct_IO*) to callers. Take a look at 313*4882a593Smuzhiyunext2_write_failed and callers for an example. 314*4882a593Smuzhiyun 315*4882a593Smuzhiyun**mandatory** 316*4882a593Smuzhiyun 317*4882a593Smuzhiyun->truncate is gone. The whole truncate sequence needs to be 318*4882a593Smuzhiyunimplemented in ->setattr, which is now mandatory for filesystems 319*4882a593Smuzhiyunimplementing on-disk size changes. Start with a copy of the old inode_setattr 320*4882a593Smuzhiyunand vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to 321*4882a593Smuzhiyunbe in order of zeroing blocks using block_truncate_page or similar helpers, 322*4882a593Smuzhiyunsize update and on finally on-disk truncation which should not fail. 323*4882a593Smuzhiyunsetattr_prepare (which used to be inode_change_ok) now includes the size checks 324*4882a593Smuzhiyunfor ATTR_SIZE and must be called in the beginning of ->setattr unconditionally. 325*4882a593Smuzhiyun 326*4882a593Smuzhiyun**mandatory** 327*4882a593Smuzhiyun 328*4882a593Smuzhiyun->clear_inode() and ->delete_inode() are gone; ->evict_inode() should 329*4882a593Smuzhiyunbe used instead. It gets called whenever the inode is evicted, whether it has 330*4882a593Smuzhiyunremaining links or not. Caller does *not* evict the pagecache or inode-associated 331*4882a593Smuzhiyunmetadata buffers; the method has to use truncate_inode_pages_final() to get rid 332*4882a593Smuzhiyunof those. Caller makes sure async writeback cannot be running for the inode while 333*4882a593Smuzhiyun(or after) ->evict_inode() is called. 334*4882a593Smuzhiyun 335*4882a593Smuzhiyun->drop_inode() returns int now; it's called on final iput() with 336*4882a593Smuzhiyuninode->i_lock held and it returns true if filesystems wants the inode to be 337*4882a593Smuzhiyundropped. As before, generic_drop_inode() is still the default and it's been 338*4882a593Smuzhiyunupdated appropriately. generic_delete_inode() is also alive and it consists 339*4882a593Smuzhiyunsimply of return 1. Note that all actual eviction work is done by caller after 340*4882a593Smuzhiyun->drop_inode() returns. 341*4882a593Smuzhiyun 342*4882a593SmuzhiyunAs before, clear_inode() must be called exactly once on each call of 343*4882a593Smuzhiyun->evict_inode() (as it used to be for each call of ->delete_inode()). Unlike 344*4882a593Smuzhiyunbefore, if you are using inode-associated metadata buffers (i.e. 345*4882a593Smuzhiyunmark_buffer_dirty_inode()), it's your responsibility to call 346*4882a593Smuzhiyuninvalidate_inode_buffers() before clear_inode(). 347*4882a593Smuzhiyun 348*4882a593SmuzhiyunNOTE: checking i_nlink in the beginning of ->write_inode() and bailing out 349*4882a593Smuzhiyunif it's zero is not *and* *never* *had* *been* enough. Final unlink() and iput() 350*4882a593Smuzhiyunmay happen while the inode is in the middle of ->write_inode(); e.g. if you blindly 351*4882a593Smuzhiyunfree the on-disk inode, you may end up doing that while ->write_inode() is writing 352*4882a593Smuzhiyunto it. 353*4882a593Smuzhiyun 354*4882a593Smuzhiyun--- 355*4882a593Smuzhiyun 356*4882a593Smuzhiyun**mandatory** 357*4882a593Smuzhiyun 358*4882a593Smuzhiyun.d_delete() now only advises the dcache as to whether or not to cache 359*4882a593Smuzhiyununreferenced dentries, and is now only called when the dentry refcount goes to 360*4882a593Smuzhiyun0. Even on 0 refcount transition, it must be able to tolerate being called 0, 361*4882a593Smuzhiyun1, or more times (eg. constant, idempotent). 362*4882a593Smuzhiyun 363*4882a593Smuzhiyun--- 364*4882a593Smuzhiyun 365*4882a593Smuzhiyun**mandatory** 366*4882a593Smuzhiyun 367*4882a593Smuzhiyun.d_compare() calling convention and locking rules are significantly 368*4882a593Smuzhiyunchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and 369*4882a593Smuzhiyunlook at examples of other filesystems) for guidance. 370*4882a593Smuzhiyun 371*4882a593Smuzhiyun--- 372*4882a593Smuzhiyun 373*4882a593Smuzhiyun**mandatory** 374*4882a593Smuzhiyun 375*4882a593Smuzhiyun.d_hash() calling convention and locking rules are significantly 376*4882a593Smuzhiyunchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and 377*4882a593Smuzhiyunlook at examples of other filesystems) for guidance. 378*4882a593Smuzhiyun 379*4882a593Smuzhiyun--- 380*4882a593Smuzhiyun 381*4882a593Smuzhiyun**mandatory** 382*4882a593Smuzhiyun 383*4882a593Smuzhiyundcache_lock is gone, replaced by fine grained locks. See fs/dcache.c 384*4882a593Smuzhiyunfor details of what locks to replace dcache_lock with in order to protect 385*4882a593Smuzhiyunparticular things. Most of the time, a filesystem only needs ->d_lock, which 386*4882a593Smuzhiyunprotects *all* the dcache state of a given dentry. 387*4882a593Smuzhiyun 388*4882a593Smuzhiyun--- 389*4882a593Smuzhiyun 390*4882a593Smuzhiyun**mandatory** 391*4882a593Smuzhiyun 392*4882a593SmuzhiyunFilesystems must RCU-free their inodes, if they can have been accessed 393*4882a593Smuzhiyunvia rcu-walk path walk (basically, if the file can have had a path name in the 394*4882a593Smuzhiyunvfs namespace). 395*4882a593Smuzhiyun 396*4882a593SmuzhiyunEven though i_dentry and i_rcu share storage in a union, we will 397*4882a593Smuzhiyuninitialize the former in inode_init_always(), so just leave it alone in 398*4882a593Smuzhiyunthe callback. It used to be necessary to clean it there, but not anymore 399*4882a593Smuzhiyun(starting at 3.2). 400*4882a593Smuzhiyun 401*4882a593Smuzhiyun--- 402*4882a593Smuzhiyun 403*4882a593Smuzhiyun**recommended** 404*4882a593Smuzhiyun 405*4882a593Smuzhiyunvfs now tries to do path walking in "rcu-walk mode", which avoids 406*4882a593Smuzhiyunatomic operations and scalability hazards on dentries and inodes (see 407*4882a593SmuzhiyunDocumentation/filesystems/path-lookup.txt). d_hash and d_compare changes 408*4882a593Smuzhiyun(above) are examples of the changes required to support this. For more complex 409*4882a593Smuzhiyunfilesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so 410*4882a593Smuzhiyunno changes are required to the filesystem. However, this is costly and loses 411*4882a593Smuzhiyunthe benefits of rcu-walk mode. We will begin to add filesystem callbacks that 412*4882a593Smuzhiyunare rcu-walk aware, shown below. Filesystems should take advantage of this 413*4882a593Smuzhiyunwhere possible. 414*4882a593Smuzhiyun 415*4882a593Smuzhiyun--- 416*4882a593Smuzhiyun 417*4882a593Smuzhiyun**mandatory** 418*4882a593Smuzhiyun 419*4882a593Smuzhiyund_revalidate is a callback that is made on every path element (if 420*4882a593Smuzhiyunthe filesystem provides it), which requires dropping out of rcu-walk mode. This 421*4882a593Smuzhiyunmay now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be 422*4882a593Smuzhiyunreturned if the filesystem cannot handle rcu-walk. See 423*4882a593SmuzhiyunDocumentation/filesystems/vfs.rst for more details. 424*4882a593Smuzhiyun 425*4882a593Smuzhiyunpermission is an inode permission check that is called on many or all 426*4882a593Smuzhiyundirectory inodes on the way down a path walk (to check for exec permission). It 427*4882a593Smuzhiyunmust now be rcu-walk aware (mask & MAY_NOT_BLOCK). See 428*4882a593SmuzhiyunDocumentation/filesystems/vfs.rst for more details. 429*4882a593Smuzhiyun 430*4882a593Smuzhiyun--- 431*4882a593Smuzhiyun 432*4882a593Smuzhiyun**mandatory** 433*4882a593Smuzhiyun 434*4882a593SmuzhiyunIn ->fallocate() you must check the mode option passed in. If your 435*4882a593Smuzhiyunfilesystem does not support hole punching (deallocating space in the middle of a 436*4882a593Smuzhiyunfile) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode. 437*4882a593SmuzhiyunCurrently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set, 438*4882a593Smuzhiyunso the i_size should not change when hole punching, even when puching the end of 439*4882a593Smuzhiyuna file off. 440*4882a593Smuzhiyun 441*4882a593Smuzhiyun--- 442*4882a593Smuzhiyun 443*4882a593Smuzhiyun**mandatory** 444*4882a593Smuzhiyun 445*4882a593Smuzhiyun->get_sb() is gone. Switch to use of ->mount(). Typically it's just 446*4882a593Smuzhiyuna matter of switching from calling ``get_sb_``... to ``mount_``... and changing 447*4882a593Smuzhiyunthe function type. If you were doing it manually, just switch from setting 448*4882a593Smuzhiyun->mnt_root to some pointer to returning that pointer. On errors return 449*4882a593SmuzhiyunERR_PTR(...). 450*4882a593Smuzhiyun 451*4882a593Smuzhiyun--- 452*4882a593Smuzhiyun 453*4882a593Smuzhiyun**mandatory** 454*4882a593Smuzhiyun 455*4882a593Smuzhiyun->permission() and generic_permission()have lost flags 456*4882a593Smuzhiyunargument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask. 457*4882a593Smuzhiyun 458*4882a593Smuzhiyungeneric_permission() has also lost the check_acl argument; ACL checking 459*4882a593Smuzhiyunhas been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl 460*4882a593Smuzhiyunto read an ACL from disk. 461*4882a593Smuzhiyun 462*4882a593Smuzhiyun--- 463*4882a593Smuzhiyun 464*4882a593Smuzhiyun**mandatory** 465*4882a593Smuzhiyun 466*4882a593SmuzhiyunIf you implement your own ->llseek() you must handle SEEK_HOLE and 467*4882a593SmuzhiyunSEEK_DATA. You can hanle this by returning -EINVAL, but it would be nicer to 468*4882a593Smuzhiyunsupport it in some way. The generic handler assumes that the entire file is 469*4882a593Smuzhiyundata and there is a virtual hole at the end of the file. So if the provided 470*4882a593Smuzhiyunoffset is less than i_size and SEEK_DATA is specified, return the same offset. 471*4882a593SmuzhiyunIf the above is true for the offset and you are given SEEK_HOLE, return the end 472*4882a593Smuzhiyunof the file. If the offset is i_size or greater return -ENXIO in either case. 473*4882a593Smuzhiyun 474*4882a593Smuzhiyun**mandatory** 475*4882a593Smuzhiyun 476*4882a593SmuzhiyunIf you have your own ->fsync() you must make sure to call 477*4882a593Smuzhiyunfilemap_write_and_wait_range() so that all dirty pages are synced out properly. 478*4882a593SmuzhiyunYou must also keep in mind that ->fsync() is not called with i_mutex held 479*4882a593Smuzhiyunanymore, so if you require i_mutex locking you must make sure to take it and 480*4882a593Smuzhiyunrelease it yourself. 481*4882a593Smuzhiyun 482*4882a593Smuzhiyun--- 483*4882a593Smuzhiyun 484*4882a593Smuzhiyun**mandatory** 485*4882a593Smuzhiyun 486*4882a593Smuzhiyund_alloc_root() is gone, along with a lot of bugs caused by code 487*4882a593Smuzhiyunmisusing it. Replacement: d_make_root(inode). On success d_make_root(inode) 488*4882a593Smuzhiyunallocates and returns a new dentry instantiated with the passed in inode. 489*4882a593SmuzhiyunOn failure NULL is returned and the passed in inode is dropped so the reference 490*4882a593Smuzhiyunto inode is consumed in all cases and failure handling need not do any cleanup 491*4882a593Smuzhiyunfor the inode. If d_make_root(inode) is passed a NULL inode it returns NULL 492*4882a593Smuzhiyunand also requires no further error handling. Typical usage is:: 493*4882a593Smuzhiyun 494*4882a593Smuzhiyun inode = foofs_new_inode(....); 495*4882a593Smuzhiyun s->s_root = d_make_root(inode); 496*4882a593Smuzhiyun if (!s->s_root) 497*4882a593Smuzhiyun /* Nothing needed for the inode cleanup */ 498*4882a593Smuzhiyun return -ENOMEM; 499*4882a593Smuzhiyun ... 500*4882a593Smuzhiyun 501*4882a593Smuzhiyun--- 502*4882a593Smuzhiyun 503*4882a593Smuzhiyun**mandatory** 504*4882a593Smuzhiyun 505*4882a593SmuzhiyunThe witch is dead! Well, 2/3 of it, anyway. ->d_revalidate() and 506*4882a593Smuzhiyun->lookup() do *not* take struct nameidata anymore; just the flags. 507*4882a593Smuzhiyun 508*4882a593Smuzhiyun--- 509*4882a593Smuzhiyun 510*4882a593Smuzhiyun**mandatory** 511*4882a593Smuzhiyun 512*4882a593Smuzhiyun->create() doesn't take ``struct nameidata *``; unlike the previous 513*4882a593Smuzhiyuntwo, it gets "is it an O_EXCL or equivalent?" boolean argument. Note that 514*4882a593Smuzhiyunlocal filesystems can ignore tha argument - they are guaranteed that the 515*4882a593Smuzhiyunobject doesn't exist. It's remote/distributed ones that might care... 516*4882a593Smuzhiyun 517*4882a593Smuzhiyun--- 518*4882a593Smuzhiyun 519*4882a593Smuzhiyun**mandatory** 520*4882a593Smuzhiyun 521*4882a593SmuzhiyunFS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate() 522*4882a593Smuzhiyunin your dentry operations instead. 523*4882a593Smuzhiyun 524*4882a593Smuzhiyun--- 525*4882a593Smuzhiyun 526*4882a593Smuzhiyun**mandatory** 527*4882a593Smuzhiyun 528*4882a593Smuzhiyunvfs_readdir() is gone; switch to iterate_dir() instead 529*4882a593Smuzhiyun 530*4882a593Smuzhiyun--- 531*4882a593Smuzhiyun 532*4882a593Smuzhiyun**mandatory** 533*4882a593Smuzhiyun 534*4882a593Smuzhiyun->readdir() is gone now; switch to ->iterate() 535*4882a593Smuzhiyun 536*4882a593Smuzhiyun**mandatory** 537*4882a593Smuzhiyun 538*4882a593Smuzhiyunvfs_follow_link has been removed. Filesystems must use nd_set_link 539*4882a593Smuzhiyunfrom ->follow_link for normal symlinks, or nd_jump_link for magic 540*4882a593Smuzhiyun/proc/<pid> style links. 541*4882a593Smuzhiyun 542*4882a593Smuzhiyun--- 543*4882a593Smuzhiyun 544*4882a593Smuzhiyun**mandatory** 545*4882a593Smuzhiyun 546*4882a593Smuzhiyuniget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be 547*4882a593Smuzhiyuncalled with both ->i_lock and inode_hash_lock held; the former is *not* 548*4882a593Smuzhiyuntaken anymore, so verify that your callbacks do not rely on it (none 549*4882a593Smuzhiyunof the in-tree instances did). inode_hash_lock is still held, 550*4882a593Smuzhiyunof course, so they are still serialized wrt removal from inode hash, 551*4882a593Smuzhiyunas well as wrt set() callback of iget5_locked(). 552*4882a593Smuzhiyun 553*4882a593Smuzhiyun--- 554*4882a593Smuzhiyun 555*4882a593Smuzhiyun**mandatory** 556*4882a593Smuzhiyun 557*4882a593Smuzhiyund_materialise_unique() is gone; d_splice_alias() does everything you 558*4882a593Smuzhiyunneed now. Remember that they have opposite orders of arguments ;-/ 559*4882a593Smuzhiyun 560*4882a593Smuzhiyun--- 561*4882a593Smuzhiyun 562*4882a593Smuzhiyun**mandatory** 563*4882a593Smuzhiyun 564*4882a593Smuzhiyunf_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid 565*4882a593Smuzhiyunit entirely. 566*4882a593Smuzhiyun 567*4882a593Smuzhiyun--- 568*4882a593Smuzhiyun 569*4882a593Smuzhiyun**mandatory** 570*4882a593Smuzhiyun 571*4882a593Smuzhiyunnever call ->read() and ->write() directly; use __vfs_{read,write} or 572*4882a593Smuzhiyunwrappers; instead of checking for ->write or ->read being NULL, look for 573*4882a593SmuzhiyunFMODE_CAN_{WRITE,READ} in file->f_mode. 574*4882a593Smuzhiyun 575*4882a593Smuzhiyun--- 576*4882a593Smuzhiyun 577*4882a593Smuzhiyun**mandatory** 578*4882a593Smuzhiyun 579*4882a593Smuzhiyundo _not_ use new_sync_{read,write} for ->read/->write; leave it NULL 580*4882a593Smuzhiyuninstead. 581*4882a593Smuzhiyun 582*4882a593Smuzhiyun--- 583*4882a593Smuzhiyun 584*4882a593Smuzhiyun**mandatory** 585*4882a593Smuzhiyun ->aio_read/->aio_write are gone. Use ->read_iter/->write_iter. 586*4882a593Smuzhiyun 587*4882a593Smuzhiyun--- 588*4882a593Smuzhiyun 589*4882a593Smuzhiyun**recommended** 590*4882a593Smuzhiyun 591*4882a593Smuzhiyunfor embedded ("fast") symlinks just set inode->i_link to wherever the 592*4882a593Smuzhiyunsymlink body is and use simple_follow_link() as ->follow_link(). 593*4882a593Smuzhiyun 594*4882a593Smuzhiyun--- 595*4882a593Smuzhiyun 596*4882a593Smuzhiyun**mandatory** 597*4882a593Smuzhiyun 598*4882a593Smuzhiyuncalling conventions for ->follow_link() have changed. Instead of returning 599*4882a593Smuzhiyuncookie and using nd_set_link() to store the body to traverse, we return 600*4882a593Smuzhiyunthe body to traverse and store the cookie using explicit void ** argument. 601*4882a593Smuzhiyunnameidata isn't passed at all - nd_jump_link() doesn't need it and 602*4882a593Smuzhiyunnd_[gs]et_link() is gone. 603*4882a593Smuzhiyun 604*4882a593Smuzhiyun--- 605*4882a593Smuzhiyun 606*4882a593Smuzhiyun**mandatory** 607*4882a593Smuzhiyun 608*4882a593Smuzhiyuncalling conventions for ->put_link() have changed. It gets inode instead of 609*4882a593Smuzhiyundentry, it does not get nameidata at all and it gets called only when cookie 610*4882a593Smuzhiyunis non-NULL. Note that link body isn't available anymore, so if you need it, 611*4882a593Smuzhiyunstore it as cookie. 612*4882a593Smuzhiyun 613*4882a593Smuzhiyun--- 614*4882a593Smuzhiyun 615*4882a593Smuzhiyun**mandatory** 616*4882a593Smuzhiyun 617*4882a593Smuzhiyunany symlink that might use page_follow_link_light/page_put_link() must 618*4882a593Smuzhiyunhave inode_nohighmem(inode) called before anything might start playing with 619*4882a593Smuzhiyunits pagecache. No highmem pages should end up in the pagecache of such 620*4882a593Smuzhiyunsymlinks. That includes any preseeding that might be done during symlink 621*4882a593Smuzhiyuncreation. __page_symlink() will honour the mapping gfp flags, so once 622*4882a593Smuzhiyunyou've done inode_nohighmem() it's safe to use, but if you allocate and 623*4882a593Smuzhiyuninsert the page manually, make sure to use the right gfp flags. 624*4882a593Smuzhiyun 625*4882a593Smuzhiyun--- 626*4882a593Smuzhiyun 627*4882a593Smuzhiyun**mandatory** 628*4882a593Smuzhiyun 629*4882a593Smuzhiyun->follow_link() is replaced with ->get_link(); same API, except that 630*4882a593Smuzhiyun 631*4882a593Smuzhiyun * ->get_link() gets inode as a separate argument 632*4882a593Smuzhiyun * ->get_link() may be called in RCU mode - in that case NULL 633*4882a593Smuzhiyun dentry is passed 634*4882a593Smuzhiyun 635*4882a593Smuzhiyun--- 636*4882a593Smuzhiyun 637*4882a593Smuzhiyun**mandatory** 638*4882a593Smuzhiyun 639*4882a593Smuzhiyun->get_link() gets struct delayed_call ``*done`` now, and should do 640*4882a593Smuzhiyunset_delayed_call() where it used to set ``*cookie``. 641*4882a593Smuzhiyun 642*4882a593Smuzhiyun->put_link() is gone - just give the destructor to set_delayed_call() 643*4882a593Smuzhiyunin ->get_link(). 644*4882a593Smuzhiyun 645*4882a593Smuzhiyun--- 646*4882a593Smuzhiyun 647*4882a593Smuzhiyun**mandatory** 648*4882a593Smuzhiyun 649*4882a593Smuzhiyun->getxattr() and xattr_handler.get() get dentry and inode passed separately. 650*4882a593Smuzhiyundentry might be yet to be attached to inode, so do _not_ use its ->d_inode 651*4882a593Smuzhiyunin the instances. Rationale: !@#!@# security_d_instantiate() needs to be 652*4882a593Smuzhiyuncalled before we attach dentry to inode. 653*4882a593Smuzhiyun 654*4882a593Smuzhiyun--- 655*4882a593Smuzhiyun 656*4882a593Smuzhiyun**mandatory** 657*4882a593Smuzhiyun 658*4882a593Smuzhiyunsymlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/ 659*4882a593Smuzhiyuni_pipe/i_link union zeroed out at inode eviction. As the result, you can't 660*4882a593Smuzhiyunassume that non-NULL value in ->i_nlink at ->destroy_inode() implies that 661*4882a593Smuzhiyunit's a symlink. Checking ->i_mode is really needed now. In-tree we had 662*4882a593Smuzhiyunto fix shmem_destroy_callback() that used to take that kind of shortcut; 663*4882a593Smuzhiyunwatch out, since that shortcut is no longer valid. 664*4882a593Smuzhiyun 665*4882a593Smuzhiyun--- 666*4882a593Smuzhiyun 667*4882a593Smuzhiyun**mandatory** 668*4882a593Smuzhiyun 669*4882a593Smuzhiyun->i_mutex is replaced with ->i_rwsem now. inode_lock() et.al. work as 670*4882a593Smuzhiyunthey used to - they just take it exclusive. However, ->lookup() may be 671*4882a593Smuzhiyuncalled with parent locked shared. Its instances must not 672*4882a593Smuzhiyun 673*4882a593Smuzhiyun * use d_instantiate) and d_rehash() separately - use d_add() or 674*4882a593Smuzhiyun d_splice_alias() instead. 675*4882a593Smuzhiyun * use d_rehash() alone - call d_add(new_dentry, NULL) instead. 676*4882a593Smuzhiyun * in the unlikely case when (read-only) access to filesystem 677*4882a593Smuzhiyun data structures needs exclusion for some reason, arrange it 678*4882a593Smuzhiyun yourself. None of the in-tree filesystems needed that. 679*4882a593Smuzhiyun * rely on ->d_parent and ->d_name not changing after dentry has 680*4882a593Smuzhiyun been fed to d_add() or d_splice_alias(). Again, none of the 681*4882a593Smuzhiyun in-tree instances relied upon that. 682*4882a593Smuzhiyun 683*4882a593SmuzhiyunWe are guaranteed that lookups of the same name in the same directory 684*4882a593Smuzhiyunwill not happen in parallel ("same" in the sense of your ->d_compare()). 685*4882a593SmuzhiyunLookups on different names in the same directory can and do happen in 686*4882a593Smuzhiyunparallel now. 687*4882a593Smuzhiyun 688*4882a593Smuzhiyun--- 689*4882a593Smuzhiyun 690*4882a593Smuzhiyun**recommended** 691*4882a593Smuzhiyun 692*4882a593Smuzhiyun->iterate_shared() is added; it's a parallel variant of ->iterate(). 693*4882a593SmuzhiyunExclusion on struct file level is still provided (as well as that 694*4882a593Smuzhiyunbetween it and lseek on the same struct file), but if your directory 695*4882a593Smuzhiyunhas been opened several times, you can get these called in parallel. 696*4882a593SmuzhiyunExclusion between that method and all directory-modifying ones is 697*4882a593Smuzhiyunstill provided, of course. 698*4882a593Smuzhiyun 699*4882a593SmuzhiyunOften enough ->iterate() can serve as ->iterate_shared() without any 700*4882a593Smuzhiyunchanges - it is a read-only operation, after all. If you have any 701*4882a593Smuzhiyunper-inode or per-dentry in-core data structures modified by ->iterate(), 702*4882a593Smuzhiyunyou might need something to serialize the access to them. If you 703*4882a593Smuzhiyundo dcache pre-seeding, you'll need to switch to d_alloc_parallel() for 704*4882a593Smuzhiyunthat; look for in-tree examples. 705*4882a593Smuzhiyun 706*4882a593SmuzhiyunOld method is only used if the new one is absent; eventually it will 707*4882a593Smuzhiyunbe removed. Switch while you still can; the old one won't stay. 708*4882a593Smuzhiyun 709*4882a593Smuzhiyun--- 710*4882a593Smuzhiyun 711*4882a593Smuzhiyun**mandatory** 712*4882a593Smuzhiyun 713*4882a593Smuzhiyun->atomic_open() calls without O_CREAT may happen in parallel. 714*4882a593Smuzhiyun 715*4882a593Smuzhiyun--- 716*4882a593Smuzhiyun 717*4882a593Smuzhiyun**mandatory** 718*4882a593Smuzhiyun 719*4882a593Smuzhiyun->setxattr() and xattr_handler.set() get dentry and inode passed separately. 720*4882a593Smuzhiyundentry might be yet to be attached to inode, so do _not_ use its ->d_inode 721*4882a593Smuzhiyunin the instances. Rationale: !@#!@# security_d_instantiate() needs to be 722*4882a593Smuzhiyuncalled before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack 723*4882a593Smuzhiyun->d_instantiate() uses not just ->getxattr() but ->setxattr() as well. 724*4882a593Smuzhiyun 725*4882a593Smuzhiyun--- 726*4882a593Smuzhiyun 727*4882a593Smuzhiyun**mandatory** 728*4882a593Smuzhiyun 729*4882a593Smuzhiyun->d_compare() doesn't get parent as a separate argument anymore. If you 730*4882a593Smuzhiyunused it for finding the struct super_block involved, dentry->d_sb will 731*4882a593Smuzhiyunwork just as well; if it's something more complicated, use dentry->d_parent. 732*4882a593SmuzhiyunJust be careful not to assume that fetching it more than once will yield 733*4882a593Smuzhiyunthe same value - in RCU mode it could change under you. 734*4882a593Smuzhiyun 735*4882a593Smuzhiyun--- 736*4882a593Smuzhiyun 737*4882a593Smuzhiyun**mandatory** 738*4882a593Smuzhiyun 739*4882a593Smuzhiyun->rename() has an added flags argument. Any flags not handled by the 740*4882a593Smuzhiyunfilesystem should result in EINVAL being returned. 741*4882a593Smuzhiyun 742*4882a593Smuzhiyun--- 743*4882a593Smuzhiyun 744*4882a593Smuzhiyun 745*4882a593Smuzhiyun**recommended** 746*4882a593Smuzhiyun 747*4882a593Smuzhiyun->readlink is optional for symlinks. Don't set, unless filesystem needs 748*4882a593Smuzhiyunto fake something for readlink(2). 749*4882a593Smuzhiyun 750*4882a593Smuzhiyun--- 751*4882a593Smuzhiyun 752*4882a593Smuzhiyun**mandatory** 753*4882a593Smuzhiyun 754*4882a593Smuzhiyun->getattr() is now passed a struct path rather than a vfsmount and 755*4882a593Smuzhiyundentry separately, and it now has request_mask and query_flags arguments 756*4882a593Smuzhiyunto specify the fields and sync type requested by statx. Filesystems not 757*4882a593Smuzhiyunsupporting any statx-specific features may ignore the new arguments. 758*4882a593Smuzhiyun 759*4882a593Smuzhiyun--- 760*4882a593Smuzhiyun 761*4882a593Smuzhiyun**mandatory** 762*4882a593Smuzhiyun 763*4882a593Smuzhiyun->atomic_open() calling conventions have changed. Gone is ``int *opened``, 764*4882a593Smuzhiyunalong with FILE_OPENED/FILE_CREATED. In place of those we have 765*4882a593SmuzhiyunFMODE_OPENED/FMODE_CREATED, set in file->f_mode. Additionally, return 766*4882a593Smuzhiyunvalue for 'called finish_no_open(), open it yourself' case has become 767*4882a593Smuzhiyun0, not 1. Since finish_no_open() itself is returning 0 now, that part 768*4882a593Smuzhiyundoes not need any changes in ->atomic_open() instances. 769*4882a593Smuzhiyun 770*4882a593Smuzhiyun--- 771*4882a593Smuzhiyun 772*4882a593Smuzhiyun**mandatory** 773*4882a593Smuzhiyun 774*4882a593Smuzhiyunalloc_file() has become static now; two wrappers are to be used instead. 775*4882a593Smuzhiyunalloc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases 776*4882a593Smuzhiyunwhen dentry needs to be created; that's the majority of old alloc_file() 777*4882a593Smuzhiyunusers. Calling conventions: on success a reference to new struct file 778*4882a593Smuzhiyunis returned and callers reference to inode is subsumed by that. On 779*4882a593Smuzhiyunfailure, ERR_PTR() is returned and no caller's references are affected, 780*4882a593Smuzhiyunso the caller needs to drop the inode reference it held. 781*4882a593Smuzhiyunalloc_file_clone(file, flags, ops) does not affect any caller's references. 782*4882a593SmuzhiyunOn success you get a new struct file sharing the mount/dentry with the 783*4882a593Smuzhiyunoriginal, on failure - ERR_PTR(). 784*4882a593Smuzhiyun 785*4882a593Smuzhiyun--- 786*4882a593Smuzhiyun 787*4882a593Smuzhiyun**mandatory** 788*4882a593Smuzhiyun 789*4882a593Smuzhiyun->clone_file_range() and ->dedupe_file_range have been replaced with 790*4882a593Smuzhiyun->remap_file_range(). See Documentation/filesystems/vfs.rst for more 791*4882a593Smuzhiyuninformation. 792*4882a593Smuzhiyun 793*4882a593Smuzhiyun--- 794*4882a593Smuzhiyun 795*4882a593Smuzhiyun**recommended** 796*4882a593Smuzhiyun 797*4882a593Smuzhiyun->lookup() instances doing an equivalent of:: 798*4882a593Smuzhiyun 799*4882a593Smuzhiyun if (IS_ERR(inode)) 800*4882a593Smuzhiyun return ERR_CAST(inode); 801*4882a593Smuzhiyun return d_splice_alias(inode, dentry); 802*4882a593Smuzhiyun 803*4882a593Smuzhiyundon't need to bother with the check - d_splice_alias() will do the 804*4882a593Smuzhiyunright thing when given ERR_PTR(...) as inode. Moreover, passing NULL 805*4882a593Smuzhiyuninode to d_splice_alias() will also do the right thing (equivalent of 806*4882a593Smuzhiyund_add(dentry, NULL); return NULL;), so that kind of special cases 807*4882a593Smuzhiyunalso doesn't need a separate treatment. 808*4882a593Smuzhiyun 809*4882a593Smuzhiyun--- 810*4882a593Smuzhiyun 811*4882a593Smuzhiyun**strongly recommended** 812*4882a593Smuzhiyun 813*4882a593Smuzhiyuntake the RCU-delayed parts of ->destroy_inode() into a new method - 814*4882a593Smuzhiyun->free_inode(). If ->destroy_inode() becomes empty - all the better, 815*4882a593Smuzhiyunjust get rid of it. Synchronous work (e.g. the stuff that can't 816*4882a593Smuzhiyunbe done from an RCU callback, or any WARN_ON() where we want the 817*4882a593Smuzhiyunstack trace) *might* be movable to ->evict_inode(); however, 818*4882a593Smuzhiyunthat goes only for the things that are not needed to balance something 819*4882a593Smuzhiyundone by ->alloc_inode(). IOW, if it's cleaning up the stuff that 820*4882a593Smuzhiyunmight have accumulated over the life of in-core inode, ->evict_inode() 821*4882a593Smuzhiyunmight be a fit. 822*4882a593Smuzhiyun 823*4882a593SmuzhiyunRules for inode destruction: 824*4882a593Smuzhiyun 825*4882a593Smuzhiyun * if ->destroy_inode() is non-NULL, it gets called 826*4882a593Smuzhiyun * if ->free_inode() is non-NULL, it gets scheduled by call_rcu() 827*4882a593Smuzhiyun * combination of NULL ->destroy_inode and NULL ->free_inode is 828*4882a593Smuzhiyun treated as NULL/free_inode_nonrcu, to preserve the compatibility. 829*4882a593Smuzhiyun 830*4882a593SmuzhiyunNote that the callback (be it via ->free_inode() or explicit call_rcu() 831*4882a593Smuzhiyunin ->destroy_inode()) is *NOT* ordered wrt superblock destruction; 832*4882a593Smuzhiyunas the matter of fact, the superblock and all associated structures 833*4882a593Smuzhiyunmight be already gone. The filesystem driver is guaranteed to be still 834*4882a593Smuzhiyunthere, but that's it. Freeing memory in the callback is fine; doing 835*4882a593Smuzhiyunmore than that is possible, but requires a lot of care and is best 836*4882a593Smuzhiyunavoided. 837*4882a593Smuzhiyun 838*4882a593Smuzhiyun--- 839*4882a593Smuzhiyun 840*4882a593Smuzhiyun**mandatory** 841*4882a593Smuzhiyun 842*4882a593SmuzhiyunDCACHE_RCUACCESS is gone; having an RCU delay on dentry freeing is the 843*4882a593Smuzhiyundefault. DCACHE_NORCU opts out, and only d_alloc_pseudo() has any 844*4882a593Smuzhiyunbusiness doing so. 845*4882a593Smuzhiyun 846*4882a593Smuzhiyun--- 847*4882a593Smuzhiyun 848*4882a593Smuzhiyun**mandatory** 849*4882a593Smuzhiyun 850*4882a593Smuzhiyund_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are 851*4882a593Smuzhiyunvery suspect (and won't work in modules). Such uses are very likely to 852*4882a593Smuzhiyunbe misspelled d_alloc_anon(). 853*4882a593Smuzhiyun 854*4882a593Smuzhiyun--- 855*4882a593Smuzhiyun 856*4882a593Smuzhiyun**mandatory** 857*4882a593Smuzhiyun 858*4882a593Smuzhiyun[should've been added in 2016] stale comment in finish_open() nonwithstanding, 859*4882a593Smuzhiyunfailure exits in ->atomic_open() instances should *NOT* fput() the file, 860*4882a593Smuzhiyunno matter what. Everything is handled by the caller. 861*4882a593Smuzhiyun 862*4882a593Smuzhiyun--- 863*4882a593Smuzhiyun 864*4882a593Smuzhiyun**mandatory** 865*4882a593Smuzhiyun 866*4882a593Smuzhiyunclone_private_mount() returns a longterm mount now, so the proper destructor of 867*4882a593Smuzhiyunits result is kern_unmount() or kern_unmount_array(). 868