1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun====== 4*4882a593SmuzhiyunNILFS2 5*4882a593Smuzhiyun====== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunNILFS2 is a log-structured file system (LFS) supporting continuous 8*4882a593Smuzhiyunsnapshotting. In addition to versioning capability of the entire file 9*4882a593Smuzhiyunsystem, users can even restore files mistakenly overwritten or 10*4882a593Smuzhiyundestroyed just a few seconds ago. Since NILFS2 can keep consistency 11*4882a593Smuzhiyunlike conventional LFS, it achieves quick recovery after system 12*4882a593Smuzhiyuncrashes. 13*4882a593Smuzhiyun 14*4882a593SmuzhiyunNILFS2 creates a number of checkpoints every few seconds or per 15*4882a593Smuzhiyunsynchronous write basis (unless there is no change). Users can select 16*4882a593Smuzhiyunsignificant versions among continuously created checkpoints, and can 17*4882a593Smuzhiyunchange them into snapshots which will be preserved until they are 18*4882a593Smuzhiyunchanged back to checkpoints. 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunThere is no limit on the number of snapshots until the volume gets 21*4882a593Smuzhiyunfull. Each snapshot is mountable as a read-only file system 22*4882a593Smuzhiyunconcurrently with its writable mount, and this feature is convenient 23*4882a593Smuzhiyunfor online backup. 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunThe userland tools are included in nilfs-utils package, which is 26*4882a593Smuzhiyunavailable from the following download page. At least "mkfs.nilfs2", 27*4882a593Smuzhiyun"mount.nilfs2", "umount.nilfs2", and "nilfs_cleanerd" (so called 28*4882a593Smuzhiyuncleaner or garbage collector) are required. Details on the tools are 29*4882a593Smuzhiyundescribed in the man pages included in the package. 30*4882a593Smuzhiyun 31*4882a593Smuzhiyun:Project web page: https://nilfs.sourceforge.io/ 32*4882a593Smuzhiyun:Download page: https://nilfs.sourceforge.io/en/download.html 33*4882a593Smuzhiyun:List info: http://vger.kernel.org/vger-lists.html#linux-nilfs 34*4882a593Smuzhiyun 35*4882a593SmuzhiyunCaveats 36*4882a593Smuzhiyun======= 37*4882a593Smuzhiyun 38*4882a593SmuzhiyunFeatures which NILFS2 does not support yet: 39*4882a593Smuzhiyun 40*4882a593Smuzhiyun - atime 41*4882a593Smuzhiyun - extended attributes 42*4882a593Smuzhiyun - POSIX ACLs 43*4882a593Smuzhiyun - quotas 44*4882a593Smuzhiyun - fsck 45*4882a593Smuzhiyun - defragmentation 46*4882a593Smuzhiyun 47*4882a593SmuzhiyunMount options 48*4882a593Smuzhiyun============= 49*4882a593Smuzhiyun 50*4882a593SmuzhiyunNILFS2 supports the following mount options: 51*4882a593Smuzhiyun(*) == default 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun======================= ======================================================= 54*4882a593Smuzhiyunbarrier(*) This enables/disables the use of write barriers. This 55*4882a593Smuzhiyunnobarrier requires an IO stack which can support barriers, and 56*4882a593Smuzhiyun if nilfs gets an error on a barrier write, it will 57*4882a593Smuzhiyun disable again with a warning. 58*4882a593Smuzhiyunerrors=continue Keep going on a filesystem error. 59*4882a593Smuzhiyunerrors=remount-ro(*) Remount the filesystem read-only on an error. 60*4882a593Smuzhiyunerrors=panic Panic and halt the machine if an error occurs. 61*4882a593Smuzhiyuncp=n Specify the checkpoint-number of the snapshot to be 62*4882a593Smuzhiyun mounted. Checkpoints and snapshots are listed by lscp 63*4882a593Smuzhiyun user command. Only the checkpoints marked as snapshot 64*4882a593Smuzhiyun are mountable with this option. Snapshot is read-only, 65*4882a593Smuzhiyun so a read-only mount option must be specified together. 66*4882a593Smuzhiyunorder=relaxed(*) Apply relaxed order semantics that allows modified data 67*4882a593Smuzhiyun blocks to be written to disk without making a 68*4882a593Smuzhiyun checkpoint if no metadata update is going. This mode 69*4882a593Smuzhiyun is equivalent to the ordered data mode of the ext3 70*4882a593Smuzhiyun filesystem except for the updates on data blocks still 71*4882a593Smuzhiyun conserve atomicity. This will improve synchronous 72*4882a593Smuzhiyun write performance for overwriting. 73*4882a593Smuzhiyunorder=strict Apply strict in-order semantics that preserves sequence 74*4882a593Smuzhiyun of all file operations including overwriting of data 75*4882a593Smuzhiyun blocks. That means, it is guaranteed that no 76*4882a593Smuzhiyun overtaking of events occurs in the recovered file 77*4882a593Smuzhiyun system after a crash. 78*4882a593Smuzhiyunnorecovery Disable recovery of the filesystem on mount. 79*4882a593Smuzhiyun This disables every write access on the device for 80*4882a593Smuzhiyun read-only mounts or snapshots. This option will fail 81*4882a593Smuzhiyun for r/w mounts on an unclean volume. 82*4882a593Smuzhiyundiscard This enables/disables the use of discard/TRIM commands. 83*4882a593Smuzhiyunnodiscard(*) The discard/TRIM commands are sent to the underlying 84*4882a593Smuzhiyun block device when blocks are freed. This is useful 85*4882a593Smuzhiyun for SSD devices and sparse/thinly-provisioned LUNs. 86*4882a593Smuzhiyun======================= ======================================================= 87*4882a593Smuzhiyun 88*4882a593SmuzhiyunIoctls 89*4882a593Smuzhiyun====== 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunThere is some NILFS2 specific functionality which can be accessed by applications 92*4882a593Smuzhiyunthrough the system call interfaces. The list of all NILFS2 specific ioctls are 93*4882a593Smuzhiyunshown in the table below. 94*4882a593Smuzhiyun 95*4882a593SmuzhiyunTable of NILFS2 specific ioctls: 96*4882a593Smuzhiyun 97*4882a593Smuzhiyun ============================== =============================================== 98*4882a593Smuzhiyun Ioctl Description 99*4882a593Smuzhiyun ============================== =============================================== 100*4882a593Smuzhiyun NILFS_IOCTL_CHANGE_CPMODE Change mode of given checkpoint between 101*4882a593Smuzhiyun checkpoint and snapshot state. This ioctl is 102*4882a593Smuzhiyun used in chcp and mkcp utilities. 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun NILFS_IOCTL_DELETE_CHECKPOINT Remove checkpoint from NILFS2 file system. 105*4882a593Smuzhiyun This ioctl is used in rmcp utility. 106*4882a593Smuzhiyun 107*4882a593Smuzhiyun NILFS_IOCTL_GET_CPINFO Return info about requested checkpoints. This 108*4882a593Smuzhiyun ioctl is used in lscp utility and by 109*4882a593Smuzhiyun nilfs_cleanerd daemon. 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun NILFS_IOCTL_GET_CPSTAT Return checkpoints statistics. This ioctl is 112*4882a593Smuzhiyun used by lscp, rmcp utilities and by 113*4882a593Smuzhiyun nilfs_cleanerd daemon. 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun NILFS_IOCTL_GET_SUINFO Return segment usage info about requested 116*4882a593Smuzhiyun segments. This ioctl is used in lssu, 117*4882a593Smuzhiyun nilfs_resize utilities and by nilfs_cleanerd 118*4882a593Smuzhiyun daemon. 119*4882a593Smuzhiyun 120*4882a593Smuzhiyun NILFS_IOCTL_SET_SUINFO Modify segment usage info of requested 121*4882a593Smuzhiyun segments. This ioctl is used by 122*4882a593Smuzhiyun nilfs_cleanerd daemon to skip unnecessary 123*4882a593Smuzhiyun cleaning operation of segments and reduce 124*4882a593Smuzhiyun performance penalty or wear of flash device 125*4882a593Smuzhiyun due to redundant move of in-use blocks. 126*4882a593Smuzhiyun 127*4882a593Smuzhiyun NILFS_IOCTL_GET_SUSTAT Return segment usage statistics. This ioctl 128*4882a593Smuzhiyun is used in lssu, nilfs_resize utilities and 129*4882a593Smuzhiyun by nilfs_cleanerd daemon. 130*4882a593Smuzhiyun 131*4882a593Smuzhiyun NILFS_IOCTL_GET_VINFO Return information on virtual block addresses. 132*4882a593Smuzhiyun This ioctl is used by nilfs_cleanerd daemon. 133*4882a593Smuzhiyun 134*4882a593Smuzhiyun NILFS_IOCTL_GET_BDESCS Return information about descriptors of disk 135*4882a593Smuzhiyun block numbers. This ioctl is used by 136*4882a593Smuzhiyun nilfs_cleanerd daemon. 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun NILFS_IOCTL_CLEAN_SEGMENTS Do garbage collection operation in the 139*4882a593Smuzhiyun environment of requested parameters from 140*4882a593Smuzhiyun userspace. This ioctl is used by 141*4882a593Smuzhiyun nilfs_cleanerd daemon. 142*4882a593Smuzhiyun 143*4882a593Smuzhiyun NILFS_IOCTL_SYNC Make a checkpoint. This ioctl is used in 144*4882a593Smuzhiyun mkcp utility. 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun NILFS_IOCTL_RESIZE Resize NILFS2 volume. This ioctl is used 147*4882a593Smuzhiyun by nilfs_resize utility. 148*4882a593Smuzhiyun 149*4882a593Smuzhiyun NILFS_IOCTL_SET_ALLOC_RANGE Define lower limit of segments in bytes and 150*4882a593Smuzhiyun upper limit of segments in bytes. This ioctl 151*4882a593Smuzhiyun is used by nilfs_resize utility. 152*4882a593Smuzhiyun ============================== =============================================== 153*4882a593Smuzhiyun 154*4882a593SmuzhiyunNILFS2 usage 155*4882a593Smuzhiyun============ 156*4882a593Smuzhiyun 157*4882a593SmuzhiyunTo use nilfs2 as a local file system, simply:: 158*4882a593Smuzhiyun 159*4882a593Smuzhiyun # mkfs -t nilfs2 /dev/block_device 160*4882a593Smuzhiyun # mount -t nilfs2 /dev/block_device /dir 161*4882a593Smuzhiyun 162*4882a593SmuzhiyunThis will also invoke the cleaner through the mount helper program 163*4882a593Smuzhiyun(mount.nilfs2). 164*4882a593Smuzhiyun 165*4882a593SmuzhiyunCheckpoints and snapshots are managed by the following commands. 166*4882a593SmuzhiyunTheir manpages are included in the nilfs-utils package above. 167*4882a593Smuzhiyun 168*4882a593Smuzhiyun ==== =========================================================== 169*4882a593Smuzhiyun lscp list checkpoints or snapshots. 170*4882a593Smuzhiyun mkcp make a checkpoint or a snapshot. 171*4882a593Smuzhiyun chcp change an existing checkpoint to a snapshot or vice versa. 172*4882a593Smuzhiyun rmcp invalidate specified checkpoint(s). 173*4882a593Smuzhiyun ==== =========================================================== 174*4882a593Smuzhiyun 175*4882a593SmuzhiyunTo mount a snapshot:: 176*4882a593Smuzhiyun 177*4882a593Smuzhiyun # mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir 178*4882a593Smuzhiyun 179*4882a593Smuzhiyunwhere <cno> is the checkpoint number of the snapshot. 180*4882a593Smuzhiyun 181*4882a593SmuzhiyunTo unmount the NILFS2 mount point or snapshot, simply:: 182*4882a593Smuzhiyun 183*4882a593Smuzhiyun # umount /dir 184*4882a593Smuzhiyun 185*4882a593SmuzhiyunThen, the cleaner daemon is automatically shut down by the umount 186*4882a593Smuzhiyunhelper program (umount.nilfs2). 187*4882a593Smuzhiyun 188*4882a593SmuzhiyunDisk format 189*4882a593Smuzhiyun=========== 190*4882a593Smuzhiyun 191*4882a593SmuzhiyunA nilfs2 volume is equally divided into a number of segments except 192*4882a593Smuzhiyunfor the super block (SB) and segment #0. A segment is the container 193*4882a593Smuzhiyunof logs. Each log is composed of summary information blocks, payload 194*4882a593Smuzhiyunblocks, and an optional super root block (SR):: 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun ______________________________________________________ 197*4882a593Smuzhiyun | |SB| | Segment | Segment | Segment | ... | Segment | | 198*4882a593Smuzhiyun |_|__|_|____0____|____1____|____2____|_____|____N____|_| 199*4882a593Smuzhiyun 0 +1K +4K +8M +16M +24M +(8MB x N) 200*4882a593Smuzhiyun . . (Typical offsets for 4KB-block) 201*4882a593Smuzhiyun . . 202*4882a593Smuzhiyun .______________________. 203*4882a593Smuzhiyun | log | log |... | log | 204*4882a593Smuzhiyun |__1__|__2__|____|__m__| 205*4882a593Smuzhiyun . . 206*4882a593Smuzhiyun . . 207*4882a593Smuzhiyun . . 208*4882a593Smuzhiyun .______________________________. 209*4882a593Smuzhiyun | Summary | Payload blocks |SR| 210*4882a593Smuzhiyun |_blocks__|_________________|__| 211*4882a593Smuzhiyun 212*4882a593SmuzhiyunThe payload blocks are organized per file, and each file consists of 213*4882a593Smuzhiyundata blocks and B-tree node blocks:: 214*4882a593Smuzhiyun 215*4882a593Smuzhiyun |<--- File-A --->|<--- File-B --->| 216*4882a593Smuzhiyun _______________________________________________________________ 217*4882a593Smuzhiyun | Data blocks | B-tree blocks | Data blocks | B-tree blocks | ... 218*4882a593Smuzhiyun _|_____________|_______________|_____________|_______________|_ 219*4882a593Smuzhiyun 220*4882a593Smuzhiyun 221*4882a593SmuzhiyunSince only the modified blocks are written in the log, it may have 222*4882a593Smuzhiyunfiles without data blocks or B-tree node blocks. 223*4882a593Smuzhiyun 224*4882a593SmuzhiyunThe organization of the blocks is recorded in the summary information 225*4882a593Smuzhiyunblocks, which contains a header structure (nilfs_segment_summary), per 226*4882a593Smuzhiyunfile structures (nilfs_finfo), and per block structures (nilfs_binfo):: 227*4882a593Smuzhiyun 228*4882a593Smuzhiyun _________________________________________________________________________ 229*4882a593Smuzhiyun | Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |... 230*4882a593Smuzhiyun |_blocks__|___A___|_(A,1)_|_____|(A,Na)_|___B___|_(B,1)_|_____|(B,Nb)_|___ 231*4882a593Smuzhiyun 232*4882a593Smuzhiyun 233*4882a593SmuzhiyunThe logs include regular files, directory files, symbolic link files 234*4882a593Smuzhiyunand several meta data files. The mata data files are the files used 235*4882a593Smuzhiyunto maintain file system meta data. The current version of NILFS2 uses 236*4882a593Smuzhiyunthe following meta data files:: 237*4882a593Smuzhiyun 238*4882a593Smuzhiyun 1) Inode file (ifile) -- Stores on-disk inodes 239*4882a593Smuzhiyun 2) Checkpoint file (cpfile) -- Stores checkpoints 240*4882a593Smuzhiyun 3) Segment usage file (sufile) -- Stores allocation state of segments 241*4882a593Smuzhiyun 4) Data address translation file -- Maps virtual block numbers to usual 242*4882a593Smuzhiyun (DAT) block numbers. This file serves to 243*4882a593Smuzhiyun make on-disk blocks relocatable. 244*4882a593Smuzhiyun 245*4882a593SmuzhiyunThe following figure shows a typical organization of the logs:: 246*4882a593Smuzhiyun 247*4882a593Smuzhiyun _________________________________________________________________________ 248*4882a593Smuzhiyun | Summary | regular file | file | ... | ifile | cpfile | sufile | DAT |SR| 249*4882a593Smuzhiyun |_blocks__|_or_directory_|_______|_____|_______|________|________|_____|__| 250*4882a593Smuzhiyun 251*4882a593Smuzhiyun 252*4882a593SmuzhiyunTo stride over segment boundaries, this sequence of files may be split 253*4882a593Smuzhiyuninto multiple logs. The sequence of logs that should be treated as 254*4882a593Smuzhiyunlogically one log, is delimited with flags marked in the segment 255*4882a593Smuzhiyunsummary. The recovery code of nilfs2 looks this boundary information 256*4882a593Smuzhiyunto ensure atomicity of updates. 257*4882a593Smuzhiyun 258*4882a593SmuzhiyunThe super root block is inserted for every checkpoints. It includes 259*4882a593Smuzhiyunthree special inodes, inodes for the DAT, cpfile, and sufile. Inodes 260*4882a593Smuzhiyunof regular files, directories, symlinks and other special files, are 261*4882a593Smuzhiyunincluded in the ifile. The inode of ifile itself is included in the 262*4882a593Smuzhiyuncorresponding checkpoint entry in the cpfile. Thus, the hierarchy 263*4882a593Smuzhiyunamong NILFS2 files can be depicted as follows:: 264*4882a593Smuzhiyun 265*4882a593Smuzhiyun Super block (SB) 266*4882a593Smuzhiyun | 267*4882a593Smuzhiyun v 268*4882a593Smuzhiyun Super root block (the latest cno=xx) 269*4882a593Smuzhiyun |-- DAT 270*4882a593Smuzhiyun |-- sufile 271*4882a593Smuzhiyun `-- cpfile 272*4882a593Smuzhiyun |-- ifile (cno=c1) 273*4882a593Smuzhiyun |-- ifile (cno=c2) ---- file (ino=i1) 274*4882a593Smuzhiyun : : |-- file (ino=i2) 275*4882a593Smuzhiyun `-- ifile (cno=xx) |-- file (ino=i3) 276*4882a593Smuzhiyun : : 277*4882a593Smuzhiyun `-- file (ino=yy) 278*4882a593Smuzhiyun ( regular file, directory, or symlink ) 279*4882a593Smuzhiyun 280*4882a593SmuzhiyunFor detail on the format of each file, please see nilfs2_ondisk.h 281*4882a593Smuzhiyunlocated at include/uapi/linux directory. 282*4882a593Smuzhiyun 283*4882a593SmuzhiyunThere are no patents or other intellectual property that we protect 284*4882a593Smuzhiyunwith regard to the design of NILFS2. It is allowed to replicate the 285*4882a593Smuzhiyundesign in hopes that other operating systems could share (mount, read, 286*4882a593Smuzhiyunwrite, etc.) data stored in this format. 287