xref: /OK3568_Linux_fs/kernel/Documentation/filesystems/nilfs2.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun======
4*4882a593SmuzhiyunNILFS2
5*4882a593Smuzhiyun======
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunNILFS2 is a log-structured file system (LFS) supporting continuous
8*4882a593Smuzhiyunsnapshotting.  In addition to versioning capability of the entire file
9*4882a593Smuzhiyunsystem, users can even restore files mistakenly overwritten or
10*4882a593Smuzhiyundestroyed just a few seconds ago.  Since NILFS2 can keep consistency
11*4882a593Smuzhiyunlike conventional LFS, it achieves quick recovery after system
12*4882a593Smuzhiyuncrashes.
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunNILFS2 creates a number of checkpoints every few seconds or per
15*4882a593Smuzhiyunsynchronous write basis (unless there is no change).  Users can select
16*4882a593Smuzhiyunsignificant versions among continuously created checkpoints, and can
17*4882a593Smuzhiyunchange them into snapshots which will be preserved until they are
18*4882a593Smuzhiyunchanged back to checkpoints.
19*4882a593Smuzhiyun
20*4882a593SmuzhiyunThere is no limit on the number of snapshots until the volume gets
21*4882a593Smuzhiyunfull.  Each snapshot is mountable as a read-only file system
22*4882a593Smuzhiyunconcurrently with its writable mount, and this feature is convenient
23*4882a593Smuzhiyunfor online backup.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunThe userland tools are included in nilfs-utils package, which is
26*4882a593Smuzhiyunavailable from the following download page.  At least "mkfs.nilfs2",
27*4882a593Smuzhiyun"mount.nilfs2", "umount.nilfs2", and "nilfs_cleanerd" (so called
28*4882a593Smuzhiyuncleaner or garbage collector) are required.  Details on the tools are
29*4882a593Smuzhiyundescribed in the man pages included in the package.
30*4882a593Smuzhiyun
31*4882a593Smuzhiyun:Project web page:    https://nilfs.sourceforge.io/
32*4882a593Smuzhiyun:Download page:       https://nilfs.sourceforge.io/en/download.html
33*4882a593Smuzhiyun:List info:           http://vger.kernel.org/vger-lists.html#linux-nilfs
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunCaveats
36*4882a593Smuzhiyun=======
37*4882a593Smuzhiyun
38*4882a593SmuzhiyunFeatures which NILFS2 does not support yet:
39*4882a593Smuzhiyun
40*4882a593Smuzhiyun	- atime
41*4882a593Smuzhiyun	- extended attributes
42*4882a593Smuzhiyun	- POSIX ACLs
43*4882a593Smuzhiyun	- quotas
44*4882a593Smuzhiyun	- fsck
45*4882a593Smuzhiyun	- defragmentation
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunMount options
48*4882a593Smuzhiyun=============
49*4882a593Smuzhiyun
50*4882a593SmuzhiyunNILFS2 supports the following mount options:
51*4882a593Smuzhiyun(*) == default
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun======================= =======================================================
54*4882a593Smuzhiyunbarrier(*)		This enables/disables the use of write barriers.  This
55*4882a593Smuzhiyunnobarrier		requires an IO stack which can support barriers, and
56*4882a593Smuzhiyun			if nilfs gets an error on a barrier write, it will
57*4882a593Smuzhiyun			disable again with a warning.
58*4882a593Smuzhiyunerrors=continue		Keep going on a filesystem error.
59*4882a593Smuzhiyunerrors=remount-ro(*)	Remount the filesystem read-only on an error.
60*4882a593Smuzhiyunerrors=panic		Panic and halt the machine if an error occurs.
61*4882a593Smuzhiyuncp=n			Specify the checkpoint-number of the snapshot to be
62*4882a593Smuzhiyun			mounted.  Checkpoints and snapshots are listed by lscp
63*4882a593Smuzhiyun			user command.  Only the checkpoints marked as snapshot
64*4882a593Smuzhiyun			are mountable with this option.  Snapshot is read-only,
65*4882a593Smuzhiyun			so a read-only mount option must be specified together.
66*4882a593Smuzhiyunorder=relaxed(*)	Apply relaxed order semantics that allows modified data
67*4882a593Smuzhiyun			blocks to be written to disk without making a
68*4882a593Smuzhiyun			checkpoint if no metadata update is going.  This mode
69*4882a593Smuzhiyun			is equivalent to the ordered data mode of the ext3
70*4882a593Smuzhiyun			filesystem except for the updates on data blocks still
71*4882a593Smuzhiyun			conserve atomicity.  This will improve synchronous
72*4882a593Smuzhiyun			write performance for overwriting.
73*4882a593Smuzhiyunorder=strict		Apply strict in-order semantics that preserves sequence
74*4882a593Smuzhiyun			of all file operations including overwriting of data
75*4882a593Smuzhiyun			blocks.  That means, it is guaranteed that no
76*4882a593Smuzhiyun			overtaking of events occurs in the recovered file
77*4882a593Smuzhiyun			system after a crash.
78*4882a593Smuzhiyunnorecovery		Disable recovery of the filesystem on mount.
79*4882a593Smuzhiyun			This disables every write access on the device for
80*4882a593Smuzhiyun			read-only mounts or snapshots.  This option will fail
81*4882a593Smuzhiyun			for r/w mounts on an unclean volume.
82*4882a593Smuzhiyundiscard			This enables/disables the use of discard/TRIM commands.
83*4882a593Smuzhiyunnodiscard(*)		The discard/TRIM commands are sent to the underlying
84*4882a593Smuzhiyun			block device when blocks are freed.  This is useful
85*4882a593Smuzhiyun			for SSD devices and sparse/thinly-provisioned LUNs.
86*4882a593Smuzhiyun======================= =======================================================
87*4882a593Smuzhiyun
88*4882a593SmuzhiyunIoctls
89*4882a593Smuzhiyun======
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunThere is some NILFS2 specific functionality which can be accessed by applications
92*4882a593Smuzhiyunthrough the system call interfaces. The list of all NILFS2 specific ioctls are
93*4882a593Smuzhiyunshown in the table below.
94*4882a593Smuzhiyun
95*4882a593SmuzhiyunTable of NILFS2 specific ioctls:
96*4882a593Smuzhiyun
97*4882a593Smuzhiyun ============================== ===============================================
98*4882a593Smuzhiyun Ioctl			        Description
99*4882a593Smuzhiyun ============================== ===============================================
100*4882a593Smuzhiyun NILFS_IOCTL_CHANGE_CPMODE      Change mode of given checkpoint between
101*4882a593Smuzhiyun			        checkpoint and snapshot state. This ioctl is
102*4882a593Smuzhiyun			        used in chcp and mkcp utilities.
103*4882a593Smuzhiyun
104*4882a593Smuzhiyun NILFS_IOCTL_DELETE_CHECKPOINT  Remove checkpoint from NILFS2 file system.
105*4882a593Smuzhiyun			        This ioctl is used in rmcp utility.
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun NILFS_IOCTL_GET_CPINFO         Return info about requested checkpoints. This
108*4882a593Smuzhiyun			        ioctl is used in lscp utility and by
109*4882a593Smuzhiyun			        nilfs_cleanerd daemon.
110*4882a593Smuzhiyun
111*4882a593Smuzhiyun NILFS_IOCTL_GET_CPSTAT         Return checkpoints statistics. This ioctl is
112*4882a593Smuzhiyun			        used by lscp, rmcp utilities and by
113*4882a593Smuzhiyun			        nilfs_cleanerd daemon.
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun NILFS_IOCTL_GET_SUINFO         Return segment usage info about requested
116*4882a593Smuzhiyun			        segments. This ioctl is used in lssu,
117*4882a593Smuzhiyun			        nilfs_resize utilities and by nilfs_cleanerd
118*4882a593Smuzhiyun			        daemon.
119*4882a593Smuzhiyun
120*4882a593Smuzhiyun NILFS_IOCTL_SET_SUINFO         Modify segment usage info of requested
121*4882a593Smuzhiyun				segments. This ioctl is used by
122*4882a593Smuzhiyun				nilfs_cleanerd daemon to skip unnecessary
123*4882a593Smuzhiyun				cleaning operation of segments and reduce
124*4882a593Smuzhiyun				performance penalty or wear of flash device
125*4882a593Smuzhiyun				due to redundant move of in-use blocks.
126*4882a593Smuzhiyun
127*4882a593Smuzhiyun NILFS_IOCTL_GET_SUSTAT         Return segment usage statistics. This ioctl
128*4882a593Smuzhiyun			        is used in lssu, nilfs_resize utilities and
129*4882a593Smuzhiyun			        by nilfs_cleanerd daemon.
130*4882a593Smuzhiyun
131*4882a593Smuzhiyun NILFS_IOCTL_GET_VINFO          Return information on virtual block addresses.
132*4882a593Smuzhiyun			        This ioctl is used by nilfs_cleanerd daemon.
133*4882a593Smuzhiyun
134*4882a593Smuzhiyun NILFS_IOCTL_GET_BDESCS         Return information about descriptors of disk
135*4882a593Smuzhiyun			        block numbers. This ioctl is used by
136*4882a593Smuzhiyun			        nilfs_cleanerd daemon.
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun NILFS_IOCTL_CLEAN_SEGMENTS     Do garbage collection operation in the
139*4882a593Smuzhiyun			        environment of requested parameters from
140*4882a593Smuzhiyun			        userspace. This ioctl is used by
141*4882a593Smuzhiyun			        nilfs_cleanerd daemon.
142*4882a593Smuzhiyun
143*4882a593Smuzhiyun NILFS_IOCTL_SYNC               Make a checkpoint. This ioctl is used in
144*4882a593Smuzhiyun			        mkcp utility.
145*4882a593Smuzhiyun
146*4882a593Smuzhiyun NILFS_IOCTL_RESIZE             Resize NILFS2 volume. This ioctl is used
147*4882a593Smuzhiyun			        by nilfs_resize utility.
148*4882a593Smuzhiyun
149*4882a593Smuzhiyun NILFS_IOCTL_SET_ALLOC_RANGE    Define lower limit of segments in bytes and
150*4882a593Smuzhiyun			        upper limit of segments in bytes. This ioctl
151*4882a593Smuzhiyun			        is used by nilfs_resize utility.
152*4882a593Smuzhiyun ============================== ===============================================
153*4882a593Smuzhiyun
154*4882a593SmuzhiyunNILFS2 usage
155*4882a593Smuzhiyun============
156*4882a593Smuzhiyun
157*4882a593SmuzhiyunTo use nilfs2 as a local file system, simply::
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun # mkfs -t nilfs2 /dev/block_device
160*4882a593Smuzhiyun # mount -t nilfs2 /dev/block_device /dir
161*4882a593Smuzhiyun
162*4882a593SmuzhiyunThis will also invoke the cleaner through the mount helper program
163*4882a593Smuzhiyun(mount.nilfs2).
164*4882a593Smuzhiyun
165*4882a593SmuzhiyunCheckpoints and snapshots are managed by the following commands.
166*4882a593SmuzhiyunTheir manpages are included in the nilfs-utils package above.
167*4882a593Smuzhiyun
168*4882a593Smuzhiyun  ====     ===========================================================
169*4882a593Smuzhiyun  lscp     list checkpoints or snapshots.
170*4882a593Smuzhiyun  mkcp     make a checkpoint or a snapshot.
171*4882a593Smuzhiyun  chcp     change an existing checkpoint to a snapshot or vice versa.
172*4882a593Smuzhiyun  rmcp     invalidate specified checkpoint(s).
173*4882a593Smuzhiyun  ====     ===========================================================
174*4882a593Smuzhiyun
175*4882a593SmuzhiyunTo mount a snapshot::
176*4882a593Smuzhiyun
177*4882a593Smuzhiyun # mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir
178*4882a593Smuzhiyun
179*4882a593Smuzhiyunwhere <cno> is the checkpoint number of the snapshot.
180*4882a593Smuzhiyun
181*4882a593SmuzhiyunTo unmount the NILFS2 mount point or snapshot, simply::
182*4882a593Smuzhiyun
183*4882a593Smuzhiyun # umount /dir
184*4882a593Smuzhiyun
185*4882a593SmuzhiyunThen, the cleaner daemon is automatically shut down by the umount
186*4882a593Smuzhiyunhelper program (umount.nilfs2).
187*4882a593Smuzhiyun
188*4882a593SmuzhiyunDisk format
189*4882a593Smuzhiyun===========
190*4882a593Smuzhiyun
191*4882a593SmuzhiyunA nilfs2 volume is equally divided into a number of segments except
192*4882a593Smuzhiyunfor the super block (SB) and segment #0.  A segment is the container
193*4882a593Smuzhiyunof logs.  Each log is composed of summary information blocks, payload
194*4882a593Smuzhiyunblocks, and an optional super root block (SR)::
195*4882a593Smuzhiyun
196*4882a593Smuzhiyun   ______________________________________________________
197*4882a593Smuzhiyun  | |SB| | Segment | Segment | Segment | ... | Segment | |
198*4882a593Smuzhiyun  |_|__|_|____0____|____1____|____2____|_____|____N____|_|
199*4882a593Smuzhiyun  0 +1K +4K       +8M       +16M      +24M  +(8MB x N)
200*4882a593Smuzhiyun       .             .            (Typical offsets for 4KB-block)
201*4882a593Smuzhiyun    .                  .
202*4882a593Smuzhiyun  .______________________.
203*4882a593Smuzhiyun  | log | log |... | log |
204*4882a593Smuzhiyun  |__1__|__2__|____|__m__|
205*4882a593Smuzhiyun        .       .
206*4882a593Smuzhiyun      .               .
207*4882a593Smuzhiyun    .                       .
208*4882a593Smuzhiyun  .______________________________.
209*4882a593Smuzhiyun  | Summary | Payload blocks  |SR|
210*4882a593Smuzhiyun  |_blocks__|_________________|__|
211*4882a593Smuzhiyun
212*4882a593SmuzhiyunThe payload blocks are organized per file, and each file consists of
213*4882a593Smuzhiyundata blocks and B-tree node blocks::
214*4882a593Smuzhiyun
215*4882a593Smuzhiyun    |<---       File-A        --->|<---       File-B        --->|
216*4882a593Smuzhiyun   _______________________________________________________________
217*4882a593Smuzhiyun    | Data blocks | B-tree blocks | Data blocks | B-tree blocks | ...
218*4882a593Smuzhiyun   _|_____________|_______________|_____________|_______________|_
219*4882a593Smuzhiyun
220*4882a593Smuzhiyun
221*4882a593SmuzhiyunSince only the modified blocks are written in the log, it may have
222*4882a593Smuzhiyunfiles without data blocks or B-tree node blocks.
223*4882a593Smuzhiyun
224*4882a593SmuzhiyunThe organization of the blocks is recorded in the summary information
225*4882a593Smuzhiyunblocks, which contains a header structure (nilfs_segment_summary), per
226*4882a593Smuzhiyunfile structures (nilfs_finfo), and per block structures (nilfs_binfo)::
227*4882a593Smuzhiyun
228*4882a593Smuzhiyun  _________________________________________________________________________
229*4882a593Smuzhiyun | Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |...
230*4882a593Smuzhiyun |_blocks__|___A___|_(A,1)_|_____|(A,Na)_|___B___|_(B,1)_|_____|(B,Nb)_|___
231*4882a593Smuzhiyun
232*4882a593Smuzhiyun
233*4882a593SmuzhiyunThe logs include regular files, directory files, symbolic link files
234*4882a593Smuzhiyunand several meta data files.  The mata data files are the files used
235*4882a593Smuzhiyunto maintain file system meta data.  The current version of NILFS2 uses
236*4882a593Smuzhiyunthe following meta data files::
237*4882a593Smuzhiyun
238*4882a593Smuzhiyun 1) Inode file (ifile)             -- Stores on-disk inodes
239*4882a593Smuzhiyun 2) Checkpoint file (cpfile)       -- Stores checkpoints
240*4882a593Smuzhiyun 3) Segment usage file (sufile)    -- Stores allocation state of segments
241*4882a593Smuzhiyun 4) Data address translation file  -- Maps virtual block numbers to usual
242*4882a593Smuzhiyun    (DAT)                             block numbers.  This file serves to
243*4882a593Smuzhiyun                                      make on-disk blocks relocatable.
244*4882a593Smuzhiyun
245*4882a593SmuzhiyunThe following figure shows a typical organization of the logs::
246*4882a593Smuzhiyun
247*4882a593Smuzhiyun  _________________________________________________________________________
248*4882a593Smuzhiyun | Summary | regular file | file  | ... | ifile | cpfile | sufile | DAT |SR|
249*4882a593Smuzhiyun |_blocks__|_or_directory_|_______|_____|_______|________|________|_____|__|
250*4882a593Smuzhiyun
251*4882a593Smuzhiyun
252*4882a593SmuzhiyunTo stride over segment boundaries, this sequence of files may be split
253*4882a593Smuzhiyuninto multiple logs.  The sequence of logs that should be treated as
254*4882a593Smuzhiyunlogically one log, is delimited with flags marked in the segment
255*4882a593Smuzhiyunsummary.  The recovery code of nilfs2 looks this boundary information
256*4882a593Smuzhiyunto ensure atomicity of updates.
257*4882a593Smuzhiyun
258*4882a593SmuzhiyunThe super root block is inserted for every checkpoints.  It includes
259*4882a593Smuzhiyunthree special inodes, inodes for the DAT, cpfile, and sufile.  Inodes
260*4882a593Smuzhiyunof regular files, directories, symlinks and other special files, are
261*4882a593Smuzhiyunincluded in the ifile.  The inode of ifile itself is included in the
262*4882a593Smuzhiyuncorresponding checkpoint entry in the cpfile.  Thus, the hierarchy
263*4882a593Smuzhiyunamong NILFS2 files can be depicted as follows::
264*4882a593Smuzhiyun
265*4882a593Smuzhiyun  Super block (SB)
266*4882a593Smuzhiyun       |
267*4882a593Smuzhiyun       v
268*4882a593Smuzhiyun  Super root block (the latest cno=xx)
269*4882a593Smuzhiyun       |-- DAT
270*4882a593Smuzhiyun       |-- sufile
271*4882a593Smuzhiyun       `-- cpfile
272*4882a593Smuzhiyun              |-- ifile (cno=c1)
273*4882a593Smuzhiyun              |-- ifile (cno=c2) ---- file (ino=i1)
274*4882a593Smuzhiyun              :        :          |-- file (ino=i2)
275*4882a593Smuzhiyun              `-- ifile (cno=xx)  |-- file (ino=i3)
276*4882a593Smuzhiyun                                  :        :
277*4882a593Smuzhiyun                                  `-- file (ino=yy)
278*4882a593Smuzhiyun                                    ( regular file, directory, or symlink )
279*4882a593Smuzhiyun
280*4882a593SmuzhiyunFor detail on the format of each file, please see nilfs2_ondisk.h
281*4882a593Smuzhiyunlocated at include/uapi/linux directory.
282*4882a593Smuzhiyun
283*4882a593SmuzhiyunThere are no patents or other intellectual property that we protect
284*4882a593Smuzhiyunwith regard to the design of NILFS2.  It is allowed to replicate the
285*4882a593Smuzhiyundesign in hopes that other operating systems could share (mount, read,
286*4882a593Smuzhiyunwrite, etc.) data stored in this format.
287