xref: /OK3568_Linux_fs/kernel/Documentation/filesystems/files.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun===================================
4*4882a593SmuzhiyunFile management in the Linux kernel
5*4882a593Smuzhiyun===================================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunThis document describes how locking for files (struct file)
8*4882a593Smuzhiyunand file descriptor table (struct files) works.
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunUp until 2.6.12, the file descriptor table has been protected
11*4882a593Smuzhiyunwith a lock (files->file_lock) and reference count (files->count).
12*4882a593Smuzhiyun->file_lock protected accesses to all the file related fields
13*4882a593Smuzhiyunof the table. ->count was used for sharing the file descriptor
14*4882a593Smuzhiyuntable between tasks cloned with CLONE_FILES flag. Typically
15*4882a593Smuzhiyunthis would be the case for posix threads. As with the common
16*4882a593Smuzhiyunrefcounting model in the kernel, the last task doing
17*4882a593Smuzhiyuna put_files_struct() frees the file descriptor (fd) table.
18*4882a593SmuzhiyunThe files (struct file) themselves are protected using
19*4882a593Smuzhiyunreference count (->f_count).
20*4882a593Smuzhiyun
21*4882a593SmuzhiyunIn the new lock-free model of file descriptor management,
22*4882a593Smuzhiyunthe reference counting is similar, but the locking is
23*4882a593Smuzhiyunbased on RCU. The file descriptor table contains multiple
24*4882a593Smuzhiyunelements - the fd sets (open_fds and close_on_exec, the
25*4882a593Smuzhiyunarray of file pointers, the sizes of the sets and the array
26*4882a593Smuzhiyunetc.). In order for the updates to appear atomic to
27*4882a593Smuzhiyuna lock-free reader, all the elements of the file descriptor
28*4882a593Smuzhiyuntable are in a separate structure - struct fdtable.
29*4882a593Smuzhiyunfiles_struct contains a pointer to struct fdtable through
30*4882a593Smuzhiyunwhich the actual fd table is accessed. Initially the
31*4882a593Smuzhiyunfdtable is embedded in files_struct itself. On a subsequent
32*4882a593Smuzhiyunexpansion of fdtable, a new fdtable structure is allocated
33*4882a593Smuzhiyunand files->fdtab points to the new structure. The fdtable
34*4882a593Smuzhiyunstructure is freed with RCU and lock-free readers either
35*4882a593Smuzhiyunsee the old fdtable or the new fdtable making the update
36*4882a593Smuzhiyunappear atomic. Here are the locking rules for
37*4882a593Smuzhiyunthe fdtable structure -
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun1. All references to the fdtable must be done through
40*4882a593Smuzhiyun   the files_fdtable() macro::
41*4882a593Smuzhiyun
42*4882a593Smuzhiyun	struct fdtable *fdt;
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun	rcu_read_lock();
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun	fdt = files_fdtable(files);
47*4882a593Smuzhiyun	....
48*4882a593Smuzhiyun	if (n <= fdt->max_fds)
49*4882a593Smuzhiyun		....
50*4882a593Smuzhiyun	...
51*4882a593Smuzhiyun	rcu_read_unlock();
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun   files_fdtable() uses rcu_dereference() macro which takes care of
54*4882a593Smuzhiyun   the memory barrier requirements for lock-free dereference.
55*4882a593Smuzhiyun   The fdtable pointer must be read within the read-side
56*4882a593Smuzhiyun   critical section.
57*4882a593Smuzhiyun
58*4882a593Smuzhiyun2. Reading of the fdtable as described above must be protected
59*4882a593Smuzhiyun   by rcu_read_lock()/rcu_read_unlock().
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun3. For any update to the fd table, files->file_lock must
62*4882a593Smuzhiyun   be held.
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun4. To look up the file structure given an fd, a reader
65*4882a593Smuzhiyun   must use either fcheck() or fcheck_files() APIs. These
66*4882a593Smuzhiyun   take care of barrier requirements due to lock-free lookup.
67*4882a593Smuzhiyun
68*4882a593Smuzhiyun   An example::
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun	struct file *file;
71*4882a593Smuzhiyun
72*4882a593Smuzhiyun	rcu_read_lock();
73*4882a593Smuzhiyun	file = fcheck(fd);
74*4882a593Smuzhiyun	if (file) {
75*4882a593Smuzhiyun		...
76*4882a593Smuzhiyun	}
77*4882a593Smuzhiyun	....
78*4882a593Smuzhiyun	rcu_read_unlock();
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun5. Handling of the file structures is special. Since the look-up
81*4882a593Smuzhiyun   of the fd (fget()/fget_light()) are lock-free, it is possible
82*4882a593Smuzhiyun   that look-up may race with the last put() operation on the
83*4882a593Smuzhiyun   file structure. This is avoided using atomic_long_inc_not_zero()
84*4882a593Smuzhiyun   on ->f_count::
85*4882a593Smuzhiyun
86*4882a593Smuzhiyun	rcu_read_lock();
87*4882a593Smuzhiyun	file = fcheck_files(files, fd);
88*4882a593Smuzhiyun	if (file) {
89*4882a593Smuzhiyun		if (atomic_long_inc_not_zero(&file->f_count))
90*4882a593Smuzhiyun			*fput_needed = 1;
91*4882a593Smuzhiyun		else
92*4882a593Smuzhiyun		/* Didn't get the reference, someone's freed */
93*4882a593Smuzhiyun			file = NULL;
94*4882a593Smuzhiyun	}
95*4882a593Smuzhiyun	rcu_read_unlock();
96*4882a593Smuzhiyun	....
97*4882a593Smuzhiyun	return file;
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun   atomic_long_inc_not_zero() detects if refcounts is already zero or
100*4882a593Smuzhiyun   goes to zero during increment. If it does, we fail
101*4882a593Smuzhiyun   fget()/fget_light().
102*4882a593Smuzhiyun
103*4882a593Smuzhiyun6. Since both fdtable and file structures can be looked up
104*4882a593Smuzhiyun   lock-free, they must be installed using rcu_assign_pointer()
105*4882a593Smuzhiyun   API. If they are looked up lock-free, rcu_dereference()
106*4882a593Smuzhiyun   must be used. However it is advisable to use files_fdtable()
107*4882a593Smuzhiyun   and fcheck()/fcheck_files() which take care of these issues.
108*4882a593Smuzhiyun
109*4882a593Smuzhiyun7. While updating, the fdtable pointer must be looked up while
110*4882a593Smuzhiyun   holding files->file_lock. If ->file_lock is dropped, then
111*4882a593Smuzhiyun   another thread expand the files thereby creating a new
112*4882a593Smuzhiyun   fdtable and making the earlier fdtable pointer stale.
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun   For example::
115*4882a593Smuzhiyun
116*4882a593Smuzhiyun	spin_lock(&files->file_lock);
117*4882a593Smuzhiyun	fd = locate_fd(files, file, start);
118*4882a593Smuzhiyun	if (fd >= 0) {
119*4882a593Smuzhiyun		/* locate_fd() may have expanded fdtable, load the ptr */
120*4882a593Smuzhiyun		fdt = files_fdtable(files);
121*4882a593Smuzhiyun		__set_open_fd(fd, fdt);
122*4882a593Smuzhiyun		__clear_close_on_exec(fd, fdt);
123*4882a593Smuzhiyun		spin_unlock(&files->file_lock);
124*4882a593Smuzhiyun	.....
125*4882a593Smuzhiyun
126*4882a593Smuzhiyun   Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
127*4882a593Smuzhiyun   the fdtable pointer (fdt) must be loaded after locate_fd().
128*4882a593Smuzhiyun
129