1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun=============================================================== 4*4882a593SmuzhiyunInotify - A Powerful yet Simple File Change Notification System 5*4882a593Smuzhiyun=============================================================== 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunDocument started 15 Mar 2005 by Robert Love <rml@novell.com> 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunDocument updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com> 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun - Deleted obsoleted interface, just refer to manpages for user interface. 14*4882a593Smuzhiyun 15*4882a593Smuzhiyun(i) Rationale 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunQ: 18*4882a593Smuzhiyun What is the design decision behind not tying the watch to the open fd of 19*4882a593Smuzhiyun the watched object? 20*4882a593Smuzhiyun 21*4882a593SmuzhiyunA: 22*4882a593Smuzhiyun Watches are associated with an open inotify device, not an open file. 23*4882a593Smuzhiyun This solves the primary problem with dnotify: keeping the file open pins 24*4882a593Smuzhiyun the file and thus, worse, pins the mount. Dnotify is therefore infeasible 25*4882a593Smuzhiyun for use on a desktop system with removable media as the media cannot be 26*4882a593Smuzhiyun unmounted. Watching a file should not require that it be open. 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunQ: 29*4882a593Smuzhiyun What is the design decision behind using an-fd-per-instance as opposed to 30*4882a593Smuzhiyun an fd-per-watch? 31*4882a593Smuzhiyun 32*4882a593SmuzhiyunA: 33*4882a593Smuzhiyun An fd-per-watch quickly consumes more file descriptors than are allowed, 34*4882a593Smuzhiyun more fd's than are feasible to manage, and more fd's than are optimally 35*4882a593Smuzhiyun select()-able. Yes, root can bump the per-process fd limit and yes, users 36*4882a593Smuzhiyun can use epoll, but requiring both is a silly and extraneous requirement. 37*4882a593Smuzhiyun A watch consumes less memory than an open file, separating the number 38*4882a593Smuzhiyun spaces is thus sensible. The current design is what user-space developers 39*4882a593Smuzhiyun want: Users initialize inotify, once, and add n watches, requiring but one 40*4882a593Smuzhiyun fd and no twiddling with fd limits. Initializing an inotify instance two 41*4882a593Smuzhiyun thousand times is silly. If we can implement user-space's preferences 42*4882a593Smuzhiyun cleanly--and we can, the idr layer makes stuff like this trivial--then we 43*4882a593Smuzhiyun should. 44*4882a593Smuzhiyun 45*4882a593Smuzhiyun There are other good arguments. With a single fd, there is a single 46*4882a593Smuzhiyun item to block on, which is mapped to a single queue of events. The single 47*4882a593Smuzhiyun fd returns all watch events and also any potential out-of-band data. If 48*4882a593Smuzhiyun every fd was a separate watch, 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun - There would be no way to get event ordering. Events on file foo and 51*4882a593Smuzhiyun file bar would pop poll() on both fd's, but there would be no way to tell 52*4882a593Smuzhiyun which happened first. A single queue trivially gives you ordering. Such 53*4882a593Smuzhiyun ordering is crucial to existing applications such as Beagle. Imagine 54*4882a593Smuzhiyun "mv a b ; mv b a" events without ordering. 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun - We'd have to maintain n fd's and n internal queues with state, 57*4882a593Smuzhiyun versus just one. It is a lot messier in the kernel. A single, linear 58*4882a593Smuzhiyun queue is the data structure that makes sense. 59*4882a593Smuzhiyun 60*4882a593Smuzhiyun - User-space developers prefer the current API. The Beagle guys, for 61*4882a593Smuzhiyun example, love it. Trust me, I asked. It is not a surprise: Who'd want 62*4882a593Smuzhiyun to manage and block on 1000 fd's via select? 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun - No way to get out of band data. 65*4882a593Smuzhiyun 66*4882a593Smuzhiyun - 1024 is still too low. ;-) 67*4882a593Smuzhiyun 68*4882a593Smuzhiyun When you talk about designing a file change notification system that 69*4882a593Smuzhiyun scales to 1000s of directories, juggling 1000s of fd's just does not seem 70*4882a593Smuzhiyun the right interface. It is too heavy. 71*4882a593Smuzhiyun 72*4882a593Smuzhiyun Additionally, it _is_ possible to more than one instance and 73*4882a593Smuzhiyun juggle more than one queue and thus more than one associated fd. There 74*4882a593Smuzhiyun need not be a one-fd-per-process mapping; it is one-fd-per-queue and a 75*4882a593Smuzhiyun process can easily want more than one queue. 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunQ: 78*4882a593Smuzhiyun Why the system call approach? 79*4882a593Smuzhiyun 80*4882a593SmuzhiyunA: 81*4882a593Smuzhiyun The poor user-space interface is the second biggest problem with dnotify. 82*4882a593Smuzhiyun Signals are a terrible, terrible interface for file notification. Or for 83*4882a593Smuzhiyun anything, for that matter. The ideal solution, from all perspectives, is a 84*4882a593Smuzhiyun file descriptor-based one that allows basic file I/O and poll/select. 85*4882a593Smuzhiyun Obtaining the fd and managing the watches could have been done either via a 86*4882a593Smuzhiyun device file or a family of new system calls. We decided to implement a 87*4882a593Smuzhiyun family of system calls because that is the preferred approach for new kernel 88*4882a593Smuzhiyun interfaces. The only real difference was whether we wanted to use open(2) 89*4882a593Smuzhiyun and ioctl(2) or a couple of new system calls. System calls beat ioctls. 90*4882a593Smuzhiyun 91