1*4882a593Smuzhiyun=============================== 2*4882a593SmuzhiyunDocumentation for /proc/sys/fs/ 3*4882a593Smuzhiyun=============================== 4*4882a593Smuzhiyun 5*4882a593Smuzhiyunkernel version 2.2.10 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunCopyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunCopyright (c) 2009, Shen Feng<shen@cn.fujitsu.com> 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunFor general info and legal blurb, please look in intro.rst. 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun------------------------------------------------------------------------------ 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunThis file contains documentation for the sysctl files in 16*4882a593Smuzhiyun/proc/sys/fs/ and is valid for Linux kernel version 2.2. 17*4882a593Smuzhiyun 18*4882a593SmuzhiyunThe files in this directory can be used to tune and monitor 19*4882a593Smuzhiyunmiscellaneous and general things in the operation of the Linux 20*4882a593Smuzhiyunkernel. Since some of the files _can_ be used to screw up your 21*4882a593Smuzhiyunsystem, it is advisable to read both documentation and source 22*4882a593Smuzhiyunbefore actually making adjustments. 23*4882a593Smuzhiyun 24*4882a593Smuzhiyun1. /proc/sys/fs 25*4882a593Smuzhiyun=============== 26*4882a593Smuzhiyun 27*4882a593SmuzhiyunCurrently, these files are in /proc/sys/fs: 28*4882a593Smuzhiyun 29*4882a593Smuzhiyun- aio-max-nr 30*4882a593Smuzhiyun- aio-nr 31*4882a593Smuzhiyun- dentry-state 32*4882a593Smuzhiyun- dquot-max 33*4882a593Smuzhiyun- dquot-nr 34*4882a593Smuzhiyun- file-max 35*4882a593Smuzhiyun- file-nr 36*4882a593Smuzhiyun- inode-max 37*4882a593Smuzhiyun- inode-nr 38*4882a593Smuzhiyun- inode-state 39*4882a593Smuzhiyun- nr_open 40*4882a593Smuzhiyun- overflowuid 41*4882a593Smuzhiyun- overflowgid 42*4882a593Smuzhiyun- pipe-user-pages-hard 43*4882a593Smuzhiyun- pipe-user-pages-soft 44*4882a593Smuzhiyun- protected_fifos 45*4882a593Smuzhiyun- protected_hardlinks 46*4882a593Smuzhiyun- protected_regular 47*4882a593Smuzhiyun- protected_symlinks 48*4882a593Smuzhiyun- suid_dumpable 49*4882a593Smuzhiyun- super-max 50*4882a593Smuzhiyun- super-nr 51*4882a593Smuzhiyun 52*4882a593Smuzhiyun 53*4882a593Smuzhiyunaio-nr & aio-max-nr 54*4882a593Smuzhiyun------------------- 55*4882a593Smuzhiyun 56*4882a593Smuzhiyunaio-nr is the running total of the number of events specified on the 57*4882a593Smuzhiyunio_setup system call for all currently active aio contexts. If aio-nr 58*4882a593Smuzhiyunreaches aio-max-nr then io_setup will fail with EAGAIN. Note that 59*4882a593Smuzhiyunraising aio-max-nr does not result in the pre-allocation or re-sizing 60*4882a593Smuzhiyunof any kernel data structures. 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun 63*4882a593Smuzhiyundentry-state 64*4882a593Smuzhiyun------------ 65*4882a593Smuzhiyun 66*4882a593SmuzhiyunFrom linux/include/linux/dcache.h:: 67*4882a593Smuzhiyun 68*4882a593Smuzhiyun struct dentry_stat_t dentry_stat { 69*4882a593Smuzhiyun int nr_dentry; 70*4882a593Smuzhiyun int nr_unused; 71*4882a593Smuzhiyun int age_limit; /* age in seconds */ 72*4882a593Smuzhiyun int want_pages; /* pages requested by system */ 73*4882a593Smuzhiyun int nr_negative; /* # of unused negative dentries */ 74*4882a593Smuzhiyun int dummy; /* Reserved for future use */ 75*4882a593Smuzhiyun }; 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunDentries are dynamically allocated and deallocated. 78*4882a593Smuzhiyun 79*4882a593Smuzhiyunnr_dentry shows the total number of dentries allocated (active 80*4882a593Smuzhiyun+ unused). nr_unused shows the number of dentries that are not 81*4882a593Smuzhiyunactively used, but are saved in the LRU list for future reuse. 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunAge_limit is the age in seconds after which dcache entries 84*4882a593Smuzhiyuncan be reclaimed when memory is short and want_pages is 85*4882a593Smuzhiyunnonzero when shrink_dcache_pages() has been called and the 86*4882a593Smuzhiyundcache isn't pruned yet. 87*4882a593Smuzhiyun 88*4882a593Smuzhiyunnr_negative shows the number of unused dentries that are also 89*4882a593Smuzhiyunnegative dentries which do not map to any files. Instead, 90*4882a593Smuzhiyunthey help speeding up rejection of non-existing files provided 91*4882a593Smuzhiyunby the users. 92*4882a593Smuzhiyun 93*4882a593Smuzhiyun 94*4882a593Smuzhiyundquot-max & dquot-nr 95*4882a593Smuzhiyun-------------------- 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunThe file dquot-max shows the maximum number of cached disk 98*4882a593Smuzhiyunquota entries. 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunThe file dquot-nr shows the number of allocated disk quota 101*4882a593Smuzhiyunentries and the number of free disk quota entries. 102*4882a593Smuzhiyun 103*4882a593SmuzhiyunIf the number of free cached disk quotas is very low and 104*4882a593Smuzhiyunyou have some awesome number of simultaneous system users, 105*4882a593Smuzhiyunyou might want to raise the limit. 106*4882a593Smuzhiyun 107*4882a593Smuzhiyun 108*4882a593Smuzhiyunfile-max & file-nr 109*4882a593Smuzhiyun------------------ 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunThe value in file-max denotes the maximum number of file- 112*4882a593Smuzhiyunhandles that the Linux kernel will allocate. When you get lots 113*4882a593Smuzhiyunof error messages about running out of file handles, you might 114*4882a593Smuzhiyunwant to increase this limit. 115*4882a593Smuzhiyun 116*4882a593SmuzhiyunHistorically,the kernel was able to allocate file handles 117*4882a593Smuzhiyundynamically, but not to free them again. The three values in 118*4882a593Smuzhiyunfile-nr denote the number of allocated file handles, the number 119*4882a593Smuzhiyunof allocated but unused file handles, and the maximum number of 120*4882a593Smuzhiyunfile handles. Linux 2.6 always reports 0 as the number of free 121*4882a593Smuzhiyunfile handles -- this is not an error, it just means that the 122*4882a593Smuzhiyunnumber of allocated file handles exactly matches the number of 123*4882a593Smuzhiyunused file handles. 124*4882a593Smuzhiyun 125*4882a593SmuzhiyunAttempts to allocate more file descriptors than file-max are 126*4882a593Smuzhiyunreported with printk, look for "VFS: file-max limit <number> 127*4882a593Smuzhiyunreached". 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun 130*4882a593Smuzhiyunnr_open 131*4882a593Smuzhiyun------- 132*4882a593Smuzhiyun 133*4882a593SmuzhiyunThis denotes the maximum number of file-handles a process can 134*4882a593Smuzhiyunallocate. Default value is 1024*1024 (1048576) which should be 135*4882a593Smuzhiyunenough for most machines. Actual limit depends on RLIMIT_NOFILE 136*4882a593Smuzhiyunresource limit. 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun 139*4882a593Smuzhiyuninode-max, inode-nr & inode-state 140*4882a593Smuzhiyun--------------------------------- 141*4882a593Smuzhiyun 142*4882a593SmuzhiyunAs with file handles, the kernel allocates the inode structures 143*4882a593Smuzhiyundynamically, but can't free them yet. 144*4882a593Smuzhiyun 145*4882a593SmuzhiyunThe value in inode-max denotes the maximum number of inode 146*4882a593Smuzhiyunhandlers. This value should be 3-4 times larger than the value 147*4882a593Smuzhiyunin file-max, since stdin, stdout and network sockets also 148*4882a593Smuzhiyunneed an inode struct to handle them. When you regularly run 149*4882a593Smuzhiyunout of inodes, you need to increase this value. 150*4882a593Smuzhiyun 151*4882a593SmuzhiyunThe file inode-nr contains the first two items from 152*4882a593Smuzhiyuninode-state, so we'll skip to that file... 153*4882a593Smuzhiyun 154*4882a593SmuzhiyunInode-state contains three actual numbers and four dummies. 155*4882a593SmuzhiyunThe actual numbers are, in order of appearance, nr_inodes, 156*4882a593Smuzhiyunnr_free_inodes and preshrink. 157*4882a593Smuzhiyun 158*4882a593SmuzhiyunNr_inodes stands for the number of inodes the system has 159*4882a593Smuzhiyunallocated, this can be slightly more than inode-max because 160*4882a593SmuzhiyunLinux allocates them one pageful at a time. 161*4882a593Smuzhiyun 162*4882a593SmuzhiyunNr_free_inodes represents the number of free inodes (?) and 163*4882a593Smuzhiyunpreshrink is nonzero when the nr_inodes > inode-max and the 164*4882a593Smuzhiyunsystem needs to prune the inode list instead of allocating 165*4882a593Smuzhiyunmore. 166*4882a593Smuzhiyun 167*4882a593Smuzhiyun 168*4882a593Smuzhiyunoverflowgid & overflowuid 169*4882a593Smuzhiyun------------------------- 170*4882a593Smuzhiyun 171*4882a593SmuzhiyunSome filesystems only support 16-bit UIDs and GIDs, although in Linux 172*4882a593SmuzhiyunUIDs and GIDs are 32 bits. When one of these filesystems is mounted 173*4882a593Smuzhiyunwith writes enabled, any UID or GID that would exceed 65535 is translated 174*4882a593Smuzhiyunto a fixed value before being written to disk. 175*4882a593Smuzhiyun 176*4882a593SmuzhiyunThese sysctls allow you to change the value of the fixed UID and GID. 177*4882a593SmuzhiyunThe default is 65534. 178*4882a593Smuzhiyun 179*4882a593Smuzhiyun 180*4882a593Smuzhiyunpipe-user-pages-hard 181*4882a593Smuzhiyun-------------------- 182*4882a593Smuzhiyun 183*4882a593SmuzhiyunMaximum total number of pages a non-privileged user may allocate for pipes. 184*4882a593SmuzhiyunOnce this limit is reached, no new pipes may be allocated until usage goes 185*4882a593Smuzhiyunbelow the limit again. When set to 0, no limit is applied, which is the default 186*4882a593Smuzhiyunsetting. 187*4882a593Smuzhiyun 188*4882a593Smuzhiyun 189*4882a593Smuzhiyunpipe-user-pages-soft 190*4882a593Smuzhiyun-------------------- 191*4882a593Smuzhiyun 192*4882a593SmuzhiyunMaximum total number of pages a non-privileged user may allocate for pipes 193*4882a593Smuzhiyunbefore the pipe size gets limited to a single page. Once this limit is reached, 194*4882a593Smuzhiyunnew pipes will be limited to a single page in size for this user in order to 195*4882a593Smuzhiyunlimit total memory usage, and trying to increase them using fcntl() will be 196*4882a593Smuzhiyundenied until usage goes below the limit again. The default value allows to 197*4882a593Smuzhiyunallocate up to 1024 pipes at their default size. When set to 0, no limit is 198*4882a593Smuzhiyunapplied. 199*4882a593Smuzhiyun 200*4882a593Smuzhiyun 201*4882a593Smuzhiyunprotected_fifos 202*4882a593Smuzhiyun--------------- 203*4882a593Smuzhiyun 204*4882a593SmuzhiyunThe intent of this protection is to avoid unintentional writes to 205*4882a593Smuzhiyunan attacker-controlled FIFO, where a program expected to create a regular 206*4882a593Smuzhiyunfile. 207*4882a593Smuzhiyun 208*4882a593SmuzhiyunWhen set to "0", writing to FIFOs is unrestricted. 209*4882a593Smuzhiyun 210*4882a593SmuzhiyunWhen set to "1" don't allow O_CREAT open on FIFOs that we don't own 211*4882a593Smuzhiyunin world writable sticky directories, unless they are owned by the 212*4882a593Smuzhiyunowner of the directory. 213*4882a593Smuzhiyun 214*4882a593SmuzhiyunWhen set to "2" it also applies to group writable sticky directories. 215*4882a593Smuzhiyun 216*4882a593SmuzhiyunThis protection is based on the restrictions in Openwall. 217*4882a593Smuzhiyun 218*4882a593Smuzhiyun 219*4882a593Smuzhiyunprotected_hardlinks 220*4882a593Smuzhiyun-------------------- 221*4882a593Smuzhiyun 222*4882a593SmuzhiyunA long-standing class of security issues is the hardlink-based 223*4882a593Smuzhiyuntime-of-check-time-of-use race, most commonly seen in world-writable 224*4882a593Smuzhiyundirectories like /tmp. The common method of exploitation of this flaw 225*4882a593Smuzhiyunis to cross privilege boundaries when following a given hardlink (i.e. a 226*4882a593Smuzhiyunroot process follows a hardlink created by another user). Additionally, 227*4882a593Smuzhiyunon systems without separated partitions, this stops unauthorized users 228*4882a593Smuzhiyunfrom "pinning" vulnerable setuid/setgid files against being upgraded by 229*4882a593Smuzhiyunthe administrator, or linking to special files. 230*4882a593Smuzhiyun 231*4882a593SmuzhiyunWhen set to "0", hardlink creation behavior is unrestricted. 232*4882a593Smuzhiyun 233*4882a593SmuzhiyunWhen set to "1" hardlinks cannot be created by users if they do not 234*4882a593Smuzhiyunalready own the source file, or do not have read/write access to it. 235*4882a593Smuzhiyun 236*4882a593SmuzhiyunThis protection is based on the restrictions in Openwall and grsecurity. 237*4882a593Smuzhiyun 238*4882a593Smuzhiyun 239*4882a593Smuzhiyunprotected_regular 240*4882a593Smuzhiyun----------------- 241*4882a593Smuzhiyun 242*4882a593SmuzhiyunThis protection is similar to protected_fifos, but it 243*4882a593Smuzhiyunavoids writes to an attacker-controlled regular file, where a program 244*4882a593Smuzhiyunexpected to create one. 245*4882a593Smuzhiyun 246*4882a593SmuzhiyunWhen set to "0", writing to regular files is unrestricted. 247*4882a593Smuzhiyun 248*4882a593SmuzhiyunWhen set to "1" don't allow O_CREAT open on regular files that we 249*4882a593Smuzhiyundon't own in world writable sticky directories, unless they are 250*4882a593Smuzhiyunowned by the owner of the directory. 251*4882a593Smuzhiyun 252*4882a593SmuzhiyunWhen set to "2" it also applies to group writable sticky directories. 253*4882a593Smuzhiyun 254*4882a593Smuzhiyun 255*4882a593Smuzhiyunprotected_symlinks 256*4882a593Smuzhiyun------------------ 257*4882a593Smuzhiyun 258*4882a593SmuzhiyunA long-standing class of security issues is the symlink-based 259*4882a593Smuzhiyuntime-of-check-time-of-use race, most commonly seen in world-writable 260*4882a593Smuzhiyundirectories like /tmp. The common method of exploitation of this flaw 261*4882a593Smuzhiyunis to cross privilege boundaries when following a given symlink (i.e. a 262*4882a593Smuzhiyunroot process follows a symlink belonging to another user). For a likely 263*4882a593Smuzhiyunincomplete list of hundreds of examples across the years, please see: 264*4882a593Smuzhiyunhttps://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp 265*4882a593Smuzhiyun 266*4882a593SmuzhiyunWhen set to "0", symlink following behavior is unrestricted. 267*4882a593Smuzhiyun 268*4882a593SmuzhiyunWhen set to "1" symlinks are permitted to be followed only when outside 269*4882a593Smuzhiyuna sticky world-writable directory, or when the uid of the symlink and 270*4882a593Smuzhiyunfollower match, or when the directory owner matches the symlink's owner. 271*4882a593Smuzhiyun 272*4882a593SmuzhiyunThis protection is based on the restrictions in Openwall and grsecurity. 273*4882a593Smuzhiyun 274*4882a593Smuzhiyun 275*4882a593Smuzhiyunsuid_dumpable: 276*4882a593Smuzhiyun-------------- 277*4882a593Smuzhiyun 278*4882a593SmuzhiyunThis value can be used to query and set the core dump mode for setuid 279*4882a593Smuzhiyunor otherwise protected/tainted binaries. The modes are 280*4882a593Smuzhiyun 281*4882a593Smuzhiyun= ========== =============================================================== 282*4882a593Smuzhiyun0 (default) traditional behaviour. Any process which has changed 283*4882a593Smuzhiyun privilege levels or is execute only will not be dumped. 284*4882a593Smuzhiyun1 (debug) all processes dump core when possible. The core dump is 285*4882a593Smuzhiyun owned by the current user and no security is applied. This is 286*4882a593Smuzhiyun intended for system debugging situations only. 287*4882a593Smuzhiyun Ptrace is unchecked. 288*4882a593Smuzhiyun This is insecure as it allows regular users to examine the 289*4882a593Smuzhiyun memory contents of privileged processes. 290*4882a593Smuzhiyun2 (suidsafe) any binary which normally would not be dumped is dumped 291*4882a593Smuzhiyun anyway, but only if the "core_pattern" kernel sysctl is set to 292*4882a593Smuzhiyun either a pipe handler or a fully qualified path. (For more 293*4882a593Smuzhiyun details on this limitation, see CVE-2006-2451.) This mode is 294*4882a593Smuzhiyun appropriate when administrators are attempting to debug 295*4882a593Smuzhiyun problems in a normal environment, and either have a core dump 296*4882a593Smuzhiyun pipe handler that knows to treat privileged core dumps with 297*4882a593Smuzhiyun care, or specific directory defined for catching core dumps. 298*4882a593Smuzhiyun If a core dump happens without a pipe handler or fully 299*4882a593Smuzhiyun qualified path, a message will be emitted to syslog warning 300*4882a593Smuzhiyun about the lack of a correct setting. 301*4882a593Smuzhiyun= ========== =============================================================== 302*4882a593Smuzhiyun 303*4882a593Smuzhiyun 304*4882a593Smuzhiyunsuper-max & super-nr 305*4882a593Smuzhiyun-------------------- 306*4882a593Smuzhiyun 307*4882a593SmuzhiyunThese numbers control the maximum number of superblocks, and 308*4882a593Smuzhiyunthus the maximum number of mounted filesystems the kernel 309*4882a593Smuzhiyuncan have. You only need to increase super-max if you need to 310*4882a593Smuzhiyunmount more filesystems than the current value in super-max 311*4882a593Smuzhiyunallows you to. 312*4882a593Smuzhiyun 313*4882a593Smuzhiyun 314*4882a593Smuzhiyunaio-nr & aio-max-nr 315*4882a593Smuzhiyun------------------- 316*4882a593Smuzhiyun 317*4882a593Smuzhiyunaio-nr shows the current system-wide number of asynchronous io 318*4882a593Smuzhiyunrequests. aio-max-nr allows you to change the maximum value 319*4882a593Smuzhiyunaio-nr can grow to. 320*4882a593Smuzhiyun 321*4882a593Smuzhiyun 322*4882a593Smuzhiyunmount-max 323*4882a593Smuzhiyun--------- 324*4882a593Smuzhiyun 325*4882a593SmuzhiyunThis denotes the maximum number of mounts that may exist 326*4882a593Smuzhiyunin a mount namespace. 327*4882a593Smuzhiyun 328*4882a593Smuzhiyun 329*4882a593Smuzhiyun 330*4882a593Smuzhiyun2. /proc/sys/fs/binfmt_misc 331*4882a593Smuzhiyun=========================== 332*4882a593Smuzhiyun 333*4882a593SmuzhiyunDocumentation for the files in /proc/sys/fs/binfmt_misc is 334*4882a593Smuzhiyunin Documentation/admin-guide/binfmt-misc.rst. 335*4882a593Smuzhiyun 336*4882a593Smuzhiyun 337*4882a593Smuzhiyun3. /proc/sys/fs/mqueue - POSIX message queues filesystem 338*4882a593Smuzhiyun======================================================== 339*4882a593Smuzhiyun 340*4882a593Smuzhiyun 341*4882a593SmuzhiyunThe "mqueue" filesystem provides the necessary kernel features to enable the 342*4882a593Smuzhiyuncreation of a user space library that implements the POSIX message queues 343*4882a593SmuzhiyunAPI (as noted by the MSG tag in the POSIX 1003.1-2001 version of the System 344*4882a593SmuzhiyunInterfaces specification.) 345*4882a593Smuzhiyun 346*4882a593SmuzhiyunThe "mqueue" filesystem contains values for determining/setting the amount of 347*4882a593Smuzhiyunresources used by the file system. 348*4882a593Smuzhiyun 349*4882a593Smuzhiyun/proc/sys/fs/mqueue/queues_max is a read/write file for setting/getting the 350*4882a593Smuzhiyunmaximum number of message queues allowed on the system. 351*4882a593Smuzhiyun 352*4882a593Smuzhiyun/proc/sys/fs/mqueue/msg_max is a read/write file for setting/getting the 353*4882a593Smuzhiyunmaximum number of messages in a queue value. In fact it is the limiting value 354*4882a593Smuzhiyunfor another (user) limit which is set in mq_open invocation. This attribute of 355*4882a593Smuzhiyuna queue must be less or equal then msg_max. 356*4882a593Smuzhiyun 357*4882a593Smuzhiyun/proc/sys/fs/mqueue/msgsize_max is a read/write file for setting/getting the 358*4882a593Smuzhiyunmaximum message size value (it is every message queue's attribute set during 359*4882a593Smuzhiyunits creation). 360*4882a593Smuzhiyun 361*4882a593Smuzhiyun/proc/sys/fs/mqueue/msg_default is a read/write file for setting/getting the 362*4882a593Smuzhiyundefault number of messages in a queue value if attr parameter of mq_open(2) is 363*4882a593SmuzhiyunNULL. If it exceed msg_max, the default value is initialized msg_max. 364*4882a593Smuzhiyun 365*4882a593Smuzhiyun/proc/sys/fs/mqueue/msgsize_default is a read/write file for setting/getting 366*4882a593Smuzhiyunthe default message size value if attr parameter of mq_open(2) is NULL. If it 367*4882a593Smuzhiyunexceed msgsize_max, the default value is initialized msgsize_max. 368*4882a593Smuzhiyun 369*4882a593Smuzhiyun4. /proc/sys/fs/epoll - Configuration options for the epoll interface 370*4882a593Smuzhiyun===================================================================== 371*4882a593Smuzhiyun 372*4882a593SmuzhiyunThis directory contains configuration options for the epoll(7) interface. 373*4882a593Smuzhiyun 374*4882a593Smuzhiyunmax_user_watches 375*4882a593Smuzhiyun---------------- 376*4882a593Smuzhiyun 377*4882a593SmuzhiyunEvery epoll file descriptor can store a number of files to be monitored 378*4882a593Smuzhiyunfor event readiness. Each one of these monitored files constitutes a "watch". 379*4882a593SmuzhiyunThis configuration option sets the maximum number of "watches" that are 380*4882a593Smuzhiyunallowed for each user. 381*4882a593SmuzhiyunEach "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes 382*4882a593Smuzhiyunon a 64bit one. 383*4882a593SmuzhiyunThe current default value for max_user_watches is the 1/32 of the available 384*4882a593Smuzhiyunlow memory, divided for the "watch" cost in bytes. 385